Comparison of the Different Methods Used to Calculate Feature Importance

Alexandra Chirilov; James Pitcher; Andrzej Surma

Comparison of the Different Methods Used to Calculate Feature Importance

Presented at the 10th Annual BayesiaLab Conference on Wednesday, October 26, 2022.

Abstract

The topic of the presentation is comparing the different ways of calculating the feature importance: Shapley Value Regression in R, Bayesian Network in BayesiaLab software, Random Forest in R and shap library in Python. The paper aims to show the similarities and differences between considered approaches.

The entire process is based on the data simulation using copulas in which different scenarios are tested to account for the limitations of survey data, e.g., data skewness.

In the paper, the author tests the different strengths of relationships between the independent variables, the number of predictors, and the measurement scales (binary, Likert scale).

The author used a model agnostic approach called permutation feature importance as a comparison benchmark.

Presentation Video

Presentation Slides

2022-10-26-BayesiaLab-Conference-Andrzej-Surma.pdf_xonpue.pdf

PDF

Please also see the second presentation by GfK at the BayesiaLab Conference:

Driver Analysis in Brand Trackers — Bayesian Network vs Shapley Value Regression

About the Presenters

Alexandra Chirilov, Head of Global Marketing Science, GfK SE

Alexandra Chirilov is leading GfK’s Global Product Development Practice for consumer and brand intelligence. Her insights and research have been featured in publications such as Esomar, Journal of Marketing Research, Sawtooth, and more. She is a winner of the ESOMAR Corporate Young Professional Award, among other industry awards.

James Pitcher, GfK Marketing Sciences Lead UK

James Pitcher leads GfK’s Marketing Sciences team in the UK, which designs and delivers sophisticated analytical solutions to solve client high-value problems. He has spent the last 15 years providing statistical advice and consultancy within the market research industry, working with clients across many different sectors and regions. James is an expert in conjoint analysis, brand research, pricing, consumer segmentation, and a wide range of multivariate techniques, including Bayesian Networks analysis, contributing to the development of innovative techniques and regularly presenting at international conferences.

Andrzej Surma, Senior Marketing Scientist, GfK

Andrzej Surma works as a methodological lead for the Global Product Development Team. He has more than ten years of experience in data analysis. With a background in mathematics, Andrzej loves to solve problems, such as recognizing the Greek letters contained within formulas that describe mathematical models! Recently, he co-created a Bayesian Networks approach to running Key Drivers Analysis on brand tracking data. Spatial data analysis is another particular interest of his. Andrzej likes to be active in his free time, playing football and riding bikes and is inspired daily by his wife and their three children.

Bayesian Networks and Their Applications in Modelling Resilience and Regime Shifts Cross-Examination with Bayesian Networks