Recorded on March 23, 2018.
Big Data is not the Answer
One might be tempted to think that all the data nowadays being generated from the retail sector would easily capture all the sales dynamics related to cannibalization. Hence, one might speculate that it would be straightforward to quantify the amount of sales a new product Y is taking away from an existing product X. This calculation, however, is not possible, regardless of how much data we collect.
It's a Causal Question
Why not? The key issue is that cannibalization is a fundamentally causal question. Looking forward, we wish to know how the introduction of a new product will cause a change in sales of the existing product. Looking backward, we are interested in the counterfactual, i.e., had it not been for the introduction of the new product, how much would we have sold of the existing product. To answer these questions, we require a causal model, which we cannot generate from data alone, unfortunately.
Machine Learning & Domain Knowledge
How can Bayesian networks help? We can machine-learn Bayesian networks from data and approximate the joint distribution of the recorded sales data. Then, we can formally introduce into the network our human domain knowledge and provide causal assumptions. With that, we have a formal causal model, which allows us to compute causal and counterfactual inference. A particularly attractive property of Bayesian networks is that we can perform omnidirectional inference, i.e., we can compute the effect of product X on Y, and vice versa, using the same model. Furthermore, we can extend this to any number of products whose cannibalization effects we wish to investigate.
No Data, No Inference?
Perhaps this seems plausible for estimating the cannibalization that occurred over a time frame for which we have collected a lot of data. But what about looking forward, before product X hits the market? Once again, human domain knowledge is needed. So, are Bayesian networks even helpful in this context? Indeed, they are. Until now, the challenge has been to encode the knowledge of experts systematically into a unified model. It is perhaps easy to state that, for instance, product X will take away 10% of sales of Y. But what about the other way around? What if we also consider product Z? It can easily happen that a sequence of numerical estimates by experts turns out to be mathematically impossible. Encoding assumptions into a Bayesian network, however, enforces their consistency and prevents contradictions.
The Wisdom of Crowds
Another opportunity arises from the possibility to elicit knowledge from a group of experts, thus overcoming any potential individual biases. The Bayesia Expert Knowledge Elicitation Environment (BEKEE) allows us to systematically query multiple experts and "merge" their knowledge into a Bayesian network. Thus, we can derive a mathematically correct summary of the prevailing opinions of all the involved experts. As a result, we obtain a quantitative model that gives us consistent estimates of cannibalization on the basis of what is knowable.