Customer characterization, development of profiles
The data bases relative to the customers can be used to elaborate profiles by using Data Mining methods. These profiles can then bring objective information to the Marketing department, they can also be used to reduce the cost of the campaigns by selecting only the prospects than have a high probability to reply positively, or they can be exploited for fraud detection.
Marketing
We describe here the use of BayesiaLab for the profiling of customers with respect to a bank product that has various modalities. The data base used contains variables describing the customer (age, socioprofessional group, ...), its bank account (facilities, consumption, ...) and the modality of its bank product.
This last variable allows using supervised learning methods. Instead of learning a Bayesian network representing all the probabilistic relations that hold in the data base, it is possible to use the supervised learning algorithms of BayesiaLab. For example, the Markov Blanket Learning algorithm allows focusing the search only toward the variables that really characterize the target variable. The screen shot below represent the Bayesian network that has been automatically learned.
This bayesian network, that represents the Markov Blanket of the bank product, makes the analysis easier by reducing the number of variables to take into account. The complete analysis toolbox of BayesiaLab is also very helpful (arc length relative to the strength of the probabilistic relations, automatic positioning of the nodes that takes into account its strength, analysis of the probabilistic relations between the variables and the target variable or one of its modality).
The quality evaluation of the learned Bayesian network on a data set that has not been used for learning has allowed measuring a gain of 20% in precision with respect to the profile used then. The screen shot below represents the evaluation tools of BayesiaLab (Total precision of the Bayesian network, Confusion matrix to measure more precisely the quality of the model (displayed with occurrences, reliability or precision), and Lift or ROC curves that constitute useful tools for choosing the probability thresholds that will be used in the decision rules).
BayesiaLab can then exploit the Bayesian network to build adaptive questionnaires for sorting the monitor variables (i.e. the questions) based both on the information brought to the knowing of the target variable, and on the cost associated to the knowing of the question variable.
The Bayesian network can also be used to predict the bank product for new customers. As BayesiaLab returns the probability associated to that prediction, it is possible to use this probability to reduce the cost of the campaigns by using this probability to select the prospects
Fraud detection
By using the same methodology as the one described above on data coming from a telecommunication operator, the supervised Markov Blanket Learning algorithm has improved the precision of 23% compared to the filters that were used then. The performances of this Bayesian network, learned 115 times quicker than the time necessary for learning the Bayesian network representing all the probabilistic dependences, have been also better than those obtained with Decision trees (+12%) and those obtained with Neural Networks (+7%).
The supervised Markov Blanket Learning algorithm constitutes also a very powerful tool for selecting the interesting variables (cf. also the Microarrays analysis). In that study for example, it has allowed focusing the analysis on just 21 variables that were really relevant for the characterization of the fraud, among the 224 variables that were available.




