BayesiaLab: Application Examples of Bayesian networks
Customer Profiling
Lionel Jouffe
Bayesia
The data bases relative to the customers can be used to elaborate
profiles by using Data Mining methods. These profiles can then bring objective
information to the Marketing department, they can also be used to reduce the
cost of the campaigns by selecting only the prospects than have a high probability
to reply positively, or they can be exploited for fraud detection.
MARKETING
We describe here the use of BayesiaLab
for the profiling of customers with respect to a bank product that has various
modalities. The data base used contains variables describing the customer (age,
socioprofessional group, ...), its bank account (facilities, consumption, ...)
and the modality of its bank product.
This last variable allows using supervised
learning methods. Instead of learning a Bayesian network representing all
the probabilistic relations that hold in the data base, it is possible to use
the supervised learning algorithms of BayesiaLab. For example, the Markov Blanket
Learning algorithm allows focusing the search only toward the variables that
really characterize the target variable. The screen shot below represent the
Bayesian network that has been automatically learned.
This Bayesian network, that represents the Markov Blanket
of the bank product, makes the analysis easier by reducing the number of variables
to take into account. The complete analysis
toolbox of BayesiaLab is also very helpful (arc length relative to the
strength of the probabilistic relations, automatic positioning of the nodes
that takes into account its strength, analysis of the probabilistic relations
between the variables and the target variable or one of its modality).
The quality evaluation of the learned Bayesian network on
a data set that has not been used for learning has allowed measuring a
gain of 20% in precision with respect to the profile used
then. The screen shot below represents the evaluation tools of BayesiaLab
(Total precision of the Bayesian network, Confusion matrix to measure more
precisely the quality of the model (displayed with occurrences, reliability
or precision), and Lift or ROC curves that constitute useful tools for choosing
the probability thresholds that will be used in the decision rules).

BayesiaLab can then exploit the Bayesian network to build
adaptive questionnaires for sorting the monitor variables (i.e. the questions)
based both on the information brought to the knowing of the target variable,
and on the cost associated to the knowing of the question variable.
The Bayesian network can also be used to predict the bank
product for new customers. As BayesiaLab returns the probability associated
to that prediction, it is possible to use this probability to reduce the cost
of the campaigns by using this probability to select the prospects
FRAUD DETECTION
By using the same methodology as the one described above
on data coming from a telecommunication operator, the supervised Markov
Blanket Learning algorithm has improved the
precision of 23% compared to the filters that were used
then. The performances of this Bayesian network, learned 115 times quicker
than the time necessary for learning the Bayesian network representing all
the probabilistic dependences, have been also better than those obtained
with Decision trees (+12%) and those obtained with Neural Networks (+7%).
The supervised Markov Blanket Learning algorithm constitutes
also a very powerful tool for selecting the interesting variables (cf. also
the Microarrays analysis). In that study for
example, it has allowed focusing the analysis on just 21 variables that
were really relevant for the characterization of the fraud, among the 224
variables that were available.
|