BayesiaLab: Application Examples of Bayesian networks
Transcriptome Analysis
Lionel Jouffe
Bayesia
DNA Microarrays technology allows to obtain information on
the gene's functional role by measuring the differential expressions of a great
number of genes simultaneously. The problem is then to analyse all this information,
to identify for example the group of genes playing a significant role in a particular
physiological situation, or to identify the potential interactions between the
gene's products.

We describe here the use of BayesiaLab
to analyse data coming from a DNA microarray dedicated to the study of the colon
cancer. This microarray measures the expression levels of 2000 genes taken in
62 different samples (22 corresponding to tumor biopsies). This data set is
available on the University
of Edinburgh's web site.
As there are a great number of genes and a low number of samples,
the lines correspond to the genes and the column correspond to the samples.
To analyse the genes, it is then necessary to transpose the data by using for
example the BayesiaLab's data importation wizard.

We add a Boolean variable to this data set to indicate if the
sample corresponds to a tumor biopsy or not. This new variable will be useful
for two reasons:
- Gene expressions are represented as continuous variables. To be able to
take into account these variables, we need to make them discrete. We can use
classical algorithms based on equal frequencies or equal distance, but BayesiaLab
has much more powerful discretization algorithm that finds the relevant thresholds
by using Decision Tree induction. However, this algorithm needs a discrete
target variable. The tumor tag can play that role.
- This tumor tag also allows us to use the powerful supervised learning
algorithms of BayesiaLab to select the genes that are important for the
characterization of the colon cancer. The screen shot below corresponds to
the Bayesian network obtained with our Augmented Markov Blanket algorithm.

11 genes have been selected. By keeping only these genes, it
is possible to use the unsupervised learning algorithms of BayesiaLab to discover
the interactions between these genes. What-if scenarios can be then carried
out to see the impact of the activation levels of some genes on the levels of
the other genes. It is also possible to use the BayesiaLab's analysis
toolbox.

It is also possible to analyse the relations that hold between
samples (i.e. without transposing data). The Bayesian network below corresponds
to the one obtained by the SopLEQ algorithm. Nodes with a red label are those
representing tumor biopsies. As it can be seen on this Bayesian network automatically
positioned thanks to our genetic positioning
algorithm, these red nodes are globally in the same zone.

|