Biocomputing transcriptome analysis
DNA Microarrays technology allows to obtain information on the gene's functional role by measuring the differential expressions of a great number of genes simultaneously. The problem is then to analyse all this information, to identify for example the group of genes playing a significant role in a particular physiological situation, or to identify the potential interactions between the gene's products.
We describe here the use of BayesiaLab to analyse data coming from a DNA microarray dedicated to the study of the colon cancer. This microarray measures the expression levels of 2000 genes taken in 62 different samples (22 corresponding to tumor biopsies). This data set is available on the University of Edinburgh's web site.
As there are a great number of genes and a low number of samples, the lines correspond to the genes and the column correspond to the samples. To analyse the genes, it is then necessary to transpose the data by using for example the BayesiaLab's data importation wizard.
We add a Boolean variable to this data set to indicate if the sample corresponds to a tumor biopsy or not. This new variable will be useful for two reasons:
Gene expressions are represented as continuous variables. To be able to take into account these variables, we need to make them discrete. We can use classical algorithms based on equal frequencies or equal distance, but BayesiaLab has much more powerful discretization algorithm that finds the relevant thresholds by using Decision Tree induction. However, this algorithm needs a discrete target variable. The tumor tag can play that role.- This tumor tag also allows us to use the powerful supervised learning algorithms of BayesiaLab to select the genes that are important for the characterization of the colon cancer. The screen shot below corresponds to the Bayesian network obtained with our Augmented Markov Blanket algorithm.
11 genes have been selected. By keeping only these genes, it is possible to use the unsupervised learning algorithms of BayesiaLab to discover the interactions between these genes. What-if scenarios can be then carried out to see the impact of the activation levels of some genes on the levels of the other genes. It is also possible to use the BayesiaLab's analysis toolbox.
It is also possible to analyse the relations that hold between samples (i.e. without transposing data). The bayesian network below corresponds to the one obtained by the SopLEQ algorithm. Nodes with a red label are those representing tumor biopsies. As it can be seen on this bayesian network automatically positioned thanks to our genetic positioning algorithm, these red nodes are globally in the same zone.


