BayesiaLab features a comprehensive array of highly optimized learning algorithms that can quickly uncover structures in datasets. The optimization criteria in BayesiaLab’s learning algorithms are based on information theory (e.g. the Minimum Description Length). With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.
Unsupervised Structural Learning
In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a very clear distinction, we place emphasis on “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between a large number of variables, without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.
Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e. to develop a model for predicting a target variable. Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to a one type of network, i.e. the Naive Bayes network. BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also taking into account the complexity of the resulting network.
We should highlight the Markov Blanket algorithm for its speed, which is particularly helpful when dealing with a large number of variables. In this context, the Markov Blanket algorithm can serve as an efficient variable selection algorithm. An example of Supervised Learning using this algorithm, and the closely-related Augmented Markov Blanket algorithm, will be presented in Chapter 6.
Clustering in BayesiaLab covers both Data Clustering and Variable Clustering. The former applies to the grouping of records (or observations) in a dataset; the latter performs a grouping of variables according to the strength of their mutual relationships (Figure 3.8).
A third variation of this concept is of particular importance in BayesiaLab: Multiple Clustering can be characterized as a kind of nonlinear, nonparametric and nonorthogonal factor analysis. Multiple Clustering often serves as the basis for developing Probabilistic Structural Equation Models with BayesiaLab.