What is a Bayesian Network?
Overview
- A Bayesian network, also known as a Bayesian belief network (BBN), is a probabilistic model that represents a set of random variables and their conditional dependencies using a directed acyclic graph (DAG) and associated conditional probability distributions.
- Each node in the DAG corresponds to a random variable, and each directed arc indicates a conditional dependency between variables. Every node is associated with a conditional probability table (CPT) that defines the probability distribution of the variable given its parent nodes.
- Bayesian networks are used to model complex systems and support reasoning under uncertainty. They are applicable to a wide range of tasks, including classification, prediction, diagnosis, decision-making, and causal inference.
- A key advantage of Bayesian networks is their ability to compactly represent and efficiently compute high-dimensional probability distributions, especially when variable relationships are intricate or difficult to model using traditional statistical approaches.
- Bayesian networks support rigorous probabilistic inference by propagating evidence observed on a subset of variables throughout the network, updating beliefs about unobserved variables accordingly.
- In addition to representing expert knowledge, uncertain beliefs, and qualitative knowledge in an intuitive graphical form, Bayesian networks can also serve as powerful tools for knowledge discovery when combined with machine learning and data mining techniques.
Probabilistic Model
From a technical point of view, a Bayesian network consists of two parts:
- Qualitative: a Directed Acyclic Graph (DAG), i.e., a special kind of directed graph that does not include cycles.
- Directed Acyclic Graphs are composed of nodes that represent the variables of the domain (e.g., the temperature of a device, a feature of an object, the occurrence of an event, the age of a patient), and the links represent statistical (informational) or causal dependencies among the variables.
- The DAG is the formal definition of the factorization of the Joint Probability Distribution over the set of all variables in the domain.
- *Quantitative: conditional probability distributions to quantify the dependencies of each node given its parents in the DAG.
Example
Let’s take two variables: and . The corresponding DAG has two nodes, one for and another for .
As there is a probabilistic (and causal) relationship between Age and Gray Hair, there is a link between the two nodes:
This graph defines a factorization of the joint probability distribution :
Marginal Probability Distribution
The probability distributions in a Bayesian network are typically represented with tables. The marginal distribution of the node is shown in the following table:
This table indicates that 16.4% of the population under study is under 30 years old, while 9.4% is over 70 years old.
Conditional Probability Table
The following Conditional Probability Table quantifies the relationship between and :
This suggests that 66% of individuals under 30 do not have any gray hair. Conversely, 30.8% of those over 70 are completely gray.
Probabilistic Inference
- The DAG and the probability distributions associated with each node allow a compact representation of the Joint Probability Distribution over all the variables.
- Inference algorithms are available so that Bayesian networks can be used as probabilistic expert systems or inference engines for computing the posterior probability distributions of unobserved nodes given evidence observed on any number of observed nodes.
- Also, observational inference in Bayesian networks is omnidirectional: it is possible to perform inference from parent nodes to child (simulation), child nodes to parent nodes (diagnosis), and any combination of the two kinds of inference.
- However, it is essential to point out that causal inference can only be used in the context of simulation with a causal Bayesian network.
Examples
The following Monitors show the marginal probability distributions of and :
Simulation Example
In this example, we simulate the posterior probability distribution of , given that we observe .
- The top Monitor shows the observed evidence
- The bottom Monitor displays the posterior probability distribution of .
Diagnosis Example
Now, we reason in the reverse direction. We observe and infer (or diagnose) the posterior probability distribution of .
Bayesian Network Design
There are two ways to create a Bayesian network:
- Knowledge Modeling: You can use any available expert knowledge to manually design a Bayesian network and define the corresponding probability distributions.
- Machine Learning: You can machine-learn a Bayesian network from data and estimate the corresponding probability distributions.
Within the same theoretical framework, BayesiaLab offers a broad set of data mining algorithms:
- Unsupervised Structural Learning: BayesiaLab induces a Bayesian network to compactly represent the joint probability distribution sampled by the data set; all the variables have the same importance in this context.
- Supervised Learning: BayesiaLab can learn a Bayesian network entirely focused on the characterization (or prediction) of a target variable.
- Data Clustering: BayesiaLab creates a Bayesian network with a hidden variable to represent uniform groups of individuals/observations.
- Variable Clustering: BayesiaLab identifies strongly connected variables that can be clustered into factors.
- Probabilistic Structural Equation Models: BayesiaLab builds a hierarchical Bayesian network using hidden variables (or factors) that were identified during Variable Clustering.