1 of 6

BayesiaLab

Overview

BayesiaLab is a powerful desktop application (Windows/Mac/Unix) that provides scientists with a comprehensive “laboratory” for machine learning, knowledge modeling, probabilistic reasoning (incl. diagnosis and simulation), causal inference, and optimization.
BayesiaLab utilizes the Bayesian network framework for gaining deep insights into problem domains and reasoning about them.
BayesiaLab is the result of more than twenty years of research by Dr. Lionel Jouffe and Dr. Paul Munteanu and their team of computer scientists. Their company, Bayesia S.A.S., is headquartered in Laval in northwestern France, with affiliates in the U.S. and Singapore.
Today, Bayesia S.A.S. is the world’s leading supplier of Bayesian network software, serving hundreds of major corporations and research organizations around the world.
Learn about the innovations implemented in the latest version of BayesiaLab here: What's New?

Executive Summary

Executive Summary This executive summary in PDF format explains on two pages how BayesiaLab can support you in your research and decision-making workflows. Pass it along to anyone in your organization who needs to know — in non-technical terms — what BayesiaLab can do.

BayesiaLab's Core Features & Functions

Knowledge Modeling

Subject matter experts often express their causal understanding of a domain in the form of diagrams in which arrows indicate causal directions.
This visual representation of causes and effects has a direct analog in the network graph in BayesiaLab.
Nodes (representing variables) can be added and positioned on BayesiaLab’s Graph Panel with a mouse click, and arcs (representing relationships) can be “drawn” between nodes.
The causal direction can be encoded by orienting the arcs from cause to effect.
The quantitative nature of relationships between variables, plus many other attributes, can be managed in BayesiaLab’s Node Editor.
In this way, BayesiaLab facilitates the straightforward encoding of one’s understanding of a domain.
Simultaneously, BayesiaLab enforces internal consistency so that impossible conditions cannot be encoded accidentally.

See Examples & Learn More

Chapter 4: Knowledge Modeling & Probabilistic Reasoning
Webinar: Reasoning About Renewable Energy
Webinar: Optimizing Health Policies

Knowledge Elicitation

In addition to directly encoding explicit knowledge in BayesiaLab, the Bayesia Expert Knowledge Elicitation Environment (BEKEE) is available to acquire the probabilities of a network from a group of experts.
The Bayesia Expert Knowledge Elicitation Environment (BEKEE) is a web service that allows you to systematically elicit both explicit and tacit knowledge from multiple expert stakeholders.

See Examples & Learn More

Discrete, Nonlinear, and Nonparametric Modeling

BayesiaLab contains all “parameters” describing probabilistic relationships between variables in Conditional Probability Tables (CPT), meaning no functional forms are utilized.
Given this nonparametric, discrete approach, BayesiaLab can conveniently handle nonlinear relationships between variables. However, this CPT-based representation requires a preparation step for dealing with continuous variables, namely discretization. This consists of manually or automatically defining a discrete representation of all continuous values.
BayesiaLab offers several tools for discretization, which are accessible in the Data Import Wizard, in the Node Editor, and in a standalone Discretization function. Univariate, bivariate, and multivariate discretization algorithms are available in this context.

Machine Learning with BayesiaLab

BayesiaLab features a comprehensive array of highly optimized algorithms to efficiently learn Bayesian networks from data (structure and parameters).
The optimization criteria in BayesiaLab’s learning algorithms are mostly based on information theory (e.g., the Minimum Description Length).
With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.

Unsupervised Learning

In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a clear distinction, we emphasize “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between many variables without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.

Supervised Learning

Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e., to develop a model for predicting a target variable.
Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to one type of network, i.e., the Naive Bayes network.
BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also considering the complexity of the resulting network.
We should highlight the set of Markov Blanket algorithms for their speed, which is particularly helpful when dealing with many variables. In this context, the Markov Blanket algorithm can be an efficient variable selection algorithm.

See Examples & Learn More

Markov Blanket Learning Algorithms (9.0)
Chapter 6: Supervised Learning
Webinar: Diagnostic Decision Support

Clustering

Clustering in BayesiaLab covers both Data Clustering and Variable Clustering.
- Data Clustering applies to creating a Latent Variable whose states represent groups of observations (records) that share some characteristics.
- Variable Clustering groups variables according to the strength of their relationships.
Multiple Clustering is one of the steps of BayesiaLab's Probabilistic Structural Equation Model (PSEM) workflow. It consists of iteratively using Data Clustering on subsets of data defined by Variable Clustering to create Latent Variables that represent the hidden causes that have been sensed by Manifest Variables. This can be considered as a kind of nonlinear, nonparametric, and nonorthogonal factor analysis.

See Examples & Learn More

Data Clustering (7.0)
Variable Clustering (7.0)
Multiple Clustering (9.0)
Chapter 8: Probabilistic Structural Equation Models Webinar: Factor Analysis Reinvented — Probabilistic Latent Factor Induction

Inference: Diagnosis, Prediction, and Simulation

The inherent ability of Bayesian networks to explicitly model uncertainty makes them suitable for a broad range of real-world applications.
In the Bayesian network framework, diagnosis, prediction, and simulation are identical computations. They all consist of observational inference conditional upon evidence:
- Inference from observed effects to causes: diagnosis or abduction.
- Inference from observed causes to effects: simulation or prediction.
This distinction, however, only exists from the perspective of the researcher, who would presumably see the symptom of a disease as the effect and the disease itself as the cause. Hence, carrying out inference based on observed symptoms is interpreted as a “diagnosis.”

Observational Inference

One of the central benefits of Bayesian networks is that they represent the Joint Probability Distribution and can therefore carry out inference “omnidirectionally.”
Given an observation with any type of evidence on any of the networks’ nodes (or a subset of nodes), BayesiaLab computes the posterior probabilities of all other nodes in the network, regardless of arc directions.
Both exact and approximate observational inference algorithms are implemented in BayesiaLab.

Types of Evidence

Hard Evidence: no uncertainty regarding the state of the variable (node).
Likelihood/Virtual Evidence is defined by likelihoods associated with each variable state.
Probabilistic/Soft Evidence, defined by marginal probability distributions.
Numerical Evidence, for numerical variables or for categorical/symbolic variables that have associated numerical values.

See Examples & Learn More

Types of Evidence in Chapter 7: Unsupervised Learning

Causal Inference

Beyond observational inference, BayesiaLab can also perform causal inference for computing the impact of intervening on a subset of variables instead of merely observing these variables.
Pearl’s Graph Surgery and Jouffe’s Likelihood Matching are available for this purpose.

See Examples & Learn More

Effects Analysis

Many research activities focus on estimating the size of an effect, e.g., to establish the treatment effect of a new drug or to determine the sales boost from a new advertising campaign. Other studies attempt to decompose observed effects into their causes, i.e., they perform attribution.
BayesiaLab performs simulations to compute effects, as parameters as such do not exist in this nonparametric framework.
As all the domain dynamics are encoded in discrete Conditional Probability Tables (CPT), effect sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis, and several other functions offer ways to study effects, including nonlinear and variable interactions.

See Examples & Learn More

Optimization

BayesiaLab’s ability to perform inference over all possible states of all nodes in a network also provides the basis for searching for node values that optimize a target criterion. BayesiaLab’s Target Optimization is a set of tools for this purpose.
Using these functions in combination with Direct Effects is of particular interest when searching for the optimum combination of variables that have a nonlinear relationship with the target, plus co-relations between them.
A typical example would be searching for the optimum mix of marketing expenditures to maximize sales. BayesiaLab’s Genetic Target Optimization will search, within the specified constraints, for those scenarios that optimize the target criterion.

Model Utilization

BayesiaLab provides a range of functions for systematically utilizing the knowledge contained in a Bayesian network. They make a Bayesian network accessible as a probabilistic expert system that can be queried interactively by an end-user.

Adaptive Questionnaire

The Adaptive Questionnaire function provides guidance regarding the optimum sequence for seeking evidence.
BayesiaLab determines dynamically, given the evidence already gathered, the next best piece of evidence to obtain in order to maximize the information gain with respect to the Target Node while minimizing the cost of acquiring such evidence.
In a medical context, for instance, this would allow for the optimal “escalation” of diagnostic procedures from “low-cost/small-gain” evidence (e.g., measuring the patient’s blood pressure) to “high-cost/large-gain” evidence (e.g., performing an MRI scan).

See Examples & Learn More

WebSimulator

The BayesiaLab WebSimulator is a platform for publishing interactive models and Adaptive Questionnaires via the web, which means that any Bayesian network built with BayesiaLab can be shared privately with clients or publicly with a broader audience.
Once a model is published via the WebSimulator, end users can try out scenarios and examine the dynamics of that model.

Code Export

Batch Inference is available for automatically performing inference on many records in a dataset. For example, Batch Inference can be used to produce a predictive score for all customers in a database.
With the same objective, BayesiaLab’s optional Code Export Module can translate predictive network models into static code that can run in external programs. Modules are available that can generate code for R, SAS, PHP, VBA, Python, and JavaScript.

Bayesia Engine API

Developers can also access many of BayesiaLab’s functions—outside the graphical user interface—by using the Bayesia Engine API.
The Bayesia Modeling Engine allows you to construct and edit networks.
The Bayesia Inference Engine can access network models programmatically for performing automated inference, e.g., as part of a real-time application with streaming data.
Finally, the Bayesia Learning Engine gives you programmatic access to BayesiaLab's discretization and learning algorithms.
The Bayesia Engine APIs are implemented as pure Java class libraries (jar files), which can be integrated into any software project.

See Examples & Learn More

Knowledge Communication

While generating a Bayesian network, either by expert knowledge modeling or through machine learning, is all about a computer acquiring knowledge.
However, a Bayesian network can also be a remarkably powerful tool for humans to extract or “harvest” knowledge.
Given that a Bayesian network can serve as a high-dimensional representation of a real-world domain, BayesiaLab allows us to interactively — even playfully — engage with this domain to learn about it.

Visualization in 2D, 3D, and VR

Through visualization, simulation, and analysis functions, plus the graphical nature of the network model itself, BayesiaLab becomes an instructional device that can effectively retrieve and communicate the knowledge contained within the Bayesian network.
As such, BayesiaLab becomes a bridge between artificial intelligence and human intelligence.

See Examples & Learn More

Inference: Diagnosis, Prediction, and Simulation

The inherent ability of Bayesian networks to explicitly model uncertainty makes them suitable for a broad range of real-world applications.
In the Bayesian network framework, diagnosis, prediction, and simulation are identical computations. They all consist of observational inference conditional upon evidence:
- Inference from observed effects to causes: diagnosis or abduction.
- Inference from observed causes to effects: simulation or prediction.
This distinction, however, only exists from the perspective of the researcher, who would presumably see the symptom of a disease as the effect and the disease itself as the cause. Hence, carrying out inference based on observed symptoms is interpreted as a “diagnosis.”

Observational Inference

One of the central benefits of Bayesian networks is that they represent the Joint Probability Distribution and can therefore carry out inference “omnidirectionally.”
Given an observation with any type of evidence on any of the networks’ nodes (or a subset of nodes), BayesiaLab computes the posterior probabilities of all other nodes in the network, regardless of arc directions.
Both exact and approximate observational inference algorithms are implemented in BayesiaLab.

Types of Evidence

Hard Evidence: no uncertainty regarding the state of the variable (node).
Likelihood/Virtual Evidence is defined by likelihoods associated with each variable state.
Probabilistic/Soft Evidence, defined by marginal probability distributions.
Numerical Evidence, for numerical variables or for categorical/symbolic variables that have associated numerical values.

See Examples & Learn More

Types of Evidence in Chapter 7: Unsupervised Learning

Causal Inference

Beyond observational inference, BayesiaLab can also perform causal inference for computing the impact of intervening on a subset of variables instead of merely observing these variables.
Pearl’s Graph Surgery and Jouffe’s Likelihood Matching are available for this purpose.

See Examples & Learn More

Effects Analysis

Many research activities focus on estimating the size of an effect, e.g., to establish the treatment effect of a new drug or to determine the sales boost from a new advertising campaign. Other studies attempt to decompose observed effects into their causes, i.e., they perform attribution.
BayesiaLab performs simulations to compute effects, as parameters as such do not exist in this nonparametric framework.
As all the domain dynamics are encoded in discrete Conditional Probability Tables (CPT), effect sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis, and several other functions offer ways to study effects, including nonlinear and variable interactions.

See Examples & Learn More

Optimization

BayesiaLab’s ability to perform inference over all possible states of all nodes in a network also provides the basis for searching for node values that optimize a target criterion. BayesiaLab’s Target Optimization is a set of tools for this purpose.
Using these functions in combination with Direct Effects is of particular interest when searching for the optimum combination of variables that have a nonlinear relationship with the target, plus co-relations between them.
A typical example would be searching for the optimum mix of marketing expenditures to maximize sales. BayesiaLab’s Genetic Target Optimization will search, within the specified constraints, for those scenarios that optimize the target criterion.

Machine Learning with BayesiaLab

BayesiaLab features a comprehensive array of highly optimized algorithms to efficiently learn Bayesian networks from data (structure and parameters).
The optimization criteria in BayesiaLab’s learning algorithms are mostly based on information theory (e.g., the Minimum Description Length).
With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.

Unsupervised Learning

In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a clear distinction, we emphasize “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between many variables without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.

Supervised Learning

Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e., to develop a model for predicting a target variable.
Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to one type of network, i.e., the Naive Bayes network.
BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also considering the complexity of the resulting network.
We should highlight the set of Markov Blanket algorithms for their speed, which is particularly helpful when dealing with many variables. In this context, the Markov Blanket algorithm can be an efficient variable selection algorithm.

See Examples & Learn More

Markov Blanket Learning Algorithms (9.0)
Chapter 6: Supervised Learning
Webinar: Diagnostic Decision Support

Clustering

Clustering in BayesiaLab covers both Data Clustering and Variable Clustering.
- Data Clustering applies to creating a Latent Variable whose states represent groups of observations (records) that share some characteristics.
- Variable Clustering groups variables according to the strength of their relationships.
Multiple Clustering is one of the steps of BayesiaLab's Probabilistic Structural Equation Model (PSEM) workflow. It consists of iteratively using Data Clustering on subsets of data defined by Variable Clustering to create Latent Variables that represent the hidden causes that have been sensed by Manifest Variables. This can be considered as a kind of nonlinear, nonparametric, and nonorthogonal factor analysis.

See Examples & Learn More

Data Clustering (7.0)
Variable Clustering (7.0)
Multiple Clustering (9.0)
Chapter 8: Probabilistic Structural Equation Models Webinar: Factor Analysis Reinvented — Probabilistic Latent Factor Induction