1 of 150

BAYESIA

Bayesia Homepage

An All-New Website for BayesiaLab 11

With the release of BayesiaLab 11, we are also transitioning to an entirely new website. If you can't find the content you are looking for on this new site, please check the Legacy Edition of the BayesiaLab Knowledge Hub.

BayesiaLab

What's New?

Learn about the latest innovations in BayesiaLab 11

BayesiaLab 11

Version 11 of BayesiaLab is the latest iteration of our flagship product that has been under continuous development for nearly 25 years. No other organization has invested as many resources in developing technologies around the Bayesian network paradigm.

Release 11 once again features many innovations, including the native integration of a LLM-based subject matter assistant (OpenAI, OpenAI GPT Assistants, Azure, Mistral, ...).

Here is a selection of the most important new features:

Hellixia

Hellixia is the name of BayesiaLab's subject matter assistant based on Large Language Models. Hellixia offers a wide range of functions to help you characterize a given problem domain:

Dimension Elicitor: Identify relevant dimensions of a problem domain by using a large set of keywords and create the corresponding nodes.
Comment Generator: Utilize a comprehensive set of keywords to pinpoint relevant dimensions within a problem domain and add them as comments to the nodes.
Embedding Generator: This tool creates embeddings encapsulating node semantics, featuring vectors of 1,536 dimensions, enabling the learning of semantic networks.
Class Description Generator: Generate descriptive summaries for set of nodes to use as names for latent variables, for instance.
Semantic Variable Clustering: Create clusters of nodes based on their semantic.
Pairwise Causal Link: This function evaluates the causal relationship between two nodes, adding an arc if a link exists. It also quantifies the causal effect (ranging from -100 to 100) and creates or updates the conditional probability table accordingly.
Causal Structural Priors: This tool assesses the causal relationship between two nodes and creates a Structural Prior if a relationship exists. The value of the prior reflects the confidence level in the relationship's existence.
Causal Arc Explainer: This tool examines the causal relationship between two nodes, providing a detailed description of the causal mechanism when a relationship is identified. Additionally, it quantifies the causal effect, with values ranging from -100 to 100.
Causal Network Generator: This tool develops a Causal Bayesian Network focused on the chosen node. It generates new nodes, adds detailed comments for each causal link explaining the mechanism, determines causal effects (with values between -100 and 100), and constructs the conditional probability tables.
Causal Relationships Finder: This tool, akin to the Causal Network Generator, is designed to build a causal network using a predefined set of nodes instead of centering around a single node and generating new nodes.
Image Generator: This feature produces icons that visually represent the information linked to the nodes.
Translator: This function translates various network elements — including names of nodes, states, and comments on nodes and arcs — into the chosen language.
Report Analyzer: This tool processes the output from the Relationship Analysis Report, such as arc and node forces, and creates an HTML report that details the key dynamics of the domain represented by the network.

Independence of Causal Influence

The Independence of Causal Influence (ICI) tool has been enhanced with several updates:

New Combination Functions

SumPos(): An asymmetrical variation of the Sum function focusing on positive local mechanical effects.
SumNeg(): A counterpart that emphasizes negative local mechanical effects.
MinMax(): A function that implements the min method for negative values and the max method for positive ones.

ICI Wizard Enhancement

A Condensed Display option has been introduced. This feature creates a network where the local effects are snapped to their parent and the combination nodes to their respective children.

The Expert Editor has been rebranded as the SMEs & BEKEE Session Manager.
Subject Matter Experts (SMEs) can now be identified with specific colors for better differentiation.
There's an option to decide whether to send out invitation emails to the SMEs.
In terms of qualitative knowledge elicitation, specifically the qualitative segment of the Delphi Method, you can now utilize the Assessment Editor to produce Notes directly on the Graph Panel, derived from the comments provided by experts.

When eliciting a node, its current distribution can be dispatched as a prior to all experts in BEKEE, serving as an alternative to the default uniform distribution.
Node Contextual Menu:
- Generate from Assessments: this function facilitates the creation of distributions based on the weighted votes of chosen experts.
- Generate Assessments: This feature uses the node's current probability distribution to create an assessment associated with a selected expert. When Prior Weights are linked to the node, there's an option to use these weights to determine the expert's confidence level in the assessments.
- Delete Zero-Confidence Assessments: this option removes all assessments in which the expert's confidence level is set to 0.
- Delete Assessments: his feature deletes the assessments linked to the chosen experts.
Hellinger Distance: Measures the distance between experts' votes and a reference expert (usually the consensus).
2D/3D Mapping incorporates new metrics derived from experts' assessments.

Formulas

The Formulas tab in the Node Editor now supports local variables.

Additionally, new functions have been introduced, with some of the most notable being:

TriangularMD(v1, x), i.e., triangular membership degree in fuzzy logic (under Special Functions)

Deciban(x): The deciban is a logarithmic unit — much like the decibel or the Richter scale — introduced by Alan Turing for expressing probabilities. It is a tenth of a ban, which is also known as the base-10 log odds (under Arithmetic Functions)
Hellinger(v1, v2): The Hellinger distance is a measure of the similarity between two probability distributions (under Inference Functions)
NoisySum(s, leak, v1, w1, vn, wn): Used for representing situations where the variable s is the weighted (wi) sum of its parents (vi) plus an additional noise term (leak) to model uncertainty or random fluctuations
DualNoisyOr(s, leak, c1, p1, cn, pn): This function implements a modified Noisy-Or model that operates based on the combined effect of all pi values. The parameters ci represent conditions or boolean variables, while pi are their associated effects (positive or negative). When the aggregated sum of pi values is positive, the function executes a Noisy-Or with an overall effect equal to this sum, effectively determining the probability of the True state. Conversely, when the sum is negative, the function applies the Noisy-Or logic to the False state, adjusting the likelihood of the outcome being False according to this negative sum
SingleMode(v): A function designed to ascertain whether the distribution of variable v is unimodal (under Inference Functions).

Weight of Evidence

Weight of Evidence now features four new types of analyses:

Most/Least Relevant Explanations
Most/Least Confirmatory Clues

Structural Learning Algorithms

The EQ-based learning algorithms are now disabled in scenarios where the score of an arc is not equivalent in both directions. This can occur due to filtered states, constraints, structural priors, etc. The assumption of equivalence is no longer theoretically valid in such contexts and could result in invalid networks with cycles.

Evidence Scenario Files

The data associated with the network can now be exported into an evidence scenario file.
Scenarios are now editable, allowing adjustments to the index, weight, and comments.
A new Evidence Scenario Report is now accessible, offering a detailed description of the scenarios' content.

Target Evaluation Tool

The redesigned Target Evaluation function now features dedicated tabs for:

Classification
Posterior Probabilities
Regression
Triage

Graph Layout, Rendering and Edition

Dynamic Grid Layout: This innovative layout algorithm, particularly suitable for creating readable graphs featuring badges with associated comments, excels in handling graphs created with Hellixia.

View Menu: four new functions have been introduced to optimize the display of graphs. Users can now shrink or stretch graphs both vertically and horizontally, offering enhanced visualization flexibility.

Position Menu: this new item has been introduced to enable the adjustment of the graphical layers of Nodes and Notes. It's available via their contextual menus.

Horizontal and Vertical Stacking: These new alignment tools enable the positioning of the selected nodes horizontally or vertically, aligning them automatically closely without extra space.

Highlight a Class: Accessible from the Note Contextual Menu, this feature lets you select a Class and then automatically adjusts the size and position of the note to encompass all nodes belonging to that class.

Arc Editor: Accessible by double-clicking an arc, this feature enables you to edit the text associated with the arc as well as its rendering properties.
Moving Arc Comments: You can now reposition comments along their corresponding arcs.

Color Linked: This new feature, added to the Rendering Properties of Badges, Monitors, Bars, and Gauges, automatically applies the node's associated color to the Name Background Color. Additionally, it also automatically selects white for the Name Color on dark backgrounds and black on lighter ones.
By pressing 'Z', a selection zone can be initiated, regardless of whether an object on the graph is clicked.
Numerical Evidence Entry for Gauges and Bars: A new approach is introduced for inputting numerical evidence through shift-clicking on a node. Utilize the 'M' and 'B' icons to select the Distribution Estimation Method (MinXEnt and Binary, respectively), with the three icon colors representing the Observation Type: No Fixing, Fix Mean, and Fix Probabilities, respectively.

Pseudo Root-Nodes: If a node exclusively has Function Nodes as parents, making it a root node of its subnetwork, and the parents of these Function Nodes have fixed observed values, then the distribution of these pseudo root-nodes is also automatically set to fixed.

Boolean Conversion: Featured in the Tools menu, this function enables the conversion of selected nodes into boolean nodes.

2D Mapping

The 2D mapping has been enhanced to incorporate an additional dimension for node analysis: Font Size, supplementing the existing Node Size and Color dimensions. This enables font sizes to be proportional to the selected metric.

The Node Analysis section has been enriched with the addition of numerous metrics, providing a more comprehensive analysis capability:
- Mutual Information with Target Node
- Mutual Information with Target State
- Bayes Factor
- Normalized Bayes Factor
- Kullback-Leibler
- Normalized Kullback-Leibler
- Total Effect on Target
- Standardized Total Effect on Target
- Direct Effect on Target
- Standardized Direct Effect on Target
- Number of Assessments
- Assessment Completion Rate
- Maximum Assessment Divergence
- Overall Assessment Divergence
- Missing Value Rate
Comments associated with the nodes are now displayed when you hover over them.
The option Hide Text for Ignored Nodes conceals the names of nodes that are not observable.

Introducing Hellixia

Background & Motivation

To this day, no reliable methods exist to find causal relationships in data. More specifically, given a statistical association between two variables, it is impossible to establish which variable is the cause and which is the effect.

As a result, acquiring additional external information, such as human expert knowledge or the temporal order of the variables, has always been necessary to determine the causal direction in bivariate relationships.

Given the importance of domain knowledge, the BayesiaLab team has developed new tools for expert knowledge elicitation for many years, such as the Bayesia Expert Knowledge Elicitation Environment (BEKEE).

Thus, the arrival of ChatGPT last year has prompted the Bayesia team to immediately leverage the potential of this new type of AI with BayesiaLab.

Hellixia, BayesiaLab's New Subject Matter Assistant

Hellixia is the name of BayesiaLab's subject matter assistant powered by ChatGPT. Hellixia offers a wide range of functions to help you characterize a given problem domain:

Identify relevant dimensions of a problem domain
Extract dimensions from a text
Generate embeddings for learning a semantic network
Generate meaningful descriptions for classes of nodes
Provide tools for causal analysis
Translate names and comments of nodes into different languages
Generate images to be associated with nodes

Embeddings

In the context of machine learning and natural language processing (NLP), embedding refers to a mathematical representation of a word, phrase, sentence, or any other linguistic unit in a continuous vector space. Word embeddings, in particular, are widely used representations that capture the semantic and syntactic properties of words.

Semantic Network

A semantic network is a graphical representation of knowledge or concepts organized in a network-like structure. It is a form of knowledge representation that depicts how different concepts or entities are related to each other through meaningful connections.

In a semantic network, concepts are represented as nodes, and their relationships are depicted as labeled links or arcs. These links indicate the connections or associations between the concepts, such as hierarchical, associative, or causal relationships.

Hellixia Workflow

A typical research workflow with Hellixia consists of the following steps:

BayesiaLab utilizes its structural learning algorithms to find associations between variables.
Then, Hellixia obtains the causal directions for the learned associations and applies them as structural priors to the network.
Finally, with these newly defined structural priors, BayesiaLab relearns the network. The final network now represents statistical knowledge from data plus the causal knowledge obtained from ChatGPT.

Hellixia Performance

The Tübingen Cause-Effect Pairs is a well-known dataset for assessing the performance of causal discovery methods. When tested against this dataset, Hellixia achieves 98% accuracy. The only errors are related to financial relationships for which Hellixia could not retrieve any causal relationships from ChatGPT.

The feature highlight of the BayesiaLab 11 is the integration of Hellixia, a subject matter assistant that leverages ChatGPT for structural knowledge elicitation.

Hellixia Preview at the 2023 BayesiaLab Spring Conference

In this presentation from the 2023 BayesiaLab Spring Conference, we show the new Hellixia functions integrate GPT-4 directly into BayesiaLab, including:

Chat Completion
Image Generation
Embedding Generation

As a result, the new Hellixia subject matter assistant can improve research workflows in several ways:

Accelerate the qualitative part of knowledge elicitation.
Generate practical natural language descriptions for latent factors created through BayesiaLab's clustering functions.
Automatically create images to illustrate nodes in a network.

Presentation Video

Learn More

BayesiaLab

Overview

BayesiaLab is a powerful desktop application (Windows/Mac/Unix) that provides scientists with a comprehensive “laboratory” for machine learning, knowledge modeling, probabilistic reasoning (incl. diagnosis and simulation), causal inference, and optimization.
BayesiaLab utilizes the Bayesian network framework for gaining deep insights into problem domains and reasoning about them.
BayesiaLab is the result of more than twenty years of research by Dr. Lionel Jouffe and Dr. Paul Munteanu and their team of computer scientists. Their company, Bayesia S.A.S., is headquartered in Laval in northwestern France, with affiliates in the U.S. and Singapore.
Today, Bayesia S.A.S. is the world’s leading supplier of Bayesian network software, serving hundreds of major corporations and research organizations around the world.
Learn about the innovations implemented in the latest version of BayesiaLab here: What's New?

Executive Summary

Executive Summary This executive summary in PDF format explains on two pages how BayesiaLab can support you in your research and decision-making workflows. Pass it along to anyone in your organization who needs to know — in non-technical terms — what BayesiaLab can do.

BayesiaLab's Core Features & Functions

Knowledge Modeling

Subject matter experts often express their causal understanding of a domain in the form of diagrams in which arrows indicate causal directions.
This visual representation of causes and effects has a direct analog in the network graph in BayesiaLab.
Nodes (representing variables) can be added and positioned on BayesiaLab’s Graph Panel with a mouse click, and arcs (representing relationships) can be “drawn” between nodes.
The causal direction can be encoded by orienting the arcs from cause to effect.
The quantitative nature of relationships between variables, plus many other attributes, can be managed in BayesiaLab’s Node Editor.
In this way, BayesiaLab facilitates the straightforward encoding of one’s understanding of a domain.
Simultaneously, BayesiaLab enforces internal consistency so that impossible conditions cannot be encoded accidentally.

See Examples & Learn More

Chapter 4: Knowledge Modeling & Probabilistic Reasoning
Webinar: Reasoning About Renewable Energy
Webinar: Optimizing Health Policies

Knowledge Elicitation

In addition to directly encoding explicit knowledge in BayesiaLab, the Bayesia Expert Knowledge Elicitation Environment (BEKEE) is available to acquire the probabilities of a network from a group of experts.
The Bayesia Expert Knowledge Elicitation Environment (BEKEE) is a web service that allows you to systematically elicit both explicit and tacit knowledge from multiple expert stakeholders.

See Examples & Learn More

Discrete, Nonlinear, and Nonparametric Modeling

BayesiaLab contains all “parameters” describing probabilistic relationships between variables in Conditional Probability Tables (CPT), meaning no functional forms are utilized.
Given this nonparametric, discrete approach, BayesiaLab can conveniently handle nonlinear relationships between variables. However, this CPT-based representation requires a preparation step for dealing with continuous variables, namely discretization. This consists of manually or automatically defining a discrete representation of all continuous values.
BayesiaLab offers several tools for discretization, which are accessible in the Data Import Wizard, in the Node Editor, and in a standalone Discretization function. Univariate, bivariate, and multivariate discretization algorithms are available in this context.

Machine Learning with BayesiaLab

BayesiaLab features a comprehensive array of highly optimized algorithms to efficiently learn Bayesian networks from data (structure and parameters).
The optimization criteria in BayesiaLab’s learning algorithms are mostly based on information theory (e.g., the Minimum Description Length).
With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.

Unsupervised Learning

In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a clear distinction, we emphasize “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between many variables without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.

Supervised Learning

Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e., to develop a model for predicting a target variable.
Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to one type of network, i.e., the Naive Bayes network.
BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also considering the complexity of the resulting network.
We should highlight the set of Markov Blanket algorithms for their speed, which is particularly helpful when dealing with many variables. In this context, the Markov Blanket algorithm can be an efficient variable selection algorithm.

See Examples & Learn More

Markov Blanket Learning Algorithms (9.0)
Chapter 6: Supervised Learning
Webinar: Diagnostic Decision Support

Clustering

Clustering in BayesiaLab covers both Data Clustering and Variable Clustering.
- Data Clustering applies to creating a Latent Variable whose states represent groups of observations (records) that share some characteristics.
- Variable Clustering groups variables according to the strength of their relationships.
Multiple Clustering is one of the steps of BayesiaLab's Probabilistic Structural Equation Model (PSEM) workflow. It consists of iteratively using Data Clustering on subsets of data defined by Variable Clustering to create Latent Variables that represent the hidden causes that have been sensed by Manifest Variables. This can be considered as a kind of nonlinear, nonparametric, and nonorthogonal factor analysis.

See Examples & Learn More

Data Clustering (7.0)
Variable Clustering (7.0)
Multiple Clustering (9.0)
Chapter 8: Probabilistic Structural Equation Models Webinar: Factor Analysis Reinvented — Probabilistic Latent Factor Induction

Inference: Diagnosis, Prediction, and Simulation

The inherent ability of Bayesian networks to explicitly model uncertainty makes them suitable for a broad range of real-world applications.
In the Bayesian network framework, diagnosis, prediction, and simulation are identical computations. They all consist of observational inference conditional upon evidence:
- Inference from observed effects to causes: diagnosis or abduction.
- Inference from observed causes to effects: simulation or prediction.
This distinction, however, only exists from the perspective of the researcher, who would presumably see the symptom of a disease as the effect and the disease itself as the cause. Hence, carrying out inference based on observed symptoms is interpreted as a “diagnosis.”

Observational Inference

One of the central benefits of Bayesian networks is that they represent the Joint Probability Distribution and can therefore carry out inference “omnidirectionally.”
Given an observation with any type of evidence on any of the networks’ nodes (or a subset of nodes), BayesiaLab computes the posterior probabilities of all other nodes in the network, regardless of arc directions.
Both exact and approximate observational inference algorithms are implemented in BayesiaLab.

Types of Evidence

Hard Evidence: no uncertainty regarding the state of the variable (node).
Likelihood/Virtual Evidence is defined by likelihoods associated with each variable state.
Probabilistic/Soft Evidence, defined by marginal probability distributions.
Numerical Evidence, for numerical variables or for categorical/symbolic variables that have associated numerical values.

See Examples & Learn More

Types of Evidence in Chapter 7: Unsupervised Learning

Causal Inference

Beyond observational inference, BayesiaLab can also perform causal inference for computing the impact of intervening on a subset of variables instead of merely observing these variables.
Pearl’s Graph Surgery and Jouffe’s Likelihood Matching are available for this purpose.

See Examples & Learn More

Effects Analysis

Many research activities focus on estimating the size of an effect, e.g., to establish the treatment effect of a new drug or to determine the sales boost from a new advertising campaign. Other studies attempt to decompose observed effects into their causes, i.e., they perform attribution.
BayesiaLab performs simulations to compute effects, as parameters as such do not exist in this nonparametric framework.
As all the domain dynamics are encoded in discrete Conditional Probability Tables (CPT), effect sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis, and several other functions offer ways to study effects, including nonlinear and variable interactions.

See Examples & Learn More

Optimization

BayesiaLab’s ability to perform inference over all possible states of all nodes in a network also provides the basis for searching for node values that optimize a target criterion. BayesiaLab’s Target Optimization is a set of tools for this purpose.
Using these functions in combination with Direct Effects is of particular interest when searching for the optimum combination of variables that have a nonlinear relationship with the target, plus co-relations between them.
A typical example would be searching for the optimum mix of marketing expenditures to maximize sales. BayesiaLab’s Genetic Target Optimization will search, within the specified constraints, for those scenarios that optimize the target criterion.

Model Utilization

BayesiaLab provides a range of functions for systematically utilizing the knowledge contained in a Bayesian network. They make a Bayesian network accessible as a probabilistic expert system that can be queried interactively by an end-user.

Adaptive Questionnaire

The Adaptive Questionnaire function provides guidance regarding the optimum sequence for seeking evidence.
BayesiaLab determines dynamically, given the evidence already gathered, the next best piece of evidence to obtain in order to maximize the information gain with respect to the Target Node while minimizing the cost of acquiring such evidence.
In a medical context, for instance, this would allow for the optimal “escalation” of diagnostic procedures from “low-cost/small-gain” evidence (e.g., measuring the patient’s blood pressure) to “high-cost/large-gain” evidence (e.g., performing an MRI scan).

See Examples & Learn More

WebSimulator

The BayesiaLab WebSimulator is a platform for publishing interactive models and Adaptive Questionnaires via the web, which means that any Bayesian network built with BayesiaLab can be shared privately with clients or publicly with a broader audience.
Once a model is published via the WebSimulator, end users can try out scenarios and examine the dynamics of that model.

Code Export

Batch Inference is available for automatically performing inference on many records in a dataset. For example, Batch Inference can be used to produce a predictive score for all customers in a database.
With the same objective, BayesiaLab’s optional Code Export Module can translate predictive network models into static code that can run in external programs. Modules are available that can generate code for R, SAS, PHP, VBA, Python, and JavaScript.

Bayesia Engine API

Developers can also access many of BayesiaLab’s functions—outside the graphical user interface—by using the Bayesia Engine API.
The Bayesia Modeling Engine allows you to construct and edit networks.
The Bayesia Inference Engine can access network models programmatically for performing automated inference, e.g., as part of a real-time application with streaming data.
Finally, the Bayesia Learning Engine gives you programmatic access to BayesiaLab's discretization and learning algorithms.
The Bayesia Engine APIs are implemented as pure Java class libraries (jar files), which can be integrated into any software project.

See Examples & Learn More

Knowledge Communication

While generating a Bayesian network, either by expert knowledge modeling or through machine learning, is all about a computer acquiring knowledge.
However, a Bayesian network can also be a remarkably powerful tool for humans to extract or “harvest” knowledge.
Given that a Bayesian network can serve as a high-dimensional representation of a real-world domain, BayesiaLab allows us to interactively — even playfully — engage with this domain to learn about it.

Visualization in 2D, 3D, and VR

Through visualization, simulation, and analysis functions, plus the graphical nature of the network model itself, BayesiaLab becomes an instructional device that can effectively retrieve and communicate the knowledge contained within the Bayesian network.
As such, BayesiaLab becomes a bridge between artificial intelligence and human intelligence.

See Examples & Learn More

BayesiaLab Trial Software

Free 30-Day BayesiaLab Trial

We offer an unrestricted trial version of the BayesiaLab software so you can evaluate our technology at your leisure.

All BayesiaLab functions are available in the trial version. There are no restrictions on the number of nodes and observations.
Upon registering for the BayesiaLab trial version, we will typically send you the download and activation instructions within 24 hours.
The instructions you receive will include download links for a number of operating systems, including Windows, macOS (Intel and ARM), and Unix/Linux.
From the date you receive your trial license credentials, you can use BayesiaLab for 30 days.

The 30-day trial period starts with the delivery of your credentials, not the date you install the trial.

BayesiaLab Trial Registration

Note to Current BayesiaLab Users

Please don't use the evaluation version to restore or upgrade your existing BayesiaLab license. The installation files are different from the licensed versions of BayesiaLab.
If you require an update, you can download the latest version via the Help menu in BayesiaLab: Main Menu > Help > Check for Updates.

User Guide

Introduction

Since BayesiaLab's initial release in 2002, this User Guide has grown from a small help file to a comprehensive software documentation, now exceeding 1,500 topics.
With the BayesiaLab software eco-system continuing to grow rapidly, this User Guide is very much a living document, with more details being added daily. Plus, the annual cycle of major releases adds countless new features.
Beyond documenting the software functionality, this User Guide also serves as a reference to BayesiaLab-related nomenclature.
Many of BayesiaLab's analysis functions are entirely new and unique in the world of research, so many BayesiaLab-specific terms are neologisms. Here, you can find what we mean with expressions such as "Target Dynamic Profile" or "Likelihood Matching."
In this User Guide, you will also find many cross-references to examples and case studies presented in seminars, webinars, and our e-book, now available as a free online edition within this Knowledge Hub.

User Guide Structure

This User Guide's tree structure mirrors the BayesiaLab software's structure.
For instance, if you want to learn about the details of the function located in BayesiaLab's menu structure at Main Menu > Analysis > Visual > Overall > Arc > Mutual Information, the corresponding documentation resides in this User Guide at: Main Menu | Analysis | Visual | Overall | Arc | Mutual Information.
If the same function is accessible via multiple paths, e.g., from the Main Menu, from the Graph Panel Context Menu, and the Node Context Menu, the main documentation of this function will be attached to the highest level in the hierarchy in this case, the Main Menu. All other mentions of the function will refer back to this main entry.

BayesiaLab's Graphical User Interface

BayesiaLab runs inside the Application Window.
Inside the Application Window, there are four main elements:

Main Menu

Context

The Main Menu serves as the top-level navigation in BayesiaLab.
Most functions and tools are available through multiple levels of submenus attached to the Main Menu.
However, in many cases, these functions are also accessible in the context of specific workflows.
The Main Menu can appear in three different configurations, and certain menu items and icons will only be available in specific contexts:

No Graph Open

Network

Context

The Network menu includes a range of standard functions related to:

Creating new files
Opening and closing files
Generating reports with network statistics

Startup Page

Clicking on the menu item Startup Page brings up a window featuring 12 quick-access cards.

The top row features some of the most common user actions after starting BayesiaLab:
- Manually Create a Network
- Open a Bayesian Network
- Learn a Network from Data
- Open the Media Center
The bottom two rows of cards show the most recently opened files with a network preview.
By default, the Startup screen is displayed right after launching BayesiaLab. The checkbox allows you to disable its automatic display.

New

Close

Closes the active Graph Window and prompts you to save the corresponding file, if your network, the associated dataset, or the Evidence Scenario File were changed since the last Save operation.

Close Others

Closes all open Graph Windows, except for the active one, and prompts you to save the corresponding files, if any of them were modified.

Close All

Closes all open Graph Windows and prompts you to save the corresponding files if any of them were modified.

Recent Networks

Provides a list of the most recently opened networks so you can quickly reopen them as needed.

Set Working Directory

Set Working Directory allows you to define a Working Directory, i.e., a workspace, by associating a name with a specific directory.
Once a Working Directory is set, it will serve as the default location for all file operations, such as Open, Save As, etc.
Subsequently, you can recall the directories you defined with the menu item Recent Working Directories.

Recent Working Directories

With Recent Working Directories, you can quickly recall a Working Directory you previously specified.
The list features the name you assigned plus the corresponding path.
The size of this list can be modified under Main Menu > Window > Preferences > Menus. See Recent Networks.

Reports

Properties

Export

Protect

Magnifier

Save as Image

Print

Exit

Closes all graphs, prompts you to save if needed and closes BayesiaLab.

Allows exportingtheMarkovblanketofthetargetvariable of the current network into a language selected in the following dialog box:

Once the network is exported in a language, it can be used to infer the value of the target variable according to the observations of the other variables.

Allows locking the network with a password to prevent it from being edited. Then, the network can be used only in Validation Mode. This menu gives access to the lock manager.

Prints the Bayesian network of the active graph window. An assistant gives access to:

the setup of the page-setting,
the configuration of the printer,
the selection of the desired scale for the network,
the possibility of displaying reference marks. These marks are useful when the network has to be print on more than one page. They indicate the page number (column, row), the border, and the vicinity,
the possibility to center the network.

Exit

Open

With Open, you can select a Bayesian network file via a File Dialog and load it into Graph Window.

To the right of the file list, a preview panel shows you the structure of the Bayesian network to be loaded.
Additionally, you can specify what you wish to load along with the to-be-opened file. Clicking on the icons to the right of the file list allows you to toggle on and off specific file contents:
The Files of Type dropdown menu allows you to filter the types of Bayesian network formats to be displayed in the file list.

In addition to BayesiaLab's XBL format, select versions of BIF, NET, SSS, SCI, and DNE formats may be supported.

Bayesia does not guarantee the compatibility of BayesiaLab with any third-party or open-source Bayesian network formats.

Save

Saves the Bayesian network in the active Graph Window using the XBL format.
By default, any dataset associated with the Bayesian network and any Evidence Scenario Files will be saved in the same XBL file so that they can be jointly loaded again later.
You can edit these default settings under Main Menu > Window > Prefences > Data.
If the Bayesian network has a Junction Tree, it will be also automatically saved in the same file.

Increment & Save

This command saves the current Bayesian network in a new file, adding an iteration number in parentheses to the current file name as a suffix.
If your current network is named Graph.xbl, the Increment & Save function will save it as Graph(2).xbl and not overwrite the original Graph.xbl file.
With each further iteration of Increment & Save, the counter in the suffix will increase by 1 unit, i.e., Graph(3).xbl, Graph(4).xbl, etc.
This is a helpful function for maintaining a history when developing a model, allowing you to revert to an earlier version when necessary.

Save As

Save As lets you choose a new file name and location for your current Bayesian network.

Additionally, you can specify what you wish to include in the to-be-saved file. Clicking on the icons to the right of the file list allows you to toggle on and off specific contents:

Close

Closes the active Graph Window and prompts you to save the corresponding file if your network, the associated dataset, or the Evidence Scenario File were changed since the last Save operation.

Protect

Context

This password locking mechanism allows you to share your networks to make sure they will not be modified by unauthorized users.

When a network is locked, you cannot validate and save the modifications done in the Node Editor, add or delete arcs and nodes, associate dictionaries and databases for learning, modify classes, etc.

Usage

To start protecting a network select Main Menu > Network > Protect.
Unless the network already has a lock, the following dialog box is displayed:

When the network is unlocked, the menu Network | Lock displays the following dialog box:

This dialog box allows you to:

lock the network using the existing password,
remove completely the Lock,
change the Lock Password.

Reports

Context

The Reports submenu within the Network menu offers an array of information about the Bayesian network in the active Graph Window.

Network Comments Report

Context

Usage

Select Main Menu > Network > Report > Comments.
The Network Comments window opens, and a typical report resembles the following screenshot.

For a network that is in an early stage of development with little customization, the Network Comments Report may only feature default information:

Network Report

Context

The Network Report is a very comprehensive documentation of the network in the active Graph Window.
It includes statistics about the network structure as a whole, plus details for each node, such as the Node States, the Conditional Probability Tables, and equations.
As such, it presents all qualitative and quantitative knowledge contained in the network as a long, tabular report.
To some extent, you could recreate the network from all these details.

Usage

Select Main Menu > Network > Reports > Network to create the Network Report.
The report can be quite substantial, depending on your network's size and complexity.
The following screenshot only shows the top portion of a much longer report:

For a thorough offline analysis, you may want to save the Network Report as an HTML file, which you can then open as a spreadsheet in Excel.

Occurrences Report

Context, Background, and Motivation

Occurrences refer to the number of observations in a cell of a Probability Table or a Conditional Probability Table.
The number of cells in a Conditional Probability Table is a function of the following parameters:
- The number of Parent Nodes.
- The number of Node States of the Parent Nodes.
- The number of Node States of the Child Nodes.
Here, Age is discretized into 4 states and BMI into 6 for a total of 48 cells in the table associated with BMI.

The numbers in each cell are counts of observations or Occurrences. In our case, each Occurrence represents one person from the sample of 200 individuals.
For instance, the Occurrence table associated with BMI states that Count(BMI≤20 | Age≤30)=2. So, we have only two Occurrences of that particular condition, i.e., only two individuals who are 30 years old or younger have a BMI of 20 or lower.
To create a Bayesian network, BayesiaLab needs to translate the Occurrences in each cell into probabilities.
However, with a small number of Occurrences, that can become an issue.
We have repeatedly referenced a rule of thumb, which says that we should have a minimum of 5 Occurrences per cell to estimate a Probability Table or Conditional Probability Table reliably.
In our example, several cells fall below the recommended minimum.
Such deficiencies are easy to recognize in a small example, but in more complex networks, it can be difficult to spot such weaknesses.
That is the motivation for the Occurrence Report. It displays all tables in a network and visually highlights potentially problematic cells with low Occurrences.

Usage

Select the nodes you want to include in the Occurrences Report. I none are selected, the analysis will be performed on all nodes.
Select Main Menu > Network > Reports > Reports> Occurrences to create the Occurrences Report.
The Occurrence Report opens up and shows all Probability Tables and Conditional Probability Tables.

The fields in the report are color-coded to highlight potential issues:
- Cells with 0 Occurrences are marked in red.
- Cells with 5 Occurrences are marked in yellow. This is generally considered the minimum number of Occurrences.
- Cells with 40 or more Occurrences are marked in green.
Furthermore, the Occurrence Report calculates the mean number of Occurrences for each row in all Probability Tables and Conditional Probability Tables.
If the mean value of any row in any of the nodes drops below the threshold of 5, the corresponding nodes are called out at the top of the report.

Confidence Intervals Report

Context

Whenever you learn a Bayesian network from a small dataset, you must consider whether the number of observations is sufficient for correctly estimating all Probability Tables and Conditional Probability Tables in the network.
For a deeper analysis, BayesiaLab can produce the Confidence Intervals Report, which we discuss on this page.

Frequentist Parameter Estimation

To understand how Confidence Intervals can be computed, we first need to explain the estimation of probabilities in the Probability Tables and Conditional Probability Tables, the so-called parameters.
In BayesiaLab, these parameters are estimated using Maximum Likelihood, i.e., using the frequencies observed in the dataset:

where:
So, the Parameter Estimation is straightforward and happens entirely in the background in BayesiaLab.
As a result, we may not always be aware of what numbers gave rise to the probabilities we see in a Probability Table or Conditional Probability Table, as the following diagram illustrates:

However, in terms of our confidence in the estimate, the two approaches are not the same. Our intuition tells us that we should have more confidence in the 0.1 value calculated based on the sample of 10,000.

Confidence Intervals

BayesiaLab is using precisely the same approach for the Confidence Intervals Report.
However, in BayesiaLab, you can avoid resorting to this heuristic by using Uniform Prior Samples.

Usage

Within this network, focus on the three nodes BMI, Age, and Gender:

Go to Main Menu > Network > Reports > Confidence Intervals to start the Confidence Intervals Report.
The Confidence Interval Report window opens up.

At the top of the report, the Confidence Level that serves as the basis for the reported Confidence Intervals is displayed.
Then, for each node, one table is shown.
For each cell containing a parameter estimate, an adjacent cell to the right displays the corresponding Confidence Interval in percentage points.
The fields in the report are color-coded to highlight potential issues:
- Cells with 0 Occurrences are marked with a red background.
- Cells with 5 Occurrences are highlighted with a yellow background. This is generally considered the minimum acceptable number of Occurrences.
- Cells with 40 or more Occurrences are marked with a green background.

Options

You can adjust the Confidence Level used for this report.
Go to Main Menu > Window > Preferences > Tools > Statistical Tools.
Select the desired value from the Confidence Level dropdown menu.

Note that your selection here also applies to all other statistical tools and tests used in BayesiaLab.

Network Comments

Context

Network Comments provides space for notes, descriptions, and references regarding a Bayesian network.
In the Network Comments field, you can enter and edit paragraph-style text.

Usage

You can access the Network Comments Editor in two ways:
- Main Menu > Network > Properties > Comments.
- Graph Panel Context Menu > Properties > Comments.
A new window opens featuring the Node Comments Editor.
By default, the Node Comments field contains the date and time the file was created, plus the user that created the file.
Alternatively, the Network Comments field displays any custom text you may have defined, such as a problem domain description.
You can apply HTML-style formatting to your text using the toolbar, including links and images.
Note that Network Comments are automatically saved with the network file.
If you share your network file with others, the information contained in Network Comments will be accessible to them.

Data

Open Data Source

This menu item allows opening the file or the database selector and then starts the Data Import Wizard.

Text *file:* Once the file is read and the pre-processing done, a fully unconnected network is created in a new graph window, each attribute having one corresponding node. The set of Bayesian network learning methods becomes then available.
Database: Once the database table is loaded and the pre-processing done, a fully unconnected network is created in a new graph window, each attribute having one corresponding node. The set of Bayesian network learning methods becomes then available.
Recent databases: Keep a list of the recently opened databases. The Data importation wizard is directly opened on the selected file. The size of this list can be modified through the settings Menus .

Associate Data Source

This menu item allows opening the Data association wizard in order to associate data from a text file or a database with an existing Bayesian network.

Recent databases: Keep a list of the recently opened databases. The Data association wizard is directly opened on the selected file. The size of this list can be modified through the settings Menus .

When the network structure is modified during the association (addition of nodes or states), the conditional probability tables are automatically recomputed from the database. If the structure re- mains unmodified, the conditional probability tables are not modified.

Associate Dictionary

This menu item allows defining the properties of the active Bayesian network thanks to text files. These properties concern arcs, nodes and states:

Arc:
- Arcs: allows associating a set of arcs to the network. The indicated arcs can be added or removed from the network. The arc removal will always be done before adding an arc. Before adding an arc, all the constraints belonging to the Bayesian network as well as the arc constraints and the temporal indices will be checked. If a constraint is not verified, then the arc won't be added.
- Forbidden Arcs: allows associating with the network a set of forbidden arcs .
- Arc Comments: allows associating with the network a set of arc comments .
- Arc Colors: allows associating with the network a set of colors on the arcs.
- Fixed Arcs: allows defining if some arcs are fixed or not.
Node:
- Node Renaming: allows renaming each node with a new name. These new names must be, of course, all different.
- Comments: allows associating a comment with each node that is in the file.
- Classes: allows organizing nodes in subsets called classes . A node can belong to several classes at the same time. These classes allow generalizing some node's properties to the nodes belonging to the same classes. They allow also creating constraints over the arc creation during learning.
- Colors: allows associating colors with the nodes or classes that are in the file. The colors are written as Red Green Blue with 8 bits by channel in hexadecimal format (web format): for example the color red is 255 red 0 green 0 blue, it will give FF0000. Green gives 00FF00, yellow gives FFFF00, etc.
- Images: allows associating colors with the nodes or classes that are in the file. The images are represented by their path relatively to the directory where the dictionary is.
- Costs: allows associating with each node a cost . A node without cost is called not observable.
- Temporal Indices: allows associating temporal indices with the nodes that are in the file. These temporal indexes are used by the BayesiaLab's learning algorithms to take into account any constraints over the probabilistic relations, as for example the no adding arcs between future nodes to past nodes. The rule that is used to add an arc from node N1 to node N2 is:
- If the temporal index of N1 is positive or null, then the arc from N1 to N2 is only possible if the temporal index of N2 is greater of equal to the index of N1.
- Local Structural Coefficients: allows setting the local structural coefficient of each specified node or each node of each specified class.
- State Virtual Numbers: allows setting the state virtual number of each specified node or each node of each specified class.
- Locations: allows setting the position of each node.
State:
- State Renaming: allows renaming each state of each node with a new name.
- State Values: allows associating with each state of each node a numerical value .
- State Long Names: allows associating with each state of each node a long name more explicit than the default state name. This name can be used in the different ways to export a database, in the html reports and in the monitors.
- Filtered States: allows defining a state to each node as a filtered state .

As indicated by the syntax, the name of the node, class or state in the text file cannot contain equal, space or tab characters. If the node names contain such characters in the networks, those characters must be written with a {color} (backslash) character before in the text file: for example the node named Visit Asia will be written Visit\ Asia in the file.

In order to specifically differenciate a nam which is the same for a classe, a node or a state, you must add at the end of the name the suffix "c" for a class, "n" for a node and "s" for a state.

If your network contains not-ASCII characters, you must save your own dictionaries with UTF-8 (Unicode) encoding. For example, in MS Excel, choose "save as" and select "Text Unicode (*.txt)" as type of file. In Notepad, choose "save as" and select "UTF-8" as encod- ing. If your file contains only ASCII character you can let the default encoding (depending on the platform) but it is strongly encouraged to use UTF-8 (Unicode) encoding in order to create dictionary files that doesn't depend on the user's platform. So, for example, a chinese dictionary can be read by a german without any problem whatever the used platforms are. If you are not sure how to save a file with UTF-8 encoding, you should export a dictionary with BayesiaLab, modify and save it (with any text editor) and load it in BayesiaLab.

Export Dictionary

This menu item allows exporting the different kinds of dictionaries in text files.

The dictionary files are saved with UTF-8 (Unicode) encoding in order to support any character of any language. An option, in the Import and Associate preferences: Save Format , allows saving or not the BOM (Byte Order Mask) at the beginning of the file. The BOM increases the compatibility with Microsoft applications. On other platform like Unix, Linux or Mac OS X, the BOM is not necessary and, in come cases, is considered as simple extra characters at the beginning of the file.

Associate an Evidence Scenario File

This menu item allows associating an evidence scenario file with the network.

Export an Evidence Scenario File

This menu item allows exporting into a text file an evidence scenario file associated with the network.

Save Data

This menu item allows saving the base associated with the network including the results of the various pre-processing that have been carried out within the data importation wizard (discretization, aggregation, filtering,). If the imported database still contains missing values and if the selected algorithm to process the missing values is one of the two imputation algorithms (static or dynamic), then option will allow you to realize all your imputation tasks by saving a database without any missing values. Indeed, each missing value is replaced by taking into account its conditional probability distri- bution, returned by the Bayesian network, given all the known values of the line. If the database contains data for test and data for learning, the user can choose which kind of data he wants to save: only learning data, only test data or the whole data. It is also possible to save only the data corresponding to the selected nodes.

The states' long name can be saved instead of the states' name. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there is no numerical values asso- ciated with the database and if the option is checked, the numerical values will be created by randomly generating a value in each concerned interval. If the database contains weights, they will be saved as the first column in the output file.

Imputation

Allows the imputation of the missing values of the associated database according to the mode selected in the following dialog box:

The data will be saved in the specified file and the long name of the states will be used as specified. If the database contains data for test and data for learning, the user can choose on which kind of data he wants to perform imputation: only learning data, only test data or the whole data. The states' long name can be saved instead of the states' name. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there is no numerical values associated with the database and if the option is checked, the numerical values will be created by randomly generating a value in each concerned interval. However, if there are numerical values in the database, the missing numerical values will be generated from the distribution function of each interval. If the database contains weights, they will be saved as the first column in the output file.