1 of 100

BAYESIA

Bayesia Homepage

An All-New Website for BayesiaLab 11

With the release of BayesiaLab 11, we are also transitioning to an entirely new website. If you can't find the content you are looking for on this new site, please check the Legacy Edition of the BayesiaLab Knowledge Hub.

BayesiaLab

What's New?

Learn about the latest innovations in BayesiaLab 11

BayesiaLab 11

Version 11 of BayesiaLab is the latest iteration of our flagship product that has been under continuous development for nearly 25 years. No other organization has invested as many resources in developing technologies around the Bayesian network paradigm.

Release 11 once again features many innovations, including the native integration of a LLM-based subject matter assistant (OpenAI, OpenAI GPT Assistants, Azure, Mistral, ...).

Here is a selection of the most important new features:

Hellixia

Hellixia is the name of BayesiaLab's subject matter assistant based on Large Language Models. Hellixia offers a wide range of functions to help you characterize a given problem domain:

Dimension Elicitor: Identify relevant dimensions of a problem domain by using a large set of keywords and create the corresponding nodes.
Comment Generator: Utilize a comprehensive set of keywords to pinpoint relevant dimensions within a problem domain and add them as comments to the nodes.
Embedding Generator: This tool creates embeddings encapsulating node semantics, featuring vectors of 1,536 dimensions, enabling the learning of semantic networks.
Class Description Generator: Generate descriptive summaries for set of nodes to use as names for latent variables, for instance.
Semantic Variable Clustering: Create clusters of nodes based on their semantic.
Pairwise Causal Link: This function evaluates the causal relationship between two nodes, adding an arc if a link exists. It also quantifies the causal effect (ranging from -100 to 100) and creates or updates the conditional probability table accordingly.
Causal Structural Priors: This tool assesses the causal relationship between two nodes and creates a Structural Prior if a relationship exists. The value of the prior reflects the confidence level in the relationship's existence.
Causal Arc Explainer: This tool examines the causal relationship between two nodes, providing a detailed description of the causal mechanism when a relationship is identified. Additionally, it quantifies the causal effect, with values ranging from -100 to 100.
Causal Network Generator: This tool develops a Causal Bayesian Network focused on the chosen node. It generates new nodes, adds detailed comments for each causal link explaining the mechanism, determines causal effects (with values between -100 and 100), and constructs the conditional probability tables.
Causal Relationships Finder: This tool, akin to the Causal Network Generator, is designed to build a causal network using a predefined set of nodes instead of centering around a single node and generating new nodes.
Image Generator: This feature produces icons that visually represent the information linked to the nodes.
Translator: This function translates various network elements — including names of nodes, states, and comments on nodes and arcs — into the chosen language.
Report Analyzer: This tool processes the output from the Relationship Analysis Report, such as arc and node forces, and creates an HTML report that details the key dynamics of the domain represented by the network.

Independence of Causal Influence

The Independence of Causal Influence (ICI) tool has been enhanced with several updates:

New Combination Functions

SumPos(): An asymmetrical variation of the Sum function focusing on positive local mechanical effects.
SumNeg(): A counterpart that emphasizes negative local mechanical effects.
MinMax(): A function that implements the min method for negative values and the max method for positive ones.

ICI Wizard Enhancement

A Condensed Display option has been introduced. This feature creates a network where the local effects are snapped to their parent and the combination nodes to their respective children.

The Expert Editor has been rebranded as the SMEs & BEKEE Session Manager.
Subject Matter Experts (SMEs) can now be identified with specific colors for better differentiation.
There's an option to decide whether to send out invitation emails to the SMEs.
In terms of qualitative knowledge elicitation, specifically the qualitative segment of the Delphi Method, you can now utilize the Assessment Editor to produce Notes directly on the Graph Panel, derived from the comments provided by experts.

When eliciting a node, its current distribution can be dispatched as a prior to all experts in BEKEE, serving as an alternative to the default uniform distribution.
Node Contextual Menu:
- Generate from Assessments: this function facilitates the creation of distributions based on the weighted votes of chosen experts.
- Generate Assessments: This feature uses the node's current probability distribution to create an assessment associated with a selected expert. When Prior Weights are linked to the node, there's an option to use these weights to determine the expert's confidence level in the assessments.
- Delete Zero-Confidence Assessments: this option removes all assessments in which the expert's confidence level is set to 0.
- Delete Assessments: his feature deletes the assessments linked to the chosen experts.
Hellinger Distance: Measures the distance between experts' votes and a reference expert (usually the consensus).
2D/3D Mapping incorporates new metrics derived from experts' assessments.

Formulas

The Formulas tab in the Node Editor now supports local variables.

Additionally, new functions have been introduced, with some of the most notable being:

TriangularMD(v1, x), i.e., triangular membership degree in fuzzy logic (under Special Functions)

Deciban(x): The deciban is a logarithmic unit — much like the decibel or the Richter scale — introduced by Alan Turing for expressing probabilities. It is a tenth of a ban, which is also known as the base-10 log odds (under Arithmetic Functions)
Hellinger(v1, v2): The Hellinger distance is a measure of the similarity between two probability distributions (under Inference Functions)
NoisySum(s, leak, v1, w1, vn, wn): Used for representing situations where the variable s is the weighted (wi) sum of its parents (vi) plus an additional noise term (leak) to model uncertainty or random fluctuations
DualNoisyOr(s, leak, c1, p1, cn, pn): This function implements a modified Noisy-Or model that operates based on the combined effect of all pi values. The parameters ci represent conditions or boolean variables, while pi are their associated effects (positive or negative). When the aggregated sum of pi values is positive, the function executes a Noisy-Or with an overall effect equal to this sum, effectively determining the probability of the True state. Conversely, when the sum is negative, the function applies the Noisy-Or logic to the False state, adjusting the likelihood of the outcome being False according to this negative sum
SingleMode(v): A function designed to ascertain whether the distribution of variable v is unimodal (under Inference Functions).

Weight of Evidence

Weight of Evidence now features four new types of analyses:

Most/Least Relevant Explanations
Most/Least Confirmatory Clues

Structural Learning Algorithms

The EQ-based learning algorithms are now disabled in scenarios where the score of an arc is not equivalent in both directions. This can occur due to filtered states, constraints, structural priors, etc. The assumption of equivalence is no longer theoretically valid in such contexts and could result in invalid networks with cycles.

Evidence Scenario Files

The data associated with the network can now be exported into an evidence scenario file.
Scenarios are now editable, allowing adjustments to the index, weight, and comments.
A new Evidence Scenario Report is now accessible, offering a detailed description of the scenarios' content.

Target Evaluation Tool

The redesigned Target Evaluation function now features dedicated tabs for:

Classification
Posterior Probabilities
Regression
Triage

Graph Layout, Rendering and Edition

Dynamic Grid Layout: This innovative layout algorithm, particularly suitable for creating readable graphs featuring badges with associated comments, excels in handling graphs created with Hellixia.

View Menu: four new functions have been introduced to optimize the display of graphs. Users can now shrink or stretch graphs both vertically and horizontally, offering enhanced visualization flexibility.

Position Menu: this new item has been introduced to enable the adjustment of the graphical layers of Nodes and Notes. It's available via their contextual menus.

Horizontal and Vertical Stacking: These new alignment tools enable the positioning of the selected nodes horizontally or vertically, aligning them automatically closely without extra space.

Highlight a Class: Accessible from the Note Contextual Menu, this feature lets you select a Class and then automatically adjusts the size and position of the note to encompass all nodes belonging to that class.

Arc Editor: Accessible by double-clicking an arc, this feature enables you to edit the text associated with the arc as well as its rendering properties.
Moving Arc Comments: You can now reposition comments along their corresponding arcs.

Color Linked: This new feature, added to the Rendering Properties of Badges, Monitors, Bars, and Gauges, automatically applies the node's associated color to the Name Background Color. Additionally, it also automatically selects white for the Name Color on dark backgrounds and black on lighter ones.
By pressing 'Z', a selection zone can be initiated, regardless of whether an object on the graph is clicked.
Numerical Evidence Entry for Gauges and Bars: A new approach is introduced for inputting numerical evidence through shift-clicking on a node. Utilize the 'M' and 'B' icons to select the Distribution Estimation Method (MinXEnt and Binary, respectively), with the three icon colors representing the Observation Type: No Fixing, Fix Mean, and Fix Probabilities, respectively.

Pseudo Root-Nodes: If a node exclusively has Function Nodes as parents, making it a root node of its subnetwork, and the parents of these Function Nodes have fixed observed values, then the distribution of these pseudo root-nodes is also automatically set to fixed.

Boolean Conversion: Featured in the Tools menu, this function enables the conversion of selected nodes into boolean nodes.

2D Mapping

The 2D mapping has been enhanced to incorporate an additional dimension for node analysis: Font Size, supplementing the existing Node Size and Color dimensions. This enables font sizes to be proportional to the selected metric.

The Node Analysis section has been enriched with the addition of numerous metrics, providing a more comprehensive analysis capability:
- Mutual Information with Target Node
- Mutual Information with Target State
- Bayes Factor
- Normalized Bayes Factor
- Kullback-Leibler
- Normalized Kullback-Leibler
- Total Effect on Target
- Standardized Total Effect on Target
- Direct Effect on Target
- Standardized Direct Effect on Target
- Number of Assessments
- Assessment Completion Rate
- Maximum Assessment Divergence
- Overall Assessment Divergence
- Missing Value Rate
Comments associated with the nodes are now displayed when you hover over them.
The option Hide Text for Ignored Nodes conceals the names of nodes that are not observable.

Introducing Hellixia

Background & Motivation

To this day, no reliable methods exist to find causal relationships in data. More specifically, given a statistical association between two variables, it is impossible to establish which variable is the cause and which is the effect.

As a result, acquiring additional external information, such as human expert knowledge or the temporal order of the variables, has always been necessary to determine the causal direction in bivariate relationships.

Thus, the arrival of ChatGPT last year has prompted the Bayesia team to immediately leverage the potential of this new type of AI with BayesiaLab.

Hellixia, BayesiaLab's New Subject Matter Assistant

Hellixia is the name of BayesiaLab's subject matter assistant powered by ChatGPT. Hellixia offers a wide range of functions to help you characterize a given problem domain:

Identify relevant dimensions of a problem domain
Extract dimensions from a text
Generate embeddings for learning a semantic network
Generate meaningful descriptions for classes of nodes
Provide tools for causal analysis
Translate names and comments of nodes into different languages
Generate images to be associated with nodes

Embeddings

In the context of machine learning and natural language processing (NLP), embedding refers to a mathematical representation of a word, phrase, sentence, or any other linguistic unit in a continuous vector space. Word embeddings, in particular, are widely used representations that capture the semantic and syntactic properties of words.

Semantic Network

A semantic network is a graphical representation of knowledge or concepts organized in a network-like structure. It is a form of knowledge representation that depicts how different concepts or entities are related to each other through meaningful connections.

In a semantic network, concepts are represented as nodes, and their relationships are depicted as labeled links or arcs. These links indicate the connections or associations between the concepts, such as hierarchical, associative, or causal relationships.

Hellixia Workflow

A typical research workflow with Hellixia consists of the following steps:

BayesiaLab utilizes its structural learning algorithms to find associations between variables.
Then, Hellixia obtains the causal directions for the learned associations and applies them as structural priors to the network.
Finally, with these newly defined structural priors, BayesiaLab relearns the network. The final network now represents statistical knowledge from data plus the causal knowledge obtained from ChatGPT.

Hellixia Performance

The feature highlight of the BayesiaLab 11 is the integration of Hellixia, a subject matter assistant that leverages ChatGPT for structural knowledge elicitation.

Hellixia Preview at the 2023 BayesiaLab Spring Conference

Chat Completion
Image Generation
Embedding Generation

As a result, the new Hellixia subject matter assistant can improve research workflows in several ways:

Accelerate the qualitative part of knowledge elicitation.
Generate practical natural language descriptions for latent factors created through BayesiaLab's clustering functions.
Automatically create images to illustrate nodes in a network.

Presentation Video

Learn More

BayesiaLab

Overview

BayesiaLab is a powerful desktop application (Windows/Mac/Unix) that provides scientists with a comprehensive “laboratory” for machine learning, knowledge modeling, probabilistic reasoning (incl. diagnosis and simulation), causal inference, and optimization.
BayesiaLab utilizes the Bayesian network framework for gaining deep insights into problem domains and reasoning about them.
BayesiaLab is the result of more than twenty years of research by Dr. Lionel Jouffe and Dr. Paul Munteanu and their team of computer scientists. Their company, Bayesia S.A.S., is headquartered in Laval in northwestern France, with affiliates in the U.S. and Singapore.
Today, Bayesia S.A.S. is the world’s leading supplier of Bayesian network software, serving hundreds of major corporations and research organizations around the world.

Executive Summary

Executive Summary This executive summary in PDF format explains on two pages how BayesiaLab can support you in your research and decision-making workflows. Pass it along to anyone in your organization who needs to know — in non-technical terms — what BayesiaLab can do.

BayesiaLab's Core Features & Functions

Knowledge Modeling

Subject matter experts often express their causal understanding of a domain in the form of diagrams in which arrows indicate causal directions.
This visual representation of causes and effects has a direct analog in the network graph in BayesiaLab.
The causal direction can be encoded by orienting the arcs from cause to effect.
The quantitative nature of relationships between variables, plus many other attributes, can be managed in BayesiaLab’s Node Editor.
In this way, BayesiaLab facilitates the straightforward encoding of one’s understanding of a domain.
Simultaneously, BayesiaLab enforces internal consistency so that impossible conditions cannot be encoded accidentally.

See Examples & Learn More

Webinar: Optimizing Health Policies

Knowledge Elicitation

See Examples & Learn More

Discrete, Nonlinear, and Nonparametric Modeling

Machine Learning with BayesiaLab

BayesiaLab features a comprehensive array of highly optimized algorithms to efficiently learn Bayesian networks from data (structure and parameters).
The optimization criteria in BayesiaLab’s learning algorithms are mostly based on information theory (e.g., the Minimum Description Length).
With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.

Unsupervised Learning

In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a clear distinction, we emphasize “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between many variables without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.

Supervised Learning

Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e., to develop a model for predicting a target variable.
Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to one type of network, i.e., the Naive Bayes network.
BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also considering the complexity of the resulting network.
We should highlight the set of Markov Blanket algorithms for their speed, which is particularly helpful when dealing with many variables. In this context, the Markov Blanket algorithm can be an efficient variable selection algorithm.

See Examples & Learn More

Markov Blanket Learning Algorithms (9.0)
Chapter 6: Supervised Learning
Webinar: Diagnostic Decision Support

Clustering

Clustering in BayesiaLab covers both Data Clustering and Variable Clustering.
- Data Clustering applies to creating a Latent Variable whose states represent groups of observations (records) that share some characteristics.
- Variable Clustering groups variables according to the strength of their relationships.
Multiple Clustering is one of the steps of BayesiaLab's Probabilistic Structural Equation Model (PSEM) workflow. It consists of iteratively using Data Clustering on subsets of data defined by Variable Clustering to create Latent Variables that represent the hidden causes that have been sensed by Manifest Variables. This can be considered as a kind of nonlinear, nonparametric, and nonorthogonal factor analysis.

See Examples & Learn More

Data Clustering (7.0)
Variable Clustering (7.0)
Multiple Clustering (9.0)
Chapter 8: Probabilistic Structural Equation Models Webinar: Factor Analysis Reinvented — Probabilistic Latent Factor Induction

Inference: Diagnosis, Prediction, and Simulation

The inherent ability of Bayesian networks to explicitly model uncertainty makes them suitable for a broad range of real-world applications.
In the Bayesian network framework, diagnosis, prediction, and simulation are identical computations. They all consist of observational inference conditional upon evidence:
- Inference from observed effects to causes: diagnosis or abduction.
- Inference from observed causes to effects: simulation or prediction.
This distinction, however, only exists from the perspective of the researcher, who would presumably see the symptom of a disease as the effect and the disease itself as the cause. Hence, carrying out inference based on observed symptoms is interpreted as a “diagnosis.”

Observational Inference

One of the central benefits of Bayesian networks is that they represent the Joint Probability Distribution and can therefore carry out inference “omnidirectionally.”
Given an observation with any type of evidence on any of the networks’ nodes (or a subset of nodes), BayesiaLab computes the posterior probabilities of all other nodes in the network, regardless of arc directions.
Both exact and approximate observational inference algorithms are implemented in BayesiaLab.

Types of Evidence

Hard Evidence: no uncertainty regarding the state of the variable (node).
Likelihood/Virtual Evidence is defined by likelihoods associated with each variable state.
Probabilistic/Soft Evidence, defined by marginal probability distributions.
Numerical Evidence, for numerical variables or for categorical/symbolic variables that have associated numerical values.

See Examples & Learn More

Types of Evidence in Chapter 7: Unsupervised Learning

Causal Inference

Beyond observational inference, BayesiaLab can also perform causal inference for computing the impact of intervening on a subset of variables instead of merely observing these variables.
Pearl’s Graph Surgery and Jouffe’s Likelihood Matching are available for this purpose.

See Examples & Learn More

Effects Analysis

Many research activities focus on estimating the size of an effect, e.g., to establish the treatment effect of a new drug or to determine the sales boost from a new advertising campaign. Other studies attempt to decompose observed effects into their causes, i.e., they perform attribution.
BayesiaLab performs simulations to compute effects, as parameters as such do not exist in this nonparametric framework.
As all the domain dynamics are encoded in discrete Conditional Probability Tables (CPT), effect sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis, and several other functions offer ways to study effects, including nonlinear and variable interactions.

See Examples & Learn More

Optimization

BayesiaLab’s ability to perform inference over all possible states of all nodes in a network also provides the basis for searching for node values that optimize a target criterion. BayesiaLab’s Target Optimization is a set of tools for this purpose.
Using these functions in combination with Direct Effects is of particular interest when searching for the optimum combination of variables that have a nonlinear relationship with the target, plus co-relations between them.
A typical example would be searching for the optimum mix of marketing expenditures to maximize sales. BayesiaLab’s Genetic Target Optimization will search, within the specified constraints, for those scenarios that optimize the target criterion.

Model Utilization

BayesiaLab provides a range of functions for systematically utilizing the knowledge contained in a Bayesian network. They make a Bayesian network accessible as a probabilistic expert system that can be queried interactively by an end-user.

Adaptive Questionnaire

In a medical context, for instance, this would allow for the optimal “escalation” of diagnostic procedures from “low-cost/small-gain” evidence (e.g., measuring the patient’s blood pressure) to “high-cost/large-gain” evidence (e.g., performing an MRI scan).

See Examples & Learn More

WebSimulator

Once a model is published via the WebSimulator, end users can try out scenarios and examine the dynamics of that model.

Code Export

Bayesia Engine API

The Bayesia Modeling Engine allows you to construct and edit networks.
The Bayesia Inference Engine can access network models programmatically for performing automated inference, e.g., as part of a real-time application with streaming data.
Finally, the Bayesia Learning Engine gives you programmatic access to BayesiaLab's discretization and learning algorithms.
The Bayesia Engine APIs are implemented as pure Java class libraries (jar files), which can be integrated into any software project.

See Examples & Learn More

Knowledge Communication

While generating a Bayesian network, either by expert knowledge modeling or through machine learning, is all about a computer acquiring knowledge.
However, a Bayesian network can also be a remarkably powerful tool for humans to extract or “harvest” knowledge.
Given that a Bayesian network can serve as a high-dimensional representation of a real-world domain, BayesiaLab allows us to interactively — even playfully — engage with this domain to learn about it.

Visualization in 2D, 3D, and VR

Through visualization, simulation, and analysis functions, plus the graphical nature of the network model itself, BayesiaLab becomes an instructional device that can effectively retrieve and communicate the knowledge contained within the Bayesian network.
As such, BayesiaLab becomes a bridge between artificial intelligence and human intelligence.

See Examples & Learn More

BayesiaLab Trial Software

Free 30-Day BayesiaLab Trial

We offer an unrestricted trial version of the BayesiaLab software so you can evaluate our technology at your leisure.

All BayesiaLab functions are available in the trial version. There are no restrictions on the number of nodes and observations.
Upon registering for the BayesiaLab trial version, we will typically send you the download and activation instructions within 24 hours.
The instructions you receive will include download links for a number of operating systems, including Windows, macOS (Intel and ARM), and Unix/Linux.
From the date you receive your trial license credentials, you can use BayesiaLab for 30 days.

The 30-day trial period starts with the delivery of your credentials, not the date you install the trial.

BayesiaLab Trial Registration

Note to Current BayesiaLab Users

Please don't use the evaluation version to restore or upgrade your existing BayesiaLab license. The installation files are different from the licensed versions of BayesiaLab.
If you require an update, you can download the latest version via the Help menu in BayesiaLab: Main Menu > Help > Check for Updates.

User Guide

Introduction

Since BayesiaLab's initial release in 2002, this User Guide has grown from a small help file to a comprehensive software documentation, now exceeding 1,500 topics.
With the BayesiaLab software eco-system continuing to grow rapidly, this User Guide is very much a living document, with more details being added daily. Plus, the annual cycle of major releases adds countless new features.
Beyond documenting the software functionality, this User Guide also serves as a reference to BayesiaLab-related nomenclature.
Many of BayesiaLab's analysis functions are entirely new and unique in the world of research, so many BayesiaLab-specific terms are neologisms. Here, you can find what we mean with expressions such as "Target Dynamic Profile" or "Likelihood Matching."
In this User Guide, you will also find many cross-references to examples and case studies presented in seminars, webinars, and our e-book, now available as a free online edition within this Knowledge Hub.

User Guide Structure

This User Guide's tree structure mirrors the BayesiaLab software's structure.
For instance, if you want to learn about the details of the function located in BayesiaLab's menu structure at Main Menu > Analysis > Visual > Overall > Arc > Mutual Information, the corresponding documentation resides in this User Guide at: Main Menu | Analysis | Visual | Overall | Arc | Mutual Information.
If the same function is accessible via multiple paths, e.g., from the Main Menu, from the Graph Panel Context Menu, and the Node Context Menu, the main documentation of this function will be attached to the highest level in the hierarchy in this case, the Main Menu. All other mentions of the function will refer back to this main entry.

BayesiaLab's Graphical User Interface

BayesiaLab runs inside the Application Window.
Inside the Application Window, there are four main elements:
- The Main Menu serves as the top-level navigation to all features and tools in BayesiaLab.
- The Toolbar provides quick one-click access to frequently used functions.
- The Graph Window is your work surface for creating and editing Bayesian network graphs.
- Each Graph Window corresponds to a Bayesian network, which you can save as a file in XBL format.
- The Graph Bar, at the bottom of the Main Window, allows you to manage multiple Graph Windows, which can all be opened simultaneously.
Main Menu
Toolbar
Graph Windows
Graph Bar

Main Menu

Context

The Main Menu serves as the top-level navigation in BayesiaLab.
Most functions and tools are available through multiple levels of submenus attached to the Main Menu.
However, in many cases, these functions are also accessible in the context of specific workflows.
As a result, this User Guide often shows multiple ways of launching the same tool or function.
The Main Menu can appear in three different configurations, and certain menu items and icons will only be available in specific contexts:

No Graph Open

Modeling Mode

Validation Mode

Open

With Open, you can select a Bayesian network file via a File Dialog and load it into Graph Window.

To the right of the file list, a preview panel shows you the structure of the Bayesian network to be loaded.
Additionally, you can specify what you wish to load along with the to-be-opened file. Clicking on the icons to the right of the file list allows you to toggle on and off specific file contents:
The Files of Type dropdown menu allows you to filter the types of Bayesian network formats to be displayed in the file list.

In addition to BayesiaLab's XBL format, select versions of BIF, NET, SSS, SCI, and DNE formats may be supported.

Bayesia does not guarantee the compatibility of BayesiaLab with any third-party or open-source Bayesian network formats.

Increment & Save

This command saves the current Bayesian network in a new file, adding an iteration number in parentheses to the current file name as a suffix.
If your current network is named Graph.xbl, the Increment & Save function will save it as Graph(2).xbl and not overwrite the original Graph.xbl file.
With each further iteration of Increment & Save, the counter in the suffix will increase by 1 unit, i.e., Graph(3).xbl, Graph(4).xbl, etc.
This is a helpful function for maintaining a history when developing a model, allowing you to revert to an earlier version when necessary.

Protect

Context

This password locking mechanism allows you to share your networks to make sure they will not be modified by unauthorized users.

When a network is locked, you cannot validate and save the modifications done in the Node Editor, add or delete arcs and nodes, associate dictionaries and databases for learning, modify classes, etc.

However, you can still edit the costs associated with the nodes as they are utilized only in the Validation Mode (e.g, Adaptive Questionnaire, not observable nodes, etc.).

Usage

To start protecting a network select Main Menu > Network > Protect.
Unless the network already has a lock, the following dialog box is displayed:

When the network is unlocked, the menu Network | Lock displays the following dialog box:

This dialog box allows you to:

lock the network using the existing password,
remove completely the Lock,
change the Lock Password.

Reports

Context

The Reports submenu within the Network menu offers an array of information about the Bayesian network in the active Graph Window.

Network Comments Report

Context

The Network Comments Report displays the information recorded in a network's Comment field.
And if available, the Network Comments Report also lists the associations of Node Names, Long Names, and Node Comments.

Usage

Select Main Menu > Network > Report > Comments.
The Network Comments window opens, and a typical report resembles the following screenshot.

For a network that is in an early stage of development with little customization, the Network Comments Report may only feature default information:

Network Report

Context

The Network Report is a very comprehensive documentation of the network in the active Graph Window.
It includes statistics about the network structure as a whole, plus details for each node, such as the Node States, the Conditional Probability Tables, and equations.
As such, it presents all qualitative and quantitative knowledge contained in the network as a long, tabular report.
To some extent, you could recreate the network from all these details.

Usage

Select Main Menu > Network > Reports > Network to create the Network Report.
The report can be quite substantial, depending on your network's size and complexity.
The following screenshot only shows the top portion of a much longer report:

For a thorough offline analysis, you may want to save the Network Report as an HTML file, which you can then open as a spreadsheet in Excel.

Occurrences Report

Context, Background, and Motivation

Occurrences refer to the number of observations in a cell of a Probability Table or a Conditional Probability Table.
The number of cells in a Conditional Probability Table is a function of the following parameters:
- The number of Parent Nodes.
- The number of Node States of the Parent Nodes.
- The number of Node States of the Child Nodes.
Here, Age is discretized into 4 states and BMI into 6 for a total of 48 cells in the table associated with BMI.

The numbers in each cell are counts of observations or Occurrences. In our case, each Occurrence represents one person from the sample of 200 individuals.
For instance, the Occurrence table associated with BMI states that Count(BMI≤20 | Age≤30)=2. So, we have only two Occurrences of that particular condition, i.e., only two individuals who are 30 years old or younger have a BMI of 20 or lower.
To create a Bayesian network, BayesiaLab needs to translate the Occurrences in each cell into probabilities.
However, with a small number of Occurrences, that can become an issue.
We have repeatedly referenced a rule of thumb, which says that we should have a minimum of 5 Occurrences per cell to estimate a Probability Table or Conditional Probability Table reliably.
In our example, several cells fall below the recommended minimum.
Such deficiencies are easy to recognize in a small example, but in more complex networks, it can be difficult to spot such weaknesses.
That is the motivation for the Occurrence Report. It displays all tables in a network and visually highlights potentially problematic cells with low Occurrences.

Usage

Select the nodes you want to include in the Occurrences Report. I none are selected, the analysis will be performed on all nodes.
Select Main Menu > Network > Reports > Reports> Occurrences to create the Occurrences Report.
The Occurrence Report opens up and shows all Probability Tables and Conditional Probability Tables.

The fields in the report are color-coded to highlight potential issues:
- Cells with 0 Occurrences are marked in red.
- Cells with 5 Occurrences are marked in yellow. This is generally considered the minimum number of Occurrences.
- Cells with 40 or more Occurrences are marked in green.
Furthermore, the Occurrence Report calculates the mean number of Occurrences for each row in all Probability Tables and Conditional Probability Tables.
If the mean value of any row in any of the nodes drops below the threshold of 5, the corresponding nodes are called out at the top of the report.

Confidence Intervals Report

Context

Whenever you learn a Bayesian network from a small dataset, you must consider whether the number of observations is sufficient for correctly estimating all Probability Tables and Conditional Probability Tables in the network.
For instance, using the Occurrences Report, you can evaluate whether all Conditional Probability Tables in your network meet the rule-of-thumb criterion of at least 5 observations per cell.
For a deeper analysis, BayesiaLab can produce the Confidence Intervals Report, which we discuss on this page.

Frequentist Parameter Estimation

To understand how Confidence Intervals can be computed, we first need to explain the estimation of probabilities in the Probability Tables and Conditional Probability Tables, the so-called parameters.
In BayesiaLab, these parameters are estimated using Maximum Likelihood, i.e., using the frequencies observed in the dataset:

where:
- ${\hat \theta }$ is the estimated probability,
- ${{x_i}}$ is the state $i$ of variable $x$ ,
- $N( \cdot )$ represents the number of occurrences of the argument in the data set.
So, the Parameter Estimation is straightforward and happens entirely in the background in BayesiaLab.
As a result, we may not always be aware of what numbers gave rise to the probabilities we see in a Probability Table or Conditional Probability Table, as the following diagram illustrates:

So, BayesiaLab could have estimated a probability of 0.1 (or 10%) for ${x_0}$ in numerous ways, e.g., based on a sample of 10 or 10,000: $\frac{1}{{10}} = \frac{{1,000}}{{10,000}} = 0.1$ .
However, in terms of our confidence in the estimate, the two approaches are not the same. Our intuition tells us that we should have more confidence in the 0.1 value calculated based on the sample of 10,000.

Confidence Intervals

From Frequentist Statistics, we know how to calculate a Confidence Interval ${I_c}$
for a proportion in a sample, which is exactly what the parameter ${\hat \theta }$ represents.
BayesiaLab is using precisely the same approach for the Confidence Intervals Report.
So, for a Confidence Level of 95%, the Confidence Interval ${I_c}$ is calculated as: ${I_c} = \left[ {{{\hat \theta }_v} - 1.96 \cdot \frac{{\sigma (V)}}{{\sqrt N }};{{\hat \theta }_v} + 1.96 \cdot \frac{{\sigma (V)}}{{\sqrt N }}} \right]$
where $\sigma (V) = \sqrt {{{\hat \theta }_v}(1 - {{\hat \theta }_v})}$
If zero observations were observed for a given state, e.g., $N(X = {x_0}) = 0$ , the Rule of Three would have to be used instead to produce Confidence Intervals: ${I_c} = \left[ {{{\hat \theta }_v} - \frac{3}{{\sqrt N }};{{\hat \theta }_v} + \frac{3}{{\sqrt N }}} \right]$
However, in BayesiaLab, you can avoid resorting to this heuristic by using Uniform Prior Samples.

Usage

Within this network, focus on the three nodes BMI, Age, and Gender:

Go to Main Menu > Network > Reports > Confidence Intervals to start the Confidence Intervals Report.
The Confidence Interval Report window opens up.

At the top of the report, the Confidence Level that serves as the basis for the reported Confidence Intervals is displayed.
Then, for each node, one table is shown.
For each cell containing a parameter estimate, an adjacent cell to the right displays the corresponding Confidence Interval in percentage points.
The color-coding scheme is identical to the one used in the Occurrences Report.
The fields in the report are color-coded to highlight potential issues:
- Cells with 0 Occurrences are marked with a red background.
- Cells with 5 Occurrences are highlighted with a yellow background. This is generally considered the minimum acceptable number of Occurrences.
- Cells with 40 or more Occurrences are marked with a green background.

Options

You can adjust the Confidence Level used for this report.
Go to Main Menu > Window > Preferences > Tools > Statistical Tools.
Select the desired value from the Confidence Level dropdown menu.

Note that your selection here also applies to all other statistical tools and tests used in BayesiaLab.

Network Comments

Context

Network Comments provides space for notes, descriptions, and references regarding a Bayesian network.
In the Network Comments field, you can enter and edit paragraph-style text.

Usage

You can access the Network Comments Editor in two ways:
- Main Menu > Network > Properties > Comments.
- Graph Panel Context Menu > Properties > Comments.
A new window opens featuring the Node Comments Editor.
By default, the Node Comments field contains the date and time the file was created, plus the user that created the file.
Alternatively, the Network Comments field displays any custom text you may have defined, such as a problem domain description.
You can apply HTML-style formatting to your text using the toolbar, including links and images.
Note that Network Comments are automatically saved with the network file.
If you share your network file with others, the information contained in Network Comments will be accessible to them.

Open Data Source (Data Import Wizard)

Context

The Data Import Wizard is the principal tool in BayesiaLab for preprocessing and importing external data.

Data Sources

You can use BayesiaLab's Data Import Wizard to import data from two types of sources:

Data tables in text format, in which data fields are separated by delimiters, such as comma, semicolon, tab, or pipe "|". The most common format is CSV.
Data tables in SQL-compatible databases can be accessed via a JDBC driver. Third-party JDBC drivers are available for all major databases.

All data sources must be structured as a single table, i.e., with rows and columns. All table joins must be performed before importing the data in BayesiaLab.

Usage

To launch the Data Import Wizard for a data table in a
- text file, select Main Menu > Data > Open Data Source > Text File.
- database, select Main Menu > Data > Open Data Source > Database.

Then, the Data Import Wizard guides you through five sequential steps. The first step of the Data Import Wizard depends on the data source, i.e., text file or database. All subsequent steps of the Data Import Wizard are the same for both types of data sources.

Data Structure Definition
- Data table in a text format
- Data table in a database
Definition of Variable Types
Data Selection, Filtering, and Missing Value Processing
Discretization and Aggregation
Import Report

Step 1 — Data Structure Definition: Text File

Context

Open Data Source (Data Import Wizard) brings data into BayesiaLab to create a new Bayesian network.

BayesiaLab can load data from flat text files (e.g., CSV, TXT) or connected databases.

Usage

In Step 1 — Data Structure Definition: Text File of the five-step Data Import Wizard, you need to define the dataset structure for BayesiaLab so that the data can be imported and interpreted correctly.
The Data Structure Definition window opens up.
Specify all Settings & Options (see below).
Click Next to proceed the Step 2 — Definition of Variable Types.

Many of the settings can be immediately reviewed and validated in the Data Preview panel. However, Missing Values or Filtered Values can be mischaracterized and yet go unnoticed and, later, introduce major problems causing misleading analysis results.

Separators

The Data Import Wizard will attempt to automatically identify the separator or delimiter of the fields in the data table.

However, there can be ambiguous situations in which you need to specify the separator by checking the appropriate box:

Tab
Semicolon
Comma
Space
Other

If you prepare a dataset externally for import into BayesiaLab, ensure that separators are unique and do not appear as content in any data field. So, if any data fields contain text with commas as content, you cannot use commas as the separator. In such a case, try a tab or semicolon.

Encoding

The Encoding drop-down list allows you to select an alternative encoding for the dataset to be imported. This can become necessary for importing data from certain legacy systems.

Missing Values

Specifying the correct code for Missing Values is very important so that BayesiaLab can process such Missing Values appropriately.

The list shows a number of codes that are commonly used for Missing Values. However, this is not necessarily comprehensive, and your dataset may contain different codes, such as "." (dot) or "-9999", etc.

Click Add to create a new entry in this list for the current data import.
Clicking Remove deletes the selected entries.

Deleting a default entry such as NR (for no response) may become necessary, for instance, if a data field contains the string "NR" as a valid value. That would be the case if your data set included New York Stock Exchange ticker symbols. In this context, "NR" would be the symbol of Newpark Resources, Inc. Unless you address this issue, all "NR" strings would be treated as Missing Values.

You can set your own default list of codes under Main Menu > Windows > Preferences > Data > Import & Associate > Missing & Filtered Values.

Filtered Values

Just as important as the correct definition of Missing Values is a clear understanding of a Filtered Value.

A Filtered Value occurs when a variable cannot have any value for logical reasons. For instance, in a demographic dataset, there could be a field Age at Retirement. However, in the record of a 16-year-old high school student in this dataset, there could be no value for the field Age at Retirement. However, this situation must not be treated as a Missing Value! A Missing Value implies that a value exists but is unknown. In the case of the student's record, a value is logically impossible, not missing. So, instead of a numerical value or a blank, you must specify a code that says that there can be no value. This is the purpose of assigning a Filtered Value code.

Importantly, you must encode any Filtered Values before importing your dataset into BayesiaLab. In BayesiaLab, you merely need to declare what code you used in your dataset to represent Filtered Values. BayesiaLab will create a Filtered State as an additional state in each node for which Filtered Values are encountered during data import.

Click Add to create a new entry in this list for the current data import.
Clicking Remove deletes the selected entries.

You can set your own default list of codes under Main Menu > Windows > Preferences > Data > Import & Associate > Missing & Filtered Values.

In Data Preview, all Filtered Values are marked with an asterisk (*) in the data table.

Understanding the difference between Missing and Filtered Values is critically important.

Sampling

Clicking Define Sample button opens a window that allows you to sample records from your data source.

This is particularly useful for the preliminary analysis of large datasets. By default, BayesiaLab imports all records from the data.

You can define a subset in three ways:

Random Sample — Size in Percent: specify the size of the random sample as a percentage of the original dataset size.
Random Sample — Size: specify the number of records in the sampled dataset.
Custom Range — First Row to Last Row: specify the range of records to be imported.

Checking the option Fixed Seed and specifying a number ensures that you can repeat exactly the same random sampling for each iteration of the import. This allows you to reproduce your results as you develop your model.

Learning/Test

By default, the Data Import Wizard loads the entire dataset as a Learning Set.

By clicking the Define Learning/Test Sets button, you can set aside a Test Set (or holdout sample).

You can define the Learning Set/Test Set split in three ways:

Random Test Set — Size in Percent: specify the size of the Test Set as a percentage of the original dataset size.
Random Test Set — Size: specify the number of records in the Test Set.
Custom Test Set — First Row to Last Row: select a specific range of records for a Test Set.

Checking the option Fixed Seed and specifying a number ensures that you can obtain the same Test Set with each iteration of the import. This allows you to reproduce your results and validation measures as you develop your model.

In addition to specifying a Learning Set/Test Set split here, you can define a split in other ways:

You can designate a variable in the original dataset to assign records to the Learning Set and Test Set. You can select such a variable in the next step of the Data Import Wizard: Step 2 — Definition of Variable Types.
Main Menu > Data > Data Set > Generate Learning/Test Split

Furthermore, you can remove the Learning Set/Test Set split at any time:

Main Menu > Data > Data Set > Remove Learning/Test Split.

Options

The Options Panel allows you to manage the interpretation of the to-be-imported dataset.

Title Line:
- By checking this option, BayesiaLab reads the first row of the dataset and uses its values as column headers.
- If the values in the first row are not compatible, e.g., due to missing values or duplicate values, you are prompted to accept the proposed corrections, which include adding suffixes for duplicate names and substituting missing values with generic column headers, e.g., N0, N1, N2, etc.
End of Line Character:
- With some files, it may be necessary to specify a certain character so that BayesiaLab can correctly detect the end of a row in a data table.
Consider Identical Consecutive Separators as One:
- Check this box so that if you have multiple consecutive separators of the same type, e.g., “;;;”, the Data Import Wizard will treat them as a single separator.
Consider Different Consecutive Separators as One:
- Check this box so that if you have multiple consecutive separators of any type, e.g., “;,|”, the Data Import Wizard will treat them as a single separator.
Double Quotes:
- Remove
- As String Delimiters
Simple Quotes:
- Remove
- As String Delimiters
Transpose:
- By default, BayesiaLab expects the data source to be arranged in
  - columns corresponding to variables and
  - rows corresponding to samples, records, or observations.
- Checking the Transpose option allows you to accept an alternate format, i.e.,
  - rows corresponding to variables and
  - columns corresponding to samples, records, or observations.
- The transposed format is commonly used in bioinformatics. For instance, variables representing genes — sometimes tens of thousands — are arranged row by row. Observations — sometimes only a few dozen — are placed in columns side by side.

The data table at the bottom of the window provides a preview of how the Data Import Wizard sees and interprets your dataset.

Blank fields indicate a Missing Value.
Asterisks (*) mark Filtered Values. In the dataset shown below, for instance, Filtered Values were assigned to all males and post-menopausal women for the variable Pregnancy Status. For those two groups and for obvious reasons, pregnancy is impossible.
Horizontal and vertical sliders allow you to scroll and view the entire dataset. Alternatively, you can move your mouse's scroll wheel up and down.
If a variable name exceeds the column width, you can click on the divider between column headers and drag it into the desired position. Alternatively, double-click the divider to auto-fit the column width to the variable name.

Workflow Animation

In the following animation, we show a dataset that requires numerous settings to be adjusted for proper import:

The dataset uses the pipe character ("|") as a delimiter.
All fields are enclosed in double quotes.
Multiple, arbitrary codes are used for Missing Values:
- "Refused"
- "unknown"
"Not Applicable" is the code for Filtered Value used in this dataset.

Note that there are no standardized codes for Missing Values and Filtered Values. They can be as arbitrary as in this example. Therefore, it is of utmost importance that whoever prepares the dataset must convey the precise meaning of these codes to the analyst who imports the data into BayesiaLab.

Step 2 — Definition of Variable Types

Context

In Step 2 — Definition of Variable Types of the five-step Data Import Wizard, you need to define variable types.
Step 2 contains four panels that relate to each other in their content and available actions.

Overview of Elements in Step 2

Type

With the radio buttons in the Type panel, you can define the type of each variable.
Before you start making your determinations, BayesiaLab has already made some guesses regarding the appropriate variable type, i.e., Discrete versus Continuous.
Furthermore, some variables have limited options regarding the variable type because of their distributions:
- If a variable has the same value for all observations, it falls into the Unused variable type. Such a not-distributed variable cannot be imported at all into BayesiaLab.
- Variables that contain any text values cannot be declared Continuous variables.
- Variables with Missing Values cannot be of the type Weight, Row Identifier, or Learn/Test.

Usage

You can perform the selection of multiple variables with keystroke combinations commonly used in spreadsheet editing:
- Ctrl+Click: add a variable to the current selection.
- Shift+Click: add all variables between the currently selected and the clicked variable to the selection.
- Shift+End: select all variables from the currently selected variable to the rightmost variable in the table.
- Shift+Home: select all variables from the currently selected variable to the leftmost variable in the table.
The current selection is highlight by showing the selected columns in a darker shade of their current color.

Discrete

The Discrete type considers each unique value of the variable a distinct state.
Any variable that contains text will be considered Discrete by default.
The maximum number of unique values that can be accommodated can be specified under Main Menu > Window > Preference > Editing > Node > Maximum Number of States.

Continuous

The Continuous type applies to numerical variables, which must be discretized in Step 4 — Discretization and Aggregation.
If a variable contains integer values above a certain threshold, the variable will be considered Continuous.
You can specify this threshold under Main Menu > Windows > Preferences > Data > Import & Associate > Threshold for Assuming Integers as Continuous. The default threshold value is 5.

Learn more about Discrete and Continuous nodes in the Node Editor topic.

Weight

Weighting is often applied to surveys to make a survey sample representative of the demographics of the underlying population.

If your dataset contains such a Weight variable, select it by clicking on the corresponding column.
Then, select the Weight button in the Type panel.
Later, in Step 4 — Discretization and Aggregation, you can specify whether or not to normalize the Weight variable.

Learning/Test

For a dataset that has already been split into a Learning Set and a Test Set, you can use such an existing definition to import your data into BayesiaLab.

Both the Learning Set and the Test Set need to be in the same data table, rather than in separate files.
A binary indicator variable needs to identify each set with a unique code.
With a Learning/Test variable defined, in Step 4 — Discretization and Aggregation of the Data Import Wizard, you need to assign which of your codes corresponds to BayesiaLab's Learning and Test states.

Row Identifier

You can assign one or more variables to serve as Row Identifiers. The values of Row Identifiers are imported but not processed in any way. They serve as labels that are attached to each record.

There are numerous functions in BayesiaLab that allow you to look up what record in the dataset corresponds to what is currently on display on the screen.
For instance, Automatic Evidence-Setting displays the Row Identifier in the Status Bar.

Unused

By selecting the Unused button, you can skip the import of the selected variables. In previous versions of BayesiaLab, this option was also known as "Not Distributed."

Unused is automatically applied to variables containing only a single value across all observations, i.e., when the variable is "not distributed," hence the original name.
Unused variables will appear grayed out in the remaining steps of the Data Import Wizard.

Multiple Typing

The Multiple Typing panel allows you to quickly assign variable types across multiple variables.

By clicking either button, all previous type assignments are replaced.

You can automatically remove variables, i.e., set them to the Unused type, if they exceed a certain column percentage of Missing Values.

Click the Set Missing Values Threshold button.
From the pop-up window, set the percentage.

All variables that exceed the specified threshold are set to Unused.

Information

The Information panel provides a range of statistics relating to the current type assignment of variables:

Number of Rows refers to the number of records in the to-be-imported datasets. In the context of datasets, rows, records, cases, samples, and observations all have equivalent meanings.
Others displays the count of all the variable assigned to the types Row Identifier, Weight, or Learn/Test.
Unused shows the absolute count of variables currently assigned to the Unused type. The percentage refers to the proportion of Unused variables among all variables.
Missing Values displays the count of cells in the dataset that contain Missing Values. The percentage refers to the proportion of cells in the dataset that contain Missing Values, including all variables types, even Unused, Row Identifier, and Learning/Test.
Filtered Values displays the count of cells in the dataset that contain Filtered Values, as indicated by the asterisk (*). The percentage refers to the proportion of cells in the dataset that contain Filtered Values, including all variable types, even Unused, Row Identifier, and Learning/Test.

Data

Horizontal and vertical scrolling allows you to view the entire dataset that will be imported.

Workflow Animation

Step 3 — Data Selection, Filtering, and Missing Value Processing

Context

Step 3 of the five-step Data Import Wizard deals with Data Selection, Filtering, and Missing Values Processing.

Overview of Elements

Data

This Data panel resembles the Data panel from Step 2 — Definition of Variable Types.

However, there are several important additional pieces of information available:

- For Discrete variables, it shows the frequencies of all states, including Missing Values and Filtered Values:

As you experiment with checking/unchecking, you can see how the Number of Rows in the Information panel changes.

In terms of a data query, the Filter checkbox would be the equivalent of a nominal value row filter.

Note that the number of Filtered Values does not refer to the number of excluded rows due to an unchecked Filter checkbox.

For Continuous variables, it shows the standard statistics, such as Minimum, Maximum, Mean, and Standard Deviation. Additionally, the table displays the frequencies of non-missing values, Missing Values, and Filtered Values:

Select Values

Three actions are available in this panel:

You can choose the logic for combining the Filters and Minima/Maxima assigned in the Data panel:
- OR: a row will be removed if ANY of the selected Filters or specified Minima/Maxima across all variables apply to that row.
- AND: a row will only be removed of ALL of the selected Filters and specified Minima/Maxima across all variables that apply to that row.
Click the Show Selections button to review what Filters and Minima/Maxima are currently in place.
Note the syntax for Discrete variables: The variable name is followed by "in" (i.e., is an element of) followed by the included values shown as an array in square brackets.
Further logical expressions are shown as conjunctions (AND) or disjunctions (OR) in separate lines.

Clicking the Delete Selections button removes all Filters and Minima/Maxima currently in place.

Missing Values Processing

In the Missing Value Processing panel you can specify which kind of processing to apply to variables with Missing Values, i.e., Filter, Replace, and Infer.

Filter

The Filter function allows you to remove rows from the dataset that contain Missing Values. This is equivalent to what is commonly known as casewise deletion.

You can apply the Filter individually to any variable that contains Missing Values.

Usage

Then, check the Filter checkbox in the Missing Values Processing panel.
Next, choose the logical condition to apply when you select multiple variables to be subject to the Filter.
- OR: a row will be removed if ANY of the selected variables contain a Missing Value in that row.
- AND: a row will only be removed of ALL of the selected variables containing a Missing Value in that row.

Before applying Filter, please consider the implications discussed in Chapter 9: Missing Values Processing.

Replace By

With the Replace By function, you can specify a value for replacing the Missing Values in the selected variable.

You have several options in this regard:

You can set a specific value:
- For a Discrete variable, you can select among the values observed in the variable from a drop-down list.

Alternatively, you can choose the Modal value, i.e., the most frequently occurring value of the variable in the dataset.

For a Continuous variable, you can select to use the Mean value computed from the dataset.

As an alternative, you can specify any arbitrary value.

Infer

For practical analysis purposes, the Infer option is the most common method for Missing Values Processing.

The Methods in Detail:

Infer — Static Imputation
Infer — Dynamic Imputation
Infer — Structural EM
Infer — Entropy-Based Imputations

Information

Weights

Context

This screen is only available if you designated a Weight variable in Step 2 — Definition of Variable Types.

Usage

Click on that Weight variable in the Data panel, and the Normalize Weights checkbox appears as the only option on the screen.

You need to determine whether to apply Normalize Weights or not:
- If yes, the Weights will be normalized so that the total number of cases considered by BayesiaLab for machine learning is equal to the actual number of samples in the dataset.
- If no, the Weight variable will be treated as representing the actual number of observed cases. So, a weight of 10 for one observation would be treated and counted like ten instances of that same observation. As a result, the total number of cases considered by BayesiaLab would correspond to the population from which the weight was calculated.
- This example illustrates the situation for a survey consisting of 10 observations:
- If you do not normalize, BayesiaLab would consider a sample of 100 for learning purposes and presumably find spurious relationships. This "over-counting" by a factor of 10 has the same effect as reducing the Structural Coefficient to 0.1.
- If you normalize, BayesiaLab considers the correct proportions of the weighted samples but still only considers ten observations in total for learning purposes.

If you have specified a Weight variable, it will be taken into account in the Discretization and Aggregation algorithms.

Discretization

Context

BayesiaLab requires the discretization of all Continuous variables, and in this screen, you need to specify how to discretize those variables.
The Discretization process determines how a Continuous variable will be imported into BayesiaLab, i.e.,
- the number of intervals (or bins);
- the values of the thresholds which define the ranges of the intervals.
These attributes define the transformation of the underlying Continuous variable in the dataset into a discretized Continuous node in BayesiaLab.

To learn more about the important distinction between Continuous and Discrete nodes, please see these topics:

Continuous Nodes
Discrete Nodes

Usage

Select one or more Continuous variables and click into one of the headers or one of the corresponding columns.
The Discretization panel appears.

Discretization Types Overview

The first item in the Discretization panel is the Discretization Type drop-down menu.
The items on this list can be grouped into Automatic Discretization versus Manual Discretization.
- The bottom item on the drop-down menu, Manual, refers to a Manual Discretization approach in which you have full control over thresholds, etc.
- The remaining eleven items all refer to different kinds of Automatic Discretization.

However, even in Manual Discretization, you take advantage of the algorithms available with Automatic Discretization.

Discretization Types in Detail

Manual Discretization
Automatic Discretization

Export Dictionary

Edit

View

Hellixia

Learning

Inference

Analysis

Monitor

Tools

Window

Help

Network

Edit

View

Display

Modeling Tools

Monitor Tools

Inference Tools

Analysis Tools

Miscellaneaous Tools

Graph Bar

What's New?

Learn about the latest innovations in BayesiaLab 11

BayesiaLab 11

Release 11 once again features many innovations, including the native integration of a LLM-based subject matter assistant (OpenAI, OpenAI GPT Assistants, Azure, Mistral, ...).

Here is a selection of the most important new features:

Hellixia

Hellixia is the name of BayesiaLab's subject matter assistant based on Large Language Models. Hellixia offers a wide range of functions to help you characterize a given problem domain:

Dimension Elicitor: Identify relevant dimensions of a problem domain by using a large set of keywords and create the corresponding nodes.
Comment Generator: Utilize a comprehensive set of keywords to pinpoint relevant dimensions within a problem domain and add them as comments to the nodes.
Embedding Generator: This tool creates embeddings encapsulating node semantics, featuring vectors of 1,536 dimensions, enabling the learning of semantic networks.
Class Description Generator: Generate descriptive summaries for set of nodes to use as names for latent variables, for instance.
Semantic Variable Clustering: Create clusters of nodes based on their semantic.
Pairwise Causal Link: This function evaluates the causal relationship between two nodes, adding an arc if a link exists. It also quantifies the causal effect (ranging from -100 to 100) and creates or updates the conditional probability table accordingly.
Causal Structural Priors: This tool assesses the causal relationship between two nodes and creates a Structural Prior if a relationship exists. The value of the prior reflects the confidence level in the relationship's existence.
Causal Arc Explainer: This tool examines the causal relationship between two nodes, providing a detailed description of the causal mechanism when a relationship is identified. Additionally, it quantifies the causal effect, with values ranging from -100 to 100.
Causal Network Generator: This tool develops a Causal Bayesian Network focused on the chosen node. It generates new nodes, adds detailed comments for each causal link explaining the mechanism, determines causal effects (with values between -100 and 100), and constructs the conditional probability tables.
Causal Relationships Finder: This tool, akin to the Causal Network Generator, is designed to build a causal network using a predefined set of nodes instead of centering around a single node and generating new nodes.
Image Generator: This feature produces icons that visually represent the information linked to the nodes.
Translator: This function translates various network elements — including names of nodes, states, and comments on nodes and arcs — into the chosen language.
Report Analyzer: This tool processes the output from the Relationship Analysis Report, such as arc and node forces, and creates an HTML report that details the key dynamics of the domain represented by the network.

pageIntroducing Hellixia

Independence of Causal Influence

The Independence of Causal Influence (ICI) tool has been enhanced with several updates:

New Combination Functions

SumPos(): An asymmetrical variation of the Sum function focusing on positive local mechanical effects.
SumNeg(): A counterpart that emphasizes negative local mechanical effects.
MinMax(): A function that implements the min method for negative values and the max method for positive ones.

ICI Wizard Enhancement

A Condensed Display option has been introduced. This feature creates a network where the local effects are snapped to their parent and the combination nodes to their respective children.

The Expert Editor has been rebranded as the SMEs & BEKEE Session Manager.
Subject Matter Experts (SMEs) can now be identified with specific colors for better differentiation.
There's an option to decide whether to send out invitation emails to the SMEs.
In terms of qualitative knowledge elicitation, specifically the qualitative segment of the Delphi Method, you can now utilize the Assessment Editor to produce Notes directly on the Graph Panel, derived from the comments provided by experts.

When eliciting a node, its current distribution can be dispatched as a prior to all experts in BEKEE, serving as an alternative to the default uniform distribution.
Node Contextual Menu:
- Generate from Assessments: this function facilitates the creation of distributions based on the weighted votes of chosen experts.
- Generate Assessments: This feature uses the node's current probability distribution to create an assessment associated with a selected expert. When Prior Weights are linked to the node, there's an option to use these weights to determine the expert's confidence level in the assessments.
- Delete Zero-Confidence Assessments: this option removes all assessments in which the expert's confidence level is set to 0.
- Delete Assessments: his feature deletes the assessments linked to the chosen experts.
Hellinger Distance: Measures the distance between experts' votes and a reference expert (usually the consensus).
2D/3D Mapping incorporates new metrics derived from experts' assessments.

Formulas

The Formulas tab in the Node Editor now supports local variables.

Additionally, new functions have been introduced, with some of the most notable being:

TriangularMD(v1, x), i.e., triangular membership degree in fuzzy logic (under Special Functions)

Deciban(x): The deciban is a logarithmic unit — much like the decibel or the Richter scale — introduced by Alan Turing for expressing probabilities. It is a tenth of a ban, which is also known as the base-10 log odds (under Arithmetic Functions)
Hellinger(v1, v2): The Hellinger distance is a measure of the similarity between two probability distributions (under Inference Functions)
NoisySum(s, leak, v1, w1, vn, wn): Used for representing situations where the variable s is the weighted (wi) sum of its parents (vi) plus an additional noise term (leak) to model uncertainty or random fluctuations
DualNoisyOr(s, leak, c1, p1, cn, pn): This function implements a modified Noisy-Or model that operates based on the combined effect of all pi values. The parameters ci represent conditions or boolean variables, while pi are their associated effects (positive or negative). When the aggregated sum of pi values is positive, the function executes a Noisy-Or with an overall effect equal to this sum, effectively determining the probability of the True state. Conversely, when the sum is negative, the function applies the Noisy-Or logic to the False state, adjusting the likelihood of the outcome being False according to this negative sum
SingleMode(v): A function designed to ascertain whether the distribution of variable v is unimodal (under Inference Functions).

Weight of Evidence

Weight of Evidence now features four new types of analyses:

Most/Least Relevant Explanations
Most/Least Confirmatory Clues

Structural Learning Algorithms

Evidence Scenario Files

The data associated with the network can now be exported into an evidence scenario file.
Scenarios are now editable, allowing adjustments to the index, weight, and comments.
A new Evidence Scenario Report is now accessible, offering a detailed description of the scenarios' content.

Target Evaluation Tool

The redesigned Target Evaluation function now features dedicated tabs for:

Classification
Posterior Probabilities
Regression
Triage

Graph Layout, Rendering and Edition

Dynamic Grid Layout: This innovative layout algorithm, particularly suitable for creating readable graphs featuring badges with associated comments, excels in handling graphs created with Hellixia.

View Menu: four new functions have been introduced to optimize the display of graphs. Users can now shrink or stretch graphs both vertically and horizontally, offering enhanced visualization flexibility.

Position Menu: this new item has been introduced to enable the adjustment of the graphical layers of Nodes and Notes. It's available via their contextual menus.

Horizontal and Vertical Stacking: These new alignment tools enable the positioning of the selected nodes horizontally or vertically, aligning them automatically closely without extra space.

Highlight a Class: Accessible from the Note Contextual Menu, this feature lets you select a Class and then automatically adjusts the size and position of the note to encompass all nodes belonging to that class.

Arc Editor: Accessible by double-clicking an arc, this feature enables you to edit the text associated with the arc as well as its rendering properties.
Moving Arc Comments: You can now reposition comments along their corresponding arcs.

Color Linked: This new feature, added to the Rendering Properties of Badges, Monitors, Bars, and Gauges, automatically applies the node's associated color to the Name Background Color. Additionally, it also automatically selects white for the Name Color on dark backgrounds and black on lighter ones.
By pressing 'Z', a selection zone can be initiated, regardless of whether an object on the graph is clicked.
Numerical Evidence Entry for Gauges and Bars: A new approach is introduced for inputting numerical evidence through shift-clicking on a node. Utilize the 'M' and 'B' icons to select the Distribution Estimation Method (MinXEnt and Binary, respectively), with the three icon colors representing the Observation Type: No Fixing, Fix Mean, and Fix Probabilities, respectively.

Pseudo Root-Nodes: If a node exclusively has Function Nodes as parents, making it a root node of its subnetwork, and the parents of these Function Nodes have fixed observed values, then the distribution of these pseudo root-nodes is also automatically set to fixed.

Boolean Conversion: Featured in the Tools menu, this function enables the conversion of selected nodes into boolean nodes.

2D Mapping

The 2D mapping has been enhanced to incorporate an additional dimension for node analysis: Font Size, supplementing the existing Node Size and Color dimensions. This enables font sizes to be proportional to the selected metric.

The Node Analysis section has been enriched with the addition of numerous metrics, providing a more comprehensive analysis capability:
- Mutual Information with Target Node
- Mutual Information with Target State
- Bayes Factor
- Normalized Bayes Factor
- Kullback-Leibler
- Normalized Kullback-Leibler
- Total Effect on Target
- Standardized Total Effect on Target
- Direct Effect on Target
- Standardized Direct Effect on Target
- Number of Assessments
- Assessment Completion Rate
- Maximum Assessment Divergence
- Overall Assessment Divergence
- Missing Value Rate
Comments associated with the nodes are now displayed when you hover over them.
The option Hide Text for Ignored Nodes conceals the names of nodes that are not observable.