To this day, no reliable methods exist to find causal relationships in data. More specifically, given a statistical association between two variables, it is impossible to establish which variable is the cause and which is the effect.
As a result, acquiring additional external information, such as human expert knowledge or the temporal order of the variables, has always been necessary to determine the causal direction in bivariate relationships.
Given the importance of domain knowledge, the BayesiaLab team has developed new tools for expert knowledge elicitation for many years, such as the Bayesia Expert Knowledge Elicitation Environment (BEKEE).
Thus, the arrival of ChatGPT last year has prompted the Bayesia team to immediately leverage the potential of this new type of AI with BayesiaLab.
Hellixia is the name of BayesiaLab's subject matter assistant powered by ChatGPT. Hellixia offers a wide range of functions to help you characterize a given problem domain:
Identify relevant dimensions of a problem domain
Extract dimensions from a text
Generate embeddings for learning a semantic network
Generate meaningful descriptions for classes of nodes
Provide tools for causal analysis
Translate names and comments of nodes into different languages
Generate images to be associated with nodes
In the context of machine learning and natural language processing (NLP), embedding refers to a mathematical representation of a word, phrase, sentence, or any other linguistic unit in a continuous vector space. Word embeddings, in particular, are widely used representations that capture the semantic and syntactic properties of words.
A semantic network is a graphical representation of knowledge or concepts organized in a network-like structure. It is a form of knowledge representation that depicts how different concepts or entities are related to each other through meaningful connections.
In a semantic network, concepts are represented as nodes, and their relationships are depicted as labeled links or arcs. These links indicate the connections or associations between the concepts, such as hierarchical, associative, or causal relationships.
A typical research workflow with Hellixia consists of the following steps:
BayesiaLab utilizes its structural learning algorithms to find associations between variables.
Then, Hellixia obtains the causal directions for the learned associations and applies them as structural priors to the network.
Finally, with these newly defined structural priors, BayesiaLab relearns the network. The final network now represents statistical knowledge from data plus the causal knowledge obtained from ChatGPT.
The Tübingen Cause-Effect Pairs is a well-known dataset for assessing the performance of causal discovery methods. When tested against this dataset, Hellixia achieves 98% accuracy. The only errors are related to financial relationships for which Hellixia could not retrieve any causal relationships from ChatGPT.
The feature highlight of the BayesiaLab 11 is the integration of Hellixia, a subject matter assistant that leverages ChatGPT for structural knowledge elicitation.
In this presentation from the 2023 BayesiaLab Spring Conference, we show the new Hellixia functions integrate GPT-4 directly into BayesiaLab, including:
Chat Completion
Image Generation
Embedding Generation
As a result, the new Hellixia subject matter assistant can improve research workflows in several ways:
Accelerate the qualitative part of knowledge elicitation.
Generate practical natural language descriptions for latent factors created through BayesiaLab's clustering functions.
Automatically create images to illustrate nodes in a network.
Learn about the latest innovations in BayesiaLab 11
Version 11 of BayesiaLab is the latest iteration of our flagship product that has been under continuous development for nearly 25 years. No other organization has invested as many resources in developing technologies around the Bayesian network paradigm.
Release 11 once again features many innovations, including the native integration of a LLM-based subject matter assistant (OpenAI, OpenAI GPT Assistants, Azure, Mistral, ...).
Here is a selection of the most important new features:
Hellixia is the name of BayesiaLab's subject matter assistant based on Large Language Models. Hellixia offers a wide range of functions to help you characterize a given problem domain:
Dimension Elicitor: Identify relevant dimensions of a problem domain by using a large set of keywords and create the corresponding nodes.
Comment Generator: Utilize a comprehensive set of keywords to pinpoint relevant dimensions within a problem domain and add them as comments to the nodes.
Embedding Generator: This tool creates embeddings encapsulating node semantics, featuring vectors of 1,536 dimensions, enabling the learning of semantic networks.
Class Description Generator: Generate descriptive summaries for set of nodes to use as names for latent variables, for instance.
Semantic Variable Clustering: Create clusters of nodes based on their semantic.
Pairwise Causal Link: This function evaluates the causal relationship between two nodes, adding an arc if a link exists. It also quantifies the causal effect (ranging from -100 to 100) and creates or updates the conditional probability table accordingly.
Causal Structural Priors: This tool assesses the causal relationship between two nodes and creates a Structural Prior if a relationship exists. The value of the prior reflects the confidence level in the relationship's existence.
Causal Arc Explainer: This tool examines the causal relationship between two nodes, providing a detailed description of the causal mechanism when a relationship is identified. Additionally, it quantifies the causal effect, with values ranging from -100 to 100.
Causal Network Generator: This tool develops a Causal Bayesian Network focused on the chosen node. It generates new nodes, adds detailed comments for each causal link explaining the mechanism, determines causal effects (with values between -100 and 100), and constructs the conditional probability tables.
Causal Relationships Finder: This tool, akin to the Causal Network Generator, is designed to build a causal network using a predefined set of nodes instead of centering around a single node and generating new nodes.
Image Generator: This feature produces icons that visually represent the information linked to the nodes.
Translator: This function translates various network elements — including names of nodes, states, and comments on nodes and arcs — into the chosen language.
Report Analyzer: This tool processes the output from the Relationship Analysis Report, such as arc and node forces, and creates an HTML report that details the key dynamics of the domain represented by the network.
The Independence of Causal Influence (ICI) tool has been enhanced with several updates:
SumPos()
: An asymmetrical variation of the Sum function focusing on positive local mechanical effects.
SumNeg()
: A counterpart that emphasizes negative local mechanical effects.
MinMax()
: A function that implements the min method for negative values and the max method for positive ones.
A Condensed Display option has been introduced. This feature creates a network where the local effects are snapped to their parent and the combination nodes to their respective children.
The Expert Editor has been rebranded as the SMEs & BEKEE Session Manager.
Subject Matter Experts (SMEs) can now be identified with specific colors for better differentiation.
There's an option to decide whether to send out invitation emails to the SMEs.
In terms of qualitative knowledge elicitation, specifically the qualitative segment of the Delphi Method, you can now utilize the Assessment Editor to produce Notes directly on the Graph Panel, derived from the comments provided by experts.
When eliciting a node, its current distribution can be dispatched as a prior to all experts in BEKEE, serving as an alternative to the default uniform distribution.
Node Contextual Menu:
Generate from Assessments: this function facilitates the creation of distributions based on the weighted votes of chosen experts.
Generate Assessments: This feature uses the node's current probability distribution to create an assessment associated with a selected expert. When Prior Weights are linked to the node, there's an option to use these weights to determine the expert's confidence level in the assessments.
Delete Zero-Confidence Assessments: this option removes all assessments in which the expert's confidence level is set to 0.
Delete Assessments: his feature deletes the assessments linked to the chosen experts.
Hellinger Distance: Measures the distance between experts' votes and a reference expert (usually the consensus).
2D/3D Mapping incorporates new metrics derived from experts' assessments.
The Formulas tab in the Node Editor now supports local variables.
Additionally, new functions have been introduced, with some of the most notable being:
TriangularMD(v1, x)
, i.e., triangular membership degree in fuzzy logic (under Special Functions)
Deciban(x)
: The deciban is a logarithmic unit — much like the decibel or the Richter scale — introduced by Alan Turing for expressing probabilities. It is a tenth of a ban, which is also known as the base-10 log odds (under Arithmetic Functions)
Hellinger(v1, v2)
: The Hellinger distance is a measure of the similarity between two probability distributions (under Inference Functions)
NoisySum(s, leak, v1, w1, vn, wn):
Used for representing situations where the variable s
is the weighted (wi
) sum of its parents (vi)
plus an additional noise term (leak)
to model uncertainty or random fluctuations
DualNoisyOr(s, leak, c1, p1, cn, pn):
This function implements a modified Noisy-Or model that operates based on the combined effect of all pi
values. The parameters ci
represent conditions or boolean variables, while pi
are their associated effects (positive or negative). When the aggregated sum of pi
values is positive, the function executes a Noisy-Or with an overall effect equal to this sum, effectively determining the probability of the True state. Conversely, when the sum is negative, the function applies the Noisy-Or logic to the False state, adjusting the likelihood of the outcome being False according to this negative sum
SingleMode(v)
: A function designed to ascertain whether the distribution of variable v is unimodal (under Inference Functions).
Weight of Evidence now features four new types of analyses:
Most/Least Relevant Explanations
Most/Least Confirmatory Clues
The EQ-based learning algorithms are now disabled in scenarios where the score of an arc is not equivalent in both directions. This can occur due to filtered states, constraints, structural priors, etc. The assumption of equivalence is no longer theoretically valid in such contexts and could result in invalid networks with cycles.
The data associated with the network can now be exported into an evidence scenario file.
Scenarios are now editable, allowing adjustments to the index, weight, and comments.
A new Evidence Scenario Report is now accessible, offering a detailed description of the scenarios' content.
The redesigned Target Evaluation function now features dedicated tabs for:
Classification
Posterior Probabilities
Regression
Triage
Dynamic Grid Layout: This innovative layout algorithm, particularly suitable for creating readable graphs featuring badges with associated comments, excels in handling graphs created with Hellixia.
View Menu: four new functions have been introduced to optimize the display of graphs. Users can now shrink or stretch graphs both vertically and horizontally, offering enhanced visualization flexibility.
Position Menu: this new item has been introduced to enable the adjustment of the graphical layers of Nodes and Notes. It's available via their contextual menus.
Horizontal and Vertical Stacking: These new alignment tools enable the positioning of the selected nodes horizontally or vertically, aligning them automatically closely without extra space.
Highlight a Class: Accessible from the Note Contextual Menu, this feature lets you select a Class and then automatically adjusts the size and position of the note to encompass all nodes belonging to that class.
Arc Editor: Accessible by double-clicking an arc, this feature enables you to edit the text associated with the arc as well as its rendering properties.
Moving Arc Comments: You can now reposition comments along their corresponding arcs.
Color Linked: This new feature, added to the Rendering Properties of Badges, Monitors, Bars, and Gauges, automatically applies the node's associated color to the Name Background Color. Additionally, it also automatically selects white for the Name Color on dark backgrounds and black on lighter ones.
By pressing 'Z', a selection zone can be initiated, regardless of whether an object on the graph is clicked.
Numerical Evidence Entry for Gauges and Bars: A new approach is introduced for inputting numerical evidence through shift-clicking on a node. Utilize the 'M' and 'B' icons to select the Distribution Estimation Method (MinXEnt and Binary, respectively), with the three icon colors representing the Observation Type: No Fixing, Fix Mean, and Fix Probabilities, respectively.
Pseudo Root-Nodes: If a node exclusively has Function Nodes as parents, making it a root node of its subnetwork, and the parents of these Function Nodes have fixed observed values, then the distribution of these pseudo root-nodes is also automatically set to fixed.
Boolean Conversion: Featured in the Tools menu, this function enables the conversion of selected nodes into boolean nodes.
The 2D mapping has been enhanced to incorporate an additional dimension for node analysis: Font Size, supplementing the existing Node Size and Color dimensions. This enables font sizes to be proportional to the selected metric.
The Node Analysis section has been enriched with the addition of numerous metrics, providing a more comprehensive analysis capability:
Mutual Information with Target Node
Mutual Information with Target State
Bayes Factor
Normalized Bayes Factor
Kullback-Leibler
Normalized Kullback-Leibler
Total Effect on Target
Standardized Total Effect on Target
Direct Effect on Target
Standardized Direct Effect on Target
Number of Assessments
Assessment Completion Rate
Maximum Assessment Divergence
Overall Assessment Divergence
Missing Value Rate
Comments associated with the nodes are now displayed when you hover over them.
The option Hide Text for Ignored Nodes conceals the names of nodes that are not observable.