Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
To this day, no reliable methods exist to find causal relationships in data. More specifically, given a statistical association between two variables, it is impossible to establish which variable is the cause and which is the effect.
As a result, acquiring additional external information, such as human expert knowledge or the temporal order of the variables, has always been necessary to determine the causal direction in bivariate relationships.
Given the importance of domain knowledge, the BayesiaLab team has developed new tools for expert knowledge elicitation for many years, such as the Bayesia Expert Knowledge Elicitation Environment (BEKEE).
Thus, the arrival of ChatGPT last year has prompted the Bayesia team to immediately leverage the potential of this new type of AI with BayesiaLab.
Hellixia is the name of BayesiaLab's subject matter assistant powered by ChatGPT. Hellixia offers a wide range of functions to help you characterize a given problem domain:
Identify relevant dimensions of a problem domain
Extract dimensions from a text
Generate embeddings for learning a semantic network
Generate meaningful descriptions for classes of nodes
Provide tools for causal analysis
Translate names and comments of nodes into different languages
Generate images to be associated with nodes
In the context of machine learning and natural language processing (NLP), embedding refers to a mathematical representation of a word, phrase, sentence, or any other linguistic unit in a continuous vector space. Word embeddings, in particular, are widely used representations that capture the semantic and syntactic properties of words.
A semantic network is a graphical representation of knowledge or concepts organized in a network-like structure. It is a form of knowledge representation that depicts how different concepts or entities are related to each other through meaningful connections.
In a semantic network, concepts are represented as nodes, and their relationships are depicted as labeled links or arcs. These links indicate the connections or associations between the concepts, such as hierarchical, associative, or causal relationships.
A typical research workflow with Hellixia consists of the following steps:
BayesiaLab utilizes its structural learning algorithms to find associations between variables.
Then, Hellixia obtains the causal directions for the learned associations and applies them as structural priors to the network.
Finally, with these newly defined structural priors, BayesiaLab relearns the network. The final network now represents statistical knowledge from data plus the causal knowledge obtained from ChatGPT.
The Tübingen Cause-Effect Pairs is a well-known dataset for assessing the performance of causal discovery methods. When tested against this dataset, Hellixia achieves 98% accuracy. The only errors are related to financial relationships for which Hellixia could not retrieve any causal relationships from ChatGPT.
The feature highlight of the BayesiaLab 11 is the integration of Hellixia, a subject matter assistant that leverages ChatGPT for structural knowledge elicitation.
In this presentation from the 2023 BayesiaLab Spring Conference, we show the new Hellixia functions integrate GPT-4 directly into BayesiaLab, including:
Chat Completion
Image Generation
Embedding Generation
As a result, the new Hellixia subject matter assistant can improve research workflows in several ways:
Accelerate the qualitative part of knowledge elicitation.
Generate practical natural language descriptions for latent factors created through BayesiaLab's clustering functions.
Automatically create images to illustrate nodes in a network.
An All-New Website for BayesiaLab 11
With the release of BayesiaLab 11, we are also transitioning to an entirely new website. If you can't find the content you are looking for on this new site, please check the Legacy Edition of the BayesiaLab Knowledge Hub.
Learn about the latest innovations in BayesiaLab 11
Version 11 of BayesiaLab is the latest iteration of our flagship product that has been under continuous development for nearly 25 years. No other organization has invested as many resources in developing technologies around the Bayesian network paradigm.
Release 11 once again features many innovations, including the native integration of a LLM-based subject matter assistant (OpenAI, OpenAI GPT Assistants, Azure, Mistral, ...).
Here is a selection of the most important new features:
Hellixia is the name of BayesiaLab's subject matter assistant based on Large Language Models. Hellixia offers a wide range of functions to help you characterize a given problem domain:
Dimension Elicitor: Identify relevant dimensions of a problem domain by using a large set of keywords and create the corresponding nodes.
Comment Generator: Utilize a comprehensive set of keywords to pinpoint relevant dimensions within a problem domain and add them as comments to the nodes.
Embedding Generator: This tool creates embeddings encapsulating node semantics, featuring vectors of 1,536 dimensions, enabling the learning of semantic networks.
Class Description Generator: Generate descriptive summaries for set of nodes to use as names for latent variables, for instance.
Semantic Variable Clustering: Create clusters of nodes based on their semantic.
Pairwise Causal Link: This function evaluates the causal relationship between two nodes, adding an arc if a link exists. It also quantifies the causal effect (ranging from -100 to 100) and creates or updates the conditional probability table accordingly.
Causal Structural Priors: This tool assesses the causal relationship between two nodes and creates a Structural Prior if a relationship exists. The value of the prior reflects the confidence level in the relationship's existence.
Causal Arc Explainer: This tool examines the causal relationship between two nodes, providing a detailed description of the causal mechanism when a relationship is identified. Additionally, it quantifies the causal effect, with values ranging from -100 to 100.
Causal Network Generator: This tool develops a Causal Bayesian Network focused on the chosen node. It generates new nodes, adds detailed comments for each causal link explaining the mechanism, determines causal effects (with values between -100 and 100), and constructs the conditional probability tables.
Causal Relationships Finder: This tool, akin to the Causal Network Generator, is designed to build a causal network using a predefined set of nodes instead of centering around a single node and generating new nodes.
Image Generator: This feature produces icons that visually represent the information linked to the nodes.
Translator: This function translates various network elements — including names of nodes, states, and comments on nodes and arcs — into the chosen language.
Report Analyzer: This tool processes the output from the Relationship Analysis Report, such as arc and node forces, and creates an HTML report that details the key dynamics of the domain represented by the network.
The Independence of Causal Influence (ICI) tool has been enhanced with several updates:
SumPos()
: An asymmetrical variation of the Sum function focusing on positive local mechanical effects.
SumNeg()
: A counterpart that emphasizes negative local mechanical effects.
MinMax()
: A function that implements the min method for negative values and the max method for positive ones.
A Condensed Display option has been introduced. This feature creates a network where the local effects are snapped to their parent and the combination nodes to their respective children.
The Expert Editor has been rebranded as the SMEs & BEKEE Session Manager.
Subject Matter Experts (SMEs) can now be identified with specific colors for better differentiation.
There's an option to decide whether to send out invitation emails to the SMEs.
In terms of qualitative knowledge elicitation, specifically the qualitative segment of the Delphi Method, you can now utilize the Assessment Editor to produce Notes directly on the Graph Panel, derived from the comments provided by experts.
When eliciting a node, its current distribution can be dispatched as a prior to all experts in BEKEE, serving as an alternative to the default uniform distribution.
Node Contextual Menu:
Generate from Assessments: this function facilitates the creation of distributions based on the weighted votes of chosen experts.
Generate Assessments: This feature uses the node's current probability distribution to create an assessment associated with a selected expert. When Prior Weights are linked to the node, there's an option to use these weights to determine the expert's confidence level in the assessments.
Delete Zero-Confidence Assessments: this option removes all assessments in which the expert's confidence level is set to 0.
Delete Assessments: his feature deletes the assessments linked to the chosen experts.
Hellinger Distance: Measures the distance between experts' votes and a reference expert (usually the consensus).
2D/3D Mapping incorporates new metrics derived from experts' assessments.
The Formulas tab in the Node Editor now supports local variables.
Additionally, new functions have been introduced, with some of the most notable being:
TriangularMD(v1, x)
, i.e., triangular membership degree in fuzzy logic (under Special Functions)
Deciban(x)
: The deciban is a logarithmic unit — much like the decibel or the Richter scale — introduced by Alan Turing for expressing probabilities. It is a tenth of a ban, which is also known as the base-10 log odds (under Arithmetic Functions)
Hellinger(v1, v2)
: The Hellinger distance is a measure of the similarity between two probability distributions (under Inference Functions)
NoisySum(s, leak, v1, w1, vn, wn):
Used for representing situations where the variable s
is the weighted (wi
) sum of its parents (vi)
plus an additional noise term (leak)
to model uncertainty or random fluctuations
DualNoisyOr(s, leak, c1, p1, cn, pn):
This function implements a modified Noisy-Or model that operates based on the combined effect of all pi
values. The parameters ci
represent conditions or boolean variables, while pi
are their associated effects (positive or negative). When the aggregated sum of pi
values is positive, the function executes a Noisy-Or with an overall effect equal to this sum, effectively determining the probability of the True state. Conversely, when the sum is negative, the function applies the Noisy-Or logic to the False state, adjusting the likelihood of the outcome being False according to this negative sum
SingleMode(v)
: A function designed to ascertain whether the distribution of variable v is unimodal (under Inference Functions).
Weight of Evidence now features four new types of analyses:
Most/Least Relevant Explanations
Most/Least Confirmatory Clues
The EQ-based learning algorithms are now disabled in scenarios where the score of an arc is not equivalent in both directions. This can occur due to filtered states, constraints, structural priors, etc. The assumption of equivalence is no longer theoretically valid in such contexts and could result in invalid networks with cycles.
The data associated with the network can now be exported into an evidence scenario file.
Scenarios are now editable, allowing adjustments to the index, weight, and comments.
A new Evidence Scenario Report is now accessible, offering a detailed description of the scenarios' content.
The redesigned Target Evaluation function now features dedicated tabs for:
Classification
Posterior Probabilities
Regression
Triage
Dynamic Grid Layout: This innovative layout algorithm, particularly suitable for creating readable graphs featuring badges with associated comments, excels in handling graphs created with Hellixia.
View Menu: four new functions have been introduced to optimize the display of graphs. Users can now shrink or stretch graphs both vertically and horizontally, offering enhanced visualization flexibility.
Position Menu: this new item has been introduced to enable the adjustment of the graphical layers of Nodes and Notes. It's available via their contextual menus.
Horizontal and Vertical Stacking: These new alignment tools enable the positioning of the selected nodes horizontally or vertically, aligning them automatically closely without extra space.
Highlight a Class: Accessible from the Note Contextual Menu, this feature lets you select a Class and then automatically adjusts the size and position of the note to encompass all nodes belonging to that class.
Arc Editor: Accessible by double-clicking an arc, this feature enables you to edit the text associated with the arc as well as its rendering properties.
Moving Arc Comments: You can now reposition comments along their corresponding arcs.
Color Linked: This new feature, added to the Rendering Properties of Badges, Monitors, Bars, and Gauges, automatically applies the node's associated color to the Name Background Color. Additionally, it also automatically selects white for the Name Color on dark backgrounds and black on lighter ones.
By pressing 'Z', a selection zone can be initiated, regardless of whether an object on the graph is clicked.
Numerical Evidence Entry for Gauges and Bars: A new approach is introduced for inputting numerical evidence through shift-clicking on a node. Utilize the 'M' and 'B' icons to select the Distribution Estimation Method (MinXEnt and Binary, respectively), with the three icon colors representing the Observation Type: No Fixing, Fix Mean, and Fix Probabilities, respectively.
Pseudo Root-Nodes: If a node exclusively has Function Nodes as parents, making it a root node of its subnetwork, and the parents of these Function Nodes have fixed observed values, then the distribution of these pseudo root-nodes is also automatically set to fixed.
Boolean Conversion: Featured in the Tools menu, this function enables the conversion of selected nodes into boolean nodes.
The 2D mapping has been enhanced to incorporate an additional dimension for node analysis: Font Size, supplementing the existing Node Size and Color dimensions. This enables font sizes to be proportional to the selected metric.
The Node Analysis section has been enriched with the addition of numerous metrics, providing a more comprehensive analysis capability:
Mutual Information with Target Node
Mutual Information with Target State
Bayes Factor
Normalized Bayes Factor
Kullback-Leibler
Normalized Kullback-Leibler
Total Effect on Target
Standardized Total Effect on Target
Direct Effect on Target
Standardized Direct Effect on Target
Number of Assessments
Assessment Completion Rate
Maximum Assessment Divergence
Overall Assessment Divergence
Missing Value Rate
Comments associated with the nodes are now displayed when you hover over them.
The option Hide Text for Ignored Nodes conceals the names of nodes that are not observable.
BayesiaLab features a comprehensive array of highly optimized algorithms to efficiently learn Bayesian networks from data (structure and parameters).
The optimization criteria in BayesiaLab’s learning algorithms are mostly based on information theory (e.g., the Minimum Description Length).
With that, no assumptions regarding the variable distributions are made. These algorithms can be used for all kinds and all sizes of problem domains, sometimes including thousands of variables with millions of potentially relevant relationships.
In statistics, “unsupervised learning” is typically understood to be a classification or clustering task. To make a clear distinction, we emphasize “structural” in “Unsupervised Structural Learning,” which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural Learning means that BayesiaLab can discover probabilistic relationships between many variables without having to specify input or output nodes. One might say that this is a quintessential form of knowledge discovery, as no assumptions are required to perform these algorithms on unknown datasets.
See Also
Webinar: Analyzing Capital Flows of Exchange-Traded Funds
Supervised Learning in BayesiaLab has the same objective as many traditional modeling methods, i.e., to develop a model for predicting a target variable.
Note that numerous statistical packages also offer “Bayesian Networks” as a predictive modeling technique. However, in most cases, these packages are restricted in their capabilities to one type of network, i.e., the Naive Bayes network.
BayesiaLab offers a much greater number of Supervised Learning algorithms to search for the Bayesian network that best predicts the target variable while also considering the complexity of the resulting network.
We should highlight the set of Markov Blanket algorithms for their speed, which is particularly helpful when dealing with many variables. In this context, the Markov Blanket algorithm can be an efficient variable selection algorithm.
See Examples & Learn More
Markov Blanket Learning Algorithms (9.0)
Chapter 6: Supervised Learning
Webinar: Diagnostic Decision Support
Clustering in BayesiaLab covers both Data Clustering and Variable Clustering.
Data Clustering applies to creating a Latent Variable whose states represent groups of observations (records) that share some characteristics.
Variable Clustering groups variables according to the strength of their relationships.
Multiple Clustering is one of the steps of BayesiaLab's Probabilistic Structural Equation Model (PSEM) workflow. It consists of iteratively using Data Clustering on subsets of data defined by Variable Clustering to create Latent Variables that represent the hidden causes that have been sensed by Manifest Variables. This can be considered as a kind of nonlinear, nonparametric, and nonorthogonal factor analysis.
See Examples & Learn More
Data Clustering (7.0)
Variable Clustering (7.0)
Multiple Clustering (9.0)
Chapter 8: Probabilistic Structural Equation Models Webinar: Factor Analysis Reinvented — Probabilistic Latent Factor Induction
BayesiaLab is a powerful desktop application (Windows/Mac/Unix) that provides scientists with a comprehensive “laboratory” for machine learning, knowledge modeling, probabilistic reasoning (incl. diagnosis and simulation), causal inference, and optimization.
BayesiaLab utilizes the Bayesian network framework for gaining deep insights into problem domains and reasoning about them.
BayesiaLab is the result of more than twenty years of research by Dr. Lionel Jouffe and Dr. Paul Munteanu and their team of computer scientists. Their company, Bayesia S.A.S., is headquartered in Laval in northwestern France, with affiliates in the U.S. and Singapore.
Today, Bayesia S.A.S. is the world’s leading supplier of Bayesian network software, serving hundreds of major corporations and research organizations around the world.
Learn about the innovations implemented in the latest version of BayesiaLab here: What's New?
Executive Summary This executive summary in PDF format explains on two pages how BayesiaLab can support you in your research and decision-making workflows. Pass it along to anyone in your organization who needs to know — in non-technical terms — what BayesiaLab can do.
Subject matter experts often express their causal understanding of a domain in the form of diagrams in which arrows indicate causal directions.
This visual representation of causes and effects has a direct analog in the network graph in BayesiaLab.
Nodes (representing variables) can be added and positioned on BayesiaLab’s Graph Panel with a mouse click, and arcs (representing relationships) can be “drawn” between nodes.
The causal direction can be encoded by orienting the arcs from cause to effect.
The quantitative nature of relationships between variables, plus many other attributes, can be managed in BayesiaLab’s Node Editor.
In this way, BayesiaLab facilitates the straightforward encoding of one’s understanding of a domain.
Simultaneously, BayesiaLab enforces internal consistency so that impossible conditions cannot be encoded accidentally.
See Examples & Learn More
Webinar: Optimizing Health Policies
In addition to directly encoding explicit knowledge in BayesiaLab, the Bayesia Expert Knowledge Elicitation Environment (BEKEE) is available to acquire the probabilities of a network from a group of experts.
The Bayesia Expert Knowledge Elicitation Environment (BEKEE) is a web service that allows you to systematically elicit both explicit and tacit knowledge from multiple expert stakeholders.
BayesiaLab contains all “parameters” describing probabilistic relationships between variables in Conditional Probability Tables (CPT), meaning no functional forms are utilized.
Given this nonparametric, discrete approach, BayesiaLab can conveniently handle nonlinear relationships between variables. However, this CPT-based representation requires a preparation step for dealing with continuous variables, namely discretization. This consists of manually or automatically defining a discrete representation of all continuous values.
BayesiaLab offers several tools for discretization, which are accessible in the Data Import Wizard, in the Node Editor, and in a standalone Discretization function. Univariate, bivariate, and multivariate discretization algorithms are available in this context.
While generating a Bayesian network, either by expert knowledge modeling or through machine learning, is all about a computer acquiring knowledge.
However, a Bayesian network can also be a remarkably powerful tool for humans to extract or “harvest” knowledge.
Given that a Bayesian network can serve as a high-dimensional representation of a real-world domain, BayesiaLab allows us to interactively — even playfully — engage with this domain to learn about it.
Through visualization, simulation, and analysis functions, plus the graphical nature of the network model itself, BayesiaLab becomes an instructional device that can effectively retrieve and communicate the knowledge contained within the Bayesian network.
As such, BayesiaLab becomes a bridge between artificial intelligence and human intelligence.
BayesiaLab provides a range of functions for systematically utilizing the knowledge contained in a Bayesian network. They make a Bayesian network accessible as a probabilistic expert system that can be queried interactively by an end-user.
The Adaptive Questionnaire function provides guidance regarding the optimum sequence for seeking evidence.
BayesiaLab determines dynamically, given the evidence already gathered, the next best piece of evidence to obtain in order to maximize the information gain with respect to the Target Node while minimizing the cost of acquiring such evidence.
In a medical context, for instance, this would allow for the optimal “escalation” of diagnostic procedures from “low-cost/small-gain” evidence (e.g., measuring the patient’s blood pressure) to “high-cost/large-gain” evidence (e.g., performing an MRI scan).
See Examples & Learn More
The BayesiaLab WebSimulator is a platform for publishing interactive models and Adaptive Questionnaires via the web, which means that any Bayesian network built with BayesiaLab can be shared privately with clients or publicly with a broader audience.
Once a model is published via the WebSimulator, end users can try out scenarios and examine the dynamics of that model.
Batch Inference is available for automatically performing inference on many records in a dataset. For example, Batch Inference can be used to produce a predictive score for all customers in a database.
With the same objective, BayesiaLab’s optional Code Export Module can translate predictive network models into static code that can run in external programs. Modules are available that can generate code for R, SAS, PHP, VBA, Python, and JavaScript.
Developers can also access many of BayesiaLab’s functions—outside the graphical user interface—by using the Bayesia Engine API.
The Bayesia Modeling Engine allows you to construct and edit networks.
The Bayesia Inference Engine can access network models programmatically for performing automated inference, e.g., as part of a real-time application with streaming data.
Finally, the Bayesia Learning Engine gives you programmatic access to BayesiaLab's discretization and learning algorithms.
The Bayesia Engine APIs are implemented as pure Java class libraries (jar files), which can be integrated into any software project.
See Examples & Learn More
The inherent ability of Bayesian networks to explicitly model uncertainty makes them suitable for a broad range of real-world applications.
In the Bayesian network framework, diagnosis, prediction, and simulation are identical computations. They all consist of observational inference conditional upon evidence:
Inference from observed effects to causes: diagnosis or abduction.
Inference from observed causes to effects: simulation or prediction.
This distinction, however, only exists from the perspective of the researcher, who would presumably see the symptom of a disease as the effect and the disease itself as the cause. Hence, carrying out inference based on observed symptoms is interpreted as a “diagnosis.”
One of the central benefits of Bayesian networks is that they represent the Joint Probability Distribution and can therefore carry out inference “omnidirectionally.”
Given an observation with any type of evidence on any of the networks’ nodes (or a subset of nodes), BayesiaLab computes the posterior probabilities of all other nodes in the network, regardless of arc directions.
Both exact and approximate observational inference algorithms are implemented in BayesiaLab.
Hard Evidence: no uncertainty regarding the state of the variable (node).
Likelihood/Virtual Evidence is defined by likelihoods associated with each variable state.
Probabilistic/Soft Evidence, defined by marginal probability distributions.
Numerical Evidence, for numerical variables or for categorical/symbolic variables that have associated numerical values.
See Examples & Learn More
Beyond observational inference, BayesiaLab can also perform causal inference for computing the impact of intervening on a subset of variables instead of merely observing these variables.
Pearl’s Graph Surgery and Jouffe’s Likelihood Matching are available for this purpose.
See Examples & Learn More
Many research activities focus on estimating the size of an effect, e.g., to establish the treatment effect of a new drug or to determine the sales boost from a new advertising campaign. Other studies attempt to decompose observed effects into their causes, i.e., they perform attribution.
BayesiaLab performs simulations to compute effects, as parameters as such do not exist in this nonparametric framework.
As all the domain dynamics are encoded in discrete Conditional Probability Tables (CPT), effect sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis, and several other functions offer ways to study effects, including nonlinear and variable interactions.
BayesiaLab’s ability to perform inference over all possible states of all nodes in a network also provides the basis for searching for node values that optimize a target criterion. BayesiaLab’s Target Optimization is a set of tools for this purpose.
Using these functions in combination with Direct Effects is of particular interest when searching for the optimum combination of variables that have a nonlinear relationship with the target, plus co-relations between them.
A typical example would be searching for the optimum mix of marketing expenditures to maximize sales. BayesiaLab’s Genetic Target Optimization will search, within the specified constraints, for those scenarios that optimize the target criterion.
With Open, you can select a Bayesian network file via a File Dialog and load it into Graph Window.
To the right of the file list, a preview panel shows you the structure of the Bayesian network to be loaded.
Additionally, you can specify what you wish to load along with the to-be-opened file. Clicking on the icons to the right of the file list allows you to toggle on and off specific file contents:
The Files of Type dropdown menu allows you to filter the types of Bayesian network formats to be displayed in the file list.
In addition to BayesiaLab's XBL format, select versions of BIF, NET, SSS, SCI, and DNE formats may be supported.
Bayesia does not guarantee the compatibility of BayesiaLab with any third-party or open-source Bayesian network formats.
Since BayesiaLab's initial release in 2002, this User Guide has grown from a small help file to a comprehensive software documentation, now exceeding 1,500 topics.
With the BayesiaLab software eco-system continuing to grow rapidly, this User Guide is very much a living document, with more details being added daily. Plus, the annual cycle of major releases adds countless new features.
Beyond documenting the software functionality, this User Guide also serves as a reference to BayesiaLab-related nomenclature.
Many of BayesiaLab's analysis functions are entirely new and unique in the world of research, so many BayesiaLab-specific terms are neologisms. Here, you can find what we mean with expressions such as "Target Dynamic Profile" or "Likelihood Matching."
In this User Guide, you will also find many cross-references to examples and case studies presented in seminars, webinars, and our e-book, now available as a free online edition within this Knowledge Hub.
This User Guide's tree structure mirrors the BayesiaLab software's structure.
For instance, if you want to learn about the details of the function located in BayesiaLab's menu structure at Main Menu > Analysis > Visual > Overall > Arc > Mutual Information
, the corresponding documentation resides in this User Guide at:
Main Menu | Analysis | Visual | Overall | Arc | Mutual Information.
If the same function is accessible via multiple paths, e.g., from the Main Menu, from the Graph Panel Context Menu, and the Node Context Menu, the main documentation of this function will be attached to the highest level in the hierarchy in this case, the Main Menu. All other mentions of the function will refer back to this main entry.
BayesiaLab runs inside the Application Window.
Inside the Application Window, there are four main elements:
We offer an unrestricted trial version of the BayesiaLab software so you can evaluate our technology at your leisure.
All BayesiaLab functions are available in the trial version. There are no restrictions on the number of nodes and observations.
Upon registering for the BayesiaLab trial version, we will typically send you the download and activation instructions within 24 hours.
The instructions you receive will include download links for a number of operating systems, including Windows, macOS (Intel and ARM), and Unix/Linux.
From the date you receive your trial license credentials, you can use BayesiaLab for 30 days.
The 30-day trial period starts with the delivery of your credentials, not the date you install the trial.
Please don't use the evaluation version to restore or upgrade your existing BayesiaLab license. The installation files are different from the licensed versions of BayesiaLab.
If you require an update, you can download the latest version via the Help menu in BayesiaLab: Main Menu > Help > Check for Updates
.
Load the Dataset with the network.
Load the with the network, if available.
Load the Junction Tree with the network, if available.
Load the Virtual Dataset with the network, if available.
Load the Simulator, i.e., load the configuration, if available.
The serves as the top-level navigation to all features and tools in BayesiaLab.
The provides quick one-click access to frequently used functions.
The is your work surface for creating and editing Bayesian network graphs.
Each corresponds to a Bayesian network, which you can save as a file in XBL format.
The , at the bottom of the Main Window, allows you to manage multiple , which can all be opened simultaneously.
Saves the Bayesian network in the active Graph Window using the XBL format.
By default, any dataset associated with the Bayesian network and any Evidence Scenario Files will be saved in the same XBL file so that they can be jointly loaded again later.
You can edit these default settings under Main Menu > Window > Prefences > Data
.
If the Bayesian network has a Junction Tree, it will be also automatically saved in the same file.
This command saves the current Bayesian network in a new file, adding an iteration number in parentheses to the current file name as a suffix.
If your current network is named Graph.xbl, the Increment & Save function will save it as Graph(2).xbl and not overwrite the original Graph.xbl file.
With each further iteration of Increment & Save, the counter in the suffix will increase by 1 unit, i.e., Graph(3).xbl, Graph(4).xbl, etc.
This is a helpful function for maintaining a history when developing a model, allowing you to revert to an earlier version when necessary.
The Main Menu serves as the top-level navigation in BayesiaLab.
Most functions and tools are available through multiple levels of submenus attached to the Main Menu.
However, in many cases, these functions are also accessible in the context of specific workflows.
The Main Menu can appear in three different configurations, and certain menu items and icons will only be available in specific contexts:
Save As lets you choose a new file name and location for your current Bayesian network.
Additionally, you can specify what you wish to include in the to-be-saved file. Clicking on the icons to the right of the file list allows you to toggle on and off specific contents:
As a result, this often shows multiple ways of launching the same tool or function.
Save the Dataset with the network.
Save the with the network, if available.
Save the Junction Tree with the network, if available.
Save the Virtual Dataset with the network, if available.
Save the Simulator, i.e., load the configuration, if available.
Closes the active Graph Window and prompts you to save the corresponding file if your network, the associated dataset, or the Evidence Scenario File were changed since the last Save operation.
The Network menu includes a range of standard functions related to:
Creating new files
Opening and closing files
Generating reports with network statistics
Clicking on the menu item Startup Page brings up a window featuring 12 quick-access cards.
The top row features some of the most common user actions after starting BayesiaLab:
Manually Create a Network
Open a Bayesian Network
Learn a Network from Data
Open the Media Center
The bottom two rows of cards show the most recently opened files with a network preview.
By default, the Startup screen is displayed right after launching BayesiaLab. The checkbox allows you to disable its automatic display.
Closes the active Graph Window and prompts you to save the corresponding file, if your network, the associated dataset, or the Evidence Scenario File were changed since the last Save operation.
Closes all open Graph Windows, except for the active one, and prompts you to save the corresponding files, if any of them were modified.
Closes all open Graph Windows and prompts you to save the corresponding files if any of them were modified.
Provides a list of the most recently opened networks so you can quickly reopen them as needed.
Set Working Directory allows you to define a Working Directory, i.e., a workspace, by associating a name with a specific directory.
Subsequently, you can recall the directories you defined with the menu item Recent Working Directories.
With Recent Working Directories, you can quickly recall a Working Directory you previously specified.
The list features the name you assigned plus the corresponding path.
The size of this list can be modified under Main Menu > Window > Preferences > Menus. See Recent Networks.
Closes all graphs, prompts you to save if needed and closes BayesiaLab.
Allows exportingtheMarkovblanketofthetargetvariable of the current network into a language selected in the following dialog box:
Once the network is exported in a language, it can be used to infer the value of the target variable according to the observations of the other variables.
Allows locking the network with a password to prevent it from being edited. Then, the network can be used only in Validation Mode. This menu gives access to the lock manager.
Prints the Bayesian network of the active graph window. An assistant gives access to:
the setup of the page-setting,
the configuration of the printer,
the selection of the desired scale for the network,
the possibility of displaying reference marks. These marks are useful when the network has to be print on more than one page. They indicate the page number (column, row), the border, and the vicinity,
the possibility to center the network.
The size of this list can be modified under Main Menu > Window > Preferences > Menus.
Enter your desired Working Directory name into the Name field and select the corresponding Path using the Directory dialog .
Clicking Recent Working Directories opens up a list of the most recently used Working Directories, from which you can pick the one you wish to recall.
The Reports submenu within the Network menu offers an array of information about the Bayesian network in the active Graph Window.
The Network Comments Report displays the information recorded in a network's field.
And if available, the Network Comments Report also lists the associations of .
The Network Report is a very comprehensive documentation of the network in the active Graph Window.
It includes statistics about the network structure as a whole, plus details for each node, such as the Node States, the Conditional Probability Tables, and equations.
As such, it presents all qualitative and quantitative knowledge contained in the network as a long, tabular report.
To some extent, you could recreate the network from all these details.
Select Main Menu > Network > Reports > Network
to create the Network Report.
The report can be quite substantial, depending on your network's size and complexity.
The following screenshot only shows the top portion of a much longer report:
For a thorough offline analysis, you may want to save the Network Report as an HTML file, which you can then open as a spreadsheet in Excel.
This password locking mechanism allows you to share your networks to make sure they will not be modified by unauthorized users.
When a network is locked, you cannot validate and save the modifications done in the Node Editor, add or delete arcs and nodes, associate dictionaries and databases for learning, modify classes, etc.
To start protecting a network select Main Menu > Network > Protect
.
Unless the network already has a lock, the following dialog box is displayed:
When the network is unlocked, the menu Network | Lock displays the following dialog box:
This dialog box allows you to:
lock the network using the existing password,
remove completely the Lock,
change the Lock Password.
However, you can still edit the costs associated with the nodes as they are utilized only in the (e.g, Adaptive Questionnaire, not observable nodes, etc.).
Upon confirmation of your password, the Lock icon appears in the Status Bar to indicate that the network is unlocked.
You just have to click on the icon to lock/unlock the network. The icon updates to to indicate that the network is locked.
Occurrences refer to the number of observations in a cell of a Probability Table or a Conditional Probability Table.
The number of cells in a Conditional Probability Table is a function of the following parameters:
The number of Parent Nodes.
The number of Node States of the Parent Nodes.
The number of Node States of the Child Nodes.
Here, Age is discretized into 4 states and BMI into 6 for a total of 48 cells in the table associated with BMI.
The numbers in each cell are counts of observations or Occurrences. In our case, each Occurrence represents one person from the sample of 200 individuals.
For instance, the Occurrence table associated with BMI states that Count(BMI≤20 | Age≤30)=2. So, we have only two Occurrences of that particular condition, i.e., only two individuals who are 30 years old or younger have a BMI of 20 or lower.
To create a Bayesian network, BayesiaLab needs to translate the Occurrences in each cell into probabilities.
However, with a small number of Occurrences, that can become an issue.
We have repeatedly referenced a rule of thumb, which says that we should have a minimum of 5 Occurrences per cell to estimate a Probability Table or Conditional Probability Table reliably.
In our example, several cells fall below the recommended minimum.
Such deficiencies are easy to recognize in a small example, but in more complex networks, it can be difficult to spot such weaknesses.
That is the motivation for the Occurrence Report. It displays all tables in a network and visually highlights potentially problematic cells with low Occurrences.
Select the nodes you want to include in the Occurrences Report. I none are selected, the analysis will be performed on all nodes.
Select Main Menu > Network > Reports > Reports> Occurrences
to create the Occurrences Report.
The Occurrence Report opens up and shows all Probability Tables and Conditional Probability Tables.
The fields in the report are color-coded to highlight potential issues:
Cells with 0 Occurrences are marked in red.
Cells with 5 Occurrences are marked in yellow. This is generally considered the minimum number of Occurrences.
Cells with 40 or more Occurrences are marked in green.
Furthermore, the Occurrence Report calculates the mean number of Occurrences for each row in all Probability Tables and Conditional Probability Tables.
If the mean value of any row in any of the nodes drops below the threshold of 5, the corresponding nodes are called out at the top of the report.
Whenever you learn a Bayesian network from a small dataset, you must consider whether the number of observations is sufficient for correctly estimating all Probability Tables and Conditional Probability Tables in the network.
For a deeper analysis, BayesiaLab can produce the Confidence Intervals Report, which we discuss on this page.
To understand how Confidence Intervals can be computed, we first need to explain the estimation of probabilities in the Probability Tables and Conditional Probability Tables, the so-called parameters.
In BayesiaLab, these parameters are estimated using Maximum Likelihood, i.e., using the frequencies observed in the dataset:
where:
So, the Parameter Estimation is straightforward and happens entirely in the background in BayesiaLab.
As a result, we may not always be aware of what numbers gave rise to the probabilities we see in a Probability Table or Conditional Probability Table, as the following diagram illustrates:
However, in terms of our confidence in the estimate, the two approaches are not the same. Our intuition tells us that we should have more confidence in the 0.1 value calculated based on the sample of 10,000.
BayesiaLab is using precisely the same approach for the Confidence Intervals Report.
However, in BayesiaLab, you can avoid resorting to this heuristic by using Uniform Prior Samples.
Within this network, focus on the three nodes BMI, Age, and Gender:
Go to Main Menu > Network > Reports > Confidence Intervals
to start the Confidence Intervals Report.
The Confidence Interval Report window opens up.
At the top of the report, the Confidence Level that serves as the basis for the reported Confidence Intervals is displayed.
Then, for each node, one table is shown.
For each cell containing a parameter estimate, an adjacent cell to the right displays the corresponding Confidence Interval in percentage points.
The fields in the report are color-coded to highlight potential issues:
Cells with 0 Occurrences are marked with a red background.
Cells with 5 Occurrences are highlighted with a yellow background. This is generally considered the minimum acceptable number of Occurrences.
Cells with 40 or more Occurrences are marked with a green background.
You can adjust the Confidence Level used for this report.
Go to Main Menu > Window > Preferences > Tools > Statistical Tools
.
Select the desired value from the Confidence Level dropdown menu.
Note that your selection here also applies to all other statistical tools and tests used in BayesiaLab.
The following example with one Parent Node (Age, measured in years) and one Child Node (BMI, i.e., Body Mass Index, measured in ) illustrates this with numbers:
The affected nodes in the Graph Panel are also marked with the information icon .
For instance, using the , you can evaluate whether all Conditional Probability Tables in your network meet the rule-of-thumb criterion of at least 5 observations per cell.
is the estimated probability,
is the state of variable ,
represents the number of occurrences of the argument in the data set.
So, BayesiaLab could have estimated a probability of 0.1 (or 10%) for in numerous ways, e.g., based on a sample of 10 or 10,000: .
From Frequentist Statistics, we know how to calculate a Confidence Interval
for a proportion in a sample, which is exactly what the parameter represents.
So, for a Confidence Level of 95%, the Confidence Interval is calculated as:
where
If zero observations were observed for a given state, e.g., , the Rule of Three would have to be used instead to produce Confidence Intervals:
To illustrate the Confidence Intervals Report, we use the following network:
The color-coding scheme is identical to the one used in the .
Network Comments provides space for notes, descriptions, and references regarding a Bayesian network.
In the Network Comments field, you can enter and edit paragraph-style text.
You can access the Network Comments Editor in two ways:
Main Menu > Network > Properties > Comments
.
Graph Panel Context Menu > Properties > Comments
.
A new window opens featuring the Node Comments Editor.
By default, the Node Comments field contains the date and time the file was created, plus the user that created the file.
Alternatively, the Network Comments field displays any custom text you may have defined, such as a problem domain description.
You can apply HTML-style formatting to your text using the toolbar, including links and images.
Note that Network Comments are automatically saved with the network file.
If you share your network file with others, the information contained in Network Comments will be accessible to them.
Data
This menu item allows opening the file or the database selector and then starts the Data Import Wizard.
Text *file:* Once the file is read and the pre-processing done, a fully unconnected network is created in a new graph window, each attribute having one corresponding node. The set of Bayesian network learning methods becomes then available.
Database: Once the database table is loaded and the pre-processing done, a fully unconnected network is created in a new graph window, each attribute having one corresponding node. The set of Bayesian network learning methods becomes then available.
Recent databases: Keep a list of the recently opened databases. The Data importation wizard is directly opened on the selected file. The size of this list can be modified through the settings Menus .
This menu item allows opening the Data association wizard in order to associate data from a text file or a database with an existing Bayesian network.
Recent databases: Keep a list of the recently opened databases. The Data association wizard is directly opened on the selected file. The size of this list can be modified through the settings Menus .
When the network structure is modified during the association (addition of nodes or states), the conditional probability tables are automatically recomputed from the database. If the structure re- mains unmodified, the conditional probability tables are not modified.
This menu item allows defining the properties of the active Bayesian network thanks to text files. These properties concern arcs, nodes and states:
Arc:
Arcs: allows associating a set of arcs to the network. The indicated arcs can be added or removed from the network. The arc removal will always be done before adding an arc. Before adding an arc, all the constraints belonging to the Bayesian network as well as the arc constraints and the temporal indices will be checked. If a constraint is not verified, then the arc won't be added.
Forbidden Arcs: allows associating with the network a set of forbidden arcs .
Arc Comments: allows associating with the network a set of arc comments .
Arc Colors: allows associating with the network a set of colors on the arcs.
Fixed Arcs: allows defining if some arcs are fixed or not.
Node:
Node Renaming: allows renaming each node with a new name. These new names must be, of course, all different.
Comments: allows associating a comment with each node that is in the file.
Classes: allows organizing nodes in subsets called classes . A node can belong to several classes at the same time. These classes allow generalizing some node's properties to the nodes belonging to the same classes. They allow also creating constraints over the arc creation during learning.
Colors: allows associating colors with the nodes or classes that are in the file. The colors are written as Red Green Blue with 8 bits by channel in hexadecimal format (web format): for example the color red is 255 red 0 green 0 blue, it will give FF0000. Green gives 00FF00, yellow gives FFFF00, etc.
Images: allows associating colors with the nodes or classes that are in the file. The images are represented by their path relatively to the directory where the dictionary is.
Costs: allows associating with each node a cost . A node without cost is called not observable.
Temporal Indices: allows associating temporal indices with the nodes that are in the file. These temporal indexes are used by the BayesiaLab's learning algorithms to take into account any constraints over the probabilistic relations, as for example the no adding arcs between future nodes to past nodes. The rule that is used to add an arc from node N1 to node N2 is:
If the temporal index of N1 is positive or null, then the arc from N1 to N2 is only possible if the temporal index of N2 is greater of equal to the index of N1.
Local Structural Coefficients: allows setting the local structural coefficient of each specified node or each node of each specified class.
State Virtual Numbers: allows setting the state virtual number of each specified node or each node of each specified class.
Locations: allows setting the position of each node.
State:
State Renaming: allows renaming each state of each node with a new name.
State Values: allows associating with each state of each node a numerical value .
State Long Names: allows associating with each state of each node a long name more explicit than the default state name. This name can be used in the different ways to export a database, in the html reports and in the monitors.
Filtered States: allows defining a state to each node as a filtered state .
As indicated by the syntax, the name of the node, class or state in the text file cannot contain equal, space or tab characters. If the node names contain such characters in the networks, those characters must be written with a {color} (backslash) character before in the text file: for example the node named Visit Asia will be written Visit\ Asia in the file.
In order to specifically differenciate a nam which is the same for a classe, a node or a state, you must add at the end of the name the suffix "c" for a class, "n" for a node and "s" for a state.
If your network contains not-ASCII characters, you must save your own dictionaries with UTF-8 (Unicode) encoding. For example, in MS Excel, choose "save as" and select "Text Unicode (*.txt)" as type of file. In Notepad, choose "save as" and select "UTF-8" as encod- ing. If your file contains only ASCII character you can let the default encoding (depending on the platform) but it is strongly encouraged to use UTF-8 (Unicode) encoding in order to create dictionary files that doesn't depend on the user's platform. So, for example, a chinese dictionary can be read by a german without any problem whatever the used platforms are. If you are not sure how to save a file with UTF-8 encoding, you should export a dictionary with BayesiaLab, modify and save it (with any text editor) and load it in BayesiaLab.
This menu item allows exporting the different kinds of dictionaries in text files.
The dictionary files are saved with UTF-8 (Unicode) encoding in order to support any character of any language. An option, in the Import and Associate preferences: Save Format , allows saving or not the BOM (Byte Order Mask) at the beginning of the file. The BOM increases the compatibility with Microsoft applications. On other platform like Unix, Linux or Mac OS X, the BOM is not necessary and, in come cases, is considered as simple extra characters at the beginning of the file.
This menu item allows associating an evidence scenario file with the network.
This menu item allows exporting into a text file an evidence scenario file associated with the network.
This menu item allows saving the base associated with the network including the results of the various pre-processing that have been carried out within the data importation wizard (discretization, aggregation, filtering,). If the imported database still contains missing values and if the selected algorithm to process the missing values is one of the two imputation algorithms (static or dynamic), then option will allow you to realize all your imputation tasks by saving a database without any missing values. Indeed, each missing value is replaced by taking into account its conditional probability distri- bution, returned by the Bayesian network, given all the known values of the line. If the database contains data for test and data for learning, the user can choose which kind of data he wants to save: only learning data, only test data or the whole data. It is also possible to save only the data corresponding to the selected nodes.
The states' long name can be saved instead of the states' name. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there is no numerical values asso- ciated with the database and if the option is checked, the numerical values will be created by randomly generating a value in each concerned interval. If the database contains weights, they will be saved as the first column in the output file.
Allows the imputation of the missing values of the associated database according to the mode selected in the following dialog box:
The data will be saved in the specified file and the long name of the states will be used as specified. If the database contains data for test and data for learning, the user can choose on which kind of data he wants to perform imputation: only learning data, only test data or the whole data. The states' long name can be saved instead of the states' name. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there is no numerical values associated with the database and if the option is checked, the numerical values will be created by randomly generating a value in each concerned interval. However, if there are numerical values in the database, the missing numerical values will be generated from the distribution function of each interval. If the database contains weights, they will be saved as the first column in the output file.
Opens the graph editor if a database is associated with the current network.
Dictionary File Structures | ||
---|---|---|
Arc
Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , true for an added arc or false for a removed arc. The last occurrence is always chosen.
Forbidden Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class.
Comments
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , comment . The comment can be any character string without return (in html or not). The last occurrence is always chosen.
Colors
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , color . The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF,etc. The last occurrence is always chosen.
Fixed Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , true for an fixed arc or false for a not fixed arc. The last occurrence is always chosen.
Node
Node Renaming
Name of the node Equal, Space or Tab new node name. The new name must be valid (different from t or T and without?). A node can be present only once otherwise the last occurrence is chosen.
Comments
Name of the node or the class Equal, Space or Tab Comment. The comment can be any character string without return (in html or not). A node can be present only once otherwise the last occurrence is chosen.
Classes
Name of the node Equal, Space or Tab Name of the class. The class can be any character string. A node present several times will be associated with different classes.
Colors
Name of a node or a class Equal, Space or Tab Color The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF, etc. A node can be present only once otherwise the last occurrence is chosen.
Images
Name of a node or a class Equal, Space or Tab path to the image relatively to the directory where the dictionary is. The image path must be a valid relative path or an empty string. A node can be present only once otherwise the last occurrence is chosen.
Costs
Name of the node Equal, Space or Tab value of the cost or empty if we want the node to be not observable. The cost is an empty string or a real number superior or equal to 1. A node can be present only once otherwise the last occurrence is chosen.
Temporal Indices
Name of the node Equal, Space or Tab value of the index or empty if we want to delete an already existent index The index is an integer. A node can be present only once otherwise the last occurrence is chosen.
Local Structural Coefficients
Name of the node Equal, Space or Tab value of the local structural coefficient or empty if we want to reset to the default value 1. The local structural coefficient is an empty string or a real number superior to 0. A node can be present only once otherwise the last occurrence is chosen.
State Virtual Numbers
Name of the node Equal, Space or Tab virtual number of states or empty if we want to delete an already existent number. The state virtual number is an empty string or an integer superior or equal to 2. A node can be present only once otherwise the last occurrence is chosen.
Locations
Name of the node Equal, Space or Tab , position. The location is represented by two real numbers separated by a Space . The first number represent the x-coordinate of the node and the second number the y-coordinate. A node can be present only once otherwise the last occurrence is chosen.
State
State Renaming
Name of the node or class dot (.) name of the state Equal, Space or Tab new state name or State name Equal, Space or Tab new state name if we want to rename the state for all nodes. The new name is a valid state name. A state can be present only once otherwise the last occurrence is chosen.
State Values
Name of the node or class dot (.) name of the state Space or Tab real value or Name of the state Equal, Space or Tab real value if we want to associate a value with a state whatever the node. The value is a real number. A state can be present only once otherwise the last occurrence is chosen.
State Long Names
Name of the node or class dot (.) name of the state Equal, Space or Tab long name or Name of the state Equal, Space or Tab long name if we want to associate a long name with a state whatever the node. The long name is a string. A state can be present only once otherwise the last occurrence is chosen.
Filtered States
Name of the node or class dot (.) name of the filtered state. Name of the filtered state if we want to set the filter property to the state whatever the node. A state can be present only once otherwise the last occurrence is chosen.
The Data Import Wizard is the principal tool in BayesiaLab for preprocessing and importing external data.
You can use BayesiaLab's Data Import Wizard to import data from two types of sources:
Data tables in text format, in which data fields are separated by delimiters, such as comma, semicolon, tab, or pipe "|". The most common format is CSV.
Data tables in SQL-compatible databases can be accessed via a JDBC driver. Third-party JDBC drivers are available for all major databases.
All data sources must be structured as a single table, i.e., with rows and columns. All table joins must be performed before importing the data in BayesiaLab.
To launch the Data Import Wizard for a data table in a
text file, select Main Menu > Data > Open Data Source > Text File
.
database, select Main Menu > Data > Open Data Source > Database
.
Then, the Data Import Wizard guides you through five sequential steps. The first step of the Data Import Wizard depends on the data source, i.e., text file or database. All subsequent steps of the Data Import Wizard are the same for both types of data sources.
Data Structure Definition
Data table in a database
Definition of Variable Types
Data Selection, Filtering, and Missing Value Processing
Discretization and Aggregation
Import Report
Step 3 of the five-step Data Import Wizard deals with Data Selection, Filtering, and Missing Values Processing.
Information (same as in Step 2 — Definition of Variable Types)
We start with the Data panel — although it is at the bottom of the window — as it can help inform decisions about Missing Values Processing.
This Data panel resembles the Data panel from Step 2 — Definition of Variable Types.
However, there are several important additional pieces of information available:
For Discrete variables, it shows the frequencies of all states, including Missing Values and Filtered Values:
As you experiment with checking/unchecking, you can see how the Number of Rows in the Information panel changes.
In terms of a data query, the Filter checkbox would be the equivalent of a nominal value row filter.
Note that the number of Filtered Values does not refer to the number of excluded rows due to an unchecked Filter checkbox.
For Continuous variables, it shows the standard statistics, such as Minimum, Maximum, Mean, and Standard Deviation. Additionally, the table displays the frequencies of non-missing values, Missing Values, and Filtered Values:
The Select Values panel relates to the Filter checkboxes plus any Required Minima and Maxima applied in the Data panel.
Three actions are available in this panel:
You can choose the logic for combining the Filters and Minima/Maxima assigned in the Data panel:
OR: a row will be removed if ANY of the selected Filters or specified Minima/Maxima across all variables apply to that row.
AND: a row will only be removed of ALL of the selected Filters and specified Minima/Maxima across all variables that apply to that row.
Click the Show Selections button to review what Filters and Minima/Maxima are currently in place.
Note the syntax for Discrete variables: The variable name is followed by "in" (i.e., is an element of) followed by the included values shown as an array in square brackets.
Further logical expressions are shown as conjunctions (AND) or disjunctions (OR) in separate lines.
Clicking the Delete Selections button removes all Filters and Minima/Maxima currently in place.
In the Missing Value Processing panel you can specify which kind of processing to apply to variables with Missing Values, i.e., Filter, Replace, and Infer.
The Filter function allows you to remove rows from the dataset that contain Missing Values. This is equivalent to what is commonly known as casewise deletion.
You can apply the Filter individually to any variable that contains Missing Values.
Usage
In the Data panel, click on the header or into the column of the variable with Missing Values.
Then, check the Filter checkbox in the Missing Values Processing panel.
Next, choose the logical condition to apply when you select multiple variables to be subject to the Filter.
OR: a row will be removed if ANY of the selected variables contain a Missing Value in that row.
AND: a row will only be removed of ALL of the selected variables containing a Missing Value in that row.
Before applying Filter, please consider the implications discussed in Chapter 9: Missing Values Processing.
With the Replace By function, you can specify a value for replacing the Missing Values in the selected variable.
You have several options in this regard:
You can set a specific value:
For a Discrete variable, you can select among the values observed in the variable from a drop-down list.
Alternatively, you can choose the Modal value, i.e., the most frequently occurring value of the variable in the dataset.
For a Continuous variable, you can select to use the Mean value computed from the dataset.
As an alternative, you can specify any arbitrary value.
For practical analysis purposes, the Infer option is the most common method for Missing Values Processing.
To learn about Missing Values Processing beyond Filter and Replace By, please see Missing Values Processing in Chapter 9 of our e-book.
The Methods in Detail:
Infer — Static Imputation
Infer — Dynamic Imputation
Infer — Structural EM
Infer — Entropy-Based Imputations
The Information panel is identical in its functionality to the Information panel in Step 2 — Definition of Variable Types. Please refer to that topic for details.
In Step 2 — Definition of Variable Types of the five-step Data Import Wizard, you need to define variable types.
Step 2 contains four panels that relate to each other in their content and available actions.
With the radio buttons in the Type panel, you can define the type of each variable.
Before you start making your determinations, BayesiaLab has already made some guesses regarding the appropriate variable type, i.e., Discrete versus Continuous.
Furthermore, some variables have limited options regarding the variable type because of their distributions:
If a variable has the same value for all observations, it falls into the Unused variable type. Such a not-distributed variable cannot be imported at all into BayesiaLab.
Variables that contain any text values cannot be declared Continuous variables.
Variables with Missing Values cannot be of the type Weight, Row Identifier, or Learn/Test.
To select a variable, click on the variable header or click anywhere inside the column in the Data panel.
You can perform the selection of multiple variables with keystroke combinations commonly used in spreadsheet editing:
Ctrl+Click
: add a variable to the current selection.
Shift+Click
: add all variables between the currently selected and the clicked variable to the selection.
Ctrl+A
: select all variables in the Data panel.
Shift+End
: select all variables from the currently selected variable to the rightmost variable in the table.
Shift+Home
: select all variables from the currently selected variable to the leftmost variable in the table.
The current selection is highlight by showing the selected columns in a darker shade of their current color.
Discrete
The Discrete type considers each unique value of the variable a distinct state.
Any variable that contains text will be considered Discrete by default.
The maximum number of unique values that can be accommodated can be specified under Main Menu > Window > Preference > Editing > Node > Maximum Number of States
.
Continuous
The Continuous type applies to numerical variables, which must be discretized in Step 4 — Discretization and Aggregation.
If a variable contains integer values above a certain threshold, the variable will be considered Continuous.
You can specify this threshold under Main Menu > Windows > Preferences > Data > Import & Associate > Threshold for Assuming Integers as Continuous
. The default threshold value is 5.
Learn more about Discrete and Continuous nodes in the Node Editor topic.
Weight
Weighting is often applied to surveys to make a survey sample representative of the demographics of the underlying population.
If your dataset contains such a Weight variable, select it by clicking on the corresponding column.
Then, select the Weight button in the Type panel.
Later, in Step 4 — Discretization and Aggregation, you can specify whether or not to normalize the Weight variable.
Learning/Test
For a dataset that has already been split into a Learning Set and a Test Set, you can use such an existing definition to import your data into BayesiaLab.
Both the Learning Set and the Test Set need to be in the same data table, rather than in separate files.
A binary indicator variable needs to identify each set with a unique code.
With a Learning/Test variable defined, in Step 4 — Discretization and Aggregation of the Data Import Wizard, you need to assign which of your codes corresponds to BayesiaLab's Learning and Test states.
Row Identifier
You can assign one or more variables to serve as Row Identifiers. The values of Row Identifiers are imported but not processed in any way. They serve as labels that are attached to each record.
There are numerous functions in BayesiaLab that allow you to look up what record in the dataset corresponds to what is currently on display on the screen.
For instance, Automatic Evidence-Setting displays the Row Identifier in the Status Bar.
By selecting the Unused button, you can skip the import of the selected variables. In previous versions of BayesiaLab, this option was also known as "Not Distributed."
Unused is automatically applied to variables containing only a single value across all observations, i.e., when the variable is "not distributed," hence the original name.
Unused variables will appear grayed out in the remaining steps of the Data Import Wizard.
The Multiple Typing panel allows you to quickly assign variable types across multiple variables.
Click Set All to Discrete to apply the Discrete type all variables, if possible.
Click Set All to Continuous to apply the Continuous type all variables, if possible.
By clicking either button, all previous type assignments are replaced.
You can automatically remove variables, i.e., set them to the Unused type, if they exceed a certain column percentage of Missing Values.
Click the Set Missing Values Threshold button.
From the pop-up window, set the percentage.
All variables that exceed the specified threshold are set to Unused.
The Information panel provides a range of statistics relating to the current type assignment of variables:
Number of Rows refers to the number of records in the to-be-imported datasets. In the context of datasets, rows, records, cases, samples, and observations all have equivalent meanings.
Discrete shows the absolute count of variables currently assigned to the Discrete type. The percentage refers to the proportion of Discrete variables among all variables, including the type Unused.
Continuous shows the absolute count of variables currently assigned to the Continuous type. The percentage refers to the proportion of Continuous variables among all variables, including the type Unused.
Others displays the count of all the variable assigned to the types Row Identifier, Weight, or Learn/Test.
Unused shows the absolute count of variables currently assigned to the Unused type. The percentage refers to the proportion of Unused variables among all variables.
Missing Values displays the count of cells in the dataset that contain Missing Values. The percentage refers to the proportion of cells in the dataset that contain Missing Values, including all variables types, even Unused, Row Identifier, and Learning/Test.
Filtered Values displays the count of cells in the dataset that contain Filtered Values, as indicated by the asterisk (*). The percentage refers to the proportion of cells in the dataset that contain Filtered Values, including all variable types, even Unused, Row Identifier, and Learning/Test.
The Data panel visualizes the current variable selection and type assignment with colors (see Usage above).
Horizontal and vertical scrolling allows you to view the entire dataset that will be imported.
BayesiaLab requires the discretization of all Continuous variables, and in this screen, you need to specify how to discretize those variables.
The Discretization process determines how a Continuous variable will be imported into BayesiaLab, i.e.,
the number of intervals (or bins);
the values of the thresholds which define the ranges of the intervals.
These attributes define the transformation of the underlying Continuous variable in the dataset into a discretized Continuous node in BayesiaLab.
To learn more about the important distinction between Continuous and Discrete nodes, please see these topics:
Continuous Nodes
Discrete Nodes
Select one or more Continuous variables and click into one of the headers or one of the corresponding columns.
The Discretization panel appears.
The first item in the Discretization panel is the Discretization Type drop-down menu.
The items on this list can be grouped into Automatic Discretization versus Manual Discretization.
The bottom item on the drop-down menu, Manual, refers to a Manual Discretization approach in which you have full control over thresholds, etc.
The remaining eleven items all refer to different kinds of Automatic Discretization.
However, even in Manual Discretization, you take advantage of the algorithms available with Automatic Discretization.
Manual Discretization
Automatic Discretization
Step 4 — Discretization and Aggregation requires you to make several more important choices before concluding the import process.
As opposed to the previous steps, which all consisted of a single screen, Step 4 provides one screen per variable type for six screens.
As you go from Step 3 to Step 4, the variable that you last selected in Step 3 remains highlighted.
And depending on the variable type, Step 4 starts with one of six possible screens, one for each variable type. Click on the thumbnails in the following table for a preview.
Note that for Row Identifier and Unused variables, no actions are available. Except for the Data panel, the corresponding screens are blank.
For all other variable types, we discuss all available options in detail in separate sections:
Weights
Learning/Test
Discretization
Aggregation
Click on that Weight variable in the Data panel, and the Normalize Weights checkbox appears as the only option on the screen.
You need to determine whether to apply Normalize Weights or not:
If yes, the Weights will be normalized so that the total number of cases considered by BayesiaLab for machine learning is equal to the actual number of samples in the dataset.
If no, the Weight variable will be treated as representing the actual number of observed cases. So, a weight of 10 for one observation would be treated and counted like ten instances of that same observation. As a result, the total number of cases considered by BayesiaLab would correspond to the population from which the weight was calculated.
This example illustrates the situation for a survey consisting of 10 observations:
If you normalize, BayesiaLab considers the correct proportions of the weighted samples but still only considers ten observations in total for learning purposes.
If you have specified a Weight variable, it will be taken into account in the Discretization and Aggregation algorithms.
BayesiaLab can load data from flat text files (e.g., CSV, TXT) or connected databases.
In Step 1 — Data Structure Definition: Text File of the five-step Data Import Wizard, you need to define the dataset structure for BayesiaLab so that the data can be imported and interpreted correctly.
The Data Structure Definition window opens up.
Specify all Settings & Options (see below).
Many of the settings can be immediately reviewed and validated in the Data Preview panel. However, Missing Values or Filtered Values can be mischaracterized and yet go unnoticed and, later, introduce major problems causing misleading analysis results.
The Data Import Wizard will attempt to automatically identify the separator or delimiter of the fields in the data table.
However, there can be ambiguous situations in which you need to specify the separator by checking the appropriate box:
Tab
Semicolon
Comma
Space
Other
If you prepare a dataset externally for import into BayesiaLab, ensure that separators are unique and do not appear as content in any data field. So, if any data fields contain text with commas as content, you cannot use commas as the separator. In such a case, try a tab or semicolon.
The Encoding drop-down list allows you to select an alternative encoding for the dataset to be imported. This can become necessary for importing data from certain legacy systems.
Specifying the correct code for Missing Values is very important so that BayesiaLab can process such Missing Values appropriately.
The list shows a number of codes that are commonly used for Missing Values. However, this is not necessarily comprehensive, and your dataset may contain different codes, such as "." (dot) or "-9999", etc.
Click Add to create a new entry in this list for the current data import.
Clicking Remove deletes the selected entries.
Deleting a default entry such as NR (for no response) may become necessary, for instance, if a data field contains the string "NR" as a valid value. That would be the case if your data set included New York Stock Exchange ticker symbols. In this context, "NR" would be the symbol of Newpark Resources, Inc. Unless you address this issue, all "NR" strings would be treated as Missing Values.
You can set your own default list of codes under Main Menu > Windows > Preferences > Data > Import & Associate > Missing & Filtered Values
.
Just as important as the correct definition of Missing Values is a clear understanding of a Filtered Value.
A Filtered Value occurs when a variable cannot have any value for logical reasons. For instance, in a demographic dataset, there could be a field Age at Retirement. However, in the record of a 16-year-old high school student in this dataset, there could be no value for the field Age at Retirement. However, this situation must not be treated as a Missing Value! A Missing Value implies that a value exists but is unknown. In the case of the student's record, a value is logically impossible, not missing. So, instead of a numerical value or a blank, you must specify a code that says that there can be no value. This is the purpose of assigning a Filtered Value code.
Importantly, you must encode any Filtered Values before importing your dataset into BayesiaLab. In BayesiaLab, you merely need to declare what code you used in your dataset to represent Filtered Values. BayesiaLab will create a Filtered State as an additional state in each node for which Filtered Values are encountered during data import.
Click Add to create a new entry in this list for the current data import.
Clicking Remove deletes the selected entries.
You can set your own default list of codes under Main Menu > Windows > Preferences > Data > Import & Associate > Missing & Filtered Values
.
In Data Preview, all Filtered Values are marked with an asterisk (*) in the data table.
Understanding the difference between Missing and Filtered Values is critically important.
Clicking Define Sample button opens a window that allows you to sample records from your data source.
This is particularly useful for the preliminary analysis of large datasets. By default, BayesiaLab imports all records from the data.
You can define a subset in three ways:
Random Sample — Size in Percent: specify the size of the random sample as a percentage of the original dataset size.
Random Sample — Size: specify the number of records in the sampled dataset.
Custom Range — First Row to Last Row: specify the range of records to be imported.
Checking the option Fixed Seed and specifying a number ensures that you can repeat exactly the same random sampling for each iteration of the import. This allows you to reproduce your results as you develop your model.
By default, the Data Import Wizard loads the entire dataset as a Learning Set.
By clicking the Define Learning/Test Sets button, you can set aside a Test Set (or holdout sample).
You can define the Learning Set/Test Set split in three ways:
Random Test Set — Size in Percent: specify the size of the Test Set as a percentage of the original dataset size.
Random Test Set — Size: specify the number of records in the Test Set.
Custom Test Set — First Row to Last Row: select a specific range of records for a Test Set.
Checking the option Fixed Seed and specifying a number ensures that you can obtain the same Test Set with each iteration of the import. This allows you to reproduce your results and validation measures as you develop your model.
In addition to specifying a Learning Set/Test Set split here, you can define a split in other ways:
You can designate a variable in the original dataset to assign records to the Learning Set and Test Set. You can select such a variable in the next step of the Data Import Wizard: Step 2 — Definition of Variable Types.
Main Menu > Data > Data Set > Generate Learning/Test Split
Furthermore, you can remove the Learning Set/Test Set split at any time:
Main Menu > Data > Data Set > Remove Learning/Test Split
.
The Options Panel allows you to manage the interpretation of the to-be-imported dataset.
Title Line:
By checking this option, BayesiaLab reads the first row of the dataset and uses its values as column headers.
If the values in the first row are not compatible, e.g., due to missing values or duplicate values, you are prompted to accept the proposed corrections, which include adding suffixes for duplicate names and substituting missing values with generic column headers, e.g., N0, N1, N2, etc.
End of Line Character:
With some files, it may be necessary to specify a certain character so that BayesiaLab can correctly detect the end of a row in a data table.
Consider Identical Consecutive Separators as One:
Check this box so that if you have multiple consecutive separators of the same type, e.g., “;;;”, the Data Import Wizard will treat them as a single separator.
Consider Different Consecutive Separators as One:
Check this box so that if you have multiple consecutive separators of any type, e.g., “;,|”, the Data Import Wizard will treat them as a single separator.
Double Quotes:
Remove
As String Delimiters
Simple Quotes:
Remove
As String Delimiters
Transpose:
By default, BayesiaLab expects the data source to be arranged in
columns corresponding to variables and
rows corresponding to samples, records, or observations.
Checking the Transpose option allows you to accept an alternate format, i.e.,
rows corresponding to variables and
columns corresponding to samples, records, or observations.
The transposed format is commonly used in bioinformatics. For instance, variables representing genes — sometimes tens of thousands — are arranged row by row. Observations — sometimes only a few dozen — are placed in columns side by side.
The data table at the bottom of the window provides a preview of how the Data Import Wizard sees and interprets your dataset.
Blank fields indicate a Missing Value.
Asterisks (*) mark Filtered Values. In the dataset shown below, for instance, Filtered Values were assigned to all males and post-menopausal women for the variable Pregnancy Status. For those two groups and for obvious reasons, pregnancy is impossible.
Horizontal and vertical sliders allow you to scroll and view the entire dataset. Alternatively, you can move your mouse's scroll wheel up and down.
If a variable name exceeds the column width, you can click on the divider between column headers and drag it into the desired position. Alternatively, double-click the divider to auto-fit the column width to the variable name.
In the following animation, we show a dataset that requires numerous settings to be adjusted for proper import:
The dataset uses the pipe character ("|") as a delimiter.
All fields are enclosed in double quotes.
Multiple, arbitrary codes are used for Missing Values:
"Refused"
"unknown"
"Not Applicable" is the code for Filtered Value used in this dataset.
Note that there are no standardized codes for Missing Values and Filtered Values. They can be as arbitrary as in this example. Therefore, it is of utmost importance that whoever prepares the dataset must convey the precise meaning of these codes to the analyst who imports the data into BayesiaLab.
A Missing Values icon indicates the presence of at least one Missing Value in the corresponding variable.
A triangle icon indicates that variable-specific statistics are available. It appears on all variable headers with the exception of variables of the type Row Identifier and Unused.
Clicking on the triangle icon or the associated variable header brings up a table with variable statistics:
The Filter checkboxes allow you to uncheck/deselect specific values.
The checked box means that the value is included, which is the default condition.
The unchecked box means that the value is excluded and that all rows that contain that value will be filtered, i.e., removed.
This panel is only active if you select one of the variables that feature a small question mark icon . This icon indicates that the corresponding variable contains at least one Missing Value.
The Discretization screen is part of within the .
This screen is only available if you designated at least one Continuous variable in .
At the bottom of the screen, the Data panel carries over from , although now without any options.
This screen is only available if you designated a Weight variable in .
If you do not normalize, BayesiaLab would consider a sample of 100 for learning purposes and presumably find spurious relationships. This "over-counting" by a factor of 10 has the same effect as reducing the to 0.1.
brings data into BayesiaLab to create a new Bayesian network.
In Modeling Mode , select Main Menu > Data > Open Data Source > Text File
.
Click Next to proceed the .
Right-click on the database icon in the Status Bar and select Generate Learning/Test Split.
Right-click on the database icon in the Status Bar and select Remove Learning/Test Split.
Observation No. | Weight | Normalized Weight |
1 | 10 | 1.0 |
2 | 12 | 1.2 |
3 | 8 | 0.8 |
4 | 9 | 0.9 |
5 | 11 | 1.1 |
6 | 13 | 1.3 |
7 | 7 | 0.7 |
8 | 4 | 0.4 |
9 | 15 | 1.5 |
10 | 11 | 1.1 |
Sum | 100 | 10 |
Tree is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
Tree is a bivariate discretization method. It machine-learns a decision tree that uses the to-be-discretized variable for representing the conditional probability distributions of the Target variable given the to-be-discretized variable. Once the Tree is learned, it is analyzed to extract the most useful thresholds.
It is the method of choice in the context of Supervised Learning, i.e., if you plan to machine-learn a model to predict the Target variable.
At the same time, we do not recommend using Tree in the context of Unsupervised Learning. The Tree algorithm creates bins that are biased toward the designated Target variable. Naturally, emphasizing one particular variable would run counter to the intent of Unsupervised Learning.
Note that if the to-be-discretized variable is independent of the selected Target variable, it will be impossible to build a tree, and BayesiaLab will prompt you to select a univariate discretization algorithm.
All manually discretized variables can be used as a Target variable for Tree discretization.
Using a Target variable for Discretization does not create a Target Node in the network.
This screen is only available if you designated a Learning/Test variable in Step 2 — Definition of Variable Types.
Select the Learning/Test variable by clicking on its header or into the corresponding column.
Select BayesiaLab's learning and test labels from the drop-down lists to match the codes in your dataset.
Additionally, you can see the proportion of cases for each code in your dataset.
Given that you have a variable of the type Learn/Test, only the "learning" rows will be taken into account for Discretization and Aggregation. Otherwise, you would partially defeat the purpose of having a hold-out set.
Automatic Discretization covers numerous discretization algorithms that are part of Step 4 — Discretization and Aggregation of the Data Import Wizard.
Except for Manual, all items in the Type menu represent Automatic Discretization algorithms.
Most of these algorithms can also be accessed via the Generate a Discretization function within the Manual Discretization screen.
Selecting a Discretization algorithm applies variable by variable, i.e., you can use a different algorithm for each Continuous variable.
To select a variable, click on the variable header or anywhere inside the column.
You can perform the selection and deselection of multiple variables with keystroke combinations commonly used in spreadsheet editing:
Ctrl+Click: add a variable to the current selection.
Shift+Click: add all variables between the currently selected and the clicked variable to the selection.
Ctrl+A: select all variables in the Data panel. However, selecting all variables is not useful here in Step 4, as there are no actions that can apply to all variable types.
Shift+End: select all variables from the currently selected variable to the rightmost variable in the table.
Shift+Home: select all variables from the currently selected variable to the leftmost variable in the table.
Click the Select All Continuous button to select all Continuous variables.
Note that this action will also select any variables which you have already discretized manually. As a result, you may override your previous choices.
Note that Continuous variables already discretized manually are highlighted in soft blue.
If you do not specify an algorithm for a variable that was not manually discretized either, the default Discretization algorithm with its default settings will be used.
You can set the default Discretization algorithm under Main Menu > Window > Preferences > Discretization. [+] Show More
For the following algorithms, a Log Transformation is available as an option:
Applying the Log Transformation is useful if you have a high density of values at the bottom end of the variable domain. This "stretches" the scale for small values approaching zero.
Note that the Log Transformation is only used temporarily for discretization purposes. Thus, the values of the thresholds and values of the intervals can all be interpreted based on the original scale.
For the following algorithms, the option Isolate Zeros is available:
Separating 0 into a separate interval can be useful for zero-inflated distributions so as to clearly separate small values from "absolutely nothing."
Click Finish to perform the Discretization.
A progress bar displays the status of the Discretization process:
If a Filtered Value is defined for a Continuous variable, a new artificial interval with an infinitesimally small width of 10-7 will be added after the intervals defined in this step. This newly-created state will serve as the Filtered State, and "*", i.e., the asterisk character, will be its State Name.
At its conclusion, BayesiaLab opens up a Graph Window with all imported variables now represented as nodes.
Simultaneously, a window pops up that offers you an optional Import Report, which is Step 5 of the Data Import Wizard.
Manual Discretization is one type of Discretization available in Step 4 — Discretization and Aggregation of the Data Import Wizard.
Select Manual from the drop-down menu.
Several additional items and buttons appear on the left side, plus a Cumulative Distribution Function (CDF) is shown on the right. This CDF plot can help in selecting appropriate discretization intervals.
In the screenshot below, the variable Standing Height (cm) is selected, meaning that the CDF plot corresponds to that variable.
Click on the Density Function button, and the Probability Density Function (PDF) of the same variable appears.
Now the button reads Distribution Function, and by clicking it, you can toggle back to the CDF view.
By default, only one threshold is placed at the mean value of the corresponding variable.
This threshold appears as a horizontal line on the CDF and a vertical line on the PDF.
The CDF and PDF plots are interactive; you can add, delete, and modify thresholds.
The following instructions apply to both plots:
To select a threshold, left-click on that threshold.
The selected threshold is highlighted in red.
The remaining thresholds on the plot remain blue.
The precise numerical value of a selected threshold is shown in the Threshold Value field to the right of the plot.
To move a threshold, click on it and hold, then move it. Release to fix its position.
The percentages displayed at the end of a selected threshold refer to the share of observations that fall into the intervals above and below this threshold.
Instead of moving the selected threshold with your cursor, you can type a specific value into the Threshold Value field.
To add an additional threshold, right-click with your cursor on the desired position.
To remove an existing threshold, right-click on it to delete it.
A zoom function is available for examining the plot in detail:
Hold the Ctrl
key, click and hold the left mouse button, then move the cursor across the range you wish to focus.
To revert to the default zoom, hold Ctrl
, then double-click anywhere in the plot area.
You can zoom in repeatedly until you have reached the desired magnification level.
As an alternative to selecting a threshold by left-clicking, you can scroll through all thresholds using the Previous and Next buttons.
Note that as soon as a threshold is defined on a Continuous variable, it is considered Discretized, and the variable's data column is colored in soft blue.
The interactive CDF and PDF plots are similar to the editing functions available under Curve View in the Node Editor.
We re-use the dataset from the previous steps, so we can fast-forward to Step 4 and focus on that step.
While remaining on the Manual Discretization screen, you can also utilize the Generate a Discretization function.
It allows you to use the algorithms from Automatic Discretization but in a more controlled environment where you can closely observe the results of the Discretization.
Click on the Generate a Discretization button.
Then, select the Type from the drop-down menu, e.g., the R2-GenOpt algorithm. You have nine algorithms available, i.e., the univariate methods only.
Choose the number of Intervals, e.g., 5.
Set a Minimum Interval Weight, which defines the minimum prior probability of an interval in percent. The default value is 1%.
Note that you can set defaults for the above settings under Main Menu > Window > Preferences > Discretization
.
Additionally, there are options for Log Transformation and Isolate Zeros, which we discuss in the context of Automatic Discretization.
Click OK to perform the Discretization.
In certain situations, you may carefully choose thresholds for a variable (see Manual Discretization Workflow Animation). Perhaps another variable, or multiple variables, should have exactly the same discretization. In this context, you can use the Transfer the Discretization Thresholds button.
Select the source variable from which you wish to copy the thresholds.
Click the Transfer the Discretization Thresholds button.
A new window opens up that allows you to select one or more target variables.
Select the target variables.
Click OK.
This checkbox is synchronized across Manual and Automatic Discretization processes.
If checked, BayesiaLab automatically creates Classes for each type of Discretization, i.e., all variables that are discretized with the same algorithm will belong to the same Class.
Note that variables that were discretized manually, even if you used the Generate a Discretization button, will all become members of the Class MANUAL.
You can review the Class memberships in the Class Editor after the data import process is complete.
This function allows you to load a Discretization Dictionary with saved Discretization Intervals and Discretization Methods.
This approach is particularly helpful when you repeatedly import datasets with the same variables for which you have already found a suitable discretization.
The following text file illustrates the syntax of a Discretization Dictionary.
R2-GenOpt* is a modified version of R2-GenOpt and uses a specific MDL score to choose the number of bins.
With 100 observations, even though we selected 8 bins, only 3 were created for the variable 8- Wrist girth.
With 1,500 observations, even though we selected 10 bins, only 5 have been created for AGN, and 6 for ALL.
R2-GenOpt* is one of the algorithms for Continuous variables in of the .
Discretization Dictionary
Supervised Multivariate is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The Supervised Multivariate discretization algorithm focuses on representing the multivariate probabilistic dependencies involving a Target variable.
It utilizes Random Forests to find the most useful thresholds for predicting the Target variable.
Its function can be summarized as follows:
Data Perturbation generates a range of datasets.
For each perturbed dataset, a multivariate tree is learned to predict the Target variable with a subset of variables. If a structure is already defined, it is used to bias the selection of the variables for each dataset.
Extracting the most frequent thresholds produces the final discretization.
The Supervised Multivariate takes into account the Minimum Interval Weight and can improve the generalization capability of the model.
Being based on Random Forests, this algorithm is computationally expensive and stochastic by nature.
After the conclusion of the Data Import Wizard, the Supervised Multivariate discretization algorithm is also available from Main Menu > Learning > Discretization
.
Not that the Supervised Multivariate discretization algorithm is not available via Node Context Menu > Node Editor > States > Curve > Generate a Discretization
.
R2-GenOpt is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The R2-GenOpt algorithm utilizes a Genetic Algorithm to find a discretization that maximizes the R2 between the discretized variable and its corresponding (hidden) Continuous variable.
As such, it is the optimal approach for achieving the first objective of discretization, i.e., finding a precise representation of the values of a Continuous variable.
This algorithm takes into account the Minimum Interval Weight and can also create a specific bin for representing zeros if the Isolate Zeros option is set.
In Validation Mode, the R2 value between the Discretized variable and its corresponding Continuous variable can be retrieved in the Information Mode by hovering over the monitor.
K-Means is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The K-Means algorithm is based on the classical K-Means data clustering algorithm but uses only one dimension, which is the to-be-discretized variable.
K-Means returns a discretization that directly depends on the Probability Density Function of the variable.
More specifically, it employs the Expectation-Maximization algorithm with the following steps:
Initialization: random creation of K centers
Expectation: each point is associated with the closest center
Maximization: each center position is computed as the barycenter of its associated points
Steps 2 and 3 are repeated until convergence is reached.
Based on the centers K, the discretization thresholds are defined as:
The following figure illustrates how the algorithm works with K=3.
For example, applying a three-bin K-Means Discretization to a normally distributed variable would create a central bin representing 50% of the data points and one bin of 25% each for the distribution's tails.
Without a Target variable, or if little else is known about the variation domain and distribution of the Continuous variables, K-Means is recommended as the default method.
The Density Approximation discretization detects changes in the sign of the derivative of the Probability Density Function (PDF) in order to identify local minima and maxima.
Between each local minimum and maximum, the algorithm creates a threshold.
Also, the algorithm automatically detects the optimal number of bins, although you can specify the maximum number of bins.
The minimum size permitted for bins is 1% of the data points.
Density Approximation is one of the algorithms for Continuous variables in of the .
This multivariate discretization method is based on analyzing the relationship between variables.
The Unsupervised Multivariate discretization algorithm focuses on representing multivariate probabilistic dependencies using Random Forests.
Its functionality can be described as follows:
A new dataset is created as a clone of the original one.
In this new dataset, each variable is independently shuffled to render all the variables independent while keeping the same statistics for each variable.
The cloned dataset is concatenated with the original dataset. Then, a target variable is created to differentiate the clone from the original, indicating the independent set versus the original dependent set.
Various datasets are generated from this concatenated dataset with Data Perturbation.
For each perturbed dataset, a multivariate tree is learned to predict the target variable with a subset of variables. If a structure is already defined, it is used to bias the selection of the variables for each dataset.
Extracting the most frequent thresholds produces the discretization.
Being based on Random Forests, this algorithm is computationally expensive and stochastic by nature, specifically when the number of variables is important.
The Unsupervised Multivariate discretization algorithm is also available after the data import via Main Menu > Learning > Discretization
.
However, it is not available in the Node Editor (Node Context Menu > Edit > Curve > Generate a Discretization
).
Unsupervised Multivariate is one of the algorithms for Continuous variables in of the .
Equal Frequency is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
This Equal Frequency algorithm defines thresholds so that each interval contains the same number of observations.
This approach typically produces a uniform distribution.
As a result, the shape of the original density function is no longer apparent upon discretization.
This also leads to an artificial increase in the entropy of the system, directly affecting the complexity of machine-learned models.
However, this type of discretization can be useful — once a structure is learned — for further increasing the precision of the representation of continuous values.
Open Data Source (Data Import Wizard) brings data into BayesiaLab to create a new Bayesian network, while Associate Data Source (Data Association Wizard) adds new data to a pre-existing network.
BayesiaLab can load data from flat text files (e.g., CSV, TXT) or connected databases.
There are a total of six steps in the Data Association Wizard, which are mostly identical to the steps in the Data Import Wizard.
To launch the Data Association Wizard for a data table in a
text file, select Main Menu > Data > Open Data Source > Text File
.
database, select Main Menu > Data > Open Data Source > Database
.
See Step 1 of the Data Import Wizard
See Step 2 of the Data Import Wizard.
Additionally, clicking the Unmatched Columns button displays all the columns in the database that are not in the network.
The Unmatched Columns window allows you to select whether to use or not use the unmatched columns from the new dataset.
This step links the variables in the dataset to the nodes of the network.
As such, this step depends on the three previous steps and the selection of variable types.
Here you can define how the variables in the to-be-associated dataset will be mapped to the nodes already in the network.
The following assignments are possible:
Discrete variable in the dataset → Discrete node in the network
Discrete variable in the dataset → Continuous node in the network
Continuous variable in the dataset → Continuous node in the network
If variables in the dataset have the same name and type as existing nodes in the network, BayesiaLab will automatically propose an association.
You can process in the same way for the continuous node N. You can also select and add several nodes at the same time.
The zone 3 contains the buttons used to add or remove associations.
The zone 4 contains the list of associations. It can contain also added variables from the database that will be treated as new nodes in the network. A double-click on an association display, if necessary, a dialog used to edit a discrete or a continuous association. As you can see, some associations show a warning icon. This icon indicates that some unusual behaviors are present in those associations.
The zone 6 contains three buttons. The first and second buttons allow extending automatically the minimum and maximum of each continuous node that does not fit the database's limits. The third button allows filtering automatically each row that does not fit the network's limits.
When you want to add or edit an association between a discrete column of the database and a discrete or continuous node, a dialog box appears:
The zone 3 contains the buttons to add or remove states' associations.
By default, the database's states which are the same as the network's ones, as the aggregates or as the states' long names will be automatically linked.
If filtered values exist in the database but are not declared in the network, it is possible to merge them with the specific state *, if it exists. In this case, this state will be automatically defined as filtered for each concerned node.
When you want to add or edit an association between a continuous column of the database and a continuous node, a dialog box appears:
This dialog is displayed only if the limits of the variable from the database are outside the limits of the node from the network.
By default, the limits of the node of the network are used and all the values outside these limits will be removed from the database. If you want to keep them, use the corresponding options.
If filtered values exist in the database but are not declared in the network, it is possible to merge them with the specific state *, if it exists. In this case, this state will be automatically defined as filtered for each concerned node.
This step occurs only when some columns of the database are not linked with nodes of the network but are distributed. These columns will create new nodes in the network and must be discretized if they are continuous and their states can be aggregated if they are discrete.
Same as Step 4 in Data Importation Wizard.
The modified nodes table:
For the discrete nodes, will be indicated, if necessary, the correspondence between the states in the database and in the network.
For the continuous nodes, will be indicated, if necessary, the initial minimum of the data and the retained final minimum and also the initial maximum and the retained final maximum.
The hidden nodes table: indicates the node that are in the network and that don't have any associated data.
The added nodes table: indicate the list of variables added to the network from the database. This table is the same as in the import report
The zone 1 contains the list of the variables contained in the database and not already associated to a node of the network or added as a new node of the network. As you can see, the variable Geographic Zone contained in the database is discrete and has no corresponding node in the network. If you want to add it as a new node, select it and click on the button , otherwise, if you don't want to add it, do nothing.
The zone 2 contains the list of the nodes contained in the network and not already associated to a column of the database. If you want to link a variable from the database to a node of the network, simply select each one and press the button . All remaining nodes in this list will not be linked to a column of the database and will be considered as hidden node in the network.
The zone 5 contains a list containing the details of each warning of associations located in zone 4. If you select a warning in the list, the corresponding association will be selected in the zone 4. When the mouse hovers on the list, a tooltip shows the content of the warning. A double-click on it opens the convenient association editor in order to verify or modify the association. If you want to remove an asso- ciation or an added node, select it in the list and press the button .
The zone 1 contains the list of the states from the database that are not already linked to a state of the node or directly added as a new state. To perform an association, select a state in the zone1 and a state in the zone 2 and press . If you want to add a state without linking it to a state of the node, simply select it and press the button .
The zone 2 contains the list of the states from the node of the network. This list will never be modified. Even if an association is done, the corresponding state of this list will not be removed and can be reused for another association. It allows linking several states from the database to the same state of the node from the network. To perform an association, select a state in the zone1 and a state in the zone 2 and press . The state from the zone 1 will be removed and the association will be added in the zone 4.
The zone 4 contains all the associated and added states. An association can be removed by selecting it in the list and pressing the button .
After association, the dialog looks like: If there are still some states not linked in zone 1, these states will be removed from the database.
After successful data import, it is possible to display the HTML associate report. This report may contain three tables:
Unlike the Discretization step, which is mandatory for Continuous variables, Aggregation is optional for Discrete variables.
Note that an analogous function, Generate Aggregations, is also available for Discrete nodes in the States tab of the Node Editor.
This function is useful when dealing with a large number of values in a Discrete variable. Once imported, the large number of resulting Node States would make it difficult to discover any relationships with that node.
The Aggregation function in the Data Import Wizard is available for single Discrete variables and for multiple Discrete variables.
Please see the usage instructions and examples in the corresponding sub-topics:
Aggregation of Single Variable
Aggregation of Multiple Variables
Similar to the workflow for the Aggregation of a Single Variable, you can also perform an Aggregation of Multiple Variables.
We use the same auto buyer survey dataset to illustrate the process. In the auto industry, numerous schemes are used to group vehicle types and body styles into so-called segments. Each segment carries a descriptive name, e.g., Compact Car, Full-Size SUV, Minivan, Mid-Size Pickup, Mid-Size Crossover. In our dataset, we have four variables, which each represent such a segmentation scheme. While all these segmentation schemes roughly convey the same information, they differ in their granularity: for instance, variable Segmentation 3 has 23 states; Segmentation 4 has 33. Our objective is now to reduce each one of the segmentation schemes down to three states.
This time, instead of Price, we use the variable MPG - Combined as a target. It represents the survey respondents' estimates of their vehicles' combined fuel economy in miles per gallon (MPG). In other words, we want to create a new aggregation for each segmentation scheme based on fuel economy. Also, the variable MPG - Combined only has two intervals, with one threshold at 22.5. This number has been used in the past as a criterion for so-called "gas guzzlers." So, we are going to use the state <=22.5 as a proxy for poor fuel economy. As a result, we expect each of the existing segments to be "remapped" according to fuel economy.
In the Data panel, using Ctrl+Click or Shift+Click, select the variables Segmentation 1, Segmentation 2, Segmentation 3, and Segmentation 4.
This brings up the Multiple Aggregation panel.
Set Target to MPG - Combined, and State to <=22.5.
Set Final Number of States to 3.
Click the Aggregate button to perform the aggregation.
Note that there will be no immediate feedback regarding the results of the aggregation.
Rather, we can only see the results of the aggregation in the Import Report in Step 5 of the Data Import Wizard.
Click Finish to complete Step 4 of the Data Import Wizard.
BayesiaLab opens a new Graph Window with all variables now presented as nodes.
Simultaneously, a prompt comes up offering to display the Import Report.
Click Yes, and the Import Report — featuring all variables, not just the aggregated variables — appears in a new window.
The Import Report is the fifth and final step of the Data Import Wizard.
Depending on the size of your dataset, the selected discretization algorithms, and the number of Missing Values, this may take anywhere from a fraction of a second to several minutes.
At the same time, a prompt appears, offering you the Import Report.
Click Yes to bring up the Import Report window.
The first column displays the names of the imported variables.
The second column displays the type associated with each variable.
For a Weight variable, no further information is available or provided.
For a Learn/Test variable, the association with BayesiaLab's Learn and Test labels is shown, plus the corresponding number of cases.
The third column shows all States of each variable, if applicable.
The right part of the report depends on the variable type:
Discrete Variables:
The report shows each state and, adjacent to it, any aggregations that were performed. Furthermore, the color of the rightmost cell in the row highlights that an aggregation took place.
Continuous Variables:
The names of the discretized states are shown.
The next two columns to the right report the lower and upper thresholds for each interval.
The rightmost column is colored according to the discretization algorithm used.
Asked/Obtained indicates the requested discretization algorithm versus the one that was used as the fallback option.
Note that you can save this Import Report as an HTML file, so you can subsequently open the fully-formatted report in Excel or any other spreadsheet software.
To illustrate all related workflows, we use an American auto buyer satisfaction survey containing 42,397 responses. Each record contains attributes of the purchased vehicle, such as make (or brand), model, body style, vehicle segment, number of cylinders, transmission, price paid, self-reported fuel economy, plus hundreds of other variables.
First, we want to manually aggregate all 37 automobile brands that appear in the survey into just two states, i.e., Premium Brands and Non-Premium Brands.
This manual aggregation will be based exclusively on our subjective perception of the auto industry as of 2009, which is when this particular survey was conducted.
Click on the Brand variable in the Data panel.
From the States list on the left, select the values you wish to aggregate using Shift+Click or Ctrl+Click.
Then, click the Aggregate button.
The newly-formed, aggregated state appears in the Aggregates list on the right.
By default, the original values are concatenated using the "+" symbol as a delimiter. An underscore "_" is added as a prefix.
As necessary, you can select more values from the States list and create additional aggregated states.
In the list of Aggregates, you can now replace the automatically-generated state names with more meaningful ones.
You can now proceed to any other variable or click Finish to conclude the Data Import Wizard.
Continuing with the previous example, we now perform an aggregation of the same variable, Brand. Now, however, we use each brand's correlation with Price as a guide instead of our judgment.
For the purpose of this demonstration, we have already discretized the Price variable manually into three (arbitrary) intervals using two thresholds, i.e., $25,000 and $45,000.
We now want to use the correlation of each brand with the top interval, i.e., $45,000+, as a measure of its "premium appeal" so that we can reduce the 37 brands into three states, Mainstream, Premium, and Luxury.
For reference, 8.65% of all survey responses reported a vehicle purchase price of $45,000 or higher.
Click on the Brand variable in the Data panel.
Click the Show Correlations box.
Select Target and State.
Review the values shown in the Correlations column. By hovering with your cursor over the Correlation bars in each row, a Tooltip displays the percentage difference of the corresponding row versus the marginal value.
The colored bars show how each value compares to the marginal probability of the selected state of the target. A green-colored bar indicates a probability higher than the marginal probability, and a red bar suggests a lower probability.
Select the states to aggregate using Ctrl+Click.
Once you have selected the values, click the Aggregate button.
The newly aggregated values now appear as a single item in the Aggregates list.
Review the newly aggregated states and, if necessary, assign new names to replace the ones that were generated automatically.
To reverse the aggregation select the aggregated items in the Aggregates list and click Delete.
The principal difference is that you don't select your to-be-aggregated values manually but rather specify thresholds that determine the aggregation.
Click on a Discrete variable in the Data panel.
Click the Show Correlations box.
Select Target and State.
Review the values shown in the Correlations column. By hovering with your cursor over the Correlation bars in each row, a Tooltip displays the percentage difference of the corresponding row versus the marginal value.
The colored bars show how each value compares to the marginal probability of the selected state of the target. A green-colored bar indicates a probability higher than the marginal probability, and a red bar suggests a lower probability.
Now, instead of manually selecting the values you want to aggregate, click the Automatic Aggregation button.
The Automatic Aggregation window opens up.
The colored bar at the top visualizes the percentage differences versus the marginal probability of the selected state of the target.
In our example, there is one brand, Mercury, which had no observations in the $45,000+ interval. As a result, it marks the bottom end of the spectrum, i.e., it is 8.65 percentage points below the marginal probability.
On the other end of the spectrum, Porsche is 83.97 percentage points above the marginal probability.
A default threshold is shown for 0, which is marked by the pink-to-red color change in the bar.
You can manually add thresholds by right-clicking on the bar.
As soon as you add a threshold, a corresponding entry appears in the list below.
Right-clicking again on an existing threshold removes that threshold.
You can move an existing threshold by clicking on it and then dragging it to the desired value.
Also, in the table below the colored bar, you can type in a threshold value.
By clicking OK, you confirm the specified thresholds, and all values in the States list will be aggregated accordingly.
Alternatively, you can click on Generate Aggregates and specify the desired number of intervals.
You obtain a set of aggregation thresholds, which you can further modify or accept by clicking OK.
Now you have a new set of states in the list of Aggregates.
After you click Finish in of the Data Import Wizard, two progress bars inform you about the status:
Data Discretization:
Dataset Creation:
Missing Values Estimation:
Once completed, BayesiaLab opens up a new with all imported variables now represented as nodes.
Note that this report is entirely optional. So whether you display it or not does not affect the completion of the .
Individual variables can be aggregated manually or automatically in .
In addition to the described above, BayesiaLab can support you in making the aggregation decisions. For this purpose, BayesiaLab can show how the original values of the to-be-aggregated variable correlate with those of other variables.
The Correlation-Aided Automatic Aggregation is very similar to the .
So, the initial steps are analogous to the .
To associate a Dictionary for Arc properties, select
Main Menu > Data > Associate Dictionary > Arc >
and then select the property from the submenu:
Arcs
Specifies the addition or removal of arcs for the currently active Bayesian network. If an arc removal is specified, it will precede any addition of an arc.
Before adding arcs, any constraints applicable to the active Bayesian network and the Temporal Indices will be checked. If a specified arc addition is inconsistent with the existing constraint, the arc won't be added.
Syntax Examples:
N1->N2=
true
adds an arc from N1 to N2
N1->N2=
false
removes the arc from N1 to N2
N1<-N2=
true
adds an arc from N2 to N1 (note the reversal of the arrow symbol <-
produces an arc in the opposite direction).
Note that you need to add an escape character \
before any spaces in node names. Otherwise, a space will be interpreted as a delimiter:
N\ 1->Node\ 2=
true
adds an arc from N 1 to N 2
Instead of the ->
characters, you can also use space
, the equal sign =
, and --
as a delimiter between the start node and end node. With these alternative delimiters, the order of the nodes determines the arc direction.
N1 N2=
true
adds an arc from N1 to N1
N1=N2=
true
adds an arc from N1 to N2
N1--N2=
true
adds an arc from N1 to N2
N1 N2=
true
adds an arc from N1 to N2
Forbidden Arcs
Specifies the addition or removal of Forbidden Arcs between nodes and classes
Syntax Examples:
N1->N2
adds a Forbidden Arc from N1 to N2
N1--N2
adds a Forbidden Arc between N1 and N2
ClassA->ClassB
applies Forbidden Arcs from any nodes in ClassA to any nodes in ClassB
N1
N2
removes any existing Forbidden Arc between N1 and N2. Note the space in the syntax, which triggers the removal of the Forbidden Arc.
Arc Comments
Adds, updates, or removes Arc Comments to arcs in the active network. Arc Comments are stored in HTML format.
Syntax Examples:
N1->N2=<p>This is a sample <b>Arc Comment</b>.</p>
adds an Arc Comment to the arc between N1 and N2.
N1->N2=
removes an existing Arc Comment from the arc between N1 and N2.
The added Arc Comment can be edited in the Arc Editor: Arc Contextual Menu > Edit
Arc Colors
Defines colors for arcs in the active network. You can specify the color for each arc individually by providing the hex code of the color
Syntax Examples:
N1->N2=000000
changes the color of the arc between N1 and N2 to black.
N1->N2=FF0000
changes the color of the arc between N1 and N2 to bright red.
Note that there is no option to revert an arc color to the default color. When changing Arc Colors via a Dictionary, the colors must always be specified explicitly.
Structural Priors
Assigns Structural Priors to arcs in the active network.
Fixed Arcs
Applies Fixed Arcs to the active network or removes them.
Syntax Example:
N1->N2=true
changes the arc between N1 and N2 to a Fixed Arc.
N1->N2=false
changes the arc between N1 and N2 to a normal, non-fixed arc.
Dictionaries offer a convenient way to manage a large set of properties related to a Bayesian network using text files with a human-readable syntax.
Dictionaries are plain text files that can be opened and edited outside of BayesiaLab in any text editor.
Using Dictionaries, you can export the properties of a given network or associate properties that you previously saved.
Dictionaries are specific to the elements of a Bayesian network, e.g., Arcs, Nodes, and States and their respective properties.
To associate a Dictionary for Arc properties, select
Main Menu > Data > Associate Dictionary > Arc >
and then select the property from the submenu:
Dictionary File Structure
As indicated by the syntax, the name of the node, class or state in the text file cannot contain equal, space or tab characters. If the node names contain such characters in the networks, those characters must be written with a {color} (backslash) character before in the text file: for example the node named Visit Asia will be written Visit\ Asia in the file.
In order to specifically differenciate a nam which is the same for a classe, a node or a state, you must add at the end of the name the suffix "c" for a class, "n" for a node and "s" for a state.
If your network contains non-ASCII characters, you must save your own dictionaries with UTF-8 (Unicode) encoding. For example, in MS Excel, choose "save as" and select "Text Unicode (*.txt)" as type of file. In Notepad, choose "save as" and select "UTF-8" as encoding. If your file contains only ASCII character you can let the default encoding (depending on the platform) but it is strongly encouraged to use UTF-8 (Unicode) encoding in order to create dictionary files that doesn't depend on the user's platform. So, for example, a chinese dictionary can be read by a german without any problem whatever the used platforms are. If you are not sure how to save a file with UTF-8 encoding, you should export a dictionary with BayesiaLab, modify and save it (with any text editor) and load it in BayesiaLab.
BayesiaLab offers learning algorithms that allow you to generate a Bayesian network from data.
However, with a given Bayesian network, BayesiaLab can also generate data.
For this purpose, BayesiaLab draws samples from the Joint Probability Distribution encoded by the Bayesian network and saves the obtained samples as observations.
Select Main Menu > Data > Generate Data
.
It is possible to choose to generate the data as an internal database. We can also indicate the rate of missing values of the base and use the long name of the states if the database is written in a file. It is possible to generate a database with test examples by indicating the wanted percentage.
State:
State Renaming: allows renaming each state of each node with a new name.
State Values: allows associating with each state of each node a numerical value .
State Long Names: allows associating with each state of each node a long name more explicit than the default state name. This name can be used in the different ways to export a database, in the html reports and in the monitors.
Filtered States: allows defining a state to each node as a filtered state .
Node Renaming: allows renaming each node with a new name. These new names must be, of course, all different.
Comments: allows associating a comment with each node that is in the file.
Classes: allows organizing nodes in subsets called classes . A node can belong to several classes at the same time. These classes allow generalizing some node's properties to the nodes belonging to the same classes. They allow also creating constraints over the arc creation during learning.
Colors: allows associating colors with the nodes or classes that are in the file. The colors are written as Red Green Blue with 8 bits by channel in hexadecimal format (web format): for example the color red is 255 red 0 green 0 blue, it will give FF0000. Green gives 00FF00, yellow gives FFFF00, etc.
Images: allows associating colors with the nodes or classes that are in the file. The images are represented by their path relatively to the directory where the dictionary is.
Costs: allows associating with each node a cost . A node without cost is called not observable.
Temporal Indices: allows associating temporal indices with the nodes that are in the file. These temporal indexes are used by the BayesiaLab's learning algorithms to take into account any constraints over the probabilistic relations, as for example the no adding arcs between future nodes to past nodes. The rule that is used to add an arc from node N1 to node N2 is:
If the temporal index of N1 is positive or null, then the arc from N1 to N2 is only possible if the temporal index of N2 is greater of equal to the index of N1.
Local Structural Coefficients: allows setting the local structural coefficient of each specified node or each node of each specified class.
State Virtual Numbers: allows setting the state virtual number of each specified node or each node of each specified class.
Locations: allows setting the position of each node.
Dictionary File Structures | ||
---|---|---|
The states' long name can be saved instead of the states' name. If the user wants to save the continuous values, the numerical values will be created randomly in each interval. If the data are generated in Validation Mode , then the evidence is considered.
Arc
Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , true for an added arc or false for a removed arc. The last occurrence is always chosen.
Forbidden Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class.
Comments
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , comment . The comment can be any character string without return (in html or not). The last occurrence is always chosen.
Colors
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , color . The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF,etc. The last occurrence is always chosen.
Fixed Arcs
Name of the arc's starting node or class, -> , <- or even -- to indicate the both possible orientations, name of the arc's ending node or class, Equal, Space or Tab , true for an fixed arc or false for a not fixed arc. The last occurrence is always chosen.
Node
Node Renaming
Name of the node Equal, Space or Tab new node name. The new name must be valid (different from t or T and without?). A node can be present only once otherwise the last occurrence is chosen.
Comments
Name of the node or the class Equal, Space or Tab Comment. The comment can be any character string without return (in html or not). A node can be present only once otherwise the last occurrence is chosen.
Classes
Name of the node Equal, Space or Tab Name of the class. The class can be any character string. A node present several times will be associated with different classes.
Colors
Name of a node or a class Equal, Space or Tab Color The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF, etc. A node can be present only once otherwise the last occurrence is chosen.
Images
Name of a node or a class Equal, Space or Tab path to the image relatively to the directory where the dictionary is. The image path must be a valid relative path or an empty string. A node can be present only once otherwise the last occurrence is chosen.
Costs
Name of the node Equal, Space or Tab value of the cost or empty if we want the node to be not observable. The cost is an empty string or a real number superior or equal to 1. A node can be present only once otherwise the last occurrence is chosen.
Temporal Indices
Name of the node Equal, Space or Tab value of the index or empty if we want to delete an already existent index The index is an integer. A node can be present only once otherwise the last occurrence is chosen.
Local Structural Coefficients
Name of the node Equal, Space or Tab value of the local structural coefficient or empty if we want to reset to the default value 1. The local structural coefficient is an empty string or a real number superior to 0. A node can be present only once otherwise the last occurrence is chosen.
State Virtual Numbers
Name of the node Equal, Space or Tab virtual number of states or empty if we want to delete an already existent number. The state virtual number is an empty string or an integer superior or equal to 2. A node can be present only once otherwise the last occurrence is chosen.
Locations
Name of the node Equal, Space or Tab , position. The location is represented by two real numbers separated by a Space . The first number represent the x-coordinate of the node and the second number the y-coordinate. A node can be present only once otherwise the last occurrence is chosen.
State
State Renaming
Name of the node or class dot (.) name of the state Equal, Space or Tab new state name or State name Equal, Space or Tab new state name if we want to rename the state for all nodes. The new name is a valid state name. A state can be present only once otherwise the last occurrence is chosen.
State Values
Name of the node or class dot (.) name of the state Space or Tab real value or Name of the state Equal, Space or Tab real value if we want to associate a value with a state whatever the node. The value is a real number. A state can be present only once otherwise the last occurrence is chosen.
State Long Names
Name of the node or class dot (.) name of the state Equal, Space or Tab long name or Name of the state Equal, Space or Tab long name if we want to associate a long name with a state whatever the node. The long name is a string. A state can be present only once otherwise the last occurrence is chosen.
Filtered States
Name of the node or class dot (.) name of the filtered state. Name of the filtered state if we want to set the filter property to the state whatever the node. A state can be present only once otherwise the last occurrence is chosen.
When working with multiple networks that contain the same nodes, or at least some of the same nodes, it can be useful to share an Evidence Scenario File between them as well.
For that purpose, you can export an Evidence Scenario File and subsequently associate it with another network or be used with a WebSimulator.
Main Menu > Data > Evidence Scenario File > Export.
Then choose a file name and click Save.
The Evidence Scenario File is now saved in a human-readable and easily editable text format.
This allows you to modify the Evidence Scenario File with a text editor, e.g., to add a number of new Evidence Scenarios.
Please see the sub-topic Evidence Scenario File Syntax for a detailed discussion of the format.
In BayesiaLab, you can manage sets of actual or potential observations in a Bayesian network using Evidence Scenario Files.
For instance, an Evidence Scenario File can serve as a convenient way to manage multiple sets of assumptions, such as what-if scenarios. This is particularly helpful when scenarios contain many individual assumptions. Imagine the business case of an airline represented as a Bayesian network. It would have to include assumptions regarding travel demand for all origin-destination pairs. Manually setting and modifying assumptions for hundreds of nodes would not be practical.
An Evidence Scenario File consists of one or more Evidence Scenarios.
And, each Evidence Scenario contains one or more node-specific observations, as illustrated below:
Applying an Evidence Scenario means setting the stored pieces of evidence to the corresponding nodes.
Note that evidence cannot be set on Not Observable Nodes, i.e., nodes that have a Cost of 0 (see Cost Management).
With a given Bayesian network, any current observation on a node or sets of observations set on multiple nodes can be recorded as an Evidence Scenario. As soon as you store an Evidence Scenario, BayesiaLab "starts a tab" by creating an internal Evidence Scenario File.
Four types of evidence can be saved as an Evidence Scenario:
Hard Evidence
Likelihood Evidence
Probabilistic Evidence
Numerical Evidence
To learn more about setting evidence, please see the section on Setting Evidence in Contextual Menu of Monitors.
Then, enter an optional comment in the pop-up window and assign a Weight to the Evidence Scenario you are storing. If you don't enter a comment, the Evidence Scenario will merely be indexed sequentially, starting with 0.
Click OK to confirm.
You can add further Evidence Scenarios to the ones already stored in the internal Evidence Scenario File.
Upon selecting (and therefore applying) an Evidence Scenario, the corresponding comment, if available, appears in the Status Bar.
Note that an Evidence Scenario File is saved with the Bayesian network file. So, reopening the saved network makes all stored Evidence Scenarios available again.
In addition to recalling Evidence Scenarios one by one, you can also use them in BayesiaLab batch-processing functions:
Batch Labeling
Batch Inference
Batch Joint Probability
Batch Outlier Explanation
In this context, the Evidence Scenario File provides the observations in the same way as an internal or external dataset.
To store an observation as an Evidence Scenario, click the icon.
Now, an additional icon in the Status Bar indicates that there is an Evidence Scenario File.
Right-clicking the icon in the Status Bar brings up the list of stored Evidence Scenarios, enumerated by an index and, if available, with corresponding comments.
So, the next time to click the icon, the pop-up window asks whether you want to append the new Evidence Scenario to the list of Evidence Scenarios or replace a particular existing Evidence Scenario.
To apply (or recall) a stored Evidence Scenario, right-click on the Evidence Scenario File icon in the Status Bar and click on the scenario you want to apply to the network.
Also, hovering over the Evidence Scenario File icon with your pointer brings up the number of available Evidence Scenarios.
You can remove the current Evidence Scenario File by left-clicking on the icon.
As with BayesiaLab's Dictionaries, the syntax of an Evidence Scenario File is straightforward. However, we need to distinguish between the syntax for Contemporaneous and Temporal networks:
Each line of an Evidence Scenario File represents one Evidence Scenario.
Encoding an Evidence Scenario always follows the same pattern, with the node name and the evidence separated by a colon (:). The optional scenario name follows after a double slash (//).
?<NodeName>?:<Evidence>//<ScenarioName>
Evidence can be encoded in several ways in an Evidence Scenario File:
Hard Evidence:
?<NodeA>?:<State1>//Scenario1
Numerical Evidence:
?<NodeB>?:m{<value>}//Scenario2
Probabilistic Evidence:
?<NodeC>?:p{<StateA>:0.3;<StateB>:0.5;<StateC>:0.2}//Scenario3
Likelihood Evidence:
?<NodeD?:l{<StateX>:1;<StateY>:0.5}//Scenario4
To encode multiple pieces of evidence in one Evidence Scenario, simply separate the individual pieces of evidence with a semicolon. The scenario name remains at the end of the line, separated by a double slash.
For Temporal Bayesian networks, the syntax of the Evidence Scenario File is slightly different. Here, each line in the text file refers to a time step, in which the evidence specified in that line will be applied.
Each line starts with an integer value that represents the time step, in which the evidence of that line will be set.
Evidence can be encoded in several ways in an Evidence Scenario File:
To encode multiple pieces of evidence in one Time Step, simply separate the individual pieces of evidence with a semicolon.
For Temporal networks, recalling evidence from the Evidence Scenario File is different compared to Contemporaneous networks.
Now, the time-specific Evidence Scenarios will be set automatically as you perform a temporal simulation.
A saved Evidence Scenario File can be reimported into the network from where it originated (e.g., after external modification, see Evidence Scenario File Syntax) or it can be loaded into an entirely different network file.
Main Menu > Data > Evidence Scenario File > Associate
.
If the newly-associated Evidence Scenario File contains incompatible content, e.g., nonexistent nodes in the network, BayesiaLab shows a corresponding error message:
However, the remaining, compatible content will be available in the now-attached Evidence Scenario File.
In Validation Mode, you can perform network validation, simulation, and analysis.
In Validation Mode, both the Graph Panel and the Monitor Panel are displayed in the Graph Window.
There are several ways to switch to Validation Mode:
Press the shortcut F5
.
Select Main Menu > View > Validation Mode
.
In Modeling Mode, you can conduct all modeling activities, such as learning and editing network graphs.
In Modeling Mode, only the Graph Panel is visible and accessible inside the Graph Window, i.e., the Graph Panel fills the Graph Window entirely.
There are several ways to switch to Modeling Mode:
Press the shortcut F4
.
Select Main Menu > View > Modeling Mode
.
The Graph Window can only be in one of two possible modes, i.e., and .
Click in the lower-left corner of the Graph Panel.
In any workflow with BayesiaLab, switching between and is very frequent. Hence, we highly recommend that new users start using the F4
and F5
shortcuts straight away.
The Graph Window can only be in one of two possible modes, i.e., and .
Click in the lower-left corner of the Graph Panel.
In any workflow with BayesiaLab, switching between and is very frequent. Hence, we highly recommend that new users start using the F4
and F5
shortcuts straight away.
Hellixia is the name of BayesiaLab's subject matter assistant powered by ChatGPT. Hellixia offers a wide range of functions to help you characterize a given problem domain:
Identify relevant dimensions of a problem domain
Extract dimensions from a text
Generate embeddings for learning a semantic network
Generate meaningful descriptions for classes of nodes
Provide tools for causal analysis
Translate names and comments of nodes into different languages
Generate images to be associated with nodes
BayesiaLab integrates functionality provided by OpenAI's ChatGPT, a machine learning-based service for compiling human knowledge obtained from the Internet. However, Bayesia* and its affiliates are not affiliated with OpenAI.
Bayesia* makes no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the ChatGPT feature. Therefore, any reliance on such information is strictly at your own risk.
In no event will Bayesia* be liable for any loss or damage, including indirect or consequential loss or damage, arising out of, or in connection with, the use of ChatGPT through BayesiaLab.
Please note that the responses generated by ChatGPT are created by a machine-learning model and do not reflect the opinions or policies of Bayesia*.
ChatGPT may sometimes produce inappropriate or offensive content. While OpenAI states that mechanisms exist in ChatGPT to reduce such occurrences, Bayesia* has no control over the delivery of such content and cannot prevent such instances.
*References to "Bayesia" include Bayesia S.A.S. and its affiliates Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd.
To utilize the Hellixia functions, BayesiaLab must connect to the OpenAI API using a personal API Key.
OpenAI is a third-party service that can be accessed through BayesiaLab; however, it is not part of the BayesiaLab software. As a result, Bayesia makes no representations.
A subscription fee payable to OpenAI may be required to obtain your personal API Key.
Obtain your personal API key from the OpenAI website.
Once you have obtained your API Key, enter it into your locally-installed BayesiaLab software under Main Menu > Windows > Preferences > Tools > OpenAI
If you want to utilize an alternative to OpenAI, you can deploy models in your own Microsoft Azure account. The process involves creating endpoints.
The URL is structured as follows: https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}
."
In this URL:
{your-resource-name}
should be replaced with the name of your Azure OpenAI resource.
{deployment-id}
should be replaced with the ID of the specific deployment.
{api-version}
should be replaced with the version of the API you're using. This follows the YYYY-MM-DD format.
If you're operating behind a proxy that enforces SSL rewriting or redirection, you might encounter the following error message:
'PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.'
If you encounter this issue, it will be necessary to point BayesiaLab towards the truststore, where the approved certificates are kept.
Go to Main Menu > Windows > Preferences > General.
Click on the folder icon to locate and select the BayesiaLab.cfg file.
Navigate to the end of the file, where you'll locate the [JavaOptions] section.
If you're using Windows, you should add the following two lines:
java-options=-Djavax.net.ssl.trustStoreType=Windows-ROOT
java-options=-Djavax.net.ssl.trustStore=NUL
For MacOSX users, instead, add:
java-options=-Djavax.net.ssl.trustStoreType=KeychainStore
java-options=-Djavax.net.ssl.trustStore=/dev/null
After these changes, save the file and then restart BayesiaLab for the updates to take effect.
To illustrate Semantic Text Analysis, we selected Dr. Martin Luther King's famous speech, I Have a Dream:
To start the process, we open a new graph and add a single node.
By default, the name of the new node is N1. However, we can change the name to a more descriptive title, e.g., "I Have a Dream."
This node will host the content we wish to analyze.
We now need to enter the speech as a Node Comment.
From the Node Contextual Menu, select Edit
, then select the Comment
tab.
Now paste the speech into the text field.
Note that the Node Comment can accommodate any text length, whereas the Node Name and the Node Long Name are limited.
This icon indicates that a Node Comment is associated with this node.
Select the node of interest, which is I Have a Dream.
Select Main Menu > Hellixia > Dimension Elicitor
.
The Dimension Elicitor window opens up, in which we need to specify several settings:
Question Settings
Keyword: We select Causes, Achievements, and Objectives from the list of Keywords.
Groups: With Groups, we can bundle several keywords to be easily retrieved later when the analysis needs to be repeated. We name the group of our four keywords "Civil Rights."
Responses per Keyword specifies the maximum number of items to be retrieved per Keyword.
Exclude Duplicates automatically removes duplicates from the list of results. This is helpful as the query can produce identical Dimensions in the context of different Keywords.
Completion Model: From the drop-down menu, the following models are available:
GPT_35_TURBO
GPT_35_TURBO_16K
GPT_4
Context
Knowledge File: This text file allows you to specify a broader context for a query. For example, you might embed chunks of documents related to your domain of study into a dataset. Then, you can identify and use the chunks with embeddings closest to that of your query to construct your knowledge file.
General Context: Checking the box and entering a heading provides relevant context. In our example, we use the title "Civil Rights Movement."
Subject of the Query
Checkboxes for Node Name, Node Long Name, and Node Comment are available.
In this example, however, the relevant subject is only stored in the Node Comment, i.e., the entire speech, I Have a Dream.
Options
By checking Create a Class per Keyword, BayesiaLab assigns all newly-discovered dimensions to new BayesiaLab Classes.
Submit Query
Clicking the Submit Query button starts the research.
Upon completing the research, the table at the bottom of the window lists all discovered dimensions and provides a corresponding comment.
The checkboxes at the end of each row allow you to select whether or not to keep the found Dimensions and add them to the Graph Panel. This allows you to override the default selection of all Dimensions.
Click OK to add the dimensions to the Graph Panel.
The Dimensions are now shown as nodes on the Graph Panel.
Furthermore, if you select the option Create a Class per Keyword, the Dimension nodes are grouped based on their associated Keyword. Additionally, a Note is added to visually group each set of nodes that corresponds to a particular Keyword/Dimension.
In machine learning and Natural Language Processing (NLP), embedding is a mathematical representation of a token, word, phrase, sentence, or any other linguistic unit with a continuous high-dimensional vector. Word embeddings, in particular, are widely used representations that capture the semantic and syntactic properties of words.
The embeddings used by Hellixia have 1,536 dimensions and allow capturing the semantics of the linguistic units defined by the nodes (names, long names, comments).
To demonstrate the workflow for generating embeddings, we start with a set of 54 nodes representing a selection of influential 19th and 20th-century painters.
Go to Main Menu > Hellixia > Embedding Generator
.
Select one or more Input Types from the Hellixia Embedding Generator Window, i.e., Node Name, Node Long Name, and Node Comment. In the example, only Node Names are defined, so that is the only Input Type you need to select.
Click OK.
Each node now has 1,536 observations, which is indicated by the Tooltip associated with the database icon.
A semantic network is a graphical representation of knowledge or concepts organized in a network-like structure. It is a form of knowledge representation that depicts how different concepts or entities are related to each other through meaningful connections.
In a semantic network, concepts are represented as nodes, and their relationships are depicted as labeled links or arcs. These links indicate the connections or associations between the concepts, such as hierarchical, associative, or causal relationships.
With the embeddings now stored as observations, we can machine-learn a semantic network.
For this purpose, we use one of BayesiaLab's Unsupervised Learning algorithms.
The Maximum Weight Spanning Tree (MWST) is the best choice in this context. The algorithm is quick and renders an easily interpretable network.
After the learning is completed, the resulting network appears in the following screenshot:
We can apply one of BayesiaLab's layout algorithms to interpret this graph more easily.
For instance, select Main Menu > View > Layout > Symmetric Layout
.
The resulting graph is shown outside the BayesiaLab window so that its structure can be viewed and interpreted more easily.
For further information, visit Microsoft's official documentation at
Semantic Text Analysis closely mirrors the process of identifying the dimensions of a particular subject (see ).
On the Toolbar, click on the Node icon , and place a new node on the Graph Panel.
After entering the speech and closing the Node Editor, the Information icon appears next to the node name:
As the first formal step in the Semantic Text Analysis, we need to use the again.
An Information icon is attached to each node. This means that the Comments generated by the Dimension Elicitor are stored as Node Comments.
In Modeling Mode , select the nodes on the Graph Panel for which you want to generate embeddings. In our example, we select all 54 nodes.
Upon retrieving the embeddings, the Main Window shows the database icon in the bottom right corner. This indicates that the embeddings are now attached as a dataset.
By default, the observations associated with each node are discretized into quintiles, which you can see by switching into Validation Mode and bringing up any of the Monitors.
From the Modeling Mode , select Main Menu > Learning > Unsupervised Structural Learning > Maximum Spanning Tree
.
Hellixia's Comment Generator is similar to the Dimension Elicitor.
In the case of the Dimension Elicitor, Hellixia creates new nodes.
With the Comment Generator, Hellixia retrieves Dimension Names and the related Comments from ChatGPT and adds them automatically to the Node Comment.
Create a node representing the subject of interest, e.g., "Judea Pearl."
Move your pointer to the desired location to place your new node on the Graph Panel.
Give the node a meaningful name representing the subject to be studied, i.e., "Judea Pearl."
You can also add a Node Long Name and a Node Comment to provide more information.
Select the newly-created node, and then select Main Menu > Hellixia > Comment Generator
, which brings up the Comment Generator window.
There is a range of settings you need to specify in the Comment Generator window:
Under Question Settings,
Specify the Keyword from the dropdown menu.
If needed, stipulate the maximum number of responses per Keyword.
Select the Completion Model from the dropdown menu, e.g., GPT_35 or GPT_4.
Under Context,
Open a Knowledge File, if available.
Provide a General Context for the query. In our example, use "Artificial Intelligence."
Under Main Subject of the Query, select all fields that contain relevant information for the query, i.e., Node Name, Node Long Name, and Node Comment. Check all that apply. Both the Node Long Names and Node Comments are optional properties. If they're selected but not defined for a given node, Hellixia will use the Node Name by default.
Click Submit Query, and Hellixia retrieves the responses from ChatGPT and lists them in a table at the bottom of the Comment Generator window.
The Subject Node column displays the main subject of the query.
The Keyword column lists the keyword used for the Dimension retrieved in that row.
The Index column assigns an index to each Dimension retrieved for a Keyword.
The Comment column further describes the Dimension retrieved.
The Keep column indicates which Keyword/Dimensions row to keep.
Under Output Settings, specify what part of the results table will be added to the Node Comment.
Checking Dimension Name and Comment and Concatenate Output to Current Comment, you will obtain a Comment like this, which you can see in the Node Editor.
The first step in formulating a new Bayesian network about a problem domain is typically defining the dimensions of that domain. This would also be the first step in the BEKEE workflow (seeBayesia Expert Knowledge Elicitation Environment (BEKEE))
Depending on the familiarity with the field of study, exploring a subject's facets and aspects may require a significant brainstorming effort. The Hellixia Dimension Elicitor assists by querying ChatGPT and proposing a list of dimensions.
To illustrate the Dimension Elicitor, we want to discover the dimensions related to the concept of "Bayesian Belief Networks."
Create a node representing the subject of interest, e.g., "Bayesian Belief Networks."
Move your pointer to the desired location to place your new node on the Graph Panel.
Give the node a meaningful name representing the subject to be studied, i.e., "Bayesian Belief Networks."
You can also add a Long Name and a Node Comment to provide more information.
Select the newly-created node, and then select Main Menu > Hellixia > Dimension Elicitor
, which brings up the Dimension Elicitor Window.
In the Question Settings of the Dimension Elicitor Window, specify the keywords to be investigated. The list offers 145 keywords that Hellixia can use to query ChatGPT.
Select Advantages, Characteristics, Components, Contributions, Dimensions, and Strengths as Keywords to follow our example.
Responses per Keyword specifies the maximum number of items to be retrieved per keyword.
Exclude Duplicates automatically removes duplicates from the list of results. This is helpful as the query can produce identical Dimensions in the context of different Keywords.
Depending on your OpenAI account and available resources, you can select the appropriate Completion Model from the dropdown menu, e.g., GPT-3.5 or GPT-4,
You can provide additional context by submitting a Knowledge File.
This text file allows you to specify a broader context for a query.
For example, you might embed chunks of documents related to your domain of study into a dataset.
Then, you can identify and use the chunks with embeddings closest to that of your query to construct your Knowledge File.
You can also provide a General Context for the query, e.g., "Artificial Intelligence."
The Main Subject of the Query is determined by the selected nodes.
You can use the Node Name, the Node Long Name, or the Node Comments.
Node Longe Names and Node Comments have the advantage that they can include longer text and provide more information for the query.
Both the Node Long Names and Node Comments are optional properties of a node. If they are selected as a Main Subject for the Query but have no content, Hellixia will use the Node Name by default.
Click Submit Query to start the elicitation process.
Once the query is complete, a table at the bottom of the window shows the results.
The Subject Node column displays the Main Subject of the Query.
The Keyword column lists the keyword used for the dimension retrieved in that row.
The Index column assigns an index to each dimension retrieved for a Keyword.
The Comment column further describes the dimension retrieved. This comment will also be used as a Node Comment.
The Keep column indicates which Keyword/Dimensions row to keep. If you checked Exclude Duplicates, only unique Keyword/Dimension combinations will be kept.
However, you can modify the selection by checking and unchecking items in the Keep columns.
All Dimensions are added as nodes to the Graph Panel upon clicking OK.
If you select the option Create a Class per Keyword, the Dimension nodes are grouped by their associated Keyword. Additionally, a Note is added to visually group each set of nodes corresponding to a particular Keyword/Dimension.
To manage groups of nodes, BayesiaLab offers Classes.
Nodes can be added to Classes manually or automatically. For instance, the Variable Clustering function can assign nodes to new Classes representing latent factors. By default, newly-created Classes have generic names, such as [Factor_0], which carries no meaning.
Finding suitable descriptions for Classes can be time-consuming.
The Class Description function can assist you in finding meaningful summaries of a Class of nodes.
With the Hellixia Class Description Generator, we can quickly find a useful description for a subset of nodes we select.
In our example, we have a large number of nodes from an auto buyer satisfaction survey.
We are interested in a subset of nodes related to the quality perception of the vehicle interior, i.e.:
Interior Colors
Quality of Interior Materials
Interior Trim & Finish
Quality of Seat Materials
Select these nodes node of interest.
Then select Main Menu > Hellixia > Class Description
.
Specify a Context, if applicable.
Indicate by ticking the checkboxes where the subject matter is stored, i.e., Node Name, Node Long Name, or Node Comment. Check all that apply.
Clicking OK starts generating the Class Description.
The chime confirms when the process is complete.
Opening the Class Editor shows the Class Description that was generated.
Select Graph Contextual Menu > Edit Classes
The Description column shows the newly-generated Class Description.
BayesiaLab's Clustering function produces new Factors and associated Classes.
So, having a dozen or more new Classes is quite common in this context.
By default, the newly-generated Classes have generic and non-informative names, like [Factor_0], [Factor_1], etc.
Given that the Factors and Classes are meant to represent meaningful concepts, naming them is important but can be tedious.
In the following example, 57 Factors (and Classes) were created from 240 manifest nodes. Each manifest node measures the degree of agreement or disagreement with statements in a personality test, such as, "I get angry easily" or "I remain calm under pressure."
These original statements are included as Node Comments with every node.
Semantic Variable Clustering groups nodes based on the semantics of their Node Names.
For this example, we use a list of 49 positive character traits.
All character traits are represented by nodes in an unconnected Bayesian network.
The nodes are named after character traits; no other information is available, e.g., in the Node Long Names or the Node Comments.
Select all nodes you wish to cluster.
To start the Semantic Variable Clustering, select Main Menu > Hellixia > Semantic Variable Clustering
.
In the Semantic Variable Clustering window, you can specify the following item:
Your Completion Model, which depends on your OpenAI subscription
The Context that may apply to the nodes to be clustered
The Maximum Number of Clusters allows you to limit how many clusters are generated.
Clicking OK initiates Hellixia's communication with ChatGPT.
Upon completing the task, BayesiaLab presents the Semantic Variable Clustering Report in a new window.
To this day, no reliable methods exist to find causal relationships in data. Given a statistical association between two variables, it is impossible, based on data alone, to establish which variable is the cause and which is the effect.
As a result, acquiring additional external information, such as human expert knowledge or the temporal order of the variables, remains necessary to determine the causal direction in bivariate relationships.
With ChatGPT, it is now possible to let BayesiaLab tap into external domain knowledge. BayesiaLab's Hellixia can ask ChatGPT about the causal relationship between two nodes.
Select two nodes of interest, e.g., Smoking and Lung Cancer.
Select Main Menu > Hellixia > Causality Search
In the Causality Search Window:
Specify the Completion Model.
Provide any applicable context to the Context field.
Check which fields contain the subjects under study, e.g., Node Name, Node Long Name, and Node Comment.
Click OK to launch the search.
If ChatGPT believes a causal relationship exists, BayesiaLab adds a corresponding arc.
Clicking Export produces a so-called Structural Prior Dictionary, which is a text file containing all arc attributes, i.e.,
Start and End of arc
Structural Prior for each arc
Arc Comment, which, in this context, contains the Explanation for the causal directions as obtained from ChatGPT.
We can now use this Structural Prior Dictionary as an Arc Dictionary and replace the original, machine-learned arcs with the ChatGPT-informed causal arcs.
First, select Graph Panel Contextual Menu > Delete All Arcs
to remove all existing arcs.
Then, select Main Menu > Data > Associate Dictionary > Arc >Arcs
.
The network now features the causal arc directions as obtained from ChatGPT.
With the final arc directions now in place, we should arrange the nodes into a more intuitive layout, i.e., positioning parent nodes above child nodes.
Select Main Menu > View > Layout > Genetic Grid Layout > Top-Down Repartition
.
The network now displays the correct causal order of nodes and arcs.
The Causal Structural Priors function extends this concept to more than two nodes.
We illustrate the Causal Structural Priors workflow with the well-known "Visit Asia" example from the domain of lung diseases.
We have a synthetic dataset from this domain, which has already been imported into BayesiaLab.
So, our starting point is an unconnected network, as shown in the following screenshot.
For instance, the node Smoking has an associated Node Comment that says, "The patient is a regular smoker."
Our objective is to find the causal relationships between risk factors, conditions, symptoms, and diagnostic imaging.
However, we know that machine learning alone cannot discover the true causal structure of this domain.
We begin with machine learning the associations between all nodes anyway and use the Unsupervised EQ learning algorithm for that purpose.
This newly-learned Bayesian network features directed arcs, but they can clearly not be interpreted as causal, e.g., Smoking could not possibly be a cause of Age.
Applying the Genetic Grid layout highlights the implausibility of the arc directions.
Select Main Menu > View > Layout > Genetic Grid Layout > Top-Down Repartition
.
In the past, we would have had to use any available domain knowledge from experts to correct the arc directions.
With Hellixia, however, we can tap into the domain knowledge available via ChatGPT.
So, select all arcs and then select Main Menu > Hellixia > Causal Structural Priors
.
In the Causal Structural Priors window, you need to specify a number of items:
Under Completion Model, choose a model for which you have a subscription, e.g., GPT_35 or GPT_4.
You can specify a General Context of the problem domain. In this example, "Lung Diseases" would be appropriate.
Under Subject of the Query, check all fields that contain information regarding the subject matter. We have information in the Node Name and the Node Comment in the example.
Clicking OK starts the search for causal relationships via ChatGPT. The progress bar at the bottom of the Graph Panel shows the search status.
A chime marks the completion of the search.
This table displays the causal arc directions obtained from ChatGPT in the three left columns.
The reason for the arc orientation is provided in the Explanation column.
Clicking Preview opens a window showing a simplified view of the causal arc directions proposed by ChatGPT.
Now, there are two ways to proceed, as illustrated in the following workflows 1 and 2.
Select Toolbar > Node Creation Mode
Select Toolbar > Node Creation Mode
These newly-created clusters are now represented as Classes, indicated by the Classes icon .
Furthermore, BayesiaLab adds an Arc Comment with any contextual information ChatGPT provides. The Arc Comment icon indicates that such a comment was added.
Clicking the Show Arc Comment button in the Toolbar displays the comment.
Note that the algorithm keeps searching for a better layout until you stop the process by clicking the red buttonto the left of the Progress Bar.
Clicking the Show Arc Comment button in the Toolbar displays the comments on the arcs. The Arc Comments show the explanations for the causal directions retrieved from ChatGPT.
With the function, Hellixia allows you to retrieve domain knowledge from ChatGPT about a potential causal relationship between two nodes.
In addition to the descriptive and self-explanatory node names, Comments are associated with each node, as indicated by the information icon .
Note that the algorithm keeps searching for a better layout until you stop the process by clicking the red buttonto the left of the Progress Bar.
Furthermore, the Structural Priors icon appears in the bottom-right corner of the Graph Panel.
To view the Structural Priors obtained from ChatGPT, you can click on the Structural Priors icon or select Graph Panel Contextual Menu > Edit Structural Priors
.
The final column, Check, indicates whether the causal direction matches the current orientation or not .
In addition to utilizing ChatGPT, BayesiaLab's Hellixia subject matter assistant also employs DALL-E.
DALL-E is a variant of the GPT model designed to generate images from textual descriptions.
This functionality is useful for creating small images that visualize what the node represents.
To use the Image Generator, select the nodes for which you want an image produced.
Select Main Menu > Hellixia > Image Generator
.
In the Image Generator window, specify the fields that contain the subjects, i.e., the textual descriptions of the images to be generated. Check all that apply.
Under Context, you can state the overall domain of the image subjects, if applicable.
In Workflow 1, we exported a Structural Prior Dictionary, including the Causal Structural Priors, and then imported this dictionary as an Arc Dictionary to create a causal network with these priors.
In this Workflow 2, we will immediately utilize the Causal Structural Priors to machine-learning a new network without the export/import step.
However, these new Causal Structural Priors have not been used for updating the arc directions in the network.
Select Main Menu > Learning > Unsupervised Structural Learning > Taboo
.
Like Arc Constraints, Structural Priors, Temporal Indices, and Filtered States, Causal Structural Priors impose constraints on learning. As a result, EQ-based algorithms are not available under those conditions.
This newly learned network now reflects the causal order obtained from ChatGPT.
With the final arc directions in place, we should arrange the nodes into a more intuitive layout, i.e., positioning parent nodes above child nodes.
Select Main Menu > View > Layout > Genetic Grid Layout > Top-Down Repartition
.
So, our starting point is the machine-learned network, for which Hellixia has already obtained the Causal Structural Priors. The Structural Prior icon indicates that Structural Priors are associated with the network.
Note that the algorithm keeps searching for a better layout until you stop the process by clicking the red buttonto the left of the Progress Bar.
The Hellixia Node Translator is powered by ChatGPT and DeepL.
It allows you to easily translate the Node Names, the State Names, and any Node Comments into another language.
We use an unconnected network featuring 240 statements that are all related to personality and character traits, e.g., "I get angry easily", or "I smile a lot".
These statements are contained in the Node Names.
The Node Names are in English, and we want to translate them into German.
Select all nodes to be translated.
Then, select Main Menu > Hellixia > Node Translator.
In the Node Translator window, you can pick the target language from the dropdown menu.
You can also specify the Translator Model, e.g., GPT-3.5, GPT-4, or DeepL.
Finally, you need to check what Node Properties should be translated, e.g., Node Names, State Names, or Node Comments. Check all that apply.
Clicking OK starts the translation process.
Once the process has concluded, all node names appear in German.
In this second installment, we delve into a profound examination of yet another passage from Montaigne's work, 'Essais', the 'Liars' (Book I).
This is a formidable challenge for Hellixia, given that it relies on a translation from Old French.
When they disguise and change, when they are often put back on the same story, it is difficult for them not to make mistakes, because the thing as it is, having lodged itself first in memory and having been imprinted there by way of knowledge and science, it is difficult for it not to be represented in the imagination by dislodging the falsehood, which cannot have as firm and steady a foothold, and for the circumstances of the first learning not to cause the memory of the added, false or bastardized pieces to be lost. In what they invent completely, because there is no contrary impression that contradicts their falsehood, they seem to have all the less to fear to make mistakes. However, this fiction, because it is a vain and ungraspable body, readily escapes memory if it is not well secured. If, like truth, lies had only one face, we would be in a better position, for we would take the opposite of what the liar said as certain. But the reverse of truth has a hundred thousand faces and an indefinite field. The Pythagoreans posit that good is certain and finite, evil infinite and uncertain. A thousand roads deviate from the goal, only one leads to it.
This post is also linked to a discussion we had at Marcello Di Bello's presentation, "Cross-Examination with Bayesian Networks" (BayesiaLab Conference, 2022).
Create a new node: Start by creating a new node and label it as "Montaigne". This node will serve as a container for the text you want to analyze.
Enter the excerpt: Input the selected text into the "Montaigne" node as a comment.
Run the Dimension Elicitor, set the General Context to "Philosophy", and input "Keywords" as the keyword for the analysis of the node comment.
Review the dimensions: Examine the dimensions or keywords returned by Hellixia. Remove any dimensions that seem redundant or irrelevant to your analysis.
Use the Embedding Generator on all remaining nodes. This tool captures and quantifies the semantics associated with the names and comments of each node.
Set the target node: Set "Montaigne" as the Target Node. The subsequent analyses and operations will focus on this node.
Run the Naive Learning algorithm.
Change node styles: Alter the style of all nodes to "Badges". This style will display the comment within each node.
Switch to Validation Mode.
Run the Arc Force analysis.
Apply the Radial Layout: While still in the Arc Force analysis tool, run the Radial Layout. This layout arranges the nodes in a clockwise manner according to the strength of their relationships with the target node.
Show the Arc Comments: These comments will provide information about the strength of the relationships between nodes.
Copy the "Montaigne" node: Begin by copying the node titled "Montaigne".
Paste the node into a new graph: Create a new graph and paste the copied "Montaigne" node into it.
Run the Dimension Elicitor using the following keywords to guide the analysis of the node: Contents, Ideas, Milestones, Rules, Themes, Theses, and the General Context set to "Philosophy".
Review the returned dimensions: Examine the dimensions provided by Hellixia. Remove any dimensions that appear redundant or irrelevant to your analysis.
Exclude the "Montaigne" node.
Use the Embedding Generator on all remaining nodes. This will help capture the semantic associations of their names and comments.
Create a semantic network: Use the Maximum Weight Spanning Tree algorithm to form a semantic network from the analyzed text.
Change node styles to "Badges". This style will allow the comment within each node to be shown.
Apply the Dynamic Grid Layout: Use this layout option to organize the nodes on your graph. Note that this layout algorithm is not deterministic, meaning it doesn't always produce the same results given the same input. It randomly favors vertical, horizontal, or mixed orientations. Run this layout multiple times until you find a layout that best suits your preferences.
Switch to Validation Mode.
Select Skeleton View: Since the network you're generating does not represent causal relationships, choose the Skeleton View. This will remove the arc orientations, leaving only connections between nodes without indicating a direction.
Switch back to Modeling Mode.
Change node styles to Discs.
Symmetric Layout.
Enter Validation Mode.
Analyze Node Force.
Run Variable Clustering: This will identify and group similar variables based on their semantics.
Open the Class Editor.
Within the Class Editor, activate the Class Description Generator. Use it to create meaningful names for the factors you're working with.
Save the descriptions you've just created using the Export Descriptions feature.
Switch back to Modeling Mode.
Execute Multiple Clustering to create latent variables.
Next, execute the structural learning algorithm Taboo. Make sure to enable the option "Delete Unfixed Arcs." This should result in the creation of a hierarchical network.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation Mode.
Use Node Force.
Michel de Montaigne's "Essais" is a large collection (3 books) of many short subjective treatments of various topics published in 1580. Montaigne's stated design in writing, publishing, and revising the "Essais" over the period from approximately 1570 to 1592 was to record "some traits of [his] character and of [his] humours." The "Essais" were seen as an important work that established the essay as a recognized genre in literature. This work can be qualified as introspective philosophy.
Montaigne's "Essais" is not just a foundational work in the history of ideas; it's also a unique insight into the mind of one of the most curious and open thinkers in Western history. His observations on society, culture, and humanity are as relevant today as in the 16th century.
We are launching our 'Philosophical Minute' series with an excerpt from Book 2, 'Apology of Raimond de Sebonde'. We believe this passage serves as the perfect introduction to the series, as it refers to the profound wisdom of Socrates.
The wisest man who ever lived, when asked what he knew, replied that he knew that he knew nothing. He confirmed what is said, that the greatest part of what we know is the least of what we do not know: that is to say, even what we think we know is a small part of our ignorance.
Create a new node: Start by creating a new node and label it as "Montaigne". This node will serve as a container for the text you want to analyze.
Enter the excerpt: Input the selected text into the "Montaigne" node as a comment.
Run the Dimension Elicitor, set the General Context to "Philosophy", and input "Keywords" as the keyword for the analysis of the node comment.
Review the dimensions: Examine the dimensions or keywords returned by Hellixia. Remove any dimensions that seem redundant or irrelevant to your analysis.
Use the Embedding Generator on all remaining nodes. This tool captures and quantifies the semantics associated with the names and comments of each node.
Set the target node: Set "Montaigne" as the Target Node. The subsequent analyses and operations will focus on this node.
Run the Naive Learning algorithm.
Change node styles: Alter the style of all nodes to "Badges". This style will display the comment within each node.
Switch to Validation Mode.
Run the Arc Force analysis.
Apply the Radial Layout: While still in the Arc Force analysis tool, run the Radial Layout. This layout arranges the nodes in a clockwise manner according to the strength of their relationships with the target node.
Show the Arc Comments: These comments will provide information about the strength of the relationships between nodes.
Create a new node titled "Montaigne" that will contain the text we want to analyze.
Enter the excerpt as a comment within the "Montaigne" node.
Use the Dimension Elicitor with the General Context set to "Philosophy" and the keywords (Dimensions, Ideas, Themes, and Theses) to analyze the comment within your node. Review the dimensions that Hellixia returns, and remove any that appear to be redundant or irrelevant.
Apply the Embedding Generator to all remaining nodes, capturing the semantics related to their names and comments.
Exclude the "Montaigne" node.
Use the Maximum Weight Spanning Tree algorithm to create a semantic network that describes the analyzed text.
Change all node styles to Badges so that the comment within each node is displayed.
Apply the Dynamic Grid Layout to arrange the nodes.
Switch to Validation Mode.
Since the graph we're creating doesn't represent causal relationships, select the Skeleton View to remove any arc orientations.
Switch back to Modeling Mode.
Exclude the "Montaigne" node.
Change all node styles to the Discs format.
Enter Validation Mode.
Use the Symmetric Layout.
Analyze the Node Force.
Run Variable Clustering.
Open the Class Editor and utilize the Class Description Generator to assign meaningful names to the three factors you're dealing with.
Save these descriptions using the Export Descriptions feature.
Switch back to Modeling Mode.
Execute Multiple Clustering to create latent variables.
Run Taboo, enabling the option Delete Unfixed Arcs, to create a hierarchical network.
Rename the latent variables you've just created by using the previously exported descriptions as a dictionary for naming the node names.
Switch to Validation Mode.
Utilize the Node Force function.
Baruch Spinoza's "Ethics" (often referred to as "Ethica" from its Latin title "Ethica, ordine geometrico demonstrata", meaning "Ethics Demonstrated in Geometrical Order") is a philosophical treatise written in the mid-17th century. It is one of the most significant and controversial works of the Enlightenment, and it presents Spinoza's metaphysical, epistemological, moral, and political views.
The structure of "Ethics" is unique: it is laid out like a geometrical treatise, akin to Euclid's "Elements". Starting with definitions and axioms, Spinoza proceeds with propositions, proofs, corollaries, and scholia (notes), aiming to demonstrate his philosophy with mathematical precision.
In this particular semantic analysis, we explore one of the famous quotes from Ethics:
Desire is the very essence of man, insofar as it is conceived as determined to some action by any of its affections.
Start by creating a new node. Label this node "Spinoza".
Input the chosen excerpt of text into the comment section of the "Spinoza" node.
Use the keyword "Keywords" to guide the Dimension Elicitor in analyzing the comment in the "Spinoza" node. Specify the General Context for your analysis as "Philosophy". By setting this context, you are providing direction for the Dimension Elicitor to understand the broader topic of your text. The Dimension Elicitor will then identify and extract relevant dimensions or keywords from the comment.
Examine the dimensions or keywords that Hellixia has identified. Any dimensions that appear irrelevant or redundant should be removed from your analysis.
Use the Embedding Generator on all remaining nodes. This tool will quantify the semantics associated with the names and comments of each node.
Set the "Spinoza" node as your Target Node.
Run the Naive Learning algorithm.
Update the visual style of all nodes to appear as "Badges". This will allow the comments within each node to be displayed.
Switch to Validation Mode.
Run an Arc Force analysis.
Use the Radial Layout while you are still within the Arc Force analysis tool. This will arrange the nodes in a clockwise fashion based on the strength of their relationships with the target node.
Show the Arc Comments to visualize information regarding the strength of the relationships between the nodes.
Start by copying the node "Spinoza". Then, create a new graph and paste the node.
Utilize the Dimension Elicitor with the subsequent keywords: Ideas, Rules, Themes, Theses, Topics, and the General Context set to "Philosophy".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, exclude the "Spinoza" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network from the excerpt.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; bear in mind that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
The third episode of our Philosophical Minute post is about the famous philosophical statement by René Descartes, Cogito Ergo Sum, "I think, therefore I am." This statement is at the core of Western philosophy and is the starting point of Descartes' philosophical methodology, the foundational element of his metaphysics.
Descartes sought a fundamental element that could be beyond any doubt as a basis for all knowledge. He posited that the very act of doubting one's own existence served as proof of the reality of one's own mind. In essence, if one is questioning, then one must exist to be able to do so.
Considering that all the same thoughts that we have while awake can also come to us when we sleep, without any of them being true at that time, I resolved to pretend that all the things that had ever entered my mind were no more true than the illusions of my dreams.
But immediately afterwards, I noticed that while I wanted to think that everything was false, it was necessary that I, who was thinking, be something; and realizing that this truth, I think, therefore I am, was so firm and so certain that even the most extravagant suppositions of skeptics were not capable of shaking it, I judged that I could accept it without hesitation as the first principle of the philosophy I was seeking.
Start by creating a new node. Label this node "Descartes".
Input the chosen excerpt of text into the comment section of the "Descartes" node.
Run the Dimension Elicitor, set the General Context to "Philosophy", and input "Keywords" as the keyword for the analysis of the node comment.
Examine the dimensions or keywords that Hellixia has identified. Any dimensions that appear irrelevant or redundant should be removed from your analysis.
Use the Embedding Generator on all remaining nodes. This tool will quantify the semantics associated with the names and comments of each node.
Set the "Descartes" node as your Target Node.
Run the Naive Learning algorithm.
Update the visual style of all nodes to appear as "Badges". This will allow the comments within each node to be displayed.
Switch to Validation Mode.
Run an Arc Force analysis.
Use the Radial Layout while you are still within the Arc Force analysis tool. This will arrange the nodes in a clockwise fashion based on the strength of their relationships with the target node.
Show the Arc Comments to visualize information regarding the strength of the relationships between the nodes.
Start by copying the node "Descartes." Then, create a new graph and paste the node.
Utilize the Dimension Elicitor with the subsequent keywords: Arguments, Contents, Matters, Milestones, Rules, Themes, Theses, Topics, and the General Context set to "Philosophy."
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Descartes" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network from the excerpt.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; note that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Blaise Pascal's "Pensées" (which translates to "Thoughts" in English) is a collection of fragments on theology and philosophy. Pascal, a French mathematician, physicist, and religious philosopher, began writing "Pensées" as a defense of the Christian religion, but he died before he could complete the work. The fragments he left behind were posthumously assembled and published in 1670.
This Philosophical Minute centers around a passage from Pensées, which delves into the human propensity to neglect the present moment, habitually yearning for the future, or dwelling on the past.
We never care about the present. We anticipate the future as too slow to come, as if to hasten its course; or we recall the past to stop it as too quick: so careless, we wander in times that are not ours, and do not think of the only one that belongs to us; and so vain, we think of those that are nothing anymore, and let slip without reflection the only one that remains.
It is because the present, usually, hurts us. We hide it from our sight, because it afflicts us; and if it is pleasant to us, we regret seeing it slip away.
The present is never our end: the past and the present are our means; the only future is our end. Thus we never live, but we hope to live; and, always preparing to be happy, it is inevitable that we never are.
Create a new node: Start by generating a new node named "Blaise Pascal - Pensées". This node will hold the text that you plan to analyze.
Insert the text: Add the selected excerpt into the comment section of the "Blaise Pascal - Pensées" node.
Run the Dimension Elicitor, set the General Context to "Philosophy", and input "Keywords" as the keyword for the analysis of the node comment.
Assess the extracted dimensions: Evaluate the keywords or dimensions identified by Hellixia and eliminate any that are redundant or irrelevant.
Use the Embedding Generator for all remaining nodes. This tool will distill the semantics of the names and comments of each node into a quantifiable form.
Set "Blaise Pascal - Pensées" as the Target Node.
Run the Naive Learning algorithm.
Change the style of all nodes to "Badges". This style will display the comment embedded within each node.
Switch to Validation Mode.
Perform an Arc Force analysis.
While within the Arc Force analysis tool, run the Radial Layout. This will arrange the nodes in a clockwise pattern in relation to their connection strength with the target node.
Show the Arc Comments, which will provide information about the strength of the relationships between nodes.
Start by making a copy of the node named "Blaise Pascal - Pensées".
Open a new graph and paste the copied "Blaise Pascal - Pensées" node.
Use the following keywords to guide the Dimension Elicitor in its analysis of the node: Arguments, Matters, Milestones, Rules, Themes, Theses, Topics, and the General Context set to "Philosophy".
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "Blaise Pascal - Pensées" node.
Use the Embedding Generator on all remaining nodes.
Run the Maximum Weight Spanning Tree algorithm to create a semantic network based on the text analysis.
Change the style of all nodes to "Badges". This will display the comment within each node.
Run the Dynamic Grid Layout to organize the nodes on your graph. Note that this algorithm's output is not deterministic; it may favor vertical, horizontal, or mixed orientations. Execute this layout multiple times until you find the most suitable arrangement.
Switch to Validation Mode.
As the graph you are building does not represent causal relationships, opt for the Skeleton View. This will remove all arc directions, leaving only the node connections without any specified direction.
Switch back to Modeling Mode.
Change all node styles to Discs.
Use the Symmetric Layout to organize your nodes in the graph.
Go to Validation Mode.
Conduct a Node Force analysis to evaluate the strength of associations in your graph.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor.
Run Class Description Generator: Use this function to generate descriptive names for your identified factors. This helps to make the output more understandable and interpretable.
Save these descriptions by using the Export Descriptions function.
Switch back to Modeling Mode.
Run Multiple Clustering.
Run the Taboo algorithm: Use this structural learning algorithm to learn a hierarchical network. Make sure to enable the "Delete Unfixed Arcs" option to remove unnecessary connections and streamline your model.
Use the descriptions you exported earlier as a dictionary to rename the latent variables you've just created. This helps in making your model more understandable and keeps the nodes' names consistent with their semantic meaning.
Switch to Validation Mode.
Apply Node Force.
In this fifth episode, we delve into another passage from Blaise Pascal's Pensées. This particular segment sheds light on the compromise required to uphold societal harmony, a state considered the highest form of good.
Without doubt, the equality of goods is just; but, unable to make it force to obey justice, we have made it just to obey force; unable to strengthen justice, force was justified, so that the just and the strong might be together, and peace might be, which is the sovereign good.
Create a new node: Start by generating a new node named "Blaise Pascal - Pensées". This node will hold the text that you plan to analyze.
Insert the text: Add the selected excerpt into the comment section of the "Blaise Pascal - Pensées" node.
Run the Dimension Elicitor, set the General Context to "Philosophy", and input "Keywords" as the keyword for the analysis of the node comment.
Assess the extracted dimensions: Evaluate the keywords or dimensions identified by Hellixia and eliminate any that are redundant or irrelevant.
Use the Embedding Generator for all remaining nodes. This tool will distill the semantics of the names and comments of each node into a quantifiable form.
Set "Blaise Pascal - Pensées" as the Target Node.
Run the Naive Learning algorithm.
Change the style of all nodes to "Badges". This style will display the comment embedded within each node.
Switch to Validation Mode.
Perform an Arc Force analysis.
While within the Arc Force analysis tool, run the Radial Layout. This will arrange the nodes in a clockwise pattern in relation to their connection strength with the target node.
Show the Arc Comments, which will provide information about the strength of the relationships between nodes.
Start by making a copy of the node named "Blaise Pascal - Pensées".
Open a new graph and paste the copied "Blaise Pascal - Pensées" node.
Use the following keywords to guide the Dimension Elicitor in its analysis of the node: Arguments, Contents, Ideas, Matters, Milestones, Motifs, Rules, Themes, Theses, Topics, and the General Context set to "Philosophy".
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "Blaise Pascal - Pensées" node.
Use the Embedding Generator on all remaining nodes.
Run the Maximum Weight Spanning Tree algorithm to create a semantic network based on the text analysis.
Change the style of all nodes to "Badges". This will display the comment within each node.
Run the Dynamic Grid Layout to organize the nodes on your graph. Note that this algorithm's output is not deterministic; it may favor vertical, horizontal, or mixed orientations. Execute this layout multiple times until you find the most suitable arrangement.
Switch to Validation Mode.
As the graph you are building does not represent causal relationships, opt for the Skeleton View. This will remove all arc directions, leaving only the node connections without any specified direction.
Switch back to Modeling Mode.
Change all node styles to Discs.
Use the Symmetric Layout to organize your nodes in the graph.
Go to Validation Mode.
Conduct a Node Force analysis to evaluate the strength of associations in your graph.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor.
Run Class Description Generator: Use this function to generate descriptive names for your identified factors. This helps to make the output more understandable and interpretable.
Save these descriptions by using the Export Descriptions function.
Switch back to Modeling Mode.
Run Multiple Clustering.
Run the Taboo algorithm: Use this structural learning algorithm to learn a hierarchical network. Make sure to enable the "Delete Unfixed Arcs" option to remove unnecessary connections and streamline your model.
Use the descriptions you exported earlier as a dictionary to rename the latent variables you've just created. This helps in making your model more understandable and keeps the nodes' names consistent with their semantic meaning.
Switch to Validation Mode.
Apply Node Force.
Welcome to the eighth installment of the Philosophical Minute, where we continue our exploration of the works of Baruch Spinoza. Today's focus is a captivating foray into Spinoza's reflections on desire, and its profound influence on our perceptions of good and evil:
We consider good the thing that we desire; and consequently, we call the thing that inspires us with aversion, bad; so that everyone judges according to their passions what is good or bad, what is better or worse, what is most excellent or most contemptible.
Spinoza, in his meticulous examination, sheds light on the intrinsic nature of desire and its pivotal role in shaping human behavior and ethics. How does what we desire dictate our moral compass? Why do we perceive certain desires as virtuous and others as vice? Spinoza's insights into these questions offer a deep dive into the undercurrents of human psychology and the constructs of morality.
Node Creation: Start by generating a new node. Name it "Spinoza".
Text Inclusion: Insert your chosen text excerpt into the comment section of this "Spinoza" node.
Dimension Elicitation: Use the Dimension Elicitor with the keyword "Keywords" to analyze the comment within the "Spinoza" node. Define the General Context as "Philosophy". This context directs the elicitor to frame the analysis within the broader realm of philosophical discourse.
Dimension Review: Evaluate the dimensions or keywords identified by Hellixia. Remove any that seem redundant or not pertinent to your objective.
Semantic Quantification: Run the Embedding Generator for all the nodes that are still in play. This process translates the semantic elements of each node's name and comments into quantifiable metrics.
Target Node Designation: Designate "Spinoza" as your primary or target node.
Learning Algorithm: Launch the Naive Learning algorithm.
Visualization: Alter the visual representation of every node to the "Badges" style. It ensures that the comments associated with each node are directly visible.
Validation: Transition your workspace to the Validation Mode.
Arc Analysis: Run the Arc Force analysis.
Graph Layout: While still in the Arc Force analysis tool, run the Radial Layout. This method organizes nodes in a circle around your target node, positioning them based on the strength of their connection to the target.
Arc Visualization: Activate the Arc Comments. This feature superimposes a visualization layer on your network, displaying information about the arcs' strengths.
Start by copying the node "Spinoza". Then, create a new graph and paste the node.
Utilize the Dimension Elicitor with the subsequent keywords: Arguments, Ideas, Matters, Milestones, Motifs, Rules, Themes, and the General Context set to "Philosophy".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, exclude the "Spinoza" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network from the excerpt.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; bear in mind that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Switch back to Modeling Mode and change the visual representation of each node to the "Discs" style. The disc style offers a clean and straightforward visual, which might be easier to interpret in some contexts compared to the badge style.
Use the Symmetric Layout tool.
Switch to Validation Mode and run the Node Force analysis.
Carry out Variable Clustering: This action will group similar variables together based on their semantic connections.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Switch back to Modeling Mode and run Multiple Clustering to produce latent variables.
Launch the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Continuing with Baruch Spinoza's Ethics (see Episode 6), we focus on another passage about desire, determinism, and perceived free will in human actions.:
All men are born in ignorance of causes, and a universal appetite of which they are conscious drives them to seek what is useful to them.
A first consequence of this principle is that men believe they are free, because they are conscious of their volitions and desires, and do not think at all about the causes that predispose them to desire and to want.
The result, secondly, is that men always act with an end in mind, namely, their own utility, the natural object of their desire.
The supreme end of man, guided by reason, his supreme desire, this desire by which he strives to regulate all others, is therefore the desire that drives him to adequately understand both himself and all things that fall within his comprehension.
Start by creating a new node. Label this node "Spinoza".
Input the chosen excerpt of text into the comment section of the "Spinoza" node.
Use the keyword "Keywords" to guide the Dimension Elicitor in analyzing the comment in the "Spinoza" node. Specify the General Context for your analysis as "Philosophy". By setting this context, you are providing direction for the Dimension Elicitor to understand the broader topic of your text. The Dimension Elicitor will then identify and extract relevant dimensions or keywords from the comment.
Examine the dimensions or keywords that Hellixia has identified. Any dimensions that appear irrelevant or redundant should be removed from your analysis.
Use the Embedding Generator on all remaining nodes. This tool will quantify the semantics associated with the names and comments of each node.
Set the "Spinoza" node as your Target Node.
Run the Naive Learning algorithm.
Update the visual style of all nodes to appear as "Badges". This will allow the comments within each node to be displayed.
Switch to Validation Mode.
Run an Arc Force analysis.
Use the Radial Layout while you are still within the Arc Force analysis tool. This will arrange the nodes in a clockwise fashion based on the strength of their relationships with the target node.
Show the Arc Comments to visualize information regarding the strength of the relationships between the nodes.
Start by copying the node "Spinoza". Then, create a new graph and paste the node.
Utilize the Dimension Elicitor with the subsequent keywords: Arguments, Ideas, Matters, Milestones, Motifs, Rules, Themes, and the General Context set to "Philosophy".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, exclude the "Spinoza" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network from the excerpt.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; bear in mind that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Delve into the intricate world of John Locke's "Two Treatises of Government" in this dedicated section. Using the power of Hellixia, we aim to dissect this seminal work, which stands as a cornerstone of modern political philosophy. The text, rooted in the theories of natural rights and the social contract, has played a pivotal role in shaping democratic governance and individual liberties. Through our in-depth analysis, we will construct semantic networks that elucidate Locke's arguments, laying bare the foundational principles of his thoughts on society, governance, and the very nature of human rights. Join us on this enlightening journey as we navigate the depths of "Two Treatises," unraveling its philosophical intricacies and enduring relevance.
Start by creating the node "Two Treatises of Government, by John Locke".
Use the Dimension Elicitor, employing a broad array of keywords like "Achievements", "Considerations", "Concepts", and many more, to conduct an exhaustive analysis of the essay (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Two Treatises of Government, by John Locke" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our section where we utilize the power of Hellixia to explore the fascinating world of literature. Here, we go beyond conventional textual analyses to create semantic networks, unraveling the rich layers of classics such as 'Hamlet' by William Shakespeare, 'Madame Bovary' by Gustave Flaubert, 'A Tale of Two Cities' by Charles Dickens, and 'Middlemarch' by George Eliot. But our exploration doesn't stop at individual works. We also delve into the relationships between authors from diverse styles - from magic realism and gothic fiction to surrealism and science fiction, and beyond. This innovative approach illuminates the subtle interconnections within and across genres.
Let's embark on this literary journey together, weaving semantic networks that capture the unique essence of literary works, authors, and genres, and provide a refreshing perspective on the magnificent tapestry of literature.
Embark on a literary journey with us as we use Hellixia to uncover the rich interconnections among hundreds of authors, spanning a variety of literary styles such as magic realism, gothic fiction, surrealism, and science fiction. By mapping these intricate relationships, our semantic network becomes a personalized guide, helping you discover your next potential favorite read. This network is not just a visual tool, it's your passport to uncharted literary territories, ready to guide your reading adventure.
Start by creating the node "Magic Realism".
Utilize the Dimension Elicitor with "Competitors" as the guiding keyword, setting the General Context to "Literature Style", to discover other literary styles.
Inspect the dimensions Hellixia generates and discard any that appear irrelevant or extraneous to your analysis.
Select all nodes.
Run the Dimension Elicitor again using "Members" as the keyword and "Influential Writers" as the General Context. This process aims to discover influential writers for each style, focusing on Node Comments as the Main Subject of the Query. Set the Responses per Keyword parameter to 20 to get a wide range of results.
Inspect the resulting dimensions from Hellixia and remove any that appear irrelevant or superfluous.
Repeat the last 2 steps with the Node Names and Comments as Main Subject of the Query. This will enable the discovery of additional writers.
Use the Maximum Weight Spanning Tree algorithm to create a semantic network.
Change node styles to Badges to display each node's comment.
Apply the Dynamic Grid Layout for positioning the nodes on your graph. This algorithm is not deterministic and favors vertical, horizontal, or mixed orientations randomly. Running this layout multiple times might be necessary until you achieve an arrangement that suits your preferences.
Switch to Validation Mode and activate Skeleton View. As your network does not represent causal relations, the Skeleton View will only show the connections between nodes without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Optional: Delete all Arcs. This can be helpful for achieving a cleaner graph layout.
Use the Distance Mapping algorithm based on Mutual Information. This algorithm creates a 2D layout where the nodes' distances are proportional to the semantic proximity between the nodes (considering both names and comments).
Step into the world of William Shakespeare's "Macbeth," a profound tragedy that navigates the treacherous terrain of ambition, power, and the human psyche. In this section, we'll embark on a comprehensive exploration of this iconic play, divided into two illuminating parts:
1. Narrative Analysis: We'll dissect the plot's complexities, unravel character dynamics, and spotlight key events that shape Macbeth's tragic trajectory.
2. Holistic Analysis: Beyond the surface, we'll step back to capture overarching themes, moral implications, and the timeless resonance that gives "Macbeth" its enduring impact.
Join us on this analytical odyssey as we traverse the profound layers of Shakespeare's masterpiece, using semantic networks to illuminate its essence and offer fresh insights into the complexities of the human condition.
Uncover the plot's intricacies, character dynamics, and pivotal moments in the dedicated narrative analysis of "Macbeth." With the guidance of Hellixia, we'll unravel the story's threads, shedding light on the twists and turns that drive this iconic tragedy.
Start by creating the node "Macbeth, by Shakespeare."
Use the Dimension Elicitor, employing a broad array of keywords: Agents, Contexts, Developments, Entities, Events, Highlights, Keywords, Locations, Milestones, Motifs, Progressions, and Relationships.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Macbeth, by Shakespeare" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors. Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Transitioning from the narrative, our focus shifts to the broader canvas of "Macbeth." Through Hellixia's lens, we'll delve into overarching themes, explore moral complexities, and unearth the enduring significance beneath the surface.
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Theses, and Values.
In the vast tapestry of Shakespearean tragedies, "King Lear" stands out as a potent tale of familial strife, ambition, and the relentless quest for power. As we journey into this masterwork, we find ourselves amidst the tumultuous relationships of a king with his children, set against the backdrop of a kingdom in disarray.
Having ventured into the intricate worlds of "Hamlet" and "Macbeth", we now shift our focus to this powerful narrative. Our exploration is structured in two parts: we begin with a detailed narrative analysis, diving deep into plot intricacies and character dynamics. Following this, we transition to a holistic analysis, capturing overarching themes, motives, and the very essence that makes "King Lear" a cornerstone of literary greatness.
With the precision of Hellixia guiding our analysis, join us in this enlightening expedition as we endeavor to unveil the complexities and profundities that Shakespeare so masterfully wove into the fabric of "King Lear".
Navigating "King Lear", our narrative analysis dissects the play's pivotal events and character dynamics. We'll unravel the tale of a father, his daughters, and a kingdom in turmoil, shedding light on Shakespeare's intricate storytelling.
Start by creating the node "King Lear, by Shakespeare."
Use the Dimension Elicitor, employing a broad array of keywords: Agents, Contexts, Developments, Entities, Events, Highlights, Keywords, Locations, Milestones, Motifs, Progressions, and Relationships.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "King Lear, by Shakespeare" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors. Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Stepping beyond the immediate narrative, our holistic examination delves into the deeper themes, sentiments, and philosophical underpinnings of "King Lear." This lens allows us to grasp the timeless essence and profound messages that Shakespeare interwove within the play's fabric.
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Theses, and Values.
Welcome to our in-depth analysis of "The Demon" by Hubert Selby Jr. In this concise yet comprehensive section, we use Hellixia to facilitate a two-part exploration of this riveting novel.
First, we embark on a narrative analysis, dissecting the plot and characters to reveal the underlying themes that Selby skillfully interweaves throughout the story. This part offers a vivid glimpse into Selby's dark and immersive world.
Next, we transition to a holistic analysis, where we zoom out to evaluate the novel's broader philosophical and societal undertones. This segment intends to illuminate the novel's intricate interplay of themes, values, and impacts, showcasing its rich complexity and literary significance.
Join us for this enriching journey that offers a fresh and insightful perspective on "The Demon".
In this first segment, we focus on the narrative intricacies of "The Demon". Through Hellixia's lens, we will dissect the vibrant characters and the entwined plot that makes Selby's novel an evocative journey.
Start by creating the node "The Demon."
Use the Dimension Elicitor, employing a broad array of keywords: Agents, Contexts, Developments, Entities, Events, Highlights, Keywords, Locations, Milestones, Motifs, Progressions, and Relationships. Set also the General Context to: "Hubert Selby Novel"
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Demon" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments. Please note that "The Demon" is not as widely recognized, and GPT might hallucinate, i.e., occasionally generate responses that align with other more prominent works by Selby, such as "Requiem for a Dream".
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors. Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Moving forward, we transition to a more expansive view in our holistic analysis. Utilizing Hellixia, we aim to delve deeper, exploring the broader themes, societal influences, and underlying philosophies encapsulated in "The Demon".
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Themes, Theses, Topics, and Values.
Welcome to a deep dive into the depths of "A Tale of Two Cities," Charles Dickens' renowned novel that weaves a tapestry of intertwined lives against the backdrop of the French Revolution. With the help of Hellixia, we will create a detailed semantic network that exposes the complex relationships and themes embedded within this literary masterpiece. From its iconic characters and their motivations to the social and political currents driving the narrative, we'll explore the intricate layers that make this novel a timeless classic. Brace yourself for a journey through love, sacrifice, and redemption as we unravel Dickens' narrative in a way you've never seen before.
Start by creating the node "A Tale of Two Cities".
Use the Dimension Elicitor, employing a broad array of keywords like "Agents", "Aspects", "Components", "Milestones", and many more, to conduct an exhaustive analysis of the book (see the exhaustive list of keywords below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "A Tale of Two Cities" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
"Madame Bovary" is a novel written by the French author Gustave Flaubert, published in 1857. It is one of the most influential literary works of the 19th century and is widely regarded as a seminal work of realism in literature. Flaubert's meticulous attention to detail and his pursuit of the "mot juste" (the exact right word) have made the novel a benchmark in the development of the modern novel.
Flaubert's portrayal of Emma Bovary is complex and multi-dimensional. While she can be seen as self-centered and even morally corrupt, she is also a victim of her environment, upbringing, and limited means of escaping her circumstances.
Semantic networks produced by Hellixia reveal the relationship between the characters and the structure of themes with unprecedented clarity.
Start by creating the node "Madame Bovary".
Use the Dimension Elicitor, employing a broad array of keywords like "Agents", "Aspects", "Components", "Milestones", and many more, to conduct an exhaustive analysis of the book (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Madame Bovary" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and change the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our film analysis section, where we use Hellixia's capabilities to delve into the intricate narratives of iconic movies like "The Good, The Bad, and The Ugly" and "Apocalypse Now." With Hellixia's assistance, we'll generate semantic networks that capture these films' complex character relationships, thematic depth, and contextual subtleties. From the moral and psychological complexities of warfare depicted in "Apocalypse Now" to the multi-layered exploration of good and evil in "The Good, The Bad, and The Ugly," our analyses will offer a fresh perspective on these cinematic masterpieces. This section is a cinephile's dream, providing an engaging blend of art and technology to deepen our understanding and appreciation of film.
Welcome to a comprehensive analysis of "The Good, The Bad, and The Ugly," a quintessential spaghetti western directed by the legendary Sergio Leone. With the power of Hellixia, we will create a detailed semantic network, offering an in-depth exploration into this cinematic masterwork. We will dissect its iconic characters, intricate plot lines, dramatic settings, and the moral dilemmas they embody. This film's subtle commentaries on good, evil, and the gray areas in between will be laid bare through our network. Prepare for a fascinating journey as we unravel the intricate layers of "The Good, The Bad, and The Ugly," a film that forever changed the landscape of western cinema.
Start by creating the node "The Good, the Bad and the Ugly".
Use the Dimension Elicitor, employing a broad array of keywords like "Achievements", "Characteristics", "Components", "Milestones", and many more, to conduct an exhaustive analysis of the book (see the exhaustive list of keywords below). Set the General Context to "Sergio Leone Movie".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Good, the Bad and the Ugly" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Embark with us on a journey into James Joyce's "Ulysses," a literary masterpiece revered for its complexity and depth. Leveraging Hellixia, we will navigate the intricate labyrinth of themes, symbols, and linguistic innovations present in the text. From exploring the psychological depths of its characters to interpreting its myriad of allusions, we will construct a comprehensive semantic network that illuminates the intricate facets of "Ulysses." Prepare for a compelling expedition into the heart of Joyce's modernist vision, a textual exploration that unravels the compelling richness of this universally admired work.
Start by creating the node "Ulysses"
Use the Dimension Elicitor, employing a broad array of keywords like "Characteristics", "Emotions", "Features", "Strengths, "Traits," and "Weaknesses" to conduct an exhaustive analysis of the book. Set the General Context to "James Joyce."
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Ulysse" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Welcome to an in-depth analysis of "Apocalypse Now," Francis Ford Coppola's seminal film that probes into the heart of darkness represented by the Vietnam War. Utilizing Hellixia, we will generate a sophisticated semantic network to illuminate the complex themes, characters, and cinematic techniques of this iconic film. From its profound critique of war and colonialism to its exploration of human nature and morality, we'll dissect the multi-layered narrative that defines this cinematic masterpiece. Strap in for an intellectual journey as we delve into the chaotic world of "Apocalypse Now" and shine a light on its profound commentary on the human condition.
Start by creating the node "Apocalypse Now".
Use the Dimension Elicitor, employing a broad array of keywords like "Achievements", "Characteristics", "Components", "Milestones", and many more, to conduct an exhaustive analysis of the book (see the exhaustive list of keywords below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Apocalypse Now" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our exploration of Sergio Leone's epic masterpiece, "Once Upon a Time in America." Spanning decades, this cinematic tour de force weaves a complex tale of friendship, ambition, betrayal, and redemption against the backdrop of organized crime in 20th-century America.
Leone's storytelling prowess, coupled with a haunting score by Ennio Morricone and remarkable performances by a stellar cast, including Robert De Niro and James Woods, make this film an unforgettable journey through time and human emotion.
From the gritty streets of New York's Lower East Side to the lavish elegance of 1960s' Manhattan, "Once Upon a Time in America" unfolds its narrative with a richness and complexity rarely seen in cinema. The film's non-linear structure, exquisite cinematography, and deeply layered themes make it an object of fascination and study.
Join us as we delve into this magnum opus, unraveling its intricate narrative threads and uncovering the symbolism, motifs, and philosophical undertones that elevate this movie to the status of timeless art. Whether you're revisiting this classic or discovering it for the first time, our analysis promises to provide new insights into a film that continues to captivate audiences worldwide.
Start by creating the node "Once Upon a Time in America".
Use the Dimension Elicitor, employing a broad array of keywords like "Achievements", "Characteristics", "Components", "Milestones", and many more, to conduct an exhaustive analysis of the book (see the exhaustive list of keywords below). Set the General Context to "Sergio Leone Movie".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Once Upon a Time in America" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our section dedicated to the profound world of song lyrics, where we harness the capabilities of Hellixia to dissect and interpret musical narratives. Venturing beyond mere words, we craft semantic networks that spotlight the underlying stories and sentiments of iconic tracks like "Mercy Seat," "Red Right Hand," "Last Great American Whale," and "Jungleland." We aim to unravel the richness of these compositions, gleaning insights into their essence and cultural resonance. Dive deep with us as we illuminate the intricate nuances of these songs, offering a fresh, interconnected perspective on their lyrical artistry.
Join us as we plunge into the lyrical depths of "Last Great American Whale", a song by the iconic musician Lou Reed. Known for his distinctive storytelling and unique blend of rock, this track from his 1989 album, 'New York', stands as a testament to Reed's keen observation of American society and culture.
In "Last Great American Whale", Reed weaves a tale that resonates with environmental and social commentary, a narrative that's as poignant today as it was when first penned. To navigate through this multifaceted piece of music, we'll be enlisting the aid of Hellixia, BayesiaLab's subject matter assistant.
Harnessing Hellixia's ability to create intricate semantic networks, we aim to dissect the themes, motifs, and narratives hidden within Reed's lyrics. This song, ripe with symbolism and metaphor, offers a rich landscape for such analysis.
From the overarching narratives of environmentalism and social critique to the individual threads of American culture, Hellixia will guide us through the complex lyrical world that Reed has created. So come, immerse yourself in the rhythm and the words, as we unravel the enigma of Lou Reed's 'Last Great American Whale'.
Start by creating the node "Last Great American Whale".
Use the Dimension Elicitor, employing a broad array of keywords like "Developments", "Influencers", "Events", "Entities", and many more, to conduct an exhaustive analysis of the song lyrics (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Last Great American Whale" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
In the realm of advanced data analysis and knowledge modeling, understanding the underpinnings of causality is crucial. Hellixia, at the forefront of this analytical revolution, offers a set of functions dedicated to causality, enabling the generation of Causal Semantic Networks (CSN) and Causal Bayesian Networks (CBN). This section is aimed at engineers and researchers seeking to unravel the complexity of cause-and-effect relationships in their field.
Welcome to our in-depth analysis of "The Mercy Seat," an iconic song by Nick Cave. Through this exploration, we will delve into the intricate narratives and powerful emotions embedded within the song. Using Hellixia, we will construct a semantic network that reveals the song's complex themes and the relationships among them, shedding light on the profound depths of Cave's storytelling. Join us as we journey into the haunting world of "The Mercy Seat."
Start by creating the node "Τhe Mercy Seat".
Use the Dimension Elicitor with a broad array of keywords like "Achievements," "Characteristics," "Ideas," and "Impacts" (see the exhaustive list below), and set the General Context to "Nick Cave Song." By doing so, you're informing the tool to approach the analysis with the perspective to a song by Nick Cave.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Mercy Seat" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Prepare to embark on an explorative journey through "Jungleland," a sonic masterpiece by none other than the legendary Bruce Springsteen. An epic closure to his breakthrough album 'Born to Run', released in 1975, "Jungleland" is a symphony of vivid storytelling, resounding saxophone solos, and the raw intensity that characterizes Springsteen's work.
In the rich tapestry of "Jungleland", Springsteen paints a picture of urban struggle and young love, masterfully set against the backdrop of a gritty cityscape. His intricate lyrics tell a tale that's profoundly human and deeply emotive.
To guide us through the labyrinth of Springsteen's poetic narrative, we'll be utilizing Hellixia, BayesiaLab's subject matter assistant. Harnessing the power of Hellixia's semantic network generation, we will delve into the depths of Springsteen's lyrics, dissecting the themes, metaphors, and underlying emotions that make "Jungleland" a celebrated piece of musical storytelling.
From the hustle of the city streets to the poignant silent reverence in the face of loss, Hellixia will enable us to explore the intricate interplay of love, struggle, and resilience in "Jungleland". So join us as we navigate the urban landscape of Springsteen's imagination, diving into the heart of his narrative genius.
Start by creating the node "Jungleland".
Use the Dimension Elicitor, employing a broad array of keywords like "Milestones", "Agents", "Connections", "Forces", and many more, to conduct an exhaustive analysis of the song lyrics (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Jungleland" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Step into the enigmatic realm of Nick Cave & The Bad Seeds with their masterful song, "Red Right Hand." Renowned for its rich imagery and profound thematic undertones, this song offers a narrative tapestry begging to be unraveled. Leveraging Hellixia, our exploration will commence with a narrative analysis of the lyrics, delving deep into the song's storytelling elements. Following this, we'll transition into a holistic examination, piecing together the broader themes and emotional resonances that Cave artfully embeds. Join us as we navigate this iconic track's poetic and musical depths.
Let's delve into the very fabric of "Red Right Hand," examining its lyrical landscape to uncover the embedded stories, motifs, and emotions they evoke.
Start by creating the node "Lyrics of The Red Right Hand, by Nick Cave & the Bad Seeds."
Input the lyrics into the comment section of the node:
Use the Dimension Elicitor, employing the keywords "Agents, Keywords, Events, Relationships, Developments, Contexts, Highlights, Milestones, Entities, Progressions, Motifs, and Locations" to conduct an exhaustive narrative analysis of the lyrics.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Lyrics of The Red Right Hand, by Nick Cave & the Bad Seeds" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and change the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Moving beyond the narrative, we'll now capture the broader essence of "Red Right Hand," exploring its overarching themes, sentiments, and the cultural resonances embedded within.
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Themes, Theses, and Values.
Join us as we delve into a detailed examination of the New Deal, an essential historical period shaped by the repercussions of the Great Depression. Triggered by our reading of John Steinbeck's poignant "The Grapes of Wrath," we will harness the power of Hellixia to create a causal semantic network. This network will depict the policies enacted during the New Deal and explore their cause-and-effect relationships. Through this analysis, we aim to shed light on the complex interplay between economic conditions, policy decisions, and societal outcomes during this transformative era in American history.
Create a node named "New Deal".
Use the following keywords to guide the Dimension Elicitor's node analysis: Characteristics, Causes, Elements, Keywords, Features, Years, Dimensions, Definitions, Traits, Outcomes, Factors, Consequences, Aims, Descriptions, and Goals.
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "New Deal" node.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Utilize Hellxia's Causal Structural Priors to evaluate whether the correlations highlighted by the maximum spanning tree indeed signify causal relationships.
Inspect the Structural Priors Explanations suggested by Hellixia. Any priors that are irrelevant should be removed.
Run the Taboo Learning algorithm with the remaining Structural Priors. These priors will reduce the cost of adding arcs that embody these causal relations.
Use Hellxia's Causal Structural Priors again to examine if the highlighted correlations from the maximum spanning tree suggest causal relationships.
Repeat the above three steps as necessary until the model is satisfactory.
Inspect the final set of Structural Priors Explanations and remove any irrelevant priors.
Export the Structural Priors.
Delete all arcs.
Use the saved Structural Priors as an arc dictionary. This will generate a causal network based on these priors.
Utilize the Structural Priors as an arc comment dictionary to store the descriptions of the causal relationships.
Apply the Genetic Grid Layout algorithm to neatly arrange the nodes on your graph, reflecting the causal directionality.
The screenshot below displays the explanations associated with the Structural Priors. These explanations detail the causal relationships and the logical connections inferred by Hellixia's analysis between the different nodes in the network. They provide valuable insights into the underlying structure and dynamics of the system being studied.
The blue icon in the 'Check' column signifies that an arc in the network currently represents the corresponding structural prior. If there's a red icon, it indicates that the arc is reversed. If there's no icon at all, it denotes the arc is absent from the network. The only red icon in the example below reflects that Hellixia identified a causal explanation in both directions.
This section showcases how to use Hellixia to create semantic networks created around a wide array of subjects, including authors, philosophers, dogs, and more.
In the complex and dynamic field of air transport, it is crucial for airlines to understand and mitigate flight delays. With the advent of sophisticated analytical tools like Hellixia, we now have the opportunity to delve deeper into the causal factors behind these delays. This article explores the innovative application of Hellixia in the creation of Causal Bayesian Networks (CBN), a method that transcends traditional data analysis to uncover the root causes of flight delays.
Using Hellixia for this purpose represents a significant advance in the field of causal analysis. By building causal Bayesian networks, we can map the complex web of factors contributing to delays, from weather conditions to logistical challenges.
In the following sections, we'll look at how Hellixia facilitates the construction of these causal networks, and the insights they provide into the management and prevention of flight delays.
First, we will perform a semantic analysis of the domain to obtain an overview of the key concepts and variables within the aircraft delay domain.
For our analysis of "Delays in scheduled flight departures", we start by building a semantic network, followed by a hierarchical semantic network, similar to our previous workflows (for example, as demonstrated with Hamlet). This process is essential for mapping the semantic landscape surrounding flight delays, providing a solid foundation for understanding the underlying dynamics of this issue.
We begin our analysis by creating a node entitled "Delays in scheduled flight departures" and proceed to use Hellixia's Dimension Elicitor, using two distinct groups of keywords: 'Ancestors' and 'Descendants'. This approach allows us to explore in depth the factors leading to and resulting from flight delays.
We carefully examine the dimensions provided by Hellixia, removing any that seem extraneous or irrelevant to our analysis. Next, we exclude 'Delays in scheduled flight departures' and run the Embedding Generator on the remaining nodes. This step is crucial to understanding the semantic relationships linked to their names and comments.
We have two large sets of nodes: one representing "Ancestors" (42 nodes) and the other "Descendants" (69 nodes). Our approach is to learn a separate network for each group. To do this, we define specific constraints that prohibit relationships between nodes that do not belong to the same class.
We then run the Maximum Weight Spanning Tree algorithm to find the most significant semantic relationships between nodes.
To improve visibility, we change the node styles to Badges, clearly displaying the comment associated with node. Next, we run the Dynamic Grid Layout to position the nodes on the graph. It's important to note that this algorithm is not deterministic, resulting in random orientations - vertical, horizontal or mixed. As a result, we may have to apply this layout several times to get a configuration that matches your preferences.
Next, we switch to Validation Mode and opt for the Skeleton View. In this context, since our network doesn't represent causal relationships, this view is particularly useful as it retains only the connections between nodes, omitting direction indicators.
Next, we run Variable Clustering. This step categorizes variables that are similar, grouping them based on the semantic relationships identified between them.
We can now proceed with the creation of two hierarchical semantic networks.
Opening Class Editor: We begin by accessing the Class Editor and then running the Class Description Generator. This generates descriptive names for the factors we're examining.
Exporting Descriptions: Next, we use the Export Descriptions function to save the newly created factor descriptions.
Returning to Modeling Mode: We then switch back to Modeling Mode and conduct Multiple Clustering to create latent variables.
Running the Structural Learning Algorithm (Taboo): We run the Taboo algorithm for structural learning, ensuring that the Delete Unfixed Arcs option is selected.
Renaming Latent Variables with Exported Descriptions: We utilize the descriptions we previously exported as a Dictionary to rename the latent variables, adding clarity to our model.
Switching to Validation Mode and Running Node Force: Finally, we go back to Validation Mode and run the Node Force analysis, which helps us understand the dynamics and strength of the connections within our network.
Having established a global understanding of the domain via semantic networks, we're now ready to move forward with the construction of causal Bayesian networks, taking advantage of the latest capabilities introduced in Hellixia as part of BayesiaLab version 11.2.
We initiate the process by creating a node named Delays in Scheduled Flight Departures and then proceed to use the Causal Network Generator feature.
After one or two minutes, due to the complexity of the prompt, we manage to generate a small but fully specified causal Bayesian network (graph and probabilities). This network features directed arcs to signify causal relationships, with each arc accompanied by a succinct explanation of its causal link and an estimate of the effect, scaled from -100 (shown in red) to 100 (shown in blue).
To differentiate nodes by depth using different colors, we first run the Edit Class function. Next, we select Generate a Predefined Class - Depth. Next, we select the four depth classes that have been created and apply the Colors - Associate Random Colors with Classes function to assign distinct colors to each class.
Nodes marked with an icon representing a function are parameterized using BayesiaLab's new DualNoisyOr() formula. This formula integrates both positive and negative interactions between Boolean variables (the causal effects returned by Hellixia).
By selecting the Create Corresponding Structural Priors option in the Causal Network Generator wizard, we now have access to Structural Priors. The value of each prior is derived from the absolute value of the causal effect returned by Hellixia. In addition, the explanation provided for each prior corresponds to the description of its causal relationship. These structural priors can then be used later for network learning when relevant data becomes available.
To finalize this first causal network, we employ the Hellixia Image Generator to create unique icons for each node, based on the comment.
Let's move on to the creation of a more complex causal network by setting Complexity to High.
The next crucial step is an in-depth examination of this automatically generated network. For example, we observe that Fueling Delays is identified as a direct cause. Interestingly, Aircraft Turnaround Time is also identified as a direct cause. This leads us to speculate that Fueling Delays could be a direct cause of Aircraft Turnaround Time, which would have an indirect effect on flight delays.
To verify this hypothesis, we select the two nodes, Fueling Delays and Aircraft Turnaround Time, and apply the Hellixia Pairwise Causal Link feature. This will help us ascertain the nature of the causal relationship between these variables.
Hellixia validates the existence of this causal relationship and accordingly updates the conditional probability distribution of Aircraft Turnaround Time. This update incorporates a DualNoisyOr() function with a coefficient of 0.75, reflecting the quantified impact of Fueling Delays on Aircraft Turnaround Time.
Following this update, our next step involves removing the direct link from Fueling Delays to Delays in Scheduled Flight Departures. Subsequently, we need to adjust the DualNoisyOr() formula to accurately reflect this change in the network's structure.
Driven by curiosity to delve deeper, we select the relevant node to explore the causes of causes. For this, we once again make use of the Causal Network Generator, but on Fueling Delays.
Upon reviewing the newly added nodes and relationships, we identified that three relationships were incorrectly marked as negative, contrary to the descriptions in their respective link comments. To rectify this, we change the color of these links to accurately reflect their positive nature and accordingly update the DualNoisyOr() formula of Operational Efficiency.
To conclude our analysis, we're going to build a final causal network, this time using the Causal Relationships Finder function. Unlike the Causal Network Generator, which added new nodes for creating the network, this feature works directly with selected nodes. To begin with, we use the Dimension Elicitor tool to identify the 5 main Causes and 5 main Effects associated with Delays in Scheduled Flight Departures.
We proceed by selecting the 10 causes and effects, along with the Delays in Scheduled Flight Departures node. With these nodes selected, we then run the Hellixia Causal Relationships Finder to create the network.
As a result, we obtain the bow-tie network structure below.
This brings us to the end of our article. For further insights, we invite you to view the recorded webinar on this topic, which was conducted in January 2024.
Venture with us into the fascinating world of the Labrador Retriever, an incredibly cherished dog breed across the globe. Harnessing the power of Hellixia, we will delve into the various characteristics that define this breed. From its temperament and physical attributes to its historical background and unique quirks, we will construct a detailed semantic network that reveals the intricate aspects of the Labrador Retriever. Join us as we delve into understanding what makes this breed so special and universally adored.
Create a node named "Labrador Retriever".
Use the following keywords to guide the Dimension Elicitor in its node analysis: Advantages, Aims, Behaviors, Characteristics, Competitors, Components, Definitions, Descriptions, Dimensions, Elements, Factors, Features, and Traits.
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "Labrador Retriever" node.
Use the Embedding Generator on all remaining nodes.
Run the Maximum Weight Spanning Tree algorithm to create a semantic network.
Change the style of all nodes to "Badges". This will display the comment within each node.
Run the Dynamic Grid Layout to organize the nodes on your graph. Note that this algorithm's output is not deterministic; it may favor vertical, horizontal, or mixed orientations. Execute this layout multiple times until you find the most suitable arrangement.
Switch to Validation Mode.
As the graph you are building does not represent causal relationships, opt for the Skeleton View. This will remove all arc directions, leaving only the node connections without any specified direction.
Switch back to Modeling Mode.
Change all node styles to Discs.
Use the Symmetric Layout to organize your nodes in the graph.
Go to Validation Mode.
Conduct a Node Force analysis to evaluate the strength of associations in your graph.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Welcome to an engaging exploration of man's best friend, as seen through the lens of semantic networks. In this example, we will use the power of Hellixia to unravel the intricate web of relationships between different dog breeds.
Whether you're a canine enthusiast, a professional breeder, or simply curious about the method, you'll find this demonstration enlightening and entertaining. Let's embark on this journey to better understand the world of dog breeds.
Create a node named "Dog Breeds".
Use the Dimension Elicitor, and enter "Sample" as a single keyword to extract the 10 main breeds.
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "Dog Breeds" node.
Select the 10 created nodes.
Open the Dimension Elicitor and enter "Competitors" as the keyword.
Set the General Context to "Dog Breeds". This ensures that the elicitor will only consider elements related to "Dog Breeds" during the analysis.
Adjust the settings of the Dimension Elicitor to extract 10 breeds per node.
Run the Dimension Elicitor with the node name as the Main Subject of the Query.
Review the results. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Repeat the same workflow on the new nodes.
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Select all remaining nodes on your graph.
Open the Embedding Generator tool: Set the Linguistic Unit to "Node Name" and "Node Comment": The linguistic unit refers to the part of the node that the Embedding Generator will use. In this case, it will analyze both the node names (i.e., the breeds of the dogs) and the node comments (i.e., the descriptions of the breeds).
Run the Embedding Generator.
Run the Maximum Weight Spanning Tree algorithm to create a semantic network.
Change the style of all nodes to "Badges". This will display the comment within each node.
Run the Dynamic Grid Layout to organize the nodes on your graph. Note that this algorithm's output is not deterministic; it may favor vertical, horizontal, or mixed orientations. Execute this layout multiple times until you find the most suitable arrangement.
Switch to Validation Mode.
As the graph you are building does not represent causal relationships, opt for the Skeleton View. This will remove all arc directions, leaving only the node connections without any specified direction.
Switch back to Modeling Mode.
Change all node styles to Discs.
Use the Symmetric Layout to organize your nodes in the graph.
Go to Validation Mode.
Conduct a Node Force analysis to evaluate the strength of associations in your graph.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
OpenAI
Painters.xbl
BayesiaLab
The Leading Desktop Software for Research, Analytics, & Reasoning with Bayesian Networks
2024 BayesiaLab Spring Conference
Join the BayesiaLab Conference on April 11 & 12 in Cincinnati and Learn from Leading Experts.
New BayesiaLab Courses
Learn all about Bayesian networks in the highly-acclaimed BayesiaLab Courses coming up in Cincinnati in April.
New BayesiaLab 11
Learn About Hellixia, BayesiaLab's Native ChatGPT Integration
Hellixia
BayesiaLab's subject matter assistant powered by ChatGPT
BayesiaLab Trial
Explore All of BayesiaLab Functions in a Free 30-Day Trial
BRICKS
A Probabilistic Relational Modeling Framework
BEST
A Bayesian Expert System for Troubleshooting
Upcoming BayesiaLab Events
Courses, Seminars, & Events Around the World
BayesiaLab User Guide
1,500+ Pages of Documentation
BayesiaLab Installation Guide
Set Up BayesiaLab on Your Windows, Mac, or Linux/Unix Computer
BayesiaLab Archive
Recordings of Webinars, Seminars, Tutorials, Examples, & Case Studies
The BayesiaLab Book
E-Book: Bayesian Networks & BayesiaLab — A Practical Introduction for Researchers
BayesiaLab WebSimulator
Publish Your BayesiaLab Models as a WebSimulator
Bayesia Expert Knowledge Elicitation Environment (BEKEE)
Construct Bayesian Networks with Expert Knowledge
Bayesia Engine API
Learn and Deploy Bayesian Networks Programmatically
Previous BayesiaLab Releases
Over 20 Years of Innovation
Bayesia License Server (BLS)
Deploy and Manage BayesiaLab for Teams and Enterprises
BayesiaLab Configurations
Prices & Licensing Options for Individuals, Teams, Enterprises, and Academia
Key Concepts
Fundamental Ideas for Understanding Bayesian Networks
FAQ
Frequently Asked Questions About Bayesian Networks & BayesiaLab
Normalized Equal Distance is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The Normalized Equal Distance algorithm pre-processes the data with a smoothing algorithm to remove outliers before computing equal partitions.
As a result, the algorithm is less sensitive to outliers than the Equal Distance algorithm.
The algorithm also takes into account the Minimum Interval Weight that defines the minimum prior probability of a bin.
You can adjust the default Minimum Interval Weight under Main > Menu > Window > Preferences > Discretization
.
Equal Distance is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The Equal Distance algorithm computes the equal distances based on the range of the variable.
This method is particularly useful for discretizing variables that share the same variation domain (e.g. satisfaction measures in surveys).
Additionally, this method is suitable for obtaining a discrete representation of the density function.
However, the Equal Distance algorithm is extremely sensitive to outliers and can generate intervals that do not contain any data points. Please see the Normalized Equal Distance algorithm, which addresses this particular issue.
Perturbed Tree is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.
The Perturbed Tree algorithm is designed to optimize the representation of the probabilistic dependency between a Target variable and the to-be-discretized variable. It is an extension of the Tree discretization algorithm, and it functions as follows:
Data Perturbation generates a range of datasets.
For each perturbed dataset, a univariate tree is learned to predict the Target variable with the to-be-discretized continuous variable.
Extracting the most frequent thresholds produces the final discretization.
The Perturbed Tree algorithm takes into account the Minimum Interval Weight and can reduce the number of bins if necessary. It can also be more robust than the simple Tree discretization.
Welcome to the vibrant section of our website dedicated to showcasing Hellixia's semantic network examples, where analysis takes on a new dimension. This part of our site is a hub for curious minds eager to explore the complex interconnections within various domains such as philosophy, literature, cinema, song lyrics, and more.
With the help of Hellixia, we unravel the intricate relationships between ideas, themes, characters, and authors. From examining the moral quandaries in philosophical works like Machiavelli's "The Prince" or Hobbes' "Leviathan" to uncovering the essence of Shakespeare's "Hamlet" and Flaubert's "Madame Bovary," our analyses reach new depths.
But our exploration doesn't stop at books. We venture into the world of cinema, dissecting masterpieces like "Apocalypse Now," and dive into the poignant lyrics of songs by artists such as Nick Cave. Through Hellixia's power, we bring to life semantic networks that vividly illustrate the multifaceted connections and underlying themes in these works.
Whether you're a lover of classic literature, a cinema enthusiast, or a philosopher at heart, this section invites you to explore, learn, and engage with content in a way that transcends traditional analysis. Join us in this exciting journey where technology and creativity intersect, providing unique insights and fostering a deeper understanding of the world around us.
Step with us into the realms of power, strategy, and human nature as we set our sights on Niccolò Machiavelli's The Prince. Crafted in the crucible of Renaissance Florence, this timeless piece of literature stands as one of the most impactful texts in political philosophy, its influence reaching far beyond its era.
Machiavelli's frank, pragmatic exploration of power and statecraft provides a view of leadership that is as intriguing as it is controversial, and understanding his complex narrative requires a nuanced approach. To achieve this, we enlist the capabilities of Hellixia, BayesiaLab's subject matter assistant.
Using Hellixia's ability to generate intricate semantic networks, we can delve deep into the narrative threads of The Prince, illuminating the interconnected concepts, themes, and motifs that form the foundation of Machiavelli's groundbreaking treatise.
From the cunning strategies of political maneuvering to the paradoxical virtues of a successful leader, we'll explore the sophisticated landscape of The Prince, powered by the detailed semantic analysis provided by Hellixia. So, come and join us on this captivating journey as we uncover the layers of Machiavelli's enduring masterpiece.
Start by creating the node "The Prince".
Use the Dimension Elicitor, employing a broad array of keywords like "Characteristics", "Contributions", "Motivations", "Influencers", and many more, to conduct an exhaustive analysis of the book (see the keywords that are listed in the Class Editor below). We also set the General Context to "Nicolas Machiavel Political Philosophy".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Prince" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Welcome to our dedicated section, where we leverage Hellixia, BayesiaLab's new subject matter assistant, to explore the realm of philosophical essays. Here, we unpack the thoughts and arguments contained within works such as Niccolò Machiavelli's "The Prince," Thomas Hobbes' "Leviathan," and John Locke's "Two Treatises of Government." Through our analyses, we aim to construct semantic networks illuminating the complex webs of ideas and ideologies these essays present.
As we journey through each essay, we'll uncover the layers of philosophical discourse, revealing insights that have shaped political and moral thought for centuries. Join us as we navigate the pathways of these seminal philosophical works and gain a fresh understanding of their significance.
Embarking on an exploration of one of the most influential works in the realm of political philosophy, we turn our attention to Thomas Hobbes' Leviathan. Penned in a time of civil strife, Leviathan serves as a cornerstone of Western political thought, offering insights into the nature of social contract, sovereignty, and the legitimacy of political power.
Hobbes' arguments and reasoning, profound yet intricate, necessitate a thoughtful and systematic approach to understanding. That is where Hellixia, BayesiaLab's subject matter assistant, comes into play. With the power to construct detailed semantic networks, Hellixia provides us with a uniquely comprehensive way to interpret and examine the depth of Leviathan.
Utilizing these semantic networks, we will delve into the complex themes and ideas that Hobbes presents, mapping out the interconnections and dissecting the concepts that lie at the heart of Leviathan. From the notions of the state of nature and the social contract to the role and extent of sovereignty, our journey through this foundational text, powered by Hellixia's semantic analysis, promises a fresh perspective and new insights into Hobbes' grand political treatise.
Start by creating the node "Leviathan".
Use the Dimension Elicitor, employing a broad array of keywords like "Points", "Considerations", "Approaches", "Concepts", and many more, to conduct an exhaustive analysis of the book (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Leviathan" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Prepare to delve into the richly complex world of William Shakespeare's Hamlet, one of the most influential works in English literature. With its iconic characters and timeless themes of power, revenge, morality, and madness, Hamlet continues to captivate audiences centuries after its creation.
To navigate the intricacies of this monumental work, we will create and explore semantic networks, providing a unique lens through which to view and understand Hamlet.
Through these semantic networks, we'll uncover the deep interconnections between the play's characters, themes, and motifs, illuminating the layered narrative and providing fresh insights into this enduring classic. Join us on this enlightening journey as we explore Hamlet in a way you've never seen before, brought to life through the power of Hellixia's semantic analysis.
Start by creating the node "Hamlet".
Use the Dimension Elicitor, employing a broad array of keywords like "Developments", "Ideas", "Perspectives", "Milestones", and many more, to conduct an exhaustive analysis of the play (see the keywords that are listed in the Class Editor below).
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Hamlet" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our comprehensive exploration of Montesquieu's seminal work, "The Spirit of the Laws." Through the lens of Hellixia, we will embark on an intellectual journey to dissect and understand this monumental text, which remains a cornerstone in the realms of political science and philosophy.
In this section, we will conduct a detailed holistic analysis, delving deep into the complex layers that constitute this influential work. Focusing on various aspects like Concepts, Values, Impacts, and Perspectives, we aim to forge a rich, multidimensional exploration of Montesquieu's political theory. This analysis explains in depth Montesquieu's views on systems of governance, law, and the underlying principles that drive societies.
Join us as we traverse the intricate pathways of "The Spirit of the Laws", illuminating the timeless wisdom encapsulated within its pages and unraveling the broader implications and influences of Montesquieu's revolutionary thoughts on the modern world.
Start by creating the node "The Spirit of the Laws, by Montesquieu."
Use the Dimension Elicitor with this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Themes, Theses, Topics, and Values.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Spirit of the Laws, by Montesquieu" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on their semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and apply Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Welcome to our specialized section on creating Causal Semantic Networks. This segment is dedicated to showcasing the process and benefits of constructing networks that represent the semantic relationships between different factors and set the causal orientations that drive those relationships. Through various case studies and demonstrations, we will illustrate how Hellixia, our subject matter assistant, aids in identifying and defining these causations. From historical events to scientific phenomena, these causal semantic networks will provide a rich, contextual understanding of complex systems. Let's embark on this journey of exploration and insight, seeking to make the invisible visible and the complex comprehensible.
Step into a realm where two of the Enlightenment's most profound thinkers, Thomas Hobbes and John Locke, are set side by side for scrutiny. This section is dedicated to a comparative analysis of these philosophical giants using the insights provided by Hellixia. While both philosophers tackled the nature of the social contract, governance, and human nature, their conclusions often diverged, leading to rich philosophical debates that resonate today. With the aid of semantic networks, we'll untangle the intricate threads of their arguments, highlighting areas of agreement and divergence. This exploration promises a study of their philosophies and a deeper understanding of the broader political and ethical landscape they helped shape. Join us in this captivating journey as we traverse the intricate terrains of Hobbesian and Lockean thought.
Start by creating the node "Thomas Hobbes and John Locke".
Use the Dimension Elicitor with a broad array of keywords like "Perspectives, Rules, Divergences, Ideas, Topics, Similarities and Differences", and set the General Context to "Political Philosophy."
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Thomas Hobbes and John Locke" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
In this section, we harness the power of Hellixia, crafting a temporal and causal semantic network to delve into the relationships between 25 philosophers across time. With the help of Hellixia Comment Generator, we construct a Temporal Indice Dictionary, enabling us to set temporal constraints.
Begin by creating a node named "Influential Philosophers".
Utilize the Dimension Elicitor with "Samples" as Keyword. Adjust the Responses per Keyword setting to 25 to ensure a broad collection of answers.
Review the dimensions returned by Hellixia, eliminating any that seem redundant or irrelevant to your analysis.
Select all nodes.
Run the Comment Generator with "Years" as the Keyword, setting the Responses per Keyword to 1, and checking the Node Name as the Main Subject of the Query. Set the Output Settings to Dimension Name. This step replaces the existing comments tied to the nodes with the primary date associated with each philosopher.
Review the comments to ensure their accuracy. Modify BC dates to negative dates.
Export the Node Comments as a Dictionary and associate it with Node Temporal Indices. These indices will be automatically used as structural constraints to orient the arcs from past to future.
Select all nodes.
Run the Comment Generator again, this time using "Field" as Keyword and "Philosophy" as General Context. Set Responses per Keyword to 2, set the Node Name as the Main Subject of the Query, and set the Output Settings to Dimension Name. Make sure to check the box for Append Output to Current Comment. This action appends the current comments associated with the nodes with each philosopher's two main fields of study.
Use the Maximum Weight Spanning Tree algorithm to construct the Causal/Temporal Semantic Network.
Select all nodes and change the node styles to Badges, which allows the display of each node's comment.
Run the Genetic Grid Layout algorithm to efficiently organize the nodes on your graph, reflecting the causal/temporal directionality of the connections.
Venture into the haunting narrative of "The Horla," Guy de Maupassant's masterful exploration of sanity's fragile line and the unknown's unsettling embrace. In this section, with Hellixia as our analytical compass, we will journey through two distinct facets of this chilling tale:
Narrative Analysis: We'll dissect the plot intricacies, key events, and character dynamics, laying bare the psychological currents that drive this unsettling story forward.
Holistic Analysis: Beyond the immediate narrative, we'll step back to capture the broader themes, motifs, and overarching sentiments that give "The Horla" its enduring resonance.
Together, let's plunge into the depths of this classic horror story, using semantic networks to illuminate its layers and offer fresh insights into Maupassant's unsettling vision.
In this section, we'll unravel the plot intricacies, key events, and character dynamics that form the backbone of Maupassant's haunting tale. Through the lens of Hellixia, witness the story's unfolding as we navigate its chilling corridors.
Start by creating the node "The Horla, by Guy de Maupassant."
Use the Dimension Elicitor, employing the keywords "Context, Developments, Entities, Events, Keywords, Locations, Milestones, Motifs, Progressions, and Relationships," to conduct an exhaustive narrative analysis of the book.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "The Horla, by Guy de Maupassant" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and change the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Transitioning from the narrative, we now embark on a holistic exploration of "The Horla." With Hellixia's insights, we'll delve into the deeper themes, emotions, and overarching concepts that permeate Maupassant's masterpiece, capturing its essence beyond just the storyline.
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Themes, Theses, and Values.
Welcome to the Holistic Analysis of Salman Rushdie's "Midnight's Children," facilitated by the advanced tools of Hellixia. In this comprehensive exploration, we traditionally delve into the multifaceted narrative, characters, and themes of Rushdie's iconic work.
Adding a new dimension to our analysis, we will now also utilize the innovative Hellixia Report Analyzer feature. This state-of-the-art tool is adept at providing a useful summary of the novel's domain, focusing on the nuanced analysis of node forces and the strengths of the relationships within the story's network.
By integrating this feature into our holistic analysis, we aim to not only maintain our thorough examination but also enhance it with a succinct and insightful summary, capturing the essence of Rushdie's narrative in a way that complements our deep dive into the text.
Start by creating the node "Midnight's Children, by Salman Rushdie"
Use the Dimension Elicitor, employing a broad array of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Theses, and Values.
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Midnight's Children, by Salman Rushdie" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and alter the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Switch to Validation Mode.
Generate the Relationship Report. This report returns two key pieces of information: the Node Force, which indicates the influence and importance of each node within the network, and the strength of all relationships as described in the network. This provides a comprehensive view of how nodes are interconnected and the significance of these connections.
Run the Report Analyzer: With the Relationship Report in hand, proceed to run the Report Analyzer. This tool is designed to synthesize the data into a narrative form. It interprets the node forces and relationship strengths to create a story that summarizes the main dynamics of the domain. This narrative provides a digestible and insightful summary of the complex relationships and key elements within the network.
Execute Variable Clustering: This operation will categorize analogous variables based on semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors.
Use the Export Descriptions function and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
After our initial exploration using the Report Analyzer on the network of "manifest variables," we are now set to delve deeper. Our next step involves generating a new report, this time concentrating on the hierarchical network – the domain of latent variables.
Immerse yourself in George Eliot's "Middlemarch," a literary masterpiece that profoundly looks into 19th-century provincial life in England. Leveraging the capabilities of Hellixia, our journey into this classic will be navigated through semantic networks, dividing our exploration into two distinct stages:
Narrative Analysis: By examining the plot intricacies, character dynamics, and the socio-personal currents influencing them, we'll draw deeper connections within the narrative.
Holistic Analysis: Stepping back from the immediate narrative, Hellixia will guide us through a broader examination of the novel. Tapping into diverse categories such as Achievements, Emotions, Themes, and Values, we aim to capture the multifaceted essence of "Middlemarch."
Join us in this exploration, where we aim to unravel the nuances and complexities of "Middlemarch" that continue to resonate with readers across generations.
From the unfolding Events to pivotal Milestones and distinct Locations to underlying Motifs, we'll spotlight the interwoven Relationships among the novel's Entities. Guided by essential keywords like Context, Developments, and Progressions, this section seeks to unveil the narrative depth and intricacies of Eliot's masterpiece.
Start by creating the node "Middlemarch."
Use the Dimension Elicitor, employing the keywords "Context, Developments, Entities, Events, Keywords, Locations, Milestones, Motifs, Progressions, and Relationships," to conduct an exhaustive narrative analysis of the book. Set the General Context to "George Eliot novel".
Inspect the dimensions returned by Hellixia and eliminate any that seem superfluous or unrelated to your analysis. Next, disregard the "Middlemarch" node and run the Embedding Generator on all remaining nodes to apprehend the semantic associations of their names and comments.
Use the Maximum Weight Spanning Tree algorithm to generate a semantic network.
Change node styles to Badges to ensure each node's comment is visible. Then, apply the Dynamic Grid Layout to position the nodes on your graph; remember that this algorithm is not deterministic, and its orientation—vertical, horizontal, or mixed—is random. You might need to execute this layout several times to obtain an arrangement that aligns with your taste.
Switch over to Validation Mode and select Skeleton View. Since your network doesn't represent causal relations, Skeleton View will maintain only node connections without indicating a direction.
Return to Modeling Mode and change the node styles to Discs.
Use the Symmetric Layout and switch to Validation Mode to run a Node Force analysis.
Execute Variable Clustering: This operation will categorize analogous variables based on semantic relationships.
Open the Class Editor and run Class Description Generator to generate descriptive names for the factors in question. Use the Export Descriptions function, and save the newly created descriptions.
Return to Modeling Mode and run Multiple Clustering to generate latent variables.
Run the structural learning algorithm Taboo. Ensure the "Delete Unfixed Arcs" option is enabled.
Use the descriptions you exported earlier as a Dictionary to rename the latent variables you've created.
Switch to Validation and run Node Force.
Given the size of this network, we can focus on the upper level of the hierarchical network. Below is the Node Force analysis on these factors only, i.e., excluding all manifest variables before the analysis.
Transitioning from the narrative details, our next phase delves into the broader essence of "Middlemarch." Here, we venture beyond the story to understand its Achievements, Emotions, Themes, and Values, capturing the multifaceted heart of Eliot's work. This comprehensive exploration offers a panoramic view of the novel's enduring impact and significance.
Follow the workflow outlined in the Narrative Analysis section, but use this set of keywords: Achievements, Characteristics, Components, Concepts, Considerations, Contributions, Domains, Elements, Emotions, Features, Feelings, Forces, Ideas, Impacts, Perspectives, Purposes, Sentiments, Subjects, Themes, Theses, and Values.
Welcome to our Causal Bayesian Networks section, where we leverage Hellixia as a Subject Matter Assistant for constructing Causal Bayesian Networks. These networks feature directional arcs that convey causality. In contrast to Causal Semantic Networks, which primarily offer qualitative insights by highlighting semantic causal relationships between variables, Causal Bayesian Networks offer a dual approach, encompassing both qualitative and quantitative aspects. They serve not only to improve our understanding of a domain, but also to enable probabilistic and causal inference.
A Causal Knowledge Discovery Case Study in Dermatology
Skin hyperpigmentation is a common condition where patches of skin become darker than the surrounding skin. This conceptual example explores opportunities for developing new treatments and therapies. The starting point of any such endeavor should be a thorough causal understanding of the problem domain.
In this example, we leverage the capabilities of Hellixia, BayesiaLab's new subject matter assistant, to analyze the cause-and-effect interplay related to this skin condition.
Our focus is on constructing a comprehensive causal semantic network that highlights the factors influencing the onset and severity of hyperpigmentation. From genetic predispositions and environmental triggers to lifestyle habits, we search for the connections that are relevant to this condition. This exploration offers insights into the dynamics of skin hyperpigmentation.
Create a node named "Skin Hyperpigmentation with Visible Light."
Use the following keywords to guide the Dimension Elicitor's node analysis: Causes, Effects, Milestones, and Mechanisms, and set the General Context to "Dermatology."
Inspect the dimensions suggested by Hellixia. Any dimensions that are irrelevant or redundant should be removed from your analysis.
Exclude the "Skin Hyperpigmentation with Visible Light" node.
Change the style of all nodes to "Badges". This will display the comment within each node.
Given that the keywords 'Causes' and 'Effects' already embody causal semantics, our primary task now is to manually scrutinize the relationships between the nodes generated by the keywords "Mechanisms" and "Milestones". Generating embeddings and using structural learning can be beneficial during this analysis phase.
Manually draw arcs between the nodes to denote a causal relationship.
Select all arcs and utilize Hellixia's Explanation of Causal Arcs. If Hellixia concurs with the proposed causal relationship, it will provide an explanation, which will then be associated with the arc comments.
Run the Genetic Grid Layout: This will arrange the nodes on your graph while considering the causal directions of the connections. It positions the nodes so that the causal flow, as represented by the directed arcs, generally goes from the top of the graph toward the bottom, thereby providing a clear, hierarchical visual representation of the causal relationships.
Atopic Dermatitis, commonly known as eczema, manifests as red, itchy, and occasionally painful rashes, affecting both children and adults to varying degrees.
This section examines the many facets of atopic dermatitis, where genetic, environmental, and immunological factors converge to influence its development and progression. To better understand this complex disease, we use Hellixia to generate causal Bayesian networks, which provide a structured framework for deciphering cause-and-effect relationships.
But first, we'll start with a semantic analysis of the domain to get an overview of the main concepts, variables, and relationships in the field of atopic dermatitis.
We finally select the factors only (i.e., we focus on the higher level of this hierarchical network), and use the Hellixia Report Analyzer to generate a concise summary of the Relationship Analysis Report.
Having gained an overall understanding of the domain through semantic networks, we now move on to the construction of Causal Bayesian Networks using Hellixia's new capabilities that will be released in BayesiaLab 11.2.
We start by creating a node called "Atopic Dermatitis Mechanism", then select the Causal Network Generator feature.
After one or two minutes (the prompt is indeed quite complex), we obtain a fully specified Causal Bayesian Network (graph and probabilities). This network is characterized by causally oriented arcs, each accompanied by a concise explanation of the causal relationship and an estimate of the causal effect, scaled between -100 (shown in red) and 100 (shown in blue). To translate these causal effects into conditional probability tables, we use a new BayesiaLab formula, DualNoisyOr(), specially designed to integrate positive and negative effects between Boolean variables.
Naturally, the networks generated by Hellixia MUST undergo rigorous evaluation by Subject-Matter Experts. This verification is crucial not only from a qualitative point of view to ensure that the network accurately represents real causal relationships but also from a quantitative point of view to confirm the relevance of the suggested causal effects.
Let's delve further into this domain by exploring the underlying causes of "Microbial Infection." To do this, we select the respective node in the network and proceed to the Causal Network Generator.
Displayed below is the generated causal network, showcasing the expanded view with detailed aspects of Microbial Infection. The yellow nodes are common to both the original and expanded networks, the grey nodes represent the original network nodes only, and the red nodes indicate the newly added dimensions specific to microbial infection.
We finally use the Hellixia Report Analyzer to generate a concise summary of the (Causal) Relationship Analysis Report.
We will now adopt a different workflow to construct a Causal Network for Atopic Dermatitis. We start by using Hellixia's Dimension Elicitor to identify relevant dimensions. With these nodes generated, we diverge from our usual practice of generating embeddings for semantic networks. Instead, we utilize Hellixia's new Causal Relationships Finder feature to automatically create a Causal Network based on our set of selected nodes.
We select a range of keywords to guide the Dimension Elicitor process in Hellixia, encompassing various aspects of the domain under study. These keywords include 'Accelerators,' 'Catalyzers,' 'Causes,' 'Drivers,' 'Mechanisms,' 'Consequences,' 'Symptoms,' 'Inhibitors,' 'Moderators,' 'Preventers,' and 'Treatments.'
We run the Causal Relationships Finder on the nodes elicited for the Atopic Dermatitis Mechanism. This tool examines potential causal connections among these nodes and, if required, generates latent variables to enhance the network's explanatory power.
Similar to the Causal Network Generator, the tool does more than identify causal links; it also quantifies the causal effects, which are represented on a scale ranging from -100 (indicated in red) to 100 (indicated in blue).
We conclude this section by utilizing the Hellixia Report Analyzer, which efficiently generates a concise summary of the (causal) Relationship Analysis Report for this latest network.
We create the node "Atopic Dermatitis" and then go through our usual workflow for creating a semantic network and then a hierarchical semantic network (see previous sections, e.g., ) to perceive the semantic landscape surrounding atopic dermatitis and lay the foundations for a deeper understanding of its underlying dynamics.
In this section, we demonstrate how Hellixia can be utilized to form a semantic network from the 145 keywords provided by the Dimension Elicitor, illustrating the semantic connections between these keywords.
Create Nodes: create a node for each keyword by importing a CSV file where all the keywords are located on the first line, followed by a '0' on the second line and a '1' on the third line for each keyword. This structure will help BayesiaLab interpret these columns as variables.
Generate Embeddings: Once you have created your nodes, select them all and use the Embedding Generator. This tool will capture the semantic meaning associated with the node names.
Learn Semantic Relationships: Use the Maximum Weight Spanning Tree algorithm to learn the semantic relationships between these nodes (variables). This algorithm will create the most significant connections between the nodes, forming a tree structure that maximizes the total weight of the tree.
Automatic Node Positioning: Apply the Symmetric Layout algorithm to the nodes for automatic positioning. This will organize your nodes in a visually clear and understandable way.
Switch to Validation Mode and conduct a Node Force Analysis.