1 of 40

Webinars, Seminars, Tutorials, Examples, & Case Studies

Context

If you want to learn about the practical applications of Bayesian networks, you've come to the right place.
You will find countless real-world examples and many hours of tutorial videos in this section.
Most examples include the corresponding raw data in CSV format plus the associated BayesiaLab XBL files.
This allows you to replicate all the steps shown in the examples.

Webinar: Discovering Complex Causal Structures with Hellixia

Dr. Lionel Jouffe & Stefan Conrady

Recorded on January 25, 2024, 16:00 (UTC)

In this webinar, you'll learn about a new approach to causality analysis using Bayesian networks. We will showcase the innovative causal analysis features of Hellixia, BayesiaLab's new subject matter assistant. All of these new Hellixia features are part of BayesiaLab 11.2, which is now available.

Webinar Recording

Introduction

Whether you're a data scientist, researcher, or analytics professional, understanding the causal direction between variables is critical to fully understanding a given problem domain. However, despite many advances in machine learning in recent years, discovering causalities purely from data remains elusive.

Context & Background

Where is my Bag?

You may be familiar with our "Where is my bag?" example, which received much publicity through Judea Pearl's best-selling book, The Book of Why. This example was about correctly reasoning about the probability of receiving a piece of checked luggage after landing at a destination airport.

The key to dealing with this problem was encoding our limited causal knowledge into a Bayesian network, which then allowed us to perform inference correctly. In other words, we provided our knowledge, and then the computer, i.e., the BayesiaLab software, could reason for us.

Flight Departure Delays

Today, we once again use an example from the field of air travel. Now, we are interested in the causes and consequences of flight delays. However, we are not relying on any knowledge we might already have to build a Bayesian network model. Instead, we leverage BayesiaLab's new subject matter assistant, Hellixia, to assemble any knowledge that may be accessible through Large Language Models, such as ChatGPT.

Webinar Agenda

Pairwise Causal Links

To explore the topic of flight delays, we start by evaluating pairwise relationships between variables that are presumably relevant, such as "Holidays" and "Flight Delays" or "Flight Delays" and "Flight Safety."

Without using any data from which a network could potentially be machine-learned, Hellixia utilizes Large Language Models to investigate the presence of causal relationships between selected pairs of variables. This feature determines whether a causal link exists and provides an estimated measure of causal effect, including both positive and negative impacts.

Causal Structural Priors

Establishing causality requires much more than discovering associations between variables. It requires that we identify the correct direction and the nature of the influence between the variables of interest. Hellixia can directly tap into the subject matter expertise contained in Large Language Models and use that knowledge to assign causal structural priors to a set of selected arcs. BayesiaLab's Structural Learning algorithms can then use these priors to machine-learn causal Bayesian Networks.

Causal Arc Explainer

Once causal directions have been established for specific arcs within a Bayesian network, it becomes important to communicate their meaning to stakeholders. Hellixia can elaborate on the causal mechanisms represented by arcs in the Bayesian network, thereby helping stakeholders understand the underlying causal processes.

In our example about flight delays, Hellixia would add comments to the causal arcs shown in the following network, which includes variables such as Crew Availability, Air Traffic, Flight Scheduling, and Passenger Boarding.

Causal Network Generator

Creating a new Bayesian network of a complex domain exclusively from human expert knowledge can be challenging and time-consuming. In this webinar, we'll demonstrate how Hellixia can substantially simplify and accelerate this process. Given a particular variable of interest as a starting point, Hellixia can automatically create nodes and build a comprehensive and fully specified Bayesian network representing the problem domain. Fully specified means that both the causal network structure and the parameters are obtained by Hellixia. As a result, experts can review and build upon such a Hellixia-generated model instead of having to start from zero.

In our example domain, the starting point is "Delays in Scheduled Flight Departures." Hellixia finds causes, such as "Air Traffic" and "Weather Conditions," as well as consequences, e.g., "Passenger Satisfaction" and "Operational Costs."

Causal Relationship Finder

In addition to building a causal Bayesian network from scratch, like with the #causal-network-generator function above, we can use Hellixia to create a causal Network from a defined set of variables (e.g., created with Hellxia's Dimension Elicitor).

Here, we provide a set of causes and consequences and let Hellixia determine how they are causally related.

About the Presenter

Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel holds a Ph.D. in Computer Science from the University of Rennes and has worked in Artificial Intelligence since the early 1990s. While working as a Professor/Researcher at ESIEA, Lionel started exploring the potential of Bayesian networks.
After co-founding Bayesia in 2001, he and his team have been working full-time on the development of BayesiaLab. Since then, BayesiaLab has emerged as the leading software package for knowledge discovery, data mining, and knowledge modeling using Bayesian networks. It enjoys broad acceptance in academic communities, business, and industry.

Recordings of Previous Webinars

Upcoming BayesiaLab Events

Webinar: Harnessing Hellixia: Innovations in Bayesian Belief Network Construction

Dr. Lionel Jouffe, Bayesia S.A.S. & Stefan Conrady, Bayesia USA

Presentation Video

Overview

In the realm of Bayesian Belief Networks, integrating advanced tools can profoundly enhance the modeling, understanding, and interpretation of complex systems. This presentation introduces Hellixia, BayesiaLab's cutting-edge subject matter assistant powered by ChatGPT, as a game-changing tool in this domain.

Discover how Hellixia assists users in identifying pertinent dimensions/nodes within any problem domain. Beyond identification, we delve into exploiting the Independence of Causal Influence principle. This pivotal concept provides a strategic avenue for model simplification, resulting in expedited model-building phases.

The presentation further explores Hellixia's capability to generate embeddings, enabling the discovery of semantic relationships between nodes and facilitating the seamless creation of semantic networks for rapid domain comprehension.

Finally, we'll explore how Hellixia can assist us in identifying causal relationships between nodes.

To provide a hands-on perspective, attendees will be treated to a live demonstration of these features using BayesiaLab, offering a tangible insight into the transformative potential of Hellixia in Bayesian Belief network modeling.

Hellixia Screenshots

About the Instructor

Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel holds a Ph.D. in Computer Science from the University of Rennes and has worked in Artificial Intelligence since the early 1990s. While working as a Professor/Researcher at ESIEA, Lionel started exploring the potential of Bayesian networks.
After co-founding Bayesia in 2001, he and his team have been working full-time on the development of BayesiaLab, which has since emerged as the leading software package for knowledge discovery, data mining, and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities, business, and industry.

Using BayesiaLab's New REST API for Diagnosing COVID-19 with a Smartphone App

Dr. Lionel Jouffe, Bayesia S.A.S. & Stefan Conrady, Bayesia USA

Summary

In this webinar, Dr. Lionel Jouffe, Bayesia's CEO, explains all the development steps from the initial medical knowledge elicitation to releasing a smartphone app for diagnostic decision support — all within the framework of Bayesian networks and BayesiaLab. In this context, you will learn about Bayesia's new REST API and how it can make inferences with Bayesian networks available to any application.

Presentation Video

Background & Motivation

Over the past 12 months, the growing number of COVID-19 infections has led to an urgent need for the reliable detection of new infections. Given the similarity of their symptoms, the common cold, influenza, and COVID-19 have remained difficult to differentiate.

Bayesian networks are recognized as powerful tools for risk analysis and decision support and have become a popular model for clinical decision support systems (CDSS). Bayesian networks are particularly suitable for healthcare as they can

model complex problems with causal dependencies under high degrees of uncertainty;
combine different sources of information, including empirical data and expert opinion;
have an interpretable graphical structure;
model interventions in diagnostic and prognostic ways.

Based on the success of the COVID-19 WebSimulator implementation, we joined forces with Dr. Jordi Ochando at the Spanish Society for Immunology and Dr. Manisha Brahmachary at the Mount Sinai School of Medicine to further refine our web-based diagnostic tool and turn it into an app for iOS and Android. Numerous medical research institutes collaborated in our efforts, including Hôpital Foch (France), the University of Mons (Belgium), and the Hospital Universitario Donostia (Spain).

Medical Knowledge Elicitation with BEKEE

The Bayesia WebSimulator

The Bayesia Engine API

The SEI Smartphone App

You can install the app via the following links:

Webinar Materials

Examples from The Book of Why Illustrated with BayesiaLab

For each example, we provide an excerpt from Pearl's book to introduce the problem domain and then offer a solution in the form of a Bayesian network.
We share all networks in BayesiaLab's XBL format and publish them as interactive WebSimulators so you can experiment with the models without installing BayesiaLab.

Smallpox Vaccine

Context

"Let me give you an example in which probabilities make all the difference. It echoes the public debate that erupted in Europe when the smallpox vaccine was first introduced. Unexpectedly, data showed that more people died from smallpox inoculations than from smallpox itself. Naturally, some people used this information to argue that inoculation should be banned, when in fact it was saving lives by eradicating smallpox."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 43–44). Basic Books. Kindle Edition.

The Problem as a Bayesian Network

We implement this example as a Causal Bayesian network. "Causal" means that the arc directions represent causal relationships between the variables.
In this network, the green node #Dead is a Function Node that calculates the number of children who died within a population of 1 million.

We created a WebSimulator that allows you to experiment with this model and try out different scenarios: https://simulator.bayesialab.com/#!simulator/685025884871

Question: Do Vaccines Kill?

"I can empathize with the parents who might march to the health department with signs saying, 'Vaccines kill!' And the data seem to be on their side; the vaccinations indeed cause more deaths than smallpox itself. But is logic on their side? Should we ban vaccination or take into account the deaths prevented?" (Pearl, p. 44)

Querying the Bayesian Network

We attempt to answer this counterfactual question in BayesiaLab.
To do so, we need to set Vaccinated=False as Hard Evidence, thus simulating a counterfactual world in which no children are vaccinated.
The Bayesian network infers that not vaccinating would cost the lives of 4,000 children, as shown in the green Function Node.

To replicate the same step in the WebSimulator you need to move the slider Vaccinated=False to 100.

Teahouse

The Teahouse Example

"To see how Bayes’s method works, let’s start with a simple example about customers in a teahouse, for whom we have data documenting their preferences. Data, as we know from Chapter 1, are totally oblivious to cause-effect asymmetries and hence should offer us a way to resolve the inverse-probability puzzle."

Will you have a scone with your tea?

Upon completing the data import, the two variables, Tea and Scones, are represented as nodes.
Now we manually add an arc from Tea to Scones to represent a relationship between the nodes.
Then, we let BayesiaLab estimate the probabilities of this relationship using Maximum Likelihood Estimation: Main Menu > Learning > Parameter Estimation.

Note that the arc between Tea and Scones does not have any causal meaning here. It merely represents the association between Tea and Scones.
As a result, we could invert the arc without changing the representation of this non-causal example.
The following screen capture from the WebSimulator illustrates that the proportion of customers who ordered both tea and scones is indeed 1/3, i.e., the Joint Probability equals 1/3, as shown in the Output Panel on the right.

The Inverse Probability Problem

This innocent-looking equation came to be known as “Bayes’s rule.” If we look carefully at what it says, we find that it offers a general solution to the inverse-probability problem." (Pearl, p. 101)

Will you have tea with your scone?

To answer this question, we need to perform probabilistic inference with the WebSimulator by setting Scones to Yes.
Then, the WebSimulator automatically infers the probability of Tea=Yes, which is now 80%.

Updating Beliefs in Response to Evidence

"We can also look at Bayes’s rule as a way to update our belief in a particular hypothesis. This is extremely important to understand because a large part of human belief about future events rests on the frequency with which they or similar events have occurred in the past. [...]
As we saw, Bayes’s rule is formally an elementary consequence of his definition of conditional probability. But epistemologically, it is far from elementary. It acts, in fact, as a normative rule for updating beliefs in response to evidence. In other words, we should view Bayes’s rule not just as a convenient definition of the new concept of “conditional probability” but as an empirical claim to faithfully represent the English expression “given that I know.” (Pearl, pp. 101-102)

Breast Cancer

Context

"Subjectivity (Ed, i.e., the prior) is sometimes seen as a deficiency of Bayesian inference. Others regard it as a powerful advantage; it permits us to express our personal experience mathematically and combine it with data in a principled and transparent way. Bayes’s rule informs our reasoning in cases where ordinary intuition fails us or where emotion might lead us astray. We will demonstrate this power in a situation familiar to all of us.
Suppose you take a medical test to see if you have a disease, and it comes back positive. How likely is it that you have the disease? For specificity, let’s say the disease is breast cancer, and the test is a mammogram."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 104-105). Basic Books. Kindle Edition.

Representing the Problem Domain as a Bayesian Network

We implement this example as a causal Bayesian network, which means the arc between Breast Cancer and Mammogram represents a causal relationship.

You can also experiment with this model via our WebSimulator: https://simulator.bayesialab.com/#!simulator/186824514911

Should I worry about a positive test result?

"Suppose a forty-year-old woman gets a mammogram to check for breast cancer, and it comes back positive. The hypothesis, D (for “disease”), is that she has cancer. The evidence, T (for “test”), is the result of the mammogram. How strongly should she believe the hypothesis? Should she have surgery?" (Pearl, p. 105)

Calculating the Cancer Risk with BayesiaLab

We use the probabilities described by Pearl to set the parameters of the Causal Bayesian Network:
- For a typical forty-year-old woman, the probability of getting breast cancer in the next year is about one in seven hundred, 0.14%. We use that as our prior;
- The sensitivity (true-positive) of a mammogram is 73%;
- The specificity (true-negative) of a mammogram is 88%.
Notice the Input component Breast Cancer—Your Prior Estimate in the WebSimulator. This allows you to set your own initial belief that a patient has breast cancer.
Upon setting Mammogram=Positive as Hard Evidence, the probability of Breast Cancer=True increases from 0.14% to 0.86%.

Counterintuitive Results

"The conclusion is startling. I think that most forty-year-old women who have a positive mammogram would be astounded to learn that they still have less than a 1 percent chance of having breast cancer. Figure 3.3 might make the reason easier to understand: the tiny number of true positives (i.e., women with breast cancer) is overwhelmed by the number of false positives."(Pearl, p. 106)

Should I worry now?

"However, the story would be very different if our patient had a gene that put her at high risk for breast cancer—say, a one-in-twenty chance within the next year. [...]
For a woman in this situation, the chances that the test provides lifesaving information are much higher. That is why the task force continued recommending annual mammograms for high-risk women.
This example shows that P(disease | test) is not the same for everyone; it is context-dependent (Ed: it depends on the prior). If you know that you are at high risk for a disease to begin with, Bayes’s rule allows you to factor that information in. Or if you know that you are immune, you need not even bother with the test!" (Pearl, pp. 107–108)

Recalculating the Risk

To answer this question with BayesiaLab, you can either modify the model by setting the prior of Breast Cancer to 5% via the Node Editor, or you can set a Probabilistic Evidence via the Monitor.
In the WebSimulator, you would set the Input Breast Cancer—Your Prior Estimate (initial belief) to 5%.
Upon setting Mammogram=Positive, the probability of Breast Cancer=True increases to 24.25%.

Visualizing the Impact of the Prior

To illustrate the impact of the prior (or prevalence), we added a parent node to Breast Cancer for defining such prior. This is what we call a "hyperparameter."
You can now set Mammogram=Positive as Hard Evidence.
With this evidence set, you can use Target Mean Analysis to explore a range of values for the prior, from 0% to 100%: Main Menu > Analysis > Visual > Target > Target's Posterior > Curves > Total Effects.
You will obtain a plot in which the x-axis represents the prior of Breast Cancer=True, i.e., the hyper-parameter.
The y-axis represents the updated probability of Breast Cancer=True given a positive mammogram result.

Where Is My Bag?

Context

"So far I have emphasized only one aspect of Bayesian networks—namely, the diagram and its arrows that preferably point from cause to effect. Indeed, the diagram is like the engine of the Bayesian network. But like any engine, a Bayesian network runs on fuel. The fuel is called a conditional probability table [...]
Let’s look at a concrete example, suggested by Stefan Conrady and Lionel Jouffe of BayesiaLab, Inc. It’s a scenario familiar to all travelers: we can call it “Where Is My Bag?” Suppose you’ve just landed in Zanzibar after making a tight connection in Aachen, and you’re waiting for your suitcase to appear on the carousel. Other passengers have started to get their bags, but you keep waiting… and waiting… and waiting. What are the chances that your suitcase did not actually make the connection from Aachen to Zanzibar? The answer depends, of course, on how long you have been waiting. If the bags have just started to show up on the carousel, perhaps you should be patient and wait a little bit longer. If you’ve been waiting a long time, then things are looking bad."

The Problem Domain as a Bayesian Network

What does this table mean?

"This table, though large, should be easy to understand. The first eleven rows say that if your bag didn’t make it onto the plane (bag on plane = false) then, no matter how much time has elapsed, it won’t be on the carousel (carousel = false). That is, P(carousel = false | bag on plane = false) is 100 percent. That is the meaning of the 100s in the first eleven rows. The other eleven rows say that the bags are unloaded from the plane at a steady rate. If your bag is indeed on the plane, there is a 10 percent probability it will be unloaded in the first minute, a 10 percent probability in the second minute, and so forth. For example, after 5 minutes there is a 50 percent probability it has been unloaded, so we see a 50 for P(carousel = true | bag on plane = true, time = 5). After ten minutes, all the bags have been unloaded, so P(carousel = true | bag on plane = true, time = 10) is 100 percent. Thus we see a 100 in the last entry of the table." (Pearl, p. 119)

The Curve of Abandoning Hope

"The most interesting thing to do with this Bayesian network, as with most Bayesian networks, is to solve the inverse-probability problem: if x minutes have passed and I still haven’t gotten my bag, what is the probability that it was on the plane? Bayes’s rule automates this computation and reveals an interesting pattern. After one minute, there is still a 47 percent chance that it was on the plane. (Remember that our prior assumption was a 50 percent probability.) After five minutes, the probability drops to 33 percent. After ten minutes, of course, it drops to zero." (Pearl, p. 119)

Querying the Bayesian Network

In BayesiaLab, you can automatically generate and plot the "Curve of Abandoning Hope."
First, you need to define Bag on Plane as your Target Node.
Then set Bag on Carousel=False as Hard Evidence.
Finally, select Main Menu > Analysis > Visual > Target > Target's Posterior > Histogram.
The x-axis represents Elapsed Time, the y-axis the posterior probability of Bag on Plane=True given Bag on Carousel=False and Elapsed Time.

The Back-Door Criterion

Context

Causal effect estimation is the topic of Chapter 10 in our book, Bayesian Networks & BayesiaLab. In this context, we discuss the central role of confounders and non-confounders in identifying and estimating causal effects. Much of what we explain in that chapter is a practical illustration of Judea Pearl's teaching on causality.

As the originator of an entire school of thought on causality, Judea Pearl is certainly at liberty to take a more light-hearted and playful approach in presenting this serious topic. Chapter 4 in The Book of Why he titled "Confounding and Deconfounding: Or, Slaying the Lurking Variable." In fact, Pearl presents the task of "deconfounding" for causal effect estimation as a series of "games," which we now wish to illustrate with Bayesian networks.

The Back-Door Criterion and Deconfounding — It's All Fun and Games

We begin with a selection of quotes from the beginning of Chapter 4 to provide motivation for the forthcoming examples.

"To understand the back-door criterion, it helps first to have an intuitive sense of how information flows in a causal diagram. I like to think of the links as pipes that convey information from a starting point X to a finish Y. Keep in mind that the conveying of information goes in both directions, causal and noncausal, as we saw in Chapter 3.
In fact, the noncausal paths are precisely the source of confounding."
"To deconfound two variables X and Y, we need only to block every noncausal path between them without blocking or perturbing any causal paths."
"With these rules, decounfounding becomes so simple and fun that you can treat it like a game"
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 158-159). Basic Books. Kindle Edition.

For each of the proposed games in Chapter 4, we prepare a corresponding Bayesian network in BayesiaLab. These networks allow you to experiment with the "pipes that convey information" as if they were set up in a laboratory, where you can look inside the tubes and measure the flows in pipes:

Game 3

"In Games 1 and 2 you didn’t have to do anything, but this time you do. There is one back-door path from X to Y, X←B→Y, which can only be blocked by controlling for B. If B is unobservable, then there is no way of estimating the effect of X on Y without running a randomized controlled experiment. Some (in fact, most) statisticians in this situation would control for A, as a proxy for the unobservable variable B, but this only partially eliminates the confounding bias and introduces a new collider bias." (Pearl, p. 160)

Game 3 in BayesiaLab

As with the earlier games, we encode Game 3 as a causal Bayesian network graph:

Again, the probabilities are fictitious and irrelevant.
We select Main Menu > Analysis > Visual > Graph > Influence Paths to Target to analyze the paths from X to Y.

Given the presence of a noncausal path (highlighted in pink), it becomes clear that we need to control for B to block that path.
Here, "fixing the probabilities" of B are a practical way of controlling for that variable. Note that the states and the values of the variable are irrelevant.

Now, after controlling for B, only one causal path remains, highlighted in blue, which allows us to estimate the effect of X and Y.
However, if B were unobservable ("not observable" or "hidden" in BayesiaLab terminology), some statisticians would perhaps propose to control for A as a proxy of B.
Let's try that scenario as well. We are now fixing A while leaving B "open."

The Influence Path Analysis reveals that controlling for proxy A does not achieve our objective.
Not only does it not block the noncausal path X←B→Y, controlling for A introduces an additional noncausal path X→A ←B→Y, i.e., another bias that prevents us from estimating the effect of X on Y.
This phenomenon is known as "collider bias," as it is produced by conditioning on a collider, such as A.

The Birth-Weight Paradox

Context

"In the mid-1960s, Jacob Yerushalmy pointed out that a mother’s smoking during pregnancy seemed to benefit the health of her newborn baby, if the baby happened to be born underweight."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 183). Basic Books. Kindle Edition.

The Paradox Illustrated

We implemented this counterintuitive example as a causal Bayesian network, which means the arcs represent causal relationships.

Since the problem's description in the book is purely qualitative, and no data is available, we associated arbitrary probability distributions with the nodes. Although arbitrary, we specified the probabilities so that the network produces the paradoxical behavior described by Pearl.
Alternatively, you can experiment with this model using our WebSimulator: https://simulator.bayesialab.com/#!simulator/115411982911

The birth-weight paradox can be highlighted with two observations:
- Babies of smokers have a lower birth weight than babies of non-smokers.
- Low-birth-weight babies of smoking mothers have a higher survival rate compared to those of non-smokers.

The Paradox Resolved

"Smoking may be harmful in that it contributes to low birth weight, but certain other causes of low birth weight, such as serious or life-threatening genetic abnormalities, are much more harmful. There are two possible explanations for low birth weight in one particular baby: it might have a smoking mother, or it might be affected by one of those other causes." (Pearl, pp. 184–185)

In other words, Low-Birth-Weight is a collider in the structure Smoking Mother → Low-Birth-Weight ← Birth Defect.
By observing Low-Birth-Weight, we open a noncausal ("back-door") path between Smoking Mother and Mortality of Child, which gives rise to the paradox. Please see our discussion of the Back-Door Criterion for more details on noncausal paths.
In BayesiaLab, we can illustrate what happens by highlighting all information paths:
- Set Mortality of Child as Target Node.
- Set evidence on Low-Birth-Weight.
- Select Smoking Mother. Then, run Main Menu > Analysis > Visual > Graph > Influence Paths to Target.
- Now, all influence paths are visible.
If we observe Smoking Mother=False, this explains away Low-Birth Weight=True and reduces the probability of Birth Defect=True;
On the other hand, if we observe Smoking Mother=False, the probability of Birth Defect=True increases, and the probability of Mortality of Child=True increases, too.

Alternatively, we can use the WebSimulator to replicate these two scenarios:

Tech Talk: Epidemic Modeling with Bayesian Networks

Dr. Lionel Jouffe, Bayesia S.A.S. & Stefan Conrady, Bayesia USA

BayesiaLab Tech Talk Overview

Compartmental models represent the most common approach for characterizing the development of an epidemic. In an earlier webinar, we introduced a compartmental S-I-R-D model and created a highly-simplified Bayesian network to illustrate the principles. Given its great relevance, we believe the topic warrants a more detailed explanation beyond the initial "toy model."

For the purpose of this BayesiaLab Tech Talk, we present a more comprehensive S-E-I-R-D model. Each letter denotes a compartment (or state) of individuals in a population:

S: number of susceptible
E: number of exposed
I: number of infected
R: number recovered
D: number of dead

Additionally, we further differentiate within the states of exposed and infected to account for contagiousness and disease severity.

In standard models, a set of differential equations describes how individuals move between the compartments/states. In this Tech Talk, we implement the differential equations as probabilistic, temporal relationships between nodes in a Bayesian network.

While we often use fictional values in webinars to emphasize methodology over the subject matter, we take a different approach here: The numerical values and parameters presented in this Tech Talk are derived from current COVID-19 observations in France. As a result, the model attempts to represent the actual pandemic situation in France and forecast the pandemic progression.

Presentation Video

Presentation Materials

Webinar: Spatial Modeling with Bayesian Networks

Stefan Conrady, Bayesia USA

These days, our daily movements are largely governed by "social distancing" mandates. Their purpose is self-explanatory, and following them is certainly the prudent thing to do.

Given the economic impact of keeping individuals apart, the question arises about what kind of distancing is most effective. How much worse is it, for instance, to have groups of 100 people versus groups of 10 in one place? What is the risk of traveling by bus compared to carpooling? How should employees be spaced across open-floor offices once work resumes?

This webinar will neither answer these specific questions nor provide policy recommendations. However, we will endeavor to provide a formal framework for reasoning about such questions.

Distribution of Distances

In this context, the distribution of distances between random points within a given space is very important. For example, if 1000 people were evenly distributed in a square hall, what would be the distribution of their distances, i.e., how many would be close together versus far apart? Unfortunately, the mathematical solution to this question is far from trivial. The distribution of distances l within a unit square is as follows:

P(l) = \left\{ {\begin{array}{*{20}{c}} {2l\left( {{l^2} - 4l + \pi } \right)}&{0 \leqslant l \leqslant 1} \\ {2l\left( { - {l^2} - 4{{\tan }^{ - 1}}\left( {\sqrt {{l^2} - 1} } \right) + 4\sqrt {{l^2} - 1} + \pi - 2} \right)}&{1 < l \leqslant \sqrt 2 } \end{array}} \right.

Presumably, the distances between individuals would be proportional to the probability of transmitting an infection. But that's a secondary question for now. First, we need to consider that individuals in groups are typically not evenly spaced. Secondly, environments can have any shape, although rectangles are very common. And, finally, people move about all the time. With that, an algebraic solution for the distribution of real-world distances between individuals seems unattainable.

Bayesian Networks to the Rescue

Bayesian networks can help with a fairly simple and intuitive solution to this problem. A simple Bayesian network with only five nodes can compute the same distributions as the complicated formula above.

In fact, with this network, we can compute the distributions for any arbitrary positioning of individuals in any type of space. For instance, we could take the U.S. population within the geographic bounds of the country and determine the distribution of distances between any two individuals, and all we need is the above network. Also, we can go beyond Euclidean distances and utilize the great-circle distance between points on the Earth's surface.

Our proposed approach will be the backbone of our discussion about "social distancing." It will allow us to quantify the effect of mandating or changing distances at different levels, similar to using signal filters for attenuating certain frequencies within a frequency range.

Post-Pandemic Applications

Beyond the current pandemic, there are presumably many other applications for this approach. We illustrated one of them in a recent webinar on Geographic Optimization with Bayesian Networks and BayesiaLab. One particular application was finding the optimal warehouse location given the distribution of manufacturing sites and end customers.

A key benefit of this methodology is that changing distributions does not require recalculating the distances of thousands of points. Rather, the Bayesian network can instantly perform inference given new inputs, thus allowing to simulate different configurations, such as desk arrangements in a classroom, etc. It opens up applications, such as maximizing the distances between seat assignments of airplane passengers.

Focusing on BayesiaLab

Compared to earlier webinars, we will emphasize the technical aspects of implementing the proposed methodologies with BayesiaLab. In this context, we will also cover foundational elements which may already be familiar to current BayesiaLab users. The objective is that you can replicate all examples presented in the webinar independently after the event.

Presentation Video

Presentation Materials

About the Presenter

Stefan Conrady has over 20 years of experience in decision analysis, analytics, market research, and product strategy with Mercedes-Benz, BMW Group, Rolls-Royce Motor Cars, and Nissan, which included assignments in North America, Europe, and Asia.

Today, in his role as Managing Partner of Bayesia USA and Bayesia Singapore, he is recognized as a thought leader in applying Bayesian networks for research, analytics, and reasoning.

Recently, Stefan and his colleague Dr. Lionel Jouffe co-authored Bayesian Networks & BayesiaLab — A Practical Introduction for Researchers, which is now available as an e-book.

Webinar: Differential Diagnosis of COVID-19

Dr. Lionel Jouffe, Bayesia S.A.S. & Stefan Conrady, Bayesia USA

Artificial Intelligence for Pandemic Triage with Bayesian Networks

This webinar introduces our collaborative knowledge elicitation project for the differential diagnosis of COVID-19 and influenza-like diseases.

COVID-19 WebSimulator

From Local Insight to Worldwide Diagnostic Practice

We present a comprehensive knowledge elicitation and reasoning framework that is built on the Bayesian network paradigm. You will see the practical steps for eliciting knowledge with the Bayesia Expert Knowledge Elicitation Environment and see the resulting knowledge base in the form of a Bayesian network. This workflow aggregates emerging medical knowledge and produces an evolving expert system that clinicians can use through a public web portal.

Overcoming Human Challenges in Reasoning

We also briefly present the principles of probabilistic inference and the fundamental challenges that humans — including experts — have with reasoning from symptoms back to their potential causes. In this context, we introduce Bayesian networks as a reasoning framework that can help overcome these cognitive limitations and provide normative inference given the available knowledge.

Please note that the COVID-19 WebSimulator is experimental and not meant to provide medical advice to patients. Always consult your healthcare professional regarding any symptoms or health conditions you may have!

Presentation Video

Presentation Slides

Webinar: Differential Diagnosis of Diseases

Stefan Conrady, Bayesia USA

Webinar Overview

With the outbreak of the COVID-19 pandemic, reasoning about diseases has gone mainstream. No longer is it just healthcare professionals that perform differential diagnoses. Newspapers and social media have been publicizing charts that compare symptoms of COVID-19, the "regular" flu, and the common cold so individuals can potentially self-diagnose and reduce the burden on healthcare providers.

While a chart can list symptoms, it is not an "inference engine." Deliberate reasoning still has to happen in the mind of the self-diagnosing individual to reach a conclusion. That turns out to be the difficult part, as humans are ill-equipped to handle probabilistic inference from effect back to the cause, i.e., from symptom to disease.

In this webinar, we present Bayesian networks as a framework for encoding knowledge about diseases and symptoms. Given this knowledge base, we then use BayesiaLab's inference algorithms to update the probabilities of the potential conditions given the observed symptoms. A very similar model, the so-called "Visit Asia" network, was one of the earliest examples that illustrated the reasoning capabilities of Bayesian networks.

Please note that this webinar does not constitute medical advice. Although the example is based on current events, we focus solely on the reasoning process. Thus, all numerical values and probabilities shown in the presentation should be considered fictional.

Presentation Video

Presentation Slides

Example: Cancer Classification with Gene Expression Data

Example

This simple network model illustrates how BayesiaLab can quickly learn a Bayesian network classifier from a dataset consisting of 7129 genes from 72 tumor samples.

Datasets & Network Files

References

Golub, Todd R., et al. "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring." science 286.5439 (1999): 531-537.

Example: Unsupervised Learning with Currency Exchange Data

Example

This simple model was generated by using BayesiaLab's unsupervised learning algorithms. The network shows the high-dimensional relationships between twelve currency pairs:

EUR/USD
USD/JPY
USD/CHF
GBP/USD
USD/CAD
EUR/GBP
EUR/JPY
EUR/CHF
AUD/USD
GBP/JPY
CHF/JPY
GBP/CHF

Dataset & Network

Example: House Price Predictor

Context

This example is based on a dataset that characterizes the transactions of single-family homes in Ames, Iowa, from 2006 to 2010 (De Cock, 2011)
Comprising 2930 entries, the dataset includes a wide array of explanatory variables, including 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables.

Data & Network Files

References

Dean De Cock. Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project. Journal of Statistical Education, 19(3), 2011.

Example: Air Defense Threat Assessment Model

Context

This example features an expert-designed Bayesian network for automated situation assessment in command and control systems. This model provides Combat-ID and Threat Assessment decision support in naval anti-air warfare.

Network File

References

Bladon, P., Day, P., Hughes, T., & Stanley, P. (2006). High-Level Fusion using Bayesian Networks: Applications in Command and Control.

Example: Diagnostic Decision Support for Lung Diseases

Context

The two attached network examples represent derivations of the well-known "Visit Asia" example, which was first presented in Lauritzen and Spiegelhalter (1988).

Network Files

References

Lauritzen, S., & Spiegelhalter, D. (1988). Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157-224. Retrieved June 24, 2020, from www.jstor.org/stable/2345762

Webinar: What is Importance?

Stefan Conrady, Bayesia USA

Context & Motivation

Everybody knows the meaning of "importance," right? "What is important?" is a common question in daily life, and it is presumably the most common question in research. It's all about attempting to understand what matters within the context of a given domain.

Upon entering the world of statistics and analytics, we encounter a myriad of measures all related to importance, e.g., correlation, weight, significance, indirect/direct effect size, temporal/contemporaneous effects, unit effect, standardized effect, Bayes Factor, Mutual Information, KL-Divergence, contribution, elasticity, etc. Additionally, some of these measures should not be used in isolation but instead need to be seen in conjunction with other quantities, such as joint probability, for decision-making purposes. This highlights that "importance" is not at all a narrowly-defined concept but that it instead covers a broad and diverse spectrum of notions.

While none of these measures are tied to Bayesian networks, we employ this framework to explain major and minor differences between these concepts. More specifically, we attempt to develop an intuition for all of the above concepts using machine-learned Bayesian network models. Our objective is to understand in which contexts what measures of importance are most appropriate to use.

Presentation Video

Presentation Materials

About the Presenter

Today, in his role as Managing Partner of Bayesia USA and Bayesia Singapore, he is recognized as a thought leader in applying Bayesian networks for research, analytics, and reasoning.

Recently, Stefan and his colleague Dr. Lionel Jouffe co-authored Bayesian Networks & BayesiaLab — A Practical Introduction for Researchers, which is now available as an e-book.

Webinar: Calculating Contributions Using Causal Counterfactuals

Stefan Conrady, Bayesia USA

Context & Motivation

Attribution and contribution often appear in a similar context, and both concepts are closely related to causality. In general, attribution identifies the cause of an observed outcome. In the marketing domain, however, attribution has a somewhat unique interpretation and often refers to the origin of a consumer’s journey toward a purchase. In this particular context, observed outcomes are attributed to specific prior touchpoints, such as website visits or ad clicks.

On the other hand, contribution, as the name implies, refers to the confluence of multiple factors or causes with regard to an effect. In the marketing context, multiple advertising campaigns and promotions, beyond just single touchpoints, would contribute to sales, for instance. So, the definition of contribution is reasonably straightforward.

The decomposition and quantification of the contributing causes is the problem. Plus, this challenge is not new, as this quote from the late 19th century suggests: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half” (no pun intended, but the attribution of this quote is uncertain). In other words, we do not know how promotional activities contribute to the outcome, i.e., sales. Conversely, calculating the contributions means that we proportionally allocate a given outcome to any number of potential causes.

Calculating Contributions

While contribution appears to be rather straightforward in conceptual terms, a mathematical definition is not nearly as obvious.

We propose distinguishing between two types of contributions, which we shall call Type 1 and Type 2 Contributions. Both types rely on computing the difference between factual and counterfactual outcomes corresponding to factual and counterfactual conditions of multiple causes.

A factual outcome is simply an actual observation of an outcome, e.g., sales. Associated with a factual outcome are multiple causes at their observed, factual levels. A counterfactual outcome results from causes being set to hypothetical, counterfactual conditions. This begs the question of how we can calculate a counterfactual outcome. We need to calculate the counterfactual outcome by simulating a counterfactual intervention using a causal model. In our case, we use a Bayesian network, which provides numerous advantages for our purposes.

A Fictional Example

We introduce an elementary fictional domain with three causes and one outcome as an example. In fact, we make up the “laws of nature” and, thus, have perfect knowledge of this data-generating process (DGP).

From this generated data, we then machine-learn a Bayesian network that approximates the joint probability distribution of the data as if we did not know the DGP. By default, of course, any machine-learned network would be non-causal. However, by utilizing VanderWeele’s Disjunctive Cause Criterion for confounder selection, we can indeed utilize the learned Bayesian network for causal inference. Hence, we can simulate the effect of setting all three causes to counterfactual states. That choice, however, requires making assumptions from expert knowledge.

In this webinar, we perform machine learning with BayesiaLab and use its Likelihood Matching algorithm for causal inference computations. In addition to calculating contributions, we can determine the “baseline level” of the outcome variable and estimate synergies (positive and negative) between multiple causes.

Presentation Video

Presentation Materials

About the Presenter

Today, in his role as Managing Partner of Bayesia USA and Bayesia Singapore, he is recognized as a thought leader in applying Bayesian networks for research, analytics, and reasoning.

Recently, Stefan and his colleague Dr. Lionel Jouffe co-authored Bayesian Networks & BayesiaLab — A Practical Introduction for Researchers, which is now available as an e-book.