Webinar: Key Driver Analysis with Bayesian Networks — From Observational Data to Causal Inference
Recorded on April 24, 2025.
Executive Summary
Our free webinar offers a comprehensive tutorial on leveraging Bayesian networks, BayesiaLab, and GenAI for Key Driver Analysis and Decision Support. This webinar's case study focuses on HR Analytics, and we utilize data from the 2023 Federal Employee Viewpoint Survey (FEVS) (opens in a new tab) to identify the optimal prioritization of management initiatives that would most effectively and efficiently enhance employee satisfaction. With years of consulting experience, we have successfully implemented this methodology for the world’s largest CPG, skincare, beverage, food, and healthcare companies.
Given the large number of variables in the survey, we begin our workflow by utilizing BayesiaLab's machine learning and clustering algorithms to create a Bayesian network model and reduce its dimensionality by identifying latent factors. Additionally, we use BayesiaLab's GenAI assistant, Hellixia, to generate meaningful characterizations of these newly identified factors.
Our first research objective is to prioritize opportunities for enhancing overall employee satisfaction at each federal agency by performing an observational Key Driver Analysis using the Bayesian network model learned from the FEVS data.
Next, we transition this machine-learned Bayesian network into a model designed for causal inference, enabling simulations of policy interventions, such as evaluating the effects of mandating a return to the office for employees. To support this shift from observational to causal inference, we incorporate domain knowledge into the model by applying the Disjunctive Cause Criterion.
This case study demonstrates how machine-learned Bayesian networks, combined with causal domain knowledge, can facilitate policy analysis for robust and audit-proof decision support.
Webinar Recording
Webinar Workflow
Bayesian Network Learning
We start our decision-support workflow by constructing a Bayesian network model using BayesiaLab's Unsupervised Machine-Learning Algorithms, using data from the 2023 Federal Employee Viewpoint Survey (FEVS) (opens in a new tab). This approach enables us to capture high-dimensional associations within the underlying problem domain.
Initial Analysis
To explore our initial model, we evaluate information-theoretic measures, such as Mutual Information and Kullback-Leibler Divergence, to learn about the strength of associations within the network. This step serves as a critical sanity check, confirming that the learned structure aligns with the expected data patterns.
Latent Factor Induction
With this initial Bayesian network established, we utilize BayesiaLab’s clustering algorithms to identify latent concepts among the manifest variables recorded in the survey. These latent variables represent major themes or concepts that can be indirectly observed through the manifest variables. Using Multiple Clustering, we formally introduce these latent variables as factors within the network, allowing them to summarize the manifest variables and, therefore, provide a big-picture view of the problem domain.
To generate meaningful names for the newly created factors, we use Hellixia, BayesiaLab's new GenAI assistant. This ensures that the new factor names provide conceptual abstractions of the manifest variable names, while the factors represent mathematical summaries of the manifest variables.
Key Driver Analysis — Observational Inference
To apply this framework to a specific research question, we designate a Target Node, which is Q43 – "I recommend my organization as a good place to work." (in the original FEVS study, this variable served as an overall satisfaction index). By connecting this Target Node to the newly created factors, our Bayesian network transforms into a Probabilistic Structural Equation Model.
Building on this foundation, we conduct a Key Driver Analysis to evaluate how the factors within the model influence the Target Node. However, a key challenge must be highlighted: while the term "driver" implies causality, our survey data is strictly observational, meaning that we are limited — for now — to performing observational inference to estimate the "effects" of the factors on the Target Node. As a result, we can simulate what-if scenarios but cannot compute the causal impact of policy interventions.
Despite this limitation, we leverage BayesiaLab's Target Dynamic Profile function to identify the optimal order of priorities among variables for improving Q43. However, for formal decision support regarding policy options, we ultimately require causal inference.
Transition to Causal Inference for Impact Analysis
To overcome this limitation, we must integrate external causal information into the model, e.g., from human domain knowledge. We do that by applying the Disjunctive Cause Criterion to distinguish between Confounders and Non-confounders within the network. Under specific conditions, we can now perform causal inference using our machine-learned, non-causal model.
With this framework, we can simulate the causal effects of hypothetical policy interventions. For our purposes, we estimate the impact of a hypothetical return-to-office policy and how such an intervention will influence employee perceptions and workplace dynamics. This impact analysis of a proposed policy now serves as the basis for decision support.
For formal decision support, however, we require more than the estimated impact of a policy under consideration. As we will see, the impact of the policy goes beyond the Target Node and "spreads" to a wide range of variables throughout the network. While the policy impact on the Target Node might be desirable from a decision-maker's perspective, the side effects might not.
Formal Decision Analysis and Support Using Utilities
To provide a comprehensive assessment of the domain as a result of a policy intervention, we need to introduce so-called Utilities and define them for all relevant variables. These Utilities can represent the value judgments of decision-makers and stakeholders concerning the states of variables in this domain. For instance, higher employee satisfaction would have a positive Utility, but an increase in employee turnover should be considered negative. Summing the Utilities across all variables under alternative policy options now allows us to evaluate the proposed policy, along with alternatives, and inform the pending decision in this regard.
Similar Studies and Applications
While this case study focuses on personnel management, the same approach applies to many other domains. In the past, we have shown similar workflows in the context of consumer surveys with the objective of improving product satisfaction.
About the Presenter
Stefan Conrady has over 20 years of experience in decision analysis, analytics, market research, and product strategy, having worked with Mercedes-Benz, BMW Group, Rolls-Royce Motor Cars, and Nissan across North America, Europe, and Asia.
As Managing Partner of Bayesia USA and Bayesia Singapore, he is widely recognized as a thought leader in applying Bayesian networks to research, analytics, and decision-making. Together with his business partner, Dr. Lionel Jouffe, he co-authored Bayesian Networks & BayesiaLab — A Practical Introduction for Researchers, an influential resource now widely cited in academic literature.
With their deep expertise in Bayesian networks for Key Driver Analysis and Optimization, Stefan and Lionel are highly sought-after consultants, advising global leaders such as Procter & Gamble, Coca-Cola, UnitedHealth Group, L’Oréal, the World Bank, and many of the world’s largest market research firms.