Chapter 8: Probabilistic Structural Equation Models for Key Driver Analysis

Structural Equation Modeling is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions. This definition of a Structural Equation Model (SEM) was articulated by the geneticist Sewall Wright (1921), the economist Trygve Haavelmo (1943), and the cognitive scientist Herbert Simon (1953), and formally defined by Judea Pearl (2000). Structural Equation Models (SEM) allow both confirmatory and exploratory modeling, meaning they are suited to both theory testing and theory development.
What we call Probabilistic Structural Equation Models (PSEMs) in BayesiaLab are conceptually similar to traditional SEMs. However, PSEMs are based on a Bayesian network structure as opposed to a series of equations. More specifically, PSEMs can be distinguished from SEMs in terms of key characteristics:
In general, specifying and estimating a traditional SEM requires a high degree of statistical expertise. Additionally, the multitude of manual steps involved can make the entire SEM workflow extremely time-consuming. The PSEM workflow in BayesiaLab, on the other hand, is accessible to non-statistician subject matter experts. Perhaps more importantly, it can be faster by several orders of magnitude. Finally, once a PSEM is validated, it can be utilized like any other Bayesian network. This means that the full array of analysis, simulation, and optimization tools is available to leverage the knowledge represented in the PSEM.

Example: Consumer Survey

In this chapter, we present a prototypical PSEM application: key drivers analysis and product optimization based on consumer survey data. We examine how consumers perceive product attributes and how these perceptions relate to the consumers’ purchase intent for specific products.
Given the inherent uncertainty of survey data, we also wish to identify higher-level variables, i.e., “latent” variables that represent concepts that are not directly measured in the survey. We do so by analyzing the relationships between the so-called “manifest” variables, i.e., variables that are directly measured in the survey. Including such concepts helps in building more stable and reliable models than what would be possible using manifest variables only.
Our overall objective is to make surveys clearer to interpret by researchers and make them “actionable” for decision-makers. The ultimate goal is to use the generated PSEM for prioritizing marketing and product initiatives to maximize purchase intent.


This study is based on a monadic consumer survey about perfumes, which was conducted by a market research agency in France. In this study, each respondent evaluated only one perfume.
In this example, we use survey responses from 1,320 women who have evaluated a total of 11 fragrances (representative of the French market) on a wide range of attributes:

Workflow Overview

A PSEM is a hierarchical Bayesian network that can be generated through a series of machine-learning and analysis tasks:
  • All relationships in a PSEM are probabilistic—hence the name, as opposed to having deterministic relationships plus error terms in traditional SEMs.
  • PSEMs are nonparametric, which facilitates the representation of nonlinear relationships plus relationships between categorical variables.
  • The structure of PSEMs is partially or fully machine-learned from data.
    • 27 ratings on fragrance-related attributes, such as Sweet, Flowery, Feminine, etc., measured on a 1–10 scale.
    • 12 ratings with regard to imagery about someone who wears the respective fragrance, e.g. Sexy, Modern, measured on a 1–10 scale.
    • 1 variable for Intensity, a measure reflecting the level of intensity, measured on a 1–5 scale. The variable Intensity is listed separately due to the a priori knowledge of its non-linearity and the existence of a “just-about-right” level.
    • 1 variable for Purchase Intent, measured on a 1–6 scale.
    • 1 nominal variable, Product, for product identification.
  • Unsupervised Learning to discover the strongest relationships between the manifest variables.
  • Variable Clustering, based on the learned Bayesian network, to identify groups of variables that are strongly connected.
  • Multiple Clustering: we consider the strong intra-cluster connections identified in the Variable Clustering step to be due to a “hidden common cause.” For each cluster of variables, we use Data Clustering—on the variables within the cluster only—to induce a latent variable representing the hidden cause.
  • Unsupervised Learning to find the interrelations between the newly-created latent variables and their relationships with the Target Node.

Of particular interest is BayesiaLab’s Contingency Table Fit (CTF), which measures the quality of the JPD representation. It is defined as:
\[CTF(B) = \frac{{\overline {ll} ({B_u}) - \overline {ll} (B)}}{{\overline {ll} ({B_f}) - \overline {ll} (B)}}\]
\({\overline {ll} (B)}\)
is the mean of the log-likelihood of the data given the network currently under study,
\(\overline {ll} ({B_u})\)
is the mean of the log-likelihood of the data given the fully unconnected network, i.e., the “worst-case scenario,” and
\(\overline {ll} ({B_f})}\)
Step 4: Completing the Probabilistic Structural Equation Model
Based on the “final” network, we can proceed to the next step in our network building process. We now introduce Purchase Intent, which had been excluded up to this point. Clicking this node while holding X renders it “un-excluded.” This makes Purchase Intent available for learning. Additionally, we designate Purchase Intent as Target Node by double-clicking the node while holding T.
Looking for an SEM-type network structure stipulates that Manifest variables be connected exclusively to the Factors and that all the connections with Purchase Intent must go through the factors. We have already imposed this constraint by setting the option Forbid New Relations with Manifest Variables in the Multiple Clustering dialog box. This created so-called Forbidden Arcs, which prevent learning algorithms from creating new arcs between the specified nodes. BayesiaLab indicates the presence of Forbidden Arcs with an icon in the lower right-hand corner of the Graph Panel window. Clicking on the icon brings up the Forbidden Arc Editor, which allows us to review the currently set constraints. We see that the nodes belonging to the Class Manifest must not have any links to any other nodes, i.e., both directions are “forbidden.”
Upon confirming these constraints, we start Unsupervised Learning to generate a network that includes the Factors and the Target Node. In this particular situation, we need to utilize Taboo Learning. It is the only algorithm in BayesiaLab that can learn a new structure on top of an existing network structure and simultaneously guarantee to keep Fixed Arcs unchanged (EQ can also be used for structural learning on top of an existing network, but as it searches in the space of Essential Graphs, there is no guarantee that the Fixed Arcs remain unchanged). This is important as the arcs linking the Factors and their Manifest variables are such Fixed Arcs. To distinguish them visually, Fixed Arcs appear as dotted lines in the network, as opposed to the solid lines of “regular” arcs.
We start Taboo Learning from the main menu by selecting Learning > Unsupervised Structural Learning > Taboo and check the option Keep Network Structure in the Taboo Learning dialog box.
Upon completing the learning process, we obtain the network shown below.
As in Step 1, we also try to improve the quality of this network by using the Data Perturbation algorithm.
As it turns out, this algorithm allowed us to escape from a local optimum and returned a final network with a lower MDL Score. By using Automatic LayoutP and turning on Node Comments, we can quickly transform this network into a much more interpretable format.
Now we see how the manifest variables are “laddering up” to the Factors, and we also see how the Factors are related to each other. Most importantly, we can observe where the Purchase Intent node was attached to the network during the learning process. The structure conveys that Purchase Intent is only connected to [Factor_2], which is labeled with the Node Comment “Pleasure_(4).”

Key Drivers Analysis

Our Probabilistic Structural Equation Model is now complete, and we can use it to perform the analysis part of this exercise, namely to find out what “drives” Purchase Intent. We return to the Validation ModeF5.
In order to understand the relationship between the factors and Purchase Intent, we want to “tune out” all Manifest variables for the time being. We can do so by right-clicking the Classes icon in the bottom right corner of the Graph Panel window. This brings up a list of all Classes. By default, all are checked and thus visible.
For our purposes, we want to un-check All and then only check the class Factor.
In the resulting view, all the Manifest Nodes are transparent, so the relationship between the Factors becomes visually more prominent. By de-selecting the Manifest Nodes in this way, we also exclude them from the following visual analysis.
Target Analysis
In line with our objective of learning about the key drivers in this domain, we proceed to analyze the association of the newly created Factors with Purchase Intent.
We return to the Validation ModeF5, in which we can use two approaches to learn about the relationships between Factors and the Target Node: we first perform a visual analysis and then generate a report in table format.
Visual Analysis
We initiate the visual analysis by selecting Analysis > Visual > Target Mean Analysis > Standard:
This brings up a dialog box with the options shown below. Given the context, selecting Mean for both the Target Node and the Variables (Nodes) is appropriate.
Upon clicking Display Sensitivity Chart, the resulting plot shows the response curves of the Target Node as a function of the values of the Factors. This allows an immediate interpretation of the strength of association.
Target Analysis Report
As an alternative to the visual analysis, we now run the Target Analysis Report: Analysis > Report > Target Analysis > Total Effects on Target. Although “effects” carries a causal connotation, we need to emphasize that we are strictly examining associations. This means that we perform observational inference as we generate this report.
A new window opens up to present the report. Under Options > Settings > Reporting, we can check Display the Node Comments in Tables so that Node Comments appears in addition to the Node Names in all reports.
The Total Effect (TE) is estimated as the derivative of the Target Node with respect to the driver node under study.
$$TE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}}$$
where X is the analyzed variable and Y is the Target Node. The Total Effect represents the change in the mean of the Target Node associated with—and not necessarily caused by— a small modification of the mean of a driver node. The Total Effect is the ratio of these two values.
This way of measuring the effect of the Factors on the Target Node assumes the relationships to be locally linear. Even though this is not always a correct assumption, it can be reasonable for simulating small changes of satisfaction levels.
As per the report, [Factor_2] provides the strongest Total Effect with a value of 0.399. This means that observing an increase of one unit in the level of the concept represented by [Factor_2] predicts a posterior probability distribution of Purchase Intent that has an expected value that is 0.399 higher compared to the marginal value.
The Standardized Total Effect (STE) is also displayed. It represents the Total Effect multiplied by the ratio of the standard deviation of the driver node and the standard deviation of the Target Node.
$$STE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}} \times \frac{{{\sigma _X}}}{{{\sigma _Y}}}$$
This means that Standardized Total Effect takes into account the “potential” of the driver under study.
In the report, the results are sorted by the Standardized Total Effect in descending order. This immediately highlights the order of importance of the Factors relative to the Target Node, Purchase Intent.
Independence Tests
In the columns further to the right in the report, the results of independence tests between the nodes are reported:
  • Chi-Square (χ2) test or G-test: The independence test is computed on the basis of the network between each driver node and the target variable. It is possible to change the type of independence from the Chi-Square (χ2) test to the G-test via Options > Settings > Statistical Tools.
  • Degree of Freedom: Indicates the degree of freedom between each driver node and the Target Node in the network.
  • p-value: the p-value is the probability of observing a value as extreme as the test statistic by chance.
If a dataset is associated with the network, as is the case here, the independence test, the degrees of freedom, and the p-value are also computed directly from the underlying data.
Factors versus Manifest Nodes
For overall interpretation purposes, looking at Factor-level drivers can be illuminating. Often, it provides a useful big-picture view of the domain. In order to identify specific product actions, however, we need to consider the Manifest-level drivers. As pointed out earlier, the Factor-level drivers only exist as theoretical constructs, which cannot be directly observed in data. As a result, changing the Factor nodes requires the manipulation of the underlying Manifest nodes. For this reason, we now switch back our view of the network in order to only consider the Manifest nodes in the analysis. We do that by right-clicking the Classes icon in the bottom right corner of the Graph Panel window. This brings up the list of all Classes, of which we only check the Class Manifest. Now all Factors are translucent and excluded from the analysis.
We repeat both the Target Mean Analysis and the Total Effects on Target report.
Not surprisingly, the Manifest Nodes show a similar pattern of association as the Factors. However, there is one important exception: the Manifest NodeIntensity shows a nonlinear relationship with Purchase Intent. The curve for Intensity is shown with a gray line. Note that by hovering over a curve or a node name, BayesiaLab highlights the corresponding item in the legend or the plot respectively.
Also, we can see that Intensity was recorded on a 1−5 scale, rather than the 1−10 scale that applies to the other nodes. Intensity is a so-called “JAR” variable, i.e. a variable that has a “just-about-right” value. In the context of perfumes, this characteristic is obvious. A fragrance that is either too strong or too light is undesirable. Rather, there is a value somewhere in-between that would make a fragrance most attractive. The JAR characteristic is prototypical for variables representing sensory dimensions, e.g., saltiness or sweetness.
This emphasizes the importance of the visual analysis, as the nonlinearity goes unnoticed in the Total Effects on Target report. In fact, it drops almost to the bottom of the list in the report.
It turns out to be rather difficult to optimize a JAR-type variable at a population level. For example, increasing Intensity would reduce the number of consumers who find the fragrance too subtle. On the other hand, an increase in Intensity would presumably dismay some consumers who believed the original Intensity level to be appropriate.
Constraints via Costs
As this drivers' analysis model is intended to be used for product optimization, we need to consider any possible real-world constraints that may limit our ability to optimize any of the drivers in this domain. For instance, a perfumer may know how to change the intensity of a perfume but may not know how to directly affect the perception of “pleasure.” In the original study, a number of such constraints were given.
In BayesiaLab, we can conveniently encode constraints via Costs, which is a Node Property. More specifically, we can declare any node as Not Observable, which—in this context—means that they cannot be considered with regard to optimization. Costs can be set by right-clicking on an individual node and then selecting Properties > Cost.
This brings up the Cost Editor for an individual node. By default, all nodes have a cost of 1.
Unchecking the box Cost, or setting a value ≤0, results in the node becoming Not Observable.
Alternatively, we can bring up the Cost Editor for all nodes by right-clicking on the Graph Panel and then selecting Edit Costs from the contextual menu.
The Cost Editor presents the default values for all nodes.
Again, setting values to zero will make nodes Not Observable. Instead of applying this setting node by node, we can import a Cost Dictionary that defines the values for each node. An excerpt from the text file is shown below. The syntax is straightforward: Not Observable is represented by 0.
From within the Cost Editor, we can use the Import button to associate a Cost Dictionary. Alternatively, we can select Data > Associate Dictionary > Node > Costs from the main menu.
Upon import, the Node Editor reflects the new values, and the presence of non-default values for costs is indicated by the Cost icon in the lower right-hand corner of the Graph Panel.
Furthermore, upon defining Costs, we can see that all Not Observable nodes are marked with a light purple background.
It is important to point out that all Factors are also set to Not Observable in our example. In fact, we do have two options here:
  1. 1.
    The optimization can be done at the first level of the hierarchical model, i.e. using the Manifest variables;
  2. 2.
    The optimization can be performed at the second level of the model, i.e. using the Factors.
Most importantly, these two approaches cannot be combined as setting evidence on Factors will block information coming from Manifest variables. Formally declaring the Factors as Not Observable tells BayesiaLab to proceed with option #1. Indeed, our plan is to perform optimization using the Manifest variables only.

Multi-Quadrant Analysis

The network we have analyzed thus far modeled Purchase Intent as a function of perceived perfume characteristics. It is important to point out that this model represents the entire domain of all 11 tested perfumes. It is reasonable to speculate, however, that different perfumes have different drivers of Purchase Intent. Furthermore, for purposes of product optimization, we certainly need to look at the dynamics of each product individually.
BayesiaLab assists us in this task by means of Multi-Quadrant Analysis. This is a function that can generate new networks as a function of a Breakout Node in an existing network. This is the point where the node Product comes into play, which has been excluded all this time. Our objective is to generate a set of networks that model the drivers of Purchase Intent for each perfume individually, as identified by the Product breakout variable.
We start the Multi-Quadrant Analysis by selecting Tools > Multi-Quadrant Analysis.
This brings up the dialog box, in which we need to set a number of options:
Firstly, Breakout Variable must be set to Product to indicate that we want to generate a network for each state of Product. For Analysis, we have a several options: We choose Total Effects to be consistent with the earlier analysis. Regarding the Learning Algorithm, we select Parameter Estimation. This choice becomes obvious once the dataset representing the “overall market” is split into 11 product-specific subsets. Now, the number of available observations per product drops to only 120. Given that most of our variables have 5 states, learning a structure with a dataset that small would be challenging.
This also explains why we used the entire dataset to learn the PSEM structure, which will be shared by all the products. However, using Parameter Estimation will ensure that the parameters, i.e., the probability tables of each network, will be estimated based on the subsets of records associated with each state of Product.
Among the Options, we check Regenerate Values. This recomputes, for each new network, the values associated with each state of the discretized nodes based on the respective subset of data.
There is no need to check Rediscretize Continuous Nodes because all discretized nodes share the same variation domain, and we required equal distance binning during the data import. However, we do recommend using this option if the variation domains are different between subsets in a study, e.g., sales volume in California versus Vermont. Without using the Rediscretize Continuous Nodes option, it could happen that all data points for sales in Vermont end up in the first bin, effectively transforming the variable into a constant.
Furthermore, we do not check the option for Linearize Nodes’ Values either. This function reorders a node’s states so that its states’ values have a monotonically positive relationship with the values of the Target Node. Applying this transformation to the node Intensity would artificially increase its impact. It would incorrectly imply that it is possible to change a perfume in a way that simultaneously satisfies those consumers who rated it as too subtle and also those who rated it as too strong. Needless to say, this is impossible.
Finally, computing all Contributions will be helpful for interpreting each product-specific network.
Upon clicking OK, 11 networks are created and saved to the Output Directory defined in the dialog box. Each network is then analyzed with the specified Analysis method to produce the Multi-Quadrant Plot.
The x-value of each point indicates the mean value of the corresponding Manifest Node, as rated by those respondents who have evaluated Product 1; the position on the y-axis reflects the computed Total Effect.
From the contextual menu, we can choose Display Horizontal Scales and Display Vertical Scales, which provides the range of positions of the other products.
Using Horizontal Scales provides a quick overview of how the product under study is rated vis-à-vis other products. The Vertical Scales compare the importance of each dimension with respect to Purchase Intent. Finally, we can select the individual product to be displayed in the Multi-Quadrant Analysis window via the Contextual Menu.
Drawing a rectangle with the cursor zooms in on the specified area of the plot.
The meaning of the Horizontal Scales and Vertical Scales becomes apparent when hovering over any dot as this brings up the position of the other (competitive) products with regard to the currently highlighted attribute.
This means, for instance, that Product 2 and Product 7 are rated lowest and highest respectively on the x-scale with regard to the variable Fresh. In terms of Total Effect on Purchase Intent, Product 12 and Product 2 mark the bottom and top end respectively (y-scale).
From a product management perspective, this suggests that for Product 1, with regard to the attribute Fresh, there is a large gap to the level of the best product, i.e., Product 7. So, one could interpret the variation from the status quo to the best level as “room for improvement” for Product 1.
On the other hand, as we can see below, the variables Personality, Original, and Feminine, and have a greater Total Effect on Purchase Intent. These relative positions will soon become relevant as we will need to simultaneously consider improvement potential and importance for optimizing Purchase Intent.
BayesiaLab’s Export Variations function allows us to save the variation domain for each driver, i.e., the minimum and maximum mean values observed across all products in the study.
Knowing these variations will be useful for generating realistic scenarios for the subsequent optimization. However, what do we mean by “realistic”? Ultimately, only a subject matter expert can judge how realistic a scenario is. However, a good heuristic is whether or not a certain level is achieved by any product in the market. One could argue that the existence of a certain satisfaction level for some product means that such a level is not impossible to achieve and is, therefore, “realistic.”
Clicking the Export Variations button saves the Absolute Variations to a text file for subsequent use in optimization.

Product Optimization

In order to perform optimization for a particular product, we need to open the network for that specific product. Networks for all products were automatically generated and saved during the Multi-Quadrant Analysis, so we simply need to open the network for the product of interest. The suffix in the file name reflects the Product.
To demonstrate the optimization process, we open the file that corresponds to Product 1. Structurally, this network is identical to the network learned from the entire dataset. However, the parameters of this network were estimated only on the basis of the observations associated with Product 1.
Now we have all the elements that are necessary for optimizing the Purchase Intent of Product 1:
  • A network that is specific to Product 1;
  • A set of driver variables, selected by excluding the non-driver variables via Costs;
  • Realistic scenarios, as determined by the Variation Domains of each driver variable.
With the above, we are now in a position to search for node values that optimize Purchase Intent.
Target Dynamic Profile
Before we proceed, we need to explain what we mean by optimization. As all observations in this study are consumer perceptions, it is clear that we cannot directly manipulate them directly. Rather, the purpose of this optimization is to identify in which order these perceptions should be addressed by the perfume maker. Some consumer perceptions may relate to specific odoriferous compounds that a perfumer can modify; other perceptions can perhaps be influenced by marketing and branding initiatives. However, the precise mechanism of influencing consumer perceptions is not the subject of our discussion. From our perspective, the causes that could influence the perception are hidden. Thus, we have here a prototypical application of Soft Evidence, i.e., we assume that the simulated changes in the distribution of consumer perceptions originate in hidden causes (see Numerical Evidence in Chapter 7).
While BayesiaLab offers a number of optimization methods, Target Dynamic Profile is appropriate here. We start it from within Validation ModeF5 by selecting Analysis > Report > Target Analysis > Target Dynamic Profile.
We need to explain the large number of options that must be set for Target Dynamic Profile. These options will reflect our objective of pursuing realistic sets of evidence:
In Profile Search Criterion we specify that we want to optimize the mean value of the Target Node, as opposed to any particular state or the difference between states.
Joint Probability
Next, we specify under Criterion Optimization that the mean value of the Target Node is to be maximized. Furthermore, we check Take Into Account the Joint Probability. This weights any potential improvement in the mean value of Target Node by the joint probability that corresponds to the set of simulated evidence that generated this improvement. The joint probability of a simulated evidence scenario will be high if its probability distribution is close to the original probability distribution observed in the consumer population: the higher the joint probability, the closer is the simulated scenario to the status quo of customer perception.
In practice, checking this option means that we prefer smaller improvements with a high joint probability over larger ones with a low joint probability: 0.146 × 26.9% = 0.0393 > 0.174 × 21.33% = 0.0371.
If all simulated values were within the constraints set in the Variation Editor, it would be better to increase the driver variable Spiced to a simulated value of 7 rather than 7.5, even though Purchase Intent would be higher for the latter value of Spiced. In other words, the “support” for E(Spiced)=7 is greater than for E(Spiced)=7.5, as more respondents are already in agreement with such a scenario. Once again, this is about pursuing improvements that are achievable rather than proposing pie-in-the-sky scenarios.
In this example, so far, we have only used Costs for selecting the subset of driver variables. Additionally, we can utilize Costs in the original sense of the word in the optimization process. For instance, if we had information on the typical cost of improving a specific rating by one unit, we could enter such a value as cost. This could be a dollar value, or we could set the costs in such a way that they reflect the relative effort required for the same amount of change, e.g., one unit, in each of the driver variables. For example, a marketing manager may know that it requires twice as much effort to change the perception of Feminine compared to changing the perception of Sweet. If we want to quantify such efforts by using Costs, we will need to ensure that the costs of all variables share the same scale. For instance, if some drivers are measured in dollars, and others are measured in terms of time spent in hours, we will need to convert hours to dollars.
In our study, we leave all the included driver variables at a Cost of 1, i.e., we assume that it requires the same effort for the same amount of change in any driver variable. Hence, we can leave the Utilize Evidence Cost unchecked (Not Observable nodes still remain excluded as driver variables).
Compute Only Prior Variations needs to remain unchecked as well. This option would be useful if we were interested in only computing the marginal effect of drivers. For that purpose, we would not want any cumulative effects or conditional variations given other drivers.
Associate Evidence Scenario will save the identified sets of evidence for subsequent evaluation.
The setting of Search Methods is critically important for the optimization task. We need to define how to search for sets of evidence. Using Hard Evidence means that would we exclusively try out sets of evidence consisting of nodes with one state set to 100%. This would imply that we simulate a condition in which all consumers perfectly agree with regard to some ratings. Needless to say, this would be utterly unrealistic. Instead, we will explore sets of evidence, consisting of distributions for each node, by modifying their mean values as Soft Evidence. More precisely, we use the MinXEnt method to generate such evidence (see Minimum Cross-Entropy in Chapter 7).
In this context, we reintroduce the Variations we saved earlier. We reason that the best-rated product with regard to a particular attribute represents a plausible upper limit for what any product could strive for in terms of improvement. This also means that a driver variable that has already achieved the best level will not be optimized any further in this framework.
Variation Editor
Clicking on Variations brings up the Variation Editor. By default, it shows variations in the amount of ±100% of the current mean.
To load the Variations that we generated earlier through Multi-Quadrant Analysis, we click Import and select Absolute Variations from the pop-up window.
Now we can open the previously saved file.
The Variation Editor now reflects the constraints. Any available expert knowledge can be applied here, either by entering new values for the Minimum Mean or Maximum Mean or by entering percent values for Positive Variations and Negative Variations.
Depending on the setting, the percentages are relative to (a) the Mean, (b) the Domain, or (c) the Progression Margin.
Selecting the Progression Margin is particularly useful as it automatically constrains the positive and negative variations in proportion to the gap from the Current Mean to the Maximum Mean and Minimum Mean values respectively. In other words, it limits the improvement potential of a driver variable as its value approaches the maximum. It is a practical—albeit arbitrary—approach to prevent overly optimistic optimizations.
Next, we select MinXEnt in the Search Method panel as the method for generating Soft Evidence. In terms of Intermediate Points, we set a value of 20. This means that BayesiaLab will simulate 22 values for each node, i.e., the minimum and maximum plus 20 intermediate values, all within the constraints set by the variations. This is particularly useful in the presence of non-linear effects.
Within the Search Stop Criteria panel, Maximum Size of Evidence specifies the maximum number of driver variables to be recommended as part of the optimization policy. This setting is once again driven by real-world considerations. Although one could wish to bring all variables to their ideal level, a decision-maker may recognize that it is not plausible to pursue anything beyond the top-4 variables.
Alternatively, we can choose to terminate the optimization process once the joint probability of the simulated evidence drops below the specified Minimum Joint Probability.
The final option, Use the Automatic Stop Criterion, leaves it up BayesiaLab to determine whether adding further evidence provides a worthwhile improvement for the Target Node.
Optimization Results
Once the optimization process concludes, we obtain a report window that contains a list of priorities: Personality, Fruity, Flowery, and Tenacious.
To explain the items in the report, we present a simplified and annotated version of the report below. Note that this report can be saved in HTML format, for subsequent editing as a spreadsheet.
Most importantly, the Value/Mean column shows the successive improvement upon implementation of each policy. From initially 3.58, the Purchase Intent improves to 3.86, which may seem like a fairly small step. However, the importance lies in the fact that this improvement is not based on Utopian thinking, but rather on modest changes in consumer perception, well within the range of competitive performance.
Evidence Scenarios
As an alternative to interpreting the static report, we can examine each element in the list of priorities. To do so, we bring up all the Monitors of the nodes identified for optimization.
Then, we retrieve the individual steps by right-clicking on the Evidence Scenario icon in the lower right-hand corner of the main window.
Selecting the first row in the table (Index=0) sets the evidence that corresponds to the first priority, i.e., Personality. We can now see that the evidence we have set is a distribution, rather than a single value. The small gray arrows indicate how the distribution of the evidence and the distributions of Purchase Intent, Fruity, Flowery, and Tenacious are all different from their prior, marginal distributions. The changes to the Fruity, Flowery, and Tenacious correspond what is shown in the report in the column Value/Mean at T.
By selecting Index=1 we introduce a second set of evidence, i.e. the optimized distribution for Personality.
Continuing with Index 2 and 3, we see that the improvements to Purchase Intent become smaller.
Bringing up all the remaining nodes would bring up any “collateral” changes as a result of setting multiple pieces of evidence.
The results tell us that for Product 1, a higher consumer rating of Personality would be associated with a higher Purchase Intent. Improving the perception of Personality might be a task for the marketing and advertising team. Similarly, a better consumer rating of Fruity would also be associated with greater Purchase Intent. A product manager could then interpret this and request a change to some ingredients. Our model tells us that, if such changes in consumer ratings were to be brought about in the proposed order, a higher Purchase Intent would be potentially be observed.
While we have only presented the results for Product 1, we want to highlight that the priorities are indeed different for each product, even though they all share the same underlying PSEM structure. The recommendations from the Target Dynamic Profile of Product 11 are shown below.
This is an interesting example as it identifies that the JAR-type variable Intensity needs to be lowered to optimize Purchase Intent for Product 11.
It is important to reiterate that the sets of evidence we apply are not direct interventions in this domain. Hence, we are not performing causal inference. Rather, the sets of evidence we found help us prioritize our course of action for product and marketing initiatives.


We presented a complete workflow that generates a Probabilistic Structural Equation Model for key drivers analysis and product optimization. The Bayesian networks paradigm turned out to a practical platform for the development of the model and its subsequent analysis, all the way through optimization. With all steps contained in BayesiaLab, we have a single, continuous line of reasoning from raw survey data to a final order of priorities for action.