1 of 8

Chapter 8: Probabilistic Structural Equation Models for Key Driver Analysis

Structural Equation Modeling is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions. This definition of a Structural Equation Model (SEM) was articulated by the geneticist Sewall Wright (1921), the economist Trygve Haavelmo (1943), and the cognitive scientist Herbert Simon (1953), and formally defined by Judea Pearl (2000). Structural Equation Models (SEM) allow both confirmatory and exploratory modeling, meaning they are suited to both theory testing and theory development.

What we call Probabilistic Structural Equation Models (PSEMs) in BayesiaLab are conceptually similar to traditional SEMs. However, PSEMs are based on a Bayesian network structure as opposed to a series of equations. More specifically, PSEMs can be distinguished from SEMs in terms of key characteristics:

In general, specifying and estimating a traditional SEM requires a high degree of statistical expertise. Additionally, the multitude of manual steps involved can make the entire SEM workflow extremely time-consuming. The PSEM workflow in BayesiaLab, on the other hand, is accessible to non-statistician subject matter experts. Perhaps more importantly, it can be faster by several orders of magnitude. Finally, once a PSEM is validated, it can be utilized like any other Bayesian network. This means that the full array of analysis, simulation, and optimization tools is available to leverage the knowledge represented in the PSEM.

Example: Consumer Survey

In this chapter, we present a prototypical PSEM application: key drivers analysis and product optimization based on consumer survey data. We examine how consumers perceive product attributes and how these perceptions relate to the consumers’ purchase intent for specific products.

Given the inherent uncertainty of survey data, we also wish to identify higher-level variables, i.e., “latent” variables that represent concepts that are not directly measured in the survey. We do so by analyzing the relationships between the so-called “manifest” variables, i.e., variables that are directly measured in the survey. Including such concepts helps in building more stable and reliable models than what would be possible using manifest variables only.

Our overall objective is to make surveys clearer to interpret by researchers and make them “actionable” for decision-makers. The ultimate goal is to use the generated PSEM for prioritizing marketing and product initiatives to maximize purchase intent.

Dataset

This study is based on a monadic consumer survey about perfumes, which was conducted by a market research agency in France. In this study, each respondent evaluated only one perfume.

In this example, we use survey responses from 1,320 women who have evaluated a total of 11 fragrances (representative of the French market) on a wide range of attributes:

Workflow Overview

A PSEM is a hierarchical Bayesian network that can be generated through a series of machine-learning and analysis tasks:

All relationships in a PSEM are probabilistic—hence the name, as opposed to having deterministic relationships plus error terms in traditional SEMs.
PSEMs are nonparametric, which facilitates the representation of nonlinear relationships plus relationships between categorical variables.
The structure of PSEMs is partially or fully machine-learned from data.
- 27 ratings on fragrance-related attributes, such as Sweet, Flowery, Feminine, etc., measured on a 1–10 scale.
- 12 ratings with regard to imagery about someone who wears the respective fragrance, e.g. Sexy, Modern, measured on a 1–10 scale.
- 1 variable for Intensity, a measure reflecting the level of intensity, measured on a 1–5 scale. The variable Intensity is listed separately due to the a priori knowledge of its non-linearity and the existence of a “just-about-right” level.
- 1 variable for Purchase Intent, measured on a 1–6 scale.
- 1 nominal variable, Product, for product identification.
Unsupervised Learning to discover the strongest relationships between the manifest variables.
Variable Clustering, based on the learned Bayesian network, to identify groups of variables that are strongly connected.
Multiple Clustering: we consider the strong intra-cluster connections identified in the Variable Clustering step to be due to a “hidden common cause.” For each cluster of variables, we use Data Clustering—on the variables within the cluster only—to induce a latent variable representing the hidden cause.
Unsupervised Learning to find the interrelations between the newly-created latent variables and their relationships with the Target Node.

Workflow Details

Data Import

For this example, we need to override the default data type for the variable named Product as it is a nominal product identifier rather than a numerical value. We can change this variable’s data type by highlighting the Product column and clicking the Discrete radio button. This changes the color of the Product column to red. We also define Purchase Intent and Intensity as Discrete variables. Their number of states is suitable for our purposes.

The next step is the Discretization and Aggregation screen. Given the number of observations, it is appropriate to reduce the number of states of the ratings from the original 10 states (1–10) to a smaller number. All these variables measure satisfaction on the same scale, i.e., from 1 to 10. Following our earlier recommendations (see Discretization Intervals in Chapter 6), the best choice in this context is the Equal Distance discretization.

By clicking Select All Continuous, we highlight all to-be-discretized variables. Then, we choose the type of discretization to be applied, which is Equal Distance. Furthermore, given the number of observations, we choose 5 bins for the discretization.

Clicking Finish finalizes the import process. Upon completion, we are prompted whether we want to view the Import Report.

As there is no uncertainty with regard to the outcome of the discretization, we decline and automatically obtain a fully unconnected network with 49 nodes.

Step 1: Unsupervised Learning

As a first step, we need to exclude the node Purchase Intent, which will later serve as our Target Node. We do not want this node to become part of the structure that we will subsequently use for discovering hidden concepts. Likewise, we need to exclude the node Product, as it does not contain consumer feedback to be evaluated.

The upcoming series of steps is crucial. We now need to prepare a robust network on which we can later perform the clustering process. Given the importance, we recommend going through the full range of Unsupervised Learning algorithms and comparing the performance of each resulting network structure to select the best structure.

The objective is to increase our chances of finding the optimal network for our purposes. Given that the number of possible networks grows super-exponentially with the number of nodes, this is a significant challenge.

It may not be immediately apparent how such an astronomical number of networks could be possible. The illustration below shows how 3 nodes can be combined in 25 ways to form a network.

Needless to say, generating all 9×10^376 networks—based on 47 nodes—and then selecting the best one is completely intractable (for reference, it is estimated that there are between 1078 to 1082 atoms in the known, observable universe). So, an exhaustive search would only be feasible for a small subset of nodes.

As a result, we have to use heuristic search algorithms to explore a small part of this huge space in order to find a local optimum. However, a heuristic search algorithm does not guarantee to find the global optimum. This is why BayesiaLab offers a range of distinct learning algorithms, which all use different search spaces and search strategies:

Bayesian networks for Maximum Weight Spanning Tree and Taboo.
Essential Graphs for EQ and SopLEQ, i.e., graphs with edges and arcs representing classes of equivalent networks.
Order of nodes for Taboo Order.

This diversity increases the probability of finding a solution close to the global optimum. Given adequate time and resources for learning, we recommend employing the algorithms in the following sequence to find the best solution:

Maximum Weight Spanning Tree + Taboo
Taboo (“from scratch”)
EQ (“from scratch”) + Taboo
SopLEQ + Taboo
Taboo Order + Taboo

However, to keep the presentation compact, we only illustrate the learning steps for the EQ algorithm: Main Menu > Learning > Unsupervised Structural Learning > EQ.

The network generated by the EQ algorithm is shown below.

We now need to assess the quality of this network. Each of BayesiaLab’s Unsupervised Learning algorithms uses the Minimum Description Length (MDL) Score —internally—as the measure to optimize while searching for the best possible network. However, we can also employ the MDL Score for explicitly rating the quality of a network.

Minimum Description Length

The Minimum Description Length (MDL) Score is a two-component score, which has to be minimized to obtain the best solution. It has been used traditionally in the artificial intelligence community for estimating the number of bits required for representing (1) a “model” and (2) “data given this model.”

In our machine-learning application, the “model” is a Bayesian network consisting of a graph and probability tables. The second term is the log-likelihood of the data given the model, which is inversely proportional to the probability of the observations (data) given the Bayesian network (model). More formally, we write this as:

α represents BayesiaLab’s Structural Coefficient (the default value is 1), a parameter that permits changing the weight of the structural part of the MDL Score (the lower the value of α, the greater the complexity of the resulting networks),

MDL Score Comparison

We now use the MDL Score to compare the results of all learning algorithms. We can look up the MDL Score of the current network by pressing W while hovering the cursor over the Graph Panel. This brings up an info box that reports a number of measures, including the MDL Score, which is displayed here as the “Final Score.”

The MDL score can only be compared for networks with precisely the same representation of all variables, i.e., with the same discretization thresholds and the same data.

Alternatively, we can open up the Console via Main Menu > Options > Console > Open Console:

The Console maintains a kind of “log” that keeps track of the learning progress by recording the MDL Score (or “Network Score”) at each step of the learning process. Here, the “Final Score” marks the MDL Score of the current network, which is what we need to select the network with the lowest value.

The EQ algorithm produces a network with an MDL Score of 98,606.572. As it turns out, this performance is on par with all the other algorithms we considered, although we skip presenting the details in this chapter. Given this result, we can proceed with the EQ-learned network to the next step.

Data Perturbation

As a further safeguard against utilizing a sub-optimal network, BayesiaLab offers Data Perturbation, which is an algorithm that adds random noise (from within the interval [-1,1]) to the weight of each observation in the dataset.

In the context of our learning task, Data Perturbation can help us escape from local minima, which we could have encountered during learning. We start this algorithm by selecting Learning > Data Perturbation.

For Data Perturbation, we need to set a number of parameters. The additive noise is always generated from a Normal distribution with a mean of 0, but we must set the Initial Standard Deviation. A Decay Factor defines the exponential attenuation of the standard deviation with each iteration.

Upon completing Data Perturbation, we see the newly learned network in the Graph Panel. Once again, we can retrieve the score by pressing W while hovering with the cursor over the Graph Panel or by looking it up in the Console. The score remains unchanged at 98,606.572. We can now be reasonably confident that we have found the optimal network given the original choice of discretization, i.e., the most compact representation of the joint probability distribution defined by the 47 manifest variables.

Step 2: Variable Clustering

BayesiaLab’s Variable Clustering is a hierarchical agglomerative clustering algorithm that uses Arc Force (i.e., the Kullback-Leibler Divergence) for computing the distance between nodes.

At the start of Variable Clustering, each manifest variable is treated as a distinct cluster. The clustering algorithm proceeds iteratively by merging the “closest” clusters into a new cluster. Two criteria are used for determining the number of clusters:

Stop Threshold: a minimum Arc Force value below which clusters are not merged (a kind of significance threshold).
Maximum Cluster Size: the maximum number of variables per cluster.

These criteria can be set via Main Menu > Options > Settings > Learning > Variable Clustering:

We do not advise changing the Stop Threshold, but Maximum Cluster Size is more subjective. For building PSEMs, we recommend a value between 5 and 7, for reasons that will become clear when we show how latent variables are generated. If, however, the goal of Variable Clustering is dimensionality reduction, we suggest increasing the Maximum Cluster Size to a much higher value, thus effectively eliminating it as a constraint.

The Variable Clustering algorithm can be started via Main Menu > Learning > Clustering > Variable Clustering or by using the shortcut S.

In this example, BayesiaLab identified 15 clusters, and each node is now color-coded according to its cluster membership. The following image shows the standalone graph—outside the BayesiaLab window for better legibility.

BayesiaLab offers several tools for examining and editing the proposed cluster structure. They are accessible from an extended menu bar (highlighted in the screenshot below).

Dendrogram

Also, the Dendrogram can be copied directly as a vector or bitmap graphic by right-clicking on it. Alternatively, it can be exported in various formats via the Save As... button. As such, it can be imported into documents and presentations. This ability to copy and paste graphics applies to most graphs, plots, and charts in BayesiaLab.

Cluster Mapping

By hovering over any cluster “bubbles” with the cursor, BayesiaLab displays a list of all manifest nodes connected to that particular cluster. Each list of manifest variables is sorted according to the intra-cluster Node Force. This also explains the names displayed on the clusters. By default, each cluster takes on the name of the strongest manifest variable.

Number of Clusters

As explained earlier, BayesiaLab uses two criteria to determine the default number of clusters. We can change this number via the selector in the menu bar.

The Dendrogram and the Mapping view respond dynamically to any changes to the number of clusters.

Cluster Validation

The result of the Variable Clustering algorithm is purely descriptive. Once the question regarding the number of clusters is settled, we need to formally confirm our choice by clicking the Validate Clustering button in the toolbar. Only then can we trigger the creation of one Class per Cluster. At that time, all nodes become associated with unique Classes named “[Factor_i]”, with i representing the identifier of the factor. Additionally, we are prompted to confirm that we wish to keep the node colors generated during clustering.

The Clusters are now saved, and the color-coding is formally associated with the nodes. A Clustering Report provides a formal summary of the new Factors and their associated Manifest variables.

Note that we use the following terms interchangeably: “derived concept,” “unobserved latent variable,” “hidden cause,” and “extracted factor.”

Editing Factors

Cluster Cross-Validation

We now examine the robustness of the identified factors, i.e., how these factors respond to changes in sampling. This is particularly important for studies that are regularly repeated with new data, e.g., annual customer satisfaction surveys. Inevitably, survey samples are going to be different between the years. As a result, machine learning will probably discover somewhat different structures each time and, therefore, identify different clusters of nodes. Therefore, it is important to distinguish between a sampling artifact and a substantive change in the joint probability distribution. In the context of our example, the latter would reveal a structural change in consumer behavior.

We start the validation process via Main Menu > Tools > Cross-Validation > Variable Clustering > Data Perturbation.

This brings up the dialogue box shown below.

These settings specify that BayesiaLab will learn 100 networks with EQ and perform Variable Clustering on each of them, all while maintaining the constraint of a maximum of 5 nodes per cluster without any attenuation of the perturbation. Upon completion, we obtain a report panel, from which we initially select Variable Clustering Report.

The Variable Clustering Report consists primarily of two large tables. The first table in the report shows the cluster membership of each node in each network (only the first 12 columns are shown). Here, thanks to the colors, we can easily detect whether nodes remain clustered together between iterations or whether they “break up.”

The second table shows how frequently individual nodes are clustered together.

Clicking the Clustering Frequency Graph provides yet another visualization of the clustering patterns. The thickness of the lines is proportional to the frequency of nodes being in the same Cluster. Equally important for interpretation is the absence of lines between nodes. For instance, the absence of a line between Flowery and Modern says that they have never been clustered together in any of the 100 samples. If they were to cluster together in future iteration with new survey data, it would probably reflect a structural change in the market rather than a data sampling artifact.

Step 3: Multiple Clustering

The Cluster Cross-Validation was merely a review, and it did not change the factors we confirmed when we clicked the Validate Clustering button. Although we have defined these Factors in terms of Classes of Manifest variables, we still need to create the corresponding latent variables via Multiple Clustering. This process creates one discrete Factor for each Cluster of variables by performing Data Clustering on each subset of clustered manifest variables.

In traditional statistics, deriving such latent variables or factors is typically performed by means of Factor Analysis, e.g., Principal Components Analysis (PCA).

Data Clustering

Before we run this automatically across all factors with the Multiple Clustering algorithm, we will demonstrate the process on a single cluster of nodes, namely the nodes associated with Factor_0: Active, Bold, Character, Fulfilled, and Trust. We simply delete all other nodes and arcs and save this subset of nodes as a new, separate XBL file.

The objective of BayesiaLab’s Data Clustering algorithm is to create a node that compactly represents the joint probability distribution defined by the variables of interest. We start Data Clustering via Main Menu > Learning > Clustering > Data Clustering. Unless we select a subset, Data Clustering will be applied to all nodes.

In the Data Clustering dialog box, we set the options as shown below. Among the settings, we need to point out that we leave the number of states of the to-be-induced factor open; we only set a range, e.g., 2–5. This means we let BayesiaLab determine the optimal number of states for representing the joint probability distribution.

Upon completing the clustering process, we obtain a graph with newly induced [Factor_0] being connected to all its associated manifest variables.

Furthermore, BayesiaLab produces a window that contains a comprehensive Clustering Report.

Given its size, we show it as two separate tables below.

Relative Significance

In the second part of the report, the variables are sorted by Relative Significance with respect to the Target Node, which is [Factor_0].

Mapping

From the window that contains the report, we can also produce a Mapping of the Clusters.

This graph displays three properties of the identified Cluster States (Cluster 1–Cluster 5) within the new Factor node, [Factor_0]:

The saturation of the blue represents the purity of the Cluster States: the higher the purity, the higher the saturation of the color. Here, all purities are in the 90%+ range, which is why they are all deep blue.
The sizes represent the respective marginal probabilities of the Cluster States. We will see this distribution again once we open the Monitor of the new factor node.
The distance between any two clusters is proportional to the neighborhood of the clusters.

Quadrants

Clicking the Quadrants button in the report window brings up the options for graphically displaying the node's relative importance with regard to the induced [Factor_0].

This Quadrant Plot highlights two measures that are relevant for interpretation:

Mutual Information on the y-axis, i.e., the importance of each Manifest variable with regard to [Factor_0].
The mean value of each Manifest variable on the x-axis.

This plot shows that the most important variable is Trust with I(Trust,[Factor_0])=1.26. It is also the variable with the highest expected satisfaction level, i.e., E(Trust)=6.79.

When hovering with the cursor over the plot, the upper panel of the Quadrant Plot window returns the exact coordinates of the respective point, i.e., Mutual Information and Mean Value in this example.

We return to the Graph Panel after closing the Quadrant Plot and the report window. It shows the newly induced [Factor_0] directly connected to all its associated Manifest variables. Applying the Automatic Layout P produces a suitable view of the network produced by the Data Clustering process.

In the Monitor of [Factor_0], we see that the name of each Cluster State carries a value shown in parentheses, e.g., C1 (2.022). This value is the weighted average of the associated Manifest variables given the state C1, where the weight of each variable is its Relative Significance with respect to [Factor_0]. That means that given the state C1 of [Factor_0], the weighted mean value of Trust, Bold, Fulfilled, Active, and Character is 2.022. This becomes more apparent when we actually set [Factor_0] to state C1.

Given that all the associated Manifest variables share the same satisfaction level scale, the values of the created states can also be interpreted as satisfaction levels. Cluster State C1 summarizes the “low” ratings across the Manifest nodes. Conversely, C5 represents mostly the “high” ratings; the other states cover everything else in between.

It is important to understand that each underlying record was assigned a specific state of [Factor_0]. In other words, the hidden variable is no longer hidden. It has been added to the dataset and imputed for all respondents. The imputation is done via Maximum Likelihood: given the satisfaction levels observed for each of the 5 Manifest variables, the state with the highest posterior probability is assigned to the respondent.

We can easily verify this by scrolling through each record in the dataset. To do so, we first set [Factor_0] as Target Node by right-clicking on it and selecting Set as Target Node from the Contextual Menu. Note that the Monitor corresponding to the Target Node turns red.

Then, we select Main Menu > Inference > Interactive Inference.

Network Performance Analysis

While the performance indices shown in the Data Clustering Report have already included some measures of fit, we can further study this point by starting a more formal performance analysis: Main Menu > Analysis > Network Performance > Overall.

The resulting report provides measures of how well this network represents the underlying dataset.

Contingency Table Fit

Of particular interest is BayesiaLab’s Contingency Table Fit (CTF), which measures the quality of the JPD representation. It is defined as:

where:

Accordingly, we can interpret the following key values of the CTF:

CTF is equal to 0 if the network represents a joint probability distribution no different than the one produced by the fully unconnected network, in which all the variables are marginally independent.
CTF is equal to 100 if the network represents the joint probability distribution of the data without any approximation, i.e., it has the same log-likelihood as the fully connected network.

The main benefit of employing CTF as a quality measure is that it has normalized values ranging between 0% and 100%.

This measure can become negative if the parameters of the model are not estimated from the currently associated dataset.

CTF in Practice

Even though this says the higher the CTF, the better the representation of the JPD, we are not aiming for CTF=100%. This would conflict with the objective of finding a compact representation of the JPD.

The Naive structure of the network used for Data Clustering implies that the entire JPD representation relies on the Factor node. Removing this node would produce a fully unconnected network with a CTF=0%. Therefore, BayesiaLab excludes—but does not remove—the Factor node when computing the CTF. This allows measuring the quality of the JPD representation with the induced clusters only.

It is not easy to recommend a threshold value below which the Factor should be “reworked,” as the CTF depends directly on the size of the JPD and the number of states of the Factor. For instance, given a Factor with 4 states and 2 binary Manifest variables, a CTF any lower than 100% would be a poor representation of the JPD, as the JPD only consists of 4 cells. On the other hand, given 10 manifest variables, with 5 states each, and a Factor also consisting of 5 states, a CTF of 50% would be a very compact representation of the JPD. This means that 5 states would represent a JPD of 510 cells with a quality of 50%.

Returning to the context of our PSEM workflow, we have the following 3 conditions:

A maximum number of 5 variables per cluster of variables;
Manifest variables with 5 states;
Factors with a maximum of 5 states.

In this situation, we recommend using 70% as an alert threshold. However, this threshold level would have to be reduced if conditions #1 and #2 increased in their values or if condition #3 decreased.

Multiple Clustering

The previous section on Data Clustering dealt exclusively with the induction of [Factor_0]. In our perfume study, however, we have 15 clusters of Manifest variables, for which 15 Factors need to be induced. This means all steps applicable to Data Clustering must be repeated 15 times. BayesiaLab simplifies this task by offering the Multiple Clustering algorithm, which automates all necessary steps for all Factors.

We now return to the original network. On this basis, we can immediately start Multiple Clustering: Main Menu > Learning > Clustering > Multiple Clustering.

We set the above requirements via Connect Factors to their Manifest Variables and Forbid New Relations with Manifest Variables. Another helpful setting is Compute Manifests’ Contributions to their Factor, which helps to identify the dominant nodes within each Factor.

We can save the report or proceed to the new network in the Graph Panel, which has all nodes arranged in a grid-like arrangement: Manifest variables are on the left; the new Factors are stacked up on the right.

Upon applying Automatic Layout (P), we can identify 15 Factors surrounded by their Manifest nodes, arranged almost like a field of flowers.

The Arc Comments, shown by default, display the Contribution of each Manifest variable towards its Factor. Once we turn off the Arc Comments and turn on the Node Comments, we see that the Node Comments contain the name of the “strongest” associated Manifest variable, along with the number of associated Manifest variables in parentheses. The following graph includes a subset of the nodes with their respective Node Comments.

Also, by going into our previously specified output directory, we can see that 15 new sub-networks—in BayesiaLab’s XBL format for networks—were generated. Any of these files would allow us to study the properties of the sub-network, as we did for the single Factor, [Factor_0] that was generated by Data Clustering.

Additionally, one more file was created in this directory, which is highlighted in the screenshot below. The file marked the suffix “_Final” is the network that consists of both the original Manifest variables and the newly created Factors. As such, it is labeled as the “final” network in BayesiaLab parlance. It is also the network that is currently active.

In this context, BayesiaLab also created two new Classes:

Manifest, which contains all the manifest variables;
Factor, which contains all the latent variables.

Opening the Class Editor confirms their presence (highlighted below).

Step 4: Completing the Probabilistic Structural Equation Model

Based on the “final” network, we can proceed to the next step in our network-building process. We now introduce Purchase Intent, which has been excluded up to this point. Clicking this node while holding X renders it “un-excluded.” This makes Purchase Intent available for learning. Additionally, we designate Purchase Intent as Target Node by double-clicking the node while holding T.

Upon confirming these constraints, we start Unsupervised Learning to generate a network that includes the Factors and the Target Node. In this particular situation, we need to utilize Taboo Learning. It is the only algorithm in BayesiaLab that can learn a new structure on top of an existing network structure and simultaneously guarantee to keep Fixed Arcs unchanged (EQ can also be used for structural learning on top of an existing network, but as it searches in the space of Essential Graphs, there is no guarantee that the Fixed Arcs remain unchanged). This is important as the arcs linking the Factors and their Manifest variables are such Fixed Arcs. To distinguish them visually, Fixed Arcs appear as dotted lines in the network instead of the solid lines of “regular” arcs.

We start Taboo Learning from the main menu by selecting Main Menu > Learning > Unsupervised Structural Learning > Taboo and check the option Keep Network Structure in the Taboo Learning dialog box.

Upon completing the learning process, we obtain the network shown below.

Now we see how the manifest variables are “laddering up” to the Factors and how the Factors are related to each other. Most importantly, we can observe where the Purchase Intent node was attached to the network during the learning process. The structure conveys that Purchase Intent is only connected to [Factor_2], which is labeled with the Node Comment “Pleasure_(4).”

Key Driver Analysis

For our purposes, we want to un-check All and then only check the class Factor.

In the resulting view, all the Manifest Nodes are transparent, so the relationship between the Factors becomes visually more prominent. By de-selecting the Manifest Nodes in this way, we also exclude them from the following visual analysis.

Target Analysis

In line with our objective of learning about the key drivers in this domain, we proceed to analyze the association of the newly created Factors with Purchase Intent.

Visual Analysis

We initiate the visual analysis by selecting Main Menu > Analysis > Visual > Target Mean Analysis > Standard:

This brings up a dialog box with the options shown below. Given the context, selecting Mean for both the Target Node and the Nodes is appropriate.

Upon clicking Display Sensitivity Chart, the resulting plot shows the response curves of the Target Node as a function of the values of the Factors. This allows an immediate interpretation of the strength of the associations.

Target Analysis Report

As an alternative to the visual analysis, we now run the Target Analysis Report: Main Menu > Analysis > Report > Target Analysis > Total Effects on Target.

Although “effect” carries a causal connotation, we must emphasize that we strictly examine associations. This means that we perform observational inference as we generate this report.

A new window opens up to present the report. Under Main Menu > Options > Settings > Reporting, we can check Display the Node Comments in Tables so that Node Comments appear in addition to the Node Names in all reports.

The Total Effect (TE) is estimated as the derivative of the Target Node with respect to the driver node under study.

TE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}}

where $X$ is the analyzed variable and $Y$ is the Target Node. The Total Effect represents the change in the mean of the Target Node associated with—and not necessarily caused by— a small modification of the mean of a driver node. The Total Effect is the ratio of these two values.

This way of measuring the effect of the Factors on the Target Node assumes the relationships to be locally linear. Even though this is not always a correct assumption, it can be reasonable to simulate small changes in satisfaction levels.

As per the report, [Factor_2] provides the strongest Total Effect with a value of 0.399. This means that observing an increase of one unit in the level of the concept represented by [Factor_2] predicts a posterior probability distribution of Purchase Intent that has an expected value that is 0.399 higher compared to the marginal value.

The Standardized Total Effect (STE) is also displayed. It represents the Total Effect multiplied by the ratio of the standard deviation of the driver node and the standard deviation of the Target Node.

TE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}} \times \frac{{{\sigma _X}}}{{{\sigma _Y}}}

This means that Standardized Total Effect takes into account the “potential” of the driver under study.

In the report, the results are sorted by the Standardized Total Effect in descending order. This immediately highlights the order of importance of the Factors relative to the Target Node, Purchase Intent.

Independence Tests

In the columns further to the right in the report, the results of independence tests between the nodes are reported:

Chi-Square ( ${\chi ^2}$ ) test or G-test: The independence test is computed on the basis of the network between each driver node and the target variable. It is possible to change the type of independence from the Chi-Square ( ${\chi ^2}$ ) test to the G-test via Main Menu > Options > Settings > Statistical Tools.
Degree of Freedom: Indicates the degree of freedom between each driver node and the Target Node in the network.
p-value: the p-value is the probability of observing a value as extreme as the test statistic by chance.

If a dataset is associated with the network, as is the case here, the independence test, the degrees of freedom, and the p-value are also computed directly from the underlying data.

Factors versus Manifest Nodes

We repeat both the Target Mean Analysis and the Total Effects on Target report.

Not surprisingly, the Manifest Nodes show a similar pattern of association as the Factors. However, there is one important exception: the Manifest NodeIntensity shows a nonlinear relationship with Purchase Intent. The curve for Intensity is shown with a gray line. Note that by hovering over a curve or a node name, BayesiaLab highlights the corresponding item in the legend or the plot.

Also, we can see that Intensity was recorded on a 1−5 scale rather than the 1−10 scale that applies to the other nodes. Intensity is a so-called “JAR” variable, i.e., a variable that has a “just-about-right” value. In the context of perfumes, this characteristic is obvious. A fragrance that is either too strong or too light is undesirable. Instead, there is a value somewhere in between that would make a fragrance most attractive. The JAR characteristic is prototypical for variables representing sensory dimensions, e.g., saltiness or sweetness.

This emphasizes the importance of visual analysis, as the nonlinearity goes unnoticed in the Total Effects on Target report. In fact, it drops almost to the bottom of the list in the report.

It turns out to be rather difficult to optimize a JAR-type variable at a population level. For example, increasing Intensity would reduce the number of consumers who find the fragrance too subtle. On the other hand, an increase in Intensity would presumably dismay some consumers who believed the original Intensity level to be appropriate.

Constraints via Costs

As this drivers' analysis model is intended for product optimization, we need to consider any possible real-world constraints that may limit our ability to optimize any of the drivers in this domain. For instance, a perfumer may know how to change the intensity of a perfume but may not know how to directly affect the perception of “pleasure.” In the original study, a number of such constraints were given.

In BayesiaLab, we can conveniently encode constraints via Costs, which is a Node Property. More specifically, we can declare any node as Not Observable, which — in this context — means that they cannot be considered with regard to optimization. Costs can be set by right-clicking on an individual node and then selecting Properties > Cost.

This brings up the Cost Editor for an individual node. By default, all nodes have a cost of 1.

Unchecking the box Cost or setting a value ≤0 results in the node becoming Not Observable.

Alternatively, we can bring up the Cost Editor for all nodes by right-clicking on the Graph Panel and then selecting Edit Costs from the contextual menu.

The Cost Editor presents the default values for all nodes.

Again, setting values to zero will make nodes Not Observable. Instead of applying this setting node by node, we can import a Cost Dictionary that defines the values for each node. An excerpt from the text file is shown below. The syntax is straightforward: Not Observable is represented by 0.

From within the Cost Editor, we can use the Import button to associate a Cost Dictionary. Alternatively, we can select Main Menu > Data > Associate Dictionary > Node > Costs from the main menu.

Furthermore, upon defining Costs, we can see that all Not Observable nodes are marked with a light purple background.

It is important to note that all Factors are also set to Not Observable in our example. In fact, we do have two options here:

The optimization can be done at the first level of the hierarchical model, i.e., using the Manifest variables;
The optimization can be performed at the second level of the model, i.e., using the Factors.

Most importantly, these two approaches cannot be combined as setting evidence on Factors will block information coming from Manifest variables. Formally declaring the Factors as Not Observable tells BayesiaLab to proceed with option #1. Indeed, we plan to perform optimization using the Manifest variables only.

Multi-Quadrant Analysis

The network we have analyzed thus far modeled Purchase Intent as a function of perceived perfume characteristics. It is important to point out that this model represents the entire domain of all 11 tested perfumes. However, it is reasonable to speculate that different perfumes have different drivers of Purchase Intent. Furthermore, for purposes of product optimization, we certainly need to look at the dynamics of each product individually.

BayesiaLab assists us in this task by means of Multi-Quadrant Analysis. This function can generate new networks as a function of a Breakout Node in an existing network. This is where the node Product comes into play, which has been excluded all this time. Our objective is to generate a set of networks that model the drivers of Purchase Intent for each perfume individually, as identified by the Product breakout variable.

We start the Multi-Quadrant Analysis by selecting Main Menu > Tools > Multi-Quadrant Analysis.

This brings up the dialog box, in which we need to set a number of options:

Firstly, the Breakout Variable must be set to Product to indicate that we want to generate a network for each state of Product. For Analysis, we have several options: We choose Total Effects to be consistent with the earlier analysis. Regarding the Learning Algorithm, we select Parameter Estimation. This choice becomes evident once the dataset representing the “overall market” is split into 11 product-specific subsets. Now, the number of available observations per product drops to only 120. Given that most of our variables have 5 states, learning a structure with a dataset that small would be challenging.

This also explains why we used the entire dataset to learn the PSEM structure, which all the products will share. However, using Parameter Estimation will ensure that the parameters, i.e., the probability tables of each network, will be estimated based on the subsets of records associated with each state of Product.

Among the Options, we check Regenerate Values. This recomputes, for each new network, the values associated with each state of the discretized nodes based on the respective subset of data.

There is no need to check Rediscretize Continuous Nodes because all discretized nodes share the same variation domain, and we required equal distance binning during the data import. However, we do recommend using this option if the variation domains are different between subsets in a study, e.g., sales volume in California versus Vermont. Without using the Rediscretize Continuous Nodes option, it could happen that all data points for sales in Vermont end up in the first bin, effectively transforming the variable into a constant.

Furthermore, we do not check the option for Linearize Nodes’ Values either. This function reorders a node’s states so that its states’ values have a monotonically positive relationship with the values of the Target Node. Applying this transformation to the node Intensity would artificially increase its impact. It would incorrectly imply that it is possible to change a perfume in a way that simultaneously satisfies consumers who rated it too subtle and those who rated it too strong. Needless to say, this is impossible.

Finally, computing all Contributions will be helpful for interpreting each product-specific network.

Upon clicking OK, 11 networks are created and saved to the Output Directory defined in the dialog box. Each network is then analyzed with the specified Analysis method to produce the Multi-Quadrant Plot.

The x-value of each point indicates the mean value of the corresponding Manifest Node, as rated by those respondents who have evaluated Product 1; the position on the y-axis reflects the computed Total Effect.

From the contextual menu, we can choose Display Horizontal Scales and Display Vertical Scales, which provide the range of positions of the other products.

Using Horizontal Scales provides a quick overview of how the product under study is rated vis-à-vis other products. The Vertical Scales compare the importance of each dimension with respect to Purchase Intent. Finally, we can select the individual product to be displayed in the Multi-Quadrant Analysis window via the Contextual Menu.

Drawing a rectangle with the cursor zooms in on the specified area of the plot.

The meaning of the Horizontal Scales and Vertical Scales becomes apparent when hovering over any dot, as this brings up the position of the other (competitive) products with regard to the currently highlighted attribute.

This means, for instance, that Product 2 and Product 7 are rated lowest and highest, respectively, on the x-scale with regard to the variable Fresh. Regarding the Total Effect on Purchase Intent, Product 12 and Product 2 mark the bottom and top end, respectively (y-scale).

From a product management perspective, this suggests that for Product 1, with regard to the attribute Fresh, there is a large gap to the level of the best product, i.e., Product 7. So, one could interpret the variation from the status quo to the best level as “room for improvement” for Product 1.

On the other hand, as we can see below, the variables Personality, Original, and Feminine and have a greater Total Effect on Purchase Intent. These relative positions will soon become relevant as we must simultaneously consider the potential for improvement and the importance of optimizing Purchase Intent.

BayesiaLab’s Export Variations function allows us to save the variation domain for each driver, i.e., the minimum and maximum mean values observed across all products in the study.

Knowing these variations will be useful for generating realistic scenarios for subsequent optimization. However, what do we mean by “realistic”? Ultimately, only a subject matter expert can judge how realistic a scenario is. However, a good heuristic is whether or not a certain level is achieved by any product in the market. One could argue that the existence of a certain satisfaction level for some product means that such a level is not impossible to achieve and is, therefore, “realistic.”

Clicking the Export Variations button saves the Absolute Variations to a text file for subsequent use in optimization.

Product Optimization

In order to perform optimization for a particular product, we need to open the network for that specific product. Networks for all products were automatically generated and saved during the Multi-Quadrant Analysis, so we need to open the network for the product of interest. The suffix in the file name reflects the Product.

To demonstrate the optimization process, we open the file that corresponds to Product 1. Structurally, this network is identical to the network learned from the entire dataset. However, the parameters of this network were estimated only on the basis of the observations associated with Product 1.

Now we have all the elements that are necessary for optimizing the Purchase Intent of Product 1:

A network that is specific to Product 1;
A set of driver variables, selected by excluding the non-driver variables via Costs;
Realistic scenarios, as determined by the Variation Domains of each driver variable.

With the above, we are now able to search for node values that optimize Purchase Intent.

Target Dynamic Profile

Before we proceed, we need to explain what we mean by optimization. As all observations in this study are consumer perceptions, it is clear that we cannot directly manipulate them directly. Rather, this optimization aims to identify in which order the perfume maker should address these perceptions. Some consumer perceptions may relate to specific odoriferous compounds that a perfumer can modify; other perceptions may be influenced by marketing and branding initiatives. However, the precise mechanism of influencing consumer perceptions is not the subject of our discussion. From our perspective, the causes that could influence the perception are hidden. Thus, we have here a prototypical application of Soft Evidence, i.e., we assume that the simulated changes in the distribution of consumer perceptions originate in hidden causes (see Numerical Evidence in Chapter 7).

We need to explain the many options that must be set for Target Dynamic Profile. These options will reflect our objective of pursuing realistic sets of evidence:

In Profile Search Criterion, we specify that we want to optimize the mean value of the Target Node as opposed to any particular state or the difference between states.

Joint Probability

Next, we specify under Criterion Optimization that the mean value of the Target Node is to be maximized. Furthermore, we check Take Into Account the Joint Probability. This weights any potential improvement in the mean value of the Target Node by the joint probability corresponding to the set of simulated evidence that generated this improvement. The joint probability of a simulated evidence scenario will be high if its probability distribution is close to the original probability distribution observed in the consumer population: the higher the joint probability, the closer the simulated scenario to the status quo of customer perception.

In practice, checking this option means that we prefer smaller improvements with a high joint probability over larger ones with a low joint probability: 0.146 × 26.9% = 0.0393 > 0.174 × 21.33% = 0.0371.

If all simulated values were within the constraints set in the Variation Editor, it would be better to increase the driver variable Spiced to a simulated value of 7 rather than 7.5, even though Purchase Intent would be higher for the latter value of Spiced. In other words, the “support” for E(Spiced)=7 is greater than for E(Spiced)=7.5, as more respondents are already in agreement with such a scenario. Once again, this is about pursuing achievable improvements rather than proposing pie-in-the-sky scenarios.

Costs

In this example, so far, we have only used Costs for selecting the subset of driver variables. Additionally, we can utilize Costs in the original sense of the word in the optimization process. For instance, if we had information on the typical cost of improving a specific rating by one unit, we could enter such a value as cost. This could be a dollar value, or we could set the costs to reflect the relative effort required for the same amount of change, e.g., one unit, in each driver variable. For example, a marketing manager may know that it requires twice as much effort to change the perception of Feminine compared to changing the perception of Sweet. If we want to quantify such efforts by using Costs, we must ensure that all variables' costs share the same scale. For instance, if some drivers are measured in dollars and others are measured in terms of time spent in hours, we will need to convert hours to dollars.

In our study, we leave all the included driver variables at a Cost of 1, i.e., we assume that it requires the same effort for the same amount of change in any driver variable. Hence, we can leave the Utilize Evidence Cost unchecked (Not Observable nodes still remain excluded as driver variables).

Compute Only Prior Variations needs to remain unchecked as well. This option would be useful if we were interested in only computing the marginal effect of drivers. We would not want any cumulative effects or conditional variations given other drivers for that purpose.

Associate Evidence Scenario will save the identified sets of evidence for subsequent evaluation.

In this context, we reintroduce the Variations we saved earlier. We reason that the best-rated product with regard to a particular attribute represents a plausible upper limit for what any product could strive for in terms of improvement. This also means that a driver variable that has already achieved the best level will not be optimized further in this framework.

Variation Editor

Clicking on Variations brings up the Variation Editor. By default, it shows variations in the amount of ±100% of the current mean.

To load the Variations we generated earlier through Multi-Quadrant Analysis, we click Import and select Absolute Variations from the pop-up window.

Now, we can open the previously saved file.

The Variation Editor now reflects the constraints. Any available expert knowledge can be applied here by entering new values for the Minimum Mean or Maximum Mean or by entering percent values for Positive Variations and Negative Variations.

Depending on the setting, the percentages are relative to (a) the Mean, (b) the Domain, or (c) the Progression Margin.

Selecting the Progression Margin is particularly useful as it automatically constrains the positive and negative variations in proportion to the gap from the Current Mean to the Maximum Mean and Minimum Mean values, respectively. In other words, it limits the improvement potential of a driver variable as its value approaches the maximum. It is a practical—albeit arbitrary—approach to prevent overly optimistic optimizations.

Next, we select MinXEnt in the Search Method panel as the method for generating Soft Evidence. In terms of Intermediate Points, we set a value of 20. This means that BayesiaLab will simulate 22 values for each node, i.e., the minimum and maximum plus 20 intermediate values, all within the constraints set by the variations. This is particularly useful in the presence of non-linear effects.

Within the Search Stop Criteria panel, the Maximum Size of Evidence specifies the maximum number of driver variables recommended as part of the optimization policy. Real-world considerations once again drive this setting. Although one could wish to bring all variables to their ideal level, a decision-maker may recognize that it is not plausible to pursue anything beyond the top 4 variables.

Alternatively, we can choose to terminate the optimization process once the joint probability of the simulated evidence drops below the specified Minimum Joint Probability.

The final option, Use the Automatic Stop Criterion, leaves it up BayesiaLab to determine whether adding further evidence provides a worthwhile improvement for the Target Node.

Optimization Results

Once the optimization process concludes, we obtain a report window that contains a list of priorities: Personality, Fruity, Flowery, and Tenacious.

To explain the items in the report, we present a simplified and annotated version of the report below. Note that this report can be saved in HTML format for subsequent editing as a spreadsheet.

Most importantly, the Value/Mean column shows the successive improvement upon implementing each policy. From initially 3.58, the Purchase Intent improves to 3.86, which may seem like a fairly small step. However, the importance lies in the fact that this improvement is not based on Utopian thinking but rather on modest changes in consumer perception, well within the range of competitive performance.

Evidence Scenarios

As an alternative to interpreting the static report, we can examine each element in the list of priorities. To do so, we bring up all the Monitors of the nodes identified for optimization.

Selecting the first row in the table (Index=0) sets the evidence corresponding to the first priority, i.e., Personality. We can now see that the evidence we have set is a distribution rather than a single value. The small gray arrows indicate how the distribution of the evidence and the distributions of Purchase Intent, Fruity, Flowery, and Tenacious, are all different from their prior marginal distributions. The changes to the Fruity, Flowery, and Tenacious correspond to what is shown in the report in the column Value/Mean at T.

By selecting Index=1 we introduce a second set of evidence, i.e., the optimized distribution for Personality.

Continuing with Index 2 and 3, we see that the improvements to Purchase Intent become smaller.

Bringing up all the remaining nodes would bring up any “collateral” changes due to setting multiple pieces of evidence.

The results tell us that for Product 1, a higher consumer rating of Personality would be associated with a higher Purchase Intent. Improving the perception of Personality might be a task for the marketing and advertising team. Similarly, a better consumer rating of Fruity would also be associated with greater Purchase Intent. A product manager could then interpret this and request a change to some ingredients. Our model tells us that if such changes in consumer ratings were brought about in the proposed order, a higher Purchase Intent would potentially be observed.

While we have only presented the results for Product 1, we want to highlight that the priorities are different for each product, even though they all share the same underlying PSEM structure. The recommendations from the Target Dynamic Profile of Product 11 are shown below.

This is an interesting example as it identifies that the JAR-type variable Intensity needs to be lowered to optimize Purchase Intent for Product 11.

It is important to reiterate that the sets of evidence we apply are not direct interventions in this domain. Hence, we are not performing causal inference. Instead, the sets of evidence we found help us prioritize our course of action for product and marketing initiatives.

Summary

We presented a complete workflow that generates a Probabilistic Structural Equation Model for key driver analysis and product optimization. The Bayesian networks paradigm turned out to be a practical platform for developing the model and its subsequent analysis all the way through optimization. With all the steps contained in BayesiaLab, we have a single, continuous line of reasoning from raw survey data to a final order of priorities for action.

Step 3: Multiple Clustering

In traditional statistics, deriving such latent variables or factors is typically performed by means of Factor Analysis, e.g., Principal Components Analysis (PCA).

Data Clustering

Upon completing the clustering process, we obtain a graph with newly induced [Factor_0] being connected to all its associated manifest variables.

Furthermore, BayesiaLab produces a window that contains a comprehensive Clustering Report.

Given its size, we show it as two separate tables below.

Relative Significance

In the second part of the report, the variables are sorted by Relative Significance with respect to the Target Node, which is [Factor_0].

R{S_i} = {{I(M{}_i,F)} \over {\mathop {\max }\limits_j I({M_j},F)}}

where represents the manifest variable, and represents the factor variable. The function computes the .

Mapping

From the window that contains the report, we can also produce a Mapping of the Clusters.

This graph displays three properties of the identified Cluster States (Cluster 1–Cluster 5) within the new Factor node, [Factor_0]:

The saturation of the blue represents the purity of the Cluster States: the higher the purity, the higher the saturation of the color. Here, all purities are in the 90%+ range, which is why they are all deep blue.
The sizes represent the respective marginal probabilities of the Cluster States. We will see this distribution again once we open the Monitor of the new factor node.
The distance between any two clusters is proportional to the neighborhood of the clusters.

Quadrants

Clicking the Quadrants button in the report window brings up the options for graphically displaying the node's relative importance with regard to the induced [Factor_0].

For our example, we select . Furthermore, we do not need to normalize the means as all values of the Manifest nodes in [Factor_0] are recorded on the same scale.

This Quadrant Plot highlights two measures that are relevant for interpretation:

Mutual Information on the y-axis, i.e., the importance of each Manifest variable with regard to [Factor_0].
The mean value of each Manifest variable on the x-axis.

This plot shows that the most important variable is Trust with I(Trust,[Factor_0])=1.26. It is also the variable with the highest expected satisfaction level, i.e., E(Trust)=6.79.

After switching to the , we open the Monitors for all nodes. We can see five Cluster States for [Factor_0], labeled C1 through C5, as well as their marginal distribution. This distribution was previously represented as the “bubble size” of Clusters 1–5.

Then, we select Main Menu > Inference > Interactive Inference.

Using the Record Selector in the extended toolbar, we can now scroll through each record in the associated dataset. The Monitors of the Manifest nodes show the actual survey observations, while the Monitor of [Factor_0] shows the posterior probability distribution of the states given these observations. The state highlighted in light blue marks the modal value, i.e., the “winning” state, which is the imputed state now recorded in the dataset. Clicking the Stop icon closes this function.

Network Performance Analysis

The resulting report provides measures of how well this network represents the underlying dataset.

Contingency Table Fit

Of particular interest is BayesiaLab’s Contingency Table Fit (CTF), which measures the quality of the JPD representation. It is defined as:

CTF(B) = \frac{{\overline {ll} ({B_u}) - \overline {ll} (B)}}{{\overline {ll} ({B_f}) - \overline {ll} (B)}}

where:

is the mean of the log-likelihood of the data given the network currently under study, and

is the mean of the log-likelihood of the data given the fully unconnected network, i.e., the “worst-case scenario,” and

is the mean of the log-likelihood of the data given the fully connected network, i.e., the “best-case scenario.” The fully connected network is the complete graph in which all nodes have direct links to all other nodes. Therefore, it is the exact representation of the chain rule without any conditional independence assumptions in the representation of the joint probability distribution.

Accordingly, we can interpret the following key values of the CTF:

CTF is equal to 0 if the network represents a joint probability distribution no different than the one produced by the fully unconnected network, in which all the variables are marginally independent.
CTF is equal to 100 if the network represents the joint probability distribution of the data without any approximation, i.e., it has the same log-likelihood as the fully connected network.

The main benefit of employing CTF as a quality measure is that it has normalized values ranging between 0% and 100%.

This measure can become negative if the parameters of the model are not estimated from the currently associated dataset.

CTF in Practice

It must be emphasized that CTF measures only the quality of the network in terms of its data fit. As such, it represents the second term in the definition of the :

Returning to the context of our PSEM workflow, we have the following 3 conditions:

A maximum number of 5 variables per cluster of variables;
Manifest variables with 5 states;
Factors with a maximum of 5 states.

In this situation, we recommend using 70% as an alert threshold. However, this threshold level would have to be reduced if conditions #1 and #2 increased in their values or if condition #3 decreased.

Multiple Clustering

We now return to the original network. On this basis, we can immediately start Multiple Clustering: Main Menu > Learning > Clustering > Multiple Clustering.

Compared to the dialogue box for Data Clustering, the options for Multiple Clustering are much expanded. Firstly, we need to specify an Output Directory for the to-be-learned networks. This will produce a separate network for each Factor, which we can subsequently examine. Furthermore, we want the new Factors to be connected to their Manifest variables, but we do not wish the Manifest variables to be connected among themselves. We have already learned the relationships between the manifest variables during . These relationships will ultimately be encoded via the connections between their associated Factors upon completion of . We consider these new Factor nodes to belong to the second layer of our hierarchical Bayesian network. This also means that, at this point, all structural learning involving the nodes of the first layer, i.e., the Manifest variables, is completed.

The Multiple Clustering process concludes with a report showing details regarding the generated clustering. Among the many available metrics, we can check the minimum value of the , which is reported as 76.16%. Given the recommendations we provided earlier, this suggests that we did not lose too much information by inducing the latent variables.

Upon applying Automatic Layout (P), we can identify 15 Factors surrounded by their Manifest nodes, arranged almost like a field of flowers.

In this context, BayesiaLab also created two new Classes:

Manifest, which contains all the manifest variables;
Factor, which contains all the latent variables.

Opening the Class Editor confirms their presence (highlighted below).

Product Optimization

Now we have all the elements that are necessary for optimizing the Purchase Intent of Product 1:

A network that is specific to Product 1;
A set of driver variables, selected by excluding the non-driver variables via Costs;
Realistic scenarios, as determined by the Variation Domains of each driver variable.

With the above, we are now able to search for node values that optimize Purchase Intent.

Target Dynamic Profile

While BayesiaLab offers a number of optimization methods, Target Dynamic Profile is appropriate here. We start it from within by selecting Main Menu > Analysis > Report > Target Analysis > Target Dynamic Profile.

We need to explain the many options that must be set for Target Dynamic Profile. These options will reflect our objective of pursuing realistic sets of evidence:

In Profile Search Criterion, we specify that we want to optimize the mean value of the Target Node as opposed to any particular state or the difference between states.

Joint Probability

Costs

Associate Evidence Scenario will save the identified sets of evidence for subsequent evaluation.

The setting of Search Methods is critically important for the optimization task. We need to define how to search for sets of evidence. Using Hard Evidence means that we would exclusively try out sets of evidence consisting of nodes with one state set to 100%. This would imply that we simulate a condition in which all consumers perfectly agree with regard to some ratings. Needless to say, this would be utterly unrealistic. Instead, we will explore sets of evidence consisting of distributions for each node by modifying their mean values as Soft Evidence. More precisely, we use the MinXEnt method to generate such evidence (see ).

Variation Editor

Clicking on Variations brings up the Variation Editor. By default, it shows variations in the amount of ±100% of the current mean.

To load the Variations we generated earlier through Multi-Quadrant Analysis, we click Import and select Absolute Variations from the pop-up window.

Now, we can open the previously saved file.

Depending on the setting, the percentages are relative to (a) the Mean, (b) the Domain, or (c) the Progression Margin.

Alternatively, we can choose to terminate the optimization process once the joint probability of the simulated evidence drops below the specified Minimum Joint Probability.

The final option, Use the Automatic Stop Criterion, leaves it up BayesiaLab to determine whether adding further evidence provides a worthwhile improvement for the Target Node.

Optimization Results

Once the optimization process concludes, we obtain a report window that contains a list of priorities: Personality, Fruity, Flowery, and Tenacious.

To explain the items in the report, we present a simplified and annotated version of the report below. Note that this report can be saved in HTML format for subsequent editing as a spreadsheet.

Evidence Scenarios

As an alternative to interpreting the static report, we can examine each element in the list of priorities. To do so, we bring up all the Monitors of the nodes identified for optimization.

Then, we retrieve the individual steps by right-clicking on the Evidence Scenario icon in the lower right-hand corner of the main window.

By selecting Index=1 we introduce a second set of evidence, i.e., the optimized distribution for Personality.

Continuing with Index 2 and 3, we see that the improvements to Purchase Intent become smaller.

Bringing up all the remaining nodes would bring up any “collateral” changes due to setting multiple pieces of evidence.

This is an interesting example as it identifies that the JAR-type variable Intensity needs to be lowered to optimize Purchase Intent for Product 11.

Summary

Key Driver Analysis

Our Probabilistic Structural Equation Model is now complete, and we can use it to perform the analysis part of this exercise, namely to find out what “drives” Purchase Intent. We return to the Validation Mode .

For our purposes, we want to un-check All and then only check the class Factor.

Target Analysis

In line with our objective of learning about the key drivers in this domain, we proceed to analyze the association of the newly created Factors with Purchase Intent.

Visual Analysis

We initiate the visual analysis by selecting Main Menu > Analysis > Visual > Target Mean Analysis > Standard:

This brings up a dialog box with the options shown below. Given the context, selecting Mean for both the Target Node and the Nodes is appropriate.

Target Analysis Report

As an alternative to the visual analysis, we now run the Target Analysis Report: Main Menu > Analysis > Report > Target Analysis > Total Effects on Target.

Although “effect” carries a causal connotation, we must emphasize that we strictly examine associations. This means that we perform observational inference as we generate this report.

The Total Effect (TE) is estimated as the derivative of the Target Node with respect to the driver node under study.

TE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}}

TE(X,Y) = \frac{{{\delta _Y}}}{{{\delta _X}}} \times \frac{{{\sigma _X}}}{{{\sigma _Y}}}

This means that Standardized Total Effect takes into account the “potential” of the driver under study.

Independence Tests

In the columns further to the right in the report, the results of independence tests between the nodes are reported:

Chi-Square ( ${\chi ^2}$ ) test or G-test: The independence test is computed on the basis of the network between each driver node and the target variable. It is possible to change the type of independence from the Chi-Square ( ${\chi ^2}$ ) test to the G-test via Main Menu > Options > Settings > Statistical Tools.
Degree of Freedom: Indicates the degree of freedom between each driver node and the Target Node in the network.
p-value: the p-value is the probability of observing a value as extreme as the test statistic by chance.

If a dataset is associated with the network, as is the case here, the independence test, the degrees of freedom, and the p-value are also computed directly from the underlying data.

Factors versus Manifest Nodes

We repeat both the Target Mean Analysis and the Total Effects on Target report.

This emphasizes the importance of visual analysis, as the nonlinearity goes unnoticed in the Total Effects on Target report. In fact, it drops almost to the bottom of the list in the report.

Constraints via Costs

This brings up the Cost Editor for an individual node. By default, all nodes have a cost of 1.

Unchecking the box Cost or setting a value ≤0 results in the node becoming Not Observable.

Alternatively, we can bring up the Cost Editor for all nodes by right-clicking on the Graph Panel and then selecting Edit Costs from the contextual menu.

The Cost Editor presents the default values for all nodes.

From within the Cost Editor, we can use the Import button to associate a Cost Dictionary. Alternatively, we can select Main Menu > Data > Associate Dictionary > Node > Costs from the main menu.

Furthermore, upon defining Costs, we can see that all Not Observable nodes are marked with a light purple background.

It is important to note that all Factors are also set to Not Observable in our example. In fact, we do have two options here:

The optimization can be done at the first level of the hierarchical model, i.e., using the Manifest variables;
The optimization can be performed at the second level of the model, i.e., using the Factors.