Example: Most Relevant Explanations for Failure Analysis

Example Background & Context

To illustrate the Most Relevant Explanations function, we present a causal Bayesian network derived from a problem domain explained in Yuan, C. et al. (2011) (opens in a new tab), which had originally been proposed in Poole, D., & Provan, G. M. (1991).
This domain was originally described as an electrical circuit consisting of an Input, an Output, plus four switches, A, B, C, and D, which can fail.

We took the liberty of embedding this exact problem that Yuan described into a practical technical context, hoping to make it easier to understand.
So, instead of a fictional and abstract circuit, we are considering an overhead catenary system that supplies power to electric locomotives along railroad tracks.

We can simplify the technical diagram from above into the following equivalent pseudo circuit, in which we represent the real-world insulators as idealized switches:
- The equivalent of a functioning insulator is an open switch.
- The equivalent of a defective insulator is a closed switch.
Note that this circuit representation is identical to the one Yuan's paper.
Looking at the arrangement of the insulators/switches, we can see that not all failures have the same effect:
- A failure of A would immediately create a connection between the Input and the Output, leading to a power drain.
- However, if any one of B, C, or D failed by themselves, it would not create an immediate problem.
Beyond nodes that represent actual, technical components in our system, we introduce intermediate output nodes that inform us about the conditions on the output sides of the switches/insulators.
Think of these intermediate output nodes as embedded sensors that indicate whether the corresponding switch can transmit power to the point where sensors are attached.

Here, the equivalent pseudo circuit is shown with the intermediate output nodes in place:

Now we have all the elements we need to represent this domain in a causal Bayesian network:
- Input (i.e., the catenary) has the states, Power and No Power.
- Output (i.e, the pylon) has the states Current and No Current.
- The node names for the switches (i.e., the insulators) correspond to their designation in the diagram, i.e., A, B, C, and, D. They all feature the states OK and defective.
- The node names for the intermediate output nodes are Output A through Output D. Each of them has the states Power and No Power, indicating where the respective switch can transmit power or not.
Upon entering the failure probabilities, we have a fully specified causal Bayesian network, which you can download here:
Yuan_Lu_Circuit.xbl

Assuming that Input=Voltage, we can see how the Bayesian network computes the probability of Output=Current, i.e., the presence of a stray current.

However, we are going to change the viewpoint. Instead of predicting the probability of system failure, we actually do observe a system failure, i.e., we measure a stray current that is flowing all the way through to the Output.
So, one or more of the components in this system must have failed.
Unfortunately, we do not have access to the intermediate outputs, which would reveal what the problem is. Note that those nodes are marked as Not Observable .
So, we must infer from the observed outcome and reason back to the potential causes.
More specifically, we wish to know the most relevant causes, i.e., what would best explain the outcome we have observed.
The following network illustrates the status of all nodes after setting Output=Current.
Naively, we might expect that the node with the highest probability of being defective is the one that prompted the failure.
However, the question is much more complex than that.

We need to employ the Most Relevant Explanations feature to identify the problems.
Select Analysis > Report > Evidence > Most Relevant Explanations.
This opens up an options window, in which we the Search Space to the Ancestors of the Target Node Output.
Upon clicking OK, BayesiaLab starts the search and quickly brings up a report showing a list of solutions, i.e., explanations.
In the list of Best Solutions, the top line shows the most relevant explanation H* for the observed evidence :
- Both B and C are defective.
- Additionally, several measures corresponding to H* are reported in the columns to the right:
  - MRE Size refers to the number of individual pieces of evidence that are part of H*, which is 2 for B and C.
  - Generalized Bayes Factor (GBF): Given that our network is a causal model, we can use the likelihood ratio to interpreting GBF. This means that the likelihood of "B=defective and C=defective" being the cause of E is 42 times greater than the likelihood of "B=OK and C=OK" being the cause of E, i.e., 48.4% versus 1.1%.
  - Likelihood P(E|H)
  - Posterior Odds O(H|E)
  - Posterior Probability P(H|E)

In this example, the number of solutions is manageable. In more complex situations, however, the search algorithm could potentially find thousands of solutions.
For constraining the size of the report, you can select the Filtering Power for producing the report.