Skip to Content

Example: Most Relevant Explanations for Failure Analysis

Example Background & Context

  • To illustrate the Most Relevant Explanations function, we present a causal Bayesian network derived from a problem domain explained in Yuan, C. et al. (2011) , which had originally been proposed in Poole, D., & Provan, G. M. (1991).
  • This domain was originally described as an electrical circuit consisting of an InputInput, an OutputOutput, and four switches, AA, BB, CC, and DD, which can fail.

Overhead Catenary System

  • We took the liberty of embedding the problem Yuan described into a practical technical context, hoping to make it easier to understand.
  • So, instead of a fictional and abstract circuit, we are considering an overhead catenary system that supplies power to electric locomotives along railroad tracks.

Conceptual Illustration

Loading SVG...
Click to Zoom
Loading SVG...
  • Our system consists of the following elements, as illustrated in the above diagram.
    • A high-voltage wire, a so-called overhead catenary, serves as a power source for electric locomotives.
    • This high-voltage wire is suspended from a support structure that is attached to steel pylons alongside the railroad track.
  • In addition to supporting the wire, this structure must also provide electrical insulation.
    • It does so through four insulators, labeled AA, BB, CC, and DD, so that there is no path along which electricity could flow into the steel pylon and ultimately into the ground.
    • As with all equipment that is exposed to the elements and subject to mechanical and electrical forces, these insulators can fail.
    • Long-term operational data has established specific failure probabilities for each of the insulators within a given time period:
      • p(A=defective)=1.6%p(A=\mathit{defective})=1.6\%
      • p(B=defective)=10%p(B=\mathit{defective})=10\%
      • p(C=defective)=15%p(C=\mathit{defective})=15\%
      • p(D=defective)=10%p(D=\mathit{defective})=10\%
    • In this context, “defective” means the degradation from a perfect insulator (R=)(R = \infty) to an Ohmic resistor (R0Ω)(R \gg 0\,\Omega), not necessarily a short circuit (R=0Ω)(R = 0\,\Omega).
  • As a result, the failure of one or more insulators could create a stray current between the catenary wire (Input\mathit{Input}) and the steel pylon anchored in the ground (Output\mathit{Output}), thereby “leaking” electric energy.
  • With that, the overall objective must be to prevent any stray currents. However, our specific objective in this example is to find the relevant causes if a stray current were to be observed.

Equivalent Circuit

  • We can simplify the technical diagram from above into the following equivalent pseudo circuit, in which we represent the real-world insulators as idealized switches:

    • The equivalent of a functioning insulator is an open switch.
    • The equivalent of a defective insulator is a closed switch.
Loading SVG...
Click to Zoom
Loading SVG...
  • Note that this circuit representation is identical to the one in Yuan’s paper.

  • Looking at the arrangement of the insulators/switches, we can see that not all failures have the same effect:

    • A failure of AA would immediately create a connection between the Input\mathit{Input} and the Output\mathit{Output}, leading to a power drain.
    • However, if any one of BB, CC, or DD failed by itself, it would not create an immediate problem.
  • Beyond nodes that represent actual, technical components in our system, we introduce intermediate output nodes that inform us about the conditions on the output sides of the switches/insulators.

  • Think of these intermediate output nodes as embedded sensors that indicate whether the corresponding switch can transmit power to the point where the sensors are attached:

  • Here, the equivalent pseudo circuit is shown with the intermediate output nodes in place:

Loading SVG...

Explanatory Causal Bayesian Network

  • Now we have all the elements we need to represent this domain in a causal Bayesian network:
    • Input\mathit{Input} (i.e., the catenary) has the states Voltage\mathit{Voltage} and No Voltage\mathit{No\ Voltage}.
    • Output\mathit{Output} (i.e., the pylon) has the states Current\mathit{Current} and No Current\mathit{No\ Current}.
    • The node names for the switches (i.e., the insulators) correspond to their designation in the diagram, i.e., AA, BB, CC, and DD. They all feature the states OKOK and defective\mathit{defective}.
  • The node names for the intermediate output nodes are Output A\mathit{Output}\ A through Output D\mathit{Output}\ D. Each of them has the states Power\mathit{Power} and No Power\mathit{No\ Power}, indicating whether the respective switch can transmit power.
  • Upon entering the failure probabilities, we have a fully specified causal Bayesian network, which you can download here:
Yuan_Lu_Circuit.xbl

Initial State of Bayesian Network

Loading SVG...
Click to Zoom
Loading SVG...
  • Assuming that Input=Voltage\mathit{Input}=\mathit{Voltage}, we can see how the Bayesian network computes the probability of Output=Current\mathit{Output}=\mathit{Current}, i.e., the presence of a stray current.

System Failure Observed

  • However, we are going to change the viewpoint. Instead of predicting the probability of system failure, we actually do observe a system failure, i.e., we measure a stray current that is flowing all the way to the Output\mathit{Output}.
  • So, one or more of the components in this system must have failed.
  • Unfortunately, we do not have access to the intermediate outputs, which would reveal what the problem is. Note that those nodes are marked as Not Observable.
  • So, we must infer from the observed outcome and reason back to the potential causes.
  • More specifically, we wish to know the most relevant causes, i.e., what would best explain the outcome we have observed.
  • The following network illustrates the status of all nodes after setting Output=Current\mathit{Output}=\mathit{Current}.
Loading SVG...
Click to Zoom
Loading SVG...
  • Naively, we might expect that the node with the highest probability of being defective is the one that prompted the failure.
  • However, the question is much more complex than that.

Most Relevant Explanations

  • We need to employ the Most Relevant Explanations feature to identify the problems.

  • Select Analysis > Report > Evidence > Most Relevant Explanations.

  • This opens up an options window, in which we set the Search Space to the Ancestors of the Target Node Output\mathit{Output}.

    MREOptions2
  • Upon clicking OK, BayesiaLab starts the search and quickly brings up a report showing a list of solutions, i.e., explanations.

    MREReport
  • In the list of Best Solutions, the top line shows the most relevant explanation HH^* for the observed evidence E=Input=Voltage,Output=CurrentE={\mathit{Input=Voltage}, \mathit{Output=Current}}:

    • Both B\mathit{B} and C\mathit{C} are defective.
  • Additionally, several measures corresponding to HH^* are reported in the columns to the right:

    • MRE Size refers to the number of individual pieces of evidence that are part of HH^*, which is 2 for BB and CC.
    • Generalized Bayes Factor (GBF): Given that our network is a causal model, we can use the likelihood ratio to interpret GBF. This means that the likelihood of B=defectiveB=\mathit{defective} and C=defectiveC=\mathit{defective} being the cause of EE is 42 times greater than the likelihood of B=OKB=OK and C=OKC=OK being the cause of EE, i.e., 48.4% versus 1.1%.
    • Likelihood P(EH)P(E \mid H)
    • Posterior Odds O(HE)O(H \mid E)
    • Posterior Probability P(HE)P(H \mid E)

Filtering Solutions

  • In this example, the number of solutions is manageable. In more complex situations, however, the search algorithm could potentially find thousands of solutions.
  • For constraining the size of the report, you can select the Filtering Power for producing the report.

Filtering Power=0 (No Filtering)

MREReportFilter0

Filtering Power=1 (Strongly Dominated Solutions are Filtered)

MREReportFilter1

Filtering Power=2 (Strongly and Weakly Dominated Solutions are Filtered)

MREReportFilter2

Workflow Animation