Pearson Correlation

Context

In BayesiaLab's approach to learning and analyzing Bayesian networks, statistical concepts play a secondary role compared to concepts from the field of Information Theory.
Nevertheless, statistical measures, such as correlation, can provide certain insights that are unavailable from non-statistical measures.

Definition

The Pearson Correlation Coefficient $r$ between two nodes $X$ and $Y$ is defined as the covariance of the two corresponding variables divided by the product of their standard deviations:

Where the covariance is defined by:

And the standard deviation:

${{V_x}}$ is the value that is associated with the state $x$ .
${{v_X}}$ is the Expected Value of the node $X$
${{p_x}}$ is the marginal probability of state $x$ returned by the Bayesian network
${p(x,y)}$ is the joint probability of states $x$ and $y$ returned by the Bayesian network

Special Considerations

For calculating the Pearson Correlation $R$ , BayesiaLab must use the values of node states.
In BayesiaLab, there are Discrete Nodes and Continuous Nodes with discretized numerical states. As a result, the value of a node's state may not always be apparent:
- For Discrete Nodes that have states with integer or real values, BayesiaLab uses these numerical values directly.
For Discrete Nodes that have states without values, e.g., {red, green, blue}, BayesiaLab uses the indices of the states as values, i.e., {red, green, blue} would have the values {0, 1, 2} for the purpose of calculating $R$ $R$ . Note that the index of states starts at 0.
- For Continuous Nodes, BayesiaLab uses these mean values of each interval.
Please see Mean, Value, and Standard Deviations for a detailed discussion.

Symmetric Relative Mutual Information Structural Coefficient