Pearson Correlation

Context

In BayesiaLab's approach to learning and analyzing Bayesian networks, statistical concepts play a secondary role compared to concepts from the field of Information Theory (see Key Concepts).
Nevertheless, statistical measures, such as correlation, can provide certain insights that are not available from non-statistical measures.

Definition

The Pearson Correlation Coefficient $r$ between two nodes $X$ and $Y$ is defined as the covariance of the two corresponding variables divided by the product of their standard deviations:

r = \frac{{{\mathop{\rm cov}} (X,Y)}}{{{\sigma _X}{\sigma _Y}}}

where the covariance is defined by:

{\mathop{\rm cov}} (X,Y) = \sum\limits_{x,y} {p(x,y) \times ({V_x} - {v_X})} \times ({V_y} - {v_Y})

and the standard deviation:

{\sigma _X} = \sqrt {{{\sum\limits_x {{p_x} \times ({V_x} - {v_X})} }^2}}

${{V_x}}$ is the value that is associated with the state $x$ .
${{v_X}}$ is the Expected Value of node $X$
${{p_x}}$ is the marginal probability of state $x$ returned by the Bayesian network
${p(x,y)}$ is the joint probability of states $x$ and $y$ returned by the Bayesian network

Special Considerations

For calculating the Pearson Correlation $r$ , BayesiaLab must use the values of node states.
In BayesiaLab, there are Discrete Nodes and Continuous Nodes with discretized numerical states. As a result, the value of a node's state may not always be apparent:
- For Discrete Nodes and Continuous Nodes that have states with integer or real values, BayesiaLab uses these numerical values directly.
- For Discrete Nodes and Continuous Nodes that have states without values, e.g., {red, green, blue}, BayesiaLab uses the indices of the states as values, i.e., {red, green, blue} would have the values {0, 1, 2} for the purpose of calculating $r$ . Note that the index of states starts at 0.
- For Continuous Nodes, BayesiaLab uses these mean values of each interval.
Please see Mean, Value, and Standard Deviations for a detailed discussion.

Usage

To display the Pearson Correlation on the arcs of the network, select Menu > Analysis > Visual > Overall > Arc > Pearson Correlation or press the G key as a shortcut.
The width of each arc in the network is now proportional to the Pearson Correlation.
An additional control panel is available in the Toolbar, which allows you to define the Pearson Correlation threshold for the arcs.
By moving the slider or typing in a specific value, BayesiaLab grays out all arcs that fall below that threshold.
Alternatively, you can use the previous and next buttons to step through the specific thresholds at which arcs are added and disappear respectively.
Furthermore, you can specify the following options in the control panel:
- displays only those arcs that have a negative correlation greater than the value specified as a threshold. So, in mode, a threshold of 0.5 means that correlations in the range of $-1 \le R \le - 0.5$ will be shown.
- displays only those arcs that have a correlation with an absolute value greater than the one specified as a threshold. So, in mode, a threshold of 0.5 means that correlations in the range of $-1 \le R \le -0.5$ and $0.5 \le R \le 1$ will be shown.
- displays only those arcs that have a positive correlation greater than the value specified as a threshold. So, in mode, a threshold of 0.5 means that correlations in the range of $- 1 \le R \le - 0.5$ will be shown.
Click the Arc Comment icon on the Toolbar to display the Pearson Correlation values as a comment label on each arc. Alternatively, you can select Menu > View > Show Arc Comments.
Positive and negative correlations are marked blue and red respectively. This color assignment reflects the convention followed in BayesiaLab.
By clicking the checkmark icon for validation, you can save all computed Pearson Correlation values as Arc Comments so that they will be retained even after this analysis concludes. This validation also saves the widths of the arcs as a graphical property.

⚠️

Note that once values have been saved as Arc Comments, they are merely static text labels, which will not be updated if the network changes. |

Clicking the cancel icon concludes the analysis without saving any information from the analysis.

Mutual Information Mapping