Pearson Correlation
- In BayesiaLab's approach to learning and analyzing Bayesian networks, statistical concepts play a secondary role compared to concepts from the field of Information Theory.
- Nevertheless, statistical measures, such as correlation, can provide certain insights that are unavailable from non-statistical measures.
The Pearson Correlation Coefficient
between two nodes
and
is defined as the covariance of the two corresponding variables divided by the product of their standard deviations:
Where the covariance is defined by:
And the standard deviation:
- is the value that is associated with the state.
- is the marginal probability of statereturned by the Bayesian network
- is the joint probability of statesandreturned by the Bayesian network
- For calculating the Pearson Correlation, BayesiaLab must use the values of node states.
- In BayesiaLab, there are Discrete Nodes and Continuous Nodes with discretized numerical states. As a result, the value of a node's state may not always be apparent:
- For Discrete Nodes that have states with integer or real values, BayesiaLab uses these numerical values directly.
- For Discrete Nodes that have states without values, e.g., {red, green, blue}, BayesiaLab uses the indices of the states as values, i.e., {red, green, blue} would have the values {0, 1, 2} for the purpose of calculating. Note that the index of states starts at 0.
- For Continuous Nodes, BayesiaLab uses these mean values of each interval.
- Please see Mean, Value, and Standard Deviations for a detailed discussion.