Pearson Correlation
Context
- In BayesiaLab's approach to learning and analyzing Bayesian networks, statistical concepts play a secondary role compared to concepts from the field of Information Theory.
- Nevertheless, statistical measures, such as correlation, can provide certain insights that are unavailable from non-statistical measures.
Definition
The Pearson Correlation Coefficient between two nodes and is defined as the covariance of the two corresponding variables divided by the product of their standard deviations:
Where the covariance is defined by:
And the standard deviation:
- is the value that is associated with the state .
- is the Expected Value of the node
- is the marginal probability of state returned by the Bayesian network
- is the joint probability of states and returned by the Bayesian network
Special Considerations
- For calculating the Pearson Correlation , BayesiaLab must use the values of node states.
- In BayesiaLab, there are Discrete Nodes and Continuous Nodes with discretized numerical states. As a result, the value of a node's state may not always be apparent:
- For Discrete Nodes that have states with integer or real values, BayesiaLab uses these numerical values directly.
- For Discrete Nodes that have states without values, e.g., {red, green, blue}, BayesiaLab uses the indices of the states as values, i.e., {red, green, blue} would have the values {0, 1, 2} for the purpose of calculating . Note that the index of states starts at 0.
- For Continuous Nodes, BayesiaLab uses these mean values of each interval.
- Please see Mean, Value, and Standard Deviations for a detailed discussion.