Links

Pearson Correlation

Context

  • In BayesiaLab's approach to learning and analyzing Bayesian networks, statistical concepts play a secondary role compared to concepts from the field of Information Theory.
  • Nevertheless, statistical measures, such as correlation, can provide certain insights that are unavailable from non-statistical measures.

Definition

The Pearson Correlation Coefficient
rr
between two nodes
XX
and
YY
is defined as the covariance of the two corresponding variables divided by the product of their standard deviations:
R=cov(X,Y)σXσYR = \frac{{{\mathop{\rm cov}} (X,Y)}}{{{\sigma _X}{\sigma _Y}}}
Where the covariance is defined by:
cov(X,Y)=x,yp(x,y)×(VxvX)×(VyvY){\mathop{\rm cov}} (X,Y) = \sum\limits_{x,y} {p(x,y) \times ({V_x} - {v_X})} \times ({V_y} - {v_Y})
And the standard deviation:
σX=xpx×(VxvX)2{\sigma _X} = \sqrt {{{\sum\limits_x {{p_x} \times ({V_x} - {v_X})} }^2}}
  • Vx{{V_x}}
    is the value that is associated with the state
    xx
    .
  • vX{{v_X}}
    is the Expected Value of the node
    XX
  • px{{p_x}}
    is the marginal probability of state
    xx
    returned by the Bayesian network
  • p(x,y){p(x,y)}
    is the joint probability of states
    xx
    and
    yy
    returned by the Bayesian network

Special Considerations

  • For calculating the Pearson Correlation
    RR
    , BayesiaLab must use the values of node states.
  • In BayesiaLab, there are Discrete Nodes and Continuous Nodes with discretized numerical states. As a result, the value of a node's state may not always be apparent:
    • For Discrete Nodes that have states with integer or real values, BayesiaLab uses these numerical values directly.
    • For Discrete Nodes that have states without values, e.g., {red, green, blue}, BayesiaLab uses the indices of the states as values, i.e., {red, green, blue} would have the values {0, 1, 2} for the purpose of calculating
      RR
      . Note that the index of states starts at 0.
    • For Continuous Nodes, BayesiaLab uses these mean values of each interval.
  • Please see Mean, Value, and Standard Deviations for a detailed discussion.