Links

Information Gain

Definition

The Information Gain regarding evidence
EE
is the difference between the:
  • Log-Loss
    LLU(E)L{L_U}\left( E \right)
    , given an unconnected network
    UU
    , i.e., a so-called straw model, in which all nodes are marginally independent;
  • Log-Loss
    LLB(E)L{L_B}\left( E \right)
    given a reference network
    BB
    .
IGB(E)=log2(P(e1,...,en)i=1nP(ei))=LLU(E)LLB(E)IG_B(E) = {\log _2}\left( {{{P({e_1},...,{e_n})} \over {\prod\limits_{i = 1}^n {P({e_i})} }}} \right) = L{L_U}(E) - L{L_B}(E)
In earlier versions of BayesiaLab, Information Gain was named Consistency.

Interpretation

The Log-Loss reflects the "cost" in bits of applying the network
BB
to evidence
EE
, i.e., the number of bits that are needed to encode evidence
EE
. The lower the probability of evidence
EE
, the higher the Log-Loss.
As a result, a positive value of Information Gain would reflect a "cost-saving" for encoding evidence
EE
by virtue of having network
BB
. In other words, encoding
EE
with network
BB
is less "costly" than encoding it with the straw model
UU
. Therefore, evidence
EE
would be consistent with network
BB
.
Conversely, a negative Information Gain indicates a so-called conflict, Log-Loss of evidence
EE
is higher with the straw model
UU
compared to the reference network
BB
. Note that conflicting evidence does not necessarily mean that the reference network is wrong. Rather, it probably indicates that such a set of evidence belongs to the tail of the distribution that is represented by the reference network
BB
.
However, if evidence
EE
is drawn from the original data on which the reference network
BB
was originally learned, the probability of observing conflicting evidence should be smaller than the probability of observing consistent evidence.
So, for a network model to be useful, there should generally be more sets of evidence with a positive Information Gain, i.e., consistent observations, than sets of evidence with a negative Information Gain, i.e., conflicting observations.
Therefore, the mean value of the Information Gain of a reference network
BB
compared to a straw model
UU
is a useful performance indicator of the reference network
BB
.

Related BayesiaLab Functions

  • Information Gain and Evidence Analysis
  • Network Performance