Information Gain and Evidence Analysis

Context

Log-Loss

The Log-Loss $LL(E)$ reflects the number of bits required to encode an n-dimensional piece of evidence (or observation) $E$ given the current Bayesian network $B$ . As a shorthand for "the number of bits required to encode" we use the term "cost" in the sense that "more bits required" means computationally "more expensive."

$LL_B(E) = - \log_2({P_B(E)})$ ,

where $P_B(E)$ is the joint probability of the evidence $E$ computed by the network $B$ :

$P_B(E) = P_B({e_1},...,{e_n})$

In other words, the lower the probability of $E$ given the network $B$ , the higher is the Log-Loss $LL(E)$ .

⚠️

Note that $E$ refers to a single piece of n-dimensional evidence, and not an entire dataset.

Information Gain

The Information Gain regarding evidence $E$ is the difference between the:

Log-Loss $LL_U(E)$ , given an unconnected network $U$ , i.e., a so-called "straw model", in which all nodes are marginally independent;
Log-Loss $LL_B(E)$ the current network $B$ .

As a result, a positive value of Information Gain would reflect a "cost-saving" for encoding the evidence $E$ by virtue of having the network $B$ . In other words, encoding $E$ with network $B$ is less "costly" than without network $B$ .

In previous versions of BayesiaLab, Information Gain has also been referred to as "Overall Conflict in Evidence" or "Global Contradiction Measure."

Usage

The Information Gain function is available if at least one node is observed, i.e., one piece of evidence is set.
Both Hard Evidence and/or Soft Evidence, i.e., probability distributions, can serve as a basis for this analysis.
You can restrict the analysis by selecting a subset of nodes.
Select Menu > Analysis > Report > Evidence > Information Gain.
Alternatively, you can activate this function using the shortcut Shift+R.
Upon activation, the Information Gain function produces a report in a new window.

| ![Information] | This function was previously named Evidence Analysis Report.

Information Gain and Evidence Analysis Report

To illustrate Information Gain and related concepts, we use the following network B that represents wind speed, sunshine, time, and temperature (measured in °C) in El Paso, Texas, along with the city's energy demand.

You can download this Bayesian network here:

ElPaso.xbl

Let's now set evidence $E$ that represents high noon in Texas in July, i.e., Month=7, Hour=12, and Temperature<=35, and run the Information Gain analysis:

Analysis Context Table

The Analysis Context panel shows evidence $E$ — which we just set — along with the corresponding Joint Probability.

Information Gain

The next box shows the value of the Information Gain regarding evidence $E$ given the network $B$ .

The positive value of $IG_B(E)$ tells us that network $B$ reduces the encoding cost of evidence $E$ . In other words, $B$ and $E$ are "consistent."

A negative value of $IG_B(E)$ would suggest that $B$ and $E$ are "not consistent" or "in conflict."

Evidence Analysis Table

The Evidence Analysis table features columns for Node, State, Local Information Gain, and Hypothetical Information Gain.

Local Information Gain

Local Information Gain reveals how much "information would be gained" by adding one piece of hypothetical evidence $h$ to the current, existing set of evidence $E$ .

The following Evidence Analysis table lists all states that could hypothetically be added as evidence h:

Each row in the Evidence Analysis table shows one piece of hypothetical evidence h along with the corresponding Local Information Gain:

Positive values suggest that the hypothetical evidence is "in line" or "consistent" with the existing evidence $E_1$ , given the network $B$ .
For each node, the state representing the hypothetical evidence with the maximum Local Information Gain is highlighted in yellow.
For instance, the report says that Shortwave Radiation (W/m2)>800 would provide a Local Information Gain of 3.139. It is not surprising that intense sunshine coincides with great heat at 12 noon on a day in July.
Negative values of Local Information Gain can be interpreted that such hypothetical evidence would be inconsistent or in conflict with the existing evidence $E$ .
According to our network $B$ , for example, a hypothetical observation of Shortwave Radiation (W/m2)<=0, i.e., no sunlight at all (top row in the Evidence Analysis table), would be extremely inconsistent with a hot summer day in Texas at 12 noon.

Previously, this measure was also known as Local Consistency.

Hypothetical Information Gain (Bayes Factor)

The Hypothetical Information Gain is commonly known as the Bayes Factor. This measure sums the Information Gain of any existing evidence $E$ and the Local Information Gain from the hypothetical evidence $h$ . As a result, it provides an overall assessment of the "agreement" of all existing plus the hypothetical evidence:

$HI{G_B}(E,h) = I{G_B}(E) + LI{G_B}(E,h)$

Here is the complete accounting of values:

Information Gain:

$IG_B(E) = 1.973$

Local Information Gain:

Hypothetical Information Gain:

Reference Evidence

If the Bayesian network includes a Target Node and evidence is set on it, this evidence serves as Reference Evidence. In that case, the report contains a special section titled Evidence Analysis with respect to <Name of Target Node>.

To illustrate this situation, we make the node Energy Demand (MWh) the Target Node in the network of this example.

Furthermore, we set new evidence E, consisting of the following individual observations:

Temperature<=40 (°C)
Month=7 (i.e., July)
Hour=20 (i.e., 8 p.m.)
Energy Demand (MWh)>1750, which is the highest-valued state of that node.

Then, we run the analysis again: Menu > Analysis > Report > Evidence > Information Gain.

All of the above pieces of evidence $E$ will be analyzed in terms of Information Gain just like before.

Additionally, evidence $E$ will be assessed with regard to the Reference Evidence, Energy Demand (MWh)>1750, i.e., whether it confirms or contradicts it.

The report now features the additional panel regarding the Reference Evidence:

This additional panel report the nodes and states that confirm and/or contradict the Reference Evidence:

Evidence that confirms the Reference Evidence:
- Temperature<=40
- Month=7
Evidence that contradicts the Reference Evidence:
- Hour=20.

Note that evidence could also be neutral with regard to the Reference Evidence, which would be reflected in this panel as well, but is not applicable in this example.

Here is a possible interpretation of the Evidence Analysis regarding the Reference Evidence:

The observation of summer heat (Month=7, Temperature<=40) is consistent with maximum energy demand (Energy Demand (MWh)>1750), e.g., due to the extensive use of air conditioning.
However, the late hour of 8 p.m. (Hour=20) is past the typical time of peak demand. As a result, it contradicts the Reference Evidence on the basis of our network $B$ .

Evidence Information Gain