Compression
Context
-
The fit of a model to a dataset and the efficiency of encoding the dataset with a model are closely related concepts.
-
In this context, the "compression" achieved with a model can be used as a performance measure.
-
Under Information Gain and Evidence Analysis, we discussed the Information Gain of a network with regard to a single set of evidence :
-
The Information Gain regarding evidence is the difference between the:
- Log-Loss , given an unconnected network U, i.e., a so-called straw model, in which all nodes are marginally independent;
- Log-Loss the current network .
- As a result, a positive value of Information Gain would reflect a "cost-saving" for encoding the evidence by virtue of having the network . In other words, encoding with network is less "costly" than without network .
-
-
In Validation Mode
F5
, selectMenu > Analysis > Network Performance > Compression
. -
A new report window opens up featuring a graph plus a range of metrics.
The report window contains two histograms of the Log-Loss values computed from all observations in the dataset given:
- the "current model", i.e., the to-be-evaluated Bayesian network (blue bars).
- the "straw model", i.e., the unconnected network (red bars).
Furthermore, the report window includes numerous related measures:
- Entropy , based on the current model.
- Entropy , based on the "straw model."
- Mean Information Gain, i.e., the arithmetic mean of the Information Gain of each observation/evidence in the dataset.
- Mean Compression, i.e., the arithmetic mean of the Compression of each observation/evidence in the dataset.
Compression
Compression is a concept that first appears in this context. Its definition is:
So, by dividing the Information Gain by the Log-Loss , we obtain the Compression measure.
The following table illustrates the calculation of all measures.
We use the same data and network as in the example in Overall Network Performance.
Evidence E from Dataset | Computed Measures | ||||||||
---|---|---|---|---|---|---|---|---|---|
Month | Hour | Temperature | Shortwave Radiation (W/m2) | Wind Speed (m/s) | Energy Demand (MWh) | Log-Loss (Bayesian Network) | Log-Loss (Unconnected Network) | Information Gain | Compression |
8 | 18 | 36.57 | 213.6 | 2 | 1574 | 13.42 | 22.06 | 8.63 | 39% |
8 | 19 | 36.04 | 105.91 | 1.9 | 1574 | 13.55 | 21.68 | 8.13 | 38% |
8 | 20 | 34.71 | 42.72 | 2.14 | 1485 | 11.93 | 19.4 | 7.47 | 39% |
8 | 21 | 33.94 | 0 | 2.75 | 1470 | 11.92 | 17.73 | 5.81 | 33% |
8 | 22 | 33.19 | 0 | 3.55 | 1378 | 11.81 | 17.73 | 5.92 | 33% |
8 | 23 | 32.38 | 0 | 4.21 | 1249 | 13.69 | 16.93 | 3.23 | 19% |
8 | 0 | 31.56 | 0 | 4.5 | 1110 | 12.91 | 16.93 | 4.02 | 24% |
8 | 1 | 30.6 | 0 | 4.8 | 1031 | 13.21 | 16.93 | 3.71 | 22% |
8 | 2 | 29.66 | 0 | 4.9 | 975 | 11.16 | 14.7 | 3.54 | 24% |
8 | 3 | 29.02 | 0 | 4.6 | 944 | 10.85 | 14.7 | 3.85 | 26% |
⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ |
Entropy | Entropy | Mean Information Gain | Mean Compression | ||||||
Mean | 13.17 | 17.46 | 4.29 | 24% | |||||
Std. Dev. | 2.08 | 2.17 | 2.33 | ||||||
Minimum | 9.75 | 14.37 | -12.5 | ||||||
Maximum | 31.78 | 31.06 | 16.3 |
Updated Terminology
Please note the updated terminology when referring to earlier versions of BayesiaLab.
Deprecated | Current |
---|---|
Consistency | Information Gain |
Consistency Gain | Mean Compression |