Validation Measure

Context

The Validation Measure is available to be plotted within the Curve window as part of the Structural Coefficient Analysis.
Formally, the Validation Measure VM is defined as
$VM = \mathop{\rm E}\nolimits (LL_{\text{Test}}) \times \max \left( 1, \frac{\mathop{\rm var} (LL_{\text{Test}})}{\mathop{\rm var} (LL_{\text{Learning}})} \right)$
where LL stands for Log-Loss.
As with all measures provided by the Structural Coefficient Analysis, the Validation Measure supports you in choosing an appropriate value for the Structural Coefficient.
The Validation Measure is based on the Log-Loss statistics of the Learning Set and the Test Set. Hence, the associated dataset must be already be split into a Learning Set and the Test Set.
The Confidence Analysis Report also reports the Validation Measure corresponding to each Comparison Structure.

To illustrate the use of the Validation Measure, we use a sample network that represents the joint distribution of symptoms related to COVID-19:

The dataset associated with this model is split into a Learning Set and a Test Set, as indicated by the symbol tagged onto the database icon in the lower right-hand corner of the Graph Window.
With this network, we now perform a Structural Coefficient Analysis: Main Menu > Tools > Multi-Run > Structural Coefficient Analysis.
We follow the overall workflow introduced in Structural Coefficient Analysis.
Given that the Validation Measure is particularly relevant in the context of Unsupervised Learning, we use EQ as the Learning Algorithm and set a Structural Coefficient range of 0.5 to 2.
Upon clicking the Curve button at the bottom of the report, we obtain the following plot.
In the screenshot below, we have 16 x-y pairs (corresponding to 16 iterations) shown on the plot:
The x-axis represents the Structural Coefficient values.
The y-axis shows the Validation Measure computed for each network learned with the corresponding value of the Structural Coefficient.
Note that the y-values are normalized to a 0 to 1 range, i.e., the smallest computed Validation Measure is displayed as 0 and the largest value as 1.
You can hover with your pointer over the points on the plot, and a tooltip will show the normalized value plus the unnormalized value in parentheses.

The Structural Coefficient values that are associated with the minimum values of the Validation Measure are considered ideal.
The u-shaped portion at the bottom of the plotted curve (also referred to as "tub" or "trough") represents the range of minimum values: $0.6<SC<0.9.$
With values of the Structural Coefficient within this approximate range, a model would be neither "overfitted" nor "underfitted."
Recall the definition of the Validation Measure VM:

VM = \mathop{\rm E}\nolimits (LL_{\text{Test}}) \times \max \left( 1, \frac{\mathop{\rm var}(LL_{\text{Test}})}{\mathop{\rm var}(LL_{\text{Learning}})} \right)

where LL stands for Log-Loss.

This means that if the Test Set Log-Loss variance exceeds the Learning Set Log-Loss variance, the Validation Measure increases beyond the Log-Loss of the Test Set.
In other words, an optimal Validation Measure can be achieved only if the network's predictive performance with regard to the Test Set is good, i.e., the Test Set Log-Loss is small, and if the Test Set Log-Loss variance does not exceed the Learning Set Log-Loss variance.
Intuitively, this makes sense: If the Test Set Log-Loss variance is greater than the Learning Set Log-Loss variance, the predictive model is not a good generalization of the underlying dataset.