bayesia logo

Validation Measure

Context

  • The Validation Measure is available to be plotted within the Curve window as part of the Structural Coefficient Analysis.
  • Formally, the Validation Measure VM is defined as where LL stands for Log-Loss.
  • As with all measures provided by the Structural Coefficient Analysis, the Validation Measure supports you in choosing an appropriate value for the Structural Coefficient.
  • The Validation Measure is based on the Log-Loss statistics of the Learning Set and the Test Set. Hence, the associated dataset must be already be split into a Learning Set and the Test Set.
  • The Confidence Analysis Report also reports the Validation Measure corresponding to each Comparison Structure.

Usage

  • To illustrate the use of the Validation Measure, we use a sample network that represents the joint distribution of symptoms related to COVID-19:

CovidSymptoms.xbl

  • The dataset associated with this model is split into a Learning Set and a Test Set, as indicated by the  symbol tagged onto the database icon  in the lower right-hand corner of the Graph Window.
  • With this network, we now perform a Structural Coefficient Analysis: Main Menu > Tools > Multi-Run > Structural Coefficient Analysis.
  • We follow the overall workflow introduced in Structural Coefficient Analysis.
  • Given that the Validation Measure is particularly relevant in the context of Unsupervised Learning, we use EQ as the Learning Algorithm and set a Structural Coefficient range of 0.5 to 2.
    StructuralCoefficientAnalysisSettingsValidationMeasure
  • Upon clicking the Curve button at the bottom of the report, we obtain the following plot.
  • In the screenshot below, we have 16 x-y pairs (corresponding to 16 iterations) shown on the plot:
    CurveValidationMeasure
  • The x-axis represents the Structural Coefficient values.
  • The y-axis shows the Validation Measure computed for each network learned with the corresponding value of the Structural Coefficient.
  • Note that the y-values are normalized to a 0 to 1 range, i.e., the smallest computed Validation Measure is displayed as 0 and the largest value as 1.
  • You can hover with your pointer over the points on the plot, and a tooltip will show the normalized value plus the unnormalized value in parentheses.

Interpretation

  • The Structural Coefficient values that are associated with the minimum values of the Validation Measure are considered ideal.
  • The u-shaped portion at the bottom of the plotted curve (also referred to as "tub" or "trough") represents the range of minimum values:
  • With values of the Structural Coefficient within this approximate range, a model would be neither "overfitted" nor "underfitted."
  • Recall the definition of the Validation Measure VM:

where LL stands for Log-Loss.

  • This means that if the Test Set Log-Loss variance exceeds the Learning Set Log-Loss variance, the Validation Measure increases beyond the Log-Loss of the Test Set.
  • In other words, an optimal Validation Measure can be achieved only if the network's predictive performance with regard to the Test Set is good, i.e., the Test Set Log-Loss is small, and if the Test Set Log-Loss variance does not exceed the Learning Set Log-Loss variance.
  • Intuitively, this makes sense: If the Test Set Log-Loss variance is greater than the Learning Set Log-Loss variance, the predictive model is not a good generalization of the underlying dataset.

Copyright © 2025 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.