bayesia logo

Curve

Context

Usage

  • You can open the Curve window by clicking the Curve button at the bottom of the Structural Coefficient Analysis Report.
    CurveButton

Curve Window

  • Depending on what you selected in Analysis Settings at the beginning of the Structural Coefficient Analysis and what type of network you have, a range of metrics is available to plot.
  • All of them are shown simultaneously in the following screenshot. In practice, you would view them individually, and only some might apply to your analysis context.
    CurveWindowAllMeasures

Metrics Interpretation

Structural Coefficient Analysis (7.0)

Context

Tools | Multi-Run | Structural Coefficient Analysis

18351495

This tool helps choosing the best  Structural Coefficient by testing structural learning algorithms with a range of coefficients, impacting thus the structural complexity of the machine-learned networks.

Renamed Menu Item

This feature was previously under Tools | Cross-Validation.

New Feature: Rediscretize Continuous Nodes

This new option allows running automatic discretization before executing the selected structural learning algorithm with the current Structural Coefficient.

The discretization is only run for the continuous variables that have an associated automatic discretization algorithm, i.e. for which the discretization thresholds have not been manually defined (or modified). Note the Target Node is never rediscretized in this context.

The main purpose of this option is to allow testing the impact of the Structural Coefficient on the discretization algorithms. It is thus geared toward supervised learning problems, where the variables are discretized with Tree based approaches. The Structural Coefficient is indeed also used in the MDL score that is utilized for the induction of the trees.

However, if the Seed is not fixed, this can also have an impact on the following discretization algorithms that are stochastic by nature:

Example

Let's use a data set that contains house sale prices for King County (opens in a new tab), which includes Seattle. It describes homes sold between May 2014 and May 2015. More precisely, we have extracted the 94 houses that are more than 100 years old, that have been renovated, and come with a basement.

All the continuous variables have been discretized into three bins, with R2-GenOpt.

Given the small number of observation in this data set, we set five prior samples for the Smoothed Probability Estimation (5.0.4) in order to utilize a non-informative prior in the estimation of the parameters.

Below is the network learned with EQ, with the default Structural Coefficient:

18351341

Four nodes remain unconnected.

This means that, from the MDL score perspective, with the default Structural Coefficient, relationships with the other nodes are too weak, and therefore it is "too expensive" to represent these relationships. In other words, the additional bits required to represent the structure, if we were to add a link with one of these nodes, will not be compensated by the reduction of the number of bits to represent the data.

One way to try getting these nodes connected would be to decrease the number of bins used for discretization. This would automatically reduce the "price" for adding links with these nodes.

However, if we want to keep the same discretization, we can try to reduce the Structural Coefficient. Instead of manually selecting a value by trial and error, the Structural Coefficient Analysis tool can be used for automatically testing different coefficients.

18351378

With this setting, 25 networks are learned with EQ, starting with a Structural Coefficient = 1, then to 0.968, then 0.936 ..., the last network being learned with a Structural Coefficient = 0.2.

Prior to each trial, the nodes are discretized into three bins with R2-GenOpt. As the seed of our random number generator is fixed, unchecking Redescritize Continuous Nodes would return the exact same results.

The three selected metrics are computed for each of these 25 networks.

18351385

The normalized values of these metrics are available by clicking the Curve button:

18351383

All these curves suggest a coefficient between to 0.5 and 0.6.

Updated Feature: Structure Comparison

Comparing the structure of the learned networks usually helps to decide which coefficient to finally utilized. The networks are now stored from the largest Structural Coefficient to the smallest.

Example

Let's continue with our house example. The structures of the networks can be compared by clicking Structure Comparison.

18351386
18351403

The two highlighted arrows are used to go through the different structures.

  • Synthesis Structure
  • Reference Structure
  • Max SC=1
  • Max SC=0.933
  • Max SC=0.6
  • Max SC=0.567
  • Max SC=0.533
18351387

The Synthesis Structure is not a Bayesian network. It is a graph that contains all the links that have been generated during the different trials.

A link can have 3 different colors:

  • Black: the link belong to the initial structure and has been found in at least one generated solution;
  • Red: the link belong to the initial structure but has never been found in the generated solutions;
  • Blue: the link does not belong to the initial structure but has been found in at least one generated solution;

Furthermore, the thickness of the link is proportional to its frequency in the generated structures.

When an arc is added between two links, this indicates a V-Structure. Without an arc, the link can have both orientations in its Equivalent Class.

18351402
18351388

This is the network that is in the Graph Panel when the analysis is run.

18351389
18351404

This structure has been found twice, and the maximum Structural Coefficient was 1. They correpond to the two highlighted points in the graph below:

18351405
18351390
18351406

This structure has been found twice, and the maximum Structural Coefficient was 0.933. They correpond to the two highlighted points in the graph below:

18351407
18351391
18351408

This structure has been found once with a Structural Coefficient = 0.6. It correponds to the highlighted point in the graph below:

18351409
18351392
18351410

This structure has been found once with a Structural Coefficient = 0.567. It correponds to the highlighted point in the graph below:

18351411
18351393
18351412

This structure has been found seven times, and the maximum Structural Coefficient was 0.533. They correpond to the highlighted points in the graph below:

18351413

Given these structures, a conservative choice would be select the solution with a path between every nodes and the largest coefficient, i.e. the solution with a Structural Coefficient set to 0.567.

18351416

Clicking the highligted icon  allows to direcly open a new graph with the visualized structure.

When choosing a Structural Coefficient lower that 1, it is highly recommended to double check that the relationships that have been represented by decreasing the Structural Coefficient are significant. This can be done by running Analysis | Report | Relationship.

18351418

The p-value highlighted in green confirms that the relationships are significant with a threshold set to 1%. Note that, even though there is a link between view and waterfront, the p-value computed with the model and the one computed directly on the data (assuming a direct link between the two variables) are not exaclty the same. This is because the model has been learned with five prior samples for the Smoothed Probability Estimation, smoothing thus slightly the relationship.

Doing the same analysis on the network learned with a Structural Coefficient set to 0.533 returns a p-value of 2% for the weakest relationship estimated on the data.


Copyright © 2025 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.