Variable Segmentation - Cross Validation

The cross-validation toolbox now includes two new functions for estimating the stability of the clusters of variables (JackKnife and Data Perturbation)
The stability is estimated by
- Generating different data sets (with JackKnife or data perturbation),
- Learning the network with the selected unsupervised learning algorithm, and
- Applying the variable clustering algorithm on the obtained networks with the defined parameters.


Example


The stability can be assessed:
- Qualitatively, by comparing the color blocks describing the clusters
- Quantitatively with the FIT Score that corresponds to the percentage of correspondence of each cluster with respect to the one obtained on the initial network

The stability can also be assessed graphically:
- A link between two variables indicates that they have been associated into at least one factor
- The thickness of the link is directly proportional to the frequency of the association
- The colors of the nodes are those of the initial factors
