Supervised Multivariate

Supervised Multivariate is one of the Automatic Discretization algorithms for Continuous variables in Step 4 — Discretization and Aggregation of the Data Import Wizard.

The Supervised Multivariate discretization algorithm focuses on representing the multivariate probabilistic dependencies involving a Target variable.
It utilizes Random Forests to find the most useful thresholds for predicting the Target variable.
Its function can be summarized as follows:
- Data Perturbation (opens in a new tab) generates a range of datasets.
- For each perturbed dataset, a multivariate tree is learned to predict the Target variable with a subset of variables. If a structure is already defined, it is used to bias the selection of the variables for each dataset.
- Extracting the most frequent thresholds produces the final discretization.
The Supervised Multivariate takes into account the Minimum Interval Weight and can improve the generalization capability of the model.
Being based on Random Forests, this algorithm is computationally expensive and stochastic by nature.
After the conclusion of the Data Import Wizard, the Supervised Multivariate discretization algorithm is also available from Menu > Learning > Discretization.
Not that the Supervised Multivariate discretization algorithm is not available via Node Context Menu > Node Editor > States > Curve > Generate a Discretization.