Unsupervised Multivariate

Context

Algorithm Details & Recommendations

  • The Unsupervised Multivariate discretization algorithm focuses on representing multivariate probabilistic dependencies using Random Forests.

  • Its functionality can be described as follows:

    • A new dataset is created as a clone of the original one.

    • In this new dataset, each variable is independently shuffled to render all the variables independent while keeping the same statistics for each variable.

    • The cloned dataset is concatenated with the original dataset. Then, a target variable is created to differentiate the clone from the original, indicating the independent set versus the original dependent set.

    • Various datasets are generated from this concatenated dataset with Data Perturbation.

    • For each perturbed dataset, a multivariate tree is learned to predict the target variable with a subset of variables. If a structure is already defined, it is used to bias the selection of the variables for each dataset.

    • Extracting the most frequent thresholds produces the discretization.

    • Being based on Random Forests, this algorithm is computationally expensive and stochastic by nature, specifically when the number of variables is important.

  • The Unsupervised Multivariate discretization algorithm is also available after the data import via Main Menu > Learning > Discretization.

  • However, it is not available in the Node Editor (Node Context Menu > Edit > Curve > Generate a Discretization).

Last updated

Logo

Bayesia USA

info@bayesia.us

Bayesia S.A.S.

info@bayesia.com

Bayesia Singapore

info@bayesia.com.sg

Copyright © 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.