Skip to Content

Perturbed Tree

Context

Algorithm Details & Recommendations

  • The Perturbed Tree algorithm is designed to optimize the representation of the probabilistic dependency between a Target variable and the to-be-discretized variable. It is an extension of the Tree discretization algorithm, and it functions as follows:
    • Data Perturbation  generates a range of datasets.
    • For each perturbed dataset, a univariate tree is learned to predict the Target variable with the to-be-discretized continuous variable.
    • Extracting the most frequent thresholds produces the final discretization.
  • The Perturbed Tree algorithm takes into account the Minimum Interval Weight and can reduce the number of bins if necessary. It can also be more robust than the simple Tree discretization.