Automatic Discretization
Context
- Automatic Discretization covers numerous discretization algorithms that are part of Step 4 — Discretization and Aggregation of the Data Import Wizard.
- Except for Manual, all items in the Type menu represent Automatic Discretization algorithms.
- Most of these algorithms can also be accessed via the Generate a Discretization function within the Manual Discretization screen.
Usage
- Selecting a Discretization algorithm applies variable by variable, i.e., you can use a different algorithm for each Continuous variable.
- To select a variable, click on the variable header or anywhere inside the column.
- You can perform the selection and deselection of multiple variables with keystroke combinations commonly used in spreadsheet editing:
- Ctrl+Click: add a variable to the current selection.
- Shift+Click: add all variables between the currently selected and the clicked variable to the selection.
- Ctrl+A: select all variables in the Data panel. However, selecting all variables is not useful here in Step 4, as there are no actions that can apply to all variable types.
- Shift+End: select all variables from the currently selected variable to the rightmost variable in the table.
- Shift+Home: select all variables from the currently selected variable to the leftmost variable in the table.
- Click the Select All Continuous button to select all Continuous variables.
- Note that this action will also select any variables which you have already discretized manually. As a result, you may override your previous choices.
- Note that Continuous variables already discretized manually are highlighted in soft blue.
- If you do not specify an algorithm for a variable that was not manually discretized either, the default Discretization algorithm with its default settings will be used.
- You can set the default Discretization algorithm under Menu > Window > Preferences > Discretization.
[+] Show More - For the following algorithms, a Log Transformation is available as an option:
- Applying the Log Transformation is useful if you have a high density of values at the bottom end of the variable domain. This "stretches" the scale for small values approaching zero.
- Note that the Log Transformation is only used temporarily for discretization purposes. Thus, the values of the thresholds and values of the intervals can all be interpreted based on the original scale.
- For the following algorithms, the option Isolate Zeros is available:
- R2-GenOpt*
- R2-GenOpt
- K-Means
- Normalized Equal Distance
- Separating 0 into a separate interval can be useful for zero-inflated distributions so as to clearly separate small values from "absolutely nothing."
- Click Finish to perform the Discretization.
- A progress bar displays the status of the Discretization process:
- If a Filtered Value is defined for a Continuous variable, a new artificial interval with an infinitesimally small width of 10-7 will be added after the intervals defined in this step. This newly-created state will serve as the Filtered State, and "*", i.e., the asterisk character, will be its State Name.
- At its conclusion, BayesiaLab opens up a Graph Window with all imported variables now represented as nodes.
- Simultaneously, a window pops up that offers you an optional Import Report (opens in a new tab), which is Step 5 (opens in a new tab) of the Data Import Wizard (opens in a new tab).
Automatic Discretization Algorithms in Detail
\