bayesia logo
BayesiaLab
Step 3 Data Selection Filtering and Missing Value Processing

Step 3 — Data Selection, Filtering, and Missing Value Processing

​Context

  • Step 3 of the five-step Data Import Wizard deals with Data Selection, Filtering, and Missing Values Processing.

Overview of Elements

Data

We start with the Data panel — although it is at the bottom of the window — as it can help inform decisions about Missing Values Processing.

This Data panel resembles the Data panel from Step 2 — Definition of Variable Types.

However, there are several important additional pieces of information available:

  • A Missing Values icon indicates the presence of at least one Missing Value in the corresponding variable.
  • A triangle icon indicates that variable-specific statistics are available. It appears on all variable headers with the exception of variables of the type Row Identifier and Learn/Test.
  • Clicking on the triangle icon or the associated variable header brings up a table with variable statistics:
  • For Discrete variables, it shows the frequencies of all states, including Missing Values and Filtered Values:
  • The Filter checkboxes allow you to uncheck/deselect specific values.
  • The checked box means that the value is included, which is the default condition.
  • The unchecked box means that the value is excluded and that all rows​ that contain that value will be filtered, i.e., removed.
  • As you experiment with checking/unchecking, you can see how the Number of Rows in the Information panel changes.

In terms of a data query, the Filter checkbox would be the equivalent of a nominal value row filter.

🚫

Note that the number of Filtered Values does not refer to the number of excluded rows due to an unchecked Filter checkbox.

  • For Continuous variables, it shows the standard statistics, such as Minimum, Maximum, Mean, and Standard Deviation. Additionally, the table displays the frequencies of non-missing values, Missing Values, and Filtered Values:

Select Values

Three actions are available in this panel:

  • You can choose the logic for combining the Filters and Minima/Maxima assigned in the Data panel:

    • OR: a row will be removed if ANY of the selected Filters or specified Minima/Maxima across all variables apply to that row.
    • AND: a row will only be removed of ALL of the selected Filters and specified Minima/Maxima across all variables that apply to that row.
  • Click the Show Selections button to review what Filters and Minima/Maxima are currently in place.

  • Note the syntax for Discrete variables: The variable name is followed by "in" (i.e., is an element of) followed by the included values shown as an array in square brackets.

  • Further logical expressions are shown as conjunctions (AND) or disjunctions (OR) in separate lines.

  • Clicking the Delete Selections button removes all Filters and Minima/Maxima currently in place.

Missing Values Processing

  • In the Missing Value Processing panel you can specify which kind of processing to apply to variables with Missing Values, i.e., Filter, Replace, and Infer.

  • This panel is only active if you select one of the variables that feature a small question mark icon . This icon indicates that the corresponding variable contains at least one Missing Value.

Filter

  • The Filter function allows you to remove rows from the dataset that contain Missing Values. This is equivalent to what is commonly known as casewise deletion.

  • You can apply the Filter individually to any variable that contains Missing Values.

🚫

Despite the similarity in name, this function is not related to [Filtered Values].

Usage
  • In the Data panel, click on the header or into the column of the variable with Missing Values.
  • Then, check the Filter checkbox in the Missing Values Processing panel.
  • Next, choose the logical condition to apply when you select multiple variables to be subject to the Filter.
    • OR: a row will be removed if ANY of the selected variables contain a Missing Value in that row.
    • AND: a row will only be removed of ALL of the selected variables containing a Missing Value in that row.
⚠️

Before applying Filter, please consider the implications discussed in the section Filter Listwise/Casewise Deletion.

Replace By

With the Replace By function, you can specify a value for replacing the Missing Values in the selected variable.

You have several options in this regard:

  • You can set a specific value:
    • For a Discrete variable, you can select among the values observed in the variable from a drop-down list.
    • Alternatively, you can choose the Modal value, i.e., the most frequently occurring value of the variable in the dataset.
    • For a Continuous variable, you can select to use the Mean value computed from the dataset.
    • As an alternative, you can specify any arbitrary value.

Infer

The Methods in Detail:
  • Infer — Static Imputation
  • Infer — Dynamic Imputation
  • Infer — Structural EM
  • Infer — Entropy-Based Imputations

Information


Copyright © 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.