Aggregation of Single Variable

Individual variables can be aggregated manually or automatically in Step 4 of the Data Import Wizard.

To illustrate all related workflows, we use an American auto buyer satisfaction survey containing 42,397 responses. Each record contains attributes of the purchased vehicle, such as make (or brand), model, body style, vehicle segment, number of cylinders, transmission, price paid, self-reported fuel economy, plus hundreds of other variables.

Manual Aggregation

First, we want to manually aggregate all 37 automobile brands that appear in the survey into just two states, i.e., Premium Brands and Non-Premium Brands.

This manual aggregation will be based exclusively on our subjective perception of the auto industry as of 2009, which is when this particular survey was conducted.

  • Click on the Brand variable in the Data panel.

  • From the States list on the left, select the values you wish to aggregate using Shift+Click or Ctrl+Click.

  • Then, click the Aggregate button.

  • The newly-formed, aggregated state appears in the Aggregates list on the right.

  • By default, the original values are concatenated using the "+" symbol as a delimiter. An underscore "_" is added as a prefix.

  • As necessary, you can select more values from the States list and create additional aggregated states.

  • In the list of Aggregates, you can now replace the automatically-generated state names with more meaningful ones.

  • You can now proceed to any other variable or click Finish to conclude the Data Import Wizard.

Workflow Animation

Correlation-Aided Manual Aggregation

In addition to the Manual Aggregation described above, BayesiaLab can support you in making the aggregation decisions. For this purpose, BayesiaLab can show how the original values of the to-be-aggregated variable correlate with those of other variables.

Continuing with the previous example, we now perform an aggregation of the same variable, Brand. Now, however, we use each brand's correlation with Price as a guide instead of our judgment.

For the purpose of this demonstration, we have already discretized the Price variable manually into three (arbitrary) intervals using two thresholds, i.e., $25,000 and $45,000.

We now want to use the correlation of each brand with the top interval, i.e., $45,000+, as a measure of its "premium appeal" so that we can reduce the 37 brands into three states, Mainstream, Premium, and Luxury.

For reference, 8.65% of all survey responses reported a vehicle purchase price of $45,000 or higher.

Workflow Instructions

  • Click on the Brand variable in the Data panel.

  • Click the Show Correlations box.

  • Select Target and State.

  • Review the values shown in the Correlations column. By hovering with your cursor over the Correlation bars in each row, a Tooltip displays the percentage difference of the corresponding row versus the marginal value.

  • The colored bars show how each value compares to the marginal probability of the selected state of the target. A green-colored bar indicates a probability higher than the marginal probability, and a red bar suggests a lower probability.

  • Select the states to aggregate using Ctrl+Click.

  • Once you have selected the values, click the Aggregate button.

  • The newly aggregated values now appear as a single item in the Aggregates list.

  • Review the newly aggregated states and, if necessary, assign new names to replace the ones that were generated automatically.

  • To reverse the aggregation select the aggregated items in the Aggregates list and click Delete.

Workflow Animation

Correlation-Aided Automatic Aggregation

The Correlation-Aided Automatic Aggregation is very similar to the Correlation-Aided Manual Aggregation.

The principal difference is that you don't select your to-be-aggregated values manually but rather specify thresholds that determine the aggregation.

So, the initial steps are analogous to the Correlation-Aided Manual Aggregation.

  • Click on a Discrete variable in the Data panel.

  • Click the Show Correlations box.

  • Select Target and State.

  • Review the values shown in the Correlations column. By hovering with your cursor over the Correlation bars in each row, a Tooltip displays the percentage difference of the corresponding row versus the marginal value.

  • The colored bars show how each value compares to the marginal probability of the selected state of the target. A green-colored bar indicates a probability higher than the marginal probability, and a red bar suggests a lower probability.

  • Now, instead of manually selecting the values you want to aggregate, click the Automatic Aggregation button.

  • The Automatic Aggregation window opens up.

  • The colored bar at the top visualizes the percentage differences versus the marginal probability of the selected state of the target.

  • In our example, there is one brand, Mercury, which had no observations in the $45,000+ interval. As a result, it marks the bottom end of the spectrum, i.e., it is 8.65 percentage points below the marginal probability.

  • On the other end of the spectrum, Porsche is 83.97 percentage points above the marginal probability.

  • A default threshold is shown for 0, which is marked by the pink-to-red color change in the bar.

  • You can manually add thresholds by right-clicking on the bar.

  • As soon as you add a threshold, a corresponding entry appears in the list below.

  • Right-clicking again on an existing threshold removes that threshold.

  • You can move an existing threshold by clicking on it and then dragging it to the desired value.

  • Also, in the table below the colored bar, you can type in a threshold value.

  • By clicking OK, you confirm the specified thresholds, and all values in the States list will be aggregated accordingly.

  • Alternatively, you can click on Generate Aggregates and specify the desired number of intervals.

  • You obtain a set of aggregation thresholds, which you can further modify or accept by clicking OK.

  • Now you have a new set of states in the list of Aggregates.

Workflow Animation

Last updated

Logo

Bayesia USA

info@bayesia.us

Bayesia S.A.S.

info@bayesia.com

Bayesia Singapore

info@bayesia.com.sg

Copyright © 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.