Generating the Test Dataset

To begin this exercise, we use BayesiaLab to produce the test data that we will later use for evaluating the Missing Values Processing methods.

We can directly generate data according to the joint probability distribution encoded by the Reference Network: Main Menu > Data > Generate Data.

Next, we must specify whether to generate this data internally or externally. For now, we generate the data internally, which means that we associate data points with all nodes. This includes missing values and Filtered Values according to the reference network.

For the Number of Examples (i.e., cases or records), we set 10,000.

Now that this data exists inside BayesiaLab, we need to export it, so we can truly start โ€œfrom scratchโ€ with the test dataset. Also, regarding realism, we only want to make the observable variables available rather than all. We first select the nodes X1_obs through X5_obs and then select Main Menu > Data > Save Data from the main menu.

Next, we confirm that we only want to save the Selected Nodes, i.e., the observable variables.

Upon specifying a file name and saving the file, the export task is complete.

A quick look at the CSV file confirms that the newly generated data contain missing and Filtered Values, as indicated with question marks (?) and asterisks (*), respectively.

Now that we have produced a test dataset with all types of missingness, we forget our reference model to start โ€œfrom scratch.โ€ We approach this dataset as if this were the first time we see it, without any assumptions and without any background knowledge. This provides us with a suitable test case for BayesiaLabโ€™s range of missing values processing methods.

Last updated

Logo

Bayesia USA

info@bayesia.us

Bayesia S.A.S.

info@bayesia.com

Bayesia Singapore

info@bayesia.com.sg

Copyright ยฉ 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.