Distribution
Context
- With the Distribution comparison function, you can compare two files containing Log-Loss values, each of which represents the fit of a dataset to a Bayesian network model.
- In other words, this function only compares the measures of fit and not the underlying data itself.
- As result, you can even compare the fit of two entirely unrelated datasets to their respective models, e.g., comparing the data-to-model fit of an image recognition model to the data-to-model fit of a credit risk model.
- More realistically, you can compare the fit of a Bayesian network model with the fit produced by a different modeling technique.
Usage
- In the Validation Mode, select Main Menu > Tools > Compare > Distribution.
- Then, a pop-up window prompts you to select two files, which must each contain the Log-Loss values for each record of a dataset..
- Open the first network, switch into Validation Mode and then perform **Inference → Batch Joint Probability → (Text File | Database | Internal Data | Internal Evidence Scenarios)
**Here, it is not necessary to save the values of any nodes, so No Node should be selected.Upon selecting the file name of the to-be-saved file, BayesiaLab will prompt you whether the logarithm of the join probability should be saved. For our purposes, No must be selected. - Repeat the previous steps for the second network, saving the file under a different name.
- The resulting csv files each contain three columns. Delete the two rightmost columns (highlighted in red) in each file with a spreadsheet editor
- Save the edited files, maintaining the csv format.
- Once the files are prepared in this format, you can start the Comparison of Joint Probabilities, Inference → Batch Joint Probability → (Text File | Database | Internal Data | Internal Evidence Scenarios)
- Select the previously edited files via the dialogue box and click Compare.
Results
The results are presented in three tabs: First Data Set, Second Data Set, and Comparison.
Each panel shows the mean, standard deviation, minimum, maximum, and computed row numbers.
First Data Set


Second Data Set


Comparison
The Kolmogorov-Smirnov Test is computed to test for the equality of the probability distributions of both datasets.
More specifically, it compares the distributions of log-likelihoods. The panel also reports the K-S Z and D statistics along with their p-value.

