Distribution

Context

With the Distribution comparison function, you can compare two files containing Log-Loss values, each of which represents the fit of a dataset to a Bayesian network model.
In other words, this function only compares the measures of fit and not the underlying data itself.
As result, you can even compare the fit of two entirely unrelated datasets to their respective models, e.g., comparing the data-to-model fit of an image recognition model to the data-to-model fit of a credit risk model.
More realistically, you can compare the fit of a Bayesian network model with the fit produced by a different modeling technique.

Usage

In the Validation Mode, select Main Menu > Tools > Compare > Distribution.
Then, a pop-up window prompts you to select two files, which must each contain the Log-Loss values for each record of a dataset..

Open the first network, switch into Validation Mode and then perform **Inference → Batch Joint Probability → (Text File | Database | Internal Data | Internal Evidence Scenarios)
**Here, it is not necessary to save the values of any nodes, so No Node should be selected.
Upon selecting the file name of the to-be-saved file, BayesiaLab will prompt you whether the logarithm of the join probability should be saved. For our purposes, No must be selected.
Repeat the previous steps for the second network, saving the file under a different name.
The resulting csv files each contain three columns. Delete the two rightmost columns (highlighted in red) in each file with a spreadsheet editor
Save the edited files, maintaining the csv format.
Once the files are prepared in this format, you can start the Comparison of Joint Probabilities, Inference → Batch Joint Probability → (Text File | Database | Internal Data | Internal Evidence Scenarios)
Select the previously edited files via the dialogue box and click Compare.

Results

The results are presented in three tabs: First Data Set, Second Data Set, and Comparison.

Each panel shows the mean, standard deviation, minimum, maximum, and computed row numbers.

First Data Set

Second Data Set

Comparison

The Kolmogorov-Smirnov Test is computed to test for the equality of the probability distributions of both datasets.

More specifically, it compares the distributions of log-likelihoods. The panel also reports the K-S Z and D statistics along with their p-value.

Compare Structure