Batch Labeling — Posterior Probabilities
Context
- Batch Labeling is used for processing the set of observations described from a dataset or an associated Evidence Scenario File].
- The multi-dimensional observations are defined by the values of the nodes that are observable and not missing.
- The values of the Target Node and of the Not Observable nodes are not used to defined the set of evidence, even when there is a corresponding value the data set row of Evidence Scenario. Each observation is this iteratively set as evidence in the network, and the posterior probability distributions of the Target Node and/or of the Not Observable variables are updated accordingly. Thus, by default, the state with the highest posterior probability is chosen for the imputation. However, for binary nodes, a dialog box allows setting the probability acceptance threshold. If the posterior distribution is uniform, a dialog box allows defining the imputation policy:
The imputations are stored in an output file that takes the selected fields of the input file and creates two additional fields per imputed variable: one for the imputed value, the other one for the posterior probability that has been utilized for the decision. If the data source is an external database, the fields of the input file that are included in the exploitation file are selected via the wizard illustrated below:
If the data source is the associated database, a dialog allows the user choosing which part of the data set (all, learning or test) will be processed and which nodes will be saved in the output file. It is also possible to choose if the state's long names are used and if the continuous values are saved: