|
Data
|
|
Manual imputation of hidden
nodes or with missing values |
When a node (or a set of nodes) has missing
values in the database associated to the network, it is possible to directly perform
the imputation of its/their missing values, i.e. without using the global imputation process. In modeling mode, you must select
these variables and choose Imputation in the nodes' contextual
menu. The corresponding missing values in the database are then replaced by the
imputed values.
When the network contains hidden variables and an
associated database, it is also possible to perform the imputation of these
nodes. This is the case for example for the imputation of the Cluster node, result of
the data clustering.
 |
|
|
Random selection when
equiprobability during imputation |
When an equiprobability over two or more
modalities occurs during imputation, the final modality is chosen randomly. |
|
|
Automatic association of
discrete values for continuous nodes |
When a database is associated to a network, it
is possible to associate a column containing discrete values to a continuous
node. If the column name is the same as the node name and the
name of the values are the same as the name of the node's intervals, the
association is automatically realized.
It should be noted that, in such case, no numeric values would be
associated to the node in the loaded database. It will then be impossible to
use graphics with continuous values. |
|
|
Transfer of aggregates |
During data import, it is now possible to
transfer aggregates defined for a variable to other variables.
Once the aggregates is defined, we select the variables on which we want to
have the same aggregates and we perform the transfer. In order to perform
the aggregation over a variable, this variable must have the same modalities
as the initial variable. It is possible that the transfer is not
performed on some variables when the final modality number
is less than 2. In that case, a list containing the variables on
which the transfer failed is displayed. |
|
|
Using class.modality keys in
dictionaries |
To associate a value or a long name to a
modality, it is possible to use class.modality key in order to
associate this value or long name to the specified modality of all the nodes
of the specified class. It allows avoiding the enumeration of all concerned
nodes. |
|
|
Import/Export of color
dictionaries |
The color dictionary allows to associate colors
to the specified nodes or to the nodes of the specified classes. These
colors are described in hexadecimal RGB coding with 8 bits by channel (web
coding). A color will be written with 6 hexadecimal characters: 2 characters
for red (from 00 to FF: from 0 to 255), followed by 2 characters for the
green and 2 characters for the blue. Examples of colors :
- red: FF0000
- green: 00FF00
- blue: 0000FF
- gray: 929292
- yellow: FFFF00
- pink: FFC0FF
|
|
|
Internal database generation |
It is now possible to generate data, in memory, in agreement with
the joint probability distribution described by the active network. These data can then be saved
on disc. |
|
|
Import/Export of long name
dictionaries for the modalities |
A long name can be associated to each node modality
and can be used instead of the default name in the different kinds
of display. As the values, these long names can also be imported from a dictionary.
These long names can also be exported into
dictionaries. It is also possible to use these long names in the database
saving and imputation by checking the corresponding option. |
Analysis
|
|
Variable clustering |
A very powerful analysis tool has been added to BayesiaLab.
This tool allows to cluster the network variables into group of variables that are close semantically.
These clusters are designed according to the node proximity in the graph and based on the force of the arcs.
A color is automatically associated to each cluster to highlight the clustering:

The number of clusters is automatically computed by using the force of the arcs (4 in the previous example). However, the associated
toolbar contains a slider that allows to choose the desired number of clusters:

The button
validates the current clustering and associates to each cluster a
class named Cluster_i.
Once the classes created, it is possible to
perform the multiple clustering, in modeling mode, that allows, for each class named
Cluster_i, to generate a synthetic variable from the nodes belonging to this class.
|
|
|
Performance of the network
over multiple thresholds |
The network performance analysis tool has been
enriched by the possibility to evaluate several probability acceptance
thresholds at the same time. It allows to find the optimal policy to adopt
according to the results obtained with the different acceptance thresholds.
It is possible to define a unique threshold (0.5 by default) that can be
modified or to use several thresholds for the evaluation. We indicate the
needed threshold number and the algorithm choose the probability thresholds
by following an equal frequency distribution. The following dialog box
allows to enter the parameters for the evaluation:

In the result window, a combo box allows to choose the threshold
for which the confusion matrix (occurrences, reliability, precision) will be displayed.

A button named Global report generates a Html report that synthesizes the
the analysis on the various thresholds.
 |
Inference
|
|
Batch joint probability
|
A new tool allows to compute the joint
probability of each record of a database, using the active Bayesian network. The batch joint
probability process can be interrupted at any moment without loosing the
computed data. The already generated data are saved in the output database.
This tool can be very useful
to detect outliers, ie. atypical records with very weak joint probability, by taking into account all the variables. |
|
|
Exact inference for the
compact dynamic Bayesian networks |
It is now possible to perform exact
inference with compact dynamic Bayesian networks that have dependent temporal nodes.
It is necessary to qualitatively indicate these dependences (without modifying the original conditional probability table to keep the possible marginal independence), by simply adding arcs between these dependent nodes, as illustrated below, with the red marked arcs.
|
Learning
|
|
Taboo learning
optimizations |
The Taboo learning algorithm has been
optimized and its learning time has been reduced of 80% on average. |
|
|
Data Clustering Improvement |
The Data Clustering algorithm has been completely reconsidered. The implementation of a new score and the development of a new search strategy allow obtaining much more relevant clusters. These clusters are more stable (differences between the marginal distributions and the effective training records associated to the cluster), and with an improved purity (the mean of the cluster probability computed from each associated training record). The resulting number of clusters can be then consequently reduced with respect to the previous release (elimination of the clusters that are finally irrelevant).
|
|
|
Graphical representation of
the obtained clusters |
The final cluster analysis report contains now a new button named Mapping. It allows to display a graphical representation of the obtained clusters.

This graph displays three characteristics of the clusters:
- the color represents the purity of the clusters: the darkness of the blue is directly proportional to the purity. Cluster 8 is the cluster with the higher purity in the above example;
- the size represents the prior probability of the cluster;
- the distance between two clusters represents the mean
neighborhood of the two clusters.
It is possible to rotate the graph using the two buttons at
the bottom right. |
|
|
Multiple clustering |
This tool allows to carry out data clustering on various subsets of variables of the same Bayesian network.
The subsets are defined by the classes that are named Cluster_i.
The variables of each class Cluster_i are then used to induce a new variable (the latent variable, i.e. without any corresponding data in the file)
that summarizes them.
Even if it is possible to manually define these classes, they can be created automatically by the
variable clustering tool.
A Bayesian network is created for each
of these classes. This network contains the variables that belong to the class and the synthetic variable Cluster. At the end of the
last data clustering, a final Bayesian network is created. It contains all the Clusters and comes with a internal database where
all the values of these latent variables have been inferred with respect to their corresponding network. It is also possible to add to that
final Bayesian network the intial variables (the manifest variables).
The following dialog box allows to enter the different parameters for
Multiple clustering. It naturally reuses most of the parameters used for
data clustering.
In addition to the parameters that are also in data
clustering, the wizzard allows to select the directory where the various
generated networks will be saved (a network by class Cluster_i and the final network with all the latent variables).
This wizzard also allows to add or not all the nodes of the initial
network to the final one. As in data
clustering, the number of values of the latent variables can be a priori fixed or found by a random walk. It can also be defined
as being equal to the average number of values of the variables belonging to Cluster_i. The remainder is strictly identical
to data clustering.
At the end of each clustering, an automatic analysis of
the obtained Bayesian network is carried out and a target analysis report is generated. This report
is identical to the one generated by data clustering.
A the end of the last clustering, a synthetic report is generated. This report describes, for each latent variable,
the distribution of its values on the learning set, and the list of the nodes sorted according to the quantity of information brought to its knowledge.
|
|
|
Running EQ from an already
connected network |
It is now possible to start the EQ learning
algorithm from a network that already contains arcs. It should be noted that
the fixed arcs are considered as normal arcs and can be removed or reversed
by the algorithm. The following dialog box allows to choose the starting
mode of the algorithm:
 |
Network
|
|
Markov blanket export |
In order to ease the deployment of the Bayesian scoring functions, as for example in direct marketing applications, a new exportation tool ( )
allows now to export the Markov blanket of the target variable in external programming language. The exportation is a text file, a program that performs exact
inference on the target variable from the values of the Markov blanket's
nodes according to the programming language chosen in the following dialog
box:

In SAS, for example, the output file contains a SAS macro
that can be used directly.
Other languages like SPSS, Java, C, C++ will be soon available. Bayesia will also study more specific user requests in order the increase the possibilities.
Note: this is an option not included in the classic versions of
BayesiaLab. |
|
|
Class editor |
A class editor has been added to directly manage all the classes. It is possible to add and remove classes and to modify the set of nodes contained in each class.

By selecting one or more classes in the list, it is possible to modify the
color value, the image, the temporal index and the cost for all the
nodes of the selected classes. Each corresponding button opens an editor
specific to each property:

The second checkbox allows to generate automatically a distinct color for each selected
class. It is thus necessary to select various classes in the list.



The modifications will be only applied during the general
validation of the dialog box. It also should be noted that the modifications
made with the properties act only on the nodes which already belong to the
class and not to the nodes which will be added later. |
|
|
Display dialog of introduced
cycles during arc addition |
In modeling mode, when we add an arc that
introduces a directed cycle into the graph, a dialog box announces it and
allows to display a window in which the list of introduced directed cycles
is displayed.

In the following case, the addition of an arc between Dyspnea and Age
introduces 3 directed cycles. The lengths of the cycles are also indicated.
Clicking on a directed cycle in the list allows to highlight the arcs of the cycle (pink arcs).
 |
|
|
Long names for modalities |
Each node modality can now have a long
name associated to it. This long name can be used in monitors and reports. It can also be used
during database exportation (saving, imputation) to replace the (short) modality names by the long names.
The long name editor has been added to the node editor.
 |
|
|
Import/export of forbidden
arcs |
The forbidden arcs can now be imported and
exported into a text file.
 |
|
|
Copy of the classes |
The copy of the classes is now managed during
the copy of the nodes that belong to these classes. The copied nodes are
added to the classes the initial nodes belong to. |
Monitors
|
|
Monitors toolbar |
To ease the management of the monitors and to
extend their functionalities, a monitor management toolbar has been added and is
displayed in validation mode:

It contains 5 tools:
-
: evidences removing
-
: monitors removing
-
: resetting of the gray arrows representing the probability variations
-
: setting the reference for the probability variation
-
: highlighting of the maximum probability variations
|
|
|
Highlighting the maximum posterior probabilities variations |
The button
allows to highlight the modalities for which the last evidence setting have implied the greatest negative and positive probability variation. Whereas the green arrow represents the maximum positive variation, the red one indicates the maximum negative variation. |
|
|
Fixing the reference
probability distribution |
The button
allows to define the probability distributions that are used as reference for the computation of the probability variations (grey arrows). By default, the probability variation
is always computed from the previous distribution (ie before entrering the last evidence). |
|
|
Direct probability distribution setting |
A new evidence setting tool have been added to directly enter a desired
probability distribution.

The likelihoods are computed so that the final probability distribution
of the node is the one entered by the user. The probability edition mode is
available by two means: by pressing the Ctrl and Shift keys while clicking
on a modality bar, or by using the contextual menu associated to the
monitor. A green and red buttons are then added to the monitor. The
probabilities can be entered:
- by maintaining the left mouse button pressed while choosing the desired
probability level, or
- directly by editing the probability value thanks to a double click on
the value.
- A click on the name of the modality (on the right) fix the current
probability value (the probability bar is green).
Once all the probabilities are entered, the green button allows
validating the data entry. The likelihoods and the probability
distribution are then updated. The red button allows canceling the probability
edition. |
|
|
Display sorted monitors for
the selected nodes |
The displaying of the monitors sorted according
to the target information gain or to the target modality information gain can be
restricted to selected nodes of the graph. |
|
|
Modality long names display
option |
The contextual menu of the monitors allows to
switch between the modality labels display and the modality long names
display. In the case of the long names display, if a modality does not have
a long name, its label will be displayed instead. |
Interface
|
|
Resizable toolbar
container |
In order to allow to display all of the
toolbars according to any window size, the toolbar container is resizable
and displays on another line the toolbars that will be otherwise truncated. It can
contain as many lines as toolbars.
 |
|
|
Quit button in the error
report |
The dialog box of the error report contains a
Quit button that allows to exit the program without having to send
the error report. |
|
|
Tooltip for the modality
names in the CPTs |
Sometimes the name of the modalities are too
long to be fully displayed in the CPTs headers in the node editor. A
tooltip showing the full name is displayed when the cursor hovers over the
concerned headers. |
|
|
Constraints, constants and
classes indicators icons |
In the graph window's status bar, at the bottom
right, three new icons are displayed:
-
: forbidden arcs indicator. It is displayed only if some forbidden arcs are
defined. Clicking on it opens the forbidden arc editor. It is only enabled
in modeling mode.
-
: constants indicator. It is displayed only if some constants are defined.
Clicking on it opens the constant editor. It is only enabled in modeling
mode.
-
:
classes indicator. It is displayed only if some classes are defined.
Clicking on it opens the class editor. It is only enabled in modeling mode.
|
|
|
Selective rotation
|
The behavior of the graph rotation has been modified
in order to rotate whole or part of the graph. When there is no
selected nodes, or when all the nodes are selected, the whole graph will rotate around its barycenter. If only
some nodes are selected, only these nodes will rotate around their
barycenter. |
|
|
Lexicographical order of the
variables in the node editor |
In order to have a easiest access to the
different nodes in the node editor, the combo box displays the nodes in the lexicographical order rather than
the creation order. |
|
|
Back to selection mode
configurable for the networks edition |
When we create networks, we often switch
between node or arc creation mode to selection mode. It is now possible to
do it in two ways: either by a right click which allows to return in
selection mode on request, or by activating the option of automatic back to
selection mode after each action. This option is in the editing preferences:
 |
|
|
Positioning grid
displayable and modifiable |
The nodes positioning grid is now displayable
and modifiable. Its display is managed by the convenient menu ( ).
The line spacing, expressed in pixels, is modifiable in the display preferences. |
Security
|
|
Install, activation and
uninstall with administrative rights |
Install, activation and uninstall of the
software are done with the administrative rights. |
|
|
Uninstall key |
During the uninstall, the software displays an
uninstall key that can be used to reinstall BayesiaLab on another computer
keeping the same license. |
|
|
License server parameters
in the preferences |
In the license server version of BayesiaLab,
the server parameters can be modified in the corresponding preferences and
will be effective at the next start of the software. |
|