|
Data
|
|
Importing/Exporting image
dictionary |
It is possible to import and export
dictionaries of images associated with the nodes. The dictionary contains a
series of association between a name of node or class and the path to an
image relatively to the directory in which the dictionary file is. When the
images are exported, they are saved in the same directory as the dictionary.
Their file format is png. |
|
|
Automatic association of
modalities with the long names |
When the user associates a database containing
the modalities' long names instead of their names, these long names are
automatically associated to the corresponding modalities. |
|
|
Buttons to extend limits during
association |
During a database association, it can happen
that, for the continuous nodes, certain values of the database are outside
the variation domain of the corresponding node. It is possible to extend the
domain of the node for each node. However, when there are many nodes in this
case, these two buttons, "Database's Minima" and "Database's Maxima", allow
to extend the limits for all the concerned nodes. The button "Network's
Limits" allows to filter the lines whose values are outside the domains of
the nodes.
 |
|
|
Settings to import integers as
discrete modalities |
It is possible, through the Settings, to define
how will be interpreted, in the importation, a column containing only
integer values. We can specify below which integer modalities number the
node will be automatically considered as a discrete node and not a
continuous node. Above this number the node will be considered, a priori, as
a continuous node. It is always possible to manually change the type of the
node.
 |
|
|
Automatic attribution of values
for the continuous variable modalities |
During the importation or the association of
databases, it is possible to associate as value of each modality of a
continuous node, the mean of the values contained in the database and
corresponding to an interval. This property can be modified separately for
the importation and association in the Settings.
 |
|
|
Multiple selection of the
modalities when filtering during import |
When the user filters the data of one column
during importation, it is now possible to select several modalities at the
same time and to apply the filter to the selection instead of having to do
it separately for each modality. That increases a lot the speed of filtering
on the large databases. |
|
|
Buttons to select all
continuous or discrete columns when filtering during import |
It is now possible to select all the continuous
columns and all the discrete columns at the data filtering step during
import thanks to two new buttons. They allow to easily apply filter to all
the continuous nodes or all the discrete ones.
 |
|
|
Possibility to save only the
data of the selected nodes |
If the user selects a set of nodes before
saving the database associated with the network, the assistant will propose
to save only the data corresponding to the selected nodes or to save the
whole database.
 |
|
|
Possibility to save the
continuous nodes' long names |
A long name can be associated to each modality
of a continuous node. By default, when we save, the modality's name or the
continuous value, if it exists, is used. It is now possible to use the
modality's long name. This is valid also for the imputation, the database
generation, etc.
 |
|
|
Possibility to save or
generate the continuous value of a continuous node |
When a database associated to a network that
contains continuous nodes is saved, it is possible to save the continuous
values contained in the database, if they exist, instead of the name of the
modalities. When there are no continuous values in the database but only the
modality's names of the continuous nodes, the values will be generated
automatically. For each method of the database, a number is randomly
generated in a uniform way between the limits of the corresponding interval.
This is also valid for the imputation, the database generation, etc.
 |
|
|
Learning and test database |
It is now possible to define a specific column
in the database that will indicate what are the lines that will be used for
learning and those that will be used for testing the network. For example,
when one wants to use the target evaluation of a network, this allows to
induce the network only on the learning lines and to evaluate it on the
testing lines without reload any database. It is possible to define as data
type (learning or test) column a column that contains only two modalities.

It is possible to attribute the training to one of the two modalities and
thus the test to the other.

When there is no type, the whole database is used for learning and
analysis. |
|
|
Automatic data typing as
learning/test during import |
During import or association, when there is no
column to type the data in learning data and test data, it is possible to
define a percentage of lines that will be dedicated to the tests.

The following dialog box allows to define this percentage:

The lines will be randomly chosen in the database after the filtering
step.
Choosing a column for data type won't be anymore allowed. |
|
|
Choice of the data type for
the database saving, imputation and generating |
If a database contains data for learning and
data for tests, when saving or performing imputation, the user can choose
between the learning data, the test data or both at the same time.

When we generate a database, we also can choose to generate data for
learning and data for tests.
 |
|
|
Weights and data type saved in
the database's first columns |
If a database has a weight column and/or a data
type column, both columns will be saved in first positions when the database
will be saved. |
|
|
K-Means discretization |
A new discretization algorithm for the
continuous variable was introduced in BayesiaLab: it is the K-Means
algorithm. It allows to discretize continuous nodes very efficiently,
specially when there is no target node for the decision tree.
 |
|
|
Use of K-Means if the decision
tree fails |
During the discretization of the continuous
nodes at import, if the decision tree was required on a variable and
it fails to find a correct discretization, the K-Means algorithm will be
used instead of the equal frequency discretization. |
Analysis
|
|
Target modality
optimization |
This function allows searching for the tuples
that either maximize the probability of the target modality (likelihood
maximization) or maximize the a posteriori (probability of the target
modality knowing the tuple weighted by the occurrence probability of that
tuple). The context (i.e. the evidences) is taken into account during the
analysis.
This is an any-time algorithm. In other words, even if this is an
exhaustive search over the tuples, it can be interrupted at any moment by
clicking on the red light (at the lower left corner of the graph window)
without loosing the results. Furthermore, we use a heuristic that allows us
to begin the search with the most promising tuples.
The settings panel allows to restrict the search to the selected nodes.
The observations' size is the size of the tuples we want to obtain. it is
possible to associate the results as an internal database : you simply have
to choose the number of examples you want to associate. If you stop the
search before attaining the wanted number of examples, the associated
database will be smaller. In order to do not use the database for learning,
as some columns are completely empty, the data are typed as test data.
You can save the found tuples in a file by specifying its name. The
observations saving can be filtered in 5 different ways in order to keep
only the combinations of interesting observations. The three first
possibilities are :
- The likelihood is higher than the target's initial probability
- The likelihood is higher than the best obtained likelihood
- The likelihood is higher than the defined value (the value is defined
between 0 and 1)
In the case we take into account the posterior probability, two other
filters are added:
- The a posteriori is higher than the best obtained a posteriori
- The a posteriori is higher than the defined value (the value is defined
between 0 and 1)
 |
|
|
Target node sensitivity
analysis |
This tool allows to graphically visualize the
impact of the network's variables over each modality of the target variable.
The variation range of each modality is displayed according to each node's
values. The ranges are sorted from bottom to top from the strongest to the
weakest. The first thumbnail represents the variations of the modalities on
one graph. The following thumbnails represent the variations for each
modality.
The analysis is performed over the whole nodes or over a subset of
selected nodes. The context of the observations is taken into account and
displayed under the graphs when it is necessary. A contextual menu allows to
display the comment associated to the nodes instead of the name. It allows
to copy the graph as an image.


 |
|
|
Parameters sensitivity
analysis |
This tool allows to measure the impact of the
uncertainty associated to the "parameter" nodes on the target nodes by using
sampling. By default, the parameters nodes are the root nodes (i.e. the
nodes without parent), the target nodes are the leaf nodes (i.e. the nodes
without child).

The result of the simulation can be saved in a file. The result of the
analysis is presented with a curve representing the repartition function of
the probabilities of each modality, and, a bar chart representing the
probability density function. Besides this graphical results, the mean and
the standard-deviation of the probabilities of the target modalities are
also given. Obviously, the mean corresponds to the marginal probability
displayed in the monitors.


The analysis is performed over the whole nodes or over a subset of
selected nodes. The context of the evidences is taken into account and
displayed under the graphs. A contextual menu allows to copy the graph as an
image. |
|
|
Variable clustering report
|
An html report of the variable clustering is
now displayed when the user validates the current clustering:
 |
|
|
Variable clustering
dendrogram |
The button
was added to the variable clustering toolbar in order to display the
hierarchical representation of the current clustering as a dendrogram. It is
always possible to modify interactively the number of clusters and to
observe the result on the dendrogram. A contextual menu allows to display
the comment associated to the node instead of the name. You can also copy
the graph as an image.
 |
|
|
Network global performance
|
This tools computes a global performance index
of the network over the associated database. The computed value correspond
to the log-likelihood.
If the database contains data for learning and for test, the analysis is
done for each kind of database, otherwise it is done over the whole data.
When a test database is associated to the network, we can compare the result
obtained over the learning database with the result over the test database.
The results are displayed as graphs in a window. There is a thumbnail for
each kind of database (learning and test). For each database, it is possible
to visualize the results as a density function in which the interval number
can be modified dynamically or as a distribution function.
Density function

Distribution function

The graphs associated to the learning database and to the
test database are at the same scale in order to compare them very easily by
simply changing the selected thumbnail.
When an example of a database is impossible, i.e. it represents an
impossible combination of evidences, the example is not taken in account in
the final result and is displayed in a table in an html report. This report
is displayed by pressing on the Skipped Rows button.
 |
|
|
Most probable explanation
analysis mode |
This tool allows to compute the most probable
explanation, i.e. the case that have the highest joint probability. The
monitors are used to highlight this case. The probability of each modality
is then replaced by the likelihood that the corresponding modality belongs
to to most probable case. The context (i.e. the evidences) is taken into
account during the analysis. Each time and evidence is set, the most
probable explanation is computed and the monitors are updated. The joint
probability displayed in the upper part of the window of the monitors
corresponds to the joint probability of this most probable explanation.
 |
|
|
Neighborhood graphical
analysis |
This kind of analysis allows visualizing, for
the selected nodes, what is the set of nodes that are belonging to it
according to the mode chosen in the toolbar:

The nodes that are not belonging to the selected node are made
translucent and are not selectable anymore. When we click on a visible node,
the node that are not belonging to it are made translucent. In order to make
the node visible again, you have just to click on any location except on a
visible node.
It is possible to display, through the combo box:
- The Markov blanket
- The spouses
- The parents. It is possible to specify until which distance the
ancestors are displayed with the corresponding field.
- The children. It is possible to specify until which distance the
descendants are displayed with the corresponding field.
- The neighbors. It is possible to specify until which distance the
neighbors are displayed with the corresponding field.
In the following example, the Markov blanket of the selected node is
displayed, i.e. the not concerned nodes are made translucent :
 |
|
|
KL index global
contribution percentage |
In the relationship analysis report, a column
containing the percentage of the Kullback-Leibler divergence total
contribution for each arc was added. |
|
|
Pearson's analysis
improvements |
The association of values to nodes' modalities
allows to compute R, the Pearson's linear correlation coefficient between
two nodes linked by an arc. If the modalities don't have associated values,
default values are defined in order to compute R (from 0 to n-1 for a node
with n modalities). The thickness of an arc is directly proportional to the
absolute value of R, its color represents the sign of R (blue if positive
and red if not). The exact value of the correlation for each arc is
temporary displayed in the comment of the arc.
Note: if there is no value associated to the modalities, the index of the
modality starting from 0 is used. If the node is continuous, the values used
is the mean of each interval. If the node is discrete with integer values as
modalities, the integer represented by the modality is used.
You can use the slider to change the arc display threshold according to
the selected filter button:
-
Displays only arcs having a negative correlation greater than the given
threshold in absolute value
-
Displays only arcs having a correlation greater than the given threshold in
absolute value
-
Displays only arcs having a positive correlation greater than the given
threshold
If all the arcs of a node became transparent, the node becomes
transparent. |
Inference
|
|
Batch MPE inference
|
For each node declared as not observable
target or hidden, we compute, for each case described in the specified
database, the most probable explanation and all the
likelihoods are saved.
The results are stored in an exploitation file that takes the selected
fields of the input file and associates, for each modality of each not
observable node, the computed probability. |
|
|
Batch MPE labeling |
It makes an inference with the
most probable explanation, for each case described
in the specified data base, and select the most probable modality of each
target variable. The target variables are the nodes declared as not
observable, target or hidden.
The results are stored in an exploitation file that takes the selected
fields of the input file and creates two new fields: one for the predicted
value, the other one for its corresponding probability. |
|
|
Batch inference
optimization |
The batch inference algorithm was optimized in
order to increase the computation speed when all the variables that belong
the the Markov blanket are observed. |
Learning
|
|
Stratification |
When the target value has a very weak
representation (as usually the fraud for example), stratification allows to
modify the probability distribution of the target variable (by using the
weights). This modification of the probability distribution can then permit
to learn a network that is structurally more complex. Once the structure
learned, the parameters (i.e. the probability tables) are estimated on the
not stratified data. In the following dialog box, you can indicate what is
the is the proportion of each modality of the specified node that you want
to obtain. The initial value corresponds to the proportion of the database.
You have simply to move the slider or edit directly the value for each
modality.

When a stratification is done, the icon
is displayed in the status bar. It is possible to remove the stratification
by right-clicking on this icon to display the contextual menu and to choose
to remove the stratification. |
|
|
Observed decision nodes
taken in account in static policies |
Now, the evidences set on the decision nodes
are taken in account in the static policy computation. |
|
|
Maximum spanning tree |
This learning algorithm is by far the quickest
unsupervised learning algorithm. Indeed, it relies only on two passes. The
first one consists in computing the a priori weight of all the binary
relations between all the variables, a the second one consists then in
constructing the maximum weight spanning tree with those relations. Even if
the resulting network is not optimal, it can then be used for a first
imputation of the missing values, it can be used as the initial network
before using Taboo or EQ, and it can also be used for the variable
clustering with there is a lot of variables.
The user can choose between two different scoring methods for this
learning: the Minimum Description Length and the Pearson's Correlation.
However, arcs that are fixed (the blue ones) are treated as normal arcs
but the forbidden arcs are taken in account.
At the end of the learning, a tree without oriented arcs is obtained. To
obtain a bayesian network, the arcs will be oriented so as to avoid
introducing V-structures. However, the use of fixed arcs can introduce
V-structures. |
|
|
Clustering speed
improvement |
Several optimizations improve significantly the
speed of the clustering algorithm. |
|
|
Semi-supervised clustering |
A weight editor is now available in the data
clustering wizard. Those weights, with default value 1, are associated to
the variables and permit to guide the clustering. A weight greater than 1
will imply that the variable will be more taken into account during the
clustering. A zero weight will make the variable purely illustrative.
 |
|
|
Fixed arcs kept in the
multiple clustering |
Now, the arcs fixed by the user in the initial
network, on which a multiple clustering is done, are copied in each created
sub-networks if it is possible, i.e each extremity of each arc must belong
to the new network. |
|
|
Cluster values |
In order to ease the understanding of the
obtained clusters, and if at least one variable used in the clustering has
numerical values associated to its modalities, the modalities of the node
Cluster will have long names automatically associated. This name will
contain the mean value of all the clustered variables obtained when
observing the modality of the Cluster.
 |
|
|
Network structural
compression rate |
In addition to the calculation of the data
compression rate, the network structural compression rate is computed and
displayed in the console at the end of any structural learning. This ratio
represents the network current arc number divided by the network theoretical
maximum arc number.
This ratio is also accessible in the descriptive report of the graph. |
Network
|
|
Markov blanket exported in
PHP and JavaScript |
Two modules of target's Markov blanket
exportation ( )
of a bayesian network ware added to the existing SAS module. It is possible
to export in PHP and Javascript in order to integrate these scripts into a
web content.These modules are available only through Bayesia S.A. that
can export your networks according to your needs and can encapsulate the
script in order to make it easily integrable.
Note: this is an option that is not present in the basic versions of
BayesiaLab. |
|
|
Network descriptive report |
The contextual menu of the network allows to
generate a descriptive report of the network in html. It contains a summary
of the network structure. If nodes of the network are in error, the list of
the errors and the warnings is also displayed.
 |
|
|
Displaying the class and
forbidden arc total numbers |
In the class editor, the class total number is
now displayed and is dynamically updated if the user adds or removes a
class. In the same way, the forbidden arc editor displays dynamically the
current number of forbidden arcs. |
|
|
Constraint node outing arcs
become forbidden |
A constraint node cannot, by definition, have
any children, this is why the adding of an outing arcs from a constraint
node has been forbidden. |
|
|
Editable node name font |
It is now possible to modify the font used to
display the network node names. We can reach this thanks to the
Property menu located in the network contextual
menu.

It is also possible to change the font used by default for all networks
in the Settings. It avoids the modification of the font for each network. |
|
|
Property menu in the
network's contextual menu |
A Property menu was added in the contextual
menu of the network. It gathers the addition or the suppression of the
background image, the node font edition and the edition of the comment
associated to the network. |
|
|
Delete unfixed arcs menu |
The unfixed arcs removal in the network can be
done thanks to the menu added in the contextual menu of the network. |
|
|
Connected root nodes
selection |
The Select menu of the node contextual menu now
contains the possibility to select the root nodes connected directly or not
to the current node. |
Monitors
|
|
Evolution of the node
values displayed in the monitors |
During various observations made on the
network, the value of the nodes, calculated starting from the values
associated to the modalities, evolves. The evolution of this value from one
given evidence to another is now displayed in the monitors at the right of
the current value. |
|
|
Number of cases
corresponding to each modality displayed in the tooltip |
When a database is associated to a network, it
is possible to display, for each modality of each node, the number of cases
of the database that the probability of the method according to the given
evidences represents.
 |
|
|
Computation of the network
uncertainty and probability variations |
From the contextual menu of the panel of the
monitors, it is possible to display in the information panel of the monitors
the uncertainty and likelihood variations of the network.
- Uncertainty: this value represents the uncertainty variations over the
unobserved nodes relatively to the fully disconnected network. This value
is computed from the entropy (the highest entropy corresponds to the
uniform distributions and the lowest one to a probability of 100% on a
modality).
- Likelihood: this value represents the likelihood variations of the
bayesian network relatively to the fully disconnected network. This
likelihood is computed from the joint probabilities of the evidences done.
These values are computed when the corresponding option is checked in the
contextual menu of the monitor panel. |
|
|
Information panel closing
button |
A button
/
allows the user to close (and open) the information panel if this one uses
too much place in the monitor panel.
 |
Interface
|
|
Edit menu modification
|
The Edit menu contains a new menu that allows
the selection of the discrete, continuous, constraint, decision or utility
nodes according to the chosen sub-menu.
It also contains a menu used to remove all the unfixed arcs, all the
arcs, the disconnected nodes or the virtually disconnected nodes (KL Force). |
|
|
Nodes and arcs selection
improvement |
The select/unselect actions of the nodes and
the arcs were dramatically improved in speed. |
|
|
Nodes and arcs centered
during search |
When the user wants to search for nodes or arcs
with the convenient wizard, the results that are selected in the list are
now centered in the middle of the window of the network in order to locate
it more easily. |
|
|
Node editor's size and
position kept |
When the user changes the node to edit with the
combo box located in the node editor, this one keeps its position and its
current dimensions and is not any more automatically centered on the screen
and is not resized to its initial size. |
|
|
Translucent node names
|
The node names display follows the graphic
behavior of the node. If a node becomes translucent for various
reasons, its name also becomes translucent. |
|
|
Indicators on the database
icon |
Two new indicators were added to the database
icon: the stratification indicator and the test database indicator. The
four possible indicators are:
-
: the database stratification indicator
-
: the weight indicator
-
: the test database indicator (data typing)
-
: the missing value indicator
The icon may looks like this:
 |
|
|
Contextual menu on the
database icon |
A contextual menu was added to the database
icon. It makes it possible, according to the different cases, to remove the
database stratification, to remove the weights or to remove the
data typing. |
|
|
Translucent nodes and arcs
not selectable |
The translucent nodes and arcs are not
henceforth any more selectable. It is necessary that they become visible so
that they can be selectable again. |
|
|
Management of the layers
associated to the classes |
When classes are defined in the network, the
icon of the classes indicator is displayed in the status bar:
. A
click on the icon opens the classes editor dialog. A right click on the icon
displays the list of the classes. If a class is selected, it will be
displayed and if deselected, it will be hidden.

The checkbox named All is a short-cut to select or unselect all the
checkboxes at the same time. When a class is not checked, the nodes that are
part of it become transparent and not selectable anymore. If an arc is
between two transparent nodes, it becomes transparent also. |
|
|
Scrolling of the window
during arc drawing or selection |
When the user wants to draw an arc whose size
is higher than the window, the window of the graph scrolls automatically as
soon as the mouse is close to the border. In the same way, when the user
wants to select a zone that is not entirely displayed, the window scrolls
when the cursor of the mouse reaches the border of the window. |
|
|
Zoom centered on the
selection |
When a node or the whole nodes is selected and
the user perform a zoom, the network will remain centered, as much as
possible, on the selection. |
|
|
Best fit of the selection
to the window |
It is now possible to adjust to the size of the
window the nodes that are selected. If no node is selected, the adjustment
is done on the whole network. |
|
|
Proximity of the nodes of
a same class taken in account in the genetic positioning |
A new factor of the positioning genetic
algorithm evaluation function was added. It groups together the nodes that
belong to the same classes.
|
Plugins
|
|
User function plugins |
BayesiaLab can include automatically functions
defined by the user in its equation editor, in order to use them to generate
the conditional probability tables of the nodes.
It is possible to interface Java, C, C++, FORTRAN, Mathematica, etc. This
can be done directly, through JNI or some specialized libraries.
For all information and advices you can contact Bayesia S.A.
To allow this integration, a Java interface is included in the library
BayesiaLab.jar located in the BayesiaLab's installation directory. In order
to create its own function, the user must 'implement' this interface with
its own Java class.
After having created its plugin and after the restart of BayesiaLab, the
plugin is loaded and is available in the equation editor. in the following
example, the user function Sum simply adds two real numbers:
 |
Security
|
|
Encryption of the XBL |
The bayesian network save files were encrypted
in order to keep the confidentiality of the contents if the user wants to
distribute his networks. |
Java
|
|
Upgrading to Java 6 |
BayesiaLab was upgraded to be run under a Java
6 virtual machine. It can take benefit from the technical progress of this
new version of Java. However compatibility with Java 5 was preserved for
those who don't have Java 6. |
|