Français Search

BayesiaLab 4.3 : new features


Data


Importing/Exporting image dictionary

 

It is possible to import and export dictionaries of images associated with the nodes. The dictionary contains a series of association between a name of node or class and the path to an image relatively to the directory in which the dictionary file is. When the images are exported, they are saved in the same directory as the dictionary. Their file format is png.

Automatic association of modalities with the long names

 

When the user associates a database containing the modalities' long names instead of their names, these long names are automatically associated to the corresponding modalities.

Buttons to extend limits during association

 

During a database association, it can happen that, for the continuous nodes, certain values of the database are outside the variation domain of the corresponding node. It is possible to extend the domain of the node for each node. However, when there are many nodes in this case, these two buttons, "Database's Minima" and "Database's Maxima", allow to extend the limits for all the concerned nodes. The button "Network's Limits" allows to filter the lines whose values are outside the domains of the nodes.

Extend lmit


Settings to import integers as discrete modalities

 

It is possible, through the Settings, to define how will be interpreted, in the importation, a column containing only integer values. We can specify below which integer modalities number the node will be automatically considered as a discrete node and not a continuous node. Above this number the node will be considered, a priori, as a continuous node. It is always possible to manually change the type of the node.

Discrete import


Automatic attribution of values for the continuous variable modalities

 

During the importation or the association of databases, it is possible to associate as value of each modality of a continuous node, the mean of the values contained in the database and corresponding to an interval. This property can be modified separately for the importation and association in the Settings.

Interval value


Multiple selection of the modalities when filtering during import

 

When the user filters the data of one column during importation, it is now possible to select several modalities at the same time and to apply the filter to the selection instead of having to do it separately for each modality. That increases a lot the speed of filtering on the large databases.

Buttons to select all continuous or discrete columns when filtering during import

 

It is now possible to select all the continuous columns and all the discrete columns at the data filtering step during import thanks to two new buttons. They allow to easily apply filter to all the continuous nodes or all the discrete ones.

Select continuous discrete


Possibility to save only the data of the selected nodes

 

If the user selects a set of nodes before saving the database associated with the network, the assistant will propose to save only the data corresponding to the selected nodes or to save the whole database.

Save node


Possibility to save the continuous nodes' long names

 

A long name can be associated to each modality of a continuous node. By default, when we save, the modality's name or the continuous value, if it exists, is used. It is now possible to use the modality's long name. This is valid also for the imputation, the database generation, etc.

Long names


Possibility to save or generate the continuous value of a continuous node

 

When a database associated to a network that contains continuous nodes is saved, it is possible to save the continuous values contained in the database, if they exist, instead of the name of the modalities. When there are no continuous values in the database but only the modality's names of the continuous nodes, the values will be generated automatically. For each method of the database, a number is randomly generated in a uniform way between the limits of the corresponding interval. This is also valid for the imputation, the database generation, etc.

Continuous values


Learning and test database

 

It is now possible to define a specific column in the database that will indicate what are the lines that will be used for learning and those that will be used for testing the network. For example, when one wants to use the target evaluation of a network, this allows to induce the network only on the learning lines and to evaluate it on the testing lines without reload any database. It is possible to define as data type (learning or test) column a column that contains only two modalities.

Data typing

It is possible to attribute the training to one of the two modalities and thus the test to the other.

Data typing

When there is no type, the whole database is used for learning and analysis.


Automatic data typing as learning/test during import

 

During import or association, when there is no column to type the data in learning data and test data, it is possible to define a percentage of lines that will be dedicated to the tests.

Automatic typing

The following dialog box allows to define this percentage:

Automatic typing

The lines will be randomly chosen in the database after the filtering step.

Choosing a column for data type won't be anymore allowed.


Choice of the data type for the database saving, imputation and generating

 

If a database contains data for learning and data for tests, when saving or performing imputation, the user can choose between the learning data, the test data or both at the same time.

Type saving

When we generate a database, we also can choose to generate data for learning and data for tests.

Type generating


Weights and data type saved in the database's first columns

 

If a database has a weight column and/or a data type column, both columns will be saved in first positions when the database will be saved.

K-Means discretization

 

A new discretization algorithm for the continuous variable was introduced in BayesiaLab: it is the K-Means algorithm. It allows to discretize continuous nodes very efficiently, specially when there is no target node for the decision tree.

K-means


Use of K-Means if the decision tree fails

 

During the discretization of the continuous nodes at import, if the decision tree was required on a variable and  it fails to find a correct discretization, the K-Means algorithm will be used instead of the equal frequency discretization.

Analysis


Target modality optimization

 

This function allows searching for the tuples that either maximize the probability of the target modality (likelihood maximization) or maximize the a posteriori (probability of the target modality knowing the tuple weighted by the occurrence probability of that tuple). The context (i.e. the evidences) is taken into account during the analysis.

This is an any-time algorithm. In other words, even if this is an exhaustive search over the tuples, it can be interrupted at any moment by clicking on the red light (at the lower left corner of the graph window) without loosing the results. Furthermore, we use a heuristic that allows us to begin the search with the most promising tuples.

The settings panel allows to restrict the search to the selected nodes. The observations' size is the size of the tuples we want to obtain. it is possible to associate the results as an internal database : you simply have to choose the number of examples you want to associate. If you stop the search before attaining the wanted number of examples, the associated database will be smaller. In order to do not use the database for learning, as some columns are completely empty, the data are typed as test data.

You can save the found tuples in a file by specifying its name. The observations saving can be filtered in 5 different ways in order to keep only the combinations of interesting observations. The three first possibilities are :

  • The likelihood is higher than the target's initial probability
  • The likelihood is higher than the best obtained likelihood
  • The likelihood is higher than the defined value (the value is defined between 0 and 1)

In the case we take into account the posterior probability, two other filters are added:

  • The a posteriori is higher than the best obtained a posteriori
  • The a posteriori is higher than the defined value (the value is defined between 0 and 1)

Target modality optimization


Target node sensitivity analysis

 

This tool allows to graphically visualize the impact of the network's variables over each modality of the target variable. The variation range of each modality is displayed according to each node's values. The ranges are sorted from bottom to top from the strongest to the weakest. The first thumbnail represents the variations of the modalities on one graph. The following thumbnails represent the variations for each modality.

The analysis is performed over the whole nodes or over a subset of selected nodes. The context of the observations is taken into account and displayed under the graphs when it is necessary. A contextual menu allows to display the comment associated to the nodes instead of the name. It allows to copy the graph as an image.

Target sensitivity

Target sensitivity

Target sensitivity


Parameters sensitivity analysis

 

This tool allows to measure the impact of the uncertainty associated to the "parameter" nodes on the target nodes by using sampling. By default, the parameters nodes are the root nodes (i.e. the nodes without parent), the target nodes are the leaf nodes (i.e. the nodes without child).

Parameter sensitivity

The result of the simulation can be saved in a file. The result of the analysis is presented with a curve representing the repartition function of the probabilities of each modality, and, a bar chart representing the probability density function. Besides this graphical results, the mean and the standard-deviation of the probabilities of the target modalities are also given. Obviously, the mean corresponds to the marginal probability displayed in the monitors.

Parameter sensitivity

Parameter sensitivity

The analysis is performed over the whole nodes or over a subset of selected nodes. The context of the evidences is taken into account and displayed under the graphs. A contextual menu allows to copy the graph as an image.


Variable clustering report

 

An html report of the variable clustering is now displayed when the user validates the current clustering:

Variable clustering report


Variable clustering dendrogram

 

The button Dendogram was added to the variable clustering toolbar in order to display the hierarchical representation of the current clustering as a dendrogram. It is always possible to modify interactively the number of clusters and to observe the result on the dendrogram. A contextual menu allows to display the comment associated to the node instead of the name. You can also copy the graph as an image.

Variable clustering dendogram


Network global performance

 

This tools computes a global performance index of the network over the associated database. The computed value correspond to the log-likelihood.

If the database contains data for learning and for test, the analysis is done for each kind of database, otherwise it is done over the whole data. When a test database is associated to the network, we can compare the result obtained over the learning database with the result over the test database.

The results are displayed as graphs in a window. There is a thumbnail for each kind of database (learning and test). For each database, it is possible to visualize the results as a density function in which the interval number can be modified dynamically or as a distribution function.

Density function

Global performance learning

Distribution function

Global performance learning

The graphs associated to the learning database and to the test database are at the same scale in order to compare them very easily by simply changing the selected thumbnail.

When an example of a database is impossible, i.e. it represents an impossible combination of evidences, the example is not taken in account in the final result and is displayed in a table in an html report. This report is displayed by pressing on the Skipped Rows button.

Global performance skipped rows


Most probable explanation analysis mode

 

This tool allows to compute the most probable explanation, i.e. the case that have the highest joint probability. The monitors are used to highlight this case. The probability of each modality is then replaced by the likelihood that the corresponding modality belongs to to most probable case. The context (i.e. the evidences) is taken into account during the analysis. Each time and evidence is set, the most probable explanation is computed and the monitors are updated. The joint probability displayed in the upper part of the window of the monitors corresponds to the joint probability of this most probable explanation.

MPE


Neighborhood graphical analysis

 

This kind of analysis allows visualizing, for the selected nodes, what is the set of nodes that are belonging to it according to the mode chosen in the toolbar:

Neighborhood analysis

The nodes that are not belonging to the selected node are made translucent and are not selectable anymore. When we click on a visible node, the node that are not belonging to it are made translucent. In order to make the node visible again, you have just to click on any location except on a visible node.

It is possible to display, through the combo box:

  • The Markov blanket
  • The spouses
  • The parents. It is possible to specify until which distance the ancestors are displayed with the corresponding field.
  • The children. It is possible to specify until which distance the descendants are displayed with the corresponding field.
  • The neighbors. It is possible to specify until which distance the neighbors are displayed with the corresponding field.

In the following example, the Markov blanket of the selected node is displayed, i.e. the not concerned nodes are made translucent :

Neighborhood analysis graph


KL index global contribution percentage

 

In the relationship analysis report, a column containing the percentage of the Kullback-Leibler divergence total contribution for each arc was added.

Pearson's analysis improvements

 

The association of values to nodes' modalities allows to compute R, the Pearson's linear correlation coefficient between two nodes linked by an arc. If the modalities don't have associated values, default values are defined in order to compute R (from 0 to n-1 for a node with n modalities). The thickness of an arc is directly proportional to the absolute value of R, its color represents the sign of R (blue if positive and red if not). The exact value of the correlation for each arc is temporary displayed in the comment of the arc.

Note: if there is no value associated to the modalities, the index of the modality starting from 0 is used. If the node is continuous, the values used is the mean of each interval. If the node is discrete with integer values as modalities, the integer represented by the modality is used.

Pearson

You can use the slider to change the arc display threshold according to the selected filter button:

  • Negative correlation Displays only arcs having a negative correlation greater than the given threshold in absolute value
  • Absolute correlation Displays only arcs having a correlation greater than the given threshold in absolute value
  • Positive correlation Displays only arcs having a positive correlation greater than the given threshold

If all the arcs of a node became transparent, the node becomes transparent.


Inference


Batch MPE inference

 

For each node declared as not observable  target or hidden, we compute, for each case described in the specified database, the most probable explanation and all the likelihoods are saved.

The results are stored in an exploitation file that takes the selected fields of the input file and associates, for each modality of each not observable node, the computed probability.


Batch MPE labeling

 

It makes an inference with the most probable explanation, for each case described in the specified data base, and select the most probable modality of each target variable. The target variables are the nodes declared as not observable, target or hidden.

The results are stored in an exploitation file that takes the selected fields of the input file and creates two new fields: one for the predicted value, the other one for its corresponding probability.


Batch inference optimization

 

The batch inference algorithm was optimized in order to increase the computation speed when all the variables that belong the the Markov blanket are observed.

Learning


Stratification

 

When the target value has a very weak representation (as usually the fraud for example), stratification allows to modify the probability distribution of the target variable (by using the weights). This modification of the probability distribution can then permit to learn a network that is structurally more complex. Once the structure learned, the parameters (i.e. the probability tables) are estimated on the not stratified data.

In the following dialog box, you can indicate what is the is the proportion of each modality of the specified node that you want to obtain. The initial value corresponds to the proportion of the database. You have simply to move the slider or edit directly the value for each modality.

Stratification

When a stratification is done, the icon Icon database is displayed in the status bar. It is possible to remove the stratification by right-clicking on this icon to display the contextual menu and to choose to remove the stratification.


Observed decision nodes taken in account in static policies

 

Now, the evidences set on the decision nodes are taken in account in the static policy computation.

Maximum spanning tree

 

This learning algorithm is by far the quickest unsupervised learning algorithm. Indeed, it relies only on two passes. The first one consists in computing the a priori weight of all the binary relations between all the variables, a the second one consists then in constructing the maximum weight spanning tree with those relations. Even if the resulting network is not optimal, it can then be used for a first imputation of the missing values, it can be used as the initial network before using Taboo or EQ, and it can also be used for the variable clustering with there is a lot of variables.

The user can choose between two different scoring methods for this learning: the Minimum Description Length and the Pearson's Correlation.

Max spanning tree

However, arcs that are fixed (the blue ones) are treated as normal arcs but the forbidden arcs are taken in account.

At the end of the learning, a tree without oriented arcs is obtained. To obtain a bayesian network, the arcs will be oriented so as to avoid introducing V-structures. However, the use of fixed arcs can introduce V-structures.


Clustering speed improvement

 

Several optimizations improve significantly the speed of the clustering algorithm.

Semi-supervised clustering

 

A weight editor is now available in the data clustering wizard. Those weights, with default value 1, are associated to the variables and permit to guide the clustering. A weight greater than 1 will imply that the variable will be more taken into account during the clustering. A zero weight will make the variable purely illustrative.

Clustering weights


Fixed arcs kept in the multiple clustering

 

Now, the arcs fixed by the user in the initial network, on which a multiple clustering is done, are copied in each created sub-networks if it is possible, i.e each extremity of each arc must belong to the new network.

Cluster values

 

In order to ease the understanding of the obtained clusters, and if at least one variable used in the clustering has numerical values associated to its modalities, the modalities of the node Cluster will have long names automatically associated. This name will contain the mean value of all the clustered variables obtained when observing the modality of the Cluster.

Clustering values Clustering values


Network structural compression rate

 

In addition to the calculation of the data compression rate, the network structural compression rate is computed and displayed in the console at the end of any structural learning.

This ratio represents the network current arc number divided by the network theoretical maximum arc number.

This ratio is also accessible in the descriptive report of the graph.


Network


Markov blanket exported in PHP and JavaScript

 

Two modules of target's Markov blanket exportation (Export) of a bayesian network ware added to the existing SAS module. It is possible to export in PHP and Javascript in order to integrate these scripts into a web content.

These modules are available only through Bayesia S.A. that can export your networks according to your needs and can encapsulate the script in order to make it easily integrable.

Note: this is an option that is not present in the basic versions of BayesiaLab.


Network descriptive report

 

The contextual menu of the network allows to generate a descriptive report of the network in html. It contains a summary of the network structure. If nodes of the network are in error, the list of the errors and the warnings is also displayed.

Graph report


Displaying the class and forbidden arc total numbers

 

In the class editor, the class total number is now displayed and is dynamically updated if the user adds or removes a class.

In the same way, the forbidden arc editor displays dynamically the current number of forbidden arcs.


Constraint node outing arcs become forbidden

 

A constraint node cannot, by definition, have any children, this is why the adding of an outing arcs from a constraint node has been forbidden.

Editable node name font

 

It is now possible to modify the font used to display the network node names. We can reach this thanks to the Property menu located in the network contextual menu.

Font editor

It is also possible to change the font used by default for all networks in the Settings. It avoids the modification of the font for each network.


Property menu in the network's contextual menu

 

A Property menu was added in the contextual menu of the network. It gathers the addition or the suppression of the background image, the node font edition and the edition of the comment associated to the network.

Delete unfixed arcs menu

 

The unfixed arcs removal in the network can be done thanks to the menu added in the contextual menu of the network.

Connected root nodes selection

 

The Select menu of the node contextual menu now contains the possibility to select the root nodes connected directly or not to the current node.

Monitors


Evolution of the node values displayed in the monitors

 

During various observations made on the network, the value of the nodes, calculated starting from the values associated to the modalities, evolves. The evolution of this value from one given evidence to another is now displayed in the monitors at the right of the current value.

Number of cases corresponding to each modality displayed in the tooltip

 

When a database is associated to a network, it is possible to display, for each modality of each node, the number of cases of the database that the probability of the method according to the given evidences represents.

Modality cases


Computation of the network uncertainty and probability variations

 

From the contextual menu of the panel of the monitors, it is possible to display in the information panel of the monitors the uncertainty and likelihood variations of the network.
  • Uncertainty: this value represents the uncertainty variations over the unobserved nodes relatively to the fully disconnected network. This value is computed from the entropy (the highest entropy corresponds to the uniform distributions and the lowest one to a probability of 100% on a modality).
  • Likelihood: this value represents the likelihood variations of the bayesian network relatively to the fully disconnected network. This likelihood is computed from the joint probabilities of the evidences done.

These values are computed when the corresponding option is checked in the contextual menu of the monitor panel.


Information panel closing button

 

A button Up/Down allows the user to close (and open) the information panel if this one uses too much place in the monitor panel.

Monitor info


Interface


Edit menu modification

 

The Edit menu contains a new menu that allows the selection of the discrete, continuous, constraint, decision or utility nodes according to the chosen sub-menu.

It also contains a menu used to remove all the unfixed arcs, all the arcs, the disconnected nodes or the virtually disconnected nodes (KL Force).


Nodes and arcs selection improvement

 

The select/unselect actions of the nodes and the arcs were dramatically improved in speed.

Nodes and arcs centered during search

 

When the user wants to search for nodes or arcs with the convenient wizard, the results that are selected in the list are now centered in the middle of the window of the network in order to locate it more easily.

Node editor's size and position kept

 

When the user changes the node to edit with the combo box located in the node editor, this one keeps its position and its current dimensions and is not any more automatically centered on the screen and is not resized to its initial size.

Translucent node names

 

The node names display follows the graphic behavior of the node.  If a node becomes translucent for various reasons, its name also becomes translucent.

Indicators on the database icon

 

Two new indicators were added to the database icon: the stratification indicator and the test database indicator.

The four possible indicators are:

  • Database stratification : the database stratification indicator
  • weight indicator : the weight indicator
  • test database indicator : the test database indicator (data typing)
  • missing value indicator : the missing value indicator

The icon may looks like this: Icone


Contextual menu on the database icon

 

A contextual menu was added to the database icon. It makes it possible, according to the different cases, to remove the database stratification, to remove the weights or to remove the data typing.

Translucent nodes and arcs not selectable

 

The translucent nodes and arcs are not henceforth any more selectable. It is necessary that they become visible so that they can be selectable again.

Management of the layers associated to the classes

 

When classes are defined in the network, the icon of the classes indicator is displayed in the status bar: Classes. A click on the icon opens the classes editor dialog. A right click on the icon displays the list of the classes. If a class is selected, it will be displayed and if deselected, it will be hidden.

Calc

The checkbox named All is a short-cut to select or unselect all the checkboxes at the same time. When a class is not checked, the nodes that are part of it become transparent and not selectable anymore. If an arc is between two transparent nodes, it becomes transparent also.


Scrolling of the window during arc drawing or selection

 

When the user wants to draw an arc whose size is higher than the window, the window of the graph scrolls automatically as soon as the mouse is close to the border.

In the same way, when the user wants to select a zone that is not entirely displayed, the window scrolls when the cursor of the mouse reaches the border of the window.


Zoom centered on the selection

 

When a node or the whole nodes is selected and the user perform a zoom, the network will remain centered, as much as possible, on the selection.

Best fit of the selection to the window

 

It is now possible to adjust to the size of the window the nodes that are selected. If no node is selected, the adjustment is done on the whole network.

Proximity of the nodes of a same class taken in account in the genetic positioning

 

A new factor of the positioning genetic algorithm evaluation function was added. It groups together the nodes that belong to the same classes.

Class weight


Plugins


User function plugins

 

BayesiaLab can include automatically functions defined by the user in its equation editor, in order to use them to generate the conditional probability tables of the nodes.
It is possible to interface Java, C, C++, FORTRAN, Mathematica, etc. This can be done directly, through JNI or some specialized libraries.
For all information and advices you can contact Bayesia S.A.

To allow this integration, a Java interface is included in the library BayesiaLab.jar located in the BayesiaLab's installation directory. In order to create its own function, the user must 'implement' this interface with its own Java class.

After having created its plugin and after the restart of BayesiaLab, the plugin is loaded and is available in the equation editor. in the following example, the user function Sum simply adds two real numbers:

Plugin example


Security


Encryption of the XBL

 

The bayesian network save files were encrypted in order to keep the confidentiality of the contents if the user wants to distribute his networks.

Java


Upgrading to Java 6

 

BayesiaLab was upgraded to be run under a Java 6 virtual machine. It can take benefit from the technical progress of this new version of Java. However compatibility with Java 5 was preserved for those who don't have Java 6.