Français Search

BayesiaLab 4.2 : new features


Data

  • Manual imputation of hidden nodes or with missing values
  • Random selection when equiprobability during imputation
  • Automatic association of discrete values for continuous nodes
  • Transfer of aggregates
  • Using class.modality keys in dictionaries
  • Import/Export of color dictionaries
  • Internal database generation
  • Import/Export of long name dictionaries for the modalities

Analysis

  • Variable clustering
  • Performance of the network over multiple thresholds

Inference

  • Batch joint probability
  • Exact inference for the compact dynamic bayesian networks

Learning

  • Taboo learning optimizations
  • Data Clustering improvement
  • Graphical representation of the obtained clusters
  • Multiple clustering
  • Running EQ from an already connected network

Network

  • Markov blanket export
  • Class editor
  • Display dialog of introduced cycles during arc addition
  • Long names for modalities
  • Import/export of forbidden arcs
  • Copy of the classes

Monitors

  • Monitors toolbar
  • Highlighting the maximum posterior probabilities variations
  • Fixing the reference probability distribution
  • Direct probability distribution setting
  • Display sorted monitors for the selected nodes
  • Modality long names display option

Interface

  • Resizable toolbar container
  • Quit button in the error report
  • Tooltip for the modality names in the CPTs
  • Constraints, constants and classes indicators icons
  • Selective rotation
  • Lexicographical order of the variables in the node editor
  • Back to selection mode configurable for the networks edition
  • Positioning grid displayable and modifiable

Security

  • Install, activation and uninstall with administrative rights
  • Uninstall key
  • License server parameters in the preferences

 




Data




Manual imputation of hidden nodes or with missing values
When a node (or a set of nodes) has missing values in the database associated to the network, it is possible to directly perform the imputation of its/their missing values, i.e. without using the global imputation process. In modeling mode, you must select these variables and choose Imputation in the nodes' contextual menu. The corresponding missing values in the database are then replaced by the imputed values.
When the network contains hidden variables and an associated database, it is also possible to perform the imputation of these nodes. This is the case for example for the imputation of the Cluster node, result of the data clustering.

Imputation




Random selection when equiprobability during imputation
When an equiprobability over two or more modalities occurs during imputation, the final modality is chosen randomly.


Automatic association of discrete values for continuous nodes
When a database is associated to a network, it is possible to associate a column containing discrete values to a continuous node. If the column name is the same as the node name and the name of the values are the same as the name of the node's intervals, the association is automatically realized.
It should be noted that, in such case, no numeric values would be associated to the node in the loaded database. It will then be impossible to use graphics with continuous values.


Transfer of aggregates
During data import, it is now possible to transfer aggregates defined for a variable to other variables. Once the aggregates is defined, we select the variables on which we want to have the same aggregates and we perform the transfer. In order to perform the aggregation over a variable, this variable must have the same modalities as the initial variable. It is possible that the transfer is not performed on some variables when the final modality number is less than 2. In that case, a list containing the variables on which the transfer failed is displayed.


Using class.modality keys in dictionaries
To associate a value or a long name to a modality, it is possible to use class.modality key in order to associate this value or long name to the specified modality of all the nodes of the specified class. It allows avoiding the enumeration of all concerned nodes.


Import/Export of color dictionaries
The color dictionary allows to associate colors to the specified nodes or to the nodes of the specified classes. These colors are described in hexadecimal RGB coding with 8 bits by channel (web coding). A color will be written with 6 hexadecimal characters: 2 characters for red (from 00 to FF: from 0 to 255), followed by 2 characters for the green and 2 characters for the blue.
Examples of colors :
  • red: FF0000
  • green: 00FF00
  • blue: 0000FF
  • gray: 929292
  • yellow: FFFF00
  • pink: FFC0FF



Internal database generation
It is now possible to generate data, in memory, in agreement with the joint probability distribution described by the active network. These data can then be saved on disc.


Import/Export of long name dictionaries for the modalities
A long name can be associated to each node modality and can be used instead of the default name in the different kinds of display. As the values, these long names can also be imported from a dictionary. These long names can also be exported into dictionaries.
It is also possible to use these long names in the database saving and imputation by checking the corresponding option.

 

Analysis



Variable clustering
A very powerful analysis tool has been added to BayesiaLab. This tool allows to cluster the network variables into group of variables that are close semantically. These clusters are designed according to the node proximity in the graph and based on the force of the arcs. A color is automatically associated to each cluster to highlight the clustering:

Segmentation

The number of clusters is automatically computed by using the force of the arcs (4 in the previous example). However, the associated toolbar contains a slider that allows to choose the desired number of clusters:

Toolbar variable partitioning

The button  Validatevalidates the current clustering and associates to each cluster a class named Cluster_i.

Once the classes created, it is possible to perform the multiple clustering, in modeling mode, that allows, for each class named Cluster_i, to generate a synthetic variable from the nodes belonging to this class.




Performance of the network over multiple thresholds
The network performance analysis tool has been enriched by the possibility to evaluate several probability acceptance thresholds at the same time. It allows to find the optimal policy to adopt according to the results obtained with the different acceptance thresholds. It is possible to define a unique threshold (0.5 by default) that can be modified or to use several thresholds for the evaluation. We indicate the needed threshold number and the algorithm choose the probability thresholds by following an equal frequency distribution. The following dialog box allows to enter the parameters for the evaluation:

network performance threshold
In the result window, a combo box allows to choose the threshold for which the confusion matrix (occurrences, reliability, precision) will be displayed.

network performance panel

A button named Global report generates a Html report that synthesizes the the analysis on the various thresholds.


network performance report


 Inference


Batch joint probability
A new tool allows to compute the joint probability of each record of a database, using the active Bayesian network. The batch joint probability process can be interrupted at any moment without loosing the computed data. The already generated data are saved in the output database.
This tool can be very useful to detect outliers, ie. atypical records with very weak joint probability, by taking into account all the variables.




Exact inference for the compact dynamic Bayesian networks
It is now possible to perform exact inference with compact dynamic Bayesian networks that have dependent temporal nodes. It is necessary to qualitatively indicate these dependences (without modifying the original conditional probability table to keep the possible marginal independence), by simply adding arcs between these dependent nodes, as illustrated below, with the red marked arcs. 

Temporel Xor

 Learning


Taboo learning optimizations
The Taboo learning algorithm has been optimized and its learning time has been reduced of 80% on average.




Data Clustering Improvement
The Data Clustering algorithm has been completely reconsidered. The implementation of a new score and the development of a new search strategy allow obtaining much more relevant clusters. These clusters are more stable (differences between the marginal distributions and the effective training records associated to the cluster), and with an improved purity (the mean of the cluster probability computed from each associated training record). The resulting number of clusters can be then consequently reduced with respect to the previous release (elimination of the clusters that are finally irrelevant). 


Graphical representation of the obtained clusters
The final cluster analysis report contains now a new button named Mapping. It allows to display a graphical representation of the obtained clusters.

Cluster mapping
This graph displays three characteristics of the clusters:

  • the color represents the purity of the clusters: the darkness of the blue is directly proportional to the purity. Cluster 8 is the cluster with the higher purity in the above example;
  • the size represents the prior probability of the cluster;
  • the distance between two clusters represents the mean neighborhood of the two clusters.
    It is possible to rotate the graph using the two buttons at the bottom right.



Multiple clustering
This tool allows to carry out data clustering on various subsets of variables of the same Bayesian network. The subsets are defined by the classes that are named Cluster_i. The variables of each class Cluster_i are then used to induce a new variable (the latent variable, i.e. without any corresponding data in the file) that summarizes them.

Even if it is possible to manually define these classes, they can be created automatically by the variable clustering tool.

A Bayesian network is created for each of these classes. This network contains the variables that belong to the class and the synthetic variable Cluster. At the end of the last data clustering, a final Bayesian network is created. It contains all the Clusters and comes with a internal database where all the values of these latent variables have been inferred with respect to their corresponding network. It is also possible to add to that final Bayesian network the intial variables (the manifest variables).

The following dialog box allows to enter the different parameters for Multiple clustering. It naturally reuses most of the parameters used for data clustering.

multiple clustering parameters

In addition to the parameters that are also in data clustering, the wizzard allows to select the directory where the various generated networks will be saved (a network by class Cluster_i and the final network with all the latent variables). This wizzard also allows to add or not all the nodes of the initial network to the final one. As in data clustering, the number of values of the latent variables can be a priori fixed or found by a random walk. It can also be defined as being equal to the average number of values of the variables belonging to Cluster_i. The remainder is strictly identical to data clustering.

At the end of each clustering, an automatic analysis of the obtained Bayesian network is carried out and a target analysis report is generated. This report is identical to the one generated by data clustering.

A the end of the last clustering, a synthetic report is generated. This report describes, for each latent variable, the distribution of its values on the learning set, and the list of the nodes sorted according to the quantity of information brought to its knowledge.



Running EQ from an already connected network
It is now possible to start the EQ learning algorithm from a network that already contains arcs. It should be noted that the fixed arcs are considered as normal arcs and can be removed or reversed by the algorithm.
The following dialog box allows to choose the starting mode of the algorithm:


Learning dialog


 

Network


Markov blanket export
In order to ease the deployment of the Bayesian scoring functions, as for example in direct marketing applications, a new exportation tool (Export) allows now to export the Markov blanket of the target variable in external programming language. The exportation is a text file, a program that performs exact inference on the target variable from the values of the Markov blanket's nodes according to the programming language chosen in the following dialog box:

Export language
In SAS, for example, the output file contains a SAS macro that can be used directly.

Other languages like SPSS, Java, C, C++ will be soon available. Bayesia will also study more specific user requests in order the increase the possibilities.

Note: this is an option not included in the classic versions of BayesiaLab.




Class editor
A class editor has been added to directly manage all the classes. It is possible to add and remove classes and to modify the set of nodes contained in each class.

Class editor
By selecting one or more classes in the list, it is possible to modify the color value, the image, the temporal index and the cost for all the nodes of the selected classes. Each corresponding button opens an editor specific to each property:

  • Color :

Class editor color

The second checkbox allows to generate automatically a distinct color for each selected class. It is thus necessary to select various classes in the list.

  • Image

Class editor image

  • Temporal index

Clas editor index

  • Cost

Class editor cost


The modifications will be only applied during the general validation of the dialog box. It also should be noted that the modifications made with the properties act only on the nodes which already belong to the class and not to the nodes which will be added later.




Display dialog of introduced cycles during arc addition
In modeling mode, when we add an arc that introduces a directed cycle into the graph, a dialog box announces it and allows to display a window in which the list of introduced directed cycles is displayed.

Loop warning
In the following case, the addition of an arc between Dyspnea and Age introduces 3 directed cycles. The lengths of the cycles are also indicated. Clicking on a directed cycle in the list allows to highlight the arcs of the cycle (pink arcs).


Loop dialog




Long names for modalities
Each node modality can now have a long name associated to it. This long name can be used in monitors and reports. It can also be used during database exportation (saving, imputation) to replace the (short) modality names by the long names. The long name editor has been added to the node editor.

Editor modality names




Import/export of forbidden arcs
The forbidden arcs can now be imported and exported into a text file.

Forbidden arcs editor

 



Copy of the classes
The copy of the classes is now managed during the copy of the nodes that belong to these classes. The copied nodes are added to the classes the initial nodes belong to.

 

Monitors


Monitors toolbar
To ease the management of the monitors and to extend their functionalities, a monitor management toolbar has been added and is displayed in validation mode:

Toolbar editor
It contains 5 tools:

Remove observation: evidences removing 
monitors removing: monitors removing 
clear shift: resetting of the gray arrows representing the probability variations 
setting the reference: setting the reference for the probability variation 
highlighting of the maximum probability variations: highlighting of the maximum probability variations




Highlighting the maximum posterior probabilities variations
The button Image button allows to highlight the modalities for which the last evidence setting have implied the greatest negative and positive probability variation. Whereas the green arrow represents the maximum positive variation, the red one indicates the maximum negative variation.


Fixing the reference probability distribution
The button  Image buttonallows to define the probability distributions that are used as reference for the computation of the probability variations (grey arrows). By default, the probability variation is always computed from the previous distribution (ie before entrering the last evidence).

Direct probability distribution setting
A new evidence setting tool have been added to directly enter a desired probability distribution.

Monitor
The likelihoods are computed so that the final probability distribution of the node is the one entered by the user. The probability edition mode is available by two means: by pressing the Ctrl and Shift keys while clicking on a modality bar, or by using the contextual menu associated to the monitor. A green and red buttons are then added to the monitor. The probabilities can be entered:

  • by maintaining the left mouse button pressed while choosing the desired probability level, or
  • directly by editing the probability value thanks to a double click on the value.
  • A click on the name of the modality (on the right) fix the current probability value (the probability bar is green).

Once all the probabilities are entered, the green button allows validating the data entry. The likelihoods and the probability distribution are then updated. The red button allows canceling the probability edition.




Display sorted monitors for the selected nodes
The displaying of the monitors sorted according to the target information gain or to the target modality information gain can be restricted to selected nodes of the graph. 


Modality long names display option
The contextual menu of the monitors allows to switch between the modality labels display and the modality long names display. In the case of the long names display, if a modality does not have a long name, its label will be displayed instead.

 

Interface


Resizable toolbar container
In order to allow to display all of the toolbars according to any window size, the toolbar container is resizable and displays on another line the toolbars that will be otherwise truncated. It can contain as many lines as toolbars.

Resizable toolbar




Quit button in the error report
The dialog box of the error report contains a Quit button that allows to exit the program without having to send the error report.


Tooltip for the modality names in the CPTs
Sometimes the name of the modalities are too long to be fully displayed in the CPTs headers in the node editor. A tooltip showing the full name is displayed when the cursor hovers over the concerned headers.


Constraints, constants and classes indicators icons
In the graph window's status bar, at the bottom right, three new icons are displayed:
Contraints: forbidden arcs indicator. It is displayed only if some forbidden arcs are defined. Clicking on it opens the forbidden arc editor. It is only enabled in modeling mode. 
Constants: constants indicator. It is displayed only if some constants are defined. Clicking on it opens the constant editor. It is only enabled in modeling mode. 
Classes: classes indicator. It is displayed only if some classes are defined. Clicking on it opens the class editor. It is only enabled in modeling mode.



Selective rotation
The behavior of the graph rotation has been modified in order to rotate whole or part of the graph. When there is no selected nodes, or when all the nodes are selected, the whole graph will rotate around its barycenter. If only some nodes are selected, only these nodes will rotate around their barycenter.

Lexicographical order of the variables in the node editor
In order to have a easiest access to the different nodes in the node editor, the combo box displays the nodes in the lexicographical order rather than the creation order.


Back to selection mode configurable for the networks edition
When we create networks, we often switch between node or arc creation mode to selection mode. It is now possible to do it in two ways: either by a right click which allows to return in selection mode on request, or by activating the option of automatic back to selection mode after each action. This option is in the editing preferences:

Preferences edit




Positioning grid displayable and modifiable
The nodes positioning grid is now displayable and modifiable. Its display is managed by the convenient menu (Grid).
The line spacing, expressed in pixels, is modifiable in the display preferences.

 

Security


Install, activation and uninstall with administrative rights
Install, activation and uninstall of the software are done with the administrative rights.




Uninstall key
During the uninstall, the software displays an uninstall key that can be used to reinstall BayesiaLab on another computer keeping the same license.

 
License server parameters in the preferences

In the license server version of BayesiaLab, the server parameters can be modified in the corresponding preferences and will be effective at the next start of the software.