Download
BayesiaLab 4.4


Dynamic Presentations
of BayesiaLab


Static Presentations
of BayesiaLab


News


Search

 Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools 

News Letter: May 2006

BayesiaLab 4.1: what's new




Data


Discretization with Decision Tree

The list of the available target variables that can be used for the discretization with decision trees contains now the continuous variables for which the discretization as been defined.

Import and associate report

An importation and association report is available at the end of the loading process.

Data file encoding

It is now possible to choose the encoding in order to import file encoded with non default platform encoding (Shift_JIS, UTF-8, UTF-16, UTF-16BE, UTF-16LE, iso, ibm, windows and many others are supported). BOMs (Byte Order Masks) are supported for the UTF-8 and UTF-16 (BE and LE) encoding.


Output file encoding

The settings allow to choose the encoding of the output files such as the saved or generated databases as well as the generated html files (such as reports and others). This encoding format can be changed in the setting panel through the "File encoding" combo box in the option "Database>Save format". An encoding indicator is also added into the meta of the html files.


Data file path

The path of the imported or associated data file is now kept in order to be quickly reused through the Recent database menu.

Dictionaries

Dictionaries of costs, classes and modality values have been added.
  • a cost can be associated to each node by associating a number to the name of the node (or nothing if the node must be not observable)
  • one or more classes can be associated to one or more nodes
  • numeric values can be associated to the modalities of the nodes by associating a number to the name of the node following by a dot and the name of the modality.

The dictionaries can be also exported.


Handling blanks in dictionaries

If the name of a node or a modality contains a blank, it must be replaced by an underscore in the dictionary file. For example, if you want to link a cost (10) to a node named "Node 1", the dictionary file must contain the line:
Node_1 10

Images

All the images associated to the network such as background image and node images are now included in the network xbl file.

Graphs


Occurrences matrix

The results can be displayed along four modes:
  • absolute,
  • percent,
  • line percent,
  • column percent.


Node editor


New node editor

The node editor has a new interface. All the properties of a node can be directly edited by selecting the convenient thumbnail.


Classes

A new mechanism allowing to create classes of nodes has been added. A class is a subset of nodes with a given name. A node can belong to several classes at the same time. It is very useful to manage properties shared by several nodes (costs, temporal indices, colors, images,...) and to create arc constraints between nodes.

The classes are managed with the node editor:


Properties

Color, image, temporal index and cost can be edited in this panel. It is also possible to propagate a property to the nodes that belong to the same classe.


Values

When there is at least a node with associated values, the expected total value of the network and the mean value of the nodes having associated values are displayed below the joint probability.

The values are used quite like Utility nodes. Indeed, an expected numerical value can be obtained by associating an Utility node to each node, except that the modalities without values can't be represented with this kind of node. Thus, these values are used to evaluate the network, to measure the impact of such lever on the quality of the network. However, unlike Utility nodes, these values are not taken in account during the action policies learning.

This is these values that are used to compute the Pearson's linear correlation coefficient.


Comment

The comment can be edited within the node editor.


Network


Forbidden arcs

A new editor allows to create and manage constraints over the arcs. The created constraints will be taken into account by the learning algorithms.

You can forbid probabilistic relation, in one or both directions, between two nodes, between a node and a class of node, between two classes of node or between a class of node and a node.


Constants

A constant editor has been added. These constants will be used in the formulas that generate the conditional probability tables. A constant has a type (real, integer, boolean and string) and a value. The constants are managed by this editor:

To create a new constant you must choose a name that is not already used by a node or another constant, a type and a value. Once the constant is created, its value can be modified and the conditional probability tables will be regenerated according to the formulas that use the modified constants.


Interface


New icons


New search tool

The tool that allows to search nodes and arcs takes not only into account the node nameS but also the classes that have been defined..


New node selection tool

The node selection tool available from the node contextual menu allows to select all the nodes that belong to the same classe.


New alignment tool

The alignment tool available from the node contextual menu allows to define a layout where the selected nodes are equally distributed, horizontally or vertically.


Not observable nodes displayed differently

The not observable nodes and their monitors are displayed now with a mauve color in order to be identified immediately.


Image on nodes

Now, you can display an image instead of the default node representation.

The chosen image can be propagated to the nodes of the same classes if needed. The images are saved in the network file.

We can switch between standard display and images display (if there are images) with the button:


Image preview

An image preview has been added into the file chooser when the selected file is a valid image format. The dimensions of the image are also displayed.

Node comments and arc comments

The comments of the nodes and arcs can be displayed separately.


Properties edited from node's menu

All node's properties can be directly edited from the contextual menu.


Reorganization of the Inference menu

The Inference menu has been split into two menus: the Analysis menu containing all the graphic and report analysis and the Inference menu containing the batch labeling and inference and the adaptive questionnaire.


Security


Proxy authentication

When the automatic validation process is used, you can configure the use of a proxy for the Internet connection by giving the login and password for authentication.

Monitors


Joint probability of the network and others

The joint probability of the network is displayed at the top of the monitor panel. If the network has a database associated, the number of cases is also displayed. If the different modalities of the nodes have associated values, the total value of the network and its mean are displayed.


Node's color displayed in monitors

If a color is associated to a node, it will be displayed as a border in the corresponding monitor.


Time indicator

The time indicator is no more represented by a node but has been included as an icon at the bottom right of the network frame. Clicking on it allows to remove the use of the time in the network.


Learning


Learning optimizations

The EQ and SopLEQ algorithms have been completely rewritten and are much more efficient. The learning time has been reduced by 10% in general.

Completion optimizations

Switching to different completion mode is faster.

The completion methods has been improved during learning.


Compression rate

A compression rate is available in the console. This new indicator measures the data compression obtained by the network with respect to the previous network (usually, the unconnected network). This rate then not only gives an indication on the probabilistic links that are in the network, but also the strength of these links.

For example, with a database containing two binary variables that are strictly identical, the corresponding network will link these variables and describe in the conditional probability table that the value of the second variable is deterministically defined by the first variable. The compression rate will be then equal to 50%.


Inference


Missing values imputation in the associated database

In the Data menu, you can use the new Imputation menu that allows to perform imputation on the currently loaded database. You can choose the values according to the law or according to the maximum probability. The generated database will be saved in a file.


Interactive bayesian updating

The interactive bayesian updating allows to use the associated database as a file of observations. This file can then be used to update the probability distribution of the nodes that have been declared as "not observable", with respect to the observations that are interactively read from this file. Whereas the probability distribution of all the unobserved nodes can be impacted by these observations, we just update those of the "not observable" nodes after each observation. This mode displays a new toolbar that allows to perform a step by step updating and also complete updating over the database:

The button allows to come back to the first example of the database and to reset the probability distributions of the "not observable" nodes. The button performs an updating from the current index to the last in the database. This process can be stop while running by clicking on the red light of the Status bar. The button steps to the next example. The text field indicates the index of the current example. It is possible to enter an index in the field to perform updating from the current index to the new index. If the new index is lower than the current one, the probability distributions are reset and the updating goes from the index 0 to the specified one. The button validates the updated conditional probability tables. The button stops the interactive updating and reinitialize the conditional probability tables. It also removes all the observations.


Interactive inference

The interactive inference allows to use the associated database as a file of observations. This mode displays a new toolbar that allows to navigate through the different cases contained in the database:

The button allows to come back to the first example of the database, the button navigates to the last one. The button goes to the previous example if it is possible and the button goes to the next one. The text field indicates the index of the current example. It is possible to enter an index in the field to go to it directly. The button stops the interactive inference and removes all the observations.

At each example, the nodes are observed with the corresponding value in the database except if this value is missing or if the nodes are declared as not observable or as target node. The probability distributions of these unobserved nodes are computed and displayed in the monitors. When a node is unobserved but has a corresponding value in the database, this value is indicated in the monitor with the blue sky bar. The joint probability and the corresponding number of cases are also computed again.

In the following picture, Cancer is the target node (pink background) and is not observed. The corresponding value in the database is No (blue sky) and corresponds to the value predicted by the network (99,97%). The node TbOrCa is not observed because it is declared as not observable (mauve background) et the corresponding value in the database is False (blue sky). The node Smoking is not observed because the corresponding value is missing in the database:

This mode allows then to see interactively the behavior of the network and to check its validity.


Batch inference

The batch inference has been added in order to infer the probability distributions of the nodes declared as "not observable" based on the cases that are describe in a database. The batch inference process can be interrupted at any time without loosing the computed data. The already generated data are saved in the output database.

Batch labeling

The batch labeling process can be interrupted at any time without loosing the computed data. The already generated data are saved in the database. A node with "not observable" cost is not observed, even if its values are in the file.

New gain analysis tool

The gain curve has been extended ("Performance of the network" analysis toolbox) in order to automatically analyze the expected economical gains with the evaluated model. These computations follow the definition of the unit costs corresponding to the treatment of each individual (x-axis), of unit gains corresponding to each positive answer (y-axis), and finally of a target population's size. The economical gain is then defined as the difference between the profit corresponding to the treatment of x% of the population and the profit corresponding to the treatment of the whole population. As the following screen captures shows it, the result is displayed as a curve (blue curve) and as a gradient of color (the closer we are to the yellow, the more we are close to optimality).

The economical parameters can be modified with the following dialog:


Complexity reduction algorithm

We have developed a new algorithm to reduce the complexity of the graph that are too connected to allow the construction of the junction tree, and then, that prevent exact inference. This algorithm incrementally simplifies the network structure until the exact inference can be performed.

After reduction, a report containing all the removed arcs is displayed.


Pearson's correlation

The association of values to nodes' modalities allows to compute R the Pearson's linear correlation coefficient between two nodes linked by an arc. If the modalities don't have associated values, default values are defined in order to compute R (from 0 to n-1 for a node with n modalities). The thickness of an arc is directly proportional to the absolute value of R, its color represents the sign of R (blue if positive and red otherwise). The exact value of the correlation for each arc is temporary displayed in the comment of the arc. The Pearson's correlation has also been added to the relationships analysis report.

Network's skeleton

In Validation mode, the network can be displayed without the head of the arcs in order to avoid any erroneous causal analysis of the direction of the arrows. This option is activated by pressing the convenient button in the toolbar:


Settings


Editing

An option has been added in the Settings to choose the behavior of the software when a node has been created:
  • once a node or an arc has been created, the software go back to selection mode automatically
  • the software goes back to selection mode when the user do a right click.


Database

It is possible to set default values for some options of the importation wizard:
  • completion algorithm for the missing values (static, dynamic or Structural EM),
  • discretization method (equal distances or equal frequencies),
  • number of intervals.


Inference

It is possible to modify the parameter of the network complexity reducer algorithm. Based on the available RAM memory, one can increase or decrease the reduction rate.


© 2001-2008 Bayesia SA.
All rights reserved.