Français Search
www.bayesia.com does not fully support your browser (Internet Explorer 6).
We suggest upgrading to IE 7 or downloading Firefox for a more enjoyable web experience.

BayesiaLab 4.5: new features


Data


Database saved with graph The data associated with the network can be saved inside the same file as the network. This operation is now realized by default in BayesiaLab, nevertheless, an option allows deactivating automatic database saving in the "settings" menu.



When a loaded network contains a database, the database is automatically loaded too (unless the user explicitely specifies not to do so, see section "optional database loading").

Evidence scenario file It is possible to associate an evidence scenario file to a bayesian network. This file contains a series of evidence sets that are applied on the nodes of the network. This file can be written by the user and then imported, but it can be also generated by saving the evidences done with the monitors. After that, this generated file can be exported. A comment can be associated to each set of evidences. this comment will be displayed in the network's status bar during interactive inference or interactive updating.

There are three possible kinds of evidences which are the same as ones obtained with the monitors:
  1. Hard evidence on a state of a node
  2. Likelihood distribution over the states of a node (noted l{...} )
  3. Probability distribution over the states of a node (fixed distribution with exact inference, computation of the corresponding likelihoods with approximate inference) (noted p{...} )
When a network is temporal, it uses the Time variable or although it has at least a temporal node, the time step can be set by indicating its value with a positive or null integer.

The following example contains hard evidences, likelihoods and probabilities for four time steps:
0;?Valve1?:OK;?Valve2?:OK;?Valve3?:OK //All the valves are working
2;?Valve1 t+1?:l{OK:0.8
;RC:0.9;RO:0.9}
20;?Valve2 t+1?:l{OK:0.3
;RO:0.3;RC:0.3};?Valve1 t+1?:p{OK:0.2;RO:0.4;RC:0.4}
30;?Valve3 t+1?:OK;?Valve1 t+1?:
p{OK:0;RO:0.8;RC:0.2}
When a temporal evidence file is associated to a temporal network, the evidences are taken into account each time the time meter reaches one of the specified time steps. When the file and the network are not temporal, the evidences are taken into account during interactive inference or during interactive updating.
The following example shows not temporal evidences and also evidences with numerical values:
?Smoker?:Yes;?Age?:25.5;?Bronchitis?:p{Yes:0.8;No:0.2} //Young smoker with a large probability of bronchitis
?Smoker?:No;?Age?:70;Dyspnea:l{Yes:0.8;No:0.5} //Non-smoker senior person 
When an evidence scenario file is created or imported, the icon is displayed in the network's status bar. The tooltip associated to this icon indicates the number of evidence sets contained in the file. A click on this button allows us to remove the association between the network and the file.

Default missing and filtered values lists When importing and associating data, default values for missing and filtered data are defined. The lists are filled from the import/associate section in the settings menu. With this option, any value in the data that matches a value in those lists will automatically be recognized as missing or filtered.

Default missing values and default filtered values lists:


In the settings screen, the two lists can be tuned :
Listes des valeurs manquantes et filtrées par défaut

Associated continuous variables as targets for discretization and aggregation When associating data, it is now possible to use continuous associated variables as targets for discretizing or aggregating other added variables. This comes in addition to manually discretized added variables.

Optional database loading If a network has been saved with a database, this database can be loaded or not by checking the Load database option in the network loading dialog box.

Option de chargement de la base de données

Import/Export forbidden arcs in the menus Exporting and importing forbidden arcs are now available from the import/export dictionnary menu.

K-Means discretization algorithm upgrade The K-Means discretization algorithm has been upgraded for offering new features:
  1. When one of the created intervals size is below a specified threshold, the points contained in the interval are automatically merged with its neighbours.
  2. A filter whose purpose is reducing outlier impact on discretization has been introduced. The filter value allows computing the width of the sliding window which will balance each value of the database. The size of this window is the given percentage of the distinct values number. In order to activate this filter, the filter value must respect two conditions: first the filter value must be superior or equal to 2, second the number of distinct numeric values in the database must be superior to (2 + width of window).  
These properties can be tuned from the settings:

Options du KMeans


States modification without loosing the database association When a state is added or when the states order is changed in a node, the database is not automatically removed anymore. The system checks whether the states are still compatible with the database and reconfigurates it. Note that a state cannot be removed without removing the database association too.

Concerning continuous nodes, the bounds of intervals can be changed, as well as intervals themselves (deletion or addition) as far as the internal database contains numerical values.

Network


Filtered states For each node, discrete or continuous, it is possible to define one and only one filtered state among all the states of the node. This state allows us to represent the cases where the variable doesn't have a real existence according to the other variables of the network. It is the case when the current variable has a value only if another variable has a specific value. For example, let us take the case of a variable Analysis which is performed only according to the result of a variable Test:

The variable Test has two states: "True", "False".

The variable Analysis has three states: "Positive", "Negative" et "Not applicable" (also noted "*") that will be defined as a filtered state
  • if the value of the variable Test is "True" then the value of the varaible Analysis can be "Positive" or "Negative" according to a certain probability distribution
  • if the value of the variable Test is "False" then the value of the variable Analysis will be the filtered value "Not applicable" (or "*")
This configuration often occurs in the results of questionnaires, surveys, etc, where questions or items are asked or not according to the answer to previous questions. The problem is that when the structure of a network is learned with this type of data, very strong structural relations between the "pivot" variables (like the variable Test) and the "conditional" variables (like the variable Analysis) are systematically extracted to represent this dependence while bringing, generally, no "interesting" information to the model. The important thing to learn, is the behavior of the "conditional" variables with regards to the model only when they exist, i.e. on data sample in which them value is different from the filtered value. The definition a filtered state allows doing this.

Filtered state indicators:

When a network has filtered states, the filtered state indicator  is displayed in the network's status bar.

If a node has a filtered state, its monitor will display the icon in front of the corresponding state, as we can see it in the following picture :



Importing/Associating data containing filtered values:

It is possible to define a filtered state at data importation or at data association. In this case all the filtered values of the database will be replaced by the character *. At the end of importation, for each node containing a filtered value, a filtered state will be added to the list of the states present in the database. Concerning continuous nodes, a new interval of width 1E-7 will be added after the intervals defined during importation. The name of this interval will be * and it will be considered as a filtered state.
In the case of data association, if the variable is added then the behavior is the same as at importation, if not, if it already exists and it is a continuous variable, several cases are to be differentiated:
  1. No initially filtered state in the continuous node, filtered values in the database: an interval of width 1E-7 will be added after the intervals defined in the node.
  2. Initially filtered state in the continuous node and no filtered value in the database: the imported values which correspond to the interval of the filtered state are treated as filtered values.
  3. Initially filtered state in the continuous node and filtered values in the database: the imported values which correspond to the interval of the filtered state are treated as filtered values and the imported filtered values are also associated with this interval.
Editing filtered states:

The node editor has a specific panel allowing us to manually define one filtered state by node.

Filtered states dictionary:

It is possible to define what are the filtered states of the nodes in the network by using a dictionary. It is also possible to export a dictionary containing the filtered states of the network.

Excluded nodes A node can now be temporarily excluded from the workspace. Those nodes are not taken into account during structural learning. Node exclusion/inclusion is done from the node's contextual menu.
The MDL score is modified consequently: the impact of the excluded nodes is suppressed (with or without missing values).

Global editor and indicator of temporal indices A dialog box has been designed for editing nodes temporal indices:

Editeur global d'indices temporels

When at least one node has a temporal indice, the icon is displayed in the network's status bar. Clicking this icon opens the dialog box.

Cost indicator When at least one node has an associated cost, the icon is displayed in the network's status bar. Clicking this icon opens the cost editor dialog box.

CPT of the selected nodes in the graph report The graph report data is completed with selected nodes CPT. If no node is selected, all CPTs are added to the report.

Excluded nodes list in the graph report The graph report also contains a list of excluded nodes.

Sorting of forbidden arcs in the graph report If the network contains forbidden arcs, then the sorted list of these arcs is added to the graph report.

Structural coefficient for each network The structural complexity influence coefficient is now associated to each network.
By default, this coefficient value is 1, it can be modified from the popup menu with this dialog box:

Editeur de l'influence de la complexité structurelle

When the coefficient differs from 1, the icon  is displayed in the network's status bar, clicking this icon opens the dialog box.

Learning


Temporal indices taken into account in the spanning tree The maximum spanning tree learning algorithm now takes into account the node temporal indices.

Last used learning kept and saved When a structural learning has been processed on a network, information about the used learning algorithm is saved in the network until the next learning. This information is also included in the saved network file. This information allows reusing the same learning algorithm when performing arc confidence analysis or targeted cross validation.

Saving continuous values in the multiple clustering's databases Each new intermediate database generated during multiple clustering can be saved with the corresponding network depending on the settings, and can contains also the numerical data extracted from the initial database.

Cluster with sorted numerical states For data clustering and multiple clustering, clusters composed of numerical states can now be generated, thanks to the option in the dialog box:

Paramétrage de la segmentation

If the option is checked, the cluster node will be created with ordered numerical states. These values are calculated from the score average of connected nodes. If two values are strictly identical, an epsilon value is added to one of them in order to make the difference. Excluded nodes are not taken into account for calculating these values.

Modification of state values computation in the clustering Cluster's state values are calculated on the basis of connected nodes, without excluded nodes.

Inference


Improvement of the junction tree's performance Inference computation performance in the junction tree has been dramatically improved, the time gain is up to 60 times in strongly connected networks.

Improvement of the junction tree's creation As a consequence to inference computation improvement, junction tree creation is potentially longer. Aware of this potentiual pitfall, the junction tree creation procedure has been multi-threaded, resulting in a strong time-gain directly proportional to the number of cores/processors of the computer.

Improvement of the complexity reducer Complexity reducer has been improved for dectecting faster and more accurately junction trees that would not fit into the available memory size. The discovery of arcs that can be deleted through the complexity reducer has also been improved.

Fixed probability observations A new kind of evidence was introduced. It is the fixed probability observation. This kind of observation is added to the simple affectation of the probabilities (already exists). It is only possible with exact inference.

As setting probabilities, fixing the probabilities also allows indicating the probability distribution of a node. However, likelihoods are recomputed after each new pieces of evidence gathered on the other nodes (hard, soft, and probability distribution) so that the final probability distribution of the node remains the same as initially entered by the user.
The probability edition mode is available by two means: by pressing the CTRL and SHIFT keys while clicking on a state bar, or by using the contextual menu associated with the monitor. A light green, mauve and red buttons are then added to the monitor. The probabilities can be entered:
  • by maintaining the left mouse button pressed while choosing the desired probability level, or
  • directly by editing the probability value thanks to a double-click on the value.
  • A click on the name of the state (on the right) fix the current probability value (the probability bar is green).
Once all the probabilities are entered, the light green button allows setting the probabilities and the mauve button allows fixing the probabilities. The probability distribution is then updated. The red button allows cancelling the probability edition.



So, there are two ways to use the probability capture:
  1. Simply setting the probabilities: When the probabilities are valildated with the light green button, the likelihoods associated to the states of the node are computed again in order to make the marginal probability distribution correspond to the distribution entered by the user. It is, in fact, an indirect capture of the likelihoods. You must note that, at the next observation of another node, the probability distribution of this node will change because the likelihoods are not computed again.
    The result will be displayed with light green bars as the likelihoods:

    The observed node takes the light green color of the evidence.
  2. Fixing the probabilities: When the probabilities are valildated with the mauve button, the likelihoods associated to the states of the node are computed as in the previous case. However, after each new evidence entered on the other nodes, the likelihoods are computed again in order to make the marginal probability distribution correspond (if the solution exists) to the distribution initially entered by the user. Fixing probabilities is also done in the evidence scenario files with the notation p{...}. You must note that fixing probabilities is only valid for the exact inference. If the approximate inference is used, fixing probabilities is considered like simply setting the probabilities.
    The result will be displayed with mauve bars:

    The observed node takes the mauve color of the evidence.

Evidence scenario file for batch exploitation Evidence scenario file associated with the network can now be used for batch exploitation algorithms:
  • Labeling
  • Inference
  • Most probable explanation labeling
  • Most probable explanation inference
  • Joint probability
Three types of observations are now allowed:
  1. Hard evidence on a state of a node
  2. Likelihood distribution over the states of a node
  3. Probability distribution over the states of a node (fixed distribution with exact inference, computation of the corresponding likelihoods with approximate inference)

Evidence scenario file for interactive inference and updating The evidence scenario file associated with the network can now be used for interactive inference and interactive updating. A dialog box allows choosing data source if needed.

In this configuration all three observation types are allowed.

Observations comments that might be in the file are displayed in the network's statusbar.

Evidence scenario file for temporal networks The observations scenario file associated with the network can now be used for temporal simulation. The file must come with temporal markers for correct observations at the required time step.

In this configuration all three types of observations types are allowed.

Analysis


Initial probability displayed in the target sensitivity analysis In the chart of the target sensitivity analysis, the initial probability of each state is displayed as caption.

Unification of evidence contexts The observations made on nodes before a graphical analysis or an analysis report are taken into account whatever the observation type is (hard, likelihood, probability).
These evidence contexts are displayed in HTML in the reports or directly in the graphical results with the same rules:
  1. The hard evidence on a node's state is noted:
    <Node>: <State>
  2. The observation of a likelihood distribution on the states of a node is noted:
    <Node>: l{ <State1>: <Likelihood1 %> , ... , <StateN>: <LikelihoodN %>}
  3. The observation of a probability distribution on a node (fixed distribution in exact inference, computation of the corresponding likelihoods in approximate inférence) is noted:
    <Node>: p{ <State1>: <Probability1 %> , ... , <StateN>: <ProbabilityN %>}

Comments and long names displayed in the parameter sensitivity analysis In the chart of the parameter sensitivity analysis, the contextual menu allows displaying or not the node comments instead of the names and the state long names.

KL taking into account the filtered values The Kullback-Leibler computation takes into account the filtrered states declared in the nodes. 

KL and global contribution displayed in the arc comments In the arc force analysis, the arc comments is replaced by the value of the Kullback-Leibler divergence and by the value of the global contribution of each relationship.

Simply press the display arc comment button to obtain them.

Choice of the independence test G or Chi² Two independence tests can be computed: Chi²-test and G-test. In some analysis where they are computed, it is possible to display either one or the other. This choice is done in the statistical tools' settings where a list allows choosing the used independence test:
Choix du test d'indépendance

Once the test chosen, it will be used in the relationship analysis, the total effects, the target analysis report and the chart of the occurrence matrix.

GKL-test on the network for the relationship analysis In the relationship analysis report, the independence test G computed from the Kullback-Leibler divergence of the relationships was added to the HTML report. This test is noted GKL-test. The degree of freedom and the corresponding p-value are also displayed.

Independence test on the network or the database for the relationship analysis, total effects and target analysis In the reports of the relationship analysis, the total effects and the target analysis, the independence test between variables is now computed. The degree of freedom of the relationship and the corresponding p-value are also displayed.

If a database is associated with the current network, the independence test is then computed from the data. Otherwise, it will be computed from the network. When it is computed from the data, the used source (Data) is displayed next to it.

This independence test can be either Chi² ot G-test. The choice of this test is done the the settings of the statistical tools.

Example in the total effects, without database and with the Chi²-test:

Calcul du test d'indépendance, du degré de liberté et de la p-valeur

Index of continuous node's states displayed int the reports When a node is continuous, each state is displayed, in the analysis reports, with its index and the total number of states. The format is:  State (i/n) where i is the index of the state and n the total number of states.

It allows indicating the order of the state as well as its rank in the whole states.

Example of the node Age in the following report:

Affichage de l'index des modalités de noeuds continus

Tooltip in the dendrogram of the variable clustering In the dendrogram chart of the variable clustering, when the mouse cursor is moved over the link junctions, a tooltip containing the value of the link computed from the arc force is displayed.

Degree of freedom displayed for the occurrence matrix an the mosaic In the chart of occurrence matrix and mosaic analysis, the degree of freedom was added:

Degré de liberté dans la matrice d'occurrences

Degré de liberté dans la mosaïque

Mosaics copied in HTML format In addition to the copy as image, it is now possible to copy as HTML tables the obtained mosaics.

Mosaics can be computed from network or from database If a database is associated with the network, it is possible to choose the data source used for the standardized Pearson's residual computation:
  • Network: the Structure equivalent example number allows simulating a set of data to compute the standardized Pearson's residual. This number is, by default, the number of examples used for the last learning.
  • Database: the standardized Pearson's residual is directly computed from the population of the associated database.
If a database exists but one of the selected nodes is hidden (there is no corresponding data in the database) then the default selected data source is the network.

If a database exists and it has data for learning and test, a dialog box proposes to choose the data to use (all, learning, test).

Monitors


Evidences saved in the evidence scenario file A new button in the monitor's toolbar allows saving the whole set of evidences defined on the monitors in the current evidence scenario file. If no evidence scenario file is associated to the network, a new one will be automatically created and the icon  will be displayed in the network's status bar.

The three types of evidences are available:
  1. Hard evidence on a state of a node
  2. Likelihood distribution over the states of a node
  3. Probability distribution over the states of a node (fixed distribution with exact inference, computation of the corresponding likelihoods with approximate inference)
If the network is temporal, then the set of evidences will be added for the current time step.

When the evidences are added, a dialog box proposes to add a comment corresponding to the added evidences. This comment will be displayed in the status bar during interactive inference or updating.

Color of the nodes and monitor's bars according to the kind of evidence
Now, a color is associated to each kind of evidence:
  • Green: hard evidence on a node
  • Light green: observation of the likelihoods (direct or via the probability setting)
  • Mauve: fixed probability observation
An observed node and the corresponding monitor's bars take the color of the evidence set.

Changing target node and target state Pressing the ALT key while clicking on a state of a monitor allows to quickly change the target node and/or the target state.

Centering nodes corresponding to the selected monitors
When one or more monitors are selected, pressing the S key allows selecting the corresponding nodes and centering the display on them. If necessary, the selection will be fitted to the window. When one monitor is selected, pressing the C key allows to search the corresponding node (it will then blink).

Interface


Optimization of multiple deletion with a database
When a database is associated and we want to remove nodes from the network, the database must be modified and the indices must be computed again for each removed node. Now, the process is done in only one cycle, whatever the number of simultaneously removed node is.
The undoing of the deletion was also improved, knowing that the corresponding data are always definitively removed.

Selection preserved when changing the mode
If a selection of nodes and arcs is done on the network and the user changes the mode (Validation/Modeling), this selection is now preserved in the new chosen mode.

Ability to hide the node names A new item in the View menu allows hiding the node names.

Node names displayed:
Noms des noeuds affichés
Node names hidden:
Noms des noeuds cachés
Node names hidden with comments displayed:
Noms des noeuds cachés avec commentaires affichés

Working directories It is now possible to define working directories in BayesiaLab from the Network menu.
The created and selected working directory will be used as default directory to load or save the various kinds of files used. A name is associated to each created directory.

Création d'un répertoire de travail

A list of the recent working directories is kept in order to choose which one we want to use.

The working directories management is done in the settings.

Displaying client identifier If the user has a version of BayesiaLab without license server, the client identifer is now displayed in the About dialog box.

New node selections In the menu Select Nodes of the Edit menu, it is now possible to select:
  • all nodes,
  • excluded nodes.

New arc selections The Select Arcs menu was added to the Edit
menu. Now, it is possible to select:
  • all arcs,
  • fixed arcs,
  • temporal arcs,
  • not-oriented arcs.

Compatibility improvements with Mac OS X Leopard The use of the CTRL key was replaced by the use of the CMD key under Mac OS X Leopard in order to fit the Mac standards. The corresponding shortcuts are then modified. The combination CTRL + left click allows simulating the Windows right click.

New keyboard shortcuts New keyboard shortcuts were added in order to speed up the use of BayesiaLab:
  • Shift + M (validation mode): Mosaic analysis
  • Shift + P (validation mode): Target dynamic profile
  • P (in the structure comparator): Automatic layouting
  • Q (validation mode): Adaptive questionnaire
  • F (validation mode): Arc force analysis
  • G (validation mode): Pearson's correlation
  • H (validation mode): Node force analysis
  • S (validation mode): Variable clustering
  • C + click on a monitor (validation mode): Search of the corresponding node
  • S + several selected monitors (validation mode): Selection of the corresponding nodes and fit of the display into the window if necessary
  • Alt + clic on a state in a monitor (validation mode): Corresponding node and state considered as target

Tools


New Tools menu A menu Tools was added. It contains a set of tools allowing the comparison of two networks, arc confidence analysis and the targeted cross validation. These tools allow using the structure graphical comparator and the network extractor.

Graphs comparison The menu Tools>Compare gives access to the structure comparison of two Bayesian networks. The networks must have exactly the same nodes.

Parameters:
 
This dialog box allows choosing the networks to be compared:


The left hand side network is the "reference" network for comparison. Clicking the button below the picture allows loading another reference network.

The right hand side network is the "comparison" network. On the same manner, clicking the button below the picture allows loading another network.

An option allows choosing whether the bayesian networks themselves or their equivalence class shall be compared.

Comparison report:
 
The following report (HTML format) is created by clicking Compare button:


The first line indicates the names of compared networks.

The rest of the report contains up to ten lists:
  • Common arcs: same arcs in networks
  • Inverted arcs: arcs which orientation changes in the comparison network
  • Added arcs: arcs that exist only in the comparison network
  • Deleted arcs: arcs that exist only in the reference network
  • Common edges: same edges in networks
  • Added edges: edges that exist only in the comparison network
  • Deleted edges:  edges that exist only in the reference network
  • Common V-Structures: same V-Structures in networks
  • Added V-Structures: V-Structures that exist only in the comparison network
  • Deleted V-Structures:  V-Structures that exists only in the reference network
The report can be printed and/or saved in a HTML-format file.

Graphs:

The Graphs button from the report allows displaying the structure comparison graphic tool. With this tool, the data from the report can be easily viewed and the differences between the reference network and the comparison network can be easily detailed.

Arc confidence analysis The menu item Tools>Cross validation>Arc confidence allows computing a frequency value that measures how robust is each arc. The network needs a database and the validation mode must be activated.
For this purpose, the Jackknife method is used. It consists at splitting the database into k samples of equal size. In this context, k-1 samples are used in order to learn k networks while the k th sample is not used.

Parameters:

In the following window, the learning algorithm shall be selected, as well as the number of data samples to be generated.

The window displays sample size depending on the size of the database and the number of samples.
In the output directory can be saved all networks and databases learnt during Jackknife process.
Analysis report:

Once the networks have been learnt on each data sample, the following report is displayed:

It is made of four parts:
  1. The learning context: Reminds the learning method and the number of data samples. It also indicates the structural complexity coefficient used (it is the same as the one used in the initial network).
  2. Arc confidence analysis: Lists of arcs grouped by three colored types
    • Black: the arcs that exist both in the reference structure and the networks from the samples.
      • Arc frequency represent how often the arc appeared with the same orientation as in the sample networks.
      • Inverted arc frequency represent how often the arc appeared with the reverse orientation as in the sample networks.
      • Edges frequency represent how often the arc appeared without any orientation as in the sample networks (the equivalence class of the learnt networks is used) .
      • Total frequency is the sum of all previous ones. It indicates the overall strenght between the variables.
    • Blue: the arcs that exist in at least one sample network but that do not exist in the reference network. In this case, the frequencies are displayed with a negative value. The reference orientation of arc is arbitrarily given by the first arc found in the first semple network.
      • Arcs frequency represent how often this arc appeared with the same orientation as the first.
      • Inverted arc frequency represent how often this arc appeared with the reverse orientation compared with the first.
      • Edge frequency represent how often the arc appeared without any orientation (equivalence class).
      • Total frequency is the sum of all previous ones, it indicates the overall strength of a relationship that does not exist in the reference network.
    • Red: arcs that exist in the reference structure but that have never been found in any learnt sample.
  3. V-Structures confidence analysis: Lists of V-Structures grouped by three colored types:
    • Black: V-Structures that exist both in the reference structure and the networks from the samples.
    • Blue: V-Structures that do not exist in the reference structure but that appeared in at least once in a sample network. Frequencies are displayed with a negative value.
    • Red: V-Structures that exist in the reference network but that never appeared in any sample network.
  4. Comparison structure table: It summarizes all learnt networks from the samples. When identical structures exist, they are gathered.
    • First column is the structure identifier
    • The second column is the number of identical structures learnt.
    • The third column represent the frequancy of the whole strucrure : it is the number of times the current structure appreared divided by the total number of structures.
    • The last column indicates whether the reference structure is included or not in the current structure.
The report can be saved in a HTML format file. It can also be printed. Two other options exist : displaying graphs and extracting the network.

Graphs:

The Graphs button from the report allows displaying the graphical structure comparator. With this tool, data contained in reports can be veiwed and interpreted easily.

Extracting the network:

The Network extraction button from the report displays network extraction tool. This tool allows building a network from any structure depending on arcs frequency thresholds.

Targeted cross validation The menu Tools>Cross Validation>Targeted allows performing a targeted cross validation based on the current network. The requirements are (1) the network must have associated data, (2) the validation mode must be activated and (3) the network must have a target variable.

For targeted cross validation, the k-Folds method is used. It consists at splitting the database into k parts (the folds) and using k - 1 folds for learning a set of k networks. For each network, the last k-th fold is used for testing the network and measuring its predictive performance. For each network learnt, the continuous variables are re-discretized according to the variable distribution in the fold. The discretization method is the same as the one used for the reference network. On the contrary, initial aggregations are kept.

On this basis, the network structures are learnt using the chosen algorithm, and the network's targeted network performance is computed.

Parameters:

The learning algorithm and the number of folds shall be choosen from this dialog box :


The sample size is calculated depending on the size of the database and the number of folds.
An output directory can be specified where all intermediate networks learnt form the folds shall be saved with their corresponding database.

Results:

Network's targeted performance is displayed in this window:


The first panel "results synthesis" displays averages of values obtained for each network:
  • global precision
  • relative Gini value
  • relative Lift value
  • confusion matrices : for occurrences, for fiability and for precision
Nodes frequency array indicates how often a node appears in any network built upon a fold (no matter it is directly conneted to the target node or not).

The Global report button (in the synthesis panel) displays the cross validation synthesis report.

The tabs contain the targeted performance result of each network learnt on the folds:


Global analysis report:

Once all the networks are learnt, the following report is generated:


The report is built on the same template as the global targeted evaluation report, except that it summarizes all values for each index and each matrix calculated for each fold.

The rest of the report contains the nodes frequencies:


The last part of the report contains structural comparison of reference network with the generated networks:
 
The structure of this report is identical to the one generated for the arc confidence analysis.

This report can be saved in a HTML-format file and can also be printed. Two other options exist: displaying graphs and extracting the network.

Graphs:

The Graphs button from the report allows displaying the graphical structure comparator. With this tool, data contained in reports can be veiwed and interpreted easily.

Extracting the network:

The Network extraction button from the report displays network extraction tool. This tool allows building a network from any structure depending on arcs frequency thresholds.

Structure graphical comparator This tool can be acceded from the results of three different analysis: The purpose of this tool is comparing a "reference" network with another one or a group of other ones (or their equivalence class). It creates a colored graphical structure (that is not a bayesian network) that represents all differences between networks.

The patterns matched are arcs, edges and V-Structures.

Toolbar:

The window toolbar contains ten buttons: 4 for the navigation and 6 for modifying display
  1. Navigation bar
    • displays synthesis structure
    • displays reference structure
    • displays the previous structure (with regards to current displayed)
    • displays the next structure (with regards to current displayed)
    Navigating through the structures is realized in this order:
    1. Synthesis structure
    2. Reference structure
    3. 1st comparison structure
    4. 2nd comparaison structure
    5. ...
  2. Display bar
    • Zoom in
    • Zoom out
    • Display structure with default zoom
    • Adjust zoom to window size
    • Rotating structure on the left
    •  Rotating structure on the right
Pressing the P key allows launching the automatic layout algorithm on the graph.

Popup menu:

Popup menu allows displaying nodes comments and copying structure.

Synthesis structure:


This structure summarizes all differences between reference bayesian network and comparison structures.

The color code is:
  • black: Arc or edge that exists in the reference and in at least one comparison network
  • blue: Arc or edge that does not exist in the reference network but that exists in at least one comparison network
  • red: Arc or edge that exists in the reference network but that does not appear in any comparison network
An arc is displayed with an arrow, an edge is a simple line and the V-Structure is displayed as a portion of circle.

The thickness of arcs, edges and V-Stuctures is directly proportional to their frequency.

The nodes can be moved on the draw zone by standard procedure (click and move mouse). When a node is moved in any structure, it is moved the same way in all other structures that are not currently displayed.

Hints appear when the mouse is moved over an arc, ane edge or a V_Structure. An hint contains the name of the object, if it has been added, removed of if it is constant with regards to reference structure. It also contains the frequencies:

The synthesis structure can be printed or saved in an image file by clicking the buttons.

Reference structure:


The reference structure is the initial bayesian network (or its equivalence class) that is used for the comparison basis. The V-Structures of the reference are also displayed.

Comparison structures:



Each network obtained from the cross validation is displayed and numerated from zero, the V_Structures of the network or its equivalence class are also displayed.

Since a comparison structure can represent several identical networks, the total number of networks included in this manner is indicated in the picture caption. The frequency equals the number of networks represented by the comparison structure divided by the total number of networks produced through the cross validation.

Network extractor A network can directly be extracted from the obtained results from arcs confidence or targeted cross validation. This option is available from the corresponding report.

Extracting the network:

The network extraction button displays this dialog box:


This tool automatically creates a network containing all the arcs that have a frequency greater or equal to the indicated threshold. With this operation, only the strongest relationships between variables are kept. The network's conditionnal probability tables are learnt from initial data.

Settings


Text fields replaced by spinners In the settings, the numerical parameters entered with text fields are now set with spinners. The values are now automatically limited and the arrows of the spinners allow to make the parameters vary according to the most appropriate step.

Reorganization of  database settings The database settings panel was reorganized in four panels as in the following image:

Réorganisations des préférences des bases de données

Comment font The font used by default to display all the comments is now customizable in the display settings.
It is possible to modify the kind of font and its size.
As comments are in HTML, some font sizes are not correctly displayed.

Choice of the decimal separator for the export of the numbers In the database saving format settings, it is possible to choose between the dot (.) and the comma (,) as decimal separator for the numbers in the output files.
By defaukt, the separator is chosen according to the user's country.

Option to write or not the BOM for UTF In the database saving format settings, it is possible to indicate if we want to write or not the Byte Order Mask at the beginning of the file if UTF is chosen as encoding for output files.

Settings of the various directories It is possible to configure the various user directories:
Paramétrage des répertoires utilisateur

This wizard allows to choose which are the directories to be used for the graphs, the databases, the images and other files (reports, etc). It is possible to use the same directory for all by selecting the convenient option.

The option fixing the different paths prevents from updating the paths when one of them is changed at the opening or saving of files.

Minimum size of the clusters for  the clustering In the clustering's settings, a field allows entering the minimum size allowed for a cluster. The value is given as a percentage of the database.
If the data clustering algorithm generates smaller clusters, they will be removed.

Security


BayesiaLicenseServer version 3.0.X It is now possible to define a list of users authenticated by name and password for each installed license in order to restrict the access to these licenses.

A mechanism of priority management is also added. Each user has an associated priority. When a license has no more free token, a user with a high priority can take the token of a user with a lower priority.

Network locking It is possible to prevent a network from editing in order to use it only in Validation mode.

A locking mechanism with password forbids the editing of a network previously locked. It allows the user to distribute its network to other users to only perform inference, preventing the network from any modification.

When a network is locked, it is no more possible to edit the nodes and their properties (the modifications are not taken into account), to add or delete arcs and nodes, to associate dictionaries and databases for learning, to modify classes, etc.

However, it is always possible to modify the costs associated to the nodes beacause they are used in Validation mode (adaptive questionnaire, not observable nodes, etc.).

Adding a lock to the network is done through the menu Network>Lock. When the network doesn't already have a lock, the following dialog box is displayed: 


You simply have to enter a password and to confirm it. Then, the indicator  is displayed in the status bar of the network. The network has a lock, now, but is is still modifiable because it is not locked. To prevent editing, you have just to click on the indicator . This icon indicates that the network is not editable anymore. To be able to edit it again, simply click on the icon (or with the menu Network>Lock) and a dialog box asking the password is displayed :


When the network is unlocked, the menu Network>Lock displays the following dialog box:

This dialog box allows the user to:
  • lock the network using the existing password
  • remove completely the lock
  • change the lock password.

Internationalizing


Chinese locale The complete translation of BayesiaLab in Chinese is integrated.

Spanish locale The complete translation of BayesiaLab in Spanish is integrated.