BayesiaLab 4.4 : new features
|
|
Data |
|
| Temporary exchange file used for data import and association size has been divided by ten. The CPU time needed for data filtering has also been divided by ten. | |
|
|
|
| While imputing continuous missing values, a value must be computed. If a database is associated with the network, then this value is sampled out of the distribution function values of the interval. |
|
Every discretization referenced in the import or association report is associated with a color. The same principle is applied for node aggregation. ![]() |
|
|
|
|
Several extra nodes coming from a database can be added to the network at the same time.![]() |
|
|
Import / associate columns highlighting when missing values exist |
While importing or associating a database, the icon appears in column header if missing values exist. This occurs at the filtering / replacement missing values step. If those missing values are filtered or replaced, the icon disappears in the considered column. |
|
|
|
| When missing values are filtered or replaced at data import or association step, displayed statistics are immediately up to dated in order to represent current database state. | |
|
|
|
| While importing or associating data, missing values replacement can be proceed simultaneously on several columns. The combobox now allows choosing a modality among all available in each column, the selected modality is used for replacing the missing values. | |
| When a multiple aggregation is asked, a progress bar is displayed. The multiple aggregation process can be aborted by clicking the dialog close button. | |
|
|
|
| Database lines with missing values are ignored for Khi² computation in the occurrence matrix. Any weight present in the database is taken into account for Khi² computation in the occurrence matrix. |
|
|
|
|
| Node renaming is allowed by importing a dictionary containing the new name of each node. A dictionary template can be designed by using export node name: this template contains the name of each node. A new name can be associated with each node in this dictionary. Node renaming is propagated in equations. |
|
|
|
|
| Modality renaming is allowed by using a dictionary. Only some modalities or all of them can be renamed. The modality to be renamed can be referenced either by mentioning its name or by mentioning the node or class name AND the modality. In the first case, all modalities in the network are renamed, whereas in the second, only the concerned node or class modalities are renamed. A dictionary template containing each modality preceded by its corresponding node name can be exported as a file. The new modality name can be associated with each modality. |
|
|
|
|
While manually discretizing a variable during data import, in addition to distribution function, the density graph can be displayed. This graph is computed using batch-means method.![]() Switch view button allows switching between density and distribution graphs. Discretization points can be placed on the graph. Red areas indicate parts of the graph that might be not correct. |
|
|
|
|
| Zooming on discretization graphs is now possible by selecting the corresponding area of the graph. Zooming is realized vertically on the distribution function, whereas it is realized horizontally on the density graph. This allows sharper discretization points positioning. Zooming out is realized by double clicking on the graph. |
|
|
|
|
| During import or association, while variables are automatically discretized, the selected method may fail because no result can be found. In this case, a dialog box pops up, allowing changing the discretization method. This happens each time a discretization fails for a variable, the choice made the first time can be saved for the following. |
|
|
|
|
| A "Recent" menu item is available in base import and association menus. This allows fast database network association, particularly useful for daily used networks. | |
|
|
|
| In the right part of the matrix occurrence graph, variable independence probability is also displayed as a percentage. | |
|
|
|
| An example of discretized data exported out of a continuous node is: <=0,5, <=2,7, >2,7 When the same data was imported, modalities used to be alphabetically sorted: >2,7, <=0,5, <=2,7 Now, the symbol <= appears before >. The numerical part is used afterwards, in case of equal values are found. The result is now: <=0,5, <=2,7, >2,7 |
|
|
|
|
| In validation mode, database generation is realized using nodes probability distribution. Now the generation also takes into account exact observation and soft evidences. | |
|
|
|
| When a variable is discretized or when a node is manually discretized, interval names are created according to its bounds. The considered interval is named after the upper bound preceded with <=. However, two intervals may receive the same name depending on the size of the interval (rounding). Now, rounding is realized with regards to the required precision that avoids same interval names. Required precision is independently computed over each interval in order to avoid too long interval names. |
|
Analysis |
|
Target node performance evaluation can now be realized according to a single or all modalities, as displayed below :
In the case all modalities are evaluated, the gain, lift and ROC curves are calculated for each modality and displayed in different tabs in the dialog result. In the case described below, only two modalities exist "Yes" and "No" : Moreover, curves quality has been increased for better readability. |
|
|
|
|
| When database lines are weighted, the weight values are taken into account for targeted evaluation. Occurrence matrices and lift / Gini / ROC curves are modified as well. | |
|
|
|
|
Gini, |
In the targeted evaluation, some new indices are computed for each curve. Gain curve: ![]() The Gini Index and the Relative Gini Index are computed according to the curve and displayed at the top of the graphic. The Gini Index is computed as the surface under the red curve and above the blue curve divided by the surface above the blue curve. But, as shown above, the surface of the optimal policy is less than the surface above the blue line, so the relative Gini index is computed as the surface under the red curve and above the blue curve divided by the surface under the curve of the optimal policy and above the blue curve. It is a more representative coefficient. Lift curve: ![]() The Mean Lift and the Relative Lift Index are computed according to the curve and displayed at the top of the graphic. The Mean Lift is the mean of all the points in the curve. The relative Lift Index is computed as the surface under the Lift curve divided by the surface under the lift curve of the optimal policy. ROC curve: ![]() The ROC index is computed according to the curve and displayed at the top of the graphic. It represents the surface under the ROC curve divided by the total surface. |
|
|
|
|
This analysis is used to display on a two-dimensional graph, the marginal probabilities of a node based on all possible combinations of evidences set on nodes. The Pearson's standardized residual is also computed for each combination. These probabilities are displayed with colored rectangles that can be easily identified and compared to each other. The analysis is performed only on the selected nodes in the network. Depending on the number of selected nodes, the dialog settings may slightly vary. The most complete version is displayed when three nodes are selected. The following version is the simple version: The selected nodes are displayed in the table and their positions in the graph are displayed on the left. It is possible to modify their respective positions by selecting the desired node and using the Up and Down buttons. By default the display of variables is done in alternating horizontal and vertical positions. With one variable, the graphic will represent P(Horizontal0). With two variables, the graphic will show P (Vertical0 | Horizontal0). With three variables, the graphic will display P(Horizontal1 | Vertical0, Horizontal0). With four variables, the graphic will display P(Vertical1 | Horizontal1, Vertical0, Horizontal0). And so forth. If Horizontal Diagram is checked, then the graphic will be displayed with the first variable in vertical position and all others in horizontal position inside a separate chart for each horizontal variable that represents P (Vertical | Horizontal i). If Display P(Horizontal | Vertical) is checked, then each graphic will represent P (Horizontal i | Vertical). The Structure Equivalent Example Number setting allows simulating a set of data in order to compute the standardized Pearson's residues. The following image is a chart with three variables. The first variable is the horizontal variable Eyes, the second is the vertical variable Hair and the third is the horizontal variable Sex. The horizontal and vertical cells represent the marginal probabilities of each variable's states without any evidence set. The central cells represent the conditional probabilities P(Eye | Hair, Sex). The value of the Khi2 test and the associated independence probability are shown at the top of the graph. For each cell, the Pearson's standardized residual is computed as : Di = (ni - Ni) / SQRT(Ni)The KhiҠtest equals the sum of DiҮ The result display panel is also modifiable: The option Display Pearson's Standardized Residual toggles between classic display with colors corresponding to the states of the first horizontal variable and the display with the color code of the Pearson's standardized residual. The color code is as follows:
The option Resizable Graphic allows enlarging or reducing the graphic according to the window's size. If this option is unchecked, the graphic has a predefined constant size and scroll bars are displayed if necessary. There are two possibilities of separation between the cells of the graph:
Here is a part of charts that can be obtained according to the settings:
|
|
|
|
|
This report allows establishing the profile of the target node according to the selected criterion. The goal is to maximize or minimize one of the three available criterions by setting evidence sequentially on the other variables. The parameters can be modified in the following dialog box: One of these profile search criterions must be selected:
The search is stopped when the joint probability of the network reaches zero. But this stop criterion can be modified by setting a maximum number to the evidences done and by modifying the minimum joint probability allowed. Here is the result corresponding to the parameters above: ![]() |
|
|
|
|
|
Allows highlighting the importance of the node with respect to the complete structure. Three kinds of node forces are computed:
You can use this tool to make translucent all the nodes having a force lower than the value indicated.
|
|
|
|
|
| A new table is added to the relationship analysis report. The second table represents the node force analysis. For each node it displays:
|
|
|
|
|
|
KhiҠindependence probability in relationships analysis report |
If a database is associated with current network, KhiҠindependence probability of each relationship is computed and displayed in the relationships analysis report. The calculation includes weights if some are specified. |
|
|
|
| Mutual information between two nodes is added in the report for each relationship (displayed in picture above). |
|
|
|
|
| This report allows computing the total effect of each variable on the target node. We consider that the target variable is locally linear and the total effect is the estimation of the derivative of the target with respect ot this variable. The total effect represents the impact of a small modification of the "mean" of a variable over the "mean" of the target. The total effect is the obtained ratio. The standardized total effect is also displayed. It corresponds to the total effect multiplied by the ratio to the standard deviation of the current variable and the standard deviation of the target. The mean of each node is computed like this: if the node has values associated with its states, the mean is computed from them, otherwise if the node is continuous, its mean is computed from the intervals, and if the node is discrete with integer or real states, the mean is computed from them. If there is no possibility to compute the mean, a default set of values from 0 to the number of states minus one is used. The positive impacts are displayed in blue and the negative ones are displayed in red. ![]() |
|
|
|
|
| In neighborhood analysis mode, the number of nodes in the neighborhood is displayed in the graphӳ status bar. | |
|
|
|
|
Target and parameter sensibility analysis on selected non-translucent nodes |
Target and parameter sensibility analysis are now realized only on selected non-translucent nodes. |
|
|
|
| In variable clustering, resulting class names have been renamed from Cluster_X to [Factor_X] in order to avoid any confusion with data clustering. | |
|
|
|
| In variable clustering, the length of the lines in the dendogram is inversely proportional to the force of the relationship between two variable sets: the shorter the line, the stronger the relationship. | |
|
|
|
| In variable clustering, the length of the lines in the dendogram is inversely proportional to the force of the relationship between two variable sets: the shorter the line, the stronger the relationship. | |
|
|
|
|
Graphical comparison between learning and test sets in global performance evaluation |
In network global performance evaluation, a new graph allows comparing learning set results with test set results:![]() ![]() |
|
|
|
Target analysis report items are reordered:![]() |
|
|
|
|
| Target analysis can now be computed even if it is a soft evidence node. In all other analysis, any node can be a soft evidence node too. | |
|
|
|
| In order to enhance computational performance, target-independent nodes are not included in report analysis. Target-dependant nodes must be linked (directly or indirectly) and unobserved. They can be soft evidence nodes. |
|
Inference |
|
|
Network nodes are sequentially observed with regards to each database line (save not observable nodes and missing values). Joint probability is computed, and then its likelihood is compared with disconnected networkӳ likelihood. The results are associated with selected entry fields and stored in a file. |
|
|
|
|
|
Some of the networks have a too important complexity to perform exact inference on them. The junction tree may be too big to be represented in memory and the inference time can be extremely important. In this case, when the user asks to go in inference mode, a dialog box is displayed to propose several options:
|
|
|
|
|
| Whenever a network is associated with a database, this database can be used as the data source for each batch command. This comes in addition to text or jdbc source database use. | |
|
|
|
| In the batch inference, the expected value is computed for each not-observable node and saved in a file. This expected value is computed based on values associated with node modalities if exists, or based on averages of each interval for a continuous node and in real or integer modalities for a discrete node. If it is not possible to compute these values, there is no expected value. | |
|
|
|
It is now possible to keep record of many modalities of a same variable in a dynamic Bayesian network. The modality choice is made in the following dialog box:![]() |
|
Learning |
|
Tree augmented naive Bayes is a partially predefined structure allowing relaxing the strong constraint of conditional independence associated to the naive Bayes assuming that the knowing the value of the target makes each node independent of the others. This architecture is made up of a naive architecture on which a maximum spanning tree is learned. The prediction accuracy of this algorithm is better than those obtained by the naive architecture, but not as good as obtained with Augmented Naive Bayes, however, this algorithm is much quicker than it.![]() |
|
|
|
|
In multiple clustering settings dialog box, an option allows displaying the reports at the end of the segmentation of each cluster: ![]() |
|
|
|
|
| In order to conserve the coherence with the variable clustering that products classes named [Factor_X], [], the multiple clustering generates nodes named [Factor_X] instead of Cluster_X. | |
|
|
|
|
Network and database backup at the end of the multiple clustering |
In the output parameter part, the multiple clustering wizard allows selecting the folder where the generated networks will be saved (one network per class [Factor_X] and the final network covering all latent variables) and adding all initial network node to the final network. Moreover, the wizard asks if the user wants to save the long names of modalities and the continuous values in the final database. |
|
|
|
| If the initial database has a test set, this set is transferred into the final database and missing values imputation is made on the new variables [Factor_X]. Finally, the final database is saved in the target folder. | |
Monitors |
|
| The node score computation displayed in monitors has changed. When a node has values associated with its modalities, the result value, which is a function of the node probability distribution, is displayed. If a continuous node has no associated values, the average of each interval (computed from data if there is an associated database or the arithmetic average is used) is used. If it is a discrete node with integer or real values, these values are used.
|
|
|
|
|
|
Monitor restriction for adaptive questionnaire and display according to the target |
Now, when an adaptive questionnaire is asked, the monitors with translucent nodes are not displayed. Similarly, when a sorted display of the monitors according to the target or a target modality is chosen, the monitors with translucent nodes are not used. |
|
|
|
If the target modality monitor is displayed, then the icon is displayed close to the modality in the monitor: ![]() |
|
|
|
|
| During an adaptive questionnaire, when the user observes a monitor modality, all monitors are computed and shown again. Instead of displaying the last added monitor, the panel moves to show the first monitor on top-left. The user must answer to this monitor in priority. Similarly, when the monitors are sorted according to the target or a target modality, the panel moves to display the first monitor on top-left. |
|
Interface |
|
| All the comments are now in HTML (3.2). For the nodes, the arcs and the network, a common editor allows creating and editing comments. The following editor allows creating complex comments in HTML. It can be accessed throught the contextual menus of the arcs, the nodes and the network. It is also integrated int the node editor. The File menu allows:
With the buttons of the toolbar, it is possible to change, for the current selection, the font, the text alignment, the bold, italic and underlined attributes and the color of the foreground and background. According to the position of the cursor, the contextual menu, accessible with a right click, allows: |
|
|
|
|
In the node editor, two buttons allow moving up or down the selected modality. The current modality table is automatically rebuilt. The modality long name order and associated values change in the same time. The probability tables of child nodes are recalculated when the change is validated. ![]() |
|
|
|
|
It is possible to change parent node order in the node editor tab Probability Distribution by a click on the parent header and a drag and drop up to the desired location. The probability table is automatically rebuilt when the header is released in this destination. ![]() |
|
|
|
|
Now, when a node or an arc is translucent, if the associated commentary is showed, it is also translucent. ![]() |
|
|
|
|
It is now possible to rename node directly in the node editor. If a node is renamed, the modifications are automatically saved before. ![]() ![]() The new node name must be different of other one. |
|
|
|
|
|
Copy and transfer of exact numerical values from table to table |
When the table content in the node editor is copy-pasted in another table, the exact numerical values are kept instead of used the round values due to the display of the cells. |
|
|
|
| The new item Invert All Selection in Edit menu allows inverting all the selection in the network, both nodes and arcs. There are also new items for inverting only node selection and only arc selection as well. |
|
|
|
|
The database weight sum is now in the database tooltip:![]() |
|
|
|
|
| A discrete node can have modalities with integer or real values. Depending upon cases, these modalities can be used as integer or real values in the equations. A modality generator is now in the node editor:
|
|
|
|
|
It is possible to show or hide the arc tags independently of the node tags thanks to the button added in the network tool bar:![]() |
|
|
|
|
| The initial color table now offers softer colors. | |
|
|
|
When nodes and arcs are selected in a network, the corresponding node and arc numbers are displayed in the status bar of the graph window: ![]() ![]() |
|
Formulas |
|
| The discrete variables that have only real values modalities can be used as real variables in the equations. | |
|
|
|
| When the probability table of continuous node is generated by an equation, it is possible that some generated values are out of range. In this case, a dialog box asks the user if he wants to enlarge automatically the node bounds to use these values. The choice is proposed each time this happens unless the user are selected the option to do it automatically for each values. | |
|
|
|
| A new function Switch is introduced in the special function list. It allows replacing efficiently a sequence of nested If functions: Switch(s, ki, vi, ..., d) Description: Branch instruction. According to the value ki that s can take, the corresponding value vi is returned. If no ki is corresponding, then the default value d is returned. Number of Parameters: >= 4 Parameter type: (all, all, all, ..., all) but the parameters s and ki must have the same type or comparable (integer and real for example) and it must be the same thing for the parameters vi and d. Result type: The return type is the common type of the parameters vi and d. If one of them is real and the other integer, then the result type is real. Example: The previous probability distributions correspond to:
P(?Opinion? | ?Note?) = where Opinion is a Discrete variable that has 5 states (Very weak, Weak, Fair, Good, Very good) and Note is a Discrete variable with 21 integer states from 0 to 20. |
|
|
|
|
| When a node that has an equation is copy-pasted in the same network, the node name is changed to avoid duplicates. In this case, the old node names that are referenced by the equations are also renamed. It is not necessary to do it manually. | |
|
|
|
| Now, the equations entered in the equation editor save user manual format givenand indent. It is retained in save file and recovered after the network opening. | |
|
|
|
| The equations defined by users that implement the dedicated JAVA interface can now have a variable parameter number. This parameter number is defined when the function is used in the equations. For example, a Sum function can be defined with a variable parameter number in order to add any parameter number we need. |
|
Settings |
|
| In database settings, a new option named Minimum Interval Size in Database Size Percent for KMeans discretization allows indicating the minimal interval size found by KMeans discretization to keep during data importing. | |
|
|
|
| The database settings layout is reorganized and the text fields for parameter definition are replaced by formatted fields whose values can be changed thanks to associated buttons (Spinner). The default interval number for automatic and manual KMeans discretization is 3. The weight normalization option is also integrated in the layout. ![]() |
|
|
|
|
| The learning settings are reorganized with a sub tab for association discovery. The text fields are replaced by Spinners. |
|
|
|
|
|
Clustering settings for the maximum drift and the minimum purity |
It is now possible to change two parameters associated with the data clustering:
|
|
|
|
| The cursor that allows changing the structural complexity influence of networks during learning has now a logarithmic scale over ]0, 150]. | |
Security |
|
|
Automatic uninstalling of BayesiaLab and BayesiaLicenseServer licenses |
It is now possible to uninstall automatically the BayesiaLab and BayesiaLicenseServer licenses from our server in order to reinstall the software on another computer. The machine where the software is installed must have an Internet connection. When the software will be uninstalled (or the license for BayesiaLicenseServer), a connection will validate the uninstalling from our server. If the server validates the uninstalling, the license can be reused with another computer. The uninstall number is limited by 2 per 12-months period. |
|
|
|
| BayesiaLicenseServer now allows keeping record of all the transactions done into a log file. This log file describes the transaction between the client applications and BLS. The following information is saved for each transaction: ID, date, hour, name and IP of the host IP, origin (server or client) and type (open, close, invalid) of the transaction, name, edition and version of the software corresponding to the license, user group, client ID, message associated to the transaction, session length, transaction result. |
|
|
|
|
Thanks to the new BayesiaLicenseServer HCI, the administrator can manage the connection one by one. He can also send messages to his customers connected to BayesiaLicenseServer. ![]() |
|
|
|
|
| The BayesiaLab connection to BayesiaLicenseServer is enhanced in order to avoid losing the connection during network micro-cuts. If BayesiaLicenseServer loses the connection, it keeps the used token and BayesiaLab will try to reconnect for recovering the token or to take another one if it is not possible. If this attempt fails, BayesiaLab will warn about its closure and will propose to save the user work. | |
|
|
|





appears in column header if missing values exist. This occurs at the filtering / replacement missing values step. If those missing values are filtered or replaced, the icon disappears in the considered column.







simulated data are in very significant overrepresentation (D > 4)
simulated data are in significant overrepresentation (D > 2)
simulated data are in not significant overrepresentation (D > 0)
simulated data are in not significant under-representation (D < 0)
simulated data are in significant under-representation (D < 2)
simulated data are in very significant under-representation (D < 4)
absence of simulated data









Go back to the previous threshold according to the selected force
Go to the next threshold according to the selected force
Computes only the entering force of the nodes and displays if greater than the given threshold
Computes the global force of the nodes and displays if greater than the given threshold
Computes only the outing force of the nodes and displays if greater than the given threshold









is displayed close to the modality in the monitor: 










added in the network tool bar:




