Associate Dictionary
Context
- This menu item allows you to define the properties of the active Bayesian network using text files.
- These properties concern arcs, nodes, and states:
Dictionary File Structures
| Dictionary File Structures | ||
|---|---|---|
| Arc | Arcs | Name of the arc’s starting node or class, -> , <- or even — to indicate both possible orientations, name of the arc’s ending node or class, Equal, Space or Tab , true for an added arc or false for a removed arc. The last occurrence is always chosen. |
| Forbidden Arcs | Name of the arc’s starting node or class, -> , <- or even — to indicate both possible orientations, name of the arc’s ending node or class. | |
| Comments | Name of the arc’s starting node or class, -> , <- or even — to indicate both possible orientations, name of the arc’s ending node or class, Equal, Space or Tab , comment . The comment can be any character string without line breaks (in HTML or not). The last occurrence is always chosen. | |
| Colors | Name of the arc’s starting node or class, -> , <- or even — to indicate both possible orientations, name of the arc’s ending node or class, Equal, Space or Tab , color . The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example, green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF, etc. The last occurrence is always chosen. | |
| Fixed Arcs | Name of the arc’s starting node or class, -> , <- or even — to indicate both possible orientations, name of the arc’s ending node or class, Equal, Space or Tab , true for a fixed arc or false for a non-fixed arc. The last occurrence is always chosen. | |
| Node | Node Renaming | Name of the node Equal, Space or Tab new node name. The new name must be valid (different from t or T and without?). A node can be present only once; otherwise, the last occurrence is chosen. |
| Comments | Name of the node or the class Equal, Space or Tab Comment. The comment can be any character string without line breaks (in HTML or not). A node can be present only once; otherwise, the last occurrence is chosen. | |
| Classes | Name of the node Equal, Space or Tab Name of the class. The class can be any character string. A node present several times will be associated with different classes. | |
| Colors | Name of a node or a class Equal, Space or Tab Color. The color is defined as Red Green Blue 8 bits by channel color written in hexadecimal (web format). For example, green gives 00FF00, yellow gives FFFF00, blue gives 0000FF, pink gives FFC0FF, etc. A node can be present only once; otherwise, the last occurrence is chosen. | |
| Images | Name of a node or a class Equal, Space or Tab path to the image relative to the directory where the dictionary is. The image path must be a valid relative path or an empty string. A node can be present only once; otherwise, the last occurrence is chosen. | |
| Costs | Name of the node Equal, Space or Tab value of the cost or empty if we want the node to be unobservable. The cost is an empty string or a real number greater than or equal to 1. A node can be present only once; otherwise, the last occurrence is chosen. | |
| Temporal Indices | Name of the node Equal, Space or Tab value of the index or empty if we want to delete an already existing index. The index is an integer. A node can be present only once; otherwise, the last occurrence is chosen. | |
| Local Structural Coefficients | Name of the node Equal, Space or Tab value of the local structural coefficient or empty if we want to reset to the default value 1. The local structural coefficient is an empty string or a real number greater than 0. A node can be present only once; otherwise, the last occurrence is chosen. | |
| State Virtual Numbers | Name of the node Equal, Space or Tab virtual number of states or empty if we want to delete an already existing number. The state virtual number is an empty string or an integer greater than or equal to 2. A node can be present only once; otherwise, the last occurrence is chosen. | |
| Locations | Name of the node Equal, Space or Tab , position. The location is represented by two real numbers separated by a Space. The first number represents the x-coordinate of the node and the second number the y-coordinate. A node can be present only once; otherwise, the last occurrence is chosen. | |
| State | State Renaming | Name of the node or class dot (.) name of the state Equal, Space or Tab new state name or State name Equal, Space or Tab new state name if we want to rename the state for all nodes. The new name is a valid state name. A state can be present only once; otherwise, the last occurrence is chosen. |
| State Values | Name of the node or class dot (.) name of the state Space or Tab real value or Name of the state Equal, Space or Tab real value if we want to associate a value with a state regardless of the node. The value is a real number. A state can be present only once; otherwise, the last occurrence is chosen. | |
| State Long Names | Name of the node or class dot (.) name of the state Equal, Space or Tab long name or Name of the state Equal, Space or Tab long name if we want to associate a long name with a state regardless of the node. The long name is a string. A state can be present only once; otherwise, the last occurrence is chosen. | |
| Filtered States | Name of the node or class dot (.) name of the filtered state. Name of the filtered state if we want to set the filter property to the state regardless of the node. A state can be present only once; otherwise, the last occurrence is chosen. |
As indicated by the syntax, the name of the node, class, or state in the text file cannot contain equal, space, or tab characters. If the node names contain such characters in the network, those characters must be preceded by a \ (backslash) in the text file. For example, the node named Visit Asia will be written Visit\ Asia in the file.
To differentiate a name that is the same for a class, a node, or a state, add the suffix “c” for a class, “n” for a node, and “s” for a state.
If your network contains non-ASCII characters, you must save your dictionaries with UTF-8 (Unicode) encoding. For example, in MS Excel, choose “save as” and select “Text Unicode (*.txt)” as the file type. In Notepad, choose “save as” and select “UTF-8” as the encoding. If your file contains only ASCII characters, you can use the default encoding (depending on the platform), but it is strongly encouraged to use UTF-8 (Unicode) encoding to create dictionary files that do not depend on the user’s platform. For example, a Chinese dictionary can be read by a German without any problem regardless of the platform used. If you are not sure how to save a file with UTF-8 encoding, export a dictionary with BayesiaLab, modify and save it (with any text editor), and load it in BayesiaLab.
Export Dictionary
This menu item allows exporting the different kinds of dictionaries in text files.
The dictionary files are saved with UTF-8 (Unicode) encoding to support any character in any language. An option in the Import and Associate preferences, Save Format, allows saving or not saving the BOM (Byte Order Mark) at the beginning of the file. The BOM increases compatibility with Microsoft applications. On other platforms, such as Unix, Linux, or macOS, the BOM is not necessary and, in some cases, is considered extra characters at the beginning of the file.
Associate an Evidence Scenario File
This menu item allows associating an evidence scenario file with the network.
Export an Evidence Scenario File
This menu item allows exporting into a text file an evidence scenario file associated with the network.
Generate Data
This menu item allows generating a set of n cases in agreement with the probability law described by the active Bayesian network. It is possible to generate the data as an internal database. You can also indicate the missing-value rate and use the long name of the states if the database is written to a file. It is possible to generate a database with test examples by specifying the desired percentage.
State long names can be saved instead of state names. If the user wants to save continuous values, the numerical values are created by randomly generating a value in each corresponding interval. If the data are generated in validation mode, then the evidence is taken into account.
Save Data
This menu item allows saving the database associated with the network, including the results of the various pre-processing steps carried out within the Data Import Wizard. If the imported database still contains missing values and the selected algorithm to process missing values is one of the two imputation algorithms (static or dynamic), this option allows you to complete the imputation tasks by saving a database without any missing values. Each missing value is replaced by taking into account its conditional probability distribution, returned by the Bayesian network, given all known values of the row. If the database contains test and learning data, the user can choose which kind of data to save: only learning data, only test data, or the entire dataset. It is also possible to save only the data corresponding to the selected nodes.
State long names can be saved instead of state names. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there are no numerical values associated with the database and the option is checked, the numerical values are created by randomly generating a value in each corresponding interval. If the database contains weights, they are saved as the first column in the output file.
Imputation
Allows the imputation of the missing values of the associated database according to the mode selected in the following dialog box:
The data will be saved in the specified file, and state long names will be used as specified. If the database contains test and learning data, the user can choose which kind of data to perform imputation on: only learning data, only test data, or the entire dataset. State long names can be saved instead of state names. The numerical values in the database associated with the continuous nodes can be saved if they exist. If there are no numerical values associated with the database and the option is checked, the numerical values are created by randomly generating a value in each corresponding interval. However, if there are numerical values in the database, the missing numerical values are generated from the distribution function of each interval. If the database contains weights, they are saved as the first column in the output file.