Chapter 1 : Creating a Bayesian network (modeling mode)
BayesiaLab's graphic interface allows for an intuitive, manual development of a bayesian network. In order to access this functionality, either a new graph must be created or an existing graph must be edited.
Once a worksheet is open, new icons appear in the toolbar:
- Create a new graph
- Open a graph (“xbl”, “bif” or “net” format)
- Save a graph
- Print a graph
- Cut
- Copy
- Paste
- Cancel the last action
- Redo the last action
- Find a node
- Zoom in
- Zoom out
- View by default
- Adjust position on the page
- - Select a node or an arc
- - Create a node
- - Create a constraint node
- - Create a utility node
- - Create a decision node
- - Create an arc
- - Delete an arc or a node
1.1 Creating nodes and arcs
First, we'll consider the case of a specialist of lung illnesses who wants to model his or her knowledge (very simplified) concerning a diagnosis of cancer or tuberculosis.
The development of a Bayesian network relies on using the previously collected variables needed for describing the domain under consideration. Each of these variables then corresponds to a node in the graph.
Using the button "create a node" (or by holding down the "N" key while performing a left click), our specialist adds three nodes to the worksheet: one representing the age of the patient, another one to represent if the patient smokes or not and the last one to represent if he or she has cancer.
A warning icon is displayed at the top left of a node to mean that the associated probabilities were not checked or are incorrect. To display the warning message, simply move the mouse cursor over the node maintaining the key W pressed.
He defines then the probabilistic relations between these variables. These relations are shown by arcs that can be created with the "create an arc" button, by moving the mouse, with the left button held down, from one node to the other (it is also possible to switch to the "create an arc" mode by holding down the "L" key while pulling the arc).
- The age has a direct influence on Smoker and Cancer
- Smoking influences cancer
The screenshot given below corresponds to the contextual menu associated to the nodes in Modeling mode (the gray items are only available in Validation mode, which means that they are associated to inference). Whereas the first function is used for editing the properties of each node (cf. chapter 1.2), the definition of a target node is especially useful for automatically learning a Bayesian network from a data base (cf. chapter 3). The “Node Tagging” item allows tagging the nodes with specific colors (to locate quickly for example the nodes that represent symptoms, diseases …)
Our specialist adds then another disease node, the variable Tuberculosis, and a variable TbOrCa to handle a "logical or" between tuberculosis and cancer. TbOrCa may not be necessary but it can simplify the graph later on (common symptoms).
Defining this “logical or” means specifying the direct relations between the nodes Tuberculosis and Cancer and the logical node TbOrCa.
The « P » key allows then to automatically position the nodes, as they appear in the screen shot below.
The contextual menu described below is the one associated to the arcs. With this menu, it is possible to change the direction of an arc, to delete it or to fix it. This option allows expert knowledge to be introduced to the automatic learning process by indicating, to the algorithm, that this knowledge is certain (cf. chapter 3). The last option lets the user define Dynamic Bayesian Networks (cf. chapter 4).
In the end, the specialist has developed a Bayesian network that corresponds to his or her knowledge of respiratory illnesses:
It is possible to associate a global comment to the Bayesian network as well as a comment for each node through the contextual menus. These comments can contain links towards files and Internet addresses. The nodes with a comment display a bubble after their name. To show these comments, simply move the mouse cursor over the concerned nodes maintaining the key V pressed:
1.2 An overview of the Node Editing interface
Once the structural part of the graph has been created (nodes and arcs), the table of conditional probabilities for each node must be filled out (by double left clicking on the node or by using the contextual menu associated to the node "edit the node").
The name of the variable is displayed in the upper left corner. Next the type of variable is given (Label or Interval), then its states are given (a list of different values that the node can take exclusively).
If the type is Label (in other words symbolic), the user can modify the list of default values to meet his or her needs. The list of default values for a symbolic variable is (False, True).
If the type is Interval (which means that the represented variable is initially continuous), the user specifies the interval of continuous values associated to each state, either from the table or directly on the axis of the continuous variable.
The default mode for entering the parameters associated to the node is the table of probabilities. If the conditional probability table is filled in a partial manner or with non standard numeric values, then the "Complete”, “Normalize" or “Random” buttons can be used to bring the sum of a line to 100.
The two probability tables below quantify the direct probabilistic relations between Age, Smoker and Cancer.
The younger the patient is, the more the probability that he or she smokes is high.
The older the patient is, the more the probability that he or she has cancer is high, smoking being an aggravating factor.
Lastly, the variable TbOrCa represents a “logical or” and thus is a deterministic node. Its probability table is:
However, this type of entry can be rather heavy for deterministic variables. The Deterministic mode allows for a more economic means of entering the relation since the probability distributions no longer need to be specified, but instead only certain states (with a probability of 100). In this mode, the “logical or" becomes:
Lastly, we can automatically define probability distributions using deterministic or probabilistic equations with the Equation mode. A great number of functions are integrated and can be used directly or you can create your own equations as well. In this example, we will simply use a “logical or” represented by “|”:
1.3 Managing the costs
It is possible to associate a cost to a node, or rather to the knowledge of the value of this node. This does not mean a pure financial cost. The cost can be considered in terms of something negative (a doctor who asks an indiscrete question or who performs a painful examination) or a risk (a risky examination). These costs are used by BayesiaLab to produce adaptive questionnaires.
Costs can be edited thanks to contextual menu associated to the Bayesian network worksheet (right click on the worksheet), Edit costs. In Validation mode, these costs can be edited thanks to the adaptive questionnaire assistant.
In our specialist's case, Tuberculosis, TbOrCa and Cancer are variables corresponding to the diagnosis, their values are considered as non observable (the doctor cannot ask the patient if he or she has cancer). A variable is set non observable by deleting the value in the cell. For example, the doctor will try to avoid having to take an x-ray with a cost of 1000. Costs are exploited during the automatic development of an adaptive questionnaire (cf. paragraph 2.2.1).








