Gix Mapping

Context

The GIS Mapping function utilizes Google Maps to visualize records from the dataset that is associated with your Bayesian network.
This requires that your dataset contains variables representing longitude and latitude so that each record features geographic coordinates.
With the coordinates, BayesiaLab can display record-specific values in four dimensions:
- Shape
- Color
- Size
- Opacity
Each of these dimensions can be assigned attributes to visualize:
- Directly observed node values or node states in the dataset, if the node is an Observable Random Node and its value is not missing.
- Inferred values based on the current Bayesian network if the node is
  - an Observable Random Node and its value is missing
  - a Target Node
  - a Not-Observable Random Node
  - a Function Node representing numerical values

Usage

If the data set associated with the network contains latitude and longitude coordinates, it is now possible to display a graphical object per observation/row on a Google Map.

The coordinates need to be loaded as continuous variables and thus have to be discretized. While the choice of discretization can have an impact on the machine-learned model (if these coordinates are useful for the model), it does not have any impact on the mapping. The continuous values are utilized directly.

The graphical objects have four dimensions: Shape, Color, Size and Opacity.

Each of these dimensions can be:

Fixed, i.e. identical for each object/observation,

Based on the value of a variable, i.e specific to each observation:

Directly extracted from the observation described in the data set when the variable is:

an Observable Random Node and the value is Not Missing,

Inferred with the current Bayesian network when the variable is:

an ObservableRandom Node and the value is Missing,

the TargetNode,

a Not Observable Random Node,

a Function Node with numerical values.

For inference, all the non missing values of the Observable Random Nodes are set as hard evidence.

The value that will be utilized for the mapping depends on the type of the variable:

Discrete: the state is chosen with the Maximum a posteriori criterion,

Continuous: the mean value is computed with the posterior probability distribution, normalized to bring all values into the range [0,1],

Function Node: except when used to define the Shape, the value is normalized to bring all values into the range [0,1], by using the Minimum and Maximum Values set in the wizard.

Shape

Size

Color

Opacity

Three shapes are defined: Circle (1), Square (2) and Triangle (3).

When not Fixed, the shape is chosen based on its rank and the inferred value:

For Discrete nodes:

where is the state's rank and is the modulo,

For Function nodes:

where is the value of the Function and is the function for converting into an integer.

When not Fixed to the user defined Fixed Value , the size is chosen based on the and the inferred value:

For Discrete variables:

where is the normalized state's rank;

For Continuous and Function nodes:

where is the normalized value.

When not Fixed to the chosen color, the color is chosen as folliows:

For Discrete variables: the color is chosen based on the state's rank and the Secondary Color Palette,

For Continuous and Function nodes: the normalized value is direclty used to defined a color on the user defined scale Min, Mid (if checked), and Max.

When not Fixed to the user defined Fixed Value (), the opacity is chosen based on and the inferred value:

For Discrete variables:

where is the normalized state's rank;

For Continuous and Function nodes:

where is the normalized value.

Example

Let's use a data set that contains house sale prices for King County, which includes Seattle. It describes homes sold between May 2014 and May 2015. More precisely, we have extracted the 94 houses that are more than 100 years old, that have been renovated, and come with a basement.

After having set Price (K$) as a Target Node, we've used the Augmented Markov Blanket algorithm for generating the following network:

The Function Node Certainty is defined as: 1-Entropy(?Price (K$)?, yes)

The first three parameters of this wizard are the general settings of the mapping:

Map Type: Roads, Terrain, Satellite or Hybrid,

Latitude: the continuous variable to use for the latitude coordinate,

Longitude: the continuous variable to use for the longitude coordinate.

This setting generates the following map that takes into account four differents variables:

The Observable variable Overall grade given to the housing unit (discretized into three bins) defines the shape. The values are directly read in the data set to determine the corresponding discrete bin, if not missing;

$<= 7.5:$ CIRCLE

$<= 8.5:$ SQUARE

$\> 8.5:$ TRIANGLE

The Observable variable Living room area in 2015 defines the size, with 25 as the maximum (set in the Fixed field), The continuous values are directly read in the data set, if not missing,

The Target Node Price (K$) defines the color. The values are the inferred posterior mean values,

The Function Node Certainty defines the opacity. The values are inferred.

Evidence Instantiation Hyperparameter Augmentation