Skip to Content

GIS Mapping

Context

  • The GIS Mapping function utilizes Google Maps to visualize records from the dataset associated with your Bayesian network.
  • This requires that your dataset contains variables representing longitude and latitude so that each record features geographic coordinates.
  • With the coordinates, BayesiaLab can display record-specific values in four dimensions:
    • Shape
    • Color
    • Size
    • Opacity
  • Each of these dimensions can be assigned attributes to visualize:
    • Directly observed node values or node states in the dataset, if the node is an Observable Random Node and its value is not missing.
    • Inferred values based on the current Bayesian network if the node is
      • An Observable Random Node and its value is missing
      • A Target Node
      • A Not-Observable Random Node
      • A Function Node representing numerical values

Usage

If the dataset associated with the network contains latitude and longitude coordinates, it is possible to display a graphical object per observation (row) on a Google Map.

The coordinates need to be loaded as continuous variables and therefore must be discretized for modeling. While the choice of discretization can have an impact on the machine-learned model (if these coordinates are useful for the model), it does not have any impact on the mapping. The continuous values are utilized directly.

The graphical objects have four dimensions: Shape, Color, Size, and Opacity.

Each of these dimensions can be:

  • Fixed, i.e., identical for each object/observation.
  • Based on the value of a variable, i.e., specific to each observation:
    • Directly extracted from the observation described in the dataset when the variable is an Observable Random Node and the value is not missing.
    • Inferred with the current Bayesian network when the variable is:
      • an Observable Random Node and the value is missing,
      • the Target Node,
      • a Not-Observable Random Node,
      • a Function Node with numerical values.

For inference, all the non-missing values of the Observable Random Nodes are set as Hard Evidence.

The value that will be utilized for the mapping depends on the type of the variable:

  • Discrete: the state is chosen with the maximum a posteriori criterion.
  • Continuous: the mean value is computed with the posterior probability distribution, normalized to bring all values into the range [0,1].
  • Function Node: except when used to define the Shape, the value is normalized to bring all values into the range [0,1], using the Minimum and Maximum values set in the wizard.

Three shapes are defined: Circle (1), Square (2), and Triangle (3).

When not fixed, the shape is chosen based on rank and the inferred value:

  • For Discrete nodes: it uses the state’s rank and its modulo.
  • For Function nodes: it uses the function value and a conversion to an integer.

When not fixed to the user-defined value, the size is chosen based on the inferred value:

  • For Discrete variables: it corresponds to the normalized state’s rank.
  • For Continuous and Function nodes: it corresponds to the normalized value.

When not fixed to the chosen color, the color is chosen as follows:

  • For Discrete variables: the color is chosen based on the state’s rank and the Secondary Color Palette,
  • For Continuous and Function nodes: the normalized value is directly used to define a color on the user-defined scale Min, Mid (if checked), and Max.

When not fixed to the user-defined value, the opacity is chosen based on the inferred value:

  • For Discrete variables: it corresponds to the normalized state’s rank.
  • For Continuous and Function nodes: it corresponds to the normalized value.

Example

Let’s use a dataset that contains house sale prices for King County, which includes Seattle. It describes homes sold between May 2014 and May 2015. More precisely, we have extracted the 94 houses that are more than 100 years old, that have been renovated, and come with a basement.

After setting Price ($) as a Target Node, we’ve used the Augmented Markov Blanket algorithm to generate the following network:

  • The Function Node Certainty is defined as: 1-Entropy(?Price (K$)?, yes)
  • The first three parameters of this wizard are the general settings of the mapping:
  • Map Type: Roads, Terrain, Satellite or Hybrid,
  • Latitude: the continuous variable to use for the latitude coordinate,
  • Longitude: the continuous variable to use for the longitude coordinate.

This setting generates the following map that takes into account four different variables:

The Observable variable Overall grade given to the housing unit (discretized into three bins) defines the shape. The values are directly read in the dataset to determine the corresponding discrete bin, if not missing:

  • <= 7.5: CIRCLE
  • <= 8.5: SQUARE
  • > 8.5: TRIANGLE

The Observable variable Living room area in 2015 defines the size, with 25 as the maximum (set in the Fixed field). The continuous values are directly read in the dataset, if not missing.

The Target Node Price (K$) defines the color. The values are the inferred posterior mean values.

The Function Node Certainty defines the opacity. The values are inferred.