Skip to Content

Calculating Complexity: DL(B)

Description Length of the Bayesian Network — DL(B)

DL(B)DL(B) is the number of bits required to represent a Bayesian network. We can break down this value into the sum of two components:

DL(G)DL(G), which stands for the number of bits required to represent the graph GG of the Bayesian network,

DL(PG)DL(P \mid G) represents the number of bits required to represent the set of probability tables PP.

DL(B)=DL(G)+DL(PG)\displaystyle DL(B) = DL(G) + DL(P \mid G)

Calculating DL(G)

To calculate DL(G)DL(G), we need to determine the number of nodes and the number of their parent nodes.

DL(G)=i=1n(log2(n)+log2(nPai))\displaystyle DL(G) = \sum_{i=1}^n \left( \log_2(n) + \log_2 \binom{n}{\lvert P_{a_i}\rvert} \right)

where

nn is the number of random variables (nodes): X1,...,Xn{X_1},...,{X_n}

PaiP{a_i} is the set of the random variables that are parents of Xi{X_i} in graph GG

and PaiP{a_i} is the number of parents of the random variable Xi{X_i}.

Calculating DL(P|G)

Computing DL(PG)DL(P \mid G) is straightforward as it is proportional to the number of cells in all probability tables.

DL(PG)=i=1n(j=1PaiSj×(Si1)×DL(p))\displaystyle DL(P \mid G) = \sum_{i=1}^n \left( \prod_{j=1}^{\lvert P_{a_i}\rvert} S_j \times (S_i - 1) \times DL(p) \right)

where

Si{S_i} is the number of states of the random variable Xi{X_i}

pp is the probability associated with the cell.

As the probability p cannot be known prior to learning the network, we use the following classical heuristic in BayesiaLab:

DL(p)=log2(N)2\displaystyle DL(p) = \frac{\log_2(N)}{2}