# Bayesian Networks

## Representation of the Joint Probability Distribution

Any complete probabilistic model of a domain must—either explicitly or implicitly—represent the joint probability distribution (JPD), i.e. the probability of every possible event as defined by the combination of the values of all the variables. There are exponentially many such events, yet Bayesian networks achieve compactness by factoring the JPD into local, conditional distributions for each variable given its parents. If *x _{i}* denotes some value of the variable

*X*and

_{i}*pa*denotes some set of values for the parents of

_{i}*X*, then

_{i}*P(x*denotes this conditional probability distribution. For example, in the graph in Figure 2.4,

_{i}|pa_{i})*P(x*is the probability of

_{4}|x_{2},x_{3})*Wetness*given the values of

*Sprinkler*and

*Rain*. The global semantics of Bayesian networks specifies that the full JPD is given by the product rule (or chain rule):

In our example network, we have:

It becomes clear that the number of parameters grows linearly with the size of the network, i.e. the number of variables, whereas the size of the JPD itself grows exponentially. Given a discrete representation of the CPD with a CPT, the size of a local CPD grows exponentially with the number of parents. Savings can be achieved using compact CPD representations—such as noisy-OR models, trees, or neural networks.

The JPD representation with Bayesian networks also translates into a local semantics, which asserts that each variable is independent of non-descendants in the network given its parents. For example, the parents of *X _{4}* in Figure 2.4 are

*X*and

_{2}*X*, and they render

_{3}*X*independent of the remaining non-descendant,

_{4}*X*:

_{1}

The collection of independence assertions formed in this way suffices to derive the global assertion of the product rule (or chain rule) in (2.2), and vice versa. The local semantics is most useful for constructing Bayesian networks because selecting as parents all the direct causes (or direct relationships) of a given variable invariably satisfies the local conditional independence conditions. The global semantics leads directly to a variety of algorithms for reasoning.