[PDF] Probabilistic Inferences in Bayesian Networks

Abstract

Bayesian network is a complete model for the variables and their relationships, it can be used to answer probabilistic queries about them. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems. In the application of Bayesian networks, most of the work is related to probabilistic inferences. Any variable updating in any node of Bayesian networks might result in the evidence propagation across the Bayesian networks. This paper sums up various inference techniques in Bayesian networks and provide guidance for the algorithm calculation in probabilistic inference in Bayesian networks.

Full PDF

aa r X i v : . [ c s . A I] N ov Probabilistic Inferences in Bayesian Networks

Jianguo Ding

Interdisciplinary Center for Security, Reliability and TrustUniversity of Luxembourg, Luxembourg

[email protected]

Abstract.

Bayesian network is a complete model for the variables andtheir relationships, it can be used to answer probabilistic queries aboutthem. A Bayesian network can thus be considered a mechanism for auto-matically applying Bayes’ theorem to complex problems. In the applica-tion of Bayesian networks, most of the work is related to probabilistic in-ferences. Any variable updating in any node of Bayesian networks mightresult in the evidence propagation across the Bayesian networks. Thispaper sums up various inference techniques in Bayesian networks andprovide guidance for the algorithm calculation in probabilistic inferencein Bayesian networks.

Because a Bayesian network is a complete model for the variables and theirrelationships, it can be used to answer probabilistic queries about them. Forexample, the network can be used to ﬁnd out updated knowledge of the state ofa subset of variables when other variables (the evidence variables) are observed.This process of computing the posterior distribution of variables given evidenceis called probabilistic inference. A Bayesian network can thus be considered amechanism for automatically applying Bayes’ theorem to complex problems.In the application of Bayesian networks, most of the work is related to prob-abilistic inferences. Any variable updating in any node of Bayesian networksmight result in the evidence propagation across the Bayesian networks. How toexamine and execute various inferences is the important task in the applicationof Bayesian networks.This chapter will sum up various inference techniques in Bayesian networksand provide guidance for the algorithm calculation in probabilistic inference inBayesian networks. Information systems are of discrete event characteristics, thischapter mainly concerns the inferences in discrete events of Bayesian networks.

The key feature of Bayesian networks is the fact that they provide a methodfor decomposing a probability distribution into a set of local distributions. Theindependence semantics associated with the network topology speciﬁes how to

Jianguo Ding combine these local distributions to obtain the complete joint probability distri-bution over all the random variables represented by the nodes in the network.This has three important consequences.Firstly, naively specifying a joint probability distribution with a table re-quires a number of values exponential in the number of variables. For systemsin which interactions among the random variables are sparse, Bayesian networksdrastically reduce the number of required values.Secondly, eﬃcient inference algorithms are formed in that work by transmit-ting information between the local distributions rather than working with thefull joint distribution.Thirdly, the separation of the qualitative representation of the inﬂuences be-tween variables from the numeric quantiﬁcation of the strength of the inﬂuenceshas a signiﬁcant advantage for knowledge engineering. When building a Bayesiannetwork model, one can focus ﬁrst on specifying the qualitative structure of thedomain and then on quantifying the inﬂuences. When the model is built, one isguaranteed to have a complete speciﬁcation of the joint probability distribution.The most common computation performed on Bayesian networks is the de-termination of the posterior probability of some random variables, given thevalues of other variables in the network. Because of the symmetric nature ofconditional probability, this computation can be used to perform both diagnosisand prediction. Other common computations are: the computation of the prob-ability of the conjunction of a set of random variables, the computation of themost likely combination of values of the random variables in the network andthe computation of the piece of evidence that has or will have the most inﬂuenceon a given hypothesis.A detailed discussion of inference techniques in Bayesian networks can befound in the book by Pearl [Pearl, 2000]. – Probabilistic semantics.

Any complete probabilistic model of a domainmust, either explicitly or implicitly, represent the joint distribution whichthe probability of every possible event as deﬁned by the values of all thevariables. There are exponentially many such events, yet Bayesian networksachieve compactness by factoring the joint distribution into local, conditionaldistributions for each variable given its parents. If x i denotes some valueof the variable X i and π ( x i ) denotes some set of values for X i ’s parents π ( x i ), then P ( x i | π ( x i )) denotes this conditional distribution. For example, P ( x | x , x ) is the probability of wetness given the values of sprinkler andrain. Here P ( x | x , x ) is the brief of P ( x |{ x , x } ). The set parenthesesare omitted for the sake of readability. We use the same expression in thisthesis. The global semantics of Bayesian networks speciﬁes that the full jointdistribution is given by the product P ( x , . . . , x n ) = Y i P ( x i | π ( x i )) (1)Equation 1 is also called the chain rule for Bayesian networks.In the example Bayesian network in Figure 1, we have robabilistic Inferences in Bayesian Networks 3 Fig. 1.

Causal Inﬂuences in A Bayesian Network. P ( x , x , x , x , x ) = P ( x ) P ( x | x ) P ( x | x ) P ( x | x , x ) P ( x | x ) (2)Provided the number of parents of each node is bounded, it is easy to seethat the number of parameters required grows only linearly with the size ofthe network, whereas the joint distribution itself grows exponentially. Fur-ther savings can be achieved using compact parametric representations, suchas noisy-OR models, decision tress, or neural networks, for the conditionaldistributions [Pearl, 2000].There are also entirely equivalent local semantics, which assert that eachvariable is independent of its non-descendants in the network given its par-ents. For example, the parents of X in Figure 1 are X and X and theyrender X independent of the remaining non-descendant, X . That is, P ( x | x , x , x ) = P ( x | x , x ) (3)The collection of independence assertions formed in this way suﬃces to de-rive the global assertion in Equation 2, and vice versa. The local semanticsare most useful in constructing Bayesian networks, because selecting as par-ents the direct causes of a given variable automatically satisﬁes the localconditional independence conditions. The global semantics lead directly toa variety of algorithms for reasoning. – Evidential reasoning. From the product speciﬁcation in Equation 2, onecan express the probability of any desired proposition in terms of the con-ditional probabilities speciﬁed in the network. For example, the probabilitythat the sprinkler was on, given that the pavement is slippery, is P ( X = on | X = true ) (4)= P ( X = on, X = true ) P ( X = true ) Jianguo Ding = P x ,x ,x P ( x , x , X = on, x , X = true ) P x ,x ,x ,x P ( x , x , x , x , X = true )= P x ,x ,x P ( x ) P ( x | x ) P ( X = on | x ) P ( x | x , X = on ) P ( X = true | x ) P x ,x ,x ,x P ( x ) P ( x | x ) P ( x | x ) P ( x | x , x ) P ( X = true | x )These expressions can often be simpliﬁed in the ways that reﬂect the struc-ture of the network itself.It is easy to show that reasoning in Bayesian networks subsumes the sat-isﬁability problem in propositional logic and hence reasoning is NP-hard[Cooper, 1990]. Monte Carlo simulation methods can be used for approxi-mate inference [Pearl, 1987], given that estimates are gradually improved asthe sampling proceeds. (Unlike join-tree methods, these methods use localmessage propagation on the original network structure.) Alternatively, varia-tional methods [Jordan et al., 1998] provide bounds on the true probability. – Functional Bayesian networks. The networks discussed so far are capa-ble of supporting reasoning about evidence and about actions. Additionalreﬁnement is necessary in order to process counterfactual information. Forexample, the probability that ”the pavement would not have been slipperyhad the sprinkler been

OFF , given that the sprinkler is in fact ON andthat the pavement is in fact slippery” cannot be computed from the infor-mation provided in Figure 1 and Equation 2. Such counterfactual probabil-ities require a speciﬁcation in the form of functional networks, where eachconditional probability P ( x i | π ( i )) is replaced by a functional relationship x i = f i ( π ( i ) , ǫ i ), where ǫ i is a stochastic (unobserved) error term. When thefunctions f i and the distributions of ǫ i are known, all counterfactual state-ments can be assigned unique probabilities, using evidence propagation in astructure called a ”twin network”. When only partial knowledge about thefunctional form of f i is available, bounds can be computed on the probabil-ities of counterfactual sentences [Balke & Pearl, 1995] [Pearl, 2000]. – Causal discovery. One of the most exciting prospects in recent years hasbeen the possibility of using Bayesian networks to discover causal structuresin raw statistical data [Pearl & Verma, 1991] [Spirtes et al., 1993] [Pearl, 2000],which is a task previously considered impossible without controlled experi-ments. Consider, for example, the following pattern of dependencies amongthree events: A and B are dependent, B and C are dependent, yet A and C are independent. If you ask a person to supply an example of three suchevents, the example would invariably portray A and C as two independentcauses and B as their common eﬀect, namely, A → B ← C . Fitting this de-pendence pattern with a scenario in which B is the cause and A and C arethe eﬀects is mathematically feasible but very unnatural, because it must en-tail ﬁne tuning of the probabilities involved; the desired dependence patternwill be destroyed as soon as the probabilities undergo a slight change.Such thought experiments tell us that certain patterns of dependency, whichare totally void of temporal information, are conceptually characteristic ofcertain causal directionalities and not others. When put together systemat-ically, such patterns can be used to infer causal structures from raw data robabilistic Inferences in Bayesian Networks 5 and to guarantee that any alternative structure compatible with the datamust be less stable than the one(s) inferred; namely, slight ﬂuctuations inparameters will render that structure incompatible with the data. – Plain beliefs. In mundane decision making, beliefs are revised not by ad-justing numerical probabilities but by tentatively accepting some sentencesas ”true for all practical purposes”. Such sentences, called plain beliefs, ex-hibit both logical and probabilistic characters. As in classical logic, they arepropositional and deductively closed; as in probability, they are subject toretraction and to varying degrees of entrenchment. Bayesian networks canbe adopted to model the dynamics of plain beliefs by replacing ordinaryprobabilities with non-standard probabilities, that is, probabilities that areinﬁnitesimally close to either zero or one [Goldszmidt & Pearl, 1996]. – Models of cognition.

Bayesian networks may be viewed as normativecognitive models of propositional reasoning under uncertainty [Pearl, 2000].They handle noise and partial information by using local, distributed al-gorithm for inference and learning. Unlike feed forward neural networks,they facilitate local representations in which nodes correspond to proposi-tions of interest. Recent experiments [Tenenbaum & Griﬃths, 2001] suggestthat they capture accurately the causal inferences made by both childrenand adults. Moreover, they capture patterns of reasoning that are not easilyhandled by any competing computational model. They appear to have manyof the advantages of both the “symbolic” and the “subsymbolic” approachesto cognitive modelling.Two major questions arise when we postulate Bayesian networks as potentialmodels of actual human cognition.Firstly, does an architecture resembling that of Bayesian networks exist any-where in the human brain? No speciﬁc work had been done to design neuralplausible models that implement the required functionality, although no ob-vious obstacles exist.Secondly, how could Bayesian networks, which are purely propositional intheir expressive power, handle the kinds of reasoning about individuals, re-lations, properties, and universals that pervades human thought? One plau-sible answer is that Bayesian networks containing propositions relevant tothe current context are constantly being assembled as needed to form a morepermanent store of knowledge. For example, the network in Figure 1 may beassembled to help explain why this particular pavement is slippery right now,and to decide whether this can be prevented. The background store of knowl-edge includes general models of pavements, sprinklers, slipping, rain, and soon; these must be accessed and supplied with instance data to constructthe speciﬁc Bayesian network structure. The store of background knowl-edge must utilize some representation that combines the expressive power ofﬁrst-order logical languages (such as semantic networks) with the ability tohandle uncertain information.

Jianguo Ding d-Separation is one important propertyof Bayesian networks for inference. Before we deﬁne d-separation, we ﬁrst lookat the way that evidence is transmitted in Bayesian Networks. There are twotypes of evidence: – Hard Evidence (instantiation) for a node A is evidence that the state of A is deﬁnitely a particular value. – Soft Evidence for a node A is any evidence that enables us to update theprior probability values for the states of A . d-Separation (Deﬁnition):Two distinct variables X and Z in a causal network are d-separated if, forall paths between X and Z , there is an intermediate variable V (distinct from X and Z ) such that either – the connection is serial or diverging and V is instantiated or – the connection is converging, and neither V nor any of V ’s descendants havereceived evidence.If X and Z are not d-separated, we call them d-connected. Basic structures of Bayesian Networks

Based on the deﬁnition of d-seperation,three basic structures in Bayesian networks are as follows:1.

Serial connections

Consider the situation in Figure 2. X has an inﬂuence on Y , which in turnhas an inﬂuence on Z . Obviously, evidence on Z will inﬂuence the certaintyof Y , which then inﬂuences the certainty of Z . Similarly, evidence on Z willinﬂuence the certainty on X through Y . On the other hand, if the state of Y is known, then the channel is blocked, and X and Z become independent.We say that X and Z are d-separated given Y , and when the state of avariable is known, we say that it is instantiated (hard evidence).We conclude that evidence may be transmitted through a serial connectionunless the state of the variable in the connection is known. Fig. 2.

Serial Connection. When Y is Instantiated, it blocks the communication be-tween X and Z .robabilistic Inferences in Bayesian Networks 7 Diverging connections

The situation in Figure 3 is called a diverging connection. Inﬂuence can passbetween all the children of X unless the state of X is known. We say that Y , Y , . . . , Y n are d-separated given X .Evidence may be transmitted through a diverging connection unless it isinstantiated. Fig. 3.

Diverging Connection. If X is instantiated, it blocks the communication be-tween its children. Converging connections

Fig. 4.

Converging Connection. If Y changes certainty, it opens for the communicationbetween its parents. A description of the situation in Figure 4 requires a little more care. Ifnothing is known about Y except what may be inferred from knowledge ofits parents X , . . . , X n , then the parents are independent: evidence on oneof the possible causes of an event does not tell us anything about otherpossible causes. However, if anything is known about the consequences, theninformation on one possible cause may tell us something about the othercauses.This is the explaining away eﬀect illustrated in Figure 1. X (pavement iswet) has occurred, and X (the sprinkler is on) as well as X (it’s raining)may cause X . If we then get the information that X has occurred, thecertainty of X will decrease. Likewise, if we get the information that X has not occurred, then the certainty of X will increase. Jianguo Ding

The three preceding cases cover all ways in which evidence may be transmit-ted through a variable.

In Bayesian networks, 4 popular inferences are identiﬁed as:1. Forward InferenceForward inferences is also called predictive inference (from causes to eﬀects).The inference reasons from new information about causes to new beliefsabout eﬀects, following the directions of the network arcs. For example, inFigure 2, X → Y → Z is a forward inference.2. Backward InferenceBackward inferences is also called diagnostic inference (from eﬀects to causes).The inference reasons from symptoms to cause, Note that this reasoning oc-curs in the opposite direction to the network arcs. In Figure 2 , Z → Y is abackward inference. In Figure 3 , Y i → X ( i ∈ [1 , n ]) is a backward inference.3. Intercausal InferenceIntercausal inferences is also called explaining away (between parallel vari-ables). The inference reasons about the mutual causes (eﬀects) of a com-mon eﬀect (cause). For example, in Figure 4, if the Y is instantiated, X i and X j ( i, j ∈ [1 , n ]) are dependent. The reasoning X i ↔ X j ( i, j ∈ [1 , n ])is an intercausal inference. In Figure 3, if X is not instantiated, Y i and Y j ( i, j ∈ [1 , n ]) are dependent. The reasoning Y i ↔ Y j ( i, j ∈ [1 , n ]) is anintercausal inference.4. Mixed inferenceMixed inferences is also called combined inference. In complex Bayesian net-works, the reasoning does not ﬁt neatly into one of the types described above.Some inferences are a combination of several types of reasoning. in Serial Connections • the forward inference executes with the evidence forward propagation.For example, in Figure 5, consider the inference X → Y → Z . If Y is instantiated, X and Z are independent, then we have followingexample: P ( Z | XY ) = P ( Z | Y ); P ( Z + | Y + ) = 0 . Note: In this chapter, P ( X + ) is the abbreviation of P ( X = true ), P ( X − ) is theabbreviation of P ( | X = false ). For simple expression, we use P ( Y | X ) to denote P ( Y = true | X = true ) by default. But in express P ( Y + | X ), X denotes both situa-tions X = true and X = false .robabilistic Inferences in Bayesian Networks 9 Fig. 5.

Inference in Serial Connection P ( Z − | Y + ) = 0 . P ( Z + | Y − ) = 0 . P ( Z − | Y − ) = 0 . P ( Z + | X + Y ) = P ( Z + | Y + ) P ( Y + | X + ) + P ( Z + | Y − ) P ( Y − | X + )= 0 . ∗ .

85 + 0 . ∗ .

15 = 0 . . . P ( Z − | X − Y ) = P ( Z − | Y + ) P ( Y + | X − ) + P ( Z − | Y − ) P ( Y − | X − )= 0 . ∗ .

03 + 0 . ∗ .

97 = 0 . . . • the backward inference executes the evidence backward propagation.For example, in Figure 5, consider the inference Z → Y → X .1. If Y is instantiated ( P ( Y + ) = 1 or P ( Y − ) = 1), X and Z areindependent, then P ( X | Y Z ) = P ( X | Y ) = P ( X ) P ( Y | X ) P ( Y ) (5) P ( X + | Y + Z ) = P ( X + | Y + ) = P ( X + ) P ( Y + | X + ) P ( Y + ) = ∗ . = 0 . P ( X + | Y − Z ) = P ( X + | Y − ) = P ( X + ) P ( Y − | X + ) P ( Y − ) = ∗ . = 0 . Y is not instantiated, X and Z are dependent (See the dashedlines in Figure 5). Suppose P ( Z + ) = 1 then P ( X + | Y Z + ) = P ( X + Y Z + ) P ( Y Z + ) = P ( X + Y Z + ) P X P ( XY Z + ) ; P ( X + Y Z + ) = P ( X + Y + Z + ) + P ( X + Y − Z + ) = 0 . ∗ . ∗ .

95 +0 . ∗ . ∗ .

05 = 0 . . . P X P ( XY Z + ) = P ( X + Y + Z + ) + P ( X + Y − Z + ) + P ( X − Y + Z + ) + P ( X − Y − Z + )= 0 . ∗ . ∗ .

95 + 0 . ∗ . ∗ .

99 + 0 . ∗ . ∗ .

95 + 0 . ∗ . ∗ .

01= 0 . . . . . P ( X + | Y Z + ) = P ( X + Y Z + ) P X P ( XY Z + ) = . . = 0 . . In serial connections, there is no intercausal inference. – in Diverging Connections Fig. 6.

Inference in Diverging Connection • the forward inference executes with the evidence forward propagation.For example, in Figure 6, consider the inference Y → X and Y → Z ,the goals are easy to obtain by nature. • the backward inference executes with the evidence backward propaga-tion, see the dashed line in Figure 6, consider the inference ( XZ ) → Y , X and Z are instantiated by assumption, suppose P ( X + = 1), P ( Z + = 1).Then, P ( Y + | X + Z + ) = P ( Y + X + Z + ) P ( X + Z + ) = P ( Y + ) P ( X + | Y + ) P ( Z + | Y + ) P ( X + Z + )= 0 . ∗ . ∗ .

901 = 0 . • the intercausal inference executes between eﬀects with a common cause.In Figure 6, if Y is not instantiated, there exists intercausal inference indiverging connections. Consider the inference X → Z , P ( X + | Y Z + ) = P ( X + Y Z + ) P ( Y Z + ) = P ( X + Y + Z + )+ P ( X + Y − Z + ) P ( Y + Z + )+ P ( Y − Z + ) ;= . ∗ . ∗ . . ∗ . ∗ . . ∗ . . ∗ . = 0 . – in Converging Connections, • the forward inference executes with the evidence forward propagation.For example, in Figure 7, consider the inference ( XZ ) → Y , P ( Y | XZ )is easy to obtain by the deﬁnition of Bayesian Network in by nature. • the backward inference executes with the evidence backward propa-gation. For example, in Figure 7, consider the inference Y → ( XZ ). P ( Y ) = P XZ P ( XY Z ) = P XZ ( P ( Y | XZ ) P ( XZ )), P ( XZ | Y ) = P ( Y | XZ ) P ( XZ ) P ( Y ) = P ( Y | XZ ) P ( X ) P ( Z ) P XZ ( P ( Y | XZ ) P ( XZ )) .Finally, P ( X | Y ) = P Z P ( XZ | Y ), P ( Z | Y ) = P X P ( XZ | Y ). robabilistic Inferences in Bayesian Networks 11 Fig. 7.

Inference in Converging Connection • the intercausal inference executes between causes with a commoneﬀect, and the intermediate node is instantiated, then P ( Y + ) = 1 or P ( Y − ) = 1. In Figure 7, consider the inference X → Z , suppose P ( Y + ) =1, P ( Z + | X + Y + ) = P ( Z + X + Y + ) P ( X + Y + ) = P ( Z + X + Y + ) P Z P ( X + Y + Z ) ; P ( Z + X + Y + ) = P ( X + ) P ( Z + ) P ( Y + | X + Z + ); P Z P ( X + Y Z ) = P ( X + Y + Z + ) + P ( X + Y + Z − ); P ( Z + | X + Y + ) = P ( Z + X + Y + ) P Z P ( X + Y + Z ) = P ( X + ) P ( Z + ) P ( Y + | X + Z + ) P ( X + Y + Z + )+ P ( X + Y + Z − ) . inference in complex model For complex models in Bayesian networks, thereare single-connected networks, multiple-connected, or event looped networks. Itis possible to use some methods, such as Triangulated Graphs, Clustering andJoin Trees [Bertele & Brioschi, 1972] [Finn & Thomas, 2007 ] [Golumbic, 1980],etc., to simplify them into a polytree. Once a polytree is obtained, the inferencecan be executed by the following approaches.Polytrees have at most one path between any pair of nodes; hence they arealso referred to as singly-connected networks.Suppose X is the query node, and there is some set of evident nodes E, X / ∈ E . The posterior probability (belief) is denoted as B ( X ) = P ( X | E ), see Figure8. E can be splitted into 2 parts: E + and E − . E − is the part consisting ofassignments to variables in the subtree rooted at X , E + is the rest of it. π X ( E + ) = P ( X | E + ) λ X ( E − ) = P ( E − | X ) B ( X ) = P ( X | E ) = P ( X | E + E − ) = P ( E − | XE + ) P ( X | E + ) P ( E − | E + ) = P ( E − | X ) P ( X | E + ) P ( E − | E + ) = απ X ( E + ) λ X ( E − )(7) Fig. 8.

Evidence Propagation in Polytree α is a constant independent of X .where λ X ( E − ) = { if evidence is X = x i if evidence is f or another x j (8) π X ( E + ) = X u ,...,u m P ( X | u , ..., u m ) Y i π X ( u i ) (9)1. Forward inference in PolytreeNode X sends π messages to its children. π X ( U ) = { if x i ∈ X is entered if evidentce is f or another value x j P u ,...u m P ( X | u , ...u m ) Q i π X ( u i ) otherwise (10)2. Backward inference in Polytree Node X sends new λ messages to its parents. λ X ( Y ) = Y y j ∈ Y [ X j P ( y j | X ) λ X ( y j )] (11) Various types of inference algorithms exist for Bayesian networks [Lauritzen & Spiegelhalter, 1988][Pearl, 1988] [Pearl, 2000] [Neal, 1993]. Each class oﬀers diﬀerent properties andworks better on diﬀerent classes of problems, but it is very unlikely that a singlealgorithm can solve all possible problem instances eﬀectively. Every resolutionis always based on a particular requirement. It is true that almost all compu-tational problems and probabilistic inference using general Bayesian networkshave been shown to be NP-hard by Cooper [Cooper, 1990]. robabilistic Inferences in Bayesian Networks 13

In the early 1980’s, Pearl published an eﬃcient message propagation inferencealgorithm for polytrees [Kim & Pearl, 1983] [Peal, 1986]. The algorithm is exact,and has polynomial complexity in the number of nodes, but works only for singlyconnected networks. Pearl also presented an exact inference algorithm for mul-tiple connected networks called loop cutset conditioning algorithm [Peal, 1986].The loop cutset conditioning algorithm changes the connectivity of a networkand renders it singly connected by instantiating a selected subset of nodes re-ferred to as a loop cutset. The resulting single connected network is solved bythe polytree algorithm, and then the results of each instantiation are weightedby their prior probabilities. The complexity of this algorithm results from thenumber of diﬀerent instantiations that must be considered. This implies thatthe complexity grows exponentially with the size of the loop cutest being O ( d c ),where d is the number of values that the random variables can take, and c is thesize of the loop cutset. It is thus important to minimize the size of the loop cutsetfor a multiple connected network. Unfortunately, the loop cutset minimizationproblem is NP-hard. A straightforward application of Pearl’s algorithm to anacyclic digraph comprising one or more loops invariably leads to insuperableproblems [ Koch & Westphall, 2001] [Neal, 1993].Another popular exact Bayesian network inference algorithm is Lauritzen andSpiegelhalter’s clique-tree propagation algorithm [Lauritzen & Spiegelhalter, 1988].It is also called a ”clustering” algorithm. It ﬁrst transforms a multiple connectednetwork into a clique tree by clustering the triangulated moral graph of theunderlying undirected graph and then performs message propagation over theclique tree. The clique propagation algorithm works eﬃciently for sparse net-works, but still can be extremely slow for dense networks. Its complexity isexponential in the size of the largest clique of the transformed undirected graph.In general, the existent exact Bayesian network inference algorithms sharethe property of run time exponentiality in the size of the largest clique of thetriangulated moral graph, which is also called the induced width of the graph[Lauritzen & Spiegelhalter, 1988]. This chapter summarizes the popular inferences methods in Bayesian networks.The results demonstrates that the evidence can propagated across the Bayesiannetworks by any links, whatever it is forward or backward or intercausal style.The belief updating of Bayesian networks can be obtained by various availableinference techniques. Theoretically, exact inferences in Bayesian networks is fea-sible and manageable. However, the computing and inference is NP-hard. Thatmeans, in applications, in complex huge Bayesian networks, the computing andinferences should be dealt with strategically and make them tractable. Simpli-fying the Bayesian networks in structures, pruning unrelated nodes, mergingcomputing, and approximate approaches might be helpful in the inferences oflarge scale Bayeisan networks.