116
1 Chapter 14 Probabilistic Reasoning

1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

Embed Size (px)

Citation preview

Page 1: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

1

Chapter 14

Probabilistic Reasoning

Page 2: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

2

Outlines

• Representing Knowledge in an Uncertain Domain

• The semantics of Bayesian Networks• Efficient Representation of Conditional

Distributions• Exact inference in Bayesian Networks• Approximate inference in in Bayesian

Networks• Extending Probability to FOL

representations• Other approaches to Uncertain Reasoning

Page 3: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

3

14-1• Full joint probability distribution

– can answer any question about the domain, but can become intractably large as the number of variables grows.

– Furthermore, specifying probabilities for atomic events is rather unnatural and can be very difficult unless a large amount of data is available from which to gather statistical estimates.

• We also saw that independence and conditional independence relationships among variables can greatly reduce the number of probabilities that need to be specified in order to define the full joint distribution.

• This section introduces a data structure called a Bayesian networks to represent the dependencies among variables and to give a concise specification of any full joint probability distribution.

Page 4: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

4

Definition Definition Definition Definition

A Bayesian network is a directed acyclic graph (DAG) which consists of:

• A set of random variables which makes up the nodes of the network.

• A set of directed links (arrows) connecting pairs of nodes. If there is an arrow from node X to node Y, X is said to be a parent of Y.

• Each node Xi has a conditional probability distribution P(Xi|Parents(Xi)) that quantifies the effect of the parents on the node.

Intuitions: • A Bayesian network models our incomplete understanding of

the causal relationships from an application domain.• A node represents some state of affairs or event.• A link from X to Y means that X has a direct influence on

Y.

Page 5: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

5

What are Bayesian Networks?

• Graphical notation for conditional independence assertions

• Compact specification of full joint distributions

• What do they look like?– Set of nodes, one per variable– Directed, acyclic graph– Conditional distribution for each node given its

parents: P(Xi|Parents(Xi))

Page 6: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

6

Example (Fig. 14.1)

• Weather is independent of the other variables

• Toothache and Catch are conditionally independent given Cavity

Weather Cavity

Toothache Catch

causes

effects

Page 7: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

7

Bayesian Network Notice that Cavity is the “cause” of both Toothach

e and PCatch, and represent the causality links explicitly

Give the prior probability distribution of Cavity Give the conditional probability tables of Toothach

e and PCatchCavity

Toothache

P(Cavity)0.2

P(Toothache|c)

CavityCavity

0.60.1

PCatch

P(PClass|c)CavityCavity

0.90.02

5 probabilities, instead of 7

P(ctpc) = P(tpc|c) P(c) = P(t|c) P(pc|c) P(c)

Page 8: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

8

Another Example

Sample Domain: • You have a burglar alarm installed in your

home. It is fairly reliable at detecting a burglary, but also responds on occasion to minor earthquakes.

• You also have to neighbors, John and Mary, who have promised to call you at work when they hear the alarm.

• John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too.

• Mary, on the other hand, likes rather loud music and sometimes misses the alarm altogether.

Page 9: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

9

Example

• I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar?

• What are the variables?– Burglar– Earthquake– Alarm– JohnCalls– MaryCalls

Page 10: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

10

Another Example (continued)

• Network Topology reflects causal knowledge:– A burglar can set the alarm off– An earthquake can set the alarm off– The alarm can cause Mary to call– The alarm can cause John to call

• Assumption– they do not perceive any burglaries directly, – they do not notice the minor earthquakes,

and – they do not confer before calling.

Page 11: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

11

Another example (Fig.14.2)

Alarm

Earthquake

JohnCalls MaryCalls

Burglary

Page 12: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

12

Conditional Probability table (CPT)• Each distribution is shown as a conditional probability

table, or CPT. • CPT can be used for discrete variables; other

representations, including those suitable for continuous variables

• Each row in a CPT contains the conditionalprobability of each node value for a conditioning case.

• A conditioning case is just a possible combination of values for the parent nodes—a miniature atomic event, if you like.

• Each row must sum to 1, because the entries represent an exhaustive set of cases for the variable.

• For Boolean variables, once you know that the probability of a true value is p, the probability of false must be 1 – p, so we often omit the second number, as in Figure 14.2.

• In general, a table for a Boolean variable with k Boolean parents contains 2k independently specifiable probabilities.

• A node with no parents has only one row, representing the prior probabilities of each possible value of the variable.

Page 13: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

13

Another example (Fig.14.2)

B E P(A|B,E)

T T .95

T F .94

F T .29

F F .001

P(B)

.001

A P(M|A)

T .70

F .01

A P(J|A)

T .90

F .05

P(E)

.001

Alarm

Earthquake

JohnCalls MaryCalls

Burglary

Page 14: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

14

Compactness

• Conditional Probability Table (CPT): distribution over Xi for each combination of parent values

• Each row requires one number p for for Xi=true (since the false case is just 1-p)

• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values

• Full network requires O(n = 2k) numbers (instead of 2n)

B E P(A|B,E)

T T .95

T F .94

F T .29

F F .001

A

E

J M

B

Page 15: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

15

14.2 Semantics of Bayesian Networks

• Global: Representing the full joint distribution– be helpful in understanding how to

construct networks,

• Local: Representing conditional independence – be helpful in designing inference

procedures

Page 16: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

16

Global Semantics• Global semantics defines the full joint distribution as

the product of the local conditional distributions• P(X1=x1, X2=x2, X3=x3, …,Xn=xn)• = P(X1,…,Xn)=n

i=1 P(Xi|Parents(Xi))– where parents (Xi) denotes the specific values of the variable

s in Parents(Xi). – Thus, each entry in the joint distribution is represented by th

e product of the appropriate elements of the conditional probability tables (CPTs) in the Bayesian network.

– The CPTs therefore provide a decomposed representation of the joint distribution.

• Example:What is P(j m a b e)? = P(j|a)P(m|a)P(a| b, e)P( b)P( e)0.90*0.70*0.001*0.999*0.998=0.00062

Page 17: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

17

But does a BN represent a belief state?

In other words, can we compute the full joint

distribution of the propositions from it?

Page 18: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

18

Calculation of Joint Probability

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE) = ??

Page 19: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

19

P(JMABE)= P(JM|A,B,E) P(ABE)= P(J|A,B,E) P(M|A,B,E) P(ABE)(J and M are independent given A)

P(J|A,B,E) = P(J|A)(J and BE are independent given A)

P(M|A,B,E) = P(M|A) P(ABE) = P(A|B,E) P(B|E) P(E)

= P(A|B,E) P(B) P(E)(B and E are independent)

P(JMABE) = P(J|A)P(M|A)P(A|B,E)P(B)P(E)

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Page 20: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

20

Calculation of Joint Probability

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

Page 21: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

21

Calculation of Joint Probability

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

P(x1x2…xn) = i=1,…,nP(xi|parents(Xi)) full joint distribution table

Page 22: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

22

Calculation of Joint Probability

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

Since a BN defines the full joint distribution of a set of propositions, it represents a belief state

Since a BN defines the full joint distribution of a set of propositions, it represents a belief state

P(x1x2…xn) = i=1,…,nP(xi|parents(Xi)) full joint distribution table

Page 23: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

23

Page 24: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

24

Chain Rule

Page 25: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

25

• We need to choose parents for each node such that this property holds. Intuitively, the parents of node Xi should contain all those nodes in X1, ... , Xi_1 that directly influence Xi.

• For example, suppose we have completed the network in Figure 14.2 except for the choice of parents for MaryCalls. MaryCalls is certainly influenced by whether there is a Burglary or an Earthquake, but not directly influenced. Intuitively, our knowledge of the domain tells us that these events influence Mary's calling behavior only through their effect on the alarm.

• Also, given the state of the alarm, whether John calls has no influence on Mary's calling. Formally speaking, we believe that the following conditional independence statement holds:

• P(MaryCalls John Calls , Alarm, Earthquake, Burglary) = P( MaryCalls I Alarm) .

Page 26: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

26

Constructing Bayesian Networks (cont.)

• Direct influencers should be added to the network first

• The correct order in which to add nodes is to add “root causes” first, then the variables they influence, and so on.

• We need to choose parents for each node such that this property holds. Intuitively, the parents of node Xi should contain all those nodes in X1, X2, ... , Xi-1 that directly influence Xi.

• If we don’t follow these rules, we can end up with a very complicated network.

Page 27: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

27

Constructing Bayesian Networks

M

A

E

J

B

M

P(MaryCalls|JohnCalls,Alarm,Earthquake,Burglary) = P(MaryCalls|Alarm)

Page 28: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

28

Constructing Bayesian Networks (cont.)

MaryCalls

P(J|M)=P(J)?

Chosen order: M,J,A,B,E

Page 29: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

29

Constructing Bayesian Networks (cont.)

JohnCalls

MaryCalls

P(J|M)=P(J)? No!P(A|J,M)=P(A|J)? P(A|J,M)=P(A)?

Chosen order: M,J,A,B,E

Page 30: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

30

Constructing Bayesian Networks (cont.)

Alarm

JohnCalls

MaryCalls

P(J|M)=P(J)? No!P(A|J,M)=P(A|J)? P(A|J,M)=P(A)? No.P(B|A,J,M)=P(B|A)?P(B|A,J,M)=P(B)?

Chosen order: M,J,A,B,E

Page 31: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

31

Constructing Bayesian Networks (cont.)

Burglary

Alarm

JohnCalls

MaryCalls

P(J|M)=P(J)? No! P(A|J,M)=P(A|J)? P(A|J,M)=P(A)? No. P(B|A,J,M)=P(B|A)? Yes! P(E|B,A,J,M)=P(E|A)?P(B|A,J,M)=P(B)? No! P(E|B,A,J,M)=P(E|A,B)?

Chosen order: M,J,A,B,E

Page 32: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

32

Constructing Bayesian Networks (cont.)

Alarm

JohnCalls

Burglary

Earthquake

MaryCalls

P(J|M)=P(J)? No! P(A|J,M)=P(A|J)? P(A|J,M)=P(A)? No. P(B|A,J,M)=P(B|A)? Yes! P(E|B,A,J,M)=P(E|A)? No!P(B|A,J,M)=P(B)? No! P(E|B,A,J,M)=P(E|A,B)? Yes!

Chosen order: M,J,A,B,E

Page 33: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

33

Bad example

Page 34: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

34

Constructing Bayesian Networks (cont.)

Earthquake

JohnCalls

Burglary

Alarm

MaryCalls

Page 35: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

35

Local Semantics

• Local Semantics: Each node is conditionally independent of its nondescendants given its parents

Page 36: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

36

Markov Blanket• A node is conditionally independent of all other

nodes in the network, given its parents, children, and children's parents—that is, given its Markov blanket.

Page 37: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

37

14-3 Efficient Representation of Conditional Distributions

• Even if the maximum number of parents k is smallish, filling in the CPT for a node requires up to O(2k) numbers and perhaps a great deal of experience with all the possible conditioning cases.

• In fact, this is a worst-case scenario in which the relationship between the parents and the child is completely arbitrary.

• Usually, such relationships are describable by a canonical distribution that fits some standard pattern.

• In such cases, the complete table can be specified by naming the pattern and perhaps supplying a few parameters—much easier than supplying an exponential number of parameters.

Page 38: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

38

Deterministic nodes• A deterministic node has its value specified

exactly by the values of its parents, with no uncertainty. – The relationship can be a logical one:

• for example, the relationship between the parent nodes Canadian, US, Mexican and the child node NorthAmerican is simply that the child is the disjunction of the parents.

– The relationship can also be numerical: • for example, if the parent nodes are the prices of a particu

lar model of car at several dealers, and the child node is the price that a bargain hunter ends up paying, then the child node is the minimum of the parent values; or

• if the parent nodes are the inflows (rivers, runoff, precipitation) into a lake and the outflows (rivers, evaporation, seepage) from the lake and the child is the change in the water level of the lake, then the value of the child is the difference between the inflow parents and the outflow parents.

Page 39: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

39

Efficient representation of PDs

Page 40: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

40

Efficient representation of PDs

Page 41: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

41

Noisy-OR relation

• The standard example is the noisy-OR relation, which is a generalization of the logical OR.

• In propositional logic, we might say that Fever is true if and only if Cold, Flu( 流行性感冒 ), or Malaria( 瘧疾 ) is true.

• The noisy-OR model allows for uncertainty about the ability of each parent to cause the child to be true—the causal relationship between parent and child may be inhibited, and so a patient could have a cold, but not exhibit a fever.

Page 42: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

42

Noisy-OR relation

• The model makes two assumptions. – First, it assumes that all the possible causes

are listed. (This is not as strict as it seems, because we can always add a so-called leak node that covers "miscellaneous causes.")

– Second, it assumes that inhibition of each parent is independent of inhibition of any other parents:

• for example, whatever inhibits Malaria from causing a fever is independent of whatever inhibits Flu from causing a fever.

• Fever is false if and only if all its true parents are inhibited, and the probability of this is the product of the inhibition probabilities for each parent.

Page 43: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

43

Example• Let us suppose these individual inhibition probabilities

(or noisy paramaters) are as follows: – P(fever |cold, flu , malaria) = 0.6 , [P(fever |cold) = 0.4 ], – P( fever | cold , flu, malaria) = 0.2 , [P(fever|flu) = 0.8 ], – P( fever | cold , flu, malaria) = 0.1 . [P(fever|malaria) = 0.9 ],

• Then, from this information and the noisy-OR assumptions, the entire CPT can be built.

O(k)

Page 44: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

44

Page 45: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

45

Page 46: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

46

Bayesian nets with cont. variables• Many real-world problems involve continuou

s quantities.• Much of statistics deals with random variable

s whose domains are continuous. By definition, continuous variables have an infinite number of possible values, so it is impossible to specify conditional probabilities explicitly for each value.

• Handle continuous variables is to avoid them by using discretizations, dividing up the possible values into a fixed set of intervals.

• Discretization is sometimes an adequate solution, but often results in a considerable loss of accuracy and very large CPTs.

Page 47: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

47

cont. variables

• The other solution is to define standard families of probability density functions (see Appendix A) that are specified by a finite number of parameters. – For example, a Gaussian (or normal) distribution N

(µ, σ2) (x) has the mean µ and the variance σ2 as parameters.

• A network with both discrete and continuous variables is called a hybrid Bayesian network. – the conditional distribution for a continuous variab

le given discrete or continuous parents; P(C|C) or P(C|D)

– the conditional distribution for a discrete variable given continuous parents. P(D|C)

Page 48: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

48

Page 49: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

49

Hybrid (discrete + continuous) networks

Discrete (Subsidy? and Buys?);Continuous (Harvest and Cost)

• How to deal with this?

補助 產季、收穫

Page 50: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

50

Probability density functions

• Instead of probability distributions

• For continuous variablescontinuous variables• Ex.: let X denote tomorrow’s

maximum temperature in the summer in EindhovenBelief that X is distributed uniformly

between 18 and 26 degree Celsius:P(X=x) = U[18,26](x)P(X=20,5) = U[18,26](20,5)=0,125/C

Page 51: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

51

PDF

Page 52: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

52

CDF

Page 53: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

53

Normal PDF

Page 54: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

54

Hybrid (discrete + continuous) networks

Discrete (Subsidy? and Buys?);Continuous (Harvest and Cost)

• How to deal with this?

Page 55: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

55

Hybrid (discrete + continuous) networks

Discrete (Subsidy? and Buys?);Continuous (Harvest and Cost)

• Option 1: discretizationdiscretization – possibly large errors, large CPTs

• Option 2: finitely parameterized cafinitely parameterized canonical familiesnonical familiesa) Continuous variable, discrete + c

ontinuous parents (e.g., Cost)b) Discrete variable, continuous par

ents (e.g., Buys?)

Page 56: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

56

a) Continuous child variables

• Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents

• Most common is the linear Gaussian model, e.g.:

• Mean Cost varies linearly w. Harvest, variance is fixed

• Linear variation is unreasonable over the full range, but works OK if the likely range of Harvest is narrow

Page 57: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

57

Continuous child variables – ex.

• All-continuous network w. LG distribution full joint is a multivariate

Gaussian

• Discrete + continuous LG network is a conditional Gaussian networkconditional Gaussian network, i.e., a multivariate Gaussianmultivariate Gaussian over all continuous variables for each combination of discrete variable values

Page 58: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

58

b) Discrete child, continuous parent

• P(buys|Cost=c) = ((-c + ) / ) • with - threshold for buying • Probit distributionProbit distribution:

- the integralintegral on the standard normal distribution

• Logit distributionLogit distribution:– Uses the sigmoidsigmoid function

x

dxxNx ))(1,0()(

xex

21

1)(

Page 59: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

59

Probit distribution

Page 60: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

60

14-4 Exact inference in Bayesian Networks

• Let us use the following notations:– X denotes the query variable– e denotes the set of evidence variables E1, … , En

– y denotes the set of nonevidence (hidden) variables Y1, … , Yn

– Complete set of variable X={X}E Y.• A typical query asks for the posterior probabili

ty distribution P(Xle). • P(Burglary|JohnCalls = true, MaryCalls = tru

e) = <0.284, 0.716>

Page 61: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

61

Inference by enumeration• any conditional probability can be computed by summi

ng terms from the full joint distribution. • query P(X le) can be answered using Equation (13.6),

• Bayesian network gives a complete representation of the full joint distribution.

• The terms P(x, e, y) in the joint distribution can be written as products of conditional probabilities from the network.

• Therefore, a query can be answered using a Bayesian network by computing sums of products of conditional probabilities from the network.

Page 62: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

62

Page 63: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

63

13-4 Enumerate-Joint-Ask

Page 64: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

64

Page 65: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

65

Computation

Hidden variables

Page 66: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

66

Improvement

Page 67: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

67

Page 68: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

68

Page 69: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

69

Page 70: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

70

Page 71: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

71

Page 72: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

72

Page 73: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

73

Page 74: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

74

Variable elimination algorithm

The enumeration algorithm can be improved substantially by eliminating repeated calculations of the kind illustrated in Figure 14.8.

The idea is simple: do the calculation once andsave the results for later use. This is a form of dynamic programming. There are several versions of this approach; we present the variable elimination algorithm, which is the simplest.

Variable elimination works by evaluating expressions such as Equation (14.3) in right-to-left order (that is, bottom-up in Figure 14.8). Intermediate results are stored, and summations over each variable are done only for those portions of the expression that depend on the variable.

Page 75: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

75

Page 76: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

76

Pointwise product

Page 77: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

77

Page 78: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

78

Page 79: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

79

Elimination-ASK

Page 80: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

80

New evidence E indicates that JohnCalls with some probability p

We would like to know the posterior probability of the other beliefs, e.g. P(Burglary|E)

P(B|E) = P(BJ|E) + P(B J|E)= P(B|J,E) P(J|E) + P(B |J,E) P(J|E)= P(B|J) P(J|E) + P(B|J) P(J|E)= p P(B|J) + (1-p) P(B|J)

We need to compute P(B|J) and P(B|J)

Querying the BN

Page 81: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

81

The BN gives P(t|c) What about P(c|t)? P(Cavity|t)

= P(Cavity t)/P(t)= P(t|Cavity) P(Cavity) / P

(t)[Bayes’ rule]

P(c|t) = P(t|c) P(c)

Querying a BN is just applying the trivial Bayes’ rule on a larger scale

Querying the BN

Cavity

Toothache

P(C)

0.1

C P(T|c)TF

0.40.01111

Page 82: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

82

P(b|J) = P(bJ)= maeP(bJmae) [marginalization]

= maeP(b)P(e)P(a|b,e)P(J|a)P(m|a) [BN]

= P(b)eP(e)aP(a|b,e)P(J|a)mP(m|a) [re-ordering]

Depth-first evaluation of P(b|J) leads to computing each of the 4 following products twice:P(J|A) P(M|A), P(J|A) P(M|A), P(J|A) P(M|A), P(J|A) P(M|A)

Bottom-up (right-to-left) computation + caching – e.g., variable elimination algorithm (see R&N) – avoids such repetition

For singly connected BN, the computation takes time linear in the total number of CPT entries ( time linear in the # propositions if CPT’s size is bounded)

Querying the BN

Page 83: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

83

Singly Connected BN

A BN is singly connected (or polytree) if there is at most one undirected path between any two nodes

Burglary Earthquake

Alarm

MaryCallsJohnCalls

is singly connected

The time and space complexity of exact inference in polytrees is linear in the size of the network (the number of CPT entries).

Page 84: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

84

Multiply Connected BN

A BN is multiply connected if there is more than one undirected path between a pair of nodes

Burglary Earthquake

Alarm

MaryCallsJohnCalls

is multiply connected

variable elimination can have exponential time and space complexity in the worst case, even when the number of parents per node is bounded. it includes inference in propositional logic as a special case, inference in Bayesian networks is NP-hard.

Page 85: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

85

Multiply Connected BN

A BN is multiply connected if there is more than one undirected path between a pair of nodes

Burglary Earthquake

Alarm

MaryCallsJohnCalls

is multiply connectedQuerying a multiply-connected BN takes time exponential in the total

number of CPT entries in the worst case

Querying a multiply-connected BN takes time exponential in the total

number of CPT entries in the worst case

Page 86: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

86

Clustering algorithm

• Join tree algorithm O(n)• Widely used in commercial networks tools.• join individual nodes of the network to for

m clus ter nodes in such a way that the resulting network is a polytree.

• Once the network is in polytree form, a special-purpose inference algorithm is applied. Essentially, the algorithm is a form of constraint propagation (see Chapter 5) where the constraints ensure that neighboring clusters agree on the posterior probability of any variables that they have in common.

Page 87: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

87

Clustering algorithm

Page 88: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

88

14-5 Approximate inference in Bayesian Networks

• Given the intractability of exact inference in large, multiply connected networks, it is essential to consider approximate inference methods.

• This section describes randomized sampling algorithms, also called Monte Carlo algorithms, that provide approximate answers whose accuracy depends on the number of samples generated.

• In recent years, Monte Carlo alg rithms have become widely used in computer science to estimate quantities that are difficult to calculate exactly. For example, the simulated annealing algorithm.

• We describe two families of algorithms: – direct sampling and – Markov chain sampling. – variational methods and (skip)– loopy propagation (skip).

Page 89: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

89

Methods

i.Sampling from an empty networkii.Rejection sampling: reject

samples disagreeing w. evidenceiii.Likelihood weighting: use

evidence to weight samplesiv.MCMC: sample from a stochastic

process whose stationary distribution is the true posterior

Page 90: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

90

Introduction

• The primitive element in any sampling algorithm is the generation of samples from a known probability distribution.

• For example, an unbiased coin can be thought of as a random variable Coin with values (heads, tails) and a prior distribution P(Coin) = (0.5, 0.5).

• Sampling from this distribution is exactly like flipping the coin: with probability 0.5 it will return heads, and with probability 0.5 it will return tails.

• Given a source of random numbers in the range [0, 1], it is a simple matter to sample any distribution on a single variable.

Page 91: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

91

Sampling on Bayesian Network

• The simplest kind of random sampling process for Bayesiah networks generates events from a network that has no evidence associated with it.

• The idea is to sample each variable in turn, in topological order.

• The probability distribution from which the value is sampled is conditioned on the values already assigned to the variable's parents.

Page 92: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

92

Page 93: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

93

Prior-sample

• This algorithm is shown in Figure 14.12. We can illustrate its operation on the network in Figure 14.11(a), assuming an ordering [Cloudy, Sprinkler, Rain, WetGrass] :

• Sample from P(Cloudy) _ <0.5, 0.5>; suppose this returns true.

• Sample from P(Sprinkler |Cloudy = true) = <0.1, 0.9>; suppose this returns false.

• Sample from P(Rain | Cloudy = true) = <0.8, 0.2>; suppose this returns true.

• Sample from P( WetGrass| Sprinkler = false, Rain = true) = <0.9, 0.1>; suppose this returns true.

• PRIOR-SAMPLE returns the event [true, false, true, true].

Page 94: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

94

i. Sampling from an empty network – cont.

• Probability that PRIOR-SAMPLE generates a particular event:

SPS(x1, … ,xn) = n i=1P(Xi|Parents(Xi))=P(x1,…, xn)

• NPS (Y=y) no. of samples generated for which Y=y for any set of variables Y.

• Then, P’(Y=y) = NPS(Y=y)/N and

• lim N P’(Y=y) = h SPS(Y=y,H=h) = = h P(Y=y,H=h) = = P(Y=y)

estimates derived from PRIOR-SAMPLE are consistenconsistentt

Page 95: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

95

II Rejection samplingII Rejection sampling

• Rejection sampling is a general method for producing samples from a hard-to-sample distribution given an easy-to-sample distribution.

• In its simplest form, it can be used to compute conditional probabilities—that is, to determine P(X le).

• REJECTION-SAMPLING algorithm– First, generates samples from the prior

distribution specified by the network. – Then, it rejects all those that do not match the

evidence. – Finally, the estimate P(X = x| e) is obtained by

counting how often X = x occurs in the remaining samples

Page 96: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

96

Rejection-sampling algorithm

Page 97: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

97

Example• Assume that we wish to estimate P(Rainl Sprinkler = true), us

ing 100 samples. • Of the 100 that we generate, suppose that 73 have Sprinkler =

false and are rejected, while 27 have Sprinkler = true; of the 27, 8 have Rain = true and 19 have Rain = false.

• P( Rain1 |Sprinkler = true) ≈ NORMALIZE(<8,19>) = <0.296, 0.704> .

• The true answer is <0.3, 0.7>. • As more samples are collected, the estimate will converge to

the true answer. The standard deviation of the error in each probability will be proportional to 1/sqrt(n), where n is the number of samples used in the estimate.

• The biggest problem with rejection sampling is that it rejects so many samples! The fraction of samples consistent with the evidence e drops exponentially as the number of evi dence variables grows, so the procedure is simply unusable for complex problems.

Page 98: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

98

iii. Likelihood weighting analysis

• avoids the inefficiency of rejection sampling by generating only events that are consistent with the evidence e.

• generates consistent probability estimates.• fixes the values for the evidence variables E and

samples only the remaining variables X and Y. This guarantees that each event generated is consistent with the evidence.

• Before tallying the counts in the distribution for the query variable, each event is weighted by the likelihood that the event accords to the evidence, as measured by the product of the conditional probabilities for each evidence variable, given its parents.

• Intuitively, events in which the actual evidence appears unlikely should be given less weight.

Page 99: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

99

Example

• query P(Rain lSprinkler = true, WetGrass = true).

• First, the weight w is set to 1.0. • Then an event is generated:• Sample from P( Cloudy) _=(0.5, 0.5); suppose this returns tru

e.• Sprinkler is an evidence variable with value true. • Therefore, we set• w <-_ w x P(Sprinkler = true| Cloudy = true) = 0.1 .• Sample from P(Rain| Cloudy = true) = (0.8, 0.2); suppose this r

eturns true.• WetGrass is an evidence variable with value true. • Therefore, we set• w <- w x P(WetGrass = true Sprinkler = true, Rain = true) = 0.0

99 .

Page 100: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

100

Page 101: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

101

iii. Likelihood weighting analysis

• Sampling probability for WEIGHTED-SAMPLE is SWS(y,e) = l i=1P(yi|Parents(Yi))

• Note: pays attention to evidence in ancestors only somewhere “in between” prior and posterior distribution

• Weight for a given sample y,e, is w(y,e) = n i=1P(ei|Parents(Ei))

• Weighted sampling probability isSWS(y,e) w(y,e) = l i=1P(yi|Parents(Yi)) m i=1P(ei|Parents(Ei)) =

P(y,e) # by standard global semantics of network

• Hence, likelihood weighting is consistentconsistent• But performance still degrades w. many evidence variabl

es

Page 102: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

102

iv. MCMC Example• Estimate P(Rain|Sp

rinkler=true, WetGrass=true)

• Sample Cloudy then Rain, repeat.– Markov blanket of C

loudy is Sprinkler and Rain.

– Markov blanket of Rain is Cloudy, Sprinkler and WetGrass.

Page 103: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

103

iv. MCMC Example – cont.0. Random initial state: Cloudy=true and Rain=false1. P(Cloudy|MB(Cloudy)) = P(Cloudy|Sprinkler, Rai

n)sample false

2. P(Rain|MB(Rain)) = P(Rain|Cloudy, Sprinkler,WetGrass)

sample true

Visit 100 states 31 have Rain=true, 69 have Rain=false P’(Rain|Sprinkler=true,WetGrass=true) = NORMALIZ

E(<31,69>) = <0.31,0.69>

Page 104: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

104

Probability of x, given MB(x)

)(

))(|())(|())(|(XChildrenY

jj

j

YparentsyPXparentsxPXmbxP

Page 105: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

105

MCMC algorithm

Page 106: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

106

Performance of statistical algorithms

• Polytime approximation• Stochastic approximation techniques

such as likelihood weighting and MCMC – can give reasonable estimates of true pos

terior probabilities in a network, and – can cope with much larger networks

Page 107: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

107

14-6 Skip

Page 108: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

108

14-7 Other approaches to uncertain reasoning

• Different generations of expert systems– Strict logic reasoning (ignore uncertainty)– Probabilistic techniques using the full Joint– Default reasoning - believed until a better reaso

n is found to believe something else– Rules with certainty factors– Handling ignorance - Dempster-Shafer theory– Vagueness( 含糊 ) - something is sort of true (fuzz

y logic)• Probability makes the same ontological co

mmitment as logic: the event is true or false

Page 109: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

109

Default reasoning

• The four-wheel car conclusion is reached by default.

• New evidence can cause the conclusion retracted, while FOL is strictly monotonic.

• Representatives are default logic, nonmonotonic logic, circumscription

• There are problematic issues

Page 110: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

110

Rule-based methods

• Logical reasoning systems have properties like:– Monotonicity– Locality

• In logical systems, whenever we have a rule of the form AB, we can conclude B, given evidence A, without worrying about any other rules.

– Detachment• Once a logical proof is found for a proposition B, the proposition can

be used regardless of how it was derived. That is, it can be detached from its justification.

– Truth-functionality• In logic, the truth of complex sentences can be computed fr

om the truth of the components.

Page 111: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

111

Rule-based method

• These properties are good for obvious computational advantages;

• bad as they’re inappropriate for uncertain reasoning.

Page 112: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

112

Representing ignorance:

• Dempster—Shafer theory• The Dempster—Shafer theory is design

ed to deal with the distinction between uncertainty and ignorance( 無知 ).

• Rather than computing the probability of a proposition, it computes the probability that the evidence supports the proposition.

• This measure of belief is called a belief function, written Bel (X)

Page 113: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

113

Example• coin flipping for an example of belief functions. • Suppose a shady character comes up to you and offers to bet

you $10 that his coin will come up heads on the next flip. • Given that the coin might or might not be fair, what belief sh

ould you ascribe to the event that it comes up heads? • Dempster—Shafer theory says that because you have no evid

ence either way, you have to say that the belief Bel(Heads) = 0 and also that Bel(Heads) = 0.

• Now suppose you have an expert at your disposal who testifies with 90% certainty that the coin is fair (i.e., he is 90% sure that P(Heads) = 0.5).

• Then Dempster—Shafer theory gives Bel(Heads) = 0.9 x 0.5 = 0.45 and likewise Bel( Heads) = 0.45.

• There is still a 10 percentage point "gap" that is not accounted for by the evidence.

• "Dempster's rule" (Dempster, 1968) shows how to combine evidence to give new values for Bel, and Shafer's work extends this into a complete computational model.

Page 114: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

114

Fuzzy set & fuzzy logic• Fuzzy set theory is a means of specifying how well an object s

atisfies a vague description( 模糊的描述 ). • For example, consider the proposition "Nate is tall." Is this tr

ue, if Nate is 5' 10"? Most people would hesitate to answer "true" or "false," preferring to say, "sort of."

• Note that this is not a question of uncertainty about the external world—we are sure of Nate's height. The issue is that the linguistic term "tall" does not refer to a sharp demarcation of objects into two classes—there are degrees of tallness.

• For this reason, fuzzy set theory is not a method for uncertain reasoning at all.

• Rather, fuzzy set theory treats Tall as a fuzzy predicate and says that the truth value of Tall(Nate) is a number between 0 and 1, rather than being just true or false. The name "fuzzy set" derives from the interpretation of the predicate as implicitly defining a set of its members—a set that does not have sharp boundaries.

Page 115: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

115

Fuzzy logic

• Fuzzy logic is a method for reasoning with logical expressions describing membership in fuzzy sets.

• For example, the complex sentence Tall(Nate) Heavy(Nate) has a fuzzy truth value that is a function of the truth values of its components.

• The standard rules for evaluating the fuzzy truth, T, of a complex sentence are– T(A B) = min(T(A), T(B)) – T(A B) = max(T(A),T(B)) – T(A)=1-T(A).

Page 116: 1 Chapter 14 Probabilistic Reasoning. 2 Outlines Representing Knowledge in an Uncertain Domain The semantics of Bayesian Networks Efficient Representation

116

Summary• Reasoning properly

– In FOL, it means conclusions follow from premises– In probability, it means having beliefs that allow an

agent to act rationally• Conditional independence info is vital• A Bayesian network is a complete representati

on for the JPD, but exponentially smaller in size

• Bayesian networks can reason causally, diagnostically, intercausally, or combining two or more of the three.

• For polytrees, the computational time is linear in network size.