61
1 CMSC 671 CMSC 671 Fall 2010 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

Embed Size (px)

DESCRIPTION

3 Bayesian Reasoning Chapter 13

Citation preview

Page 1: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

1

CMSC 671CMSC 671Fall 2010Fall 2010

Class #18/19 – Wednesday, November 3 / Monday, November 8

Some material borrowed with permission from Lise Getoor

Page 2: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

2

Next two classes

• Probability theory (quick review!)• Bayesian networks

– Network structure– Conditional probability tables– Conditional independence

• Bayesian inference– From the joint distribution– Using independence/factoring– From sources of evidence

Page 3: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

3

Bayesian ReasoningBayesian Reasoning

Chapter 13

Page 4: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

4

Sources of uncertainty• Uncertain inputs

– Missing data– Noisy data

• Uncertain knowledge– Multiple causes lead to multiple effects– Incomplete enumeration of conditions or effects– Incomplete knowledge of causality in the domain– Probabilistic/stochastic effects

• Uncertain outputs– Abduction and induction are inherently uncertain– Default reasoning, even in deductive fashion, is uncertain– Incomplete deductive inference may be uncertain

Probabilistic reasoning only gives probabilistic results (summarizes uncertainty from various sources)

Page 5: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

5

Decision making with uncertainty

• Rational behavior:– For each possible action, identify the possible outcomes– Compute the probability of each outcome– Compute the utility of each outcome– Compute the probability-weighted (expected) utility

over possible outcomes for each action– Select the action with the highest expected utility

(principle of Maximum Expected Utility)

Page 6: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

6

Why probabilities anyway?• Kolmogorov showed that three simple axioms lead to the

rules of probability theory– De Finetti, Cox, and Carnap have also provided compelling

arguments for these axioms1. All probabilities are between 0 and 1:

• 0 ≤ P(a) ≤ 12. Valid propositions (tautologies) have probability 1, and

unsatisfiable propositions have probability 0:• P(true) = 1 ; P(false) = 0

3. The probability of a disjunction is given by:• P(a b) = P(a) + P(b) – P(a b)

aba b

Page 7: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

7

Probability theory

• Random variables– Domain

• Atomic event: complete specification of state

• Prior probability: degree of belief without any other evidence

• Joint probability: matrix of combined probabilities of a set of variables

• Alarm, Burglary, Earthquake– Boolean (like these), discrete,

continuous• Alarm=True Burglary=True

Earthquake=Falsealarm burglary ¬earthquake

• P(Burglary) = .1

• P(Alarm, Burglary) =alarm ¬alarm

burglary .09 .01¬burglary .1 .8

Page 8: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

8

Probability theory (cont.)

• Conditional probability: probability of effect given causes

• Computing conditional probs:– P(a | b) = P(a b) / P(b)– P(b): normalizing constant

• Product rule:– P(a b) = P(a | b) P(b)

• Marginalizing:– P(B) = ΣaP(B, a)

– P(B) = ΣaP(B | a) P(a) (conditioning)

• P(burglary | alarm) = .47P(alarm | burglary) = .9

• P(burglary | alarm) = P(burglary alarm) / P(alarm) = .09 / .19 = .47

• P(burglary alarm) = P(burglary | alarm) P(alarm) = .47 * .19 = .09

• P(alarm) = P(alarm burglary) + P(alarm ¬burglary) = .09+.1 = .19

Page 9: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

9

Example: Inference from the jointalarm ¬alarmearthquake ¬earthquake earthquake ¬earthquake

burglary .01 .08 .001 .009¬burglary .01 .09 .01 .79

P(Burglary | alarm) = α P(Burglary, alarm) = α [P(Burglary, alarm, earthquake) + P(Burglary, alarm, ¬earthquake) = α [ (.01, .01) + (.08, .09) ] = α [ (.09, .1) ]

Since P(burglary | alarm) + P(¬burglary | alarm) = 1, α = 1/(.09+.1) = 5.26 (i.e., P(alarm) = 1/α = .19 – quizlet: how can you verify this?)

P(burglary | alarm) = .09 * 5.26 = .474

P(¬burglary | alarm) = .1 * 5.26 = .526

Page 10: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

10

Exercise: Inference from the joint

• Queries:– What is the prior probability of smart?– What is the prior probability of study?– What is the conditional probability of prepared, given

study and smart?• Save these answers for next time!

p(smart study prep)

smart smart

study study study study

prepared .432 .16 .084 .008

prepared .048 .16 .036 .072

Page 11: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

11

Independence• When two sets of propositions do not affect each others’

probabilities, we call them independent, and can easily compute their joint and conditional probability:– Independent (A, B) → P(A B) = P(A) P(B), P(A | B) = P(A)

• For example, {moon-phase, light-level} might be independent of {burglary, alarm, earthquake}– Then again, it might not: Burglars might be more likely to

burglarize houses when there’s a new moon (and hence little light)– But if we know the light level, the moon phase doesn’t affect

whether we are burglarized– Once we’re burglarized, light level doesn’t affect whether the alarm

goes off• We need a more complex notion of independence, and

methods for reasoning about these kinds of relationships

Page 12: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

12

Exercise: Independence

• Queries:– Is smart independent of study?– Is prepared independent of study?

p(smart study prep)

smart smart

study study study study

prepared .432 .16 .084 .008

prepared .048 .16 .036 .072

Page 13: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

13

Conditional independence• Absolute independence:

– A and B are independent if P(A B) = P(A) P(B); equivalently, P(A) = P(A | B) and P(B) = P(B | A)

• A and B are conditionally independent given C if– P(A B | C) = P(A | C) P(B | C)

• This lets us decompose the joint distribution:– P(A B C) = P(A | C) P(B | C) P(C)

• Moon-Phase and Burglary are conditionally independent given Light-Level

• Conditional independence is weaker than absolute independence, but still useful in decomposing the full joint probability distribution

Page 14: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

14

Exercise: Conditional independence

• Queries:– Is smart conditionally independent of prepared, given

study?– Is study conditionally independent of prepared, given

smart?

p(smart study prep)

smart smart

study study study study

prepared .432 .16 .084 .008

prepared .048 .16 .036 .072

Page 15: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

15

Bayes’s rule• Bayes’s rule is derived from the product rule:

– P(Y | X) = P(X | Y) P(Y) / P(X)

• Often useful for diagnosis: – If X are (observed) effects and Y are (hidden) causes, – We may have a model for how causes lead to effects (P(X | Y))– We may also have prior beliefs (based on experience) about the

frequency of occurrence of effects (P(Y))– Which allows us to reason abductively from effects to causes (P(Y |

X)).

Page 16: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

16

Bayesian inference• In the setting of diagnostic/evidential reasoning

– Know prior probability of hypothesis conditional probability

– Want to compute the posterior probability• Bayes’ theorem (formula 1):

onsanifestatievidence/m

hypotheses

1 mj

i

EEE

H

)(/)|()()|( jijiji EPHEPHPEHP

)( iHP)|( ij HEP

)|( ij HEP

)|( ji EHP

)( iHP

Page 17: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

17

Simple Bayesian diagnostic reasoning

• Knowledge base:– Evidence / manifestations: E1, … Em

– Hypotheses / disorders: H1, … Hn

• Ej and Hi are binary; hypotheses are mutually exclusive (non-overlapping) and exhaustive (cover all possible cases)

– Conditional probabilities: P(Ej | Hi), i = 1, … n; j = 1, … m

• Cases (evidence for a particular instance): E1, …, El

• Goal: Find the hypothesis Hi with the highest posterior– Maxi P(Hi | E1, …, El)

Page 18: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

18

Bayesian diagnostic reasoning II

• Bayes’ rule says that– P(Hi | E1, …, El) = P(E1, …, El | Hi) P(Hi) / P(E1, …, El)

• Assume each piece of evidence Ei is conditionally independent of the others, given a hypothesis Hi, then:– P(E1, …, El | Hi) = l

j=1 P(Ej | Hi)

• If we only care about relative probabilities for the Hi, then we have:– P(Hi | E1, …, El) = α P(Hi) l

j=1 P(Ej | Hi)

Page 19: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

19

Limitations of simple Bayesian inference

• Cannot easily handle multi-fault situation, nor cases where intermediate (hidden) causes exist:– Disease D causes syndrome S, which causes correlated

manifestations M1 and M2

• Consider a composite hypothesis H1 H2, where H1 and H2 are independent. What is the relative posterior?– P(H1 H2 | E1, …, El) = α P(E1, …, El | H1 H2) P(H1 H2)

= α P(E1, …, El | H1 H2) P(H1) P(H2)= α l

j=1 P(Ej | H1 H2) P(H1) P(H2)

• How do we compute P(Ej | H1 H2) ??

Page 20: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

20

Limitations of simple Bayesian inference II

• Assume H1 and H2 are independent, given E1, …, El?– P(H1 H2 | E1, …, El) = P(H1 | E1, …, El) P(H2 | E1, …, El)

• This is a very unreasonable assumption– Earthquake and Burglar are independent, but not given Alarm:

• P(burglar | alarm, earthquake) << P(burglar | alarm)

• Another limitation is that simple application of Bayes’s rule doesn’t allow us to handle causal chaining:

– A: this year’s weather; B: cotton production; C: next year’s cotton price– A influences C indirectly: A→ B → C– P(C | B, A) = P(C | B)

• Need a richer representation to model interacting hypotheses, conditional independence, and causal chaining

• Next time: conditional independence and Bayesian networks!

Page 21: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

21

Bayesian NetworksBayesian Networks

Chapter 14.1-14.3

Some material borrowedfrom Lise Getoor

Page 22: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

22

Bayesian Belief Networks (BNs)

• Definition: BN = (DAG, CPD) – DAG: directed acyclic graph (BN’s structure)

• Nodes: random variables (typically binary or discrete, but methods also exist to handle continuous variables)

• Arcs: indicate probabilistic dependencies between nodes (lack of link signifies conditional independence)

– CPD: conditional probability distribution (BN’s parameters)• Conditional probabilities at each node, usually stored as a table

(conditional probability table, or CPT)

– Root nodes are a special case – no parents, so just use priors in CPD:

iiii xxP of nodesparent all ofset theis where)|( ππ

)()|( so , iiii xPxP ∅ ππ

Page 23: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

23

Example BN

a

b c

d e

P(C|A) = 0.2 P(C|A) = 0.005P(B|A) = 0.3

P(B|A) = 0.001

P(A) = 0.001

P(D|B,C) = 0.1 P(D|B,C) = 0.01P(D|B,C) = 0.01 P(D|B,C) = 0.00001

P(E|C) = 0.4 P(E|C) = 0.002

Note that we only specify P(A) etc., not P(¬A), since they have to add to one

Page 24: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

24

• Conditional independence assumption–

where q is any set of variables (nodes) other than and its successors

– blocks influence of other nodes on and its successors (q influences onlythrough variables in )

– With this assumption, the complete joint probability distribution of all variables in the network can be represented by (recovered from) local CPDs by chaining these CPDs:

ix

)|(),...,( 11 iinin xPxxP πΠ

)|(),|( iiii xPqxP ππ

ix iπ ix

iπ q

ix iπ

Conditional independence and chaining

Page 25: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

25

Chaining: Example

Computing the joint probability for all variables is easy:

P(a, b, c, d, e) = P(e | a, b, c, d) P(a, b, c, d) by the product rule= P(e | c) P(a, b, c, d) by cond. indep. assumption= P(e | c) P(d | a, b, c) P(a, b, c) = P(e | c) P(d | b, c) P(c | a, b) P(a, b)= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)

a

b c

d e

Page 26: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

26

Topological semantics

• A node is conditionally independent of its non-descendants given its parents

• A node is conditionally independent of all other nodes in the network given its parents, children, and children’s parents (also known as its Markov blanket)

• The method called d-separation can be applied to decide whether a set of nodes X is independent of another set Y, given a third set Z

Page 27: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

28

Inference in Bayesian Inference in Bayesian NetworksNetworks

Chapter 14.4-14.5

Some material borrowedfrom Lise Getoor

Page 28: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

29

Inference tasks• Simple queries: Computer posterior marginal P(Xi | E=e)

– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)

• Conjunctive queries: – P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)

• Optimal decisions: Decision networks include utility information; probabilistic inference is required to find P(outcome | action, evidence)

• Value of information: Which evidence should we seek next?• Sensitivity analysis: Which probability values are most

critical?• Explanation: Why do I need a new starter motor?

Page 29: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

30

Approaches to inference

• Exact inference – Enumeration– Belief propagation in polytrees– Variable elimination– Clustering / join tree algorithms

• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Genetic algorithms– Neural networks– Simulated annealing– Mean field theory

Page 30: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

31

Direct inference with BNs

• Instead of computing the joint, suppose we just want the probability for one variable

• Exact methods of computation:– Enumeration– Variable elimination– Join trees: get the probabilities associated with every query variable

Page 31: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

32

Inference by enumeration

• Add all of the terms (atomic event probabilities) from the full joint distribution

• If E are the evidence (observed) variables and Y are the other (unobserved) variables, then:

P(X|e) = α P(X, E) = α ∑ P(X, E, Y)• Each P(X, E, Y) term can be computed using the chain rule• Computationally expensive!

Page 32: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

33

Example: Enumeration

• P(xi) = Σ πi P(xi | πi) P(πi)• Suppose we want P(D=true), and only the value of E is

given as true• P (d|e) = ΣABCP(a, b, c, d, e)

= ΣABCP(a) P(b|a) P(c|a) P(d|b,c) P(e|c)• With simple iteration to compute this expression, there’s

going to be a lot of repetition (e.g., P(e|c) has to be recomputed every time we iterate over C=true)

a

b c

d e

Page 33: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

34

Exercise: Enumeration

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smart

study .9 .7

study .5 .1

p(pass|…)smart smart

prep prep prep prep

fair .9 .7 .7 .2

fair .1 .1 .1 .1

Query: What is the probability that a student studied, given that they pass the exam?

Page 34: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

35

Variable elimination

• Basically just enumeration, but with caching of local calculations

• Linear for polytrees (singly connected BNs)• Potentially exponential for multiply connected BNs

Exact inference in Bayesian networks is NP-hard!• Join tree algorithms are an extension of variable elimination

methods that compute posterior probabilities for all nodes in a BN simultaneously

Page 35: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

36

Variable elimination

General idea:• Write query in the form

• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product

∑ ∑∑∏kx x x i

iin paxPXP3 2

)|(),( Le

Page 36: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

37

Variable elimination: ExampleVariable elimination: Example

RainSprinkler

Cloudy

WetGrass

∑c,s,r

)c(P)c|s(P)c|r(P)s,r|w(P)w(P

∑ ∑s,r c

)c(P)c|s(P)c|r(P)s,r|w(P

∑s,r

1 )s,r(f)s,r|w(P )s,r(f1

Page 37: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

39

A more complex example

Visit to Asia Smoking

Lung CancerTuberculosis

Abnormalityin Chest Bronchitis

X-Ray Dyspnea

• “Asia” network:

Page 38: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

40

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b

Initial factors

Page 39: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

41

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b

Initial factors

Eliminate: v

Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term

Compute: ∑v

v vtPvPtf )|()()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

Page 40: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

42

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b

• Initial factors

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables

Compute: ∑s

s slPsbPsPlbf )|()|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

Page 41: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

43

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: x,t,l,a,b

• Initial factors

Eliminate: x

Note: fx(a) = 1 for all values of a !!

Compute: ∑x

x axPaf )|()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

Page 42: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

44

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: t,l,a,b

• Initial factors

Eliminate: tCompute: ∑

tvt ltaPtflaf ),|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|(),()(),( badPlafaflbf txs⇒

Page 43: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

45

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: l,a,b

• Initial factors

Eliminate: lCompute: ∑

ltsl laflbfbaf ),(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|(),()(),( badPlafaflbf txs⇒

),|()(),( badPafbaf xl⇒

Page 44: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

46

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: b

• Initial factors

Eliminate: a,bCompute:

∑∑ b

aba

xla dbfdfbadpafbafdbf ),()(),|()(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒

),|()|(),|(),()( badPaxPltaPlbftf sv⇒

),|(),|()(),()( badPltaPaflbftf xsv⇒

),|()(),( badPafbaf xl⇒),|(),()(),( badPlafaflbf txs⇒

)(),( dfdbf ba ⇒⇒

Page 45: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

47

Dealing with evidence

• How do we deal with evidence?

• Suppose we are give evidence V = t, S = f, D = t• We want to compute P(L, V = t, S = f, D = t)

V S

LT

A B

X D

Page 46: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

48

Dealing with evidence • We start by writing the factors:

• Since we know that V = t, we don’t need to eliminate V• Instead, we can replace the factors P(V) and P(T|V) with

• These “select” the appropriate parts of the original factors given the evidence• Note that fp(V) is a constant, and thus does not appear in elimination of other variables

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

V S

LT

A B

X D

)|()()( )|()( tVTPTftVPf VTpVP

Page 47: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

49

Dealing with evidence • Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

Page 48: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

50

• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:

• Eliminating x, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

Dealing with evidence

Page 49: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

51

Dealing with evidence

• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:

• Eliminating x, we get

• Eliminating t, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

Page 50: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

52

Dealing with evidence

• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:

• Eliminating x, we get

• Eliminating t, we get

• Eliminating a, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

),()()( )|()|()()( lbfbflfff asbPslPsPvP

Page 51: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

53

Dealing with evidence • Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:

• Eliminating x, we get

• Eliminating t, we get

• Eliminating a, we get

• Eliminating b, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

),()()( )|()|()()( lbfbflfff asbPslPsPvP

)()()|()()( lflfff bslPsPvP

Page 52: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

54

Variable elimination algorithm• Let X1,…, Xm be an ordering on the non-query variables

• For i = m, …, 1

– Leave in the summation for Xi only factors mentioning Xi

– Multiply the factors, getting a factor that contains a number for each value of the variables mentioned, including Xi

– Sum out Xi, getting a factor f that contains a number for each value of the variables mentioned, not including Xi

– Replace the multiplied factor in the summation

∏∑ ∑∑j

jjX XX

))X(Parents|X(P...1 m2

Page 53: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

55

∑x

kxkx yyxfyyf ),,,('),,( 11 KK

m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,(' KK

Complexity of variable eliminationSuppose in one elimination step we compute

This requires

multiplications (for each value for x, y1, …, yk, we do m multiplications) and

additions (for each value of y1, …, yk , we do |Val(X)| additions)

►Complexity is exponential in the number of variables in the intermediate factors►Finding an optimal ordering is NP-hard

∏⋅⋅i

iYXm )Val()Val(

∏⋅i

iYX )Val()Val(

Page 54: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

56

Exercise: Variable elimination

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smart

study .9 .7

study .5 .1

p(pass|…)smart smart

prep prep prep prep

fair .9 .7 .7 .2

fair .1 .1 .1 .1

Query: What is the probability that a student is smart, given that they pass the exam?

Page 55: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

57

Conditioning

• Conditioning: Find the network’s smallest cutset S (a set of nodes whose removal renders the network singly connected)

– In this network, S = {A} or {B} or {C} or {D}• For each instantiation of S, compute the belief update with your favorite

inference algorithm• Combine the results from all instantiations of S• Computationally expensive (finding the smallest cutset is in general

NP-hard, and the total number of possible instantiations of S is O(2|S|))

a

b c

d e

Page 56: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

58

Approximate inference:Direct sampling

• Suppose you are given values for some subset of the variables, E, and want to infer values for unknown variables, Z

• Randomly generate a very large number of instantiations from the BN– Generate instantiations for all variables – start at root variables and

work your way “forward” in topological order• Rejection sampling: Only keep those instantiations that are

consistent with the values for E• Use the frequency of values for Z to get estimated

probabilities• Accuracy of the results depends on the size of the sample

(asymptotically approaches exact results)

Page 57: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

59

Exercise: Direct sampling

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smart

study .9 .7

study .5 .1

p(pass|…)smart smart

prep prep prep prep

fair .9 .7 .7 .2

fair .1 .1 .1 .1

Topological order = …?Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42

Page 58: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

60

Likelihood weighting

• Idea: Don’t generate samples that need to be rejected in the first place!

• Sample only from the unknown variables Z• Weight each sample according to the likelihood that it

would occur, given the evidence E

Page 59: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

61

Markov chain Monte Carlo algorithm• So called because

– Markov chain – each instance generated in the sample is dependent on the previous instance

– Monte Carlo – statistical sampling method

• Perform a random walk through variable assignment space, collecting statistics as you go– Start with a random instantiation, consistent with evidence variables– At each step, for some nonevidence variable, randomly sample its

value, consistent with the other current assignments

• Given enough samples, MCMC gives an accurate estimate of the true distribution of values

Page 60: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

62

Exercise: MCMC sampling

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smart

study .9 .7

study .5 .1

p(pass|…)smart smart

prep prep prep prep

fair .9 .7 .7 .2

fair .1 .1 .1 .1

Topological order = …?Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42

Page 61: 1 CMSC 671 Fall 2010 Class #18/19 – Wednesday, November 3 / Monday, November 8 Some material borrowed with permission from Lise Getoor

63

Summary

• Bayes nets– Structure– Parameters– Conditional independence– Chaining

• BN inference– Enumeration– Variable elimination– Sampling methods