Upload
andrew-hawkins
View
219
Download
0
Embed Size (px)
DESCRIPTION
3 Bayesian Reasoning Chapter 13
Citation preview
1
CMSC 671CMSC 671Fall 2010Fall 2010
Class #18/19 – Wednesday, November 3 / Monday, November 8
Some material borrowed with permission from Lise Getoor
2
Next two classes
• Probability theory (quick review!)• Bayesian networks
– Network structure– Conditional probability tables– Conditional independence
• Bayesian inference– From the joint distribution– Using independence/factoring– From sources of evidence
3
Bayesian ReasoningBayesian Reasoning
Chapter 13
4
Sources of uncertainty• Uncertain inputs
– Missing data– Noisy data
• Uncertain knowledge– Multiple causes lead to multiple effects– Incomplete enumeration of conditions or effects– Incomplete knowledge of causality in the domain– Probabilistic/stochastic effects
• Uncertain outputs– Abduction and induction are inherently uncertain– Default reasoning, even in deductive fashion, is uncertain– Incomplete deductive inference may be uncertain
Probabilistic reasoning only gives probabilistic results (summarizes uncertainty from various sources)
5
Decision making with uncertainty
• Rational behavior:– For each possible action, identify the possible outcomes– Compute the probability of each outcome– Compute the utility of each outcome– Compute the probability-weighted (expected) utility
over possible outcomes for each action– Select the action with the highest expected utility
(principle of Maximum Expected Utility)
6
Why probabilities anyway?• Kolmogorov showed that three simple axioms lead to the
rules of probability theory– De Finetti, Cox, and Carnap have also provided compelling
arguments for these axioms1. All probabilities are between 0 and 1:
• 0 ≤ P(a) ≤ 12. Valid propositions (tautologies) have probability 1, and
unsatisfiable propositions have probability 0:• P(true) = 1 ; P(false) = 0
3. The probability of a disjunction is given by:• P(a b) = P(a) + P(b) – P(a b)
aba b
7
Probability theory
• Random variables– Domain
• Atomic event: complete specification of state
• Prior probability: degree of belief without any other evidence
• Joint probability: matrix of combined probabilities of a set of variables
• Alarm, Burglary, Earthquake– Boolean (like these), discrete,
continuous• Alarm=True Burglary=True
Earthquake=Falsealarm burglary ¬earthquake
• P(Burglary) = .1
• P(Alarm, Burglary) =alarm ¬alarm
burglary .09 .01¬burglary .1 .8
8
Probability theory (cont.)
• Conditional probability: probability of effect given causes
• Computing conditional probs:– P(a | b) = P(a b) / P(b)– P(b): normalizing constant
• Product rule:– P(a b) = P(a | b) P(b)
• Marginalizing:– P(B) = ΣaP(B, a)
– P(B) = ΣaP(B | a) P(a) (conditioning)
• P(burglary | alarm) = .47P(alarm | burglary) = .9
• P(burglary | alarm) = P(burglary alarm) / P(alarm) = .09 / .19 = .47
• P(burglary alarm) = P(burglary | alarm) P(alarm) = .47 * .19 = .09
• P(alarm) = P(alarm burglary) + P(alarm ¬burglary) = .09+.1 = .19
9
Example: Inference from the jointalarm ¬alarmearthquake ¬earthquake earthquake ¬earthquake
burglary .01 .08 .001 .009¬burglary .01 .09 .01 .79
P(Burglary | alarm) = α P(Burglary, alarm) = α [P(Burglary, alarm, earthquake) + P(Burglary, alarm, ¬earthquake) = α [ (.01, .01) + (.08, .09) ] = α [ (.09, .1) ]
Since P(burglary | alarm) + P(¬burglary | alarm) = 1, α = 1/(.09+.1) = 5.26 (i.e., P(alarm) = 1/α = .19 – quizlet: how can you verify this?)
P(burglary | alarm) = .09 * 5.26 = .474
P(¬burglary | alarm) = .1 * 5.26 = .526
10
Exercise: Inference from the joint
• Queries:– What is the prior probability of smart?– What is the prior probability of study?– What is the conditional probability of prepared, given
study and smart?• Save these answers for next time!
p(smart study prep)
smart smart
study study study study
prepared .432 .16 .084 .008
prepared .048 .16 .036 .072
11
Independence• When two sets of propositions do not affect each others’
probabilities, we call them independent, and can easily compute their joint and conditional probability:– Independent (A, B) → P(A B) = P(A) P(B), P(A | B) = P(A)
• For example, {moon-phase, light-level} might be independent of {burglary, alarm, earthquake}– Then again, it might not: Burglars might be more likely to
burglarize houses when there’s a new moon (and hence little light)– But if we know the light level, the moon phase doesn’t affect
whether we are burglarized– Once we’re burglarized, light level doesn’t affect whether the alarm
goes off• We need a more complex notion of independence, and
methods for reasoning about these kinds of relationships
12
Exercise: Independence
• Queries:– Is smart independent of study?– Is prepared independent of study?
p(smart study prep)
smart smart
study study study study
prepared .432 .16 .084 .008
prepared .048 .16 .036 .072
13
Conditional independence• Absolute independence:
– A and B are independent if P(A B) = P(A) P(B); equivalently, P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if– P(A B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:– P(A B C) = P(A | C) P(B | C) P(C)
• Moon-Phase and Burglary are conditionally independent given Light-Level
• Conditional independence is weaker than absolute independence, but still useful in decomposing the full joint probability distribution
14
Exercise: Conditional independence
• Queries:– Is smart conditionally independent of prepared, given
study?– Is study conditionally independent of prepared, given
smart?
p(smart study prep)
smart smart
study study study study
prepared .432 .16 .084 .008
prepared .048 .16 .036 .072
15
Bayes’s rule• Bayes’s rule is derived from the product rule:
– P(Y | X) = P(X | Y) P(Y) / P(X)
• Often useful for diagnosis: – If X are (observed) effects and Y are (hidden) causes, – We may have a model for how causes lead to effects (P(X | Y))– We may also have prior beliefs (based on experience) about the
frequency of occurrence of effects (P(Y))– Which allows us to reason abductively from effects to causes (P(Y |
X)).
16
Bayesian inference• In the setting of diagnostic/evidential reasoning
– Know prior probability of hypothesis conditional probability
– Want to compute the posterior probability• Bayes’ theorem (formula 1):
onsanifestatievidence/m
hypotheses
1 mj
i
EEE
H
)(/)|()()|( jijiji EPHEPHPEHP
)( iHP)|( ij HEP
)|( ij HEP
)|( ji EHP
)( iHP
17
Simple Bayesian diagnostic reasoning
• Knowledge base:– Evidence / manifestations: E1, … Em
– Hypotheses / disorders: H1, … Hn
• Ej and Hi are binary; hypotheses are mutually exclusive (non-overlapping) and exhaustive (cover all possible cases)
– Conditional probabilities: P(Ej | Hi), i = 1, … n; j = 1, … m
• Cases (evidence for a particular instance): E1, …, El
• Goal: Find the hypothesis Hi with the highest posterior– Maxi P(Hi | E1, …, El)
18
Bayesian diagnostic reasoning II
• Bayes’ rule says that– P(Hi | E1, …, El) = P(E1, …, El | Hi) P(Hi) / P(E1, …, El)
• Assume each piece of evidence Ei is conditionally independent of the others, given a hypothesis Hi, then:– P(E1, …, El | Hi) = l
j=1 P(Ej | Hi)
• If we only care about relative probabilities for the Hi, then we have:– P(Hi | E1, …, El) = α P(Hi) l
j=1 P(Ej | Hi)
19
Limitations of simple Bayesian inference
• Cannot easily handle multi-fault situation, nor cases where intermediate (hidden) causes exist:– Disease D causes syndrome S, which causes correlated
manifestations M1 and M2
• Consider a composite hypothesis H1 H2, where H1 and H2 are independent. What is the relative posterior?– P(H1 H2 | E1, …, El) = α P(E1, …, El | H1 H2) P(H1 H2)
= α P(E1, …, El | H1 H2) P(H1) P(H2)= α l
j=1 P(Ej | H1 H2) P(H1) P(H2)
• How do we compute P(Ej | H1 H2) ??
20
Limitations of simple Bayesian inference II
• Assume H1 and H2 are independent, given E1, …, El?– P(H1 H2 | E1, …, El) = P(H1 | E1, …, El) P(H2 | E1, …, El)
• This is a very unreasonable assumption– Earthquake and Burglar are independent, but not given Alarm:
• P(burglar | alarm, earthquake) << P(burglar | alarm)
• Another limitation is that simple application of Bayes’s rule doesn’t allow us to handle causal chaining:
– A: this year’s weather; B: cotton production; C: next year’s cotton price– A influences C indirectly: A→ B → C– P(C | B, A) = P(C | B)
• Need a richer representation to model interacting hypotheses, conditional independence, and causal chaining
• Next time: conditional independence and Bayesian networks!
21
Bayesian NetworksBayesian Networks
Chapter 14.1-14.3
Some material borrowedfrom Lise Getoor
22
Bayesian Belief Networks (BNs)
• Definition: BN = (DAG, CPD) – DAG: directed acyclic graph (BN’s structure)
• Nodes: random variables (typically binary or discrete, but methods also exist to handle continuous variables)
• Arcs: indicate probabilistic dependencies between nodes (lack of link signifies conditional independence)
– CPD: conditional probability distribution (BN’s parameters)• Conditional probabilities at each node, usually stored as a table
(conditional probability table, or CPT)
– Root nodes are a special case – no parents, so just use priors in CPD:
iiii xxP of nodesparent all ofset theis where)|( ππ
)()|( so , iiii xPxP ∅ ππ
23
Example BN
a
b c
d e
P(C|A) = 0.2 P(C|A) = 0.005P(B|A) = 0.3
P(B|A) = 0.001
P(A) = 0.001
P(D|B,C) = 0.1 P(D|B,C) = 0.01P(D|B,C) = 0.01 P(D|B,C) = 0.00001
P(E|C) = 0.4 P(E|C) = 0.002
Note that we only specify P(A) etc., not P(¬A), since they have to add to one
24
• Conditional independence assumption–
where q is any set of variables (nodes) other than and its successors
– blocks influence of other nodes on and its successors (q influences onlythrough variables in )
– With this assumption, the complete joint probability distribution of all variables in the network can be represented by (recovered from) local CPDs by chaining these CPDs:
ix
)|(),...,( 11 iinin xPxxP πΠ
)|(),|( iiii xPqxP ππ
ix iπ ix
iπ q
ix iπ
Conditional independence and chaining
25
Chaining: Example
Computing the joint probability for all variables is easy:
P(a, b, c, d, e) = P(e | a, b, c, d) P(a, b, c, d) by the product rule= P(e | c) P(a, b, c, d) by cond. indep. assumption= P(e | c) P(d | a, b, c) P(a, b, c) = P(e | c) P(d | b, c) P(c | a, b) P(a, b)= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)
a
b c
d e
26
Topological semantics
• A node is conditionally independent of its non-descendants given its parents
• A node is conditionally independent of all other nodes in the network given its parents, children, and children’s parents (also known as its Markov blanket)
• The method called d-separation can be applied to decide whether a set of nodes X is independent of another set Y, given a third set Z
28
Inference in Bayesian Inference in Bayesian NetworksNetworks
Chapter 14.4-14.5
Some material borrowedfrom Lise Getoor
29
Inference tasks• Simple queries: Computer posterior marginal P(Xi | E=e)
– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
• Conjunctive queries: – P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
• Optimal decisions: Decision networks include utility information; probabilistic inference is required to find P(outcome | action, evidence)
• Value of information: Which evidence should we seek next?• Sensitivity analysis: Which probability values are most
critical?• Explanation: Why do I need a new starter motor?
30
Approaches to inference
• Exact inference – Enumeration– Belief propagation in polytrees– Variable elimination– Clustering / join tree algorithms
• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Genetic algorithms– Neural networks– Simulated annealing– Mean field theory
31
Direct inference with BNs
• Instead of computing the joint, suppose we just want the probability for one variable
• Exact methods of computation:– Enumeration– Variable elimination– Join trees: get the probabilities associated with every query variable
32
Inference by enumeration
• Add all of the terms (atomic event probabilities) from the full joint distribution
• If E are the evidence (observed) variables and Y are the other (unobserved) variables, then:
P(X|e) = α P(X, E) = α ∑ P(X, E, Y)• Each P(X, E, Y) term can be computed using the chain rule• Computationally expensive!
33
Example: Enumeration
• P(xi) = Σ πi P(xi | πi) P(πi)• Suppose we want P(D=true), and only the value of E is
given as true• P (d|e) = ΣABCP(a, b, c, d, e)
= ΣABCP(a) P(b|a) P(c|a) P(d|b,c) P(e|c)• With simple iteration to compute this expression, there’s
going to be a lot of repetition (e.g., P(e|c) has to be recomputed every time we iterate over C=true)
a
b c
d e
34
Exercise: Enumeration
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smart
study .9 .7
study .5 .1
p(pass|…)smart smart
prep prep prep prep
fair .9 .7 .7 .2
fair .1 .1 .1 .1
Query: What is the probability that a student studied, given that they pass the exam?
35
Variable elimination
• Basically just enumeration, but with caching of local calculations
• Linear for polytrees (singly connected BNs)• Potentially exponential for multiply connected BNs
Exact inference in Bayesian networks is NP-hard!• Join tree algorithms are an extension of variable elimination
methods that compute posterior probabilities for all nodes in a BN simultaneously
36
Variable elimination
General idea:• Write query in the form
• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product
∑ ∑∑∏kx x x i
iin paxPXP3 2
)|(),( Le
37
Variable elimination: ExampleVariable elimination: Example
RainSprinkler
Cloudy
WetGrass
∑c,s,r
)c(P)c|s(P)c|r(P)s,r|w(P)w(P
∑ ∑s,r c
)c(P)c|s(P)c|r(P)s,r|w(P
∑s,r
1 )s,r(f)s,r|w(P )s,r(f1
39
A more complex example
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
• “Asia” network:
40
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b
Initial factors
41
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b
Initial factors
Eliminate: v
Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term
Compute: ∑v
v vtPvPtf )|()()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
42
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b
• Initial factors
Eliminate: s
Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables
Compute: ∑s
s slPsbPsPlbf )|()|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
),|()|(),|(),()( badPaxPltaPlbftf sv⇒
43
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: x,t,l,a,b
• Initial factors
Eliminate: x
Note: fx(a) = 1 for all values of a !!
Compute: ∑x
x axPaf )|()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
),|()|(),|(),()( badPaxPltaPlbftf sv⇒
),|(),|()(),()( badPltaPaflbftf xsv⇒
44
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: t,l,a,b
• Initial factors
Eliminate: tCompute: ∑
tvt ltaPtflaf ),|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
),|()|(),|(),()( badPaxPltaPlbftf sv⇒
),|(),|()(),()( badPltaPaflbftf xsv⇒
),|(),()(),( badPlafaflbf txs⇒
45
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: l,a,b
• Initial factors
Eliminate: lCompute: ∑
ltsl laflbfbaf ),(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
),|()|(),|(),()( badPaxPltaPlbftf sv⇒
),|(),|()(),()( badPltaPaflbftf xsv⇒
),|(),()(),( badPlafaflbf txs⇒
),|()(),( badPafbaf xl⇒
46
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: b
• Initial factors
Eliminate: a,bCompute:
∑∑ b
aba
xla dbfdfbadpafbafdbf ),()(),|()(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv⇒
),|()|(),|(),()( badPaxPltaPlbftf sv⇒
),|(),|()(),()( badPltaPaflbftf xsv⇒
),|()(),( badPafbaf xl⇒),|(),()(),( badPlafaflbf txs⇒
)(),( dfdbf ba ⇒⇒
47
Dealing with evidence
• How do we deal with evidence?
• Suppose we are give evidence V = t, S = f, D = t• We want to compute P(L, V = t, S = f, D = t)
V S
LT
A B
X D
48
Dealing with evidence • We start by writing the factors:
• Since we know that V = t, we don’t need to eliminate V• Instead, we can replace the factors P(V) and P(T|V) with
• These “select” the appropriate parts of the original factors given the evidence• Note that fp(V) is a constant, and thus does not appear in elimination of other variables
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
V S
LT
A B
X D
)|()()( )|()( tVTPTftVPf VTpVP
49
Dealing with evidence • Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
50
• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:
• Eliminating x, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
Dealing with evidence
51
Dealing with evidence
• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:
• Eliminating x, we get
• Eliminating t, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
52
Dealing with evidence
• Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:
• Eliminating x, we get
• Eliminating t, we get
• Eliminating a, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
),()()( )|()|()()( lbfbflfff asbPslPsPvP
53
Dealing with evidence • Given evidence V = t, S = f, D = t• Compute P(L, V = t, S = f, D = t )• Initial factors, after setting evidence:
• Eliminating x, we get
• Eliminating t, we get
• Eliminating a, we get
• Eliminating b, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
),()()( )|()|()()( lbfbflfff asbPslPsPvP
)()()|()()( lflfff bslPsPvP
54
Variable elimination algorithm• Let X1,…, Xm be an ordering on the non-query variables
• For i = m, …, 1
– Leave in the summation for Xi only factors mentioning Xi
– Multiply the factors, getting a factor that contains a number for each value of the variables mentioned, including Xi
– Sum out Xi, getting a factor f that contains a number for each value of the variables mentioned, not including Xi
– Replace the multiplied factor in the summation
∏∑ ∑∑j
jjX XX
))X(Parents|X(P...1 m2
55
∑x
kxkx yyxfyyf ),,,('),,( 11 KK
∏
m
ilikx i
yyxfyyxf1
,1,1,11 ),,(),,,(' KK
Complexity of variable eliminationSuppose in one elimination step we compute
This requires
multiplications (for each value for x, y1, …, yk, we do m multiplications) and
additions (for each value of y1, …, yk , we do |Val(X)| additions)
►Complexity is exponential in the number of variables in the intermediate factors►Finding an optimal ordering is NP-hard
∏⋅⋅i
iYXm )Val()Val(
∏⋅i
iYX )Val()Val(
56
Exercise: Variable elimination
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smart
study .9 .7
study .5 .1
p(pass|…)smart smart
prep prep prep prep
fair .9 .7 .7 .2
fair .1 .1 .1 .1
Query: What is the probability that a student is smart, given that they pass the exam?
57
Conditioning
• Conditioning: Find the network’s smallest cutset S (a set of nodes whose removal renders the network singly connected)
– In this network, S = {A} or {B} or {C} or {D}• For each instantiation of S, compute the belief update with your favorite
inference algorithm• Combine the results from all instantiations of S• Computationally expensive (finding the smallest cutset is in general
NP-hard, and the total number of possible instantiations of S is O(2|S|))
a
b c
d e
58
Approximate inference:Direct sampling
• Suppose you are given values for some subset of the variables, E, and want to infer values for unknown variables, Z
• Randomly generate a very large number of instantiations from the BN– Generate instantiations for all variables – start at root variables and
work your way “forward” in topological order• Rejection sampling: Only keep those instantiations that are
consistent with the values for E• Use the frequency of values for Z to get estimated
probabilities• Accuracy of the results depends on the size of the sample
(asymptotically approaches exact results)
59
Exercise: Direct sampling
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smart
study .9 .7
study .5 .1
p(pass|…)smart smart
prep prep prep prep
fair .9 .7 .7 .2
fair .1 .1 .1 .1
Topological order = …?Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42
60
Likelihood weighting
• Idea: Don’t generate samples that need to be rejected in the first place!
• Sample only from the unknown variables Z• Weight each sample according to the likelihood that it
would occur, given the evidence E
61
Markov chain Monte Carlo algorithm• So called because
– Markov chain – each instance generated in the sample is dependent on the previous instance
– Monte Carlo – statistical sampling method
• Perform a random walk through variable assignment space, collecting statistics as you go– Start with a random instantiation, consistent with evidence variables– At each step, for some nonevidence variable, randomly sample its
value, consistent with the other current assignments
• Given enough samples, MCMC gives an accurate estimate of the true distribution of values
62
Exercise: MCMC sampling
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smart
study .9 .7
study .5 .1
p(pass|…)smart smart
prep prep prep prep
fair .9 .7 .7 .2
fair .1 .1 .1 .1
Topological order = …?Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42
63
Summary
• Bayes nets– Structure– Parameters– Conditional independence– Chaining
• BN inference– Enumeration– Variable elimination– Sampling methods