47
15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein (Berkeley), Percy Liang (Stanford) and Past 15-381 Instructors for some slide content, and Russell & Norvig

782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein (Berkeley), Percy Liang (Stanford) and Past 15-381 Instructors for some slide content, and Russell & Norvig

Page 2: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

What You Should Know •  Define probabilistic inference •  How to define a Bayes Net given a real example •  How joint can be used to answer any query •  Complexity of exact inference •  Approximation inference (direct, likelihood, Gibbs)

o  Be able to implement and run algorithm o  Compare benefits and limitations of each

2

Page 3: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Bayesian Network

•  Compact representation of the joint distribution •  Conditional independence relationships explicit

o  Each var conditionally independent of all its non-descendants in the graph given the value of its parents

3

Page 4: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Joint Distribution Ex.

•  Variables: Cloudy, Sprinkler, Rain, Wet Grass

•  Domain of each variable: 2 (true or false)

•  Joint encodes probability of all combos of variables & values

4

+c +s +r +w .01

+c +s +r -w .01

+c +s -r +w .05

+c +s -r -w .1

+c -s +r +w #

+c -s +r -w #

+c -s -r +w #

+c -s -r -w #

-c +s +r +w #

-c +s +r -w #

-c +s -r +w #

-c +s -r -w #

-c -s +r +w #

-c -s +r -w #

-c -s -r +w #

-c -s -r -w #

P(Cloudy=false & Sprinkler = true & Rain = false & WetGrass = True)

Page 5: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Joint as Product of Conditionals (Chain rule)

5

+c +s +r +w .01

+c +s +r -w .01

+c +s -r +w .05

+c +s -r -w .1

+c -s +r +w #

+c -s +r -w #

+c -s -r +w #

+c -s -r -w #

-c +s +r +w #

-c +s +r -w #

-c +s -r +w #

-c +s -r -w #

-c -s +r +w #

-c -s +r -w #

-c -s -r +w #

-c -s -r -w #

=

P(WetGrass|Cloudy,Sprinkler,Rain)* P(Rain|Cloudy,Sprinkler)* P(Sprinkler|Cloudy)* P(Cloudy)

Page 6: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Joint as Product of Conditionals

6

+c +s +r +w .01

+c +s +r -w .01

+c +s -r +w .05

+c +s -r -w .1

+c -s +r +w #

+c -s +r -w #

+c -s -r +w #

+c -s -r -w #

-c +s +r +w #

-c +s +r -w #

-c +s -r +w #

-c +s -r -w #

-c -s +r +w #

-c -s +r -w #

-c -s -r +w #

-c -s -r -w #

=

P(WetGrass|Cloudy,Sprinkler,Rain)* P(Rain|Cloudy,Sprinkler)* P(Sprinkler|Cloudy)* P(Cloudy)

…but there may be additional conditional independencies

Cloudy

Sprinkler Rain

Wet Grass

Page 7: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

What if some variables are conditionally indep?

Explicitly shows any conditional

independencies

7

Cloudy

Sprinkler Rain

Wet Grass

Cloudy

Sprinkler Rain

Wet Grass

Page 8: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Conditional Independencies

8

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

+c +s +r +w .01

+c +s +r -w .01

+c +s -r +w .05

+c +s -r -w .1

+c -s +r +w #

+c -s +r -w #

+c -s -r +w #

+c -s -r -w #

-c +s +r +w #

-c +s +r -w #

-c +s -r +w #

-c +s -r -w #

-c -s +r +w #

-c -s +r -w #

-c -s -r +w #

-c -s -r -w #

à

Page 9: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Bayesian Network

•  Compact representation of the joint distribution •  Conditional independence relationships explicit •  Still represents joint so can be used to answer any

probabilistic query

9

Page 10: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Probabilistic Inference

•  Compute probability of a query variable (or variables) taking on a value (or set of values) given some evidence

•  Pr[Q | E1=e1,...,Ek=ek]

10

Page 11: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Using the Joint To Answer Queries •  Joint distribution is sufficient to answer any

probabilistic inference question involving variables described in joint

•  Can take Bayes Net, construct full joint, and then look up entries where evidence variables take on specified values

11

Page 12: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

But Constructing Joint Expensive & Exact Inference is NP-Hard

12

Page 13: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Soln: Approximate Inference

•  Use samples to approximate posterior distribution Pr[Q | E1=e1,...,Ek=ek]

•  Last time o  Direct sampling o  Likelihood weighting

•  Today o  Gibbs

13

Page 14: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Poll: Which Algorithm? •  Evidence: Cloudy=+c, Rain=+r •  Query variable: Sprinkler •  P(Sprinkler|Cloudy=+c,Rain=+r) •  Samples

o  +c,+s,+r,-w o  +c,-s,-r,-w o  +c,+s,-r,+w o  +c,-s,+r,-w

•  What algorithm could’ve generated these samples? 1)  Direct sampling 2)  Likelihood weighting 3)  Both 4)  No clue

14

Cloudy

Sprinkler Rain

Wet Grass

Page 15: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Direct Sampling Recap Algorithm: 1. Create a topological order of the variables in the Bayes Net

15

Page 16: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Topological Order •  Any ordering in directed acyclic graph

where a node can only appear after all of its ancestors in the graph

•  E.g. o  Cloudy, Sprinkler, Rain, WetGrass o  Cloudy, Rain, Sprinkler, WetGrass

16

Cloudy

Sprinkler Rain

Wet Grass

Page 17: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Direct Sampling Recap Algorithm: 1. Create a topological order of the variables in the Bayes Net 2. Sample each variable conditioned on the values of its parents 3. Use samples which match evidence variable values to estimate probability of query variable e.g. P(Sprinkler=+s|Cloudy=+c,Rain=+r) ~ # samples with +s,+c,+r / # samples with +c, +r •  Consistent in limit of infinite samples •  Inefficient (why?)

17

Page 18: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Consistency

•  In the limit of infinite samples, estimated Pr[Q | E1=e1,...,Ek=ek] will converge to true posterior probability

•  Desirable property (otherwise always have some error)

18

Page 19: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Likelihood Weighting Recap

19

1.  Create array TotalWeights 1.  Initialize value of each array element to 0

2.  For j=1:N 1.  wtmp = 1 2.  Set evidence variables in sample z=<z1,…zn> to observed values 3.  For each variable zi in topological order

1.  If xi is an evidence variable 1.  wtmp = wtmp*P(Zi = ei |Parents(Z) = x(Parents(Zi)))

2.  Else 1.  Sample xi conditioned on the values of its parents

4.  Update weight of resulting sample 1.  TotalWeights[z]=TotalWeights[z]+wtmp

3. Use weights to compute probability of query variable P(Sprinkler=+s|Cloudy=+c,Rain=+r) ~ Sumc,r,wTotalWeight(+s,c,r,w)/Sums,c,r,wTotalWeight(s,c,r,w)

Page 20: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

LW Consistency •  Probability of getting a sample (z,e) where z is a set of

values for the non-evidence variables and e is the vals of evidence vars

•  Is this the true posterior distribution P(z|e)? o  No, why? o  Doesn’t consider evidence that is not an ancestor… o  Weights fix this!

20

Sampling distribution for a weighted sample (WS)

Page 21: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

•  Samples each non-evidence variable z according to

•  Weight of a sample is

•  Weighted probability of a sample is

Weighted Probability

21

From chain rule & conditional indep

Page 22: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Does Likelihood Weighting Produce Consistent Estimates? Yes

22

P(X = x | e) = P(X = x,e)P(e)

∝P(X = x,e)

P(X = x | e)∝ P(X = x,e) = NWS (x, y,e)w(x, y,e)y∑

≈ n*SWS (x, y,e)w(x, y,e)y∑

= P(x, y,e)y∑

= P(x,e)

# of samples where query variables=x, non-query=y, Evidence=e

X is query var(s) E is evidence var(s) Y is non-query vars

as # samples n à infinity

Page 23: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Example •  When sampling S and R the evidence W=t is

ignored o  Samples with S=f and R=f although evidence rules

this out •  Weight makes up for this difference

o  above weight would be 0

•  If we have 100 samples with R=t and total weight 1, and 400 samples with R=f and total weight 2, what is estimate of R=t? o  = 1/ 3

23

Page 24: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Limitations of Likelihood Weighting •  Poor performance if evidence vars occur later in

ordering •  Why? •  Not being used to influence samples! •  Yields samples with low weights

24

Page 25: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Markov Chain Monte Carlo Methods

•  Prior methods generate each new sample from scratch

•  MCMC generate each new sample by making a random change to preceding sample

•  Can view algorithm as being in a particular state (assignment of values to each variable)

25

Page 26: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

26

Review: Markov Blanket

•  Markov blanket o  Parents o  Children o  Children’s

parents •  Variable

conditionally independent of all other nodes given its Markov Blanket

Page 27: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling: Compute P(X|e)

27

mb(Zi) = Markov Blanket of Zi

from Russell & Norvig

Page 28: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling Example •  Want Pr(R|S=t,W=t) •  Non-evidence variables are C & R •  Initialize randomly: C= t and R=f •  Initial state (C,S,R,W)= [t,t,f,t] •  Sample C given current values of

its Markov Blanket

28

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 29: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling Example •  Want Pr(R|S=t,W=t) •  Non-evidence variables are C & R •  Initialize randomly: C= t and R=f •  Initial state (C,S,R,W)= [t,t,f,t] •  Sample C given current values of

its Markov Blanket •  Markov blanket is parents, children

and children’s parents: for C=S & R •  Sample C given P(C|S=t,R=f) •  First have to compute P(C|S=t,R=f) •  Use exact inference to do this

29

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 30: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Exercise: compute P(C=t|S=t,R=f)?

•  Quick refresher •  Sum rule

•  Product/Chain rule

•  Bayes rule

30

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 31: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Exact Inference Exercise

•  P(C|S=t,R=f) •  What is the probability P(C=t | S=t, R= f)? = P(C=t, S=t, R=f) / (P(S=t,R=f)) Proportional to P(C=t, S=t, R=f) Use normalization trick, & compute the above for C=t and C=f P(C=t, S=t, R=f) = P(C=t) P(S=t|C=t) P (R=f | C=t, S=t) product rule = P(C=t) P(S=t|C=t) P (R=f | C=t) (BN independencies) = 0.5 * 0.1 * 0.2 = 0.01 P(C=f, S=t, R=f) = P(C=f) P (S=t|C=f) P(R=f|C=f) = 0.5 * 0.5 * 0.8 = 0.2 (P(S=t,R=f)) use sum rule = P(C=f, S=t, R=f) + P(C=t, S=t, R=f) P (C = t | S=t, R= f) = 0.21 P (C=t | S=t, R = f) = 0.01 / 0.21 ~ 0.0476

31

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 32: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling Example •  Want Pr(R|S=t,W=t) •  Non-evidence variables are C & R •  Initialize randomly: C= t and R=f •  Initial state (C,S,R,W)= [t,t,f,t] •  Sample C given current values of

its Markov Blanket •  Markov blanket is parents, children

and children’s parents: for C=S & R •  Exactly compute P(C|S=t,R=f) •  Sample C given P(C|S=t,R=f) •  Get C = f •  New state (f,t,f,t)

32

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 33: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling Example •  Want Pr(R|S=t,W=t) •  Initialize non-evidence variables

(C and R) randomly to t and f •  Initial state (C,S,R,W)= [t,t,f,t] •  Sample C given current values of

its Markov Blanket, p(C|S=t,R=f) •  Suppose result is C=f •  New state (f,t,f,t) •  Sample Rain given its MB •  What is its Markov blanket?

33

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 34: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling Example •  Want Pr(R|S=t,W=t) •  Initialize non-evidence variables

(C and R) randomly to t and f •  Initial state (C,S,R,W)= [t,t,f,t] •  Sample C given current values of

its Markov Blanket, p(C|S=t,R=f) •  Suppose result is C=f •  New state (f,t,f,t) •  Sample Rain given its MB,

p(R|C=f,S=t,W=t) •  Suppose result is R=t •  New state (f,t,t,t)

34

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 35: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Poll: Gibbs Sampling Ex. •  Want Pr(R|S=t,W=t) •  Initialize non-evidence variables

(C and R) randomly to t and f •  Initial state (C,S,R,W)= [t,t,f,t] •  Current state (f,t,t,t)

•  What is not a possible next state •  1. (f,t,t,t) •  2. (t,t,t,t) •  3. (f,t,f,t) •  4. (f,f,t,t) (inconsistent w/evid) •  5. Not sure

35

Cloudy

Sprinkler Rain

Wet Grass

+c   0.5  -­‐c   0.5  

+s  +r   +w   .99  +s  +r   -­‐w   .01  +s  -­‐r   +w   .90  +s  -­‐r   -­‐w   .10  -­‐s  +r   +w   .90  -­‐s  +r   -­‐w   .10  -­‐s   -­‐r   +w   0  -­‐s   -­‐r   -­‐w   1.0  

+c   +s   .1  +c   -­‐s   .9  -­‐c   +s   .5  -­‐c   -­‐s   .5  

+c   +r   .8  +c   -­‐r   .2  -­‐c   +r   .2  -­‐c   -­‐r   .8  

Page 36: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling

36

mb(Zi) = Markov Blanket of Zi

from Russell & Norvig

This involve inference!

Page 37: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Poll Are Gibbs samples independent? 1. Yes 2. No 3. Not sure

37

mb(Zi) = Markov Blanket of Zi

from Russell & Norvig

Page 38: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Markov Blanket Sampling

•  Want to show P(Zi| mb(Zi) ) is same as P(Zi | all other variables) o  Implies conditional independence of Zi from

rest of network given its Markov Blanket •  Derive equation for computing P(Zi| mb(Zi) )

38

Page 39: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Probability Given Markov Blanket

39

Page 40: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Why is Gibbs Consistent?

•  Sampling process settles into a stationary distribution where long-term fraction of time spent in each state is exactly equal to posterior probability o  à Implies that if draw enough samples from

this stationary distribution, will get consistent estimate because sampling from true posterior

40

Page 41: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Markov Chain •  Let P(x à x’) be probability the sampling process

makes a transition from x (some state) to x’ (some other state) o  E.g. (t,t,f,t) à (t,f,f,t)

•  Run sampling for t steps •  Pt(x) is probability system is in state x at time t •  Next state Pt+1(x’) = Sumx Pt(x) P(x à x’)

41

Page 42: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Stationary Distribution •  Let P(x à x’) be probability the process makes a

transition from x to x’ •  Pt(x) is probability system is in state x at time t •  Pt+1(x’) = Sumx Pt(x) P(x à x’) •  Reached stationary distribution if Pt+1(x’)=Pt(x) •  Call stationary distribution π

o  Must satisfy π(x’) = \sum_{x} π(x) P(x à x’) for all x’ •  If P(x à x’) is ergodic, exactly one such π for any

given P(x à x’)

42

Page 43: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Detailed Balance •  Let P(x à x’) be probability the process makes a

transition from x to x’ •  Pt(x) is probability system is in state x at time t •  Stationary distribution π

o  Satisfies π(x’) = \sum_{x} π(x) P(x à x’) for all x’ •  Detailed balance: inflow = outflow •  π(x) P(x à x’) = π(x’) P(x’ à x) for all x, x’

43

π (x ')P(xx∑ − > x ') = π (x ')x∑ P(x '− > x) = π (x ')

Page 44: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

•  Proof on board

44

Page 45: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Proving Gibbs Samples from True Posterior •  General Gibbs: sample the value of a new

variable conditioned on all the other variables •  Can prove this version of Gibbs satisfies

detailed balance equation with stationary distribution of P(X|e)

•  Then use prior result that sampling conditioned on all variables is equivalent to sampling given Markov Blanket for Bayes Nets

•  See text for recap

45

Page 46: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

Gibbs Sampling

•  Samples are valid once reach stationary distribution

•  When do we reach stationary distribution? o  Unclear…

46

Page 47: 782 381 bayes nets 2 topostarielpro/15381f16/slides/781f16-bn2.pdf15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein

What You Should Know •  Define probabilistic inference •  How to define a Bayes Net given a real example •  How joint can be used to answer any query •  Complexity of exact inference •  Approximation inference (direct, likelihood, Gibbs)

o  Be able to implement and run algorithm o  Compare benefits and limitations of each

47