114
Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 2: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Cutset-based Variance Reduction

6. AND/OR importance sampling

Page 3: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Probabilistic Reasoning;Graphical models

• Graphical models:

– Bayesian network, constraint networks, mixed network

• Queries

• Exact algorithm

– using inference,

– search and hybrids

• Graph parameters:

– tree-width, cycle-cutset, w-cutset

Page 4: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Queries

• Probability of evidence (or partition function)

• Posterior marginal (beliefs):

• Most Probable Explanation

)var( 1

|)|()(eX

n

i

eii paxPeP X i

ii CZ )(

)var( 1

)var( 1

|)|(

|)|(

)(

),()|(

eX

n

j

ejj

XeX

n

j

ejj

ii

paxP

paxP

eP

exPexP i

e),xP(maxarg*xx

Page 5: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

5

Approximation

• Since inference, search and hybrids are too expensive when graph is dense; (high treewidth) then:

• Bounding inference:• mini-bucket and mini-clustering• Belief propagation

• Bounding search:• Sampling

• Goal: an anytime scheme

Page 6: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 7: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Outline

• Definitions and Background on Statistics

• Theory of importance sampling

• Likelihood weighting

• State-of-the-art importance sampling techniques

7

Page 8: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

A sample

• Given a set of variables X={X1,...,Xn}, a sample, denoted by St is an instantiation of all variables:

8

),...,,(t

n

ttt xxxS 21

Page 9: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

How to draw a sample ?Univariate distribution

• Example: Given random variable X having domain {0, 1} and a distribution P(X) = (0.3, 0.7).

• Task: Generate samples of X from P.

• How?

– draw random number r [0, 1]

– If (r < 0.3) then set X=0

– Else set X=1

9

Page 10: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

How to draw a sample?Multi-variate distribution

• Let X={X1,..,Xn} be a set of variables

• Express the distribution in product form

• Sample variables one by one from left to right, along the ordering dictated by the product form.

• Bayesian network literature: Logic sampling

10

),...,|(...)|()()( 11121 nn XXXPXXPXPXP

Page 11: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Sampling for Prob. InferenceOutline

• Logic Sampling

• Importance Sampling

– Likelihood Sampling

– Choosing a Proposal Distribution

• Markov Chain Monte Carlo (MCMC)

– Metropolis-Hastings

– Gibbs sampling

• Variance Reduction

Page 12: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Logic Sampling:

No Evidence (Henrion 1988)

Input: Bayesian network

X= {X1,…,XN}, N- #nodes, T - # samples

Output: T samples

Process nodes in topological order – first process the

ancestors of a node, then the node itself:

1. For t = 0 to T

2. For i = 0 to N

3. Xi sample xit from P(xi | pai)

12

Page 13: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Logic sampling (example)

13

X1

X4

X2X3

)( 1XP

)|( 12 XXP )|( 13 XXP

)|( from Sample .4

)|( from Sample .3

)|( from Sample .2

)( from Sample .1

sample generate//

Evidence No

33,2244

1133

1122

11

xXxXxPx

xXxPx

xXxPx

xPx

k

),|()|()|()(),,,( 324131214321 XXXPXXPXXPXPXXXXP

),|( 324 XXXP

Page 14: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Logic Sampling w/ Evidence

Input: Bayesian network

X= {X1,…,XN}, N- #nodes

E – evidence, T - # samples

Output: T samples consistent with E

1. For t=1 to T

2. For i=1 to N

3. Xi sample xit from P(xi | pai)

4. If Xi in E and Xi xi, reject sample:

5. Goto Step 1.

14

Page 15: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Logic Sampling (example)

15

X1

X4

X2X3

)( 1xP

)|( 12 xxP

),|( 324 xxxP

)|( 13 xxP

)|( from Sample 5.

otherwise 1, fromstart and

samplereject 0, If .4

)|( from Sample .3

)|( from Sample .2

)( from Sample .1

sample generate//

0 :Evidence

3,244

3

133

122

11

3

xxxPx

x

xxPx

xxPx

xPx

k

X

Page 16: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Expected value: Given a probability distribution P(X)

and a function g(X) defined over a set of variables X =

{X1, X2, … Xn}, the expected value of g w.r.t. P is

Variance: The variance of g w.r.t. P is:

Expected value and Variance

16

)()()]([ xPxgxgEx

P

)()]([)()]([2

xPxgExgxgVarx

PP

Page 17: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Monte Carlo Estimate

• Estimator:

– An estimator is a function of the samples.

– It produces an estimate of the unknown parameter of the sampling distribution.

17

T

t

t

P

SgT

g

P

1

T21

1

:bygiven is [g(x)]E of estimate carlo Monte the

, fromdrawn S ,S ,S samples i.i.d.Given

)(ˆ

Page 18: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Example: Monte Carlo estimate• Given:

– A distribution P(X) = (0.3, 0.7).– g(X) = 40 if X equals 0

= 50 if X equals 1.

• Estimate EP[g(x)]=(40x0.3+50x0.7)=47.• Generate k samples from P: 0,1,1,1,0,1,1,0,1,0

18

4610

650440

150040

samples

XsamplesXsamplesg

#

)(#)(#ˆ

Page 19: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Outline

• Definitions and Background on Statistics

• Theory of importance sampling

• Likelihood weighting

• State-of-the-art importance sampling techniques

19

Page 20: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Importance sampling: Main idea

• Express query as the expected value of a random variable w.r.t. to a distribution Q.

• Generate random samples from Q.

• Estimate the expected value from the generated samples using a monte carloestimator (average).

20

Page 21: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Importance sampling for P(e)

)(,)()(ˆ

:

)]([)(

),(

)(

)(),(),()(

)(),(

,\

ZQzwT

eP

zwEzQ

ezPE

zQ

zQezPezPeP

zQezP

EXZLet

tT

t

t

QQ

zz

z where1

estimate Carlo Monte

:as P(e) rewritecan weThen,

00

satisfying on,distributi (proposal) a be Q(Z)Let

1

Page 22: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Properties of IS estimate of P(e)

• Convergence: by law of large numbers

• Unbiased.

• Variance:

Tfor )()(

1)(ˆ ..

1ePzw

TeP saT

i

i

T

zwVarzw

TVarePVar

QN

i

i

QQ

)]([)(

1)(ˆ

1

)()](ˆ[ ePePEQ

Page 23: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Properties of IS estimate of P(e)

• Mean Squared Error of the estimator

T

xwVar

ePVar

ePVarePEeP

ePePEePMSE

Q

Q

QQ

QQ

)]([

)(ˆ

)(ˆ)](ˆ[)(

)()(ˆ)(ˆ

2

2

This quantity enclosed in the brackets is zero because the expected value of the

estimator equals the expected value of g(x)

Page 24: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Estimating P(Xi|e)

24

)|()|(E:biased is Estimate

,

,

)(ˆ

),(ˆ)|(:estimate Ratio

IS.by r denominato andnumerator Estimate :Idea

)(

),(

)(

),()(

),(

),()(

)(

),()|(

otherwise. 0 and xcontains z if 1 is which function, delta-dirac a be (z)Let

T

1k

T

1k

ix i

exPexP

e)w(z

e))w(z(z

eP

exPexP

zQ

ezPE

zQ

ezPzE

ezP

ezPz

eP

exPexP

ii

k

kk

x

ii

Q

x

Q

z

z

x

ii

i

i

i

Page 25: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Properties of the IS estimator for P(Xi|e)

• Convergence: By Weak law of large numbers

• Asymptotically unbiased

• Variance

– Harder to analyze

– Liu suggests a measure called “Effective sample size”

25

T as )|()|( exPexP ii

)|()]|([lim exPexPE iiPT

Page 26: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Generating samples from Q

• No restrictions on “how to”

• Typically, express Q in product form:

– Q(Z)=Q(Z1)xQ(Z2|Z1)x….xQ(Zn|Z1,..Zn-1)

• Sample along the order Z1,..,Zn

• Example:

– Z1Q(Z1)=(0.2,0.8)

– Z2 Q(Z2|Z1)=(0.1,0.9,0.2,0.8)

– Z3 Q(Z3|Z1,Z2)=Q(Z3)=(0.5,0.5)

Page 27: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Outline

• Definitions and Background on Statistics

• Theory of importance sampling

• Likelihood weighting

• State-of-the-art importance sampling techniques

27

Page 28: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Weighting(Fung and Chang, 1990; Shachter and Peot, 1990)

28

Works well for likely evidence!

“Clamping” evidence+logic sampling+weighing samples by evidence likelihood

Is an instance of importance sampling!

Page 29: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Weighting: Sampling

29

e e e e e

e e e e

Sample in topological order over X !

Clamp evidence, Sample xi P(Xi|pai), P(Xi|pai) is a look-up in CPT!

Page 30: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Weighting: Proposal Distribution

30

EE

jj

EXX

ii

EXX EE

jjii

n

EXX

ii

j

i

i j

i

paeP

epaxP

paePepaxP

xQ

exPw

xxx

Weights

xXXXPXP

epaXPEXQ

)|(

),|(

)|(),|(

)(

),(

),..,(:

:

),|()(

),|()\(

\

\

\

sample aGiven

)X,Q(X

.xX Evidence

and )X,X|P(X)X|P(X)P(X)X,X,P(X :networkBayesian aGiven

:Example

1

2213131

22

213121321

Notice: Q is another Bayesian network

Page 31: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Weighting: Estimates

31

T

t

twT

eP1

)(1)(ˆEstimate P(e):

otherwise zero equals and xif 1)(

)(

)(ˆ

),(ˆ)|(ˆ

i

)(

1

)(

)(

1

)(

t

i

t

x

T

t

t

t

x

T

t

t

ii

xxg

w

xgw

eP

exPexP

i

i

Estimate Posterior Marginals:

Page 32: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Weighting

• Converges to exact posterior marginals

• Generates Samples Fast

• Sampling distribution is close to prior (especially if E Leaf Nodes)

• Increasing sampling variance

Convergence may be slow

Many samples with P(x(t))=0 rejected

32

Page 33: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Outline

• Definitions and Background on Statistics

• Theory of importance sampling

• Likelihood weighting

• Error estimation

• State-of-the-art importance sampling techniques

33

Page 34: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview
Page 35: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview
Page 36: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview
Page 37: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

absolute

Page 38: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Outline

• Definitions and Background on Statistics

• Theory of importance sampling

• Likelihood weighting

• State-of-the-art importance sampling techniques

38

Page 39: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Proposal selection

• One should try to select a proposal that is as close as possible to the posterior distribution.

)|()(

)()(

),(

estimator variance-zero a have to,0)()(

),(

)()()(

),(1)]([)(ˆ

2

ezPzQ

zQeP

ezP

ePzQ

ezP

zQePzQ

ezP

NT

zwVarePVar

Zz

Q

Q

Page 40: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Perfect sampling using Bucket Elimination

• Algorithm:

– Run Bucket elimination on the problem along an ordering o=(XN,..,X1).

– Sample along the reverse ordering: (X1,..,XN)

– At each variable Xi, recover the probability P(Xi|x1,...,xi-1) by referring to the bucket.

Page 41: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

41

Bucket Elimination )0,()0|( eaPeaP

debc

cbePbadPacPabPaPeaP,0,,

),|(),|()|()|()()0,(

dc b e

badPcbePabPacPaP ),|(),|()|()|()(0

Elimination Order: d,e,b,cQuery:

D:

E:

B:

C:

A:

d

D badPbaf ),|(),(),|( badP

),|( cbeP ),|0(),( cbePcbfE

b

EDB cbfbafabPcaf ),(),()|(),(

)()()0,( afApeaP C)(aP

)|( acP c

BC cafacPaf ),()|()(

)|( abP

D,A,B E,B,C

B,A,C

C,A

A

),( bafD ),( cbfE

),( cafB

)(afC

AA

DD EE

CCBB

Bucket TreeD E

B

C

A

Original Functions Messages

Time and space exp(w*)

Page 42: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

SP2 42

Bucket elimination (BE) Algorithm elim-bel (Dechter 1996)

b

Elimination operator

P(e)

bucket B:

P(a)

P(C|A)

P(B|A) P(D|B,A) P(e|B,C)

bucket C:

bucket D:

bucket E:

bucket A:

B

C

D

E

A

e)(A,hD

(a)hE

e)C,D,(A,hB

e)D,(A,hC

Page 43: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

SP2 43

Sampling from the output of BE(Dechter 2002)

bucket B:

P(A)

P(C|A)

P(B|A) P(D|B,A) P(e|B,C)

bucket C:

bucket D:

bucket E:

bucket A:

e)(A,hD

(A)hE

e)C,D,(A,hB

e)D,(A,hC

Q(A)aA:

(A)hP(A)Q(A)E

Sample

ignore :bucket Evidence

e)D,(a,he)a,|Q(Dd D :Sample

bucket in the aASet

C

e)C,d,(a,h)|(d)e,a,|Q(C cC :Sample

bucket thein dDa,ASet

B

ACP

),|(),|()|(d)e,a,|Q(Bb B:Sample

bucket thein cCd,Da,ASet

cbePaBdPaBP

Page 44: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Mini-buckets: “local inference”

• Computation in a bucket is time and space exponential in the number of variables involved

• Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables

• Can control the size of each “mini-bucket”, yielding polynomial complexity.

SP2 44

Page 45: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

SP2 45 45

Mini-Bucket Elimination

bucket A:

bucket E:

bucket D:

bucket C:

bucket B:

ΣB

P(B|A) P(D|B,A)

hE(A)

hB(A,D)

P(e|B,C)

Mini-buckets

ΣB

P(C|A) hB(C,e)

hD(A)

hC(A,e)

Approximation of P(e)

Space and Time constraints:Maximum scope size of the new function generated should be bounded by 2

BE generates a function having scope size 3. So it cannot be used.

P(A)

Page 46: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

SP2 46 46

Sampling from the output of MBE

bucket A:

bucket E:

bucket D:

bucket C:

bucket B: P(B|A) P(D|B,A)

hE(A)

hB(A,D)

P(e|B,C)

P(C|A) hB(C,e)

hD(A)

hC(A,e)Sampling is same as in BE-sampling except that now we construct Q from a randomly selected “mini-bucket”

Page 47: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

IJGP-Sampling (Gogate and Dechter, 2005)

• Iterative Join Graph Propagation (IJGP)

– A Generalized Belief Propagation scheme (Yedidiaet al., 2002)

• IJGP yields better approximations of P(X|E) than MBE

– (Dechter, Kask and Mateescu, 2002)

• Output of IJGP is same as mini-bucket “clusters”

• Currently the best performing IS scheme!

Page 48: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Current Research question

• Given a Bayesian network with evidence or a Markov network representing function P, generate another Bayesian network representing a function Q (from a family of distributions, restricted by structure) such that Q is closest to P.

• Current approaches– Mini-buckets– Ijgp– Both

• Experimented, but need to be justified theoretically.

Page 49: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Algorithm: Approximate Sampling

1) Run IJGP or MBE

2) At each branch point compute the edge probabilities by consulting output of IJGP or MBE

• Rejection Problem:

– Some assignments generated are non solutions

Page 50: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

k

)(ˆ Re

')(Q Update

)(N

1)(ˆe)(EP̂

Q z,...,z samples Generate

dok to1iFor

0)(ˆ

))(|(...))(|()()(Q Proposal Initial

1k

1

N1

221

1

eEPturn

End

QQkQ

zweEP

from

eEP

ZpaZQZpaZQZQZ

kk

iN

j

k

k

nn

Adaptive Importance Sampling

Page 51: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Adaptive Importance Sampling

• General case

• Given k proposal distributions

• Take N samples out of each distribution

• Approximate P(e)

1

)(ˆ

1

k

j

proposaljthweightAvgk

eP

Page 52: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Estimating Q'(z)

sampling importanceby estimated is

)Z,..,Z|(ZQ'each where

))(|('...))(|(')(')(Q

1-i1i

221

'

nn ZpaZQZpaZQZQZ

Page 53: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 54: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Markov Chain

• A Markov chain is a discrete random process with the property that the next state depends only on the

current state (Markov Property):

54

x1 x2 x3 x4

)|(),...,,|( 1121 tttt xxPxxxxP

• If P(Xt|xt-1) does not depend on t (time homogeneous) and state space is finite, then it is often expressed as a transition function (aka transition matrix) 1)(

x

xXP

Page 55: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Example: Drunkard’s Walk• a random walk on the number line where, at

each step, the position may change by +1 or −1 with equal probability

55

1 2 3

,...}2,1,0{)( XD5.05.0

)1()1(

n

nPnP

transition matrix P(X)

1 2

Page 56: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Example: Weather Model

56

5.05.0

1.09.0

)()(

sunny

rainy

sunnyPrainyP

transition matrix P(X)

},{)( sunnyrainyXD

rain rain rainrain sun

Page 57: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Multi-Variable System

• state is an assignment of values to all the variables

57

x1t

x2t

x3t

x1t+1

x2t+1

x3t+1

},...,,{ 21

t

n

ttt xxxx

finitediscreteXDXXXX i ,)(},,,{ 321

Page 58: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Bayesian Network System

• Bayesian Network is a representation of the joint probability distribution over 2 or more variables

X1t

X2t

X3t

},,{ 321

tttt xxxx

X1

X2 X3

58

},,{ 321 XXXX

x1t+1

x2t+1

x3t+1

Page 59: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

59

Stationary DistributionExistence

• If the Markov chain is time-homogeneous, then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

• Finite state space Markov chain has a unique stationary distribution if and only if:– The chain is irreducible

– All of its states are positive recurrent

)(

)|()()(XDx

jiji

i

xxPxx

Page 60: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

60

Irreducible

• A state x is irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps

• If one state is irreducible, then all the states must be irreducible

(Liu, Ch. 12, pp. 249, Def. 12.1.1)

Page 61: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

61

Recurrent

• A state x is recurrent if the chain returns to x

with probability 1

• Let M(x ) be the expected number of steps to return to state x

• State x is positive recurrent if M(x ) is finiteThe recurrent states in a finite state chain are positive recurrent .

Page 62: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Stationary Distribution Convergence

• Consider infinite Markov chain:nnn PPxxPP 00)( )|(

• Initial state is not important in the limit

“The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…”

(Liu, Ch. 12.1)

)(lim n

nP

62

• If the chain is both irreducible and aperiodic, then:

Page 63: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Aperiodic

• Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for i, then chain is aperiodic

• Positive recurrent, aperiodic states are ergodic

63

Page 64: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Markov Chain Monte Carlo

• How do we estimate P(X), e.g., P(X|e) ?

64

• Generate samples that form Markov Chain with stationary distribution =P(X|e)

• Estimate from samples (observed states):

visited states x0,…,xn can be viewed as “samples” from distribution

),(1

)(1

tT

t

xxT

x

)(lim xT

Page 65: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

MCMC Summary

• Convergence is guaranteed in the limit

• Samples are dependent, not i.i.d.

• Convergence (mixing rate) may be slow

• The stronger correlation between states, the slower convergence!

• Initial state is not important, but… typically, we throw away first K samples - “burn-in”

65

Page 66: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling (Geman&Geman,1984)

• Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables

• Sample new variable value one variable at a time from the variable’s conditional distribution:

• Samples form a Markov chain with stationary distribution P(X|e)

66

)\|(},...,,,..,|()( 111 i

t

i

t

n

t

i

t

i

t

ii xxXPxxxxXPXP

Page 67: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling: Illustration

The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk):

In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).

Page 68: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Ordered Gibbs Sampler

Generate sample xt+1 from xt :

In short, for i=1 to N:

68

),\|( from sampled

),,...,,|(

...

),,...,,|(

),,...,,|(

1

1

1

1

2

1

1

1

3

1

12

1

22

321

1

11

exxXPxX

exxxXPxX

exxxXPxX

exxxXPxX

i

t

i

t

ii

t

N

tt

N

t

NN

t

N

ttt

t

N

ttt

ProcessAllVariablesIn SomeOrder

Page 69: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Transition Probabilities in BN

Markov blanket:

:)|( )\|( t

iii

t

i markovXPxxXP

ij chX

jjiii

t

i paxPpaxPxxxP )|()|()\|(

)()( jj chX

jiii pachpaXmarkov

Xi

Given Markov blanket (parents, children, and their parents),Xi is independent of all other nodes

69

Computation is linear in the size of Markov blanket!

Page 70: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Ordered Gibbs Sampling Algorithm(Pearl,1988)

Input: X, E=e

Output: T samples {xt }

Fix evidence E=e, initialize x0 at random1. For t = 1 to T (compute samples)

2. For i = 1 to N (loop through variables)

3. xit+1 P(Xi | markovi

t)

4. End For

5. End For

Page 71: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling Example - BN

71

X1

X4

X8 X5 X2

X3

X9 X7

X6

}{},,...,,{ 9921 XEXXXX

X1 = x10

X6 = x60

X2 = x20

X7 = x70

X3 = x30

X8 = x80

X4 = x40

X5 = x50

Page 72: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling Example - BN

72

X1

X4

X8 X5 X2

X3

X9 X7

X6

),,...,|( 9

0

8

0

21

1

1 xxxXPx

}{},,...,,{ 9921 XEXXXX

),,...,|( 9

0

8

1

12

1

2 xxxXPx

Page 73: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Answering Queries P(xi |e) = ?• Method 1: count # of samples where Xi = xi (histogram estimator):

T

t

t

iiiii markovxXPT

xXP1

)|(1

)(

T

t

t

iii xxT

xXP1

),(1

)(

Dirac delta f-n

• Method 2: average probability (mixture estimator):

• Mixture estimator converges faster (consider estimates for the unobserved values of Xi; prove via Rao-Blackwell theorem)

Page 74: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Rao-Blackwell Theorem

Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution (R,L) and function g, the following result applies

74

)]([}|)({[ RgVarLRgEVar

for a function of interest g, e.g., the mean or covariance (Casella&Robert,1996, Liu et. al. 1995).

• theorem makes a weak promise, but works well in practice!• improvement depends the choice of R and L

Page 75: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Importance vs. Gibbs

T

tt

tt

t

T

t

t

T

t

xQ

xPxg

Tg

eXQX

xgT

Xg

eXPeXP

eXPx

1

1

)(

)()(1

)|(

)(1

)(ˆ

)|()|(ˆ

)|(ˆ

wt

Gibbs:

Importance:

Page 76: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling: Convergence

76

• Sample from `P(X|e)P(X|e)

• Converges iff chain is irreducible and ergodic

• Intuition - must be able to explore all states:– if Xi and Xj are strongly correlated, Xi=0 Xj=0,

then, we cannot explore states with Xi=1 and Xj=1

• All conditions are satisfied when all probabilities are positive

• Convergence rate can be characterized by the second eigen-value of transition matrix

Page 77: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs: Speeding Convergence

Reduce dependence between samples (autocorrelation)

• Skip samples

• Randomize Variable Sampling Order

• Employ blocking (grouping)

• Multiple chains

Reduce variance (cover in the next section)

77

Page 78: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Blocking Gibbs Sampler

• Sample several variables together, as a block

• Example: Given three variables X,Y,Z, with domains of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:

+ Can improve convergence greatly when two variables are strongly correlated!

- Domain of the block variable grows exponentially with the #variables in a block!

78

)|,(),(

)(),|(

1111

1

tttt

tttt

xZYPwzy

wPzyXPx

Page 79: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs: Multiple Chains

• Generate M chains of size K

• Each chain produces independent estimate Pm:

79

M

i

mPM

P1

K

t

i

t

iim xxxPK

exP1

)\|(1

)|(

Treat Pm as independent random variables.

• Estimate P(xi|e) as average of Pm (xi|e) :

Page 80: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Gibbs Sampling Summary

• Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

• Samples are dependent, form Markov Chain

• Sample from which converges to

• Guaranteed to converge when all P > 0

• Methods to improve convergence:– Blocking

– Rao-Blackwellised

80

)|( eXP )|( eXP

Page 81: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 82: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Sampling: Performance

• Gibbs sampling

– Reduce dependence between samples

• Importance sampling

– Reduce variance

• Achieve both by sampling a subset of variables and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation

• Exploit graph structure to manage the extra cost

82

Page 83: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Smaller Subset State-Space

• Smaller state-space is easier to cover

83

},,,{ 4321 XXXXX },{ 21 XXX

64)( XD 16)( XD

Page 84: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Smoother Distribution

84

00

01

10

11

0

0.1

0.2

0001

1011

P(X1,X2,X3,X4)

0-0.1 0.1-0.2 0.2-0.26

0

1

0

0.1

0.2

0

1

P(X1,X2)

0-0.1 0.1-0.2 0.2-0.26

Page 85: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Speeding Up Convergence

• Mean Squared Error of the estimator:

PVarBIASPMSE QQ 2

2

2

][ˆ]ˆ[]ˆ[ PEPEPVarPMSE QQQQ

• Reduce variance speed up convergence !

• In case of unbiased estimator, BIAS=0

85

Page 86: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Rao-Blackwellisation

86

)}(~{]}|)([{)}({

)}(ˆ{

]}|)([{)}({

]}|)({var[]}|)([{)}({

]}|)([]|)([{1

)(~

)}()({1

)(ˆ

1

1

xgVarT

lxhEVar

T

xhVarxgVar

lxgEVarxgVar

lxgElxgEVarxgVar

lxhElxhET

xg

xhxhT

xg

LRX

T

T

Liu, Ch.2.3

Page 87: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Rao-Blackwellisation

• X=RL

• Importance Sampling:

• Gibbs Sampling: – autocovariances are lower (less correlation

between samples)

– if Xi and Xj are strongly correlated, Xi=0 Xj=0, only include one fo them into a sampling set

87

})(

)({}

),(

),({

RQ

RPVar

LRQ

LRPVar QQ

Liu, Ch.2.5.5

“Carry out analytical computation as much as possible” - Liu

Page 88: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Blocking Gibbs Sampler vs. Collapsed

• Standard Gibbs:

(1)

• Blocking:

(2)

• Collapsed:

(3)

88

X Y

Z

),|(),,|(),,|( yxzPzxyPzyxP

)|,(),,|( xzyPzyxP

)|(),|( xyPyxP

FasterConvergence

Page 89: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Collapsed Gibbs Sampling

Generating Samples

Generate sample ct+1 from ct :

89

),\|(

),,...,,|(

...

),,...,,|(

),,...,,|(

1

1

1

1

2

1

1

1

3

1

12

1

22

321

1

11

ecccPcC

eccccPcC

eccccPcC

eccccPcC

i

t

i

t

ii

t

K

tt

K

t

KK

t

K

ttt

t

K

ttt

from sampled

In short, for i=1 to K:

Page 90: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Collapsed Gibbs Sampler

Input: C X, E=e

Output: T samples {ct }

Fix evidence E=e, initialize c0 at random1. For t = 1 to T (compute samples)

2. For i = 1 to N (loop through variables)

3. cit+1 P(Ci | ct\ci)

4. End For

5. End For

Page 91: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Calculation Time

• Computing P(ci| ct\ci,e) is more expensive (requires inference)

• Trading #samples for smaller variance:

– generate more samples with higher covariance

– generate fewer samples with lower covariance

• Must control the time spent computing sampling probabilities in order to be time-effective!

91

Page 92: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Exploiting Graph Properties

Recall… computation time is exponential in the adjusted induced width of a graph

• w-cutset is a subset of variable s.t. when they are observed, induced width of the graph is w

• when sampled variables form a w-cutset , inference is exp(w) (e.g., using Bucket Tree

Elimination)

• cycle-cutset is a special case of w-cutset

92

Sampling w-cutset w-cutset sampling!

Page 93: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

What If C=Cycle-Cutset ?

93

}{},{ 9

0

5

0

2

0 XE,xxc

X1

X7

X5 X4

X2

X9 X8

X3

X6

X1

X7

X4

X9 X8

X3

X6

P(x2,x5,x9) – can compute using Bucket Elimination

P(x2,x5,x9) – computation complexity is O(N)

Page 94: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Computing Transition Probabilities

94

),,1(:

),,0(:

932

932

xxxPBE

xxxPBE

X1

X7

X5 X4

X2

X9 X8

X3

X6

),,1()|1(

),,0()|0(

),,1(),,0(

93232

93232

932932

xxxPxxP

xxxPxxP

xxxPxxxP

Compute joint probabilities:

Normalize:

Page 95: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Cutset Sampling-Answering Queries

• Query: ci C, P(ci |e)=? same as Gibbs:

95

computed while generating sample tusing bucket tree elimination

compute after generating sample tusing bucket tree elimination

T

t

i

t

ii ecccPT

|ecP1

),\|(1

)(ˆ

T

t

t

ii ,ecxPT

|e)(xP1

)|(1

• Query: xi X\C, P(xi |e)=?

Page 96: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Cutset Sampling vs. Cutset Conditioning

96

)|()|(

)()|(

)|(1

)(

)(

1

ecPc,exP

T

ccountc,exP

,ecxPT

|e)(xP

CDc

i

CDc

i

T

t

t

ii

• Cutset Conditioning

• Cutset Sampling

)|()|()(

ecPc,exP|e)P(xCDc

ii

Page 97: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Cutset Sampling Example

97

)(

)(

)(

3

1)|(

)(

)(

)(

9

2

52

9

1

52

9

0

52

92

9

2

52

3

2

9

1

52

2

2

9

0

52

1

2

,x| xxP

,x| xxP

,x| xxP

xxP

,x| xxP x

,x| xxP x

,x| xxP x

X1

X7

X6 X5 X4

X2

X9 X8

X3

Estimating P(x2|e) for sampling node X2 :

Sample 1

Sample 2

Sample 3

Page 98: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Cutset Sampling Example

98

),,|(},{

),,|(},{

),,|(},{

9

3

5

3

23

3

5

3

2

3

9

2

5

2

23

2

5

2

2

2

9

1

5

1

23

1

5

1

2

1

xxxxPxxc

xxxxPxxc

xxxxPxxc

Estimating P(x3 |e) for non-sampled node X3 :

X1

X7

X6 X5 X4

X2

X9 X8

X3

),,|(

),,|(

),,|(

3

1)|(

9

3

5

3

23

9

2

5

2

23

9

1

5

1

23

93

xxxxP

xxxxP

xxxxP

xxP

Page 99: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS54 Test Results

99

MSE vs. #samples (left) and time (right)

Ergodic, |X|=54, D(Xi)=2, |C|=15, |E|=3

Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3

0

0.001

0.002

0.003

0.004

0 1000 2000 3000 4000 5000

# samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0

0.0002

0.0004

0.0006

0.0008

0 5 10 15 20 25

Time(sec)

Cutset Gibbs

Page 100: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS179 Test Results

100

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35

Exact Time = 122 sec using Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

100 500 1000 2000 3000 4000

# samples

Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

0 20 40 60 80

Time(sec)

Cutset Gibbs

Page 101: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS360b Test Results

101

MSE vs. #samples (left) and time (right)

Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36

Exact Time > 60 min using Cutset Conditioning

Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

0 200 400 600 800 1000

# samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

1 2 3 5 10 20 30 40 50 60

Time(sec)

Cutset Gibbs

Page 102: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Random Networks

102

MSE vs. #samples (left) and time (right)

|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20

Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0 200 400 600 800 1000 1200

# samples

Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0002

0.0004

0.0006

0.0008

0.001

0 1 2 3 4 5 6 7 8 9 10 11

Time(sec)

Cutset Gibbs

Page 103: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Coding NetworksCutset Transforms Non-Ergodic Chain to Ergodic

103

MSE vs. time (right)

Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50

Sample Ergodic Subspace U={U1, U2,…Uk}

Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4

u1 u2 u3 u4

p1 p2 p3 p4

y4y3y2y1

Coding Networks, n=100, |C|=12-14

0.001

0.01

0.1

0 10 20 30 40 50 60

Time(sec)

IBP Gibbs Cutset

Page 104: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Non-Ergodic Hailfinder

104

MSE vs. #samples (left) and time (right)

Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0

Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

1

1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

0 500 1000 1500

# samples

Cutset Gibbs

Page 105: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS360b - MSE

105

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0

0.000005

0.00001

0.000015

0.00002

0.000025

0 200 400 600 800 1000 1200 1400 1600

Time (sec)

Gibbs

IBP

|C|=26,fw=3

|C|=48,fw=2

MSE vs. Time

Ergodic, |X| = 360, |C| = 26, D(Xi)=2

Exact Time = 50 min using BTE

Page 106: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Cutset Importance Sampling

• Apply Importance Sampling over cutset C

T

t

tT

tt

t

wTcQ

ecP

TeP

11

1

)(

),(1)(ˆ

T

t

tt

ii wccT

ecP1

),(1

)|(

T

t

tt

ii wecxPT

exP1

),|(1

)|(

where P(ct,e) is computed using Bucket Elimination

(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)

Page 107: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Likelihood Cutset Weighting (LCS)

• Z=Topological Order{C,E}

• Generating sample t+1:

107

For End

If End

),...,|(

Else

If

:do For

1

1

11

1

1

t

i

t

i

t

i

ii

t

i

i

i

zzZPz

e ,zzz

E Z

ZZ • computed while generating sample tusing bucket tree elimination

• can be memoized for some number of instances K (based on memory available

KL[P(C|e), Q(C)+ ≤ KL*P(X|e), Q(X)]

Page 108: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Pathfinder 1

108

Page 109: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Pathfinder 2

109

Page 110: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Link

110

Page 111: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Summary

• i.i.d. samples

• Unbiased estimator

• Generates samples fast

• Samples from Q

• Reject samples with zero-weight

• Improves on cutset

• Dependent samples

• Biased estimator

• Generates samples slower

• Samples from `P(X|e)

• Does not converge in presence of constraints

• Improves on cutset

111

Importance Sampling Gibbs Sampling

Page 112: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS360b

112

cpcs360b, N=360, |LC|=26, w*=21, |E|=15

1.E-05

1.E-04

1.E-03

1.E-02

0 2 4 6 8 10 12 14

Time (sec)

MS

ELW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 113: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

CPCS422b

113

1.0E-05

1.0E-04

1.0E-03

1.0E-02

0 10 20 30 40 50 60

MS

E

Time (sec)

cpcs422b, N=422, |LC|=47, w*=22, |E|=28 LW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 114: Sampling Techniques for Probabilistic and …Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview

Coding Networks

114

coding, N=200, P=3, |LC|=26, w*=21

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

0 2 4 6 8 10

Time (sec)

MS

ELWAIS-BNGibbsLCSIBP

LW – likelihood weightingLCS – likelihood weighting on a cutset