63
Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche chapter 15, related papers

Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Sampling Techniques for Probabilistic and Deterministic Graphical models

ICS 276, Spring 2018

Bozhena Bidyuk

Rina Dechter

Reading” Darwiche chapter 15, related papers

Page 2: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 3: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Markov Chain

• A Markov chain is a discrete random process with the property that the next state depends only on the

current state (Markov Property):

3

x1 x2 x3 x4

)|(),...,,|( 1121 −− = tttt xxPxxxxP

• If P(Xt|xt-1) does not depend on t (time homogeneous) and state space is finite, then it is often expressed as a transition function (aka transition matrix) 1)( ==

x

xXP

Page 4: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Example: Drunkard’s Walk• a random walk on the number line where, at

each step, the position may change by +1 or −1 with equal probability

4

1 2 3

,...}2,1,0{)( =XD5.05.0

)1()1(

n

nPnP +−

transition matrix P(X)

1 2

Page 5: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Example: Weather Model

5

5.05.0

1.09.0

)()(

sunny

rainy

sunnyPrainyP

transition matrix P(X)

},{)( sunnyrainyXD =

rain rain rainrain sun

Page 6: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Multi-Variable System

• state is an assignment of values to all the variables

6

x1t

x2t

x3t

x1t+1

x2t+1

x3t+1

},...,,{ 21

t

n

ttt xxxx =

finitediscreteXDXXXX i ,)(},,,{ 321 ==

Page 7: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Bayesian Network System

• Bayesian Network is a representation of the joint probability distribution over 2 or more variables

X1t

X2t

X3t

},,{ 321

tttt xxxx =

X1

X2 X3

7

},,{ 321 XXXX =

x1t+1

x2t+1

x3t+1

Page 8: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

8

Stationary DistributionExistence

• If the Markov chain is time-homogeneous, then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

• Finite state space Markov chain has a unique stationary distribution if and only if:– The chain is irreducible

– All of its states are positive recurrent

=)(

)|()()(XDx

jiji

i

xxPxx

Page 9: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

9

Irreducible

• A state x is irreducible if under the transition rule one has nonzero probability of moving from x to

any other state and then coming back in a finite number of steps

• If one state is irreducible, then all the states must be irreducible

(Liu, Ch. 12, pp. 249, Def. 12.1.1)

Page 10: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

10

Recurrent

• A state x is recurrent if the chain returns to x

with probability 1

• Let M(x ) be the expected number of steps to return to state x

• State x is positive recurrent if M(x ) is finiteThe recurrent states in a finite state chain are positive recurrent .

Page 11: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Stationary Distribution Convergence

• Consider infinite Markov chain:nnn PPxxPP 00)( )|( ==

• Initial state is not important in the limit

“The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…”

(Liu, Ch. 12.1)

)(lim n

nP

→=

11

• If the chain is both irreducible and aperiodic, then:

Page 12: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Aperiodic

• Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for i, then chain is aperiodic

• Positive recurrent, aperiodic states are ergodic

12

Page 13: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Markov Chain Monte Carlo

• How do we estimate P(X), e.g., P(X|e) ?

13

• Generate samples that form Markov Chain with stationary distribution =P(X|e)

• Estimate from samples (observed states):

visited states x0,…,xn can be viewed as “samples” from distribution

),(1

)(1

tT

t

xxT

x =

=

)(lim xT

=

Page 14: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

MCMC Summary

• Convergence is guaranteed in the limit

• Samples are dependent, not i.i.d.

• Convergence (mixing rate) may be slow

• The stronger correlation between states, the slower convergence!

• Initial state is not important, but… typically, we throw away first K samples - “burn-in”

14

Page 15: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling (Geman&Geman,1984)

• Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables

• Sample new variable value one variable at a time from the variable’s conditional distribution:

• Samples form a Markov chain with stationary distribution P(X|e)

15

)\|(},...,,,..,|()( 111 i

t

i

t

n

t

i

t

i

t

ii xxXPxxxxXPXP == +−

Page 16: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling: Illustration

The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk):

In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).

Page 17: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Ordered Gibbs Sampler

Generate sample xt+1 from xt :

In short, for i=1 to N:

17

),\|( from sampled

),,...,,|(

...

),,...,,|(

),,...,,|(

1

1

1

1

2

1

1

1

3

1

12

1

22

321

1

11

exxXPxX

exxxXPxX

exxxXPxX

exxxXPxX

i

t

i

t

ii

t

N

tt

N

t

NN

t

N

ttt

t

N

ttt

=

=

=

=

+

+

+++

++

+

ProcessAllVariablesIn SomeOrder

Page 18: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Transition Probabilities in BN

Markov blanket:

:)|( )\|( t

iii

t

i markovXPxxXP =

ij chX

jjiii

t

i paxPpaxPxxxP )|()|()\|(

)()( jj chX

jiii pachpaXmarkov

=

Xi

Given Markov blanket (parents, children, and their parents),Xi is independent of all other nodes

18

Computation is linear in the size of Markov blanket!

U U U

Page 19: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Ordered Gibbs Sampling Algorithm(Pearl,1988)

Input: X, E=e

Output: T samples {xt }

Fix evidence E=e, initialize x0 at random1. For t = 1 to T (compute samples)

2. For i = 1 to N (loop through variables)

3. xit+1 P(Xi | markovi

t)

4. End For

5. End For

Page 20: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling Example - BN

20

X1

X4

X8 X5 X2

X3

X9 X7

X6

}{},,...,,{ 9921 XEXXXX ==

X1 = x10

X6 = x60

X2 = x20

X7 = x70

X3 = x30

X8 = x80

X4 = x40

X5 = x50

Page 21: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling Example - BN

21

X1

X4

X8 X5 X2

X3

X9 X7

X6

),,...,|( 9

0

8

0

21

1

1 xxxXPx

}{},,...,,{ 9921 XEXXXX ==

),,...,|( 9

0

8

1

12

1

2 xxxXPx

Page 22: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Answering Queries P(xi |e) = ?• Method 1: count # of samples where Xi = xi (histogram estimator):

=

===T

t

t

iiiii markovxXPT

xXP1

)|(1

)(

=

==T

t

t

iii xxT

xXP1

),(1

)(

Dirac delta f-n

• Method 2: average probability (mixture estimator):

• Mixture estimator converges faster (consider estimates for the unobserved values of Xi; prove via Rao-Blackwell theorem)

Page 23: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Rao-Blackwell Theorem

Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution (R,L) and function g, the following result applies

23

)]([}|)({[ RgVarLRgEVar

for a function of interest g, e.g., the mean or covariance (Casella&Robert,1996, Liu et. al. 1995).

• theorem makes a weak promise, but works well in practice!• improvement depends on the choice of R and L

Page 24: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Importance vs. Gibbs

=

=

=

=

⎯⎯ →⎯

T

tt

tt

t

T

t

t

T

t

xQ

xPxg

Tg

eXQX

xgT

Xg

eXPeXP

eXPx

1

1

)(

)()(1

)|(

)(1

)(ˆ

)|()|(ˆ

)|(ˆ

wt

Gibbs:

Importance:

Page 25: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling: Convergence

25

• Sample from `P(X|e)→P(X|e)

• Converges iff chain is irreducible and ergodic

• Intuition - must be able to explore all states:– if Xi and Xj are strongly correlated, Xi=0 Xj=0,

then, we cannot explore states with Xi=1 and Xj=1

• All conditions are satisfied when all probabilities are positive

• Convergence rate can be characterized by the second eigen-value of transition matrix

Page 26: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs: Speeding Convergence

Reduce dependence between samples (autocorrelation)

• Skip samples

• Randomize Variable Sampling Order

• Employ blocking (grouping)

• Multiple chains

Reduce variance (cover in the next section)

26

Page 27: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Blocking Gibbs Sampler

• Sample several variables together, as a block

• Example: Given three variables X,Y,Z, with domains of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:

+ Can improve convergence greatly when two variables are strongly correlated!

- Domain of the block variable grows exponentially with the #variables in a block!

27

)|,(),(

)(),|(

1111

1

++++

+

=

=

tttt

tttt

xZYPwzy

wPzyXPx

Page 28: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs: Multiple Chains

• Generate M chains of size K

• Each chain produces independent estimate Pm:

28

( ) ( )•=• =

M

i

mPM

P1

=

=K

t

i

t

iim xxxPK

exP1

)\|(1

)|(

Treat Pm as independent random variables.

• Estimate P(xi|e) as average of Pm (xi|e) :

Page 29: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Gibbs Sampling Summary

• Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

• Samples are dependent, form Markov Chain

• Sample from which converges to

• Guaranteed to converge when all P > 0

• Methods to improve convergence:– Blocking

– Rao-Blackwellised

29

)|( eXP )|( eXP

Page 30: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Overview

1. Probabilistic Reasoning/Graphical models

2. Importance Sampling

3. Markov Chain Monte Carlo: Gibbs Sampling

4. Sampling in presence of Determinism

5. Rao-Blackwellisation

6. AND/OR importance sampling

Page 31: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Sampling: Performance

• Gibbs sampling

– Reduce dependence between samples

• Importance sampling

– Reduce variance

• Achieve both by sampling a subset of variables and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation

• Exploit graph structure to manage the extra cost

31

Page 32: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Smaller Subset State-Space

• Smaller state-space is easier to cover

32

},,,{ 4321 XXXXX = },{ 21 XXX =

64)( =XD 16)( =XD

Page 33: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Smoother Distribution

33

00

01

10

11

0

0.1

0.2

0001

1011

P(X1,X2,X3,X4)

0-0.1 0.1-0.2 0.2-0.26

0

1

0

0.1

0.2

0

1

P(X1,X2)

0-0.1 0.1-0.2 0.2-0.26

Page 34: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Speeding Up Convergence

• Mean Squared Error of the estimator:

PVarBIASPMSE QQ += 2

−== 2

2

][ˆ]ˆ[]ˆ[ PEPEPVarPMSE QQQQ

• Reduce variance speed up convergence !

• In case of unbiased estimator, BIAS=0

34

Page 35: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Rao-Blackwellisation

35

)}(~{]}|)([{)}({

)}(ˆ{

]}|)([{)}({

]}|)({var[]}|)([{)}({

]}|)([]|)([{1

)(~

)}()({1

)(ˆ

1

1

xgVarT

lxhEVar

T

xhVarxgVar

lxgEVarxgVar

lxgElxgEVarxgVar

lxhElxhET

xg

xhxhT

xg

LRX

T

T

==

+=

++=

++=

=

Liu, Ch.2.3

U

Page 36: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Rao-Blackwellisation

• X=RL

• Importance Sampling:

• Gibbs Sampling: – autocovariances are lower (less correlation

between samples)

– if Xi and Xj are strongly correlated, Xi=0 Xj=0, only include one fo them into a sampling set

36

})(

)({}

),(

),({

RQ

RPVar

LRQ

LRPVar QQ

Liu, Ch.2.5.5

“Carry out analytical computation as much as possible” - Liu

Page 37: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Blocking Gibbs Sampler vs. Collapsed

• Standard Gibbs:

(1)

• Blocking:

(2)

• Collapsed:

(3)

37

X Y

Z

),|(),,|(),,|( yxzPzxyPzyxP

)|,(),,|( xzyPzyxP

)|(),|( xyPyxP

FasterConvergence

Page 38: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Collapsed Gibbs Sampling

Generating Samples

Generate sample ct+1 from ct :

38

),\|(

),,...,,|(

...

),,...,,|(

),,...,,|(

1

1

1

1

2

1

1

1

3

1

12

1

22

321

1

11

ecccPcC

eccccPcC

eccccPcC

eccccPcC

i

t

i

t

ii

t

K

tt

K

t

KK

t

K

ttt

t

K

ttt

from sampled

=

=

=

=

+

+

+++

++

+

In short, for i=1 to K:

Page 39: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Collapsed Gibbs Sampler

Input: C X, E=e

Output: T samples {ct }

Fix evidence E=e, initialize c0 at random1. For t = 1 to T (compute samples)

2. For i = 1 to N (loop through variables)

3. cit+1 P(Ci | ct\ci)

4. End For

5. End For

Page 40: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Calculation Time

• Computing P(ci| ct\ci,e) is more expensive (requires inference)

• Trading #samples for smaller variance:

– generate more samples with higher covariance

– generate fewer samples with lower covariance

• Must control the time spent computing sampling probabilities in order to be time-effective!

40

Page 41: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Exploiting Graph Properties

Recall… computation time is exponential in the adjusted induced width of a graph

• w-cutset is a subset of variable s.t. when they are observed, induced width of the graph is w

• when sampled variables form a w-cutset , inference is exp(w) (e.g., using Bucket Tree

Elimination)

• cycle-cutset is a special case of w-cutset

41

Sampling w-cutset w-cutset sampling!

Page 42: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

What If C=Cycle-Cutset ?

42

}{},{ 9

0

5

0

2

0 XE,xxc ==

X1

X7

X5 X4

X2

X9 X8

X3

X6

X1

X7

X4

X9 X8

X3

X6

P(x2,x5,x9) – can compute using Bucket Elimination

P(x2,x5,x9) – computation complexity is O(N)

Page 43: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Computing Transition Probabilities

43

),,1(:

),,0(:

932

932

xxxPBE

xxxPBE

=

=

X1

X7

X5 X4

X2

X9 X8

X3

X6

),,1()|1(

),,0()|0(

),,1(),,0(

93232

93232

932932

xxxPxxP

xxxPxxP

xxxPxxxP

===

===

=+==

Compute joint probabilities:

Normalize:

Page 44: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Cutset Sampling-Answering Queries

• Query: ci C, P(ci |e)=? same as Gibbs:

44

computed while generating sample tusing bucket tree elimination

compute after generating sample tusing bucket tree elimination

=

=T

t

i

t

ii ecccPT

|ecP1

),\|(1

)(ˆ

=

=T

t

t

ii ,ecxPT

|e)(xP1

)|(1

• Query: xi X\C, P(xi |e)=?

Page 45: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Cutset Sampling vs. Cutset Conditioning

45

)|()|(

)()|(

)|(1

)(

)(

1

ecPc,exP

T

ccountc,exP

,ecxPT

|e)(xP

CDc

i

CDc

i

T

t

t

ii

=

=

=

=

• Cutset Conditioning

• Cutset Sampling

)|()|()(

ecPc,exP|e)P(xCDc

ii =

Page 46: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Cutset Sampling Example

46

+

+=

)(

)(

)(

3

1)|(

)(

)(

)(

9

2

52

9

1

52

9

0

52

92

9

2

52

3

2

9

1

52

2

2

9

0

52

1

2

,x| xxP

,x| xxP

,x| xxP

xxP

,x| xxP x

,x| xxP x

,x| xxP x

X1

X7

X6 X5 X4

X2

X9 X8

X3

Estimating P(x2|e) for sampling node X2 :

Sample 1

Sample 2

Sample 3

Page 47: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Cutset Sampling Example

47

),,|(},{

),,|(},{

),,|(},{

9

3

5

3

23

3

5

3

2

3

9

2

5

2

23

2

5

2

2

2

9

1

5

1

23

1

5

1

2

1

xxxxPxxc

xxxxPxxc

xxxxPxxc

=

=

=

Estimating P(x3 |e) for non-sampled node X3 :

X1

X7

X6 X5 X4

X2

X9 X8

X3

+

+=

),,|(

),,|(

),,|(

3

1)|(

9

3

5

3

23

9

2

5

2

23

9

1

5

1

23

93

xxxxP

xxxxP

xxxxP

xxP

Page 48: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS54 Test Results

48

MSE vs. #samples (left) and time (right)

Ergodic, |X|=54, D(Xi)=2, |C|=15, |E|=3

Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3

0

0.001

0.002

0.003

0.004

0 1000 2000 3000 4000 5000

# samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0

0.0002

0.0004

0.0006

0.0008

0 5 10 15 20 25

Time(sec)

Cutset Gibbs

Page 49: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS179 Test Results

49

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35

Exact Time = 122 sec using Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

100 500 1000 2000 3000 4000

# samples

Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

0 20 40 60 80

Time(sec)

Cutset Gibbs

Page 50: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS360b Test Results

50

MSE vs. #samples (left) and time (right)

Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36

Exact Time > 60 min using Cutset Conditioning

Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

0 200 400 600 800 1000

# samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

1 2 3 5 10 20 30 40 50 60

Time(sec)

Cutset Gibbs

Page 51: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Random Networks

51

MSE vs. #samples (left) and time (right)

|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20

Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0 200 400 600 800 1000 1200

# samples

Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0002

0.0004

0.0006

0.0008

0.001

0 1 2 3 4 5 6 7 8 9 10 11

Time(sec)

Cutset Gibbs

Page 52: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Coding NetworksCutset Transforms Non-Ergodic Chain to Ergodic

52

MSE vs. time (right)

Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50

Sample Ergodic Subspace U={U1, U2,…Uk}

Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4

u1 u2 u3 u4

p1 p2 p3 p4

y4y3y2y1

Coding Networks, n=100, |C|=12-14

0.001

0.01

0.1

0 10 20 30 40 50 60

Time(sec)

IBP Gibbs Cutset

Page 53: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Non-Ergodic Hailfinder

53

MSE vs. #samples (left) and time (right)

Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0

Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

1

1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

0 500 1000 1500

# samples

Cutset Gibbs

Page 54: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS360b - MSE

54

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0

0.000005

0.00001

0.000015

0.00002

0.000025

0 200 400 600 800 1000 1200 1400 1600

Time (sec)

Gibbs

IBP

|C|=26,fw=3

|C|=48,fw=2

MSE vs. Time

Ergodic, |X| = 360, |C| = 26, D(Xi)=2

Exact Time = 50 min using BTE

Page 55: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Cutset Importance Sampling

• Apply Importance Sampling over cutset C

==

==T

t

tT

tt

t

wTcQ

ecP

TeP

11

1

)(

),(1)(ˆ

=

=T

t

tt

ii wccT

ecP1

),(1

)|(

=

=T

t

tt

ii wecxPT

exP1

),|(1

)|(

where P(ct,e) is computed using Bucket Elimination

(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)

Page 56: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Likelihood Cutset Weighting (LCS)

• Z=Topological Order{C,E}

• Generating sample t+1:

56

For End

If End

),...,|(

Else

If

:do For

1

1

11

1

1

+

++

+

=

t

i

t

i

t

i

ii

t

i

i

i

zzZPz

e ,zzz

E Z

ZZ • computed while generating sample tusing bucket tree elimination

• can be memoized for some number of instances K (based on memory available

KL[P(C|e), Q(C)] ≤ KL[P(X|e), Q(X)]

Page 57: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Pathfinder 1

57

Page 58: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Pathfinder 2

58

Page 59: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Link

59

Page 60: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Summary

• i.i.d. samples

• Unbiased estimator

• Generates samples fast

• Samples from Q

• Reject samples with zero-weight

• Improves on cutset

• Dependent samples

• Biased estimator

• Generates samples slower

• Samples from `P(X|e)

• Does not converge in presence of constraints

• Improves on cutset

60

Importance Sampling Gibbs Sampling

Page 61: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS360b

61

cpcs360b, N=360, |LC|=26, w*=21, |E|=15

1.E-05

1.E-04

1.E-03

1.E-02

0 2 4 6 8 10 12 14

Time (sec)

MS

ELW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 62: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

CPCS422b

62

1.0E-05

1.0E-04

1.0E-03

1.0E-02

0 10 20 30 40 50 60

MS

E

Time (sec)

cpcs422b, N=422, |LC|=47, w*=22, |E|=28 LW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 63: Sampling Techniques for Probabilistic and …Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading” Darwiche

Coding Networks

63

coding, N=200, P=3, |LC|=26, w*=21

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

0 2 4 6 8 10

Time (sec)

MS

ELWAIS-BNGibbsLCSIBP

LW – likelihood weightingLCS – likelihood weighting on a cutset