Perfect Sampling: The Basics...50 Applications of perfect sampling Ising model (random cluster...

Preview:

Citation preview

1

Perfect Sampling: The BasicsPerfect Sampling: The Basics

Mark HuberDept. of Mathematics and Inst. of Statistics and Decision Sciences

Duke Universitymhuber@math.duke.edu

www.math.duke.edu/~mhuber

2

The Problem

■ Start with state space, finite measure

For discrete , know measure of singletons:

For continuous , know density

■ Goal: generate random variates from

A=∫Af x d x

x ,∀ x∈

A= A

f x

3

Usual approach

■ Construct a Markov chain with as stationary distribution

■ Can use Metropolis-Hastings or Gibbs sampler without knowing

■ Problem: difficult to find the mixing time of the Markov chain

4

Where this arises...

■ Computer Scienceapproximation algs. for #P-complete problemspermanent of 0-1 matrix

■ Numerical Integrationacceptance/rejection: need tight envelopeSelf-reducibility for somewhat smooth functions

■ Statisticsexact p-values, approximate probability intervals

■ Statistical PhysicsIsing, hard-core, random cluster models

5

Numerical Integration

Monte Carlo Integration

Step 1) Draw points from area under say Step 2) Form order statisticsStep 3) Let be new limits

X 1 , , X N

f

Nf

X N /4 , X 3 N /4

1/2Area

6

Numerical Integration Part II

Monte Carlo Integration

Step 4) Draw points from new area Step 5) Use median as new limitStep 6) Repeat until interval small

f

N

7

Numerical Integration Part III

Let

Final Area estimate =

f

ab2 ⋅2r

a b

r1=initial length, r2=b−a r=number of times split area

8

Numerical Integration Part IV

Best news: work grows linearly with dimension!

With dimensions, repeat times

3 log2r1/r2N log 1/

Let

Error from Monte Carlo =

=probability algorithm fails

d d

9

Direct is not perfect

Exact Sampling

Direct Sampling

Perfect Samplingdraws exactly from

without knowing

computing

10

Acceptance Rejection

Acc/Rej Algorithm:

1) Let2) If accept Else goto step 1

FeaturesRunning Time Geometric:No need to know

X Unif

X ∈A

A

P Tk E [T ]e−k

11

Properties of Perfect Sampling

■ Generates exactly from desired distribution

■ Running time is random (Las Vegas algorithm)

■ No knowledge of normalizing constant needed

P T2 k E [T ]1 /2k

■ Direct sampling uses knowledge of

12

The Good News

■ Generates exactly from desired distribution

■ Can be used for continuous or discrete

■ True algorithms(Markov chain methods are not algorithms unless the mixing time is known)

■ Useful even if running time unknown

13

The Bad News

■ Not a magic solution to slow Markov chains

■ Requires more effort than Metroplis-Hastings

■ Methods more complex

14

Perfect Sampling Methods

Protocols

Techniques

General frameworks for creating perfect sampling algorithms

Specific tricks and methods for turning the protocols into algorithms

15

Protocols

Coupling Markov chains

Coupling from the past (Propp, Wilson 1996)Fill, Machida, Murdoch, Rosenthal (1999)Read-Once CFTP (Wilson 2000)High noise CFTP (Häggström, Steif 2000)

Modified Acceptance/Rejection

Popping Algorithms (Propp, Wilson 1998)Randomness Recycler (Fill, Huber 2001)

16

Techniques

Monotonicity (Propp, Wilson 1996)

Multigamma coupling (Murdoch, Green 1998)

Bounding chains (Häggström, Nelander 1999),(H. 1999, 2004)

Multishift coupling (Wilson 2000)

How to build a better coupler:

17

Coupling from the Past

How to describe a Markov chain?Update functionUpdate function (Propp, Wilson 1996)Stochastic Recursive SchemeStochastic Recursive Scheme (Borovkov, Foss 1992)Complete couplingComplete coupling (H. 2004)

U 1, U 2, ...~U [0,1]Given a sequence of independent, identical uniforms:

...a deterministic function , and starting state :fX 0=x0, X t1= f X t ,U t

x0

18

A simple exampleTransposition chain on permutations

Suppose differ by one transposition1,2

P X t1=1∣X t=2=1n2

Example permuation: 3 4 7 2 1 6 5

Say card is in position Chain is just swapping two cards at random

4 2

19

More than one complete coupling...Method 1

LetSwap cards and

iUnif {1,2,. .. , n}, jUnif {1,2,. ... , n}i j

Method 2LetSwap cards at positions and

iUnif {1,2,. .. , n}, jUnif {1,2,. ... , n}i j

20

The best one...Best method

LetSwap card and card in position

iUnif {1,2,. .. , n}, jUnif {1,2,. ... , n}i j

The Key Fact:This chain can be run without knowing !

X 0

21

Bounding chainBegin with unknown stationary state

? ? ? ? ? ? ?Choose card and position

i3, j4

Swap card and positioni j

? ? ? 3 ? ? ?

22

Continuing...The next steps:

i j

? ? ? 3 ? ? ?2 ? ? 3 ? ? ?2 ? ? 3 ? 5 ?2 ? 3 ? ? 5 ?5 ? 3 ? ? 2 ?

2 15 63 32 6

⋮5 3 2 1 6 7 4

23

Once we have a coupler

CFTP(T)1) Generate iid2) If then return3) Else let

return

U 0, U 1, ,U T Unif [0,1]

Defn: F t x:= f f f ⋯ f x ,U 0⋯ ,U t−1 ,U t (so X 0=x X t=F t x )

F T ={X } XX 0CFTP 2T

F T X 0

24

It works

Thm: As long as is constant with positive probability for some , CFTP terminates with probability 1 and is a perfect sampling algorithm

F T T

Drawbacks

Read-Twice: need to storeNoninterruptible: cannot abort algorithm

without biasing the sampleAs slow as underlying Markov chain

U 0, U 1, ,U T

25

Bounding chain: formal definition

⊆CVOriginal state space:

Bounding chain state space: ∗⊆2C V

A chain on bounds one on if they can be coupled so that

X t v ⊆Y t v ∀ v∈V

X t1v ⊆Y t1v ∀ v∈V

26

Other coupling methods

Mononicity: an update function is monotonic if(for some partial ordering of )

X t≤Y t f X t ,U t≤ f Y t ,U t

Track minimum and maximum statesmax

min

stat

27

More useful facts

■ Techniques exist for continuous state spaces: multigamma, multishift couplers specially designed

■ Bounding chains with Metropolis-Hastings works with continuous or discrete spaces

■ CFTP always as good as (and usually faster) than acceptance rejection

28

Multishift coupler

Use Metropolis-Hastings with proposal uniform centered at current location:

If accept for entire interval, then whole interval couples to a single point

29

Acceptance/Rejection Revisited

X i Unif {1, , n}

Acc/Rej(n)1) For2) Generate3) If for some

start algorithm over again4) Return

i∈{1, , n}

X i =X j ji

3 4 1 42 1 5 7 6 3 4

X

30

Why no one does this

Running Time

nn

n!≈en

2 n

Solution: Recycle!

31

Randomness Recycler

Framework (Fill, Huber 2001)Build up variate one coordinate at a timeIf accept that coordinate, keep goingElse recycle what you can, keep going

AccExample: permuations

Acc Acc RejWhat is distribution of given rejected?

[X 1 X 2 X 3]X 4

32

Effect of rejection

The events of interest

The calculation

A:=[X 1 X 2 X 3]=[ x 1 x 2 x 3]B:=X 4 rejected

P A∣B=P A , B/P B =[n n−1n−23 /7]/3/7 =nn−1n−2

The result:Can recycle all but the last element!

33

The RR algorithm

X i Unif {1, , n}

Randomness Recycler for Permuations(n)1) For2) Generate3) If for some

goto step 2)4) Return

i∈{1, , n}

X i =X j ji

X

34

RR for general problems

Bivariate chain

At each step

Quit when

X t∗ , X t , where X t~X t

X t∗ , X t

X acc∗ , X t1

X rej∗ , X t

X t∗=

35

Notes on RR

X acc∗

X acc∗

■ The user gets to choose

■ The closer is to higher probability of acceptance

■ Generally closer to , farther away

X acc∗ X t

X rej∗

36

Compare and Constrast

Randomness Recycler

FasterInterruptibleRead-OnceForward direction (no recursion)Harder to buildRelated to strong stationary times of Markov chains

Coupling from the past

Can utilize existing Markov chains for problemsNoninterruptible, Read-Twice, slower(can fix exactly one of these with modifications)Related to coupling times of Markov chains

37

Running times for permutations

Randomness Recycler

Coupling from the past

nn

nn−1⋯

n2

n1≤n ln n

nn

2

nn−1

2

⋯ n1

2

≤n22

6

38

Another Example

Hard core gas model(physics: gases, computer science: network failures)

Assign each node in a graphGiven constants

x v ∈{0,1}v

x=[∏nodes vv x v ][∏v~w

1x v x w ≤1]v

activity independent set

39

In pictures

identically smallv 1

01 1

0

0

0 0

00

0

1

v 1

1

1

1

0

0 0

0

0

01

1

identically large

0

11

40

Markov chain

1

01

0

0

Gibbs Sampler1) Let2) Let3) If all neighbors of

have and

let4) Else let

X vUnif set of nodes

U Unif [0,1]vw

X w =0

U≥ v 1v

X v 1X v 0

U=.8

1 0

10

41

Bounding chain

U≤ v 1v

When

Always set X v 0

??

A Good Move

? ?

??

0

?? ??

??

??

42

Bounding chain

U≥ v 1v If

Don't know X v

A Bad Move

0

0 0

When , more good

moves than bad moves

≤ 2−2

??

0

????

0

43

Randomness Recycler

0

0

RR for Hard Core Gas model1) Start with 2) While3) Let4) Let 5) If no conflicts accept and set6) Else reject and recycle

' v =0,∀ v∃v : ' v =0

U Unif [0,1]X v 1Uv /[1v ]

' v v

1

0

' v =0

' v =1

44

Recycle

' v 00

10

0Suppose

U=.4

Causes conflict0

1

0

1

Recycle by resetting neighbors of

0

0 011

0

45

RR Type II

Hard core gas model(physics: gases, computer science: network failures)

Assign each node in a graphGiven constants

x v ∈{0,1}v

x=[∏nodes vv x v ][∏v~w

1x v x w ≤1]v

activity independent set

First RR relaxed this constraint

Second RR relaxes this constraint

46

Randomness Recycler II

RR for Hard Core Gas model using edges1) Start with no edges in graph 2) While some edges not in graph3) Add an edge back to graph4) If no conflicts accept edge5) Else reject and recycle

47

Recycle Edges Part I

0

1

0

Edge causes conflict

0

0 0

1 0

0 0 0 1

48

Recycle Edges Part II

0

1

1) Remove contaminated edges2) Reroll values for nodes

0

0 0

1

0

0 0 0

1

1

49

Analysis

Randomness Recycler

Coupling from the past

Runtime Valid

O n

O n ln n

≤ 43−2

≤ 2−2

:=maximum degree of graph

50

Applications of perfect sampling

■ Ising model (random cluster model)■ Proper colorings of a graph (Potts model)■ Widom-Rowlinson model■ Move ahead 1 chain■ Hard core gas model (discrete and continuous)■ Soft (penetrating) core gas models■ Linear extensions of permuations■ Regular, dense restricted permutations of a graph■ Sink free orientations of a graph■ Bayesian analysis: unknown mixture problems■ Multivariate normals in positive orthant■ Exact p-values for nonparametric regression■ Orthonormal model selection

51

What we know

■ True algorithmsno need to know mixing time of anything

■ Several different typesCoupling from the past (and variants)Randomness Recycler

■ No knowledge of normalizing constant needed

■ Works on continuous and discrete problems

52

What we would like to know

■ Crossover potentialcould monotonicity be used with RR?how about bounding chains

■ Conductance in Markov chainsCouplings related to CFTP, SST related to RRThird major method for proving rapid mixing of

Markov chains is conductanceCan we design a protocol to use conductance?

■ Must running time be random?

53

ReferencesA. A. Brovkov and S. G. Foss, Stochastically recursive sequences and their generalizations. Siberian Advances in Mathematics, 2(1): 16—81, 1992

J. A. Fill and M. L. Huber. The Randomness Recycler: A new approach to perfect sampling. In Proc 41st Sympos. on Foundations of Comp. Sci., 503—511, 2000

O. Haggstrom and J. E. Steif. Propp-Wilson algorithms and finitary coding for high noise Markov random fields. Combin. Probab. Computing, 9:425—439, 2000

M. Huber. Perfect sampling using bounding chains. Annals of Applied Probability, 2004, to appear

M. Huber. Perfect sampling with bounding chains. PhD thesis, Cornell University, 1999

54

ReferencesD. J. Murdoch and P. J. Green. Exact sampling from a continuous state space. Scand. J. Statist., 25(3):483—502, 1998

J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures & Algorithms, 9(1—2):223--252, 1996

D. B. Wilson. How to couple from the past using a read-once source of randomness. Random Structures & Algorithms, 16(1):85—113, 2000

My website:http://www.math.duke.edu/~mhuber

David Wilson's perfect sampling pagehttp://dbwilson.com/exact/

Recommended