June 2, 2015 1 MARKOV CHAIN MONTE CARLO: A Workhorse for Modern Scientific Computation Xiao-Li Meng Department of Statistics Harvard University

April 18, 2023 1

MARKOV CHAIN MONTE CARLO:A Workhorse for Modern Scientific Computation

Xiao-Li MengDepartment of Statistics

Harvard University

2April 18, 2023

Introduction

3April 18, 2023

Applications of Monte Carlo

Physics

Chemistry

Astronomy

Biology

Environment

Engineering

Traffic

Sociology

Education

Psychology

Arts

Linguistics

History

Medical Science

Economics

Finance

Management

Policy

Military

Government

Business

…

4April 18, 2023

Monte Carlo 之应用A Chinese version of the previous slide

物理

化学

天文

生物

环境

工程

交通

社会

教育

心理

人文

语言

历史

医学

经济

金融

管理

政策

军事

政府

商务…

5April 18, 2023

Monte Carlo Integration

Suppose we want to compute

where f(x) is a probability density. If

we have samples x1,…,xn » f(x), we can estimate I by

6April 18, 2023

Monte Carlo Optimization

We want to maximize p(x) Simulate from

f(x) / p(x).

As ! 1, the simulated

draws will be more and

more concentrated around

the maximizer of p(x)

-5 0 5

0.0

0.1

0.2

0.3

0.4

x

dens

ity

-5 0 5

0.00

0.05

0.10

0.15

x

dens

ity

-5 0 50.0

e+00

8.0

e-09

x

dens

ity

=1

=2

=20

7April 18, 2023

Simulating from a Distribution

What does it mean?Suppose a random variable (随机变量 ) X can only take two values:

Simulating from the distribution of X means that we want a collection of 0’s and 1’s:

such that about 25% of them are 0’s and about 75%of them are 1’s, when n, the simulation size is large.

The {xi, i = 1,…,n} don’t have to be independent

8April 18, 2023

Simulating from a Complex Distribution

Continuous variable X, described by a density function f(x)

Complex: the form of f(x) the dimension of x

9April 18, 2023

Markov Chain Monte Carlo

where {U(t), t=1,2,…} are identically and independently distributed.

Under regularity conditions,

So We can treat {x(t), t= N0, …, N} as an approximate sample from f(x), the stationary/limiting distribution.

10April 18, 2023

Gibbs Sampler

Target density: We know how to simulate form the conditional

distributions

For the previous example,

N(,2)Normal Distribution

“Bell Curve”

11April 18, 2023

12April 18, 2023

Statistical Inference Point Estimator:

Variance Estimator:

Interval Estimator:

13April 18, 2023

Gibbs Sampler (k steps)

Select an initial value (x1(0), x2

(0) ,…, xk(0)).

For t = 0,1,2, …, N Step 1: Draw x1

(t+1) from f(x1|x2(t), x3

(t),…, xk(t))

Step 2: Draw x2(t+1) from f(x2|x1

(t+1), x3(t),…, xk

(t))

…

Step K:Draw xk(t+1) from f(xk|x1

(t+1), x2(t+1),…, xk-1

(t+1))

Output {(x1(t), x2

(t),…, xk(t) ): t= 1,2,…,N}

Discard the first N0 draws

Use {(x1(t), x2

(t),…, xk(t) ): t= N0+1,2,…,N} as (approximate) samples from

f(x1, x2,…, xk).

14April 18, 2023

Data Augmentation

We want to simulate from

But this is just the marginal distribution of

So once we have simulations:

{(x(t), y(t): t= 1,2,…,N)},

we also obtain draws:

{x(t): t= 1,2,…,N)}

0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

de

nsi

ty

15April 18, 2023

A More Complicated Example

16April 18, 2023

Metropolis-Hastings algorithm

Simulate from an approximate distribution q(z1|z2), then

Step 0: Select z(0);

Now for t = 1,2,…,N, repeat

Step 1: draw z1 from q(z1|z2=z(t))

Step 2: Calculate

Step 3: set

Discard the first N0 draws

Accept

reject

17April 18, 2023

M-H Algorithm: An Intuitive Explanation

Assume , then

18April 18, 2023

M-H: A Terrible Implementation

We choose q(z|z2)=q(z)=(x-4)(y-4)

Step 1: draw x » N(4,1), y » N(4,1);

Dnote z1=(x,y)

Step 2: Calculate

Step 3: draw u » U[0,1]

Let

19April 18, 2023

Why is it so bad?

20April 18, 2023

M-H: A Better Implementation

Starting from some arbitrary (x(0),y(0))

Step 1: draw x » N(x(t),1), y » N(y(t),1)“random walk”

Step 2: dnote z1=(x,y), calculate

Step 3: draw u » U[0,1]Let

21April 18, 2023

Much Improved!

22April 18, 2023

Further Discussion

How large should N0 and N be?

Not an easy problem! Key difficulty:

multiple modes in unknown area

We would like to know all (major) modes, as well as their surrounding mass.

Not just the global mode

We need “automatic, Hill-climbing” algorithms.

) The Expectation/Maximization (EM) Algorithm, which can be viewed as a deterministic version of Gibbs Sampler.

23April 18, 2023

Drive/Drink Safely,

Don’t become a Statistic;

Go to Graduate School,

Become a Statistician!

Documents

June 2, 2015 1 MARKOV CHAIN MONTE CARLO: A Workhorse for Modern Scientific Computation Xiao-Li Meng Department of Statistics Harvard University