29
Machine Learning Lecture 23: Statistical Estimation with Sampling in Murray’s MLSS lecture on videolectures.net: tp://videolectures.net/mlss09uk_murray_mcmc/

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Embed Size (px)

Citation preview

Page 1: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Machine Learning

Lecture 23: Statistical Estimation with Sampling

Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/

Page 2: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Today

• In service of EM In graphical models

• Sampling– Technique to approximate the expected value

of a distribution

• Gibbs Sampling– Sampling of latent variables in a Graphical

Model

2

Page 3: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

What is the average height of professors of CS at Queens

College?

• What’s the size of C?

3

Page 4: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

What is the average height of students at Queens College?

• What’s the size of C?

4

Page 5: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

What is the average height of people in Queens?

5

Page 6: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

So we’re comfortable approximating statistical

parameters…

• Why don’t we use this to do inference in complicated Graphical Models?

• or where it is difficult to count everything?

6

Page 7: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Statistical sampling

• Make a prediction about variable, x, based on data D.

7

Page 8: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Expected Values

• Want to know the expected value of a distribution.– E[p(t | x)] is a classification

problem• We can calculate p(x), but

integration is difficult.• Given a graphical model

describing the relationship between variables, we’d like to generate E[p(x)] where x is only partially observed.

8

Page 9: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Sampling

• We have a representation of p(x) and f(x), but integration is intractable

• E[f] is difficult as an integral, but easy as a sum.• Randomly select points from distribution p(x) and

use these as representative of the distribution of f(x).

• It turns out that if correctly sampled, only 10-20 points can be sufficient to estimate the mean and variance of a distribution.– Samples must be independently drawn– Expectation may be dominated by regions of high

probability, or high function values

9

Page 10: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Monte Carlo Example

• Sampling techniques to solve difficult integration problems.

• What is the area of a circle with radius 1? – What if you don’t know trigonometry?

10

Page 11: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Monte Carlo Estimation• How can we approximate the

area of a circle if we have no trigonometry?

• Take a random x and a random y between 1 and -1– Sample from x and sample from y.

• Determine if• Repeat many times.• Count the number of times that

the inequality is true.• Divide by the area of the square

11

Page 12: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

How is sampling used in EM?

• E-Step– what are the responsibilities in GMM?

– p(xhidden | xobserved)

• M-Step– Reestimate parameters based on a convex

optimization.– Get new parameters

12

Page 13: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Sampling in a Graphical Model

• Sample variables from its marginal

• Sample children after parents

13

AA BB

CC

DD

EE

Page 14: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

How do you sample from a distribution???

• Known algorithms

• Use this book: http://luc.devroye.org/rnbookindex.html

14

Page 15: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Basic Algorithm

• Sample uniformly from x.• The probability mass to the left of x is a

uniform distribution. 15

x1x2 x3 x4

Page 16: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Basic Algorithm

• y(u) = h-1(u)

• h is not always easy to calculate or invert16

x1x2 x3 x4

1

Page 17: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Rejection Sampling

• The distribution p(x) is easy to evaluate– As in a graphical model representation

• But difficult to integrate.• Identify a simpler distribution, kq(x), which bounds

p(x), and sample, x0, from it.– This is called the proposal distribution.

• Generate another sample u from an even distribution between 0 and kq(x0).– If u ≤ p(x0) accept the sample

• E.g. use it in the calculation of an expectation of f– Otherwise reject the sample

• E.g. omit from the calculation of an expectation of f

17

Page 18: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Rejection Sampling Example

18

Page 19: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Importance Sampling

• One problem with rejection sampling is that you lose information when throwing out samples.

• If we are only looking for the expected value of f(x), we can incorporate unlikely samples of x in the calculation.

• Again use a proposal distribution to approximate the expected value.– Weight each sample from q by

the likelihood that it was also drawn from p.

19

Page 20: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Graphical Example of Importance Sampling

20

Page 21: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Markov Chain Monte Carlo

• Markov Chain:– p(x1|x2,x3,x4,x5,…) = p(x1|x2)

• For MCMC sampling start in a state z(0).• At each step, draw a sample z(m+1) based on the

previous state z(m)

• Accept this step with some probability based on a proposal distribution.– If the step is accepted: z(m+1) = z(m)

– Else: z(m+1) = z(m)

• Or only accept if the sample is consistent with an observed value

21

Page 22: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Markov Chain Monte Carlo

• Goal: p(z(m)) = p*(z) as m →∞ – MCMCs that have this property are called ergodic.– Implies that the sampled distribution converges to the

true distribution

• Need to define a transition function to move from one state to the next. – How do we draw a sample at state m+1 given state

m?– Often, z(m+1) is drawn from a gaussian with z(m) mean

and a constant variance.

22

Page 23: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Markov Chain Monte Carlo

• Goal: p(z(m)) = p*(z) as m →∞ – MCMCs that have this property are ergodic.

• Transition properties that provide detailed balance guarantee ergodic MCMC processess. – Also considered reversible.

23

Page 24: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Metropolis-Hastings Algorithm

• Assume the current state is z(m).

• Draw a sample z* from q(z|z(m))

• Accept probability function

• Often use a normal distribution for q– Tradeoff between convergence and

acceptance rate based on variance.

24

Page 25: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Gibbs Sampling

• We’ve been treating z as a vector to be sampled as a whole

• However, in high dimensions, the accept probability becomes vanishingly small.

• Gibbs sampling allows us to sample one variable at a time, based on the other variables in z.

25

Page 26: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Gibbs sampling

• Assume a distribution over 3 variables.

• Generate a new sample for each variable conditioned on all of the other variables.

26

Page 27: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Gibbs Sampling in a Graphical Model

• The appeal of Gibbs sampling in a graphical model is that the conditional distribution of a variable is only dependent on its parents.

• Gibbs sampling fixes n-1 variables, and generates a sample for the the nth.

• If each of the variables are assumed to have easily sample-able distributions, we can just sample from the conditionals given by the graphical model given some initial states.

27

Page 28: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Gibbs Sampling

• Fix 4 variables, sample 5th

• repeat until convergence

28

AA BB

CC

DD

EE

Page 29: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Next Time

• Perceptrons

• Neural Networks

29