35
Explaining “Explaining Variational Approximation” Based on Paper “Explaining Variational Approximation” JT Ormerod, MP Wand (2010) Presentation by Wayne Tai Lee

Explaining the Basics of Mean Field Variational Approximation for Statisticians

Embed Size (px)

Citation preview

Page 1: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Explaining “Explaining Variational Approximation”

Based on Paper

“Explaining Variational Approximation”

JT Ormerod, MP Wand (2010)

Presentation by Wayne Tai Lee

Page 2: Explaining the Basics of Mean Field Variational Approximation for Statisticians

My Goal

● Convert the paper into a short presentation● Not covering the examples (really helpful!)● Intuition and motivation only

Page 3: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● In Statistics, Bayesian solutions always involve the posterior:

p(Θ|data) = p(data | Θ) p(Θ) / p(data)

Page 4: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(Θ|data) : posterior, belief after updating with data

Page 5: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(Θ|data) : posterior, belief after updating with data

p(data | Θ): likelihood, data generation

Page 6: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(Θ|data) : posterior, belief after updating with data

p(data | Θ): likelihood, data generation

p(Θ): prior, belief before updating with data

Page 7: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(Θ|data) : posterior, belief after updating with data

p(data | Θ): likelihood, data generation

p(Θ): prior, belief before updating with data

p(data): “normalizing constant” to ensure posterior is a density function

Page 8: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(data | Θ): likelihood, specified by you

p(Θ): prior, specified by you

p(data): a nasty integral that cannot be calculated explicitly in general

Page 9: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Why do we want to use variational approximations?

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)

p(data | Θ): likelihood, specified by you

p(Θ): prior, specified by you

p(data): a nasty integral that cannot be calculated explicitly in general

● Consequence:– Posterior often has no analytical expression

Page 10: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Most popular alternative

● To obtain the posterior or any related statistic– Sample the posterior via MCMC methods

Page 11: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Most popular alternative

● To obtain the posterior or any related statistic– Sample the posterior via MCMC methods

● Pros– Can get arbitrarily close to the posterior with enough

samples (resource/time intensive)

● Con:– Lots of tuning necessary

– Time consuming to run

Page 12: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Variational Approximation

● Intuition:– Approximate the posterior with a class of

functions that are easier to deal with mathematically

– Find the function that minimizes the KL divergence between the posterior in this class

Page 13: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Variational Approximation

● Intuition:– Approximate the posterior with a class of

functions that are easier to deal with mathematically

– Find the function that minimizes the KL divergence between the posterior in this class

● Pros:– Suuuuper fast

● Cons:– No guarantees on closeness

Page 14: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Big Picture

Method to get Posterior

MCMC Variational Method

Strategy Sampling Optimization

Solution Asymptotically Exact Approximation with no bounds

Speed Often slow Fast

The “catch” Tuning and convergence assessment require experience

Need tractable mathematical setup

Page 15: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Explaining Variational Approximation

● Change notation: p(y) = p(data)

● Use q(Θ) to approximate p(Θ|y)● Will assume family of functions for q(Θ)

– q(Θ) = q1(Θ1)q2(Θ2)...qp(Θp)– Each qi(Θi) is a density

Page 16: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Max Lower Bound = Min KL-Divergence

Page 17: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Sanity Check: Optimal Solution is THE solution

● Optimal q(Θ) is p(Θ|y) for general form:

● Important: this is a very general solution for arbitrary dependence/distribution of Θ and y

● Product form of q(Θ) allows us to divide and conquer!

Page 18: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Focus on each Θ separately

Page 19: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Focus on each Θ separately

+...

Page 20: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Focus on each Θ separately

+...

Page 21: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Focus on each Θ separately

Page 22: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Our assumptions so far

● Product form of q(Θ) allowed us to optimize each term separately

● qi(Θi) being densities allowto integrate out nicely

Page 23: Explaining the Basics of Mean Field Variational Approximation for Statisticians

How to convert into an optimization problem that we can solve?

Page 24: Explaining the Basics of Mean Field Variational Approximation for Statisticians

We've only learned one trick...

Page 25: Explaining the Basics of Mean Field Variational Approximation for Statisticians

We've only learned one trick...

● Optimal q1(Θ1) is then

Page 26: Explaining the Basics of Mean Field Variational Approximation for Statisticians

To get a densities, just normalize

Page 27: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Unfold our definitions

Page 28: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Focus on Θ1

Page 29: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Repeat for Θi

● General Mean Field Variational Approximation Solution: The density that is proportional to

Page 30: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Similarity to Full Conditional in Gibb Sampling

● Optimal qi(Θi) is proportional to

● Need to do algebra until this is “tractable”– i.e. something we recognize as a standard

distribution that is easily normalized

– This is where the “setup” comes in important

Page 31: Explaining the Basics of Mean Field Variational Approximation for Statisticians

For example

● Ifresembles exp(Θi^2 *c) then we know this must be the Gaussian density!

Page 32: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Final Solution

● Product of all qi(θi) is then approximated to

p(θ|y)

● Naturally doesn't do well when there's strong dependence between the θi

● You Should try the examples in the paper!

Page 33: Explaining the Basics of Mean Field Variational Approximation for Statisticians

First Example

● Data generated as

– Y | μ, σ^2 ~ N(μ,σ^2)● Priors

– μ ~ N(m,s^2)– σ^2 ~InvGamma(a, b)

Page 34: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Gibbs Sampling vs Variational Samples

● N=100

● N=20

Page 35: Explaining the Basics of Mean Field Variational Approximation for Statisticians

Discussion

● Hard to know when the approximation is poor relative to the true posterior...