Deep Variational Inference - University of Texas at Austinml/flare/extra/slides-deepvarinf.pdf ·...

Deep Variational Inference

FLARE Reading Group PresentationWesley Tansey9/28/2016

What is Variational Inference?

● Want to estimate some distribution, p*(x)

What is Variational Inference? p*(x)

● Too expensive to estimate

What is Variational Inference? p*(x)

● Too expensive to estimate

● Approximate it with a tractable distribution, q(x)

What is Variational Inference? p*(x) q(x)

● Fit q(x) inside of p*(x)● Centered at a single

mode ○ q(x) is unimodal

here○ VI is a MAP

estimate

What is Variational Inference? p*(x) q(x)

● Mathematically:

KL(q || p*)

= Σxq(x)log(q(x) / p*(x))

Still hard!

p*(x) usually has a tricky normalizing constant

● Mathematically:

KL(q || p*)

● Use unnormalized p~ instead

log(q(x) / p*(x))

= log(q(x)) - log(p*(x))

= log(q(x)) - log(p~(x) / Z)

= log(q(x)) - log(p~(x)) - log(Z)

● Mathematically:

KL(q || p*)

log(q(x) / p*(x))

= log(q(x)) - log(p*(x))

= log(q(x)) - log(p~(x) / Z)

= log(q(x)) - log(p~(x)) - log(Z)

● Mathematically:

KL(q || p*)

Constant=> Can ignore in our optimization problem

● Classical method

● Uses a factorized q:

q(x) = ∏i q

Mean Field VI

[1] Blei, Ng, Jordan, “Latent Dirichlet Allocation”, JMLR, 2003.

● Example: Multivariate Gaussian

● Product of independent Gaussians for q

● Spherical covariance underestimates true covariance

Mean Field VI

● Vanilla mean field VI assumes you know all the parameters, θ, of the true distribution, p*(x)

Variational Bayes

● Vanilla mean field VI assumes you know all the parameters, θ, of the true distribution, p*(x)

● Enter: Variational Bayes (VB)

Variational Bayes

● VB infers both the latent q(x) variables, z, and the p*(x) parameters, θ

● VB-EM was popularized for LDA1

○ E for z, M for θ

Variational Bayes

● VB usually uses a mean field approximation of the form:

q(x) = q(zi | θ)∏

Variational Bayes

● Requires analytical solutions of expectations w.r.t. q

i○ Intractable in

general● Factored form limits

the power of the approximation

Issues with Mean Field VB

i○ Intractable in

Solution: Auto-Encoding Variational Bayes(Kingma and Welling, 2013)

i○ Intractable in

Solution:Variational Inference with Normalizing Flows(Rezende and Mohamed, 2015)

Solution: Auto-Encoding Variational Bayes(Kingma and Welling, 2014)

Auto-Encoding Variational Bayes1

High-level idea:

1) Optimizing the same lower bound that we get in VB

2) Data augmentation trick leads to lower-variance estimator

3) Lots of choices of q(z|x) and p(z) lead to partial closed-form

4) Use a neural network to parameterize qϕ(z | x) and pθ(x | z)

5) SGD to fit everything

[1] Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR, 2014.

● Given N iid data points, (x1, ... , xn)

● Maximize the marginal likelihood:

log pθ(x1,...,xn) = Σi log pθ(x(i))

1) VB Lower Bound

Always positive

1) VB Lower Bound

Always positive

Lower bound

● Write lower bound

1) VB Lower Bound

Anyone want the derivation?

● Rewrite lower bound

1) VB Lower Bound

Derivation?

● Monte Carlo gradient estimator of expectation part

1) VB Lower Bound

● Monte Carlo gradient estimator of expectation part○ Too high variance

1) VB Lower Bound

● Rewrite qϕ(z(l) | x)

● Separate q into a deterministic function of x and an auxiliary noise variable ϵ

● Leads to lower variance estimator

2) Reparameterization trick

● Example: univariate Gaussian

● Can rewrite as sum of mean and a scaled noise variable

● Lots of distributions like this. Three classes given:○ Tractable inverse

CDF○ Location-scale○ Composition

2) Reparameterization trick Exponential, Cauchy, Logistic,

Rayleigh, Pareto, Weibull, Reciprocal, Gompertz, Gumbel, Erlang

Laplace, Elliptical, Student’s t, Logistic, Uniform, Triangular, Gaussian

Log-Normal (exponentiated normal)Gamma (sum of exponentials)Dirichlet (sum of Gammas)Beta, Chi-Squared, F

● Yields a new MC estimator

● Plug estimator into the lower bound eq.

● KL term often can be integrated analytically○ Careful choice of

priors

● Plug estimator into the lower bound eq.

priors

priors○ E.g. both Gaussian

3) Partial closed form

● Regularizer ● Reconstruction error

● Neural nets○ Encode: q(z | x)○ Decode: p(x | z)

4) Auto-encoder connection

● q(z | x) encodes● p(x | z) decodes● “Information layer(s)”

need to compress○ Reals = infinite info○ Reals + random

noise = finite info

4) Auto-encoder connection (alt.)

More info in Karol Gregor’s Deep Mind lecture: https://www.youtube.com/watch?v=P78QYjWh5sM

● Deep networks parameterize both q(z | x) and p(x | z)

● Lower-variance estimator of expected log-likelihood

● Can choose from lots of families of q(z | x) and p(z)

Where are we with VI now? (2013’ish)

● Problem:○ Most parametric families

available are simple○ E.g. product of independent

univariate Gaussians○ Most posteriors are complex

Where are we with VI now? (2013’ish)

Variational Inference with Normalizing Flows1

High-level idea:

1) VAEs are great, but our posterior q(z|x) needs to be simple

2) Take simple q(z | x) and apply series of k transformations to z to get q_k(z | x). Metaphor: z “flows” through each transform.

3) Be clever in choice of transforms (computational issue)

4) Variational posterior q now converges to true posterior p

5) Deep NN now parameterizes q and flow parameters[1] Rezende, Danilo Jimenez, and Shakir Mohamed. "Variational inference with normalizing flows." arXiv preprint arXiv:1505.05770 (2015)..

● Function that transforms a probability density through a sequence of invertible mappings

What is a normalizing flow?

q0(z | x)

qk(z | x)

● Chain rule lets us write q

k as product of

q0 and inverted determinants

Key equations (1)

● Density qk(z’)

obtained by successively composing k transforms

Key equations (2)

● Log likelihood of qk(z’)

has a nice additive form

Key equations (3)

● Expectation over qk

can be written as an expectation under q

● Cute name: law of the unconscious statistician (LOTUS)

Key equations (4)

Types of flows

1) Infinitesimal Flows:○ Can show convergence in the limit○ Skipping (theoretical; computationally

expensive)

2) Invertible Linear-Time Flows:○ log-det can be calculated efficiently

● Applies the transform:

where:

Planar Flows

● Applies the transform:

where:

Radial Flows

● VI approx. p(x) via latent variable model ○ p(x) = Σ

z p(z)p(x | z)

● VAE introduces an auto-encoder approach○ Reparameterization trick makes it feasible○ Deep NNs parameterize q(z | x) and p(x | z)

● NF takes q(z|x) from simple to complex○ Series of linear-time transforms○ Convergence in the limit

Summary

Deep Variational Inference - University of Texas at Austinml/flare/extra/slides-deepvarinf.pdf ·...

Documents

Variational Inference with Normalizing Flows · PDF fileVariational Inference with Normalizing Flows the potential scalability of variational inference since it re-quires evaluation

Variational Bayesian inference - Kay Brodersen · Variational Bayesian inference is based on variational calculus. Variational calculus Standard calculus Newton, Leibniz, and others

Towards Verified Stochastic Variational Inference for

Variational Inference for Hierarchical Dirichlet Process ...cs.brown.edu/research/pubs/theses/ugrad/2015/stephenson.will.pdf · Variational Inference for Hierarchical Dirichlet Process

Stochastic Variational Inference - Columbia Universityjwp2128/Papers/HoffmanBleiWangPaisley2013.pdf · Using stochastic variational inference, we analyze several large collections

Rethinking Function-Space Variational Inference in

Morphogenesis as Bayesian Inference: A Variational

Variational Inference for Computational Imaging Inverse

Variational Inference Amr Ahmed Nov. 6 th 2008. Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI

Stochastic Variational Inference

Variational Inference and Learning - University at Buffalocedar.buffalo.edu/~srihari/CSE676/19.4 VariationalInference.pdflatent variables, we can perform variational inference and

Variational Inference for Computational Imaging Inverse Problems · Variational Inference for Computational Imaging Inverse Problems Francesco Tonolini1 Ashley Lyons 2Piergiorgio

Parallelization of Variational Inference for Bayesian

Functional Variational Bayesian Neural Networksbayesiandeeplearning.org/2018/papers/12.pdf · Sampling-Based Functional Variational Inference Adversarial functional variational inference

Stochastic Annealing for Variational Inference

Tightening Bounds for Variational Inference by Revisiting ......Tightening Bounds for Variational Inference by Revisiting Perturbation Theory 4 Exact posterior inference is impossible

Variational Inference for Dirichlet Process Mixture

Variational Inference over Combinatorial Spaces

Variational Inference for GPs: Presenters · Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl

Variational Inference - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/17a.pdf · 2020-06-30 · Variational Inference — Inferring hidden variables — Unlike MCMC: Deterministic