View
8
Download
0
Category
Preview:
Citation preview
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Stochastic Variational Inference
Jesus Fernandez Bes
Machine Learning Group
March 27, 2014
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
1 Introduction
2 Stochastic Variational InferenceModels with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
3 Stochastic Variational Inference in Topic ModelsTopic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
4 Some Bibliograpy
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
MotivationMain Ideas
Challenges of modern data analysis
Massive
Complex
High-dimensional
Probability Models (and Graphical Models) deal with complexity.Scale is the problem.
“Traditional” Variational Inference
1 Inference =⇒ High-dimensional optimization.
2 Solved using Coordinate ascent algorithms.
Analyze ALL the data.Re-estimate hidden structure.Analyze ALL the data.. . .
DO NOT SCALE WITH BIG DATA
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
MotivationMain Ideas
How to make a general Variational method that scales.
Use Stochastic Optimization. Follow cheap noisy estimates ofthe gradient.
Use Natural Gradient. Stochastic Variational Inference has anattractive form.
Structure of SVI
1 Subsample one or more data points from the data.
2 Analyze the subsample using current variational parameters.
3 Implement a closed-form update of the parameters.
4 Repeat.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
p(x, z, β|α) = p(β|α)
N∏n=1
p(xn, zn|β)
N observations x = x1:N .
Vector of global hidden variables β.
N local hidden variables z = z1:N each is a collection of Jvariables zn = zn,1:J .
Vector of fixed parameters α.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Complete Conditional assumption
Complete conditionals are in the exponential family
p(β|x, z, α) = h(β) exp{ηg(x, z, α)T t(β)− ag(ηg(x, z, α))}p(znj |xn, zn,−j , β) = h(znj) exp{ηl(xn, zn,−j , β)T t(znj)−al(ηl(xn, zn,−j , β))}
h(·) is the base measure.
a(·) is the log normalizer.
η(·) is the natural parameter vectors.
t(·) are the sufficient statistics.
Several distributions in the exponential family
Bernoulli, Gaussian, Multinomial, Dirichlet, Gamma, Poisson,Beta,...
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Examples of this kind of model
Bayesian Mixture Models.
Latent Dirichlet Allocation.
Hidden Markov Models (+ many variants).
Kalman filters (+ many variants).
Hierarchical linear regression models.
Hierarchical probit classification models.
Probabilistic factor analysis/matrix factorization models.
Certain Bayesian nonparametric mixture models.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
GOAL
Approximate the posterior distribution of hidden variables given theobservations.
p(z, β|x) =p(x, z, β)∫
p(x, z, β)dzdβ
The problem with the denominator. Intractable to compute.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
The evidence lower bound (ELBO)
log p(x) = log
∫p(x, z, β)dzdβ
= log
∫p(x, z, β)
q(z, β)
q(z, β)dzdβ
= log
(Eq[p(x, z, β)
q(z, β)
])≥ Eq [log p(x, z, β)]− Eq [log q(z, β)]
, L(q).
KL(q(z, β)‖p(z, β|x)) = −L(q) + const.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Mean-field Approximation
Assumption on q(z, β):
q(z, β) = q(β|λ)
N∏n=1
J∏j=1
q(znj |φnj)
with q(β|λ) and q(znj |φnj) in the same exponential family as thecomplete conditionals.
q(β|λ) = h(β) exp{λT t(β)− ag(λ)}q(znj |φnj) = h(znj) exp{φTnjt(znj)− al(φnj)}
Easy coordinate ascent algorithm.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Gradient of the ELBO and Coordinate Ascent Inference
∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)
∇φnjL = ∇2
φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)
Both of them equal 0 by setting
λ = Eq [ηg(x, z, α)]
φn,j = Eq [ηl(xn, zn,−j , β)]
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Gradient, if exists, points to the direction of steepest ascent,
arg maxdλ
f(λ− dλ) subject to ‖dλ‖2 < ε
for small ε. Gradient depends on euclidean distance metric in theparameter space.
In probability distributions euclidean metric can be a bad metric.Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Natural gradient accounts for the information geometry of itsparameter space.
Symmetrized KL divergence
Natural measure of dissimilarity between probability distributions
DsymKL (λ, λ′) = Eλ
[log
q(β|λ)
q(β|λ′)
]+ Eλ′
[log
q(β|λ′)q(β|λ)
]Using this distance, the direction of steepest ascent is
arg maxdλ
f(λ+ dλ) subject to DsymKL (λ, λ+ dλ) < ε
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Natural Gradient
Natural Gradient points in the direction of steeped ascent inthe Riemannian space.
∇̂λf(λ) = G(λ)−1∇λf(λ)
where G(λ) = Eλ[(∇λ log q(β, λ))(∇λ log q(β, λ))T
]is the
fisher information matrix of q(λ).
For exponential family: G(λ) = ∇2λag(λ)
For our mean-field model:
∇̂λL = Eφ [ηq(x, z, α)]− λ
∇̂φnjL = Eλ,φn,−j
[ηl(xn, zn,−j , β)]− φnj
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Why Natural Gradients?
Traditional Gradients
∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)
∇φnjL = ∇2
φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)
Natural Gradients
∇̂λL = Eφ [ηq(x, z, α)]− λ∇̂φnj
L = Eλ,φn,−j[ηl(xn, zn,−j , β)]− φnj
Coordinate ascent is equal to taking a natural gradient step oflength one.
Easier to compute. Use them to develop scalable variationalinferece algorithms.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Optimization
We have a random function B(λ) with Eq [B(λ)] = ∇λf(λ). Wecan optimize f(λ) iteratively as,
λ(t) = λ(t−1) + ρtbt(λ(t−1))
where bt is an independent draw from B. The sequence of ρt mustsatisfy Robbins-Monro conditions.
Follow noisy estimates of the gradient with a decreasing stepsize.If gradient can be written as a sum of terms (one per datapoint) a fast noisy approximation can be computed bysubsampling data.λ(t) will converge to the optimal λ∗ (if f is convex) or a localoptimum of f (if not convex *).
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
L(λ) =
global︷ ︸︸ ︷Eq [log p(β)]− Eq [log q(β)]
+
N∑n=1
maxφn
(Eq [log p(xn, zn|β)]− Eq [log q(zn)])︸ ︷︷ ︸sum of local
We choose I ∼ Unif(1, · · · , N) and define LI(λ) as the randomfunction
LI(λ) = Eq [log p(β)]− Eq [log q(β)]
+ N maxφI
(Eq [log p(xI , zI |β)]− Eq [log q(zI)])
Expectation of LI is equal to the objective, and consequently∇̂λLI is a noisy but unbiased estimate of the natural gradient ofthe objective.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Optimization for global parameters
∇̂λLi = Eq[ηg(x
(N)i , z
(N)i , α)
]− λ
ηg(x(N)i , z
(N)i , α) = α+N · (t(xn, zn), 1)
∇̂λLi = α+N · (Eq [t(xn, zn)] , 1)− λ
Using Stochastic optimization
λ̂t , α+NEφ(λ) [(t(xi, zi), 1)]
λ(t) = λ(t−1) + ρt
(λ̂t − λ(t−1)
)= (1− ρt)λ(t−1) + ρtλ̂t
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Variational Inference
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Extensions
Minibatches
Pick more than one data point each time,
λ(t) = (1− ρt)λ(t−1) +ρtS
∑s
λ̂s.
Empirical Bayes estimation of hyperparameters
Get a point estimate of the value of hyperparameters α
α(t) = α(t−1) + ρt∇αLt(λ(t−1), φ, α(t−1)).
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Topic Models
Observations:
Words wdn is the nth word in the dth document. Element of afixed vocabulary of V terms.
Latent Variables:
A topic βk is a distribution over the vocabulary. Point inV − 1-simplex.Topic proportions θd are asociated to each document.Distribution over topics.Each word in each document comes from a single topic. TopicAssignment zdn are topic indexes.
Consider two models: Latent Dirichlet Allocation (LDA) has afixed number of K topics. Hierarchical Dirichlet Process (HDP)has infinite number of topics.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Analyzing the documents
Posterior inference of p(β, θ, z|w)
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Generative model
1 Draw topics bk ∼ Dirichlet(η, · · · , η).2 For each document d ∈ {1, · · · , D}:
1 Draw topic proportions θ ∼ Dirichlet(α, · · · , α).2 For each word w ∈ {1, · · · , N}:
1 Draw topic assignment zdn ∼ Multinomial(θd).2 Draw word wdn ∼ Multinomial(βzdn).
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Variational Inference in LDA
Mean-field for LDA
q(zdn) = Multinomial(φdn)
q(θd) = Dirichlet(γd)
q(βk) = Dirichlet(λk)
1 Update per-document d local variational parameters
φkdn ∝ exp{Ψ(γdk) + Ψ(λk,wdn)−Ψ(
∑v
λkv)} for n ∈ {1, · · · , N}
γd = α+
N∑n=1
φdn
2 Update global parameters λk = η +∑D
d=1
∑Nn=1 φ
kdnwdn
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Stochastic Variational Inference in LDA
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Results LDA
DATA
Nature: 350k docs, 58M words, 4200 terms.
New York Times: 1.8M docs, 461M words, 8000 terms.
Wikipedia: 3.8M docs,482M words, 7700 terms.
* Batch Variational uses a subset of 100k docs.Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Results HDP
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Some Bibliograpy
Main Paper
Hoffman, M. D., and Blei, D. M., and Wang, C., andPaisley, J. (2013). “Stochastic variational inference”. The Journalof Machine Learning Research, 14(1), 1303-1347.
Other References
Blei, D. M.. “Variational Inference”. Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/
cos597C/lectures/variational-inference-i.pdf.
Blei, D. M., “Exponential Families,” Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/
cos597C/lectures/exponential-families.pdf.
Jesus Fernandez Bes Stochastic Variational Inference
Recommended