Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
pSVGD: Projected Stein Variational Gradient Descent A fast and scalable Bayesian inference method in high dimensions
Data-informed Intrinsic Low-dimensionality
Bayesian Inference and SVGD
References
P. Chen, O. Ghattas. Projected Stein variational gradient descent. Advances in Neural Information Processing Systems, 2020.
P. Chen, K. Wu, O. Ghattas. Bayesian inference of heterogeneous epidemic models: Application to COVID-19 spread accounting for long-term care facilities. arXiv:2011.01058, 2020.
P. Chen, K. Wu, J. Chen, T. O'Leary-Roseberry, O Ghattas. Projected Stein variational Newton: A fast and scalable Bayesian inference method in high dimensions. Advances in Neural Information Processing Systems, 2019.
O. Zahm, T. Cui, K. Law, A. Spantini, Y. Marzouk. Certified dimension reduction in nonlinear Bayesian inverse problems, arXiv:1807.03712, 2018
Peng Chen and Omar Ghattas {peng, omar}@oden.utexas.edu
The curse of dimensionality is a longstanding challenge in Bayesian inference in high dimensions. In particular the Stein variational gradient descent method (SVGD) suffers from this curse. To overcome this challenge, we propose the projected SVGD (pSVGD) method, which exploits the intrinsic low dimensionality of the data informed subspace stemming from ill-posedness of inference problems. We adaptively construct the subspace using a gradient information matrix of the log-likelihood, and apply pSVGD to the much lower-dimensional coefficients of the parameter projection. The method is demonstrated to be more accurate and efficient than SVGD, and scalable with respect to the
number of parameters, samples, data points, and processor cores.
Abstract
p(x)⏟posterior
=1Z
f(x)⏟likelihood
p0(x)⏟prior
.
Bayes’ rule for parameter x ∈ ℝd
xℓ+1m = xℓ
m + ϵlϕ*ℓ(xℓ
m), m = 1,…, N, ℓ = 0,1,…,
ϕ*ℓ(xℓm) =
1N
N
∑n=1
∇xℓnlog p(xℓ
n )k(xℓn , xℓ
m) + ∇xℓnk(xℓ
n , xℓm),
SVGD update for prior samples x01, …, x0
N
where is the approximate steepest directionϕ*ℓ(xℓm)
where is, e.g., Gaussian kernel, collapsing in for big .k( ⋅ , ⋅ ) ℝd d
Experiment I: PDE-constrained Inference Experiment II: Inference of COVID-19
S E I R
H D
σ1 (1 − π1)γ I1
ν1μ1
π1η1
β1
(1 − ν1)γH1
S E I R
H D
σ2 (1 − π2)γ I2
ν2μ2
π2η2
β2
(1 − ν2)γH2
C
susceptible
exposed
infectious
recovered
hospitalized
deceased
contacts
S
E
I
H
R
D
C
1 − α1
1 − α2
H = ∫ℝd
(∇x log f(x))(∇x log f(x))T p(x)dx
≈N
∑m=1
∇x log f(xm)(∇x log f(xm))T,
Gradient information matrix of the log-likelihood
Hψi = λiΓψi,
x1, …, xNwhere are adaptively updated at suitable iteration.
Dimension Reduction by Projection
Given prior covariance , solve generalized eigenvalue problem:Γ
λ1 ≥ λ2 ≥ ⋯ ≥ λr ≥ ⋯ ≥ 0.with eigenvalues
Prx :=r
∑i=1
ψiψTi x = Ψrw, ∀x ∈ ℝd,
We project the high-dimensional pamameter to the subspace Xr = span(ψ1, …, ψr)
pr(x) :=1Zr
g(Prx)p0(x) .
and define a projected posterior with profile function g(Prx)
g*(Prx) = ∫X⊥
f(Prx + ξ)p⊥0 (ξ |Prx)dξ ⟹ DKL(p |pr) ≥ DKL(p |p*r ) ≤
γ2
d
∑i=r+1
λi,
Optimal profile function in Kullback—Leibler divergence bounded by eigenvalues
p⊥0 (ξ |Prx) = p0(Prx + ξ)/pr
0(Prx) with pr0(Prx) = ∫X⊥
p0(Prx + ξ)dξ .
where the conditional/marginal density in complement subspace X⊥
SVGD vs pSVGDInput Layer
Output Layer
1
1112 1314151617181920
23456 78910
2122 2324252627282930
3132 3334353637383940
4142 4344454647484950
5152 5354555657585960
6162 6364656667686970
Output Layer
Input Layer
1
1112
23456 78910
1314
151617181920 21222324
2526
2728
2930 3132333435363738
Projection +
SVGD in +
Prolongation
Xℓr
Projection +
SVGD in +
Prolongation
Xℓ+1r
π(w)⏟
posterior
=1
Zwg(Ψrw)
likelihood
π0(w)⏟prior
Bayes’ rule for projection coefficient in data-informed dominant subspacesw ∈ ℝr
wℓ+1m = wℓ
m + ϵlϕ*ℓ(wℓ
m), m = 1,…, N, ℓ = 0,1,…,
ϕ*ℓ(wℓm) =
1N
N
∑n=1
∇wℓnlog π(wℓ
n )kr(wℓn , wℓ
m) + ∇wℓnkr(wℓ
n , wℓm),
SVGD update for prior samples w01, …, w0
N
where is the approximate steepest direction
where is the projected kernel in ,
ϕ*ℓ(wℓm)
kr( ⋅ , ⋅ ) ℝr ∇wlog π(w) = ΨTr ∇x log pr(Prx) .
−∇ ⋅ (ex ∇u) = 0, in (0,1)2 .
x ∈ 𝒩(0,𝒞), 𝒞 = (−0.1Δ + I )−2
PDE model with random field ODE model with stochastic process
αi(t) =12 (tanh(gi(t)) + 1) .
gi(t) ∈ 𝒩(g*i , 𝒞g), 𝒞g = (−sgΔt + sII )−1