13
Scalable Training of Inference Networks for Gaussian-Process Models Jiaxin Shi Tsinghua University Jun Zhu Mohammad Emtiyaz Khan Joint work with

Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks forGaussian-Process Models

Jiaxin ShiTsinghua University

Jun ZhuMohammad Emtiyaz Khan

Joint work with

Page 2: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Gaussian Process

mean function

covariance function / kernel

Sparse variational GP [Titsias, 09; Hensman et al., 13]

Gaussian field

inducing points

Posterior inference

complexity, conjugate likelihoods

Page 3: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Inference Networks for GP Models Remove sparse assumption

Gaussianfield

Inputs

Observations

Data Prediction

Inference network

Page 4: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Inference Networks for GP Models Remove sparse assumption

Inference network

Data Prediction

Gaussianfield

Inputs

Observations

Page 5: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Examples of Inference Networks

weight space

function space

● Bayesian neural networks:

○ intractable output densitysin

cos

sin

freq (s)

weights (w)

[Sun et al., 19]

● Inference network architecture can be derived from

the weight-space posterior

○ Random feature expansions

○ Deep neural nets

[Cutajar, et al., 18]

Page 6: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

● Consider matching variational and true posterior processes at arbitrary

● This objective is doing improper minibatch for the KL divergence term

Minibatch Training is DifficultFunctional Variational Bayesian Neural Networks (Sun et al., 19)

● Full batch fELBO

● Practical fELBO

Measurement points

Page 7: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent

● work with the functional density directly

○ natural gradient in the density space

○ minibatch approximation with stochastic functional gradient

● closed-form solution as an adaptive Bayesian filter

[Dai et al., 16; Cheng & Boots, 16]

seeing next data point adapted prior

● sequentially applying Bayes’ rule is the most natural gradient

○ in conjugate models: equivalent to natural gradient for exponential families

[Raskutti & Mukherjee, 13; Khan & Lin, 17]

Page 8: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

student teacher

● an idea from filtering: bootstrap

○ similar idea: temporal difference (TD) learning with function approximation

Page 9: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

● (Gaussian likelihood case) closed-form marginals of at locations

○ equivalent to GP regression

● (Nonconjugate case) optimize an upper bound of

Page 10: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Measurement points vs. inducing points

GPN

et

M=2 M=5 M=20

SVG

P

● inducing points - expressiveness of variational approximation

● measurement points - variance of training

Page 11: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Effect of proper minibatch training

FBNN, M=20 GPNet, M=20

Airline Delay (700K)

● Fix underfitting N=100, batch size=20

● Better performance with more measurement points

Page 12: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Scalable Training of Inference Networks for GP Models Regression & Classification

Regression benchmarks

GP classification with a priorderived from

infinite-width Bayesian ConvNets

Page 13: Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent work

Poster #227Code: https://github.com/thjashin/gp-infer-net