Heterogeneous Multi-output Gaussian Process Prediction

Heterogeneous Multi-output Gaussian Process PredictionPablo Moreno-Muñoz1 Antonio Artés-Rodríguez1 Mauricio A. Álvarez2

1Universidad Carlos III de Madrid, Spain 2University of Sheffield, UK{pmoreno, antonio}@tsc.uc3m.es [email protected]

IntroductionA novel extension of multi-output Gaussian processes (MOGPs) for handling heterogeneous outputs (bi-nary, real, categorical, . . . ). Each output has its own likelihood distribution and we use a MOGP prior tojointly model the parameters in all likelihoods as latent functions. We are able to obtain tractable variationalbounds amenable to stochastic variational inference (SVI).

Multi-output GPsWe will use a linear model of corregionalisation type of covariance function for expressing correlationsbetween latent parameter functions fd,j(x) (LPFs).

Each LPF is a linear combination of independent latent functions U = {uq(x)}Qq=1. Each uq(x) is assummedto be drawn from a GP prior such that uq(·) ∼ GP(0, kq(·, ·)), where kq can be any valid covariance function.

fd,j(x) =

Q∑q=1

Rq∑i=1

aid,j,quiq(x),

We assume that Rq = 1, meaning that the corregionalisation matrices are rank-one. In the literature suchmodel is known as the semiparametric latent factor model.

Heterogeneous Likelihood ModelConsider a set of output functions Y = {yd(x)}Dd=1, with x ∈ Rp, that we want to jointly model using GPs.Let y(x) = [y1(x), y2(x), · · · , yD(x)]> be a vector-valued function. If outputs are conditionally independentgiven the vector of parameters θ(x) = [θ1(x), θ2(x), · · · , θD(x)]>, we may define

p(y(x)|θ(x)) = p(y(x)|f(x)) =D∏

d=1

p(yd(x)|θd(x)) =D∏

d=1

p(yd(x)|̃fd(x)),

where f̃d(x) = [fd,1(x), · · · , fd,Jd(x)]> ∈ RJd×1 are the set of LPFs that specify the parameters in θd(x) for anarbitrary number D of likelihood functions.

Variational BoundsSparse Approximations in MOGPs: We define the set of M inducing variables per latent function uq(x)as uq = [uq(z1), · · · , uq(zM)]>, evaluated at a set of inducing inputs Z = {zm}Mm=1 ∈ RM×p. We also defineu = [u>1 , · · · ,u>Q]> ∈ RQM×1. We approximate the posterior p(f ,u|y,X) as follows:

p(f ,u|y,X) ≈ q(f ,u) = p(f |u)q(u) =D∏

d=1

Jd∏j=1

p(fd,j|u)Q∏

q=1

q(uq),

Variational Inference: Exact posterior inference is intractable in our model due to the presence of an arbi-trary number of non-Gaussian likelihoods. We use variational inference to compute a lower bound L forthe marginal log-likelihood log p(y), and for approximating the posterior distribution p(f ,u|D).

L =D∑

d=1

Eq(f̃d)

[log p(yd(xn)|̃fd)

]−

Q∑q=1

KL(q(uq)||p(uq)

)Acknowledgements: PMM is supported by a doctoral FPI grant (BES2016-077626) under the project Macro-ADOBE (TEC2015-67719-P), MINECO, Spain. AARacknowledges the projects ADVENTURE (TEC2015-69868-C2-1-R), AID (TEC2014-62194-EXP) and CASI-CAM-CM (S2013/ICE-2845). MAA has been partiallyfinanced by the Engineering and Physical Research Council (EPSRC) Research Projects EP/N014162/1 and EP/R034303/1.

Results Code→ github.com/pmorenoz/HetMOGPMissing Gap Prediction: We predict observations in one output (binary classification) using training information from anotherone (Gaussian regression). Multi-output test-NLPD value: 32.5±0.2×10−2 / Single-output test-NLPD value: 40.51±0.08×10−2.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−6

−4

−2

0

2

4

6

Real Input

RealOutput

Output 1: Gaussian Regression

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Real Input

Bin

ary

Ou

tpu

t

Output 2: Binary Classification

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Real Input

Bin

ary

Ou

tpu

t

Single Output: Binary Classification

London House Price: Complete register of properties sold in the Greater London Countyduring 2017. All properties addresses were translated to latitude-longitude points. For eachspatial input, we considered two observations, one binary (property type) and one real(sale price).

-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29

51.37

51.45

51.53

51.61

51.69

Longitude

Latitude

Property Type

FlatOther

-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29

51.37

51.45

51.53

51.61

51.69

Longitude

Latitude

Sale Price

79K£

167K£

351K£

738K£

1.5M£

-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29

51.37

51.45

51.53

51.61

51.69

Longitude

Latitude

Probability of Flat House

0

0.2

0.4

0.6

0.8

1

-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29

51.37

51.45

51.53

51.61

51.69

Longitude

Latitude

Log-price Variance

0.3

0.6

0.9

1.2

1.5

1.8

2.1

2.4

TEST-NLPD[LONDON] Bernoulli Heteroscedastic Global

HetMOGP 6.38± 0.46 10.05± 0.64 16.44± 0.01ChainedGP 6.75± 0.25 10.56± 1.03 17.31± 1.06

Human Behavior Data: Wemodel human behavior in psy-chiatric patients. Our datacomes from a medical study thatuses the monitoring app (eB2).

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

0

0.2

0.4

0.6

0.8

1

Output1:

Binary

Presence/Absence at Home

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

−4

−2

0

2

4

Output2:

Log-distance

Distance from Home (Km)

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

0

0.2

0.4

0.6

0.8

1

Output3:

Binary

Use/non-use of Whatsapp

ConclusionsWe present a MOGP model for handling heterogeneous obser-vations that is able to work on large scale datasets. Experimentalresults show relevant improvements with respect to indepen-dent learning.

ReferencesY. W. Teh et al. Semiparametric latent factor models. AISTATS, 2005M. A. Álvarez et al., Sparse convolved Gaussian processes for multi-output regres-sion. NIPS, 2008J. D. Hadfield, MCMC methods for multi-response GLMMs. JSS, 2010J. Hensman et al., Gaussian processes for big data. UAI, 2013A. Saul et al., Chained Gaussian processes. AISTATS, 2016

Likelihood Linked Parameters

Gaussian µ(x) = f , σ(x)

Het. Gaussian µ(x) = f1, σ(x) = exp(f2)

Bernoulli ρ(x) = exp(f)1+exp(f)

Categorical ρk(x) =exp(fk)

1+∑K−1

k′=1exp(fk′ )

Poisson λ(x) = exp(f)

Gamma a(x) = exp(f1), b(x) = exp(f2)

Documents

Heterogeneous Multi-output Gaussian Process Prediction