Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Heterogeneous Multi-output Gaussian Process PredictionPablo Moreno-Muñoz1 Antonio Artés-Rodríguez1 Mauricio A. Álvarez2
1Universidad Carlos III de Madrid, Spain 2University of Sheffield, UK{pmoreno, antonio}@tsc.uc3m.es [email protected]
IntroductionA novel extension of multi-output Gaussian processes (MOGPs) for handling heterogeneous outputs (bi-nary, real, categorical, . . . ). Each output has its own likelihood distribution and we use a MOGP prior tojointly model the parameters in all likelihoods as latent functions. We are able to obtain tractable variationalbounds amenable to stochastic variational inference (SVI).
Multi-output GPsWe will use a linear model of corregionalisation type of covariance function for expressing correlationsbetween latent parameter functions fd,j(x) (LPFs).
Each LPF is a linear combination of independent latent functions U = {uq(x)}Qq=1. Each uq(x) is assummedto be drawn from a GP prior such that uq(·) ∼ GP(0, kq(·, ·)), where kq can be any valid covariance function.
fd,j(x) =
Q∑q=1
Rq∑i=1
aid,j,quiq(x),
We assume that Rq = 1, meaning that the corregionalisation matrices are rank-one. In the literature suchmodel is known as the semiparametric latent factor model.
Heterogeneous Likelihood ModelConsider a set of output functions Y = {yd(x)}Dd=1, with x ∈ Rp, that we want to jointly model using GPs.Let y(x) = [y1(x), y2(x), · · · , yD(x)]> be a vector-valued function. If outputs are conditionally independentgiven the vector of parameters θ(x) = [θ1(x), θ2(x), · · · , θD(x)]>, we may define
p(y(x)|θ(x)) = p(y(x)|f(x)) =D∏
d=1
p(yd(x)|θd(x)) =D∏
d=1
p(yd(x)|̃fd(x)),
where f̃d(x) = [fd,1(x), · · · , fd,Jd(x)]> ∈ RJd×1 are the set of LPFs that specify the parameters in θd(x) for anarbitrary number D of likelihood functions.
Variational BoundsSparse Approximations in MOGPs: We define the set of M inducing variables per latent function uq(x)as uq = [uq(z1), · · · , uq(zM)]>, evaluated at a set of inducing inputs Z = {zm}Mm=1 ∈ RM×p. We also defineu = [u>1 , · · · ,u>Q]> ∈ RQM×1. We approximate the posterior p(f ,u|y,X) as follows:
p(f ,u|y,X) ≈ q(f ,u) = p(f |u)q(u) =D∏
d=1
Jd∏j=1
p(fd,j|u)Q∏
q=1
q(uq),
Variational Inference: Exact posterior inference is intractable in our model due to the presence of an arbi-trary number of non-Gaussian likelihoods. We use variational inference to compute a lower bound L forthe marginal log-likelihood log p(y), and for approximating the posterior distribution p(f ,u|D).
L =D∑
d=1
Eq(f̃d)
[log p(yd(xn)|̃fd)
]−
Q∑q=1
KL(q(uq)||p(uq)
)Acknowledgements: PMM is supported by a doctoral FPI grant (BES2016-077626) under the project Macro-ADOBE (TEC2015-67719-P), MINECO, Spain. AARacknowledges the projects ADVENTURE (TEC2015-69868-C2-1-R), AID (TEC2014-62194-EXP) and CASI-CAM-CM (S2013/ICE-2845). MAA has been partiallyfinanced by the Engineering and Physical Research Council (EPSRC) Research Projects EP/N014162/1 and EP/R034303/1.
Results Code→ github.com/pmorenoz/HetMOGPMissing Gap Prediction: We predict observations in one output (binary classification) using training information from anotherone (Gaussian regression). Multi-output test-NLPD value: 32.5±0.2×10−2 / Single-output test-NLPD value: 40.51±0.08×10−2.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−6
−4
−2
0
2
4
6
Real Input
RealOutput
Output 1: Gaussian Regression
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Real Input
Bin
ary
Ou
tpu
t
Output 2: Binary Classification
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Real Input
Bin
ary
Ou
tpu
t
Single Output: Binary Classification
London House Price: Complete register of properties sold in the Greater London Countyduring 2017. All properties addresses were translated to latitude-longitude points. For eachspatial input, we considered two observations, one binary (property type) and one real(sale price).
-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29
51.37
51.45
51.53
51.61
51.69
Longitude
Latitude
Property Type
FlatOther
-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29
51.37
51.45
51.53
51.61
51.69
Longitude
Latitude
Sale Price
79K£
167K£
351K£
738K£
1.5M£
-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29
51.37
51.45
51.53
51.61
51.69
Longitude
Latitude
Probability of Flat House
0
0.2
0.4
0.6
0.8
1
-0.51 -0.34 -0.17 -0.0 0.16 0.3351.29
51.37
51.45
51.53
51.61
51.69
Longitude
Latitude
Log-price Variance
0.3
0.6
0.9
1.2
1.5
1.8
2.1
2.4
TEST-NLPD[LONDON] Bernoulli Heteroscedastic Global
HetMOGP 6.38± 0.46 10.05± 0.64 16.44± 0.01ChainedGP 6.75± 0.25 10.56± 1.03 17.31± 1.06
Human Behavior Data: Wemodel human behavior in psy-chiatric patients. Our datacomes from a medical study thatuses the monitoring app (eB2).
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
0
0.2
0.4
0.6
0.8
1
Output1:
Binary
Presence/Absence at Home
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
−4
−2
0
2
4
Output2:
Log-distance
Distance from Home (Km)
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
0
0.2
0.4
0.6
0.8
1
Output3:
Binary
Use/non-use of Whatsapp
ConclusionsWe present a MOGP model for handling heterogeneous obser-vations that is able to work on large scale datasets. Experimentalresults show relevant improvements with respect to indepen-dent learning.
ReferencesY. W. Teh et al. Semiparametric latent factor models. AISTATS, 2005M. A. Álvarez et al., Sparse convolved Gaussian processes for multi-output regres-sion. NIPS, 2008J. D. Hadfield, MCMC methods for multi-response GLMMs. JSS, 2010J. Hensman et al., Gaussian processes for big data. UAI, 2013A. Saul et al., Chained Gaussian processes. AISTATS, 2016
Likelihood Linked Parameters
Gaussian µ(x) = f , σ(x)
Het. Gaussian µ(x) = f1, σ(x) = exp(f2)
Bernoulli ρ(x) = exp(f)1+exp(f)
Categorical ρk(x) =exp(fk)
1+∑K−1
k′=1exp(fk′ )
Poisson λ(x) = exp(f)
Gamma a(x) = exp(f1), b(x) = exp(f2)