Latent Forces Models using Gaussian Processes

Mauricio A. Alvarez

(joint with David Luengo and Neil D. Lawrence)

Latent Force Models Workshop, Sheffield, June 2013

Contenido

Introduction

Motivation: From Latent Variables to Latent Forces

Latent Force Models Basics

Learning Latent Force Models

ApplicationsSecond order dynamical systemsPartial Differential EquationsStochastic LFM

ExtensionsNon-linear and cascaded LFMsSwitched Latent Force Models

Data driven paradigm

q Traditionally, the main focus in machine learning has been modelgeneration through a data driven paradigm.

q Combine a data set with a flexible class of models and, throughregularization, make predictions on unseen data.

q Problems– Data is scarce relative to the complexity of the system.– Model is forced to extrapolate.

Mechanistic models

q Models inspired by the underlying knowledge of a physical system arecommon in many areas.

q Description of a well characterized physical process that underpins thesystem, typically represented with a set of differential equations.

q Identifying and specifying all the interactions might not be feasible.

q A mechanistic model can enable accurate prediction in regions wherethere may be no available training data

Hybrid systems

q We suggest a hybrid approach involving a mechanistic model of thesystem augmented through machine learning techniques.

q Dynamical systems (e.g. incorporating first order and second orderdifferential equations).

q Partial differential equations for systems with multiple inputs.

Contenido

Introduction

Latent variable model: definition

q Our approach can be seen as a type of latent variable model.

Y = UW> + E,

where Y ∈ RN×D, U ∈ RN×Q , W ∈ RD×Q (Q < D) and E is a matrixvariate white Gaussian noise with columns e:,d ∼ N (0,Σ).

q In PCA and FA the common approach to deal with the unknowns is tointegrate out U under a Gaussian prior and optimize with respect to W.

Latent variable model: alternative view

q Data with temporal nature and Gaussian (Markov) prior for rows of Uleads to the Kalman filter/smoother.

q Consider a joint distribution for p (U|t), t = [t1 . . . tN ]>, with the form of aGaussian process (GP),

p (U|t) =Q∏

N(u:,q |0,Ku:,q ,u:,q

The latent variables are random functions, {uq(t)}Qq=1 with associated

covariance Ku:,q ,u:,q .

q The GP for Y can be readily implemented.

Latent force model: mechanistic interpretation (1)

q We include a further dynamical system with a mechanistic inspiration.

q Reinterpret equation Y = UW> + E, as a force balance equation

YB = US> + E,

where S ∈ RD×Q is a matrix of sensitivities, B ∈ RD×D is diagonalmatrix of spring constants, W> = S>B−1 and e:,d ∼ N

(0,B>ΣB

yd (t)

YB = US> + E

Latent force model: extension (1)

q The model can be extended including dampers and masses.

q We can writeYM + YC + YB = US> + E ,

whereY is the first derivative of Y w.r.t. timeY is the second derivative of Y w.r.t. timeC is a diagonal matrix of damping coefficientsM is a diagonal matrix of massesE is a matrix variate white Gaussian noise.

yd (t)

YM + YC + YB = US> + E

Latent force model: properties

q This model allows to include behaviors like inertia and resonance.

q We refer to these systems as latent force models (LFMs).

q One way of thinking of our model is to consider puppetry.

Contenido

Introduction

General Dynamical LFM

q Dynamical latent force model of order M

M∑m=0

Dm[Y]Am = US> + E,

where Dm[Y] has elements Dmyd (t) = dmyd (t)dtm , and Am is a diagonal

matrix with elements Am,d that weight the contribution of Dmyd (t).

q Each element in the expression above can be written as

DM0 yd =

M∑m=0

Am,dDmyd (t) =Q∑

Sd,quq(t) + ed (t),

where we have introduced an operator DM0 that is equivalent to applying

the weighted sum of operators Dm.

Green’s functions

q The operator DM0 is related to a so called Green’s function Gd (t , s) by

DM0 [Gd (t , s)] = δ(t − s),

with s fixed.

q The solution for yd (t) can be written in terms of the Green’s function like

yd (t) =Q∑

Sd,q fd (t ,uq(t)) + wd (t),

fd (t ,uq(t)) =

Gd (t , τ)uq(τ)dτ,

and wd (t) is a general stochastic process.

Covariance for the outputs

q We assume that the latent functions {uq(t)}Qq=1 are independent.

q We also assume that each uq(t) follows a Gaussian process prior, thisis, uq(t) ∼ GP(0, kuq ,uq (t , t ′)).

q Furthermore, the processes {wd}Dd=1 are also assumed independent.

q The covariance cov[yd (t), yd ′(t ′)] is the given as

cov[fd (t), fd ′(t ′)] + cov[wd (t),wd ′(t ′)]δd,d ′ ,

with cov[fd (t), fd ′(t ′)] equals to

Q∑q=1

Sd,qSd ′,q

∫T ′

Gd (t − τ)Gd ′(t ′ − τ ′)kuq ,uq (τ, τ′)dτ ′dτ.

Multidimensional inputs

q In dynamical latent force models the input variable is one-dimensional(time).

q For higher-dimensional inputs, x ∈ Rp, partial differential equations areused.

q Once the Green’s function associated to the linear partial differentialoperator has been established, we employ similar equations to theones shown before to compute covariances.

q The input t is replaced by a high-dimensional vector x.

Contenido

Introduction

Hyperparameter Learning

q Let X = {xn}Nn=1 represents a set of inputs, and θ represents the

hyperparameters of the covariance function.

q The marginal likelihood for the outputs can be written as

p(y|X,θ) = N (y|0,Kf,f + Σ),

where y = vec Y, Kf,f ∈ RND×ND with each element given bycov[fd (xn), fd ′(xn′)] (Neil’s talk on Tuesday and today).

q The matrix Σ represents the covariance associated with theindependent processes wd (x).

q Hyperparameters are estimated by maximizing the logarithm of themarginal likelihood.

Predictive distribution

q Prediction for a set of test inputs X∗ is done using standard Gaussianprocess regression techniques.

q The predictive distribution is given by

p(y∗|y,X,θ) = N (y∗|µ∗,Ky∗,y∗),

µ∗ = Kf∗,f (Kf,f + Σ)−1 y,

Ky∗,y∗ = Kf∗,f∗ − Kf∗,f (Kf,f + Σ)−1 K>f∗,f + Σ∗.

Efficient approximations (I)

q Learning θ through marginal likelihood maximization involves theinversion of the matrix Kf,f + Σ.

q The inversion of this matrix scales as O(D3N3).

q Single output case (D = 1) (James’ talk on Tuesday).

q Recently, Alvarez and Lawrence (2009) introduced an efficientapproximation for the case D > 1.

Efficient approximations (II)

q If only a few number K < N of values of u(x) are known, then the set ofoutputs fd (x,u(x)) are uniquely determined.

q Similar to Partially Independent Training Conditional (PITC)approximation.

Efficient approximations (III)

Sample from p(u) fd (x) =

Gd (x− z)u(z)dz

Efficient approximations (III)

Sample from p(u) fd (x) =

Gd (x− z)u(z)dz

Sample fromp(u|u)

fd (x) ≈∫X

Gd (x− z) E [u(z)|u] dz

Efficient approximations (IV)

q Another approximation (Alvarez et al., 2010) establishes a lower boundon the marginal likelihood and reduces computational complexity toO(DNK 2).

q Maximizing the lower bound with respect to φ(u)

L(Z,θ) = logN(y|0,Kf,uK−1

u,uKu,f + Σ)

− 12

trace[Σ−1 (Kf,f − Kf,uK−1

u,uKu,f)].

q Deterministic Training Conditional Variational (DTCVAR) approximationfor multiple-output GP regression.

Contenido

Introduction

Contenido

Introduction

Second Order Dynamical System

Using the system of second order differential equations

Mdd2yd (t)

dt2 + Cddyd (t)

dt+ Bdyd (t) =

Q∑q=1

Sd,quq(t) + ed (t),

whereuq(t) latent forcesyd (t) displacements over time

Cd damper constant for the d-th outputBd spring constant for the d-th outputMd mass constant for the d-th output

Sd,q sensitivity of the d-th output to the q-th input.

Second Order Dynamical System: solution

Solving for yd (t), we obtain

yd (t) =Q∑

Sd,q fd (t ,uq(t)) + wd (t),

where the linear operator is given by a convolution:

fd (t ,uq(t)) =

exp(−αd (t − τ)) sin(ωd (t − τ))︸︷︷︸Gd (t−τ)

uq(τ)dτ,

with ωd =√

4Bd − C2d/2 and αd = Cd/2.

Second Order Dynamical System: covariance matrix

Behaviour of the system summarized by the damping ratio:

ζd =12

Cd/√

ζd > 1 overdamped systemζd = 1 critically damped systemζd < 1 underdamped systemζd = 0 undamped system (no friction)

Example covariance matrix:

ζ1 = 0.125 underdampedζ2 = 2 overdampedζ3 = 1 critically damped

f(t) y1(t) y

2(t) y

y 1(t)

y 2(t)

y 3(t)

−0.4

−0.2

Second Order Dynamical System: samples from GP

0 5 10 15 20−2.5

−1.5

−0.5

Joint samples from the ODE covariance, cyan: u (t), red:y1 (t)(underdamped) and green: y2 (t) (overdamped) and blue: y3 (t)(critically damped).

0 5 10 15 20−2.5

−1.5

−0.5

0 5 10 15 20−1

−0.5

0 5 10 15 20−2.5

−1.5

−0.5

Motion Capture Data (1)

q CMU motion capture data, motions 18, 19 and 20 from subject 49.

q Motions 18 and 19 for training and 20 for testing.

Motion Capture Data (2)

q The data down-sampled by 32 (from 120 frames per second to 3.75).

q We focused on the subject’s left arm.

q For testing, we condition only on the observations of the shoulder’sorientation (motion 20) to make predictions for the rest of the arm’sangles.

Motion Capture Results

Root mean squared (RMS) angle error for prediction of the left arm’sconfiguration in the motion capture data. Prediction with the latent forcemodel outperforms the prediction with regression for all apart from theradius’s angle.

Latent Force RegressionAngle Error ErrorRadius 4.11 4.02Wrist 6.55 6.65

Hand X rotation 1.82 3.21Hand Z rotation 2.76 6.14

Thumb X rotation 1.77 3.10Thumb Z rotation 2.73 6.09

Contenido

Introduction

Diffussion in the Swiss Jura

Cadmium

CopperCopper

Region ofSwiss Jura

Cadmium

CopperCopper

Region ofSwiss Jura

Cadmium

CopperCopper

Region ofSwiss Jura

Cadmium

CopperCopper

Region ofSwiss Jura

Diffusion equationq A simplified version of the diffusion equation is

∂fd (x, t)∂t

p∑j=1

κd∂2fd (x, t)∂x2

where fd (x, t) are the concentrations of each pollutant.

q The solution to the system is then given by

fd (x, t) =Q∑

Gd (x,x′, t)uq(x′)dx′,

where uq(x) represents the concentration of pollutants at time zero andGd (x,x′, t) is the Green’s function given as

Gd (x,x′, t) =1

2pπp/2T p/2d

− p∑j=1

(xj − x ′j )2

,with Td = κd t .

Prediction of Metal Concentrations

q Prediction of a primary variable by conditioning on the values of somesecondary variables.

Primary variable Secondary VariablesCd Ni, ZnCu Pb, Ni, ZnPb Cu, Ni, ZnCo Ni, Zn

q Comparison bewteen diffusion kernel, independent GPs and “ordinaryco-kriging”.

Metals IGPs GPDK OCKCd 0.5823±0.0133 0.4505±0.0126 0.5Cu 15.9357±0.0907 7.1677±0.2266 7.8Pb 22.9141±0.6076 10.1097±0.2842 10.7Co 2.0735±0.1070 1.7546±0.0895 1.5

Contenido

Introduction

A dynamic model for financial data (I)

Multivariate financial data set: the dollar prices of the 3 precious metals andtop 10 currencies.

50 100 150 200 250600

Gold50 100 150 200 250

A dynamic model for financial data (I)

q Our model: a set of coupled differential equations, driven by either aGaussian process, a white noise process or both,

dfd (t)dt

= λd fd (t) + Sdu(t),

where λd is a decay coefficient and Sd quantifies the influence of theprocess u(t).

q If u(t) is a white noise process→ Langevin equation→ a linearstochastic differential equation.

q Solution for fd (t) has the form of convolutions. For a single output andwhite noise process, fd (t)→ Ornstein-Uhlenbeck (OU) process.

A dynamic model for financial data (III)

50 100 150 200 2501000

XPT50 100 150 200 250

50 100 150 200 2507.8

9.8x 10

JPY50 100 150 200 250

Contenido

Introduction

Contenido

Introduction

Non-linear and Cascaded LFMs

q Non-linear LFMs– If the likelihood function is not Gaussian or the differential equation is

nonlinear, approximations are needed.

– Approximations used before include the Laplace’s approximation(Lawrence et al., 2007) or sampling (Titsias et al., 2009).

q Cascaded Latent Force Models– Latent forces uq(t) could be the outputs of another latent force model.

– For example, in Honkela et al. (2010), the authors use a cascaded systemto describe gene expression data

Contenido

Introduction

q Latent force models encode the interaction between multiple relateddynamical systems in the form of a covariance function.

q Each variable to be modeled is represented as the output of adifferential equation.

q Differential equations are driven by a weighted sum of latent functionswith uncertainty given by a Gaussian process priors.

Discontinuous latent forces

q If a single Gaussian process prior is used to represent each latentfunction then the models we consider are limited to smooth drivingfunctions.

q However, discontinuities and segmented latent forces are omnipresentin real-world data.

q Impact forces due to contacts in a mechanical dynamical system or aswitch in an electrical circuit.

q Motor primitives: most non-rhythmic natural motor skills consist of asequence of segmented, discrete movements.

damperspring

sensitivity

output(t)force(t)

force(t) ∼ GP(0, k(t,t’))

damperspring

sensitivity

output(t)force(t)

force(t)

damperspring

sensitivity

output(t)force(t)

output(t)

force(t)

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)

force(t,i) ∼ GP(0, k(t,t’,i))

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)

force(t,1) force(t,2) force(t,3)

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)

output(t,1) output(t,2) output(t,3)

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)

output(t,1) output(t,2) output(t,3)

output(t)

Continuous in the outputs

y1(t! t0)

y2(t! t1)

y3(t! t2)

y1(t0)

y1(t1 ! t0)

y2(t1)

y2(t2 ! t1)

y3(t2)

t0 t1 t2

zd (t) = c id (t − ti−1)y i

d (ti−1) + eid (t − ti−1)y i

d (ti−1) + Sd,i f id (t − ti−1,ui−1),

f id (t ,ui−1) =

e−αd (t−τ) sin[(t − τ)ωd ]ui−1(τ)dτ.

Covariance and Samples

0 1 2 3 4 5

Covariance and Samples

0 1 2 3 4 5

0 1 2 3 4 5−14

outputs

Toy examples

0 5 10 15

LF toy example 1.

0 5 10 15−0.2

−0.1

Output 1 toy example 1.

0 5 10 15−0.25

−0.2

−0.15

−0.1

−0.05

0 5 10 15 20

LF toy example 2.

0 5 10 15 20−1

−0.5

0 5 10 15 20−0.1

Segmentation of human movement (I)

q The task is to segment discrete movements related to motor primitives.

q Data collection was performed using a Barrett WAM robot as hapticinput device, with 7 DOF.

Segmentation of human movement (II)

1 2 3 4 5 6 7 8 9 10 11 12−1000

−800

−600

−400

−200

log−

Number of intervals

Log-Likelihood Try 1.

5 10 15 20

TimeLa

ceLatent force Try 1.

5 10 15 20−3

−2.5

−1.5

−0.5

HR Output Try 1.

1 2 3 4 5 6 7 8 9 10 11 12−1200

−1000

−800

−600

−400

−200

log−

Number of intervals

Log-Likelihood Try 2.

5 10 15−2

Latent force Try 2.

5 10 15−1

−0.5

ESFE Output Try 2.

References I

Alvarez, M. A., Lawrence, N. D., 2009. Sparse convolved Gaussian processes for multi-output regression. In: Koller et al. (2009), Vol. 21, pp. 57–64.

Alvarez, M. A., Luengo, D., Titsias, M. K., Lawrence, N. D., 13-15 May 2010. Efficient multioutput Gaussian processes through variational inducingkernels. In: Teh, Y. W., Titterington, M. (Eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLRW&CP 9, Chia Laguna, Sardinia, Italy, pp. 25–32.

Honkela, A., Girardot, C., Gustafson, E. H., Liu, Y.-H., Furlong, E. E. M., Lawrence, N. D., Rattray, M., 2010. Model-based method for transcription factortarget identification with limited data. Proc. Natl. Acad. Sci. 107 (17), 7793–7798.

Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (Eds.), 2009. NIPS. Vol. 21. MIT Press, Cambridge, MA.

Lawrence, N. D., Sanguinetti, G., Rattray, M., 2007. Modelling transcriptional regulation using Gaussian processes. In: Scholkopf, B., Platt, J. C.,Hofmann, T. (Eds.), NIPS. Vol. 19. MIT Press, Cambridge, MA, pp. 785–792.

Titsias, M., Lawrence, N. D., Rattray, M., 2009. Efficient sampling for Gaussian process inference using control variables. In: Koller et al. (2009), Vol. 21,pp. 1681–1688.

Titsias, M. K., 16-18 April 2009. Variational learning of inducing variables in sparse Gaussian processes. In: van Dyk, D., Welling, M. (Eds.), Proceedingsof the Twelfth International Conference on Artificial Intelligence and Statistics. JMLR W&CP 5, Clearwater Beach, Florida, pp. 567–574.

Latent Forces Models using Gaussian Processes

Documents

Workshop on Bayesian Inference for Latent Gaussian Models with Applications

Non-reversible Gaussian processes for identifying latent … · 2020. 11. 5. · Non-reversible Gaussian processes for identifying latent dynamical structure in neural data Virginia

Latent Gaussian Processes for Distribution Estimation of ...proceedings.mlr.press/v37/gala15.pdf · Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical

Gaussian Process Latent Variable Models & applications in single-cell genomics

Isometric Gaussian Process Latent Variable Model for

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models

Gaussian Hierarchical Latent Dirichlet Allocation: Bringing ...use to represent a given document. Our Gaussian hierarchical latent Dirichlet allocation signi cantly improves polysemy

Variational Inference for Latent Variables and …...of MAP estimation over the latent variables which is currently the standard way to train GP-LVMs. 2.1 Gaussian Processes for Latent

Likelihood Evaluation of High-Dimensional Spatial Latent Gaussian

Variational Gaussian Process Dynamical Systemspapers.nips.cc/paper/4330-variational-gaussian... · with a dynamical prior for the latent space and seek a maximum a posteriori (MAP)

Latent Gaussian Count Time Series

Learning GP-BayesFilters via Gaussian Process Latent ...slotcar based on IMU measurements only. I. INTRODUCTION Over the last years, Gaussian processes (GPs) have been applied with

Approximate Inference for Deep Latent Gaussian …enalisni/BDL_paper20.pdf · Approximate Inference for Deep Latent Gaussian Mixtures ... Burda et al. [2] proposed an importance weighted

Localized discriminative Gaussian process latent variable ... · PDF fileLocalized discriminative Gaussian process latent variable model for text ... we use Discriminative Gaussian

Approximate Inference for Deep Latent Gaussian Mixturesenalisni/BDL_paper20.pdf · Approximate Inference for Deep Latent Gaussian Mixtures Eric Nalisnick1, Lars Hertel2, and Padhraic

Gaussian Process Structural Equation Models with Latent Variables

Approximate Inference for Deep Latent Gaussian Mixtures · Approximate Inference for Deep Latent Gaussian Mixtures Eric Nalisnick1, Lars Hertel2, and Padhraic Smyth1 1Department of

Nonparametric estimation of a latent variable model...One possibility to ﬁt latent variable models to data is to assume that the underly-ing distribution is Gaussian, and therefore

LCMM: a R package for the estimation of latent …...LCMM: a R package for the estimation of latent class mixed models for Gaussian, ordinal, curvilinear longitudinal data and/or time-to-event

Variational Methods for Discrete-Data Latent Gaussian Models › Writings › epfl-march-2012.pdf · Variational Methods for Discrete-Data Latent Gaussian Models Missing Ratings From