Latent Forces Models using Gaussian Processes

Latent Forces Models using Gaussian Processes

Mauricio A. Alvarez

(joint with David Luengo and Neil D. Lawrence)

Latent Force Models Workshop, Sheffield, June 2013

Contenido

Introduction

Motivation: From Latent Variables to Latent Forces

Latent Force Models Basics

Learning Latent Force Models

ApplicationsSecond order dynamical systemsPartial Differential EquationsStochastic LFM

ExtensionsNon-linear and cascaded LFMsSwitched Latent Force Models

Data driven paradigm

q Traditionally, the main focus in machine learning has been modelgeneration through a data driven paradigm.

q Combine a data set with a flexible class of models and, throughregularization, make predictions on unseen data.

q Problems– Data is scarce relative to the complexity of the system.– Model is forced to extrapolate.

Mechanistic models

q Models inspired by the underlying knowledge of a physical system arecommon in many areas.

q Description of a well characterized physical process that underpins thesystem, typically represented with a set of differential equations.

q Identifying and specifying all the interactions might not be feasible.

q A mechanistic model can enable accurate prediction in regions wherethere may be no available training data

Hybrid systems

q We suggest a hybrid approach involving a mechanistic model of thesystem augmented through machine learning techniques.

q Dynamical systems (e.g. incorporating first order and second orderdifferential equations).

q Partial differential equations for systems with multiple inputs.

Contenido

Introduction






Latent variable model: definition

q Our approach can be seen as a type of latent variable model.

Y = UW> + E,

where Y ∈ RN×D, U ∈ RN×Q , W ∈ RD×Q (Q < D) and E is a matrixvariate white Gaussian noise with columns e:,d ∼ N (0,Σ).

q In PCA and FA the common approach to deal with the unknowns is tointegrate out U under a Gaussian prior and optimize with respect to W.

Latent variable model: alternative view

q Data with temporal nature and Gaussian (Markov) prior for rows of Uleads to the Kalman filter/smoother.

q Consider a joint distribution for p (U|t), t = [t1 . . . tN ]>, with the form of aGaussian process (GP),

p (U|t) =Q∏

q=1

N(u:,q |0,Ku:,q ,u:,q

).

The latent variables are random functions, {uq(t)}Qq=1 with associated

covariance Ku:,q ,u:,q .

q The GP for Y can be readily implemented.

Latent force model: mechanistic interpretation (1)

q We include a further dynamical system with a mechanistic inspiration.

q Reinterpret equation Y = UW> + E, as a force balance equation

YB = US> + E,

where S ∈ RD×Q is a matrix of sensitivities, B ∈ RD×D is diagonalmatrix of spring constants, W> = S>B−1 and e:,d ∼ N

(0,B>ΣB

).


Bd

yd (t)

U(t)


U(t)

Sd1

Sd2

SdQ

u1(t)

u2(t)

uQ(t)


U(t)

Sd1

Sd2

SdQ

u1(t)

u2(t)

uQ(t)

Bd

yd (t)

YB = US> + E

Latent force model: extension (1)

q The model can be extended including dampers and masses.

q We can writeYM + YC + YB = US> + E ,

whereY is the first derivative of Y w.r.t. timeY is the second derivative of Y w.r.t. timeC is a diagonal matrix of damping coefficientsM is a diagonal matrix of massesE is a matrix variate white Gaussian noise.


Bd

yd (t)

Cd

Md

U(t)


U(t)

Sd1

Sd2

SdQ

u1(t)

u2(t)

uQ(t)


U(t)

Sd1

Sd2

SdQ

u1(t)

u2(t)

uQ(t)

Bd

yd (t)

Cd

Md

YM + YC + YB = US> + E

Latent force model: properties

q This model allows to include behaviors like inertia and resonance.

q We refer to these systems as latent force models (LFMs).

q One way of thinking of our model is to consider puppetry.

Contenido

Introduction






General Dynamical LFM

q Dynamical latent force model of order M

M∑m=0

Dm[Y]Am = US> + E,

where Dm[Y] has elements Dmyd (t) = dmyd (t)dtm , and Am is a diagonal

matrix with elements Am,d that weight the contribution of Dmyd (t).

q Each element in the expression above can be written as

DM0 yd =

M∑m=0

Am,dDmyd (t) =Q∑

q=1

Sd,quq(t) + ed (t),

where we have introduced an operator DM0 that is equivalent to applying

the weighted sum of operators Dm.

Green’s functions

q The operator DM0 is related to a so called Green’s function Gd (t , s) by

DM0 [Gd (t , s)] = δ(t − s),

with s fixed.

q The solution for yd (t) can be written in terms of the Green’s function like

yd (t) =Q∑

q=1

Sd,q fd (t ,uq(t)) + wd (t),

with

fd (t ,uq(t)) =

∫T

Gd (t , τ)uq(τ)dτ,

and wd (t) is a general stochastic process.

Covariance for the outputs

q We assume that the latent functions {uq(t)}Qq=1 are independent.

q We also assume that each uq(t) follows a Gaussian process prior, thisis, uq(t) ∼ GP(0, kuq ,uq (t , t ′)).

q Furthermore, the processes {wd}Dd=1 are also assumed independent.

q The covariance cov[yd (t), yd ′(t ′)] is the given as

cov[fd (t), fd ′(t ′)] + cov[wd (t),wd ′(t ′)]δd,d ′ ,

with cov[fd (t), fd ′(t ′)] equals to

Q∑q=1

Sd,qSd ′,q

∫T

∫T ′

Gd (t − τ)Gd ′(t ′ − τ ′)kuq ,uq (τ, τ′)dτ ′dτ.

Multidimensional inputs

q In dynamical latent force models the input variable is one-dimensional(time).

q For higher-dimensional inputs, x ∈ Rp, partial differential equations areused.

q Once the Green’s function associated to the linear partial differentialoperator has been established, we employ similar equations to theones shown before to compute covariances.

q The input t is replaced by a high-dimensional vector x.

Contenido

Introduction






Hyperparameter Learning

q Let X = {xn}Nn=1 represents a set of inputs, and θ represents the

hyperparameters of the covariance function.

q The marginal likelihood for the outputs can be written as

p(y|X,θ) = N (y|0,Kf,f + Σ),

where y = vec Y, Kf,f ∈ RND×ND with each element given bycov[fd (xn), fd ′(xn′)] (Neil’s talk on Tuesday and today).

q The matrix Σ represents the covariance associated with theindependent processes wd (x).

q Hyperparameters are estimated by maximizing the logarithm of themarginal likelihood.

Predictive distribution

q Prediction for a set of test inputs X∗ is done using standard Gaussianprocess regression techniques.

q The predictive distribution is given by

p(y∗|y,X,θ) = N (y∗|µ∗,Ky∗,y∗),

with

µ∗ = Kf∗,f (Kf,f + Σ)−1 y,

Ky∗,y∗ = Kf∗,f∗ − Kf∗,f (Kf,f + Σ)−1 K>f∗,f + Σ∗.

Efficient approximations (I)

q Learning θ through marginal likelihood maximization involves theinversion of the matrix Kf,f + Σ.

q The inversion of this matrix scales as O(D3N3).

q Single output case (D = 1) (James’ talk on Tuesday).

q Recently, Alvarez and Lawrence (2009) introduced an efficientapproximation for the case D > 1.

Efficient approximations (II)

q If only a few number K < N of values of u(x) are known, then the set ofoutputs fd (x,u(x)) are uniquely determined.

q Similar to Partially Independent Training Conditional (PITC)approximation.

Efficient approximations (III)

Sample from p(u) fd (x) =

∫X

Gd (x− z)u(z)dz

Efficient approximations (III)

Sample from p(u) fd (x) =

∫X

Gd (x− z)u(z)dz

Sample fromp(u|u)

fd (x) ≈∫X

Gd (x− z) E [u(z)|u] dz

Efficient approximations (IV)

q Another approximation (Alvarez et al., 2010) establishes a lower boundon the marginal likelihood and reduces computational complexity toO(DNK 2).

q Maximizing the lower bound with respect to φ(u)

L(Z,θ) = logN(y|0,Kf,uK−1

u,uKu,f + Σ)

− 12

trace[Σ−1 (Kf,f − Kf,uK−1

u,uKu,f)].

q Deterministic Training Conditional Variational (DTCVAR) approximationfor multiple-output GP regression.

Contenido

Introduction






Contenido

Introduction






Second Order Dynamical System

Using the system of second order differential equations

Mdd2yd (t)

dt2 + Cddyd (t)

dt+ Bdyd (t) =

Q∑q=1

Sd,quq(t) + ed (t),

whereuq(t) latent forcesyd (t) displacements over time

Cd damper constant for the d-th outputBd spring constant for the d-th outputMd mass constant for the d-th output

Sd,q sensitivity of the d-th output to the q-th input.

Second Order Dynamical System: solution

Solving for yd (t), we obtain

yd (t) =Q∑

q=1

Sd,q fd (t ,uq(t)) + wd (t),

where the linear operator is given by a convolution:

fd (t ,uq(t)) =

∫ t

0

1ωd

exp(−αd (t − τ)) sin(ωd (t − τ))︸︷︷︸Gd (t−τ)

uq(τ)dτ,

with ωd =√

4Bd − C2d/2 and αd = Cd/2.

Second Order Dynamical System: covariance matrix

Behaviour of the system summarized by the damping ratio:

ζd =12

Cd/√

Bd

ζd > 1 overdamped systemζd = 1 critically damped systemζd < 1 underdamped systemζd = 0 undamped system (no friction)

Example covariance matrix:

ζ1 = 0.125 underdampedζ2 = 2 overdampedζ3 = 1 critically damped

f(t) y1(t) y

2(t) y

3(t)

f(t)

y 1(t)

y 2(t)

y 3(t)

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Second Order Dynamical System: samples from GP

0 5 10 15 20−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Joint samples from the ODE covariance, cyan: u (t), red:y1 (t)(underdamped) and green: y2 (t) (overdamped) and blue: y3 (t)(critically damped).


0 5 10 15 20−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2



0 5 10 15 20−1

−0.5

0

0.5

1

1.5

2



0 5 10 15 20−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2


Motion Capture Data (1)

q CMU motion capture data, motions 18, 19 and 20 from subject 49.

q Motions 18 and 19 for training and 20 for testing.

Motion Capture Data (2)

q The data down-sampled by 32 (from 120 frames per second to 3.75).

q We focused on the subject’s left arm.

q For testing, we condition only on the observations of the shoulder’sorientation (motion 20) to make predictions for the rest of the arm’sangles.

Motion Capture Results

Root mean squared (RMS) angle error for prediction of the left arm’sconfiguration in the motion capture data. Prediction with the latent forcemodel outperforms the prediction with regression for all apart from theradius’s angle.

Latent Force RegressionAngle Error ErrorRadius 4.11 4.02Wrist 6.55 6.65

Hand X rotation 1.82 3.21Hand Z rotation 2.76 6.14

Thumb X rotation 1.77 3.10Thumb Z rotation 2.73 6.09

Contenido

Introduction






Diffussion in the Swiss Jura

Lead

Cadmium

CopperCopper

Region ofSwiss Jura


Lead

Cadmium

CopperCopper

Region ofSwiss Jura


Lead

Cadmium

CopperCopper

Region ofSwiss Jura


Lead

Cadmium

CopperCopper

Region ofSwiss Jura

Diffusion equationq A simplified version of the diffusion equation is

∂fd (x, t)∂t

=

p∑j=1

κd∂2fd (x, t)∂x2

j,

where fd (x, t) are the concentrations of each pollutant.

q The solution to the system is then given by

fd (x, t) =Q∑

q=1

Sd,q

∫Rp

Gd (x,x′, t)uq(x′)dx′,

where uq(x) represents the concentration of pollutants at time zero andGd (x,x′, t) is the Green’s function given as

Gd (x,x′, t) =1

2pπp/2T p/2d

exp

− p∑j=1

(xj − x ′j )2

4Td

,with Td = κd t .

Prediction of Metal Concentrations

q Prediction of a primary variable by conditioning on the values of somesecondary variables.

Primary variable Secondary VariablesCd Ni, ZnCu Pb, Ni, ZnPb Cu, Ni, ZnCo Ni, Zn

q Comparison bewteen diffusion kernel, independent GPs and “ordinaryco-kriging”.

Metals IGPs GPDK OCKCd 0.5823±0.0133 0.4505±0.0126 0.5Cu 15.9357±0.0907 7.1677±0.2266 7.8Pb 22.9141±0.6076 10.1097±0.2842 10.7Co 2.0735±0.1070 1.7546±0.0895 1.5

Contenido

Introduction






A dynamic model for financial data (I)

Multivariate financial data set: the dollar prices of the 3 precious metals andtop 10 currencies.

50 100 150 200 250600

650

700

750

800

850

Gold50 100 150 200 250

0.76

0.78

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

AUD

A dynamic model for financial data (I)

q Our model: a set of coupled differential equations, driven by either aGaussian process, a white noise process or both,

dfd (t)dt

= λd fd (t) + Sdu(t),

where λd is a decay coefficient and Sd quantifies the influence of theprocess u(t).

q If u(t) is a white noise process→ Langevin equation→ a linearstochastic differential equation.

q Solution for fd (t) has the form of convolutions. For a single output andwhite noise process, fd (t)→ Ornstein-Uhlenbeck (OU) process.

A dynamic model for financial data (III)

50 100 150 200 2501000

1100

1200

1300

1400

1500

1600

XPT50 100 150 200 250

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

CAD

50 100 150 200 2507.8

8

8.2

8.4

8.6

8.8

9

9.2

9.4

9.6

9.8x 10

−3

JPY50 100 150 200 250

0.7

0.75

0.8

0.85

0.9

0.95

AUD

Contenido

Introduction






Contenido

Introduction






Non-linear and Cascaded LFMs

q Non-linear LFMs– If the likelihood function is not Gaussian or the differential equation is

nonlinear, approximations are needed.

– Approximations used before include the Laplace’s approximation(Lawrence et al., 2007) or sampling (Titsias et al., 2009).

q Cascaded Latent Force Models– Latent forces uq(t) could be the outputs of another latent force model.

– For example, in Honkela et al. (2010), the authors use a cascaded systemto describe gene expression data

Contenido

Introduction






Recap

q Latent force models encode the interaction between multiple relateddynamical systems in the form of a covariance function.

q Each variable to be modeled is represented as the output of adifferential equation.

q Differential equations are driven by a weighted sum of latent functionswith uncertainty given by a Gaussian process priors.

Discontinuous latent forces

q If a single Gaussian process prior is used to represent each latentfunction then the models we consider are limited to smooth drivingfunctions.

q However, discontinuities and segmented latent forces are omnipresentin real-world data.

q Impact forces due to contacts in a mechanical dynamical system or aswitch in an electrical circuit.

q Motor primitives: most non-rhythmic natural motor skills consist of asequence of segmented, discrete movements.

LFM

mass

damperspring

sensitivity

output(t)force(t)

force(t) ∼ GP(0, k(t,t’))

LFM

mass

damperspring

sensitivity

output(t)force(t)


force(t)

time

LFM

mass

damperspring

sensitivity

output(t)force(t)


output(t)

time

force(t)

time

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)

force(t,i) ∼ GP(0, k(t,t’,i))

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)


t

force(t,1) force(t,2) force(t,3)

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)


t


output(t,1) output(t,2) output(t,3)

t

Switched LFM

mass(i)

damper(i)spring(i)

sensitivity(i)

output(t)force(t,i)


t


output(t,1) output(t,2) output(t,3)

t

t

output(t)

Continuous in the outputs

y1(t! t0)

y2(t! t1)

y3(t! t2)

y1(t0)

y1(t1 ! t0)

y2(t1)

y2(t2 ! t1)

y3(t2)

z(t)

t0 t1 t2

zd (t) = c id (t − ti−1)y i

d (ti−1) + eid (t − ti−1)y i

d (ti−1) + Sd,i f id (t − ti−1,ui−1),

where

f id (t ,ui−1) =

∫ t

0

1ωd

e−αd (t−τ) sin[(t − τ)ωd ]ui−1(τ)dτ.

Covariance and Samples

0 1 2 3 4 5

0

1

2

3

4

5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Covariance and Samples

0 1 2 3 4 5

0

1

2

3

4

5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5−14

−12

−10

−8

−6

−4

−2

0

2

4

time

outputs

Toy examples

0 5 10 15

−2

−1

0

1

2

LF toy example 1.

0 5 10 15−0.2

−0.1

0

0.1

0.2

0.3

0.4

Output 1 toy example 1.

0 5 10 15−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05


0 5 10 15 20

−1

0

1

2

LF toy example 2.

0 5 10 15 20−1

−0.5

0

0.5

1

1.5


0 5 10 15 20−0.1

0

0.1

0.2

0.3

0.4

0.5


Segmentation of human movement (I)

q The task is to segment discrete movements related to motor primitives.

q Data collection was performed using a Barrett WAM robot as hapticinput device, with 7 DOF.

Segmentation of human movement (II)

1 2 3 4 5 6 7 8 9 10 11 12−1000

−800

−600

−400

−200

0

200

400

Val

ue o

f the

log−

likel

ihoo

d

Number of intervals

Log-Likelihood Try 1.

5 10 15 20

−2

−1

0

1

2

TimeLa

tent

For

ceLatent force Try 1.

5 10 15 20−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

Time

HR

HR Output Try 1.

1 2 3 4 5 6 7 8 9 10 11 12−1200

−1000

−800

−600

−400

−200

0

200

Val

ue o

f the

log−

likel

ihoo

d

Number of intervals

Log-Likelihood Try 2.

5 10 15−2

−1

0

1

2

3

4

Time

Late

nt F

orce

Latent force Try 2.

5 10 15−1

−0.5

0

0.5

1

1.5

2

2.5

Time

SF

ESFE Output Try 2.

References I

Alvarez, M. A., Lawrence, N. D., 2009. Sparse convolved Gaussian processes for multi-output regression. In: Koller et al. (2009), Vol. 21, pp. 57–64.

Alvarez, M. A., Luengo, D., Titsias, M. K., Lawrence, N. D., 13-15 May 2010. Efficient multioutput Gaussian processes through variational inducingkernels. In: Teh, Y. W., Titterington, M. (Eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLRW&CP 9, Chia Laguna, Sardinia, Italy, pp. 25–32.

Honkela, A., Girardot, C., Gustafson, E. H., Liu, Y.-H., Furlong, E. E. M., Lawrence, N. D., Rattray, M., 2010. Model-based method for transcription factortarget identification with limited data. Proc. Natl. Acad. Sci. 107 (17), 7793–7798.

Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (Eds.), 2009. NIPS. Vol. 21. MIT Press, Cambridge, MA.

Lawrence, N. D., Sanguinetti, G., Rattray, M., 2007. Modelling transcriptional regulation using Gaussian processes. In: Scholkopf, B., Platt, J. C.,Hofmann, T. (Eds.), NIPS. Vol. 19. MIT Press, Cambridge, MA, pp. 785–792.

Titsias, M., Lawrence, N. D., Rattray, M., 2009. Efficient sampling for Gaussian process inference using control variables. In: Koller et al. (2009), Vol. 21,pp. 1681–1688.

Titsias, M. K., 16-18 April 2009. Variational learning of inducing variables in sparse Gaussian processes. In: van Dyk, D., Welling, M. (Eds.), Proceedingsof the Twelfth International Conference on Artificial Intelligence and Statistics. JMLR W&CP 5, Clearwater Beach, Florida, pp. 567–574.

Documents

Latent Forces Models using Gaussian Processes