31
Gaussian Process Latent Variable Model (GPLVM) James McMurray PhD Student Department of Empirical Inference 11/02/2014

The Gaussian Process Latent Variable Model (GPLVM)

Embed Size (px)

Citation preview

Page 1: The Gaussian Process Latent Variable Model (GPLVM)

Gaussian Process Latent Variable Model(GPLVM)

James McMurray

PhD StudentDepartment of Empirical Inference

11/02/2014

Page 2: The Gaussian Process Latent Variable Model (GPLVM)

Outline of talkIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 3: The Gaussian Process Latent Variable Model (GPLVM)

Why are latent variable models useful?

I Data has structure.

Page 4: The Gaussian Process Latent Variable Model (GPLVM)

Why are latent variable models useful?

I Observed high-dimensional data often lies on alower-dimensional manifold.

Example “Swiss Roll” dataset

Page 5: The Gaussian Process Latent Variable Model (GPLVM)

Why are latent variable models useful?

I The structure in the data means that we don’t need such highdimensionality to describe it.

I The lower dimensional space is often easier to work with.

I Allows for interpolation between observed data points.

Page 6: The Gaussian Process Latent Variable Model (GPLVM)

Definition of a latent variable model

I Assumptions:I Assume that the observed variables actually result from a

smaller set of latent variables.I Assume that the observed variables are independent given the

latent variables.

I Differs slightly from dimensionality reduction paradigm whichwishes to find a lower-dimensional embedding in thehigh-dimensional space.

I With the latent variable model we specify the functional formof the mapping:

y = g(x) + ε

where x are the latent variables, y are the observed variablesand ε is noise.

I Obtain different latent variable models for differentassumptions on g(x) and ε

Page 7: The Gaussian Process Latent Variable Model (GPLVM)

Graphical model representation

Graphical Model example of Latent Variable ModelTaken from Neil Lawrence:http: // ml. dcs. shef. ac. uk/ gpss/ gpws14/ gp_ gpws14_ session3. pdf

Page 8: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 9: The Gaussian Process Latent Variable Model (GPLVM)

Principal Components Analysis (PCA)

I Returns orthogonal dimensions of maximum variance.I Works well if data lies on a plane in the higher dimensional

space.I Linear method (although variants allow non-linear application,

e.g. kernel PCA).

Example application of PCA. Taken fromhttp: // www. nlpca. org/ pca_ principal_ component_ analysis. html

Page 10: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 11: The Gaussian Process Latent Variable Model (GPLVM)

Probabilistic PCA (PPCA)

I A probabilistic version of PCA.I Probabilistic formulation is useful for many reasons:

I Allows comparison with other techniques via likelihoodmeasure.

I Facilitates statistical testing.I Allows application of Bayesian methods.I Provides a principled way of handling missing values - via

Expectation Maximization.

Page 12: The Gaussian Process Latent Variable Model (GPLVM)

PPCA Definition

I Consider a set of centered data of n observations and ddimensions: Y = [y1, . . . , yn]T .

I We assume this data has a linear relationship with someembedded latent space data xn. Where Y ∈ RN×D andx ∈ RN×q.

I yn = Wxn + n, where xn is the q-dimensional latent variableassociated with each observation, and W ∈ RD×q is thetransformation matrix relating the observed and latent space.

I We assume a spherical Gaussian distribution for the noise witha mean of zero and a covariance of β−1I

I Likelihood for an observation yn is:

p (yn|xn,W, β) = N(yn|Wxn, β

−1I)

Page 13: The Gaussian Process Latent Variable Model (GPLVM)

PPCA Derivation

I Marginalise latent variables xn, put a Gaussian prior on W andsolve using maximum likelihood.

I The prior used for xn in the integration is a zero mean, unitcovariance Gaussian distribution:

p(xn) = N (xn|0, I)

p(yn|W, β) =

∫p(yn|xn,W, β)p(xn)dxn

p(yn|W, β) =

∫N(yn|Wxn, β

−1I)N (xn|0, I)dxn

p(yn|W, β) = N (yn|0,WWT + β−1I)

I Assuming i.i.d. data, the likelihood of the full set is theproduct of the individual probabilities:

p(Y |W, β) =N∏

n=1

p(yn|W, β)

Page 14: The Gaussian Process Latent Variable Model (GPLVM)

PPCA Derivation

I To calculate that marginalisation step we use the summationand scaling properties of Gaussians.

I Sum of Gaussian variables is Gaussian.

n∑i=1

N (µi , σ2i ) ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

I Scaling a Gaussian leads to a Gaussian:

wN (µ, σ2) ∼ N (wµ,w2σ2)

I So:y = Wx + ε , x ∼ N (0, I) , ε ∼ N (0, σ2I)

Wx ∼ N (0,WWT)

Wx + ε ∼ N (0,WWT + σ2I)

Page 15: The Gaussian Process Latent Variable Model (GPLVM)

PPCA Derivation

I Can find a solution for W by maximising the likelihood.

I Results in an eigenvalue problem.

I Turns out that the closed-form solution for W is achievedwhen W spans the principal sub-space of the data1.

I Same solution as PCA: Probabilistic PCA

I Can it be extended to capture non-linear features?

1Michael E. Tipping and Christopher M. Bishop. “Probabilistic principalcomponent analysis.” (1997).

Page 16: The Gaussian Process Latent Variable Model (GPLVM)

Dual PPCA

I Similar to previous derivation of PPCA.

I But marginalise W and optimise xn.

I Same linear-Gaussian relationship between latent variables anddata:

p(Y|X,W, β) =D∏

d=1

N (yd ,:|Wxd ,:, β−1I)

I Place a conjugate prior on W:

P(W) =D∏

d=1

N (wd ,:|0, I)

I Resulting marginal likelihood is:

P(Y |X , β) =D∏

d=1

N (y:,d |0,XXT + β−1I)

Page 17: The Gaussian Process Latent Variable Model (GPLVM)

Dual PPCA

I Results in equivalent eigenvalue problem to PPCA.

I So what is the benefit?

I The eigendecomposition is now done on an N × q instead of ad × q matrix.

I Recall marginal likelihood:

P(Y |X , β) =D∏

d=1

N (y:,d |0,XXT + β−1I)

I The covariance matrix is a covariance function:

K = XXT + β−1I

I This linear kernel can be replaced by other covariancefunctions for non-linearity.

I This is the GPLVM.

Page 18: The Gaussian Process Latent Variable Model (GPLVM)

GPLVM

I Each dimension of the marginal distribution can be interpretedas an independent Gaussian Process2.

I Dual PPCA is the special case where the output dimensionsare assumed to be linear, independent and identicallydistributed.

I GPLVM removes assumption of linearity.

I Gaussian prior over the function space.

I Choice of covariance function changes family of functionsconsidered.

I Popular kernels:I Exponentiated Quadratic (RBF) kernelI Matern kernelsI Periodic kernelsI Many more...

2Neil Lawrence: “Probabilistic non-linear principal component analysis withGaussian process latent variable models.” JMLR (2005)

Page 19: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 20: The Gaussian Process Latent Variable Model (GPLVM)

Example: Frey Face data

Example in GPMat

Page 21: The Gaussian Process Latent Variable Model (GPLVM)

Example: Motion Capture data

Taken from Keith Grochow, et al. “Style-based inverse kinematics.” ACMTransactions on Graphics (TOG). Vol. 23. No. 3. ACM, 2004.

Page 22: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 23: The Gaussian Process Latent Variable Model (GPLVM)

Practical points

I Need to optimise over non-convex objective function.

I Achieved using gradient-based methods (scaled conjugategradients).

I Several restarts to attempt to avoid local optima.

I Cannot guarantee global optimum.

I High computational cost for large datasets.

I May need to optimise over most-informative subset of data,the “active set” for sparsification.

Page 24: The Gaussian Process Latent Variable Model (GPLVM)

Practical points

Initialisation can have a large effect on the final results.

Effect of poor initialisation on Swiss Roll dataset. PCA left, Isomap right. Taken from“Probabilistic non-linear principal component analysis with Gaussian process latentvariable models.”, Neil Lawrence, JMLR (2005).

Page 25: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 26: The Gaussian Process Latent Variable Model (GPLVM)

Variants

I There are a number of variants of the GPLVM.

I For example, the GPLVM uses the same covariance functionfor each output dimension.

I This can be changed, for example the Scaled GPLVM whichintroduces a scaling parameter for each output dimension3.

I The Gaussian Process Dynamic Model (GPDM) adds anotherGaussian process for dynamical mappings4.

I The Bayesian GPLVM approximates integrating over both thelatent variables and the mapping function5.

3Keith Grochow, et al. “Style-based inverse kinematics.” ACM Transactionson Graphics (TOG). Vol. 23. No. 3. ACM, 2004.

4Wang, Jack, Aaron Hertzmann, and David M. Blei. “Gaussian processdynamical models.” Advances in neural information processing systems. 2005.

5Titsias, Michalis, and Neil Lawrence. “Bayesian Gaussian process latentvariable model.” (2010).

Page 27: The Gaussian Process Latent Variable Model (GPLVM)

Variants

I The Shared GPLVM learns mappings from a shared latentspace to two separate observational spaces.

I Used by Disney Research in their paper “AnimatingNon-Humanoid Characters with Human Motion Data” forgenerating animations for non-human characters from humanmotion capture data.

Shared GPLVM mappings as used by Disney Research

I Video

Page 28: The Gaussian Process Latent Variable Model (GPLVM)

Variants

I Can also put a Gaussian Process prior on X to produce DeepGaussian Processes6.

I Zhang et al. developed Invariant GPLVM7 - permitsinterpretation of causal relations between observed variables,by allowing arbitrary noise correlations between the latentvariables.

I Currently attempting to implement IGPLVM in GPy.

6Damianou, Andreas C., and Neil D. Lawrence. ”Deep Gaussian Processes.”arXiv preprint arXiv:1211.0358 (2012).

7Zhang, K., Scholkopf, B., and Janzing, D. (2010). “Invariant GaussianProcess Latent Variable Models and Application in Causal Discovery”. UAI2010.

Page 29: The Gaussian Process Latent Variable Model (GPLVM)

PlanIntroduction

Why are latent variable models useful?Definition of a latent variable modelGraphical Model representation

PCA recapPrincipal Components Analysis (PCA)

Probabilistic PCA based methodsProbabilistic PCA (PPCA)Dual PPCAGPLVM

Examples

Practical points

Variants

ConclusionConclusionReferences

Page 30: The Gaussian Process Latent Variable Model (GPLVM)

Conclusion

I Implemented in GPy (Python) and GPMat (MATLAB).

I Many practical applications - pose modelling, tweening

I Especially if smooth interpolation is desireable.

I Modeling of confounders.

Thanks for your time

Questions?

Page 31: The Gaussian Process Latent Variable Model (GPLVM)

References

I Neil Lawrence, ”Gaussian process latent variable models forvisualisation of high dimensional data.” Advances in neuralinformation processing systems 16.329-336 (2004): 3.

I Neil Lawrence, “Probabilistic non-linear principal componentanalysis wit Gaussian process latent variable models.” JMLR(2005)

I Gaussian Process Winter School, Sheffield 2013:http://ml.dcs.shef.ac.uk/gpss/gpws14/

I WikiCourseNote: http://wikicoursenote.com/wiki/

Probabilistic_PCA_with_GPLVM