18
Gaussian Process Kernels for Pattern Discovery and Extrapolation Androw Gordon Wilson * Ryan Prescott Adams * Department of Engineering, University of Cambridge, Cambridge, UK School of Engineering and Applied Sciences, Harvard University, Cambridge, USA 07 Feb 2014 Proceedings of the 30 th International Conference on Machine Learning (2013) Presented by Kyle Ulrich A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 1 / 18

Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Gaussian Process Kernels for Pattern Discovery andExtrapolation

Androw Gordon Wilson∗ Ryan Prescott Adams†

∗ Department of Engineering, University of Cambridge, Cambridge, UK† School of Engineering and Applied Sciences, Harvard University, Cambridge, USA

07 Feb 2014

Proceedings of the 30th International Conference on Machine Learning (2013)

Presented by Kyle Ulrich

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 1 / 18

Page 2: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Outline

1 Brief introduction of Gaussian ProcessesGP’s and GP RegressionCovariance kernels

2 Pattern discovery and extrapolationDifficulty in learning hidden representations of dataSpectral mixture kernel

3 ExperimentsExtrapolating CO2 concentrationsRecovering general covariance functionsHandling negative covariancesRecovering functionsPredicting airline passenger numbers

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 2 / 18

Page 3: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Gaussian Processes

Gaussian Processes are defined as a distribution over functions,

f (x) ∼ GP(m(x), k(x , x ′)),

where x ∈ RP is an input variable. The mean function m(x) andcovariance kernel k(x , x ′) are defined by

m(x) = E[f (x)]

k(x , x ′) = cov(f (x), f (x ′))

A collection of functions [f (x1), f (x2), . . . , f (xN)] are realized througha multivariate normal distribution

[f (x1), f (x2), . . . , f (xN)]T ∼ N (µ,K)

where µi = m(xi ) and Kij = k(xi , xj).

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 3 / 18

Page 4: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Gaussian Process Regression

Consider a noisy model where we observe y drawn from a Gaussianprocess. Often, we wish to predict new observations f∗ for new testpoints X∗.

The joint distribution of training observations y and test data f∗ is[yf∗

]∼ N

([m(X )m(X∗)

],

[K (X ,X ) + σ2nI K (X ,X∗)

K (X∗,X ) K (X∗,X∗)

])from which, the predictive distribution is trivially p(f∗|X , y,X∗)The marginal likelihood is defined by:

p(y|X) =

∫p(y|f,X)p(f|X)df

Model parameters (e.g., length scale of covariance function) can beoptimized to maximize the marginal likelihood

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 4 / 18

Page 5: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Some Common Stationary Kernel Functions

A stationary kernel is a function of τ = x − x ′

Squared Exponential (SE) Kernel

kSE (x , x ′) = exp(− 1

2`2||x − x ′||2

)Common smoothing kernel; thelength scale ` determines how quicklythe function varies with x

Matern (ME) Kernel (dof ν = 32)

kMA(τ) = a(

1 +√3τ`

)exp

(−√3τ`

)Heavy-tailed kernel that is ν−1 timesdifferentiable. As ν → ∞, the MEkernel approaches the SE kernel

Rational Quadratic (RQ) Kernel

kRQ(τ) =(

1 + τ2

2α`2

)−αDerived as a scale mixture of SE ker-nels with different length scales

Periodic (PE) Kernel

kPE (τ) = exp(− 2`2

sin2(πτω))

Easily models functions that repeatthemselves with frequency ω

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 5 / 18

Page 6: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Outline

1 Brief introduction of Gaussian ProcessesGP’s and GP RegressionCovariance kernels

2 Pattern discovery and extrapolationDifficulty in learning hidden representations of dataSpectral mixture kernel

3 ExperimentsExtrapolating CO2 concentrationsRecovering general covariance functionsHandling negative covariancesRecovering functionsPredicting airline passenger numbers

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 6 / 18

Page 7: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Difficulty in Learning Hidden Representations

Gaussian processes are often used as smoothing interpolators

Such smoothing devices cannot discover hidden features in data

Neural networks with GP’s (e.g., [Damianou & Lawrence, 2012])

Often designed to model specific structureOften use GP’s with simple interpolating kernelsIndirectly induce complicated kernels that have no closed formDifficult to interpretRequire sophisticated approximate inference techniques

Composing together a few standard kernel functions

Either specialized to application or prone to over-fittingDifficult to interpret

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7 / 18

Page 8: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Bochner’s Theorem

Theorem (Bochner)

A complex-valued function k on RP is the covariance function of a weaklystationary mean square continuous complex-valued random process on RP

if and only if it can be represented as

k(τ) =

∫RP

e2πisT τψ(ds)

where ψ is a positive finite measure

Let ψ have density S(s), which we label the spectral density of k

k and S are therefore Fourier duals:

k(τ) =

∫S(s)e2πis

T τds

S(s) =

∫k(τ)e−2πis

T τdτ

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 8 / 18

Page 9: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Spectral Mixture Kernels

Consider modeling the spectral density with a single Gaussian:

φ(s;µ, σ2) =1√

2πσ2exp{− 1

2σ2(s − µ)2}

S(s) =1

2[φ(s) + φ(−s)]

The kernel is easily obtained as follows:

k(τ) = exp{−2π2τ2σ2} cos(2πτµ)

Extending this to a scaled mixture of Q Gaussians, we obtain

k(τ) =Q∑

q=1

wq

P∏p=1

exp{−2π2τ2p v(p)q } cos(2πτpµ

(p)q )

where the qth component has mean µq = (µ(1)q , . . . , µ

(P)q ) and

covariance matrix Mq = diag(v(1)q , . . . , v

(P)q ).

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 9 / 18

Page 10: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Interpreting the SM Kernel

k(τ) =Q∑

q=1

wq

P∏p=1

exp{−2π2τ2p v(p)q } cos(2πτpµ

(p)q )

The inverse means 1/µq represent the component periods

The inverse standard deviations 1/√

vq represent length scales

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 10 / 18

Page 11: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Outline

1 Brief introduction of Gaussian ProcessesGP’s and GP RegressionCovariance kernels

2 Pattern discovery and extrapolationDifficulty in learning hidden representations of dataSpectral mixture kernel

3 ExperimentsExtrapolating CO2 concentrationsRecovering general covariance functionsHandling negative covariancesRecovering functionsPredicting airline passenger numbers

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 11 / 18

Page 12: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Mauna Loa CO2 Concentrations

Figure : Mauna Loa CO2 Concentrations. a) Forecasting CO2. Blue - trainingdata; green - testing data; black - SM kernel; cyan - Matern kernel; dashed red -squared exponential kernel; magenta - rational quadratic kernel; orange - periodickernel. b) Log spectral density comparing SE (red) and SM (black) kernels.

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 12 / 18

Page 13: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Recovering Covariance Functions

Figure : Recovering correlation functions (normalized kernels). a) Matern kernel,b) sum of RQ and periodic kernels. True correlation function is in green. TheSM, SE, and empirical correlation functions are in dashed black, red, andmagenta, respectively.

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 13 / 18

Page 14: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Negative Covariances

400 points are sampled from an AR(1) discrete time GP:

y(x + 1) = −e−0.01y(x) + σε(x)

ε(x) ∼ N (0, 1)

corresponding kernel: k(x , x ′)) = σ2(−e−.01)|x−x′|/(1− e−.02)

Figure : Negative Covariances. a) Observations of the AR series with negativecovariances. b) SM kernel in black, while the true kernel is in green. c) Spectraldensity of the SM kernel.

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 14 / 18

Page 15: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Recovering the Sinc Pattern

A pattern was created according to

y(x) = sinc(x + 10) + sinc(x) + sinc(x − 10)

and data needs to be reconstructed for x ∈ [−4.5, 4.5]

Figure : a) Observed data. b) Training data in blue; testing data in green; SMkernel in dashed black; SE, ME, RQ, and PE in red, cyan, magenta, and orange,respectively. c) Correlation function for SM kernel (black) and ME kernel (red).d) Spectral densities of the SM kernel (black) and SE kernel (red).

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 15 / 18

Page 16: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Predicting Airline Passenger Numbers

Figure : Monthly airline passenger data. a) Training data in blue; testing data ingreen; SM kernel in black; SE, ME, RQ, and PE kernels in red, cyan, magenta,and orange, respectively. b) Spectral densities of the SM and SE kernels in blackand red, respectively.

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 16 / 18

Page 17: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Performance Comparison

Figure : Test performance of the proposed spectral mixture (SM) kernelcompared with squared exponential (SE), Matern (MA), rational quadratic (RQ),and periodic (PE) kernels. The SM kernel consistently has the lowest MSE andhighest log likelihood.

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 17 / 18

Page 18: Androw Gordon Wilson Ryan Prescott Adamsypeople.ee.duke.edu/~lcarin/Kyle2.8.2014.pdfFeb 08, 2014  · A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 7

Questions?

A.G. Wilson and R.P. Adams (ICML) Spectral Mixture (SM) Kernels 07 Feb 2014 18 / 18