116
MLSS 2012: Gaussian Processes for Machine Learning MLSS 2012: Gaussian Processes for Machine Learning John P. Cunningham Washington University University of Cambridge April 18, 2012

MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

MLSS 2012:Gaussian Processes for Machine Learning

John P. CunninghamWashington University

University of Cambridge

April 18, 2012

Page 2: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 3: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

What is a Gaussian (for machine learning)?

◮ A handy tool for Bayesian inference on real valued variables:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

Page 4: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

What is a Gaussian (for machine learning)?

◮ A handy tool for Bayesian inference on real valued variables:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

Page 5: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

What is a Gaussian (for machine learning)?

◮ A handy tool for Bayesian inference on real valued variables:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

Page 6: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

What is a Gaussian (for machine learning)?

◮ A handy tool for Bayesian inference on real valued variables:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

prior p(f)y

1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

Page 7: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

What is a Gaussian (for machine learning)?

◮ A handy tool for Bayesian inference on real valued variables:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

prior p(f)y

1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

posterior p(f|y)

Page 8: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

40 50 60 70 80 900

0.05

0.1

0.15

Heart Rate at 0700 (bps)

prob

abili

ty d

ensi

ty

Page 9: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

40 50 60 70 80 90 10040

50

60

70

80

90

100

Heart Rate at 0700 (bps)

Hea

rt R

ate

at 0

800

(bps

)

Page 10: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

Page 11: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

40 50 60 70 80 90 10040

50

60

70

80

90

100

Heart Rate at 0700 (bps)

Hea

rt R

ate

at 0

800

(bps

)

Page 12: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

40 50 60 70 80 90 10040

50

60

70

80

90

100

Heart Rate at 0700 (bps)

Hea

rt R

ate

at 0

800

(bps

)

prior p(f)y

1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

Page 13: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From univariate to multivariate Gaussians:

40 50 60 70 80 90 10040

50

60

70

80

90

100

Heart Rate at 0700 (bps)

Hea

rt R

ate

at 0

800

(bps

)

prior p(f)y

1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

posterior p(f|y)

Page 14: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From multivariate Gaussians to Gaussian Processes:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

Page 15: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From multivariate Gaussians to Gaussian Processes:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

Page 16: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From multivariate Gaussians to Gaussian Processes:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

Page 17: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From multivariate Gaussians to Gaussian Processes:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

Page 18: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

From multivariate Gaussians to Gaussian Processes:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

y1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

Page 19: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Our representation of a GP distribution:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

120

Time of Day

Hea

rt R

ate

(bps

)

prior p(f)y

1 (Monday)

y2 (Tuesday)

y3 (Wednesday)

y4 (Thursday)

Page 20: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

We can take measurements less rigidly:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 21: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

We can take measurements less rigidly:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 22: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Updating the posterior:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 23: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Updating the posterior:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 24: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Updating the posterior:

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 25: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

An intuitive summary

◮ Univariate Gaussians: distributions over real valued variables

◮ Multivariate Gaussians: {pairs, triplets, ... } of real valued vars

◮ Gaussian Processes: functions of (infinite numbers of) realvalued variables → regression.

Page 26: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders

◮ denoising/smoothing

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 27: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders

◮ denoising/smoothing◮ prediction/forecasting

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 28: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders

◮ denoising/smoothing◮ prediction/forecasting

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 29: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders◮ denoising/smoothing◮ prediction/forecasting◮ dangers of parametric models

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 30: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders◮ denoising/smoothing◮ prediction/forecasting◮ dangers of parametric models◮ dangers of overfitting/underfitting

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 31: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in words and pictures

Regression: a few reminders◮ denoising/smoothing◮ prediction/forecasting◮ dangers of parametric models◮ dangers of overfitting/underfitting

0600 0700 0800 0900 1000 1100 120040

50

60

70

80

90

100

110

Time of Day

Hea

rt R

ate

(bps

)

Page 32: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 33: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Review: multivariate Gaussian

◮ f ∈ Rn is normally distributed if

p(f ) = (2π)−n2 |K |−

12 exp

{

− 12 (f − m)T K−1(f − m)

}

◮ for mean vector m ∈ Rn and positive semidefinite covariance

matrix K ∈ Rn×n

◮ shorthand: f ∼ N (m,K )

Page 34: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Definition: Gaussian Process

◮ Loosely, a multivariate Gaussian of uncountably infinite length...really long vector ≈ function

◮ f is a Gaussian process if f (t) = [f (t1), ..., f (tn)]′ has amultivariate normal distribution for all t = [t1, ..., tn]′:

f (t) ∼ N (m(t),K (t , t))

◮ (t ∈ R here for familiarity with regression in time, but domain canbe x ∈ R

D)◮ What are m(t),K (t , t)?

Page 35: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Definition: Gaussian Process

Mean function m(t):

◮ any function m : R → R (or m : RD → R)◮ very often m(t) = 0 ∀ t (mean subtract your data)

Kernel (covariance) function:

◮ any valid Mercer kernel k : RD × RD → R

◮ Mercer’s theorem: every matrix K (t , t) = {k(ti , tj)}i,j=1...n is apositive semidefinite (covariance) matrix ∀t :

vT K (t , t)v =

n∑

i=1

n∑

j=1

Kijvivj =

n∑

i=1

n∑

j=1

k(ti , tj)vivj ≥ 0

Page 36: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Definition: Gaussian Process

GP is fully defined by:

◮ mean function m(·) and kernel (covariance) function k(·, ·)◮ requirement that every finite subset of the domain t has a

multivariate normal f (t) ∼ N (m(t),K (t , t))

Notes◮ that this should exist is not trivial!◮ most interesting properties are inherited◮ Kernel function...

Page 37: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Kernel Function

Example kernel (squared exponential or SE):

k(ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

From kernel to covariance matrix◮ Choose some hyperparameters: σf = 7 , ℓ = 100

t =

070008001029

K (t , t) = {k(ti , tj)}i,j =

49.0 29.7 00.229.7 49.0 03.600.2 03.6 49.0

Page 38: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Kernel Function

Example kernel (squared exponential or SE):

k(ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

From kernel to covariance matrix◮ Choose some hyperparameters: σf = 7 , ℓ = 500

t =

070008001029

K (t , t) = {k(ti , tj)}i,j =

49.0 48.0 39.548.0 49.0 44.139.5 44.1 49.0

Page 39: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Kernel Function

Example kernel (squared exponential or SE):

k(ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

From kernel to covariance matrix◮ Choose some hyperparameters: σf = 7 , ℓ = 50

t =

070008001029

K (t , t) = {k(ti , tj)}i,j =

49.0 06.6 00.006.6 49.0 00.000.0 00.0 49.0

Page 40: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Kernel Function

Example kernel (squared exponential or SE):

k(ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

From kernel to covariance matrix◮ Choose some hyperparameters: σf = 14 , ℓ = 50

t =

070008001029

K (t , t) = {k(ti , tj)}i,j =

196 26.5 00.026.5 196 0.0100.0 0.01 196

Page 41: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Intuitive summary of GP so far

◮ GP offer distributions over functions (infinite numbers of jointlyGaussian variables)

◮ For any finite subset vector t , we have a normal distribution:

f (t) ∼ N (0,K (t , t))

◮ where covariance matrix K is calculated by plugging t into kernelk(·, ·).

◮ New notation: f ∼ GP (m(·), k(·, ·)) or f ∼ GP(m, k).

Page 42: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Important Gaussian properties (for today’s purposes):

◮ additivity (forming a joint)

◮ conditioning (inference)

◮ expectations (posterior and predictive moments)

◮ marginalisation (marginal likelihood/model selection)

◮ ...

Page 43: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Additivity (joint)

◮ prior (or latent) f ∼ N (mf ,Kff )

◮ additive iid noise n ∼ N (0, σ2n I)

◮ let y = f + n, then:

p(y , f ) = p(y |f )p(f ) = N

([

fy

]

;

[

mf

my

]

,

[

Kff Kfy

K Tfy Kyy

])

◮ where (in this case):

Kfy = E [(f − mf )(y − my )T ] = Kff Kyy = Kff + σ2

nI

◮ latent f and noisy observation y are jointly Gaussian

Page 44: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Where did the GP go?

◮ prior (or latent) f ∼ N (mf ,Kff )

◮ additive iid noise n ∼ N (0, σ2n I)

◮ let y = f + n, then:

p(y , f ) = p(y |f )p(f ) = N

([

fy

]

;

[

mf

my

]

,

[

Kff Kfy

K Tfy Kyy

])

◮ If f and y are indexed by some input points t :

mf =

mf (t1)...

mf (tn)

Kff = {k(ti , tj)}i,j=1...n ...

Page 45: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Where did the GP go?

◮ prior (or latent) f ∼ GP(mf , kff )

◮ additive iid noise n ∼ GP(0, σ2nδ)

◮ let y = f + n, then:

p(y(t), f (t)) = p(y |f )p(f ) = N

([

fy

]

;

[

mf

my

]

,

[

Kff Kfy

K Tfy Kyy

])

◮ If f and y are indexed by some input points t :

mf =

mf (t1)...

mf (tn)

Kff = {k(ti , tj)}i,j=1...n ...

◮ warning: overloaded notation - f can be infinite (GP) or finite(MVN) depending on context.

Page 46: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Conditioning (inference)

◮ If f and y are jointly Gaussian:[

fy

]

∼ N

([

mf

my

]

,

[

Kff Kfy

K Tfy Kyy

])

◮ Then:

f |y ∼ N(

Kfy K−1yy (y − my ) + mf , Kff − Kfy K−1

yy K Tfy

)

◮ inference of latent given data is simple linear algebra.

p(f |y) =p(y |f )p(f )

p(y)

Page 47: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Expectation (posterior and predictive moments)

◮ Conditioning on data gave us:

f |y ∼ N(

Kfy K−1yy (y − my ) + mf , Kff − Kfy K−1

yy K Tfy

)

◮ then E [f |y ] = Kfy K−1yy (y − my ) + mf (MAP, posterior mean, ...)

◮ Predict data observations y∗:[

yy∗

]

∼ N

([

my

my∗

]

,

[

Kyy Ky∗y

K Ty∗y Ky∗y∗

])

◮ no different:

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) + my∗ , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

Page 48: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Marginalisation (marginal likelihood and modelselection)

◮ Again, if:[

fy

]

∼ N

([

mf

my

]

,

[

Kff Kfy

K Tfy Kyy

])

◮ we can marginalize out the latent:

p(y) =∫

p(y |f )p(f )df ↔ y ∼ N (my ,Kyy )

◮ marginal likelihood of the data (or log(p(y)) data log-likelihood)◮ In GP context, actually p(y |θ) = p(y |σf , σn, ℓ). This can be the

basis of model selection.

Page 49: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Gaussians in equations

Complaint

◮ I’m bored. All we are doing is messing around with Gaussians.

◮ Correct! (sorry)

◮ This is the whole point.

◮ We can do some remarkable things...

Page 50: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 51: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Our example model◮ f ∼ GP(0, kff ), where kff (ti , tj) = σ2

f exp{

− 12ℓ2 (ti − tj)2

}

◮ y |f ∼ GP(f , knn), where knn(ti , tj) = σ2nδ(ti − tj)

◮ y ∼ GP(0, kyy ), where kyy (ti , tj) = kff (ti , tj) + knn(ti , tj)◮ We choose σf = 10 , ℓ = 50 , σn = 1◮ The prior on f :

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 52: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Our example model◮ f ∼ GP(0, kff ), where kff (ti , tj) = σ2

f exp{

− 12ℓ2 (ti − tj)2

}

◮ y |f ∼ GP(f , knn), where knn(ti , tj) = σ2nδ(ti − tj)

◮ y ∼ GP(0, kyy ), where kyy (ti , tj) = kff (ti , tj) + knn(ti , tj)◮ We choose σf = 10 , ℓ = 50 , σn = 1◮ A draw from f :

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 53: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Drawing from the prior◮ These steps should be clear:

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

◮ Take n (many, but finite!) points ti ∈ [0, 500]◮ Evaluate Kff = {kff (ti , tj)}◮ Draw from f ∼ N (0,Kff )◮ ( f = chol(K)′ ∗ randn(n,1) )

Page 54: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Draw a few more

◮ four draws from f :

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 55: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Impact of hyperparameters

◮ σf = 10 , ℓ = 50

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 56: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Impact of hyperparameters

◮ σf = 4 , ℓ = 50

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 57: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Impact of hyperparameters

◮ σf = 4 , ℓ = 10

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 58: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Multidimensional input◮ just make each input x ∈ R

D (here D = 2, e.g. lat and long)◮ f ∼ GP(0, kff ), where

kff (x (i), x (j)) = σ2f exp

{

−∑

d1

2ℓ2d(x (i)

d − x (j)d )2

}

◮ (multidimensional GP in action: Cunningham, Rasmussen,Ghahramani (2012) AISTATS)

Page 59: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

Same model; we will now gather data yi .

◮ f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

− 12ℓ2 (ti − tj)2

}

◮ y |f ∼ GP(f , knn), where knn(ti , tj) = σ2nδ(ti − tj)

◮ y ∼ GP(0, kyy ), where kyy (ti , tj) = kff (ti , tj) + knn(ti , tj)◮ We choose σf = 10 , ℓ = 50 , σn = 1

Page 60: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ the GP prior p(f )

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 61: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ Observe a single point at t = 204:y(204) ∼ N (0, kyy (204, 204)) = N (0, σ2

f + σ2n)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 62: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ Use conditioning to update the posterior:

f |y(204) ∼ N(

Kfy K−1yy (y(204) − my ) , Kff − Kfy K−1

yy K Tfy

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 63: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ Use conditioning to update the posterior:

f |y(204) ∼ N(

Kfy K−1yy (y(204) − my ) , Kff − Kfy K−1

yy K Tfy

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 64: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ ... and the predictive distribution:

y∗|y(204) ∼ N(

Ky∗y K−1yy (y(204)− my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 65: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations◮ More observations (data vector y ):

y∗|y([

20490

]

) ∼ N

(

Ky∗y K−1yy

(

y([

20490

]

)− my

)

,Ky∗y∗ − Ky∗y K−1yy K T

y∗y

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 66: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations◮ More observations (data vector y ):

y∗|y([

20490

]

) ∼ N

(

Ky∗y K−1yy

(

y([

20490

]

)− my

)

,Ky∗y∗ − Ky∗y K−1yy K T

y∗y

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 67: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ More observations (data vector y ):

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 68: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Observations

◮ More observations (data vector y ):

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 69: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Nonparametric Regression

◮ GP let the data speak for itself... but all the data must speak.

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

◮ “nonparametric models have an infinite number of parameters”

Page 70: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Nonparametric Regression

◮ GP let the data speak for itself... but all the data must speak.

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

◮ “nonparametric models have an infinite number of parameters”

◮ “nonparametric models have a finite but unbounded number ofparameters that grows with data”

Page 71: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Almost through the basics...

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 72: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Model Selection / Hyperparameter Learning

◮ f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

− 12ℓ2 (ti − tj)2

}

ℓ = 50: just right

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

ℓ = 15: overfitting

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

ℓ = 250: underfitting

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 73: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Model Selection (1): Marginal Likelihood

◮ p(y) = N (0,Kyy ) → p(y |σf , σn, ℓ) = N (0,Kyy (σf , σn, ℓ))

◮ not obvious why this should not over (or under) fit, but it’s in themath...

log (p(y |σf , σn, ℓ)) = −12

yT K−1yy y −

12

log |Kyy | −n2

log(2π)

◮ “Occam’s Razor” implemented via regularization/Bayesian modelselection

◮ Details to be fleshed out in practical...

Page 74: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Model Selection (2): Cross Validation

◮ Can also consider predictive distribution for some held out data:

PL(σf , σn, ℓ) = log (p(ytest|ytrain, σf , σn, ℓ))

◮ Again a Gaussian.

◮ Again can take derivatives and tune model hyperparameters.

◮ Details to be fleshed out in practical...

Page 75: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 76: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Gaussian Process Basics

Outline

More details

◮ GP basics: Appdx A, Ch 1◮ Regression: Ch 2◮ Kernels: Ch 4◮ Model Selection: Ch 5

Page 77: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

Page 78: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

Page 79: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

◮ Option 2: functional form of kff → kernel choices.

Page 80: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

◮ Option 2: functional form of kff → kernel choices.

◮ Option 3: the GP?

Page 81: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

◮ Option 2: functional form of kff → kernel choices.

◮ Option 3: the GP?

◮ Option 4: the data distribution → likelihood choices.

Page 82: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 83: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

◮ Option 2: functional form of kff → kernel choices.

◮ Option 3: the GP?

◮ Option 4: the data distribution → likelihood choices.

Page 84: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

What the kernel is doing (SE)

k(ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 85: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Rational Quadratic

k(ti , tj) = σ2f

(

1 +1

2αℓ2 (ti − tj)2)

−α

∝ σ2f

zα−1 exp(

−αzβ

)

exp(

−z(ti − tj)2

2

)

dz

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 86: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Periodic

k(ti , tj) = σ2f exp

{

−2ℓ2 sin2

(

π

p|ti − tj |

)}

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 87: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

From Stationary to Nonstationary Kernels

◮ k(ti , tj) = k(ti − tj) = k(τ)◮ k(ti , tj) = σ2

f exp{

− 12ℓ2 (ti − tj )2

}

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 88: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Wiener Process

◮ k(ti , tj) = min(ti , tj)◮ Still a GP

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 89: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Wiener Process

◮ k(ti , tj) = min(ti , tj)◮ Draws from a nonstationary GP

0 100 200 300 400 500

−30

−20

−10

0

10

20

30

input t

outp

ut f

Page 90: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Linear Regression...

◮ f (t) = wt with w ∼ N (0, 1)◮ k(ti , tj) = E [f (ti )f (tj )] = ti tj

0 2 4 6 8 10−20

−15

−10

−5

0

5

10

15

20

input t

outp

ut f

Page 91: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Build your own kernel (1): Operations

◮ Linear: k(ti , tj) = αk1(ti , tj) + βk2(ti , tj) (for α, β ≥ 0)

or k(

x (i), x (j))

= ka

(

x (i)1 , x (j)

1

)

+ kb

(

x (i)2 , x (j)

2

)

◮ Products: k(ti , tj) = k1(ti , tj)k2(ti , tj)

◮ Integration: z(t) =∫

g(u, t)f (u)du ↔

kz(ti , tj) =∫ ∫

g(u, t1)kf (ti , tj)g(v , tj)dudv

◮ Differentiation: z(t) = ∂∂t f (t) ↔ kz(ti , tj) = ∂2

∂ti∂tjkf (ti , tj)

◮ Warping: z(t) = f (h(t)) ↔ kz(ti , tj) = kf (h(ti ), h(tj ))

Page 92: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Preserves joint Gaussianity (mostly)!

◮ Linear: k(ti , tj) = αk1(ti , tj) + βk2(ti , tj)

or k(

x (i), x (j))

= ka

(

x (i)1 , x (j)

1

)

+ kb

(

x (i)2 , x (j)

2

)

◮ Products: k(ti , tj) = k1(ti , tj)k2(ti , tj)

◮ Integration: z(t) =∫

g(u, t)f (u)du ↔

kz(ti , tj) =∫ ∫

g(u, t1)kf (ti , tj)g(v , tj)dudv

◮ Differentiation: z(t) = ∂∂t f (t) ↔ kz(ti , tj) = ∂2

∂ti∂tjkf (ti , tj)

◮ Warping: z(t) = f (h(t)) ↔ kz(ti , tj) = kf (h(ti ), h(tj ))

Page 93: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Build your own kernel (2): frequency domain

◮ a stationary kernel k(ti , tj) = k(ti − tj ) = k(τ) is positivesemidefinite (satisfies Mercer) iff:

S(ω) = F{k}(ω) ≥ 0 ∀ ω

◮ Power spectral density (Wiener-Khinchin, ...)

◮ Note k(0) =∫

S(ω)dω

Page 94: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Kernel choices

Kernel Summary

◮ GP gives a distribution over functions...◮ the kernel determines the type of functions.◮ can/should be tailored to application◮ toward a GP toolbox

Page 95: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

What’s next?

◮ Revisit the model and see what can be hacked:

f ∼ GP(0, kff ), where kff (ti , tj) = σ2f exp

{

−1

2ℓ2 (ti − tj)2}

yi |fi ∼ N (fi , σ2n I)

◮ Option 1: hyperparameters → model selection.

◮ Option 2: functional form of kff → kernel choices.

◮ Option 3: the GP?

◮ Option 4: the data distribution → likelihood choices.

Page 96: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

Data up to now

◮ continuous regression made sense◮ data likelihood model: yi |fi ∼ N (fi , σ2

n I)

0 100 200 300 400 500−2

−1

0

1

2

3

input t

obse

rved

dat

a y

Page 97: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

Binary label data

◮ Classification (not regression) setting◮ yi |fi ∼ N (fi , σ2

n I) is inappropriate

0 100 200 300 400 500

−1

−0.5

0

0.5

1

input t

obse

rved

dat

a y

Page 98: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

GP Classification◮ Probit or Logistic “regression” model on yi ∈ {−1,+1}:

p(yi |fi ) = φ(yi fi) =1

1 + exp(−yi fi)

◮ Warps f onto the [0, 1] interval

−6 −4 −2 0 2 4 6

0

0.2

0.4

0.6

0.8

1

GP value f

war

ped

GP

val

ue φ

(f)

Page 99: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

GP Classification

◮ Probit or Logistic “regression” model on yi ∈ {−1,+1}:

p(yi |fi ) = φ(yi fi) =1

1 + exp(−yi fi)

◮ Warps f onto the [0, 1] interval

0 100 200 300 400 500−6

−4

−2

0

2

4

6

input t

late

nt G

P v

alue

f

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

input t

map

ped

GP

val

ue φ

(f)

Page 100: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

What we want to calculate

◮ predictive distribution:

p(y∗|y) =∫

p(y∗|f ∗)p(f ∗|y)df ∗

◮ predictive posterior:

p(f ∗|y) =∫

p(f ∗|f )p(f |y)df

◮ data posterior:

p(f |y) =

i p(yi |fi)p(f )p(y)

◮ None of which is tractable to compute

Page 101: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

However...

◮ predictive distribution:

p(y∗|y) =∫

p(y∗|f ∗)q(f ∗|y)df ∗

◮ predictive posterior:

q(f ∗|y) =∫

p(f ∗|f )q(f |y)df

◮ data posterior:

q(f |y) ≈ p(f |y) =∏

i p(yi |fi)p(f )p(y)

◮ If q is Gaussian, these are tractable to compute

Page 102: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

Approximate Inference

◮ Methods for producing a Gaussian q(f |y) ≈ p(f |y)

◮ Laplace Approximation, Expectation Propagation, VariationalInference

◮ Technologies within a GP method

◮ Subject of much research; often work well

Page 103: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

Using Approximate Inference

◮ Allows “regression” on the [0, 1] interval

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

input t

map

ped

GP

val

ue φ

(f)

Page 104: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

Using Approximate Inference

◮ Allows “regression” on the [0, 1] interval

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

input t

map

ped

GP

val

ue φ

(f)

Page 105: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

What about SVM?

◮ illustrate flexibility of GP◮ draw an interesting connection◮ GP joint:

− log p(y , f ) =12

f T K−1ff f −

i

log(p(yi |fi))

◮ SVM loss (for fi = f (xi) = wT xi ):

ℓ(w) =12

wT w + C∑

i

(1 − yi fi)

Page 106: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Likelihood choices

SVM

◮ illustrate flexibility of GP◮ draw an interesting connection◮ GP joint:

− log p(y , f ) =12

f T K−1f −∑

i

log(p(yi |fi))

◮ SVM loss (for fi = f (xi) = φ(xi) = k(·, xi)):

ℓ(φ) =12

f T K−1f + C∑

i

(1 − yi fi)

◮ (more reading: Seeger (2002), Relationship between GP, SVM,Splines)

Page 107: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 108: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Shortcomings of GP

Our [weakness] grows out of our [strength]

Everything is happily Gaussian

◮ ... except when it isn’t.◮ nonnormal likelihood → approximate inference◮ model selection p(y |{σf , σn, ℓ})

Nonparametric flexibility

◮ ... but we have to compute on all the data:

f |y ∼ N(

Kfy K−1yy (y − my ) + mf , Kff − Kfy K−1

yy K Tfy

)

◮ O(n3) in runtime, O(n2) in memory◮ sparsification methods (pseudo points, etc.)◮ special structure methods (kernels, input points, etc.)

Page 109: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Shortcomings of GP

Shortcomings → active research

◮ Applications

◮ Approximate inference

◮ Computational complexity / sparsification

Page 110: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 111: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Connections

Connections

Already we have seen:

◮ Wiener processes

◮ linear regression

◮ SVM

◮ what else?

Page 112: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Connections

Temporal linear Gaussian models

◮ Wiener process (Brownian motion, random walk, OU process)◮ Linear dynamical system (state space model, Kalman

filter/smoother, etc.)

f (t) = Af (t − 1) + w(t) y(t) = f (t) + n(t)

◮ Gauss-Markov processes (ARMA(p, q), etc.)

f (t) =p

i=1

αi f (t − i) +q

i=1

βiw(t − i)

◮ Intuition of linearity and Gaussianity → GP

Page 113: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Beyond Basics

Connections

Other nonparametric models (or parametric limits)

◮ Kernel smoothing (Nadaraya-Watson, locally weightedregression):

y∗ =

n∑

i=1

αiyi where αi = k(ti , t∗)

◮ Compare to:

y∗|y ∼ N(

Ky∗y K−1yy (y − my ) , Ky∗y∗ − Ky∗y K−1

yy K Ty∗y

)

◮ Neural network limit (infinite bases, important to know about,Neal ’96):

f (t) = limm→∞

m∑

i=1

vih(t ; ui )

Page 114: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Conclusions & References

Outline

Outline

Gaussian Process BasicsGaussians in words and picturesGaussians in equationsUsing Gaussian Processes

Beyond BasicsKernel choicesLikelihood choicesShortcomings of GPConnections

Conclusions & References

Page 115: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Conclusions & References

Outline

Conclusions

◮ Gaussian Processes can be effective tools for regression andclassification

◮ Quantified uncertainty can be highly valuable

◮ GP can be extended in interesting ways (linearity helps)

◮ GP appear as limits or general cases of a number of MLtechnologies

◮ GP are not without problems

Page 116: MLSS 2012: Gaussian Processes for Machine Learningjwp2128/Teaching/E6892/papers/...MLSS 2012: Gaussian Processes for Machine Learning Gaussian Process Basics Gaussians in equations

MLSS 2012: Gaussian Processes for Machine Learning

Conclusions & References

Outline

Some References/Pointers/Credits

◮ Rasmussen and Williams, Gaussian Processes for MachineLearning

◮ Bishop, Pattern Recognition and Machine Learning

◮ www.gaussianprocess.org (better updated/kept than .com)

◮ loads of papers at AISTATS/NIPS/ICML/JMLR over the last 12years.