GitHub Pages - Gaussian processes for data-driven...

Gaussian processes for data-driven modelling anduncertainty quantification: a hands-on tutorial

Andreas Damianou

Department of Computer Science, University of Sheffield, UK

Brown University, 16/02/2016

Sheffield

Outline

1. Gaussian processes as infinite dimensional Gaussiandistributions

1.1 Gaussian distribution1.2 Intuition by sampling and plotting1.3 Mean and covariance functions1.4 Marginalization and conditioning properties

2. Noise model

3. Covariance functions, aka kernels

4. “Full” GP implementation!

5. Running our GP

5.1 Fitting and overfitting

6. GPy

7. Classification

Gaussian distribution

Probability model: p(f1, f2) ∼ N (0,K)

−2 −1 0 1 2

Covariance betweenf1 and f2:

[1 0.92

0.92 1

−2 −1 0 1 2

f2=0.35 .

[1 0.92

0.92 1

−2 −1 0 1 2

f2=0.35

p(f1 | f

2=0.35)

[1 0.92

0.92 1

−2 −1 0 1 2

[1 0.50.5 1

−2 −1 0 1 2

f2=0.35 .

[1 0.50.5 1

−2 −1 0 1 2

f2=0.35

p(f1 | f

2=0.35)

[1 0.50.5 1

Introducing Gaussian Processes:

I A Gaussian distribution depends on a mean and a covariancematrix.

I A Gaussian process depends on a mean and a covariancefunction.

GO TO NOTEBOOK (start)

Infinite model... but we always work with finite sets!

Let’s start with a multivariate Gaussian:

p(f1, f2, · · · , fs︸︷︷︸fA

, fs+1, fs+2, · · · , fN︸︷︷︸fB

) ∼ N (µ,K).

]and K =

[KAA KAB

KBA KBB

]Marginalisation property:

p(fA, fB) ∼ N (µ,K). Then:

p(fA) =

p(fA, fB)dfB = N (µA,KAA)

Let’s start with a multivariate Gaussian:

p(f1, f2, · · · , fs︸︷︷︸fA

, fs+1, fs+2, · · · , fN︸︷︷︸fB

) ∼ N (µ,K).

]and K =

[KAA KAB

KBA KBB

]Marginalisation property:

p(fA) =

p(fA, fB)dfB = N (µA,KAA)

In the GP context f = f(x):

µ∞ =

· · ·· · ·

and K∞ =

ff· · ·

· · · · · ·

Covariance function: Maps locations xi, xj of the input domainX to an entry in the covariance matrix:

Ki,j = k(xi,xj)

For all available inputs:

K = Kff= k(X,X)

GP: joint Gaussian distribution of the training and the(potentially infinite!) test data:

[ff∗

]∼ N

[K K∗K>∗ K∗,∗

])K∗ is the (cross)-covariance matrix obtained by evaluating the covariance

function in pairs of training inputs X and test inputs X∗, ie.

f∗ = k(X,X∗).

Similarly:

K∗∗ = k(X∗,X∗).

Posterior is also Gaussian!

p(fA|fB) = N (µA +KABK−1BB(fB − µB),KAA −KABK

−1BBKBA)

In the GP context this can be used for inter/extrapolation:

p(f∗|f1, · · · , fN ) = p(f(x∗)|f(x1), · · · , f(xN )) ∼ Np(f∗|f1, · · · , fN) = p(f(x∗)|f(x1), · · · , f(xN ))

∼ N (K>∗K−1f , K∗,∗ −K>∗K

−1K∗)

p(f(x∗)|f(x1), · · · , f(xN )) is a posterior process!

Posterior is also Gaussian!

p(fA|fB) = N (µA +KABK−1BB(fB − µB),KAA −KABK

−1BBKBA)

In the GP context this can be used for inter/extrapolation:

p(f∗|f1, · · · , fN ) = p(f(x∗)|f(x1), · · · , f(xN )) ∼ Np(f∗|f1, · · · , fN) = p(f(x∗)|f(x1), · · · , f(xN ))

∼ N (K>∗K−1f , K∗,∗ −K>∗K

−1K∗)

p(f(x∗)|f(x1), · · · , f(xN )) is a posterior process!

Noise model

I So far we assumed: f = f(X)I Assuming that we only observe noisy versions y of the true

outputs f :y = f(X) + ε, where:

f ∼ GP(0, k(x, x′))ε ∼ N (0, σ2I)

The above construction, gives us the following probabilities:

p(y|f) = N (y|f , σ2I)

p(f |x) = N (f |0,Kff ) = (2π)n/2|Kff |−1/2 exp(−1

2fTKff f

)p(y|x) =

∫p(y|f)p(f |x)df = N (y|0,Kff + σ2I)

p(y|x) is called the marginal likelihood and is tractable becauseof our choice for noise ε which is normally distributed.

Noise model

f ∼ GP(0, k(x, x′))ε ∼ N (0, σ2I)

p(y|f) = N (y|f , σ2I)

2fTKff f

)p(y|x) =

Noise model

f ∼ GP(0, k(x, x′))ε ∼ N (0, σ2I)

p(y|f) = N (y|f , σ2I)

2fTKff f

)p(y|x) =

Predictions in the noise model

y∗|y,x,x∗ ∼ N (µpred,Kpred)

withµpred = K>∗

[K+ σ2I

]−1y

andKpred = K∗,∗ −K>∗

[K+ σ2I

]−1K∗.

Fitting the data (shaded area is uncertainty)

−2 −1 1 2

Fitting the data - Prior Samples

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data - more noise

−2 −1 1 2

Fitting the data - no noise

−2 −1 1 2

Fitting the data - Posterior samples

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

Fitting the data

−2 −1 1 2

GO TO NOTEBOOK (Cov. functions)

Curve fitting

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.5

I Which curve fits the data better?

Curve fitting

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.5

I Which curve is more “complex”?

Curve fitting

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.5

I Which curve is better overall?

Curve fitting

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.5

I Which curve is better overall?

Need a good balance between data fit vs overfitting!

(Bayesian) Occam’s Razor

“A plurality is not to be posited without necessity”. W. of Ockham

“Everything should be made as simple as possible, but not simpler”. A. Einstein

Evidence is higher for the model that is not “unnecessarilycomplex” but still “explains” the data D.

How do GPs solve the overfitting problem (i.e. regularize)?

I Answer: Integrate over the function itself!

I This is associated with the Bayesian methodology.

I So, we will average out all possible function forms, under a(GP) prior!

Recap:

ML: argmaxw

p(y|w, φ(x)) e.g. y = φ(x)>w + ε

Bayesian: argmaxθ

∫f p(y|f) p(f |x,θ︸︷︷︸

GP prior

) e.g. y = f(x,θ) + ε

I θ are hyperparameters

I The Bayesian approach (GP) automatically balances thedata-fitting with the complexity penalty.

How do GPs solve the overfitting problem (i.e. regularize)?

I Answer: Integrate over the function itself!

I This is associated with the Bayesian methodology.

I So, we will average out all possible function forms, under a(GP) prior!

Recap:

ML: argmaxw

p(y|w, φ(x)) e.g. y = φ(x)>w + ε

Bayesian: argmaxθ

∫f p(y|f) p(f |x,θ︸︷︷︸

GP prior

) e.g. y = f(x,θ) + ε

I θ are hyperparameters

I The Bayesian approach (GP) automatically balances thedata-fitting with the complexity penalty.

GO TO NOTEBOOK

Teaser for tomorrow

I Unsupervised learning with GPs (Bayesian non-lineardimensionality reduction)

I Deep GPs

I Dynamical systems

I ..and more...

Deep GP: Step function (credits for idea to J. Hensman)

| | | | | | |

(deep GP - 3 hidden layers)

Thanks

Thanks to Prof. Neil Lawrence, his SheffieldML group and theGPy developers.

Thanks to George Karniadakis, Paris Perdikaris for hosting me

GitHub Pages - Gaussian processes for data-driven...

Documents

Quanti-CultiControl - Liofilchem · Quanti-CultiControl The Quanti-CultiControl are lyophilized, quantitative microorganism preparations to be used in industrial laboratories for

CHDC Hospital Quanti Analysis

Processes GeologicTheories of Geologic Processes Continental Drift Plate Tectonics Geologic Processes •Constant changes •Driven by internal processes –Build up the planet’s

Quanti vs quali research

Quanti-Tray Sealer - IDEXX · 3 Quanti-Tray* Sealer Description The IDEXX Quanti-Tray* Sealer Model 2X is a motor-driven, heated roller instrument designed to seal IDEXX Quanti-Trays

Levy-driven Ornstein-Uhlenbeck processes: …prac.im.pwr.edu.pl/~jumps/ln/novikov.pdfLevy-driven Ornstein-Uhlenbeck processes: survey of results on ﬁrst passage times ... Arato,

Membrane development for oil contaminated water treatmentstion and dialysis, thermally driven processes, e.g. membrane distillation (MD) and electrically driven processes, e.g. electrodialysis

Predicate Logic - EngineeringPredicates and Quanti ers Nested Quanti ers Using Predicate Calculus Predicates and Quanti ers Quanti ers Assign a value to x in P(x) =\x is an odd number",

Boost your Processes – with CENIT Process driven Solutions · Boost your Processes – with CENIT Process driven Solutions. SAPs PLM Interface for CATIA V5. Comprehensive, sound,

Large deviations for Poisson driven processes in epidemiologycermics.enpc.fr/~lelievre/CEMRACS/slides/Kratz.pdf · Large deviations for Poisson driven processes ... Long-term behavior

‘QuanTI’ ITN management web/restricted... · ‘QuanTI’ ITN management: Recruitment, finances, budget and reporting overview ‘QuanTI’ Inaugural meeting 29-30 August, 2013

ONE TOUCH WORKFLOW CLIENT… · document-driven transactions REPLACE MANUAL PROCESSES The automation of document-driven business processes brings the promise of “hands-off” speed

COMPOSABLE RISK-DRIVEN PROCESSES FOR ...csse.usc.edu/TECHRPTS/PhD_Dissertations/files/Yang... · Web viewCOMPOSABLE RISK-DRIVEN PROCESSES FOR DEVELOPING SOFTWARE SYSTEMS FROM COMMERCIAL-OFF-THE-SHELF

Levy-driven non-Gaussian Ornstein--Uhlenbeck processes for ...edkao/MyWeb/doc/yinKao.pdf · Lévy-driven non-Gaussian Ornstein–Uhlenbeck processes for degradation-based reliability

Qualit Quanti Mix

Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Model-driven in development processes

Atmospheric Buoyancy Driven Flows - CNR · 2010-06-30 · of the scale separation between the processes mainly driven by the buoyancy and the processes mainly driven by the rotation

Quanti Latest Try

Pressure Driven Processes

Chapter Report Simulation Quanti