55
Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions Deep Gaussian Processes: Theory and Applications Petar M. Djuri´ c Bellairs Workshop, Barbados February 13, 2019 Petar M. Djuri´ c — Deep Gaussian Processes: Theory and Applications 1/55

Deep Gaussian Processes: Theory and Applications

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Deep Gaussian Processes: Theory andApplications

Petar M. Djuric

Bellairs Workshop, Barbados

February 13, 2019

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 1/55

Page 2: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Outline

I Introduction

I Gaussian processes

I Deep Gaussian processes

I Applications

I Conclusions

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 2/55

Page 3: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Introduction

I Probabilistic modeling allows for representing and modifyinguncertainty about models and predictions.

I This is done according to well defined rules.

I Probabilistic modeling has a central role in machine learning,cognitive science and artificial intelligence.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 3/55

Page 4: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

The Concept of Uncertainty

I Learning and intelligence depend on the amount ofuncertainty in the information extracted from data.

I Probability theory is the main framework for handlinguncertainty.

I Interestingly, in the recent progress of deep learning with deepneural networks, which are based on learning from hugeamounts of data, the concept of uncertainty is somewhatbypassed.

I In the years to come, we will see further advances in artificialintelligence and machine learning within the probabilisticframework.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 4/55

Page 5: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

The Role of a Model

I To make inference from data, one needs models.

I Models can be simple (like linear models) or highly complex(like large and deep neural networks).

I In most settings, the models must be able to make predictions.

I Uncertainty plays a fundamental role in modeling observeddata and in interpreting model parameters, the results ofmodels, and the correctness of models.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 5/55

Page 6: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

The Learning

I Probability distributions are used to represent uncertainty.

I Learning from data occurs by transforming prior distributions(defined before seeing the data) to posterior distributions(after seeing the data).

I The optimal transformation from information-theoretic pointof view is the Bayes rule.

I The beauty of the approach is the simplicity of the Bayesmechanism.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 6/55

Page 7: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Gaussian Processes Regression

I Essentially, a GP can be seen as the distribution of areal-valued function f (x),

f (x) ∼ GP(m(x), kf (xi , xj ))

I Some assumptions are often made when using GP regression

1. the mean function m(x) = 0 for simplicity, and

2. the observation noise is additive white Gaussian noise fortractability.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 7/55

Page 8: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Gaussian Processes Regression (contd.)

Let X = {xi}Ni=1 and y denote the collection of all input vectors

and all observations, respectively, with the above assumptions, i.e.,

y = f(X) + ε

where ε ∼ N (0, σ2ε I). We also have

I Likelihood: p(y|f) = N (y|f, σ2ε I), and

I Prior: p(f|X,θ) = N (f|0,Kff ), where Kff = kf (X,X) and θdenote the hyper-parameters in the covariance function.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 8/55

Page 9: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Gaussian Processes Regression (contd.)

The hyper-parameters θ can be learned from the training data{X, y} by maximizing the log-marginal-likelihood

I Log-marginal-likelihood: log p(y|X,θ)

log p(y|X,θ) = logN (y|0, Kff + σ2ε I)

= logN (y|0, K)

= −1

2yTK−1y − 1

2log |K| − N

2log 2π

I The Occam’s razor is embedded in the model.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 9/55

Page 10: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Gaussian Processes Regression (contd.)

Let X∗ and f∗ denote the collection of test inputs and thecorresponding latent function values, respectively. Then we canexpress the predictive posterior as

I Predictive posterior: p(f∗|X∗,X, y,θ) = N (f∗|E(f∗), cov(f∗))

E(f∗) = [Kf (X∗,X)]K−1y

cov(f∗) = Kf (X∗,X∗)− [Kf (X∗,X)]K−1[Kf (X∗,X)]T

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 10/55

Page 11: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Covariance Function

I For example: Radial basis function (RBF) or squaredexponential (SE)

One dimensional form:

krbf (xi , xj ) = σ2f exp(−1

`(xi − xj )

2)

I σ2f measures strength of signal,

σ2fσ2ε

is equivalent to

signal-to-noise ratio (SNR).

I The characteristic length scale ` encodes the modelcomplexity in that dimension.

I r = 1` measures the relevance of that dimension.

I Automatic relevance determination (ARD)

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 11/55

Page 12: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Toy Example

I Goal: learn f (x) from 5 noisy observations {xi , yi}5i=1.

I Ground truth: y = sin(x) + ε, ε ∼ N (0, σ2ε ).

I Test inputs: x∗ ∈ R300×1 equally spaced from x = 0 to 2π.

I Test outputs: f∗ = f (x∗)

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 12/55

Page 13: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Prior Distribution

0 1 2 3 4 5 6

test inputs x*

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

test

ou

tpu

ts f

*

confidence interval

prior mean

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 13/55

Page 14: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Predictive (Posterior) Distribution

0 1 2 3 4 5 6

test inputs x*

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

test

ou

tpu

ts f

*

confidence interval

predictive mean

noisy observations

ground truth

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 14/55

Page 15: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Another Toy Example: The Function sin x/x

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 15/55

Page 16: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Recovery of Missing Samples in FHR1

I Goal: recover missing samples in FHR, using not onlyobserved FHR but also UA samples

I Model:yi = y(xi ) = f (xi ) + εi

I yi : i-th sample in an FHR segmentI xi = [i , ui ]

′ where ui is the i-th UA sampleI εi : Gaussian white noiseI f (xi ): i-th latent noise-free FHR sample

1Guanchao Feng, J Gerald Quirk, and Petar M Djuric. “Recovery of missing samples in fetal heart raterecordings with Gaussian processes”. In: Signal Processing Conference (EUSIPCO), 2017 25th European. IEEE.2017, pp. 261–265.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 16/55

Page 17: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

CTG Segment for Experiments

0 50 100 150 200 250 300 350 400 450 500

time[samples]

100

110

120

130

140

150

FH

R[B

PM

]

FHR

0 50 100 150 200 250 300 350 400 450 500

time[samples]

0

20

40

60

80

100

UA

UA

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 17/55

Page 18: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

CTG Segment for Experiments

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 18/55

Page 19: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Experiment I

I 120 missing samples were randomly selected, and we tried torecover their true values.

0 50 100 150 200 250 300 350 400 450 500

time[samples]

90

100

110

120

130

140

150

FH

R[B

PM

]

ground truth

cubic spline interpolation

GP-based method

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 19/55

Page 20: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Experiment II

I The percentage of missing samples was increased from 1% to85% with a step size of 1%.

0 10 20 30 40 50 60 70 80 90

Percentage of Missing Samples

-2

0

2

4

6

8

10

12

14

log o

f M

SE

MSE of cubic spline interpolation

MSE of GPs-based method

0 10 20 30 40 50 60 70 80 90

Percentage of Missing Samples

20

25

30

35

40

45

50

55

dB

SNR of cubic spline interpolation

SNR of GPs-based method

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 20/55

Page 21: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Experiment IIII To demonstrate contribution of UA, we repeated the

experiment I, but excluded ui from the input vector xi .

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 21/55

Page 22: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Experiment VI (An Extreme Case)

I 10 seconds of consecutive missing samples.

0 50 100 150 200 250 300 350 400 450 500105

110

115

120

125

130

135

140

145

Ground Truth

GP-based Method, MSE = 1.4337

Cubic Spline, MSE = 62.1504

Linear, MSE = 2.0730

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 22/55

Page 23: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Limitations

I The general framework is computationally expensive, O(N3),due to the term K−1

N×N .

I Another limitation is the joint Gaussianity that is required bythe definition of GPs.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 23/55

Page 24: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Deep Gaussian Processes

Z XH−1 . . . X2 X1 Y

I Y ∈ RN×dy : observations, output of the network

I N is the number of observation vectors.I dy is the dimension of the vectors yn.

I {Xh}H−1h=1 : intermediate latent states

I dimensions {dh}H−1h=1 are potentially different.

I Z ∈ RN×dz : the input to the networkI Z is observed for supervised learning.I Z is unobserved for unsupervised learning.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 24/55

Page 25: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Deep Gaussian Processes (contd.)

I The joint Gaussianity limitation is overcome because nonlinearmappings generally will not preserve Gaussianity.

I DGPs immediately introduce intractabilities.

I One way of handling the difficulties is by introducing a set ofinducing points and where within the variational framework,sparsity and a tractable lower bound on the marginallikelihood are obtained.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 25/55

Page 26: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Functions Sampled From DGP

I Gaussianity limitation is overcome by nonlinear functioncomposition.

0 200 400 600 800 1000 1200 1400 1600 1800 2000-10

0

10Input

-8 -6 -4 -2 0 2 4 6 8

-1

0

1

2

Layer 1

-8 -6 -4 -2 0 2 4 6 8

-2

0

2

Layer 2

-8 -6 -4 -2 0 2 4 6 8

-2

0

2

Layer 3

-8 -6 -4 -2 0 2 4 6 8

-2

0

2

Layer 4

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 26/55

Page 27: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Learning a Step Function2

I Standard GP (top), two- and four-layer DGP (middle,bottom).

I DGPs achieved much better performance.

2James Hensman and Neil D Lawrence. “Nested variational compression in deep Gaussian processes”. In:arXiv preprint arXiv:1412.1370 (2014).

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 27/55

Page 28: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Deep GPs and Deep Neural Networks (a comparison)

I A single layer of fully connected neural network with anindependent and identically distributed (iid) prior over itsparameters and with an infinite width is equivalent to a GP.

I Therefore, deep GPs are equivalent to neural networks withmultiple, infinitely wide hidden layers.

I Mappings of a DGP are governed by its GPs instead ofactivation functions.

I A DGP allows for propagations and quantifications ofuncertainties through each layer as a fully Bayesianprobabilistic model.

I There is ARD at each layer.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 28/55

Page 29: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Generative Process

Z X Yf X f Y

Figure: A two-layer DGP.

I The generative process takes the form:

xnl = f Xl (zn) + εX

nl , l = 1, . . . , dx , zn ∈ RdZ

yni = f Yi (xn) + εY

ni , i = 1, . . . , dy , xn ∈ Rdx

I εXnl and εY

ni are additive white Gaussian processes.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 29/55

Page 30: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Generative Process (contd.)

Z X Yf X f Y

I We assume Z is unobserved with a prior p(Z) = N (Z|0, I)I If we have specific prior knowledge about Z, we should

quantify this knowledge into a prior accordingly.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 30/55

Page 31: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Inference

Z X Yf X f Y

I The inference takes the reverse route, i.e., we observehigh-dimensional data Y, and we learn the low-dimensionalmanifold Z (of dimension dz , where dz < dx < dy ) that isresponsible for generating Y.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 31/55

Page 32: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Inference Challenges

The learning requires maximization of the log-marginal-likelihood,

log p(Y) = log

∫X,Z

p(Y|X)p(X|Z)p(Z)dXdZ

which is intractable.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 32/55

Page 33: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Augmentation of Probability Space

Z X Yf X f Y

I Original probability space:

p(Y,FY ,FX ,X,Z) =p(Y|FY )p(FY |X)p(X|FX )

× p(FX |Z)p(Z)

I Augmentation using inducing points:I UX = f X (Z), Z ∈ RNp×dZ and UX ∈ RNp×dx

I UY = f Y (X), X ∈ RNp×dx and X ∈ RNp×dx

I Np ≤ N

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 33/55

Page 34: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Augmentation of Probability Space

I Augmented probability space:

p(Y,FY ,FX ,X,Z,UY ,UX , X, Z)

= p(Y|FY )p(FY |UY ,X)p(UY |X)

× p(X|FX )p(FX |UX ,Z)p(UX |Z)p(Z)

I Problematic terms:

I A = p(FY |UY ,X)

I B = p(FX |UX ,Z)

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 34/55

Page 35: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Variational InferenceI A variational distribution: Q = q(UY )q(X)q(UX )q(Z)

I By Jensen’s inequality:

log p(Y) ≥ Fv =

∫Q · A · B log G dFY dXdFXdZdUXdUY

I The function G is defined as:

G(Y,FY ,X,FX ,Z,UX ,UY )

=p(Y|FY )p(UY )p(X|FX )p(UX )p(Z)

Q.

I Fv is tractable for a collection of covariance functions, sinceA and B are canceled out in G.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 35/55

Page 36: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Studying Complex Systems

Used principles

I algorithmic compressibility,

I locality, and

I deep probabilistic modeling.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 36/55

Page 37: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Applications

t

. . .z1xL1,1 zQ xL2,2

...

x1,1

y1

...

x1,2

y2

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 37/55

Page 38: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Applications-contd.

Pi

Pk Pj

Lkj

Ljk

Lji

Lij

yk [t1] yj [t2]

yi [t3]

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 38/55

Page 39: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Applications-contd.3

3Figures obtained by Sima Mofakham and Chuck Mikell.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 39/55

Page 40: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Binary pH-based Classification4

I Goal: to have the DGP classify CTG recordings into healthand unhealthy classes.

I Features:

I 14 FHR featuresI 6 (categorical) UA features

I Labeling:I Positive (unhealthy): pH < 7.1I Negative (healthy): pH > 7.2

4Guanchao Feng, J Gerald Quirk, and Petar M Djuric. “Supervised and Unsupervised Learning of Fetal HeartRate Tracings with Deep Gaussian Processes”. In: 2018 14th Symposium on Neural Networks and Applications(NEUREL). IEEE. 2018, pp. 1–6.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 40/55

Page 41: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

I Structure of DGP: our DGP network had two layers, and ineach layer, we set the initial latent dimension to five.

I Performance metrics:

1. Sensitivity (true positive rate)

2. Specificity (true negative rate)

3. Geometric mean of specificity and sensitivity

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 41/55

Page 42: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Features

Table: Features for FHR

Category Feature

Time domain Mean, Standard deviation, STV, STI, LTV, LTI

Non-linear Poincare SD1, Poincare SD2, CCM

Frequency domain VLF, LF, MF, HF, ratio

Table: Features for UA

Normal (0) Abnormal (1)

Frequency ≤ 8 contractions > 8 UC (tachysystole)

Duration < 90s > 90s

Increased tonus With toco Prolonged > 120s

Interval A Interval – peak to peak < 2min

Interval BInterval – offset of UC to

onset of next UC< 1min

Rest time > 50% < 50%

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 42/55

Page 43: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Classification Results

I Support vector machine (SVM) was used as a benchmarkingmodel.

Table: Classification results

Classifier Feature Specificity Sensitivity Geometric Mean

SVMFHR 0.82 0.73 0.77

FHR+UA 0.82 0.82 0.82

Deep GPFHR 0.91 0.73 0.82

FHR+UA 0.82 0.91 0.86

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 43/55

Page 44: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Unsupervised Learning for FHR Recordings

I Goal: to have the DGP learn informative low-dimensionallatent spaces that can generate the recordings.

I Labeling:

I pH-based labeling combined with obstetrician’s evaluation.

I Labels are only used for evaluation of learning results.

I Data:

I The last 30 minutes of 10 FHR recordings, Y ∈ R10×7200.

I Three of them are abnormal and 7 are normal.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 44/55

Page 45: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Performance Metric and Network Structure

I Performance metric: the number of errors in the latent spacefor one nearest neighbor.

I Structure of DGP: a five-layer DGP, and the initial dimensionsof the latent spaces in the layers were dx1:5 = [6, 5, 5, 4, 3]T .

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 45/55

Page 46: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Automatic Structure Learning

Layer 5

1 2 30

0.01

0.02

Layer 4

1 2 3 40

0.5

Layer 3

1 2 3 4 50

0.5

Layer 2

1 2 3 4 50

2

4

Layer 1

1 2 3 4 5 60

0.5

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 46/55

Page 47: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Visualization of the Latent Spaces with 2-D Projection.

I Red: the normal recordings

I Blue: the abnormal recordings

I Pixel intensity: proportional to precision

I The total errors in layers 1 to 5 are 2, 2, 1, 1, 0, respectively.

-3 -2 -1 0 1 2

-2

0

2

Layer 1

-1 -0.5 0 0.5 1 1.5

-1

0

1

Layer 2

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1

0

1

Layer 3

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-0.4-0.2

00.20.40.6

Layer 4

-1.5 -1 -0.5 0 0.5 1 1.5

-1

0

1

Layer 5

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 47/55

Page 48: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Deep Gaussian Processes with ConvolutionalKernels5

I Goal: multi-class image classification

I Database: MNIST (handwritten digits)

I Methods:

1. SGP: Sparse Gaussian processes2. DGP: Deep Gaussian processes3. CGP: Convolutional Gaussian processes4. CDGP: Convolutional deep Gaussian processes

5Vinayak Kumar et al. “Deep Gaussian Processes with Convolutional Kernels”. In: arXiv preprintarXiv:1806.01655 (2018).

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 48/55

Page 49: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

MNIST

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 49/55

Page 50: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Identification of Atmospheric Variable UsingDeep Gaussian Processes6

I Goal: modeling temperature using meteorological variables(features).

I Domain of interest: 25Km × 25Km around the nuclear powerplant in Krsko, Slovenia.

I Features: relative humidity, atmosphere stability, air pressure,global solar radiation, wind speed.

6Mitja Jancic, Jus Kocijan, and Bostjan Grasic. “Identification of Atmospheric Variable Using Deep GaussianProcesses”. In: IFAC-PapersOnLine 51.5 (2018), pp. 43–48.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 50/55

Page 51: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

The Geographical Features of the Surrounding Terrain

I The plant and its measurement station (marked as STOLP –Postaja) are situated in the basin surrounded by hills andvalleys, which influence micro-climate conditions.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 51/55

Page 52: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

One-Step-Ahead Prediction

I Prediction results:

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 52/55

Page 53: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Example: Deep Gaussian Process for Crop Yield PredictionBased on Remote Sensing Data7

I Goal: predicting crop yields before harvest

I Model: CNN and LSTM combined with GP

7Jiaxuan You et al. “Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data.”. In:AAAI. 2017, pp. 4559–4566.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 53/55

Page 54: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Comparing County-Level Error Maps

I The color represents the prediction error in bushel per acre.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 54/55

Page 55: Deep Gaussian Processes: Theory and Applications

Introduction Gaussian Processes Deep Gaussian Processes Applications Conclusions

Conclusions

I A case was made for using probability theory in treatinguncertainties in inference from data.

I Deep probabilistic modeling based on deep Gaussian processeswas addressed.

I The use of DGPs in studying complex interacting systems wasdescribed.

I Applications in various fields using DGPs were provided.

I Although the development of DGPs is still in its relativelyearly stages, DGPs showed great potentials in manychallenging machine learning tasks.

Petar M. Djuric — Deep Gaussian Processes: Theory and Applications 55/55