Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Machine Learning 1 Lecture 12.5 - Kernel Methods
Gaussian Processes - Bayesian Regression
Erik Bekkers
(Bishop 6.4.2, 6.4.3)
Image credit: Kirillm | Getty Images
Slide credits: Patrick Forré and Rianne van den Berg
Machine Learning 1
‣ We have observed where we assume
,
‣ Assume we have a GP for y(x), so for any
‣ Then is also a since , and the sum of two independent random variables is also Gaussian distributed
{(xi, fi)}Ni=1
fi = f(xi) = y(xi) + ε ε ∼ "(0,β−1)
y =y(x1)
⋮y(xN)
∼ " 0,k(x1, x1) . . . k(x1, xN)
⋮ ⋱ ⋮k(xN, x1) . . . k(xN, xN)
f( ⋅ ) GP f = y + ε
f ∼ "(0, K(X, X) + β−1I)
2
Regression with GP’s
(BCA ,* )
non - parametric parametricvs f. rNfoIw,p
-'
II
eaui¥ Eu pct )
Machine Learning 1
‣ The joint distribution of test points (at ) and (train points), according to our , is given by
‣ Then
with
f* X* fGP
[ ff*] ∼ " 0, [K(X, X) + β−1I K(X, X*)
K(X*, X) K(X*, X*) + β−1I]
p(f* |X*, X, f) = " (μ*, Σ*))
μ* = K(X*, X)(K(X, X) + β−1I)−1 fΣ* = K(X*, X*) + β−1I − K(X*, X)(K(X, X) + β−1I)−1 K(X, X*)
3
Predictions with GP’sxx
k( x , . x!) , KC xp , Xi' ),. ..
.tell
,Xu! )
Inc "m.
iii. am.
mi:*, )
Gaulsaim conditioning property
'
.
teniasi
Machine Learning 1 4
Predictions with GP’s active learning:
-i¥wamregions
- gather moredata
④H
- ri:I % large errorHugs
↳
Machine Learning 1 5
Drawing functions from GP posterior
Machine Learning 1
‣ The kernel parameters are hyperparameters
‣ Simplest approach: take training observations, for which we know
with
‣ Make a maximum likelihood estimate
‣ Solve numerically for
θ0, θ1, θ2, θ3
f ∼ "(0, C(X, X) = 1(2π)N/2 |C |1/2
exp (− 12 fTC−1f)C(X, X) = K(X, X) + β−1I
maxθ
ln p(f |X, θ) = maxθ
− 12 ln |C | −12 f
TC−1f − N2 ln 2π
θ
6
How to choose kernel parameters?
E E
E
Q E
O O-
-