6
Machine Learning 1 Lecture 12.5 - Kernel Methods Gaussian Processes - Bayesian Regression Erik Bekkers (Bishop 6.4.2, 6.4.3) Image credit: Kirillm | Getty Images Slide credits: Patrick Forré and Rianne van den Berg

Machine Learning 1 · 2020. 11. 27. · Machine Learning 1 Lecture 12.5 - Kernel Methods ! Gaussian Processes - Bayesian Regression ! Erik Bekkers (Bishop 6.4.2, 6.4.3) Image credit:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Machine Learning 1 Lecture 12.5 - Kernel Methods

    Gaussian Processes - Bayesian Regression

    Erik Bekkers

    (Bishop 6.4.2, 6.4.3)

    Image credit: Kirillm | Getty Images

    Slide credits: Patrick Forré and Rianne van den Berg

  • Machine Learning 1

    ‣ We have observed where we assume

    ,

    ‣ Assume we have a GP for y(x), so for any

    ‣ Then is also a since , and the sum of two independent random variables is also Gaussian distributed

    {(xi, fi)}Ni=1

    fi = f(xi) = y(xi) + ε ε ∼ "(0,β−1)

    y =y(x1)

    ⋮y(xN)

    ∼ " 0,k(x1, x1) . . . k(x1, xN)

    ⋮ ⋱ ⋮k(xN, x1) . . . k(xN, xN)

    f( ⋅ ) GP f = y + ε

    f ∼ "(0, K(X, X) + β−1I)

    2

    Regression with GP’s

    (BCA ,* )

    non - parametric parametricvs f. rNfoIw,p

    -'

    II

    eaui¥ Eu pct )

  • Machine Learning 1

    ‣ The joint distribution of test points (at ) and (train points), according to our , is given by

    ‣ Then

    with

    f* X* fGP

    [ ff*] ∼ " 0, [K(X, X) + β−1I K(X, X*)

    K(X*, X) K(X*, X*) + β−1I]

    p(f* |X*, X, f) = " (μ*, Σ*))

    μ* = K(X*, X)(K(X, X) + β−1I)−1 fΣ* = K(X*, X*) + β−1I − K(X*, X)(K(X, X) + β−1I)−1 K(X, X*)

    3

    Predictions with GP’sxx

    k( x , . x!) , KC xp , Xi' ),. ..

    .tell

    ,Xu! )

    Inc "m.

    iii. am.

    mi:*, )

    Gaulsaim conditioning property

    '

    .

    teniasi

  • Machine Learning 1 4

    Predictions with GP’s active learning:

    -i¥wamregions

    - gather moredata

    ④H

    - ri:I % large errorHugs

  • Machine Learning 1 5

    Drawing functions from GP posterior

  • Machine Learning 1

    ‣ The kernel parameters are hyperparameters

    ‣ Simplest approach: take training observations, for which we know

    with

    ‣ Make a maximum likelihood estimate

    ‣ Solve numerically for

    θ0, θ1, θ2, θ3

    f ∼ "(0, C(X, X) = 1(2π)N/2 |C |1/2

    exp (− 12 fTC−1f)C(X, X) = K(X, X) + β−1I

    maxθ

    ln p(f |X, θ) = maxθ

    − 12 ln |C | −12 f

    TC−1f − N2 ln 2π

    θ

    6

    How to choose kernel parameters?

    E E

    E

    Q E

    O O-

    -