Gp Cvpr12 Session1

Embed Size (px)

Citation preview

  • 8/11/2019 Gp Cvpr12 Session1

    1/302

  • 8/11/2019 Gp Cvpr12 Session1

    2/302

    Book

    ?

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 2 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    3/302

    Outline

    1 The Gaussian Density

    2 Covariance from Basis Functions

    3

    Basis Function Representations

    4 Constructing Covariance

    5 GP Limitations

    6 Conclusions

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 3 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    4/302

    Outline

    1 The Gaussian Density

    2 Covariance from Basis Functions

    3 Basis Function Representations

    4 Constructing Covariance

    5 GP Limitations

    6 Conclusions

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 4 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    5/302

    The Gaussian Density

    Perhaps the most common probability density.

    p(y|, 2) =

    1

    22exp

    (y

    )2

    22

    =Ny|, 2The Gaussian density.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 5 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    6/302

    Gaussian Density

    0

    1

    2

    3

    0 1 2

    p(h

    |

    ,

    2)

    h, height/m

    The Gaussian PDF with = 1.7 and variance 2 = 0.0225. Mean shownas red line. It could represent the heights of a population of students.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 6 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    7/302

    Gaussian Density

    Ny|, 2= 122

    exp(y )2

    22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 7 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    8/302

    Two Important Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

    yi Ni, 2i n

    i=1

    yi N n

    i=1

    i,n

    i=1

    2i(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    9/302

    Two Important Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

    yi Ni, 2i n

    i=1

    yi N n

    i=1

    i,n

    i=1

    2i(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    10/302

    Two Important Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

    yi Ni, 2i n

    i=1

    yi N n

    i=1

    i,n

    i=1

    2i(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy N

    w, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    11/302

    Two Important Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

    yi Ni, 2i n

    i=1

    yi N n

    i=1

    i,n

    i=1

    2i(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy N

    w, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    12/302

    Two Important Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

    yi Ni, 2i n

    i=1

    yi N n

    i=1

    i,n

    i=1

    2i(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy N

    w, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    13/302

  • 8/11/2019 Gp Cvpr12 Session1

    14/302

    Two Simultaneous Equations

    A system of two differentialequations with two unknowns.

    y1 =mx1+c

    y2 =mx2+c

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    15/302

    Two Simultaneous Equations

    A system of two differentialequations with two unknowns.

    y1 y2=m(x1 x2)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    16/302

    Two Simultaneous Equations

    A system of two differentialequations with two unknowns.

    y1 y2x1 x2 =m

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    17/302

    Two Simultaneous Equations

    A system of two differentialequations with two unknowns.

    m= y2 y1x2 x1

    c=y1 mx10

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    c

    y1

    y2

    x2 x1

    m = y2y1

    x2x1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

    S

  • 8/11/2019 Gp Cvpr12 Session1

    18/302

    Two Simultaneous Equations

    How do we deal with threesimultaneous equations with only twounknowns?

    y1 =mx1+c

    y2 =mx2+c

    y3 =mx3+c 0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    c

    y1

    y2

    x2 x1

    m = y2y1

    x2x1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

    O d i d S

  • 8/11/2019 Gp Cvpr12 Session1

    19/302

    Overdetermined System

    With two unknowns and two observations:

    y1 =mx1+cy2 =mx2+c

    Additional observation leads to overdeterminedsystem.

    y3 =mx3+c

    This problem is solved through a noise model

    N0, 2

    y1 =mx1+c+1

    y2 =mx2+c+2

    y3 =mx3+c+3

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

    O d i d S

  • 8/11/2019 Gp Cvpr12 Session1

    20/302

    Overdetermined System

    With two unknowns and two observations:

    y1 =mx1+cy2 =mx2+c

    Additional observation leads to overdeterminedsystem.

    y3 =mx3+c

    This problem is solved through a noise model

    N0, 2

    y1 =mx1+c+1

    y2 =mx2+c+2

    y3 =mx3+c+3

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

    O d t i d S t

  • 8/11/2019 Gp Cvpr12 Session1

    21/302

    Overdetermined System

    With two unknowns and two observations:

    y1 =mx1+cy2 =mx2+c

    Additional observation leads to overdeterminedsystem.

    y3 =mx3+c

    This problem is solved through a noise model

    N0, 2

    y1 =mx1+c+1

    y2 =mx2+c+2

    y3 =mx3+c+3

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

    N i M d l

  • 8/11/2019 Gp Cvpr12 Session1

    22/302

    Noise Models

    We arent modeling entire system.

    Noise model gives mismatch between model and data.

    Gaussian model justified by appeal to central limit theorem.Other models also possible (Student-tfor heavy tails).

    Maximum likelihood with Gaussian noise leads to least squares.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 11 / 74

    U d d t i d S st

  • 8/11/2019 Gp Cvpr12 Session1

    23/302

    Underdetermined System

    What about two unknowns and oneobservation?

    y1 =mx1+c

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    24/302

    Underdetermined System

    Can compute m given c.

    m= y1 cx

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    25/302

    Underdetermined System

    Can compute m given c.

    c= 1.75 = m= 1.25

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    26/302

    Underdetermined System

    Can compute m given c.

    c= 0.777 = m= 3.78

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    27/302

    Underdetermined System

    Can compute m given c.

    c= 4.01 = m= 7.01

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    28/302

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    29/302

    Underdetermined System

    Can compute m given c.

    c= 2.45 = m= 0.545

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    30/302

    Underdetermined System

    Can compute m given c.

    c= 0.657 = m= 3.66

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    31/302

    Underdetermined System

    Can compute m given c.

    c= 3.13 = m= 6.13

    0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    32/302

    Underdetermined System

  • 8/11/2019 Gp Cvpr12 Session1

    33/302

    U Sy

    Can compute m given c.Assume

    c N(0, 4) ,

    we find a distribution of solutions.0

    1

    2

    3

    4

    5

    0 1 2 3

    y

    x

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

    Probability for Under- and Overdetermined

  • 8/11/2019 Gp Cvpr12 Session1

    34/302

    y

    To deal with overdetermined introduced probability distribution forvariable,i.

    For underdetermined system introduced probability distribution forparameter,c.

    This is known as a Bayesian treatment.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 13 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    35/302

    For general Bayesian inference need multivariate priors.E.g. for multivariate linear regression:

    yi=

    i

    wjxi,j+i

    (where weve dropped c for convenience), we need a prior over w.This motivates a multivariateGaussian density.

    We will use the multivariate Gaussian to put a prior directlyon thefunction (a Gaussian process).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    36/302

    For general Bayesian inference need multivariate priors.E.g. for multivariate linear regression:

    yi =wxi,:+i

    (where weve dropped c for convenience), we need a prior over w.This motivates a multivariateGaussian density.

    We will use the multivariate Gaussian to put a prior directlyon thefunction (a Gaussian process).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

    Multivariate Regression Likelihood

  • 8/11/2019 Gp Cvpr12 Session1

    37/302

    Recall multivariate regression likelihood:

    p(y|X, w) = 1(22)n/2

    exp

    1

    22

    ni=1

    yiwxi,:

    2

    Now use a multivariate Gaussian prior:

    p(w) = 1

    (2)p2

    exp 1

    2ww

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

    Multivariate Regression Likelihood

  • 8/11/2019 Gp Cvpr12 Session1

    38/302

    Recall multivariate regression likelihood:

    p(y|X, w) = 1(22)n/2

    exp

    1

    22

    ni=1

    yiwxi,:

    2

    Now use a multivariate Gaussian prior:

    p(w) = 1

    (2)p2

    exp 1

    2ww

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

    Posterior Density

  • 8/11/2019 Gp Cvpr12 Session1

    39/302

    Once again we want to know the posterior:

    p(w|y,X) p(y|X, w)p(w)

    And we can compute by completing the square.

    log p(w|y, X) = 1

    22

    ni=1

    y2i + 1

    2

    ni=1

    yixi,:w

    122

    ni=1

    wxi,:x

    i,:w 1

    2ww+ const.

    p(w|y,X) =N(w|w, Cw)Cw = (

    2XX +1)1 and w =Cw2Xy

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

    Posterior Density

  • 8/11/2019 Gp Cvpr12 Session1

    40/302

    Once again we want to know the posterior:

    p(w|y,X) p(y|X, w)p(w)

    And we can compute by completing the square.

    log p(w|y, X) = 1

    22

    ni=1

    y2i + 1

    2

    ni=1

    yix

    i,:w

    122

    ni=1

    wxi,:x

    i,:w 1

    2ww+ const.

    p(w|y,X) =N(w|w, Cw)Cw = (

    2XX +1)1 and w =Cw2Xy

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

    Bayesian vs Maximum Likelihood

  • 8/11/2019 Gp Cvpr12 Session1

    41/302

    Note the similarity between posterior mean

    w = (2XX +1)12Xy

    and Maximum likelihood solution

    w= (XX)1Xy

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 17 / 74

    Marginal Likelihood is Computed as Normalizer

  • 8/11/2019 Gp Cvpr12 Session1

    42/302

    p(w|y,X)p(y

    |X) =p(y

    |w,X)p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 18 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    43/302

    Two Dimensional Gaussian

  • 8/11/2019 Gp Cvpr12 Session1

    44/302

    Consider height, h/m and weight, w/kg.

    Could sample height from a distribution:

    p(h)

    N(1.7, 0.0225)

    And similarly weight:

    p(w) N(75, 36)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 20 / 74

    Height and Weight Models

  • 8/11/2019 Gp Cvpr12 Session1

    45/302

    p(h)

    h/m

    Marginal Distributions

    p(w)

    w/kg Gaussian

    distributions for height and weight.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 21 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    46/302

    w

    /kg

    h/

    m

    Joint Distribution

    p(h)

    Marginal Distributions

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    47/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    48/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    Marginal Distributions

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    49/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    50/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    Marginal Distributions

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    51/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    Marginal Distributions

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    52/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    g

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    53/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    g

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    54/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    55/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    56/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    57/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    58/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    59/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    60/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    61/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    62/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    63/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    64/302

  • 8/11/2019 Gp Cvpr12 Session1

    65/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    66/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    67/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    68/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Sample height and weight one after the other and plot against each other.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    69/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    70/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    71/302

    w

    /kg

    h/

    m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    72/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    73/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    74/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    75/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    76/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    77/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    78/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    79/302

    w

    /kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    80/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    81/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    82/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    83/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    84/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    85/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    86/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    87/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    88/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    89/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    90/302

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    91/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Sampling Two Dimensional VariablesMarginal Distributions

  • 8/11/2019 Gp Cvpr12 Session1

    92/302

    w/kg

    h/m

    Joint Distribution

    p(h)

    p(w)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

    Independent Gaussians

  • 8/11/2019 Gp Cvpr12 Session1

    93/302

    p(w, h) =p(w)p(h)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

    Independent Gaussians

  • 8/11/2019 Gp Cvpr12 Session1

    94/302

    p(w, h) = 1

    221222exp

    1

    2

    (w 1)2

    21+

    (h 2)222

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

    Independent Gaussians

  • 8/11/2019 Gp Cvpr12 Session1

    95/302

    p(w, h) = 1

    2

    2122

    exp1

    2w

    h

    1

    2

    21 0

    0 221

    wh

    1

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

    Independent Gaussians

  • 8/11/2019 Gp Cvpr12 Session1

    96/302

    p(y) = 1

    2 |D| exp1

    2(y )D1(y )

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    97/302

    Correlated Gaussian

  • 8/11/2019 Gp Cvpr12 Session1

    98/302

    Form correlated from original by rotating the data space using matrix R.

    p(y) = 1

    2

    |D

    |

    12

    exp

    1

    2(Ry R)D1(Ry R)

    U t d L () S i 1 GP d R i CVPR T t i l 26 / 74

    Correlated Gaussian

  • 8/11/2019 Gp Cvpr12 Session1

    99/302

    Form correlated from original by rotating the data space using matrix R.

    p(y) = 1

    2

    |D

    |

    12

    exp

    1

    2(y )RD1R(y )

    this gives a covariance matrix:

    C1 =RD1R

    U t d L () S i 1 GP d R i CVPR T t i l 26 / 74

    Correlated Gaussian

  • 8/11/2019 Gp Cvpr12 Session1

    100/302

    Form correlated from original by rotating the data space using matrix R.

    p(y) = 1

    2

    |C

    |

    12

    exp

    1

    2(y )C1(y )

    this gives a covariance matrix:

    C= RDR

    U t d L () S i 1 GP d R i CVPR T t i l 26 / 74

    Recall Univariate Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

  • 8/11/2019 Gp Cvpr12 Session1

    101/302

    yi Ni, 2i n

    i=1yi N

    n

    i=1i,

    n

    i=12i

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    U d L () S i 1 GP d R i CVPR T i l 27 / 74

    Recall Univariate Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

  • 8/11/2019 Gp Cvpr12 Session1

    102/302

    yi Ni, 2i n

    i=1yi N

    n

    i=1i,

    n

    i=12i

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    U d L () S i 1 GP d R i CVPR T i l 27 / 74

    Recall Univariate Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

  • 8/11/2019 Gp Cvpr12 Session1

    103/302

    yi Ni, 2i n

    i=1yi N

    n

    i=1i,

    n

    i=12i

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    104/302

    Recall Univariate Gaussian Properties

    1 Sum of Gaussian variables is also Gaussian.

  • 8/11/2019 Gp Cvpr12 Session1

    105/302

    yi Ni, 2i n

    i=1yi N

    n

    i=1i,

    n

    i=12i

    2 Scaling a Gaussian leads to a Gaussian.

    y N, 2

    wy Nw, w22

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

    Multivariate Consequence

  • 8/11/2019 Gp Cvpr12 Session1

    106/302

    If x N(,)

    And

    y=Wx

    Theny NW, WW

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

    Multivariate Consequence

  • 8/11/2019 Gp Cvpr12 Session1

    107/302

    If x N(,)

    And

    y=Wx

    Theny N

    W, WW

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    108/302

    Sampling a Function

  • 8/11/2019 Gp Cvpr12 Session1

    109/302

    Multi-variate Gaussians

    We will consider a Gaussian with a particular structure of covariancematrix.

    Generate a single sample from this 25 dimensional Gaussiandistribution, f= [f1, f2. . . f25].

    We will plot these points against their index.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 29 / 74

    Gaussian Distribution Sample

    2 0.

    1

  • 8/11/2019 Gp Cvpr12 Session1

    110/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    j

    i00.0.0.0.0.

    0.0.0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    111/302

    Gaussian Distribution Sample

    2

    00.1

  • 8/11/2019 Gp Cvpr12 Session1

    112/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    00.0.0.0.0.

    0.0.

    0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

    Gaussian Distribution Sample

    2

    00.1

  • 8/11/2019 Gp Cvpr12 Session1

    113/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    00.0.0.0.0.

    0.0.

    0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

    Gaussian Distribution Sample

    2

    00.

    1

  • 8/11/2019 Gp Cvpr12 Session1

    114/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    00.0.0.0.0.

    0.0.

    0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

    Gaussian Distribution Sample

    2

    00.1

  • 8/11/2019 Gp Cvpr12 Session1

    115/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    00.0.0.0.0.

    0.0.

    0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

    Gaussian Distribution Sample

    2

    0.0.

    1

  • 8/11/2019 Gp Cvpr12 Session1

    116/302

    -2

    -1

    0

    1

    0 5 10 15 20 25

    i(a) A 25 dimensional correlated randomvariable (values ploted against index)

    00.0.0.0.0.

    0.0.

    0.

    (b) colormap showing correlations betweendimensions.

    Figure: A sample from a 25 dimensional Gaussian distribution.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    117/302

    Prediction off2 from f1

    1

  • 8/11/2019 Gp Cvpr12 Session1

    118/302

    -1

    0

    -1 0 1

    f2

    1 0.96587

    0.96587 1

    The single contour of the Gaussian density represents thejoint

    distribution,p

    (f

    1,f

    2).We observe that f1 = 0.313.Conditional density: p(f2|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

    Prediction off2 from f1

    1

  • 8/11/2019 Gp Cvpr12 Session1

    119/302

    -1

    0

    -1 0 1

    f2

    1 0.96587

    0.96587 1

    The single contour of the Gaussian density represents thejointdistribution, p(f

    1, f

    2).

    We observe that f1 = 0.313.Conditional density: p(f2|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

    Prediction off2 from f1

    1

  • 8/11/2019 Gp Cvpr12 Session1

    120/302

    -1

    0

    -1 0 1

    f2

    1 0.96587

    0.96587 1

    The single contour of the Gaussian density represents thejointdistribution, p(f

    1, f

    2).

    We observe that f1 = 0.313.Conditional density: p(f2|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    121/302

    Prediction with Correlated Gaussians

    Prediction off2 from f1 requires conditional density.

  • 8/11/2019 Gp Cvpr12 Session1

    122/302

    Conditional density is alsoGaussian.

    p(f2|f1) =N

    f2|k1,2k1,1

    f1, k2,2 k21,2

    k1,1

    where covariance of joint density is given by

    K=

    k1,1 k1,2k2,1 k2,2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 32 / 74

    Prediction off5 from f1

    11 0.57375

  • 8/11/2019 Gp Cvpr12 Session1

    123/302

    -1

    0

    -1 0 1

    f5

    0.57375 1

    The single contour of the Gaussian density represents thejointdistribution, p(f

    1, f

    5).

    We observe that f1 = 0.313.Conditional density: p(f5|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

    Prediction off5 from f1

    11 0.57375

  • 8/11/2019 Gp Cvpr12 Session1

    124/302

    -1

    0

    -1 0 1

    f5

    0.57375 1

    The single contour of the Gaussian density represents thejointdistribution, p(f

    1, f

    5).

    We observe that f1 = 0.313.Conditional density: p(f5|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

    Prediction off5 from f1

    11 0.57375

  • 8/11/2019 Gp Cvpr12 Session1

    125/302

    -1

    0

    -1 0 1

    f5

    0.57375 1

    The single contour of the Gaussian density represents thejointdistribution, p(f1, f5).

    We observe that f1 = 0.313.Conditional density: p(f5|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

    Prediction off5 from f1

    11 0.57375

  • 8/11/2019 Gp Cvpr12 Session1

    126/302

    -1

    0

    -1 0 1

    f5

    0.57375 1

    The single contour of the Gaussian density represents thejointdistribution, p(f1, f5).

    We observe that f1 = 0.313.Conditional density: p(f5|f1 = 0.313).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

    Prediction with Correlated Gaussians

    Prediction off from f requires multivariate conditional density.

    M lti i t diti l d it i l G i

  • 8/11/2019 Gp Cvpr12 Session1

    127/302

    Multivariate conditional density is alsoGaussian.

    p(f|f) =N

    f|K,fK1f,ff, K, K,fK1f,fKf,

    Here covariance of joint density is given by

    K= Kf,f K,fKf, K,

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 34 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    128/302

    Covariance FunctionsWhere did this covariance matrix come from?

    Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

    2

  • 8/11/2019 Gp Cvpr12 Session1

    129/302

    k

    x, x

    = exp

    x x

    2222

    Covariance matrix is builtusing the inputsto thefunction x.

    For the example above itwas based on Euclidean

    distance.The covariance function isalso know as a kernel.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

    x x2

  • 8/11/2019 Gp Cvpr12 Session1

    130/302

    k

    x, x

    = exp

    x x

    2222

    Covariance matrix is builtusing the inputsto thefunction x.

    For the example above itwas based on Euclidean

    distance.The covariance function isalso know as a kernel.

    -3-2

    -1

    0

    1

    2

    3

    -1 -0.5 0 0.5 1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp

    ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    131/302

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x1 = 3.0, x1 = 3.0

    k1,1 = 1.00 exp (3.03.0)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp

    ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    132/302

    1.00

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x1 = 3.0, x1 = 3.0

    k1,1 = 1.00 exp (3.03.0)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    133/302

  • 8/11/2019 Gp Cvpr12 Session1

    134/302

  • 8/11/2019 Gp Cvpr12 Session1

    135/302

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    136/302

    1.00 0.110

    0.110

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x2 = 1.20, x2 = 1.20

    k2,2 = 1.00 exp (1.201.20)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    137/302

    1.00 0.110

    0.110 1.00

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x2 = 1.20, x2 = 1.20

    k2,2 = 1.00 exp (1.201.20)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    138/302

  • 8/11/2019 Gp Cvpr12 Session1

    139/302

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    140/302

    1.00 0.110 0.0889

    0.110 1.00

    0.0889

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x1 = 3.0

    k3,1 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    141/302

    1.00 0.110 0.0889

    0.110 1.00

    0.0889

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x2 = 1.20

    k3,2 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    142/302

    1.00 0.110 0.0889

    0.110 1.00

    0.0889 0.995

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x2 = 1.20

    k3,2 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    143/302

    1.00 0.110 0.0889

    0.110 1.00 0.995

    0.0889 0.995

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x2 = 1.20

    k3,2 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    144/302

    1.00 0.110 0.0889

    0.110 1.00 0.995

    0.0889 0.995

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x3 = 1.40

    k3,3 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    145/302

    1.00 0.110 0.0889

    0.110 1.00 0.995

    0.0889 0.995 1.00

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 2.00 and = 1.00.

    x3 = 1.40, x3 = 1.40

    k3,3 = 1.00 exp (1.401.40)222.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    146/302

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    147/302

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x1 = 3, x1 = 3

    k1,1 = 1.0 exp (33)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1 0

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    148/302

    1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x1 = 3, x1 = 3

    k1,1 = 1.0 exp (33)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1 0

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    149/302

    1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x2 = 1.2, x1 = 3

    k2,1 = 1.0 exp (1.21.2)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1 0

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    150/302

    1.0

    0.11

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x2 = 1.2, x1 = 3

    k2,1 = 1.0 exp (1.21.2)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    151/302

  • 8/11/2019 Gp Cvpr12 Session1

    152/302

  • 8/11/2019 Gp Cvpr12 Session1

    153/302

  • 8/11/2019 Gp Cvpr12 Session1

    154/302

  • 8/11/2019 Gp Cvpr12 Session1

    155/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    156/302

    1.0 0.11 0.089

    0.11 1.0

    0.089

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x3 = 1.4, x1 = 3

    k3,1 = 1.0 exp (1.41.4)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    157/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    158/302

    0.11 1.0

    0.089 1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x3 = 1.4, x2 = 1.2

    k3,2 = 1.0 exp (1.41.4)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    159/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    160/302

    0.11 1.0 1.0

    0.089 1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x3 = 1.4, x3 = 1.4

    k3,3 = 1.0 exp (1.41.4)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    161/302

    0.11 1.0 1.0

    0.089 1.0 1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x3 = 1.4, x3 = 1.4

    k3,3 = 1.0 exp (1.41.4)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    162/302

    0.11 1.0 1.0

    0.089 1.0 1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x1 = 3

    k4,1 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    163/302

  • 8/11/2019 Gp Cvpr12 Session1

    164/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089 0.044

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    165/302

    0.11 1.0 1.0

    0.089 1.0 1.0

    0.044

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x2 = 1.2

    k4,2 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    166/302

  • 8/11/2019 Gp Cvpr12 Session1

    167/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089 0.044

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    168/302

    0.11 1.0 1.0 0.92

    0.089 1.0 1.0

    0.044 0.92

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x3 = 1.4

    k4,3 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    169/302

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089 0.044

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    170/302

    0.11 1.0 1.0 0.92

    0.089 1.0 1.0 0.96

    0.044 0.92 0.96

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x3 = 1.4

    k4,3 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089 0.044

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    171/302

    0.11 1.0 1.0 0.92

    0.089 1.0 1.0 0.96

    0.044 0.92 0.96

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x4 = 2.0

    k4,4 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    1.0 0.11 0.089 0.044

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    172/302

    0.11 1.0 1.0 0.92

    0.089 1.0 1.0 0.96

    0.044 0.92 0.96 1.0

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x4 = 2.0

    k4,4 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    173/302

    x1 = 3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with = 2.0 and = 1.0.

    x4 = 2.0, x4 = 2.0

    k4,4 = 1.0 exp (2.02.0)222.02

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

  • 8/11/2019 Gp Cvpr12 Session1

    174/302

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 5.00 and = 4.00.

    x1 = 3.0, x1 = 3.0

    k1,1 = 4.00 exp (3.03.0)225.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    175/302

  • 8/11/2019 Gp Cvpr12 Session1

    176/302

  • 8/11/2019 Gp Cvpr12 Session1

    177/302

  • 8/11/2019 Gp Cvpr12 Session1

    178/302

  • 8/11/2019 Gp Cvpr12 Session1

    179/302

  • 8/11/2019 Gp Cvpr12 Session1

    180/302

  • 8/11/2019 Gp Cvpr12 Session1

    181/302

  • 8/11/2019 Gp Cvpr12 Session1

    182/302

  • 8/11/2019 Gp Cvpr12 Session1

    183/302

  • 8/11/2019 Gp Cvpr12 Session1

    184/302

  • 8/11/2019 Gp Cvpr12 Session1

    185/302

    Covariance FunctionsWhere did this covariance matrix come from?

    4.00 2.81 2.72

    k(xi, xj) = exp ||xixj||222

    1 40 1 20

  • 8/11/2019 Gp Cvpr12 Session1

    186/302

    2.81 4.00 4.00

    2.72 4.00

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 5.00 and = 4.00.

    x3 = 1.40, x2 = 1.20

    k3,2 = 4.00 exp (1.401.40)225.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    187/302

    Covariance FunctionsWhere did this covariance matrix come from?

    4.00 2.81 2.72

    k(xi, xj) = exp ||xixj||222

    1 40 1 40

  • 8/11/2019 Gp Cvpr12 Session1

    188/302

    2.81 4.00 4.00

    2.72 4.00 4.00

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 5.00 and = 4.00.

    x3 = 1.40, x3 = 1.40

    k3,3 = 4.00 exp (1.401.40)225.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Covariance FunctionsWhere did this covariance matrix come from?

    k(xi, xj) = exp ||xixj||222

    x = 1 40 x = 1 40

  • 8/11/2019 Gp Cvpr12 Session1

    189/302

    x1 = 3.0, x2 = 1.20, and x3 = 1.40 with = 5.00 and = 4.00.

    x3 = 1.40, x3 = 1.40

    k3,3 = 4.00 exp (1.401.40)225.002

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

    Outline

    1 The Gaussian Density

    2 Covariance from Basis Functions

    3 Basis Function Representations

  • 8/11/2019 Gp Cvpr12 Session1

    190/302

    4 Constructing Covariance

    5 GP Limitations

    6 Conclusions

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 37 / 74

    Basis Function Form

    Radial basis functionscommonly have the form

    k(xi) = exp

    |xi k|

    2

    22

    .

    1

  • 8/11/2019 Gp Cvpr12 Session1

    191/302

    Basis function mapsdata into afeaturespacein which alinear sum is a nonlinear function.

    0

    0.5

    -8 -6 -4 -2 0 2 4 6 8

    (x

    )

    xFigure: A set of radial basis functions with width= 2 and location parameters = [4 0 4 ].

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 38 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    192/302

    Random Functions

    Functions derived using:

    f(x) =m

    k=1

    wkk(x),-2

    -10

    1

    2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    193/302

    k 1

    where W is sampledfrom a Gaussian density,

    wk N(0, ) .

    -8 -6 -4 -2 0 2 4 6 8

    x

    Figure: Functions sampled using the basis set fromfigure2. Each line is a separate sample, generated bya weighted sum of the basis set. The weights, w aresampled from a Gaussian density with variance = 1.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 40 / 74

    Direct Construction of Covariance Matrix

    Use matrix notation to write function,

    f (xi; w) =m

    k=1

    wkk(xi)

    computed at training data gives a vector

  • 8/11/2019 Gp Cvpr12 Session1

    194/302

    f= w.

    w and fare only related by a inner product.

    is fixed and non-stochastic for a given training set.

    f is Gaussian distributed.

    it is straightforward to compute distribution for f

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    195/302

    Direct Construction of Covariance Matrix

    Use matrix notation to write function,

    f (xi; w) =m

    k=1

    wkk(xi)

    computed at training data gives a vector

  • 8/11/2019 Gp Cvpr12 Session1

    196/302

    f= w.

    w and fare only related by a inner product.

    is fixed and non-stochastic for a given training set.

    f is Gaussian distributed.

    it is straightforward to compute distribution for f

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    197/302

    Direct Construction of Covariance Matrix

    Use matrix notation to write function,

    f (xi; w) =m

    k=1

    wkk(xi)

    computed at training data gives a vector

  • 8/11/2019 Gp Cvpr12 Session1

    198/302

    f= w.

    w and fare only related by a inner product.

    is fixed and non-stochastic for a given training set.

    f is Gaussian distributed.

    it is straightforward to compute distribution for f

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

    Direct Construction of Covariance Matrix

    Use matrix notation to write function,

    f (xi; w) =m

    k=1

    wkk(xi)

    computed at training data gives a vector

  • 8/11/2019 Gp Cvpr12 Session1

    199/302

    f= w.

    w and fare only related by a inner product.

    is fixed and non-stochastic for a given training set.

    f is Gaussian distributed.

    it is straightforward to compute distribution for f

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    200/302

  • 8/11/2019 Gp Cvpr12 Session1

    201/302

    Expectations

    We useto denote expectations under prior distributions.We have

    f = w .

    Prior mean ofw was zero giving

    f =0.

  • 8/11/2019 Gp Cvpr12 Session1

    202/302

    Prior covariance off is

    K=

    ff f f

    ff= ww,giving

    K= .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    203/302

    Expectations

    We useto denote expectations under prior distributions.We have

    f = w .

    Prior mean ofw was zero giving

    f =0.

  • 8/11/2019 Gp Cvpr12 Session1

    204/302

    Prior covariance off is

    K=

    ff f f

    ff= ww,giving

    K= .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    205/302

    Covariance between Two Points

    The prior covariance between two points xi and xj is

    k(xi, xj) =

    m

    (xi)(xj)

    or in vector form

    k (x x ) = (x ) (x )

  • 8/11/2019 Gp Cvpr12 Session1

    206/302

    k(xi, xj) =:(xi) :(xj) ,

    For the radial basis used this gives

    k(xi, xj) =

    m

    k=1

    exp|xi k|2 + |xjk|2

    22 .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    207/302

  • 8/11/2019 Gp Cvpr12 Session1

    208/302

    Covariance between Two Points

    The prior covariance between two points xi and xj is

    k(xi, xj) =

    m

    (xi)(xj)

    or in vector form

    k (xi xj ) = (xi ) (xj )

  • 8/11/2019 Gp Cvpr12 Session1

    209/302

    k(xi, xj) =:(xi) :(xj) ,

    For the radial basis used this gives

    k(xi, xj) =

    m

    k=1

    exp|xi k|2 + |xjk|2

    22 .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    210/302

    Selecting Number and Location of Basis

    Need to choose

    1 location of centers2 number of basis functions

    Consider uniform spacing over a region:

  • 8/11/2019 Gp Cvpr12 Session1

    211/302

    k(xi, xj) =

    mk=1

    expx

    2i +x

    2j

    2k(xi+xj) + 2

    2k

    22

    ,

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

    Selecting Number and Location of Basis

    Need to choose

    1 location of centers2 number of basis functions

    Consider uniform spacing over a region:

  • 8/11/2019 Gp Cvpr12 Session1

    212/302

    k(xi, xj) =

    mk=1

    expx

    2i +x

    2j

    2k(xi+xj) + 2

    2k

    22

    ,

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    213/302

    Uniform Basis Functions

    Set each center location to

    k=a+

    (k

    1).

    Specify the bases in terms of their indices,

  • 8/11/2019 Gp Cvpr12 Session1

    214/302

    k(xi, xj) =

    mk=1

    exp x

    2i +x

    2j

    22

    2 (a+ k) (xi+xj) + 2 (a+ k)2

    22

    .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 45 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    215/302

  • 8/11/2019 Gp Cvpr12 Session1

    216/302

    Infinite Basis Functions

    Take 0 =a and m =bso b=a+ (m 1).

    Take limit as 0 so m k(xi, xj) =

    ba

    exp

    x

    2i +x

    2j

    22

    1 ( )2 1 ( )2

  • 8/11/2019 Gp Cvpr12 Session1

    217/302

    +

    2 12(xi+xj)2

    12(xi+xj)

    2

    22

    d,

    where we have used k .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    218/302

    Infinite Basis Functions

    Take 0 =a and m =bso b=a+ (m 1).

    Take limit as 0 so m k(xi, xj) =

    ba

    exp

    x

    2i +x

    2j

    22

    2 1 ( + )2 1 ( + )2

  • 8/11/2019 Gp Cvpr12 Session1

    219/302

    +

    2 12(xi+xj)

    12(xi+xj)

    2

    22

    d,

    where we have used k .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

    Result

    Performing the integration leads to

    k(xi,xj) =

    2

    2 exp

    (xi xj)242

    erf

    b 12(xi+xj)

    erf

    a 12(xi+xj)

    ,

  • 8/11/2019 Gp Cvpr12 Session1

    220/302

    Now take limit as a and b

    k(xi, xj) = exp

    (xi xj)

    2

    42

    .

    where =2.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    221/302

    Result

    Performing the integration leads to

    k(xi,xj) =

    2

    2 exp

    (xi xj)242

    erf

    b 12(xi+xj)

    erf

    a 12(xi+xj)

    ,

  • 8/11/2019 Gp Cvpr12 Session1

    222/302

    Now take limit as a and b

    k(xi, xj) = exp

    (xi xj)

    2

    42

    .

    where =2.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

    Infinite Feature Space

    A RBF model with infinite basis functions is a Gaussian process.

    The covariance function is the exponentiated quadratic.

    Note: The functional form for the covariance function and basisfunctions are similar.

    this is a special case,

  • 8/11/2019 Gp Cvpr12 Session1

    223/302

    in general they are very differentSimilar results can obtained for multi-dimensional input networks ??.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    224/302

    Infinite Feature Space

    A RBF model with infinite basis functions is a Gaussian process.

    The covariance function is the exponentiated quadratic.

    Note: The functional form for the covariance function and basisfunctions are similar.

    this is a special case,

  • 8/11/2019 Gp Cvpr12 Session1

    225/302

    in general they are very differentSimilar results can obtained for multi-dimensional input networks ??.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

    Infinite Feature Space

    A RBF model with infinite basis functions is a Gaussian process.

    The covariance function is the exponentiated quadratic.

    Note: The functional form for the covariance function and basisfunctions are similar.

    this is a special case,i l h diff

  • 8/11/2019 Gp Cvpr12 Session1

    226/302

    in general they are very different

    Similar results can obtained for multi-dimensional input networks ??.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    227/302

    The Parametric Bottleneck

    Parametric models have a representation that does not respond toincreasing training set size.

    Bayesian posterior distributions over parameters contain theinformation about the training data.

    Use Bayes rule from training data, p(w|y,X), Make predictions on test data

    p(y|X, y,X) =

    p(y|w, X) p(w|y,X)dw) .

  • 8/11/2019 Gp Cvpr12 Session1

    228/302

    w becomes a bottleneck for information about the training set to passto the test set.

    Solution: increase m so that the bottleneck is so large that it nolonger presents a problem.

    How big is big enough for m? Non-parametrics says m .

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 50 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    229/302

  • 8/11/2019 Gp Cvpr12 Session1

    230/302

    The Parametric Bottleneck

    Now no longer possible to manipulate the model through the standardparametric form given in (1).

    However, it ispossible to express parametricas GPs:

    k(xi, xj) =:(xi) :(xj) .

  • 8/11/2019 Gp Cvpr12 Session1

    231/302

    These are known as degenerate covariance matrices.Their rank is at most m, non-parametric models have full rankcovariance matrices.

    Most well known is the linear kernel, k(xi, xj) =x

    i xj.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

    The Parametric Bottleneck

    Now no longer possible to manipulate the model through the standardparametric form given in (1).

    However, it ispossible to express parametricas GPs:

    k(xi, xj) =:(xi) :(xj) .

  • 8/11/2019 Gp Cvpr12 Session1

    232/302

    These are known as degenerate covariance matrices.Their rank is at most m, non-parametric models have full rankcovariance matrices.

    Most well known is the linear kernel, k(xi, xj) =x

    i xj.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

    The Parametric Bottleneck

    Now no longer possible to manipulate the model through the standardparametric form given in (1).

    However, it ispossible to express parametricas GPs:

    k(xi, xj) =:(xi) :(xj) .

  • 8/11/2019 Gp Cvpr12 Session1

    233/302

    These are known as degenerate covariance matrices.Their rank is at most m, non-parametric models have full rankcovariance matrices.

    Most well known is the linear kernel, k(xi, xj) =x

    i xj.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

    Making Predictions

    For non-parametrics prediction at new points f is made byconditioning on f in the joint distribution.

    In GPs this involves combining the training data with the covariancefunction and the mean function.

    Parametric is a special case when conditional prediction can besummarized in a fixednumber of parameters.

  • 8/11/2019 Gp Cvpr12 Session1

    234/302

    Complexity of parametric model remains fixed regardless of the size ofour training data set.

    For a non-parametric model the required number of parameters growswith the size of the training data.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

    Making Predictions

    For non-parametrics prediction at new points f is made byconditioning on f in the joint distribution.

    In GPs this involves combining the training data with the covariancefunction and the mean function.

    Parametric is a special case when conditional prediction can besummarized in a fixednumber of parameters.

  • 8/11/2019 Gp Cvpr12 Session1

    235/302

    Complexity of parametric model remains fixed regardless of the size ofour training data set.

    For a non-parametric model the required number of parameters growswith the size of the training data.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    236/302

  • 8/11/2019 Gp Cvpr12 Session1

    237/302

  • 8/11/2019 Gp Cvpr12 Session1

    238/302

    Covariance Functions

    RBF Basis Functions

    k

    x, x

    = (x)(x)

    i(x) = exp

    x i

    22

    2

  • 8/11/2019 Gp Cvpr12 Session1

    239/302

    =

    10

    1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 53 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    240/302

  • 8/11/2019 Gp Cvpr12 Session1

    241/302

    Covariance Functions and Mercer Kernels

    Mercer Kernels and Covariance Functions are similar.the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

    Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

  • 8/11/2019 Gp Cvpr12 Session1

    242/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    243/302

  • 8/11/2019 Gp Cvpr12 Session1

    244/302

    Constructing Covariance Functions

    Sum of two covariances is also a covariance function.

    k(x, x) =k1(x, x) +k2(x, x

    )

  • 8/11/2019 Gp Cvpr12 Session1

    245/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 56 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    246/302

  • 8/11/2019 Gp Cvpr12 Session1

    247/302

  • 8/11/2019 Gp Cvpr12 Session1

    248/302

    Covariance Functions

    MLP Covariance Function

    k

    x, x

    = asin

    wxx +b

    wxx +b+ 1wxx +b+ 1

    Based on infinite neuralnetwork model.

    0

    1

    2

  • 8/11/2019 Gp Cvpr12 Session1

    249/302

    w= 40

    b= 4 -2

    -1

    0

    -1 0 1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 59 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    250/302

    Covariance Functions

    Linear Covariance Function

    k

    x, x

    = xx

    Bayesian linear regression.

    1

    0

    1

    2

  • 8/11/2019 Gp Cvpr12 Session1

    251/302

    = 1

    -2

    -1

    -1 0 1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 60 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    252/302

  • 8/11/2019 Gp Cvpr12 Session1

    253/302

  • 8/11/2019 Gp Cvpr12 Session1

    254/302

    Gaussian Process Interpolation

    -3

    -2

    -1

    0

    1

    2

    3

    2 1 0 1 2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    255/302

    3 -2 -1 0 1 2

    x

    Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

    Gaussian Process Interpolation

    -3

    -2

    -1

    0

    1

    2

    3

    2 1 0 1 2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    256/302

    3 -2 -1 0 1 2

    x

    Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

    Gaussian Process Interpolation

    -3

    -2

    -1

    0

    1

    2

    3

    2 1 0 1 2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    257/302

    3 -2 -1 0 1 2

    x

    Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

    Gaussian Process Interpolation

    -3

    -2

    -1

    0

    1

    2

    3

    2 1 0 1 2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    258/302

    3 -2 -1 0 1 2

    x

    Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

    Gaussian Process Interpolation

    -3

    -2

    -1

    0

    1

    2

    3

    2 1 0 1 2

    f(x)

  • 8/11/2019 Gp Cvpr12 Session1

    259/302

    -2 -1 0 1 2

    x

    Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

    Noise Models

    Graph of a GP

    Relates input variables, X,

    to vector, y, throughfgiven kernel parameters .

    Plate notation indicatesindependence ofyi|fi.

    Noise model, p(yi|fi) cantake several forms

    yi

    X

    fi

    i= 1. . .

    n

  • 8/11/2019 Gp Cvpr12 Session1

    260/302

    , p (y | )take several forms.Simplest is Gaussiannoise.

    . . .

    Figure: The Gaussian processdepicted graphically.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 62 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    261/302

    Gaussian Process Regression

    -3

    -2

    -1

    0

    1

    2

    3

    -2 -1 0 1 2

    y(x)

  • 8/11/2019 Gp Cvpr12 Session1

    262/302

    x

    Figure: Examples include WiFi localization, C14 callibration curve.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    263/302

  • 8/11/2019 Gp Cvpr12 Session1

    264/302

  • 8/11/2019 Gp Cvpr12 Session1

    265/302

  • 8/11/2019 Gp Cvpr12 Session1

    266/302

  • 8/11/2019 Gp Cvpr12 Session1

    267/302

  • 8/11/2019 Gp Cvpr12 Session1

    268/302

    Gaussian Process Regression

    -3

    -2

    -1

    0

    1

    2

    3

    -2 -1 0 1 2

    y(x)

  • 8/11/2019 Gp Cvpr12 Session1

    269/302

    x

    Figure: Examples include WiFi localization, C14 callibration curve.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

    Gaussian Process Regression

    -3

    -2

    -1

    0

    1

    2

    3

    -2 -1 0 1 2

    y(x)

  • 8/11/2019 Gp Cvpr12 Session1

    270/302

    x

    Figure: Examples include WiFi localization, C14 callibration curve.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    N(y|0,K) = 1(2)

    n2 |K|exp

    yK1y2

    The parameters are insidethe covariance function

    (matrix).

  • 8/11/2019 Gp Cvpr12 Session1

    271/302

    ( )

    ki,j=k(xi, xj;)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    272/302

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    logN(y|0,K) = n2

    log 212

    log |K|yK1

    y2

    The parameters are insidethe covariance function(matrix).

  • 8/11/2019 Gp Cvpr12 Session1

    273/302

    ( )ki,j=k(xi, xj;)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    E() =

    1

    2log |K| +yK1y

    2

    The parameters are insidethe covariance function(matrix).

  • 8/11/2019 Gp Cvpr12 Session1

    274/302

    ( )ki,j=k(xi, xj;)

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    275/302

  • 8/11/2019 Gp Cvpr12 Session1

    276/302

    Capacity control: log |K|

    1 0

    0 2

    1

    2 =

  • 8/11/2019 Gp Cvpr12 Session1

    277/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    278/302

  • 8/11/2019 Gp Cvpr12 Session1

    279/302

    Capacity control: log |K|

    1 0

    0 2

    1

    2 =

  • 8/11/2019 Gp Cvpr12 Session1

    280/302

    || =12Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

    Capacity control: log |K|

    1 0

    0 2

    1

    2 || =

  • 8/11/2019 Gp Cvpr12 Session1

    281/302

    || =12Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    282/302

  • 8/11/2019 Gp Cvpr12 Session1

    283/302

    Capacity control: log |K|

    1 0

    0 2

    1

    2 || =

  • 8/11/2019 Gp Cvpr12 Session1

    284/302

    || =12Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

    Capacity control: log |K|

    w1,1 w1,2

    w2,1 w2,2

    12

    ||R =

  • 8/11/2019 Gp Cvpr12 Session1

    285/302

    |R| =12Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

    Data Fit: y1K1y

    2

    -6-4

    -2

    0

    2

    4

    6

    y2

    1

    2

  • 8/11/2019 Gp Cvpr12 Session1

    286/302

    -6 -4 -2 0 2 4 6

    y1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

    Data Fit: y1K1y

    2

    -6-4

    -2

    0

    2

    4

    6

    y2

    1

    2

  • 8/11/2019 Gp Cvpr12 Session1

    287/302

    -6 -4 -2 0 2 4 6

    y1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

    Data Fit: y1K1y

    2

    -6-4

    -2

    0

    2

    4

    6

    y2

    12

  • 8/11/2019 Gp Cvpr12 Session1

    288/302

    -6 -4 -2 0 2 4 6

    y1

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

  • 8/11/2019 Gp Cvpr12 Session1

    289/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

  • 8/11/2019 Gp Cvpr12 Session1

    290/302

  • 8/11/2019 Gp Cvpr12 Session1

    291/302

  • 8/11/2019 Gp Cvpr12 Session1

    292/302

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

  • 8/11/2019 Gp Cvpr12 Session1

    293/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

  • 8/11/2019 Gp Cvpr12 Session1

    294/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

    1

  • 8/11/2019 Gp Cvpr12 Session1

    295/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

    1

  • 8/11/2019 Gp Cvpr12 Session1

    296/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

    Learning Covariance ParametersCan we determine length scales and noise levels from the data?

    -2

    -1

    0

    1

    2

    -2 -1 0 1 2

    x

    -10

    -5

    0

    5

    10

    15

    20

    101 100 101

    length scale,

    1

  • 8/11/2019 Gp Cvpr12 Session1

    297/302

    E() =1

    2|K| + y

    K1y

    2

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

    Gene Expression Example

  • 8/11/2019 Gp Cvpr12 Session1

    298/302

    Data from ?. Figure from?.

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 70 / 74

    Outline

    1 The Gaussian Density

    2 Covariance from Basis Functions

    3 Basis Function Representations

    4 Constructing Covariance

    5 GP Limitations

  • 8/11/2019 Gp Cvpr12 Session1

    299/302

    6 Conclusions

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 71 / 74

    Limitations of Gaussian Processes

    Inference is O(n3) due to matrix inverse (in practice use Cholesky).

    Gaussian processes dont deal well with discontinuities (financialcrises, phosphorylation, collisions, edges in images).

    Widely used exponentiated quadratic covariance (RBF) can be toosmooth in practice (but there are many alternatives!!).

  • 8/11/2019 Gp Cvpr12 Session1

    300/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 72 / 74

    Summary

    Broad introduction to Gaussian processes. Started with Gaussian distribution. Motivated Gaussian processes through the multivariate density.

    Emphasized the role of the covariance (not the mean).

    Performs nonlinear regression with error bars.

    Parameters of the covariance function (kernel) are easily optimized

    with maximum likelihood.

  • 8/11/2019 Gp Cvpr12 Session1

    301/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 73 / 74

    References I

    G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6):939948, Jun 2008. [URL]. [DOI].

    A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses throughGaussian process regression. BMC Bioinformatics, 12(180), 2011. [DOI].

    R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996. Lecture Notes in Statistics 118.

    J. Oakley and A. OHagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769784, 2002.

    C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. [GoogleBooks].

    C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5):12031216, 1998.

    http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1186/1471-2105-12-180http://dx.doi.org/10.1186/1471-2105-12-180http://dx.doi.org/10.1186/1471-2105-12-180http://books.google.com/books?as_isbn=0-262-18253-Xhttp://books.google.com/books?as_isbn=0-262-18253-Xhttp://books.google.com/books?as_isbn=0-262-18253-Xhttp://books.google.com/books?as_isbn=0-262-18253-Xhttp://books.google.com/books?as_isbn=0-262-18253-Xhttp://books.google.com/books?as_isbn=0-262-18253-Xhttp://dx.doi.org/10.1186/1471-2105-12-180http://dx.doi.org/10.1101/gr.073601.107http://dx.doi.org/10.1101/gr.073601.107
  • 8/11/2019 Gp Cvpr12 Session1

    302/302

    Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 74 / 74