37
1 APPLIED MACHINE LEARNING 1 MACHINE LEARNING Gaussian Mixture Regression

Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

1

MACHINE LEARNINGAPPLIED MACHINE LEARNING

1

MACHINE LEARNING

Gaussian Mixture Regression

Page 2: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

2

MACHINE LEARNINGAPPLIED MACHINE LEARNING

2

Brief summary of last week’s lecture

Page 3: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

3

MACHINE LEARNINGAPPLIED MACHINE LEARNING

3

Locally Weighted Regression

,

= , , with , , , .i

i i i id x x

i x K d x x K d x x e d x x x x

Estimate is determined through local influence of each group of datapoints

X: query point

y x

Generates a smooth function y(x)

1 1

/ : weights function of xii j i

M M

i j

y x x y x x

y

Page 4: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

4

MACHINE LEARNINGAPPLIED MACHINE LEARNING

4

Locally Weighted Regression

Estimate is determined through local influence of each group of datapoints

1 1

/ : weights function of xii j i

M M

i j

y x x y x x

Model-free regression!

No longer explicit model of the formTy w x

Regression computed at each query point.

Depends on training points.

= ,i

i x K d x x

Page 5: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

5

MACHINE LEARNINGAPPLIED MACHINE LEARNING

5

= ,i

i x K d x x

Locally Weighted Regression

Estimate is determined through local influence of each group of datapoints

1 1

/ : weights function of xii j i

M M

i j

y x x y x x

Which training points?

Which kernel?

Page 6: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

6

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Data-driven Regression

y

x

Green: true function

Blue: estimated function

Good prediction depends on the choice of datapoints.

Page 7: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

7

MACHINE LEARNINGAPPLIED MACHINE LEARNING

y

Good prediction depends on the choice of datapoints.

The more datapoints, the better the fit.

Computational costs increase dramatically with number of datapoints

x

Data-driven Regression

Page 8: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

8

MACHINE LEARNINGAPPLIED MACHINE LEARNING

y

Several methods in ML for performing non-linear regression.

Differ in the objective function, in the amount of parameters.

Support Vector Regression (SVR) picks a subset of datapoints (support vectors)

x

Data-driven Regression

Page 9: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

9

MACHINE LEARNINGAPPLIED MACHINE LEARNING

y

Several methods in ML for performing non-linear regression.

Differ in the objective function, in the amount of parameters.

Support Vector Regression (SVR) picks a subset of datapoints (support vectors)

x

Data-driven Regression

y=f(x)

For illustrative purpose, we plot the negative Gauss function next to the SV, but they are distributed on the negative y axis.

Page 10: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

10

MACHINE LEARNINGAPPLIED MACHINE LEARNING

y

Several methods in ML for performing non-linear regression.

Differ in the objective function, in the amount of parameters.

Support Vector Regression (SVR) picks a subset of datapoints (support vectors)

x

Data-driven Regression

*

1 1.5 2 2 4 3 *

3 1.5 *

5 1 6 2.5

b

y=f(x)

*

1

,i

Mi

i

i

y f x k x x b

Analytical solution found after solving a

convex optimization problem

The Lagrange multipliers define the importance of each Gaussian function.

Converges to b when SV effect vanishes.

Page 11: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

11

MACHINE LEARNINGAPPLIED MACHINE LEARNING

y

x

Several methods in ML for performing non-linear regression.

Differ in the objective function, in the amount of parameters.

Support Vector Regression (SVR) picks a subset of datapoints (support vectors)

Gaussian Mixture Regression (GMR) generates a new set of datapoints

(centers of Gaussian functions)

Data-driven Regression

Page 12: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

12

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Gaussian Mixture Regression

Page 13: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

13

MACHINE LEARNINGAPPLIED MACHINE LEARNING

13

x

Gaussian Mixture Regression (GMR)

1) Estimate the joint density, p(x,y), across pairs of datapoints using GMM.

1

, , ; , , with , ; , ,

, : mean and covariance matrix of Gaussian

Ki i i i i i

i

i

i i

p x y p x y p x y N

i

2D projection of a

Gauss function

Ellipse contour ~ 2

std deviation

x

y

y

Page 14: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

14

MACHINE LEARNINGAPPLIED MACHINE LEARNING

14

x

1) Estimate the joint density, p(x,y), across pairs of datapoints using GMM.

1

, , ; , , with , ; , ,

, : mean and covariance matrix of Gaussian

Ki i i i i i

i

i

i i

p x y p x y p x y N

i

Parameters are learned through Expectation-maximization.

Iterative procedure. Start with random initialization.

y

Gaussian Mixture Regression (GMR)

Page 15: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

15

MACHINE LEARNINGAPPLIED MACHINE LEARNING

15

y

x

1) Estimate the joint density, p(x,y), across pairs of datapoints using GMM.

1

, , ; , , with , ; , ,

, : mean and covariance matrix of Gaussian

Ki i i i i i

i

i

i i

p x y p x y p x y N

i

1

1K

i

i

Mixing Coefficients

Relative importance of each

Gaussian i:

1

1|

Mj

i

i

p i p i xM

1 2

Gaussian Mixture Regression (GMR)

Page 16: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

16

MACHINE LEARNINGAPPLIED MACHINE LEARNING

16

1) Estimate the joint density, p(x,y), across pairs of datapoints using GMM.

2) Compute the regressive signal, by taking p(y|x)

y

x

1

| | ; ,K

i i

i

i

p y x x p y x

1

; ,with

; ,

i i

i

i Kj j

j

j

p xx

p x

Gauss function

The variance changes depending on the query point

Gaussian Mixture Regression (GMR)

Page 17: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

17

MACHINE LEARNINGAPPLIED MACHINE LEARNING

17

1) Estimate the joint density, p(x,y), across pairs of datapoints using GMM.

2) Compute the regressive signal, by taking p(y|x)

y

x

1

| | ; ,K

i i

i

i

p y x x p y x

1

; ,with

; ,

i i

i

i Kj j

j

j

p xx

p x

2 x

1 x

Influence of each marginal

is modulated by

Query point

Gaussian Mixture Regression (GMR)

x

Page 18: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

18

MACHINE LEARNINGAPPLIED MACHINE LEARNING

18

y

x

3) The regressive signal is then obtained by computing E{p(y|x)}:

1

|K

i

i

i

y E p y x x x

Linear combination of K local

regressive models

1 x

2 x

2 x

1 x

Gaussian Mixture Regression (GMR)

1

1

|

i

Ki i i i

i y xy xx x

i

x

xy E p y x x

x

The covariance matrix of

each Gauss function

can be decomposed into

blocks of matrices

,

with and the

covariance matrices on and and

the crosscovar

i i

i

i i

i i

i

xx xy

yx yy

xx yy

xy

i

x y

iance matrix.

Page 19: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

19

MACHINE LEARNINGAPPLIED MACHINE LEARNING

19

Computing the variance var{p(x,y)} provides information on the uncertainty

of the prediction computed from the conditional distribution.

Careful: This is not the uncertainty of the model.

Use the likelihood to compute the uncertainty of the model!

y

x

1 x

2 x

2

22

1 1

var |K K

i i i

i i

i i

p y x x x x x

Gaussian Mixture Regression (GMR)

1 2x x

x

1

|K

i

i

i

E p y x x x

1

with yy yx xx xy

i i i i i

The variance of the model is a weighted

combination of the variances of the

models around the weighted mean.

Page 20: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

20

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

20

var |p y x

|E p y x

Gaussian Mixture Regression (GMR)

Computing the variance var{p(x,y)} provides information on the uncertainty

of the prediction computed from the conditional distribution.

Color shading gives the

likelihood of the model

(uncertainty).

Page 21: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

21

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

21

var |p y x

Observe the modulation of the variance from small variance in first

Gauss function to large variance in the second Gauss function

Page 22: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

22

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

22

GMR: Sensitivity to Choice of K and

Initialization

Page 23: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

23

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

23

Fit with 4 Gaussians

Uniform initialization

GMR: Sensitivity to Choice of K and

Initialization

Page 24: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

24

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

24

Fit with 4 Gaussians

Random initialization

GMR: Sensitivity to Choice of K and

Initialization

Page 25: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

25

MACHINE LEARNING – 2012APPLIED MACHINE LEARNING

25

Fit with 10 Gaussians

Random initialization

GMR: Sensitivity to Choice of K and

Initialization

Page 26: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

26

MACHINE LEARNINGAPPLIED MACHINE LEARNING

26

Gaussian Mixture Regression: Summary

Such generative model provides more information than models that

directly compute p(y|x).

It allows to learn to predict a multi-dimensional output y.

It allows to query x given y, i.e. to compute p(x|y).

Parametrize the density p(x,y) and then estimate solely the parameters.

The density is constructed from a mixture of K Gaussians:

1

, , ; , , with , ; , ,

, : mean and covariance matrix of Gaussian

Ki i i i i i

i

i

i i

p x y p x y p x y N

i

Page 27: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

27

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Comparison Across Methods

Generalization – prediction away from datapoints

*

1

,i

Mi

i

i

y k x x b

SVR predicts y=b away from datapoints

Page 28: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

28

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Comparison Across Methods

Generalization – prediction away from datapoints

GMR predicts the trend away from data

1

Ki

i

i

y x x

1

; ,with

; ,

i i

i

i Kj j

j

j

p xx

p x

Page 29: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

29

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Comparison Across Methods

Generalization – prediction away from datapoints

But prediction depends on

model choice and initialization

(that influence solution found

during GMM training phase).GMR predicts the trend away from data

1

Ki

i

i

y x x

Page 30: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

30

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Comparison Across Methods

Generalization – prediction away from datapoints

The prediction away from the datapoints is affected by all

regressive models. It may become meaningless!

use the likelihood of the model to determine whether it is safe

or not to use the prediction

Page 31: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

Variance in p(y|x) in GMR represents the

modelled uncertainty of the value of y. It is not

a measure of the uncertainty of the model.

Variance in SVR represents the epsilon-tube, the

uncertainty around the predicted value of y. It does

not represent uncertainty of the model either!

Page 32: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

32

MACHINE LEARNINGAPPLIED MACHINE LEARNING

SVR, GMR: Similarities

• SVR and GMR are based on the same regressive model:

|y E p y x

SVR Solution GMR Solution

Assume white noise: ~ 0,N

| |

|

E p y x E p y x

y E p y x

( )y f x

Page 33: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

33

MACHINE LEARNINGAPPLIED MACHINE LEARNING

SVR, GMR: Similarities

• SVR and GMR are based on the same regressive model:

SVR and GMR compute a weighted combination of local predictors

Both separate input space into regions modeled by Gaussian

distributions (true only when using Gaussian/RBF kernels for SVR).

Model computed locally (locally weighted regression)!

*

1

,i

Mi

i

i

y k x x b

1

Ki

i

i

y x x

SVR Solution GMR Solution

Page 34: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

34

MACHINE LEARNINGAPPLIED MACHINE LEARNING

SVR, GMR: Differences

GMR allows to predict multi-dimensional outputs, while SVR can

predict only a uni-dimensional output y.

|y E p y x( )y f x

SVR Solution GMR Solution

But starts by computing ,p y x

can compute |

, can have arbitrary dimensions

p x y

x y

is unidimensional

can be multi-dimensional

y

x

Page 35: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

35

MACHINE LEARNINGAPPLIED MACHINE LEARNING

SVR and GMR and GPR are based on the same regressive model.

But they do not optimize the same objective function find different solutions.

• SVR:

• minimizes reconstruction error through convex optimization

ensured to find the optimal estimate; but not unique solution

• usually finds a nm of models <= nm of datapoints (support vectors)

• GMR:

• learns p(x,y) through maximum likelihood finds local optimum

• compute a generative model p(x,y) from which it derives p(y|x)

• starts with a low nm of models << nm of datapoints

SVR, GMR: Differences

Page 36: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

36

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Hyperparameters of SVR, GMR

SVR and GMR depend on hyperparameters that need to be determined

beforehand. These are:

• SVR

• choice of error margin and penalty factor C.

• choice of kernel and associated kernel parameters

• GMR:

• choice of the number of Gaussians

• choice of initialization (affects convergence to local optimum)

The hyperparamaters can be optimized separately; e.g. the nm of Gaussians in GMR

can be estimated using BIC; the kernel parameters of SVR can be optimized through

grid search.

Page 37: Gaussian Mixture Regression - EPFLlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Non... · 19 MACHINE LEARNINGAPPLIED MACHINE LEARNING 19 Computing the variance var{p(x,y)} provides

MACHINE LEARNING – 2012

37

MACHINE LEARNINGAPPLIED MACHINE LEARNING

Conclusion

No easy way to determine which regression technique fits best your problem

SVR

GMRGrows O(K)

SVR

Training Testing

Grows

O(number of SV)

Few SV - Small fraction

of original data

Convex optimization

(SMO solver)

Parameters grow O(M*N)

GMREM, iterative technique,

needs several runs

Parameters grow O(K*N2)

M: number of datapoints; N: Dimension of data; K: Number of Gauss Functions in GMM model