18
Gaussian Process Gaussian Process and Pre and Pre diction diction

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression Bayesian regression

Embed Size (px)

DESCRIPTION

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)3 Gaussian Process and Bayesian Regression (1) A distribution of y in Bayesian regression Generalized linear regression Weight-space view

Citation preview

Page 1: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

Gaussian ProcessGaussian Process and Predictionand Prediction

Page 2: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

2

OutlineOutline

Gaussian Process and Bayesian Regression Bayesian regression Weight-space view Function-space view Spline smoothing Neural network Classification problem

Active Data Selection Maximizing the expected information gain Minimizing the regression error Experimental result

Mixtures of Gaussian Process

Page 3: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

3

Gaussian Process and Bayesian Regression Gaussian Process and Bayesian Regression (1)(1)

A distribution of y in Bayesian regression

Generalized linear regression

Weight-space view

):,:()|()|()( dataDHypothesisaHDHpHypyp

)}({)()()(1

xfunctionsbasismofsetfixedaforxWxwxy i

m

i

Tii

MPT

ws

i wT

iT

i

MP

nnw

wxx

WWxWtE

posteriornegativetheMinimizingbyWChoose

dWWpWDpWpWDpDWp

txtxDexytNW

)()(21))((

21

log

)()|(/)()|()|(

))},(,),,{(,)(),,0(~

**

122

11

Page 4: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

4

Gaussian Process and Bayesian Regression Gaussian Process and Bayesian Regression (2)(2) Function-space view

Y(x) is a linear combination of Gaussian random variables W ~ N(0,) {Yx} is a Gaussian Process with mean and covariance functi

ons:

can be predicted from conditional distributions

)'()(][,0][ ' xxYYEYE wT

xxwxw

**)( YxY exytwherettY n )(),,,|( 1*

)()(

)()()(,))(,),((

)(

)(

1

111

2**1*

*1

****

1**

nmn

m

nT

wT

m

wT

wT

wT

Tw

T

xx

xxIPxxwhere

PYVar

tPYE

Page 5: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

5

Gaussian Process and Bayesian Regression Gaussian Process and Bayesian Regression (3)(3) Weight-space view and function-space view gave same results

For a smaller number of basis functions, weight space-view is preferred, while for a larger number of basis functions, function space-view (Gaussian procees view) is better.

Cf. Nonparametric Kernel estimator for a density p(y) :

):()(1)(ˆ1

bandwidthhwhereh

XyKnh

ypn

i

i

Page 6: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

6

Spline Smoothing (1)Spline Smoothing (1)

Interpolating spline

Interpolation spline is a cubic polynomial defined piecewise between adjacent knots with continuous second derivative(Schoenberg (1964)) Smoothing spline

interpolating spline. least squares linear fit. Smoothing spline is also a cubic spline ( Reinsch (1967))

niyxrtsdxxr ii

x

xr

n

,,1,)(..)(min1

2

),( 12

nxxCr

nx

x

n

iiir

dxxrxryrSr1

2

1

2 )()]([)(minarg)(ˆ

:0:

Page 7: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

7

Spline Smoothing (2)Spline Smoothing (2) Linear smoothing property of smoothing spline

If the design is equally spaced, then all of the n component smoothing splines are identical in shape. And the shape converged to the kernel (Silverman (1984)).

Cf. Nonparametric kernel regression (Nadaraya(1964) and Watson(1964):

n

i

iiryrrrryyx

1

)()2()1()21()2()1( ˆˆ,ˆˆˆ),(

||),2||

4sin(

21)( 2/|| ttetK t

s

n

jjh

ihih

n

iiih

xxK

xxKxxwyxxwxr

dyyxf

dyyxyfxXYExr

x

x

xx

1

1 )(

)(),(,),()(ˆ

),(

),()|()(

Page 8: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

8

Spline Smoothing (3)Spline Smoothing (3)

Spline estimation procedure can be interpreted as a Bayesian MAP:

2)(

1

2)( )]([21))((

21))(( xydxtxyxyM p

N

nn

n

When p=2: the resulting is a cubic spline ( a piesewise cubic function that has knots at the data points .)

)(ˆ xy

}{ )(nx

)(log.),,|)((log))((

.))((21)),(|(log

.)]([21)|)((log)):((

:),(:

1

2)(

2)(

MAPBayesianorposteriorconsttxypxyM

consttxyxytp

constxydxxypxyforprior

tDataxyModel

N

N

nn

nN

p

N

Page 9: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

9

Spline Smoothing (4)Spline Smoothing (4)

Spline priors are Gaussian processes Gaussian Process:

dxxzxyxzxyproductinner

operatorlinearAfunctionmeanxwhere

xxyAxxyZ

Axxyp

T

T

)()()()(

:,:)(

)]()(())()((21exp[1)),(|)((

)0)(0)()(..0,][,0)((

)()(21.)]([

21)|)((log 2)(

xyallforxAyxyeiADDAx

constxAyxyconstxydxxyp

TpTp

Tp

Page 10: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

10

Spline Smoothing (5)Spline Smoothing (5)

Splines correspond to Gaussian processes with a particular choice of covariance function.

n

iii

Tnn

n

zzCcZorzKkZEzZzZGiven

kkkK

KKNZZZ

1**

1*11

**1

),()()(:,,

),,0(~),,,(

Page 11: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

11

Known covariance function for modeling : (e.g.)

Page 12: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

12

Covariance function with unknown parameters For a smaller number of parameters: choose a parametric family of co

vaiance function and estimate by log likelihood.

For a larger number of parameters or for a local maxima etc.:use a prior distribution of parameters numerically.

tKKKtKKtrl

ntKtKDpl

i

T

ii

T

111

1

21)(

21

2log22

1||log21)|(log

dDpDypDyp )|(),|()|( **

Page 13: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

13

Multilayer Neural Networks and Gaussian ProcessMultilayer Neural Networks and Gaussian Process

The properties of neural network with one hidden layer converge to those of a gaussian process as the number of hidden neurons tends to infinity if standard weight decay priors are assumed. (Neal (1996))

The corresponding covariance of this gaussian process depends on the priors on the weights and activation functions of the hidden units in the network.

),()]()([)]()([

.))(,0()(,

)];'();([)];'();([)]'()([

,0)]([

...~),,0(~),,0(~,);()(

)()(22)()(22)()(

222/1

2222

22

1

jivbj

jivb

ji

vbvv

jjuvbj jjuvbw

W

jvjb

H

jjj

xxCxhxhExfxfE

CLTbyxVwNxfHwIf

uxhuxhEHuxhuxhExfxfE

xfE

diiunvnbuxhvbxf

Page 14: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

14

Classification Problems Classification Problems

Estimate the posterior p ( k | x ) for each class k with

Find a distribution by a Gaussian process prior of activation y(x) through a logistic regression.

Make a prediction for a test input x* by

( Apply appropriate Jacobian to the above for a distribution of ) When p(t|y) is Gaussian : exact expression When : no exact expression (use analytic approximation or MCMC)

k

xkpkallforxkp 1)|(,1)|(0

)(x

**** ),|(ˆ dtp

dyytpyyptp

dytyyptyp

ytpyypyytpyyptyypNote

)|()|,()|(

1),|,(),|(

)|(),(),|(),(),,(:

***

****

)(x

ii ti

t

i iytp 1)1()|(

Page 15: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

15

Active data Selection (1)Active data Selection (1) Maximizing the expected information gain criterior (Mckay (1992)).

By selecting the data with maximum predictor variance

Minimizing the error of (Cohn (1996)) : minimum overall variance.

*1

****)( wT

wT

wT PYVar

)(ˆ xy

Page 16: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

16

Active data Selection (2)Active data Selection (2) (a) Target function from a covariance function (b) Expected change of average variance over x for 100 reference points

Page 17: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

17

Active data Selection (3)Active data Selection (3) Experiments :

First data is selected random 150 data are selected actively 500 reference points for error evaluation Optimum query was selected using 300 random reference points

.

Page 18: Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)

18

Active data Selection (4)Active data Selection (4) For real data: pumadyn-8nm (puma560 robot arm)

250 data points for active selecting, 400 reference points