18
Lecture VII: MLSC - Dr. Sethu Vijayakumar 1 Lecture VII– Regression (Nonlinear/Nonparametric Methods) Contents: Local Weighted and Lazy learning techniques Nonparametric methods

Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 1

Lecture VII– Regression(Nonlinear/Nonparametric Methods)

Contents:• Local Weighted and Lazy learning techniques• Nonparametric methods

Page 2: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 2

Nonparametric Methods

Working DefinitionThe name “nonparametric” is to indicate that the data to be modeled stem from very large families of distributions which cannot be indexed by a finite dimensional parameter vector in a natural way.

Remarks this does not mean that nonparametric methods have no parameters!nonparametric methods avoid making assumptions about the parametric form of the underlying distributions (except some smoothness properties).nonparametric methods are often memory-based (but not necessarily)sometimes called “lazy learning”

Can be applied to density estimationclassification regression

Page 3: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 3

Locally Weighted Regression (LWR)

Fit locally lower order polynomials, e.g., first order polynomials

Approximate non-linear functions with a mixture of k piecewise linear models

Page 4: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 4

Minimize Weighted Squared Error

Solution: Weighted Least Squares

Weight can be calculated from any weighting kernel, e.g., a Gaussian:

LWR: Formalization

J = wn t n − y n( )T

n= 1

N

∑ t n − y n( ), where y n = x n T β

( )

1

21

0 0 00 0 0

where 0 0 00 0 0

T T

n

ww

w

β−

⎡ ⎤⎢ ⎥⎢ ⎥= =⎢ ⎥…⎢ ⎥⎢ ⎥⎣ ⎦

X WX X WY W

w = exp −12

x − c( )T D x − c( )⎛ ⎝

⎞ ⎠

Page 5: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 5

LWR: Examples (cont’d)

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

x

D=10.00

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

x

D=50.00

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

x

D=100.00

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

x

D=200.00

The fits exhibit the bias-variance tradeoff effect with respect to parameter D.

Page 6: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 6

How to Optimize the Distance Metric?

Global Optimizationfind the most suitable D for the entire data set

Local Optimizationfind the most suitable D as a function of the kernel location c

Two possible options :

Page 7: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 7

Global Distance Metric Optimization

Leave-one-out Cross Validationcompute the prediction of every data point in the training set by:

centering the kernel at every data pointbut excluding the data point at the center from the training set

find distance metric that minimizes the cross validation errordepending on how many parameters are allowed in the distance metric, this is a multidimensional optimization problem

NOTE: Leave-one-out Cross Validation is very cheap in nonparametric methods (in comparison to most parametric methods) since there is no iterative training of the learning system

J c = t n − y − nn( )T

n =1

N

∑ t n − y − nn( )

Page 8: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 8

Why Cross Validation ?

X

y

w

X

y

w

Without Cross Validation, the kernel could just shrink to zero and focus on one data point only.

*** Avoids degenerate Solutions for D

Page 9: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 9

0 50 100 150 200 250 3000.24

0.245

0.25

0.255

0.26

0.265

0.27

0.275

0.28

D

Global Optimization of Distance Metric: Example

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

x

D=100.00

Locally Weighted Cost functionResultant fit from Global Optimization

Page 10: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 10

Local Distance Metric Optimization

Why not optimize the distance metric as a function of the location of the kernel center?

Local Cross Validation Criterion

Something Exceptionally Cool: The local leave-one-out cross validation error can be computed analytically—WITHOUT an n-fold recalculation of the prediction for linear local models !

Jc = w n t n − y − nn( )T

n =1

N

∑ t n − y − nn( )

where β = X T WX( )−1X T WY = PXT WY

( ) ( ) ( ) ( )( )2

1 1 1

Tn n n n nN NTn n n n nc n n Tn n nn n

wJ w

w P− −

= =

− −= − − =

−∑ ∑

t y t yt y t y

x xa.k.a.:PressResidualError

Page 11: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 11

A Nonparametric Regression Network

Ideas:Create new (Gaussian) kernels as needed (i.e., when no other kernel in the net is activated sufficiently (=> a constructive network)update the linear model in each kernel by weighted recursive least squaresadjust the distance metric by gradient descent in local cross validation criteriaThe weighted output of all kernels is the prediction

Receptive Field Weighted Regression (RFWR)

Page 12: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 12

Nonparametric Regression Network (cont’d)

y = βxTx +β0 = βT ˜ x where ˜ x = xT 1[ ]T

w = exp −12

x−c( )T D x− c( )⎛ ⎝

⎞ ⎠ where D= MTM

J =1

w ii= 1

p

∑w i y i − ˆ y i , − i

2

i= 1

p

∑ + γ D ij2

i=1 , j =1

n

Elements of each Local Linear Model

Penalized local cross validation error

Regression Slope

Receptive Field

Page 13: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 13

Nonparametric Regression Network (cont’d)

Update of the parameters:Slope of local model :

Distance Metric Adaptation :

( )

1 1

1

1where and

n n n Tcv

n T nn n T n

cvT n

w

w

β β

βλλ

+ +

+

= +

⎛ ⎞⎜ ⎟

= − = −⎜ ⎟⎜ ⎟+⎜ ⎟⎝ ⎠

P x e

P x x PP P e y xx P x

%

% %%

% %

M n +1 = M n − α∂JdM

Another very cool thing: For linear systems, leave-one-out cross validation can be approximated INCREMENTALLY!Thus, no data has to be kept in memory!

Page 14: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 14

Example of learning

⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅

⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅⋅

⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅

⋅⋅⋅

⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅

⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅⋅⋅⋅

⋅⋅

⋅⋅⋅⋅⋅⋅⋅

⋅⋅⋅

+

++

+

+

+

+

+

++

+

++

++

+

+

++

++

+

+

+

+++

+

++

++

+++

+

++++

+++

++++

+

++

++

+

+++

++++

+++

++

+

+

+

++

-2

-1

0

1

2

3

4

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

y

x

0

0.5

1

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

w

x

Page 15: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 15

A 2D Example

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1.5

z

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1

Actual Data Generating function Sampled Data with noise

Why is this a tough function to learn ??

Page 16: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 16

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1.5

z

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1.5

z

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-0.5

0

0.5

1

1

⊕⊕

⊕ ⊕⊕

⊕⊕

⊕⊕

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5

y

x

⊕⊕

⊕ ⊕⊕

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5

y

x

A 2D Example (cont’d)Results after one Epoch of Training Results after 50 Epochs of Training

Page 17: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 17

Nonparametric Regression Network (Summary)

The LWR scheme we developed (RFWR)--can incrementally deal with the bias-variance dilemmagrows with the data (constructive)learns very fastis similar to a mixture of experts, but does not need a pre-specified number of experts (no competitive learning)is similar to committee networks (by averaging the outputs over many independently trained networks)

… but still has problems with the curse of dimensionality, as all spatially localized networks

Page 18: Lecture VII–Regression · Lecture VII: MLSC - Dr. Sethu Vijayakumar 17 Nonparametric Regression Network (Summary) The LWR scheme we developed (RFWR)-- zcan incrementally deal with

Lecture VII: MLSC - Dr. Sethu Vijayakumar 18

Handling curse of Dimensionality

We developed a method to make Local Learning methods like RFWR scale

…the resultant system is called Locally Weighted Projection Regression (LWPR)

LWPR module

We will learn more about this and other

dimensionality reduction techniques

in the next class.