17
Regression UC Berkeley Fall 2004, E77 http://jagger.me.berkeley.edu/~ pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative

Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Embed Size (px)

Citation preview

Page 1: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

RegressionUC Berkeley

Fall 2004, E77http://jagger.me.berkeley.edu/~pack/e77

Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to

Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Page 2: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

InfoMidterm next Friday (11/5)

1-2 if you actually enrolled in my section. Check BlackBoard to see Room Location

If you have a conflict (as before), bring a letter/scheduleprintout, etc to class next Monday so that we can make arrangements.

Review Session Wednesday 11/3 evening. Check BlackBoard

HW and Lab due this Friday (10/29)

Mid-Course evaluation on BlackBoard. Do it by Thursday, 11/4 at noon. We can’t see your answers, but we know if you’ve done it. Get an extra point.

Page 3: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Regression: Curve-fitting with minimum errorGiven (x,y) data pairs

(x1,y1), (x2,y2), … , (xN,yN)

From a prespecified collection of “simple” functions– for example: all linear functions

find one that “explains the data pairs with minimum error.”

e1 = f(x1) – y1

e2 = f(x2) – y2

ek = f(xk) – yk

eN = f(xN) – yN

For a given function f, the mismatch (error) is defined

Page 4: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Fitting data with a linear function

-8 -6 -4 -2 0 2 4 6 8 10-6

-5

-4

-3

-2

-1

0

1

2

X

Y

Data points

Linear function

Positive ei

Negative ei

Page 5: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Straight-line functions

How does it work if the function f is to be of the form

f(x) = ax + b

for to-be-chosen parameters a and b?

Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)

For fixed values of a and b, the mismatch (error) is

e1 = ax1+b – y1

e2 = ax2+b – y2

eN = axN+b – yN

NNN y

yy

ba

x

xx

e

ee

2

1

2

1

2

1

1

11

“data”

by choosing

Goal: make this small

Page 6: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Measuring the “amount” of mismatch

Several ways to quantify the amount of mismatch.

All have the property that if one component of mismatch is “big”, then the measure-of-mismatch is big.

This is motivated for a few reasons:–It will lead to least squares problems, which we have already

been exposed. And, it makes sense. And…–By making “reasonable” assumptions about the cause of the

mismatch (independent, random, zero-mean, identically distributed, Gaussian additive errors in observing y), then it is the best measure of how likely a candidate function led to the data observed.

Neee 21

Neee ,,,max 21

21

222

21 Neee

For convenience, we pick…

iii xfy “noise” in measurement

Page 7: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Euclidean Norms of vectors

If v is an m-by-1 column (or row) vector, the “norm of v” is defined as

m

kkvv

1

2:

Symbol to denote “norm of v”

Square-root of sum-of-squares of components, generalizing Pythagorean’s theorem

The norm of a vector is a measure of its length. Some facts:

||v||=0 if and only if every component of v is zero

||v + w|| ≤ ||v|| + ||w||

Page 8: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Straight-line functions

Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)

1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

the “e” vector

||e||

This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”

Page 9: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

The “Least Squares” Problem

If A is an n-by-m array, and b is an n-by-1 vector, let c* be the smallest possible (over all choices of m-by-1 vectors x) mismatch between Ax and b (ie., pick x to make Ax as much like b as possible).

bAxcmx

1,min:

“is defined as”“the minimum, over all m-by-1 vectors x”

“the length (ie., norm) of the difference/mismatch between Ax and b.”

Page 10: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Four cases for Least Squares

Recall least squares formulation

There are 4 scenarios

c* = 0: the equation Ax=b has at least one solution–only one x vector achieves this minimum–many different x vectors achieves the minimum

c* > 0: the equation Ax=b has no solutions–only one x vector achieves this minimum–many different x vectors achieves the minimum

bAxcmx

1,min:

In regression, this is almost always the case

Page 11: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

The backslash operator

If A is an n-by-m array, and b is an n-by-1 vector, then

>> x = A\b

solves the “least squares” problem. Namely

–If there is an x which solves Ax=b, then this x is computed

–If there is no x which solves Ax=b, then an x which minimizes the mismatch between Ax and b is computed.

In the case where many x satisfy one of the criterion above, then a smallest (in terms of vector norm) such x is computed.

So, mismatch is handled first. Among all equally suitable x vectors that minimize the mismatch, choose a smallest one.

Page 12: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Straight-line functions

Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)

1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

the “e” vector

||e||

This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”

Page 13: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Linear Regression Code

function [a,b] = linreg(Xdata,Ydata)

% Fits a linear

% function Y = aX + b

% to the data given

% by Xdata, Ydata

% Verify Xdata and Ydata are column

% vectors of same length.

N = length(Xdata);

optpara = [Xdata ones(N,1)]\Ydata;

a = optpara(1);

b = optpara(2);

1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

Page 14: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Quadratic functions

How does it work if the function f is to be of the form

f(x) = ax2 + bx + c

for to-be-chosen parameters a, b and c?

For fixed values of a, b and c, the error at (xk,yk) is

ek = axk2 + bxk + c – yk

NNNN y

yy

cba

xx

xxxx

e

ee

2

1

2

222

121

2

1

1

11

f(xk)

Page 15: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Polynomial functions

How does it work if the function f is to be of the form

f(x) = a1xn + a2xn-1 + … + anx + an+1

for to-be-chosen parameters a1, a2,…,an+1?

For fixed values of a1, a2,…,an+1, the error at (xk,yk) is

Nn

nN

nN

nN

nn

nn

N y

yy

aa

aa

xxx

xxxxxx

e

ee

2

1

1

2

1

1

21

22

11

11

2

1

1

11

knknnk

nkk yaxaxaxae

1

121

f(xk)

Page 16: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

Polynomial Regression Psuedo-Code

function p=polyreg(Xdata,Ydata,nOrd)

% Fits an nOrd’th order polynomial

% to the data given by Xdata, Ydata

N = length(Xdata);

RM = zeros(N,nOrd+1);

RM(:,end) = ones(N,1);

for i=1:nOrd

RM(:,end-i) = RM(:,end-i+1).*Xdata;

end

p = RM\Ydata;

p = p.’;

Nn

nN

nN

nN

nn

nn

y

yy

aa

aa

xxx

xxxxxx

2

1

1

2

1

1

21

22

11

11

1

11

Page 17: Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

General “basis” functions

How does it work if the function f is to be of the form

for fixed functions bi (called “basis” functions), and to-be-chosen parameters a1, a2,…,an.

For fixed values of a1, a2,…,an, the error at (xk,yk) is

NnNnNN

n

n

N y

yy

a

aa

xbxbxb

xbxbxbxbxbxb

e

ee

2

1

2

1

21

22221

11211

2

1

)()()()( 2211 xbaxbaxbaxf nn

kknnkkk yxbaxbaxbae )()()( 2211