Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons

RegressionUC Berkeley

Fall 2004, E77http://jagger.me.berkeley.edu/~pack/e77

Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to

Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

http://jagger.me.berkeley.edu/~pack/e77

http://creativecommons.org/licenses/by-sa/2.0/

InfoMidterm next Friday (11/5)

1-2 if you actually enrolled in my section. Check BlackBoard to see Room Location

If you have a conflict (as before), bring a letter/scheduleprintout, etc to class next Monday so that we can make arrangements.

Review Session Wednesday 11/3 evening. Check BlackBoard

HW and Lab due this Friday (10/29)

Mid-Course evaluation on BlackBoard. Do it by Thursday, 11/4 at noon. We can’t see your answers, but we know if you’ve done it. Get an extra point.

Regression: Curve-fitting with minimum errorGiven (x,y) data pairs

(x1,y1), (x2,y2), … , (xN,yN)

From a prespecified collection of “simple” functions– for example: all linear functions

find one that “explains the data pairs with minimum error.”

e1 = f(x1) – y1

e2 = f(x2) – y2

…

ek = f(xk) – yk

…

eN = f(xN) – yN

For a given function f, the mismatch (error) is defined

Fitting data with a linear function

-8 -6 -4 -2 0 2 4 6 8 10-6

-5

-4

-3

-2

-1

0

1

2

X

Y

Data points

Linear function

Positive ei

Negative ei

Straight-line functions

How does it work if the function f is to be of the form

f(x) = ax + b

for to-be-chosen parameters a and b?

Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)

For fixed values of a and b, the mismatch (error) is

e1 = ax1+b – y1

e2 = ax2+b – y2

…

eN = axN+b – yN

NNN y

yy

ba

x

xx

e

ee

2

1

2

1

2

1

1

11

“data”

by choosing

Goal: make this small

Measuring the “amount” of mismatch

Several ways to quantify the amount of mismatch.

All have the property that if one component of mismatch is “big”, then the measure-of-mismatch is big.

This is motivated for a few reasons:–It will lead to least squares problems, which we have already

been exposed. And, it makes sense. And…–By making “reasonable” assumptions about the cause of the

mismatch (independent, random, zero-mean, identically distributed, Gaussian additive errors in observing y), then it is the best measure of how likely a candidate function led to the data observed.

Neee 21

Neee ,,,max 21

21

222

21 Neee

For convenience, we pick…

iii xfy “noise” in measurement

Euclidean Norms of vectors

If v is an m-by-1 column (or row) vector, the “norm of v” is defined as

m

kkvv

1

2:

Symbol to denote “norm of v”

Square-root of sum-of-squares of components, generalizing Pythagorean’s theorem

The norm of a vector is a measure of its length. Some facts:

||v||=0 if and only if every component of v is zero

||v + w|| ≤ ||v|| + ||w||



1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

the “e” vector

||e||

This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”

The “Least Squares” Problem

If A is an n-by-m array, and b is an n-by-1 vector, let c* be the smallest possible (over all choices of m-by-1 vectors x) mismatch between Ax and b (ie., pick x to make Ax as much like b as possible).

bAxcmx

1,min:

“is defined as”“the minimum, over all m-by-1 vectors x”

“the length (ie., norm) of the difference/mismatch between Ax and b.”

Four cases for Least Squares

Recall least squares formulation

There are 4 scenarios

c* = 0: the equation Ax=b has at least one solution–only one x vector achieves this minimum–many different x vectors achieves the minimum

c* > 0: the equation Ax=b has no solutions–only one x vector achieves this minimum–many different x vectors achieves the minimum

bAxcmx

1,min:

In regression, this is almost always the case

The backslash operator

If A is an n-by-m array, and b is an n-by-1 vector, then

>> x = A\b

solves the “least squares” problem. Namely

–If there is an x which solves Ax=b, then this x is computed

–If there is no x which solves Ax=b, then an x which minimizes the mismatch between Ax and b is computed.

In the case where many x satisfy one of the criterion above, then a smallest (in terms of vector norm) such x is computed.

So, mismatch is handled first. Among all equally suitable x vectors that minimize the mismatch, choose a smallest one.



1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

the “e” vector

||e||

This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”

Linear Regression Code

function [a,b] = linreg(Xdata,Ydata)

% Fits a linear

% function Y = aX + b

% to the data given

% by Xdata, Ydata

% Verify Xdata and Ydata are column

% vectors of same length.

N = length(Xdata);

optpara = [Xdata ones(N,1)]\Ydata;

a = optpara(1);

b = optpara(2);

1

11

min 2

1

2

1

,

NN

ba

y

yy

ba

x

xx

Quadratic functions


f(x) = ax2 + bx + c

for to-be-chosen parameters a, b and c?

For fixed values of a, b and c, the error at (xk,yk) is

ek = axk2 + bxk + c – yk

NNNN y

yy

cba

xx

xxxx

e

ee

2

1

2

222

121

2

1

1

11

f(xk)

Polynomial functions


f(x) = a1xn + a2xn-1 + … + anx + an+1

for to-be-chosen parameters a1, a2,…,an+1?

For fixed values of a1, a2,…,an+1, the error at (xk,yk) is

Nn

nN

nN

nN

nn

nn

N y

yy

aa

aa

xxx

xxxxxx

e

ee

2

1

1

2

1

1

21

22

11

11

2

1

1

11

knknnk

nkk yaxaxaxae

1

121

f(xk)

Polynomial Regression Psuedo-Code

function p=polyreg(Xdata,Ydata,nOrd)

% Fits an nOrd’th order polynomial

% to the data given by Xdata, Ydata

N = length(Xdata);

RM = zeros(N,nOrd+1);

RM(:,end) = ones(N,1);

for i=1:nOrd

RM(:,end-i) = RM(:,end-i+1).*Xdata;

end

p = RM\Ydata;

p = p.’;

Nn

nN

nN

nN

nn

nn

y

yy

aa

aa

xxx

xxxxxx

2

1

1

2

1

1

21

22

11

11

1

11

General “basis” functions


for fixed functions bi (called “basis” functions), and to-be-chosen parameters a1, a2,…,an.

For fixed values of a1, a2,…,an, the error at (xk,yk) is

NnNnNN

n

n

N y

yy

a

aa

xbxbxb

xbxbxbxbxbxb

e

ee

2

1

2

1

21

22221

11211

2

1

)()()()( 2211 xbaxbaxbaxf nn

kknnkkk yxbaxbaxbae )()()( 2211

Documents

Regression UC Berkeley Fall 2004, E77 pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons