Upload
collin-franklin
View
213
Download
1
Embed Size (px)
Citation preview
RegressionUC Berkeley
Fall 2004, E77http://jagger.me.berkeley.edu/~pack/e77
Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
InfoMidterm next Friday (11/5)
1-2 if you actually enrolled in my section. Check BlackBoard to see Room Location
If you have a conflict (as before), bring a letter/scheduleprintout, etc to class next Monday so that we can make arrangements.
Review Session Wednesday 11/3 evening. Check BlackBoard
HW and Lab due this Friday (10/29)
Mid-Course evaluation on BlackBoard. Do it by Thursday, 11/4 at noon. We can’t see your answers, but we know if you’ve done it. Get an extra point.
Regression: Curve-fitting with minimum errorGiven (x,y) data pairs
(x1,y1), (x2,y2), … , (xN,yN)
From a prespecified collection of “simple” functions– for example: all linear functions
find one that “explains the data pairs with minimum error.”
e1 = f(x1) – y1
e2 = f(x2) – y2
…
ek = f(xk) – yk
…
eN = f(xN) – yN
For a given function f, the mismatch (error) is defined
Fitting data with a linear function
-8 -6 -4 -2 0 2 4 6 8 10-6
-5
-4
-3
-2
-1
0
1
2
X
Y
Data points
Linear function
Positive ei
Negative ei
Straight-line functions
How does it work if the function f is to be of the form
f(x) = ax + b
for to-be-chosen parameters a and b?
Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)
For fixed values of a and b, the mismatch (error) is
e1 = ax1+b – y1
e2 = ax2+b – y2
…
eN = axN+b – yN
NNN y
yy
ba
x
xx
e
ee
2
1
2
1
2
1
1
11
“data”
by choosing
Goal: make this small
Measuring the “amount” of mismatch
Several ways to quantify the amount of mismatch.
All have the property that if one component of mismatch is “big”, then the measure-of-mismatch is big.
This is motivated for a few reasons:–It will lead to least squares problems, which we have already
been exposed. And, it makes sense. And…–By making “reasonable” assumptions about the cause of the
mismatch (independent, random, zero-mean, identically distributed, Gaussian additive errors in observing y), then it is the best measure of how likely a candidate function led to the data observed.
Neee 21
Neee ,,,max 21
21
222
21 Neee
For convenience, we pick…
iii xfy “noise” in measurement
Euclidean Norms of vectors
If v is an m-by-1 column (or row) vector, the “norm of v” is defined as
m
kkvv
1
2:
Symbol to denote “norm of v”
Square-root of sum-of-squares of components, generalizing Pythagorean’s theorem
The norm of a vector is a measure of its length. Some facts:
||v||=0 if and only if every component of v is zero
||v + w|| ≤ ||v|| + ||w||
Straight-line functions
Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)
1
11
min 2
1
2
1
,
NN
ba
y
yy
ba
x
xx
the “e” vector
||e||
This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”
The “Least Squares” Problem
If A is an n-by-m array, and b is an n-by-1 vector, let c* be the smallest possible (over all choices of m-by-1 vectors x) mismatch between Ax and b (ie., pick x to make Ax as much like b as possible).
bAxcmx
1,min:
“is defined as”“the minimum, over all m-by-1 vectors x”
“the length (ie., norm) of the difference/mismatch between Ax and b.”
Four cases for Least Squares
Recall least squares formulation
There are 4 scenarios
c* = 0: the equation Ax=b has at least one solution–only one x vector achieves this minimum–many different x vectors achieves the minimum
c* > 0: the equation Ax=b has no solutions–only one x vector achieves this minimum–many different x vectors achieves the minimum
bAxcmx
1,min:
In regression, this is almost always the case
The backslash operator
If A is an n-by-m array, and b is an n-by-1 vector, then
>> x = A\b
solves the “least squares” problem. Namely
–If there is an x which solves Ax=b, then this x is computed
–If there is no x which solves Ax=b, then an x which minimizes the mismatch between Ax and b is computed.
In the case where many x satisfy one of the criterion above, then a smallest (in terms of vector norm) such x is computed.
So, mismatch is handled first. Among all equally suitable x vectors that minimize the mismatch, choose a smallest one.
Straight-line functions
Given (x,y) data pairs (x1,y1), (x2,y2), … , (xN,yN)
1
11
min 2
1
2
1
,
NN
ba
y
yy
ba
x
xx
the “e” vector
||e||
This says: “By choice of a and b, minimize the Euclidean norm of the mismatch.”
Linear Regression Code
function [a,b] = linreg(Xdata,Ydata)
% Fits a linear
% function Y = aX + b
% to the data given
% by Xdata, Ydata
% Verify Xdata and Ydata are column
% vectors of same length.
N = length(Xdata);
optpara = [Xdata ones(N,1)]\Ydata;
a = optpara(1);
b = optpara(2);
1
11
min 2
1
2
1
,
NN
ba
y
yy
ba
x
xx
Quadratic functions
How does it work if the function f is to be of the form
f(x) = ax2 + bx + c
for to-be-chosen parameters a, b and c?
For fixed values of a, b and c, the error at (xk,yk) is
ek = axk2 + bxk + c – yk
NNNN y
yy
cba
xx
xxxx
e
ee
2
1
2
222
121
2
1
1
11
f(xk)
Polynomial functions
How does it work if the function f is to be of the form
f(x) = a1xn + a2xn-1 + … + anx + an+1
for to-be-chosen parameters a1, a2,…,an+1?
For fixed values of a1, a2,…,an+1, the error at (xk,yk) is
Nn
nN
nN
nN
nn
nn
N y
yy
aa
aa
xxx
xxxxxx
e
ee
2
1
1
2
1
1
21
22
11
11
2
1
1
11
knknnk
nkk yaxaxaxae
1
121
f(xk)
Polynomial Regression Psuedo-Code
function p=polyreg(Xdata,Ydata,nOrd)
% Fits an nOrd’th order polynomial
% to the data given by Xdata, Ydata
N = length(Xdata);
RM = zeros(N,nOrd+1);
RM(:,end) = ones(N,1);
for i=1:nOrd
RM(:,end-i) = RM(:,end-i+1).*Xdata;
end
p = RM\Ydata;
p = p.’;
Nn
nN
nN
nN
nn
nn
y
yy
aa
aa
xxx
xxxxxx
2
1
1
2
1
1
21
22
11
11
1
11
General “basis” functions
How does it work if the function f is to be of the form
for fixed functions bi (called “basis” functions), and to-be-chosen parameters a1, a2,…,an.
For fixed values of a1, a2,…,an, the error at (xk,yk) is
NnNnNN
n
n
N y
yy
a
aa
xbxbxb
xbxbxbxbxbxb
e
ee
2
1
2
1
21
22221
11211
2
1
)()()()( 2211 xbaxbaxbaxf nn
kknnkkk yxbaxbaxbae )()()( 2211