Upload
leonard-fuller
View
89
Download
7
Embed Size (px)
DESCRIPTION
~ Curve Fitting ~ Least Squares Regression Chapter 17. Curve Fitting. Fit the best curve to a discrete data set and obtain estimates for other data points Two general approaches: - PowerPoint PPT Presentation
Citation preview
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1
~ Curve Fitting ~
Least Squares Regression
Chapter 17
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2
• Fit the best curve to a discrete data set and obtain estimates for other data points
• Two general approaches:– Data exhibit a significant degree of scatter
Find a single curve that represents the general trend of the data.
– Data is very precise. Pass a curve(s) exactly through each of the points.
• Two common applications in engineering:
Trend analysis. Predicting values of dependent variable: extrapolation beyond data points or interpolation between data points.
Hypothesis testing. Comparing existing mathematical model with measured data.
Curve Fitting
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
3
In sciences, if several measurements are made of a particular quantity, additional insight can be gained by summarizing the data in one or more well chosen statistics:
Arithmetic mean - The sum of the individual data points (yi) divided by the number of points.
Standard deviation – a common measure of spread for a sample
or variance
Coefficient of variation –
quantifies the spread of data (similar to relative error)
nin
yy i ,,1
1
2
n
yyS i
y
)(
1
22
n
yyS i
y
)(
Simple Statistics
%100..y
Svc y
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4
• Fitting a straight line to a
set of paired observations:
(x1, y1), (x2, y2),…,(xn, yn)
yi : measured value
e : error
yi = a0 + a1 xi + e
e = yi - a0 - a1 xi
a1 : slope
a0 : intercept
Linear Regression
e Error
Line equationy = a0 + a1 x
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Minimize the sum of the residual errors for all available data?
Inadequate!
(see )
• Sum of the absolute values?
Inadequate!
(see )
• How about minimizing the distance that an individual point falls from the line?
This does not work either! see
n
iioi
n
ii xaaye
11
1
)(
n
iii
n
ii xaaye
110
1
Choosing Criteria For a “Best Fit”
Regression
line
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
6
• Best strategy is to minimize the sum of the squares of the residuals between the measured-y and the y calculated with the linear model:
• Yields a unique line for a given set of data• Need to compute a0 and a1 such that Sr is minimized!
n
iiir
n
imodelimeasuredi
n
iir
xaayS
yy
eS
1
210
1
2
1
2
)(
)( ,,
e Error
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line
0 0)(2
0 0)(2
2101
1
101
iiiiiioir
iiioio
r
xaxaxyxxaaya
S
xaayxaaya
S
Normal equations which can be solved simultaneously
iiii
ii
xyaxax
yaxna
naa
12
0
10
00
(2)
(1)
Since
n
iii
n
iir xaayeS
1
210
1
2 )( :error Minimize
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line
xayaa
xxn
yxyxna
ii
iiii
100
221
as expressed becan (1), using
Normal equations which can be solved simultaneously
Mean values
iiii
ii
xyaxax
yaxna
12
0
10
n
iii
n
iir xaayeS
1
210
1
2 )( :error Minimize
ii
i
ii
i
xy
y
a
a
xx
x
1
02
n
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Sr = Sum of the squares of residuals around the regression line• St = total sum of the squares around the mean
• (St – Sr) quantifies the improvement or error reduction due to describing data in terms of a straight line rather than as an average value.
• For a perfect fit Sr=0 and r = r2 = 1 signifies that the line explains 100 percent of the variability of the data.
• For r = r2 = 0 Sr=St the fit represents no improvement
t
rt
S
SSr
2
r : correlation coefficient
“Goodness” of our fit
The spread of data
(a)around the mean
(b)around the best-fit line
Notice the improvement in the error due to linear regression
2)( yyS it
n
iiir xaayS
1
210 )(
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships
(a) Data that is ill-suited for linear least-squares regression
(b) Indication that a parabola may be more suitable
Exponential Eq.
xey
slope
lnintercept
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Saturation growth-rate Eq.
Power Eq.
Linearization of Nonlinear Relationships
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data to be fit to the power equation:
x y1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
22
2
logloglog
2
xy
xy
After (log y)–(log x) plot is obtained
Find and using:
log intercept Slope 22
Linear Regression yields the result:
log y = 1.75*log x – 0.3002 = 1.75 log 2 = - 0.3 2 = 0.5
y = 0.5 x1.75
See Exercises.xls
log x
0
0.301
0.477
0.602
0.699
log y
-0.301
0.226
0.531
0.756
0.924
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
13
Polynomial Regression
• Some engineering data is poorly represented by a straight line. A curve (polynomial) may be better suited to fit the data. The least squares method can be extended to fit the data to higher order polynomials.
• As an example let us consider a second order polynomial to fit the data points:
n
iiii
n
iir xaxaayeS
1
22210
1
2 )( :error Minimize
21 2
21 2
1
2 21 2
2
2 ( ) 0
2 ( ) 0
2 ( ) 0
ri o i i
o
ri i o i i
ri i o i i
Sy a a x a x
a
Sx y a a x a x
a
Sx y a a x a x
a
iiiii
iiiii
iii
yxaxaxax
yxaxaxax
yaxaxna
22
41
30
2
23
12
0
22
10
2210 xaxaay
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
14
imi
ii
i
mmm
imi
mi
miii
mii
yx
yx
y
a
a
a
xxx
xxx
xxn
1
0
1
12
• To fit the data to an mth order polynomial, we need to solve the following system of linear equations ((m+1) equations with (m+1) unknowns)
Matrix Form
Polynomial Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
15
• A useful extension of linear regression is the case where y is a linear function of two or more independent variables. For example:
y = ao + a1x1 + a2x2
• For this 2-dimensional case, the regression line becomes a plane as shown in the figure below.
Multiple Linear Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
16
n
iiii
n
iir xaxaayeS
1
222110
1
2 )( :error Minimize :vars)-(2 Example
0)(2
0)(2
0)(2
221122
221111
2211
iioiir
iioiir
iioio
r
xaxaayxa
S
xaxaayxa
S
xaxaaya
S
iiiiii
iiiiii
iii
yxaxaxxax
yxaxxaxax
yaxaxna
222212102
122112101
22110
ii
ii
i
iiii
iiii
ii
yx
yx
y
a
a
a
xxxx
xxxx
xxn
2
1
2
1
0
22212
21211
21
Which method would you use to solve this Linear System of Equations?
Multiple Linear Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Multiple Linear Regression
Example 17.6
The following data is calculated from the equation y = 5 + 4x1 - 3x2
Use multiple linear regression to fit this data.
Solution:this system can be solved using Gauss Elimination.The result is: a0=5 a1=4 and a2= -3
y = 5 + 4x1 -3x2
100
5.243
54
544814
4825.765.16
145.166
2
1
0
a
a
a
x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
ii
ii
i
iiii
iiii
ii
yx
yx
y
a
a
a
xxxx
xxxx
xxn
2
1
2
1
0
22212
21211
21