~ Curve Fitting ~ Least Squares Regression Chapter 17

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

1

~ Curve Fitting ~

Least Squares Regression

Chapter 17


2

• Fit the best curve to a discrete data set and obtain estimates for other data points

• Two general approaches:– Data exhibit a significant degree of scatter

Find a single curve that represents the general trend of the data.

– Data is very precise. Pass a curve(s) exactly through each of the points.

• Two common applications in engineering:

Trend analysis. Predicting values of dependent variable: extrapolation beyond data points or interpolation between data points.

Hypothesis testing. Comparing existing mathematical model with measured data.

Curve Fitting


3

In sciences, if several measurements are made of a particular quantity, additional insight can be gained by summarizing the data in one or more well chosen statistics:

Arithmetic mean - The sum of the individual data points (yi) divided by the number of points.

Standard deviation – a common measure of spread for a sample

or variance

Coefficient of variation –

quantifies the spread of data (similar to relative error)

nin

yy i ,,1

1

2

n

yyS i

y

)(

1

22

n

yyS i

y

)(

Simple Statistics

%100..y

Svc y


4

• Fitting a straight line to a

set of paired observations:

(x1, y1), (x2, y2),…,(xn, yn)

yi : measured value

e : error

yi = a0 + a1 xi + e

e = yi - a0 - a1 xi

a1 : slope

a0 : intercept

Linear Regression

e Error

Line equationy = a0 + a1 x


• Minimize the sum of the residual errors for all available data?

Inadequate!

(see )

• Sum of the absolute values?

Inadequate!

(see )

• How about minimizing the distance that an individual point falls from the line?

This does not work either! see

n

iioi

n

ii xaaye

11

1

)(

n

iii

n

ii xaaye

110

1

Choosing Criteria For a “Best Fit”

Regression

line


6

• Best strategy is to minimize the sum of the squares of the residuals between the measured-y and the y calculated with the linear model:

• Yields a unique line for a given set of data• Need to compute a0 and a1 such that Sr is minimized!

n

iiir

n

imodelimeasuredi

n

iir

xaayS

yy

eS

1

210

1

2

1

2

)(

)( ,,

e Error


Least-Squares Fit of a Straight Line

0 0)(2

0 0)(2

2101

1

101

iiiiiioir

iiioio

r

xaxaxyxxaaya

S

xaayxaaya

S

Normal equations which can be solved simultaneously

iiii

ii

xyaxax

yaxna

naa

12

0

10

00

(2)

(1)

Since

n

iii

n

iir xaayeS

1

210

1

2 )( :error Minimize


Least-Squares Fit of a Straight Line

xayaa

xxn

yxyxna

ii

iiii

100

221

as expressed becan (1), using

Normal equations which can be solved simultaneously

Mean values

iiii

ii

xyaxax

yaxna

12

0

10

n

iii

n

iir xaayeS

1

210

1


ii

i

ii

i

xy

y

a

a

xx

x

1

02

n


• Sr = Sum of the squares of residuals around the regression line• St = total sum of the squares around the mean

• (St – Sr) quantifies the improvement or error reduction due to describing data in terms of a straight line rather than as an average value.

• For a perfect fit Sr=0 and r = r2 = 1 signifies that the line explains 100 percent of the variability of the data.

• For r = r2 = 0 Sr=St the fit represents no improvement

t

rt

S

SSr

2

r : correlation coefficient

“Goodness” of our fit

The spread of data

(a)around the mean

(b)around the best-fit line

Notice the improvement in the error due to linear regression

2)( yyS it

n

iiir xaayS

1

210 )(


Linearization of Nonlinear Relationships

(a) Data that is ill-suited for linear least-squares regression

(b) Indication that a parabola may be more suitable

Exponential Eq.

xey

slope

lnintercept


Saturation growth-rate Eq.

Power Eq.

Linearization of Nonlinear Relationships


Data to be fit to the power equation:

x y1 0.5

2 1.7

3 3.4

4 5.7

5 8.4

22

2

logloglog

2

xy

xy

After (log y)–(log x) plot is obtained

Find and using:

log intercept Slope 22

Linear Regression yields the result:

log y = 1.75*log x – 0.3002 = 1.75 log 2 = - 0.3 2 = 0.5

y = 0.5 x1.75

See Exercises.xls

log x

0

0.301

0.477

0.602

0.699

log y

-0.301

0.226

0.531

0.756

0.924


13

Polynomial Regression

• Some engineering data is poorly represented by a straight line. A curve (polynomial) may be better suited to fit the data. The least squares method can be extended to fit the data to higher order polynomials.

• As an example let us consider a second order polynomial to fit the data points:

n

iiii

n

iir xaxaayeS

1

22210

1


21 2

21 2

1

2 21 2

2

2 ( ) 0

2 ( ) 0

2 ( ) 0

ri o i i

o

ri i o i i

ri i o i i

Sy a a x a x

a

Sx y a a x a x

a

Sx y a a x a x

a

iiiii

iiiii

iii

yxaxaxax

yxaxaxax

yaxaxna

22

41

30

2

23

12

0

22

10

2210 xaxaay


14

imi

ii

i

mmm

imi

mi

miii

mii

yx

yx

y

a

a

a

xxx

xxx

xxn

1

0

1

12

• To fit the data to an mth order polynomial, we need to solve the following system of linear equations ((m+1) equations with (m+1) unknowns)

Matrix Form

Polynomial Regression


15

• A useful extension of linear regression is the case where y is a linear function of two or more independent variables. For example:

y = ao + a1x1 + a2x2

• For this 2-dimensional case, the regression line becomes a plane as shown in the figure below.

Multiple Linear Regression


16

n

iiii

n

iir xaxaayeS

1

222110

1

2 )( :error Minimize :vars)-(2 Example

0)(2

0)(2

0)(2

221122

221111

2211

iioiir

iioiir

iioio

r

xaxaayxa

S

xaxaayxa

S

xaxaaya

S

iiiiii

iiiiii

iii

yxaxaxxax

yxaxxaxax

yaxaxna

222212102

122112101

22110

ii

ii

i

iiii

iiii

ii

yx

yx

y

a

a

a

xxxx

xxxx

xxn

2

1

2

1

0

22212

21211

21

Which method would you use to solve this Linear System of Equations?




Example 17.6

The following data is calculated from the equation y = 5 + 4x1 - 3x2

Use multiple linear regression to fit this data.

Solution:this system can be solved using Gauss Elimination.The result is: a0=5 a1=4 and a2= -3

y = 5 + 4x1 -3x2

100

5.243

54

544814

4825.765.16

145.166

2

1

0

a

a

a

x1 x2 y

0 0 5

2 1 10

2.5 2 9

1 3 0

4 6 3

7 2 27

ii

ii

i

iiii

iiii

ii

yx

yx

y

a

a

a

xxxx

xxxx

xxn

2

1

2

1

0

22212

21211

21

Documents

~ Curve Fitting ~ Least Squares Regression Chapter 17