34
Simple Linear Regression Polynomial Regression Multiple Regression Statistical Analysis of L-S Theory Non-Linear Regression Part 5a: LEAST-SQUARES REGRESSION

Es272 ch5a

Embed Size (px)

Citation preview

Page 1: Es272 ch5a

– Simple Linear Regression

– Polynomial Regression

– Multiple Regression

– Statistical Analysis of L-S Theory

– Non-Linear Regression

Part 5a: LEAST-SQUARES REGRESSION

Page 2: Es272 ch5a

m v

t

Introduction:

Consider the falling object in air problem:

0v

m

m

1v

0t

1t

nt nv (t) values considered to be error-free.Every measurement of (v) contain some error.Assume error in (v) are normally distributed

(random error).Find “best fit” curve to represent v(t)

“Best fit”

Page 3: Es272 ch5a

n

i

n

i

iifitmeasr xaayyyS1 1

2

10

2 )()(

Simple Linear Regression

xaay 10

Find a0 and a1 that minimizes Sr (least-square)

Consider a set of n scattered data Find a line that “best fits” the scattered data

There are a number of ways to define the “best fit” line. However we want to find one that is unique, i.e., for a particular set of data. A uniquely defined best-fit line can be found by minimizing the sum of the square of the residuals from each data point:

sum of the square of the residuals (or spread)

a0= intercepta1= slope

Page 4: Es272 ch5a

To minimize Sr (a0 , a1), differentiate and set to zero:

0)(21

10

0

n

i

iir xaay

a

S

0])[(21

10

1

n

i

iiir xxaay

a

S

or

ii xaay 100 2

100 iiii xaxaxy

ii yaxna 10

iiii yxaxax 1

2

0

Need to solve these simultaneous equations for the unknowns a0

and a1

Normal equations for simple linear L-S regression

Page 5: Es272 ch5a

Solution for a1 and a0 gives

221

ii

iiii

xxn

yxyxna and xay

n

xa

n

ya

ii

110

EX: Find linear fit for the set of measurements:

x y

1 0.5

2 2.5

3 2.0

4 4.0

5 3.5

6 6.0

7 5.5

5.119ii yx

7n

1402

ix

28ix 24iy

4286.3y4x

839.0)28()140(7

)24(2)5.119(721a

0714.0)4(839.04286.30axy 839.00714.0

Page 6: Es272 ch5a

Quantification of Error:

2/

n

Ss r

xy

standard error of the L-S estimate

2

1

)(n

i

it yyS

2

1

10 )(n

i

iir xaayS

standard deviation

Sum of the square of the residuals for the mean

sum of the square of the residuals for the linear regression

1n

Ss t

y

All these approaches are based on the assumptions:x > error-freey > normal error

Page 7: Es272 ch5a

t

rt

S

SSr 2 r “correlation coefficient”

Sr =0 (r=1) Perfect fitSr =St (r=0) No improvement by fitting the line

2222

iiii

iiii

yynxxn

yxyxnr

Alternative formulation for the correlation coefficient

Note: r 1 does not always necessarily mean that the fit is “good”. You should always plot the data along with the regression curve to see the goodness of the fit.

Four set of data with same r=0.816

“Coefficient of determination” is defined as

Page 8: Es272 ch5a

Linearization of non-linear relationships:

Many engineering applications involve non-linear relationships, e.g., exponential, power law, or saturated growth rate.

xbeay 1

12

2

bxay

xb

xay

3

3

exponential power-law saturated growth-rate

These relationships can be linearized by some mathematical operations:

xbay 11lnln 22 logloglog axby

33

3 111

axa

b

y

Linear L-S fit can be applied to find the coefficients.

Page 9: Es272 ch5a

EX: Fit a power law relationship to the following dataset:

Calculate logarithm of both data:

x y

1 0.5

2 1.7

3 3.4

4 5.7

5 8.4

log x log y

0 -0.301

0.301 0.226

0.477 0.534

0.602 0.753

0.699 0.922

2

2

bxay

22 logloglog axby

Power law model

(find a2 and b2)

Applying simple linear regression gives;

slope=1.75 and intercept=-0.300

75.12b 300.0log 2a 5.02a

75.15.0 xy

Page 10: Es272 ch5a

Polynomial Regression

In some cases, we may want to fit our data to a curve rather than a line. We can then apply polynomial regression (In fact, linear regression is nothing but an n=1 polynomial regression).

Data to fit to a second order polynomial:

2

210 xaxaay

2

1

2

210

2

1

,, )()(n

i

iii

n

i

fitiobsir xaxaayyyS

Sum of the square of the residuals (spread)

To minimize Sr(a0, a1, a2), take derivatives and equate to zero:

0)(21

2

210

0

n

i

iiir xaxaay

a

S

Page 11: Es272 ch5a

0)(21

2

210

1

n

i

iiiir xaxaayx

a

S

0)(21

2

210

2

2

n

i

iiiir xaxaayx

a

S

Three linear equations with three unknowns a0, a1, a2 :

iii yaxaxan 2

2

10)(

iiiii yxaxaxax 2

3

1

2

0

iiiii yxaxaxax 2

2

4

1

3

0

2 all summations are i=1..n

“normal equations”

This set of equations can be solved by any linear solution techniques (e.g., Gauss elimination, LU Dec., Cholesky Dec., etc.)

Page 12: Es272 ch5a

The approach can be generalized to order (m) polynomial following the same way. Now, the fit function becomes

m

mxaxaxaay ..2

210

This will require the solution of an order (m+1) system of linear equations. The standard error becomes

)1(/

mn

Ss r

xy

Because (m+1) degrees of freedom was lost from data of (n) due to extraction of (m+1) coefficients .

EX 17.5: Fit an 2nd order polynomial to the following data xi yi

0 2.1

1 7.7

2 13.6

3 27.2

4 40.9

5 61.1

2m

6n 15ix

6.152iy

552

ix 2253

ix 9794

ix

6.585ii yx

8.24882

ii yx

Page 13: Es272 ch5a

System of linear equations:

We get

Then, the fit function:

Standard error:

where

8.2488

6.585

6.152

97922555

2255515

55156

2

1

0

a

a

a

86071.1

35929.2

47857.2

2

1

0

a

a

a

286071.135929.247857.2 xxy

12.1)3(6

74657.3

)1(/

mn

Ss r

xy

74657.3)86071.135929.247857.2(

26

1

2

i

iiir xxyS

Page 14: Es272 ch5a

Multiple Linear Regression

2x

1x

),( 21 xxy In some cases, data may have two or more independent variables. In this example, for a function of two variables, the linear regression gives a planar fit function.

Function to fit

22110 xaxaay

Sum of the square of the residuals (spread)

2

1

22110

2

1

,, )()(n

i

iii

n

i

fitiobsir xaxaayyyS

Page 15: Es272 ch5a

Minimizing the spread function gives:

0)(21

22110

0

n

i

iiir xaxaay

a

S

0)(21

221101

1

n

i

iiiir xaxaayx

a

S

0)(21

221102

2

n

i

iiiir xaxaayx

a

S

The system of equations to be solved:

ii

ii

i

iiii

iiii

ii

yx

yx

y

a

a

a

xxxx

xxxx

xxn

2

11

2

1

0

2

2212

21

2

11

21Normal equations for multiple linear regression

Page 16: Es272 ch5a

EX 17.7: Fit a planar surface to the following data

We first do the following calculations:

x1 x2 y

0 0 5

2 1 10

2.5 2 9

1 3 0

4 6 3

7 2 27

y x1 x2 x1x1 x2x2 x1x2 x1y x2y

5 0 5 0 0 0 0 0

10 1 10 4 1 2 20 10

9 2 9 6.25 4 5 22.5 18

0 3 0 1 9 3 0 0

3 6 3 16 36 24 12 18

27 2 27 49 4 14 189 54

54 16.5 14 76.25 54 48 243.5 100

Page 17: Es272 ch5a

The system of equations to calculate the fit coefficients:

returns

The fit function

100

5.243

54

544814

4825.765.16

145.166

2

1

0

a

a

a

50a 41a 32a

21 345 xxy

For the general case of a function of m-variables, the same strategy can applied. The fit function in this case:

mmxaxaxaay ..22110

Standard error:

)1(/

mn

Ss r

xy

Page 18: Es272 ch5a

A useful application of multiple regression is for fitting a power law equation of multiple variables of the form:

ma

m

aaxxxay ..21

210

Linearization of this equation gives

mm xaxaay log...logloglog 110

The coefficients in the last equation can be calculated using multiple linear regression, and can be substituted to the original power law equation.

Page 19: Es272 ch5a

Generalization of L-S Regression:

In the most general form, L-S regression can be stated as

mmzazazay ...1100

functions

mm xzxzz ...,,,1 110 Multiple regression

m

m xzxzxz ...,,, 1

1

0

0 Polynomial regression

In general, this form is called “linear regression” as the fitting coefficients are linearly dependant on the fit function.

Other functions can be defined for fitting as well, e.g.,

tataay sincos 210

Page 20: Es272 ch5a

ezazazay mm...1100

For a particular data point

For n data (in matrix form):

eaZy

nmnn

m

zzz

zzz

Z

10

11110

...

...

...

Calculated based on the measured independant variables

m: order of the fit functionn: number of data points

Z is generally not a square matrix.

1mn

ny

y

y

y...

2

1

ma

a

a

a...

1

0

residuals

ne

e

e

e...

2

1

data coefficients

Page 21: Es272 ch5a

Sum of the square of the residuals:

2

1 0

)(n

i

m

j

jijir zayS

To determine the fit coefficients, minimize ),..,,( 10 mr aaaS

This is equivalent to the following:

yZaZZTT

Normal equations for the general L-S regression

This is the general representation of the normal equations for L-S regression including simple linear, polynomial, and multiple linear regression methods.

Page 22: Es272 ch5a

Solution approaches:

yZaZZTT

A symmetric and squarematrix of size [m+1 , m+1]

Elimination methods are best suited for the solution of the above linear system:

LU Decomposition / Gauss EliminationCholesky Decomposition

Especially, Cholesky decomposition is fast and requires less storage. Furthermore,Cholesky decomposition is very appropriate when the order of

the polynomial fit model (m) is not known beforehand. Successive higher order models can be efficiently developed.Similarly, increasing the number of variables in multiple

regression is very efficient using Cholesky decomposition.

Page 23: Es272 ch5a

Statistical Analysis of L-S Theory

Some definitions:

n

y

y

n

i

i

1 mean

1

2

n

yys

i

yStandard deviation

1

2

n

Ss t

yvariance

For a perfectly normal distribution:mean±std fall about 68% of the total data. mean±2std fall about 95% of the total data.

If a histogram of the data shows a bell shape curve, normallydistributed data.This has a well-defined statistics

: true mean: true std

Page 24: Es272 ch5a

Confidence intervals:

For 95% confidence interval =0.05

1,2/ n

yt

n

syL

2,2/ n

yt

n

syU

t-distribution (tabulated in books); in EXCEL tinv ( ,n)

significance level

1ULP

true mean

Confidence interval estimates intervals within which the parameter is expected to fall, with a certain degree of confidence.Find L and U values such that

e.g., for =0.05 and n=20t /2, n-1=2.086

T-distribution is used to compramize between a perfect and an imperfect estimate. For example, if data is few (small n), t-value becomes larger, hence giving a more conservative interval of confidence.

Page 25: Es272 ch5a

EX: Some measurements of coefficient of thermal expansion of steel (x10-6 1/°F):

Find the mean and corresponding 95% confidence intervals for the a) first 8 measurements b) first 16 measurements c) all 24 measurements.

For n=8

6.495 6.595 6.615 6.635 6.485 6.555

6.665 6.505 6.435 6.625 6.715 6.655

6.755 6.625 6.715 6.575 6.655 6.605

6.565 6.515 6.555 6.395 6.775 6.685

n=8

n=16

n=24

59.6y 089921.0ys 364623.218,2/05.01,2/ tt n

5148.6364623.28

089921.059.61,2/ n

yt

n

syL

6652.6364623.28

089921.059.62,2/ n

yt

n

syU

6652.65148.6

For eight measurements, there is a 95% probability that true mean falls between these values.

Page 26: Es272 ch5a

The cases of n=16 and n=24 can be performed in a similar fashion. Hence we obtain:

Results shows that confidence interval narrows down as the number of measurements increases (even though sy increases by increasing n!).

For n=24 we have 95% confidence that true mean is between 6.5590 and 6.6410.

n mean(y) sy t /2,n-1 L U

8 6.5900 0.089921 2.364623 6.5148 6.6652

16 6.5794 0.095845 2.131451 6.5283 6.6304

24 6.6000 0.097133 2.068655 6.5590 6.6410

Page 27: Es272 ch5a

Using matrix inverse for the solution of (a) is inefficient:

yZZZaTT 1

However, inverse matrix carries useful statistical informationabout the goodness of the fit.

1

ZZT

Diagonal terms variances (var) of the fit coefficients

Off -diagonal terms covariances (cov) of the fit coefficients

2

/1)var( xyiii sua

2

/,11 ),cov( xyjiji suaa

These statistics allow calculation of confidence intervals for the fit coefficients.

Confidence Interval for L-S regression:

uij: Elements of the inverse matrix

Inverse matrix

Page 28: Es272 ch5a

Calculating confidence intervals for simple linear regression:

)( 02,2/0 astaL n

)( 02,2/0 astaU n

)( 12,2/1 astaL n

)( 12,2/1 astaU n

xaay 10

For the intercept (a0)

Standard error for the coefficient (extracted from the inverse matrix)

)var()( ii aas

For the slope (a1)

Page 29: Es272 ch5a

EX 17.8: Compare results of measured versus model data shown below. a) Plot the measured versus model values. b) Apply simple linear regression formula to see the adequacy of the measured versus model data.c) Recompute regression using matrix approach, estimate standard error of the estimation and for the fit parameters, and develop confidence intervals.

Measured Value

Model value

10 8.953

16.3 16.405

23 22.607

27.5 27.769

31 32.065

35.6 35.641

39 38.617

41.5 41.095

42.9 43.156

45 44.872

46 46.301

45.5 47.49

46 48.479

49 49.303

50 49.988

0

10

20

30

40

50

60

0 20 40 60

a)

b) Applying simple linear regression formula gives

xy 032.1859.0x: measuredy: model

measured

mo

del

Page 30: Es272 ch5a

c) For the statistical analysis, first form the following [Z] matrix and (y) vector

Then,

Solution using the matrix inversion

501

....

....

3.161

101

Z

988.49

..

..

405.16

953.8

y

yZaZZTT

43.22421

741.552

21.221913.548

3.54815

1

0

a

a

yZZZaTT 1

031592.1

85872.0

43.22421

741.552

000465.001701.0

01701.0688414.0

1

0

a

a

Page 31: Es272 ch5a

Standard error for the fit function:

Standard error for the coefficients:

For a 95% confidence interval ( =0.05, n=13, Excel returns inv(0.05,13)=2.160368)

Desired values of slope=1 and intercept=0 falls in the intervals (hence we can conclude that a good fit exist between measured and model values).

863403.02

/n

Ss r

xy

716372.0)863403.0(688414.0)( 22

/110 xysuas

018625.0)863403.0(000465.0)( 22

/221 xysuas

547627.185872.0

)716372.0(160368.285872.0)( 02,2/00 astaa n

040237.0031592.1

)018625.0(160368.2031592.1)( 12,2/11 astaa n

Page 32: Es272 ch5a

Non-linear Regression

In some cases we must fit a non-linear model to the data, e.g.,

)1( 1

0

xaeay

parameters a0 and a1

are not linearly dependant on y

Generalized L-S formulation cannot be used for such models.Same approach of using sum of square of the residuals are

applied, but the solution is sought iteratively.

Gauss-Newton method:

A Taylor series expansion is used to (approximately) linearize the model. Then standard L-S theory can be applied to estimate the improved estimates of the fit parameters.

),..,;( 10 maaaxfy

In most general form

Page 33: Es272 ch5a

Taylor series around the fit parameters

1

1

0

0

1

)()()()( a

a

xfa

a

xfxfxf

jiji

iji

i: i-th data pointj: iteration number

1

1

0

0

)()(a

a

xfa

a

xfyy

jiji

fitmeas

In matrix form:

aZd j

00

0

2

0

2

0

1

0

1

......

a

f

a

f

a

f

a

f

a

f

a

f

Z

nn

j

)(

...

)(

)(

22

11

nn xfy

xfy

xfy

d 1

0

a

aa

iteration number

Then

Page 34: Es272 ch5a

Applying the generalized L-S formula

dZaZZT

jj

T

j

We solve the above system for ( A) for improved values of parameters:

0,01,0 aaa jj 1,11,1 aaa jj

The procedure is iterated until an acceptable error:

1,0

,01,0

0j

jj

aa

aa

1,1

,11,1

1j

jj

aa

aa