Nonlinear Regression Analysis

8/9/2019 Nonlinear Regression Analysis

1/135

CHEE824

Nonlinear Regression Analysis

J. McLellan

Winter 2004


2/135

Module 1:

Linear Regression


3/135

3

Outline -

assessing systematic relationships

matrix representation for multiple regression

least squares parameter estimates

diagnostics

graphical quantitative

further diagnostics

testing the need for terms lack of fit test

precision of parameter estimates, predicted responses

correlation between parameter estimates


4/135

4

The Scenario

We want to describe the systematic relationship

between a response variable and a number of

explanatory variables

multiple regression

we will consider the case whichis linear in the parameters


5/135

5

Assessing Systematic Relationships

Is there a systematic relationship?

Two approaches: graphical

scatterplots, casement plots

quantitative form correlations between response, explanatory

variables

consider forming correlation matrix - table of pairwisecorrelations between regressor and explanatories, and

pairs of explanatory variables

correlation between explanatory variables leads to

correlated parameter estimates


6/135

chee824 - Winter 2004 J. McLellan 6

Graphical Methods for Analyzing Data

Visualizing relationships between variables

Techniques

scatterplots

scatterplot matrices

also referred to as casement plots

Time sequence plots


7/135


8/135


Scatterplots - Example

Scatterplot (teeth 4v*20c)

FLUORIDE

DISCOLOR

5

10

15

20

25

30

35

40

45

50

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

trend - possibly

nonlinear?

tooth discoloration data - discoloration vs. fluoride


9/135


Scatterplot - Example


BRUSHING

DISCOLOR

5

10

15

20

25

30

35

40

45

50

4 5 6 7 8 9 10 11 12 13

tooth discoloration data -discoloration vs. brushing

signficant trend?

- doesnt appear to

be present


10/135


Scatterplot - Example


BRUSHING

DISCOLOR

5

10

15

20

25

30

35

40

45

50

4 5 6 7 8 9 10 11 12 13

Variance appears

to decrease as

# of brushings increases

tooth discoloration data -discoloration vs. brushing


11/135


Scatterplot matrices

are a table of scatterplots for a set of variables

Look for -

systematic trend between independent variable and

dependent variables - to be described by estimated

model

systematic trend between supposedly independent

variables - indicates that these quantities are correlated

correlation can negatively ifluence model estimation results

not independent information

scatterplot matrices can be generated automatically

with statistical software, manually using Excel


12/135


Scatterplot Matrices - tooth data

Matrix Plot (teeth 4v*20c)

FLUORIDE

AGE

BRUSHING

DISCOLOR


13/135


Time Sequence Plot - Naphtha 90% Point

90%p

oint(d

egreesF)

390

400

410

420

430

440

450

460

470

480

0 30 60 90 120 150 180 210 240 270

Time Sequence Plot

- for naphtha 90% point - indicates amount of heavy

hydrocarbons present in gasoline range materialexcursion - sudden

shift in operation

meandering about

average operating point- time correlation in data


14/135


What do dynamic data look like?

Time Series Plot of Industrial Data

var1

var2# 1# 151

# 301# 451

# 601# 751

# 901# 1051

# 1201# 1351

# 1501# 1651

# 1801# 1951

# 21010

1

2

3

4

5

6

7


15/135

15

Assessing Systematic Relationships

Quantitative Methods

correlation

formal defn plus sample statistic (Pearsons r)

covariance

formal defn plus sample statistic

provide a quantiative measure of systematic LINEAR

relationships


16/135

16

Covariance

Formal Definition

given two random variables X and Y, the covariance

is

E{ } - expected value

sign of the covariance indicates the sign of the slope

of the systematic linear relationship positive value --> positive slope

negative value --> negative slope

issue - covariance is SCALE DEPENDENT

Cov X Y E X Y Y

( , ) {( )( )}=


17/135

17

Covariance

motivation for covariance as a measure of systematic

linear relationship

look at pairs of departures about the mean of X, Y

X

Y

mean of X, Y

X

Y

mean of X, Y


18/135

18

Correlation

is the dimensionless covariance

divide covariance by standard devns of X, Y formal definition

properties

dimensionless

range

Corr X Y X Y Cov Y

Y

( , ) ( , )( , )

= =

1 1( , )Ystrong linear relationshipwith negative slope

strong linear relationshipwith positive slope

Note - the correlation gives NO information about the

actual numerical value of the slope.


19/135

19

Estimating Covariance, Correlation

from process data (with N pairs of observations)

Sample Covariance

Sample Correlation

RN

X X Y Y i ii

N=

=

1

1 1( )( )

rN

X X Y Y

s s

i ii

N

Y

=

=

11 1

( )( )


20/135

20

Making Inferences

The sample covariance and corrleration are

STATISTICS, and have their own probability

distributions.

Confidence interval for sample correlation -

the following is approximately distributed as the standard

normal random variable

derive confidence limits for and convert to

confidence limits for the true correlation using tanh

tanh ( )1

N r 3 1 1(tanh ( ) tanh ( ))


21/135

21

Confidence Interval for Correlation

Procedure

1. find for desired confidence level

2. confidence interval for is

3. convert to limits to confidence limits for correlation by

taking tanh of the limits in step 2

A hypothesis test can also be performed using this function of the

correlation and comparing to the standard normal distribution

z/2tanh ( )1

tanh ( ) /

1 213

rN

z


22/135

22

Example - Solder Thickness

Objective - study the effect of temperature on solder

thicknessData - in pairs

Solder Temperature (C) Solder Thickness (microns)

245 171.6

215 201.1

218 213.2

265 153.3

251 178.9

213 226.6

234 190.3

257 171

244 197.5

225 209.8


23/135

23


Solder Thickness (microns)

140150

160170180190200210220230

200 210 220 230 240 250 260 270

temperature

thickness

Solder Tem perature (C ) Thickness (m i

Solder Tem perature (C) 1

Solder Thickness (m icro -0.920001236 1


24/135

24


Confidence Interval

zalpha/2 of 1.96 (95% confidence level)

limits in tanh^-1(rho) -2.329837282 -0.848216548

limits in rho -0.981238575 -0.690136605


25/135

25

Empirical Modeling - Terminology

response

dependent variable - responds to changes in othervariables

the response is the characteristic of interest which we are

trying to predict

explanatory variable

independent variable, regressor variable, input, factor

these are the quantities that we believe have an

influence on the response

parameter

coefficients in the model that describe how the

regressors influence the response


26/135

26

Models

When we are estimating a model from data, we

consider the following form:

Y f= +( , )

response

explanatoryvariables

parameters

random error


27/135

27

The Random Error Term

is included to reflect fact that measured data contain

variability

successive measurements under the same conditions

(values of the explanatory variables) are likely to be

slightly different this is the stochastic component

the functional form describes the deterministic

component

random error is not necessarily the result of mistakes in

experimental procedures - reflects inherent variability

noise


28/135

28

Types of Models

linear/nonlinear in the parameters

linear/nonlinear in the explanatory variables

number of response variables

single response (standard regression)

multi-response (or multivariate models)

From the perspective of statistical model-building,the key point is whether the model is linear ornonlinear in the PARAMETERS.


29/135

29

Linear Regression Models

linear in theparameters

can be nonlinear in the regressors

T T T95 1 2= + +b b LGO mid

T T T95 1 2= + +b bGO mid


30/135

30

Nonlinear Regression Models

nonlinear in theparameters

e.g., Arrhenius rate expression

r exp( RT )=

k

E0

linear

(if E is fixed)

nonlinear


31/135

31

Nonlinear Regression Models

sometimes transformably linear

start with

and take ln of both sides to produce

which is of the form

r exp(RT

)=

+kE

0

ln(r) ln( )RT

= +kE

0

Y = + + 0 11

RT

linear in the

parameters


32/135

32

Transformations

note that linearizing the nonlinear equation by

transformation can lead to misleading estimates if the

proper estimation method is not used

transforming the data can alter the statistical

distribution of the random error term


33/135

33

Ordinary LS vs. Multi-Response

single response (ordinary least squares)

multi-response (e.g., Partial Least Squares)

issue - joint behaviour of responses, noise

T T T95 1 2= + +b b LGO mid

T T TT T T

,

,

95 11 12 1

95 21 22 2

LGO LGO mid

kero kero mid

b bb b

= + += + +

We will be focussing on single response models.


34/135

34

Linear Multiple Regression

Model Equation

Y X Xi i p ip i= + + + 1 1 K

i-th observationof response(i-th data point)

i-th value of

explanatory variable X 1

i-th value of

explanatory variable X p

The intercept can be considered as correspondingto an X which always has the value 1

random noisein i-th observation

of response


35/135

35

Assumptions for Least Squares Estimation

Values of explanatory variables are known EXACTLY

random error is strictly in the response variable

practically - a random component will almost always be

present in the explanatory variables as well

we assume that this component has a substantiallysmaller effect on the response than the random

component in the response

if random fluctuations in the explanatory variables are

important, consider alternative method (Errors inVariables approach)


36/135

36


The form of the equation provides an adequate

representation for the data can test adequacy of model as a diagnostic

Variance of random error is CONSTANT over range ofdata collected

e.g., variance of random fluctuations in thickness

measurements at high temperatures is the same asvariance at low temperatures

data is heteroscedastic if the variance is not constant -

different estimation procedure is required

thought - percentage error in instruments?


37/135

37


The random fluctuations in each measurement are

statistically independent from those of othermeasurements

at same experimental conditions

at other experimental conditions implies that random component has no memory

no correlation between measurements

Random error term is normally distributed

typical assumption

not essential for least squares estimation

important when determining confidence intervals,

conducting hypothesis tests


38/135

Least Squares Estimation - graphically

least squares - minimize sum of squared prediction errors

response

(solder thickness)

T

o

o

o

o

oodeterministictruerelationship

prediction errorresidual


39/135

39

More Notation and Terminology

Random error is independent, identically distributed

(I.I.D) -- can say that it is IID Normal

Capitals - Y - denotes random variable- except in case of explanatory variable - capital usedto denote formal defn

Lower case - y, x - denotes measured values ofvariables

Model

Measurement

Y X= + + 0 1

y x= + +

0 1


40/135

40

More Notation and Terminology

Estimate - denoted by hat

examples - estimates of response, parameter

Residual - difference between measured and predicted

response

$, $y 0

e y y= $


41/135

41

Matrix Representation for Multiple RegressionWe can arrange the observations in tabular form - vector of

observations, and matrix of explanatory values:

Y

Y

Y

Y

X X X

X X X

X X X

X X X

N

N

p

p

N N N p

N N N p

p

1

2

1

11 12 1

21 22 2

11 1 2 1

1 2

1

2

1

M

L

L

M M M M

L

L

M

=

+

, , ,

, , ,

2

1

M

N

N


42/135

42

Matrix Representation for Multiple Regression

The model is written as:

Y X= +

Nx1vector

Nxpmatrix

px1vector

Nx1vector

N --> number of data observationsp --> number of parameters


43/135

43

Least Squares Parameter Estimates

We make the same assumptions as in the straight line

regression case:

independent random noise components in each

observation

explanatory variables known exactly - no randomness variance constant over experimental region (identically

distributed noise components)


44/135

44

Residual Vector

Given a set of parameter values , the residual vector is formed

from the matrix expression:

e

e

e

e

Y

Y

Y

Y

X X

X X X

X X X

X X X

N

N

N

N

p

p

N N N p

N N N p

1

2

1

1

2

1

11 12 1

21 22 2

11 1 2 1

1 2

M M

L

L

M M M M

L

L

=

, , ,

, , ,

~

~

~

1

2

M

p

~


45/135

45

Sum of Squares of Residuals

is the same as before, but can be expressed as the squared

length of the residual vector:

SSE ei

i

NT

T

= =

=

=

=

2

12

e e

e

Y X Y X(

~

) (

~

)


46/135

46


Find the set of parameter values that minimize the sum

of squares of residuals (SSE)

apply necessary conditions for an optimum from calculus

(stationary point)

system of N equations inp unknowns, with number of

parameters < number of observations : over-determined

system of equations

solution - set of parameter values that comes closest to

satisfying all equations (in a least squares sense)

( )

$SSE = 0


47/135

47


The solution is:

$ ( ) = X X X YT T1

generalized matrix inverseof X

- generalization of standardconcept of matrix inverse to case ofnon-square matrix case


48/135

48


Lets analyze the data considered for the straight line case:

Solder Temperature (C) Solder Thickness (microns)

245 171.6

215 201.1

218 213.2

265 153.3251 178.9

213 226.6

234 190.3

257 171

244 197.5

225 209.8

Model:

Y X= + + 0 1


49/135

49


In matrix form:

1716

2011

2132

1533

1789

2266

1903

171

1975

2098

1 245

1 215

1 218

1 265

1 251

1 213

1 234

1 257

1 244

1 225

.

.

.

.

.

.

.

.

.

=

+

0

1

1

2

3

4

5

6

7

8

9

10

Y X= +


50/135

50


In order to calculate the Least Squares Estimates:

( ) ;X XT =

10 2367

2367 563335

X YT =

1910

449420


51/135

51


The least squares parameter estimates are obtained as:

$

( )

. .

. .

.

. = =

=

X X X Y

T T1

18 373 0 0772

0 0772 0 0003

1910

449420

45810

113


52/135

52

Example - Wave Solder Defects

(page 8-31, Course Notes)

Wave Solder Defects DataRun Conveyor Speed Pot Temp Flux Dens ity No. of Defec ts

1 -1 -1 -1 100

2 1 -1 -1 119

3 -1 1 -1 118

4 1 1 -1 2175 -1 -1 1 20

6 1 -1 1 42

7 -1 1 1 41

8 1 1 1 113

9 0 0 0 10110 0 0 0 96

11 0 0 0 115


53/135

53


In matrix form:100

119

118

217

20

42

41

113

101

96

115

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 0 0 0

1 0 0 0

1 0 0 0

=

+

0

1

2

3

1

2

3

4

5

6

7

8

9

10

11

Y X= +


54/135

54


To calculate least squares parameter estimates:

( ) ;X XT =

11 0 0 0

0 8 0 0

0 0 8 0

0 0 0 8

X YT =

1082

212

208

338


55/135

55


Least squares parameter estimates:

$ ( )

.

.

.

.

= =

=

X X X Y

1

11

18

1

8

1

8

T T1

0 0 0

0 0 0

0 0 0

0 0 0

1082

212

208

338

9336

2650

26 0

42 25


56/135

56

Examples - Comments

if there N runs, and the model hasp parameters, XTX is apxp

matrix (smaller dimension than number of runs) elements of XTY are for parametersj=1, ,p

in the Wave Solder Defects example, the values of theexplanatory variable for the runs followed very specific patterns

of -1 and +1, and XTX was a diagonal matrix

in the Solder Thickness example, the values of the explanatory

variable did not follow a specific pattern, and XTX was notdiagonal

x yij ii


57/135


58/135

58

Graphical Diagnostics

Residuals vs. Predicted Response Values

residuale

i

$yi

*

*

*

*

**

*

*

*

** *

*

*

- even scatterover range of prediction

- no discernable pattern

- roughly half the residualsare positive, half negative

DESIRED RESIDUAL PROFILE


59/135

59



residualei

$yi

*

**

*

*

*

*

*

*

*

* *

*

*

outlier lies outsidemain body of residuals

RESIDUAL PROFILE WITH OUTLIERS


60/135

60



residualei

$yi

**

*

*

*

*

*

*

*

*

*

*

**

variance of the residualsappears to increasewith higher predictions

NON-CONSTANT VARIANCE

*

*

*

*


61/135

61


Residuals vs. Explanatory Variables

ideal - no systematic trend present in plot

inadequate model - evidence of trend present

residualei

** *

*

**

* *

** *

*

*

*x

left over quadratic trend- need quadratic term in model


62/135

62


Residuals vs. Explanatory Variables Not in Model

ideal - no systematic trend present in plot

inadequate model - evidence of trend present

residualei

*

* **

*

* * *

** *

*

*

*w

systematic trendnot accounted for in model- include a linear term in w


63/135

63


Residuals vs. Order of Data Collection

residualei

** ** *

*

* * * * **

**

t

*

** **** *

* *

**

*t

residual

ei

failure to account for time trendin data

successive random noise

components are correlated- consider more complex model- time series model for randomcomponent?


64/135

64

Quantitative Diagnostics - Ratio Tests

Residual Variance Test

is the variance of the residuals significant to the inherent

noise variance?

same test as that for the straight line data

only distinction - number of degrees of freedom for theMean Squared Error => N-p , wherep is the number of

parameters in the model

compare ratio to FN-p,M-1,0.05 where M is the number of

data points used to estimate inherent variance significant? -> model is INADEQUATE


65/135

65


Residual Variance Ratio

Mean Squared Error of Residuals (Var. of Residuals):

s

s

Mean Squared Error of Residuals MSE

s

residuals

inherent inherent

2

2 2=

( )

s MSE

e

N presiduals

ii

N

22

1= =

=


66/135

66


Mean Square Regression Ratio

same as in the straight line case except for degrees of

freedom

Variance described by model:

MSR

y y

p

ii

N

=

=( $ )

2

1

1


67/135

67

Quantitative Diagnostics - Ratio Test

Test Ratio:

is compared against Fp-1,N-p,0.95

Conclusions? ratio is statistically significant --> significant trend

NOT statistically significant --> significant trend has NOT

been modeled, and model is inadequate in its present form

SR

SE

For the multiple regression case, this test is a coarsemeasure of whether some trend has been modeled -

it provides no indication of which Xs are important


68/135

68

Analysis of Variance Tables

The ratio tests involve dissection of the sum of squares:

SSR

y yii

N=

=

( $ )2

1

SSE

y yi ii

N=

=

( $ )2

1

TSS y yii

=

=

( )2

1


69/135

69

Analysis of Variance (ANOVA) for Regression

Sourceof

Variation

Degreesof

Freedom

Sum ofSquares

MeanSquare

F-Value p-value

Regression p-1 SSR MSR=SSR/(p-1) F=MSR/MSE p

Residuals N-p SSE MSE=SSE/(N-p)

Total N-1 TSS


70/135

70

Quantitative Diagnostics - R

2

Coefficient of Determination (R2 Coefficient)

square of correlation between observed and predicted

values:

relationship to sums of squares:

values typically reported in %, i.e., 100 R2

ideal - R

2

near 100%

R corr y y2 2= [ ( , $)]

R

SSE

TSS

SSR

TSS

21= =


71/135


72/135

72

Adjusted R

2

Adjust for number of parameters relative to number of observations

account for degrees of freedom of the sums of squares

define in terms of Mean Squared quantities

want value close to 1 (or 100%), as before

if N>>p, adjusted R2 is close to R2

provides measure of agreement, but does not account for

magnitude of residual error

R

SE

TSS N

SSE p

TSS N adj2

1 1 1 1= =

/ ( )

/ ( )

/ ( )


73/135

73

Testing the Need for Groups of Terms

In words: Does a specific group of terms account for significant

trend in the model?

Test

compare difference in residual variance between full and

reduced model benchmark against an estimate of the inherent variation

if significant, conclude that the group of terms ARE

required

if not significant, conclude that the group of terms can be

dropped from the model - not explaining significant trend

note that remaining parameters should be re-estimated in

this case


74/135

74


Test:

A - denotes the fullmodel (with all terms)

B - denotes the reducedmodel (group of terms deleted)

Form:

pA, pB are the numbers of parameters in models A, B

s2 is an estimate of the inherent noise variance:

estimate as SSEA/(N-pA)

SSE SSE

s p pA B

B

model model

2 ( )


75/135

75


Compare this ratio to

if MSEA is used as estimate of inherent variance, thendegrees of freedom of inherent variance estimate is pA

Fp p inherent , , . 0 95


76/135

76

Lack of Fit Test

If we have replicate runs in our regression data set, we can break

out the noise variance from the residuals, and assess thecomponent of the residuals due to unmodelled trend

Replicates -

repeated runs at the SAME experimental conditions

note that all explanatory variables must be at fixed

conditions

indication of inherent variance because no other factorsare changing

measure of repeatibility of experiments


77/135

77

Using Replicates

We can estimate the sample variance for each set of replicates,

andpoolthe estimate of the variance constancy of variance can be checked using Bartletts

test

constant variance is assumed for ordinary least squares

estimation

For each replicate set, we have:

s

y y

ni

ij ij

n

i

i

2

2

1

1=

=

( )

average ofvalues inreplicate seti

number ofvalues in

replicate seti values in

replicate seti


78/135

78

Using Replicates

The pooled estimate of variance is:

i.e., convert back to sums of squares, and divide by the total

number of degrees of freedom (the sum of the degrees of

freedom for each variance estimate)

( )n s

n m

i ii

m

ii

m

=

=

1 2

1

1


79/135

79

The Lack of Fit Test

Back to the sum of squares block:

SSR

TSS

SSELOFSSEP

pure error sumof squares

lack of fitsum of squares

SSE


80/135

80


We partition the SSE into two components:

component due to inherent noise component due to unmodeled trend

Pure error sum of squares (SSEP):

i.e., add together sums of squares associated with each replicate

group (there are m replicate groups in total)

SSEP y yij ij

n

i

m i=

==( )2

11


81/135

81


The lack of fit sum of squares (SSELOF) is formed by backing out

SSEP from SSE:

Degrees of Freedom:

- for SSEP:

- for SSELOF:

SSELOF SSE SSEP =

n mii

m

=

1

N p n mi

i

m

=1


82/135

82


The test ratio:

Compare to

significant? - there is significant unmodeled trend, and

model should be modified

not significant? - there is nosignificant unmodeled trend,

and supports model adequacy

SELOF

MSEP

SSELOF

SSEPLOF

Pure=

/

/

FOF ure , , .0 95


83/135

83


From earlier regression, SSE = 2694.0 and SSR = 25306.5

LACK OF FIT TEST

ANOVA

df SS MS F value from F-table (95% pt)

Residual 7 2694.045LOF 5 2500.045 500.0091 5.154733 19.3 (this is F5,2,0.95)

Pure Error 2 194 97

Replicate Set

9 0 0 0 10110 0 0 0 96

11 0 0 0 115

std. devn 9.848858

sample var 97

sum of sq 194(as (n_i-1)s^2)

This was done by hand - Excel has no Lack of Fit test


84/135

84

A Comment on the Ratio Tests

Order of Preference (or value) - from most definitive to

least definitive:

Lack of Fit Test -- MSELOF/MSEP

MSE/s2

inherent

MSR/MSE

If at all possible, try to include replicate runs in your experimentalprogram so that the Lack of Fit test can be conducted

Many statistical software packages will perform the Lack of Fit test

in their Regression modules - Excel does NOT


85/135

85

The Parameter Estimate Covariance Matrix

summarizes the variance-covariance structure of the parameter

estimates

=

Var Cov Cov

Cov Var Cov

Cov Cov Var

p

p

p p p

( $ ) ( $ , $ ) ( $ , $ )

( $ , $ ) ( $ ) ( $ , $ )

( $ , $ ) ( $ , $ ) ( $ )

1 1 2 1

1 2 2 2

1 2

L

L

M M O M

L


86/135

86

Properties of the Covariance Matrix

symmetric -- Cov(b1,b2) = Cov(b2,b1)

diagonal entries are always non-negative

off-diagonal entries can be +ve or -ve

matrix ispositive definite

for any vector vv v

T > 0


87/135


88/135

88

Parameter Estimate Covariance Matrix

Key point - the covariance structure of the parameter estimates is

governed by the experimental run conditions used for the

explanatory variables -the Experimental Design

Example - the Wave Solder Defects data

( ) ;X XT =

11 0 0 0

0 8 0 0

0 0 8 0

0 0 0 8

( )X X

1

11

1

8

1

8

1

8

T =

1

0 0 0

0 0 0

0 0 0

0 0 0

Parameter estimatesare uncorrelated, andvariances of thenon-intercept

parameteres are thesame- towards uniform

precision ofparameter estimates


89/135

89

Estimating the Parameter Covariance Matrix

The X matrix is known - set of run conditions - so the only

estimated quantity is the inherent noise variance

from replicates, external estimate, or MSE

For wave solder defect data, the sample variance of the replicates

is 384.86 with 7 degrees of freedom, and the parameter

covariances are:

$ ( ) ( . )

.

.

.

.

= =

=

X X

1

11

1

8

1

8

1

8

Tes

1 2

0 0 0

0 0 0

0 0 0

0 0 0

384 86

34 99 0 0 0

0 4811 0 0

0 0 4811 0

0 0 0 4811

residualvariance from

MSE


90/135

90

Using the Covariance Matrix

Variances of parameter estimates

are obtained from the diagonal of the matrix

square root is the standard devn, or standard error, of

the parameter estimates

use to formulate confidence intervals for the paramters use in hypothesis tests for the parameters

Correlations between the parameter estimates

can be obtained by taking covariance from appropriate

off-diagonal element, and dividing by the standard errors

of the individual parameter estimates


91/135

91

Correlation of the Parameter Estimates

Note that

I.e., the parameter estimate for the intercept depends

linearly on the slope!

the slope and intercept estimates are correlated

$ $ 0 1= Y x

changing slope changespoint of intersection withaxis because the line mustgo through the centroid of thedata


92/135

92

Getting Rid of the Covariance

Lets define the explanatory variable as the deviation

from its average:Z =

$

$

0

1 12

1

=

=

=

=

Y

z Y

z

i i

i

N

ii

N

Least Squares parameterestimates:

- note that now there is no explicitdependence on the slope valuein the intercept expression

- average of z is zero


93/135

93

Getting Rid of the Covariance

In this form of the model, the slope and intercept

parameter estimates are uncorrelated

Why is lack of correlation useful?

allows indepedent decisions about parameter estimates decide whether slope is significant, intercept is significant

individually

unique assignment of trend

intercept clearly associated with mean of ys

slope clearly associated with steepness of trend

correlation can be eliminated by altering form of model,

and choice of experimental points


94/135

94

Confidence Intervals for Parameters

similar procedure to straight line case:

given standard error for parameter estimate, useappropriate t-value, and form interval as:

The degrees of freedom for the t-statistic come from the

estimate of the inherent noise variance

the degrees of freedom will be the same for all of the

parameter estimates

If the confidence interval contains zero, the parameter is plausibly

zero and consideration should be given to deleting the term.

$, / $ i t s

i

2


95/135

95

Hypothesis Tests for Parameters

represent an alternative approach to testing whether the term

should be retained in the model

Null hypothesis - parameter = 0

Alternate hypothesis - parameter is not equal to 0

Test statistic:

compare absolute value to

if test statistic is greater (outside the fence), parameter

is significant -- retain

inside the fence? - consider deleting the term

$

$

is

i

t , /2


96/135

96

Example - Wave Solder Defects Data

Test statistic will be compared to

because MSE is used to calculate standard errors of parameters,and has 7 degrees of freedom.

Test statistic for intercept:

Since 16.63 > 2.365, conclude that intercept parameter ISsignificant and should be retained.

t7 0 025 2 365, . .=

$ .

..

$

0

0

9836

34 991663

s= =


97/135

97


For the next term in the model:

Therefore this term should be retained in the model.

Because the parameter estimates are uncorrelatedin this model,

terms can be dropped without the need to re-estimate the otherparameters in the model -- in general, you will have to re-

estimate the final model once more to obtain the parameter

estimates corresponding to the final model form.

$ .

.. .

$

1

1

26 5

4811382 2 365

s= = >


98/135

98


From Excel:

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.36363636 5.915031978 16.62943 6.948E-07 84.376818 112.3505

Conveyor Speed 26.5 6.935989803 3.820652 0.0065367 10.099002 42.901

Pot Temp 26 6.935989803 3.748564 0.0071817 9.599002 42.401Flux Density -42.25 6.935989803 -6.09142 0.0004953 -58.651 -25.849

standard devns.of each parameterestimate

test statisticfor eachparameter

prob. thata value isgreater thancomputed test

ratio - 2-tailedtest!

confidencelimits


99/135

99

Precision of the Predicted Responses

The predicted response from an estimated model has uncertainty,

because it is a function of the parameter estimates which haveuncertainty:

e.g., Solder Wave Defect Model - first responseat the point -1,-1,-1

If the parameter estimates were uncorrelated, the variance of the

predicted response would be:

(recall results for variance of sum of random variables)

$

$ $( )

$( )

$( )y1 0 1 2 31 1 1= + + +

Var y Var Var Var Var ( $ ) ( $ ) ( $ ) ( $ ) ( $ )1 0 1 2 3= + + +


100/135

100

Precision of the Predicted Responses

In general, both the variances and covariances of the parameter

estimates must be taken into account.

For prediction at the k-th data point:

[ ]

Var y

x x x

x

x

x

k kT T

k

k k kp

T

k

k

kp

( $ ) ( )

( )

=

=

x X X x

X X

1 2

1 2

1

1

22

LM


101/135

101

Example - Wave Solder Defects Model

In this example, the parameter estimates are uncorrelated

XTX is diagonal variance of the predicted reponse is in fact the sum of the

variances of the parameter estimates

Variance of prediction at run #11 (0,0,0):

Var y Var Var Var Var

Var

( $ ) ( $ ) ( $ )( ) ( $ )( ) ( $ )( )

( $ )

11 0 1 2 3

0

0 0 0= + + +

=


102/135

102

Precision of Future Predictions

Suppose we want to predict the response at conditions other than

those of the experimental runs --> future run.

The value we observe will consist of the component from thedeterministic component, plus the noise component.

In predicting this value, we must consider:

uncertainty from our prediction of the deterministiccomponent

noise component

The variance of this future prediction is

where is computed using the same expression

for variance of predicted responses at experimental run conditions

Var yfuture( $ ) + 2

Var yfuture( $ )


103/135

103

Estimating Precision of Predicted Responses

Use an estimate of the inherent noise variance

The degrees of freedom for the estimated variance of the predictedresponse are those of the estimate of the noise variance

replicates, external estimate, MSE

s sy k

T Tk e

k$( )2 1 2= x X X x


104/135

104

Confidence Limits for Predicted Responses

Follow an approach similar to that for parameters - 100(1-alpha)%

confidence limits for predicted response at the k-th run are:

degrees of freedom are those of the inherent noise

variance estimate

If the prediction is for a response at conditions OTHER than one of

the experimental runs, the limits are:

$ , / $ y t sk yk 2

$ , / $ y t s sk y efuture

+ 22 2


105/135

105

Practical Guidelines for Model Development

1) ConsiderCODING your explanatory variables

Coding - one standard form:

places designed experiment into +1,-1 form if run conditions are from an experimental design, this

coding must be used in order to obtain all of the benefits

from the design - uncorrelated parameter estimates

if conditions are not from an experimental design, such acoding improves numerical conditioning of the problem --

similar numerical scales for all variables

~

( )

xx x

range xi

i i

i

= 1

2


106/135

106


2) Types of models -

linear in the explanatory variables linear with two-factor interactions (xi xj)

general polynomials

3) Watch forcollinearityin the X matrix - run condition patterns for

two or more explanatory variables are almost the same

prevents clear assignment of trend to each factor

shows up as singularity in XTX matrix

associated with very strong correlation between

parameter estimates


107/135

107


4) Be careful not to extrapolate excessively beyond the range of

the data

5) Maximum number of parameters that can be fit to a data set =

number of unique run conditions

N - number of data points

m - number of replicate sets

ni - number of points in replicate set i

as number of parameters increases, precision of

predictions decreases - start modeling noise

N n mii

m

=1


108/135

108


6) Model building sequence

building approach - start with few terms and add asnecessary

pruning approach - start with more terms and remove

those which arent statistically significant

stepwise regression - terms are added, and retained

according to some criterion - frequently R2

uncorrelated? criterion?

all subsets regression - consider all subsets of modelterms of certain type, and select model with best criterion

significant computational load


109/135

109

Polynomial Models

Order - maximum over thep terms in the model of the sum of the

exponents in a given terme.g.,

is a fifth-order model

Two factor interaction -

product term -

implies that impact of x1 on response depends on value

of x2

Y x x x x= + + + + 0 1 1 2 22

3 12

23

x x1 2


110/135

110

Polynomial Models

Comments -

polynomial models can sometimes suffer from collinearityproblems - coding helps this

polynomials can provide approximations to nonlinear

functions - think of Taylor series approximations

high-order polynomial models can sometimes be

replaced by fewer nonlinear function terms

e.g., ln(x) vs. 3rd order polynomial


111/135

111

Joint Confidence Region (JCR)

answers the question

Where do the true values of the parameters lie?

Recall that for individual parameters, we gain an understanding of

where the true value lies by:

examining the variability pattern (distribution) for the

parameter estimate

identify a range in which most of the values of the

parameter estimate are likely to lie manipulate this range to determine an interval which is

likely to contain the true value of the parameter


112/135

112

Joint Confidence Region

Confidence interval for individual parameter:

Step 1) The ratio of the estimate to its standard deviation is

distributed as a Students t-distribution with degrees of freedomequal to that of the standard devn of the variance estimate

Step 2) Find interval which contains

of values -i.e., probability of a t-value falling in this interval is

Step 3) Rearrange this interval to obtain interval

which contains true value of parameter of the time

$~

$

i i

s

t

i

[ , ], / , /t t 2 2 100 1( )%

( )1

$, / $ i t s i

2100 1( )%


113/135

113


Comments on Individual Confidence Intervals:

sometimes referred to as marginalconfidence intervals -cf. marginal distributions vs. joint distributions from earlier

marginal confidence intervals do NOT account for

correlations between the parameter estimates

examining only marginal confidence intervals can

sometimes be misleading if there is strong correlation

between several parameter estimates

value of one parameter estimate depends in part on anther

deletion of the other changes the value of the parameter

estimate

decision to retain might be altered


114/135

114


Sequence:

Step 1) Identify a statistic which is a function of the parameterestimate statistics

Step 2) Identify a region in which values of this statistic lie a certain

fraction of the time (a region)

Step 3) Use this information to determine a region which contains

the true value of the parameters of the time

100 1( )%

100 1( )%


115/135

115


The quantity

is the ratio of two sums of squares, and is distributed as an F-

distribution withp degrees of freedom in the numerator, and n-p

degrees of freedom in the denominator

( $ ) ( $ )

~ ,

T T

p n pp

s

F

X X

2estimate ofinherentnoise variance(if MSE is used, degrees of freedom is n-p)


116/135

116


We can define a region by thinking of those values of the ratio

which have a value less than

i.e.,

Rearranging yields:

Fp n p, , 1

( $ ) ( $ )

, ,

T T

p n pp

sF

X X

2 1

( $ ) ( $ ) , , T T

p n p ps FX X2

C f f


117/135

117

Joint Confidence Region - Definition

The joint confidence region for the parameters is

defined as those parameter values satisfying:

Interpretation:

the region defined by this inequality contains the true

values of the parameters of the time

if values of zero for one or more parameters lie in this

region, those parameters are plausibly zero, and

consideration should be given to dropping the

corresponding terms from the model

100 1( )%

( $ ) ( $ ) , , T T

p n p ps FX X2

1

100 1( )%


118/135

118

Joint Confidence Region - Example with 2 Parameters

Lets reconsider the solder thickness example:

95% Joint Confidence Region (JCR) for slope&intercept:

( ) ;X XT =

10 2367

2367 563335

$

.

.

; =

45810

113

[ ]

( $ ) ( $ )

$ $

$

$

, , , .

=

=

T T

Tp n p ps F s F

X X

X X0 0 1 1

0 0

1 1

2 22 10 2 0 952

s2 13538= .

C f


119/135

119


95% Joint Confidence Region (JCR) for slope&intercept:

The boundary is an ellipse...

[ ]45810 11345810

113

2 13538

2 135 38 4 46 1207 59

0 1

0

1

2 8 0 95. .

.

.

( . )

( . )( . ) .

, , .

= =

X XT F

J i t C fid R i E l ith 2 P t


120/135

120


Region

320 600

-0.6

-1.6

Intercept

Slope

rotated - implies correlationbetween estimates of slope

and intercept

centred at least squaresparameter estimates

greater shadow along horizontal axis --> variance of

intercept estimate is greater than that of slope

I t ti J i t C fid R i


121/135

121

Interpreting Joint Confidence Regions

1) Are axes aligned with coordinate axes?

is ellipse horizontal or vertical?

indicates no correlation between parameter estimates2) Which axis has the greatest shadow?

projection of ellipse along axis

indicates which parameter estimate has the greatest

variance

3) The elliptical region is, by definition, centred at the least squares

parameter estimates

4) Long, narrow, rotated ellipses indicate significant correlationbetween parameter estimates

5) If a value of zero for one or more parameters lies in the region,

these parameters are plausibly zero - consider deleting from

model

J i t C fid R i


122/135

122

Joint Confidence Regions

What is the motivation for the ratio

used to define the joint confidence region?

Consider the joint distribution for the parameter estimates:

( $ ) ( $ )

T T

p

s

X X

2

12

122

1

( ) det( )exp{ ( $ ) ( $ )}

/$

$

p

T

Substitute in estimate forparameter covariance matrix:

( $ ) (( ) ) ( $ )

( $ ) ( $ )

=

T T

T T

s

s

X X

X X

1 2 1

2

C fid I t l f D iti


123/135

123

Confidence Intervals from Densities

Individual Interval Joint Regionf b$ ( )

f b b$ $ ( , ) 0 1 0 1

bb0

b1

lower upper

area = 1-alpha

volume = 1-alpha

Joint ConfidenceRegion

R l ti hi t M i l C fid Li it


124/135

124

Relationship to Marginal Confidence Limits

Region

320 600

-0.6

-1.6

Intercept

Slope

centred at least squaresparameter estimates

marginal confidence interval for intercept

marg

inalconfide

nceinterval

forslo

pe



125/135

125


Region

320 600

-0.6

-1.6

Intercept

Slope 95% confidence

region for parametersconsidered jointly

marginal confidence interval for intercept

marg

inalconfide

nceinterval

forslo

pe

95% confidenceregion implied by

considering parametersindividually

Relationship to Marginal Confidence Intervals


126/135

126

Relationship to Marginal Confidence Intervals

Marginal confidence intervals are contained in joint confidence

region potential to miss portions of plausible parameter values

at tails of ellipsoid

using individual confidence intervals implies a

rectangular region, which includes sets of parametervalues that lie outside the joint confidence region

both situations can lead to

erroneous acceptance of terms in model

erroneous rejection of terms in model

Going Further Nonlinear Regression Models


127/135

127

Going Further - Nonlinear Regression Models

Model:

Estimation Approach:

linearize model with respect to parameters

treat linearization as a linear regression problem

iterate by repeating linearization/estimation/linearizationabout new estimates, until convergence to parameter

values - Gauss-Newton iteration - or solve numerical

optimization problem

Yi i i= + ( , )x

explanatoryvariables

parameters

random noisecomponent

Interpretation Columns of X


128/135


Interpretation - Columns of X

values of a given variable at different operating points -

entries in XTX

dot products of vectors of regressor variable values

related to correlation between regressor variables

form of XTX is dictated by experimental design e.g., 2k design - diagonal form

Parameter Estimation Graphical View


129/135


Parameter Estimation - Graphical View

approximating observation vector

$y

yobservations

residual

vector

P t E ti ti N li R i C


130/135


Parameter Estimation - Nonlinear Regression Case

approximating observation vector

$y

y

residual

vector

model surface

observations

Properties of LS Parameter Estimates


131/135


Properties of LS Parameter Estimates

Key Point - parameter estimates are random variables

because of how stochastic variation in data propagatesthrough estimation calculations

parameter estimates have a variability pattern -

probability distribution and density functions

Unbiased

average of repeated data collection / estimation

sequences will be true value of parameter vectorE{$

} =

Properties of Parameter Estimates


132/135



Consistent

behaviour as number of data points tends to infinity with probability 1,

distribution narrows as N becomes large

Efficient

variance of least squares estimates is less than that ofother types of parameter estimates

N

=lim $



133/135



Covariance Structure

summarized by variance-covariance matrix

Cov( $) ( ) = X XT 1 2

structure dictated byexperimental design

variance ofnoise

CovVar Cov

Cov Var ( $)

( $ ) ( $ , $ )

( $ , $ ) ( $ )

=

0 0 1

0 1 1

Prediction Variance


134/135


Prediction Variance

in matrix form -

where is vector of conditions at k-th data point

var( $ ) ( )yk kT T

k= x X X x1 2

x k



135/135


Variability in data can affect parameter estimates jointly

depending on structure of data and model

2

1

section of sum ofsquares

(or likelihood)function

Documents

Nonlinear Regression Analysis