Nonlinear Regression Didier Concordet NATIONAL VETERINARY SCHOOL Toulouse

Nonlinear Regression

Didier Concordet

NATIONAL VETERINARY SCHOOL Toulouse

2

0

1

10

100

1000

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

An exampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

3

Questions

• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?

• For a given model, how to fit the data ?

• Is this model relevant ?

4

What does nonlinear mean ?

Definition : An operator (P) is linear if :• for all objects x, y on which it operates

P(x+y) = P (x) + P(y)• for all numbers and all objects x

P (x) = P(x)

When an operator is not linear, it is nonlinear

5

Examples

• P (t) = a t

• P(t) = a

• P(t) = a + b t

• P(t) = a t + b t²

Among the operators below which one are nonlinear ?

• P(a,b) = a t + b t²

• P(A,) = A exp (- t)

• P(A) = A exp (- 0.1 t)

• P(t) = A exp (- t)

6

What is a nonlinear kinetics ?

For a given dose D Concentration at time

t, C(t,D)

The kinetics is linear when the operator :

DtCDP , is linear

When P(D) is not linear, the kinetics is nonlinear

7

What is a nonlinear kinetics ?

Examples :

t

V

Cl

V

DDtCDP exp,

KtV

DDtCDP ,

8

What is a nonlinear statistical model ?

A statistical model

qp xxxfY ,,,,,,, 2121

Observation :Dep. variable

Parameters Covariates :indep. variables

Error :residual

function

9


qpp xxxfP ,,,,,,,,,, 212121

A statistical model is linear when the operator :

is linear.When pP ,,, 21 is not linear

the model is nonlinear

10


Example :

tY 21

Y = Concentration

t = time

The model :

is linear

11

Examples

tY 21

2321 ttY

tY 21 exp

ttY 4321 expexp

Among the statistical models below which one are nonlinear ?

22

2

3

1

x

xY

tY 1.0exp1

12

Questions




13

How to fit the data ?

• Write a (statistical) model

• Choose a criterion

• Minimize the criterion

Proceed in three main steps

14

Write a (statistical) model

• Find a function of covariate(s) to describe the

mean variation of the dependent variable (mean

model).

• Find a function of covariate(s) to describe the

dispersion of the dependent variable about the

mean (variance model).

15

0.1

1.0

10.0

100.0

1000.0

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

Example

tY 21 exp

is assumed gaussianwith a constant variance

homoscedastic model

16

How to choose the criterion to optimize ?

Homoscedasticity : Ordinary Least Squares (OLS)When normality OLS are equivalent to maximum likelihood

Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)

17

Homoscedastic models

minimum ,,, minimum 212

pi

i SS

iiqiipi xxxfY ,,2,121 ,,,,,,,

Define :

,,,,,,,,,,2

,,2,12121 i

iqiipip xxxfYSS

The Ordinary Least-Squares criterion

18

Heteroscedastic models : Weight Least-Squares criterion

minimum ,,, minimum 212

pi

ii WSSw


Define :

,,,,,,,,,,2

,,2,12121 i

iqiipiip xxxfYwWSS

weight iw

19

How to choose the weights ?

iiqiip

iqiipi

xxxg

xxxfY

,,,,,,,

,,,,,,,

,,2,121

,,2,121

When the model

is heteroscedastic (ie is not constant with i))( iVar It is possible to rewrite it as


where does not depend on i)( iVar The weights are chosen as

2,,2,121 ,,,,,,,1/ iqiipi xxxgw

20

Example

iii tY 21 exp

iiii ttY expexp 2121

with ii

i

tt

YCV

22

12

21i 2expexpVar

cste

The model can be rewritten as

csteiVar with

The weights are chosen as 221 exp1/ ii tw

21

Extended (Weight) Least Squares

minimum ,,, 21 pEWSS


Define :

ii

iqiipiip wxxxfYwEWSS ln- ,,,,,,,,,,2

,,2,12121

weight iw

22

Balance sheet

Criterion When Advantages Drawbacks

OLSHomoscedastic models

Easy to use

WLSHeteroscedastic models

robust to variance mispecification

estimator with large variance

ELSHeteroscedastic models

Unbiased estimate small variance (efficiency with normality)

not robust to variance mispecification

23

The criterion properties

It converges

It leads to consistent (unbiased) estimates

It leads to efficient estimates

It has several minima

24

It converges

When the sample size increases, it concentrates about a value of the parameter

Example : Consider the homoscedastic model

iii tY 21 exp

exp,1

22121

n

iii tYSS

The criterion to use is the Least Squares criterion

25

It converges

, 21 SS

1

2

Small sample size

Large sample size

26

It leads to consistent estimates

, 21 SS

1

2

02

01

The criterion concentrates about the true value

27


For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound).

For a fixed n, the "precision" of a consistent estimator is bounded

An estimator is efficient when its varianceequals this lower bound

28

Geometric interpretation

1

2criterion

This ellipsoidis a confidence region of the parameter

29


1

2

02

01

For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood)

- 2 ln(likelihood)

criterion

30

It has several minima

1

2criterion

31

Minimize the criterion

Suppose that the criterion to optimize has been chosen

We are looking for the value of denoted

which achieve the minimum of the criterion.

21ˆ,ˆ 21 ,

We need an algorithm to minimize such a criterion

32

Example

Consider the homoscedastic model

iii tY 21 exp

We are looking for the value of denoted

exp,2

2121 i

ii tYSS

which achieve the minimumof the criterion

21ˆ,ˆ 21 ,

33

Isocontours

, 21 SS

1

2

34

Different families of algorithms

• Zero order algorithms : computation of the criterion• First order algorithms : computation of the first derivative of the criterion• Second order algorithms : computation of the second derivative of the criterion

35

Zero order algorithms

• Simplex algorithm

• Grid search and Monte-Carlo methods

36

Simplex algorithm

1

2

37

Monte-carlo algorithm

1

2

38

First order algorithms

• Line search algorithm

• Conjugate gradient

39

First order algorithms

The derivatives of the criterion cancel at its optimaSuppose that there is only one parameter to estimate

The criterion (e.g. SS) depends only on

How to find the value(s) of where the criterion cancels ?

40

Line search algorithm

Derivative of the criterion

1

2

41

Second order algorithms

Gauss-Newton (steepest descent method)

Marquardt

42

Second order algorithms

The derivatives of the criterion cancel at its optima.

When the criterion is (locally) convex there is a path to

reach the minimum : the steepest direction.

43

Gauss Newton (one dimension)

Derivative of thecriterion

123

The criterion is convex

44

Gauss Newton (one dimension)

Derivative of thecriterion

The criterion is not convex

1 2

45

Gauss Newton

1

2

46

MarquardtD

eriv

ativ

e of

the

crit

erio

n

Allows to deal with the case where the criterion is not convex

12

When the second derivative <0 (first derivative decreases)it is set to a positive value

3

47

Balance sheet

Order Algo When Robustness Speed

0Monte-Carlo

To start the optimisation +++ 0

Simplex To start the optimisation ++ +

1Conjugate gradient

When the second derivative is difficult to compute

+ ++

Line SearchWhen the second derivative is difficult to compute

++ +

2Gauss Newton

To finish the optimisation

0 +++

MarquardtWith a reasonnable starting point

+ ++

48

Questions




49

Is this model relevant ?

• Graphical inspection of the residuals

– mean model ( f )

– variance model ( g )

• Inspection of numerical results

– variance-correlation matrix of the estimator

– Akaike indice

50

Graphical inspection of the residuals

iiqiip

iqiipi

xxxg

xxxfY

,,,,,,,

,,,,,,,

,,2,121

,,2,121

For the model

Calculate the weight residuals :

,,,,ˆ,,ˆ,ˆ

,,,,ˆ,,ˆ,ˆˆ

,,2,121

,,2,121

iqiip

iqiipii

xxxg

xxxfY

and draw i vs iqiipi xxxfY ,,2,121 ,,,,ˆ,,ˆ,ˆˆ

51

Check the mean model

scatterplot of weight residuals vs fitted values

iY

i

0

No structure in the residualsOK

iY

i

0

structure in the residualschange the mean model(f function)

ijx , ijx ,

52

Check the variance model : homoscedasticity

Scatterplot of weight residuals vs fitted values

iY

i

0

homoscedasticityOK

No structure in the residualsbut heteroscedasticitychange the model (g function)

iY

i

0

ijx ,ijx ,

53

0

1

10

100

1000

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

ExampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

iii tY 21 exphomoscedastic model

Criterion : OLS

54

Example

-6

-4

-2

0

2

4

6

0 100 200 300 400it

i

structure in the residuals

change the mean model

New model iiii ttY 4321 expexp

homoscedastic model

iii tY 21 exp

55

-3

-2

-1

0

1

2

3

4

0 100 200 300 400

Example

it

i

heteroscedasticitychange the variance model

New model

iiii ttY 1expexp 4321

Need WLS

iiii ttY 4321 expexp

56

Example


-0.15

-0.1

-0.05

0

0.05

0.1

0 100 200 300 400

it

i

No structureWeight residuals homoscedastic

OK

57

Inspection of numerical results

correlation matrix of the estimator

1ˆ,ˆˆ,ˆ

ˆ,ˆ1ˆ,ˆ

ˆ,ˆˆ,ˆ1

21

221

121

pp

p

p

rr

rr

rr

C

Strong correlations between estimators :• the model is over-parametrized• the parametrization is not good• the model is not identifiable

58

The model is over-parametrized

Change the mean and/or variance model (f and/or g )

Example :The appropriate model is

and you fitted

iii tY 21 exp


Perform a test or check the AIC

59

The parametrization is not good

Change the parametrization of your model

Example :you fitted

try

iii tY 21 exp

iii tV

Cl

V

DY

exp

Two useful indices :the parametric curvaturethe intrinsic curvature

60

The model is not identifiable

The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation

Examples :

1

2

crit

erio

n

1

2

crit

erio

n

Look at the eigenvalues of the correlation matrix if

minis too large and/orminmax /

too small, simplify the model

61

The Akaike indice

The Akaike indice allows to select a model among several models in "competition".

The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely.

The penality is chosen such that the indice is convergent :when the sample size increases, the indice selects the "true" model.

pSSnAIC 2ln n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated

The model with the smaller AIC is the best among the comparedmodels

62

Example

0 0.06979 100.665 0.0976 10.6160 0.010191 0.04349 101.738 0.1048 12.4846 0.011482 0.03713 101.589 0.1037 12.1725 0.011213 0.03707 101.596 0.1035 12.1057 0.011174 0.03707 101.595 0.1035 12.1051 0.011175 0.03707 101.595 0.1035 12.1051 0.01117

Final value of loss function is 0.037

Iteration Loss1 2 3 4

63

Example

Parameter Estimate A.S.E. Param/ASETheta1 101.595 6.104 16.645Theta2 0.104 0.006 17.366Theta3 12.105 0.784 15.431Theta4 0.011 3.66E-04 30.067

10.632 10.006 0.449 1

-0.003 0.393 0.916 1

1 2 3 4

R =

essentiallyintrinsic curvature

64

About the ellipsoid

It is linked to the convexity of the criterionIt is linked to the variance of the estimator

The convexity of the criterion is linked to the variance of the estimator

65

Different degres of convexity

flat criterionweakly convex

convex criterion

locally convex convex in some directions locally convex

66

How to measure convexity ?

Calculate the hessian matrix matrix of partial second derivatives

22

212

21

212

21

212

21

212

21 ,,

,,

,

SSSS

SSSS

H

When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated

One parameter

Severalparameters

67


212

21121 ,0

0,,

H

It is possible to find a linear transformation of the parameterssuch that the hessian matrix is

212 , are the eigenvalues of the hessian matrix 211 ,

0, 211 When for all , and 0, 212 21 ,

the criterion is convex

68


0, 211 When for some , and 0, 212 21 ,

the criterion is locally convex

What is the point for which21 , 0, 211

and 0, 212 ?

When 212 , 211 , and are low (but >0),

the criterion is flat

69

The variance-covariance matrix

The variance-covariance matrix of the estimator (denoted V)

is proportional to 02

01

1 ,H

221

211

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

02

012

02

011

02

012

02

011

,

10

0,

1

,0

0,

V

It is possible to find a linear transformation of the parameterssuch that V is

70

The variance-covariance matrix

02

011 , 0

20

12 ,

are the eigenvalues of the variance-covariance matrix V

71

The correlation matrix

The correlation matrix of the estimator (denoted C )

is obtained from V

221

211

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

21

21

ˆ ˆ

ˆ,ˆ with

1

1

VarVar

Covr

r

rC

correlation matrix

72

Geometric interpretation

1

2

crit

erio

n

2

1

02

011 ,

02

012 , 1Var

2Var

r = 0Axes of the ellipsoid // axes

Documents

Nonlinear Regression Didier Concordet NATIONAL VETERINARY SCHOOL Toulouse