Upload
jarrod-soper
View
222
Download
0
Embed Size (px)
Citation preview
Nonlinear Regression
Didier Concordet
NATIONAL VETERINARY SCHOOL Toulouse
2
0
1
10
100
1000
0 100 200 300 400
Time (t)
Co
nce
ntr
atio
n (
Y)
An exampleTime Conc
0 112.05 69.1
10 50.420 22.330 12.860 6.390 4.0
120 3.5150 2.2180 1.7210 1.2300 0.4
3
Questions
• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?
• For a given model, how to fit the data ?
• Is this model relevant ?
4
What does nonlinear mean ?
Definition : An operator (P) is linear if :• for all objects x, y on which it operates
P(x+y) = P (x) + P(y)• for all numbers and all objects x
P (x) = P(x)
When an operator is not linear, it is nonlinear
5
Examples
• P (t) = a t
• P(t) = a
• P(t) = a + b t
• P(t) = a t + b t²
Among the operators below which one are nonlinear ?
• P(a,b) = a t + b t²
• P(A,) = A exp (- t)
• P(A) = A exp (- 0.1 t)
• P(t) = A exp (- t)
6
What is a nonlinear kinetics ?
For a given dose D Concentration at time
t, C(t,D)
The kinetics is linear when the operator :
DtCDP , is linear
When P(D) is not linear, the kinetics is nonlinear
7
What is a nonlinear kinetics ?
Examples :
t
V
Cl
V
DDtCDP exp,
KtV
DDtCDP ,
8
What is a nonlinear statistical model ?
A statistical model
qp xxxfY ,,,,,,, 2121
Observation :Dep. variable
Parameters Covariates :indep. variables
Error :residual
function
9
What is a nonlinear statistical model ?
qpp xxxfP ,,,,,,,,,, 212121
A statistical model is linear when the operator :
is linear.When pP ,,, 21 is not linear
the model is nonlinear
10
What is a nonlinear statistical model ?
Example :
tY 21
Y = Concentration
t = time
The model :
is linear
11
Examples
tY 21
2321 ttY
tY 21 exp
ttY 4321 expexp
Among the statistical models below which one are nonlinear ?
22
2
3
1
x
xY
tY 1.0exp1
12
Questions
• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?
• For a given model, how to fit the data ?
• Is this model relevant ?
13
How to fit the data ?
• Write a (statistical) model
• Choose a criterion
• Minimize the criterion
Proceed in three main steps
14
Write a (statistical) model
• Find a function of covariate(s) to describe the
mean variation of the dependent variable (mean
model).
• Find a function of covariate(s) to describe the
dispersion of the dependent variable about the
mean (variance model).
15
0.1
1.0
10.0
100.0
1000.0
0 100 200 300 400
Time (t)
Co
nce
ntr
atio
n (
Y)
Example
tY 21 exp
is assumed gaussianwith a constant variance
homoscedastic model
16
How to choose the criterion to optimize ?
Homoscedasticity : Ordinary Least Squares (OLS)When normality OLS are equivalent to maximum likelihood
Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)
17
Homoscedastic models
minimum ,,, minimum 212
pi
i SS
iiqiipi xxxfY ,,2,121 ,,,,,,,
Define :
,,,,,,,,,,2
,,2,12121 i
iqiipip xxxfYSS
The Ordinary Least-Squares criterion
18
Heteroscedastic models : Weight Least-Squares criterion
minimum ,,, minimum 212
pi
ii WSSw
iiqiipi xxxfY ,,2,121 ,,,,,,,
Define :
,,,,,,,,,,2
,,2,12121 i
iqiipiip xxxfYwWSS
weight iw
19
How to choose the weights ?
iiqiip
iqiipi
xxxg
xxxfY
,,,,,,,
,,,,,,,
,,2,121
,,2,121
When the model
is heteroscedastic (ie is not constant with i))( iVar It is possible to rewrite it as
iiqiipi xxxfY ,,2,121 ,,,,,,,
where does not depend on i)( iVar The weights are chosen as
2,,2,121 ,,,,,,,1/ iqiipi xxxgw
20
Example
iii tY 21 exp
iiii ttY expexp 2121
with ii
i
tt
YCV
22
12
21i 2expexpVar
cste
The model can be rewritten as
csteiVar with
The weights are chosen as 221 exp1/ ii tw
21
Extended (Weight) Least Squares
minimum ,,, 21 pEWSS
iiqiipi xxxfY ,,2,121 ,,,,,,,
Define :
ii
iqiipiip wxxxfYwEWSS ln- ,,,,,,,,,,2
,,2,12121
weight iw
22
Balance sheet
Criterion When Advantages Drawbacks
OLSHomoscedastic models
Easy to use
WLSHeteroscedastic models
robust to variance mispecification
estimator with large variance
ELSHeteroscedastic models
Unbiased estimate small variance (efficiency with normality)
not robust to variance mispecification
23
The criterion properties
It converges
It leads to consistent (unbiased) estimates
It leads to efficient estimates
It has several minima
24
It converges
When the sample size increases, it concentrates about a value of the parameter
Example : Consider the homoscedastic model
iii tY 21 exp
exp,1
22121
n
iii tYSS
The criterion to use is the Least Squares criterion
25
It converges
, 21 SS
1
2
Small sample size
Large sample size
26
It leads to consistent estimates
, 21 SS
1
2
02
01
The criterion concentrates about the true value
27
It leads to efficient estimates
For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound).
For a fixed n, the "precision" of a consistent estimator is bounded
An estimator is efficient when its varianceequals this lower bound
28
Geometric interpretation
1
2criterion
This ellipsoidis a confidence region of the parameter
29
It leads to efficient estimates
1
2
02
01
For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood)
- 2 ln(likelihood)
criterion
30
It has several minima
1
2criterion
31
Minimize the criterion
Suppose that the criterion to optimize has been chosen
We are looking for the value of denoted
which achieve the minimum of the criterion.
21ˆ,ˆ 21 ,
We need an algorithm to minimize such a criterion
32
Example
Consider the homoscedastic model
iii tY 21 exp
We are looking for the value of denoted
exp,2
2121 i
ii tYSS
which achieve the minimumof the criterion
21ˆ,ˆ 21 ,
33
Isocontours
, 21 SS
1
2
34
Different families of algorithms
• Zero order algorithms : computation of the criterion• First order algorithms : computation of the first derivative of the criterion• Second order algorithms : computation of the second derivative of the criterion
35
Zero order algorithms
• Simplex algorithm
• Grid search and Monte-Carlo methods
36
Simplex algorithm
1
2
37
Monte-carlo algorithm
1
2
38
First order algorithms
• Line search algorithm
• Conjugate gradient
39
First order algorithms
The derivatives of the criterion cancel at its optimaSuppose that there is only one parameter to estimate
The criterion (e.g. SS) depends only on
How to find the value(s) of where the criterion cancels ?
40
Line search algorithm
Derivative of the criterion
1
2
41
Second order algorithms
Gauss-Newton (steepest descent method)
Marquardt
42
Second order algorithms
The derivatives of the criterion cancel at its optima.
When the criterion is (locally) convex there is a path to
reach the minimum : the steepest direction.
43
Gauss Newton (one dimension)
Derivative of thecriterion
123
The criterion is convex
44
Gauss Newton (one dimension)
Derivative of thecriterion
The criterion is not convex
1 2
45
Gauss Newton
1
2
46
MarquardtD
eriv
ativ
e of
the
crit
erio
n
Allows to deal with the case where the criterion is not convex
12
When the second derivative <0 (first derivative decreases)it is set to a positive value
3
47
Balance sheet
Order Algo When Robustness Speed
0Monte-Carlo
To start the optimisation +++ 0
Simplex To start the optimisation ++ +
1Conjugate gradient
When the second derivative is difficult to compute
+ ++
Line SearchWhen the second derivative is difficult to compute
++ +
2Gauss Newton
To finish the optimisation
0 +++
MarquardtWith a reasonnable starting point
+ ++
48
Questions
• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?
• For a given model, how to fit the data ?
• Is this model relevant ?
49
Is this model relevant ?
• Graphical inspection of the residuals
– mean model ( f )
– variance model ( g )
• Inspection of numerical results
– variance-correlation matrix of the estimator
– Akaike indice
50
Graphical inspection of the residuals
iiqiip
iqiipi
xxxg
xxxfY
,,,,,,,
,,,,,,,
,,2,121
,,2,121
For the model
Calculate the weight residuals :
,,,,ˆ,,ˆ,ˆ
,,,,ˆ,,ˆ,ˆˆ
,,2,121
,,2,121
iqiip
iqiipii
xxxg
xxxfY
and draw i vs iqiipi xxxfY ,,2,121 ,,,,ˆ,,ˆ,ˆˆ
51
Check the mean model
scatterplot of weight residuals vs fitted values
iY
i
0
No structure in the residualsOK
iY
i
0
structure in the residualschange the mean model(f function)
ijx , ijx ,
52
Check the variance model : homoscedasticity
Scatterplot of weight residuals vs fitted values
iY
i
0
homoscedasticityOK
No structure in the residualsbut heteroscedasticitychange the model (g function)
iY
i
0
ijx ,ijx ,
53
0
1
10
100
1000
0 100 200 300 400
Time (t)
Co
nce
ntr
atio
n (
Y)
ExampleTime Conc
0 112.05 69.1
10 50.420 22.330 12.860 6.390 4.0
120 3.5150 2.2180 1.7210 1.2300 0.4
iii tY 21 exphomoscedastic model
Criterion : OLS
54
Example
-6
-4
-2
0
2
4
6
0 100 200 300 400it
i
structure in the residuals
change the mean model
New model iiii ttY 4321 expexp
homoscedastic model
iii tY 21 exp
55
-3
-2
-1
0
1
2
3
4
0 100 200 300 400
Example
it
i
heteroscedasticitychange the variance model
New model
iiii ttY 1expexp 4321
Need WLS
iiii ttY 4321 expexp
56
Example
iiii ttY 1expexp 4321
-0.15
-0.1
-0.05
0
0.05
0.1
0 100 200 300 400
it
i
No structureWeight residuals homoscedastic
OK
57
Inspection of numerical results
correlation matrix of the estimator
1ˆ,ˆˆ,ˆ
ˆ,ˆ1ˆ,ˆ
ˆ,ˆˆ,ˆ1
21
221
121
pp
p
p
rr
rr
rr
C
Strong correlations between estimators :• the model is over-parametrized• the parametrization is not good• the model is not identifiable
58
The model is over-parametrized
Change the mean and/or variance model (f and/or g )
Example :The appropriate model is
and you fitted
iii tY 21 exp
iiii ttY 1expexp 4321
Perform a test or check the AIC
59
The parametrization is not good
Change the parametrization of your model
Example :you fitted
try
iii tY 21 exp
iii tV
Cl
V
DY
exp
Two useful indices :the parametric curvaturethe intrinsic curvature
60
The model is not identifiable
The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation
Examples :
1
2
crit
erio
n
1
2
crit
erio
n
Look at the eigenvalues of the correlation matrix if
minis too large and/orminmax /
too small, simplify the model
61
The Akaike indice
The Akaike indice allows to select a model among several models in "competition".
The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely.
The penality is chosen such that the indice is convergent :when the sample size increases, the indice selects the "true" model.
pSSnAIC 2ln n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated
The model with the smaller AIC is the best among the comparedmodels
62
Example
0 0.06979 100.665 0.0976 10.6160 0.010191 0.04349 101.738 0.1048 12.4846 0.011482 0.03713 101.589 0.1037 12.1725 0.011213 0.03707 101.596 0.1035 12.1057 0.011174 0.03707 101.595 0.1035 12.1051 0.011175 0.03707 101.595 0.1035 12.1051 0.01117
Final value of loss function is 0.037
Iteration Loss1 2 3 4
63
Example
Parameter Estimate A.S.E. Param/ASETheta1 101.595 6.104 16.645Theta2 0.104 0.006 17.366Theta3 12.105 0.784 15.431Theta4 0.011 3.66E-04 30.067
10.632 10.006 0.449 1
-0.003 0.393 0.916 1
1 2 3 4
R =
essentiallyintrinsic curvature
64
About the ellipsoid
It is linked to the convexity of the criterionIt is linked to the variance of the estimator
The convexity of the criterion is linked to the variance of the estimator
65
Different degres of convexity
flat criterionweakly convex
convex criterion
locally convex convex in some directions locally convex
66
How to measure convexity ?
Calculate the hessian matrix matrix of partial second derivatives
22
212
21
212
21
212
21
212
21 ,,
,,
,
SSSS
SSSS
H
When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated
One parameter
Severalparameters
67
How to measure convexity ?
212
21121 ,0
0,,
H
It is possible to find a linear transformation of the parameterssuch that the hessian matrix is
212 , are the eigenvalues of the hessian matrix 211 ,
0, 211 When for all , and 0, 212 21 ,
the criterion is convex
68
How to measure convexity ?
0, 211 When for some , and 0, 212 21 ,
the criterion is locally convex
What is the point for which21 , 0, 211
and 0, 212 ?
When 212 , 211 , and are low (but >0),
the criterion is flat
69
The variance-covariance matrix
The variance-covariance matrix of the estimator (denoted V)
is proportional to 02
01
1 ,H
221
211
ˆˆ,ˆ
ˆ,ˆˆ
VarCov
CovVarV
02
012
02
011
02
012
02
011
,
10
0,
1
,0
0,
V
It is possible to find a linear transformation of the parameterssuch that V is
70
The variance-covariance matrix
02
011 , 0
20
12 ,
are the eigenvalues of the variance-covariance matrix V
71
The correlation matrix
The correlation matrix of the estimator (denoted C )
is obtained from V
221
211
ˆˆ,ˆ
ˆ,ˆˆ
VarCov
CovVarV
21
21
ˆ ˆ
ˆ,ˆ with
1
1
VarVar
Covr
r
rC
correlation matrix
72
Geometric interpretation
1
2
crit
erio
n
2
1
02
011 ,
02
012 , 1Var
2Var
r = 0Axes of the ellipsoid // axes