Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 1 / 39
Robust methods in biostatistics
Stephane Heritier
The George Institute for Global HealthThe University of Sydney
Outline
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39
� Motivating examples
� Robustness background
� GLM setting
� Robust analysis in GLMs
� Example 1 : Breastfeeding data
� Extensions to GEE
� Example 2 : GUIDE data.
� Conclusions
Outline
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39
� Motivating examples
� Robustness background
� GLM setting
� Robust analysis in GLMs
� Example 1 : Breastfeeding data
� Extensions to GEE
� Example 2 : GUIDE data.
� Conclusions
Book : Robust Methods in Biostatistics by Heritier, Cantoni, Copt and Victoria-Feser
(2009), Wiley.
Example 1
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 3 / 39
Breastfeeding dataset : UK study on the decision of pregnantwomen to breastfeed. 135 expecting mothers asked on theirfeeding choice
OUTCOME : (breast= 1 if breastfeeding (tried, mixedbreast- and bottle-feeding), =0 if exclusive bottle-feeding).
COVARIATES : pregnancy advancement (pregnancy, end orbeginning), how mothers fed as babies (howfed, somebreastfeeding or only bottle-feeding), how mother’s friend fedtheir babies (howfedfriends, some breastfeeding or onlybottle-feeding), if had a partner (partner, no/yes), age(age), age at which left full time education (educat), ethnicgroup (ethnic, white/non-white) and if ever smoked(smokebf, no/yes) or if stopped smoking (smokenow, no/yes).Ref. = 1st level (coded 0).
Example 2
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 4 / 39
GUIDE dataset (Guidelines for Urinary IncontinenceDiscussion and Evaluation).
OUTCOME : bothered : 1 for “yes”, 0 for “no”.
COVARIATES : gender, (indicator for women : female), age(minus 76, divided by 10 : age), average number of leakingaccidents per day (dayacc), degree of the leak (severe : 1 =just create some moisture, 2 = wet their underwear (or pad),3 = trickle down their thigh, 4 = wet the floor), daily numberof visits to the toilet to urinate (toilet).
137 patients divided into 38 clusters (=practices) of variablesize (1 to 8), i.e. we have correlated data.
Robustness background
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39
Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !
Robustness background
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39
Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !
Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.
Robustness background
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39
Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !
Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.
Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power
Robustness background
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39
Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !
Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.
Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power
Theory : Heritier and Ronchetti, 1994 (robust testing in general parametric model),
Ronchetti and Cantoni 2001, (GLMs), Cantoni, 2004 (GEE), etc
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39
Model : Fβ (e.g. linear regression, GLM etc)
Neighbourhood : (1 − ǫ)Fβ + ǫG
MLE : solves for β the estimating equation :
n∑
i=1
s(yi, xi; β) = 0
where s = score function
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39
Model : Fβ (e.g. linear regression, GLM etc)
Neighbourhood : (1 − ǫ)Fβ + ǫG
M-estimator : solves for β the estimating equation :
n∑
i=1
ψ(yi, xi; β) = 0
where ψ = score − like function
Good properties :- asymptotically normal with asymptotic variance given by asandwich formula (Huber, 1967).- influence function proportional to ψ- bias in a neighbourhood controlled iff ψ bounded
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39
Questions : how do we bound ψ ?
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39
Questions : how do we bound ψ ?
In the GLM setting ?
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39
Questions : how do we bound ψ ?
In the GLM setting ?
Extension to GEE ?
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39
Questions : how do we bound ψ ?
In the GLM setting ?
Extension to GEE ?
How about testing ?
GLM setting
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 8 / 39
GLM ingredients
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 9 / 39
The random component : y1, . . . , yn from the exponentialfamily. We denote µi = E(yi) and V ar(yi) = φvµi
.
The systematic component : A parameterβT = (β0, β1, . . . , βq) and explanatory variablesxTi = (1, xi1, . . . , xiq) to construct the linear predictorηi = xTi β. Linearity is intended w.r.t. the parameters.
The link : A monotone and differentiable link function gwhich links the random and the systematic componentsof the model
g(µi) = ηi = xTi β.
Examples
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 10 / 39
✔ logistic regression : Yi ∼ Bernoulli(pi), E(Yi) = pi
log( pi
1 − pi
)
= logit(pi) = xTi β.
✔ Poisson regression : Yi ∼ P(λi), E(Yi) = λi
log(λi) = xTi β.
✔ Gamma regression : Yi ∼ Γ(µi, ν), E(Yi) = µi
log(µi) = xTi β.
Classical and robust estimation in GLMs
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39
Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :
n∑
i=1
yi − µiv(µi)
µ′i =
n∑
i=1
riv1/2(µi)
µ′i = 0
where ri = (yi − µi)/v1/2(µi) and µ′
i = ∂µi/∂β.
Classical and robust estimation in GLMs
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39
Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :
n∑
i=1
yi − µiv(µi)
µ′i =
n∑
i=1
riv1/2(µi)
µ′i = 0
where ri = (yi − µi)/v1/2(µi) and µ′
i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves
n∑
i=1
[
ψ(ri)w(xi)1
v1/2(µi)µ′i − a(β)
]
= 0
where a(β) = 1n
∑ni=1E[ψ(ri)]w(xi)
1v1/2(µi)
µ′i.
Classical and robust estimation in GLMs
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39
Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :
n∑
i=1
yi − µiv(µi)
µ′i =
n∑
i=1
riv1/2(µi)
µ′i = 0
where ri = (yi − µi)/v1/2(µi) and µ′
i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves
n∑
i=1
[
riψ(ri)
ri︸ ︷︷ ︸
w̃(ri)
w(xi)1
v1/2(µi)µ′i − a(β)
]
= 0
where a(β) = 1n
∑ni=1E[ψ(ri)]w(xi)
1v1/2(µi)
µ′i.
Huber’s ψc function and weights
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 12 / 39
t−4 −2 0 2 4
−1.
0−
0.5
0.0
0.5
1.0
(a)
t−4 −2 0 2 4
0.4
0.6
0.8
1.0
(b)
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 13 / 39
⋆ Huber function ψ(r; c) :
ψ(r; c) =
{r |r| < cc · sign(r) otherwise
⋆ Weights on the design : either based on the hat matrix(e.g. w(xi) =
√1 − hii) or on the Mahalanobis distance, e.g.
w(xi) = 1/√
1 + 8 max(0,
d2i−q√2q
), with robust estimates of
the center and the scatter in di (MCD or MVE).
Properties of the robust GLM estimator
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39
It holds that
√n(β̂ − β)
n→∞∼ N(
0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))
.
where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1
nXTAX − a(β)a(β)T , A and B are
diagonal matrices involving tedious but tractable expectations.
Properties of the robust GLM estimator
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39
It holds that
√n(β̂ − β)
n→∞∼ N(
0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))
.
where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1
nXTAX − a(β)a(β)T , A and B are
diagonal matrices involving tedious but tractable expectations.
Moreover, the influence function is proportional to the robustscore → bounded bias in the neighbourhood (i.e. thecorresponding estimator is locally robust)
Testing - robust deviances
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39
GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances
∆D = 2[l(β̂, y) − l(β̇, y)
] H0,n→∞∼ χ2q.
Testing - robust deviances
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
GLM ingredients
Examples
Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39
GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances
∆D = 2[l(β̂, y) − l(β̇, y)
] H0,n→∞∼ χ2q.
The robust counterpart is
ΛQM = 2[ n∑
i=1
QM(yi, µ̂i) −n∑
i=1
QM(yi, µ̇i)]H0,n→∞∼
q∑
i=1
diχ21i,
µ̂i and µ̇i obtained under models Mp and Mp−q resp., di are positive eigenvalues of
a matrix depending only on Q and the (general) inverse(s) of M in the full (reduced)
models. QM (yi, µi) is a robust version of Q(yi, µi) =∫ µi
yi(yi − t)/V (t)dt, Nelder’s
quasi-deviance function.
Breastfeeding example
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 16 / 39
Model
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 17 / 39
breasti ∼ Bernoulli(µi), so that E(breasti) = µi andV ar(breasti) = µi(1 − µi) (Binomial family).
Use the logit link.
logit(E(breast)) = logit(P (breast)) =
= β0 + β1pregnancy + β2howfed + β3howfedfr
+ β4partner + β5age + β6educat + β7ethnic
+ β8smokenow + β9smokebf,
where logit(µi) = log( µi
1−µi), with µi/(1 − µi) being the odds
of a success, and µi = P (breast) is the probability of at leasttry to breastfeed.
Classical fit
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 18 / 39
Variable Coeff. St. error z-value p-valueIntercept -4.50386 2.43357 -1.851 0.06421
pregnancy beginning -0.98115 0.57740 -1.699 0.08927howfed breast 0.30804 0.59119 0.521 0.60233
howfedfr breast 1.49555 0.58784 2.544 0.01095partner yes 1.08438 0.70281 1.543 0.12285
age 0.02681 0.05151 0.520 0.60279educat 0.17400 0.12703 1.370 0.17077
ethnic non-white 1.95507 0.75601 2.586 0.00971smokenow yes -3.31232 1.01311 -3.269 0.00108smokebf yes 1.74417 1.00626 1.733 0.08304
Tab. 1 – Parameters and standard errors estimates.
Robust estimation - R code
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 19 / 39
> library(”robustbase”)
# Model fitting : Huber’s estimator> breast.glmrob=glmrob(breast ˜ howfedfr + ethnic + educat + age + pregnancy +howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), data=breastfeeding)> summary(breast.glmrob)
# Tuning constant for the Huber fct : trade-off efficiency vs robustnessDefault c=1.345, 95% efficiency, normal model.GLM : choose a slightly higher value for GLM (e.g. c = 1.5)
# Model fitting : Mallows’ estimator> breast.glmrobWx=glmrob(breast ˜ howfedfr + ethnic + educat + age +pregnancy + howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), weights.on.x=”hat”, data=breastfeeding)> summary(breast.glmrobWx)
Output - R code
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 20 / 39
Call: glmrob(formula = breast ~ howfedfr + ethnic + educat + age
+ pregnancy + howfed + partner + smokenow + smokebf, family
=binomial, data = breast, control = glmrobMqle.control(tcc = 1.5))
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -7.78236 3.36541 -2.312 0.02075 *\\
howfedfrBreast 1.47905 0.69010 2.143 0.03209 *\\
ethnicNon-white 2.71169 1.12482 2.411 0.01592 *\\
educat 0.37661 0.18551 2.030 0.04234 *\\
age 0.03034 0.05959 0.509 0.61066 \\
pregnanyBeginning -0.81551 0.69536 -1.173 0.24088 \\
howfedBreast 0.54496 0.70962 0.768 0.44251 \\
partnerPartner 0.77184 0.81621 0.946 0.34433 \\
smokenowYes -3.47631 1.12924 -3.078 0.00208 **\\
smokebfYes 1.50738 1.10305 1.367 0.17176 \\
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 . 0.1 \\
Robustness weights w.r \\
* w.x:
127 weights are ~= 1; the remaining 8 ones:
3 11 14 53 63 75 90 115
0.74233 0.22474 0.41645 0.80182 0.51590 0.04944 0.20662 0.25762
Number of observations: 135 Fitted by method Mqle (in 12
iterations)
(Dispersion parameter for binomial family taken to be 1)
No deviance values available Algorithmic parameters:
acc tcc
0.0001 1.5000 maxit
50
test.acc
"coef"
Robust fit(s)
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 21 / 39
Variable Huber, c = 1.5 Mallows, c = 1.5Coeff. St. error z-value Coeff. St. error z-value
Intercept -7.782 3.365 -2.312 -7.778 3.363 -2.313pregnancy beginning -0.816 0.695 -1.173 -0.815 0.694 -1.173howfed breast 0.545 0.710 0.768 0.540 0.708 0.763howfedfr breast 1.479 0.690 2.143 1.482 0.689 2.150partner yes 0.772 0.816 0.946 0.775 0.816 0.950age 0.030 0.060 0.509 0.031 0.060 0.513educat 0.377 0.186 2.030 0.376 0.185 2.028ethnic non-white 2.712 1.125 2.411 2.705 1.122 2.411smokenow yes -3.476 1.129 -3.078 -3.468 1.127 -3.078smokebf yes 1.507 1.103 1.367 1.507 1.102 1.368
Tab. 2 – Parameters and standard errors estimates.
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 22 / 39
# plot of the weights w.r> par(mfrow=c(2,1))> plot(breast.glmrobWx$w.r,ylab=”w(r)”)> identify(breast.glmrobWx$w.r)# left-click on observations to identify them, right-click to end
# plot of the weights w.x> plot(breast.glmrobWx$w.x,ylab=”w(x)”)> identify(breast.glmrobWx$w.x)
# plot of various (robust) Pearson residuals> age=breastfeed[,8]> plot(breast.glmrobWx$residuals, ylab=”Pearson residuals”)> plot(breast.glmrobWx$linear.predictors,breast.glmrobWx$residuals,xlab=”Predictors”,ylab=”Pearson residuals”)
Robustness weights
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 23 / 39
0 20 40 60 80 100 120 140
0.85
0.95
Index
w(x
)
18
0 20 40 60 80 100 120 140
0.2
0.6
1.0
Index
w(r
)
3
11
14
53
63
75
90 115
Explanation
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 24 / 39
Observations : 75, 11, 115 and 14 had a large probability ofbreastfeeding (p ≥ 0.90) but bottlefed
Observations : 63 and 90 have a low probability ofbreastfeeding (p ≤ 0.11) but breastfed
large negative (resp. positive) Pearson residuals in the robustanalysis
Residuals
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 25 / 39
0 20 40 60 80 120
−30
−20
−10
0
Index
Pea
rson
res
idua
ls
Robust analysis
0 20 40 60 80 120
−30
−20
−10
−5
05
Index
Pea
rson
res
idua
ls
Classical analysis
Manual elimination - R code
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Model
Classical fitRobust estimation -R code
Output - R code
Robust fit(s)
Robustness weights
ResidualsManual elimination -R code
Extension to GEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 26 / 39
Example : Omit variable pregnancy from the full robust fit
> breast.glmrobWx=glmrob(breast ˜ pregnancy + howfedfr + ethnic + educat +age + howfed + partner + smokenow+smokebf, family=binomial,weights.on.x=”hat”, control=glmrobMqle.control(tcc=1.5),data=breast)> step1pregnancy=glmrob(breast ˜ howfedfr + ethnic + educat +age + howfed + partner +smokenow +smokebf, family=binomial,weights.on.x=”hat”,control=glmrobMqle.control(tcc=1.5),data=breast)
> anova(breast.glmrobWx,step1pregnancy,test=”QD”)
Robust Quasi-Deviance Table
Model 1: breast ~ howfedfr + ethnic + educat + age + grp + howfed
+ partner + smokenow + smokebf
Model 2: breast ~ howfedfr + ethnic + educat + age + howfed +
partner + smokenow + smokebf
pseudoDf Test.Stat Df Pr(>chisq)
1 125 2 126 -1.5993 -1 0.206
Extension to GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 27 / 39
GEE setting
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 28 / 39
Outcome Yit for subject i at time t (i = 1, · · · , K andt = 1, · · · , ni). Subjects independent.
Set of covariates xit.
Model E(Yit) = µit via g(µit) = xTitβ for a known linkfunction g.
Assume V ar(Yi) = φA1/2i Ri(α)A
1/2i , where
Ai = diag(v(µi1), · · · , v(µini)), v(µi1) = var(Yit)
Ri(α) is a working correlation matrix.
If one observation per cluster → GLM (no correlation).
Robust estimation for GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39
Generalize the GEE equations (Cantoni (2004) CJS)
K∑
i=1
DTi V −1
i ( · (Yi − µi) ) = 0,
where Di = ∂µi/∂β, Vi = φAiRi(α)Ai.
Robust estimation for GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39
Generalize the GEE equations (Cantoni (2004) CJS)
K∑
i=1
DTi ΓTi V
−1i (Wi · (Yi − µi) − ci) = 0,
where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).
Weights Wi :
w̃(rit) =
{c/|rit| if |rit| > c1 otherwise,
and w(xit) as for GLMs.
Robust estimation for GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39
Generalize the GEE equations (Cantoni (2004) CJS)
K∑
i=1
DTi ΓTi V
−1i (Wi · (Yi − µi) − ci) = 0,
where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).
Weights Wi :
w̃(rit) =
{c/|rit| if |rit| > c1 otherwise,
and w(xit) as for GLMs.
Robust estimation of nuisance parameters (α, φ) as well.
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 30 / 39
It holds that
√n(β̂ − β)
n→∞∼ N (0,M−1QM−1),
where
M = limK→∞
1
K
K∑
i=1
DTi ΓT
i V−1i ΓiDi
Q = limK→∞
1
K
K∑
i=1
DTi ΓT
i V−1i V ar(ψi)V
−1i ΓiDi
ψi = Wi · (Yi − µi).
Again the influence function for a generic observation isproportional to the bounded score.
Robust quasi-likelihoods for GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
GEE setting
Robust estimationfor GEERobustquasi-likelihoods forGEE
Example 2 - GUIDEdata
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 31 / 39
Test statistic
Λt(s) = 2( K∑
i=1
Qti(s)(yi, µ̂i) −K∑
i=1
Qti(s)(yi, µ̇i))H0,n→∞∼
q∑
i=1
diχ21i
where µ̂i and µ̇i are the estimates under the complete and the restricted modelrespectively, di the positive eigenvalues of Q
(M−1 − M̃+
), and where
Qti(s)(yi, µi) =
1
φ
∫ ti(s)=µi
ti(s)=yi
(yi − ti)TWi(ti)V
−1( bfti)Γi(ti)dti(s) =
− 1
φ
∫ ti(s)=µi
ti(s)=yi
E((yi − ti)
TWi(ti))V −1(ti)Γi(ti)dti(s),
with the integrals possibly path-dependent, but path dependence vanishes
asymptotically.
Example 2 - GUIDE data
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 32 / 39
Classical and robust fit
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 33 / 39
Estimates on the full logistic model.
Classical Robust Robustw̃(rit) 1 Huber Huberw(xit) 1 1 w(xit) =
√1 − hit
α̂ 0.09 0.11 0.10Intercept -3.05 -3.62 -3.63
(0.96) (1.30) (1.28)female -0.75 -1.45 -1.41
(0.60) (0.80) (0.78)age -0.68 -1.48 -1.39
(0.56) (0.71) (0.69)dayacc 0.39 0.51 0.52
(0.09) (0.13) (0.13)severe 0.81 0.71 0.69
(0.36) (0.42) (0.41)toilet 0.11 0.36 0.35
(0.10) (0.13) (0.13)
R code
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 34 / 39
> source(”robGEE.r”)
> GUIDEclass <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1000,kconst=1000)> GUIDEclass
> GUIDEHuber <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,kconst=2.4)> GUIDEHuber
> GUIDEMallows <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,weights.on.x=”hat”,kconst=2.4)> GUIDEMallows
# default correlation structure in robGEE = ”exchangeable”
Weights - Huber fit
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 35 / 39
0 20 40 60 80 100 120 140
0.2
0.4
0.6
0.8
1.0
Index
w(r
)
Weights per practice
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 36 / 39
• ••
•
• •
•
•
•
••
•
• • • • •
•
•
•
••••• • •••
•
••
•
•
• •
•
• ••
•
•
•
•• • • •••• • • • ••
•
•
practice
Wei
gth
w.y
0 50 100 150 200
0.2
0.4
0.6
0.8
1.0
8
19
22
42
44
88
87
135
Explanation
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 37 / 39
8 observations in 137 have a weight lower than 0.60
Patients 8, 42 and 44 pretend not to be bothered despitefrequent visits to the toilet (10 for the first two, 20 for patient40) and a high number of leaking accidents
Case 19 and 38 are bothered although the severity of theirsymptoms is pretty low compared to the others
Some of these patients also identified by other techniques.
Backward stepwise procedure for (R)GEE
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 38 / 39
Variable Step 1 Step 2 Step 3 Step 4Classical female 0.224 0.270 - -
age 0.249 - - -dayacc 3 · 10−5 3 · 10−5 2 · 10−5 4 · 10−6
severe 0.089 0.081 0.061 0.011toilet 0.224 0.164 0.165 -
Robust female 0.070 0.095 - -age 0.045 0.041 0.068 -dayacc 6 · 10−5 2 · 10−5 5 · 10−6 8 · 10−5
severe 0.092 - - -toilet 0.006 0.004 0.004 0.002
Conclusions
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39
Robust estimation and testing procedures are available forGLM and GEE type of models.
Conclusions
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39
Robust estimation and testing procedures are available forGLM and GEE type of models.
R code is available.
Conclusions
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39
Robust estimation and testing procedures are available forGLM and GEE type of models.
R code is available.
Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/
Conclusions
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39
Robust estimation and testing procedures are available forGLM and GEE type of models.
R code is available.
Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/
Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)
Conclusions
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39
Robust estimation and testing procedures are available forGLM and GEE type of models.
R code is available.
Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/
Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)
Suggestion : put them in your statistical toolbox !
References
Outline
Example 1
Example 2
Robustnessbackground
GLM setting
Breastfeedingexample
Extension to GEE
Example 2 - GUIDEdataClassical and robustfit
Robustness weights
Explanation
Conclusions
References
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 40 / 39
Cantoni (2004) A Robust Approach to Longitudinal Data Analysis, Canadian Journal
of Statistics, 32, 169-180.
Cantoni & Ronchetti (2001) Robust Inference for Generalized Linear Models, Journal
of the American Statistical Association, 1022-1030.
Heritier & Ronchetti (1994) Robust bounded-influence tests in general parametricmodels. Journal of the American Statistical Association 89(427), 897904.
Liang & Zeger (1986) Longitudinal Data Analysis Using Generalized Linear Models,Biometrika, 73, 13-22.
McCullagh & Nelder (1989) Generalized Linear Models, Chapman & Hall.
Moustaki, Victoria-Feser & Hyams (1998) A UK study on the effect of socioeconomicbackground of pregnant women and hospital practice on the decision to breastfeedand the initiation and duration of breastfeeding, Statistics Research Report LSERR44,London School of Economics, London.
Nelder & Weddeburn (1972) Generalized Linear Models, Journal of the Royal
Statistical Society A, 135, 370-384.
Victoria-Feser (2002) Robust inference with binary data, Psychometrika, 67, 21-23.