Robust Methods in Biostatistics

Preview:

DESCRIPTION

robustness, robust analysis GLM, breastfeeding data

Citation preview

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 1 / 39

Robust methods in biostatistics

Stephane Heritier

The George Institute for Global HealthThe University of Sydney

sheritier@george.org.au

Outline

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39

� Motivating examples

� Robustness background

� GLM setting

� Robust analysis in GLMs

� Example 1 : Breastfeeding data

� Extensions to GEE

� Example 2 : GUIDE data.

� Conclusions

Outline

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39

� Motivating examples

� Robustness background

� GLM setting

� Robust analysis in GLMs

� Example 1 : Breastfeeding data

� Extensions to GEE

� Example 2 : GUIDE data.

� Conclusions

Book : Robust Methods in Biostatistics by Heritier, Cantoni, Copt and Victoria-Feser

(2009), Wiley.

Example 1

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 3 / 39

Breastfeeding dataset : UK study on the decision of pregnantwomen to breastfeed. 135 expecting mothers asked on theirfeeding choice

OUTCOME : (breast= 1 if breastfeeding (tried, mixedbreast- and bottle-feeding), =0 if exclusive bottle-feeding).

COVARIATES : pregnancy advancement (pregnancy, end orbeginning), how mothers fed as babies (howfed, somebreastfeeding or only bottle-feeding), how mother’s friend fedtheir babies (howfedfriends, some breastfeeding or onlybottle-feeding), if had a partner (partner, no/yes), age(age), age at which left full time education (educat), ethnicgroup (ethnic, white/non-white) and if ever smoked(smokebf, no/yes) or if stopped smoking (smokenow, no/yes).Ref. = 1st level (coded 0).

Example 2

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 4 / 39

GUIDE dataset (Guidelines for Urinary IncontinenceDiscussion and Evaluation).

OUTCOME : bothered : 1 for “yes”, 0 for “no”.

COVARIATES : gender, (indicator for women : female), age(minus 76, divided by 10 : age), average number of leakingaccidents per day (dayacc), degree of the leak (severe : 1 =just create some moisture, 2 = wet their underwear (or pad),3 = trickle down their thigh, 4 = wet the floor), daily numberof visits to the toilet to urinate (toilet).

137 patients divided into 38 clusters (=practices) of variablesize (1 to 8), i.e. we have correlated data.

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power

Theory : Heritier and Ronchetti, 1994 (robust testing in general parametric model),

Ronchetti and Cantoni 2001, (GLMs), Cantoni, 2004 (GEE), etc

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39

Model : Fβ (e.g. linear regression, GLM etc)

Neighbourhood : (1 − ǫ)Fβ + ǫG

MLE : solves for β the estimating equation :

n∑

i=1

s(yi, xi; β) = 0

where s = score function

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39

Model : Fβ (e.g. linear regression, GLM etc)

Neighbourhood : (1 − ǫ)Fβ + ǫG

M-estimator : solves for β the estimating equation :

n∑

i=1

ψ(yi, xi; β) = 0

where ψ = score − like function

Good properties :- asymptotically normal with asymptotic variance given by asandwich formula (Huber, 1967).- influence function proportional to ψ- bias in a neighbourhood controlled iff ψ bounded

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Extension to GEE ?

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Extension to GEE ?

How about testing ?

GLM setting

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 8 / 39

GLM ingredients

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 9 / 39

The random component : y1, . . . , yn from the exponentialfamily. We denote µi = E(yi) and V ar(yi) = φvµi

.

The systematic component : A parameterβT = (β0, β1, . . . , βq) and explanatory variablesxTi = (1, xi1, . . . , xiq) to construct the linear predictorηi = xTi β. Linearity is intended w.r.t. the parameters.

The link : A monotone and differentiable link function gwhich links the random and the systematic componentsof the model

g(µi) = ηi = xTi β.

Examples

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 10 / 39

✔ logistic regression : Yi ∼ Bernoulli(pi), E(Yi) = pi

log( pi

1 − pi

)

= logit(pi) = xTi β.

✔ Poisson regression : Yi ∼ P(λi), E(Yi) = λi

log(λi) = xTi β.

✔ Gamma regression : Yi ∼ Γ(µi, ν), E(Yi) = µi

log(µi) = xTi β.

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves

n∑

i=1

[

ψ(ri)w(xi)1

v1/2(µi)µ′i − a(β)

]

= 0

where a(β) = 1n

∑ni=1E[ψ(ri)]w(xi)

1v1/2(µi)

µ′i.

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves

n∑

i=1

[

riψ(ri)

ri︸ ︷︷ ︸

w̃(ri)

w(xi)1

v1/2(µi)µ′i − a(β)

]

= 0

where a(β) = 1n

∑ni=1E[ψ(ri)]w(xi)

1v1/2(µi)

µ′i.

Huber’s ψc function and weights

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 12 / 39

t−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

(a)

t−4 −2 0 2 4

0.4

0.6

0.8

1.0

(b)

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 13 / 39

⋆ Huber function ψ(r; c) :

ψ(r; c) =

{r |r| < cc · sign(r) otherwise

⋆ Weights on the design : either based on the hat matrix(e.g. w(xi) =

√1 − hii) or on the Mahalanobis distance, e.g.

w(xi) = 1/√

1 + 8 max(0,

d2i−q√2q

), with robust estimates of

the center and the scatter in di (MCD or MVE).

Properties of the robust GLM estimator

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39

It holds that

√n(β̂ − β)

n→∞∼ N(

0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))

.

where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1

nXTAX − a(β)a(β)T , A and B are

diagonal matrices involving tedious but tractable expectations.

Properties of the robust GLM estimator

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39

It holds that

√n(β̂ − β)

n→∞∼ N(

0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))

.

where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1

nXTAX − a(β)a(β)T , A and B are

diagonal matrices involving tedious but tractable expectations.

Moreover, the influence function is proportional to the robustscore → bounded bias in the neighbourhood (i.e. thecorresponding estimator is locally robust)

Testing - robust deviances

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39

GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances

∆D = 2[l(β̂, y) − l(β̇, y)

] H0,n→∞∼ χ2q.

Testing - robust deviances

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39

GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances

∆D = 2[l(β̂, y) − l(β̇, y)

] H0,n→∞∼ χ2q.

The robust counterpart is

ΛQM = 2[ n∑

i=1

QM(yi, µ̂i) −n∑

i=1

QM(yi, µ̇i)]H0,n→∞∼

q∑

i=1

diχ21i,

µ̂i and µ̇i obtained under models Mp and Mp−q resp., di are positive eigenvalues of

a matrix depending only on Q and the (general) inverse(s) of M in the full (reduced)

models. QM (yi, µi) is a robust version of Q(yi, µi) =∫ µi

yi(yi − t)/V (t)dt, Nelder’s

quasi-deviance function.

Breastfeeding example

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 16 / 39

Model

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 17 / 39

breasti ∼ Bernoulli(µi), so that E(breasti) = µi andV ar(breasti) = µi(1 − µi) (Binomial family).

Use the logit link.

logit(E(breast)) = logit(P (breast)) =

= β0 + β1pregnancy + β2howfed + β3howfedfr

+ β4partner + β5age + β6educat + β7ethnic

+ β8smokenow + β9smokebf,

where logit(µi) = log( µi

1−µi), with µi/(1 − µi) being the odds

of a success, and µi = P (breast) is the probability of at leasttry to breastfeed.

Classical fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 18 / 39

Variable Coeff. St. error z-value p-valueIntercept -4.50386 2.43357 -1.851 0.06421

pregnancy beginning -0.98115 0.57740 -1.699 0.08927howfed breast 0.30804 0.59119 0.521 0.60233

howfedfr breast 1.49555 0.58784 2.544 0.01095partner yes 1.08438 0.70281 1.543 0.12285

age 0.02681 0.05151 0.520 0.60279educat 0.17400 0.12703 1.370 0.17077

ethnic non-white 1.95507 0.75601 2.586 0.00971smokenow yes -3.31232 1.01311 -3.269 0.00108smokebf yes 1.74417 1.00626 1.733 0.08304

Tab. 1 – Parameters and standard errors estimates.

Robust estimation - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 19 / 39

> library(”robustbase”)

# Model fitting : Huber’s estimator> breast.glmrob=glmrob(breast ˜ howfedfr + ethnic + educat + age + pregnancy +howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), data=breastfeeding)> summary(breast.glmrob)

# Tuning constant for the Huber fct : trade-off efficiency vs robustnessDefault c=1.345, 95% efficiency, normal model.GLM : choose a slightly higher value for GLM (e.g. c = 1.5)

# Model fitting : Mallows’ estimator> breast.glmrobWx=glmrob(breast ˜ howfedfr + ethnic + educat + age +pregnancy + howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), weights.on.x=”hat”, data=breastfeeding)> summary(breast.glmrobWx)

Output - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 20 / 39

Call: glmrob(formula = breast ~ howfedfr + ethnic + educat + age

+ pregnancy + howfed + partner + smokenow + smokebf, family

=binomial, data = breast, control = glmrobMqle.control(tcc = 1.5))

Coefficients:

Estimate Std. Error z-value Pr(>|z|)

(Intercept) -7.78236 3.36541 -2.312 0.02075 *\\

howfedfrBreast 1.47905 0.69010 2.143 0.03209 *\\

ethnicNon-white 2.71169 1.12482 2.411 0.01592 *\\

educat 0.37661 0.18551 2.030 0.04234 *\\

age 0.03034 0.05959 0.509 0.61066 \\

pregnanyBeginning -0.81551 0.69536 -1.173 0.24088 \\

howfedBreast 0.54496 0.70962 0.768 0.44251 \\

partnerPartner 0.77184 0.81621 0.946 0.34433 \\

smokenowYes -3.47631 1.12924 -3.078 0.00208 **\\

smokebfYes 1.50738 1.10305 1.367 0.17176 \\

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 . 0.1 \\

Robustness weights w.r \\

* w.x:

127 weights are ~= 1; the remaining 8 ones:

3 11 14 53 63 75 90 115

0.74233 0.22474 0.41645 0.80182 0.51590 0.04944 0.20662 0.25762

Number of observations: 135 Fitted by method Mqle (in 12

iterations)

(Dispersion parameter for binomial family taken to be 1)

No deviance values available Algorithmic parameters:

acc tcc

0.0001 1.5000 maxit

50

test.acc

"coef"

Robust fit(s)

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 21 / 39

Variable Huber, c = 1.5 Mallows, c = 1.5Coeff. St. error z-value Coeff. St. error z-value

Intercept -7.782 3.365 -2.312 -7.778 3.363 -2.313pregnancy beginning -0.816 0.695 -1.173 -0.815 0.694 -1.173howfed breast 0.545 0.710 0.768 0.540 0.708 0.763howfedfr breast 1.479 0.690 2.143 1.482 0.689 2.150partner yes 0.772 0.816 0.946 0.775 0.816 0.950age 0.030 0.060 0.509 0.031 0.060 0.513educat 0.377 0.186 2.030 0.376 0.185 2.028ethnic non-white 2.712 1.125 2.411 2.705 1.122 2.411smokenow yes -3.476 1.129 -3.078 -3.468 1.127 -3.078smokebf yes 1.507 1.103 1.367 1.507 1.102 1.368

Tab. 2 – Parameters and standard errors estimates.

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 22 / 39

# plot of the weights w.r> par(mfrow=c(2,1))> plot(breast.glmrobWx$w.r,ylab=”w(r)”)> identify(breast.glmrobWx$w.r)# left-click on observations to identify them, right-click to end

# plot of the weights w.x> plot(breast.glmrobWx$w.x,ylab=”w(x)”)> identify(breast.glmrobWx$w.x)

# plot of various (robust) Pearson residuals> age=breastfeed[,8]> plot(breast.glmrobWx$residuals, ylab=”Pearson residuals”)> plot(breast.glmrobWx$linear.predictors,breast.glmrobWx$residuals,xlab=”Predictors”,ylab=”Pearson residuals”)

Robustness weights

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 23 / 39

0 20 40 60 80 100 120 140

0.85

0.95

Index

w(x

)

18

0 20 40 60 80 100 120 140

0.2

0.6

1.0

Index

w(r

)

3

11

14

53

63

75

90 115

Explanation

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 24 / 39

Observations : 75, 11, 115 and 14 had a large probability ofbreastfeeding (p ≥ 0.90) but bottlefed

Observations : 63 and 90 have a low probability ofbreastfeeding (p ≤ 0.11) but breastfed

large negative (resp. positive) Pearson residuals in the robustanalysis

Residuals

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 25 / 39

0 20 40 60 80 120

−30

−20

−10

0

Index

Pea

rson

res

idua

ls

Robust analysis

0 20 40 60 80 120

−30

−20

−10

−5

05

Index

Pea

rson

res

idua

ls

Classical analysis

Manual elimination - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 26 / 39

Example : Omit variable pregnancy from the full robust fit

> breast.glmrobWx=glmrob(breast ˜ pregnancy + howfedfr + ethnic + educat +age + howfed + partner + smokenow+smokebf, family=binomial,weights.on.x=”hat”, control=glmrobMqle.control(tcc=1.5),data=breast)> step1pregnancy=glmrob(breast ˜ howfedfr + ethnic + educat +age + howfed + partner +smokenow +smokebf, family=binomial,weights.on.x=”hat”,control=glmrobMqle.control(tcc=1.5),data=breast)

> anova(breast.glmrobWx,step1pregnancy,test=”QD”)

Robust Quasi-Deviance Table

Model 1: breast ~ howfedfr + ethnic + educat + age + grp + howfed

+ partner + smokenow + smokebf

Model 2: breast ~ howfedfr + ethnic + educat + age + howfed +

partner + smokenow + smokebf

pseudoDf Test.Stat Df Pr(>chisq)

1 125 2 126 -1.5993 -1 0.206

Extension to GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 27 / 39

GEE setting

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 28 / 39

Outcome Yit for subject i at time t (i = 1, · · · , K andt = 1, · · · , ni). Subjects independent.

Set of covariates xit.

Model E(Yit) = µit via g(µit) = xTitβ for a known linkfunction g.

Assume V ar(Yi) = φA1/2i Ri(α)A

1/2i , where

Ai = diag(v(µi1), · · · , v(µini)), v(µi1) = var(Yit)

Ri(α) is a working correlation matrix.

If one observation per cluster → GLM (no correlation).

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi V −1

i ( · (Yi − µi) ) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai.

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi ΓTi V

−1i (Wi · (Yi − µi) − ci) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).

Weights Wi :

w̃(rit) =

{c/|rit| if |rit| > c1 otherwise,

and w(xit) as for GLMs.

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi ΓTi V

−1i (Wi · (Yi − µi) − ci) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).

Weights Wi :

w̃(rit) =

{c/|rit| if |rit| > c1 otherwise,

and w(xit) as for GLMs.

Robust estimation of nuisance parameters (α, φ) as well.

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 30 / 39

It holds that

√n(β̂ − β)

n→∞∼ N (0,M−1QM−1),

where

M = limK→∞

1

K

K∑

i=1

DTi ΓT

i V−1i ΓiDi

Q = limK→∞

1

K

K∑

i=1

DTi ΓT

i V−1i V ar(ψi)V

−1i ΓiDi

ψi = Wi · (Yi − µi).

Again the influence function for a generic observation isproportional to the bounded score.

Robust quasi-likelihoods for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 31 / 39

Test statistic

Λt(s) = 2( K∑

i=1

Qti(s)(yi, µ̂i) −K∑

i=1

Qti(s)(yi, µ̇i))H0,n→∞∼

q∑

i=1

diχ21i

where µ̂i and µ̇i are the estimates under the complete and the restricted modelrespectively, di the positive eigenvalues of Q

(M−1 − M̃+

), and where

Qti(s)(yi, µi) =

1

φ

∫ ti(s)=µi

ti(s)=yi

(yi − ti)TWi(ti)V

−1( bfti)Γi(ti)dti(s) =

− 1

φ

∫ ti(s)=µi

ti(s)=yi

E((yi − ti)

TWi(ti))V −1(ti)Γi(ti)dti(s),

with the integrals possibly path-dependent, but path dependence vanishes

asymptotically.

Example 2 - GUIDE data

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 32 / 39

Classical and robust fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 33 / 39

Estimates on the full logistic model.

Classical Robust Robustw̃(rit) 1 Huber Huberw(xit) 1 1 w(xit) =

√1 − hit

α̂ 0.09 0.11 0.10Intercept -3.05 -3.62 -3.63

(0.96) (1.30) (1.28)female -0.75 -1.45 -1.41

(0.60) (0.80) (0.78)age -0.68 -1.48 -1.39

(0.56) (0.71) (0.69)dayacc 0.39 0.51 0.52

(0.09) (0.13) (0.13)severe 0.81 0.71 0.69

(0.36) (0.42) (0.41)toilet 0.11 0.36 0.35

(0.10) (0.13) (0.13)

R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 34 / 39

> source(”robGEE.r”)

> GUIDEclass <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1000,kconst=1000)> GUIDEclass

> GUIDEHuber <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,kconst=2.4)> GUIDEHuber

> GUIDEMallows <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,weights.on.x=”hat”,kconst=2.4)> GUIDEMallows

# default correlation structure in robGEE = ”exchangeable”

Weights - Huber fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 35 / 39

0 20 40 60 80 100 120 140

0.2

0.4

0.6

0.8

1.0

Index

w(r

)

Weights per practice

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 36 / 39

• ••

• •

••

• • • • •

••••• • •••

••

• •

• ••

•• • • •••• • • • ••

practice

Wei

gth

w.y

0 50 100 150 200

0.2

0.4

0.6

0.8

1.0

8

19

22

42

44

88

87

135

Explanation

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 37 / 39

8 observations in 137 have a weight lower than 0.60

Patients 8, 42 and 44 pretend not to be bothered despitefrequent visits to the toilet (10 for the first two, 20 for patient40) and a high number of leaking accidents

Case 19 and 38 are bothered although the severity of theirsymptoms is pretty low compared to the others

Some of these patients also identified by other techniques.

Backward stepwise procedure for (R)GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 38 / 39

Variable Step 1 Step 2 Step 3 Step 4Classical female 0.224 0.270 - -

age 0.249 - - -dayacc 3 · 10−5 3 · 10−5 2 · 10−5 4 · 10−6

severe 0.089 0.081 0.061 0.011toilet 0.224 0.164 0.165 -

Robust female 0.070 0.095 - -age 0.045 0.041 0.068 -dayacc 6 · 10−5 2 · 10−5 5 · 10−6 8 · 10−5

severe 0.092 - - -toilet 0.006 0.004 0.004 0.002

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)

Suggestion : put them in your statistical toolbox !

References

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 40 / 39

Cantoni (2004) A Robust Approach to Longitudinal Data Analysis, Canadian Journal

of Statistics, 32, 169-180.

Cantoni & Ronchetti (2001) Robust Inference for Generalized Linear Models, Journal

of the American Statistical Association, 1022-1030.

Heritier & Ronchetti (1994) Robust bounded-influence tests in general parametricmodels. Journal of the American Statistical Association 89(427), 897904.

Liang & Zeger (1986) Longitudinal Data Analysis Using Generalized Linear Models,Biometrika, 73, 13-22.

McCullagh & Nelder (1989) Generalized Linear Models, Chapman & Hall.

Moustaki, Victoria-Feser & Hyams (1998) A UK study on the effect of socioeconomicbackground of pregnant women and hospital practice on the decision to breastfeedand the initiation and duration of breastfeeding, Statistics Research Report LSERR44,London School of Economics, London.

Nelder & Weddeburn (1972) Generalized Linear Models, Journal of the Royal

Statistical Society A, 135, 370-384.

Victoria-Feser (2002) Robust inference with binary data, Psychometrika, 67, 21-23.

Recommended