58
Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 1 / 39 Robust methods in biostatistics Stephane Heritier The George Institute for Global Health The University of Sydney [email protected]

Robust Methods in Biostatistics

Embed Size (px)

DESCRIPTION

robustness, robust analysis GLM, breastfeeding data

Citation preview

Page 1: Robust Methods in Biostatistics

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 1 / 39

Robust methods in biostatistics

Stephane Heritier

The George Institute for Global HealthThe University of Sydney

[email protected]

Page 2: Robust Methods in Biostatistics

Outline

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39

� Motivating examples

� Robustness background

� GLM setting

� Robust analysis in GLMs

� Example 1 : Breastfeeding data

� Extensions to GEE

� Example 2 : GUIDE data.

� Conclusions

Page 3: Robust Methods in Biostatistics

Outline

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 2 / 39

� Motivating examples

� Robustness background

� GLM setting

� Robust analysis in GLMs

� Example 1 : Breastfeeding data

� Extensions to GEE

� Example 2 : GUIDE data.

� Conclusions

Book : Robust Methods in Biostatistics by Heritier, Cantoni, Copt and Victoria-Feser

(2009), Wiley.

Page 4: Robust Methods in Biostatistics

Example 1

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 3 / 39

Breastfeeding dataset : UK study on the decision of pregnantwomen to breastfeed. 135 expecting mothers asked on theirfeeding choice

OUTCOME : (breast= 1 if breastfeeding (tried, mixedbreast- and bottle-feeding), =0 if exclusive bottle-feeding).

COVARIATES : pregnancy advancement (pregnancy, end orbeginning), how mothers fed as babies (howfed, somebreastfeeding or only bottle-feeding), how mother’s friend fedtheir babies (howfedfriends, some breastfeeding or onlybottle-feeding), if had a partner (partner, no/yes), age(age), age at which left full time education (educat), ethnicgroup (ethnic, white/non-white) and if ever smoked(smokebf, no/yes) or if stopped smoking (smokenow, no/yes).Ref. = 1st level (coded 0).

Page 5: Robust Methods in Biostatistics

Example 2

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 4 / 39

GUIDE dataset (Guidelines for Urinary IncontinenceDiscussion and Evaluation).

OUTCOME : bothered : 1 for “yes”, 0 for “no”.

COVARIATES : gender, (indicator for women : female), age(minus 76, divided by 10 : age), average number of leakingaccidents per day (dayacc), degree of the leak (severe : 1 =just create some moisture, 2 = wet their underwear (or pad),3 = trickle down their thigh, 4 = wet the floor), daily numberof visits to the toilet to urinate (toilet).

137 patients divided into 38 clusters (=practices) of variablesize (1 to 8), i.e. we have correlated data.

Page 6: Robust Methods in Biostatistics

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Page 7: Robust Methods in Biostatistics

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Page 8: Robust Methods in Biostatistics

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power

Page 9: Robust Methods in Biostatistics

Robustness background

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 5 / 39

Robustness paradigm : the model used is seen more as aworking model − a mathematical abstraction. Data may be(slightly) different !

Procedures (estimators, tests) still have to be stable in aneighbourhood of the assumed model. Classical procedures donot have this stability property in general.

Theoretical development due to the discovery of appropriatetools (M -estimators, Influence Function, Breakdown point)and the increase in computing power

Theory : Heritier and Ronchetti, 1994 (robust testing in general parametric model),

Ronchetti and Cantoni 2001, (GLMs), Cantoni, 2004 (GEE), etc

Page 10: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39

Model : Fβ (e.g. linear regression, GLM etc)

Neighbourhood : (1 − ǫ)Fβ + ǫG

MLE : solves for β the estimating equation :

n∑

i=1

s(yi, xi; β) = 0

where s = score function

Page 11: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 6 / 39

Model : Fβ (e.g. linear regression, GLM etc)

Neighbourhood : (1 − ǫ)Fβ + ǫG

M-estimator : solves for β the estimating equation :

n∑

i=1

ψ(yi, xi; β) = 0

where ψ = score − like function

Good properties :- asymptotically normal with asymptotic variance given by asandwich formula (Huber, 1967).- influence function proportional to ψ- bias in a neighbourhood controlled iff ψ bounded

Page 12: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

Page 13: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Page 14: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Extension to GEE ?

Page 15: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 7 / 39

Questions : how do we bound ψ ?

In the GLM setting ?

Extension to GEE ?

How about testing ?

Page 16: Robust Methods in Biostatistics

GLM setting

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 8 / 39

Page 17: Robust Methods in Biostatistics

GLM ingredients

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 9 / 39

The random component : y1, . . . , yn from the exponentialfamily. We denote µi = E(yi) and V ar(yi) = φvµi

.

The systematic component : A parameterβT = (β0, β1, . . . , βq) and explanatory variablesxTi = (1, xi1, . . . , xiq) to construct the linear predictorηi = xTi β. Linearity is intended w.r.t. the parameters.

The link : A monotone and differentiable link function gwhich links the random and the systematic componentsof the model

g(µi) = ηi = xTi β.

Page 18: Robust Methods in Biostatistics

Examples

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 10 / 39

✔ logistic regression : Yi ∼ Bernoulli(pi), E(Yi) = pi

log( pi

1 − pi

)

= logit(pi) = xTi β.

✔ Poisson regression : Yi ∼ P(λi), E(Yi) = λi

log(λi) = xTi β.

✔ Gamma regression : Yi ∼ Γ(µi, ν), E(Yi) = µi

log(µi) = xTi β.

Page 19: Robust Methods in Biostatistics

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.

Page 20: Robust Methods in Biostatistics

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves

n∑

i=1

[

ψ(ri)w(xi)1

v1/2(µi)µ′i − a(β)

]

= 0

where a(β) = 1n

∑ni=1E[ψ(ri)]w(xi)

1v1/2(µi)

µ′i.

Page 21: Robust Methods in Biostatistics

Classical and robust estimation in GLMs

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 11 / 39

Assume φ = 1 for simplicity. Classical GLM estimatingequations (maximum QL = maximum likelihood if canonicallink) :

n∑

i=1

yi − µiv(µi)

µ′i =

n∑

i=1

riv1/2(µi)

µ′i = 0

where ri = (yi − µi)/v1/2(µi) and µ′

i = ∂µi/∂β.The robust GLM estimator (Cantoni & Ronchetti (2001)) solves

n∑

i=1

[

riψ(ri)

ri︸ ︷︷ ︸

w̃(ri)

w(xi)1

v1/2(µi)µ′i − a(β)

]

= 0

where a(β) = 1n

∑ni=1E[ψ(ri)]w(xi)

1v1/2(µi)

µ′i.

Page 22: Robust Methods in Biostatistics

Huber’s ψc function and weights

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 12 / 39

t−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

(a)

t−4 −2 0 2 4

0.4

0.6

0.8

1.0

(b)

Page 23: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 13 / 39

⋆ Huber function ψ(r; c) :

ψ(r; c) =

{r |r| < cc · sign(r) otherwise

⋆ Weights on the design : either based on the hat matrix(e.g. w(xi) =

√1 − hii) or on the Mahalanobis distance, e.g.

w(xi) = 1/√

1 + 8 max(0,

d2i−q√2q

), with robust estimates of

the center and the scatter in di (MCD or MVE).

Page 24: Robust Methods in Biostatistics

Properties of the robust GLM estimator

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39

It holds that

√n(β̂ − β)

n→∞∼ N(

0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))

.

where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1

nXTAX − a(β)a(β)T , A and B are

diagonal matrices involving tedious but tractable expectations.

Page 25: Robust Methods in Biostatistics

Properties of the robust GLM estimator

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 14 / 39

It holds that

√n(β̂ − β)

n→∞∼ N(

0,M−1(ψ, F )Q(ψ, F )M−1(ψ, F ))

.

where M(ψ, F ) = 1nXTBX and Q(ψ, F ) = 1

nXTAX − a(β)a(β)T , A and B are

diagonal matrices involving tedious but tractable expectations.

Moreover, the influence function is proportional to the robustscore → bounded bias in the neighbourhood (i.e. thecorresponding estimator is locally robust)

Page 26: Robust Methods in Biostatistics

Testing - robust deviances

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39

GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances

∆D = 2[l(β̂, y) − l(β̇, y)

] H0,n→∞∼ χ2q.

Page 27: Robust Methods in Biostatistics

Testing - robust deviances

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

GLM ingredients

Examples

Classical and robustestimation in GLMsProperties of therobust GLMestimatorTesting - robustdeviances

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 15 / 39

GLM : compare a model Mp (β = (β1, . . . , βp)) to a nestedmodel Mp−q (β = (β1, . . . , βp−q, 0, . . . , 0)) with the differenceof deviances

∆D = 2[l(β̂, y) − l(β̇, y)

] H0,n→∞∼ χ2q.

The robust counterpart is

ΛQM = 2[ n∑

i=1

QM(yi, µ̂i) −n∑

i=1

QM(yi, µ̇i)]H0,n→∞∼

q∑

i=1

diχ21i,

µ̂i and µ̇i obtained under models Mp and Mp−q resp., di are positive eigenvalues of

a matrix depending only on Q and the (general) inverse(s) of M in the full (reduced)

models. QM (yi, µi) is a robust version of Q(yi, µi) =∫ µi

yi(yi − t)/V (t)dt, Nelder’s

quasi-deviance function.

Page 28: Robust Methods in Biostatistics

Breastfeeding example

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 16 / 39

Page 29: Robust Methods in Biostatistics

Model

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 17 / 39

breasti ∼ Bernoulli(µi), so that E(breasti) = µi andV ar(breasti) = µi(1 − µi) (Binomial family).

Use the logit link.

logit(E(breast)) = logit(P (breast)) =

= β0 + β1pregnancy + β2howfed + β3howfedfr

+ β4partner + β5age + β6educat + β7ethnic

+ β8smokenow + β9smokebf,

where logit(µi) = log( µi

1−µi), with µi/(1 − µi) being the odds

of a success, and µi = P (breast) is the probability of at leasttry to breastfeed.

Page 30: Robust Methods in Biostatistics

Classical fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 18 / 39

Variable Coeff. St. error z-value p-valueIntercept -4.50386 2.43357 -1.851 0.06421

pregnancy beginning -0.98115 0.57740 -1.699 0.08927howfed breast 0.30804 0.59119 0.521 0.60233

howfedfr breast 1.49555 0.58784 2.544 0.01095partner yes 1.08438 0.70281 1.543 0.12285

age 0.02681 0.05151 0.520 0.60279educat 0.17400 0.12703 1.370 0.17077

ethnic non-white 1.95507 0.75601 2.586 0.00971smokenow yes -3.31232 1.01311 -3.269 0.00108smokebf yes 1.74417 1.00626 1.733 0.08304

Tab. 1 – Parameters and standard errors estimates.

Page 31: Robust Methods in Biostatistics

Robust estimation - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 19 / 39

> library(”robustbase”)

# Model fitting : Huber’s estimator> breast.glmrob=glmrob(breast ˜ howfedfr + ethnic + educat + age + pregnancy +howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), data=breastfeeding)> summary(breast.glmrob)

# Tuning constant for the Huber fct : trade-off efficiency vs robustnessDefault c=1.345, 95% efficiency, normal model.GLM : choose a slightly higher value for GLM (e.g. c = 1.5)

# Model fitting : Mallows’ estimator> breast.glmrobWx=glmrob(breast ˜ howfedfr + ethnic + educat + age +pregnancy + howfed + partner + smokenow + smokebf, family=binomial,control=glmrobMqle.control(tcc=1.5), weights.on.x=”hat”, data=breastfeeding)> summary(breast.glmrobWx)

Page 32: Robust Methods in Biostatistics

Output - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 20 / 39

Call: glmrob(formula = breast ~ howfedfr + ethnic + educat + age

+ pregnancy + howfed + partner + smokenow + smokebf, family

=binomial, data = breast, control = glmrobMqle.control(tcc = 1.5))

Coefficients:

Estimate Std. Error z-value Pr(>|z|)

(Intercept) -7.78236 3.36541 -2.312 0.02075 *\\

howfedfrBreast 1.47905 0.69010 2.143 0.03209 *\\

ethnicNon-white 2.71169 1.12482 2.411 0.01592 *\\

educat 0.37661 0.18551 2.030 0.04234 *\\

age 0.03034 0.05959 0.509 0.61066 \\

pregnanyBeginning -0.81551 0.69536 -1.173 0.24088 \\

howfedBreast 0.54496 0.70962 0.768 0.44251 \\

partnerPartner 0.77184 0.81621 0.946 0.34433 \\

smokenowYes -3.47631 1.12924 -3.078 0.00208 **\\

smokebfYes 1.50738 1.10305 1.367 0.17176 \\

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 . 0.1 \\

Robustness weights w.r \\

* w.x:

127 weights are ~= 1; the remaining 8 ones:

3 11 14 53 63 75 90 115

0.74233 0.22474 0.41645 0.80182 0.51590 0.04944 0.20662 0.25762

Number of observations: 135 Fitted by method Mqle (in 12

iterations)

(Dispersion parameter for binomial family taken to be 1)

No deviance values available Algorithmic parameters:

acc tcc

0.0001 1.5000 maxit

50

test.acc

"coef"

Page 33: Robust Methods in Biostatistics

Robust fit(s)

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 21 / 39

Variable Huber, c = 1.5 Mallows, c = 1.5Coeff. St. error z-value Coeff. St. error z-value

Intercept -7.782 3.365 -2.312 -7.778 3.363 -2.313pregnancy beginning -0.816 0.695 -1.173 -0.815 0.694 -1.173howfed breast 0.545 0.710 0.768 0.540 0.708 0.763howfedfr breast 1.479 0.690 2.143 1.482 0.689 2.150partner yes 0.772 0.816 0.946 0.775 0.816 0.950age 0.030 0.060 0.509 0.031 0.060 0.513educat 0.377 0.186 2.030 0.376 0.185 2.028ethnic non-white 2.712 1.125 2.411 2.705 1.122 2.411smokenow yes -3.476 1.129 -3.078 -3.468 1.127 -3.078smokebf yes 1.507 1.103 1.367 1.507 1.102 1.368

Tab. 2 – Parameters and standard errors estimates.

Page 34: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 22 / 39

# plot of the weights w.r> par(mfrow=c(2,1))> plot(breast.glmrobWx$w.r,ylab=”w(r)”)> identify(breast.glmrobWx$w.r)# left-click on observations to identify them, right-click to end

# plot of the weights w.x> plot(breast.glmrobWx$w.x,ylab=”w(x)”)> identify(breast.glmrobWx$w.x)

# plot of various (robust) Pearson residuals> age=breastfeed[,8]> plot(breast.glmrobWx$residuals, ylab=”Pearson residuals”)> plot(breast.glmrobWx$linear.predictors,breast.glmrobWx$residuals,xlab=”Predictors”,ylab=”Pearson residuals”)

Page 35: Robust Methods in Biostatistics

Robustness weights

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 23 / 39

0 20 40 60 80 100 120 140

0.85

0.95

Index

w(x

)

18

0 20 40 60 80 100 120 140

0.2

0.6

1.0

Index

w(r

)

3

11

14

53

63

75

90 115

Page 36: Robust Methods in Biostatistics

Explanation

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 24 / 39

Observations : 75, 11, 115 and 14 had a large probability ofbreastfeeding (p ≥ 0.90) but bottlefed

Observations : 63 and 90 have a low probability ofbreastfeeding (p ≤ 0.11) but breastfed

large negative (resp. positive) Pearson residuals in the robustanalysis

Page 37: Robust Methods in Biostatistics

Residuals

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 25 / 39

0 20 40 60 80 120

−30

−20

−10

0

Index

Pea

rson

res

idua

ls

Robust analysis

0 20 40 60 80 120

−30

−20

−10

−5

05

Index

Pea

rson

res

idua

ls

Classical analysis

Page 38: Robust Methods in Biostatistics

Manual elimination - R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Model

Classical fitRobust estimation -R code

Output - R code

Robust fit(s)

Robustness weights

ResidualsManual elimination -R code

Extension to GEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 26 / 39

Example : Omit variable pregnancy from the full robust fit

> breast.glmrobWx=glmrob(breast ˜ pregnancy + howfedfr + ethnic + educat +age + howfed + partner + smokenow+smokebf, family=binomial,weights.on.x=”hat”, control=glmrobMqle.control(tcc=1.5),data=breast)> step1pregnancy=glmrob(breast ˜ howfedfr + ethnic + educat +age + howfed + partner +smokenow +smokebf, family=binomial,weights.on.x=”hat”,control=glmrobMqle.control(tcc=1.5),data=breast)

> anova(breast.glmrobWx,step1pregnancy,test=”QD”)

Robust Quasi-Deviance Table

Model 1: breast ~ howfedfr + ethnic + educat + age + grp + howfed

+ partner + smokenow + smokebf

Model 2: breast ~ howfedfr + ethnic + educat + age + howfed +

partner + smokenow + smokebf

pseudoDf Test.Stat Df Pr(>chisq)

1 125 2 126 -1.5993 -1 0.206

Page 39: Robust Methods in Biostatistics

Extension to GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 27 / 39

Page 40: Robust Methods in Biostatistics

GEE setting

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 28 / 39

Outcome Yit for subject i at time t (i = 1, · · · , K andt = 1, · · · , ni). Subjects independent.

Set of covariates xit.

Model E(Yit) = µit via g(µit) = xTitβ for a known linkfunction g.

Assume V ar(Yi) = φA1/2i Ri(α)A

1/2i , where

Ai = diag(v(µi1), · · · , v(µini)), v(µi1) = var(Yit)

Ri(α) is a working correlation matrix.

If one observation per cluster → GLM (no correlation).

Page 41: Robust Methods in Biostatistics

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi V −1

i ( · (Yi − µi) ) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai.

Page 42: Robust Methods in Biostatistics

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi ΓTi V

−1i (Wi · (Yi − µi) − ci) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).

Weights Wi :

w̃(rit) =

{c/|rit| if |rit| > c1 otherwise,

and w(xit) as for GLMs.

Page 43: Robust Methods in Biostatistics

Robust estimation for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 29 / 39

Generalize the GEE equations (Cantoni (2004) CJS)

K∑

i=1

DTi ΓTi V

−1i (Wi · (Yi − µi) − ci) = 0,

where Di = ∂µi/∂β, Vi = φAiRi(α)Ai. Moreover,ci = E(Wi · (Yi − µi)). Finally,Γi = E(∂Wi · (Yi − µi)/∂µi − ∂ci/∂µi).

Weights Wi :

w̃(rit) =

{c/|rit| if |rit| > c1 otherwise,

and w(xit) as for GLMs.

Robust estimation of nuisance parameters (α, φ) as well.

Page 44: Robust Methods in Biostatistics

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 30 / 39

It holds that

√n(β̂ − β)

n→∞∼ N (0,M−1QM−1),

where

M = limK→∞

1

K

K∑

i=1

DTi ΓT

i V−1i ΓiDi

Q = limK→∞

1

K

K∑

i=1

DTi ΓT

i V−1i V ar(ψi)V

−1i ΓiDi

ψi = Wi · (Yi − µi).

Again the influence function for a generic observation isproportional to the bounded score.

Page 45: Robust Methods in Biostatistics

Robust quasi-likelihoods for GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

GEE setting

Robust estimationfor GEERobustquasi-likelihoods forGEE

Example 2 - GUIDEdata

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 31 / 39

Test statistic

Λt(s) = 2( K∑

i=1

Qti(s)(yi, µ̂i) −K∑

i=1

Qti(s)(yi, µ̇i))H0,n→∞∼

q∑

i=1

diχ21i

where µ̂i and µ̇i are the estimates under the complete and the restricted modelrespectively, di the positive eigenvalues of Q

(M−1 − M̃+

), and where

Qti(s)(yi, µi) =

1

φ

∫ ti(s)=µi

ti(s)=yi

(yi − ti)TWi(ti)V

−1( bfti)Γi(ti)dti(s) =

− 1

φ

∫ ti(s)=µi

ti(s)=yi

E((yi − ti)

TWi(ti))V −1(ti)Γi(ti)dti(s),

with the integrals possibly path-dependent, but path dependence vanishes

asymptotically.

Page 46: Robust Methods in Biostatistics

Example 2 - GUIDE data

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 32 / 39

Page 47: Robust Methods in Biostatistics

Classical and robust fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 33 / 39

Estimates on the full logistic model.

Classical Robust Robustw̃(rit) 1 Huber Huberw(xit) 1 1 w(xit) =

√1 − hit

α̂ 0.09 0.11 0.10Intercept -3.05 -3.62 -3.63

(0.96) (1.30) (1.28)female -0.75 -1.45 -1.41

(0.60) (0.80) (0.78)age -0.68 -1.48 -1.39

(0.56) (0.71) (0.69)dayacc 0.39 0.51 0.52

(0.09) (0.13) (0.13)severe 0.81 0.71 0.69

(0.36) (0.42) (0.41)toilet 0.11 0.36 0.35

(0.10) (0.13) (0.13)

Page 48: Robust Methods in Biostatistics

R code

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 34 / 39

> source(”robGEE.r”)

> GUIDEclass <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1000,kconst=1000)> GUIDEclass

> GUIDEHuber <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,kconst=2.4)> GUIDEHuber

> GUIDEMallows <− robGEE(bothered ˜ female + age + dayacc + severe + toilet,cluster=practise,data=GUIDE,tuningc.y=1.5,weights.on.x=”hat”,kconst=2.4)> GUIDEMallows

# default correlation structure in robGEE = ”exchangeable”

Page 49: Robust Methods in Biostatistics

Weights - Huber fit

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 35 / 39

0 20 40 60 80 100 120 140

0.2

0.4

0.6

0.8

1.0

Index

w(r

)

Page 50: Robust Methods in Biostatistics

Weights per practice

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 36 / 39

• ••

• •

••

• • • • •

••••• • •••

••

• •

• ••

•• • • •••• • • • ••

practice

Wei

gth

w.y

0 50 100 150 200

0.2

0.4

0.6

0.8

1.0

8

19

22

42

44

88

87

135

Page 51: Robust Methods in Biostatistics

Explanation

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 37 / 39

8 observations in 137 have a weight lower than 0.60

Patients 8, 42 and 44 pretend not to be bothered despitefrequent visits to the toilet (10 for the first two, 20 for patient40) and a high number of leaking accidents

Case 19 and 38 are bothered although the severity of theirsymptoms is pretty low compared to the others

Some of these patients also identified by other techniques.

Page 52: Robust Methods in Biostatistics

Backward stepwise procedure for (R)GEE

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 38 / 39

Variable Step 1 Step 2 Step 3 Step 4Classical female 0.224 0.270 - -

age 0.249 - - -dayacc 3 · 10−5 3 · 10−5 2 · 10−5 4 · 10−6

severe 0.089 0.081 0.061 0.011toilet 0.224 0.164 0.165 -

Robust female 0.070 0.095 - -age 0.045 0.041 0.068 -dayacc 6 · 10−5 2 · 10−5 5 · 10−6 8 · 10−5

severe 0.092 - - -toilet 0.006 0.004 0.004 0.002

Page 53: Robust Methods in Biostatistics

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

Page 54: Robust Methods in Biostatistics

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Page 55: Robust Methods in Biostatistics

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Page 56: Robust Methods in Biostatistics

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)

Page 57: Robust Methods in Biostatistics

Conclusions

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 39 / 39

Robust estimation and testing procedures are available forGLM and GEE type of models.

R code is available.

Book website :http ://www.unige.ch/ses/metri/cantoni/RobustBiostat/

Goes beyond GLMs and GEE (e.g. mixed linear models, Coxregression)

Suggestion : put them in your statistical toolbox !

Page 58: Robust Methods in Biostatistics

References

Outline

Example 1

Example 2

Robustnessbackground

GLM setting

Breastfeedingexample

Extension to GEE

Example 2 - GUIDEdataClassical and robustfit

Robustness weights

Explanation

Conclusions

References

Stephane Heritier ASC 2010 Fremantle - 8/10/2010 – 40 / 39

Cantoni (2004) A Robust Approach to Longitudinal Data Analysis, Canadian Journal

of Statistics, 32, 169-180.

Cantoni & Ronchetti (2001) Robust Inference for Generalized Linear Models, Journal

of the American Statistical Association, 1022-1030.

Heritier & Ronchetti (1994) Robust bounded-influence tests in general parametricmodels. Journal of the American Statistical Association 89(427), 897904.

Liang & Zeger (1986) Longitudinal Data Analysis Using Generalized Linear Models,Biometrika, 73, 13-22.

McCullagh & Nelder (1989) Generalized Linear Models, Chapman & Hall.

Moustaki, Victoria-Feser & Hyams (1998) A UK study on the effect of socioeconomicbackground of pregnant women and hospital practice on the decision to breastfeedand the initiation and duration of breastfeeding, Statistics Research Report LSERR44,London School of Economics, London.

Nelder & Weddeburn (1972) Generalized Linear Models, Journal of the Royal

Statistical Society A, 135, 370-384.

Victoria-Feser (2002) Robust inference with binary data, Psychometrika, 67, 21-23.