Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151

Partially Missing At Random and Ignorable Inferences for Parameter

Subsets with Missing Data

Roderick Little

Rennes 2015 1

Outline• Inference with missing data: Rubin's (1976) paper on

conditions for ignoring the missing-data mechanism• Rubin’s standard conditions are sufficient but not

necessary: example• Propose definitions of partially MAR, ignorability for

likelihood (and Bayes) inference for subsets of parameters (Little and Zanganeh, 2013)

• Application: Subsample ignorable methods for regression with missing covariates (Little and Zhang, 2011)

• Joint work with Nanhua Zhang, Sahar Zanganeh

Rennes 2015 2

v

Rennes 2015 3

Rubin (1976 Biometrika)• Landmark paper (5000+ citations, after being

rejected by many journals!)– I wrote my first referee’s report (11 pages!), and an

obscure discussionon ancillarity

• Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution

• Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters– Focus here on likelihood, Bayes

Rennes 2015 4

Ignoring the mechanism

• Full likelihood:

• Likelihood ignoring mechanism:

• Missing data mechanism can be ignored for likelihood inference when

obs mis

, |

data with no missing values, observed, missing

= response indicator matrix

( , | , ) ( | ) ( | , )D R D R D

D D D

R

f D R f D f R D

obs | mis( , | , ) const. ( | ) ( | , )D R DL D R f D f R D dD

ign obs mis( | , ) const. ( | )DL D R f D dD

obs ign obs rest obs( , | , ) ( | , ) ( | , )L D R L D R L D R

Rennes 2015 5

Rubin’s sufficient conditions for ignoring the mechanism

• Missing data mechanism can be ignored for likelihood inference when– (a) the missing data are missing at random (MAR):

– (b) distinctness of the parameters of the data model and the missing-data mechanism:

• MAR is the key condition: without (b), inferences are valid but not fully efficient

| obs mis | obs mis( | , , ) ( | , ) for all ,R D R Df R D D f R D D

( , ) ; for Bayes, and a-priori independent

Rennes 2015 6

More on Rubin (1976)• Seaman et al. (2013) propose a more complex

but precise notation• Distinguish between “direct” likelihood

inference and “frequentist likelihood inference– “Realized MAR” sufficient for direct likelihood

inference – R depends only on realized observed data– “Everywhere MAR” sufficient for frequentist

likelihood inference: MAR condition needs to hold for observed values in future repeated sampling

– Rubin (1976) uses term “always MAR”. See also Mealli and Rubin (2015, forthcoming)

Rennes 2015 7

“Sufficient for ignorable” is not the same as “ignorable”

• These definitions have come to define ignorability (e.g. Little and Rubin 2002)

• However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data".

• These conditions are not necessary for ignoring the mechanism in all situations.

MAR+distinctness ignorable

ignorable MAR+distinctness

Rennes 2015 8

Example 1: Nonresponse with auxiliary data

obs resp aux

*resp 1 2 aux 1

( , )

( , ), 1,..., , , 1,...,i i j

D D D

D y y i m D y j n

00011

??

1 1 2Y R Y Y

??

Not linked

1 aux

2 1 resp

But... mechanism is ignorable, does not need to be modeled:

Marginal distribution of estimated from

Conditional of given estimated from D

Y D

Y Y

1aux

1 2 ind 1 2

1 2 1

includes the respondent values of ,

but we do not know w

, ~ ( , | )

Pr( 1| , , ) ( , )

hich they are.

i i

i i i i

D

Y Y f y y

r y

Y

y g y

Or whole population N

1Not MAR -- missing for nonrespondents iy i

Rennes 2015 9

MAR, ignorability for parameter subsets• MAR and ignorability are defined in terms of

the complete set of parameters in the data model for D

• It would be useful to have a definition of MAR that applies to subsets of parameters of substantive interest.

• Example: inference for regression parameters might be “partially MAR” when parameters for a model for full data are not.

Rennes 2015 10

MAR, ignorability for parameter subsets

1 2

1 1

1 2 obs ign 1 obs rest 2 obs

=( , ), = parameters of model for mechanism

Define the data to be partially MAR for

likelihood inference about , denoted P-MAR( ), if:

( , , | , ) ( | ) ( , | , )

for all

L D R L D L D R

1 2, ,

1 1 1 2Data are IGN( ) if P-MAR( ) and and ( , ) distinct

Rennes 2015 11

MAR, ignorability for parameter subsets

1

obs ign obs rest obs

Special case where = , all the parameters:

Data are P-MAR( ) if:

( , | , ) ( | ) ( | , )

for all ,

A consequence of (but does not imply) Rubin's MAR condition

IGN( ) if MAR( ) and

L D R L D L D R

and distinct

Rennes 2015 12

Partial MAR given a function of mechanism

obs mis obs

obs

Harel and Schafer (2009) define a different kind of Partial MAR:

Mechanism is partially MAR given ( ) if:

( | , , ( ), , ) ( | , ( ), , )

for all , , ,

Here "partial" relates to the mech

g R

P R Y Y g R P R Y g R

R Y

anism,

In our definition "partial" relates to the parameters

These ideas seem quite distinct

Rennes 2015 13

Example 1: Auxiliary Survey Data

obs resp aux

*resp 1 2 aux 1

( , )

( , ), 1,..., , , 1,...,i i j

D D D

D y y i m D y j n

00011

??

1 1 2Y R Y Y

??

Not linked

Easy to show that data are P-MAR( ),

and IGN( ) if , are distinct

aux

1 2

1 2 1 2

1 2 1

1 includes the respondent values of ,

but we do not know which they

( , ), 1,..., }

, ~ ( , | )

Pr( 1| , ,

are

( )

.

) ,

i i

i i

i i i i

D

D y y i n

Y Y f y y

y y

Y

r y g

Rennes 2015 14

Ex. 2: MNAR Monotone Bivariate Data

1 2

obs 1 2 1

1 2 1 2 1 1 2 1 2

2 1 2 1 2

( , ), 1,..., }

( , ), 1,..., and , 1,...,

, ~ ( , | ) ( | ) ( | , )

Pr ( 1| , , ) ( , , ) (MNAR)

i i

i i i

i i i i i

i i i i i

D y y i n

D y y i m y i m n

Y Y f y y f y f y y

r y y g y y

00011

??

1 2M Y Y

1

1

1

1

COMMENT: Clearly, inference about parameters

of the marginal distribution of can ignore mechanism,

since has no missing values.

In proposed definition, this mechanism is P-MAR( ),

and IGN( ) if

Y

Y

1 2

1

and ( , ) distinct

[Note distinctness for IGN( )]

1

Rennes 2015 15

More generally…(1) (2)

1 2

(1) (2) (1)1 2 1 1 1 1 1

(2) (1)1 2 1 2 1 2 2

( , ),( , ) blocks of incomplete variables, and

( , , , ) ( | )Pr( | , )

( | , )Pr( | , , , )

i i i i i i i

i i i i i i

Y R Y R

f y y r r f y r y

f y y r r y y

(1)1 1 1 1,obs, 1 1,mis,Assume: Pr( | ; ) ( , ) for all ,i i i ir y g y y

(2) (1) (1)1 2 2 2 1 2 2Pr( | , , ; ) ( , , , ),i i i i i i ir r y y g r y y

1 1 1

2 1 2

Mechanism is P-MAR( ), IGN( ) if and

( , , ) are distinct

Rennes 2015 16

Application: missing data in covariates

Z X Y

?

Target: regression of Y on X, Z; missing data on X

BUT: if Pr(X missing)= g(Z, X)

CC analysis is consistent, but IL methods (or weighted CC) are inconsistent since mechanism is not MAR

Simulations favoring IL often generate data under MAR, hence are biased against CC

IL methods include information for the regression in the incomplete cases (particularly intercept and coefficients of Z) and are valid assuming MAR:

Pr(X missing)= g(Z, Y)

Rennes 2015 17

Pattern Observation, i

P1 i = 1,…,m √ √ √

P2 i = m +1,…,n √ ? √

More general missing data in X

(1,...,1)xu

xu

( , )i ii i i x yz x y R

Key: √ denotes observed, ? denotes observed or missing

Could be vector

i i iz x y

P1

P2

Rennes 2015 18

Ignorable Likelihood methods

obs, mis,

All these methods assume MAR:

( | , , , ) ( | , , , ) for all i ix i i i x i i i ip R z x y p R z x y x

ign obs,1

Ignorable likelihood inferences (IL) for model: ( , | , ) are based on

( ) const. ( , | , )

i i i

n

i i ii

p x y z

L p x y z

( ) ( )1

( )mis mis

1

1

1

1

1

1

Multiple imputation: draw

( )

~

= parameters of regres

( | data),

apply multiple imputation combining rules to estimates of

sion of on ,

ˆ ˆML: ( )

Bayes:

draw ( )

d d

dX P

Y X Z

X

Rennes 2015 19

CC analysis

full mis1

mis1 1

mis1 1

( , ), ( ) ( , , | , )

( , , | , ) ( , , | , )

( | , , , ) ( , , | , ) ( , , | , )

i

i i

i i i

n

x i i ii

m n

x x i i i x i i ii i m

m n

i i i x x x x i i i x i i ii i m

L p R x y z dx

p R u x y z p R x y z dx

p y z x R u p R u x y z p R x y z dx

Assume: missingness of depends on covariates, not outcomes:

| , , , ) | , , ) for all *i ix x i i i x x i i i

X

p R u z x y p R u z x y

MNAR mechanism: Missingness can depend on missing values of X

1 1

1

Hence data are P-MAR( ), but not IGN( ),

since includes : loss of information except in special cases

Rennes 2015 20

Follows from (*)

m11

c

is1 1

r1 e tc s

( , , | , ) ( , , |( | , , )

(

, )

( ))

i i

m n

x x i i i x i i i

m

i i ii i i m

p R u x y z p R x y z dp y

L

x

L

z x


P1 i = 1,…,m √ √ ?

P2 i = m +1,…,n √ ? ?

Extension: missing data on X and Y

(1,...,1)xu

xu

iz

ii i i xz x y R

Key: √ denotes observed, ? denotes observed or missing

Could be vector

i i iz x y

P1: covariates complete

P2: x incomplete

Rennes 2015 21

SSIL analysis, X and Y missing

Assume:

XCOV: missingness of depends on covariates, not outcomes:

| , , , )

| , , ) for all

YMAR: is MAR in subsample with observed:

( | , , , , )

( | , ,

i

i

i i

i i

x x i i i

x x i i i

y x x i i i

y x x i

X

p R u z x y

p R u z x y

Y X

p R R u z x y

p R R u z

obs,, , ) for all (YMAR)i i ix y y

Rennes 2015 22

1Under these conditions, data are P-MAR ( ),

and corresponding analysis is to apply IL method

to the subsample of cases with fully observed.

We call this subsample ignorable likelihood

X

SSIL

SSIL likelihood, X and Y missing

mis rest1

( , , , | , ) ( )i i

m

i x x i y ii

p x R u y R z dy L

full mis mis1

( ) ( , , , | , )i i

n

x y i i ii

L p R R x y z dx dy

mis1

rest1

( | , , , , )

( ) ( , | , )

( | , , , )i i

i

iy i i i x

m

i x x ii

x

m

i i i x xi

p y z x R p R y z x R uu

L x R z

dy

p u

o1 mis*res bt

1s,( | ( | , , , , ), )) ,(

i iy i i

m

i i ii

i x xp R z y x R up y z x dyL

obs, 1**rest

1

1 1

( | , , )

Hence data are P-

(

MAR( ), SSIL valid fo

)

r

m

i i ii

L p y z x

Rennes 2015 23


P1 i = 1,…,m √ √ √ ?

P2 i = m +1,…,m+r √ √ ? ?

P3 i = m +r+1,…,n √ ? ? ?

Two covariates X, W with different mechanisms

x wu u

i ii i i i x wz x w y R R

x wu u

x wu u

SSIL: analyze cases in patterns 1 and 2i i i iz x w y

P1: covariates complete

P2: x obs, w, y may be mis

P3: x mis, w, y may be mis

Rennes 2015 24

Subsample Ignorable Likelihood (SSIL)• Target: regression of Y on Z, X, and W• Assume:

• By similar proof to previous case, data are SSIL applies IL method (e.g. ML) to the subsample of cases for which X is observed, but W or Y may be incomplete

(XCOV) Completeness of can depend on covariates but not :

| , , , , ) | , , , ) for all i ix x i i i i x x x i i i x i

X Y

p R u z x w y p R u z x w y

( , )

( , ) obs, obs, mis, mis,

(WYMAR) Missingness of ( , ) is MAR within subsample

of cases with observed:

( | , , , , ; )

( | , , , , ; ) for all ,i i i

i i i

w y i i i i x x wy x

w y i i i i x x wy x i i

W Y

X

p R z x w y R u

p R z x w y R u w y

Rennes 2015 25

1P-MAR( );

Simulation Study• For each of 1000 replications, 5000 observations Z, W,

X and Y generated as:

• 20-35% of missing values of W and X generated by four mechanisms

ind

ind

| , , ~ (1 ,1)

1

( , , ) ~ (0, ), 1

1

i i i i i i i

i i i

y z w x N z w x

z w x N

Rennes 2015 26

Simulation: missing data mechanismsMechanisms

I: All valid -1 1 0 0 0 -1 1 0 0 0

II: CC valid -1 1 1 1 0 -1 1 1 1 0

III: IML valid -2 1 0 0 1 -2 1 1 0 1

IV: SSIML valid -1 1 1 1 0 -2 1 1 0 1

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )0 0

w w w w w x x x x wz w x y z w x y

( ) ( ) ( ) ( ) ( )0

( ) ( ) ( ) ( ) ( )0

logit ( 0 | , , , )

logit ( 0 | 1, , , , )

i

i i

w w w w ww i i i i z i w i x i y i

x x x x xx w i i i i z i w i x i y i

P R z w x y z w x y

P R R z w x y z w x y

Rennes 2015 27

RMSEs*1000 of Estimated Regression Coefficients for Before Deletion (BD), Complete Cases (CC), Ignorable Maximum

Likelihood (IML) and Subsample Ignorable Maximum Likelihood (SSIML), under Four Missing Data Mechanisms.

I* II III IV I II III IV

BD 27 28 28 27 50 46 50 46

CC 45 44 553 322 86 71 426 246

IML 37 231 36 116 58 96 53 90

SSIML 42 133 360 49 70 80 319 69

Valid: ALL CC IML SSIML ALL CC IML SSIML

0 0.8

Rennes 2015 28

Missing Covariates in Survival Analysis1{ ,..., } distinct survival times, = unit that fails at time (no ties);

risk set at time , and , are covariates.

Complete data: contribution of data at time to partial likelihood is

k j

j j j j

j

t t j t

R t z x

t

( | , , ), ( | , , ) hazard

( | , , )j

j j jj j j j

k k kk R

y t z xL y t z x

y t z x

With fully observed (can be time-varying),

covariate-dependent missing, i.e.:

Pr( | , , ) Pr( | , )

then ( | , , , ) ( | , , )

Hence SSIL, restricted to cases with o

j j

j

j

j

x j j j x j j

j x j j j j j

j

z

x

R u y z x R u z x

y t R u z x y t z x

x

bserved in each risk set

gives valid partial likelihood -- see Zhang and Little (2014)Rennes 2015 29

How to choose X, W

• Choice requires understanding of the mechanism:

• Variables that are missing based on their underlying values belong in W

• Variables that are MAR belong in X• Collecting data about why variables are missing

is obviously useful to get the model right• But this applies to all missing data

adjustments…

Rennes 2015 30

Other questions and points– How much is lost from SSIL relative to full

likelihood model of data and missing data mechanism?

• In some special cases, SSIL is efficient for a pattern-mixture model

• In other cases, trade-off between additional specification of mechanism and loss of efficiency from conditional likelihood

– MAR analysis applied to the subset does not have to be likelihood-based

• E.g. weighted GEE, AIPWEE

– Pattern-mixture models (Little, 1993) can also avoid modeling the mechanism

Rennes 2015 31

Conclusions• Defined partial MAR for a subset of parameters• Application to regression with missing

covariates: sometimes discarding data is useful!• Subsample ignorable likelihood: apply

likelihood method to data, selectively discarding cases based on assumed missing-data mechanism– More efficient than CC– Valid for P-MAR mechanisms where IL, CC are

inconsistent

Rennes 2015 32

ReferencesHarel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14

Little, R.J.A. (1993). Pattern Mixture Models for Multivariate ‑Incomplete Data. JASA, 88, 125-134.

Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.) Wiley.

Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series.

Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605.

Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581-592.

Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What Is Meant by “Missing at Random”? Statist. Sci. 28, 2, 257-268.Zhang, N. & Little, R.J. (2014). Lifetime Data Analysis, published online Aug 2014. doi:10. 1007/ s10985-014-9304-x.

Rennes 2015 33

Documents

Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 20151