Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys

Preview:

DESCRIPTION

Aquatic Resource Surveys. Designs and Models for. DAMARS. R82-9096-01. Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys. Breda Munoz Virginia Lesser. - PowerPoint PPT Presentation

Citation preview

1

Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys

Breda Munoz

Virginia Lesser

R82-9096-01

2

This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.

3

Outline Missing data in environmental surveys

Nonignorable missing data mechanism

Model-based approach for nonignorable missing data

Design-based estimation and nonignorable missing data

Illustration

Summary

4

Missing Data in Environmental Surveys

Researchers in environmental studies must obtain access to selected sites to gather field data

Denial of access: common problem in environmental surveys unit non-response affects the results of data analysis

5

Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies

(Lesser, 2001)

Result 1995 1996

Private Landowners    

Agreed to access 43% 40%

Refused access 36% 37%

Undeliverable 2% 2%

Not returned/no contact 16% 14%

Public Land 3% 7%

Total 100% 100%

6

Introduction

(Boward et.al.,1999) The 1995-1997 Maryland Biological Stream Survey Results: overall denial access rate of 10%.

ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): 1998: 10.0% 1999: 6.0% 2000: 12.5%

7

Assumptions

A probability sampling design to collect outcomes of a spatial random process Y

is a collection of sampling sites selected using the probability sampling design.

auxiliary variables

1{ , , }ns s

s ( ), ( )Y s X s

1 if access was granted for site ( )

0 otherwise R

ss

8Smith, Skinner and Clark (1999), Rubin and Little (2002)

X1

X2

Y R

( ) | ( ), ( ) ( ( ))i i i iP R Y P Rs X s s s

Missing Mechanism: Missing Completely at Random (MCAR)

9

X1

X2

Y R

Missing Mechanism: Missing at Random (MAR)

( ) | ( ), ( ) ( ( ) | ( ))i i i i iP R Y P Rs X s s s X s

Smith, Skinner and Clark (1999), Rubin and Little (2002)

10

X1

X2

Y R

Missing Mechanism: Nonignorable

( ) | ( ), ( )i i iP R Ys X s s

Smith, Skinner and Clark (1999), Rubin and Little (2002)

11

Model-based Approach Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) :

R(si) ~ Bernoulli(pi),

Data model Missing Mechanism model

0 1logit( ) ( ) βi ip Y s X

covariates

( , | covariates) ( | covariates) ( | , covariates)f f fY R Y R Y

12

Model-assisted estimation and nonignorable missing data

Assume the parameter of interest:

Total of the response Y

( )y

R

T y d s s

R

13

Model-assisted estimation and nonignorable missing data

Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993):

Let be a collection of fixed values

1

( )ˆ( )

ni

yi i

yT

s

s

11

1 2 1

( ) ( )( ) ( )ˆ( ) ( )

n k ni j i ji i

yi j ii i

y I Q y Qy I y QT

s ss s

s s

1{ , , }kQ Q

14

Model-assisted estimation (cont.)

Sample size n: observed, n-n* missing nonignorable

* *

11

1 2 1

( ) ( )( ) ( )ˆ( ) ( )

n k ni j i ji i

yi j ii i

y I Q y Qy I y QT

s ss s

s s

missing

*n

* *

11

21 1

( ) ( )( ) ( )

( ) ( )

n k ni j i ji i

ji n i ni i

y I Q y Qy I y Q s ss s

s s

15

Model-assisted estimation (cont.)

Observed Missing

Class *

1Q *2Q

… *kQ Total

*1Q

*2Q

… *kQ Total Total

1 *11n

*12n … *

1kn *1n 11m 12m

… 1km 1m 1 1n m

2 *21n

*22n … *

2kn *2n 21m 22m

… 2km 2m 2 2n m

c *1cn

*2cn … *

ckn *cn 1cm 2cm

… ckm cm c cn m

* *1( ) : ( ) , 1, , , 2 ,i j i jy Q y Q i n j k s s denotes the

*jQ

16

Model-assisted estimation (cont.)

Likelihood:

*

1 1 1 1

1| ,Class 0 | ,Classij ijc k c k

n m

i ij i j i

L P R Q j P R Q j

17

Model-assisted estimation (cont.)

Reparameterize model parameters (Baker and Laird

(1988)):

| Class ij ijj

i i

N MP Q i

N M

( ) 0 | , Class ijj

ij ij

MP R Q i

N M

s

Expected cell counts

18

Model-assisted estimation (cont.)

Use EM algorithm to estimate expected counts of missing cells, Mij.

E-step:

ijE m

1

0 | Class , | Class

0 | Class , | Class

j j

i k

j jj

P R i Q P Q im

P R i Q P Q i

ij ijE n n

19

M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) Algorithm based on fit of marginal totals.

EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988)

Model-assisted estimation (cont.)

20

Possible estimators for the total of Y:

Cell adjustment:

Model-assisted estimation (cont.)

( ) ( )1 1

ˆ ( ( ) 1) ( ) ( )c k

y ij ij iji j

T I R y w

s s s adjustment weight

(1)

1 1

( )

( )1

( )

ijij

ij ij

ij c k

i j ijij

ij ij

N

n

m nw

n

m n

s

s

s

(Little and Rubin, 2002)

1yT

21

Column adjustment:

Model-assisted estimation (cont.)

2yT (2)

1 1

( )

( )1

( )

jij

j j

ij c k

i j jij

j j

N

ns

m nw

n

m n

s

s

22

Row adjustment:

Model-assisted estimation (cont.)

(3)

1 1

( )

( )1

( )

iij

i iij c k

i j iij

i i

N

n

m nw

n

m n

s

s

s

3yT

23

Model-assisted estimation (cont.)

Variance estimators obtained using bootstrap

(Efron, 1994) Bootstrap produces asymptotically valid variance.

( ) ( ) 2( ) ( ) ( )

1

1var( ) ( )

Mi i

y y yi

T T TM

24

Illustration

We simulate a continuous multivariate normal spatial random process for y

Population: John Day Middle Fork stream reaches

143 stream reaches divided in survey segments (~1 mile)

6536 survey segments

Area of 785 mi2

25

Illustration

The population of stream reaches was stratified in 6 strata based on the number of survey segments:

“<10 ” “10-20” “20-30”

“30-50” “50-100” “>100”

Nonignorable missing data was generated as:

Missing rates of 15%, 30% and 50% were created.

0 if ( ) ( )

1 otherwise

y zR

ss

26

John Day Middle Fork stream network

27

Population Summary

Strata1 Strata2 Strata3 Strata4 Strata5 Strata6

Size 246 433 269 1059 1208 3321

Class

Class 1

Class 2

64.23%

35.77%

65.13%

34.87%

64.31%

35.69%

65.44%

34.56%

65.48%

34.52%

61.70%

38.30%

Summary

Minimum

Mean

Max

-2.07

1.63

7.01

-2.99

1.68

7.95

-3.96

1.66

8.04

-2.18

1.70

6.15

-2.37

1.73

8.65

-5.47

1.80

9.87

28

Illustration

Sample size n = 100

Allocation proportional to number of survey segments on each strata

Q1 = first sample quantile

29

John Day Middle Fork stream network and sample points

30

Modified Bootstrap

We draw 1000 random samples of size 100 from the observed sample: Independently across strata Maintain proportional allocation Maintain the row totals by the auxiliary variable

For each of the 1000 samples, we estimate

We obtain a standard error and MSE for each estimate

We repeat this process 1000 times

1 2 3ˆ ˆ ˆ ˆ, , , , y y y yHTT T T T

31

Summary

15% Missing Rate 30% Missing Rate 50% Missing Rate

Estimate MSE

/100,000 Coverage 95% CI

Estimate

MSE /100,000

Coverage 95% CI

Estimate

MSE /100,000

Coverage 95% CI

yHTT

10,624.20

17.24

73.5

8,299.13

109.70

20.0%

7,646.15

149.81

0.1%

1yT

11,266.93

12.78

94.3

10,929.37

23.12

91.2%

14,788.26

130.94

7.2%

2yT

11,183.85

13.77

93.5

10,860.73

23.47

90.6%

14,790.36

131.02

7.1%

3yT

12,401.27

23.60

80.2

11,741.42

22.09

94.1%

14,380.28

105.22

14.1%

yT

11,445.13

Recommended