31
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R82-9096-01

Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys

  • Upload
    ike

  • View
    32

  • Download
    1

Embed Size (px)

DESCRIPTION

Aquatic Resource Surveys. Designs and Models for. DAMARS. R82-9096-01. Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys. Breda Munoz Virginia Lesser. - PowerPoint PPT Presentation

Citation preview

Page 1: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

1

Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys

Breda Munoz

Virginia Lesser

R82-9096-01

Page 2: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

2

This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.

Page 3: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

3

Outline Missing data in environmental surveys

Nonignorable missing data mechanism

Model-based approach for nonignorable missing data

Design-based estimation and nonignorable missing data

Illustration

Summary

Page 4: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

4

Missing Data in Environmental Surveys

Researchers in environmental studies must obtain access to selected sites to gather field data

Denial of access: common problem in environmental surveys unit non-response affects the results of data analysis

Page 5: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

5

Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies

(Lesser, 2001)

Result 1995 1996

Private Landowners    

Agreed to access 43% 40%

Refused access 36% 37%

Undeliverable 2% 2%

Not returned/no contact 16% 14%

Public Land 3% 7%

Total 100% 100%

Page 6: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

6

Introduction

(Boward et.al.,1999) The 1995-1997 Maryland Biological Stream Survey Results: overall denial access rate of 10%.

ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): 1998: 10.0% 1999: 6.0% 2000: 12.5%

Page 7: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

7

Assumptions

A probability sampling design to collect outcomes of a spatial random process Y

is a collection of sampling sites selected using the probability sampling design.

auxiliary variables

1{ , , }ns s

s ( ), ( )Y s X s

1 if access was granted for site ( )

0 otherwise R

ss

Page 8: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

8Smith, Skinner and Clark (1999), Rubin and Little (2002)

X1

X2

Y R

( ) | ( ), ( ) ( ( ))i i i iP R Y P Rs X s s s

Missing Mechanism: Missing Completely at Random (MCAR)

Page 9: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

9

X1

X2

Y R

Missing Mechanism: Missing at Random (MAR)

( ) | ( ), ( ) ( ( ) | ( ))i i i i iP R Y P Rs X s s s X s

Smith, Skinner and Clark (1999), Rubin and Little (2002)

Page 10: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

10

X1

X2

Y R

Missing Mechanism: Nonignorable

( ) | ( ), ( )i i iP R Ys X s s

Smith, Skinner and Clark (1999), Rubin and Little (2002)

Page 11: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

11

Model-based Approach Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) :

R(si) ~ Bernoulli(pi),

Data model Missing Mechanism model

0 1logit( ) ( ) βi ip Y s X

covariates

( , | covariates) ( | covariates) ( | , covariates)f f fY R Y R Y

Page 12: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

12

Model-assisted estimation and nonignorable missing data

Assume the parameter of interest:

Total of the response Y

( )y

R

T y d s s

R

Page 13: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

13

Model-assisted estimation and nonignorable missing data

Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993):

Let be a collection of fixed values

1

( )ˆ( )

ni

yi i

yT

s

s

11

1 2 1

( ) ( )( ) ( )ˆ( ) ( )

n k ni j i ji i

yi j ii i

y I Q y Qy I y QT

s ss s

s s

1{ , , }kQ Q

Page 14: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

14

Model-assisted estimation (cont.)

Sample size n: observed, n-n* missing nonignorable

* *

11

1 2 1

( ) ( )( ) ( )ˆ( ) ( )

n k ni j i ji i

yi j ii i

y I Q y Qy I y QT

s ss s

s s

missing

*n

* *

11

21 1

( ) ( )( ) ( )

( ) ( )

n k ni j i ji i

ji n i ni i

y I Q y Qy I y Q s ss s

s s

Page 15: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

15

Model-assisted estimation (cont.)

Observed Missing

Class *

1Q *2Q

… *kQ Total

*1Q

*2Q

… *kQ Total Total

1 *11n

*12n … *

1kn *1n 11m 12m

… 1km 1m 1 1n m

2 *21n

*22n … *

2kn *2n 21m 22m

… 2km 2m 2 2n m

c *1cn

*2cn … *

ckn *cn 1cm 2cm

… ckm cm c cn m

* *1( ) : ( ) , 1, , , 2 ,i j i jy Q y Q i n j k s s denotes the

*jQ

Page 16: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

16

Model-assisted estimation (cont.)

Likelihood:

*

1 1 1 1

1| ,Class 0 | ,Classij ijc k c k

n m

i ij i j i

L P R Q j P R Q j

Page 17: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

17

Model-assisted estimation (cont.)

Reparameterize model parameters (Baker and Laird

(1988)):

| Class ij ijj

i i

N MP Q i

N M

( ) 0 | , Class ijj

ij ij

MP R Q i

N M

s

Expected cell counts

Page 18: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

18

Model-assisted estimation (cont.)

Use EM algorithm to estimate expected counts of missing cells, Mij.

E-step:

ijE m

1

0 | Class , | Class

0 | Class , | Class

j j

i k

j jj

P R i Q P Q im

P R i Q P Q i

ij ijE n n

Page 19: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

19

M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) Algorithm based on fit of marginal totals.

EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988)

Model-assisted estimation (cont.)

Page 20: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

20

Possible estimators for the total of Y:

Cell adjustment:

Model-assisted estimation (cont.)

( ) ( )1 1

ˆ ( ( ) 1) ( ) ( )c k

y ij ij iji j

T I R y w

s s s adjustment weight

(1)

1 1

( )

( )1

( )

ijij

ij ij

ij c k

i j ijij

ij ij

N

n

m nw

n

m n

s

s

s

(Little and Rubin, 2002)

1yT

Page 21: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

21

Column adjustment:

Model-assisted estimation (cont.)

2yT (2)

1 1

( )

( )1

( )

jij

j j

ij c k

i j jij

j j

N

ns

m nw

n

m n

s

s

Page 22: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

22

Row adjustment:

Model-assisted estimation (cont.)

(3)

1 1

( )

( )1

( )

iij

i iij c k

i j iij

i i

N

n

m nw

n

m n

s

s

s

3yT

Page 23: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

23

Model-assisted estimation (cont.)

Variance estimators obtained using bootstrap

(Efron, 1994) Bootstrap produces asymptotically valid variance.

( ) ( ) 2( ) ( ) ( )

1

1var( ) ( )

Mi i

y y yi

T T TM

Page 24: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

24

Illustration

We simulate a continuous multivariate normal spatial random process for y

Population: John Day Middle Fork stream reaches

143 stream reaches divided in survey segments (~1 mile)

6536 survey segments

Area of 785 mi2

Page 25: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

25

Illustration

The population of stream reaches was stratified in 6 strata based on the number of survey segments:

“<10 ” “10-20” “20-30”

“30-50” “50-100” “>100”

Nonignorable missing data was generated as:

Missing rates of 15%, 30% and 50% were created.

0 if ( ) ( )

1 otherwise

y zR

ss

Page 26: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

26

John Day Middle Fork stream network

Page 27: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

27

Population Summary

Strata1 Strata2 Strata3 Strata4 Strata5 Strata6

Size 246 433 269 1059 1208 3321

Class

Class 1

Class 2

64.23%

35.77%

65.13%

34.87%

64.31%

35.69%

65.44%

34.56%

65.48%

34.52%

61.70%

38.30%

Summary

Minimum

Mean

Max

-2.07

1.63

7.01

-2.99

1.68

7.95

-3.96

1.66

8.04

-2.18

1.70

6.15

-2.37

1.73

8.65

-5.47

1.80

9.87

Page 28: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

28

Illustration

Sample size n = 100

Allocation proportional to number of survey segments on each strata

Q1 = first sample quantile

Page 29: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

29

John Day Middle Fork stream network and sample points

Page 30: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

30

Modified Bootstrap

We draw 1000 random samples of size 100 from the observed sample: Independently across strata Maintain proportional allocation Maintain the row totals by the auxiliary variable

For each of the 1000 samples, we estimate

We obtain a standard error and MSE for each estimate

We repeat this process 1000 times

1 2 3ˆ ˆ ˆ ˆ, , , , y y y yHTT T T T

Page 31: Adjustment Procedures to Account for Nonignorable Missing Data  in Environmental Surveys

31

Summary

15% Missing Rate 30% Missing Rate 50% Missing Rate

Estimate MSE

/100,000 Coverage 95% CI

Estimate

MSE /100,000

Coverage 95% CI

Estimate

MSE /100,000

Coverage 95% CI

yHTT

10,624.20

17.24

73.5

8,299.13

109.70

20.0%

7,646.15

149.81

0.1%

1yT

11,266.93

12.78

94.3

10,929.37

23.12

91.2%

14,788.26

130.94

7.2%

2yT

11,183.85

13.77

93.5

10,860.73

23.47

90.6%

14,790.36

131.02

7.1%

3yT

12,401.27

23.60

80.2

11,741.42

22.09

94.1%

14,380.28

105.22

14.1%

yT

11,445.13