2005/4/12 1 Space-Time Modeling and Application to Emerging Infectious Diseases 李正宇 National...

Preview:

Citation preview

2005/4/12 1

Space-Time Modeling and Application to Emerging Infectious Diseases

李正宇

National Health Research Institutes

Division of Biostatistics and Bioinformatics

July 26th, 2005

2005/4/12

2

Outline

• Introduction• STARMA Models• Methods for STARMA Modeling and

Software IEAST• Modeling Emerging Infectious Diseases

using STARMA and IEAST• Conclusion

2005/4/12

3

Introduction

2005/4/12

4

Introduction

Tobler’s First Law of Geography

‘‘Everything is related to everything else, but near things are more related than distant things.’’

2005/4/12

5

Introduction

• Biological and ecological processes are often organized and correlated in both space and time.

• Why use space-time data and space-time analyses?

• Various space-time models– STKF, KKF, VARMA, STARMA, etc.

• Why STARMA models?

• Is emerging infectious diseases the only application?

2005/4/12

6

Scope of the Work

• An efficient and robust STARMA modeling method– Space-time extensions of optimization algorithm and model fitness

measures– Refinement of the space-time modeling procedure

• Software development -- IEAST– The first general-purpose STARMA modeling and analysis software– Integrated Environment for Analyzing STARMA models

• Application to the spread of WNV in an epidemic in Detroit– Modeling and analysis of Dead Crow Data– Modeling and analysis of Human Case Data– Cross analysis of Human Case Data and Dead Crow Data– Statistical inferences from these space-time analyses

2005/4/12

7

STARMA Models

2005/4/12

8

Space-Time Variables Evolving over Time

• zt,x : some ecological variable at spatial coordinates vector x at time t. zx forms a time series for location x.

• These time series are not independent, but influence each other via spatial proximity.

time = t-1

time = t

timeXY

random noisezt,(2,2)

zt,(1,2) zt,(2,1)

zt,(0,0)

2005/4/12

9

General STARMA Models

• The general STARMA model has the stochastic equation:

• Model types:– STAR model (when k,b=0)

– STMA model (when k,b=0)

– Mixed model (when k,b 0 and k,b 0).

|----- AR terms -----| |----- MA terms -----|The strengths of the autoregressive components is measured by k,b and the strengths of shared moving average stochastic inputs are k,b.

k k

tktkktkt eezzb b

xbxbbxbx ,,,,,,

2005/4/12

10

A Useful Form for STARMA Modeling

• By introducing the spatial weight matrices W(l), we can express the general STARMA model as the following form:

• This is the equation actually used for the implementation of IEAST and applications.

where l : spatial lag, k : temporal lag;zt is the observation vector at time t;W(l) is the weight matrix for l-th order;kl are the parameters of autoregressive terms;kl are the parameters of moving average terms;et is the random noise vector at time t.

k l k l

tktl

klktl

klt eeWzWz )()(

2005/4/12

11

Spatial Correlation Structure and Weight Matrices

• Spatial weight matrices are used to construct the spatial correlation structure among locations.

• The following ordering is an example of the definition of spatial correlation structure (up to 4th order neighbors) in 2D system.

1st order, W(1) 3rd order, W(3)2nd order, W(2) 4th order, W(4)

2005/4/12

12

Some Limitations of STARMA Modeling

• Raster based• Requires massive amount of space-time data• Models generally may not be fully mechanistic

Assumptions:• Stationarity• “Spatial Regularity”• Effects are “constant”• Effects are “linearly” correlated

2005/4/12

13

Methods for STARMA Modeling and Software IEAST

2005/4/12

14

Box-Jenkins Modeling Method

ModelIdentification

ParameterEstimation

Diagnostic Check

Good?No

End

YesM

odify Model

Data

2005/4/12

15

Model Identification

To determine the model type and orders.

• Conventionally, space-time autocorrelations (i.e. STACF/STPACF) are used (Pfeifer and Deutsch, 1980).

• In this research, space-time extensions of model fitness measures (i.e. AIC, BIC) are used to assist identification when the method above does not work. These measures are more objective and computationally efficient.

2005/4/12

16

Model Identification—using Space-Time Autocorrelation Functions

Example 1: STAR (MaxT=2, MaxS=1)– STACF tails-off– STPACF cuts-off at T-lag=2 & S-lag=1

Example 2: STMA (MaxT=1, MaxS=1)– STACF cuts-off at T-lag=1 & S-lag=1– STPACF tails-off

STACF STPACF

STPACFSTACF

STACF STPACF

Suggested Model Type

Tail-off Cut-off STAR model

Cut-off Tail-off STMA model

Tail-off Tail-off Mixed model

2005/4/12

17

Model Identification— using Space-Time Autocorrelation Functions

Simulation Data 1Based on a

STAR process

STACF

Cut-offTail-off

STPACF

zt = 0.50zt-1 + 0.30W(1) zt-1 + 0.10zt-2 + 0.05W(1) zt-2 + et

Cut-off

Tail-off

STACF STPACF

Simulation Data 2Based on a

STMA processzt = et -(-0.6)et-1 -(-0.4)W(1) +et-1

2005/4/12

18

Model Identification— using Model Fitness Measures

90% 6% 8%Mixed

90% 6% 4%STMA

96% 0% 4%STAR

MixedSTMASTAR

Model type identifiedDatasets based on

Using Variance of Residuals

90% 2% 8%Mixed

90% 6% 4%STMA

84% 0%16%STAR

MixedSTMASTAR

Model type identifiedDatasets based on

Using AIC

66% 16% 18%Mixed

10% 86% 4%STMA

0% 0%100%STAR

MixedSTMASTAR

Model type identifiedDatasets based on

Using BIC

80% 4% 16%Mixed

18%78% 4%STMA

0% 0%100%STAR

MixedSTMASTAR

Model type identifiedDatasets based on

Using -AIC*BIC

Accuracies (number in red) of model type selection using (1)Variance of residuals, (2)AIC, (3)BIC, and (4)–AIC*BIC based on 150 Monte Carlo simulated datasets:

2005/4/12

19

Parameter Estimation

To calculate coefficients of a candidate model for given model type and orders.

• Two methods needed for two kinds of models:– Linear models (i.e. STAR) : Linear ML estimator.– Non-linear models (i.e. STMA and Mixed) : Multi-variate nonlinear

optimization.

• The multi-variate and non-linear nature raises problems while in optimization :– Converge to local optima– Very time-consuming

• A good starting point is crucial for optimization– Extra step ‘Pre-estimation’– Space-time extended Hannan-Rissanen Algorithm is used.

2005/4/12

20

Diagnostic Check

• To decide the adequacy of a candidate model for representing the given data.

• Methods:– Variance of residuals– Space-time autocorrelations of residuals– Significance testing of parameters– Space-time extension of AIC/BIC

2005/4/12

21

Modeling Procedures

ModelIdentification

ParameterEstimation

Diagnostic Check

Good?No

End

Yes

Modify M

odel

Data

Box-Jenkins method

2005/4/12

22

Software for STARMA Modeling -- IEAST

• Developed using GNU Octave v2.1.40 and able to be used under various popular OS, e.g. MS Windows, Mac OS, Unix.

• Two interfaces: menu-driven mode and programming mode.• Features:

– True spatio-temporal analysis software– Analyzing 2D lattice space-time datasets– Full configurability– Programming environment– Improved estimation algorithms– Improved diagnostic measures– Estimation of spatial correlation structure– Cross correlation analysis– 2D/3D plotting abilities

2005/4/12

23

IEAST —Menu-Driven Mode vs Programming Mode

[IEAST v1.30.01 - STARMA Modeling & Analysis]=============== [ Main Menu ] =============== [ 1] Setup [ 2] Data Preprocessing [ 3] Correlation Analyses [ 4] Model Identification [ 5] Parameter Estimation [ 6] Diagnostic Analysis [ 7] ------ [ 8] Preference [ 9] Interpreter [10] Exit =============================================

============== [ Setup ] ============== [ 1] > Space-time dataset [ 2] > Spatial correlation structure [ 3] > Information of datasets [ 4] > Return=======================================

========= [ Data Preprocessing ] ========= [ 1] > Remove Mean [ 2] > De-seasonalize: (1-B^dd)Z(t) [ 3] > Diference by one: (1-B)Z(t) [ 4] > De-trend [ 5] > ------ [ 6] > Subsequencing/Resampling [ 7] > Smoothing [ 8] > Missing Data [ 9] > Filter with a given STARMA model [10] > Undo previous action [11] > Return==========================================

========== [ Correlation Analyses ] ========== [ 1] > AutoCorrelation (STACF) [ 2] > Partial AutoCorrelation (STPACF) [ 3] > Cross Correlation (STXCF) [ 4] > Partial Cross Correlation (STPXCF) [ 5] > Extended Cross Correlation (ExtSTXCF) [ 6] > Plot Correlations versus T-Lag/S-Lag [ 7] > Return==============================================

==============================================[ Model Identification ] [ 1] Automatic Identification (Type,Orders) [ 2] Artificial Identification (Type,Orders) [ 3] Parameter Masking [ 4] ------ [ 5] Return==============================================

=================== [ Parameter Estimation ] =================== [ 1] > Pre-estimate Model Param -- Linear (STAR) [ 2] > Pre-estimate Model Param -- Non-linear (STMA,STARMA) [ 3] > Pre-estimate Model Param -- From STACF/STPACF [ 4] > Pre-estimate Model Param -- Specified by users [ 5] > Estimate Model Param -- Fixed SRM [ 6] > Estimate SRM -- Fixed Model Param [ 7] > Estimate SRM & Model Param -- Alternatively [ 8] > Return================================================================

==== [ Diagnostic Analysis ] ==== [ 1] > Statistical Significance [ 2] > AICC/BIC Analysis [ 3] > STACF of Residuals [ 4] > STPACF of Residuals [ 5] > ------ [ 6] > Return=================================

# list10 load data demo.dat20 load weight uniform.wet30 stacf ST_ACF Z 16 340 plotacf ST_ACF 16 3 "ACF" : : :

In menu-driven mode, users can conduct the modeling procedure by selecting a series of commands/options from the menu hierarchy.

2005/4/12

24

IEAST —Menu-Driven Mode vs Programming Mode

[IEAST v1.30.01 - STARMA Modeling & Analysis]=============== [ Main Menu ] ============= [ 1] Setup : : [ 8] Preference [ 9] Interpreter [10] Exit =============================================

=============================================|| Welcome to STARMA analyzing interpreter ||=============================================

# load program demo.pgm# list10 load data demo.dat20 load weight uniform.wet30 stacf STACF Z 16 3………100 end# run

10 load data demo.dat20 load weight uniform.wet30 stacf STACF Z 16 3…….

IEAST Program ‘demo.pgm’

# name: DatafileZ # type: matrix # rows: 100 # columns: 100 -0.0350001 0.00197952 -0.00635348.... -0.0886448 0.0504684 -0.00369402....0.025101 0.00844576 -0.00743455....…………………..

Space-time Dataset: ‘demo.dat’

# name: SOD # type: global matrix # rows: 21 # columns: 21 0 0 0 0 0 0 0 0…. 0 0 0 0 0 0 0 0…. ……………….

Spatial Weighting Matrices: ’uniform.wet’

In programming mode, a set of sophisticated instructions can be used to compose programs to control the modeling flow and to conduct statistical analyses.

2005/4/12

25

Modeling Emerging Infectious Diseases using STARMA and IEAST

2005/4/12

26

State of Art for Statistical Analyses of Emerging Infectious Diseases

As far as we know, no true spatial-temporal statistical models and methods have been used.

• Space-time cluster analysis available (Theophilides et al, 2003; Mostashari et al, 2003; Hoebe et al, 2004)

• Spatial models available (Watson et al, 2004).• Temporal models available.

2005/4/12

27

Limitations of Simply Observing How a Spatial Distribution Changes over Time

• For example, expansion of the leading edge of a disease range.

• Is the disease spreading directly over long distances but infrequently, or over short distances frequently?

• This is important for projecting the future spread.

2005/4/12

28

STARMA Has Potential for the Early Characterization of Infectious Diseases.

• STARMA acts as a “prism”. Can filter the spatial-temporal correlations into direct effects with known magnitude and spatial and temporal lags.

• Not generally a complete, mechanistic model, but puts critical constraints on models.

2005/4/12

29

West Nile Virus

The West Nile Virus (WNV) was first detected in a woman with a mild fever in the West Nile District of Uganda in 1937. Since then WNV has been spreading to North Africa, Europe, West and Central Asia, and the Middle East.

2005/4/12

30(A figure from CDC web site)

West Nile Virus in the United States

• Outbreak in NYC in Sep 1999. Vector is Culex mosquitoes.• Wild birds (89% are American crows) are the principal

hosts. Humans, horses, etc. are incidental hosts.• The incidence rate among crows is high. Infected crow

almost always die (68%).

• Surveillance of Dead crows has been used as an indicator of WNV epidemic.

2005/4/12

31

Dead Crow Data (DCD) & Human Case Datasets (HCD) in 2002

Time: Summer in 2002 (April~October)Place: Detroit metro area (Oakland, Macomb, and Wayne)

• DCD were collected systematically before and during an outbreak among humans. Data mainly consisted of locations and dates of reported public sightings.

• HCD were obtained from clinicians in Michigan. Data on address of residence and date of onset of disease were obtained from the case-patient or attending physician through telephone interviews.

2005/4/12

32

Two Datasets Collected in 2002

WWWpages

GIS - ArcMap

Longitude/Latitude

Toll-free #

Human Cases

Dead Crows*

Interview

Data Cleaning & Geocoding

* From www.rci.rutgers.edu/ ~insects/crowid.htm

2005/4/12

33

Space-Time Analysis for Dead Crow Data

2005/4/12

34

The Dead Crow Data

• Totally, 1817 dead crow sightings scattered within the three counties (red lines), spanning 28 weeks.

• Covered area (after truncation): a rectangular area of 31.6x25.8 mi• Divide the covered area into 10x10 cells. Cell size: 3.16x2.58mi

2005/4/12

35

Spatial Correlation Structure and Trends

Spatial correlation structure (uniform weighting)

Preprocessing– Remove spatio-temporal trend

• Spatial trend: 4th order polynomial regression trend surface

• Temporal trend: averaging over space.

– Remove mean

*65556*

6543456

5421245

5310135

5421245

6543456

*65556*

2005/4/12

36

Tail-off

Model Identification — STACF

STACF tails-off

2005/4/12

37

Model Identification — STPACF

The STACF/STPACF suggest the model –STAR(maxT=3, maxS=4).

Temporallycut-off after

this lag

Spatiallycut-off after

this lag

2005/4/12

38

Parameter Estimation

The parameters (ts) of this STAR model can be estimated in IEAST by linear maximum likelihood estimator.

04.002.002.011.002.03

11.004.007.018.004.02

04.009.010.036.026.01

43210

t

t

t

sssssts

• Values in dark blue are nominally significant at the 0.001 level.• Values in light blue are nominally significant at the 0.01 level.

2005/4/12

39

Diagnostic Check

• Statistical significance of parameters– The probabilities P that ts are not significant are:

• Residual’s autocorrelations

6.09.025.001.04.03

01.03.004.0001.004.02

4.01.001.0001.0001.01

43210

t

t

t

sssssP

STACF STPACF

2005/4/12

40

Interpretations for the DCD Analysis

• STAR(3,4) model is the best-fitted one. • The max. of spatial and temporal lags that are important

are still smaller. S=2 (or 6.4 km) and T=2 weeks.• Compare S=1 to S=2. Value for S=1 is much larger—cell

boundary length effects.• The virus is not spreading very far very fast. Crows are not

much spreading the virus spatially, though they probably are amplifying it locally.

• Negative Autoregressive Effect At S=1, and T=2,3.

04.002.002.011.002.03

11.004.007.018.004.02

04.009.010.036.026.01

43210

t

t

t

sssssts– Appears to be a real effect.– May be due to crow population depletion.– Suggests there is a mixture of two STAR

processes, the dominant one reflecting probability of infection, the other an echo effect from depletion.

2005/4/12

41

Additional Analyses and Results

Additional Analyses:• Using 20x20 and other cell configurations• Using different lag structures “Pfeiffer’s” vs. “Ring structure”• Using various polynomials for Spatial de-trending• Using sub-sample of the data

Results:• Consistent over various methods of spatial de-trending, except

high order polynomials resulted in smaller AR.• Consistent AR values using different lag structures and cell sizes.• Consistent implied spatial and temporal scales over which there

are significant or substantial AR effects

2005/4/12

42

Distances for Which There Are Significant Spatial Correlation

• Based on different cell configurations: 10x10, 16x16, and 20x20– The effective correlated area in the modeling result is consistently

about 10.75 km regardless of cell sizes.

Configurations Cell sizes Max S order of the estimated model

Equivalent distances

10 x10 5.08x4.15km 4 10.99 km

16 x 16 3.19x2.59km 6 10.88 km

20 x 20 2.54x2.08km 7 10.38 km

2005/4/12

43

Alternative Spatial Correlation Structures

3333333

3222223

3211123

3210123

3211123

3222223

3333333

Ring structure

*65556*

6543456

5421245

5310135

5421245

6543456

*65556*

Pfeifer’s

2005/4/12

44

Space-Time Analysis for Human Case Data

2005/4/12

45

Human Case Data

• Over 500 human cases spanning 13 weeks

• Date of onset-converted to week

• Home addresses (names stripped)-converted to “cell,” same as for DCD.

• Used same arrays of cell sizes and spatial correlation structures as for DCD.

• Same spatial and temporal de-trending method

2005/4/12

46

Model Identification — STACF

2005/4/12

47

Model Identification — STPACF

2005/4/12

48

Parameter Estimation

s=0 s=1 s=2 s=3 s=4 s=5 s=6

t=1 0.26 0.06 -0.10 -0.29 -0.05 -0.30 -0.60

t=2 0.12 0.27 0.13 -0.12 -0.11 -0.22 -0.11

t=3 0.07 0.10 -0.15 0.05 0.00 0.06 -0.01

t=4 0.04 -0.17 -0.07 -0.02 0.16 0.25 0.11

t=5 -0.01 -0.10 -0.04 0.10 -0.06 0.11 0.06

t=6 -0.04 0.08 0.09 0.03 -0.03 -0.19 -0.09

• Values in dark blue are nominally significant at the 0.001 level.• Values in light blue are nominally significant at the 0.01 level.

Spatial lagsT

empo

ral l

ags

(wee

ks)

2005/4/12

49

Diagnostic Check

• Residual’s STACF and STPACF

STACF STPACF

2005/4/12

50

Interpretations for the HCD Analysis

• Most people are getting infected at or near their homes.• The incidences are highly autocorrelated in space and time.• The distribution or probability of infection is highly “localized”.• The WNV “load” and probability of human infection is “spreading”

slowly, in the sense of not spreading very far very fast.• Suggests localized spraying could reduce cases. • Without depletion effect, the human case data show positive and

significant above zero for T-lag=2 and S-lag>=1, esp. at S-lag=1.

s=0 s=1 s=2

t=1 0.26 0.06 -0.10

t=2 0.12 0.27 0.13

t=3 0.07 0.10 -0.15

2005/4/12

51

Space-Time Cross Analysis for HCD and DCD

2005/4/12

52

Space-Time Data HCD and DCD

• The areas for cross analysis are same for both datasets.• The configuration is again 10x10 and spanning 28 weeks.• Cell size is 6.31x6.31 km.

2005/4/12

53

Both Temporal Epidemic Curves

Dead crow reported is leading human cases in time.

2005/4/12

54

Space-Time Cross Correlations

-3

2005/4/12

55

Interpretations for Space-Time Cross Correlations

• Drop smoothly to zero spatially and temporally.• Very large (as high as 0.7).• Across all spatial lags, the max. cross correlations are aligned at –3

weeks. • The cross correlations at spatial lag 1 is slightly greater than at

spatial lag 0. • When temporal lag decreases to –8 or below, the correlations

between these two datasets are negligible (<0.1).• When spatial lag increases up to 10, the cross correlations are

reduced to as low as 0.2.

2005/4/12

56

Is the Cross Correlations Spurious?

• The result shows that the ‘real’ cross correlations are much larger than the ‘spurious’ components.

The autocorrelation of the DCD can spuriously contribute to cross correlations. To eliminate this effect, both datasets were pre-whitened before calculating cross correlations.

Cross correlation with pre-whitening

2005/4/12

57

Summary for Modeling the Spread of WNV

• Crows are not spreading the disease spatially very far very fast.

• Spread is very localized, perhaps other animals or the mosquitoes themselves are spreading it spatially.

• Humans are being infected largely at or near their homes.• Both crows and humans appear to be responding to local

viral loads.• Dead crow findings precede human cases by two to three

weeks. Dead crows can be a good indicator of human epidemics.

2005/4/12

58

Conclusion

• It appears that STARMA modeling could be an important tool of the early characterization of many emerging and re-emerging infectious disease epidemics.

• During the course of an epidemic, it could be used (in principle) for forecasting, under existing conditions or under potential courses of action.

• While not generally a mechanistic model, STARMA does inform spatial and temporal scales of spread, hence places constraints on mechanistic models (which otherwise may have too many parameters).

2005/4/12

59

Funding Acknowledgements

• Michigan Agricultural Experiment Station, Michigan State University.

• Center for Emerging Infectious Diseases, Michigan State University.

• Centers for Disease Control and Prevention, USA.

2005/4/12

60

Thanks for your attention!

& Questions?

2005/4/12

61

References

• C.J.P.A. Hoebe, H. de Melker, L. Spanjaard, J. Dankert, and N. Nagelkerke. Space-time cluster analysis of invasive meningococcal disease, Emerging Infectious Disease, Vol.10, No. 9, p1621-1626, 2004.

• C.N. Theophilides, S.C. Ahearn, S. Grady, and M. Merlino. Identifying West Nile virus risk areas: The dynamic continuous-area space-time system. American Journal of Epidemiology, 157:843-854, 2003.

• J. Watson, R. Jones, K. Gibbs, and W. Paul. Dead crow reports and location of human West Nile virus cases, Chicago, 2002. Emerging Infectious Diseases, 10(5):938-940, 2004.

• F. Mostashari, M. Kulldorff, J.J. Hartman, J.R. Miller, V. Kulasekera. Dead bird clustering: A potential early warning system for West Nile virus activity. Emerging Infectious Diseases, 9:641-646, 2003.

Recommended