40
Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical modeling for prospective surveillance: paradigm, approach, and methods Al Ozonoff * , Paola Sebastiani Boston University School of Public Health Department of Biostatistics [email protected] 3/20/06

Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40

Statistical modeling for prospective surveillance:paradigm, approach, and methods

Al Ozonoff∗, Paola Sebastiani

Boston University School of Public HealthDepartment of Biostatistics

[email protected]

3/20/06

Page 2: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Prospective surveillance

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 2 / 40

Page 3: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Surveillance defined

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 3 / 40

“Surveillance is the cornerstone of public health practice.”(Thacker, 2004)

• Surveillance : “The systematic collection, consolidation, analysisand dissemination of data in public health practice.” (Langmuir,1963)

• “The ongoing systematic collection, analysis, and interpretationof outcome-specific data for use in the planning, implementation,and evaluation of public health practice.” (Thacker, 2000)

• Broad definition supports a wide variety of surveillance practices.

Page 4: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Traditional surveillance

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 4 / 40

Traditional practice of surveillance has nearly 400 years of history.

• Focus on retrospective examination of data.

• Infectious disease: basis for outbreak investigation.

• Other health outcomes: allows study of trends and evaluation ofpolicy changes; control measures; public health practice.

• Hypothesis-generating activity.

Page 5: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Prospective surveillance

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 5 / 40

New (and evolving) paradigm for public health surveillance.

• More timely collection of data.

• Wider range of “outcome-specific data”.

• Hypothesis-testing activity.

• Prototype: “syndromic surveillance”.

• Principles embodied in newly-formed International Society forDisease Surveillance (ISDS).

Page 6: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Prospective surveillance

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 6 / 40

Some challenges currently facing prospective surveillance:

• Informatics: speedy (electronic) acquisition of data.

• Anomaly detection: near-real-time identification of outbreaks.

• False alarms: potential hypothesis testing on daily basis requiresstrict control of Type I error.

• Forecasting: modeling of underlying process for projection offuture patterns of disease.

Page 7: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Forecasting

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 7 / 40

Begin with some model that will yield one-step-ahead prediction.

• Accuracy of forecast will depend on model chosen.

• Fundamental paradigm: first, establish what is “normal”. Then,be vigilant for deviations from normal behavior. Focus onbehavior of one-step-ahead (or many-step-ahead) residuals.

• For prospective surveillance, measure of forecasting capability ispredictive accuracy (e.g. RMSE).

Page 8: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Forecasting

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 8 / 40

Anomaly detection:

• Relies on one-step-ahead residuals.

• Small residual ⇒ “normal” behavior.

• Large residual ⇒ deviation from normalcy.

• Performance of baseline model (reduction of residuals) isparamount.

• Relentless pursuit of forecasting ability may lead to models thatobscure underlying processes.

• Are such models robust to changing conditions?

Page 9: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Forecasting

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 9 / 40

Aside from anomaly detection: consider study of disease process,epidemiology/transmission of disease, and long-range forecasting.

• Careful selection of model should yield representation of someaspects of disease process.

• Residuals consist of effects not explained by model.

• “Random variability” simply an admission that model does notaccount for all observed variation.

• Must reach a balance between parsimony/interpretability andperformance. Not a new idea!

Page 10: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Forecasting

Prospective surveillance

• Surveillance defined

• Trad’l surv

• Prosp surv

• Forecasting

Influenza surveillance

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 10 / 40

Problem: life is complicated.

• Bench sciences: make clever choice of experimental design ormeasurement device.

• Surveillance: constrained by limitations of data. Must be evenmore clever.

• Influenza demonstrates rich, complex dynamics.

• Further confounded by human behavior, environmental factors.

Page 11: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Influenza surveillance

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 11 / 40

Page 12: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Models for influenza

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 12 / 40

Serfling’s method for influenza.

• Traditional approach: model respiratory illness as sinusoid(Serfling’s method).

• Problem: sinusoid fits data poorly during epidemic periods (i.e.winter-time increase in flu activity).

• Implication for prospective surveillance: decreased performance(i.e. lower power for detection of outbreaks) during winter months.

Page 13: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

National P+I mortality

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 13 / 40

Weekly P&I mortality 1990−1996

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 14: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

National P+I mortality

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 14 / 40

Weekly P&I mortality 1990−1996

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 15: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

National P+I mortality

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 15 / 40

Residuals from sinusoidal model

Year (starting Sep 1)

Res

idua

l

−20

0−

100

010

020

030

040

0

1990 1991 1992 1993 1994 1995 1996

Page 16: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Further motivation

Prospective surveillance

Influenza surveillance

• Models for influenza

• National P+I mortality

• Further motivation

Hidden Markov Models

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 16 / 40

Other reasons for developing more sophisticated models forinfluenza surveillance data:

• Prospects of novel strain (e.g. “avian flu” H5N1) emerging tocause pandemic illness. Could see new dynamics oftransmission, epidemiology.

• Preparedness: understand past pandemics to learn lessons forfuture events. Focus shifts back to disease process.

• Challenging problem: model spread of disease across space andtime. Current univariate models don’t seem to generalize well tospatio-temporal models.

• Seasonality of influenza not completely understood.

• Data sources beyond traditional influenza surveillance data areincreasingly becoming available.

Page 17: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Hidden Markov Models

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 17 / 40

Page 18: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Classical approach

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 18 / 40

• Serfling’s model: underlying seasonal baseline is roughlysinusoidal. May be driven by temp; annual patterns (e.g. schoolyear); dynamics of disease.

Yt = α0 + α1t + β1 sin (2πt

52) + β2 cos (

2πt

52) + ǫt

• Large deviations above this baseline indicate epidemic state.Integrating residuals allows calculation of “excess mortality” i.e.mortality attributed to influenza above what would be expected,accounting for seasonal variation.

• Performs well for what it is asked to do. Not good atone-step-ahead predictions, since model fit is poor duringepidemic state.

Page 19: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Other approaches

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 19 / 40

• Periodic regression with auto-regressive component (PARMA).Used in syndromic surveillance settings. Better model fit thanksto AR component.

• “Method of analogues”: non-parametric forecasting. Outperformsmany methods in one-step-ahead prediction (Viboud 2003).Non-parametric ⇒ ignores and obscures knowledge aboutmechanism of disease.

• Nuno and Pagano developing mixed models approach usingGaussian with phase shift as random effect. Also incorporatebimodal Gaussian for occasional dual-wave behavior.

Page 20: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Hidden Markov Models (HMMs)

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 20 / 40

Our approach: HMMs.

• ‘Hidden” (latent, unobserved) discrete random variable,representing some aspect of disease process.

• Observed variables are modeled, conditional upon the hiddenstate. Know which state ⇒ know distribution of observed randomvariable.

• Markov property: conditional probability of state change(transition probability) depends only on the value of latent state atprevious time point. Thus specify the Markov model for k stateswith a k× k matrix of transition probabilities, and the distributionsof the observed data conditional on the hidden state.

• Parameter estimation using Bayesian inference Using GibbsSampling (BUGS). Freeware available, e.g. WinBUGS, BRUGS.

Page 21: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

WinBUGS screen shot

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 21 / 40

Page 22: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Hidden Markov Models (HMMs)

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 22 / 40

Model fitting in WinBUGS:

• Sequence of hidden states is treated as a free parameter and fitsimultaneously with other model coefficients.

• Computational demanding for long time series; parameter spacehas order kn.

• Convergence via Gibbs sampling may be an issue, esp formisspecified models.

• Latent variable provides information about mechanism ofdisease. Epidemic and non-epidemic behavior can be modeledseparately.

Page 23: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Hidden Markov Models (HMMs)

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 23 / 40

Yt−1 Yt Yt+1x

x

x

−→ Ht−1 −→ Ht −→ Ht+1 −→

Yt are observed data i.e. weekly P&I counts.Ht are the hidden states (for us, 2-state model).Arrows indicate conditional dependencies.

Yt ∼ α0 + α1t + β1 sin (2πt

52) + β2 cos (

2πt

52)∣

∣Ht = 0

Yt ∼

(

α0 + αe

)

+ α1t + β1 sin (2πt

52) + β2 cos (

2πt

52)∣

∣Ht = 1

Page 24: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Evaluation

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 24 / 40

Evaluation scheme for outbreak detection:

• Systematically investigate various HMMs and evaluate (withother approaches) using RMSE on one-step-ahead predictions.

• Use fixed period (e.g. 1990-1994) to fit all models, andsubsequent year (1995) for predictions. Repeat on other timeperiods so evaluation is not dependent on time period chosen.“Virtual prospective surveillance” (Seigrist).

• Compare several HMMs; Serfling’s method; PARMA; working onimplementation of other methods.

Page 25: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Preliminary results

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 25 / 40

• Research supported by pilot funds from the Blood Center ofWisconsin. Fourth month of a 10 month funding period; resultsare preliminary.

• Presenting goodness-of-fit evaluation only; prospectiveevaluation in progress.

• First step: evaluate HMMs on 122 Cities data.

• Eventually, follow similar approach with influenza-like illness (ILI)data. Goal: predictive spatio-temporal models of influenzamorbidity.

Page 26: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Data

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 26 / 40

122 Cities influenza surveillance system:

• CDC operated program running continuously since 1962.

• Weekly counts attributed to pneumonia and influenza (P&I).Reporting lag of 2-3 weeks.

• Approx 25% coverage of U.S. pop’n. Used by CDC fordetermining epidemic influenza (Serfling).

• Age-specific counts available. 122 cities divided into 9administrative regions, roughly 14 cities per region.

• Limitations: difficult to accurately attribute deaths to influenza;mortality known to lag morbidity (e.g. ILI activity); dynamics maydiffer from morbidity (depending on circulating viral strains).

Page 27: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Models

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 27 / 40

1. Traditional cyclic model (Serfling). OLS regression with terms forintercept, linear trend, two periodic terms for sinusoid with phaseshift.

2. Periodic auto-regression (PARMA) with fixed order (1,0) fitscyclic model plus additional ARMA terms.

3. Naive 2-state HMM. Non-epidemic state follows Serfling.Epidemic state modeled with simple mean shift.

4. 2-state AR-HMM. Non-epidemic state, data follow PARMA.Epidemic state auto-regresses deviation from cyclic baseline.

Page 28: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Serfling

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 28 / 40

Serfling’s model

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 29: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Simple HMM

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 29 / 40

Simple HMM

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 30: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

PARMA

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 30 / 40

PARMA model

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 31: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

AR-HMM

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 31 / 40

AR−HMM

Year (starting Sep 1)

Cou

nt

600

800

1000

1200

1990 1991 1992 1993 1994 1995 1996

Page 32: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Residuals

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 32 / 40

Residuals − Serfling/HMM

Year (starting Sep 1)

Res

idua

l

−20

020

0

1990 1991 1992 1993 1994 1995 1996

Residuals − PARMA/AR−HMM

Year (starting Sep 1)

Res

idua

l

−20

020

0

1990 1991 1992 1993 1994 1995 1996

Page 33: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Residuals

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 33 / 40

Serfling PARMA HMM AR−HMM

−20

0−

100

010

020

030

040

0

Model residuals

Page 34: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Residuals

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 34 / 40

Both HMMs provide a roughly 25% reduction in RMSE from Serfling,roughly 10% reduction for PARMA.

Model RMSE

Serfling 83.3PARMA 72.0Simple HMM 63.7AR-HMM 60.4

Page 35: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

ACF of residuals

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 35 / 40

0 5 10 15 20 25−

0.2

0.2

0.6

1.0

Lag

AC

F

Serfling

0 5 10 15 20 25

0.0

0.4

0.8

Lag

AC

F

PARMA

0 5 10 15 20 25

0.0

0.4

0.8

Lag

AC

F

HMM

0 5 10 15 20 250.

00.

40.

8Lag

AC

F

AR−HMM

Page 36: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Conclusions

Prospective surveillance

Influenza surveillance

Hidden Markov Models

• Classical approach

• Other approaches

• HMMs

• Evaluation

• Preliminary results

• Data

• Models

• Model fits

• Residuals

• Conclusions

Future work

Al Ozonoff 3/20/06 DIMACS epi wg – 36 / 40

Preliminary conclusions from model-fitting:

• HMMs offer superior model fit during epidemic periods. ARcomponents do not offer much improvement during non-epidemicperiod.

• Both models with AR component eliminate auto-correlation ofresiduals. Important for use of control chart detection methods(e.g. Shewhart, CUSUM).

• Interpretability of latent variable (for two-state models) providesimmediate benefit beyond better fit.

• Many-state models (k > 2) prove difficult to fit for convergencereasons.

Page 37: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Future work

Prospective surveillance

Influenza surveillance

Hidden Markov Models

Future work

• Integration

• Diffusion of influenza

Al Ozonoff 3/20/06 DIMACS epi wg – 37 / 40

Page 38: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Integration

Prospective surveillance

Influenza surveillance

Hidden Markov Models

Future work

• Integration

• Diffusion of influenza

Al Ozonoff 3/20/06 DIMACS epi wg – 38 / 40

Bayesian methodology for integration of multiple time series:

• Developed for gene expression data.

• Bottom-up heuristic search to aggregate time series data;likelihood criterion using model specification to identify “clusters”of time series.

• Hypothesis: cluster assignments will vary over time; possiblydependent on circulating strain, point of origin; less evidence ofdiffusion in recent years.

Page 39: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Diffusion of influenza

Prospective surveillance

Influenza surveillance

Hidden Markov Models

Future work

• Integration

• Diffusion of influenza

Al Ozonoff 3/20/06 DIMACS epi wg – 39 / 40

Evidence for diffusion dynamics?

• Standardization of multiple time series to allow for directcomparison across geographic regions.

• Comparison of standardized counts across distances to quantifydiffusion over course of surveillance period. Use L2 norm,cross-correlation, for dissimilarity measure between series.

• Eventual goal: development of true spatio-temporal model forinfluenza activity.

Page 40: Statistical modeling for prospective surveillance ...dimacs.rutgers.edu/archive/SpecialYears/2002_Epid/EpidSeminarSlides/...Al Ozonoff 3/20/06 DIMACS epi wg – 1 / 40 Statistical

Acknowledgements

Prospective surveillance

Influenza surveillance

Hidden Markov Models

Future work

• Integration

• Diffusion of influenza

Al Ozonoff 3/20/06 DIMACS epi wg – 40 / 40

• Many thanks to Nina Fefferman, DIMACS, and working groupparticipants.

• The author gratefully acknowledges the contributions of hisco-author Dr. Sebastiani and research assistant SupornSukpraprut.

• Thanks also to collaborators at the Harvard School of PublicHealth: Marcello Pagano, Laura Forsberg, Caroline Jeffery, andMiriam Nuno.

• Willie Anderson of CDC (NCHS) graciously offered assistance inacquiring historical data in electronic format.

• Research partially supported by a pilot grant originating from theBlood Center of Wisconsin, via NIAID grant U19 AI62627-02.