46
Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security Technology Department Washington Statistical Society Seminar February 3, 2006 National Center for Health Statistics Hyattsville, MD

Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Embed Size (px)

Citation preview

Page 1: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Statistical Methods for Alerting Algorithms in Biosurveillance

Howard S. BurkomThe Johns Hopkins University Applied Physics Laboratory

National Security Technology Department

Washington Statistical Society SeminarFebruary 3, 2006

National Center for Health StatisticsHyattsville, MD

Page 2: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

• ESSENCE: An Electronic Surveillance System for the Early Notification of Community-based Epidemics

• Monitoring health care data from ~800 military treatment facilities since Sept. 2001

• Evaluating data sources– Civilian physician visits– OTC pharmacy sales– Prescription sales– Nurse hotline/EMS data– Absentee rate data

• Developing & implementing alerting algorithms

ESSENCE Biosurveillance Systems

Page 3: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Outline of Talk

• Prospective Syndromic Surveillance: introduction, challenges

• Algorithm Evaluation Approaches• Statistical Quality Control in Health

Surveillance• Data Modeling and Process Control• Regression Modeling Approach• Generalized Exponential Smoothing• Comparison Study• Summary & Research Directions

Page 4: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Required Disciplines: Medical/Epi

Medical/Epidemiological• filtering/classifying clinical

records => syndromes• interpretation/response to

system output• coding/chief complaint

interpretation

Page 5: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Required Disciplines: Informatics

Information Technology• surveillance system

architecture• data ingestion/cleaning • interface between health

monitors and system

Page 6: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Required Disciplines: Analytics

Analytical• Statistical hypothesis tests• Data mining/automated

learning• Adaptation of methodology to

background data behavior

Page 7: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Essential Task Interaction in Volatile Data Background

Medical/Epidemiological• filtering/classifying clinical

records => syndromes• interpretation/response to

system output• coding/chief complaint

interpretation

Information Technology• surveillance system

architecture• data ingestion/cleaning • interface between health

monitors and system

Analytical• Statistical hypothesis tests• Data mining/automated learning• Adaptation of methodology to

background data behavior

Page 8: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

The Multivariate Temporal Surveillance Problem

Multivariate Nature of Problem:• Many locations

• Multiple syndromes

• Stratification by age, gender, other covariates

Surveillance Challenges:

• Defining anomalous behavior(s)

– Hypothesis tests--both appropriate and timely

• Avoiding excessive alerting due to multiple testing

– Correlation among data streams

– Varying noise backgrounds

• Communication with/among users at different levels

• Data reduction and visualization

Varying Nature of the Data:• Scale, trend, day-of-week, seasonal

behavior depending on grouping:

Page 9: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Data issues affecting monitoring

– Statistical properties• Scale and random dispersion

– Periodic effects• Day-of-week effects, seasonality

– Delayed (often variably) availability in monitoring system– Trends: long/short term: many causes, incl. changes in:

• Population distribution or demographic composition• Data provider participation• Consumer health care behavior• Coding or billing practices

– Prolonged data drop-outs, sometimes with catch-ups– Outliers unrelated to infectious disease levels

• Often due to problems in data chain• Inclement weather• Media reports (example: the “Clinton effect”)

Most suitable for modeling without data-specific information

Page 10: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Forming the Outcome Variable: Binning by Diagnosis Code

Page 11: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Rash Syndrome Grouping of Diagnosis Codes

Rash ICD-9-CM Code List

ICD9CM ICD9DESCR Consensus 050.0 SMALL POX, VARIOLA MAJOR 1 050.1 SMALL POX, ALASTRIM 1 050.2 SMALL POX, MODIFIED 1 050.9 SMALLPOX NOS 1 051.0 COWPOX 1 051.1 PSEUDOCOWPOX 1 052.7 VARICELLA COMPLICAT NEC 1 052.8 VARICELLA W/UNSPECIFIED C 1 052.9 VARICELLA NOS 1 057.8 EXANTHEMATA VIRAL OTHER S 1 057.9 EXANTHEM VIRAL, UNSPECIFI 1 695.0 ERYTHEMA TOXIC 1 695.1 ERYTHEMA MULTIFORME 1 695.2 ERYTHEMA NODOSUM 1 695.89 ERYTHEMATOUS CONDITIONS O 1 695.9 ERYTHEMATOUS CONDITION N 1 692.9 DERMATITIS UNSPECIFIED CA 2 782.1 RASH/OTHER NONSPEC SKIN E 2 026.0 SPIRILLARY FEVER 3 026.1 STREPTOBACILLARY FEVER 3 026.9 RAT-BITE FEVER UNSPECIFIED 3 051.2 DERMATITIS PUSTULAR, CONT 3 051.9 PARAVACCINIA NOS 3 053.20 HERPES ZOSTER DERMATITIS E 3

053.79 HERPES ZOSTER WITH OTHER SPECIF COMPLIC

3

053.8 H.Z. W/ UNSPEC. COMPLICATION 3 053.9 HERPES ZOSTER NOS W/O COM 3 054.0 ECZEMA HERPETICUM 3 054.79 HERPES SIMPLEX W/OTH.SPEC 3 054.8 HERPES SIMPLEX, W/UNS.COM 3 054.9 HERPES SIMPLEX NOS 3 055.79 MEASLES COMPLICATION NEC 3 055.8 MEASLES COMPLICATION NOS 3 055.9 MEASLES UNCOMPLICATED 3 056.79 RUBELLA COMPLICATION NEC 3 056.8 RUBELLA COMPLICATION NOS 3 056.9 RUBELLA UNCOMPLICATED 3 057.0 ERYTHEMIA INFECT.(5TH DIS 3 074.3 HAND/FOOT AND MOUTH DISEA 3 078.0 MOLLUSCUM CONTAGIOSUM 3 082.0 ROCKY MOUNTAIN SPOTTED FE 3 083.2 RICKETTSIALPOX 3 695.3 ROSACEA 3 695.4 LUPUS ERYTHEMATOSUS 3

www.bt.cdc.gov/surveillance/syndromedef/word/syndromedefinitions.doc

Page 12: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Chief Complaint Query

Simulated Data

Page 13: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Dynamic Detection

Simulated Data

Dynamic Detection

Page 14: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example with Detection Statistic Plot

Threshold

Injected Cases Presumed

Attributable to Outbreak Event

Page 15: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Comparing Alerting AlgorithmsCriteria:

• Sensitivity– Probability of detecting an outbreak signal– Depends on effect of outbreak in data

• Specificity ( 1 – false alert rate )– Probability(no alert | no outbreak )– May be difficult to prove no outbreak exists

• Timeliness– Once the effects of an outbreak appear in

the data, how soon is an alert expected?

Page 16: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Modeling the Signalas Epicurve of Primary Cases

• Need “data epicurve”: time series of attributable counts above background

• Plausible to assume proportional to epidemic curve of infected

• Sartwell lognormal model gives idealized shape for a given disease type

Observed vs Modeled Incubation Period Distribution: Sverdlovsk 1979 Outbreak

0

2

4

6

8

10

12

0 10 20 30 40 50

Days after Exposure

Nu

mb

er o

f Cas

es

observed

modeled

Sartwell, PE. The distribution of incubation periods of infectious disease. Am J Hyg 1950; 51:310:318

Page 17: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Signal Modeling: Realizations of Smallpox Epicurve

“maximum likelihood” epicurve

Each symptomatic case a random draw

Page 18: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Assessing Algorithm Performance

Sensitivity/Specificity as a function of threshold: Receiver Operating Characteristic

(ROC)

Timeliness/Specificity as a function of threshold:Activity Monitor OperatingCharacteristic

(AMOC)

False Alert Rate (1 – specificity)

Detection Probability(sensitivity)

False Alert Rate (1 – specificity)

Timeliness Score (e.g. Mean or Median Time to Alert)

threshold

threshold

Summary processing: measure dependence of sensitivity or timeliness on false alert rate (ROC or AMOC curves or key sample values at practical rates)

Page 19: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Detection Performance Comparison

Fever_Labbaji, lognormal signal

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90

Background Recurrence (days)

De

tec

tio

n P

rob

ab

ility

EWMA

EARS C2

EARS C3 (CUSUM)

Page 20: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Quality Control Charts and Health Surveillance

Benneyan JC, Statistical Quality Control Methods in Infection Control and Hospital Epidemiology, Infection and Hospital Epidemiology, Vol. 19, (3)194-214

Part I: Introduction and Basic TheoryPart II: Chart use, statistical properties, and research issues

• 1998 Survey article gives 135 references• Many applications: monitoring surgical wound infections, treatment

effectiveness, general nosocomial infection rate, …

Monitoring process for “special causes” of variation• Organize data into fixed-size groups of observations• Look for out-of-control conditions by monitoring mean, standard deviation,

…• General 2-phase procedure:

Phase I: Determine mean , standard deviation of process from historical “in-control” data; control limits often set to 3

Phase II: Apply control limits prospectively to monitor process graphically

Page 21: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Adaptation of Traditional Process Control to Early Outbreak Detection

On adapting statistical quality control to biosurveillance:Woodall , W.H. (2000). “Controversies and Communications in

Statistical Process Control”, Journal of Quality Technology 32, pp. 341-378.

• “Researchers rarely…put their narrow contributions into the context of an overall SPC strategy. There is a role for theory, but theory is not the primary ingredient in most successful applications.”

Woodall , W.H. (2006, in press). “The Use of Control Charts in Health Care Monitoring and Public Health Surveillance”

• “In industrial quality control it has been beneficial to carefully distinguish between the Phase I analysis of historical data and the Phase II monitoring stage”

• “It is recommended that a clearer distinction be made in health-related SPC between Phase I and Phase II…”

Does infectious disease surveillance require an “ongoing Phase I” strategy to maintain robust performance?

Page 22: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Statistical Process Control in Advanced Disease Surveillance

Key application issues:

• Background data characteristics change over time – Hospital/clinic visits, consumer purchases not governed

by physical science, engineering– But monitoring requires robust performance: algorithms

must be adaptive

• Target signal: effect of infectious disease outbreak– Transient signal, not a mean shift– May be sudden or gradual

Page 23: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

The Challenge of Data Modeling for Daily Health Surveillance

• Conventional scientific application of regression– Do covariates such as age, gender affect treatment?

Does treatment success of differ among sites if we control for covariates?

– Studies use static data sets with exploratory analysis

• In surveillance, we model to predict data levels in the absence of the signal of interest – Need reliable estimates of expected levels to recognize

abnormal levels– Data sets dynamic—covariate relationships change

Page 24: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

The Challenge of Data Modeling for Daily Health Surveillance, cont’d

Modeling to generate expected data levels – Predictive accuracy matters, not just strength of

association or overall goodness-of-fit– For a gradual outbreak, recent data can “train” model

to predict abnormal levels

Alerting decisions based on model residualsResidual = observed value – modeled value

Conventional approach: – assume residuals fit a known distribution (normal,

Poisson,…)– hypothesis test for membership in that distribution

For surveillance, can also apply control-chart methods to residuals

Page 25: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Monitoring Data Series with Systematic Features

• Problem: How to account for short-term trends, cyclic data features in alerting decisions?

• Approaches– Data Modeling

• Regression: GLM, ARIMA, others & combinations

– Signal Processing• LMS filters and wavelets

– Exponential Smoothing: generalizes EWMA

Page 26: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example: OTC Purchasing BehaviorInfluenced by Many Factors

Example: Tracking Daily Sales of Flu Remedies

Loglinear Regression

Log(Y) = 0 + 1-6d + 7t + 8-9h +10w + 11p + day of week(6 indicators)

harmonic(seasonal)

salespromotion(indicator)

lineartrend

weather(temp.)

deviation(Poisson

dist.)

daily countof anti-flu

sales

Page 27: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance

Judith C Brillman , Tom Burr , David Forslund , Edward Joyce , Rick Picard and Edith Umland

BMC Medical Informatics and Decision Making 2005, 5:4, pp 1-14http://www.biomedcentral.com/content/pdf/1472-6947-5-4.pdf

Modeling visit counts on day d:

Let S(d) = log ( visits(day d) + 1 ), the “started log”

S(d) = [Σi ci × Ii(d)] + [c8 + c9 × d] + [c10 × cos(kd) + c11 × sin(kd)],

k = 2π / 365.25 c1-c7 day-of-week effects c9 long-term trend c10-c11 seasonal harmonic terms

Training period:3036 days ~ 8.33 years Test period: 1 year

Recent Surveillance MethodBased on Loglinear Regression

Page 28: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Brillman et. al. Figure 1

Page 29: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

EWMA Monitoring

• Exponential Weighted Moving Average

• Average with most weight on recent Xk:

Sk = S k-1 + (1-)Xk,

where 0 < • Test statistic:

Sk compared to expectation from sliding baseline

Basic idea: monitor

(Sk – k) / k

Exponential Weighted Moving Average

0

10

20

30

40

50

60

02/25/94 03/02/94 03/07/94 03/12/94 03/17/94 03/22/94 03/27/94 04/01/94

Daily Count

Smoothed

• Added sensitivity for gradual events• Larger means less smoothing

Page 30: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

EWMA Concept & Smoothing Constant

Brown, R.G. and Meyer, R.F. (1961), "The Fundamental Theorem of Exponential Smoothing," Operations Research, 9, 673-685.

• Exponential smoothing represents “an elementary model of how a person learns”:

xk = xk-1 + xk - xk-1)where 0 < • For the smoothed value Sk,

Sk = S k-1 + (1-)Xk ,The variance of Sk is SX

• So a smaller is preferred because it gives a more stable Sk; values between 0.1 and 0.3 often used

• But Chatfield: changes in global behavior will result in a larger optimal

Page 31: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Generalized Exponential Smoothing

Forecast Function:

)) ((ˆ|

cbkmyksnnnnkn

where: mj = level at time j, bj = trend at time j, cj = periodic multiplier at time j s = periodic interval k = number of steps ahead

and mj, bj, cj are updated by exponential smoothing

http://www.statistics.gov.uk/iosmethodology/downloads/Annex_B_The_Holt-Winters_forecasting_method.pdf

Holt-Winters Method: modeling level, trend, and seasonality

Page 32: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Holt-Winters Updating Equations

Updating Equations, multiplicative method:

Level at time t:

Slope at time t:

Periodic multiplierat time t:

10,)1( stt

tt c

m

yc

10,)1()( 11 tttt bmmb

10,)()1( 11

ttst

tt bm

c

ym

And choice of initial values m0, b0, c0,…cs-1 should be calculated from available data

Page 33: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Forecasting Local Linearity:Automatic vs Nonautomatic Methods

Chatfield, C. (1978), "The Holt-Winters Forecasting Procedure," Applied Statistics, 27, 264-279.

Chatfield, C.and Yar, M. (1988), "Holt-Winters Forecasting: Some Practical Issues, " The Statistician, 37, 129-140.

• “Modern thinking favors local linearity rather than global linear regression in time…”

• “Local linearity is also implicit in ARIMA modelling…”– Simple EWMA ~ ARIMA(0,1,1)– EWMA + trend ~ ARIMA(0,2,2)– Multiplicative Holt-Winters has no ARIMA equivalent

• “Practical considerations rule out [Box-Jenkins] if there are insufficient observations or …expertise available”

– “Box-Jenkins… requires the user to identify an appropriate… [ARIMA] model”

For “fair” comparison of H-W to B-J, have both automatic or nonautomatic.Assertion: The simplicity of H-W permits easier classification, requiring less

historic data. Can an automatic B-J give robust forecasting over a range of input series

types?

Page 34: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Regression vs Holt-Winters

0 50 100 150 200 250 300 3500

100

200

300

400

500

600

Days

Cou

nts

Results for Data Set: 1; with DOW and Seasonal VariationHW-RMSE = [57.401] RegressedRMSE = [61.1454]

Raw Data

Holt Winters

Regression

0 50 100 150 200 250 300 350-400

-200

0

200

400Residuals

Days

Cou

nts

HoltWinters

Regression

Ongoing study with Galit Shmueli, U. of MD Sean Murphy, JHU/APL

30 time series, 700 days’ data

5 cities3 data types2 syndromes

Respiratory: seasonal & day-of-week behaviorGastrointestinal:

day-of-week effects

Page 35: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Temporal Aggregation for Adaptive Alerting

baseline interval

Used to get some estimate of normal data behavior• Mean, variance• Regression coefficients• Expected covariate distrib. -- spatial -- age category -- % of claims/syndrome

guardband

Avoids contamination of baseline with outbreak signal

Data stream(s) to monitor in time:

• Counts to be tested for anomaly

• Nominally 1 day• Longer to reduce

noise, test for epicurve shape

• Will shorten as data acquisition improves

test interval

Page 36: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Candidate Methods

1. Global loglinear regression of Brillman et. al.2. Holt-Winters exponential smoothing

fixed sets of smoothing parameters for data:with both day-of-week & seasonal behaviorwith only day-of-week behavior

3. Adaptive RegressionLog(Y) = 0 + 1-6d + 7t + 8hol + 9posthol +

56-day baseline, 2-day guardband1-6 = day-of-week indicator coefficient 7 = centered ramp coefficient8 = coefficient for holiday indicator9 = coefficient for post-holiday indicator

1-day ahead and 7-day-ahead predictions

Page 37: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Respiratory Visit Count Data--- Data--- Holt-Winters--- Regression--- Adaptive Regr.

All series display this autocorrelation; good test for published regression model

Page 38: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

GI Visit Count Data--- Data--- Holt-Winters--- Regression--- Adaptive Regr.

Page 39: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Stratified Residual Comparisons

--- Data--- Holt-Winters--- Regression--- Adaptive Regr.

Page 40: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Mean Residual Comparison

• When mean residuals favor regression, difference is small, and this difference results from largest residuals

• If the holiday terms in adaptive regression are removed, H-W means uniformly smaller

Page 41: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Median Residual Comparison

Page 42: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Residual Autocorrelation Comparison

--- Data--- Holt-Winters--- Regression--- Adaptive Regr.

Page 43: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Residual Autocorrelation Comparison 1-Day Ahead Predictions

Page 44: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Residual Autocorrelation Comparison 7-Day Ahead Predictions

Page 45: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Summary

• Data-adaptive methods are required for robust prospective surveillance

• Appropriate algorithm selection requires an automated data classification methodology, often with little data history

• Statistical expertise is required to manage practical issues to maintain required detection performance as datasets evolve:– stationarity (causes rooted in population behavior,

evolving informatics, others)– late reporting– data dropouts

Page 46: Statistical Methods for Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Research Directions

• Classification of time series for automatic forecasting– Easier for Holt-Winters than for Box-Jenkins?– Determining reliable discriminants:

• Autocorrelation coefficients• Simple means/medians• Goodness-of-fit measures

– How little startup data history required?• Most effective alerting algorithm using residuals,

given signal of interest– Apply control chart to residuals?– Need to detect both sudden, gradual signals– Detection performance constraints:

• Minimum detection sensitivity• Maximum background alert rate