31
Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security Technology Department Biosurveillance Information Exchange Working Group DIMACS Program/Rutgers University Piscataway, NJ February 22, 2006

Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Embed Size (px)

Citation preview

Page 1: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Practical Aspects of Alerting Algorithms in Biosurveillance

Howard S. BurkomThe Johns Hopkins University Applied Physics Laboratory

National Security Technology Department

Biosurveillance Information Exchange Working Group

DIMACS Program/Rutgers UniversityPiscataway, NJ February 22, 2006

Page 2: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Outline

What information do temporal alerting algorithms give the health monitor?

How can typical data issues introduce bias or other misinformation?

How do spatial scan statistics and other spatiotemporal methods give the monitor a different look at the data?

What data issues are important for the quality of this information?

Page 3: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Conceptual approaches to Aberration Detection

What does ‘aberration’ mean? Different approaches for a single data source:

• Process control-based: “The underlying data distribution has changed” – many measures

• Model-based: “The data do not fit an analytical model based on a historical baseline” – many models

• Can combine these approaches

• Spatiotemporal Approach: “The relationship of local data to neighboring data differs from expectations based on model or recent history”

Page 4: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Comparing Alerting AlgorithmsCriteria:

• Sensitivity– Probability of detecting an outbreak signal– Depends on effect of outbreak in data

• Specificity ( 1 – false alert rate )– Probability(no alert | no outbreak )– May be difficult to prove no outbreak exists

• Timeliness– Once the effects of an outbreak appear in the

data, how soon is an alert expected?

Page 5: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Aggregating Data in Time

baseline interval

Used to get some estimate of normal data behavior• Mean, variance• Regression coefficients• Expected covariate distrib. -- spatial -- age category -- % of claims/syndrome

guardband

Avoids contamination of baseline with outbreak signal

Data stream(s) to monitor in time:

• Counts to be tested for anomaly

• Nominally 1 day• Longer to reduce

noise, test for epicurve shape

• Will shorten as data acquisition improves

test interval

Page 6: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Elements of an Alerting Algorithm– Values to be tested: raw data, or residuals from a model?

– Baseline period• Historical data used to determine expected data behavior• Fixed or a sliding window?• Outlier removal: to avoid training on unrepresentative data• What does algorithm do when there is all zero/no baseline data?• Is a warmup period of data history required?

– Buffer period (or guardband)• Separation between the baseline period and interval to be tested

– Test period• Interval of current data to be tested

– Reset criterion• to prevent flooding by persistent alerts caused by extreme values

– Test statistic: value computed to make alerting decisions

– Threshold: alert issued if test statistic exceeds this value

Page 7: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Rash Syndrome Grouping of Diagnosis Codes

www.bt.cdc.gov/surveillance/syndromedef/word/syndromedefinitions.docRash ICD-9-CM Code List

ICD9CM ICD9DESCR Consensus 050.0 SMALL POX, VARIOLA MAJOR 1 050.1 SMALL POX, ALASTRIM 1 050.2 SMALL POX, MODIFIED 1 050.9 SMALLPOX NOS 1 051.0 COWPOX 1 051.1 PSEUDOCOWPOX 1 052.7 VARICELLA COMPLICAT NEC 1 052.8 VARICELLA W/UNSPECIFIED C 1 052.9 VARICELLA NOS 1 057.8 EXANTHEMATA VIRAL OTHER S 1 057.9 EXANTHEM VIRAL, UNSPECIFI 1 695.0 ERYTHEMA TOXIC 1 695.1 ERYTHEMA MULTIFORME 1 695.2 ERYTHEMA NODOSUM 1 695.89 ERYTHEMATOUS CONDITIONS O 1 695.9 ERYTHEMATOUS CONDITION N 1 692.9 DERMATITIS UNSPECIFIED CA 2 782.1 RASH/OTHER NONSPEC SKIN E 2 026.0 SPIRILLARY FEVER 3 026.1 STREPTOBACILLARY FEVER 3 026.9 RAT-BITE FEVER UNSPECIFIED 3 051.2 DERMATITIS PUSTULAR, CONT 3 051.9 PARAVACCINIA NOS 3 053.20 HERPES ZOSTER DERMATITIS E 3

053.79 HERPES ZOSTER WITH OTHER SPECIF COMPLIC

3

053.8 H.Z. W/ UNSPEC. COMPLICATION 3 053.9 HERPES ZOSTER NOS W/O COM 3 054.0 ECZEMA HERPETICUM 3 054.79 HERPES SIMPLEX W/OTH.SPEC 3 054.8 HERPES SIMPLEX, W/UNS.COM 3 054.9 HERPES SIMPLEX NOS 3 055.79 MEASLES COMPLICATION NEC 3 055.8 MEASLES COMPLICATION NOS 3 055.9 MEASLES UNCOMPLICATED 3 056.79 RUBELLA COMPLICATION NEC 3 056.8 RUBELLA COMPLICATION NOS 3 056.9 RUBELLA UNCOMPLICATED 3 057.0 ERYTHEMIA INFECT.(5TH DIS 3 074.3 HAND/FOOT AND MOUTH DISEA 3 078.0 MOLLUSCUM CONTAGIOSUM 3 082.0 ROCKY MOUNTAIN SPOTTED FE 3 083.2 RICKETTSIALPOX 3 695.3 ROSACEA 3 695.4 LUPUS ERYTHEMATOSUS 3

Page 8: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example: Daily Counts with Injected Cases

0

2

4

6

8

10

12

14

9/22/96 10/2/96 10/12/96 10/22/96 11/1/96 11/11/96 11/21/96 12/1/96 12/11/96 12/21/96

Encounter Date

Syn

drom

e C

ount

Rash_1

expected

event-attributable

Injected Cases Presumed

Attributable to Outbreak Event

Page 9: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example: Algorithm Alerts Indicated

0

2

4

6

8

10

12

14

9/22/96 10/2/96 10/12/96 10/22/96 11/1/96 11/11/96 11/21/96 12/1/96 12/11/96 12/21/96Encounter Date

Syn

dro

me

Co

un

t

Rash_1expectedalertevent-attributable

Test Statistic Exceeds Chosen Threshold

Page 10: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

EWMA Monitoring• Exponential Weighted

Moving Average• Average with most weight

on recent Xk:

Sk = S k-1 + (1-)Xk,

where 0 < • Test statistic:

Sk compared to expectation from sliding baseline

Basic idea: monitor

(Sk – k) / k

Exponential Weighted Moving Average

0

10

20

30

40

50

60

02/25/94 03/02/94 03/07/94 03/12/94 03/17/94 03/22/94 03/27/94 04/01/94

Daily Count

Smoothed

• Added sensitivity for gradual events• Larger means less smoothing

Page 11: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example with Detection Statistic Plot

Statistic Exceeds Threshold

Threshold

Page 12: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Example: EWMA applied to Rash Data

Page 13: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Effects of Data Problems

missed event

Additionalflags

Page 14: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Importance of spatial data for biosurveillance

– Purely temporal methods can find anomalies, IF you know which case counts to monitor

• Location of outbreak?• Extent?

– Advantages of spatial clustering• Tracking progression of outbreak• Identifying population at risk

Page 15: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

x

x

x

x

x

x

x

x

x x

x

x

x

Evaluating Candidate Clusters

x

xx

x

Surveillance RegionCandidate cluster:The scan statistic gives a measure of:“how unlikely is the number of cases inside relative to the number outside, given the expected spatial distribution of cases”

(Thus, a populous region won’t necessarily flag.)

Page 16: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

x

x

x

x

x

x

x

x

x

x

x

x

x

Selecting Candidate Clusters

Page 17: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Searching for Spatial Clustering

• form cylinders: bases are circles about each centroid in region A, height is time

• calculate statistic for event count in each cylinder relative to entire region, within space & time limits

• most significant clusters: regions whose centroids form base of cylinder with maximum statistic

• but how unusual is it? Repeat procedure with Monte Carlo runs, compare max statistic to maxima of each of these

x

x

x

x

x

x

x

x

x

centroids of data collection regions

region A

x

x

x

Page 18: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Scan Statistic Demo

Page 19: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Scan Statistics: Advantages

• Gives monitor guidance for cluster size, location, significance

• Avoids preselection bias regarding cluster size or location

• Significance testing has control for multiple testing

• Can tailor problem design by data, objective:– Location (zipcode, hospital/provider site,

patient/customer residence, school/store address)– Time windows used (cases, history, guardband)– Background estimation method: model, history,

population, eligible customers

Page 20: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Surveillance ApplicationOTC Anti-flu Sales, Dates: 15-24Apr2002

Total sales as of 25Apr: 1804

potential cluster: center at 22311 63 sales, 39 exp. from recent data rel. risk = 1.6 p = 0.041

Page 21: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Distribution of Nonsyndromic Visits4 San Diego Hospitals

Page 22: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Effect of Data Discontinuities on OTC Cough/Cold Clusters

• Before removing problem zips, cluster groups are dominated by zips that “turn on” after sustained periods of zero or abnormally low counts.• After editing, more interesting cluster groups emerge.

Day

s

Zip (S to N)

Page 23: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

School Nurse Data: All Visits

unreported

Page 24: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Cluster Investigation by Record Inspection

Records Corresponding to a Respiratory Cluster

Page 25: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Backups

Page 26: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Cumulative Summation Approach (CUSUM)

• Widely adapted to disease surveillance

• Devised for prompt detection of small shifts

• Look for changes of 2k standard deviations from the mean often k = 0.5)

• Take normalized deviation: often Zt = (xt –) /

• Compare lower, upper sums to threshold h:

SH,j = max ( 0, (Zt - k) + SH,j-1 )

SL,j = max ( 0, (-Zt - k) + SL,j-1 )

• Phase I sets h, k

ER Respiratory Claim Data

0

10

20

30

40

50

60

70

12/30 1/9 1/19 1/29 2/8 2/18 2/28

Date (2000-2001)

Nu

mb

er

of

Cas

es

Data

Smoothed

SH > 1

SL > 1

Upper Sum: Keep adding differences between today’s count and k std deviations above mean.

Alert when the sum exceeds threshold h.

Page 27: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

CuSum Example: CDC EARS Methods C1-C3

Three adaptive methods chosen by National Center for Infectious Diseases after 9/1/2001 as most consistent

• Look for aberrations representing increases, not decreases• Fixed mean, variance replaced by values from sliding baseline (usually

7 days)

Baseline for C1-MILD (-1 to -7 day)

Baseline C2-MEDIUM (-3 to -9days)

Baseline for C3-ULTRA (-3 to -9 days)

CurrentCount

Day-9 Day-8 Day-7 Day-6 Day-5 Day-4 Day-3 Day-2 Day-1 Day 0

Page 28: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Calculation for C1-C3:

Individual day statistic for day j with lag n:

Sj,n = Max {0, ( Countj – [μn + σn] ) / σn}, where

μn is 7-day average with n-day lag

( so μ3 is mean of counts in [j-3, j-9] ), and

σn = standard deviation of same 7-day window

C1 statistic for day k is Sk,1 (no lag)

C2 statistic for day k is Sk,3 (2-day lag)

C3 statistic for day k is Sk,3 + Sk-1,3 + Sk-2,3

,where Sk-1,3 , Sk-2,3 are added if they do not exceed the threshold

Upper bound threshold of 2:

equivalent to 3 standard deviations above mean

Page 29: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Detailed Example, I

Fewer alerts AND more sensitive:

why?

Page 30: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Detailed Example, II

Signal Detected only with 28-day baseline

Page 31: Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security

Detailed Example, III“the rest of the story”