Population-Wide Anomaly Detection
Weng-Keen Wong1, Gregory Cooper2, Denver Dash3, John Levander2, John
Dowling2, Bill Hogan2, Michael Wagner2
1School of Electrical Engineering and Computer Science, Oregon State University, 2Realtime Outbreak and Disease Surveillance Laboratory, University
of Pittsburgh, 3Intel Research, Santa Clara
Motivation
• Suppose you monitor Emergency Department (ED) data which arrives in realtime
• Can you specifically detect a large scale anthrax attack?
Date / Time Admitted
Age Gender Home Zip Chief Complaint
Aug 1, 2005 3:02 20-30 Male 15213 Shortness of breath
Aug 1, 2005 3:07 40-50 Male 15146 Diarrhea
Aug 1, 2004 3:09 70-80 Female 15132 Fever
: : : : :
Model non-outbreak conditions and notice deviations
Number of ED Respiratory Cases vs. Time
0
10
20
30
40
50
60
70
12/30/2000 1/4/2001 1/9/2001 1/14/2001 1/19/2001 1/24/2001 1/29/2001 2/3/2001
Date
Nu
mb
er o
f E
D R
esp
irat
ory
Cas
es
Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
Multivariate methods eg. WSARE
2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True
Model non-outbreak conditions and notice deviations
Number of ED Respiratory Cases vs. Time
0
10
20
30
40
50
60
70
12/30/2000 1/4/2001 1/9/2001 1/14/2001 1/19/2001 1/24/2001 1/29/2001 2/3/2001
Date
Nu
mb
er o
f E
D R
esp
irat
ory
Cas
es
Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
Multivariate methods eg. WSARE
2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True
These are non-specific methods – they look
for anything unusual in
the data but not
specifically for th
e onset of an anthrax attack.
Population-wide ANomaly Detection and Assessment (PANDA)
• A detector specifically for a large-scale outdoor release of inhalational anthrax
• Uses a massive causal Bayesian network
• Population-wide approach: each person in the population is represented as a subnetwork in the overall model
Population-Wide Approach
• Note the conditional independence assumptions
• Anthrax is infectious but non-contagious
Time of Release
Person Model
Anthrax Release
Location of Release
Person Model
Global nodes
Interface nodes
Each person in the population
Person Model
Population-Wide Approach
• Structure designed by expert judgment• Parameters obtained from census data, training
data, and expert assessments informed by literature and experience
Time of Release
Person Model
Anthrax Release
Location of Release
Person Model
Global nodes
Interface nodes
Each person in the population
Person Model
Person Model (Initial Prototype)Anthrax Release
Location of ReleaseTime Of Release
Anthrax Infection
Home Zip
Respiratory from Anthrax
Other ED Disease
GenderAge Decile
Respiratory CCFrom Other
RespiratoryCC
Respiratory CCWhen Admitted
ED Admitfrom Anthrax
ED Admit from Other
ED Admission
Anthrax Infection
Home Zip
Respiratory from Anthrax
Other ED Disease
Gender
Age Decile
Respiratory CCFrom Other
RespiratoryCC
Respiratory CCWhen Admitted
ED Admitfrom Anthrax
ED Admit from Other
ED Admission
……
Person Model (Initial Prototype)Anthrax Release
Location of ReleaseTime Of Release
Anthrax Infection
Home Zip
Respiratory from Anthrax
Other ED Disease
GenderAge Decile
Respiratory CCFrom Other
RespiratoryCC
Respiratory CCWhen Admitted
ED Admitfrom Anthrax
ED Admit from Other
ED Admission
Anthrax Infection
Home Zip
Respiratory from Anthrax
Other ED Disease
Gender
Age Decile
Respiratory CCFrom Other
RespiratoryCC
Respiratory CCWhen Admitted
ED Admitfrom Anthrax
ED Admit from Other
ED Admission
……
Yesterday never
False
15213
20-30Female
Unknown
15146
50-60 Male
Prototype is Computationally Feasible
Aside from caching tricks, there are two main optimizations:
• Incremental Updating• Equivalence Classes
Performance:
On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of initialization time, 3 seconds for each hour’s worth of ED data
See Cooper G.F., Dash D.H., Levander J.D., Wong W-K, Hogan W. R., Wagner M. M. Bayesian Biosurveillance of Disease Outbreaks. In Proceedings of the 20th Conference on UAI. Banff, Canada: AUAI Press; 2004. pp94-103.
What do you gain with a population-wide approach?
Coherent framework for:
1. Incorporating background knowledge
2. Incorporating different types of evidence
3. Data fusion
4. Explanation
1. Incorporating Background Knowledge
• Limited data from actual anthrax attacks available:– Postal attacks 2001 (Only 11 people affected,
not representative of a large scale attack)– Sverdlovsk 1979
• But literature contains studies on the characteristics of inhalational anthrax
1. Incorporating Background Knowledge
Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
• Progression of symptoms
• Incubation period
• Spatial dispersion pattern
1. Incorporating Background Knowledge
Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
• Progression of symptoms
• Incubation period
• Spatial dispersion pattern
At an individual level
1. Incorporating Background Knowledge
Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
• Progression of symptoms
• Incubation period
• Spatial dispersion pattern
Can represent this by the effects over individuals
2. Incorporating Evidence
• Easily incorporate different types of evidence eg. spatial, temporal, demographic, symptomatic
• Easily incorporate new evidence that distinguishes an individual (or individuals) from others– Modify the appropriate person model
3. Data Fusion
Date / Time Admitted
Age Gender Home Zip Chief Complaint
Aug 1, 2005 3:02
20-30 Male 15213 Shortness of breath
Aug 1, 2005 3:07
40-50 Male 15146 Diarrhea
Aug 1, 2004 3:09
70-80 Female 15132 Fever
: : : : :
ED data OTC data
• No data available during an actual anthrax attack that captures the correlation between these two data sources.
• By modeling the actions of individuals, and incorporating background knowledge, we can come up with a plausible model of the effects of an attack on these two data sources.
3. Data Fusion
Date / Time Admitted
Age Gender Home Zip Chief Complaint
Aug 1, 2005 3:02
20-30 Male 15213 Shortness of breath
Aug 1, 2005 3:07
40-50 Male 15146 Diarrhea
Aug 1, 2004 3:09
70-80 Female 15132 Fever
: : : : :
ED data OTC data
OTC data – aggregated over zipcode and available daily
ED data – individual patient records, available usually in
real-time
3. Data Fusion
Date / Time Admitted
Age Gender Home Zip Chief Complaint
Aug 1, 2005 3:02
20-30 Male 15213 Shortness of breath
Aug 1, 2005 3:07
40-50 Male 15146 Diarrhea
Aug 1, 2004 3:09
70-80 Female 15132 Fever
: : : : :
ED data OTC data
By representing at the finest granularity (ie. each individual), we can easily deal with different spatial and temporal granularity in data fusion.
See Wong, W-K, Cooper G.F., Dash D.H., Dowling, J.N., Levander J.D., Hogan W. R., Wagner M. M. Bayesian Biosurveillance Using Multiple Data Streams. In Proceedings of the 3rd National Syndromic Surveillance Conference, 2004.
4. Explanation
• Important to know why the model believes an anthrax attack is occurring
• Can find the subset of evidence E* that most influences such a belief
• In PANDA, E* would correspond to a group of individuals
• Identify the individuals that most contribute to the hypothesis of an attack
4. Explanation
Can also use the Bayesian network to calculate the most likely location of release and time of release
Currently, we identify the top equivalence classes that contribute the most to the hypothesis that an attack is occurring
Gender Age Decile
Home Zip
Respiratory Symptoms
Date Admitted
M 20-30 15213 True 2 days ago
Gender Age Decile
Home Zip
Respiratory Symptoms
Date Admitted
F 20-30 15213 True 2 days ago
Gender Age Decile
Home Zip
Respiratory Symptoms
Date Admitted
M 30-40 15213 True 2 days ago
Gender Age Decile
Home Zip
Respiratory Symptoms
Date Admitted
F 40-50 15213 True 2 days ago
Future Work
• More sophisticated person models
• Improved explanation capabilities
• Validation of data fusion model
• More disease models apart from anthrax
• Contagious disease models
• Combining outputs from multiple Bayesian detectors
Thank You!
RODS Laboratory: http://rods.health.pitt.edu
Bayesian Biosurveillance:http://www.cbmi.pitt.edu/panda/
This research was supported by grants IIS-0325581 from the National Science Foundation, F30602-01-2-0550 from the Department of Homeland Security, and ME-01-737 from the Pennsylvania Department of Health.