View
26
Download
0
Category
Tags:
Preview:
DESCRIPTION
A Review of Natural Language Processing for Biosurveillance. Wendy W. Chapman, PhD. Biomedical Language Understanding. University of Pittsburgh. Dept of Biomedical Informatics. Current Surveillance. New strain of H5N1 Avian influenza. Cough 咳嗽. Respiratory. - PowerPoint PPT Presentation
Citation preview
A Review of Natural Language Processing
for Biosurveillance
Wendy W. Chapman, PhD
University of Pittsburgh
Dept of Biomedical Informatics
Biomedical Language Understanding
Current SurveillanceNew strain of H5N1
Avian influenza
Cough咳嗽Respiratory
Recent travel
Exposure to others
Positive CXR
0 10 20 30 40
Patient + CXR Expos Travel SeverePat 1 X X X X
Pat 2
Pat 3 X X
Pat 4 X
cough
SOB
0 5 10 15 20 25 30 35 40
Respiratory Patients
Leverage More Data for Surveillance
Biosurveillance
BioterroristThreats
Detectattacks
Natural DiseaseOutbreaks
Detectoutbreaks
DisasterManagement
Understandsituation
Much of the useful data in textual formatNeed natural language processing
Textual Data Sources
Non-clinical
Clinical
Internet Mapping of Outbreaks
• HealthMap
• Global Health Monitor
• Global Public Health Intelligence Network (GPHIN)—Public Health Agency of Canada
Textualclinicaldata
Textprocessor
Clinical Data for Biosurveillance
Pneumonia Yes
HistoryCough Yes
3 daysFever Yes
3 days
What types of data are available?
How do we transform the data?
How well can we process the data?
Clinical Data for Biosurveillance
Trade-off
What types of data are available?
Chief Complaints
Content• Patient's reason for seeking care• 1-2 symptoms
“Cough/headache” “n/v/d” “Motor vehicle accident”
Registration Physician Exam Discharge HomeX
Timeliness
Useful for early detection of larger outbreaks
Ambulatory & Inpatient NotesContent
• Risk factors• Travel history• Homelessness• Duration of illness• Exposure to contacts
Registration Physician Exam Discharge HomeX
• Symptoms• Findings• Medications• Allergies• Diagnoses• Chronic conditions
Clinical Epidemiological
Timeliness
Useful for targeted case detection, disease surveillance, and situational awareness
Discharge ReportsContent
• Cause• Time of death
Registration Physician Exam Discharge HomeX
• Reason for hospitalization
• Summary of care• Findings• Procedures performed• Plan for follow-up
Clinical Death
Timeliness
Most detailed but least timely—potentially usefulfor situational awareness
Text ProcessingHow do we transform the data?
Textprocessor
cough/sob
SyndromeCategoryRespiratoryextract
Clinical Conditions
No past history of pneumonia—presentswith two day history of cough.
Pneumonia- historical- absent
classifyChiefComplaints
TextualNotes Cough
- recent- present
Three Methods for Interpreting Text• Keyword-based
– NYC Syndromic Macros– If “cough*” or “wheez*” Respiratory
• Symbolic– Semantics, syntax, discourse– stomach cramp is a type of abdominal pain
• Statistical– P ( localized infiltrate |
anatomic location = lower lobe,finding = hazy opacity ) = 0.96
Processing Chief Complaints—Challenges
• Synonyms– Short of breath dyspnea– Coughing cough– Coughs cough
• Abbreviations– ha headache– abd abdominal– gx ground
transportation• Acronyms
– n/v nausea/vomiting– sob shortness of breath
• Truncations– diar diarrhea– poss possible
• Concatenations– blurredvision burred
vision– flus sxs flu symptoms
• Misspellings & typographic errors– nausa nausea– diahrea diarrhea
Substantial word variation
Contain linguistically complex narrations• Linguistic variation• Polysemy• Negation• Contextual information• Implication• Coreference
Processing Notes—Challenges
NegationApproximately half of all clinical concepts in
dictated reports are negated
• Explicit absence“The mediastinum is not widened”
• Mediastinal widening: absent
• Implied absence“Lungs are clear upon auscultation”
• Rales/crackles: absent• Rhonchi: absent• Wheezing: absent
• Uncertainty
Contextual Information
• Temporality– Three-day history of cough– Past history of pneumonia
• Finding Validation– She received her influenza vaccine– His temperature was taken in the ED
• Hypothetical conditions– He should return for fever
Chief ComplaintsIdentifying Syndromic Cases
Performance using this data
• Seven studies– One on pediatric population– Beitel, Chapman, Espino, Gesteland,
Ivanov• Reference standards
– ICD-9 discharge diagnoses– Physician review of ED reports
• Eight syndromic definitions– Five febrile syndromic definitions
0
20
40
60
80
100P
erce
nt o
f Cas
es Id
entif
ied
34
77
22
74
31
72
31
60
39
75
10
30
27
46
Febrile Syndromes
Syndrome only23%
Fever only19%
Neither53%
Both5%
Syndrome onlyFever onlyNeitherBoth
Sensitivity 0% – 12%
Chapman and Dowling, J ISDS, 2007
Ambulatory Notes
Triage NotesNC-Detect
– EMT-P + NegEx– Performs well at identifying clinical conditions
ED ReportsBetter case detection than chief complaints
– Topaz (Chapman)– MCVS (Elkin)– MedLEE (Friedman, South)
Inpatient NotesChest radiograph reports
Pneumonia - > 90% sens and spec– SymText (Fiszman and Chapman)– MedLEE (Friedman and Hripcsak) – MCVS (Elkin)Widened mediastinum– IPS System (Chapman)
Tuberculosis (Hripcsak)– MedLEE (Hripcsak)
Identifying Syndromic Cases from Textual Notes
CC vs full text record for Influenza-like Illness
Data source Sensitivity Positive Predictive Value
Chief Complaint 13% 47%
ED Notes 51% 37%
All notes 88% 23%
South et al.
Identifying Epidemiological Factors from Clinical Notes
Gundlapalli et al.
Structured MedLEE clinical notes
ETOH abuse 2.9 3.7
Drug abuse 4.1 29.1
Smoking 1.3 45
Homelessness 0 10.5
Illness duration 0 2.7
History of illness 0 22.3
Chief complaints
Textual Notes
• Moderate performance at identifying syndromic cases
• Poor performance at identifying specific syndromes
• Good performance at identifying syndromic cases
• Ability to identify specific conditions
• Ability to identify epidemiological factors
Where do we go from here?Identifying cases• Most work on chief complaints• Current emphasis on reports
– Need better algorithms and more research• Temporality and other contextual information
Conveying information• Little if any applied work on characterizing
outbreaks and conveying information to public health
Conclusion• Data in clinical texts are useful for
biosurveillance• Chief complaints most frequently used data
source– Poor to moderate performance
• Clinical notes promise better performance– More complicated text– Timeliness dependent on institution– Early stages of development and evaluation
• Need to develop more applications applying NLP to characterization
Thank You
Wendy W. Chapman: wec6@pitt.eduBiomedical Language Understanding Lab
www.dbmi.pitt.edu/blulab
Chapter on NLP for Biosurveillanceto appear in
Infectious Disease Informatics and Biosurveillance: Research, Systems, and
Case Studies
Recommended