30
A Review of Natural Language Processing for Biosurveillance Wendy W. Chapman, PhD University of Pittsburgh Dept of Biomedical Informatics Biomedical Language Understanding

A Review of Natural Language Processing for Biosurveillance

  • Upload
    kuri

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

A Review of Natural Language Processing for Biosurveillance. Wendy W. Chapman, PhD. Biomedical Language Understanding. University of Pittsburgh. Dept of Biomedical Informatics. Current Surveillance. New strain of H5N1 Avian influenza. Cough 咳嗽. Respiratory. - PowerPoint PPT Presentation

Citation preview

Page 1: A Review of  Natural Language Processing for  Biosurveillance

A Review of Natural Language Processing

for Biosurveillance

Wendy W. Chapman, PhD

University of Pittsburgh

Dept of Biomedical Informatics

Biomedical Language Understanding

Page 2: A Review of  Natural Language Processing for  Biosurveillance

Current SurveillanceNew strain of H5N1

Avian influenza

Cough咳嗽Respiratory

Page 3: A Review of  Natural Language Processing for  Biosurveillance

Recent travel

Exposure to others

Positive CXR

0 10 20 30 40

Patient + CXR Expos Travel SeverePat 1 X X X X

Pat 2

Pat 3 X X

Pat 4 X

cough

SOB

0 5 10 15 20 25 30 35 40

Respiratory Patients

Leverage More Data for Surveillance

Page 4: A Review of  Natural Language Processing for  Biosurveillance

Biosurveillance

BioterroristThreats

Detectattacks

Natural DiseaseOutbreaks

Detectoutbreaks

DisasterManagement

Understandsituation

Much of the useful data in textual formatNeed natural language processing

Page 5: A Review of  Natural Language Processing for  Biosurveillance

Textual Data Sources

Non-clinical

Clinical

Page 6: A Review of  Natural Language Processing for  Biosurveillance

Internet Mapping of Outbreaks

• HealthMap

• Global Health Monitor

• Global Public Health Intelligence Network (GPHIN)—Public Health Agency of Canada

Page 7: A Review of  Natural Language Processing for  Biosurveillance
Page 8: A Review of  Natural Language Processing for  Biosurveillance

Textualclinicaldata

Textprocessor

Clinical Data for Biosurveillance

Pneumonia Yes

HistoryCough Yes

3 daysFever Yes

3 days

Page 9: A Review of  Natural Language Processing for  Biosurveillance

What types of data are available?

How do we transform the data?

How well can we process the data?

Clinical Data for Biosurveillance

Page 10: A Review of  Natural Language Processing for  Biosurveillance

Trade-off

What types of data are available?

Page 11: A Review of  Natural Language Processing for  Biosurveillance

Chief Complaints

Content• Patient's reason for seeking care• 1-2 symptoms

“Cough/headache” “n/v/d” “Motor vehicle accident”

Registration Physician Exam Discharge HomeX

Timeliness

Useful for early detection of larger outbreaks

Page 12: A Review of  Natural Language Processing for  Biosurveillance

Ambulatory & Inpatient NotesContent

• Risk factors• Travel history• Homelessness• Duration of illness• Exposure to contacts

Registration Physician Exam Discharge HomeX

• Symptoms• Findings• Medications• Allergies• Diagnoses• Chronic conditions

Clinical Epidemiological

Timeliness

Useful for targeted case detection, disease surveillance, and situational awareness

Page 13: A Review of  Natural Language Processing for  Biosurveillance

Discharge ReportsContent

• Cause• Time of death

Registration Physician Exam Discharge HomeX

• Reason for hospitalization

• Summary of care• Findings• Procedures performed• Plan for follow-up

Clinical Death

Timeliness

Most detailed but least timely—potentially usefulfor situational awareness

Page 14: A Review of  Natural Language Processing for  Biosurveillance

Text ProcessingHow do we transform the data?

Textprocessor

cough/sob

SyndromeCategoryRespiratoryextract

Clinical Conditions

No past history of pneumonia—presentswith two day history of cough.

Pneumonia- historical- absent

classifyChiefComplaints

TextualNotes Cough

- recent- present

Page 15: A Review of  Natural Language Processing for  Biosurveillance

Three Methods for Interpreting Text• Keyword-based

– NYC Syndromic Macros– If “cough*” or “wheez*” Respiratory

• Symbolic– Semantics, syntax, discourse– stomach cramp is a type of abdominal pain

• Statistical– P ( localized infiltrate |

anatomic location = lower lobe,finding = hazy opacity ) = 0.96

Page 16: A Review of  Natural Language Processing for  Biosurveillance

Processing Chief Complaints—Challenges

• Synonyms– Short of breath dyspnea– Coughing cough– Coughs cough

• Abbreviations– ha headache– abd abdominal– gx ground

transportation• Acronyms

– n/v nausea/vomiting– sob shortness of breath

• Truncations– diar diarrhea– poss possible

• Concatenations– blurredvision burred

vision– flus sxs flu symptoms

• Misspellings & typographic errors– nausa nausea– diahrea diarrhea

Substantial word variation

Page 17: A Review of  Natural Language Processing for  Biosurveillance

Contain linguistically complex narrations• Linguistic variation• Polysemy• Negation• Contextual information• Implication• Coreference

Processing Notes—Challenges

Page 18: A Review of  Natural Language Processing for  Biosurveillance

NegationApproximately half of all clinical concepts in

dictated reports are negated

• Explicit absence“The mediastinum is not widened”

• Mediastinal widening: absent

• Implied absence“Lungs are clear upon auscultation”

• Rales/crackles: absent• Rhonchi: absent• Wheezing: absent

• Uncertainty

Page 19: A Review of  Natural Language Processing for  Biosurveillance

Contextual Information

• Temporality– Three-day history of cough– Past history of pneumonia

• Finding Validation– She received her influenza vaccine– His temperature was taken in the ED

• Hypothetical conditions– He should return for fever

Page 20: A Review of  Natural Language Processing for  Biosurveillance

Chief ComplaintsIdentifying Syndromic Cases

Performance using this data

• Seven studies– One on pediatric population– Beitel, Chapman, Espino, Gesteland,

Ivanov• Reference standards

– ICD-9 discharge diagnoses– Physician review of ED reports

• Eight syndromic definitions– Five febrile syndromic definitions

Page 21: A Review of  Natural Language Processing for  Biosurveillance

0

20

40

60

80

100P

erce

nt o

f Cas

es Id

entif

ied

34

77

22

74

31

72

31

60

39

75

10

30

27

46

Page 22: A Review of  Natural Language Processing for  Biosurveillance

Febrile Syndromes

Syndrome only23%

Fever only19%

Neither53%

Both5%

Syndrome onlyFever onlyNeitherBoth

Sensitivity 0% – 12%

Chapman and Dowling, J ISDS, 2007

Page 23: A Review of  Natural Language Processing for  Biosurveillance

Ambulatory Notes

Triage NotesNC-Detect

– EMT-P + NegEx– Performs well at identifying clinical conditions

ED ReportsBetter case detection than chief complaints

– Topaz (Chapman)– MCVS (Elkin)– MedLEE (Friedman, South)

Page 24: A Review of  Natural Language Processing for  Biosurveillance

Inpatient NotesChest radiograph reports

Pneumonia - > 90% sens and spec– SymText (Fiszman and Chapman)– MedLEE (Friedman and Hripcsak) – MCVS (Elkin)Widened mediastinum– IPS System (Chapman)

Tuberculosis (Hripcsak)– MedLEE (Hripcsak)

Page 25: A Review of  Natural Language Processing for  Biosurveillance

Identifying Syndromic Cases from Textual Notes

CC vs full text record for Influenza-like Illness

Data source Sensitivity Positive Predictive Value

Chief Complaint 13% 47%

ED Notes 51% 37%

All notes 88% 23%

South et al.

Page 26: A Review of  Natural Language Processing for  Biosurveillance

Identifying Epidemiological Factors from Clinical Notes

Gundlapalli et al.

Structured MedLEE clinical notes

ETOH abuse 2.9 3.7

Drug abuse 4.1 29.1

Smoking 1.3 45

Homelessness 0 10.5

Illness duration 0 2.7

History of illness 0 22.3

Page 27: A Review of  Natural Language Processing for  Biosurveillance

Chief complaints

Textual Notes

• Moderate performance at identifying syndromic cases

• Poor performance at identifying specific syndromes

• Good performance at identifying syndromic cases

• Ability to identify specific conditions

• Ability to identify epidemiological factors

Page 28: A Review of  Natural Language Processing for  Biosurveillance

Where do we go from here?Identifying cases• Most work on chief complaints• Current emphasis on reports

– Need better algorithms and more research• Temporality and other contextual information

Conveying information• Little if any applied work on characterizing

outbreaks and conveying information to public health

Page 29: A Review of  Natural Language Processing for  Biosurveillance

Conclusion• Data in clinical texts are useful for

biosurveillance• Chief complaints most frequently used data

source– Poor to moderate performance

• Clinical notes promise better performance– More complicated text– Timeliness dependent on institution– Early stages of development and evaluation

• Need to develop more applications applying NLP to characterization

Page 30: A Review of  Natural Language Processing for  Biosurveillance

Thank You

Wendy W. Chapman: [email protected] Language Understanding Lab

www.dbmi.pitt.edu/blulab

Chapter on NLP for Biosurveillanceto appear in

Infectious Disease Informatics and Biosurveillance: Research, Systems, and

Case Studies