Syndromic Surveillance from Emergency Department Triage Notes

Syndromic Surveillance from Emergency Department triage notes

Karin M. Verspoor, The University of Melbourne

Antonio Jimeno Yepes, The University of Melbourne

Bahadorreza Ofoghi, The University of Melbourne

Geoffrey White, DSTO

26 September 2014 - MQClinicalNLP workshop

SynSurv

• SynSurv– Victorian Department of Health pilot

syndromic surveillance program– Detection of outbreaks based on ICD-10

diagnostic codes and presenting complaints as captured in free text notes

Our focus:Extracting information from unstructured free text to enable “early warning” monitoring

Objectives of our project

• Exploration of the application of natural language processing techniques to triage notes for syndromic surveillance– To enable surveillance directly from notes;

integration into natural workflow of ED– To support higher sensitivity and higher

precision than keyword-based methods

Emergency Department triage notes

• Free text notes– written by triage nurse upon assessment in

the Emergency Department– captures presenting symptoms and

complaints of a patient

CENTRAL CHEST DISCOMFORT WHILE EATING, RADIATING TO ARMS. PPM INSERTED 2/52 AGO. PAIN FREE O/A. HR72, BP160

FEBRILE ILLNESS FLU LIKE SYMPTOMS NAUSEA

L BASAL GANGLIAN BLEED POST COLLAPSE, NON VERBAL, EYES SPON OPENED, HYPERTENSIVE, P 70REG, PEARL, PMX CEREBRAL BLEED

SynSurv data characteristics

• 918,330 records• 730,054 records with ICD-10 diagnosis• 456,213 records with note text• 316,362 records with ICD-10 diagnosis

and note text

Two sets of Experiments

• Given a free text note,– Predict the ICD-10 code(s) for the note

– Predict a syndromic group, based on pre-defined sets of ICD-10 codes of interest

Machine learning for text analysis

Training setNotes + labels

for classes of interest(e.g. ICD-10 codes)

Machine learning algorithm

Words, Phrases,Linguistic categories;

names of entities;Domain concepts; Document features

Biomedical knowledge sources

UMLS (SnomedCT, ICD)

Language processing

ModelRelating features

of the text to classes of interest

Machine learning for text analysis

New notesto be classified

Words, Phrases,Linguistic categories;

names of entities;Domain concepts; Document features

Biomedical knowledge sources

UMLS (SnomedCT, ICD)

Language processing

Model

Predicted Classification

(label)

Abstracting linguistic variation

• Terminology mapping tools generalise language variation

• e.g. UMLS Concept C0027497• nausea• nauseated• feels sick• feeling sick• queasy• felt sick• nauseous

Predicting ICD-10 codes(Results)

• Direct term matching strategy outperformed by machine learning– Performance difference between micro-

average and macro-average indicates that some ICD-10 codes are underrepresented in the data, and cannot be modeled well

Predicting Syndromic Groups

• Task– Syndromic groups are defined by sets of

ICD-10 codes, e.g. Flu like group

Predicting Syndromic Groups(Detailed Results)

Issues for low performance

• Inconsistency in ICD-10 annotation– ? FISH BONE IN THROAT J03– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT S10.9– ? FISH BONE IN THROAT J02.0

• Notes not related to the patient´s visit– DIRECT ADMISSION FROM BAIRNSDALE TO 3S BED 25

• Typos in the notes text– ? FIH BONE IN THROAT

Integrating with DSTO’s BioSurv system

• Input to the DSTO BioSurv system– Trained machine learning models used as

input to BioSurv (e.g., C2 algorithm)– Prediction probability > 0.5

Model

Predicted Classification

(label)

Yesflu-like illness

No

BioSurvCount +1

Example: Flu like syndrome NLP notes annotation

• Records with no ICD-10 codes in the database are now available to BioSurv

• 730,054 out of 918,330 records with ICD-10 codes

C2 algorithm: ICD-10 vs NLP

• Earlier alert time using NLP methods

ICD-10 NLP

Conclusions

• NLP methods can be used to support the BioSurv tool

• Machine learning methods perform better than dictionary-based methods

• Expansion of original syndromic groups improves machine learning performance

• Evaluation is a challenge– Noisy training data– What’s a “gold standard” alert?

Acknowledgements

• Victorian Department of Health(for SynSurv data)

• Defence Science and Technology Organisation (DSTO)(BioSurv system)

(funding and collaboration)

© Copyright The University of Melbourne 2011

Health & Medicine

Syndromic Surveillance from Emergency Department Triage Notes