19
Syndromic Surveillance from Emergency Department triage notes Karin M. Verspoor, The University of Melbourne Antonio Jimeno Yepes, The University of Melbourne Bahadorreza Ofoghi, The University of Melbourne Geoffrey White, DSTO 26 September 2014 - MQClinicalNLP workshop

Syndromic Surveillance from Emergency Department Triage Notes

Embed Size (px)

DESCRIPTION

Background Syndromic surveillance refers to reporting and tracking of reportable and unusual diseases to public health officials. Conventional surveillance strategies are often manual, or depend on confirmatory laboratory testing after a disease diagnosis. These traditional strategies often result in relatively late detection of an outbreak or public health emergency. Strategies for reliably accelerating surveillance are under active research. The aim of our work is detection of specific syndromes in individual patient triage records in the hospital Emergency Department (ED). We focus on analysing the free text clinical notes written by a triage nurse during a brief pre-diagnostic assessment of a patient upon arrival in the ED. The system can detect patients that appear to have a disease of interest. Methods We work with a set of over 310,000 records collected in two Victorian EDs over a several-year period. Each patient triage record in our data includes (1) a free text note and (2) a diagnostic code from the International Classification of Disease (ICD-10) that was assigned after the fact. This data was used for training and testing of various classifiers, in a cross-validation scenario. We experimented with a range of different set-ups, including attempting direct prediction of ICD-10 codes for a given triage note, as well as prediction of “syndromes” defined by a specific set of ICD-10 codes. We also experimented with several different feature representations and machine learning models. Results In general, the performance of the models for syndromes was better than for direct ICD-10 category classification, suggesting that the syndrome definitions are clinically coherent. We observed substantial variation in performance across the various syndromes; several syndromes had too few examples in the dataset to build an effective classifier. The best performance on these tasks used a machine learning model that incorporates pre-processing of the texts to identify direct mentions of ICD-10 and SNOMED CT terms. Conclusion We have demonstrated that it is possible to build an effective syndrome detection tool for ED triage notes, where there is adequate and reliable training data available for a given syndrome of interest. We have shown that semantic abstraction of the text into “medical concept space” is of benefit for this task.

Citation preview

Page 1: Syndromic Surveillance from Emergency Department Triage Notes

Syndromic Surveillance from Emergency Department triage notes

Karin M. Verspoor, The University of Melbourne

Antonio Jimeno Yepes, The University of Melbourne

Bahadorreza Ofoghi, The University of Melbourne

Geoffrey White, DSTO

26 September 2014 - MQClinicalNLP workshop

Page 2: Syndromic Surveillance from Emergency Department Triage Notes

SynSurv

• SynSurv– Victorian Department of Health pilot

syndromic surveillance program– Detection of outbreaks based on ICD-10

diagnostic codes and presenting complaints as captured in free text notes

Our focus:Extracting information from unstructured free text to enable “early warning” monitoring

Page 3: Syndromic Surveillance from Emergency Department Triage Notes

Objectives of our project

• Exploration of the application of natural language processing techniques to triage notes for syndromic surveillance– To enable surveillance directly from notes;

integration into natural workflow of ED– To support higher sensitivity and higher

precision than keyword-based methods

Page 4: Syndromic Surveillance from Emergency Department Triage Notes

Emergency Department triage notes

• Free text notes– written by triage nurse upon assessment in

the Emergency Department– captures presenting symptoms and

complaints of a patient

CENTRAL CHEST DISCOMFORT WHILE EATING, RADIATING TO ARMS. PPM INSERTED 2/52 AGO. PAIN FREE O/A. HR72, BP160

FEBRILE ILLNESS FLU LIKE SYMPTOMS NAUSEA

L BASAL GANGLIAN BLEED POST COLLAPSE, NON VERBAL, EYES SPON OPENED, HYPERTENSIVE, P 70REG, PEARL, PMX CEREBRAL BLEED

Page 5: Syndromic Surveillance from Emergency Department Triage Notes

SynSurv data characteristics

• 918,330 records• 730,054 records with ICD-10 diagnosis• 456,213 records with note text• 316,362 records with ICD-10 diagnosis

and note text

Page 6: Syndromic Surveillance from Emergency Department Triage Notes

Two sets of Experiments

• Given a free text note,– Predict the ICD-10 code(s) for the note

– Predict a syndromic group, based on pre-defined sets of ICD-10 codes of interest

Page 7: Syndromic Surveillance from Emergency Department Triage Notes

Machine learning for text analysis

Training setNotes + labels

for classes of interest(e.g. ICD-10 codes)

Machine learning algorithm

Words, Phrases,Linguistic categories;

names of entities;Domain concepts; Document features

Biomedical knowledge sources

UMLS (SnomedCT, ICD)

Language processing

ModelRelating features

of the text to classes of interest

Page 8: Syndromic Surveillance from Emergency Department Triage Notes

Machine learning for text analysis

New notesto be classified

Words, Phrases,Linguistic categories;

names of entities;Domain concepts; Document features

Biomedical knowledge sources

UMLS (SnomedCT, ICD)

Language processing

Model

Predicted Classification

(label)

Page 9: Syndromic Surveillance from Emergency Department Triage Notes

Abstracting linguistic variation

• Terminology mapping tools generalise language variation

• e.g. UMLS Concept C0027497• nausea• nauseated• feels sick• feeling sick• queasy• felt sick• nauseous

Page 10: Syndromic Surveillance from Emergency Department Triage Notes

Predicting ICD-10 codes(Results)

• Direct term matching strategy outperformed by machine learning– Performance difference between micro-

average and macro-average indicates that some ICD-10 codes are underrepresented in the data, and cannot be modeled well

Page 11: Syndromic Surveillance from Emergency Department Triage Notes

Predicting Syndromic Groups

• Task– Syndromic groups are defined by sets of

ICD-10 codes, e.g. Flu like group

Page 12: Syndromic Surveillance from Emergency Department Triage Notes

Predicting Syndromic Groups(Detailed Results)

Page 13: Syndromic Surveillance from Emergency Department Triage Notes

Issues for low performance

• Inconsistency in ICD-10 annotation– ? FISH BONE IN THROAT J03– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT S10.9– ? FISH BONE IN THROAT J02.0

• Notes not related to the patient´s visit– DIRECT ADMISSION FROM BAIRNSDALE TO 3S BED 25

• Typos in the notes text– ? FIH BONE IN THROAT

Page 14: Syndromic Surveillance from Emergency Department Triage Notes

Integrating with DSTO’s BioSurv system

• Input to the DSTO BioSurv system– Trained machine learning models used as

input to BioSurv (e.g., C2 algorithm)– Prediction probability > 0.5

Model

Predicted Classification

(label)

Yesflu-like illness

No

BioSurvCount +1

Page 15: Syndromic Surveillance from Emergency Department Triage Notes

Example: Flu like syndrome NLP notes annotation

• Records with no ICD-10 codes in the database are now available to BioSurv

• 730,054 out of 918,330 records with ICD-10 codes

Page 16: Syndromic Surveillance from Emergency Department Triage Notes

C2 algorithm: ICD-10 vs NLP

• Earlier alert time using NLP methods

ICD-10 NLP

Page 17: Syndromic Surveillance from Emergency Department Triage Notes

Conclusions

• NLP methods can be used to support the BioSurv tool

• Machine learning methods perform better than dictionary-based methods

• Expansion of original syndromic groups improves machine learning performance

• Evaluation is a challenge– Noisy training data– What’s a “gold standard” alert?

Page 18: Syndromic Surveillance from Emergency Department Triage Notes

Acknowledgements

• Victorian Department of Health(for SynSurv data)

• Defence Science and Technology Organisation (DSTO)(BioSurv system)

(funding and collaboration)

Page 19: Syndromic Surveillance from Emergency Department Triage Notes

© Copyright The University of Melbourne 2011