From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October...

Preview:

Citation preview

From free text to clinical data

Language and Computing

Davide Zaccagnini, MDKaren Doyle, RNOctober 23, 2007

Outline

• Reality of Applying NLP to AHLTA documents

• Use Cases

• Ontology-Based NLP

Use Cases

• PRIMARY Use Case for Health Care Documentation compared with documentation produced for Biomedical Research

– Collect information to determine diagnosis (ses) and execute a plan of treatment and communicate with healthcare team.

• By-products of Electronic Documentation– Coding for Billing – Problem Lists– Past Medical History– Social History; 14 Elements tobacco use ETOH, toxin exposure, marital

status – Family History– Medications – Allergies– Bio-surveillance– Quality Metrics; Pay for Performance, Joint Commission, HEDIS– Research

AHLTA offers Structured Documentation Tool

Medcin Terms in Blue

Structured and Unstructured Text DoD HA Policy Guidance

Ref ASAD Health Affairs August 7, 2007

Blue is the original code calculated based on the structured documentation. Pinks are the how the Doctor can change the subscores,. But the document does not change.

Background of TATRC HPI Free Text DUMMY

• Lost Data in S/O sections: What is the value?• Patient History

– Patient’s “story”, reflects signs and symptoms – History of Present Illness – Review of Systems:– Past Family, Social and Medical History– Used to calculate Evaluation and Management (E&M)

Billing Codes• HPI: History of Present Illness

– Definition: A chronological description of the present illness from the first sign or symptom, or from last encounter

– Comprised of 8 Elements used in the calculation of E&M code

Location, quality, severity, duration, timing, context, modifying factors, associated signs and symptoms

(HPI Dummy # 1) Free text Section Extracted manually

for Analysis

100 Texts for Processing

Free Text to Data: What is desirable?

• HPI 1 45yo G4P4, POD14 s/p TAH, doing well. Denies f/c. Denies any pain. Not taking any pain meds. Staples removed on 9May. Appetite good. No N/V. Normal bowel/bladder function. She is very happy with the outcome of surgery. Only concern is incision -very small area that has not healed completely. has been keeping the incision clean and dry.

• Expand Abbreviations• Codify Terms to

Vocabularies ICD 9 SNOMED, MEDCIN

• Negation• Modality• Applying Rules

– Financial Billing – Obtain; age, height,

weight, blood pressure, dates

– Quality Metrics – Surveillance – History, Family, Past

Medical, Current Problems?

Free Text Example

Expand Abbreviations Code to Vocabularies

Evaluate for Negation

Apply Rules

appetite good

good

very

f/cn/v

TAHpain

happy

taking pain meds

negation

Ontology-based NLP

Natural Language Processing and Understanding

“…..natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

Wikipedia

DATA MODELS ONTOLOGY

FORMALLY DEFINED OF CONCEPTS:

• NO PREDEF. USE

• REALITY DRIVEN

• NO PREDEF. CONTEXT

• INFERRED MODEL

AGREED UPONTERMS:

• PREDEF. USE

• DATA DRIVEN

• PREDEF. CONTEXT

• SPECIALIZED MODEL

Representations (formal or otherwise)

What is fever?

All definitions are accurate within their model, but what is fever?

does the patient have fever?

ID# ZIP code BP001123 02139 80/120

001223 24425 65/130

patientidentifier

geographical area

blood pressure

ID#

The world according to a databasePatients {ID#, ZIP code, BP}

The world according to an ontology patient

has (identifier (is_a (ID#)) ∩ lives_in (geographic_area) ∩ has (blood_pressure (is_measured_by (blood pressure measurement(…)))

blood pressure measurement

value

80/120

is_a

is_identifed_by has is_measured_by generates

is_a is_a

65/130

lives in

ZIP codeis_identifed_by

Formal representations

Ontologies:the meaning of data

An ontology:• Explicitly specifies meaning• Represents reality, not data• Is a formal schema• Its consistency can be automatically

enforced and checked

NLP Workflow

• Example Pipeline

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Lexeme filter

Vital signs extractor

Labs extractor

FreePharma

Disambiguator

Coder

Concept filters

Relevance ranker

Output handler

Negation/modality

-> Assigns fragment labels to pieces of text within sections

-> Filters out function words (e.g. determiners) to reduce false mapping positives

-> Identifies negation, modality and future

-> Extracts vital signs

-> Extracts lab results

-> Extracts medications

-> Disambiguates concepts

-> Codes to standard classification systems like SNOMED-CT, ICD-9,…

-> Fetches document and pass to first processing component

-> Paragraph and title detection

-> Maps tokens and multi-words to ontology. Rewriting to enhance mapping

-> Assigns section labels to paragraphs

-> Performs syntactic parsing validating against grammar

-> Marks concepts that belong to different filters (e.g. diagnoses, procedures)

-> Calculates relevance of concepts

-> Creates XML/HTML/… output

Semantic tagger -> Further deduces concepts based on syntax, rewriting, full definitions and so on

Semantic Tagging

Concept: SNOMED CT : 29074008 : POLYP OF ANTRUM (DISORDER)

Sample: “Demonstrated benign small polyps in the antrum”

antrum > antralpolyp < polypsMorphological Variations:

antral polyp ; polyp antralWord Clustering:

maxillary sinus polyp, antral polypKnown Synonyms:

Types of Disambiguation

by STRING: lexical match between a term, (or it’s inflections) and a concept in the ontology.

fever

symptom

cough

Ex.: “Patient presents fever”

by DEFINITION: match between terms and concepts in the ontology, where these concepts meet necessary and sufficient conditions (logic-based reasoning)

Ex.: “Patient underwent a liver biopsy”

true true

has_location (liver) Λ is_a (biopsy)

procedureorgan

liver biopsy

liver biopsy =

Types of Disambiguation

by RELATIONSHIPS: match between SOME of the term(s), assigned to different concepts in the ontology, where these concepts compose the full definition of the concept using a ‘suggested parent’.

Ex.: “CT of thyroid”

true true

is_a (CT scan) Λ has_location (thyroid)

neckCT thyroid

CT of Neck

?

has_location

is_a (CT scan) Λ has_location (neck)

true

=

=

is_a

Types of Disambiguation

Examples of disambiguation

Ontology and NLP

LinKBase®

MedicalOntology

Spanish

English

Lexicon Grammar Proprietary

ICD-9

MEDCIN

SNOMED CT

CPT

Radlex (partial)

concepts are mapped to terms in multiple languages

Cross-mapped to multiple coding systems

Natural language processing Terminologies and data integration

Conclusion

• Ontologies are powerful NLP tools for:• Segmentation• Disambiguation• Higher level inference• Interoperability of extracted data• Requires human resources for maintenance,

but reduce the need for annotated data

• They are “white boxes”• Models that can be expanded and changed

• Combined with stochastic algorithms, they provide both formality and scalability

Thank you

“Patients in the North East have higher blood pressure than the average population”

patientidentifier

geographical area

blood pressure

ID#

blood pressure measurement

value

80/120

is_a

is_identifed_byhas

is_measured_bygenerates

is_a is_a

65/130

lives in

ZIP codeis_identifed_by

NLP/U, formal representations

Disambiguation

• Words in document are mapped to concepts in the ontology

• When more than one candidate exist in the ontology, it builds a graph of concept relations using:1. Nearness in sentence2. IS_A Relationships

3. Horizontal relationships

Syntactic Parsing

«A very young patient was given a double dose by his mother.»

The subject.

The predicate

Note passiveconstruction

Negation via Syntax

Modality via Syntax

Reference Resolution

“TeSSI” understands indirect reference to patient

The system is able to disambiguate between two different meanings of “depressed” in one and the same sentence. While it defines the “depressed” in “depressed patient” as a state of mind, it recognizes “depressed” as a part of “depressed fracture” and tags this noun phrase with the corresponding SNOMED code.

Disambiguation

Fragment Labeling

• Sentences and phrases are labeled• History, exam, impression, etc.

• Independent of superficial formatting

• One label – one type of information

“HPI: The patient whose mother had breast cancer presents with loss of hearing”

Family History

Chief Complaint

Fragment Labeling

FreePharma

. Medication Extraction• Example

Semantic Indexing

Input handler

Paragrapher

Segmenter

Disambiguator

Relevance ranker

Indexer

-> Disambiguate concepts

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Map tokens and multi-words to ontology

-> Calculate relevance of concepts

TeSSI : Terminology Supported Semantic Indexing

-> Write information to index for quick access.

Information Extraction

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Output handler

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

-> Create XML/HTML/… output

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Knowledge Discovery

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Ontology writer

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

-> Add discovered knowledge to onology

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations

Automatic coding

Input handler

Paragrapher

Segmenter

Section labeler

Syntactic parser

Fragment labeler

Vital signs extractor

Labs extractor

FreePharma

Negation/modality

-> Assign fragment labels to pieces of text within sections

-> Identify negation, modality and future

-> Extract vital signs

-> Extract lab results

-> Extract medications

-> Fetch document and pass to first processing component

-> Paragraph and title detection

-> Assign section labels to paragraphs

-> Perform syntactic parsing validating against grammar

Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on

-> Map tokens and multi-words to ontology

Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations

Code Calculator -> Code calculator: e&M, ICD-9, CPT

Output handler -> Create XML/HTML/… output

NLP-based applications and products

44 44

Quality

Projects:CPR TechnologiesJCAHOEclipsys

• Extraction of CMS Core Measures• National Patient Safety Network• Datawarehousing

45 45

Coding

Projects:Kaiser PermanenteConvergent Solutions

• E&M Coding• SNOMED Coding• ICD-9 Coding• CPT in development

46 46

Medication Extraction

Projects:The Marshfield ClinicMedquistUAB

• Medication Reconcilation• Personalized Medication Project• Validation of therapies from literature

47 47

Interoperability

Projects:Integic/DoDRevolution Health

• Semantic Integration of the military health systems

• Tie together free text content and portal applications

48 48

Web Search and Retrieval

Projects:Revolution HealthMerck

• Ontolgy enhanced search • Concept based indexing

49 49

Radiology

Projects:FUJIFILM MEDICAL SYSTEMS

• Findings and pertinent negatives extracted from radiology reports

Radiology

• Observation Types• Findings• Pertinent Negatives• Quality Assurance• Unclassified

• Observation Components• Fundamentals• Modifiers• Qualifiers

• Observation Status• (Present) / Historical• Changed/Not Changed/(not stated)

Observation Types

• Findings• E.g. “bilateral infiltrates”

• Pertinent Negatives• E.g. “the lungs are clear”

• Quality Assurance• E.g. “poor inspiration”

• Unclassified• E.g. “the lungs are unchanged”

Observation Components

• Fundamentals• Pathologic Entities• Physiologic entities• Devices• Procedure

• Modifiers• Location• Qualitative• Quantitative

• Uncertainty (modal)• Negation

Observation Status

• Historical• (non-Historical)• Change Stated• No Change Stated• (Change not stated)• Grouped• Contains Uncertain (modal) Element

Example PN and F (Modal)

Example Hx and Grouped

Example CS and NCS

Example Quality Assurance

Modifier in long distance dependency

Finding of PE in

historical context

Finding of devices

Findings

A knowledge that lungs should be

clear

negation of abnormalities

statement of normality

Pertinent Negatives