41
Assisted Chart Abstraction: a technique to help while we wait for Nirvana

Using NLP and curation to make clinical data available for research

Embed Size (px)

Citation preview

Assisted Chart Abstraction: ���a technique to help while

we wait for Nirvana

The Driver

Many entities within Northwestern Medicine (NM) want to capture data about cancer patients treated at NM.

!   Research

!   Education

!   Operational/Outcomes Analysis

!   QA/QC and Process Improvement

!   Marketing/Branding/Outreach Assessment

Challenges

NM has multiple EHR systems: Epic (NMFF), PowerChart (NMH) and Mosaiq (Radiation Oncology).

Not all clinical systems flow into one of the EHRs

All relevant data is not discretely captured during the course of clinical care. For example, pathology diagnosis is recorded in a textual document.

Northwestern Medicine Enterprise Data Warehouse (NMEDW)

The NMEDW is:

One stop shop for finding data from 40+ clinical systems, 10 years of data, and 2.2 Million patients (4 billion events!)

Optimized cross-system data marts representing major biomedical entities and events: patients, providers, encounters, labs, medications...and more.

Intelligent structures, data representations, and the ability to identify and correlate data across patients, events and data types

Who is requesting change?

The Northwestern Brain Tumor Institute*

SPORE in Prostate Cancer

Lynn Sage Breast Cancer Center

Gastrointestinal Oncology Group

Many others - typically disease-focused

* We will focus mainly on use cases and workflows from BTI

What data do they need?

Demographics

Diagnosis

Treatment

Disease Progression

Survival

Old Solution

Data coordinator opens up EHR(s) and manually copies data into a clinical database.

Newer solution: Data coordinator pulls data from reports run against the NMEDW and copies/extracts/annotates them into the clinical database

Command + Tab Model

A manually curated database disconnected from EHR data.

Depends on a data coordinator finding and manually copying data from the EHR to a clinical database.

EHR

command + tab

Clinical Database

Command + Tab Model: Pros

Depends on humans:

Humans are great at interpreting narrative documentation -- where a significant portion of cancer clinical data (unfortunately) resides.

Command + Tab Model: Cons

Depends on humans:

Difficult for a human to be aware of every relevant medical event of every patient within a cohort.

Ignores the flux that occurs within EHRs: patient medical histories merging and splitting.

Humans get bored with rote copying discrete data.

Humans quit and get new jobs.

New Solution: ���NBTI Data CaptureTool

The NBTI Data Capture Tool automatically pulls (via the NMEDW) relevant EHR data for each patient.

Data points discretely captured in the EHR need no further review.

Data points captured non-discreetly in textual documents are abstracted via natural language process (NLP) and presented to a data coordinator for review/revision.

Why not use reports? Lots of valuable clinical data still resides in narrative

documents.

Not all discrete data contained within the EHR(s) has been normalized into easily queryable structures in the NMEDW.

Today an investigator cannot ask an NMEDW analyst the question and get a quick result:

How many IDH1 negative glioma patients survived longer than 5 years?

Waiting for Nirvana

NMEDW reports will not obsolete research clinical databases until:

!   Clinical IT optimizes the EHRs to discretely capture all relevant data points (ain’t happenin’)

!   The NMEDW normalizes all EHR data into easily digestible formats and to reference terminologies (limited by above step!)

Sources for the first iteration

Epic: support discrete data capture of fundamental treatment/diagnosis data points.

Epic/MyChart: embed intake form.

Cerner: support discrete data capture of pathology data points.

Cerner: support explicit association between pathology cases and surgeries.

MOSAIQ: support discrete data capture of site, laterality for radiation therapies:.

Building the Foundation

Analyze the Data

Started with a list of data elements and sample data from a neuro-oncologist and a neurosurgeon

Determined obtainability of each data element:

!   Discrete in the EHR and in the EDW.

!   Discrete in the EHR but not in the EDW.

!   narrative document in the EHR and in the EDW.

!   narrative document in the EHR but not in the EDW.

Build an EDW ���Data Mart

Engaged the NMEDW team to build a NBTI-dedicated data mart and extract transform load (ETL) script:

patients encounters

medications surgeries

surgery notes pathology cases

gamma knifes radiation therapies

imaging exams progress notes

labs

Build a Clinical Database

Build a clinical database mirroring the structure of the EDW data mart in a PostgreSQL server

Add database structures to allow for the layering of curated data on top of data imported from the EDW.

Import Data

Expose the data in the EDW data mart as web services via SQL Reporting Service reports.

Automate via cron jobs the pull of data into the clinical database from the EDW with shared EDW web service adapter code.

Patients The criteria for inclusion within the NBTI system is

determined by a list of ICD diagnosis codes. Criteria could alternatively be determined by consent to a protocol.

Pull from the NMEDW patient name, birth date, MRN(s), gender, ethnicity, race, death date and last seen date (across Northwestern Medicine - NM).

Integrate with Specimen Inventory Data

Prepare data for migration into PathCore's specimen inventory system BSI2.

Allow for ad hoc query exploration of specimen inventory based on clinical data points.

Standardizing the structure of clinical data captures across sites makes this possible.

NLP

Build an NLP pipeline to abstract from the flow of narrative documents and textual fragments discrete data points.

Use the Stanford NLP library for chunking and sentence splitting.

Use the lingscope library to parse the negation scope of sentences.

Use the NCI metathesaurus for synonym lookups and attaching codes

Electronic Intake Form

Deploy an electronic intake that can be filled out by a patient before or at their first clinic visit.

email is sent, can be filled out by web browser, tablet or (painfully) on a smart phone

Navbar Sections

The Results

Biopsy, Surgery and Pathology Diagnosis

Pull from the NMEDW NM biopsies, surgeries, surgical procedure reports and pathology cases (inside and out).

Abstract and allow for the confirmation/revision of surgery type, site, laterality, pathology diagnosis, grade, recurrence, anatomical location of primary, cancer staging and pathology test results.

Present the NLP Abstractions

Present NLP-derived abstractions as queues of work that the data coordinator needs to confirm or revise.

Surgeries and Pathology Cases

Surgery Detail

Pathology Case Detail

With Context

Gamma Knife & Radiation Therapy

Pull from the NMEDW NM radiation therapies.

Abstract and allow for the confirmation/revision of site and laterality.

Intravenous Chemotherapy

Pull from the NMEDW NM intravenous chemotherapy treatments (from Intellidose and Epic Beacon).

Labs and Other Medications

Labs

•  Pull from the NMEDW NM labs.

Other Medications

•  Pull from the NMEDW NM non-intravenous, prescribed/ordered medications.

•  Allow for confirmation/revision of drug, route, duration, amount, patient parameter and administered.

Imaging Exams and Clinic Visits

Imaging Exams

•  Pull from the NMEDW NM imaging exams.

•  Abstract and allow for confirmation/revision of response/progression declarations and lesion measurements.

Clinic Visits

•  Pull from the NMEDW NM clinic visit notes. Abstract and allow for confirmation/revision of performance status declarations and tracking of outside treatments.

Reporting

Ad-hoc query exploration of data.

Integrate NMH quality metrics.

Generate Kaplan Meier survival curves against SEER data on demand

Reporting

Export

Export into Word, Excel, CSV for analysis and visualization by SAS, SPSS, R, etc