26
Improving Healthcare Outcomes by Leveraging Unstructured Data for Secondary Use How Organizations Can Anonymize Unstructured Data in Electronic Health Records www.privacyanalytics.ca | 855.686.4781 [email protected] in Electronic Health Records

Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Embed Size (px)

DESCRIPTION

Healthcare organizations are responding to meaningful use and accountable care initiatives that focus on increasing the quality of care, improving patient safety and reducing costs. Analyzing patient-level data in electronic health records for secondary use is critical to driving these initiatives successfully. Privacy and compliance and data analytic professionals will learn how their organizations can: - Comply with HIPAA-based requirements for anonymizing unstructured data for secondary use; - Discover personal information in text-based formats and apply risk-based rules to their de-identification; - De-identify and mask unstructured data in text and XML formats; and, - Determine risk metrics associated with anonymization and its quality for analysis. To listen to this presentation, please visit https://vimeo.com/80921698/settings

Citation preview

Page 1: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Improving Healthcare Outcomes by Leveraging

Unstructured Data for Secondary Use

How Organizations Can Anonymize Unstructured Data

in Electronic Health Records

www.privacyanalytics.ca | [email protected]

in Electronic Health Records

Page 2: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Our Presenters

Dr. Khaled El Emam is the founder and

CEO of Privacy Analytics Inc.

PRIVACYANALYTICS.CA2

www.privacyanalytics.ca | [email protected]

Chris Wright is Privacy Analytics’ VP

Marketing, and will be your moderator

today.

Page 3: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

• Privacy Analytics – Overview

• The Proliferation of Unstructured Data in Healthcare

• The Role of Unstructured Data in Improving Healthcare Outcomes

• Key Steps to Anonymize Unstructured Data

– How to Mitigate the Risk of Re-identification

– What Are Your Organization’s Compliance Considerations

Agenda

PRIVACYANALYTICS.CA3

www.privacyanalytics.ca | [email protected]

– What Are Your Organization’s Compliance Considerations

– Anonymization Test Case

• DEMO

• Summary

• Question and Answer

Page 4: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Privacy Analytics - Overview

• Purpose-built software that automates the de-

identification and masking of data using a risk-based

approach to anonymize personal information in

compliance with HIPAA requirements

For organizations that want to safeguard and enable their personal

information for secondary use …

PRIVACYANALYTICS.CA4

www.privacyanalytics.ca | [email protected]

• Integrated capabilities to anonymize structured and

unstructured data from multiple sources

• Peer-reviewed methodologies and value-added

services that certify data for secondary use

Page 5: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Secondary Use for Healthcare Data

Secondary use of health data applies

personal health information (PHI) for uses

outside of direct health care delivery. It

includes such activities as analysis, research,

quality and safety measurement, public

Definition

PRIVACYANALYTICS.CA5

www.privacyanalytics.ca | [email protected]

quality and safety measurement, public

health, payment, provider certification or

accreditation, marketing, and other business

applications, including strictly commercial

activities. 1

1. Definition sourced from white paper, “Toward a National Framework for the Secondary Use of Health Data: An American Medical

Informatics Association White Paper”, J Am Med Inform Assoc 2007;14:1-9 doi:10.1197/jamia.M2273

Page 6: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Our Customers

State of

Louisiana

Department of Preventative Medicine

PRIVACYANALYTICS.CA6

www.privacyanalytics.ca | [email protected]

Page 7: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

The Changing Healthcare Data Landscape

Richer levels of

aggregated data,

but increasingly

granular with the

view of capturing

PRIVACYANALYTICS.CA7

www.privacyanalytics.ca | [email protected]

McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013

view of capturing

the totality of

patient

information,

experiences and

interactions

Page 8: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

The Proliferation of Unstructured Data

According to IBM, Ovum and other researchers, 80-90 percent of all

medical data today is unstructured ... and that volume is doubling

every five years.

Electronic health records where personal information

resides in XML as free form text and needs to be

anonomyized for analysis

1

PRIVACYANALYTICS.CA8

www.privacyanalytics.ca | [email protected]

Medical devices where unstructured data or free form

text from machine “dumps” (i.e. x-ray machines or CAT

scans) is sent to a database(s) for analysis

Online Forums where patients or providers discuss their

conditions or cases, requiring anonymization to facilitate

sentiment analysis and other forms of information analysis

1. http://ovum.com/2012/05/11/unlocking-the-potential-of-unstructured-medical-data/

Page 9: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Growth of Unstructured Data in EHR’s

• 3X increase for EHR

systems with basic

clinician notes since

PRIVACYANALYTICS.CA9

www.privacyanalytics.ca | [email protected]

clinician notes since

2008

• 11X increase in the

adoption of

comprehensive EHR

systems since 2008

1. The Office of the National Coordinator for Health Information Technology,

ONC Data Brief, March 2013

Page 10: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Creating the Conditions for an Analytic Pipeline

Creating an Analytic Pipeline for Unstructured Data for

Secondary Use

Unstructured data’s utility for

action is shaped by the relative

degree of compliance, risk and

anonymization applied.

Discharge summaries

EHR unstructured data is rich with insight, but requires another step to

optimize its utility for secondary use and derive actionable insight

PRIVACYANALYTICS.CA10

www.privacyanalytics.ca | [email protected]

Unstructured

Data Anonymized

DataReporting Advanced

Analytics

anonymization applied.

Text fields

EHRXML code

Physician notes

Comments

Scanned docs

Page 11: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Balancing Privacy and Utility for Secondary Use

Data Quality1 Analytic Granularity2 Depth of Insight3

Ensuring de-identified

data has analytic

Allowing users to

configure de-identification

Enabling analysis of the

total patient health

PRIVACYANALYTICS.CA11

www.privacyanalytics.ca | [email protected]

data has analytic

usefulness by

determining its relative

risk associated with its

disclosure, sharing and

re-sale

for aggregated and micro-

level analysis of patient

level data without

compromising privacy

and costly breaches

experience, to compile a

complete picture of this

experience from multiple

data sources and types

Page 12: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

• Privacy Analytics – Overview

• The Proliferation of Unstructured Data in Healthcare

• The Role of Unstructured Data in Improving Healthcare Outcomes

• Key Steps to Anonymize Unstructured Data

– How to Mitigate the Risk of Re-identification

– What Are Your Organization’s Compliance Considerations

Agenda

PRIVACYANALYTICS.CA12

www.privacyanalytics.ca | [email protected]

– What Are Your Organization’s Compliance Considerations

– Anonymization Test Case

• DEMO

• Summary

• Question and Answer

Page 13: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

• Files (text files, XML files, other formats that can be converted

to text such as Word and PDF)

• Text fields in a database (e.g., notes and comments fields)

Types of Text

PRIVACYANALYTICS.CA13

www.privacyanalytics.ca | [email protected]

Page 14: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman

with a history of diverticulitis who was found to have colon

cancer on colonoscopy, which was performed in July of 2002.

An invasive moderately differentiated adenocarcinoma was

noted in the transverse colon at 80 cm.

Example Text Information

PRIVACYANALYTICS.CA14

www.privacyanalytics.ca | [email protected]

Patient address: [email protected]

Page 15: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Detecting Personal Information

Direct Identifiers

� First name

� Middle name

� Last name

� Street

� PO Box

Indirect IdentifiersIndirect Identifiers

� City

� State

� Country

� ZIP Code

� Postal Code

PRIVACYANALYTICS.CA15

www.privacyanalytics.ca | [email protected]

� PO Box

� Email address

� IP address

� Phone number

� ID (e.g., SSN and CC)

� Postal Code

� Organization (facility) name

� Age

� Date

Page 16: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman

with a history of diverticulitis who was found to have colon

cancer on colonoscopy, which was performed in July of 2002.

An invasive moderately differentiated adenocarcinoma was

noted in the transverse colon at 80 cm.

Detecting Personal Information

PRIVACYANALYTICS.CA16

www.privacyanalytics.ca | [email protected]

noted in the transverse colon at 80 cm.

Patient address: [email protected]

Page 17: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

• Redact: “*****”

• Redact & Tag: <Firstname index=1/>

• Randomize and Replace: replaces the value with a randomly

generated value

Anonymizing the Information

PRIVACYANALYTICS.CA17

www.privacyanalytics.ca | [email protected]

generated value

• Special generalization rules for dates and ZIP Codes

Page 18: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Ms. <Lastname index=1/>, admitted <Date index=1/>, is an

<Age index=1/> woman with a history of diverticulitis who was

found to have colon cancer on colonoscopy, which was

performed in <Date index=2/>. An invasive moderately

differentiated adenocarcinoma was noted in the transverse colon

at 80 cm.

Redact and Tag

PRIVACYANALYTICS.CA18

www.privacyanalytics.ca | [email protected]

at 80 cm.

Patient address: <Email index=1/>

Page 19: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

De-identification Standards

PRIVACYANALYTICS.CA19

www.privacyanalytics.ca | [email protected]

Page 20: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Performance

Direct Identifiers Indirect Identifiers

Recall > 95% Risk-Based Threshold

Precision > 80% > 70%

Standards

PRIVACYANALYTICS.CA20

www.privacyanalytics.ca | [email protected]

Direct Identifiers Indirect Identifiers

Recall

(all-or-nothing)0.95 – 1.0 0.78 – 1.0

Precision 0.93 – 1.0 0.8 to 1.0

Example on i2b2 Data Set

Page 21: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Referential Integrity

PRIVACYANALYTICS.CA21

www.privacyanalytics.ca | [email protected]

Page 22: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

PARAT Software

Providing organizations with a robust, scalable set of capabilities to

anonymize structured and unstructured data

� Use standard and configurable dictionary (Gazetter)

to enable faster and more accurate discovery

� Automate integration with different data sources

and applications

PRIVACYANALYTICS.CA22

www.privacyanalytics.ca | [email protected]

and applications

� Match masked personal information to

corresponding anonomyized unstructured text data

� Tag-based indexing to ensure personal information

(i.e. the name Chris) is replaced consistently

throughout the database

� Modular architecture for optimal extensibility

Stronger Safeguards. Richer Analysis. Integrated.Stronger Safeguards. Richer Analysis. Integrated.

Page 23: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

PARAT 5.3PARAT 5.3

Page 24: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Summary

• EHR’s represent a rich and growing source of unstructured data for secondary use

• Anonymization needs to be understood as a critical step in creating a reporting and analytic pipeline that optimizes data utility and is compliant with legal requirements

• Defensible anonymization of free form text that is compliant is possible

PRIVACYANALYTICS.CA24

www.privacyanalytics.ca | [email protected]

Defensible anonymization of free form text that is compliant is possible

• Anonymization can be completed across unstructured and structured data to attain consistency

• This can scale to large volumes of data and flat files

Page 25: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Learn More …

• Let us know if you’d like to learn more. We have experts available for either a demo, or a 30-minute workshop to better understand your anonymization needs for structured and unstructured data. You can reach us at [email protected]

• We also have several events upcoming:

– December 5: Privacy by Design User Forum @ Fairmont Royal York, Toronto, Ontario

PRIVACYANALYTICS.CA25

www.privacyanalytics.ca | [email protected]

Ontario

– December 6: Twin Cities Health Privacy Summit @ Mayo Clinic, Minneapolis, Minnesota

– December 11-12: Health Data Summit, NAHDO 28th Annual Conference, Denver, Colorado

Page 26: Improving Healthcare Outcomes By Leveraging Data for Secondary Use

Question and Answer

??

PRIVACYANALYTICS.CA26

www.privacyanalytics.ca | [email protected] 26

??

?