Upload
privacy-analytics
View
168
Download
2
Embed Size (px)
DESCRIPTION
Healthcare organizations are responding to meaningful use and accountable care initiatives that focus on increasing the quality of care, improving patient safety and reducing costs. Analyzing patient-level data in electronic health records for secondary use is critical to driving these initiatives successfully. Privacy and compliance and data analytic professionals will learn how their organizations can: - Comply with HIPAA-based requirements for anonymizing unstructured data for secondary use; - Discover personal information in text-based formats and apply risk-based rules to their de-identification; - De-identify and mask unstructured data in text and XML formats; and, - Determine risk metrics associated with anonymization and its quality for analysis. To listen to this presentation, please visit https://vimeo.com/80921698/settings
Citation preview
Improving Healthcare Outcomes by Leveraging
Unstructured Data for Secondary Use
How Organizations Can Anonymize Unstructured Data
in Electronic Health Records
www.privacyanalytics.ca | [email protected]
in Electronic Health Records
Our Presenters
Dr. Khaled El Emam is the founder and
CEO of Privacy Analytics Inc.
PRIVACYANALYTICS.CA2
www.privacyanalytics.ca | [email protected]
Chris Wright is Privacy Analytics’ VP
Marketing, and will be your moderator
today.
• Privacy Analytics – Overview
• The Proliferation of Unstructured Data in Healthcare
• The Role of Unstructured Data in Improving Healthcare Outcomes
• Key Steps to Anonymize Unstructured Data
– How to Mitigate the Risk of Re-identification
– What Are Your Organization’s Compliance Considerations
Agenda
PRIVACYANALYTICS.CA3
www.privacyanalytics.ca | [email protected]
– What Are Your Organization’s Compliance Considerations
– Anonymization Test Case
• DEMO
• Summary
• Question and Answer
Privacy Analytics - Overview
• Purpose-built software that automates the de-
identification and masking of data using a risk-based
approach to anonymize personal information in
compliance with HIPAA requirements
For organizations that want to safeguard and enable their personal
information for secondary use …
PRIVACYANALYTICS.CA4
www.privacyanalytics.ca | [email protected]
• Integrated capabilities to anonymize structured and
unstructured data from multiple sources
• Peer-reviewed methodologies and value-added
services that certify data for secondary use
Secondary Use for Healthcare Data
Secondary use of health data applies
personal health information (PHI) for uses
outside of direct health care delivery. It
includes such activities as analysis, research,
quality and safety measurement, public
Definition
PRIVACYANALYTICS.CA5
www.privacyanalytics.ca | [email protected]
quality and safety measurement, public
health, payment, provider certification or
accreditation, marketing, and other business
applications, including strictly commercial
activities. 1
1. Definition sourced from white paper, “Toward a National Framework for the Secondary Use of Health Data: An American Medical
Informatics Association White Paper”, J Am Med Inform Assoc 2007;14:1-9 doi:10.1197/jamia.M2273
Our Customers
State of
Louisiana
Department of Preventative Medicine
PRIVACYANALYTICS.CA6
www.privacyanalytics.ca | [email protected]
The Changing Healthcare Data Landscape
Richer levels of
aggregated data,
but increasingly
granular with the
view of capturing
PRIVACYANALYTICS.CA7
www.privacyanalytics.ca | [email protected]
McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013
view of capturing
the totality of
patient
information,
experiences and
interactions
The Proliferation of Unstructured Data
According to IBM, Ovum and other researchers, 80-90 percent of all
medical data today is unstructured ... and that volume is doubling
every five years.
Electronic health records where personal information
resides in XML as free form text and needs to be
anonomyized for analysis
1
PRIVACYANALYTICS.CA8
www.privacyanalytics.ca | [email protected]
Medical devices where unstructured data or free form
text from machine “dumps” (i.e. x-ray machines or CAT
scans) is sent to a database(s) for analysis
Online Forums where patients or providers discuss their
conditions or cases, requiring anonymization to facilitate
sentiment analysis and other forms of information analysis
1. http://ovum.com/2012/05/11/unlocking-the-potential-of-unstructured-medical-data/
Growth of Unstructured Data in EHR’s
• 3X increase for EHR
systems with basic
clinician notes since
PRIVACYANALYTICS.CA9
www.privacyanalytics.ca | [email protected]
clinician notes since
2008
• 11X increase in the
adoption of
comprehensive EHR
systems since 2008
1. The Office of the National Coordinator for Health Information Technology,
ONC Data Brief, March 2013
Creating the Conditions for an Analytic Pipeline
Creating an Analytic Pipeline for Unstructured Data for
Secondary Use
Unstructured data’s utility for
action is shaped by the relative
degree of compliance, risk and
anonymization applied.
Discharge summaries
EHR unstructured data is rich with insight, but requires another step to
optimize its utility for secondary use and derive actionable insight
PRIVACYANALYTICS.CA10
www.privacyanalytics.ca | [email protected]
Unstructured
Data Anonymized
DataReporting Advanced
Analytics
anonymization applied.
Text fields
EHRXML code
Physician notes
Comments
Scanned docs
Balancing Privacy and Utility for Secondary Use
Data Quality1 Analytic Granularity2 Depth of Insight3
Ensuring de-identified
data has analytic
Allowing users to
configure de-identification
Enabling analysis of the
total patient health
PRIVACYANALYTICS.CA11
www.privacyanalytics.ca | [email protected]
data has analytic
usefulness by
determining its relative
risk associated with its
disclosure, sharing and
re-sale
for aggregated and micro-
level analysis of patient
level data without
compromising privacy
and costly breaches
experience, to compile a
complete picture of this
experience from multiple
data sources and types
• Privacy Analytics – Overview
• The Proliferation of Unstructured Data in Healthcare
• The Role of Unstructured Data in Improving Healthcare Outcomes
• Key Steps to Anonymize Unstructured Data
– How to Mitigate the Risk of Re-identification
– What Are Your Organization’s Compliance Considerations
Agenda
PRIVACYANALYTICS.CA12
www.privacyanalytics.ca | [email protected]
– What Are Your Organization’s Compliance Considerations
– Anonymization Test Case
• DEMO
• Summary
• Question and Answer
• Files (text files, XML files, other formats that can be converted
to text such as Word and PDF)
• Text fields in a database (e.g., notes and comments fields)
Types of Text
PRIVACYANALYTICS.CA13
www.privacyanalytics.ca | [email protected]
Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman
with a history of diverticulitis who was found to have colon
cancer on colonoscopy, which was performed in July of 2002.
An invasive moderately differentiated adenocarcinoma was
noted in the transverse colon at 80 cm.
Example Text Information
PRIVACYANALYTICS.CA14
www.privacyanalytics.ca | [email protected]
Patient address: [email protected]
Detecting Personal Information
Direct Identifiers
� First name
� Middle name
� Last name
� Street
� PO Box
Indirect IdentifiersIndirect Identifiers
� City
� State
� Country
� ZIP Code
� Postal Code
PRIVACYANALYTICS.CA15
www.privacyanalytics.ca | [email protected]
� PO Box
� Email address
� IP address
� Phone number
� ID (e.g., SSN and CC)
� Postal Code
� Organization (facility) name
� Age
� Date
Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman
with a history of diverticulitis who was found to have colon
cancer on colonoscopy, which was performed in July of 2002.
An invasive moderately differentiated adenocarcinoma was
noted in the transverse colon at 80 cm.
Detecting Personal Information
PRIVACYANALYTICS.CA16
www.privacyanalytics.ca | [email protected]
noted in the transverse colon at 80 cm.
Patient address: [email protected]
• Redact: “*****”
• Redact & Tag: <Firstname index=1/>
• Randomize and Replace: replaces the value with a randomly
generated value
Anonymizing the Information
PRIVACYANALYTICS.CA17
www.privacyanalytics.ca | [email protected]
generated value
• Special generalization rules for dates and ZIP Codes
Ms. <Lastname index=1/>, admitted <Date index=1/>, is an
<Age index=1/> woman with a history of diverticulitis who was
found to have colon cancer on colonoscopy, which was
performed in <Date index=2/>. An invasive moderately
differentiated adenocarcinoma was noted in the transverse colon
at 80 cm.
Redact and Tag
PRIVACYANALYTICS.CA18
www.privacyanalytics.ca | [email protected]
at 80 cm.
Patient address: <Email index=1/>
Performance
Direct Identifiers Indirect Identifiers
Recall > 95% Risk-Based Threshold
Precision > 80% > 70%
Standards
PRIVACYANALYTICS.CA20
www.privacyanalytics.ca | [email protected]
Direct Identifiers Indirect Identifiers
Recall
(all-or-nothing)0.95 – 1.0 0.78 – 1.0
Precision 0.93 – 1.0 0.8 to 1.0
Example on i2b2 Data Set
PARAT Software
Providing organizations with a robust, scalable set of capabilities to
anonymize structured and unstructured data
� Use standard and configurable dictionary (Gazetter)
to enable faster and more accurate discovery
� Automate integration with different data sources
and applications
PRIVACYANALYTICS.CA22
www.privacyanalytics.ca | [email protected]
and applications
� Match masked personal information to
corresponding anonomyized unstructured text data
� Tag-based indexing to ensure personal information
(i.e. the name Chris) is replaced consistently
throughout the database
� Modular architecture for optimal extensibility
Stronger Safeguards. Richer Analysis. Integrated.Stronger Safeguards. Richer Analysis. Integrated.
PARAT 5.3PARAT 5.3
Summary
• EHR’s represent a rich and growing source of unstructured data for secondary use
• Anonymization needs to be understood as a critical step in creating a reporting and analytic pipeline that optimizes data utility and is compliant with legal requirements
• Defensible anonymization of free form text that is compliant is possible
PRIVACYANALYTICS.CA24
www.privacyanalytics.ca | [email protected]
Defensible anonymization of free form text that is compliant is possible
• Anonymization can be completed across unstructured and structured data to attain consistency
• This can scale to large volumes of data and flat files
Learn More …
• Let us know if you’d like to learn more. We have experts available for either a demo, or a 30-minute workshop to better understand your anonymization needs for structured and unstructured data. You can reach us at [email protected]
• We also have several events upcoming:
– December 5: Privacy by Design User Forum @ Fairmont Royal York, Toronto, Ontario
PRIVACYANALYTICS.CA25
www.privacyanalytics.ca | [email protected]
Ontario
– December 6: Twin Cities Health Privacy Summit @ Mayo Clinic, Minneapolis, Minnesota
– December 11-12: Health Data Summit, NAHDO 28th Annual Conference, Denver, Colorado