Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Big Data – Medical Data
Bases and the Privacy Rule:
2015 and Beyond
Lucy Doyle, VP Data Protection, Information Security and Risk Management, McKesson
Karen Smith, Sr. Director, Privacy and Data Protection, McKesson
Anand N. Vidyashankar, Associate Professor, George Mason University
Need to Share Data
The healthcare industry in which we all operate is very complex and
highly regulated. There is heightened awareness across the industry with
regard to privacy and security of sensitive data and information. Data
breaches are frequently in the headlines and highlighted in healthcare periodicals.
We need to leverage data and information to build better products and
services, to share data across organizations, to improve healthcare
operations, to support newer technologies and operations, to support patient access to information, and to support analytic studies,
benchmarking, trends, and outcome analyses to improve healthcare.
We have challenges in sharing data in multiple scenarios.
2
Challenges in Sharing Data
3
Disparate Systems
Interoperability
Standards and Non-Standards
Technologies
Rapidly Changing Environments
Lots of Data
Need for Quality Data
Data Protections
Supporting Access to Support Providers, Payers, and Patients Understanding how to apply de-identification of data to further
data protections
4
Commercializing Data
As privacy and security professionals we receive a host of requests to
have access to and to use data to create studies, benchmarking, marketing,
understand outcomes, provide quality analytics for all players in the
industry.
Many uses require de-identification of data.
Two methods per HIPAA to de-identify data:
1) Safe Harbor Method
2) Expert Method
Outside the US, the Expert Method is generally applied.
Skill sets are needed to embrace the de-identification requirements
Data Rights necessary to use personal data to create de-identified data.
REGULATORY IMPACT
HIPAA intended to call all to task to product the data privacy of the
patient/individual.
PIPEDA – Canadian Requirements to de-identify data
Other Countries
Host of other types of laws that affect what can and cannot be done
with healthcare data
5
USE CASE OVERVIEW
6
Methods of De-identification
Expert Determination It uses statistical and scientific principles and methods to render
information not individually identifiable.
Safe Harbor Requires the removal of 18 specified identifiers, which are
elements related to the individual or relatives, employers, or
household members of the individual, from a dataset.
7
METHODS OF DE-IDENTIFICATION
8
1. Names; 2. All geographic subdivisions smaller than a
State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census:
(1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
4. Telephone numbers; 5. Fax numbers;
6. Electronic mail addresses; 7. Social security numbers; 8. Medical record numbers; 9. Health plan beneficiary numbers; 10. Account numbers; 11. Certificate/license numbers; 12. Vehicle identifiers and serial numbers,
including license plate numbers; 13. Device identifiers and serial numbers; 14. Web Universal Resource Locators (URLs); 15. Internet Protocol (IP) address numbers; 16. Biometric identifiers, including finger and
voice prints; 17. Full face photographic images and any
comparable images; and 18. Any other unique identifying number,
characteristic, or code; and (ii) The covered entity does not have
actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information
Safe Harbor Method Identifiers
SAFE HARBOR 18TH ELEMENT
18. Any other unique identifying
number, characteristic, or code; and (ii) The covered entity does not have actual knowledge that the information
could be used alone or in combination with other information to identify an
individual who is a subject of the information.
10
DIRECT IDENTIFIERS V INDIRECT
IDENTIFIERS
• Direct identifiers are identifiers that relate specifically to
an individual, relatives, employers, or household members of
the individual. Example of direct identifiers are name,
address, Social Security Number, MRN, telephone number, e-
mail address, or biometric record.
• Indirect identifiers are identifiers that, alone, cannot
immediately identify individuals but when linked with other
identifiers increased the risk of individual re-identification
beyond “very small.” Examples of indirect identifiers include
race, ethnicity, occupation, salary, or rare medical
condition.
11
USE CASE ONE – CLAIMS DATA
Proposed plan to use data set with Safe Harbor data elements removed. Remaining data elements were, in part:
– Claim Amount
– Claim Paid
– Number of days between claim submission and remittance.
– ICD-9-CM
– Insurance Company
– Claims Adjustment Codes
12
USE CASE ONE – CLAIMS DATA
CONTINUED
Insurance
Company
13
USE CASE ONE – CLAIMS DATA
CONTINUED
• Small Self-managed Insurance Plan
• Link to Facebook
• Link to Photographs of group with less than 30
individuals
• Risk of Privacy Disclosure
14
USE CASE ONE – CLAIMS DATA
CONTINUED
• Links to Website
• Full Face Photographs Available
• Ability to view Additional Personal Information
15
USE CASE ONE – CLAIMS DATA
CONTINUED
Indirect Identifiers to Consider
– Insurance Company Name – Name of Employer
– Diagnosis and Procedure Codes
• Pregnancy
• Conditions that are ethnically predominate.
• Death
• Rare Conditions
• Gender Bias
• Age specific or ages in general
16
Biotech company released information to its investors
announcing the enrollment of a patient in a potentially ground breaking procedure that could reverse paralysis.
News Release Reported:
– Patient Enrollment in Pilot Program
– That the patient had surgery within 48 hours of injury.
– The hospital where the surgery was performed.
USE CASE TWO – SOCIAL MEDIA
17
Indirect Identifiers to Consider:
– Paralysis – rare condition
– Dramatic accident – dirt bike mishap
– Young patient – news worthy story
USE CASE TWO – SOCIAL MEDIA
18
SOCIAL MEDIA IMPLICATIONS
• Patients are people and people are extremely active on social media. Patients are comfortable with using social media to obtain health information, find support through discussion groups, raise funds to aid with recovery, and to memorialize their health journeys.
• When considering a data set that has been de-identified using the Safe Harbor or Expert Method, the intended use (e.g. internal v external) along with the impact of social media upon the risk of re-identification must be considered.
19
IDENTIFYING AND MANAGING
RISK: SEAMLESS INTEGRATION
20
TOPICS
• Defining Risk
• What is Special about Big Data?
• Risk Profile
• Describing a Risk Profile
• Data Collection: Security Assessment
• Data Collection: Privacy Assessment
• Managing Risk
• Conclusions
21
DEFINING RISK
• Intuitive notions of risk
• Precise definition of risk is mathematical
and involves calculation of probabilities
• Definition: probability of re-identifying an
individual from released information from a
database
• Key Issue: Calculating this probability
depends on the sources contributing to risk.
22
WHAT IS SPECIAL ABOUT BIG DATA?
• The five Vs of Big-Data: Volume, Velocity, Variety, Veracity, and Value
• Combinations of multiple databases
• Streaming of data from multiple sources
• President’s council on Big Data: “privacy challenges from data fusion do not lie in the individual data streams, each of whose collection, real time processing, and retention may be wholly necessary and appropriate for its overt, immediate purpose. Rather, the privacy challenges are emergent properties of our increasing ability to bring into analytical juxtaposition large, diverse data sets and to process them with new kinds of mathematical algorithms.”
23
RISK PROFILE
• Risk profile is the overall risk due to
multiple information releases to multiple
sources
• The risk profile includes both privacy and
security risk
• The risk profile is longitudinal in nature
24
ASSESSING RISK - I
• Collecting data across time from database/vulnerability scans
• Access control violations
• Frequency of changes to processes and procedures (long term)
• Synchronization/Violations between established policies, procedures, and practices
• Auditing: Frequency and Variation – Statistical Designs
25
ASSESSING RISK- II
• Collecting data for privacy risk:
Statistical designs
• Statistical distribution of sources
contributing to data
• Availability of public information
• Spatial risk
• Changes to distribution across time.
26
OVERALL RISK
• Combining security and privacy
assessments
• Need to be unit free
• General Strategy: Worst case scenario
• The approach assumes the adversary has
the best algorithm tore-identify from the
released information
27
MANAGING RISK
• De-identification Strategies: Cell
Suppression, Anonymization, Encryption,
Perturbation
• De-identified Databases
• Policies and Procedures depending on
Maximum Tolerated Risk (MTR)
28
CASE STUDY – DE-IDENTIFIED DATA
CAN BE USEFUL
• Plot of Privacy vs Utility shows the trade
off between them
• This trade-off is related to the trade-off
between commercial value and MTR
• This is depicted in the next two graphs
29
30
31
32
33
34
CONCLUSION
• De-identification and commerce can co-exist
• Both are feasible under a Big Data
framework
• There is no one-size fits all
• Algorithms for these implementations are
available with the presenter, Dr.
Vidyashankar
35
SUMMARY • Lots of Data in Healthcare
• All data needs protection
– Safe Data Handling Practices are Required
– Integrated Privacy and Security Controls are Required
• Quality data is needed
• De-identification of data can be complex
– Experts are needed
– Skill sets are needed
– De-identification can enable quality data that is valuable and highly usable
• Methods to de-identify vary by Intended Uses
• Embrace and apply the methods
– Supports Big Data Use Cases
36
QUESTIONS
37
SPEAKER INFORMATION
Lucy Doyle, Ph.D.
VP Data Protection-Secure Information Management – SIM
Karen Smith, J.D., CHC
Sr. Director, Privacy & Data Protection, Compliance & Ethics
Anand N. Vidyashankar, Ph.D.
Associate Professor, George Mason University
38