Big Data Medical Data Bases and the Privacy Rule: 2015 and ... · SOCIAL MEDIA IMPLICATIONS •Patients are people and people are extremely active on social media. Patients are comfortable

Big Data – Medical Data

Bases and the Privacy Rule:

2015 and Beyond

Lucy Doyle, VP Data Protection, Information Security and Risk Management, McKesson

Karen Smith, Sr. Director, Privacy and Data Protection, McKesson

Anand N. Vidyashankar, Associate Professor, George Mason University

Need to Share Data

The healthcare industry in which we all operate is very complex and

highly regulated. There is heightened awareness across the industry with

regard to privacy and security of sensitive data and information. Data

breaches are frequently in the headlines and highlighted in healthcare periodicals.

We need to leverage data and information to build better products and

services, to share data across organizations, to improve healthcare

operations, to support newer technologies and operations, to support patient access to information, and to support analytic studies,

benchmarking, trends, and outcome analyses to improve healthcare.

We have challenges in sharing data in multiple scenarios.

2

Challenges in Sharing Data

3

Disparate Systems

Interoperability

Standards and Non-Standards

Technologies

Rapidly Changing Environments

Lots of Data

Need for Quality Data

Data Protections

Supporting Access to Support Providers, Payers, and Patients Understanding how to apply de-identification of data to further

data protections

4

Commercializing Data

As privacy and security professionals we receive a host of requests to

have access to and to use data to create studies, benchmarking, marketing,

understand outcomes, provide quality analytics for all players in the

industry.

Many uses require de-identification of data.

Two methods per HIPAA to de-identify data:

1) Safe Harbor Method

2) Expert Method

Outside the US, the Expert Method is generally applied.

Skill sets are needed to embrace the de-identification requirements

Data Rights necessary to use personal data to create de-identified data.

REGULATORY IMPACT

HIPAA intended to call all to task to product the data privacy of the

patient/individual.

PIPEDA – Canadian Requirements to de-identify data

Other Countries

Host of other types of laws that affect what can and cannot be done

with healthcare data

5

USE CASE OVERVIEW

6

Methods of De-identification

Expert Determination It uses statistical and scientific principles and methods to render

information not individually identifiable.

Safe Harbor Requires the removal of 18 specified identifiers, which are

elements related to the individual or relatives, employers, or

household members of the individual, from a dataset.

7

METHODS OF DE-IDENTIFICATION

8

1. Names; 2. All geographic subdivisions smaller than a

State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census:

(1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and

(2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.

3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;

4. Telephone numbers; 5. Fax numbers;

6. Electronic mail addresses; 7. Social security numbers; 8. Medical record numbers; 9. Health plan beneficiary numbers; 10. Account numbers; 11. Certificate/license numbers; 12. Vehicle identifiers and serial numbers,

including license plate numbers; 13. Device identifiers and serial numbers; 14. Web Universal Resource Locators (URLs); 15. Internet Protocol (IP) address numbers; 16. Biometric identifiers, including finger and

voice prints; 17. Full face photographic images and any

comparable images; and 18. Any other unique identifying number,

characteristic, or code; and (ii) The covered entity does not have

actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information

Safe Harbor Method Identifiers

SAFE HARBOR 18TH ELEMENT

18. Any other unique identifying

number, characteristic, or code; and (ii) The covered entity does not have actual knowledge that the information

could be used alone or in combination with other information to identify an

individual who is a subject of the information.

10

DIRECT IDENTIFIERS V INDIRECT

IDENTIFIERS

• Direct identifiers are identifiers that relate specifically to

an individual, relatives, employers, or household members of

the individual. Example of direct identifiers are name,

address, Social Security Number, MRN, telephone number, e-

mail address, or biometric record.

• Indirect identifiers are identifiers that, alone, cannot

immediately identify individuals but when linked with other

identifiers increased the risk of individual re-identification

beyond “very small.” Examples of indirect identifiers include

race, ethnicity, occupation, salary, or rare medical

condition.

11

USE CASE ONE – CLAIMS DATA

Proposed plan to use data set with Safe Harbor data elements removed. Remaining data elements were, in part:

– Claim Amount

– Claim Paid

– Number of days between claim submission and remittance.

– ICD-9-CM

– Insurance Company

– Claims Adjustment Codes

12


CONTINUED

Insurance

Company

13


CONTINUED

• Small Self-managed Insurance Plan

• Link to Facebook

• Link to Photographs of group with less than 30

individuals

• Risk of Privacy Disclosure

14


CONTINUED

• Links to Website

• Full Face Photographs Available

• Ability to view Additional Personal Information

15


CONTINUED

Indirect Identifiers to Consider

– Insurance Company Name – Name of Employer

– Diagnosis and Procedure Codes

• Pregnancy

• Conditions that are ethnically predominate.

• Death

• Rare Conditions

• Gender Bias

• Age specific or ages in general

16

Biotech company released information to its investors

announcing the enrollment of a patient in a potentially ground breaking procedure that could reverse paralysis.

News Release Reported:

– Patient Enrollment in Pilot Program

– That the patient had surgery within 48 hours of injury.

– The hospital where the surgery was performed.

USE CASE TWO – SOCIAL MEDIA

17

Indirect Identifiers to Consider:

– Paralysis – rare condition

– Dramatic accident – dirt bike mishap

– Young patient – news worthy story

USE CASE TWO – SOCIAL MEDIA

18

SOCIAL MEDIA IMPLICATIONS

• Patients are people and people are extremely active on social media. Patients are comfortable with using social media to obtain health information, find support through discussion groups, raise funds to aid with recovery, and to memorialize their health journeys.

• When considering a data set that has been de-identified using the Safe Harbor or Expert Method, the intended use (e.g. internal v external) along with the impact of social media upon the risk of re-identification must be considered.

19

IDENTIFYING AND MANAGING

RISK: SEAMLESS INTEGRATION

20

TOPICS

• Defining Risk

• What is Special about Big Data?

• Risk Profile

• Describing a Risk Profile

• Data Collection: Security Assessment

• Data Collection: Privacy Assessment

• Managing Risk

• Conclusions

21

DEFINING RISK

• Intuitive notions of risk

• Precise definition of risk is mathematical

and involves calculation of probabilities

• Definition: probability of re-identifying an

individual from released information from a

database

• Key Issue: Calculating this probability

depends on the sources contributing to risk.

22

WHAT IS SPECIAL ABOUT BIG DATA?

• The five Vs of Big-Data: Volume, Velocity, Variety, Veracity, and Value

• Combinations of multiple databases

• Streaming of data from multiple sources

• President’s council on Big Data: “privacy challenges from data fusion do not lie in the individual data streams, each of whose collection, real time processing, and retention may be wholly necessary and appropriate for its overt, immediate purpose. Rather, the privacy challenges are emergent properties of our increasing ability to bring into analytical juxtaposition large, diverse data sets and to process them with new kinds of mathematical algorithms.”

23

RISK PROFILE

• Risk profile is the overall risk due to

multiple information releases to multiple

sources

• The risk profile includes both privacy and

security risk

• The risk profile is longitudinal in nature

24

ASSESSING RISK - I

• Collecting data across time from database/vulnerability scans

• Access control violations

• Frequency of changes to processes and procedures (long term)

• Synchronization/Violations between established policies, procedures, and practices

• Auditing: Frequency and Variation – Statistical Designs

25

ASSESSING RISK- II

• Collecting data for privacy risk:

Statistical designs

• Statistical distribution of sources

contributing to data

• Availability of public information

• Spatial risk

• Changes to distribution across time.

26

OVERALL RISK

• Combining security and privacy

assessments

• Need to be unit free

• General Strategy: Worst case scenario

• The approach assumes the adversary has

the best algorithm tore-identify from the

released information

27

MANAGING RISK

• De-identification Strategies: Cell

Suppression, Anonymization, Encryption,

Perturbation

• De-identified Databases

• Policies and Procedures depending on

Maximum Tolerated Risk (MTR)

28

CASE STUDY – DE-IDENTIFIED DATA

CAN BE USEFUL

• Plot of Privacy vs Utility shows the trade

off between them

• This trade-off is related to the trade-off

between commercial value and MTR

• This is depicted in the next two graphs

29

30

31

32

33

34

CONCLUSION

• De-identification and commerce can co-exist

• Both are feasible under a Big Data

framework

• There is no one-size fits all

• Algorithms for these implementations are

available with the presenter, Dr.

Vidyashankar

35

SUMMARY • Lots of Data in Healthcare

• All data needs protection

– Safe Data Handling Practices are Required

– Integrated Privacy and Security Controls are Required

• Quality data is needed

• De-identification of data can be complex

– Experts are needed

– Skill sets are needed

– De-identification can enable quality data that is valuable and highly usable

• Methods to de-identify vary by Intended Uses

• Embrace and apply the methods

– Supports Big Data Use Cases

36

QUESTIONS

37

SPEAKER INFORMATION

Lucy Doyle, Ph.D.

VP Data Protection-Secure Information Management – SIM

[email protected]

Karen Smith, J.D., CHC

Sr. Director, Privacy & Data Protection, Compliance & Ethics

[email protected]

Anand N. Vidyashankar, Ph.D.

Associate Professor, George Mason University

[email protected]

38

Documents

Big Data Medical Data Bases and the Privacy Rule: 2015 and ... · SOCIAL MEDIA IMPLICATIONS •Patients are people and people are extremely active on social media. Patients are comfortable