20
Risk-based De-identification Khaled El Emam, CHEO RI & uOttawa

Risk Based De-identification for Sharing Health Data

  • Upload
    kelemam

  • View
    1.034

  • Download
    4

Embed Size (px)

DESCRIPTION

This presentation describes a methodology, tools, and experiences for the de-identification of health information. The objective is to support data sharing for the purpose of research and public health.

Citation preview

Page 1: Risk Based De-identification for Sharing Health Data

Risk-based De-identificationKhaled El Emam, CHEO RI & uOttawa

Page 2: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• Re-identification risk assessment, re-identification attacks, de-identification:– Birth registry / newborn screening program– Tumor bank– Hospital data (discharge abstracts and

pharmacy databases) – local, provincial/state, national

– EMR data

Background

Page 3: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• De-identification works well in practice if you adopt a risk-based approach

• Re-identification attacks are hard• It is possible to de-identify data sets

and still retain sufficient utility• De-identification can be made simple

Issues

Page 4: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Re-identification Risk Spectrum

Page 5: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Page 6: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Managing Re-identification Risk

RiskExposure

Amount ofDe-identification

MitigatingControls

Motives &Capacity

Invasion-of-PrivacyV A

V A

-

- ++

Page 7: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Determining Pr Re-identification Attempts

Page 8: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Determining Risk Threshold to Use

Page 9: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• Adjust threshold• Adjust amount of suppression that is

acceptable• Adjust precision of variables• Sub-sample• Adjust variable weights

Tradeoffs Made

Page 10: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• Passage through research ethics is significantly faster for “secondary use” protocols that are certified as low risk

• Provides an incentive for data recipients to improve their security and privacy practices

• Provides an incentive for funders to cover the costs of infrastructure for handling data

• Amount of de-identification is proportionate to the actual risk

Advantages

Page 11: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Risk Assessment

Page 12: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

De-identification

Page 13: Risk Based De-identification for Sharing Health Data

Risk Assessment for REB

Page 14: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Risk Assessment for REB

Page 15: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

Risk Assessment for REB

Page 16: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• ‘Rogue researcher’ adversary• Search queries considered high risk• Combination of sub-sampling and

generalization for each tumor site data• Moving towards researcher self-

assessments to decide appropriate level of de-identification

Example – Tumor Bank

Page 17: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• ‘ Nosey neighbor’ adversary• Creation of a public data file• Diagnosis and intervention codes

presented difficulties• High level of suppression for a public

file, but acceptable utility with stronger access controls (higher threshold)

Example – Discharge Abstracts

Page 18: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• An audit program is required to ensure compliance with ‘mitigating controls’

• What if a breach happens ?– A risk management approach ensures that

the data is highly de-identified in situations where breaches are most likely

– Can demonstrate due diligence

Practical Considerations

Page 19: Risk Based De-identification for Sharing Health Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Bckgrd

Contents

End

• Geospatial data and longitudinal data always represent challenges because they increase the risk of re-identification

• Thus far we’ve never had to decline a data request because of identifiability or were unable to provide data with sufficient utility for a study

Lessons Learned