16
IBM Research 2007 AMIA Spring Congress | May 22-24, 2007 Security and Privacy Technology Enablers Security and Privacy Technology Enablers Security and Privacy Technology Enablers Security and Privacy Technology Enablers for Electronic Healthcare for Electronic Healthcare for Electronic Healthcare for Electronic Healthcare Tyrone Grandison PhD IBM Healthcare Center of Excellence Almaden Research Center San Jose, California [email protected]

Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

Embed Size (px)

Citation preview

Page 1: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

IBM Research

2007 AMIA Spring Congress | May 22-24, 2007

Security and Privacy Technology Enablers Security and Privacy Technology Enablers Security and Privacy Technology Enablers Security and Privacy Technology Enablers for Electronic Healthcarefor Electronic Healthcarefor Electronic Healthcarefor Electronic Healthcare

Tyrone Grandison PhDIBM Healthcare Center of Excellence

Almaden Research CenterSan Jose, [email protected]

Page 2: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

2 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Introduction

� Caveat

– As medical information move to electronic platforms, policy and

social education programs must be augmented by appropriate,

corresponding technology1.

� Objectives

– Define the addressable.

– Define the current major problems.

– Outline technological solutions to each of these problems.

1Christopher Johnson, Rakesh Agrawal, "Intersections of Law and Technology in Balancing Privacy Rights with Free Information Flow", Proceedings of the Fourth IASTED International Conference on Law and Technology, Cambridge, Massachusetts, USA, October 2006.

Page 3: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

3 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Scope of Current Technical Enablers

� The Problem Space

– Tightly Coupled Complex Systems

– Each Silo’ed System has its own Protection Mechanisms

– Conflicting Priorities and Policies

– New (and changing) Technology

� Solution Requirements

– Reduce the complexity and work-load in integrating and deploying systems, i.e.

allow systems to worry about their core function and leverage security and

privacy controls in the data system.

– Do not impact the performance/efficiency of the currently running system

– Enable the current (clinical) workflow and do not require it to change.

Page 4: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

4 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

PervasiveComputing

Location-independentservice provision

Telematics,, Telemedicine

MobileComputingAccessabilityTele-consultation

AutonomicComputing

Self-organisationHealth information

systems

UbiquitousComputing

What the Future Holds for Healthcare?

– Bernd Blobel, Head, German National eHealth Competence Center, University of Regensburg Medical Center, Regensburg, Germany

Page 5: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

5 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

System Requirements – Current & Future

� Appropriate security and

privacy services

� Openness

� Flexibility

� Scalability

� Portability

� User acceptance

� Service orientation

� Distribution at Internet

level

� Lawfulness

� Based on standards

� Service-oriented

interoperability

– Bernd Blobel, Head, German National eHealth Competence Center, University of Regensburg Medical Center, Regensburg, Germany

Page 6: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

6 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Current Major Problems

� Policy-based Private Data Management.

– How does one enforce data disclosure policies and patient

preferences?

– How does one enable privacy-preserving data mining?

� Secure Information Exchange

– How does one selective share the minimum amount of data

necessary for a task?

– How does one de-identify data for information exchange?

� Efficient Data Access Tracking

– How do you efficiently track access and disclosure?

– How do you protect data sent to outsourced agents?

Page 7: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

7 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Technology Solutions

�Policy-based Private Data Management.

– Active Enforcement

– Privacy-Preserving Data Mining

�Secure Information Exchange

– Sovereign Information Sharing

– Optimal k-anonymization (de-identification)

�Efficient Data Access Tracking

– Compliance Auditing

– Database Watermarking

Page 8: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

8 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

DATABASE

Application DataRetrieval

EnforcementHDB Driver

Personal Data

Subject Preferences& Data Collection

NegotiationSubject Preferences& Policy Matching

Installed Policy

Policy Creation

InstallationPolicyParser

Hippocratic Database Active Enforcement

� Privacy Policy: Organizations define a set of policies describing who may access data (users or roles), for what purposes data may be accessed (purposes) and to whom data may be disclosed (recipients).

� Consent: Data subjects are given control, through opt-in and opt-out choices, over who may see their data and under what circumstances

� Active Enforcement: Intercepts and rewrites incoming queries to comply with policies, subject choices, and context.

� Efficiency: Rewritten queries benefit from all of the optimizations and performance enhancements provided by the underlying engine (e.g. parallelism).

� Advantages:• Cell-level access and disclosure control.• Application modification not required.• Database agnostic; does not require

changes to the database engine.-40Daniel4

(333) 333-3333-Bob3

(111) 111-111125Adam1

PhoneAgeName#

Page 9: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

9 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Privacy-Preserving Data Mining

0

200

400

600

800

1000

1200

2 10 18 26 34 42 50 58 66 74 82

Original Randomized Reconstructed

0

20

40

60

80

100

120

10 20 40 60 80 100 150 200

Randomization Level

Original Randomized Reconstructed

50 | 40K | ... 30 | 70K | ...

Randomizer Randomizer

Reconstruct

distribution

of age

Reconstruct

distribution

of income

Data Mining Algorithms

Data Mining Model

65 | 20K | ... 25 | 60K | ...

Alice’s age

Alice’s income

Bob’s age

30+35

� Preserves privacy at the individual level, but allows accurate data mining models to be constructed at the aggregate level.

� Adds random noise to individual values to protect data subject privacy.

� EM algorithm estimates original distribution of values given randomized values + randomization function.

� Algorithms for building classification models and discovering association rules on top of privacy-preserved data with only small loss of accuracy.

Page 10: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

10 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Sovereign Information Integration

Medical

Research

Institution

DNA

Sequences

Drug

Reactions

� Autonomous databases for competitive, statutory, or security reasons.

• Provides selective, minimal sharing on need-to-know basis.

� Example: Which DNA expressions correlate with reactions to certain drugs?

� Algorithms for computing secure joins and join counts without revealing any additional information among the databases.

Minimal Necessary Sharing

R S

� R must not know that S has b & y

� S must not know that R has a & x

v

u

R S

x

v

u

a

y

v

u

b

R

S

Count (R S)

� R & S do not learn anything except that the result is 2.

Page 11: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

11 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Optimal k-Anonymization

(k=2, on name,

address, age)

130 Harry Road

Name

Erica

Paul $88,000

28210 Almaden PkwyHenry

19 Main Street

Mark

42

26

Income AgeAddress

$120,000

$42,000

$50,000

474800 17th Street

San Jose

City

San Jose

San Jose

San Jose

95120

Name

*

* $88,000

20-2995131*

95131

*

40-49

20-29

AgeAddress

$120,000

$42,000

$50,000

40-4995120

San Jose

City

San Jose

San Jose

San Jose

Income

� Optimal k-Anonymization (Bayardo, Agrawal, 2005)

• Algorithm finds optimal k-anonymizations under two representative cost measures and variations of k.

� Advantages of optimal k-anonymization:

• Truthful - Unlike other disclosure protection techniques that use data scrambling, swapping, or adding noise, all information within a k-anonymized dataset is truthful.

• Secure - More secure than other de-identification methods, which may inadvertently reveal confidential information.

Page 12: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

12 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Compliance Auditing

DataTables

2004-02…

2004-02…

Timestamp

S. RobertsAccount serviceS. RobertsSelect …2

MortgageCo.MarketingB. JonesSelect …1

RecipientPurposeUserQueryID

Query Audit Log

DatabaseLayer

Query with purpose, recipient

Generate audit recordfor each query

Updates, inserts, deletes

Backlog

Database triggers or replication

Audit

DatabaseLayer

Audit expression

IDs of log queries having accessed data specified by the audit query

� Audits: Determine whether specified particular data has been accessed in violation of privacy policies or choices.

� Audit expression: Auditor specifies the information disclosures that he or she would like to track.

� Suspicious Queries: Audit system identifies logged queries that accessed the specified data

� Audit Results: Returns the queries that accessed the specified information and the circumstances of access.

� Advantages:

• Cell-level disclosure auditing.

• Low storage overhead; reuses existing database infrastructure.

• Low performance impact; defers computation until audit time.

Page 13: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

13 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Watermarking DatabasesWatermarking DatabasesWatermarking DatabasesWatermarking Databases

Watermark

Insertion

Watermark

Detection

DatabaseSuspiciousDatabase

3. Pseudo randomly select a subset of the rows for marking

Function of secret key and attribute values

3. Identify marked rows/attributes, compare marks with expected mark values

Requires neither original unmarked data nor the watermark

1. Choose secret key

2. Specify table/attributes to be marked

1. Specify secret key

2. Specify table/attributes which should contain marks

4. Confirm presence or absence of the watermark

� Deters data theft and asserts ownership of pirated copies by intentionally introduced pattern in the data.

• Very unlikely to occur by chance.

• Hard to find => hard to destroy (robust against malicious attacks).

� Existing watermarking techniques developed for multimedia are not applicable to database tables.

• Rows in a table are unordered.

• Rows can be inserted, updated, deleted.

• Attributes can be added, dropped.

� New algorithm for watermarking database tables.

• Watermark can be detected using only a subset of the rows and attributes of a table.

• Robust against updates, incrementally updatable.

Page 14: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

14 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Conclusion

� Technology controls for security and privacy must be used in conjunction with legal policy, organizational requirements and social awareness programs in order to address the current and future problems in medical informatics systems.

� Controls must be moved to the data level in order to:

– Reduce the complexity in current system.

– Provide a unified protection framework.

– Allow the resolution of conflicts at the data level.

– Scale to future technology without infrastructure modification.

� There is a current set of enablers that would avert breaches andintegrate seamlessly into current systems.

Page 15: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

15 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Thank You

Page 16: Security and Privacy Technology Enablers for … and Privacy Technology Enablers for Electronic Healthcare ... – How does one enable privacy-preserving data mining? ... Detection

16 2007 AMIA Spring Congress | May 22-24, 2007

S27 Panel: Electronic Healthcare - Addressing the Information Security and Privacy Issues

Selected References

� Rakesh Agrawal, Tyrone Grandison, Christopher Johnson, Jerry Kiernan, "Enabling the 21st Century Healthcare Information Technology

Revolution," Communications of the ACM, Vol. 50, No. 2, February 2007.

� Tyrone Grandison, Ranjit Ganta, Uri Braun, Jamie Kaufman, "Protecting Privacy while Sharing Medical Data Between Regional Healthcare

Entities". To appear in Medinfo 2007 Congress. August 2007. Brisbane, Australia.

� Rakesh Agrawal, Christopher Johnson, "Securing Electronic Health

Records without Impeding the Flow of Information," International Journal of Medical Informatics,January 2007, doi:10.1016/j.ijmedinf.2006.09.015.

http://www.almaden.ibm.com/cs/projects/iis/hdb/publications.shtml

Slides available at http://www.almaden.ibm.com/cs/people/tgrandison/AMIA_Spring2007.pdf