18
Prepared for: New Directions in the Science of Differential Privacy March 2013 A Lifecycle Approach to Information Privacy Micah Altman <[email protected]> Director of Research, MIT Libraries Non-Resident Senior Fellow, Brookings Institution A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Embed Size (px)

DESCRIPTION

Presented at the Simons Foundation, March 2013

Citation preview

Page 1: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Prepared for:

New Directions in the Science of Differential Privacy

March 2013

A Lifecycle Approach to Information Privacy

Micah Altman<[email protected]>

Director of Research, MIT LibrariesNon-Resident Senior Fellow, Brookings Institution

Page 2: A Lifecycle Approach to Information Privacy

Collaborators*

A Lifecycle Approach to Information Privacy

• Privacy Tools for Sharing Research Data Project:Edo Airoldi, Stephen Chong, Merce Crosas, Cynthia Dwork Gary King, Phil Malone, Latanya Sweeney, Salil Vadhan

• Research SupportThanks to, the National Science Foundation (award

1237235), the Sloan Foundation and the Massachusetts Institute of Technology, & Harvard University.

* And co-conspirators

Page 3: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Related WorkReprints available from:

micahaltman.com

• Comments on ANPRM: Human Subjects Protection, http://dataprivacylab.org/projects/irb/Vadhan.pdf

• Privacy tools project proposal:http://privacytools.seas.harvard.edu/full-project-description

Page 4: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Overarching challenges• Law is evolving

– specification of technical requirements– new legal concepts – “Right to be forgotten”

• Research is changing– evidence base shifting:

reliant on big data, transactional data, new forms of data– conduct of research distributive, collaborative, multi-

institutional, multi-national– Infrastructure is changing:

cloud & distributed third-party computation & storage• privacy analysis is advancing

– new computational privacy solution concepts– new findings from reidentification experiments– new methods for estimating utility/privacy tradeoffs

Page 5: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Shifting social science evidence base How to deidentify without destroying utility? • The “Netflix Problem”: large, sparse datasets that overlap can be probabilistically

linked [see Narayan and Shmatikov 2008]• The “GIS problem”: fine geo-spatial-temporal data very difficult to mask,

when correlated with external data [see Zimmerman & Pavlik 2008; Zan et al, 2013; Srivatsa & Hicks 2013]

• The “Facebook Problem”: Possible to identify masked network data, if only a few nodes controlled. [see Backstrom, et. al 2007]

• The “Blog problem” : Pseudononymous communication can be linked through textual analysis [see Novak,, Raghavan, and Tomkins 2004]

Source: [Calberese 2008; Real Time Rome Project 2007]

Page 6: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

CUSP aims for the the Leading Edge

• Urban Informatics – high-velocity localized social science

• Leading edge data –sensors, crowd-sourcing

• Leading edge privacy needs –privacy policy,privacy award information management,privacy ethics

Page 7: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Data InputOutput Approach

Published Outputs

* Jones * * 1961 021*

* Jones * * 1961 021*

* Jones * * 1972 9404*

* Jones * * 1972 9404*

* Jones * * 1972 9404*

Modal Practice“The correlation between X and Y was large and

statistically significant”

Summary statistics

Contingency table

Public use sample microdata

Information Visualization

Page 8: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Questions Generated from Data I/O Model Solution Concepts

• Comparison of risks across concepts

• Extension of solution concepts range

Processing Stage

• How to apply DP to new analytic methods?– Bayesian methods– Data mining methods– Text analysis methods

• How to apply DP to different types of “Microdata”

– Network data– Text– Geospatial traces– Relations

Disclosure Deterministic ProbabilisticIndividual Record Linkage

K-anonymityReidentification probability

Group attributes

K-anonymity + heterogeneity (e.g. l-diversity

Threat analysisSDC on skewed magnitude tables

Individual Attributes

Attribute disclosure Differential privacyDistributional privacyBayesian-optimal privacy

specified columns/rows

Private Multiparty Computation

Questions about transformation– Imputation methods– Computation efficiency– Informational utility*

See for example:- Dwork & Smith 2009

* “My, what a large ε you have, grandma!”

Page 9: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Information Life Cycle Model

Creation/Collection

Storage/Ingest

Processing

Internal SharingAnalysis

External dissemination/publica

tion

Re-use

Long-term access

Page 10: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Legal/Policy FrameworksContract Intellectual Property

Access Rights Confidentiality

Copyright

Fair Use

DMCA

Database Rights

Moral Rights

Intellectual Attribution

Trade Secret

Patent

Trademark

Common Rule45 CFR 26

HIPAA

FERPA EU Privacy DirectivePrivacy Torts

(Invasion, Defamation)

Rights of Publicity

Sensitive but Unclassified

Potentially Harmful

(Archeological Sites,

Endangered Species, Animal

Testing, …)

Classified

FOIA

CIPSEA

State Privacy Laws

EAR

State FOI Laws

Journal Replication

Requirements

Funder Open Access

Contract

License

Click-WrapTOU

ITAR

Export Restrictions

Page 11: A Lifecycle Approach to Information Privacy

Questions Generated by Lifecycle Model

A Lifecycle Approach to Information Privacy

• Which laws apply to each stage:– are legal requirements consistent across

stages?• How to align legal instruments:

– consent forms, SLA, DUA’s • Optimizing privacy risk/utility/cost across

the research stages…when is it more efficient to…

– apply disclosure limitation at data collection stage?

– Use particular solution concepts at particular stages

– Harmonize concepts/treatments across stages

• Policy design– Policies to internalize future / public

stakeholder needs– Policy equilibrium under different privacy

solution concepts• Information reuse

– Bayesian priors– Scientific verification and replication

• Infrastructure needs– Data acquisition, storage, dissemination– Identification, authorization, authentication– Metadata, protocols

Creation/Collection

Storage/Ingest

Processing

Internal SharingAnalysis

External dissemination/pub

lication

Re-use

Long-term

access

Research methods

Data ManagementSystems

Legal / Policy Frameworks∂

Statistical / Computational

Frameworks

Page 12: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Questions on Differential Privacy from Information Lifecycle Analysis: Legal

• Legal requirements -- when does law … – require exact answers? (DP does not give exact answers) – give safe harbor if linkages are ‘only’ probabilistic? (DP provides safe harbor in

this case)– require action based on “actual knowledge”? (How do we include strongly

informative priors in DP? When is DP not actually “worst case”?) – require analysis of a specific unit of observation? (DP does not give answers

for individual units.)– require balance of privacy and utility (DP does not inherently balance, but

uses minimax – maximizes utility subject to given privacy constraint. What is appropriate choice of privacy constraint?_

• Legal instruments -- how to describe DP protections in a legally coherent way for …– service level agreements– consent/deposit terms– data usage agreements

Page 13: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Questions on Differential Privacy from Information Lifecycle Analysis: System Design

• System design: potential increased implementation cost of DP:– Information security -- hardening– Information security – certification & auditing– Model server development, provisioning, maintenance, reliability, availability

• System design: information security tradeoffs of DP… Interactive systems have larger vulnerability:

– Availability risks: denial of service attack– Availability/integrity risks: privacy budget exhaustion attacks– Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)– Secrecy/privacy: breach of authentication/authorization layer

• System design: optimizing privacy & utility across lifecycle– When does limiting disclosive data collection (e.g. using randomized response, group aggregated

methods) dominate applying DP to data analysis stage– When does restricted virtual data enclaves + public synthetic data dominate public DP queries (of

same type)• System design: Information reuse

– How do you incorporate informative priors in DP privacy solution concept? (When does the “Terry Gross” problem apply?)

– What’s required for ensuring scientific replication/verification of results produced by differentially private model servers?

– How to do DP query on confidential data linked with externally provided microdata?

Page 14: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Questions on Differential Privacy from Information Lifecycle Analysis: Policy Design

• Policy design: “market failures” for privacy goods– Is their a market failure, how do we know?– What is the nature of the market failure:

• Conditions on market structure/market power: Barriers to entry? Natural monopoly/network effect? First-mover advantage, path dependency?

• Conditions on goods: excludability, rivalry, externality• Conditions on exchange: transaction costs, agency problems, bounded

rationality, or informational asymmetry

• Policy design: policy equlibria– When does enforcing a specific privacy concept yield socially optimal

solution?– When is DP a prisoner’s dilemma?

(E.g. I contribute to a database for a small payment, since my unilateral entry does note effect result, but equilibrium is that database is largeand you learn substantially more about me than if it database was small.)

Page 15: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Urban Instrumentation and Confidentiality

Specific data source• Administrative records• Transactions• Traffic• Health• Mobile phones• Microenvironment• Crowdsource

Possible nosy questions…

• Were you fined?• What did you buy?• Where were you?• Are you sick?• How rich are you?• Do you have meth lab?

Categories• Infrastructure• Environment• People• Community – self-identified

neighborhood, school district, voting precinct, election district, police beat, crime locations, grocery prices, produce availability

Privacy implications• Business confidentiality• Security & safety – infrastructure

chokepoints; police coverage; endangered species; animal testing labs; environmental hazards

• Personal privacy

Page 16: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

Law

Social Science

Public Policy

Data Collection Methods(Research

Methodology)

Data Management(Information Science)

Statistics

Computer Science

• Privacy-aware data-management systems

• Methods for confidential data collection and management

Interdisciplinary Research Required

Law

Social Science

Public Policy

Research Methodolog

y

Information Science

Statistics

Computer Science

• Creative-Commons-like modular license plugins for privacy data use; consent; terms of service

• Model legislation – for modern privacy concepts• Privacy requirements taxonomy and

classification• Game theoretic/social-choice models of social

privacy equilibria under different privacy policies

Page 17: A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy

References• Backstrom, Lars, Cynthia Dwork, and Jon Kleinberg. "Wherefore art thou r3579x?: anonymized social

networks, hidden patterns, and structural steganography." Proceedings of the 16th international conference on World Wide Web. ACM, 2007

• C. Dwork, A. Smith, 2009, “Differential Privacy for Statistics: What we Know and What we Want to Learn “, Journal of Privacy and Confidentiality (2009) 1(2) 135–154

• Narayanan, Arvind, and Vitaly Shmatikov. "Robust de-anonymization of large sparse datasets." Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.

• Novak, Jasmine, Prabhakar Raghavan, and Andrew Tomkins. "Anti-aliasing on the web." Proceedings of the 13th international conference on World Wide Web. ACM, 2004.

• M Srivatsa and Mhi cks. 2012. Deanonymizing mobility traces: using social network as a side-channel. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS '12). ACM, New York, NY, USA, 628-637. DOI=10.1145/2382196.2382262 http://doi.acm.org/10.1145/2382196.2382262

• Bin Zan, Zhanbo Sun, Macro Gruteser, and Xuegang Ban. 2013. Linking anonymous location traces through driving characteristics. In Proceedings of the third ACM conference on Data and application security and privacy (CODASPY '13). ACM, New York, NY, USA, 293-300. DOI=10.1145/2435349.2435391 http://doi.acm.org/10.1145/2435349.2435391

• Zimmerman, D. L., Pavlik, C. (2008). Quantifying the effects of mask metadata disclosure and multiple releases on the confidentiality of geographically masked health data. Geographical Analysis 40.1, 52 (25).

Page 18: A Lifecycle Approach to Information Privacy

Discussion

Personal Web: micahaltman.com

Privacy Tools for Sharing Research Data:privacytools.seas.harvard.edu/

E-mail: [email protected]

Twitter: @drmaltman