Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Successes of Differential Privacy

Cynthia Dwork, Harvard University

Pre-Modern Cryptography

Propose

Break

Modern CryptographyPropose

STRONGERDefinition

Break Definition

Propose

Definition

Break Definition

algorithms

satisfying

definition

Algs

Propose

STRONGER

Modern Cryptography

Propose

Definition

Break Definition

Propose

STRONGERDefinition

Break Definition

algorithms

satisfying

definition

Algs

Propose

STRONGER

No Algorithm?

Propose

Definition

?

Why?

Provably No Algorithm?

Bad Definition

Propose

Definition

?

Propose

WEAKER/DIFFDefinition

Alg / ?

Scientific Launch1. Methodology

2. Engaging with negative results

Dinur-Nissim

Fundamental Law of Info Recovery “Overly accurate” estimates of “too many” statistics destroys

privacy.

Scientific Launch1. Methodology

2. Engaging with negative results

Dinur-Nissim; impossibility of semantic security (Terry Gross)

3. Algorithmic Approach

Privacy-preserving programming from a few primitives

RR, symmetric noise, EM: the ORs and ANDs of DP

The astonishing Blum-Ligett-Roth result

Composition

Analytical insights: sparse vector and PMW; geometric view

4. Complexity

Fruitful Interplay with Other Fields Learning theory, discrepancy theory, cryptography, geometry,

complexity theory, mechanism design, pseudorandomness,

communication complexity, machine learning, (robust) statistics,

fingerprinting codes, coding theory

Rich Algorithmic Literature

Counts, linear queries, histograms, contingency tables (marginals)

Location and spread (eg, median, interquartile range)

Dimension reduction (PCA, SVD), clustering

Support Vector Machines

Sparse regression/LASSO, logistic and linear regression

Gradient descent

Boosting, Multiplicative Weights

Combinatorial optimization, mechanism design

Privacy Under Continual Observation, Pan-Privacy

Kalman filtering

Statistical Queries learning model, PAC learning

False Discovery Rate control

Pan-Privacy, privacy under continual observation …

Outreach

Formative engagement with statistics

Led to earliest public deployment

Social Science Research

Law, Economics, Medicine,…

PLSC, Berkman, Brussels, Simons Foundation, EC, iDASH,…

Omics: Stanford (past); IPAM (upcoming); Society of

Epidemeoligic Research

Policy

Policy

CPUC hearings on Energy Data Center, the ruling, the Southern

CA power company

Podesta report, PCAST report

Commission on Evidence-Based Policymaking

Consumer Finance Protection Board

…

Deployment RAPPOR, Google more generally, Apple,…

A couple of startups (Leapyear, Privatar(?))

Census – OnTheMap and upcoming

Help wanted!


Help wanted!


Help wanted!


Help wanted!


Help wanted!


Help wanted!


Help wanted!

DP when Privacy is not a Concern Markets, Economics, Game Theory

Hartline, McSherry,Talwar; Roth; Pai and Roth; Lykouris, Syrgkanis,

and Tardos

Fairness in Algorithmic Classification

Generalizability under adaptive analysis

Fairness Through Awareness

Dwork, Hardt, Pitassi, Reingold, Zemel 2012

Individual Fairness People who are similar with respect to a specific classification task

should be treated similarly

S + math ∼ Sc + finance

“Fairness Through Awareness”

V: individuals

M: 𝑉 → 𝑂

𝑥

M𝑥

O: Classification

Outcomes

Classifier

Classifier

V O

Lipschitz

𝑥

𝑦

tiny d

𝑀

Individual Fairness

𝑀:𝑉 → Δ 𝑂

𝑀 𝑥 −𝑀 𝑦 ≤ 𝑑(𝑥, 𝑦)

Lipschitz Mappings

Differential Privacy Individual Fairness

Objects Databases Individuals

Outcomes Output of statistical analysis Classification outcome

Similarity General purpose metric Task-specific metric

Can use dp techniques for fairness

Theorem: Exponential mechanism of [MT07] yields individual fairness

and small loss when the metric has bounded doubling dimension.

Which is “Right”?

Statistical Validity in Adaptive Data Analysis

Dwork, Feldman, Hardt, Pitassi, Reingold, Roth

𝑞𝑖 depends on 𝑎1, 𝑎2, … , 𝑎𝑖−1 Differential privacy neutralizes risks incurred by adaptivity

Hard to find a query for which the data set is not representative

q1

a1

Database curator data analyst

Mq2

a2

q3

a3

The Re-Usable Holdout

“Training”

“Holdout”

Learn on the training set

Check against holdout via a

differentially private mechanism

Future exploration does not

significantly depend on H

H stays fresh

3 Sides of the Same Coin Fairness, Privacy, Generalizability

“Keep Up the Good Work” – Moni Naor (by channeling)

Let your research be fruitful and multiply

Build the 𝜖 registry, formally or informally

Build libraries, continue outreach efforts

Confront Implications of the Fundamental Law

Prioritization? Who decides? Which fields have the tools

Public Understanding

Generalization beyond the sample distribution / transfer learning?

Strong relation to fairness

Thank You

Documents

Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,