Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Successes of Differential Privacy
Cynthia Dwork, Harvard University
Pre-Modern Cryptography
Propose
Break
Modern CryptographyPropose
STRONGERDefinition
Break Definition
Propose
Definition
Break Definition
algorithms
satisfying
definition
Algs
Propose
STRONGER
Modern Cryptography
Propose
Definition
Break Definition
Propose
STRONGERDefinition
Break Definition
algorithms
satisfying
definition
Algs
Propose
STRONGER
No Algorithm?
Propose
Definition
?
Why?
Provably No Algorithm?
Bad Definition
Propose
Definition
?
Propose
WEAKER/DIFFDefinition
Alg / ?
Scientific Launch1. Methodology
2. Engaging with negative results
Dinur-Nissim
Fundamental Law of Info Recovery “Overly accurate” estimates of “too many” statistics destroys
privacy.
Scientific Launch1. Methodology
2. Engaging with negative results
Dinur-Nissim; impossibility of semantic security (Terry Gross)
3. Algorithmic Approach
Privacy-preserving programming from a few primitives
RR, symmetric noise, EM: the ORs and ANDs of DP
The astonishing Blum-Ligett-Roth result
Composition
Analytical insights: sparse vector and PMW; geometric view
4. Complexity
Fruitful Interplay with Other Fields Learning theory, discrepancy theory, cryptography, geometry,
complexity theory, mechanism design, pseudorandomness,
communication complexity, machine learning, (robust) statistics,
fingerprinting codes, coding theory
Rich Algorithmic Literature
Counts, linear queries, histograms, contingency tables (marginals)
Location and spread (eg, median, interquartile range)
Dimension reduction (PCA, SVD), clustering
Support Vector Machines
Sparse regression/LASSO, logistic and linear regression
Gradient descent
Boosting, Multiplicative Weights
Combinatorial optimization, mechanism design
Privacy Under Continual Observation, Pan-Privacy
Kalman filtering
Statistical Queries learning model, PAC learning
False Discovery Rate control
Pan-Privacy, privacy under continual observation …
Outreach
Formative engagement with statistics
Led to earliest public deployment
Social Science Research
Law, Economics, Medicine,…
PLSC, Berkman, Brussels, Simons Foundation, EC, iDASH,…
Omics: Stanford (past); IPAM (upcoming); Society of
Epidemeoligic Research
Policy
Policy
CPUC hearings on Energy Data Center, the ruling, the Southern
CA power company
Podesta report, PCAST report
Commission on Evidence-Based Policymaking
Consumer Finance Protection Board
…
Deployment RAPPOR, Google more generally, Apple,…
A couple of startups (Leapyear, Privatar(?))
Census – OnTheMap and upcoming
Help wanted!
Deployment RAPPOR, Google more generally, Apple,…
Help wanted!
A couple of startups (Leapyear, Privatar(?))
Help wanted!
Census – OnTheMap and upcoming
Help wanted!
Deployment RAPPOR, Google more generally, Apple,…
Help wanted!
A couple of startups (Leapyear, Privatar(?))
Help wanted!
Census – OnTheMap and upcoming
Help wanted!
DP when Privacy is not a Concern Markets, Economics, Game Theory
Hartline, McSherry,Talwar; Roth; Pai and Roth; Lykouris, Syrgkanis,
and Tardos
Fairness in Algorithmic Classification
Generalizability under adaptive analysis
Fairness Through Awareness
Dwork, Hardt, Pitassi, Reingold, Zemel 2012
Individual Fairness People who are similar with respect to a specific classification task
should be treated similarly
S + math ∼ Sc + finance
“Fairness Through Awareness”
V: individuals
M: 𝑉 → 𝑂
𝑥
M𝑥
O: Classification
Outcomes
Classifier
Classifier
V O
Lipschitz
𝑥
𝑦
tiny d
𝑀
Individual Fairness
𝑀:𝑉 → Δ 𝑂
𝑀 𝑥 −𝑀 𝑦 ≤ 𝑑(𝑥, 𝑦)
Lipschitz Mappings
Differential Privacy Individual Fairness
Objects Databases Individuals
Outcomes Output of statistical analysis Classification outcome
Similarity General purpose metric Task-specific metric
Can use dp techniques for fairness
Theorem: Exponential mechanism of [MT07] yields individual fairness
and small loss when the metric has bounded doubling dimension.
Which is “Right”?
Statistical Validity in Adaptive Data Analysis
Dwork, Feldman, Hardt, Pitassi, Reingold, Roth
𝑞𝑖 depends on 𝑎1, 𝑎2, … , 𝑎𝑖−1 Differential privacy neutralizes risks incurred by adaptivity
Hard to find a query for which the data set is not representative
q1
a1
Database curator data analyst
Mq2
a2
q3
a3
The Re-Usable Holdout
“Training”
“Holdout”
Learn on the training set
Check against holdout via a
differentially private mechanism
Future exploration does not
significantly depend on H
H stays fresh
3 Sides of the Same Coin Fairness, Privacy, Generalizability
“Keep Up the Good Work” – Moni Naor (by channeling)
Let your research be fruitful and multiply
Build the 𝜖 registry, formally or informally
Build libraries, continue outreach efforts
Confront Implications of the Fundamental Law
Prioritization? Who decides? Which fields have the tools
Public Understanding
Generalization beyond the sample distribution / transfer learning?
Strong relation to fairness
Thank You