View
215
Download
0
Embed Size (px)
Citation preview
Revealing Information while Revealing Information while Preserving PrivacyPreserving Privacy
Kobbi NissimKobbi Nissim
NEC Labs, NEC Labs, DIMACSDIMACS
Based on work with:
Irit DinurIrit Dinur, Cynthia Dwork Cynthia Dwork andand Joe Kilian Joe Kilian
2
The Hospital StoryThe Hospital Story
Patient data
q?a
MedicalDB
3
Easy Easy TemptingTempting Solution Solution
• Observation: ‘harmless’ attributes uniquely Observation: ‘harmless’ attributes uniquely identify many patients identify many patients (gender, approx age, approx weight, (gender, approx age, approx weight, ethnicity, marital status…)ethnicity, marital status…)
• Worse:`rare’ attribute Worse:`rare’ attribute (CF (CF 1/3000) 1/3000)
ddMr. Smith
Ms. John
Mr. Doe
A A BadBad Solution SolutionIdeaIdea: a. Remove identifying information (name, SSN, : a. Remove identifying information (name, SSN,
…)…)b. Publish data
4
Our Model: Our Model: Statistical Statistical DatabaseDatabase (SDB) (SDB)
d {0,1} {0,1}nn
q q [ [nn]]
aaqq==iiqq ddii
Mr. Smith
Ms. John
Mr. Doe
5
The Privacy Game: The Privacy Game: Information-Privacy Information-Privacy
TradeoffTradeoff• PrivatePrivate functions: functions:
– want to hidewant to hide ii((dd11, … ,, … ,ddnn)=)=ddii
• InformationInformation functions: functions:– want to revealwant to reveal ffqq((dd11, … ,, … ,ddnn)=)=iiq q ddii
• Explicit Explicit definition of private functions definition of private functions
• Crypto: secure function evaluationCrypto: secure function evaluation– want to reveal want to reveal ff()()
– want to hide all functions want to hide all functions () not computable () not computable from from ff()()
– ImplicitImplicit definition of private functions definition of private functions
6
Approaches to SDB Privacy Approaches to SDB Privacy [AW 89][AW 89]
• Query RestrictionQuery Restriction– Require queries to obey some structureRequire queries to obey some structure
• PerturbationPerturbation– Give `noisy’ or `approximate’ answersGive `noisy’ or `approximate’ answers This talk
7
PerturbationPerturbation• Database:Database: d d = = dd11,…,,…,ddnn
• Query:Query: q q [ [nn]]
• Exact answer:Exact answer: a aqq == iiqqddii
• Perturbed answer:Perturbed answer: â âqq
PerturbationPerturbation EE::For all For all qq:: | | ââqq –– a aqq| ≤| ≤ E E
General PerturbationGeneral Perturbation: : PrPrqq [| [|ââqq –– a aqq| ≤| ≤ E E] = 1-neg(] = 1-neg(nn))
= 99%, 51%= 99%, 51%
8
Data perturbation:Data perturbation: – Swapping Swapping [Reiss 84][Liew, Choi, Liew 85][Reiss 84][Liew, Choi, Liew 85]
– Fixed perturbations Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01][Agrawal, Srikant 00] [Agrawal, Aggarwal 01]
• Additive perturbation Additive perturbation d’d’ii==ddii++EEii
Output perturbation:Output perturbation:– Random sample queries Random sample queries [Denning 80] [Denning 80]
• Sample drawn from query setSample drawn from query set
– Varying perturbations Varying perturbations [Beck 80][Beck 80]
• Perturbation variance grows with number of queriesPerturbation variance grows with number of queries
– Rounding Rounding [Achugbue, Chin 79][Achugbue, Chin 79] Randomized Randomized [Fellegi, Phillips [Fellegi, Phillips
74]74] … …
Perturbation Techniques Perturbation Techniques [AW89][AW89]
9
Main Question: How much perturbation is needed to
achieve privacy?
10
Privacy from Privacy from n PerturbationPerturbation
• Database: Database: ddRR{0,1}{0,1}nn• On query On query qq::
1.1. Let Let aaqq==iiq q ddii
2.2. If |If |aaqq-|-|qq|/2| > |/2| > EE return return ââqq = = aaqq
3.3. Otherwise return Otherwise return ââqq = |= |qq|/2|/2
• Privacy is preservedPrivacy is preserved– If E n (lgn)2, whp always use rule 3 , whp always use rule 3
• No information about No information about dd is given! is given!
• No usability!
Can we do better?
• Smaller E ?• Usability ???
(an example of a useless database)
11
Defining PrivacyDefining Privacy• Elusive definitionElusive definition
– Application dependentApplication dependent
– Partial vs. exact compromisePartial vs. exact compromise
– Prior knowledge, how to model it?Prior knowledge, how to model it?
– Other issues …Other issues …
(not) Defining Privacy Defining Privacy
• Instead of defining privacy: Instead of defining privacy: What is What is surely non-private…surely non-private…– Strong breaking of privacyStrong breaking of privacy
12
Strong Breaking of Privacy
The Useless Database The Useless Database Achieves Best Possible Achieves Best Possible
Perturbation:Perturbation:Perturbation << Perturbation << nn Implies Implies
no Privacy!no Privacy!• Main Theorem:
Given a DB response algorithm with perturbation E << n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).
13
dn
bits
(Recall âq = iqdi + pertq )
Decoding Problem: Given access to âq1,…, âq2n reconstruct d in time poly(n).
encode
2n subsets of [n]
ââq1q1 ââq2q2 ââq3q3
’’
The Adversary as a The Adversary as a Decoding AlgorithmDecoding Algorithm
14
Where âq = iqdi mod 2 on 51% of the subsets
The GL Algorithm finds in time poly(n) a small list of candidates, containing d
dencode
2n subsets of [n]
n bits
ââq1q1 ââq2q2 ââq3q3
Goldreich-Levin Goldreich-Levin Hardcore Hardcore BitBit
Side
rem
ark
15
Comparing the TasksComparing the Tasks
RandomRandomDependentDependentQueries:Queries:
d’ s.t. dist(d,d’) < nn
(List decoding impossible)(List decoding impossible)
List decodingList decodingDecodingDecoding::
Additive perturbationAdditive perturbation
fraction of the queries fraction of the queries deviate from perturbationdeviate from perturbation
Corrupt Corrupt ½-½- of the of the queriesqueries
Noise:Noise:
aq = iqdiaq = iqdi (mod 2)Encoding:Encoding:
Side
rem
ark
16
• Main Theorem: Given a DB response algorithm with perturbation E < n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).
Recall Our Goal: Recall Our Goal: Perturbation << Perturbation << n Implies n Implies
no Privacy!no Privacy!
17
Proof of Main TheoremProof of Main Theorem The Adversary Reconstruction The Adversary Reconstruction
AlgorithmAlgorithm
Observation: An LP solution always exists, e.g. x=d.
• Query phaseQuery phase: Get : Get ââqqj j for for tt random subsets random subsets qq11,…,,…,qqtt
of [of [nn]]• Weeding phaseWeeding phase: Solve the Linear Program:: Solve the Linear Program:
0 0 x xii 1 1
||iiqqjj xxi i -- â âqqjj
| | E E
• RoundingRounding: Let : Let ccii = round( = round(xxii), output ), output cc
18
Proof of Main TheoremProof of Main Theorem Correctness of the AlgorithmCorrectness of the Algorithm
Consider x=(0.5,…,0.5) as a solution for the LP
d x
Observation: A random q often shows a n advantage either to 0’s or to 1’s.
- Such a q disqualifies x as a solution for the LP- We prove that if dist(dist(xx,,dd) > ) > nn, then whp there will
be a q among q1,…,qt that disqualifies x
19
• `Imperfect’ perturbation:`Imperfect’ perturbation:
– Can approximate the original bit string Can approximate the original bit string even if database answer is within even if database answer is within perturbation only for 99%perturbation only for 99% of the queries of the queries
• Other information functions:Other information functions:
– Given access to “noisy majority” of Given access to “noisy majority” of subsets we can approximate the original subsets we can approximate the original bit-string.bit-string.
Extensions of the Main Extensions of the Main TheoremTheorem
20
Notes on Impossibility Notes on Impossibility ResultsResults
• Exponential Adversary:Exponential Adversary:– Strong breaking of privacy if Strong breaking of privacy if EE << << nn
• Polynomial Adversary:Polynomial Adversary:– Non-adaptive queriesNon-adaptive queries
– Oblivious of Oblivious of perturbation methodperturbation method and and database distributiondatabase distribution
– Tight threshold Tight threshold EE n
• What if adversary is more restricted?What if adversary is more restricted?
21
Bounded Adversary ModelBounded Adversary Model• Database: Database: ddRR{0,1}{0,1}nn
• Theorem: If the : If the number of queries is is bounded by bounded by TT, then there is a DB , then there is a DB response algorithm with perturbation response algorithm with perturbation of of ~~TT that maintains privacy. that maintains privacy.
With a reasonable definition of privacy
22
Summary and Open Summary and Open QuestionsQuestions
• Very high perturbation is needed for privacyVery high perturbation is needed for privacy– Threshold phenomenon – above Threshold phenomenon – above n: total privacy, below n: total privacy, below
n: none (poly-time adversary)n: none (poly-time adversary)– Rules out many currently proposed solutions for SDB Rules out many currently proposed solutions for SDB
privacyprivacy– Q:Q: what’s on the threshold? Usability? what’s on the threshold? Usability?
• Main tool:Main tool: A reconstruction algorithmA reconstruction algorithm – Reconstructing an n-bit string from perturbed partial Reconstructing an n-bit string from perturbed partial
sums/thresholdssums/thresholds
• Privacy for aPrivacy for a TT-bounded adversary with a -bounded adversary with a random databaserandom database TT perturbation perturbation– Q:Q: other database distributions other database distributions
• Q:Q: Crypto and SDB privacy? Crypto and SDB privacy?
23
Our Privacy Definition Our Privacy Definition (bounded adversary (bounded adversary
model)model)
d-i
i
diFails
w.p. > ½-
…(transcript, i)
dRR{0,1}{0,1}nn
d