23
Revealing Information Revealing Information while Preserving Privacy while Preserving Privacy Kobbi Nissim Kobbi Nissim NEC Labs, NEC Labs, DIMACS DIMACS Based on work with: Irit Dinur Irit Dinur, Cynthia Dwork Cynthia Dwork and and Joe Kilian Joe Kilian

Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

Revealing Information while Revealing Information while Preserving PrivacyPreserving Privacy

Kobbi NissimKobbi Nissim

NEC Labs, NEC Labs, DIMACSDIMACS

Based on work with:

Irit DinurIrit Dinur, Cynthia Dwork Cynthia Dwork andand Joe Kilian Joe Kilian

Page 2: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

2

The Hospital StoryThe Hospital Story

Patient data

q?a

MedicalDB

Page 3: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

3

Easy Easy TemptingTempting Solution Solution

• Observation: ‘harmless’ attributes uniquely Observation: ‘harmless’ attributes uniquely identify many patients identify many patients (gender, approx age, approx weight, (gender, approx age, approx weight, ethnicity, marital status…)ethnicity, marital status…)

• Worse:`rare’ attribute Worse:`rare’ attribute (CF (CF 1/3000) 1/3000)

ddMr. Smith

Ms. John

Mr. Doe

A A BadBad Solution SolutionIdeaIdea: a. Remove identifying information (name, SSN, : a. Remove identifying information (name, SSN,

…)…)b. Publish data

Page 4: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

4

Our Model: Our Model: Statistical Statistical DatabaseDatabase (SDB) (SDB)

d {0,1} {0,1}nn

q q [ [nn]]

aaqq==iiqq ddii

Mr. Smith

Ms. John

Mr. Doe

Page 5: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

5

The Privacy Game: The Privacy Game: Information-Privacy Information-Privacy

TradeoffTradeoff• PrivatePrivate functions: functions:

– want to hidewant to hide ii((dd11, … ,, … ,ddnn)=)=ddii

• InformationInformation functions: functions:– want to revealwant to reveal ffqq((dd11, … ,, … ,ddnn)=)=iiq q ddii

• Explicit Explicit definition of private functions definition of private functions

• Crypto: secure function evaluationCrypto: secure function evaluation– want to reveal want to reveal ff()()

– want to hide all functions want to hide all functions () not computable () not computable from from ff()()

– ImplicitImplicit definition of private functions definition of private functions

Page 6: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

6

Approaches to SDB Privacy Approaches to SDB Privacy [AW 89][AW 89]

• Query RestrictionQuery Restriction– Require queries to obey some structureRequire queries to obey some structure

• PerturbationPerturbation– Give `noisy’ or `approximate’ answersGive `noisy’ or `approximate’ answers This talk

Page 7: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

7

PerturbationPerturbation• Database:Database: d d = = dd11,…,,…,ddnn

• Query:Query: q q [ [nn]]

• Exact answer:Exact answer: a aqq == iiqqddii

• Perturbed answer:Perturbed answer: â âqq

PerturbationPerturbation EE::For all For all qq:: | | ââqq –– a aqq| ≤| ≤ E E

General PerturbationGeneral Perturbation: : PrPrqq [| [|ââqq –– a aqq| ≤| ≤ E E] = 1-neg(] = 1-neg(nn))

= 99%, 51%= 99%, 51%

Page 8: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

8

Data perturbation:Data perturbation: – Swapping Swapping [Reiss 84][Liew, Choi, Liew 85][Reiss 84][Liew, Choi, Liew 85]

– Fixed perturbations Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01][Agrawal, Srikant 00] [Agrawal, Aggarwal 01]

• Additive perturbation Additive perturbation d’d’ii==ddii++EEii

Output perturbation:Output perturbation:– Random sample queries Random sample queries [Denning 80] [Denning 80]

• Sample drawn from query setSample drawn from query set

– Varying perturbations Varying perturbations [Beck 80][Beck 80]

• Perturbation variance grows with number of queriesPerturbation variance grows with number of queries

– Rounding Rounding [Achugbue, Chin 79][Achugbue, Chin 79] Randomized Randomized [Fellegi, Phillips [Fellegi, Phillips

74]74] … …

Perturbation Techniques Perturbation Techniques [AW89][AW89]

Page 9: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

9

Main Question: How much perturbation is needed to

achieve privacy?

Page 10: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

10

Privacy from Privacy from n PerturbationPerturbation

• Database: Database: ddRR{0,1}{0,1}nn• On query On query qq::

1.1. Let Let aaqq==iiq q ddii

2.2. If |If |aaqq-|-|qq|/2| > |/2| > EE return return ââqq = = aaqq

3.3. Otherwise return Otherwise return ââqq = |= |qq|/2|/2

• Privacy is preservedPrivacy is preserved– If E n (lgn)2, whp always use rule 3 , whp always use rule 3

• No information about No information about dd is given! is given!

• No usability!

Can we do better?

• Smaller E ?• Usability ???

(an example of a useless database)

Page 11: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

11

Defining PrivacyDefining Privacy• Elusive definitionElusive definition

– Application dependentApplication dependent

– Partial vs. exact compromisePartial vs. exact compromise

– Prior knowledge, how to model it?Prior knowledge, how to model it?

– Other issues …Other issues …

(not) Defining Privacy Defining Privacy

• Instead of defining privacy: Instead of defining privacy: What is What is surely non-private…surely non-private…– Strong breaking of privacyStrong breaking of privacy

Page 12: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

12

Strong Breaking of Privacy

The Useless Database The Useless Database Achieves Best Possible Achieves Best Possible

Perturbation:Perturbation:Perturbation << Perturbation << nn Implies Implies

no Privacy!no Privacy!• Main Theorem:

Given a DB response algorithm with perturbation E << n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

Page 13: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

13

dn

bits

(Recall âq = iqdi + pertq )

Decoding Problem: Given access to âq1,…, âq2n reconstruct d in time poly(n).

encode

2n subsets of [n]

ââq1q1 ââq2q2 ââq3q3

’’

The Adversary as a The Adversary as a Decoding AlgorithmDecoding Algorithm

Page 14: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

14

Where âq = iqdi mod 2 on 51% of the subsets

The GL Algorithm finds in time poly(n) a small list of candidates, containing d

dencode

2n subsets of [n]

n bits

ââq1q1 ââq2q2 ââq3q3

Goldreich-Levin Goldreich-Levin Hardcore Hardcore BitBit

Side

rem

ark

Page 15: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

15

Comparing the TasksComparing the Tasks

RandomRandomDependentDependentQueries:Queries:

d’ s.t. dist(d,d’) < nn

(List decoding impossible)(List decoding impossible)

List decodingList decodingDecodingDecoding::

Additive perturbationAdditive perturbation

fraction of the queries fraction of the queries deviate from perturbationdeviate from perturbation

Corrupt Corrupt ½-½- of the of the queriesqueries

Noise:Noise:

aq = iqdiaq = iqdi (mod 2)Encoding:Encoding:

Side

rem

ark

Page 16: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

16

• Main Theorem: Given a DB response algorithm with perturbation E < n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

Recall Our Goal: Recall Our Goal: Perturbation << Perturbation << n Implies n Implies

no Privacy!no Privacy!

Page 17: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

17

Proof of Main TheoremProof of Main Theorem The Adversary Reconstruction The Adversary Reconstruction

AlgorithmAlgorithm

Observation: An LP solution always exists, e.g. x=d.

• Query phaseQuery phase: Get : Get ââqqj j for for tt random subsets random subsets qq11,…,,…,qqtt

of [of [nn]]• Weeding phaseWeeding phase: Solve the Linear Program:: Solve the Linear Program:

0 0 x xii 1 1

||iiqqjj xxi i -- â âqqjj

| | E E

• RoundingRounding: Let : Let ccii = round( = round(xxii), output ), output cc

Page 18: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

18

qq

Proof of Main TheoremProof of Main Theorem Correctness of the AlgorithmCorrectness of the Algorithm

Consider x=(0.5,…,0.5) as a solution for the LP

d x

Observation: A random q often shows a n advantage either to 0’s or to 1’s.

- Such a q disqualifies x as a solution for the LP- We prove that if dist(dist(xx,,dd) > ) > nn, then whp there will

be a q among q1,…,qt that disqualifies x

Page 19: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

19

• `Imperfect’ perturbation:`Imperfect’ perturbation:

– Can approximate the original bit string Can approximate the original bit string even if database answer is within even if database answer is within perturbation only for 99%perturbation only for 99% of the queries of the queries

• Other information functions:Other information functions:

– Given access to “noisy majority” of Given access to “noisy majority” of subsets we can approximate the original subsets we can approximate the original bit-string.bit-string.

Extensions of the Main Extensions of the Main TheoremTheorem

Page 20: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

20

Notes on Impossibility Notes on Impossibility ResultsResults

• Exponential Adversary:Exponential Adversary:– Strong breaking of privacy if Strong breaking of privacy if EE << << nn

• Polynomial Adversary:Polynomial Adversary:– Non-adaptive queriesNon-adaptive queries

– Oblivious of Oblivious of perturbation methodperturbation method and and database distributiondatabase distribution

– Tight threshold Tight threshold EE n

• What if adversary is more restricted?What if adversary is more restricted?

Page 21: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

21

Bounded Adversary ModelBounded Adversary Model• Database: Database: ddRR{0,1}{0,1}nn

• Theorem: If the : If the number of queries is is bounded by bounded by TT, then there is a DB , then there is a DB response algorithm with perturbation response algorithm with perturbation of of ~~TT that maintains privacy. that maintains privacy.

With a reasonable definition of privacy

Page 22: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

22

Summary and Open Summary and Open QuestionsQuestions

• Very high perturbation is needed for privacyVery high perturbation is needed for privacy– Threshold phenomenon – above Threshold phenomenon – above n: total privacy, below n: total privacy, below

n: none (poly-time adversary)n: none (poly-time adversary)– Rules out many currently proposed solutions for SDB Rules out many currently proposed solutions for SDB

privacyprivacy– Q:Q: what’s on the threshold? Usability? what’s on the threshold? Usability?

• Main tool:Main tool: A reconstruction algorithmA reconstruction algorithm – Reconstructing an n-bit string from perturbed partial Reconstructing an n-bit string from perturbed partial

sums/thresholdssums/thresholds

• Privacy for aPrivacy for a TT-bounded adversary with a -bounded adversary with a random databaserandom database TT perturbation perturbation– Q:Q: other database distributions other database distributions

• Q:Q: Crypto and SDB privacy? Crypto and SDB privacy?

Page 23: Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia

23

Our Privacy Definition Our Privacy Definition (bounded adversary (bounded adversary

model)model)

d-i

i

diFails

w.p. > ½-

…(transcript, i)

dRR{0,1}{0,1}nn

d