59
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) 1 Summer @ Census, 8/15/2013

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Embed Size (px)

Citation preview

Page 1: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 1

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies

Ashwin Machanavajjhalaashwin @ cs.duke.edu

Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke)

Page 2: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 2

Overview of the talk• An inherent trade-off between privacy (confidentiality) of

individuals and utility of statistical analyses over data collected from individuals.

• Differential privacy has revolutionized how we reason about privacy– Nice tuning knob ε for trading off privacy and utility

Page 3: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 3

Overview of the talk

• However, differential privacy only captures a small part of the privacy-utility trade-off space

– No Free Lunch Theorem

– Differentially private mechanisms may not ensure sufficient utility

– Differentially private mechanisms may not ensure sufficient privacy

Page 4: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 4

Overview of the talk

• I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility

– Better control on what to keep secret and who the adversaries are

– Can ensure more utility than differential privacy in many cases

– Can ensure privacy where differential privacy fails

Page 5: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 5

Outline• Background

– Differential privacy

• No Free Lunch [Kifer-M SIGMOD ’11]– No `one privacy notion to rule them all’

• Pufferfish Privacy Framework [Kifer-M PODS’12]– Navigating the space of privacy definitions

• Blowfish: Practical privacy using policies [ongoing work]

Page 6: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Data Privacy Problem

6

Individual 1r1

Individual 2r2

Individual 3r3

Individual NrN

Server

DB

Utility:Privacy: No breach about any individual

Page 7: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 7

Data Privacy in the real world

Application Data Collector Third Party (adversary)

Private Information

Function (utility)

Medical Hospital Epidemiologist Disease Correlation between disease and geography

Genome analysis

Hospital Statistician/Researcher

Genome Correlation between genome and disease

Advertising Google/FB/Y! Advertiser Clicks/Browsing

Number of clicks on an ad by age/region/gender …

Social Recommen-

dations

Facebook Another user Friend links / profile

Recommend other users or ads to users based on

social network

Page 8: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

T-closenessLi et. al ICDE ‘07

K-AnonymitySweeney et al.IJUFKS ‘02

Many definitions

• Linkage attack• Background knowledge attack• Minimality /Reconstruction

attack• de Finetti attack• Composition attack

8

L-diversityMachanavajjhala et. al TKDD ‘07 E-PrivacyMachanavajjhala et. al VLDB ‘09

& several attacks

Differential PrivacyDwork et. al ICALP ‘06

Page 9: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 9

Differential Privacy

For every output …

OD2D1

Adversary should not be able to distinguish between any D1 and D2 based on any O

Pr[A(D1) = O] Pr[A(D2) = O] .

For every pair of inputs that differ in one value

< ε (ε>0)log

Page 10: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 10

Algorithms• No deterministic algorithm guarantees differential privacy.

• Random sampling does not guarantee differential privacy.

• Randomized response satisfies differential privacy.

Page 11: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 11

Laplace Mechanism

-10-4.300000000000021.4 7.09999999999990

0.10.20.30.40.50.6

Laplace Distribution – Lap(λ)

DatabaseD

Researcher

Query q

True answer q(D) q(D) + η

η

h(η) α exp(-η / λ)

Privacy depends on the λ parameter

Mean: 0, Variance: 2 λ2

Page 12: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 12

Laplace Mechanism

Thm: If sensitivity of the query is S, then the following guarantees ε-differential privacy.

λ = S/ε

Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) ||1 ≤ S(q)

[Dwork et al., TCC 2006]

Page 13: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Contingency tables

13

2 2

2 8

D

Count( , )

Each tuple takes k=4 different values

Page 14: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Laplace Mechanism for Contingency Tables

14

2 + Lap(2/ε) 2 + Lap(2/ε)

2 + Lap(2/ε) 8 + Lap(2/ε)

D Mean : 8Variance : 8/ε2

Sensitivity = 2

Page 15: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 15

Composition PropertyIf algorithms A1, A2, …, Ak use independent randomness and each Ai satisfies εi-differential privacy, resp.

Then, outputting all the answers together satisfies differential privacy with

ε = ε1 + ε2 + … + εk

Privacy Budget

Page 16: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 16

Differential Privacy• Privacy definition that is independent of the attacker’s prior

knowledge.

• Tolerates many attacks that other definitions are susceptible to.– Avoids composition attacks– Claimed to be tolerant against adversaries with arbitrary background

knowledge.

• Allows simple, efficient and useful privacy mechanisms – Used in LEHD’s OnTheMap [M et al ICDE ‘08]

Page 17: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 17

Outline• Background

– Differential privacy

• No Free Lunch [Kifer-M SIGMOD ’11]– No `one privacy notion to rule them all’

• Pufferfish Privacy Framework [Kifer-M PODS’12]– Navigating the space of privacy definitions

• Blowfish: Practical privacy using policies [ongoing work]

Page 18: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 18

Differential Privacy & Utility• Differentially private mechanisms may not ensure sufficient utility

for many applications.

• Sparse Data: Integrated Mean Square Error due to Laplace mechanism canbe worse than returning a random contingency table for typical values of ε (around 1)

• Social Networks [M et al PVLDB 2011]

Page 19: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 19

Differential Privacy & Privacy• Differentially private algorithms may not limit the ability of an

adversary to learn sensitive information about individuals when records in the data are correlated

• Correlations across individuals occur in many ways:– Social Networks– Data with pre-released constraints– Functional Dependencies

Page 20: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Laplace Mechanism and Correlations

20

2 + Lap(2/ε) 2 + Lap(2/ε) 4

2 + Lap(2/ε) 8 + Lap(2/ε) 10

4 10

D

Does Laplace mechanism still guarantee privacy?

Auxiliary marginals published for following reasons:

1. Legal: 2002 Supreme Court case Utah v. Evans2. Contractual: Advertisers must know exact

demographics at coarse granularities

4

10

4 10

Page 21: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Laplace Mechanism and Correlations

21

2 + Lap(2/ε) 2 + Lap(2/ε) 4

2 + Lap(2/ε) 8 + Lap(2/ε) 10

4 10

D

2 + Lap(2/ε)

2 + Lap(2/ε)2 + Lap(2/ε)

Count ( , ) = 8 + Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 + Lap(2/ε)

Page 22: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

Mean : 8 Variance : 8/ke2

Laplace Mechanism and Correlations

22

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(2/ε) 10

4 10

D

2 + Lap(2/ε)

2 + Lap(2/ε)2 + Lap(2/ε)

can reconstruct the table with high precision for large k

Page 23: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013

No Free Lunch Theorem

It is not possible to guarantee any utility in addition to privacy, without making assumptions about

• the data generating distribution

• the background knowledge available to an adversary

23

[Kifer-M SIGMOD ‘11][Dwork-Naor JPC ‘10]

Page 24: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 24

To sum up …

• Differential privacy only captures a small part of the privacy-utility trade-off space

– No Free Lunch Theorem

– Differentially private mechanisms may not ensure sufficient privacy

– Differentially private mechanisms may not ensure sufficient utility

Page 25: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 25

Outline• Background

– Differential privacy

• No Free Lunch [Kifer-M SIGMOD ’11]– No `one privacy notion to rule them all’

• Pufferfish Privacy Framework [Kifer-M PODS’12]– Navigating the space of privacy definitions

• Blowfish: Practical privacy using policies [ongoing work]

Page 26: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 26

Pufferfish Framework

Page 27: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 27

Pufferfish Semantics• What is being kept secret?

• Who are the adversaries?

• How is information disclosure bounded? – (similar to epsilon in differential privacy)

Page 28: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 28

Sensitive Information• Secrets: S be a set of potentially sensitive statements

– “individual j’s record is in the data, and j has Cancer”– “individual j’s record is not in the data”

• Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”)– (“Bob has cancer”, “Bob has diabetes”)

Page 29: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 29

Adversaries• We assume a Bayesian adversary who is can be completely

characterized by his/her prior information about the data– We do not assume computational limits

• Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior).

– No assumptions: All probability distributions over data instances are possible.

– I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk)

Page 30: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 30

Information Disclosure• Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if

Page 31: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 31

Pufferfish Semantic Guarantee

Prior odds of s vs s’

Posterior odds of s vs s’

Page 32: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 34

Pufferfish & Differential Privacy

• Spairs: – si

x: record i takes the value x

• Attackers should not be able to significantly distinguish between any two values from the domain for any individual record.

Page 33: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 35

Pufferfish & Differential Privacy

• Data evolution: – For all θ = [ f1, f2, f3, …, fk ]

• Adversary’s prior may be any distribution that makes records independent

Page 34: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 36

Pufferfish & Differential Privacy• Spairs:

– six: record i takes the value x

• Data evolution: – For all θ = [ f1, f2, f3, …, fk ]

A mechanism M satisfies differential privacy if and only if

it satisfies Pufferfish instantiated using Spairs and {θ}

Page 35: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 37

Summary of Pufferfish• A semantic approach to defining privacy

– Enumerates the information that is secret and the set of adversaries.– Bounds the odds ratio of pairs of mutually exclusive secrets

• Helps understand assumptions under which privacy is guaranteed– Differential privacy is one specific choice of secret pairs and adversaries

• How should a data publisher use this framework?• Algorithms?

Page 36: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 38

Outline• Background

– Differential privacy

• No Free Lunch [Kifer-M SIGMOD ’11]– No `one privacy notion to rule them all’

• Pufferfish Privacy Framework [Kifer-M PODS’12]– Navigating the space of privacy definitions

• Blowfish: Practical privacy using policies [ongoing work]

Page 37: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 39

Blowfish Privacy• A special class of Pufferfish instantiations

Both pufferfish and blowfish are marine fish of the Tetraodontidae family

Page 38: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 40

Blowfish Privacy• A special class of Pufferfish instantiations

• Extends differential privacy using policies

– Specification of sensitive information• Allows more utility

– Specification of publicly known constraints in the data• Ensures privacy in correlated data

• Satisfies the composition property

Page 39: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 41

Blowfish Privacy• A special class of Pufferfish instantiations

• Extends differential privacy using policies

– Specification of sensitive information• Allows more utility

– Specification of publicly known constraints in the data• Ensures privacy in correlated data

• Satisfies the composition property

Page 40: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 42

Sensitive Information• Secrets: S be a set of potentially sensitive statements

– “individual j’s record is in the data, and j has Cancer”– “individual j’s record is not in the data”

• Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”)– (“Bob has cancer”, “Bob has diabetes”)

Page 41: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 43

Sensitive information in Differential Privacy

• Spairs: – si

x: record i takes the value x

• Attackers should not be able to significantly distinguish between any two values from the domain for any individual record.

Page 42: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 44

Other notions of Sensitive Information• Medical Data

– OK to infer whether individual is healthy or not. – E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets

for any individual

• Partitioned Sensitive Information:

Page 43: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 45

Other notions of Sensitive Information• Geospatial Data

– Do not want the attacker to distinguish between “close-by” points in the space.

– May distinguish between “far-away” points

• Distance based Sensitive Information

Page 44: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 47

Generalization as a graph• Consider a graph G = (V, E), where V is the set of values that an

individual’s record can take.

• E encodes the set of discriminative pairs – Same for all records.

Page 45: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 48

Blowfish Privacy + “Policy of Secrets”• A mechanism M satisfy blowfish privacy wrt policy G if

– For every set of outputs of the mechanism S– For every pair of datasets that differ in one record,

with values x and y s.t. (x,y) ε E

Page 46: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 49

Blowfish Privacy + “Policy of Secrets”• A mechanism M satisfy blowfish privacy wrt policy G if

– For every set of outputs of the mechanism S– For every pair of datasets that differ in one record,

with values x and y s.t. (x,y) ε E

• For any x and y in the domain,

Shortest distance between x and y in G

Page 47: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 50

Blowfish Privacy + “Policy of Secrets”• A mechanism M satisfy blowfish privacy wrt policy G if

– For every set of outputs of the mechanism S– For every pair of datasets that differ in one record,

with values x and y s.t. (x,y) ε E

• Adversary is allowed to distinguish between x and y that appear in different disconnected components in G

Page 48: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 52

Algorithms for Blowfish• Consider an ordered 1-D attribute– Dom = {x1,x2,x3,…,xd}– E.g., ranges of Age, Salary, etc.

• Suppose our policy is:

Adversary should not distinguish whether an individual’s value is xj or xj+1 .

x1 x2 x3 xd

Page 49: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 53

Algorithms for Blowfish• Suppose we want to release histogram privately

– Number of individuals in each age range

• Any differentially private algorithm also satisfies blowfish– Can use Laplace mechanism (with sensitivity 2)

x1 x2 x3 xd

C(x1) C(x3) C(xd)

Page 50: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 54

Ordered Mechanism• We can answer a different set of queries to get a different private

estimator for the histogram.

x1 x2 x3 xd

C(x1) C(x3) C(xd)

S3S2

S1

Sd…

Page 51: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 55

Ordered Mechanism• We can answer each Si using Laplace mechanism …• … but sensitivity for all the queries is only 1

x1 x2 x3 xd

C(x3) +1

S3S2

S1

Sd…

C(x2) -1

Changing one tuple from x2 to x3 only

changes S2

Page 52: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 56

Ordered Mechanism• We can answer each Si using Laplace mechanism …• … but sensitivity for all the queries is only 1

Factor of 2 improvement

Page 53: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 57

Ordered Mechanism• In addition, we have the following constraint:

• However, the noisy counts may not satisfy this constraint.

• We can post-process the noisy counts to ensure this constraint:

Page 54: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 58

Ordered Mechanism• We can post-process the noisy counts to ensure this constraint:

Order of magnitude improvement for

large d

Page 55: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 59

Ordered Mechanism• By leveraging the weaker sensitive information in the policy, we

can provide significantly better utility

• Extends to more general policy specifications.

• Ordered mechanisms and other blowfish algorithms are being tested on the synthetic data generator for LODES data product.

Page 56: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 60

Blowfish Privacy & Correlations• Differentially private mechanisms may not ensure privacy when

correlations exist in the data.

• Blowfish can handle constraints in the form of publicly known constraints.– Well know marginal counts in the data– Other dependencies

• Privacy definition is similar to differential privacy with a modified notion of neighboring tables

Page 57: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 61

Other instantiations of Pufferfish• All blowfish instantiations are extensions of differential privacy

using – Weaker notions of sensitive information– Allowing knowledge of constraints about the data– All blowfish mechanisms satisfy composition property

• We can instantiate Pufferfish with other “realistic” adversary notions– Only prior distributions that are similar to the expected data distribution– Open question: Which definitions satisfy composition property?

Page 58: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 62

Summary• Differential privacy (and the tuning knob epsilon) is insufficient for

trading off privacy for utility in many applications– Sparse data, Social networks, …

• Pufferfish framework allows more expressive privacy definitions– Can vary sensitive information, adversary priors, and epsilon

• Blowfish shows one way to create more expressive definitions– Can provide useful composable mechanisms

• There is an opportunity to correctly tune privacy by using the above expressive privacy frameworks

Page 59: Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU),

Summer @ Census, 8/15/2013 63

Thank you

[M et al PVLDB’11]A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011

[Kifer-M SIGMOD’11]D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011

[Kifer-M PODS’12]D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012

[ongoing work]A. Machanavajjhala, B. Ding, X. He, “Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies”, in preparation