38
Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Embed Size (px)

Citation preview

Page 1: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Foundations of Privacy

Lecture 6

Lecturer: Moni Naor

Page 2: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Recap of last week’s lecture• Counting Queries

– The BLR Algorithm– Efficient Algorithm– Hardness Results

Page 3: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

query 1,query 2,. . .

Synthetic DB: Output is a DB

Database

answer 1answer 3

answer 2

?

Sanitizer

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatibleConsistent answers

Page 4: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Counting Queries• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

Assume all queries given in advance

U

Database D of size n

Query c

Non-interactive

Page 5: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

And Now… Bad News

Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Page 6: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

The Bad News

For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Page 7: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Showing (Cryptographic) Hardness

• Have to come with universe U and concept class C• A distribution on

– databases – Conceptsthat is hard to sanitize

• The distribution may use cryptographic primitives

Page 8: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Digital Signatures

Digital Signatures (sk,vk)

Can build from one-way function [NaYu,Ro]

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

m’ sig(m’)

valid signatures under vk

Hard to forge new signature

Page 9: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Signatures ! No Synthetic DB

Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

sanitizerm’1 s1

m’k sk

most are valid signatures under vkinputs appear in output, no

privacy!valid signatures under same vk

Page 10: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Page 11: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Where is the Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Page 12: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Hardness on Average

Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

sanitizer

valid signatures under vk

m’1 s1vk’1m1 sig(m1)vk

m2 sig(m2)vk

mn sig(mn)vk

m’k skvk’k

are these keys related to vk?Yes! At least one is vk!

Error correcting code

Page 13: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Hardness on Average

Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

m’1 s1

m’k sk

vk’1

vk’k

8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are

3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!

are these keys related to vk?Yes! At least one is vk!

Page 14: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Ullman-Vadhan: even marginals on 2 variables hard

Page 15: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Signatures Hard on Avg.Using PRFs

Page 16: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Hardness with PRFs• Let F={fs|s seed} be a family of Pseudo-random

functions. Length of seed = k • Pseudo-random functions: a family of efficiently computable

functions, such that– a random function from the family is indistinguishable (via black-box

access) from truly random functions.

fs: [ℓ] [ℓ]

• Data Universe U = {(a, b) : a, b 2 [ℓ]}.• Concepts = {cs|s seed}.

cs((a, b) ) = 1 iff fs(a)=b

Polynomial size

Polynomial size

Page 17: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

The Hard-to-sanitize Distribution

The distribution D on samples • Generate a key s 2 {0, 1}k

• Generate n distinct elements a1, ... , an 2 [ℓ]. • The i-th entry in the database X is

xi = (ai, fs(ai)).

Claim: any differentially private sanitizer A cannot be better than 1/3 correct

Page 18: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

• The function fs is a pseudorandom function– with overwhelming probability over the choice of seed s, for any a

2 [ℓ] that does not appear in a1, ... , an

A sanitizer A cannot predict fs(a) any better than it could a random function

Expect: no more than a (1/ℓ + neg())-fraction of the a’s in A(X) that are not in X to appear most frequently with the correct b.

Suppose this event does not occur. Since all of the items in the input X satisfy the concept cs

i.e. with probability noticeably greater than 1/ ℓ.

Page 19: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

General output sanitizers

Theorem

Traitor tracing schemes exist if and only if sanitizing is hard

Tight connection between |U|,|C| hard to sanitizeand key, ciphertext sizes in traitor tracing

Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme

Page 20: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption

devices

• Given a Pirate Box want to find who leaked the keys

E(Content)

K1 K3 K8

ContentPirate Box

Traitors ``privacy” is violated!

Page 21: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Need semantic security!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,

Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys

k1, . . . , kN.

Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.

Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit

Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box

Page 22: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Simple Example of Tracing Traitor• Let EK(m) be a good shared key encryption sche• Key generation: generate independent keys for E

bk = k1, . . . , kN

• Encrypt: for bit b generate independent ciphertexts EK1(b),

EK2(b), … EKN

(b)

• Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument

Properties: ciphertext length N, key length 1.

Page 23: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Equivalence of TT and Hardness of Sanitizing

Ciphertext

Key

Traitor Tracing

Database entry

Query

Sanitizing hard

TT Pirate Sanitizer

for distribution of DBs(collection of)

(collection of)

Page 24: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme

– cipher length c(n), – key length k(n),

can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases w\ entries from UD is “hard to sanitize”: exists tracer that can extract an entry in

D from any sanitizer’s output

Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme

Violate its privacy!

Page 25: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Need semantic security!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,

Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys

k1, . . . , kN.

Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.

Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit

Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box

Page 26: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

CollusionImportant parameter of a traitor-tracing scheme• its collusion-resistance• A scheme is t-resilient if tracing is guaranteed to

work as long as no more than t keys were used to create the pirate decoder.

• When t = N scheme is said to be fully resilient. • Other parameters ciphertext and private key

lengths c(n) and k(n). One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext

Need it

Page 27: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

• Data universe: all possible keys U ={0,1}k(n).

• Concept class C: a concept for every possible ciphertext - for every m 2 {0,1}c(n) – The concept cm on input a key-string K outputs the decryption

of m using the key K

• Hard-to-sanitize distribution:– Setup to generate n decryption keys for the users, database X.

Page 28: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

• Can view any sanitizer that maintains utility as – adversary that outputs an “object” that decrypts

encryptions of 0 or 1 correctly.

• We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

Page 29: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

From Hard to Sanitize to Tracing Traitors

Given hard to sanitize distributions, can create a weak TT scheme:

Ciphertext: generate database of individuals.• Each key is a separate subset.• Ciphertext corresponds to queries: knowing

individuals allows approximating the query on the database

• Need coordination between the different part, since the approximations may differ.

Page 30: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Interactive Model

Data

Multiple queries, chosen adaptively

?

query 1query 2Sanitizer

Page 31: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Counting Queries: answering queries interactively

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

• Queries given one by one and should be answered.

U

Database D of size n

Query c

Interactive

Page 32: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Can we answer queries when not known in advance?

• Can always answer with independent noise– Limited to number of queries that is smaller than

database size.

• We do not know the future but we do know the past!– Can answer based on past answers

Page 33: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Idea: Maintain list of Possible Databases

• Start with DD0 = list of all databases of size m• Each round j:

– if list DDj-1 is representative: answer according to average database in list

– Otherwise: prune the list to maintain consistency

DDj-1 DDj

Page 34: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Low sensitivity!

• Initialize DD0 = {all databases of size m over U}.

• Each round DDj-1 = {x1, x2, …} where xi of size m

For each query c1, c2, …, ck in turn:

• Let Aj à Averagei 2 DDj-1 min{d(x*,xi), √n}

• If Aj is small: answer according to median db in DDj-1

– DDj à DDj-1

• If Aj is large: remove all db’s that are far away to get DDj

– Give true answer

Noisy threshold

Plus noise

Page 35: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Need to showAccuracy and functionality:• The result is accurate • If Aj is large: many of xi 2 DDj-1 are removed

• DDj is never empty

Privacy• Not many large Aj

• Can release large rounds• Can release noisy answers.

Page 36: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Why can we release when large rounds occur?

• Do not expect more than O(m) large rounds• Make the threshold noisy

For every pair of neighboring databases: D and D’• Consider vector of thresholds • If far away from threshold – can be the same in both• If close to threshold: can correct at cost

– Cannot occur too frequently

Page 37: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Why is there a good xi

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

U

Database D of size n

Query c

Sample F of size m approximates D on all

given c

Page 38: Foundations of Privacy Lecture 6 Lecturer: Moni Naor

m is Õ(n2/3 log|C|)

There exists x of size m =Õ((n\α)2·log|C|) s.t. maxcj dist(Fgood,D) ≤ α

For α=Õ(n2/3log|C|),