Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Foundations of Privacy

Lecture 6

Lecturer: Moni Naor

Recap of last week’s lecture• Counting Queries

– The BLR Algorithm– Efficient Algorithm– Hardness Results

query 1,query 2,. . .

Synthetic DB: Output is a DB

Database

answer 1answer 3

answer 2

?

Sanitizer

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatibleConsistent answers

Counting Queries• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

Assume all queries given in advance

U

Database D of size n

Query c

Non-interactive

And Now… Bad News

Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

The Bad News

For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Showing (Cryptographic) Hardness

• Have to come with universe U and concept class C• A distribution on

– databases – Conceptsthat is hard to sanitize

• The distribution may use cryptographic primitives

Digital Signatures

Digital Signatures (sk,vk)

Can build from one-way function [NaYu,Ro]

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

m’ sig(m’)

valid signatures under vk

Hard to forge new signature

Signatures ! No Synthetic DB

Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

sanitizerm’1 s1

m’k sk

most are valid signatures under vkinputs appear in output, no

privacy!valid signatures under same vk

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Where is the Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Hardness on Average

Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

sanitizer

valid signatures under vk

m’1 s1vk’1m1 sig(m1)vk

m2 sig(m2)vk

mn sig(mn)vk

m’k skvk’k

are these keys related to vk?Yes! At least one is vk!

Error correcting code

Hardness on Average

Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

m’1 s1

m’k sk

vk’1

vk’k

8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are

3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!

are these keys related to vk?Yes! At least one is vk!

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Ullman-Vadhan: even marginals on 2 variables hard

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Signatures Hard on Avg.Using PRFs

Hardness with PRFs• Let F={fs|s seed} be a family of Pseudo-random

functions. Length of seed = k • Pseudo-random functions: a family of efficiently computable

functions, such that– a random function from the family is indistinguishable (via black-box

access) from truly random functions.

fs: [ℓ] [ℓ]

• Data Universe U = {(a, b) : a, b 2 [ℓ]}.• Concepts = {cs|s seed}.

cs((a, b) ) = 1 iff fs(a)=b

Polynomial size

Polynomial size

The Hard-to-sanitize Distribution

The distribution D on samples • Generate a key s 2 {0, 1}k

• Generate n distinct elements a1, ... , an 2 [ℓ]. • The i-th entry in the database X is

xi = (ai, fs(ai)).

Claim: any differentially private sanitizer A cannot be better than 1/3 correct

• The function fs is a pseudorandom function– with overwhelming probability over the choice of seed s, for any a

2 [ℓ] that does not appear in a1, ... , an

A sanitizer A cannot predict fs(a) any better than it could a random function

Expect: no more than a (1/ℓ + neg())-fraction of the a’s in A(X) that are not in X to appear most frequently with the correct b.

Suppose this event does not occur. Since all of the items in the input X satisfy the concept cs

i.e. with probability noticeably greater than 1/ ℓ.

General output sanitizers

Theorem

Traitor tracing schemes exist if and only if sanitizing is hard

Tight connection between |U|,|C| hard to sanitizeand key, ciphertext sizes in traitor tracing

Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme

Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption

devices

• Given a Pirate Box want to find who leaked the keys

E(Content)

K1 K3 K8

ContentPirate Box

Traitors ``privacy” is violated!

Need semantic security!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,

Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys

k1, . . . , kN.

Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.

Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit

Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box

Simple Example of Tracing Traitor• Let EK(m) be a good shared key encryption sche• Key generation: generate independent keys for E

bk = k1, . . . , kN

• Encrypt: for bit b generate independent ciphertexts EK1(b),

EK2(b), … EKN

(b)

• Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument

Properties: ciphertext length N, key length 1.

Equivalence of TT and Hardness of Sanitizing

Ciphertext

Key

Traitor Tracing

Database entry

Query

Sanitizing hard

TT Pirate Sanitizer

for distribution of DBs(collection of)

(collection of)

Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme

– cipher length c(n), – key length k(n),

can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases w\ entries from UD is “hard to sanitize”: exists tracer that can extract an entry in

D from any sanitizer’s output

Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme

Violate its privacy!

Need semantic security!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,

Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys

k1, . . . , kN.

Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.

Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit

Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box

CollusionImportant parameter of a traitor-tracing scheme• its collusion-resistance• A scheme is t-resilient if tracing is guaranteed to

work as long as no more than t keys were used to create the pirate decoder.

• When t = N scheme is said to be fully resilient. • Other parameters ciphertext and private key

lengths c(n) and k(n). One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext

Need it

• Data universe: all possible keys U ={0,1}k(n).

• Concept class C: a concept for every possible ciphertext - for every m 2 {0,1}c(n) – The concept cm on input a key-string K outputs the decryption

of m using the key K

• Hard-to-sanitize distribution:– Setup to generate n decryption keys for the users, database X.

• Can view any sanitizer that maintains utility as – adversary that outputs an “object” that decrypts

encryptions of 0 or 1 correctly.

• We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

From Hard to Sanitize to Tracing Traitors

Given hard to sanitize distributions, can create a weak TT scheme:

Ciphertext: generate database of individuals.• Each key is a separate subset.• Ciphertext corresponds to queries: knowing

individuals allows approximating the query on the database

• Need coordination between the different part, since the approximations may differ.

Interactive Model

Data

Multiple queries, chosen adaptively

?

query 1query 2Sanitizer

Counting Queries: answering queries interactively


Relaxed accuracy:


• Queries given one by one and should be answered.

U


Query c

Interactive

Can we answer queries when not known in advance?

• Can always answer with independent noise– Limited to number of queries that is smaller than

database size.

• We do not know the future but we do know the past!– Can answer based on past answers

Idea: Maintain list of Possible Databases

• Start with DD0 = list of all databases of size m• Each round j:

– if list DDj-1 is representative: answer according to average database in list

– Otherwise: prune the list to maintain consistency

DDj-1 DDj

Low sensitivity!

• Initialize DD0 = {all databases of size m over U}.

• Each round DDj-1 = {x1, x2, …} where xi of size m

For each query c1, c2, …, ck in turn:

• Let Aj Ã Averagei 2 DDj-1 min{d(x*,xi), √n}

• If Aj is small: answer according to median db in DDj-1

– DDj Ã DDj-1

• If Aj is large: remove all db’s that are far away to get DDj

– Give true answer

Noisy threshold

Plus noise

Need to showAccuracy and functionality:• The result is accurate • If Aj is large: many of xi 2 DDj-1 are removed

• DDj is never empty

Privacy• Not many large Aj

• Can release large rounds• Can release noisy answers.

Why can we release when large rounds occur?

• Do not expect more than O(m) large rounds• Make the threshold noisy

For every pair of neighboring databases: D and D’• Consider vector of thresholds • If far away from threshold – can be the same in both• If close to threshold: can correct at cost

– Cannot occur too frequently

Why is there a good xi

• Queries with low sensitivity


Relaxed accuracy:


U


Query c

Sample F of size m approximates D on all

given c

m is Õ(n2/3 log|C|)

There exists x of size m =Õ((n\α)2·log|C|) s.t. maxcj dist(Fgood,D) ≤ α

For α=Õ(n2/3log|C|),

Documents

Foundations of Privacy Lecture 6 Lecturer: Moni Naor