Upload
clyde-rose
View
215
Download
0
Embed Size (px)
Citation preview
Foundations of Privacy
Lecture 6
Lecturer: Moni Naor
Recap of last week’s lecture• Counting Queries
– The BLR Algorithm– Efficient Algorithm– Hardness Results
query 1,query 2,. . .
Synthetic DB: Output is a DB
Database
answer 1answer 3
answer 2
?
Sanitizer
Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB
Software and people compatibleConsistent answers
Counting Queries• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
Assume all queries given in advance
U
Database D of size n
Query c
Non-interactive
And Now… Bad News
Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
The Bad News
For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
Showing (Cryptographic) Hardness
• Have to come with universe U and concept class C• A distribution on
– databases – Conceptsthat is hard to sanitize
• The distribution may use cryptographic primitives
Digital Signatures
Digital Signatures (sk,vk)
Can build from one-way function [NaYu,Ro]
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
m’ sig(m’)
valid signatures under vk
Hard to forge new signature
Signatures ! No Synthetic DB
Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
sanitizerm’1 s1
m’k sk
most are valid signatures under vkinputs appear in output, no
privacy!valid signatures under same vk
Can We output Synthetic DB Efficiently?
|C|
|U|subpol
ypoly
subpoly
poly
? ?
?
Where is the Hardness Coming From?
Signature example:
Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:
Easy to satisfy each individual queryHard to maintain utility for most queries
Hardness on Average
Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vk
sanitizer
valid signatures under vk
m’1 s1vk’1m1 sig(m1)vk
m2 sig(m2)vk
mn sig(mn)vk
m’k skvk’k
are these keys related to vk?Yes! At least one is vk!
Error correcting code
Hardness on Average
Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vk
m’1 s1
m’k sk
vk’1
vk’k
8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are
3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!
are these keys related to vk?Yes! At least one is vk!
Where is Hardness Coming From?
Signature example:
Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:
Easy to satisfy each individual queryHard to maintain utility for most queries
Ullman-Vadhan: even marginals on 2 variables hard
Can We output Synthetic DB Efficiently?
|C|
|U|subpol
ypoly
subpoly
poly
? ?
?
Signatures Hard on Avg.Using PRFs
Hardness with PRFs• Let F={fs|s seed} be a family of Pseudo-random
functions. Length of seed = k • Pseudo-random functions: a family of efficiently computable
functions, such that– a random function from the family is indistinguishable (via black-box
access) from truly random functions.
fs: [ℓ] [ℓ]
• Data Universe U = {(a, b) : a, b 2 [ℓ]}.• Concepts = {cs|s seed}.
cs((a, b) ) = 1 iff fs(a)=b
Polynomial size
Polynomial size
The Hard-to-sanitize Distribution
The distribution D on samples • Generate a key s 2 {0, 1}k
• Generate n distinct elements a1, ... , an 2 [ℓ]. • The i-th entry in the database X is
xi = (ai, fs(ai)).
Claim: any differentially private sanitizer A cannot be better than 1/3 correct
• The function fs is a pseudorandom function– with overwhelming probability over the choice of seed s, for any a
2 [ℓ] that does not appear in a1, ... , an
A sanitizer A cannot predict fs(a) any better than it could a random function
Expect: no more than a (1/ℓ + neg())-fraction of the a’s in A(X) that are not in X to appear most frequently with the correct b.
Suppose this event does not occur. Since all of the items in the input X satisfy the concept cs
i.e. with probability noticeably greater than 1/ ℓ.
General output sanitizers
Theorem
Traitor tracing schemes exist if and only if sanitizing is hard
Tight connection between |U|,|C| hard to sanitizeand key, ciphertext sizes in traitor tracing
Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme
Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption
devices
• Given a Pirate Box want to find who leaked the keys
E(Content)
K1 K3 K8
ContentPirate Box
Traitors ``privacy” is violated!
Need semantic security!
Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,
Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys
k1, . . . , kN.
Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.
Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit
Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box
Simple Example of Tracing Traitor• Let EK(m) be a good shared key encryption sche• Key generation: generate independent keys for E
bk = k1, . . . , kN
• Encrypt: for bit b generate independent ciphertexts EK1(b),
EK2(b), … EKN
(b)
• Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument
Properties: ciphertext length N, key length 1.
Equivalence of TT and Hardness of Sanitizing
Ciphertext
Key
Traitor Tracing
Database entry
Query
Sanitizing hard
TT Pirate Sanitizer
for distribution of DBs(collection of)
(collection of)
Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme
– cipher length c(n), – key length k(n),
can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases w\ entries from UD is “hard to sanitize”: exists tracer that can extract an entry in
D from any sanitizer’s output
Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme
Violate its privacy!
Need semantic security!
Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,
Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys
k1, . . . , kN.
Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.
Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit
Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box
CollusionImportant parameter of a traitor-tracing scheme• its collusion-resistance• A scheme is t-resilient if tracing is guaranteed to
work as long as no more than t keys were used to create the pirate decoder.
• When t = N scheme is said to be fully resilient. • Other parameters ciphertext and private key
lengths c(n) and k(n). One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext
Need it
• Data universe: all possible keys U ={0,1}k(n).
• Concept class C: a concept for every possible ciphertext - for every m 2 {0,1}c(n) – The concept cm on input a key-string K outputs the decryption
of m using the key K
• Hard-to-sanitize distribution:– Setup to generate n decryption keys for the users, database X.
• Can view any sanitizer that maintains utility as – adversary that outputs an “object” that decrypts
encryptions of 0 or 1 correctly.
• We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.
From Hard to Sanitize to Tracing Traitors
Given hard to sanitize distributions, can create a weak TT scheme:
Ciphertext: generate database of individuals.• Each key is a separate subset.• Ciphertext corresponds to queries: knowing
individuals allows approximating the query on the database
• Need coordination between the different part, since the approximations may differ.
Interactive Model
Data
Multiple queries, chosen adaptively
?
query 1query 2Sanitizer
Counting Queries: answering queries interactively
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
• Queries given one by one and should be answered.
U
Database D of size n
Query c
Interactive
Can we answer queries when not known in advance?
• Can always answer with independent noise– Limited to number of queries that is smaller than
database size.
• We do not know the future but we do know the past!– Can answer based on past answers
Idea: Maintain list of Possible Databases
• Start with DD0 = list of all databases of size m• Each round j:
– if list DDj-1 is representative: answer according to average database in list
– Otherwise: prune the list to maintain consistency
DDj-1 DDj
Low sensitivity!
• Initialize DD0 = {all databases of size m over U}.
• Each round DDj-1 = {x1, x2, …} where xi of size m
For each query c1, c2, …, ck in turn:
• Let Aj à Averagei 2 DDj-1 min{d(x*,xi), √n}
• If Aj is small: answer according to median db in DDj-1
– DDj à DDj-1
• If Aj is large: remove all db’s that are far away to get DDj
– Give true answer
Noisy threshold
Plus noise
Need to showAccuracy and functionality:• The result is accurate • If Aj is large: many of xi 2 DDj-1 are removed
• DDj is never empty
Privacy• Not many large Aj
• Can release large rounds• Can release noisy answers.
Why can we release when large rounds occur?
• Do not expect more than O(m) large rounds• Make the threshold noisy
For every pair of neighboring databases: D and D’• Consider vector of thresholds • If far away from threshold – can be the same in both• If close to threshold: can correct at cost
– Cannot occur too frequently
Why is there a good xi
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
U
Database D of size n
Query c
Sample F of size m approximates D on all
given c
m is Õ(n2/3 log|C|)
There exists x of size m =Õ((n\α)2·log|C|) s.t. maxcj dist(Fgood,D) ≤ α
For α=Õ(n2/3log|C|),