Upload
juan-mclean
View
216
Download
0
Embed Size (px)
Citation preview
Private Inference Control
David WoodruffMIT
Joint work with Jessica Staddon (PARC)
Contents
1. Background1. Access Control and Inference Control2. Our contribution: Private Inference Control
(PIC)3. Related Work
2. PIC model & definitions3. Our Results4. Conclusions
Access Control
ServerDB of
n records
• User queries a database. Some info in DB sensitive.
• Access control prevents user from learning individual sensitive relations/attributes. • Does access control prevent user from learning sensitive info?
What’s Bob’s salary?
Sensitive: Access denied
Inference ControlNameName JobJob SalarySalaryAlyssa P. Hacker
Software Engineer
$90,000
Paul E. Nomial
Mathematician
$31,415
… … …
Combining non-sensitive info may yield something sensitive Inference Channel: {(name, job), (job, salary)} Inference Control : block all inference channels
Query 1 How much does Alyssa make?
Query 2 What is Alyssa’s job?
Query 3 How much do software engineers make?
Sensitive.
Software Engineer
$90,000
Inference Control
Database x 2 ({0,1}m)n
DB of n records, m attributes 1, …, m per record n tending to infinity, m = O(1)
Inference engine: generates collection C of subsets of [m] denoting all the inference channels
We assume have an engine [QSKLG93] (exhaustive search) F 2 C means for all i, user shouldn’t learn xi, j for all j 2
F Assume C is monotone. Assume C input to both user and server User learns C anyway when his queries are blocked C is data-independent, reveals info only about attributes
Our contribution: Private Inference Control
Existing inference control schemes require server to learn user queries to check if they form an inference
Our goal: user Privacy + Inference Control = PIC
This talk: arbitrary malicious users U*, semi-honest S
Privacy: polytime S learns nothing about honest user’s queries except # made so far
# queries made so far enables S to do inference control
Private and symmetrically-private information retrieval
Not sufficient since they are stateless User’s permissions change over time
Generic secure function evaluation Not efficient – our communication exponentially smaller
Application Government analysts inspect repositories for
terrorist patterns1. Inference Control: prevent analysts from
learning sensitive info about non-terrorists.2. User Privacy: prevent server from learning
what analysts are tracking – if discovered this info could go to terrorists!
DBDB
Related Work
Data perturbation [AS00, B80, TYW84] So much noise required data not as useful [DN03]
Adaptive Oblivious Transfer [NP99] One record can be queried adaptively at most k
times Priced Oblivious Transfer [AIR01]
One record, supports more inference channels than threshold version considered in [NP99]
We generalize [NP99] and [AIR01] Arbitrary inference channels and multiple records More efficient/private than parallelizing NP99 and
AIR01 on each record
The Model Offline Stage: S given x, C, 1k, and can preprocess x Online Stage: at time t, honest U generates query (it, jt)
(it, jt) can depend on all prior info/transactions with S Let T denote all queries U makes, (i1, j1), …, (i|T|, j|T|)
T r.v. - depends on U’s code, x, and randomness T permissable if no i s.t. (i,j) 2 T for all j 2 F for some F 2
C. We require honest U to generate permissable T. U and S interact in a multiround protocol, then U outputs
outt ViewU consists of C, n, m, 1k , all messages from S,
randomness ViewS consists of C, n, m, 1k, x, all messages from U,
randomness
Security Definitions Correctness: For all x, C, for all honest users U, for
all 2 [|T(U, x)|], out = xi, j
User Privacy: For all x, C, for all honest U, for any two sequences T1, T2 with |T1| = |T2|, for all semi-honest servers S* and random coin tosses of S*
(ViewS* | T(U, x) = T1) (ViewS* | T(U, x) = T2) Inference Control: Comparison with ideal model – for
every U*, every x, any random coins of U*, for every C there exists a simulator U’ interacting with trusted party Ch for which ViewU* View<U’, Ch>, where U’ just asks Ch for tuples (it, jt) that are permissable
Efficiency
Efficiency measures are per query Minimize communication & round complexity
Ideally O(polylog(n)) bits and 1 round Minimize server’s time-complexity
Ideally O(n) without preprocessing W/preprocessing, potentially better, but
O(n) optimal w.r.t. known single-server PIR schemes
Our Results
For any PIR scheme, let C(n) W(n) denote communication and server work for DB size n
PIC scheme #1 Communication: O(k log n C(n2)), 1-round Work: O(k log n W(n2))
PIC scheme #2 Communication: O(k(n + C(n))), O(1)-round Work: O(k(n + W(n)))
Plugging in best PIR parameters, Scheme #1: comm. O(polylog(n)), work
O(n2) Scheme #2: comm. & work: O(npolylog(n))
A Generic Reduction
A protocol is a threshold PIC (TPIC) if it satisfies the definitions of a PIC scheme assuming C = {[m]}.
Theorem (roughly speaking): If there exists a TPIC with communication C(n), work W(n), and round complexity R(n), then there exists a PIC with communication O(C(n)), work O(W(n)), and round complexity O(R(n)).
PIC ideas:
…
…
cnvdselvuiaapxnw
User/server do SPIR on table of encryptions
Idea: Encryptions of both data and keys that will help user decrypt encryptions on future queries User can only decrypt if has appropriate keys – only possible if not in danger of making an inference
Stateless PIC
Minimizing communication is a data structures problem
What type of keys require least communication for user to:
1. Update as user makes new queries?2. Prove user not in danger of making an
inference on current/future queries? Keys must prevent replay attacks: can’t
use “old” keys to pretend made less queries to records than actually have
PIC Scheme #1 – Stage 1
E(i1) -> E(r1(i1 – i3))E(i2) -> E(r2(i2 – i3))
(i3, j3)
E(i3), E(j3), ZKPOK
Let E by a homomorphic semantically secure encryption scheme (e.g., Pallier) Suppose we allow accessing each record at most once
PK, SK PK
Recovers r1, r2 iff hasn’t previously accessed i3 From r1 and r2 user can reconstruct a secret S3
PIC Scheme #1 – Stage 2
(i3, j3)
E(i3), E(j3), ZKPOKPK, SK PK
Recovers S3
E(r1,1(j-j3) + r’1,1(i – i3) + S3 + x1,1)
E(r1,2(j-j3) + r’1,2(i – i3) + S3 + x1,2)
E(r2,1(j-j3) + r’2,1(i – i3) + S3 + x2,1)
…
User does “SPIR on records” on
table of encryptions
PIC Scheme #1 - Wrapup To extend to querying a record < m times, on t-th
query, let r1, …, rt-1 be (t-m+1) out of (t-1) secret sharing of St
This scheme can be proven to be a TPIC – use generic reduction to get a PIC
User Privacy: semantic security of E, ZK of proof, privacy of SPIR
Inference Control: user can recover at most t-m ri if already queried record m-1 times – can build a simulator using SPIR w/knowledge extractor [NP99]
PIC Scheme #2 - Glimpse
1 2 43
t
Kv, bKu, a
Kw,c Kx,d Ky,e Kz,f
polylog(n)-communication PIC Balanced binary tree B Leaves are attributes Parents of leaves are records
Internal node n accessed when record r queried and n on path from r to root
Keys encode # times nodes in B have been accessed.
a+b =t
Conclusions
Extensions not in this talk Multiple users (pseudonyms) Collusion resistance: c-resistance => m-channel
becomes collection of (m-1)/c channels. Summary
New Primitive – PIC (Almost) Communication-optimal implementations