View
216
Download
1
Embed Size (px)
Citation preview
Practical Private Computation and Zero-Knowledge Tools for Privacy-PreservingDistributed Data Mining
Yitao Duan and John Cannyhttp://www.cs.berkeley.edu/~duan
Berkeley Institute of DesignComputer Science Division
University of California, Berkeley
Goal To provide practical solutions with
provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
Goal To provide practical solutions with
provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
The Scenario Two data miners mine data from n users The data miners are semi-honest: follow
the protocol but try to get more info Some fraction of users can be malicious:
they may input bogus data to disrupt the computation
A more realistic adversary model than most existing privacy-preserving data mining schemes
Model
……u1
d1
u2
d2
un-1
dn-1
un
dn
fChallenge: standard cryptographic tools not feasible at large scale
Must be obfuscated
di in Zφm
φ: < 32 or 64-bit
A Practical Solution Provable privacy: Cryptography Efficiency:
VSS over small field. Minimize the number of expensive primitives
and rely on probabilistic guarantee Realistic adversary model: An extremely
efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation
Basic Approach
……u1
d1
u2
d2
un-1
dn-1
un
dn
Σf =No leakage beyond final result for many algorithmsor differential privacy [Dwork06]
Cryptographic privacy
The Power of Addition A large number of popular algorithms can be
run with addition-only steps
Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc
All algorithms in the statistical query model [Kearns 93]
Many other gradient-based numerical algorithms A trick used a lot for parallelization in distributed
computing [Chu 06, Das 07]
Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs
Private Addition
The computation: secret sharing over small field
Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector
Big Integers vs. Small Ones Most applications work with “regular-sized”
integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec)
Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec)
A 6 orders of magnitude difference!
viui
ui + vi = di
Private Addition
di: user i’s private vector. ui,,vi and di are all in a small integer field
μ = Σui ν = Σvi
ui + vi = di
Private Addition
ui + vi = di
μ = Σui ν = Σvi
μ
ν
Private Addition
μ + ν
Private Addition
Provable privacy Computation on each server is
over small field: same cost as non-private implementation – O(m) small field operations
So the cost for privacy is only due to verification For that we have a solution that
involves only O(log m) large field operations
Private Addition
The Need for Verification Private computation obfuscates
user data. A malicious user could input anything.
Think of a voting scheme: “Please place your vote 0 or 1 in the envelope”
Bush 100,000Gore -100,000
I can prove that I know X without disclosing what X is.
I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed)
I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)
Zero Knowledge Proofs
Bounding the L2-Norm A natural and effective way to restrict a
cheating user’s malicious influence You must have a big vector to produce
large influence on the sum Perturbation theory bounds system
change with norms:|σi(A) - σi(B)| ≤ ||A-B||2 [Weyl]
Can be the basis for other checks Setting L = 1 forces each user to have only 1
vote
An Efficient ZKP of Boundedness
Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small.
The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small.
• O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L.
• Running time reduced from hours to seconds.
Random Projection-basedL2-Norm ZKP Server generates N random m-vectors in {-1, 0, +1}m with i.i.d. probability {¼, ½,
¼} User projects his data to the N directions.
provides ZKP that the square sum of the projections < NL2/2
Expensive public key operations are only on the projections and the square sum
Effectiveness
23/4/19
Acceptance/rejection Probabilities
(a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)
Performance Evaluation
(a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.
• Standard technique takes 6 to 10 hours at m = 106
Current Status The protocols (the L2-norm ZKP and the
private vector addition) have been implemented
Adding more mid-tier components In Java using native code for big integer Runs on Linux platform Made an open-source toolkit for building
privacy-preserving real-world applications