25
Practical Private Computation and Zero-Knowledge Tools for Privacy- Preserving Distributed Data Mining Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley

Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Practical Private Computation and Zero-Knowledge Tools for Privacy-PreservingDistributed Data Mining

Yitao Duan and John Cannyhttp://www.cs.berkeley.edu/~duan

Berkeley Institute of DesignComputer Science Division

University of California, Berkeley

Page 2: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Goal To provide practical solutions with

provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

Page 3: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Goal To provide practical solutions with

provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

Page 4: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

The Scenario Two data miners mine data from n users The data miners are semi-honest: follow

the protocol but try to get more info Some fraction of users can be malicious:

they may input bogus data to disrupt the computation

A more realistic adversary model than most existing privacy-preserving data mining schemes

Page 5: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Model

……u1

d1

u2

d2

un-1

dn-1

un

dn

fChallenge: standard cryptographic tools not feasible at large scale

Must be obfuscated

di in Zφm

φ: < 32 or 64-bit

Page 6: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

A Practical Solution Provable privacy: Cryptography Efficiency:

VSS over small field. Minimize the number of expensive primitives

and rely on probabilistic guarantee Realistic adversary model: An extremely

efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation

Page 7: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Basic Approach

……u1

d1

u2

d2

un-1

dn-1

un

dn

Σf =No leakage beyond final result for many algorithmsor differential privacy [Dwork06]

Cryptographic privacy

Page 8: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

The Power of Addition A large number of popular algorithms can be

run with addition-only steps

Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc

All algorithms in the statistical query model [Kearns 93]

Many other gradient-based numerical algorithms A trick used a lot for parallelization in distributed

computing [Chu 06, Das 07]

Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs

Page 9: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Private Addition

The computation: secret sharing over small field

Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector

Page 10: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Big Integers vs. Small Ones Most applications work with “regular-sized”

integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec)

Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec)

A 6 orders of magnitude difference!

Page 11: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

viui

ui + vi = di

Private Addition

di: user i’s private vector. ui,,vi and di are all in a small integer field

Page 12: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

μ = Σui ν = Σvi

ui + vi = di

Private Addition

Page 13: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

ui + vi = di

μ = Σui ν = Σvi

μ

ν

Private Addition

Page 14: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

μ + ν

Private Addition

Page 15: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Provable privacy Computation on each server is

over small field: same cost as non-private implementation – O(m) small field operations

So the cost for privacy is only due to verification For that we have a solution that

involves only O(log m) large field operations

Private Addition

Page 16: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

The Need for Verification Private computation obfuscates

user data. A malicious user could input anything.

Think of a voting scheme: “Please place your vote 0 or 1 in the envelope”

Bush 100,000Gore -100,000

Page 17: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

I can prove that I know X without disclosing what X is.

I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed)

I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)

Zero Knowledge Proofs

Page 18: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Bounding the L2-Norm A natural and effective way to restrict a

cheating user’s malicious influence You must have a big vector to produce

large influence on the sum Perturbation theory bounds system

change with norms:|σi(A) - σi(B)| ≤ ||A-B||2 [Weyl]

Can be the basis for other checks Setting L = 1 forces each user to have only 1

vote

Page 19: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

An Efficient ZKP of Boundedness

Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small.

The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small.

• O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L.

• Running time reduced from hours to seconds.

Page 20: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Random Projection-basedL2-Norm ZKP Server generates N random m-vectors in {-1, 0, +1}m with i.i.d. probability {¼, ½,

¼} User projects his data to the N directions.

provides ZKP that the square sum of the projections < NL2/2

Expensive public key operations are only on the projections and the square sum

Page 21: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Effectiveness

Page 22: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

23/4/19

Acceptance/rejection Probabilities

(a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)

Page 23: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Performance Evaluation

(a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.

• Standard technique takes 6 to 10 hours at m = 106

Page 24: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Current Status The protocols (the L2-norm ZKP and the

private vector addition) have been implemented

Adding more mid-tier components In Java using native code for big integer Runs on Linux platform Made an open-source toolkit for building

privacy-preserving real-world applications

Page 25: Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

More info

[email protected] http://www.cs.berkeley.edu/

~duan/research/p4p.html

Thank You!