Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan

Practical Private Computation and Zero-Knowledge Tools for Privacy-PreservingDistributed Data Mining

Yitao Duan and John Cannyhttp://www.cs.berkeley.edu/~duan

Berkeley Institute of DesignComputer Science Division

University of California, Berkeley

Goal To provide practical solutions with

provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

Goal To provide practical solutions with

provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

The Scenario Two data miners mine data from n users The data miners are semi-honest: follow

the protocol but try to get more info Some fraction of users can be malicious:

they may input bogus data to disrupt the computation

A more realistic adversary model than most existing privacy-preserving data mining schemes

Model

……u1

d1

u2

d2

un-1

dn-1

un

dn

fChallenge: standard cryptographic tools not feasible at large scale

Must be obfuscated

di in Zφm

φ: < 32 or 64-bit

A Practical Solution Provable privacy: Cryptography Efficiency:

VSS over small field. Minimize the number of expensive primitives

and rely on probabilistic guarantee Realistic adversary model: An extremely

efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation

Basic Approach

……u1

d1

u2

d2

un-1

dn-1

un

dn

Σf =No leakage beyond final result for many algorithmsor differential privacy [Dwork06]

Cryptographic privacy

The Power of Addition A large number of popular algorithms can be

run with addition-only steps

Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc

All algorithms in the statistical query model [Kearns 93]

Many other gradient-based numerical algorithms A trick used a lot for parallelization in distributed

computing [Chu 06, Das 07]

Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs

Private Addition

The computation: secret sharing over small field

Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector

Big Integers vs. Small Ones Most applications work with “regular-sized”

integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec)

Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec)

A 6 orders of magnitude difference!

viui

ui + vi = di

Private Addition

di: user i’s private vector. ui,,vi and di are all in a small integer field

μ = Σui ν = Σvi

ui + vi = di

Private Addition

ui + vi = di

μ = Σui ν = Σvi

μ

ν

Private Addition

μ + ν

Private Addition

Provable privacy Computation on each server is

over small field: same cost as non-private implementation – O(m) small field operations

So the cost for privacy is only due to verification For that we have a solution that

involves only O(log m) large field operations

Private Addition

The Need for Verification Private computation obfuscates

user data. A malicious user could input anything.

Think of a voting scheme: “Please place your vote 0 or 1 in the envelope”

Bush 100,000Gore -100,000

I can prove that I know X without disclosing what X is.

I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed)

I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)

Zero Knowledge Proofs

Bounding the L2-Norm A natural and effective way to restrict a

cheating user’s malicious influence You must have a big vector to produce

large influence on the sum Perturbation theory bounds system

change with norms:|σi(A) - σi(B)| ≤ ||A-B||2 [Weyl]

Can be the basis for other checks Setting L = 1 forces each user to have only 1

vote

An Efficient ZKP of Boundedness

Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small.

The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small.

• O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L.

• Running time reduced from hours to seconds.

Random Projection-basedL2-Norm ZKP Server generates N random m-vectors in {-1, 0, +1}m with i.i.d. probability {¼, ½,

¼} User projects his data to the N directions.

provides ZKP that the square sum of the projections < NL2/2

Expensive public key operations are only on the projections and the square sum

Effectiveness

23/4/19

Acceptance/rejection Probabilities

(a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)

Performance Evaluation

(a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.

• Standard technique takes 6 to 10 hours at m = 106

Current Status The protocols (the L2-norm ZKP and the

private vector addition) have been implemented

Adding more mid-tier components In Java using native code for big integer Runs on Linux platform Made an open-source toolkit for building

privacy-preserving real-world applications

More info

[email protected] http://www.cs.berkeley.edu/

~duan/research/p4p.html

Thank You!

Documents

Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny duan