23
Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration with Dan Boneh (Stanford); Udi Weinsberg, Stratis Ioannidis, Marc Joye, Nina Taft (Technicolor).

Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Data-Mining on GBytes of Encrypted DataEncrypted Data

Valeria Nikolaenko

In collaboration withDan Boneh (Stanford); Udi Weinsberg, Stratis Ioannidis,

Marc Joye, Nina Taft (Technicolor).

Page 2: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Outline

• Motivation• Background on cryptographic tools• Linear regression• Our solution• Our solution• Experiments and performance

2Valeria Nikolaenko

Page 3: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

MotivationUsers

Data

Data mining engineData

Data model

Engine learns nothing more than the model!

3Valeria Nikolaenko

Privacy concern!

Page 4: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Data Mining

• Classification• Regression• Clustering• Summarization

: linear regression

: matrix factorization• Summarization• Dependency modeling

4Valeria Nikolaenko

: matrix factorization

Main challenge: make these algorithms privacy preserving and efficient.

Page 5: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Contribution

• Design of a practical system forprivacy preserving linear regression

• Implementation• Experiments on real datasets• Experiments on real datasets

Comparison to state of the art:• Hall et al.’11: 2 days vs 3 min• Graepel et al.’12: 10 min vs 2 sec

5Valeria Nikolaenko

Page 6: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Outline

• Motivation• Background on cryptographic tools• Linear regression• Our solution• Our solution• Experiments and performance

6Valeria Nikolaenko

Page 7: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Computations on Encrypted Data

• 2009, C. Gentry – FHE

• 1979, A. Shamir1988, BGW

(slow for our problems)

Secret sharing1988, BGW

• 1982, A.C. Yao – Garbled circuits

• Our approach: hybrid of Yao and hom. encryption

(huge communication overhead)

7Valeria Nikolaenko

Secret sharing

Page 8: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Yao’s Garbled Circuits

“garbled

C(x) C(x)

8Valeria Nikolaenko

circuit“garbled circuit”

01

01

01

...

...01

G01

G11

G02

G12

G03

G13

...

...G0

nG1

n

x=010...1 Nothing is leaked about x,other than C(x)!

Page 9: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Data Mining System Architecture

User Evaluator

Crypto Service Provider (CSP)User i

CreateĈ

9Valeria Nikolaenko

Create“garbled circuit” ĈĈ

{Gx11, Gx2

2, …, Gxnn }

Compute C(x1, …, xn)

C(x1, …, xn)

Ĉ

{G01, G0

2, …, G0n}

{G11, G1

2, …, G1n}

xi

Page 10: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

System Properties

�Evaluator learns the model, not the inputs

Problems:• Not scalable with the number of users• Not scalable with the number of users• Users need to be online

10Valeria Nikolaenko

Page 11: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Outline

• Motivation• Background on cryptographic tools• Linear regression• Our solution• Our solution• Experiments and performance

11Valeria Nikolaenko

Page 12: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Linear Regression I

• (f,r) – from users• solve for β

bA =βr

12Valeria Nikolaenko

bA =β

f

r = <f, β>

Page 13: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Linear Regression II

Users

f1, r1Training engine… Training engine

fn, rn

Training engine learns nothing about f’s and r’s,other than β!

β

13Valeria Nikolaenko

Page 14: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Outline

• Motivation• Background on cryptographic tools• Linear regression• Our solution• Our solution• Experiments and performance

14Valeria Nikolaenko

Page 15: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Construct Matrices

Users

f1Tf1

Evaluator

computes A = ∑fiTfi

in the circuit

Phase1: compute and ∑= iT

i ffA ∑= iT

i rfb

15Valeria Nikolaenko

fnTfn

For additions can use homomorphic encryption:

)()](),([ 2121 AAHEAHEAHE +→

in the circuit

Page 16: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Construct Matrices

Users

EvaluatorHE(f1

Tf1)

• computesHE(A) = HE(∑f Tf )

Phase1: compute and ∑= iT

i ffA ∑= iT

i rfb

16Valeria Nikolaenko

For additions can use homomorphic encryption:

)()](),([ 2121 AAHEAHEAHE +→

HE(fnTfn)

HE(A) = HE(∑fiTfi )

• decrypts in the circuit

Circuit – independent of the number of usersUsers – offline

Page 17: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Solve Linear System

Input: HE(A),HE(b)

Decrypt A and b

bA =βPhase2: solve for β

Cholesky: compute L, s.t. A= LLT

Solve for β: L(LT β)=b

17Valeria Nikolaenko

β

Page 18: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Privacy Preserving Regression System

Evaluator Crypto Service Provider (CSP)

HEpk(f iTf i)

pk

(pk, sk)←HE.KeyGen

Create “garbled circuit” Ĉ

ComputeHEpk(A = ∑fiTfi),HEpk(b = ∑fiTri)

HEpk(f iTr i)

pk

pkPhase1

{G01, G0

2, …, G0n}

{G11, G1

2, …, G1n}

18Valeria Nikolaenko

Send ĈĈHEpk(f i

Tr i)

β

Garbled inputs

ComputeC(HEpk(A),HEpk(b))

Phase2

Page 19: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

System Properties

�Evaluator learns the model, not the inputs�Scalable with the number of users�Users can be offline

19Valeria Nikolaenko

Extensions:• Masking instead of decryption in circuit• Protection against malicious

Evaluator and CSP

Page 20: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Outline

• Motivation• Background on cryptographic tools• Linear regression• Our solution• Our solution• Experiments and performance

20Valeria Nikolaenko

Page 21: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Performance• Hybrid vs. Pure-Yao: 100 times improvement in time!

• For up to 20 features, 1000’s users• time < 3 min• communication < 1GB

21Valeria Nikolaenko

• communication < 1GB

• For 100 million of users, 20 features: 8.75 hours

• Tested on real datasets

Page 22: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Conclusion

• Privacy-preserving data-mining is efficient• Our approach can be used in practice

• Current work: matrix factorization

22Valeria Nikolaenko

• Current work: matrix factorization

• Future work:– implement other data mining algorithms– improving implementation to support high

parallelization

Page 23: Data-Mining on GBytes of Encrypted Dataforum.stanford.edu/events/2013/2013slides/security/Valeria_Nikolae… · Data-Mining on GBytes of Encrypted Data Valeria Nikolaenko In collaboration

Thank you!

23Valeria Nikolaenko

Questions? [email protected]