19
A Machine Learning Approach to Privacy - Preserving Data Mining Using Homomorphic Encryption Seiichi Ozawa Center for Mathematical Data Science Graduate School of Engineering Kobe University 1

A Machine Learning Approach to Privacy-Preserving Data Mining …besk.kr/UploadData/Editor/Conference/201809/9BCF3F42AA9C... · 2018-09-04 · 2. What is PPDM? Big data consists of

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

A Machine Learning Approach to Privacy-Preserving Data Mining

UsingHomomorphic Encryption

Seiichi OzawaCenter for Mathematical Data Science

Graduate School of Engineering Kobe University

1

What is PPDM?2

Big data consists of lots of sensitive private information such as names, addresses, phone number, etc.

Obviously, we should conduct proper masking to such sensitive data to analyze, but such masking could erase valuable information from a database.

How can we analyze big data to extract useful rules in a legitimate way?

Privacy-Preserving Data Mining (PPDM) Privacy-Preserving Machine Learning (PPML)

Approaches to PPDM (1)3

1. Homomorphic Encryption A form of encryption that allows computation on ciphertexts. - Additive HE: Pailier- Multiplicative HE: Unpadded RSA, ElGamal- Fully HE: addition + mulitiplication

2. Garbled CircuitsA cryptographic protocol that enables two-party secure computation in which two mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party. (Wikipedia)

Approaches to PPDM (2)4

3. Secret SharingA form of approaches to distributing a secret amongst a group of participants, each of whom is allocated a share of the secret. The secret can be reconstructed only when a sufficient number, of possibly different types, of shares are combined together; individual shares are of no use on their own. (Wikipedia)

4. Perturbation Approaches Adding random noise to avoid from leaking information with a mechanism satisfying Differential Privacy. - Input Perturbation- Algorithm Perturbation- Output Perturbation

A New Direction Using PPDM- Fintech

Analyst

Transaction Data ATM Data Internet

Banking

Bank A

Bank B

Transaction Data

ATMData

Internet Banking

Transaction Data

ATMData

Internet Banking

Current Approach New Approach

5

Bank C

Individual Analysis

Bank A Bank B Bank C Other Data etc...

Privacy-Preserving Data Mining Engine

Integrate Automate

- Detection of illegal money transfer- Calculate proper interest rate

Machine Learning over Encrypted Data

5

Privacy-Preserving Platform on Cloud Computing

6

Additively Homomorphic Encryption

Privacy Preserving Extreme Learning Machine

Sharing Roles in Computation7

• Nonlinear calculation with an activate function

• Multiplication and inner products

Data Contributor

• Summation of N data with Additive HE

Outsourced Server

• Calculating an inverse matrix and weights. Data Analyst

Privacy Preserving Extreme Learning Machine

8

Performance Evaluation9

(L:#Hidden Units)+0.04〜0.12

Data Sets: 4 Bench Mark Datasets in Machine Leaning Repository

Encryption: LWE base Homomorphic Encryption

Privacy-Preserving Naïve Bayes Classification

10

Classification Using Posterior Probability

Naïve Bayes Classification

Probability Estimationm: #training samples, mi: #class i samples

mit: #occurrences of xt in class i samples

x: input, λ: #classes

Assume independency for x

Privacy-Preserving Naïve Bayes Classification

11

Calculation of Posterior Probability

Obtain class i* with largest Posterior Probability

for the other labels j

Multiplying both sides by ,

d: dimensionality

Securely computed using homomorphic encryption

System Configuration - Multi-Party Computation

12

- CS1 and CS2 do not collude. - All participants are assumed “honest-but-curious”.

(Follow protocols but may want to know data information.)

CS1: No access to Alice’s secret key.

CS2 knows Alice’s secret key.

Alice’s public key

Computation of Epk(mid-1)13

Alice’s class labels

1

1

1

1

Encrypted labels (homomorphic encryption)

1

1

1

…1

1

1

1

21 10 8 12

Addition on encrypted values

21 10 8 12

21 10 8 12

21 10 8 12

Element-wise multiplication

9261 1000 512 1728

Epk (mid-1)

Alice

CS1

CS1

provide

(NOTE)

Computation of Epk(mid-1Πmjt)14

(NOTE)

1

1

1…

1

1

1

1

1

Encrypted one-hot encoded feature t

Encrypted labels

Key point of calculation of mjt

1 1

0

0

0

0 1 0 0

0 0 0 0

0 0 0 0

0 0 0 0

And accumulate for every samples

0 1 0 0

0 0 0 0

0 0 0 0

0 0 0 0

Computation of Epk(mid-1Πmjt)15

(NOTE)

1

1

1

…1

1

1

1

1

Encrypted one-hot encoded feature t

Encrypted labels

Key point of calculation of mjt

1 0

0

0

1

0 0 0 0

0 0 0 0

0 0 0 0

1 0 0 0

1 14 2 1

21 1 13 11

5 3 31 2

25 2 1 7

Still CS1 cannot observe actual values

Maximization of Posterior Probability

16

CS2CS1

Alice’s secret key

CS2

where

Construct Y

Random rotation prevents CS2 from observing actual index (classification result) For enhanced security, we are considering adoption of garbled circuit method here

Sending Classification Result17

CS2

CS1

Bob

Rotation parameter k(ex. 2)

which has 1 at the place of maximum column from rotated matrix

One-hot vector z

1

1(ex)

(ex)

Classification result: 2

Classification result is computed using information from both sides

Performance Evaluation18

Data Sets:Iris dataset from UCI ML Repository (training:test=80:20)

Preprocessing:Each feature (real-valued) are encoded into one-hot encoding(dimension: 5)

Encryption: HELib (implementation of Brakerski-Gentry-Vaikuntanathan scheme)

Prediction Accuracy(Smoothing: allocation of small value to 0 frequencies)

Execution TimeProtocol 1: computation of Epk(mi

d-1)Protocol 2: computation of Epk(H)Protocol 3: obtaining classification result

Concluding Remarks19

1. Privacy-Preserving Data Mining (PPDM) suggests a new direction of AI applications for Big Data.

2. Aggregation of big data provided by multiple organizations could bring a new impact in Big Data analysis.

3. Two machine learning approaches (i.e., PP-ELM and PP-NBC) are introduced.

4. The number of papers on PPDM/PPML is rapidly increasing at top conferences (ICML, USENIX, ACM CCS, etc.).