Privacy-Preserving Machine Learningcraven/cs760/lectures/Privacy.pdf · 2018-04-30 · •Benefits...

Preview:

Citation preview

Privacy-PreservingMachineLearning

CS760:MachineLearningSpring2018

MarkCravenandDavidPage

www.biostat.wisc.edu/~craven/cs760

1

GoalsfortheLecture

• Youshouldunderstandthefollowingconcepts:

• publickeycryptography• linearlyhomomorphicencryption• fullyhomomorphicencryption• differentialprivacy• globalsensitivity• Laplacemechanism

• ThanksEricLantzandIreneGiacomelli!

2

3

4

5

6

F(x)

7

8

F(x)

9

F(x)

10

F(x)

11

12

• Largedatabasesofpatientinformation• Regulationsandexpectationsofprivacy• Largepotentialgainsfromdatamining• Howtobalanceutilityandprivacy?

• Privacyapproaches• k-anonymity(Sweeney,2002),l-diversity(Machanavajjhala,2007),t-closeness(Li,2007)

• Homomorphicencryption• Differentialprivacy(Dwork,2006)

NeedforPrivacy

13

Recall:IWPCWarfarindosingalgorithm

• Overadozenreal-valuepredictiontechniqueswereused

• Linearregressionandsupportvectorregressionwerethebestperformers

5.6044-0.2614 Age in decades

+0.0087 Height in cm+0.0128 Weight in kg-0.8677 VKORC1A/G-1.6974 VKORC1 A/A-0.4854 VKORC1 genotype unknown-0.5211 CYP2C9 *1/*2-0.9357 CYP2C9 *1/*3-1.0616 CYP2C9 *2/*2-1.9206 CYP2C9 *2/*3-2.3312 CYP2C9 *3/*3-0.2188 CYP2C9 genotype unknown-0.1092 Asian race-0.2760 Black or African American-0.1032 Missing or Mixed race+1.1816 Enzyme inducer status-0.5503 Amiodarone status= square root of final dose

0 5 10 15

PGx

Clinical

Fixed

PredictionError(mg/wk)

14

Recall:Ridge Regression

Data point: (x, y ), x ∈ Rd and y ∈ R

Model: w ∈ Rd vector ofweights

8/ 29

Public-Key Encryption

9/ 29

sk → secret keypk → public key

Encryption:

Decryption:

Public-Key Encryption

sk → secret keypk → public key

Encryption: c = Encpk (m)c → hides m to everyone that does NOT have sk

Decryption:

m =hello! Enc c =6a7#87tpk

9/ 29

Public-Key Encryption

c =6a7#87tm =hello! Enc Dec hello!sk

9/ 29

sk → secret keypk → public key

Encryption: c = Encpk (m)c → hides m to everyone that does NOT have sk

Decryption:

c→revealsmtoeveryonethathas sk

pk

Linearly-Homomorphic Encryption

10/ 29

Addition of ciphertexts

Encpk(m1) 83 Encpk(m2) = Encpk(m1 + m2)

Multiplication of a ciphertext by a plaintext

m1 [SJ Encpk(m2) = Encpk(m1 × m2)

Linearly-Homomorphic Encryption

10/ 29

Addition of ciphertexts

Encpk(m1) 83 Encpk(m2) = Encpk(m1 + m2)

Multiplication of a ciphertext by a plaintext (m1 is public!)

M1 [SJ Encpk(m2) = Encpk(m1 × m2)

Linearly-Homomorphic Encryption

10/ 29

Addition of ciphertexts

Encpk(m1) 83 Encpk(m2) = Encpk(m1 + m2)

Multiplication of a ciphertext by a plaintext (m1 is public)

M1 [SJ Encpk(m2) = Encpk(m1 × m2)

Fully homomorphic requires multiplication analog of

and currently is much slower.

Database (DB): 105 × 102 real numbers in [−2000, 2000] with 3 digits inthe fractional part. Times using linearly-homomorphic encryption:- encrypt the DB: 40 minutes- sum of two DBs: 3 seconds- mult. by a constant: 25 mins

Illustration

...

D1

D2

D t

ML Engine

Crypto Provider

11/ 29

Illustration

...

D1

D2

D t

ML Engine

Crypto Provider

(sk,pk)

pk

pk

pk

pk

pk

pk

11/ 29

Illustration

...

D 1

D2

D t

ML Engine

Crypto Provider

(sk,pk)

pk

pk

pk

Encpk(D1)

Encpk(D2)

Encpk(D t )

11/ 29

Illustration

...

D 1

D2

D t

ML Engine

(sk,pk)

Crypto Provider

11/ 29

pk

pk

pk

Encpk (D 1)Encpk (D 2)

...Encpk(D t )

Illustration

ML Engine

Crypto Provider

w trained onD1 ∪ ···∪ D t

interactive protocol

11/ 29

Illustration

Interactive protocol:

1. the ML engine “masks inside the encryption” Encpk(D) → Encpk(D˜)

2. the crypto provider decrypts, gets D̃ and computes a “maskedmodel”, w̃

3. the ML engine computes the real model w from the masked one

Illustration

Results for seven UCI datasets (time in seconds):(phase 1 = encryption, phase 2 = interactive protocol)

n = training data (number of data points)d = number of features

• Benefits• Highutility– becauseNoNoise!!!• Nooneseesdata“intheclear”

• Disadvantages• Models(orevenjustpredictions)maystillgiveawaymoreinformationabouttrainingexamples(e.g.,patients)thanaboutotherexamples(patients)

• Veryhigh(asofnow,completelyimpractical)runtimesforsomemethods(fullyhomomorphicencryption)

• Feasibleapproaches(e.g.,linearlyhomomorphicencryption)requirere-developingeachlearningalgorithm(e.g.,ridgeregression)fromscratchwithlimitedoperations

• Protectionsmaybelostif/whenQuantumComputersbecomeavailable

CommentsonHomomorphicEncryption

29

JustReleasingaLearnedModel CanViolatePrivacy

• IWPCWarfarinModel• Canwepredictgenotypeoftrainingsetbetterthanothers?

30

PrivacyBlueprint

31

• Goal• Smalladdedriskofadversarylearning(private)informationaboutanindividualifhis/herdataintheprivatedatabaseversusnotinthedatabase

• Informally• Queryoutputdoesnotchangemuchbetweenneighboringdatabases• E.g.:whatisfractionofpeopleinclinicwithdiabetes?

DifferentialPrivacy(Dwork,2006)

32

• Given• InputdatabaseD• Randomizedalgorithmf:D->Range(f)• f is(e ,d)-differentiallyprivateiff

• ForanyS Î Range(f)andD’whered(D,D’)=1• ε andd areprivacybudget

• Smallermeansmoreprivate

DifferentialPrivacyDefinition

33

• Note:Definitionrequiresstochasticoutput… howtoachieve?

• Perturbation{LaplaceMechanism}(Dwork,2006)• Calculatecorrectanswerf(D)• Addnoisef(D)+h

• Soft-max{ExponentialMechanism}(McSherry andTalwar,2007)• Qualityfunctionq(D,s)• Exponentialweightingexp(e q(D,s))

• Inbothcases,noiseisproportionaltothesensitivity ofthefunction

ObtainingDifferentialPrivacy

34

• Givenf :D ->R,globalsensitivityoff is

• Worstcase• Oncef andthedomainofD arechosen,globalsensitivityisfixed

GlobalSensitivity

35

AddLaplaceNoise,μ=0,b afunctionofsensitivityandε

36

Privacy-UtilityTradeoffforPrivateWarfarinModel

37

CommentsonDifferentialPrivacy

• Provableguarantees,regardlessofsideinformationadversaryhas

• Elegantformulationthatleadstomanyattractivealgorithms

• Hasinsightsforotherareassuchasfairness

• Poorintuitionforhowtoselectε

• Cankillutility(e.g.,accuracy,AUC)unlesswehaveverymanyexamples… sogoodfitforageofBigDatabutnotformediumdata

• Howtosetprivacybudget?IfreleaseDPdataset,canupdatewithnewreleasewithoutaddingtopreviousε,somustplanfarahead

Recommended