2
HAL Id: hal-01956155 https://hal.archives-ouvertes.fr/hal-01956155 Submitted on 14 Dec 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Towards Scalable, Effcient and Privacy Preserving Machine Learning Rania Talbi, Sara Bouchenak To cite this version: Rania Talbi, Sara Bouchenak. Towards Scalable, Effcient and Privacy Preserving Machine Learning. Middleware ’18 Doctoral Symposium, Dec 2018, Rennes, France. hal-01956155

Towards Scalable, Efficient and Privacy Preserving Machine ... .… · § Provide anend-to-end privacy preserving outsourced data classification service. § Enable a set of mutually

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Scalable, Efficient and Privacy Preserving Machine ... .… · § Provide anend-to-end privacy preserving outsourced data classification service. § Enable a set of mutually

HAL Id: hal-01956155https://hal.archives-ouvertes.fr/hal-01956155

Submitted on 14 Dec 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Towards Scalable, Efficient and Privacy PreservingMachine Learning

Rania Talbi, Sara Bouchenak

To cite this version:Rania Talbi, Sara Bouchenak. Towards Scalable, Efficient and Privacy Preserving Machine Learning.Middleware ’18 Doctoral Symposium, Dec 2018, Rennes, France. �hal-01956155�

Page 2: Towards Scalable, Efficient and Privacy Preserving Machine ... .… · § Provide anend-to-end privacy preserving outsourced data classification service. § Enable a set of mutually

Preliminaryresults

TowardsScalable,EfficientandPrivacyPreservingMachineLearning

ContextandMotivation

RaniaTalbi,SaraBouchenakINSALyon,France

{firstname.lastname}@insa-lyon.fr

Relatedwork

Designprinciples

Objectives

References

M(⋃B%�� )

𝑩𝒊 :Localbanktransactionsof𝐶+

𝑪𝑭:Fraudulentcompany

𝑪𝒊 :Companyi

𝐶.

𝐶/ 𝐶0

𝐶1

𝐴𝑨:CentralSupervisionAuthority

𝑀:DataMiningforfrauddetection

December10th,Middleware2018’sdoctoralsymposium- Rennes,France.

DynAmic Privacy Preserving machine Learning Framework (DAPPLE)

𝐷𝑂/

PrivacyPreservingClassifierLearningPrivacyPreservingClassPrediction

𝑸𝒋

[𝑋;]=>?

[𝐶;]=>?

[𝑤>]=>A

𝐶𝑆𝑃 𝐷𝑂.

.

.

.

𝐷𝑂D

[𝑆>.]=>E

[𝑆>/]=>F

[𝑆>D]=>GIncrementalupdateofthedatamodel

𝑫𝑶𝒊 :DataOwneri

𝑸𝒋 :ClassificationQeurierj

[𝒘𝒌]𝒑𝒌𝒘 :Encrypteddatamodel

𝐂𝐒𝐏:ClassificationServiceProvider

[𝑿𝒋]𝒑𝒌𝒋 :Encryptedclassificationquery

[𝑪𝒋]𝒑𝒌𝒋 :Encryptedclassificationresponse

[𝑺𝒌𝒊]𝒑𝒌𝒊 :Encryptedlocaltrainingdatachunkfromdataowner𝐷𝑂+

§ Minimizethecomputationalcostsincurredbyprivacypreservation.§ Providean end-to-endprivacypreservingoutsourced dataclassificationservice.§ Enableasetofmutuallyuntrusteddataownerstohaveaglobalvisionontheunionoftheirdata

withoutbreachingtheprivacyofeachoneofthem.§ Enabledynamicdatamodelupdateswhennewtrainingdatasamplesareavailable.

§ Wehaveusedasyntheticdatasetfor

frauddetectioninaB2Bnetwork.

§ Thisdatasetcontains1000bank

transactionswith9attributeseach.

§ Wecompareourworktothe

Ciphermedframework[8].

PPML

DifferentMLalgorithms

DifferentPrivacy-preservationobjectives

Differentarchitectures

- Clustering[1]- Classification[2]- AssociationRule

Mining[3] MLoutput

protection

Originaldata

protection

….Distributed [4]

Outsourced[5]

Privacy

RuntimeUtility

Privacy

RuntimeUtility

Cryptographictechniques(SMC/HE,GC,OT)

Non-cryptographictechniques(PP-DataPublishing

techniques)

PrivacyPreservationtechniques

Privacy

RuntimeUtility

§ Cryptographicbasedprotection(data

model,trainingdata,classificationqueries

andresponses)

§ Decentprivacyandutilitylevels§ Partialhomomorphicencryption(PHE)

basedbuildingblocks§ Efficientruntime

§ EntirelyoutsourcedMLcomputationsoverencrypteddata

§ CombinePHEwithcryptographicblinding

(DTPKCcryptosystem[6])

𝑒𝑥 ∶ [𝑥]=>⨂ 𝑟 => = [𝑥⨁𝑟]=>

𝑼𝟏 𝑼𝟐

§ (1)Blindinputs

§ (2)Partiallydecryptblindedvalues

§ (3)Decryptblindedvalues

§ (4)Runoperationoverblindedvalues

§ (4)removeblindingfromtheresult

(2)

(4)

§ WeimplementedtheVFDTincremental

decisiontreelearningalgorithm[7]

Naiveapproach:acombinationoflowlevelPP-

buildingblocks1st optimization:useinlinebuildingblocks

2nd optimization:Parallelcomputing

B

A

A B

§ [1]X.Hu, et.al:Privacy-PreservingK-MeansClusteringUponNegativeDatabases. ICONIP(4) 2018.§ [2]S.Kimetal. Privacy-PreservingNaiveBayesClassificationUsingFullyHomomorphicEncryption. ICONIP

(4)2018: 349-358§ [3]L.Liu etal:Privacy-PreservingMiningofAssociationRuleonOutsourcedCloudDatafromMultiple

Parties. ACISP2018: 431-451§ [4]H.Yu etal.:Privacy-PreservingSVMClassificationonVerticallyPartitionedData. PAKDD 2006: 647-656§ [5]T.Li etal.:Outsourcedprivacy-preservingclassificationserviceoverencrypteddata. J.NetworkandComputer

Applications 106: 100-110 (2018)§ [6]X.Liu etal.:AnEfficientPrivacy-PreservingOutsourcedCalculationToolkitWithMultipleKeys. IEEETrans

InformationForensicsandSecurity 11(11): 2401-2414 (2016)§ [7]M.Domingos etal.:Mininghigh-speeddatastreams. KDD 2000: 71-80§ [8]R.Bost etal. :MachineLearningClassificationoverEncryptedData. NDSS 2015

2018ACM/IFIPInternationalMiddlewareConference,DoctoralSymposium,December10-14th2018– Rennes,France