55
Controlling Leakage and Disclosure Risk in Seman6c Big Data pipelines Ernesto Damiani (joint work with Paolo Ceravolo)

Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

ControllingLeakageandDisclosureRiskinSeman6cBigDatapipelines

Ernesto Damiani (joint work with Paolo Ceravolo)

Page 2: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Outline

•  Introduc.on•  PrerequisitesandVision•  NewBigDataThreats•  SomeideasforaKNOW,PREVENTDETECT,COUNTERparadigmcounterthem.

Page 3: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BIG DATA INITIATIVE

Driveopenresearch&innova6oncollabora6onwithUAEandinterna6onalins6tutesandorganisa6onstocarryworldleadingresearchanddelivertangiblevalue,training,knowledgetransferandskillsdevelopmentinlinewiththeUAE

strategicpriori6esintheareasof:Smartenterprise,smartinfrastructure&smartsociety

Security Research CenterSECURITYOFTHEGLOBALICTINFRASTRUCTURENetworkandCommunica.onsSecurityBusinessProcessSecurityandPrivacySecurityandPrivacyofBigDataPlaJormsSECURITYASSURANCESecurityRiskAssessmentandMetricsCon.nuousSecurityMonitoringandTes.ngDATAPROTECTIONANDENCRYPTIONHighPerformanceHomomorphicEncryp.onLightweightCryptographyandMutualAuthen.ca.on

Page 4: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

SESARLAB•  SecureSoOwareArchitecturesandKnowledge-basedsystemslab(SESAR)hTp://sesar.d..unimi.it

•  Located on the new campus in Crema, 40 km south-east of Milan •  Industry collaborations: SAP, British Telecom Nokia Siemens, Cisco, Telecom Italia •  Part of the BigData Community

Page 5: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Someac.vi.es

Page 6: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

•  BigDataisnotjustatechnologicaladvancebutrepresentsaparadigmshiOinextrac6ngvaluefromcomplexmul6-partyprocesses

Vision

Page 7: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromclassicdatawarehousetoBigData

Page 8: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a
Page 9: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Internalvs.Externaldatasources

Page 10: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

ProcessingModels

•  Batchvsstreaming•  Hashvssketch

Page 11: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DataModels

•  DATAMODELS:•  Non-rela.onal(aTribute-value)

•  Extendedrela.onal(columnorrow-par..oned)

•  Neo-rela.onal(hybrid)•  LargeDataSharingInfrastructuretofeedMRcomputa6ons

Page 12: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DesigningDataRepresenta.onsforBigDataApplica.ons

•  Designastheyteachyouatschool•  Scaleup->DenormalizeInstance(dropindexes,triggers)

•  Solveproblemswithread/writeprecedence->Createwrite-toandread-fromdatareplicas(keepconsistencyperiodically)

•  MemcachetheDenormalizedInstance->(looseACID)

Page 13: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Rela.onaldenormaliza.onrefresher

•  Simpleconcept:flaTenarepea.nggroupinasingletable

•  InsteadofEMP (E#, D#, Ename) - DEPT(D#, DEPT, Address)

•  UseEMP (E#, Ename, DEPT, Address)

Page 14: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Denormalization backsides

•  Makes rows longer -> longer data transfers

•  Needs more RAM for in-memory processing

•  Redundant relationships improve performance at the expense of update overhead

Page 15: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

MemcacheTypicalusage:public Data readData (String query) {

Data answer= memcache.execute(query); if (answer== null) { answer= database.read(query); memcache.write(answer); } return answer;

}

Page 16: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Low-levelrepresenta.on

•  Key-valuedata-stores•  Persistent,distributed(key,value)maps

• Organizedinregionsheldbydifferentservers

•  Everyen.tyisasetofkey-valuepairs

Page 17: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Key-valuereminder•  Akeyhasmul.plecomponents,specifiedasanorderedlist.– Themajorkeyiden.fiestheen.tyandconsistsoftheleadingcomponentsofthekey.

– Thesubsequentcomponentsarecalledminorkeys.Thisorganiza.onissimilartoadirectorypathspecifica.oninafilesystem(/Major/minor1/minor2/).

•  The“value”partofthekey-valuepairisanuninterpretedstringofbytesofarbitrarylength

Page 18: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Example“Employee” : {

“Data” : { “EmpID”: “anyByteArray” “Photo” : “anyByteArray” “DeptID” : “anyByteArray”

REGION 1 } “Department” : {

“DeptID” : “anyByteArray” “DeptDescription” : “anyByteArray” }

REGION 2

This is a key, !not a column name!

Page 19: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DenormalizedExample“Employee” : {

“EmpData” : { “Photo” : “anyByteArray” “EmpID” : “anyByteArray” } “DeptData” : { “Description” : “anyByteArray” “DeptID” : “anyByteArray” “DeptLocation” : “anyByteArray” } }

REGION 1

Page 20: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Consensus

•  Sincedataitemsarereplicated,opera.onscanbeaTemptedconcurrentlyonreplicas

•  Synchroniza6onusingleaderelec6on(Paxos)•  Features

– Reliabilityandavailability– easy-to-understandseman.cs– performance,throughput,acceptablelatency

•  hTp://labs.google.com/papers/chubby-osdi06.pdf

Page 21: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Data Batch processing: Map/Reduce

•  Map/Reduce is a programming model for efficient distributed computing

•  It works like a Unix pipeline: –  cat input | grep | sort | uniq -c | cat

> output –  Input | Map | Shuffle & Sort | Reduce |

Output •  Efficiency from

– Data routing based on keys, reducing seeks – Pipelining

•  A good fit for a lot of applications –  Log processing – Web index building

Page 22: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Prac.calMapReduce=HDFS+Hadoop

Locality optimizations Map-Reduce queries HDFS for locations of input data Map tasks are scheduled close to the inputs when possible

Page 23: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

RiskandThreats

Page 24: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

RiskComponents

Page 25: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Breach•  IntermsoftheISO15408model,adatabreachoccurs

when“adigitalinforma6onassetisstolenbya8ackersbybreakingintotheICTsystemsornetworkswhereitisheld/transported”

•  BigDataBreach:theOofaBigDataassetexecutedbybreakingintotheICTinfrastructureofacollector,transformer,processororuserwhoholdsit.–  ManyaTacksdocumentedinthefieldcanbeclassifiedasBigDataBreachesinvolvingDataSourceassets

–  2014Targetdatabreachinvolved40milliondebitandcreditcardnumbers.

•  aBigDataBreachrequirespro-ac.vehos.lebehavior(thebreak-in)

Page 26: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Leak

•  BigDataLeakcanbedefinedasthe(totalorpar.al)disclosureofaBigDataAssetatacertainstageofitslifecycle.– ABigDataLeakcanhappenwhenBigDataare(unwillingly)disclosedbytheownertotheproviderofanoutsourcedprocess,e.g.compu.ngdataanaly.cs.

•  IntermsoftheaTackermodel,BigDataLeakcanbeexploitedevenbyahonest-but-curiousaTacker.

Page 27: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Degrada.on

•  BigDataDegrada6oncanbedefinedasinjec.onofdoctoredversionofaBigDataAssetatacertainstageofitslifecycle.– BigDataDegrada.oncanhappenwhenBigDataarepoisonedbytheproviderofanoutsourcedprocess,e.g.compu.ngdataanaly.cs.

•  IntermsoftheaTackermodel,BigDataDegrada.onrequirespro-ac.vehos.lebehavior(theinjec.on),

Page 28: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreatsasAPTs

•  Anadvancedpersistentthreat(APT)isasetofstealthyandcon.nuousprocesses,oOenorchestratedbyhuman(s)targe.ngaspecificen.ty.– ”Advanced”:signifiessophis.catedtechniquesusingmalwaretoexploitvulnerabili.esinsystems.

– ”Persistent”con.nuouslymonitoringandextrac.ngdatafromaspecifictarget.

Page 29: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

TheSilosproblem

•  Different data are held by different departments

•  Representation and processing choices were made independently and may conflict

•  Regulatory differences in collection and usage may make merging a challenge

•  Early merge, late merge or never merge?

Page 30: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datarepresenta.on

•  Theimplica.onondatamodellingandseman.cshavebeenmasterfullydiscussedinseveralworks...

•  HoweverlessaTen.onhasbeendevotedtaspectsthatareusuallysecondaryincentralizedapproaches

•  Oneoftheseaspectsistheimplica.onofpre-injec.onofJoinforDataLossPreven.on...

ESWC2016

Page 31: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BreakingtheSilos

Page 32: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Tradeoffs•  Atinges.on.metwotradoffsmustbemade:

–  I/OperrequestvsTotalDataVolume-Denormaliza.on,ifdonewell,bringsmorelocalitytodataandtheamountofI/Operrequestdecreases.

•  Anormalizedrela.onalstorehastoquerymul.pletablestofulfilleachrequest,leadingtonon-localizedfetches

•  Non–localizedfetchesleadingtomoreI/O,aseachfetchrequirstoareadandeachreadhasa“blocksize”minimum.

– ProcessingComplexityvsTotalDataVolume–•  Non–localizedfetchesarefollowedbyassemblingopera.onsthatrequireCPU.me.

•  Denormalizeddataprocessingissimpler,butatthecostofincreasedtotaldatavolumeinthestore.

Page 33: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

TransparentDe-normaliza.on

•  BigDatatoolssupporttransparentdenormalisingatdatainges.on.me.

•  TheuserofaBigDatacomputa.onmaywellignore1.  thenumberofreplicasatrun.me2.  TheRegionbordercrossingsgeneratedat

inges.on.meforefficiencyreasons.

Page 34: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

De-normaliza.ongrayarea

AnalyticsAlgorithms

AvailableObservation

Space

Context

GrayArea

Page 35: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Degrada.onviafaultyvalues

Source:[13]withthanks

Page 36: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Needforade-normaliza.onindex•  The“grayarea”isacri.calissueforBigDataLeakpreven.on,especiallyforBig-Data-as-a-Service

•  Thegloballikelihoodofexposureofdatainthegrayareacanbees.matedviaaBigDatastorage’sdegreeofde-normaliza6on,orD-index[11]–  (Normalized)medianofthenumberofreplicasperdataitemheldintheBigDatastorageduringareference.meintervalΔ

•  Measurableviatrustedprobes[12],morelater

Page 37: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromD-indextodisclosureprobability(1)

•  TheD-indexisseenasa“propensionfactor”todisclosure

•  Intui.vely,itmeasurestheoverall“unrequestedtrips”thatdataitemsvalueshavedonetotheneighborhoodsofotherrelateddataitemsjustbecausethe“fuelprice”,i.e.thestoragecostintheBigDatasystem,islow.

Page 38: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromD-indextodisclosureprobability(2)

•  TheD-indexitselfcannotbedirectlyiden.fiedasaprobability

•  Although,beingnormalized,itsvaluefallsinthe[0,1]interval,itlackssomeformalproper.eswewouldexpectfromalikelihood(forinstance,thereisnorela.onlinking(1-Dindex)andtheintegrityofthedataspace).

•  AformalmappingprocedurecanbedevisedtoturntheD-indexintoarigorousprobabilityorpossibilitymeasure[6],[7].

Page 39: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Needforanaccrualconsensusindex

•  Foreachdataitemi,ΦiisthenormalizednumberofupdatesthatoriginatedeachvalueoficurrentlyheldintheBigDatastorage

•  Measuresthebasisfortheconsensusthatoriginatedeachvalue– Smallconsensusbasis->higherlikelihoodofthedataitemdegrada.on

•  InspiredtoCassandrafailureindex[14]•  Measurableviaatrusteddetectorthatoutputsavalue,Φi,associatedwitheachitem.

Page 40: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Independentinterpreta.onsofΦ

Source:[14]

Page 41: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

LeakvsBreachvsDegrada.onrevisited

•  BigDataBreach:adversarybreaksintothesystemandsees(a)allavailabledatasourcesand(b)theinternalstateoftheBigDatasystem.–  Nosilosboundaries:fullplayground!

•  BigDataLeak:adversarycollaboratestothecomputa.onofanaly.csandtakesadvantageofde-normaliza.ontoaTractinforma.oninregions

•  BigDataDegrada6on:honest-but-curiousadversarieswilljustpeek,butamaliciousaTackercoulddoctorherownorotherpeople’sdata,leadingtowrongdecisionswhichmaycausepermanentdamage.

.

Page 42: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

42

Someideas

•  Systema.cstudyofBigDataSecurityprac.cesiss.llinitsinfancy.

•  Organizebestprac.cesaroundtheworkontop-levelcybersecurityfunc.onsongoingatNIST(availableathTp://www.nist.gov/itl/upload/draO_framework_core.pdf)–  Closelybasedonfunc.onssuggestedbypubliccomments.

•  Thesefunc.onsareKnow,Prevent,Detect,Respond,andRecover.

Page 43: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a
Page 44: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Aprac.calexample

Page 45: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datastructure

From https://en.wikipedia.org/wiki/K-anonymity

•  Thisdatahas2-anonymitywithrespecttotheaTributes'Age','Gender'and'Stateofdomicile'sinceforanycombina.onoftheseaTributestherearealwaysatleast2rowswiththoseexactaTributes.

•  TheaTributesavailabletoanadversaryarecalled"quasi-iden.fiers".Each"quasi-iden.fier"tupleoccursinatleastkrecordsforadatasetwithk-anonymity.

Page 46: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datastructure

ESWC2016

From https://en.wikipedia.org/wiki/K-anonymity

Ourdatasetinneo4j

Page 47: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

AchievingthedesiredK-anonymity•  Therearetwocommonmethodsforachievingk-anonymityforsomevalueofk:

•  Suppression:inourexampleweremovename•  Generalisa.on:inourexampleagevaluescanbesubs.tutedwitharange

•  Butthek-anonymitylevelofagivensubsetofdataselectedbyaquerydependsontwofactors:–  theObfusca.oncreatedbySuppressionandGeneralisa.onofsomeaTributesintheoriginaldataset

–  theSegmenta.onofthequeryresult

ESWC2016

Page 48: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Segmenta.on•  Supposewesubmitaquerywhich

specifiesthegenderandarangefortheage:

•  Theresulthas2-anonymityw.r.t.Domicile;1-anonymityw.r.t.ReligionandDisease-guessingthevalueofthelaTeraTributeswilliden.fythepa.ent.

MATCH (s:User), (d:Domicile), (r:Religion), (e:Disease), (s)-[q2:REL]->(r), (s)-[q1:REL]->(d), (s)-[q3:REL]->(e) WHERE toInt(s.age) < 25 AND s.gender = "Female”RETURN (s)-[]-();

ESWC2016

Page 49: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Problem

•  InBigDatastorage,amalicioususerextrac.ng/inspec.ngaregion=selec.onasubsetofdata

•  Segmenta.onofBigdataregionsisdifficulttocontrol

•  Inferencesarepossible

Page 50: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Apossiblecountermeasure:RedundantRela.ons

•  Addingredundantrela.onswecanlimittheeffectofSegmenta.on

MATCH (s:User), (e:Disease)WITH COLLECT(e) AS Disease, sFOREACH (e2 in Disease |CREATE (s)-[q3:REL {context: "4321"}]->(e2))RETURN (s)-[]-();

ESWC2016

Page 51: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Secret•  Thesecretisa

contextualiza.onindexthatcountersignstherela.onshipthatwasoriginatedfromthetruedatasourceandnotforredundancy.Inourexample:

MATCH (s:User), (d:Domicile), (r:Religion), (e:Disease), (s)-[q2:REL]->(r), (s)-[q1:REL]->(d), (s)-[q3:REL {context: "1234"}]->(e) WHERE toInt(s.age) < 25 AND s.gender = "Female" RETURN (s)-[q3]-(e), (s)-[q1]-(d), (s)-[q2]-(r);

ESWC2016

Page 52: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Notapanaceaw.r.t.distribu.onschecks

•  ATackercanstudythedistribu.onsamongthecontextualrela.onships

MATCH (s:User)-[q:REL {context: "1234"}]->(e:Disease)RETURN id(s), Count(e) AS Relationships;

ESWC2016

Page 53: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Hashing

•  Allrela.onshipsaremarkedwiththesamecontext

•  Anhashindexiscreatedoverthetriple: (s)-[REL]-(e)•  Thehashfunc.onisthesecret

– Given(s)and(e)nodesweknowiftherela.onisintheoriginaldataset

ESWC2016

Page 54: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Technologycannotdoitalone(1)

•  Theopportunis.cone-shotaTackstypicaloftheearlydaysofBigDatahavebeensupplementedbyleakagesthataremorepersistentand,insomecases,moreworrisome.

•  WeneedtostartdesigningBigDatasystemsnotjusttopreventaTacksandrecoverfromthem,butalsotodetectsuccessfulaTackersquicklyandcontainthemsothatanydataleakagecanbeiden.fiedandcountered.

Page 55: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

References[1]HesmanSaey,T.,“BigData,BigChallenges”,ScienceNews,February7,2015[2]Chi,Guangqing,JeremyR.Porter,ArthurG.Cosby,andDavidLevinson.2013."TheImpactofGasolinePriceChangesonTrafficSafety:ATimeGeographyExplana.on."JournalofTransportGeography28(1):1–11.[3]BellandiV.,CimatoS.,DamianiE.,GianiniG.andZilli,A.“TowardsEconomics-AwareRiskAssessmentonTheCloud”,IEEESecurityandPrivacy,toappearinNovember2015[4]Demirkan,H.,&Delen,D.(2013).Leveragingthecapabili.esofservice-orienteddecisionsupportsystems:Puznganaly.csandbigdataincloud.DecisionSupportSystems,55(1),412-421.[5]Damiani,E.,Oliboni,B.,&Tanca,L.(2001).FuzzytechniquesforXMLdatasmushing.InComputa.onalIntelligence.TheoryandApplica.ons(pp.637-652).SpringerBerlinHeidelberg.[6]Damiani,E.,Cimato,S.,&Gianini,G.(2014).“Ariskmodelforcloudprocesses”.TheISCInterna.onalJournalofInforma.onSecurity,6(2),99-123.[7]Bellandi,V.,Cimato,S.,Damiani,E.,&Gianini,G.(2015).“Possibilis.cassessmentofprocess-relateddisclosurerisksinthecloud”.InW.Pedryczetal.,eds.,Computa.onalIntelligenceandQuan.ta.veSoOwareEngineering.Springer-Verlag,2014[8]Chen,M.,Mao,S.,Zhang,Y.,&Leung,V.C.(2014).Bigdatastorage.InBigData(pp.33-49).SpringerInterna.onalPublishing.[9]Forbes,“BigDataBreachesof2014”,availableathTp://www.forbes.com/sites/moneybuilder/2015/01/13/the-big-data-breaches-of-2014/,2015.[10]B.Biggio,B.Nelson,P.Laskov“PoisoningATacksagainstSupportVectorMachines”,Proceedingsofthe29thInterna.onalConferenceonMachineLearningEdinburgh,Scotland,UK,2012[11]E.Damiani,TowardBigDataLeakAnalysis,ProceedingsofIEEEPSBD2015,SanJosè,CA,2015[12]ClaudioAgos.noArdagna,RasoolAsal,ErnestoDamiani,QuangHieuVu:OntheManagementofCloudNon-Func.onalProper.es:TheCloudTransparencyToolkit.NTMS2014:1-4[13]SantoshAditham,NagarajanRanganathan,ANovelFrameworkforMi.ga.ngInsiderATacksinBigDataSystems,ProceedingsofIEEEPSBD2015,SanJosè,CA,2015[14]NaohiroHayashibara,XavierDéfago,RamiYared,andTakuyaKatayama,TheϕAccrualFailureDetector,JSTIS-RR-2004-010