102
SHARING SENSITIVE DATA WITH CONFIDENCE: THE DATATAGS SYSTEM Mercè Crosas, Ph.D. Chief Data Science and Technology Officer IQSS Harvard University Michael Bar-Sinai PhD candidate in Computer Science at the Ben-Gurion University of the Negev, Israel Fellow at the Institute for Quantitative Social Science at Harvard University. Latanya Sweeney Professor of Government and Technology in Residence Director of Data Privacy Lab Harvard University

SHARING SENSITIVE DATA WITH CONFIDENCE: THE DATATAGS SYSTEM · SHARING SENSITIVE DATA WITH CONFIDENCE: THE DATATAGS SYSTEM ... , Bar-Sinai M. Sharing Sensitive Data with Confidence:

Embed Size (px)

Citation preview

SHARINGSENSITIVEDATAWITHCONFIDENCE:THEDATATAGSSYSTEM

MercèCrosas,Ph.D.ChiefDataScienceandTechnologyOfficerIQSSHarvardUniversity

MichaelBar-SinaiPhDcandidateinComputerScienceattheBen-GurionUniversityoftheNegev,IsraelFellowattheInstituteforQuantitativeSocialScienceatHarvardUniversity.

LatanyaSweeneyProfessorofGovernmentandTechnologyinResidenceDirectorofDataPrivacyLabHarvardUniversity

Datasharing: goodforyouandgoodfortheworld

Datasharing: goodforyouandgoodfortheworld

ResearchersGetcreditfortheirdata

Datasharing: goodforyouandgoodfortheworld

ResearchersGetcreditfortheirdata

PublishersandJournals

Verifypublishedwork

Datasharing: goodforyouandgoodfortheworld

ResearchersGetcreditfortheirdata

PublishersandJournals

Verifypublishedwork

Federalfundingagencies

Makepublicassets

accessible

Datasharing: goodforyouandgoodfortheworld

ResearchersGetcreditfortheirdata

PublishersandJournals

Verifypublishedwork

Federalfundingagencies

Makepublicassets

accessible

ScienceValidate,reuseandextend

previouswork

dataverse.org

Open-sourcesoftwaredevelopedatHarvard’sIQSSsince2006Usedtoshare,publish,citeandarchiveresearchdata

Installedin12sitesworldwideServing100sofuniversitiesandorganizations

HarvardDataverse:dataverse.harvard.eduStartedasacommunityrepositoryforSocialScienceNowopentoallresearchfieldsandallresearchers

Morethan1300dataversesMorethan59,000datasets

Morethan1,500,000downloads

DataRepositoriesvs RepositorySoftware

DataRepositoriesvs RepositorySoftware

DataRepositoriesvs RepositorySoftware

DataRepositoriesvs RepositorySoftware

But,existingcommunityrepositoriesdonotsupportsensitivedata

“UserUploadsmustbevoidofallidentifiableinformation,suchthatre-identificationofanysubjectsfromtheamalgamationoftheinformationavailablefromallofthematerials(acrossdatasetsanddataverses)uploadedunderanyoneauthorand/orusershouldnotbepossible.”

“SubmitterrepresentsandwarrantsthattheContentdoesnotcontainanyinformation(i)whichidentifies,orwhichcanbeusedinconjunctionwithotherpubliclyavailableinformationtopersonallyidentify,anyindividual;”

“IfyouaresubmittinghumansequencestoGenBank,donotincludeanydatathatcouldrevealthepersonalidentityofthesource.Itisourassumptionthatyouhavereceivedanynecessaryinformedconsentauthorizationsthatyourorganizationsrequirepriortosubmittingyoursequences.”

GenBank

HOWCANWEMAXIMIZESHARINGSENSITIVEDATAWHILEBEINGMINDFULOFPRIVACY?

SweeneyL,CrosasM,Bar-SinaiM.SharingSensitiveDatawithConfidence:TheDataTagsSystem.TechnologyScience.2015101601.October16,2015.http://techscience.org/a/2015101601

Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling

Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling

Adatatagsrepositoryisonethatstoresandsharesdatafilesinaccordancewithastandardizedandorderedlevelsofsecurityandaccessrequirements.

ADataTagsRepositorymustsatisfythefollowingconditions:

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag3. Arecipientofafilefromtherepositorymust:

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag3. Arecipientofafilefromtherepositorymust:

a. satisfyfile’saccessrequirements,

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag3. Arecipientofafilefromtherepositorymust:

a. satisfyfile’saccessrequirements,

b. producesufficientcredentialsasrequested,

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag3. Arecipientofafilefromtherepositorymust:

a. satisfyfile’saccessrequirements,

b. producesufficientcredentialsasrequested,

c. andagreetoanytermsofuserequiredtoacquirethefile.

ADataTagsRepositorymustsatisfythefollowingconditions:

1. Supportsmorethanonedatatag

2. Eachfileintherepositorymusthaveoneandonlyonedatatag

a. additionalrequirementscannotweakenthefilesecurity

b. andcannotrequiredthesameormoresecuritythanamore

restrictivedatatag3. Arecipientofafilefromtherepositorymust:

a. satisfyfile’saccessrequirements,

b. producesufficientcredentialsasrequested,

c. andagreetoanytermsofuserequiredtoacquirethefile.4. Providestechnologicalguaranteesforrequirements1,2and3.

DatatagsLevelsTagType Description SecurityFeatures AccessRequirements

Blue Public ClearstorageCleartransmission Open

Green Controlledpublic

ClearstorageCleartransmission

Email,OAuthverifiedregistration

Yellow Accountable ClearstorageEncryptedtransmit

Password,Registered,Approval,ClickDUA

Orange Moreaccountable

EncryptedstorageEncryptedtransmit

Password,Registered,Approval,SignedDUA

Red Fullyaccountable

EncryptedstorageEncryptedtransmit

Two-factorauthentication,Approval,SignedDUA

Crimson Maximallyrestricted

MultiEncryptstoreEncryptedtransmit

Two-factorauthentication,Approval,SignedDUA

DATATAGSWITHHARVARDDATAVERSE

Level1:Nosensitivedata;opendata

Level1:De-identifieddata

Level2:ConfidentialinformationbyUniversitystandards;nomaterialharm

Level3:Confidentialinformationthatcouldcausematerialharm(non-level4FERPA)

Level4:Highriskconfidentialinformation(SSN)

Level5Informationthatwouldcausesevereharm

DataTagsvsHarvardSecurityLevels

Dataverses,Datasets,DataFilesandDataTags

ADatatagisassignedtoeachDataFile(nottotheDataset)

DataTagsWorkflowwithDataverse

http://datatags.orghttp://privacytools.seas.harvard.edu

DataTagsWorkflowwithDataverse

DataFileIngestion

http://datatags.orghttp://privacytools.seas.harvard.edu

DataTagsWorkflowwithDataverse

DataFileIngestion

http://datatags.orghttp://privacytools.seas.harvard.edu

AutomaticInterview

ReviewBoardApproval

DataTagsWorkflowwithDataverse

DataFileIngestion

http://datatags.orghttp://privacytools.seas.harvard.edu

AutomaticInterview

ReviewBoardApproval

DataTagsWorkflowwithDataverse

DataFileIngestion

SensitiveDataset

http://datatags.orghttp://privacytools.seas.harvard.edu

AutomaticInterview

ReviewBoardApproval

DataTagsWorkflowwithDataverse

DataFileIngestion

SensitiveDataset

DirectAccess

http://datatags.orghttp://privacytools.seas.harvard.edu

AuthorizedSignedDUA

AutomaticInterview

ReviewBoardApproval

DataTagsWorkflowwithDataverse

DataFileIngestion

SensitiveDataset

DirectAccess

PrivacyPreservingAccess

http://datatags.orghttp://privacytools.seas.harvard.edu

AuthorizedSignedDUA

AutomaticInterview

ReviewBoardApproval

ACuratorModelforPrivacy-PreservingAnalysis

Acknowledgement:Honaker,J.andNissim,K.,DataPrivacyToolsProject

DifferentiallyPrivatestatistics(summaries,causalinference,regression,interactivequeries)

CredentialsandRetrievalinDataverse

DataFilenotrestrictedGuestbook–Emailtoaccess

DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;ClickDUA

DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;SignDUA

DataFilerestricted;InCommonaccount;Requestaccess;Two-FactorauthenticationSignDUA

OTHERTYPEOFDATATAGSREPOSITORIES

Betty:SoleResearcher

• Receivedconsentfromparticipants• Repositoryforsharinghighly

sensitivedata(notnecessarilyHarvardDataverse)

Betty:GlobalResearchRepositoryIngestion and

Decision-making Knowledge

IRB determination or an interview system.

Codification and Infrastructure

Blue, Green, Yellow, Orange, Red, Crimson.

Credentials and Retrieval

Different files may additionally require specific terms of use based on legal or regulatory requirements or adopted best practices.

(SameusecaseasDataverse)

Adam:LargeMedicalResearchGroup

• Repositoryforsharinglocaldata• Repositoryforpublisheddata• Repositoryforsharingwith

collaborators

Adam:LargeMedicalResearchGroup

Diane:MultinationalCorporation

• Cloudcontainsdatafromallovertheworld,collectedunderavarietyofterms,subjecttodifferentlaws

• Repositorythatenforcesrequirementsonemployeeaccess

Diane:MultinationalCorporation

Charles:InstitutionalReviewBoard

• Documentcommitteedecisions• Recommendhandlingbasedon

priordecisions

Charles:InstitutionalReviewBoard

Howtechnologyimpactshumans.

DATA

Howtechnologyimpactshumans.

DATA

Howtechnologyimpactshumans.

DATA

DirectDepositDirectTagging

KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101

techscience.org

KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101

techscience.org

Published2015-09-29

SweeneyL,YooJ.De-anonymizingSouthKoreanResidentRegistrationNumbersSharedinPrescriptionData.TechnologyScience.2015092901.September29,2015.http://techscience.org/a/2015092901

techscience.org

DATATAGGINGTOOLS

ADatataggingtoolneeds:

• FormaldescriptionofaDatatag– Capturethedatahandlingpolicyofthetag– Capturethe“stricter-than”ordering

• Interviewcreationtool– Supportuser-friendlyinterviews– Decideonthedatatagbasedontheanswersonly

FormalDescriptionofaDatatag

• Modeldatahandlingpoliciesasasetoforthogonalaspects– Storageencryption,accessrequirements…

• Describeimplementationoptionsforeachaspect;orderimplementationsfromlenienttostrict– Clear<Encrypted<MultiEncrypt

DataHandlingPolicySpace

DataHandlingPolicySpace

Tags:TagsSpacefile(.ts)

• Describeatagspace

• Conveniencefeatures:hierarchy,“slots”ofdifferenttypes,top-downdesignsupport,comments…

ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU

ComprehensionAid:Visualization

ComprehensionAid:Visualization

ComprehensionAid:Visualization

FindingtheRightTag–DecisionGraph

• Directed,AcyclicGraph

• NodeTypes:– Ask– Set– Convenience:Call,End,Reject,Todo

FindingtheRightTag–DecisionGraph

ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU

InterviewVisualization

Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject

InterviewVisualization

Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject

InterviewVisualization

Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject

InterviewontheWeb

InterviewontheWeb

InterviewontheWeb

InterviewontheWeb

InterviewontheWeb

Interviewavailableatdatatags.org

DecisionGraphPoints

DecisionGraphPoints

• Familiar“interviewwithaspecialist”metaphor

DecisionGraphPoints

• Familiar“interviewwithaspecialist”metaphor

• Implicitlydescribelogicinference

DecisionGraphPoints

• Familiar“interviewwithaspecialist”metaphor

• Implicitlydescribelogicinference

DecisionGraphPoints

• Analysis:DetectionofIndependentparts

DecisionGraphPoints

• Analysis:DetectionofIndependentparts

DecisionGraphPoints

• Analysis:DetectionofIndependentparts

• Queries,suchas“whatseriesofanswerswillcreateadatatagsthatallowsclearstorage?”

DecisionGraphPoints

• Optimizations

ExamplecreatedbyEyalBen-Simon,BGU

DecisionGraphPoints

• Optimizations

ExamplecreatedbyEyalBen-Simon,BGU

DecisionGraphPoints

• Optimizations

ExamplecreatedbyEyalBen-Simon,BGU

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

StateoftheTagsTool• Open-sourceprojectat

GitHub• Languagegettingmoretools

andfeature– ProjectwithBGUstudents

• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool

• Tutorialsandreferencedatatagginglibrary.readthedocs.org

• Collaborationvia,e.g.GitHub

FutureoftheTagsTool

• Updatewebinterviewapplication– Includeuploadandinspectionfeatures

• On-linecollaborationenvironment– A-laGoogledocs?

THANKSMercèCrosas,MichaelBar-Sinai,LatanyaSweeney