Upload
trantruc
View
219
Download
3
Embed Size (px)
Citation preview
SHARINGSENSITIVEDATAWITHCONFIDENCE:THEDATATAGSSYSTEM
MercèCrosas,Ph.D.ChiefDataScienceandTechnologyOfficerIQSSHarvardUniversity
MichaelBar-SinaiPhDcandidateinComputerScienceattheBen-GurionUniversityoftheNegev,IsraelFellowattheInstituteforQuantitativeSocialScienceatHarvardUniversity.
LatanyaSweeneyProfessorofGovernmentandTechnologyinResidenceDirectorofDataPrivacyLabHarvardUniversity
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
ScienceValidate,reuseandextend
previouswork
dataverse.org
Open-sourcesoftwaredevelopedatHarvard’sIQSSsince2006Usedtoshare,publish,citeandarchiveresearchdata
Installedin12sitesworldwideServing100sofuniversitiesandorganizations
HarvardDataverse:dataverse.harvard.eduStartedasacommunityrepositoryforSocialScienceNowopentoallresearchfieldsandallresearchers
Morethan1300dataversesMorethan59,000datasets
Morethan1,500,000downloads
“UserUploadsmustbevoidofallidentifiableinformation,suchthatre-identificationofanysubjectsfromtheamalgamationoftheinformationavailablefromallofthematerials(acrossdatasetsanddataverses)uploadedunderanyoneauthorand/orusershouldnotbepossible.”
“SubmitterrepresentsandwarrantsthattheContentdoesnotcontainanyinformation(i)whichidentifies,orwhichcanbeusedinconjunctionwithotherpubliclyavailableinformationtopersonallyidentify,anyindividual;”
“IfyouaresubmittinghumansequencestoGenBank,donotincludeanydatathatcouldrevealthepersonalidentityofthesource.Itisourassumptionthatyouhavereceivedanynecessaryinformedconsentauthorizationsthatyourorganizationsrequirepriortosubmittingyoursequences.”
GenBank
SweeneyL,CrosasM,Bar-SinaiM.SharingSensitiveDatawithConfidence:TheDataTagsSystem.TechnologyScience.2015101601.October16,2015.http://techscience.org/a/2015101601
Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling
Adatatagsrepositoryisonethatstoresandsharesdatafilesinaccordancewithastandardizedandorderedlevelsofsecurityandaccessrequirements.
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.4. Providestechnologicalguaranteesforrequirements1,2and3.
DatatagsLevelsTagType Description SecurityFeatures AccessRequirements
Blue Public ClearstorageCleartransmission Open
Green Controlledpublic
ClearstorageCleartransmission
Email,OAuthverifiedregistration
Yellow Accountable ClearstorageEncryptedtransmit
Password,Registered,Approval,ClickDUA
Orange Moreaccountable
EncryptedstorageEncryptedtransmit
Password,Registered,Approval,SignedDUA
Red Fullyaccountable
EncryptedstorageEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
Crimson Maximallyrestricted
MultiEncryptstoreEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
Level1:Nosensitivedata;opendata
Level1:De-identifieddata
Level2:ConfidentialinformationbyUniversitystandards;nomaterialharm
Level3:Confidentialinformationthatcouldcausematerialharm(non-level4FERPA)
Level4:Highriskconfidentialinformation(SSN)
Level5Informationthatwouldcausesevereharm
DataTagsvsHarvardSecurityLevels
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
PrivacyPreservingAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
ACuratorModelforPrivacy-PreservingAnalysis
Acknowledgement:Honaker,J.andNissim,K.,DataPrivacyToolsProject
DifferentiallyPrivatestatistics(summaries,causalinference,regression,interactivequeries)
CredentialsandRetrievalinDataverse
DataFilenotrestrictedGuestbook–Emailtoaccess
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;ClickDUA
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;SignDUA
DataFilerestricted;InCommonaccount;Requestaccess;Two-FactorauthenticationSignDUA
Betty:SoleResearcher
• Receivedconsentfromparticipants• Repositoryforsharinghighly
sensitivedata(notnecessarilyHarvardDataverse)
Betty:GlobalResearchRepositoryIngestion and
Decision-making Knowledge
IRB determination or an interview system.
Codification and Infrastructure
Blue, Green, Yellow, Orange, Red, Crimson.
Credentials and Retrieval
Different files may additionally require specific terms of use based on legal or regulatory requirements or adopted best practices.
(SameusecaseasDataverse)
Adam:LargeMedicalResearchGroup
• Repositoryforsharinglocaldata• Repositoryforpublisheddata• Repositoryforsharingwith
collaborators
Diane:MultinationalCorporation
• Cloudcontainsdatafromallovertheworld,collectedunderavarietyofterms,subjecttodifferentlaws
• Repositorythatenforcesrequirementsonemployeeaccess
Charles:InstitutionalReviewBoard
• Documentcommitteedecisions• Recommendhandlingbasedon
priordecisions
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
Published2015-09-29
SweeneyL,YooJ.De-anonymizingSouthKoreanResidentRegistrationNumbersSharedinPrescriptionData.TechnologyScience.2015092901.September29,2015.http://techscience.org/a/2015092901
techscience.org
ADatataggingtoolneeds:
• FormaldescriptionofaDatatag– Capturethedatahandlingpolicyofthetag– Capturethe“stricter-than”ordering
• Interviewcreationtool– Supportuser-friendlyinterviews– Decideonthedatatagbasedontheanswersonly
FormalDescriptionofaDatatag
• Modeldatahandlingpoliciesasasetoforthogonalaspects– Storageencryption,accessrequirements…
• Describeimplementationoptionsforeachaspect;orderimplementationsfromlenienttostrict– Clear<Encrypted<MultiEncrypt
Tags:TagsSpacefile(.ts)
• Describeatagspace
• Conveniencefeatures:hierarchy,“slots”ofdifferenttypes,top-downdesignsupport,comments…
ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU
FindingtheRightTag–DecisionGraph
• Directed,AcyclicGraph
• NodeTypes:– Ask– Set– Convenience:Call,End,Reject,Todo
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
• Queries,suchas“whatseriesofanswerswillcreateadatatagsthatallowsclearstorage?”
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
FutureoftheTagsTool
• Updatewebinterviewapplication– Includeuploadandinspectionfeatures
• On-linecollaborationenvironment– A-laGoogledocs?