29
Email Archiving Systems Interoperability (Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Simpson, Joel. 2016. Email Archiving Systems Interoperability. Harvard Library Report. Accessed May 13, 2018 2:54:24 PM EDT Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:28682572 Terms of Use This article was downloaded from Harvard University's DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

  • Upload
    phamnga

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Email Archiving Systems Interoperability

(Article begins on next page)

The Harvard community has made this article openly available.Please share how this access benefits you. Your story matters.

Citation Simpson, Joel. 2016. Email Archiving Systems Interoperability.Harvard Library Report.

Accessed May 13, 2018 2:54:24 PM EDT

Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:28682572

Terms of Use This article was downloaded from Harvard University's DASHrepository, and is made available under the terms and conditionsapplicable to Other Posted Material, as set forth athttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Page 2: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Harvard Library ReportJuly 2016

Prepared by Joel Simpson

Email ArchivingSystemsInteroperability

Page 3: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution
Page 4: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

TheHarvardLibraryReportEmailArchivingStewardshipToolsWorkshopislicensedunderaCreativeCommonsAttribution4.0InternationalLicense(CCBY4.0)<https://creativecommons.org/licenses/by/4.0/>

PreparedbyJoelSimpson,ArtefactualSystems,Inc.

ReviewedbyWendyMarcusGogel,HarvardLibraryandGrainneReilly,LibraryTechnologyServices,HarvardUniversity

Citation:Simpson,Joel.2016.EmailArchivingSystemsInteroperability.HarvardLibraryReport.http://nrs.harvard.edu/urn-3:HUL.InstRepos:28682572.

Page 5: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Table of Contents

ExecutiveSummary..........................................................................................................................3

BackgroundandContext..................................................................................................................4

ProjectObjectives............................................................................................................................4

ProjectApproach..............................................................................................................................4

ProjectResults..................................................................................................................................5

1.AssessmentoftheEmailToolsDataSharingFramework....................................................5

2.AnalysisFramework:RequirementsforInteroperability.....................................................6

3.AnalysisofToolsusingtheRequirementsforInteroperabilityFramework.........................9

4.KeyFindings:AnalysisofToolsandEmailToolsDataSharingFramework........................18

5.OpportunitiestoImprovetheInteroperabilityofEmailTools...........................................20

Acknowledgements........................................................................................................................22

Page 6: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Executive Summary

Earlierthisyear,HarvardLibraryconvenedtheHarvardEAST(EmailArchivingStewardshipTools)workshoptofostertheexpandingemailarchivingcommunity,sharebestpracticesandidentifydirectionsforfuturework.

Oneofthemainconclusionsoftheworkshopwasthatthereisnostandardworkflowthatcanbeuniformlyappliedineverysituation,butthatallarchiveshavesimilarfunctionalneedsforemailarchiving,andthatgiventheneedforflexibility,currentprocessescouldbeimprovedbyusingtheuniquestrengthsofdifferenttoolstogether.

HarvardLibraryengagedArtefactualSystemsInc.tobetterunderstandhowthetoolscanexchangedatatodayandcarryoutanalysistoidentifyopportunitiesforthecommunitytofurthersupportcomprehensivepreservationworkflowsforemail.

CommunitymembershavebeeninvitedtocontributetoanEmailToolsDataSharingFramework.Theintentionistoprovideahighlevelviewofhowemailcontentormetadatacanbeinputoroutputtoeachofthedifferenttools,usingacommonframeworktosupportcomparisonandanalysis.Thisworkisongoing,butenoughdetailhasbeencollectedtoenableanalysisandidentificationofsomeclearopportunitiesforimprovingtheinteroperabilityofthesetools.

Asetof“requirementsforinteroperability”wereidentifiedtosetoutthedifferentaspectsorconcernsinvolvedinusingmultipletoolsinanemailarchiving,processingorpreservationworkflow.Analysiswascarriedouttounderstandhoweachofthetoolssupportsthesedifferentrequirements.Keyfindingswerethenidentifiedineachoftheseareas.

Finally,asetof7draftrecommendationshasbeenproposedforthewidercommunitytoconsider.Thesearehighlevelrecommendationswithoutdetailednextsstepsoranysuggestionforpriority.Wefeeltheyareusefulindecomposingthiscomplexproblemspaceintodiscreteandwell-definedopportunitiesthatwillbeeasiertotackleinafastchangingenvironment.

Page 3 of 22

Page 7: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Background and Context

Earlierthisyear,HarvardLibraryconvenedtheHarvardEAST(EmailArchivingStewardshipTools)workshoptofostertheexpandingemailarchivingcommunity,sharebestpracticesandidentifydirectionsforfuturework.Theworkshopinvolvedstakeholdersfromdifferentinstitutions,includingsubjectmatterexperts,usersanddevelopersofseveralemailarchivingorpreservationtools.

Theworkshopconcludedthatthecommunityisveryinterestedinworkingtogethertosolvesharedproblems.Severaldirectionsforfutureworkwereidentified,including“theneedforanexchangestandardthatenablesinteroperablewaystoextract,packageandtransferdatabetweentools”.Thisconclusionwasbasedontheconsensusthatthereisnooneuniformworkflowforemailarchiving,butthatcurrentprocessescouldbeimprovedifarchiveswereabletoharnesstheuniquestrengthsofeachtoolselectively(usingonlythefunctionalityneededinwhateverorderisneeded).

HarvardLibraryPreservationServicesengagedArtefactualSystemsInc.tocarryoutashortconsultingprojecttobuildonthesefindingsandidentifyopportunitiesforthecommunitytofurthersupportcomprehensivepreservationworkflowsforemail.

Project Objectives

Thegoalsofthisconsultingprojectareto:

1. identifygapsoropportunitiestoimprovetheinteroperabilityofthenumerousemailtoolsbyshowingthetype,formatandstructureofdatawhichcanbeinputoroutputfromeachtool

2. informemailstewardsabouttheoptionsandconsiderationsinvolvedindefiningemailarchivingworkflowsusingmultipletools

Thisprojecthasnotattemptedtoprovideafunctionaldescriptionorcomparisonofthevarioustoolsunderconsideration.Averybriefoverviewofthetools,withlinksforfurtherdetailedinformationavailablefromtheproviders,isprovidedbelowinsection3.AusefulcomparisonofEmailArchivingtools(includingmanynotconsideredinthisproject)canbefoundattheLifecycleToolsforArchivalEmailChart:https://docs.google.com/spreadsheets/d/1V1N22xnr5e0EbDlZWx58bjYO6rkrMrYH9wGX9-CK8c4/edit#gid=986222267.

Project Approach

Thisprojectisproducingtwodeliverablestomeettheobjectivesdefinedabove.

ThefirstdeliverableisanEmailToolsDataSharingFrameworkthatsetsoutthecontentobjects(i.e.email)andmetadatathateachemailorpreservationtoolcaninputoroutput.Representativesfromeachtoolproviderwereaskedtocompletethedescriptionsoftheseinputsandoutputsusingagenericframework(withassociatedglossary)toenablecommonunderstandingoftermsandmakecomparisonbetweentoolseasier.

Amoredetaileddescriptionandassessmentofthetoolisprovidedbelowinsection2.

Page 4 of 22

Page 8: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

TheseconddeliverableofthisprojectisthisConsultingReportwhich

1. assessesthecompletionandusefulnessoftheEmailToolsDataSharingFramework2. proposesagenericsetofrequirementsforinteroperabilitytouseasananalysisframework3. analyzes/summarizeshoweachtoolsatisfiesthoserequirementsforinteroperability4. setsoutseveralrecommendationsforimprovinginteroperabilityofthetoolsandfurther

establishingbestpracticesforthecommunityPleasenotethatthroughoutthisreportwhenwereferto‘digitalobjects’wemeananytypeofdigitalobjects,includingemailsthemselves,relatedcontentlikeattachments,oranyassociatedmetadata.Weuse‘data’interchangeablywith‘digitalobjects’simplybecauseitisshorter.(Wehavenotseentheneedtodistinguishtheseconceptswithmoreprecisedefinitions.)

Project Results

1. Assessment of the Email Tools Data Sharing Framework

1.1. About the Email Tools Data Sharing Framework

Theemailtoolsdatasharingframeworkincludesinformationon6differentemailorpreservationtools.Theintentionistoprovideahighlevelviewofhowemailcontentormetadatacanbeinputoroutputtoeachofthedifferenttools.

Theframeworkissetoutinaspreadsheet,withonesheettodescribeinputsandanothertodescribeoutputs.Eachsheetisorganizedtofirstdescribetheactual(or"physical")dataobjects(orinput/outputmechanisms,asinsomecasestheyareprogrammatic),followedbyadescriptionofthekindsofdataormetadatafoundinthoseobjects.

Separaterowsdistinguishbetweenthelevelofobligationdemandedtobeabletouseeachtool:

● mandatorycontentordata(systemwillnotacceptorworkproperlywithoutthis)● usefulcontentordata(isoptional,butenablesfunctionalitywithinthesystem-e.g.asensitivity

flagthatcanbeusedwhenfiltering)● additionalcontentordata(canbeconsumed,butisnotusedinanywaybyconsumingsystem--

e.g.attachmentsareincludedinMBOX,buttheparticularsystemmaynotallowuserstodoanythingwiththem)

Thegoalistodescribeineachofthesecolumns:

● thetypeorextentofdataprovided(e.g.specificfieldsusedasreferenceIDs,oramoregeneraldescriptionsuchas'preservationevents')

● formatofdata(isa'local'schemadefined,orisastandardschemaused,suchasPREMIS)● location/structureofdata(whereintheinput/outputisthisinformation--e.g.PREMISevents

arerecordedinMETS.xmlfile;folderinformationstoredinpathnameinMBOXetc.)Insomecasesthisinformationneedstobebrokendownintodifferentlevelsofgranularity,forinstancetoindicateinformationstoredatindividualemaillevelvs.collectionlevel.

Page 5 of 22

Page 9: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

1.2. Assessment of the Email Tools Data Sharing Framework

Atthetimeofthiswriting,completionofthespreadsheetisinprogress.Weinvitecommentsorthoughtsfromallparticipantson:

● abilitytocompletethespreadsheetconsistently(orkeydifferencesininterpretation)● anythinglearnedwhilefillingitin● whetheritiscompleteenough,orneedsfurtherwork;wishlistadditions/amendments(e.g.

suggestionsforaddingmoredetail)● initialviewsonvalueoftheexercise● intenttousethetoolmovingforward

Datagatheringworkisongoingandwillberefinedasneededbythecommunitytosupporttheircollaborativeeffortstoimprovethesetoolsandestablishbestpracticesforemailarchivingandpreservation.

InitialfeedbackandobservationsfromArtefactual:

● Itisinterestingtoseethisparticularperspectivefromthedifferenttools,andenablesinterestinganalysisofsimilaritiesanddifferences(whichwillbeexploredfurtherintherestofthisreport).

● Thespreadsheetemphasizestwodimensions(datatypesincolumnsandsystemsinrows),butthereareinfactnumerousdimensionsofinterest(includinggranularityofgroupingofdata,levelsofobligation,typeofdatavs.formatsorstandardsemployed,etc.).Thismakesfittinginalloftherelevantinformationachallenge.

● Giventhespace,itdoesnotseempossibletoincludeenoughdetailedinformationforthistobeaveryhandson‘howto’tool--butitmaywellbeausefulanalyticordecisionsupporttool,todetermineifthereisenoughcompatibilitybetweenaparticularselectionoftoolsforadesiredworkflow.

2. Analysis Framework: Requirements for Interoperability

Thedatasharingframeworkisprimarilyfocusedontheinputsandoutputsofeachofthetoolsunderconsideration.Giventhebroaderintenttoenableemailstewardstodeterminewhetherandhowtheymightcraftworkflowsusingmultipletools,thisreportproposesasetofgeneric‘requirementsforinteroperability’.Thisprovidesamoreholisticviewofthedifferentaspectsofusingmultipletoolsthatoperatetogethertoenableacomprehensiveworkflowforemailprocessingorpreservation.

Theserequirementsaremoreananalyticalframeworkthanaconcretesetofrequirements.Theyarefocusedonthelevelofbusinessprocessesandworkflows,anddonotrepresentaparticularefforttoelicitrequirementsfromendusers.

Therequirementsandtheirrationalearedescribedbelow.Inthefollowingsection,eachofthe6toolsisassessedagainsteachrequirement.Thisallowsustocomparesimilaritiesanddifferencesinspecificareasofconcernandusethisasthebasisforrecommendationsforfutureworklaterinthereport.

Page 6 of 22

Page 10: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

2.1. Support for data transmission

Themostbasicrequirementforaworkflowthatusesmultipletoolsworkingonacommonsetofdataistoenablethosetoolstoaccessthatdata.

Thisfunctionalitycanbeprovidedinmanyforms;userinterfacesforselectionofdataforingestfromaparticularlocation;automatedjobsthatingestdata;directsystemtosystemconnectivity;orpublishedAPIs.Thegoalhereistosimplyarticulatehoweachsystemsupportsthis,ratherthantojudgeonemethodoveranother.Thiswillallowustoseewhichtoolscansharedata(andhow),ataphysicallevel,withothertools.

2.2. Support for standard data formats

Oncewehavedeterminedaparticulartoolcanaccessasetofdataphysically,weneedtoensureitcaninterpretandprocessthatdata.Ataminimum,thedataformatmustbe‘standard’betweenthetoolsbeingconsidered.

Itiswellestablishedinthepreservationcommunitythatopen,non-proprietaryandwidelyusedstandardsarepreferableforpreservationformats.Whilenotalldatatobeexchangedneedstobe(orevencanbe)inapreservationformat,thesameprincipleswillimprovetheoddsthatanyparticulartoolwillbeinteroperablewithothers.

Supportforstandarddataformatsappliestoemailcontent,metadataandthepackagingofbothemailandmetadata.

2.3. Support for appropriate scope of exchangeable data

Emailcontentandmetadatacanexistorbegroupedatvariouslevelsofgranularity.Differentprocessingtoolsmayacceptdatawithanentirelyarbitrarydefinitionofscope(usingagenerictermsuchasa‘transfer’or‘packet’),ortheymayrequiredataormetadatatoconformtoaspecificdefinition(suchasclearlygroupingdataby‘account’).

Scopeofdataalsoreferstothetypeandextentofdatainanyparticulardataset.Forexample,Archivematicahasfunctionalitytoverifyhashes/checksums;ifchecksumshavebeencreatedinanothertool(e.g.BitCurator),thenideallyArchivematicashouldallowchecksumstobeimportedsothatverificationcanoccuronthosechecksums,notjustonchecksumscreatedbyArchivematica.Thisconceptisclearlytiedcloselywiththelevelofgranularity-achecksummaybemadeforafolderorcollectionofemails,oritmaybecreatedattheindividualemaillevel.

Emailstewardswillneedtounderstandwhatscopeofdataisrequiredorpossibleusinganyparticulartool.Similarlyanydecisiontouseaparticulardatastandardneedstoconsiderthescopeofdatathatformatallowsfororrequires.

Page 7 of 22

Page 11: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

2.4. Ability to track processing history and provenance

Theabilitytoestablishandmaintaintheprovenance(includingprocessinghistory)ofcontentisawellunderstoodrequirementinthearchivalandpreservationcommunities.Whilethismaynotbearequirementforeveryonelookingtoprocessemails,itisafundamentalrequirementforthecoreusergroupsofmanyofthe6toolsweareevaluating.

Emailstewardswhodoneedtorecordandcaptureprovenancewillgenerallyneedamechanismtodothiswhenevertheyareprocessing,creatingorchangingdata.Thismeansthateitherthetoolstheyuseforprocessingneedtocaptureprocessinghistorydirectly,ortheyneedsomeabilitytotrackprocessinghistorymanuallyandstoreitappropriately.

2.5. Support for maintaining the identity and integrity of data

Asdataismoved,migratedorprocessedbydifferenttools,emailstewardsneedtobeabletoensurethattheidentityandintegrityofthedatatheyareprocessingisnotcompromised.

Maintainingtheidentityofthedatasetdependsinlargepartuponusingidentifierstolinkittoitsdescriptiveandadministrativemetadata,andensuringthatthislinkcannotbebroken.Mosttoolsgenerateuniqueidentifiers,buttheseareusuallylocal(assigned,storedandmaintainedwithinthetoolitself).Externalidentifiersmaybesupported,eitherinformally(e.g.byrecordinganaccessionnumberaspartofadirectorystructureorfilename)ormoreformally(asinhavingafieldwithadeclareddatatypethatalignstotheidentifierusedbyanothersystem).Somesystemsalsosupportidentifiersthatreferexplicitlytoexternalresourcesorauthorities(aconceptunderpinninglinkeddata).

Maintainingtheintegrityofdigitalobjectsisoftenachievedusinghashesorchecksums,withregularverification,toensurethatthecontentoftheingesteddatahasnotbeenalteredovertime.Thehashesorchecksumscanbeassignedtoboththeoriginalingestedcontentandtoanynormalizedorotherwisemodifiedversionsthatmaybegeneratedfromthatcontent.Hashesorchecksumsmayalsobeassignedtoassociatedmetadata.

Anothercommonpracticetosafeguardtheintegrityofdataistopackagecontentandmetadata‘together’fortransfer,reducingtheriskofcorruptionorloss(i.e.linksbetweenthetwobreakingatsomepoint).

2.6. System access and documentation to support interoperability

Abasicrequirementistheabilitytoaccessandusethesoftware,bothtechnicallyandwithappropriatepermissionsorlicensing.

Allofthecapabilitiesmentionedabovearelessusefulinpracticeifknowledgetousethemisnotcapturedwell.Technicalanduserdocumentation,trainingmaterialsandtrainingresources(i.e.trainersforhire)alladdtotheabilitytousethetoolaspartofanintegratedworkflow.Thestartingminimumisdocumentationonhowtousethetoolatall.Ideallyaknowledgebasewouldaddresstheexchangeof

Page 8 of 22

Page 12: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

data,interoperabilitywithothersystemsandanylicenserequirements.

3. Analysis of Tools using the Requirements for Interoperability Framework

3.1. Archivematica

Archivematicaisanintegratedsuiteofopen-sourcesoftwaretoolsthatallowsuserstoprocessdigitalobjectsfromingesttoaccessandtoimplementpreservationplans.Usersmonitorandcontrolingestandpreservationmicro-servicesviaaweb-baseddashboard.ArchivematicausesMETS,PREMIS,DublinCore,theLibraryofCongressBagItspecificationandotherrecognizedstandardstogenerateArchivalInformationPackages(AIPs)forstorageinexternalrepositories.

Requirement SupportingFunctionality Observations

Supportfordatatransmission

Digitalobjectsneedtoresideinalocallyaccessiblefilesystemforingest.ArchivematicaisprovidedwithanaccompanyingapplicationcalledStorageServicesthatcanbeusedtoconfigureaccesstosourcesofdataforingest.ThereisanAPItoassignaccessionnumbers,butnodirectsupportformovingdataacrosshardware,networksetc.

Therearenumerousexternaltoolsavailableformovingdata.

Supportforstandardformats

Anydigitalobjectcanbeingested,soanyemailformatcanbeprocessedwithcorefunctionality.EmailinputinMBOXformatcanbeprocessedusingadditionalfunctionality(extractingattachmentsandmetadata).EmailinputinmaildircanbenormalizedandoutputasMBOX.TheBagItfilepackagingstandardissupportedforinputandoutput.Metadatainputincsvorjsonformatscanbeprocessed.Additionalmetadata(inotherformats)canbeincludedbutnotprocessed.Metadataoutputsarewellsupportedbywidelyadoptedstandards(METS,DublinCore,PREMIS,Bag)

NosupporttonormalizetoEMLformat(widelyusedemailformat).

Supportforappropriatescopeofdata

Transfer,Submission,ArchivalandDisseminationpackagescanbestructuredanddescribedusinganydefinitiontheuserchooses.Forexample,anemailaccountoraccountscanbeingestedasoneormoreSIPs,andmultipleSIPscanbecombinedintooneormoreAIPs.Somekeymetadata,suchasrightsmetadata,canonlybeinputorassignedduringprocessingatthepackagelevel.

Providescompleteflexibilitybutnonativesupportforcommonemailgroupings(e.g.account,folderetc.)Rightsmetadatacan’tbeassignedtoindividualemails,souserswouldhavetomanuallystructureinputsandoutputstoreflectdifferentrights(e.g.createoneAIPorDIPforrestrictedemails,andonefor

Page 9 of 22

Page 13: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

non-restrictedemails).

Abilitytotrackprocessinghistoryandprovenance

ProvidesextensivefunctionalitytotrackprocessinghistoryandrecordusingPREMISProcessinghistoryfromexternalsourcescould“travelwith”anydatasets,butcurrentlynoabilitytomergeorconsolidateprocessinghistoryfrommultiplesystems.

Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.

Supportformaintainingtheidentityandintegrityofdata

ArchivematicaassignsUUIDstoallingestedobjectsandusestheUUIDsandIDattributesintheMETSfilestomaintainlinksbetweendigitalobjectsandtheirmetadata.Archivematicaalsosupportsawiderangeofexternalmetadata,sothereareseveralwaysexternalidentifiers(i.e.fromothertools)canbemaintained.Howeverthereisnodirectsupportfortyped/declaredexternalidentifiers(e.g.automaticallyaddingidentifierswhenimportingfromanexternalsystem).Fixityverificationissupportedusingbothinternallyorexternallycreatedhashes.

Emailstewardscouldcreatemanualprocessesforaligningandmaintainingreferentialintegrityacrosssystems(butmayneedtoplanthis-e.g.aligningpackagestructuretoexternalidentificationsystems)

SystemAccessandDocumentation

Documentationavailable,communitysupportwebsite/groups,aswellasforhireservicesforconsultancy,trainingetc.SourcecodeandtechnicalinfoavailableonGitHub.Documentationcanbequitetechnical.

3.2. ArchivesSpace

ArchivesSpaceisanopensource,webapplicationformanagingarchivesinformation.Theapplicationisdesignedtosupportcorefunctionsinarchivesadministrationsuchasaccessioning;descriptionandarrangementofprocessedmaterialsincludinganalog,hybrid,andborn-digitalcontent;managementofauthorities(agentsandsubjects)andrights;andreferenceservice.Theapplicationsupportscollectionmanagementthroughcollectionmanagementrecords,trackingofevents,andagrowingnumberofadministrativereports.Theapplicationalsofunctionsasametadataauthoringtool,enablingthegenerationofEAD,MARCXML,MODS,DublinCore,andMETSformatteddata.

(summary taken from: https://archivesspace.atlassian.net/wiki/display/ADC/ArchivesSpace)

ArchivesSpaceisnotadigitalassetordocumentmanagementsystemandcannotmanagedigitalfilesordigitizationworkflows.Thedigitalobjectsmodulecanbeusedtodescribedigitalobjectsandlinktodigitalfilesstoredelsewhere.ThemetadatacreatedcanbeexportedtoothersystemsasMODS,METS,orDublinCoreormadepubliclyaccessiblethroughthebuilt-inpublicinterface,thoughtheviewersin

Page 10 of 22

Page 14: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

thepublicinterfacearemorelimitedintheirfunctionalitythanthoseofadigitalassetmanagementsystemordigitalrepository.

(detailondigitalobjectstakenfromFAQ:http://www.archivesspace.org/faq)

Requirement SupportingFunctionality Observations

Supportfordatatransmission

ArchivesSpacedoesnotprovideameansofmovingorstoringemailcontent.MetadatacanbeexchangedasfilesorthroughasetofAPIs.

Supportforstandardformats

ArchivesSpacesupportsarangeofwellestablishedstandardsfordescribingarchivalrecords-EAD,MARCXML,MODS,DublinCore,andMETSformatteddata.ArchivesSpacedoesnotsupportfunctionalityorprocessingofemailcontent(i.e.normalisation,searchoridentificationofauthoritiesetc.)

Supportforappropriatescopeofdata

ArchivesSpaceprovidesfunctionalityfordescribingthearrangementandrelationshipsofdigitalobjects.Itdoesnotsupportemailspecificconceptsdirectly(e.g.thenotionofanemailaccount)

Itcouldbeusefultoestablishconventionsorbestpracticesfordescribingemailaccountsandtheirpotentialrelationshipstocollections,agentsetc.

Abilitytotrackprocessinghistoryandprovenance

Supportformaintainingtheidentityandintegrityofdata

Supportforidentifiersandintegrityinternallywithinarepository.Thesystemsupportsstructuredcaptureofagentsandsubjectswhichwillimproveconsistencyandaccuracyofdescription

SystemAccessandDocumentation

ArchivesSpaceisanopensourceprojectwithconsiderabledocumentationavailable.ItissupportedbytheLyrasisorganisationwithfulltimestaffwhoaredevelopersandsubjectmatterexperts.

3.3. BitCurator

TheBitCuratorEnvironmentisbuiltonastackoffreeandopensourcedigitalforensicstoolsandassociatedsoftwarelibraries,modifiedandpackagedforincreasedaccessibilityandfunctionalityfor

Page 11 of 22

Page 15: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

collectinginstitutions.TheBitCuratorsoftwareisfreelydistributedunderanopensourcelicense.ItcanbeinstalledasaLinuxenvironment;runasavirtualmachineontopofmostcontemporaryoperatingsystems;orrunasindividualsoftwaretools,packages,supportscripts,anddocumentation.

KeyfeaturesofBitCuratorinclude:

● Pre-imagingdatatriage● Forensicdiskimaging● Filesystemanalysisandreporting● Identificationofprivateandindividuallyidentifyinginformation● Exportoftechnicalandothermetadata

(summarytakenfrom:http://www.bitcurator.net/bitcurator/)

Requirement SupportingFunctionality Observations

Supportfordatatransmission

BitCuratordoesprovidesupportformigratingdatawithoutalteringitinanyway,startingwiththeconceptofcreatingforensicimagesbeforefurthertransmittingorprocessingdata.Uniquelyamongthetoolsconsideredhere,BitCuratorprovidessoftwarewrite-blockingfunctionalitytoensuretheintegrityofsourceobjects.

Asthisisanareanotwellsupportedbyothertools,itcouldusesomeelaboration/detail.

Supportforstandardformats

SupportsDFXML(DigitalForensicsXML)thatenablestheexchangeofstructuredforensicinformation.BitCuratorgeneratesPREMISmetadatawhentheuserrunsseveralofitscoredataforensicstools,providingarecordofkeyprocessingevents.Providessomeprocessingsupportforemail-e.g.usingreadpsttoconvertPSTemailobjectsintoMBOX.AlsosupportsBAGformatforoutput.

Supportforappropriatescopeofdata

TheBitCuratorenvironmentincludesnumerousapplicationstobeusedfordifferentpurposes,toberunagainstindividualitemsorcollectionsofterms.Oneofthemostcommonlyusedtoolsisbulk_extractor,whichcanbeusedtoidentifypotentiallysensitiveinformationondisks,diskimagesordirectories.Othercoretools,includingfiwalkandotherspecializedreportingtools,aredesignedtoberunagainstentirediskimages.Whenrunagainstadiskordiskimage,bulk_extractorreportsonthelocationofpatternsbasedabyteoff-setontothedisk.Otherreportingtools,includingfiwak,generatemetadatabasedonthefilesystem(filesandfolders).Inthecaseofemail,thefileswouldbelikelyinformatssuchas.pstormbox.Thosewishingto

Page 12 of 22

Page 16: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

generatemetadataassociatedwithspecificmessageswithinthosecontainerfilescouldusereadpstandpipeitsoutputtoothercommand-linetools.BitCuratorisprimarilyconcernedwithidentificationanddescriptionofdigitalobjectsratherthanarrangement.

Abilitytotrackprocessinghistoryandprovenance

BitCuratorgeneratesPREMISmetadatawhentheuserrunsseveralofitscoredataforensicstools,providingarecordofkeyprocessingevents.

Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.

Supportformaintainingtheidentityandintegrityofdata

BitCuratorprovidessupportforindexing,characterizinganduniquelyidentifyingallcontentonadiskordiskimage.Bitcuratorsupportscreationandvalidationofhashes/checksums.

SystemAccessandDocumentation

BitCuratorisanopensourceprojectwithconsiderabledocumentationavailable.

3.4. DArcMail

DArcMail(forDigitalArchiveMailSystem)wascreatedbytheSmithsonianInstitutionArchives.DArcMailprovidesnormalization,itemlevelandbulkprocessing,intellectualarrangement,searchcapability,packagingandaccessfunctionalityforemail.

Requirement SupportingFunctionality Observations

Supportfordatatransmission

Digitalobjectsneedtoresideinanaccessiblefilesystemforingest.

Supportforstandardformats

EmailinputrequiresMBOXastheoriginalformatorasaninterimnormalizationformat.EmailinputinMBOXformatcanbeprocessedwithallcorefunctionalityincludingexportingpreservedemails,emailcollectionsoremailaccountsintheEMailAccountXML(EMA).EMAisacomprehensiveXMLschemadesignedforRFC5322compliantpreservationpurposesappliedtothefullrangeofemailobjects,i.e.,singlemessagetowholeemailaccount.AllelementsoftheoriginalemailisretainedinthepreservationEMAXMLoutput.User-definedsubsetsofemailmessagescanbecreatedandexportedinMBOXorEMAXMLformats.

NosupporttonormalizetoEML.TheEMAXMLschemaisnotwidelyadopted.Itisfullyimplementedintwootheremailarchivingtools,orinlimitedfashioninacoupleotherapplications.

Page 13 of 22

Page 17: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Supportforappropriatescopeofdata

DArcMailallowsuserstointeractwithemailsonanindividual,grouporaccountbasis.Complexsearching,filteringandmessagethreadtracking.Attachmentscanbesearched,viewedandseparatedfromemail.

Abilitytotrackprocessinghistoryandprovenance

TheDArcMailtoolisdesignedtobeusedforinitialappraisalandthenforpreservation(AIP)andaccess(DIP).ItnativelyretainsthelogicalarrangementoftheoriginalaccountinboththeAIPandDIPpackages.ItsflexibilityallowsforcreationofcustomsubsetsofemailforcreationofspecializedAIPsandDIPs.

TransferandaccessioningofemaildigitalobjectsoccuroutsideoftheDArcMailworkflow.Non-technicalmetadatasuchasrightsmetadatamustbecapturedandmaintainedinaseparatesystemormanually.

Supportformaintainingtheidentityandintegrityofdata

DArcMailmaintainsallUIDspresentintheoriginalemails.ItgeneratesSHA-1checksumsforeachmessageandforemailaccountsasawholewhichareembeddedintheEMApreservationformat.DArcMailalsoproducesexternalmetadataincludingthechecksumforeachmessagepreserved.

Theinternalmessageandaccountchecksumsareretainedevenifthepreservedemailaccountismovedtofromonerepositorytoanother.

SystemAccessandDocumentation

DArcMailisnotcurrentlyavailableoutsideoftheSmithsonian.Limiteddocumentationispubliclyavailable.TheSmithsonianintendstoreleaseitasopensourcewhentime/effortallows.

Makingthetoolpubliclyavailableisapreconditionforanyothercommunityusers.

3.5. Electronic Archiving System (EAS)

HarvarddevelopedtheEAStooltoenablearchivalprocessingofemailmessagesandattachmentsandautomatetheprocessofmakingdepositstoHarvard'spreservationrepository.Keyfeaturesinclude:

● NormalizationtoEML--anopenstandardforpreservation(anextensionofIMFRFC5322)--forlongtermpreservation.

● Summaryviewsofthemetadataassociatedwithemailorattachmentswithinaresultset.

● Batchanditemlevelprocessingoptionsforarchivists.

● Longtermpreservationofemailandattachmentsinasecureenvironmentapprovedforsensitivedataissupportedbyautomatedpackagingandtransfertothepreservationrepository–DigitalRepositoryService(DRS).

● CaptureofessentialrightsmanagementinformationusingPREMIS.

Page 14 of 22

Page 18: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

● CaptureofsignificanteventstrackingtodocumentdeletionsofemailandattachmentsandformattransformationssuchastheconversionofthenativemailformattoEML.

(featurelisttakenfrom:http://hul.harvard.edu/ois/systems/eas/)

Requirement SupportingFunctionality Observations

Supportfordatatransmission

Dataneedtobemovedtoa‘dropbox’(directoryspaceinHarvardsystems).EASdocumentationdescribeshowtouseasecureFTPclienttomovethedatabutthisisnotpartoftheEASsolution.

Therearenumerousexternaltoolsavailableformovingdata.

Supportforstandardformats

EmailcontentcanbeinputinMBOXorPSTformat(whichcoversthemajorityofemailclientstandardsforoutputofemail).Attachmentobjectsofanytype(e.g..ppt,.doc)canbeembeddedintheemailsorprovidedseparately.Itisnotpossibletoinputmetadata(beyondthatcontaineddirectlyinMBOX/PSTorattachmentformats).EmailisoutputtoEMLformat,withattachmentsextracted.Overallmetadataiscapturedandoutputusingwellestablishedstandardformats(e.g.METSandMODS)andbothrightsandprocessinghistoryarecapturedinPREMIS.SomereferencemetadataisinlocalformatdefinedbyHarvard(forpackets,collectionsetc.),asismetadatarelatingtosecurity(access)andsensitivity(usinglocallydefined‘flags’).

Emailcontentformatswellsupported.WhileEMLformatforoutputisawellestablishedstandarditisnotacceptedbyallothertoolsforinput.Securityandsensitivitymetadatacouldpotentiallybecapturedusingmorewidelyusedstandard.ReferencingmetadatagearedtowardsHarvardintegrationwithDRSsystem.Maynotbeanyneedtostandardizethis,butsupportforexternalIDswouldenablebetterinteroperabilitywithothertools.

Supportforappropriatescopeofdata

Submissionpacketscanbestructuredanddescribedusinganydefinitiontheuserchooses.Itisnotpossibletoinputadditionalmetadataorcontentbeyondemail/attachments.Processingworkcanbecompletedatindividualitemlevel(emailorattachment)oratvariouslevelsofgrouping(folder,collectionetc.).Additionalgroupingscanbeadded(collectionsorseries).Outputswillalwayscontainthesamepacketstructureastheassociatedinput.Outputcontainsnormalized/processedcontent;doesNOTcontainoriginalinputfiles(i.e.inMBOXorPSTformat)

Providessupportforgrouping(incollectionsetc.)Inabilitytoinputadditionalmetadataorcontentsuggeststhistoolmayworkbestat‘start’ofaworkflow.Stewardswillneedtothinkthroughmanualprocessesformanagingmetadatacreatedusingothertools.

Abilitytotrackprocessinghistoryandprovenance

ProvidesfunctionalitytotrackprocessinghistoryandrecordusingPREMIS.Noabilitytomergeprocessinghistorywiththatfromothertools.

Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.

Page 15 of 22

Page 19: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Supportformaintainingtheidentityandintegrityofdata

Identifiersareinternal(e.g.EASmessageID)orlocaltoHarvard(e.g.DRScodesareforHarvardrepository).Integrationwith‘Wordshack’applicationensuressomedescriptiveoridentificationinformationisbasedoncontrolledvocabulariesusedinHarvard(i.e.alsointegratedwithHarvardDRSrepository).Thisimprovesconsistencyinuseofadmincategoriesandtopics,andimprovesidentificationqualityforpersonsororganisations.

Supportforexternalreferencingsystemswouldbetterenablemulti-toolworkflows.UseofcontrolledvocabularieslimitedtoHarvardcurrently-couldbeseveralapproachestoextendthis-e.g.publishingthosevocabulariesasopendata,orenablinguse/integrationofother(e.g.linkedopendata)vocabulariesasalternatives

SystemAccessandDocumentation

UserdocumentationavailableandsupportforHarvardusers.SystemisnotcurrentlyavailablebeyondHarvardusers.

AprojecthasbeenproposedtoreleasesystemasOpenSourceproject;butsometechnicalworkrequiredtomakereadyformoregenericuse.

3.6. ePADD

ePADDisasoftwarepackagedevelopedbyStanfordUniversity'sSpecialCollectionsandUniversityArchivesthatsupportsarchivalprocessesaroundtheappraisal,ingest,processing,discovery,anddeliveryofemailarchives.Theuserguide(https://docs.google.com/document/d/1joUmI8yZEOnFzuWaVN1A5gAEA8UawC-UnKycdcuG5Xc/edit#)providesthefollowingdescriptionofthemajormodulesinthesystem:

Appraisal:Allowsdonors,dealers,andcuratorstoeasilygatherandreviewemailarchivespriortotransferringthosefilestoanarchivalrepository.

Processing:Providesarchivistswiththemeanstoarrangeanddescribeemailarchives.

Discovery:Providesthetoolsforrepositoriestoremotelysharearedactedviewofemailarchiveswithusersthroughawebserverdiscoveryenvironment.

Delivery:Enablesarchivalrepositoriestoprovidemoderatedfull-textaccesstounrestrictedemailarchiveswithinareadingroomenvironment.

Requirement SupportingFunctionality Observations

Supportfordatatransmission

Theappraisalmodulewillacceptemailfilesdirectly(fromalocalfilesystem)andalsohastheabilityconnectdirectlytoemailserverstodownloademailusingIMAP.Othermodulesrelyonoutputs(files/directories)fromotherePADDmodules(i.e.appraisaloutputis

Therearenumerousexternaltoolsavailableformovingdata.Theabilitytoconnectdirectlytoemailserverisuniqueandsimpleifonlytransportingemailcontent(i.e.noadditional

Page 16 of 22

Page 20: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

inputtoprocessingmodule,processingmoduleoutputisinputtodiscoverymoduleetc.)

content/metadata).

Supportforstandardformats

EmailcontentcanbeinputinMBOXorbydirectlyconnectingtoemailserver(thereforeexcellentsupportifonlyinterestinginingestingemailcontent).Itisnotpossibletoinputothercontent(attachments)orMetadata(beyondthatcontaineddirectlyinMBOXformat).EmailisoutputtoMBOXformat.AttachmentsareNOTextractedseparately.Metadatathatlinkscorrespondents,people,organisationsorlocationstoexternalauthorities(e.g.LCSubjectHeadings)canbeoutputwithURIsthatrepresenttheentitybytheexternalauthority.

Whiletheformatforwrappingmetadataappearstobenon-standard,theprocessforassigningthemetadataformanydescriptiveelements(correspondent,locationetc.)usesexternalauthorities(linkeddata)whicharewellestablishedstandardsforthosespecificvocabularies.

Supportforappropriatescopeofdata

ePADDingestsmaterialstructuredaroundaparticularpersonwhomayhavemorethanoneemailaccount.Itdoesnotappeartoofferthewiderflexibilityofallowinguserstoentertheirownarbitrarilydefined‘packets’.Itisnotpossibletoinputadditionalmetadataorcontentbeyondemail/attachments.Processingworkcanbecompletedatindividualitemlevel(emailorattachment)oratvariouslevelsofgrouping(folder,collectionetc.).Additionalgroupings,suchascollectionsorseries,canbeadded.Scopeofoutputscanvaryasuserscanselectindividualemailstoincludeorexclude.Onlydescriptivemetadatacanbeoutput(butnothingforrights,sensitivity,processinghistoryetc.)ePADDallowsforthere-useorsharingoflexiconfilesforentityanalysis.Lexiconfilesenablefulltextsearchingonarangeofdifferentterms,enablingstewardstoconductcomplextieredsearches.

Metadatacan’tbeinputwithemailcontent.Metadatacan’tbeoutputexplicitly,butisusedinprocessingsostewardscoulddefineworkflowsthatenablethemtoaligntothesemanually.forexample,thecartfunctionalitycanbeusedtoselectonlyemailswithacertainrightsvalueforoutput;thenrepeatforothervalues,creatinganMBOXoutputfileforeachmetadatavalue.

Abilitytotrackprocessinghistoryandprovenance

Notavailablecurrently.

Asnotedabove,couldbesomescopeformanuallyoutputtingdatathatisgroupedaroundaparticularprocessing‘event’-butnodirectsupportformaintaining,muchlessmerging,processinghistory.

Page 17 of 22

Page 21: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Supportformaintainingtheidentityandintegrityofdata

Identifiersareinternal(e.g.ePaddmessageID)IntegrationwithexternalauthoritiessuchasLCSubjectHeadings(FAST)ensuresconsistencyandimprovesaccuracyinapplyingdescriptivemetadata.

Supportforexternalreferencingsystemswouldbetterenablemulti-toolworkflows.LinkedopendataapproachfordescriptivemetadataisuniquetoePADDbutcouldbehelpfulifadoptedbyothertools.

SystemAccessandDocumentation

Userdocumentationavailable;technicaldocumentationandcodeavailableonGitHub.

4. Key Findings: Analysis of Tools and Email Tools Data Sharing Framework

Thissectionsetsoutanalysisandfindingsforeachofthe‘requirementsforinteroperability’basedonourunderstandingofthecapabilitiesavailableacrossallofthetoolstoday.Withtheexceptionofsomespecificintegrations(e.g.ArchivematicaandArchiveSpace),thesetoolswerenotdesignedtointeroperatewitheachother,andsotherearenaturallyanumberofchallengesorrisksintryingtodothatasthetoolsstandtoday.

4.1. Current state of data transmission

● Datatransmissionis,ingeneral,consideredoutofscopebythesetools.● Thereisarisktothechainofcustodyinherentinanyattempttochaintoolstogether.The

primaryriskistometadatathatispartofthedigitalobjectitself(e.g.createdon,createdby,modifiedon,modifiedbyetc.)whichcaneasilybechangedorlostaspartof‘moving’datafromonefilesystemtoanother.

● Manyofthesetoolsattempttominimizethisriskinternally,e.g.,Archivematica,Bitcurator,DArcMail,EAS,allbundleseveraltoolsinternallyandmanagedatatransmissionbetweenprocessingsteps.

4.2. Use of standard formats

● Emailcontentformostsystemsisbasedonwell-establishedformats,particularlyMBOXandEML.SofarallsystemscaninputMBOX.

○ EASoutputsonlyEMLandnotalltoolssupportthisasaninput.● Somesystemssupportonlyverylimitedemail-specificprocessing(e.g.Archivematica)andsome

donotatall(ArchiveSpace)-butasthesesystemsaredesignedtotakeinvirtuallyanydigitalobjectsthisisnotabarrierfortheirmoregenericprocessingcapabilities

● Identificationorreferencingmetadataisoftenexpectedina‘format’thatisnonstandardinseveralcases.MessageIDs,repositoryID,collectionIDareoftentiedtospecificexternalsystems(EASwithDRS,DArcMailwithCMS).

● PREMISisthestandardusedtocaptureprovenanceorprocessinghistorymetadataandrightsmetadata(forthosesystemsthatrecordthismetadata).

Page 18 of 22

Page 22: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

● TheLibraryofCongressBagItstandardisafilepackagingformatusedbyatatleasttwoofthetools(ArchivematicaandBitCurator).

4.3. Scope of email data or metadata exchange

● Therearenosignificantbarrierstoexchanginganyparticularscopeofemailcontent,withtheexceptionthatsomesystems(e.g.ePADD)assumethatemailisdealtwithormanagedonanaccountbasis,whereanaccountistheemailassociatedwithonlyoneindividual.Inotherwords,theusercouldnotinputallemailsforanentireorganisationandprocessthemtogetheratonce(whilemaintainingallindividualaccountlevelmetadata).

● Severaltoolshavelimitationsonthescopeofmetadatathatcanbeinputoraccepted:

○ EAS,ePadd,DArcMaildonotacceptanymetadataasaninput

● Severaltoolshavelimitationsonthescopeofmetadatathatcanbeoutput:

○ ePADDdoesnotallowformanytypesofmetadatatobeoutput

4.4. Capabilities for recording provenance and/or processing history

● Ifmaintainingafullprocessinghistoryisnecessary,thenitmaynotbefeasibletousesystemsthatdon’tsupportthis(ePADD,DArcMail).

4.5. Capabilities for maintaining identity and integrity of data

Use of unique identifiers:

● Mosttoolsgenerateuniqueidentifiersfordataatvariouslevelsofgranularity(someforindividualemail,virtuallyallforaggregationsofsometypesuchasfolder,account,collectionetc.).

● Mosttoolsdonotacceptorstore‘external’identifiers(i.e.uniqueIDscreatedbyothersystems).Thismaypresentchallengeswhenusingmultipletoolsbecausetherearelimitedwaysofensuringthataparticulardataitemorgroupofdataiscorrectlyidentified(forinstance,iflookingataparticularemailinonetool,isthereawayofconfidentlyfindingandprocessingthesameexactemailinanothertool).

● Sometoolsdoprovidesomemeansofcapturingexternalidentifiers(e.g.inArchivematicabyprovidingIDswithinametadatacsvfileatthepointoftransfer).Howevernoneofthetoolsappeartosupportthisatthelevelofindividualemails.

Definition of key elements and aggregations:

● Manyofthetoolsallowuserstodefinetheelementsoraggregationsthatsuitthembest.Thisflexibilityisastrengthbutcouldleadtosomeconfusionifelementsoraggregationsarenotdefinedconsistentlybetweensystems.

Page 19 of 22

Page 23: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

● ThedefinitionofanEmailAccountisprobablythemostsignificantconcernasitappearstobedefineddifferentlyindifferentsystems.Anemailaccountinonetoolmayappeartobesameemailaccountwhenviewedorprocessedinanothertool,buttheriskisthatitisn’tbecausethedefinitionsarenotconsistent.Thereisalsotheriskthatthedatamodelsarenotcompatible-forinstanceifonesystemonlyallowsoneemailaddressperaccountwhereanotherallowsmultipleaddresses.

4.6. System access and documentation

● Alloftheopensourcesystemshavepubliclyavailabledocumentationorknowledgeresources,howeveraccesstodevelopersorsubjectmatterexpertsmaynotbepubliclyavailable.

● NeitherEASnorDArcMailarecurrentlyavailablebeyondtheirinstitutions.Bothprojectteamsintendtoreleasethemwithopensourcelicenses,butworkisrequiredtodothisandmakethesoftwareavailabletothecommunity.

5. Opportunities to Improve the Interoperability of Email Tools

Severaldraftrecommendationsaresuggestedbelowfordiscussion.Atthisstagenoefforthasbeenmadetoprioritizetheseorsetoutconcretenextsteps.Wehavekeptthescopeofthesetoareasthatwefeeladdresstheinteroperabilityofthespecifictoolsassessedinthisreport.

Wehavenotmadeanyspecificrecommendationsregardingthechallengesoftransmittingdatabetweensystems.Whiletherearesomeclearrisks,asdescribedinthefirstpartofsection4.1(suchaschainofcustodyandfileintegrity),wefeelthata)theseareverybroadandapplytoallformsofpreservationusingmultipletoolsandb)theextentoftheproblemisnotwelldefinedoragreedon;forexample,someinstitutionsmaynotseeanyproblemswithdatatransmissionprotocolsthathappenbeforeformalaccession.Whilewefeelthisareawarrantsfurtherconsideration,thatmaybeoutsidethescopeofconcernforthisreport.

5.1. Enhance tools to support external reference identifiers

Attheveryleast,toolsneedtobeabletoacceptandmaintainexternalidentifierssothatemailstewardscankeeptrack(atmultiplelevelsofgranularity)whatdataisbeingprocessedthroughoutaworkflow.

Ingeneral,emailstewardsshouldbeabletousetheidentifiersforindividualitems,foldersorothergroupingsfromonesystemwhenexportingdataandcarryingoutfurtherprocessinginanothersystem.

Ideallyexternalidentifierswouldalsobecapturedwhencapturingprocessinghistorysothatitispossibletoclearlytrackthechainofcustody(forexamplebyassociatingtheidentifierwiththePREMISagentinvolvedinprocessing).

Page 20 of 22

Page 24: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

5.2. Adopt standard approaches to capturing and respecting rights and sensitivity metadata

Giventhatemailcollectionsoftencontaincontentwithavarietyofdifferentrights,andthatthereisawidespectrumofprivacyandconfidentialityissuesthatcanbeinvolved,emailarchivingtoolsshouldsupportstandardwaysofcapturingrightsorsensitivitymetadata.

Manysystemsalreadyusestandardsforrights(forinstanceusingPREMISrightsentities);however,theredoesn’tappeartobeanequivalentapproachforrecordingsensitivityorprivacyinformation.

5.3. Establish MBOX as minimum standard for input and output of email content

MBOXisthemostwidelyusedstandardamongstthetoolsconsideredhere.EMLisalsoawidelyusedstandardandsupportedbyamajorityofemailclients.TheEAXSstandardusedinDArcMailmaybemorecomprehensivebuthassofarnotbeenwidelyadoptedandtherearenotoolsfordiscoveryandaccessinthatformat.

WethereforerecommendthattoolprovidersconsideraddingMBOX--complyingwithRFC4155(ApplicationMBOXMediaType)andRFC5322(InternetMessageFormat)--asastandardforbothinputandoutput(wherethatdoesn’talreadyexist).Thisdoesn’tnecessarilymeanobsoletinguseofEMLorEAXS,butsimplyprovidingadditionalsupporttoenablemaximuminteroperabilitybetweentools.

5.4. Establish a common exchange standard for packaging email with metadata

Astandardforpackagingdigitalcontent,describingthecontentsofthepackageandensuringintegrityofthepackageusinghasheswillgreatlyimprovetheabilitytotransferdatasafelybetweensystems.TheLibraryofCongressBagitstandardiswell-establishedandisalreadyusedbyatleasttwoofthetoolshere(ArchivematicaandBitCurator).

TheBagItstandardmaynotbeenoughinitselfhowever.Whilerecommendation5.3wouldensurethatemailcontentcanbetransferredusingtheMBOXstandard,additionalstructuralandmetadatastandardsmaybeneededtodefineminimumexpectationsforwhatcontentormetadataisrequired,optionaloracceptable.Forexample,toclarifywhetheritisacceptabletopackagemultipleemailaccountstogether.

5.5. Support capture of processing history

SeveraltoolsrecordprocessinghistoryusingthewellestablishedPREMISstandard.

Ideallyalltoolswouldprovidethiscapabilitysothatcomprehensiveprocessinghistorycanbecapturedthroughoutaworkflowusingmultipletools.

Page 21 of 22

Page 25: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution

Further consideration should be given to the consolidation of processing history files from differentsystems, or the ability tomanually addprocessing history (to fill any gapswhere a tool does not yetrecorditautomatically).

5.6. Establish standard definition and description of email collections

Itisn’tclearthatthedefinitionofwhatconstitutesanemailaccount(includingtherelationshipwithemailaddresses,orpeople)isconsistentbetweentools.Establishingacommondefinitionwillenablealignmentofdifferentdatamodelsusedandreducetheriskofconfusionormis-identificationofemailcollectionsatthisfundamentallevel.

Withaconsistentandstandarddefinition,itwillthenbepossibletodevelopacommonstandardfordescribingemailaccounts.Thiswouldhelpimprovetheprecisionofsearchanddiscoveryandbetterenabletheexchangeofdescriptivemetadatabetweentools.

5.7. Make local tools publicly available with an open source license

Toolsthatareonlyusablebyoneinstitutionarenotusefultothewideremailarchivingcommunity.Whilethereareclearlycoststomakingatoolmorewidelyavailableandtryingtocreateandmaintainanactivecommunityaroundit,wefeeltherearemanybenefitsthatcanoffsetthosecostsinthelongrun,includingopeninguptheprojecttoawiderbaseofdevelopers,testersandpotentialfunders.

Acknowledgements

ThisprojectbuiltonthegreatworkstartedattheHarvardEmailArchivingStewardshipTools(EAST)workshopinMarch2016.Wewouldliketothanktheoriginalparticipantsandacknowledgethemanycontributionsreceivedsince.

InparticularwewouldliketothankthecontributorstotheEmailDataSharingFramework;GlynnEdwards,JoshSchneiderandPeterChan(StanfordUniversity),AndreaGoethals,GrainneReillyandSkipKendall(HarvardUniversity),SarahRomkeyandJustinSimpson(ArtefactualSystemsInc.)andCalLee(UniversityofNorthCarolinaChapelHill).

Numerousreviewersprovidedhelpfulcontributionsandsuggestionsforthisreport.WewouldliketothankEvelynMcLellan,JustinSimpsonandSarahRomkey(ArtefactualSystemsInc.),AnthonyMoulen,AndreaGoethalsandGrainneReilly(HarvardUniversity),ChrisProm(UniversityofIllinoisatUrbana-Champaign),CalLee(UniversityofNorthCarolinaChapelHill)andRiccardoFerrante(SmithsonianInstitutionArchives).

WewouldliketothankHarvardLibraryfortheopportunitytoengageinthisworkandprovidingsupportanddirectionthroughout.

FinallythankstoWendyGogel(HarvardUniversity)forcontributionsonmanyfrontsandprovidingleadershipfortheproject.

Page 22 of 22

Page 26: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution
Page 27: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution
Page 28: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution
Page 29: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution