213
1 Policy Template Workbook – iRODS 4.2 DataNet Federation Consortium SheauYen Chen, Mike Conway, Jon Crabtree, Cal Lee, Sunitha Misra, Reagan W. Moore, Arcot Rajasekar, Terrell Russell, Isaac Simmons, Lisa Stillwell, Helen Tibbo, Hao Xu August 25, 2015

Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

1

PolicyTemplateWorkbook–iRODS4.2

DataNetFederationConsortium

Sheau‐YenChen,MikeConway,JonCrabtree,CalLee,SunithaMisra,ReaganW.Moore,ArcotRajasekar,TerrellRussell,IsaacSimmons,LisaStillwell,Helen

Tibbo,HaoXu

August25,2015

Page 2: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

2

PolicyWorkbook‐iRODS4.2bySheau‐YenChen,MikeConway,JonCrabtree,CalLee,SunithaMisra,ReaganW.Moore,ArcotRajasekar,TerrellRussell,IsaacSimmons,LisaStillwell,HelenTibbo,HaoXu

Copyright2015bytheiRODSConsortium.Allrightsreserved.PrintedintheUnitedStatesofAmerica.

PublishedbytheiRODSConsortium,100EuropaDrive,Suite540,ChapelHill,NorthCarolina,27517USA.

September2015

AcknowledgementsThisresearchwassupportedby:NSFITR0427196,Constraint‐BasedKnowledgeSystemsforGrids,DigitalLibraries,andPersistentArchives(2004–2007).NARAsupplementtoNSFSCI0438741,Cyberinfrastructure;FromVisiontoReality—DevelopingScalableDataManagementInfrastructureinaDataGrid‐EnabledDigitalLibrarySystem(2005–2006).NARAsupplementtoNSFSCI0438741,Cyberinfrastructure;FromVisiontoReality—ResearchPrototypePersistentArchiveExtension(2006–2007).NSFSDCI0910431,SDCIDataImprovement:DataGridsforCommunityDrivenApplications(2007–2010).NSF/NARAOCI0848296,NARATranscontinentalPersistentArchivePrototype(2008–2010).NSFOCI1032732,SDCIDataImprovement:ImprovementandSustainabilityofiRODSDataGridSoftwareforMulti‐DisciplinaryCommunityDrivenApplications(2010‐2012).NSFOCI0940841,DataNetFederationConsortium(2011‐2015).Theviewsandconclusionscontainedinthisdocumentarethoseoftheauthorsandshouldnotbeinterpretedasrepresentingtheofficialpolicies,eitherexpressedorimplied,oftheNationalArchivesandRecordsAdministration(NARA),theNationalScienceFoundation(NSF),ortheU.S.Government.

Page 3: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

3

Abstract

Policy‐baseddatamanagementsystemssuchastheintegratedRuleOrientedDataSystem,automatetheenforcementofmanagementpolicies,automateadministrativetasks,andautomatethevalidationofassessmentcriteria.Thisbookpresentspolicysetsappliedinsixtypesofdatamanagementapplications:1)datasharing;2)studentdigitallibrary;3)productiondatacenters;4)preservation;5)protecteddatamanagement;and6)NSFDataManagementPlans.

Page 4: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

4

TableofContents

1 Introduction ............................................................................................................ 11.1 PolicyLibrary..................................................................................................................................................81.2 Summary........................................................................................................................................................12

2 Data Sharing Policy Set .......................................................................................... 172.1 Manageusercreation(Policy1)..........................................................................................................172.2 Manageuserdeletion(Policy2)..........................................................................................................182.3 Managerenamingofadatagrid(Policy3).....................................................................................182.4 SetthemaximumnumberofI/Ostreams(Policy4)..................................................................192.5 Bypasspermissionchecksforregisteringafile(Policy5).......................................................192.6 Setpolicyfordefiningphysicalpathnameforafile(Policy6)...............................................202.7 Setnumberofexecutionthreadsusedtoprocessrules(Policy7).......................................202.8 Setpolicyforprocessingfilesinbulk(Policy8)...........................................................................212.9 Manageindexingofthesystemstatecatalog(Policy9)............................................................212.10 Setstoragequotapolicy(Policy10)................................................................................................222.11 Manageselectionofstorageresource(Policy11).....................................................................22

3 DataManagementPolicySet(SILSLifeTimeLibrary) ................................... 243.1 Turnonstoragequotaenforcement(Policy10)...........................................................................243.1.1 Checkformissingquotas........................................................................................................................243.1.2 Calculatetotalstorageusage...............................................................................................................243.1.3 Identifypersonswhoexceededtheirquota....................................................................................253.1.4 Periodicallyupdatequotacheck.........................................................................................................25

3.2 Manageselectionofstorageresource(Policy11).......................................................................263.3 Manageselectionofstorageresourceforreplication(Policy12).........................................263.4 Enforcereplicationofeachnewfile(Policy13)...........................................................................263.5 Manageaccesscontrolpolicy(Policy14)........................................................................................27

4 DataAdministrationPolicySet(RDAPracticalPolicyworkinggroup) ...... 294.1 Dataaccesscontrolpolicies(Policy14)...........................................................................................294.1.1 FindtheUser_IDassociatedwithaUser_name:..........................................................................294.1.2 FindtheFile_IDassociatedwithafilename:................................................................................304.1.3 Setwriteaccesscontrolforauser:....................................................................................................304.1.4 Setoperationsthatareallowablefortheuser"public"...........................................................314.1.5 Checktheaccesscontrolsonafile:....................................................................................................32

4.2 Dataformatcontrolpolicies(Policy15)..........................................................................................334.2.1 Setformatconversionflag.....................................................................................................................334.2.2 Invokeformatconversion.......................................................................................................................344.2.3 Identifyandarchivespecificfileformatsfromastagingarea.............................................34

4.3 NotificationPolicies(Policy16)...........................................................................................................354.3.1 Notifyoncollectiondeletion.................................................................................................................364.3.2 Notificationofevents................................................................................................................................36

4.4 Useagreementpolicies(Policy17)....................................................................................................374.4.1 Setreceiptofsigneduseagreement..................................................................................................374.4.2 Identifyuserswithoutsigneduseagreement...............................................................................38

4.5 Integritypolicy(Policy18)....................................................................................................................384.5.1 Verifyaccesscontrolsonfiles...............................................................................................................384.5.2 Checkintegrityandnumberofreplicasoffilesinacollection.............................................39

Page 5: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

v

4.6 Metadataextraction(Policy19)..........................................................................................................424.6.1 LoadmetadatafromanXMLfile........................................................................................................424.6.2 Loadmetadatafromapipe‐delimitedfile......................................................................................434.6.3 Contextualmetadataextractionthroughpatternrecognition............................................444.6.4 Strippingmetadatafromafile............................................................................................................45

4.7 Databackuppolicies(Policy20).........................................................................................................464.7.1 Dataversioningpolicy.............................................................................................................................464.7.2 Databackupstagingpolicy...................................................................................................................474.7.3 Copyfilestoafederatedstagingarea..............................................................................................49

4.8 Dataretentionpolicies(Policy21).....................................................................................................504.8.1 Purgepolicytofreestoragespace.....................................................................................................504.8.2 Dataexpirationpolicy..............................................................................................................................51

4.9 Dispositionpolicyforexpiredfiles(Policy22).............................................................................524.10 Restrictedsearchingpolicy(Policy23).........................................................................................534.10.1 Strictaccesscontrol...............................................................................................................................534.10.2 Controlledqueries...................................................................................................................................53

4.11 Storagecostreports(Policy24)........................................................................................................534.11.1 Usagereportbyusernameandstoragesystem.......................................................................534.11.2 Costreportbyusernameandstoragesystem...........................................................................54

5 OdumDataPreservationPolicyset ................................................................. 565.1 Automateaccessrestrictions(Policy14)........................................................................................565.1.1 Setinheritanceofaccesscontrolsonacollection.......................................................................565.1.2 Checkwhetheraspecificpersonhasaccesstoacollection....................................................575.1.3 Identifyallpersonswithaccesstofilesinacollection..............................................................575.1.4 Identifyfilesthatcanbeaccessedbyanaccount........................................................................585.1.5 Deleteaccesstofilesforaspecifiedaccount.................................................................................585.1.6 Copyfiles,accesscontrollists,andAVUstoafederateddatagrid.....................................59

5.2 Normalizedatatonon‐proprietaryformats(Policy15)...........................................................615.2.1 Detectionofformattype.........................................................................................................................615.2.2 Automateformattypedetection.........................................................................................................625.2.3 Identifyfileformatextensionsinacollection...............................................................................62

5.3 CreationofPREMISeventdata(Policy16).....................................................................................635.3.1 CreatingPREMISeventinformation.................................................................................................635.3.2 SendingmessagesoverAMQP..............................................................................................................64

5.4 Automationofusersubmissionagreements(Policy17)..........................................................655.4.1 Stagingoffileswithausersubmissionagreement....................................................................65

5.5 AutomaticChecksums(Policy18)......................................................................................................665.5.1 CreatingaBagItfile..................................................................................................................................66

5.6 AutomatedcaptureofProvenance/contextualmetadata(Policy19).................................675.6.1 Provenanceforadministrativepolicies...........................................................................................67

5.7 Federation–periodicallycopydata(Policy20)...........................................................................735.8 De‐identificationofData(Policy25).................................................................................................745.8.1 BitCuratorbasedprocessing.................................................................................................................74

5.9 UniqueIdentifiersforDataSets(Policy26)...................................................................................825.9.1 AssigningaHandletoaFile..................................................................................................................835.9.2 RegisteringfilesinDataONEregistry...............................................................................................83

5.10 Authenticationidentitymanagement(Policy27).....................................................................845.10.1 Verifyaccesscontrolsoneachfile...................................................................................................84

5.11 AutomatedDataReviews(Policy28).............................................................................................84

Page 6: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

vi

5.11.1 MetadataReview.....................................................................................................................................845.12 Mappingmetadataacrosssystems(Policy29)...........................................................................855.12.1 ValidateHIVEvocabularies................................................................................................................86

5.13 ExportDatasetsinMultipleFormats(Policy30).......................................................................865.13.1 PolyglotFormatConversion...............................................................................................................86

5.14 Checkforviruses(Policy31)..............................................................................................................875.14.1 Scanfilesandflaginfectedobjects..................................................................................................87

5.15 Rulesetmanagement(Policy32).....................................................................................................885.15.1 Deployrulesets.........................................................................................................................................88

5.16 Parseeventtrailforallpersonsaccessingacollection(Policy33)....................................89

6 Protected Data Policy Sets .................................................................................... 906.1 CheckforpresenceofPIIoningestion(Policy34)......................................................................926.2 Checkforvirusesoningestion(Policy31)......................................................................................926.2.1 Scanfilesandflaginfectedobjects.....................................................................................................936.2.2 Migratefilesthatpasstheviruscheck.............................................................................................93

6.3 Checkpasswordsforrequiredattributes(Policy35).................................................................936.4 Encryptdataoningestion(Policy36)..............................................................................................946.5 Encryptdatatransfers(Policy37)......................................................................................................946.6 Federation‐controldatacopies(Policy38)..................................................................................956.7 Federation‐manageremotedatagridinteractions(Policy32)............................................966.7.1 Updatingrulebaseacrossservers......................................................................................................97

6.8 Federation–CopyDatafromstagingarea(Policy20)...............................................................996.9 Federation‐managedataretrieval(Policy39)..........................................................................1006.10 Generatechecksumoningestion(Policy40)...........................................................................1026.11 Generatereportofcorrectionstodatasetsoraccesscontrols(Policy41).................1026.12 Generatereportforcost(time)requiredtoauditevents(Policy42)............................1036.13 Generatereportoftypesofprotectedassets(Policy43)....................................................1036.14 Generatereportofallsecurityandcorruptionevents(Policy44).................................1046.15 Generatereportofthepoliciesappliedtocollections(Policy45)...................................1046.15.1 Deployrulesets......................................................................................................................................1046.15.2 Updaterulesets.....................................................................................................................................1056.15.3 Printrulesets.........................................................................................................................................105

6.16 Listallstoragesystemsbeingused(Policy46).......................................................................1066.17 Listpersonswhocanaccessacollection(Policy47)............................................................1066.18 Liststaffbypositionandrequiredtrainingcourses(Policy48)......................................1076.18.1 Setpositionandtraining...................................................................................................................1076.18.2 Liststaffbypositionandtraining.................................................................................................108

6.19 Listversionsoftechnologythatarebeingused(Policy49)...............................................1086.20 Maintaindocumentonindependentassessmentofsoftware(Policy50)...................1096.21 Maintainlogofallsoftwarechanges,OSupgrades(Policy51).........................................1096.21.1 Versionlogfiles......................................................................................................................................110

6.22 Maintainlogofdisclosures(Policy52).......................................................................................1106.23 Maintainpasswordhistoryonusername(Policy53)..........................................................1126.24 Parseeventtrailforallaccessedsystems(Policy54)..........................................................1126.25 Parseeventtrailforallpersonsaccessingcollection(Policy33)....................................1126.26 Parseeventtrailforallunsuccessfulattemptstoaccessdata(Policy55)...................1136.27 Parseeventtrailforchangestopolicies(Policy56).............................................................1136.28 Parseeventtrailforinactivity(Policy57).................................................................................1136.29 Parseeventtrailforupdatestorulebases(Policy58).........................................................114

Page 7: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

vii

6.30 Parseeventtrailtocorrelatedataaccesseswithclientactions(Policy59)................1146.31 Providetestenvironmenttoverifypoliciesonnewsystems(Policy60)....................1146.32 Providetestsystemforevaluatingarecoveryprocedure(Policy61)...........................1156.33 Providetrainingcoursesforusers(Policy62)........................................................................1156.34 Replicatedatasetsoningestion(Policy13).............................................................................1166.35 ReplicateiCATperiodically(Policy63)......................................................................................1166.36 Setaccessapprovalflag(Policy64)..............................................................................................1166.36.1 Restrictaccessfor“Protected”data............................................................................................117

6.37 Setaccesscontrols(Policy14)........................................................................................................1186.37.1 Setaccesscontrolsafterproprietaryperiod............................................................................119

6.38 Setaccessrestrictionuntilapprovalflagisset(Policy65)................................................1206.39 Setapprovalflagpercollectionforenablingbulkdownload(Policy66).....................1206.40 SetassetprotectionclassifierfordatasetsbasedontypeofPII(Policy67)..............1216.41 Setflagforwhetherticketscanbeusedonfilesinacollection(Policy68)................1216.41.1 Removepublicandanonymousaccess.......................................................................................122

6.42 Setlockoutflagandperiodonusername‐countingnumberoftries(Policy69)...1226.42.1 Setlockoutperiodonusername...................................................................................................122

6.43 Setpasswordupdateflagonusername(Policy70)..............................................................1236.44 Setretentionperiodfordatareviews(Policy71)..................................................................1246.45 Setretentionperiodoningestion(Policy21)..........................................................................1256.46 Tracksystemsbytype(server,laptop,router,….)(Policy72)..........................................1266.47 Verifyapprovalflagswithinacollection(Policy73).............................................................1266.48 Verifyfileshavenotbeencorrupted(Policy18)....................................................................1276.49 Verifypresenceofrequiredreplicas(Policy74)....................................................................1276.50 Verifythatnocontrolleddatahavepublicoranonymousaccess(Policy75)............1276.50.1 Restrictaccessto“Protected”data..............................................................................................127

6.51 Verifythatprotectedassetshavebeenencrypted(Policy76)..........................................1286.51.1 CheckthatfileswithACCESS_APPROVAL=0areencrypted...........................................128

7 Data Management Plan Example Rules ............................................................... 1297.1 Staffingpolicies(Policy48)................................................................................................................1347.2 Costreporting(Policy24)...................................................................................................................1347.3 Collectioncreationplanning(Policy45).......................................................................................1367.4 Instrumentcontrol(Policy77)..........................................................................................................1377.5 Eventlogforcollectionformation(Policy54)............................................................................1387.6 Collectionreports(Policy41)............................................................................................................1397.7 Productformation(Policy17)...........................................................................................................1407.8 Datacategorymanagement(Policy78)........................................................................................1417.9 Re‐usingexistingdata(Policy79)...................................................................................................1427.10 Qualitycontrol(Policy80)...............................................................................................................1427.11 Analysisprocedures(Policy81)....................................................................................................1437.12 Analysiscollaborations(Policy82)..............................................................................................1447.13 Datadictionary(Policy29)..............................................................................................................1457.14 Namingcontrol(Policy83)..............................................................................................................1457.15 Dataformatcontrol(Policy16)......................................................................................................1467.16 Uniqueidentifiers(Policy27).........................................................................................................1467.17 Metadatastandard(Policy29).......................................................................................................1477.18 Metadataexport(Policy84)............................................................................................................1487.19 Collectioncreationsystem(Policy85)........................................................................................1497.20 Collectionsize(Policy86)................................................................................................................150

Page 8: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

viii

7.21 Publicationoforiginaldata(Policy87)......................................................................................1517.22 Publicationofdataproducts(Policy88)....................................................................................1527.23 Re‐usepolicies(Policy89)...............................................................................................................1537.24 Distributionpolicies(Policy90)....................................................................................................1547.25 Privacyaccessrestrictions(Policy14)........................................................................................1557.26 IPRrestrictions(Policy91)..............................................................................................................1567.27 Webaccesspolicies(Policy92)......................................................................................................1587.28 Datasharingsystem(Policy93)....................................................................................................1587.29 Codedistributionsystem(Policy94)...........................................................................................1597.30 Retentionperiod(Policy21)...........................................................................................................1597.31 Curationplans(Policy95)................................................................................................................1607.32 Archivesystem(Policy96)...............................................................................................................1617.33 Replicationpolicy(Policy13).........................................................................................................1627.34 Backuppolicy(Policy97).................................................................................................................1637.35 Integrityverification(Policy18)...................................................................................................1647.36 Technologymanagementpolicies(Policy49).........................................................................1657.37 Metadatacatalogmanagement(Policy9)..................................................................................1657.38 Transformativemigration(Policy15).........................................................................................165

8 Verifying Policy Sets: ........................................................................................... 1668.1 AnalysisoftheintegratedRuleOrientedDataSystem............................................................1698.2 Policy‐enforcementpoints..................................................................................................................1708.3 Clientinvocationofpolicy‐enforcementpoints.........................................................................1708.4 Proceduresexecutedateachpolicyenforcementpoint.........................................................171

9 Summary: ............................................................................................................ 176

10 Acknowledgements: .......................................................................................... 176

11 References: ....................................................................................................... 176

Appendix A:  Policy‐enforcement Points ................................................................... 178

Appendix B:  Client Invocation of Policy Enforcement Points .................................... 180

Appendix C:  Micro‐services ...................................................................................... 183

Appendix D:  Persistent State Variables .................................................................... 194

Appendix E:  Protected Data Requirements .............................................................. 200

Appendix F: Mauna Loa Sensor Data DMP ................................................................ 204 

 

Page 9: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

1

1 Introduction TheDataNetFederationConsortium(DFC)infrastructureenablescommunitiestoimplementtheirpreferreddatamanagementapplication.PartnerswithintheDFChaveimplementeddatasharingenvironments,datapublicationsystems(digitallibraries),datapreservationsystems(archives),datadistributionsystems,anddataprocessingsystems(processingpipelines).TheDFCsupportseachtypeofdatamanagementapplicationbyspecifyingasetofpoliciesthatenforcethedesiredpurposeforthecollection.Adatasharingenvironmentfocuseson:

Unifiednamespacesforusers,files,collections,metadata Accesscontrols Hierarchicalarrangement Integrity

Adigitallibraryfocuseson: Controllednamespacesforfiles,collections,metadata Descriptivemetadatastandards Standarddataformat PREMISeventdata

Anarchivefocuseson: Authenticity Integrity ChainofCustody Originalarrangement

Adatadistributionsystemfocuseson: Caching Replication Synchronization Accesscontrols

Aprocessingpipelinefocuseson: Controllednamespacesforusers,files,collections,metadata,andprocedures Sharingofprocedures,files Accesscontrols Provenanceofworkflows

Eachofthesetypesofdatamanagementapplicationscanbuilduponcommondatagridinfrastructurebychoosinganappropriatesetofpoliciesandprocedures.Thepoliciesdeterminewhenandwheretheproceduresareexecuted.WithintheintegratedRuleOrientedDataSystem(iRODS)datagrid,policiescanbeautomaticallyenforcedatpolicyenforcementpoints,orpoliciescanbeexecutedinteractivelybyauserorgridadministrator,orpoliciescanbescheduledfordeferredandperiodicexecution.Thepolicyenforcementpointstypicallycontrolmanagementpolicies.Deferredandperiodicexecutionareusedforadministrativetasks.Interactiveexecutionmaybeusedtovalidateassessmentcriteria.

Page 10: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

2

ThisbooklistspolicysetsthathavebeenimplementedinaniRODSdatagrid,generatedinacademicclassesondigitallibrary,andprovidedbyusercommunities.Figure1liststhebasicconceptsunderlyingpolicy‐baseddatamanagement.Givenaspecificdatamanagementpurpose,acollectioncanbeassembledthathasdesiredpropertiessuchasintegrity,authenticity,andaccesscontrols.Thepropertiesthemselvesmayhaveassociatedrequirementssuchascompleteness(allfilesinthecollectionhaveeachproperty),correctness(incorrectvaluesformetadatahavebeenidentifiedandeliminated),consensus(thepropertiesrepresentthecombineddesireofthegroupassemblingthecollection),andconsistency(thesamemetadataanddataformatstandardshavebeenappliedtoallfilesinthecollection).

Figure1.Policy‐baseddatamanagementconceptgraph

Eachdesiredpropertyisenforcedbyasetofpolicies,thatdeterminewhenandwhereassociatedproceduresareexecuted.Thusanintegritypropertymayrequirepoliciesforgeneratingchecksumsandreplicatingfiles.Theassociatedproceduresareworkflowscomposedbychainingtogetherbasictasksorfunctions(alsocalledmicro‐services).Thefunctionsapplybasicoperationssuchasgenerateachecksum,orreplicateafile,orsetthedatatype.Theresultsofapplyingthefunctionsaresavedaspersistentstateinformationormetadataattributesonthefiles,users,storagesystems,policies,andmicro‐services.

Page 11: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

3

Clientsinteractwiththesystembyrequestingactionsthataretrappedatpolicyenforcementpoints(PEP).AteachPEP,arulebaseisexaminedtodeterminewhichpolicytoapply,andtheassociatedprocedureisexecuted.Toimplementassessmentcriteria,policiescanbeexecutedperiodicallytoverifycollectionproperties.Weconsiderpolicysetsforthefollowingpurposes:

Datasharing,implementedinthestandardintegratedRuleOrientedDataSystem(iRODS)release[1].

Digitallibrarymanagement,implementedintheSchoolofInformationandLibraryScienceLifeTimeLibrary[2].

Distributeddatamanagement,implementedintheResearchDataAlliancePracticalPolicyworkinggroup[3].

Datapreservation,implementedintheDataNetFederationConsortium. Protecteddatamanagement,definedintheUNCadministratormanual,

https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf

DataManagementPlans,definedattheDataManagementPlanningtoolsite,https://dmptool.org

For each policy set, we define a set of iRODS rules that can be used to enforcemanagement policies, automate administrative functions, and validate assessmentcriteria.TherulesarewrittenintheiRODSrulelanguage[4‐5].Eachrulethatisruninteractivelyhasarulename,arulebodyenclosed inbracesthat iswritten intheiRODSrule language, INPUTvariables,andOUTPUTvariables. Anexampleruletosay“helloworld”is:

Mytestrule{#ruletowritehelloworld writeLine("stdout","$userNameClientsayshelloworld"};}INPUTnullOUTPUTruleExecOut

Note that “ruleExecOut” on an OUTPUT line will copy the output informationwritten to "stdout" to the user’s screen. This enables retrieval of informationgeneratedthroughinteractiveexecutionofarule.Iftheruleisexecutedatapolicyenforcementpointorexecutedperiodically,theoutputshouldbewrittentoalogfileandsavedwithin thedatagrid. Thesessionvariable, “$userNameClient”,containsthe name of the person who executed the command. The result printed to thescreenbyrunningthisrulefromaccountrwmoorewiththeirulecommandis:

rwmooresayshelloworldThefollowingexamples includerulesthatcanberuninteractivelybyauser,rulesthat are run by a data grid administrator, rules that are enforced at Policy‐EnforcementPoints,andrulesthatrunperiodicallyunderruleenginecontrol.

Page 12: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

4

Rules that are applied at Policy‐Enforcement‐Points have a standard rule namerelated to the specific action that is being controlled. The INPUT variables aretypically replaced with session variables that track who is executing an externalaction. The INPUT variables may also be set through queries on the metadatacatalog. Rules can query a metadata catalog to retrieve information about thecollection, theusers, the storage systems, anduser‐definedmetadata. Inmanyofthefollowingexamples,aqueryismadetothemetadatacatalog,a“foreach”loopisthenused toprocess the rowsreturned fromthequery,parametersareextractedfromtherowstructureusinga“.”operator,and information isoutput toa log fileusingawriteLinemicro‐service.MoreinformationontheiRODSrulelanguagecanbefoundathttp://irods.org.andinthe“iRODSPrimer”[4].Policiesfromallsixpolicysetsareincludedinthisdocument. Thereissubstantialoverlap between policies from the Practical Policy working group, the DFCpreservationpolicyset,andtheDataManagementPlanset. ThepoliciesuniquetotheDFCpreservationpolicysetrequireinteractionwithexternalsystems,whicharelisted inTable1. Whilemanyofthepoliciesaresupportedwithinthe iRODSdatagrid,policiesmayrequire theuseofexternal technologies, suchas the InCommonauthentication system, theHIVEHelping InterdisciplinaryVocabularyEngineeringsystem,thePolyglotformattranslationservice,theBitcuratordataanalysissystem,andtheHandlefileidentifiersystem.Thepolicysetsareidentifiebythenumberintheleftmostcolumn.Whenpoliciesoverlapacrossthesixexampleareas,thepolicynumbercanbeusedtoidentifyrelatedpolicies.Atotalof97policysetshavebeendefined.Table1.Comparisonofpolicysetsfordatasharing,LifeTimeLibrary,RDAdatamanagement,DFCpreservation,ProtectedDataandDataManagementPlans.

Policies

iRODS default policies for data sharing

sils LifeTime Library policies

rda Practical Policy WG policies for admini-stration

odum policy set for preser-vation

hipaa Pro-tected Data

dmp Data Man-

agement Plans

Sup-porting Tech-nology

1 User creation X iRODS

2 User deletion X iRODS

3 Rename data grid X iRODS

4 Set number of I/O streams X iRODS

5 Server Permission checks X iRODS

6 Physical path name X iRODS

7 Execution threads X iRODS

8 Bulk processing X iRODS

9 Catalog indexing X X iRODS

10 Storage quota X X iRODS

11 Select storage X X iRODS

12 Select replication resource X iRODS

Page 13: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

5

13 Replicate files X X X iRODS

14 Access controls X X X X X iRODS

15 Data format control policies X X X iRODS, Polyglot

16 Notification policies X X iRODS, message

bus

17 Use agreement policies X X X iRODS

18 Verify files have not been corrupted 

X X X X iRODS,

SHA-128

19 Contextual metadata extraction policies

X X iRODS

20 Federation ‐ periodically copy data  X X X iRODS

21 Data retention policies X X X iRODS

22 Data disposition policies X iRODS

23 Restricted searching policies X iRODS

24 Storage cost reports X X iRODS

25 De-identification of data. X Bitcurator, iRODS

26 Applying unique identifiers to data sets.

X X Handle, iRODS

27 Authentication protocols for repository users.

X In-

Common, iRODS

28 Automated metadata review X X iRODS

29 Mapping metadata across systems. X HIVE, iRODS

30 Ability to export datasets in multiple formats

X Polyglot, iRODS

31 Check for viruses on ingestion  X X Clam-Scan,

iRODS

32 Federation ‐ manage remote data grid interactions 

X X iRODS

33 Parse event trail for all persons accessing collection 

X X iRODS, operation

s

34 Check for presence of PII on ingestion 

X Bit-

curator, iRODS

35 Check passwords for required attributes 

X iRODS

36 Encrypt data on ingestion  X iRODS

37 Encrypt data transfers  X iRODS

38 Federation ‐ control data copies  X iRODS

39 Federation‐ manage data retrieval  X iRODS

40 Generate checksum on ingestion  X iRODS

41 Generate report by collection of corrections to data sets or access controls 

X iRODS

42 Generate report for cost (time) required to audit events 

X iRODS

43 Generate report of  types of protected assets present within a 

X iRODS

Page 14: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

6

collection 

44 Generate report of all security and corruption events 

X iRODS

45 Generate report of the policies that are applied to the collections 

X iRODS

46 List all storage systems being used  X iRODS

47 List persons who can access a collection 

X iRODS

48 List staff by position and required training courses 

X X iRODS

49 List versions of technology that are being used 

X X iRODS, opera-tions

50 Maintain document on independent assessment of software 

X iRODS, opera-tions

51 Maintain log of all software changes, OS upgrades 

X iRODS, operation

s

52 Maintain log of disclosures  X iRODS, opera-tions

53 Maintain password history on user name 

X iRODS

54 Parse event trail for all accessed systems 

X X iRODS, opera-tions

55 Parse event trail for all unsuccessful attempts to access data 

X Data-book,

iRODS

56 Parse event trail for changes to policies 

X Data-book,

iRODS

57 Parse event trail for inactivity  X Data-book,

iRODS

58 Parse event trail for updates to rule bases 

X Data-book,

iRODS

59 Parse event trail to correlate data accesses with client actions 

X Data-book,

iRODS

60 Provide test environment to verify policies on new systems 

X iRODS, opera-tions

61 Provide test system for evaluating a recovery procedure 

X iRODS, opera-tions

62 Provide training courses for users  X Opera-

tions

63 Replicate iCAT periodically  X iRODS

64 Set access approval flag  X iRODS

65 Set access restriction until approval flag is set 

X iRODS

66 Set approval flag per collection for enabling bulk download 

X iRODS

67 Set asset protection classifier for data sets based on type of PII 

X iRODS

68 Set flag for whether tickets can be used on files in a collection 

X iRODS

69 Set lockout flag and period on user name ‐ counting number of tries 

X iRODS

Page 15: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

7

70 Set password update flag on user name 

X iRODS

71 Set retention period for data reviews 

X iRODS

72 Track systems by type (server, laptop, router,….) 

X iRODS, opera-tions

73 Verify approval flags within a collection 

X iRODS

74 Verify presence of required replicas 

X iRODS

75 Verify that no controlled data collections have public or anonymous access 

X iRODS

76 Verify that protected assets have been encrypted 

X iRODS

77 Instrument Type                 X iRODS

78 Data category                 X iRODS

79 Use of existing data                 X iRODS

80 Quality control                 X iRODS

81 Analysis                 X iRODS

82 Data sharing during analysis                 X iRODS

83 Naming attributes                 X iRODS

84 Metadata export                 X iRODS

85 Collection location                 X iRODS

86 Size                 X iRODS

87 Make original data public                 X iRODS

88 Make data products public                 X iRODS

89 Re‐use                 X iRODS

90 Re‐distribution                 X iRODS

91 IPR                 X iRODS

92 Web access                 X iRODS

93 Data sharing system                 X iRODS

94 Code distribution system                 X iRODS

95 Curation                 X iRODS

96 Archive                 X iRODS

97 Backup frequency                 X iRODS

Typically,thereismorethanonewaytoprovidethefunctionsneededforaspecificpolicy, and more than one way to implement a policy. In practice, policies areneededtoinitializeenvironmentalvariables,toenforcemanagementdecisions,andto validate assessment criteria. Thus each policy area may require theimplementationofasetofpoliciesforeachusergrouporcollection.

Page 16: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

8

1.1 Policy Library Tosimplifywritingthepolicies,alibraryofstandardpolicyfunctionshasbeendeveloped,calleddfc‐functions.re.Theoperationsthataresupportedare:1. addAVUMetadata(*Path,*Attname,*Attvalue,*Aunit,*Status)

AddAVUmetadatatoafile*Path TheiRODSpathtoafile;*Attname Theattributenametobeadded*Attvalue Theattributevaluetobeadded*Aunit Theattributevaluetobeadded*Status Thereturnstatus(“0”ifsuccessful)

2. addAVUMetadataToColl(*Coll,*Attname,*Attvalue,*Attunit,*Status)AddAVUmetadatatoacollection

*Coll TheiRODScollectionname*Attname Theattributenametobeadded*Attvalue Theattributevaluetobeadded*Attunit Theattributeunittobeadded*Status Thereturnstatus(“0”ifsuccessful)

3. addToList(*Name,*Usage,*Listnam,*Listuse,*Min,*Num)Addusageandnametoalistinsortedorder

*Name Anametobeaddedtoalistwhichissortedbyusage*Usage Theusageassociatedwiththename*Listnam Thereturnlistofnamesthatissorted*Listuse Thereturnlistofusagevaluesassociatedwiththenames*Min Settotheminimumusagevaluecurrentlyinthelist*Num Thesizeofthelist(fixedinputvalue)

4. checkCollInput(*Coll)Thischeckswhethertheinputvariableisacollection.

*Coll Thenameofthecollectiontocheck.Failsifcollectiondoesnotexist.

5. checkFileInput(*File)Thischeckswhethertheinputvariableisafile.

*File Thenameofthefiletocheck.Failsiffiledoesnotexist.6. checkMetaExistsColl(*Attname,*Coll,*Lfile,*Value)

Thischeckswhetheracollectionexists.*Attname Thenameofametadataattributethatshouldbepresent

forthecollection.Createdifmissingwithvalue“0”.*Coll Thenameofthecollectionthatisbeingchecked*Lfile Thenameoftheoutputbufferforerrormessages*Val Thevalueofthemetadataattribute,settozeroifthe

attributewasmissing7. checkPathInput(*Path)

Thischeckswhetheravalidpathnameexists. *Path TheiRODSpathnametobeverified(collection/file).

8. checkRescInput(*Res,*Zone)Thischeckswhethertheinputvariableisastorageresourceinzone*Zone.

*Res Thenameofastorageresourcetobechecked. *Zone ThenameoftheiRODSzonewhichhastheresource.

9. checkUserInput(*User,*Zone)Thischeckswhetertheinputvariableisauserinzone*Zone.

*User TheUSER_NAMEofauser. *Zone TheUSER_ZONEofauser.

10. checkZoneInput(*Zone)Thischeckswhetherthedesignatedzoneisaccessiblethroughfederation.

Page 17: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

9

*Zone Thefederatedzonetobechecked.Routinefailsifthezoneisnotfederatedcorrectly.

11. contains(*list,*elem)Returnstrueiflistcontainstheelement

*list Thelistthatischecked. *elem Theelementstringwhichistestedforpresenceinthelist.

12. createCollections(*coll,*cs)Createasub‐collectionforeachentryinlist*csunder*coll

*coll Thefullpathtotheparentcollection.*cs Alistofsubdirectoriesthatareaddedtotheparent

collection.13. createList(*Lista,*Num,*Val)

Createalistoflength*Numwithdefault*Val *Lista Thelistthatisbeingcreated. *Num Thenumberofdefaultvaluestoputinthelist. *Val Thedefaultvalueforeachlistitem.

14. createLogFile(*Coll,*Sub,*Name,*Res,*LPath,*Lfile,*L_FD)Thiscreatesalogcollectionandalogfile.

*Coll Thefullpathtoacollection.*Sub Thesubdirectorythatiscreatedifnecessarytoholdthe

logfile.*Name Thenameofthelogfiletowhichatimestampisappended*Res Thestorageresourcewherethelogfileisstored.*Lpath Returnsthefullpathtothelogcollection(*Coll/*Sub)*Lfile Returnsthenameofthelogfile*L_FD Returnsthefiledescriptorforthelogfile.

15. createReplicas(*N,*Numrepl,*Lfile,*Ulist,*Rlist,*Jround,*Resource,*Coll,*File,*NumRepCreated)

Thiscreates*Nreplicasonalistofresources. *N Thenumberofreplicastocreateofafile.

*Numrepl Thenumberofstorageresourcesincludedinthelistofresources.

*Lfile Theoutputbuffernameforwritingerrormessages.*Ulist Alistthatissetto“1”whenareplicaexistsonastorage

resource*Rlist Thecorrespondinglistofstoragereplicas.*Jround Anindexintothelistofstorageresourcesforthestarting

resourcetouseforreplication.*Resource Theresourceusedasthesourceforthereplica.*Coll Thecollectionnameofthefilebeingreplicated.*File Thenameofthefilethatisreplicated.*NumRepCreated Acounterthatisincrementedasreplicasare

created.16. deleteAVUMetadata(*Path,*Attname,*Attvalue,*AUnit,*Status)

Thisdeletesametadataattributeandvaluefromafile.*Path Theirodsfullpathtoafile.*Attname Theattributenamethatwillbedeleted.*Attvalue Theattributevaluethatwillbedeleted.*Aunit Theattributeunitsthatwillbedeleted.*Status Thereturnstatusresult(“0”ifsuccessful).

17. ext(*p)Extractsextensionbyparsingstringforlettersafteradot

*p Thestringthatisbeingparsed.18. findZoneHostName(*Zone,*Host,*Port)

ThisreturnstheHostnameandPortforafederatedzone.

Page 18: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

10

*Zone ThenameoftheiRODSzonewhichisbeingaccessed.*Host Returnsthehostnameextractedfrom

ZONE_CONNECTION.*Port ReturnstheportextractedfromZONE_CONNECTION.

19. getCollections(*filePaths)Returnslistofcollectionsbydeletingthefilename

*filePaths Convertsalistofpathsintoalistofcollections.20. getFiles(*localRoot,*localPaths)

Returnslistoffilesbystripping*localRootfromlist*localPaths*LocalRoot Thecollectionnamethatisstrippedfromtheinputpaths.*localPaths Returnsthelistoffiles

21. getNumSizeColl(*Coll,*colldataID,*Size,*Num)Thiscountsthenumberoffilesandtotalsizeinacollection.

*Coll Thefullpathtoacollection.*colldataID Thenumberandsizeiscalculatedforallfilesinthe

collectionwithDATA_ID>*colldataID.*Size Returnsthetotalsizeoffilesinthecollection.*Num Returnsthenumberoffilesinthecollection.

22. getRescColl(*Coll,*Rlist,*Ulist,*Lfile,*Num)Thiscreatesalistofstorageresourcesusedbyfilesinacollection.

*Coll Thefullpathtoacollectionthatisanalyzed.*Rlist Returnsalistofresourcesonwhichfileswerestored.*Ulist Returnsausagelistinitializedto“0”.*Lfile Theoutputbuffertowhichinformationiswritten.*Num Returnsthenumberofresourcesthatwerefound.

23. isColl(*LPath,*Lfile,*Status)Checkifcollectionexistsandcreateifnecessary.

*Lpath ThefullpathnameforanniRODScollection.*Lfile Theoutputbuffertowhichinformationiswritten.*Status Returns“0”ifthecollectiondoesnotexist.

24. isData(*Coll,*File,*Status)Thischeckswhetherafilealreadyexists.

*Coll ThefullpathnameforaniRODScollection.*File Thenameofafilethatistestedforpresenceinthe

collection.*Status Returns“0”ifthefiledoesnotexist.

25. modAVUMetadata(*Path,*Attname,*Attvalue,*Aunit,*Status)ThismodifiesanexistingAVUattributeonadatafile.

*Path ThefullpathtoafileiniRODS.*Attname Theattributenamethatisbeingmodifiedwithanewvalue

orunit.*Attvalue Thenewvaluethatisbeinginserted.*Aunit Thenewunitthatisbeinginserted.*Status Returnsthestatusoftheoperation.

26. selectRescUpdate(*Rlist,*Ulist,*Num,*Resource)Thisselectsaresourcetousefromalistofstorageresources.

*Rlist Alistofstorageresources.*Ulist Correspondinglistofusagewithvalue“1”ifthestorage

resourcehasareplica.*Num Thenumberofstorageresourcesinthelist.*Resource Returnsaresourcethatdoesnotstoreareplica.

27. sendAccess(*AccessType,*UserName,*DataId,*DataType,*Time,*Description,*eventOutcome,*host,*queue)

GeneratesanaccesseventmessageandsendsitusingAMQP*AccessType Inputtypeofaccessevent.

Page 19: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

11

*UserName Inputnameofuserwhocausedtheevent.*DataId InputDATA_IDofafilethatwasmanipulated.*Time Inputdatewhentheeventoccurred.*Description Inputdescriptionoftheevent.*eventOutcome Inputeventoutcome.*Host Inputaddressofhostwheretheeventinformationissent.*queue Inputqueuewherethemessageissent.

28. sendLinkingEvent(*DataId,*AccessId,*host,*queue)GenerateaJSONdocumentdescribingalinkbetweenobjects.

*DataId InputDATA_IDoffilethatwasmanipulated.*AccessId Inputeventidentifiervalue.*host Inputaddressofhostwheretheinformationissent.*queue Inputqueuewherethemessageissent.

29. sendRelatedEvent(*relationshipType,*relationshipSubType,*DataIds,*AccessIds,*host,*queue)

CreatesaJSONdocumentdescribingarelatedeventbetweenobjects.*relationshipType Inputtypeofrelationship.*relationshipSubType Inputsubtypeforrelationship.*DataIds ListofDATA_IDsforfilesthatarerelated.*AccessIds ListofaccessIDsforthefiles.*host Inputaddressofhostforsendingamessage.*queue Inputqueuewheremessageissent.

30. updateCollMeta(*Coll,*Attr,*OldValue,*NewValue,*Lfile)Thisupdatesametadataattributeonacollection.

*Coll Pathtoacollectionwhosemetadataismodified.*Attr Collectionattributenamewhosevalueismodified.*OldValue Originialvalueforattribute.*NewValue Newvalueforattribute.*Lfile Nameofbufferwhereinformationiswritten.

31. uploadFiles(*localRoot,*localPaths,*coll)Movesfilesin*localPathstothecollection*coll

*localRoot Thecollectionnamethatisstrippedfromtheinputpaths.*localPaths Listoffilepathnames.*coll Nameofcollectionwherefilesarecopied.

32. verifyReplicaChksum(*Coll,*File,*Lfile,*Num,*Rlist,*Ulist0,*Ulist,*Numr,*NumBad)Thisverifieschecksumsonthereplicasforafile.

*Coll Collectionwhosefileswillbecheckedforintegrity.*File Thefileinthecollectioncheckedforreplicas.*Lfile Nameofoutputbufferwhereinformationiswritten.*Num Numberofstorageresourcesinthestorageresourcelist.*Rlist Listofstorageresourcesusedbythecollection.*Ulist0 Alistthathasbeeninitializedto“0”.*Ulist Returnslistofresourcesthatwereusedtostoreareplica.*Numr Returnsthenumberofreplicasthatexistonthestorage

resources.*NumBad Returnsthenumberoffilesthathaveabadchecksum.

Theruleexamplesassumethatthelibraryofpolicyfunctionshasbeenenteredintotheconfigurationfile,/etc/irods/server_config.json,byadditiontothere_rulebase_set:

"re_rulebase_set":[{"filename":"core,dfc‐functions"}]

Page 20: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

12

Thelibraryofpolicyfunctionsiscalleddfc‐functions.reandisavailablefordownloadathttp://github.com/DICE‐UNC/policy‐workbook/dfc‐functions.re.ApolicyfunctionforencodingastringintoJSONisavailablefromthepolicyfunctionfilejson‐encode.reathttp://github.com/DICE‐UNC/policy‐workbook.1. jsonEncode(*str)

Thisescapesallspecialcharactersinastring. *str Astringthatisprocessedforspecialcharacters

Eachpolicyimplementsaworkflowthatreliesuponinputvariables,sessionvariables,andpersistentstateinformationtomanagetheworkflowoperations.Eachpolicyisdefinedbythesetofoperationsandvariablesthatareapplied.AcopyofeachpolicywrittenintheiRODSrulelanguageisavailableathttp://github.com/DICE‐UNC/policy‐workbook.DefinitionsoftheworkflowoperationsaregiveninAppendixC.DefinitionsofthepersistentstatevariablesaregiveninAppendixD.

1.2 Summary Thisbookpresentstemplatesfor130policies.Theresultingruleswereanalyzedtodeterminethetasksthatwereautomated,thesessionvariablesthatwereused,thepersistentstateinformationthatwasused,andtheoperationsthatwereperformed.Thispresentsacharacterizationofa“minimal”policy‐baseddatamanagementsystemthatiscapableofsupporting:

Datasharing Digitallibraries Productiondatacenters Preservation Protecteddata NSFDataManagementPlans

ThetasklistinTable1hasbeensortedtogroupsimilartaskstogether.Table2a:SortedtasklistAbility to export datasets in multiple formats Encrypt data transfers 

Access controls Execution threads

Analysis  Federation ‐ control data copies 

Applying unique identifiers to data sets. Federation ‐ manage remote data grid interactions 

Archive  Federation ‐ periodically copy data 

Authentication protocols for repository users. Federation‐ manage data retrieval 

Automated metadata review Generate checksum on ingestion 

Page 21: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

13

Backup frequency  Generate report by collection of corrections to data sets or access controls 

Bulk processing Generate report for cost (time) required to audit events 

Catalog indexing Generate report of  types of protected assets present within a collection 

Check for presence of PII on ingestion  Generate report of all security and corruption events 

Check for viruses on ingestion  Generate report of the policies that are applied to the collections 

Check passwords for required attributes  Instrument Type 

Code distribution system  IPR 

Collection location  List all storage systems being used 

Contextual metadata extraction policies List persons who can access a collection 

Curation  List staff by position and required training courses 

Data category  List versions of technology that are being used 

Data disposition policies Maintain document on independent assessment of software 

Data format control policies Maintain log of all software changes, OS upgrades 

Data retention policies Maintain log of disclosures 

Data sharing during analysis  Maintain password history on user name 

Data sharing system  Make data products public 

De-identification of data. Make original data public 

Encrypt data on ingestion  Mapping metadata across systems.

Table2b:SortedtasklistMetadata export  Encrypt data transfers 

Naming attributes  Execution threads

Notification policies Federation ‐ control data copies 

Parse event trail for all accessed systems  Federation ‐ manage remote data grid interactions 

Parse event trail for all persons accessing collection  Federation ‐ periodically copy data 

Parse event trail for all unsuccessful attempts to access data 

Federation‐ manage data retrieval 

Parse event trail for changes to policies  Generate checksum on ingestion 

Parse event trail for inactivity Generate report by collection of corrections to data sets or access controls 

Parse event trail for updates to rule bases  Generate report for cost (time) required to audit events 

Parse event trail to correlate data accesses with client actions 

Generate report of  types of protected assets present within a collection 

Physical path name Generate report of all security and corruption events 

Provide test environment to verify policies on new systems 

Generate report of the policies that are applied to the collections 

Provide test system for evaluating a recovery procedure  Instrument Type 

Provide training courses for users  IPR 

Quality control  List all storage systems being used 

Re‐distribution  List persons who can access a collection 

Re‐use  List staff by position and required training courses 

Rename data grid List versions of technology that are being used 

Replicate files Maintain document on independent assessment of software 

Page 22: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

14

Replicate iCAT periodically  Maintain log of all software changes, OS upgrades 

Restricted searching policies Maintain log of disclosures 

Select replication resource Maintain password history on user name 

Select storage Make data products public 

Server Permission checks Make original data public 

Set access approval flag  Mapping metadata across systems.

Persistentstateinformationforninetypesofobjectswasused:

Collections Data Metadata Quotas Resources Tickets Tokens Users Zones

Atotalof50persistentstateinformationvariableswereaccessed.Table3.PersistentStateInformationVariablesUsedinPoliciesCOLL_ACCESS_COLL_ID DATA_SIZE RESC_LOC

COLL_ACCESS_TYPE DATA_TYPE_NAME RESC_NAME

COLL_ACCESS_USER_ID META_COLL_ATTR_NAME TICKET_DATA_COLL_NAME

COLL_ID META_COLL_ATTR_VALUE TICKET_EXPIRY

COLL_NAME META_DATA_ATTR_ID TICKET_ID

DATA_ACCESS_DATA_ID META_DATA_ATTR_NAME TOKEN_ID

DATA_ACCESS_TYPE META_DATA_ATTR_UNITS TOKEN_NAME

DATA_ACCESS_USER_ID META_DATA_ATTR_VALUE TOKEN_NAMESPACE

DATA_CHECKSUM META_RESC_ATTR_NAME USER_GROUP_ID

DATA_CREATE_TIME META_RESC_ATTR_VALUE USER_ID

DATA_EXPIRY META_USER_ATTR_NAME USER_INFO

DATA_ID META_USER_ATTR_VALUE USER_NAME

DATA_MODIFY_TIME QUOTA_OVER USER_TYPE

DATA_NAME QUOTA_USAGE USER_ZONE

DATA_PATH QUOTA_USAGE_USER_ID ZONE_CONNECTION

DATA_REPL_NUM QUOTA_USER_ID ZONE_NAME

DATA_RESC_NAME RESC_ID   

Onlyfivesessionvariableswereusedtotrackattributesaboutclients:

$objPath $otherUserName

Page 23: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

15

$rodsZoneClient $rodsZoneProxy $userNameClient

Atotalof123operationswereappliedinautomatingthetasks.AlmostafifthoftheoperatorswererelatedtoinitializingdefaultenvironmentvariablessuchasnumberofparallelI/Ostreams,numberofprocessingthreads,defaultstorageresource,defaultreplicationresource,operationspermittedbypublicusers,etc.Table4a.OperationsNeededtoAutomateTasks

. ‐ dot operator  msiCurlGetStr

break msiCurlUrlEncodeString

cons msiDataObjChksum

delay msiDataObjClose

elem msiDataObjCopy

errorcode msiDataObjCreate 

errormsg msiDataObjGet

execCmdArg msiDataObjLseek

fail msiDataObjOpen

failmsg msiDataObjPut

for msiDataObjRead

foreach msiDataObjRename 

if msiDataObjRepl

irods_curl‐get msiDataObjTrim

list msiDataObjUnlink

msiAclPolicy msiDataObjWrite

msiAddKeyVal msiDeleteCollByAdmin

msiAddUserToGroup msiDeleteDisallowed

msiAdmInsertRulesFromStructIntoDB msiDeleteUser

msiAdmReadRulesFromFileIntoStruct msiEncrypt

msiAdmRetrieveRulesFromDBIntoStruct msiExecCmd

msiAdmShowIRB msiExecGenQuery

msiAdmWriteRulesFromStructIntoFile msiExecStrCondQuery

msiAssociateKeyValuePairsToObj msiExtractTemplateMDFromBuf

msiChksumRuleSet msiFreeBuffer

msiCollCreate  msiGetContInxFromGenQueryOut

msiCollRsync msiGetFormattedSystemTime

msiCommit msiGetIcatTime

msiCreateUserAccountsFromDataObj msiGetMoreRows

msiCreateCollByAdmin msiGetObjType

msiCreateUser msiGetStderrInExecCmdOut

Page 24: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

16

Table4b.OperationsNeededtoAutomateTasksmsiGetStdoutInExecCmdOut msiSetDefaultResc

msiGetSystemTIme  msiSetGraftPathScheme

msiGetValByKey msiSetNumThreads

msiLoadMetadataFromDataObj msiSetPublicUserOpr

msiLoadMetadataFromXml msiSetRescQuotaPolicy

msiLoadUserModsFromDataObj msiSetReServerNumProc

msiMakeGenQuery msiSleep

msiMakeQuery msiSplitPath

msiMvRuleSet msiSplitPathByKey

msiNoChkFilePathPerm msiStoreVersionWithTS

msiOrbClose msiString2KeyValPair

msiOrbDecodePkt msiStripAVUs

msiOrbOpen msiSysChksumDataObj

msiOrbReap msiSysMetaModify

msiOrbSelect msiSysReplDataObj

msiQuota msiTarFileCreate

msiReadMDTemplateIntoTagStruct msiVaccum

msiReadRuleSet msiWriteRodsLog

msiRemoveKeyValuePairsFromObj remote

msiRenameCollection select

msiRenameLocalZone setelem

msiRollback split

msiRmRuleSet strlen

msiRuleSetExists substr

msiSendMail succeed

msiSetACL time

msiSetAVU while

msiSetBulkGetPostProcPolicy writeKeyValPairs

msiSetBulkPutPostProcPolicy writeLine

msiSetDataType writeString

msiSetDataTypeFromExt   

Page 25: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

17

2 Data Sharing Policy Set TheiRODSDatagriddistributioncomeswith11defaultpoliciesthatimplementadatasharingenvironment.Thesepoliciesareprovidedinarulebase,andareinvokedautomaticallyatpolicy‐enforcementpointswithinthedatagridmiddleware.Actionsinitiatedbyclientsaretrappedatthepolicy‐enforcementpoints,therulebaseisaccessedtodeterminetheappropriatepolicytoapply,andanassociatedprocedureisexecutedtoenforcethepolicy.ThepoliciesinvokedattheseenforcementpointsinthestandardiRODSreleasearegivenanamethatcorrespondstothepolicy‐enforcementpoint(typicallystartingwith“ac”.IniRODSversion4.0.3thereare70standardpolicyenforcementpoints.Additionalpolicyenforcementpointscanbepluggedintothearchitecturetocontrolnewactions.Thedefaultrulebaseisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.1 Manageusercreation(Policy1)Thispolicyisinvokedwhenanewuseriscreated.Therulecreatesahomedirectoryandatrashdirectoryforeachnewuseraccount,andaddstheaccounttotheusergroup“public”.Iftheaccountis“anonymous”,thehomedirectoryandtrashdirectoriesarenotcreated.Theruleusessessionvariablestoidentifythedatagridzonename($rodsZoneProxy)andtheaccountname($otherUserName).NotethattherearetwoversionsoftheacCreateUserF1rules.Iftheconditionforthefirstruleisnotsatisfied,thesecondversionoftheruleisexecuted.Ifataskfails,themicro‐servicelistedafterthe“:::”separatorisexecuted.Thusinteractionswiththemetadatacatalogare“rolledback”iftheregistrationattemptfails.Thepolicyincludesinvocationofpre‐processingandpost‐processingrulesforusercreation.Thepolicyimplementsaconstraint:

AppliedattheacCreateUserpolicyenforcementpointTestonUser‐name=anonymous

Thepolicyusessessionvariables: $otherUserName $rodsZoneProxy

Theoperationsthatareperformedare:

msiAddUserToGroupmsiCommitmsiCreateCollByAdminmsiCreateUsermsiRollback

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

Page 26: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

18

2.2 Manageuserdeletion(Policy2)Thispolicyisinvokedwhenauseraccountisdeleted.Theruledeletesthehomeandtrashcollectionsassociatedwithauseraccount.Theruleusessessionvariablestoidentifythedatagridzonename($rodsZoneProxy)andtheaccountname($otherUserName).Notethatpreprocessingpolicies(acPreProcForDeleteUser)andpostprocessingpolicies(acPostProcForDeleteUser)canalsobedefined.Thesemightbeusedtomigratefilestoanarchive,orsende‐mailtotheuseraboutthedispositionofthefiles.Thepolicyimplementsaconstraint:

AppliedattheacDeleteUserpolicyenforcementpoint

Thepolicyusessessionvariables: $otherUserName $rodsZoneProxy

Theoperationsthatareperformedare:

msiCommitmsiDeleteCollByAdminmsiDeleteUsermsiRollback

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.3 Managerenamingofadatagrid(Policy3)Thispolicyisinvokedwhenanadministrativecommandisexecutedtorenameadatagrid.Therulerenamesallofthecollectionswithintheoriginaldatagrid.Theruleusestwoinputparameterstoidentifytheoriginalzonename(*oldZone)andthenewzonename(*newZone).Boththenameofthecollectionrepresentingthezoneandthezonenamearereset.Thestringconcatenationoperator“++”isusedtocreatethehomedatagridcollectionfromthehomedatagridname.Thepolicyimplementsaconstraint:

AppliedattheacRenameLocalZonepolicyenforcementpoint

Thepolicyusesinputvariables: *oldZone *newZone

Theoperationsthatareperformedare:

msiCommitmsiRenameCollectionmsiRenameLocalZonemsiRollback

Page 27: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

19

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.4 SetthemaximumnumberofI/Ostreams(Policy4)Thispolicyisinvokedwhenfiletransportisdonefromastorageresource.ThepolicycontrolsthenumberofI/Ostreamsthatareusedtomovefilesacrossanetwork.Therulesupportsconditionsbasedonthesessionvariable$rescNamesothatdifferentpoliciescanbesetfordifferentresources.Onlyonefunctioncanbeusedforthisrule:

msiSetNumThreads(sizePerThrInMb,maxNumThr,windowSize)Thissetsthenumberofthreadsandthetcpwindowsize.ThenumberofthreadsisbasedontheinputparametersizePerThrInMb(sizeperthreadinMbytes).Thenumberofthreadsiscomputedusing:

numThreads=fileSizeInMb/sizePerThrInMb+1wheresizePerThrInMbisanintegervalueinMBytes.Italsoacceptstheword"default"whichsetssizePerThrInMbtoadefaultvalueof32

maxNumThr‐Themaximumnumberofthreadstouse.Itacceptsintegervaluesupto16.Italsoacceptstheword"default"whichsetsmaxNumThrtoadefaultvalueof4.Avalueof0meansnoparallelI/O.Thiscanbehelpfultogetaroundfirewallissues.

windowSize‐thetcpwindowsizeinBytesfortheparalleltransfer.Avalue of0or"default"meansadefaultsizeof1,048,576Bytes.

ThemsiSetNumThreadsfunctionmustbepresentornoparallelthreadswillbeusedforalltransfers.Thepolicyimplementsaconstraint:

AppliedattheacSetNumThreadspolicyenforcementpoint

Theoperationsthatareperformedare:msiSetNumThreads

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.5 Bypasspermissionchecksforregisteringafile(Policy5)Thispolicyisinvokedwhenfilesareregisteredintothedatagrid.Theruledetermineswhetherfilepathpermissionsarecheckedwhenregisteringaphysicalfilepathusingcommandssuchasireg.Therulealsosetsthepolicyforcheckingthefilepathwhenunregisteringadataobjectwithoutdeletingthephysicalfile.Normally,arodsuseraccountcannotunregisteradataobjectifthephysicalfileislocatedinaresourcevault.ThemsiNoChkFilePathPermallowsthischecktobebypassed.Onlyonefunctioncanbecalled:

msiNoChkFilePathPerm()‐Donotcheckfilepathpermissionwhenregistering afile.WARNING‐Thisfunctioncancreateasecurityproblemifused.

Page 28: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

20

Thepolicyimplementsaconstraint:AppliedattheacNoChkFilePathPermpolicyenforcementpoint

Theoperationsthatareperformedare:

msiNoChkFilePathPerm

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.6 Setpolicyfordefiningphysicalpathnameforafile(Policy6)Thispolicyisinvokedbeforeafileisstoredinafilesystem.TheruledefinesthephysicalpaththatwillbeusedwithintheiRODSresourcevault.Twofunctionscanbecalled:

msiSetGraftPathScheme(addUserName,trimDirCnt)‐SettheVaultPathschemetoGRAFT_PATH‐graft(add)thelogicalpathtothevaultpathoftheresourcewhengeneratingthephysicalpathforadataobject.Thefirstargument(addUserName)specifieswhethertheuserNameshouldbeaddedtothephysicalpath.e.g.$vaultPath/$userName/$logicalPath."addUserName"canhavetwovalues‐yesorno.Thesecondargument(trimDirCnt)specifiesthenumberofleadingdirectoryelementsofthelogicalpathtotrim.Avalueof0or1isallowable.Thedefaultvalueis1.

msiSetRandomScheme()‐SettheVaultPathschemetoRANDOMmeaningarandomlygeneratedpathisappendedtothevaultPathwhengeneratingthephysicalpath.e.g.,$vaultPath/$userName/$randomPath.TheadvantagewiththeRANDOMschemeisrenamingoperations(imv,irm)aremuchfasterbecausethereisnoneedtorenamethecorrespondingphysicalpath.

ThedefaultistheGRAFT_PATHschemewithaddUserName==noandtrimDirCnt==1.Note:iftrimDirCntisgreaterthan1,thehomeortrashdirectorynamewillbetakenout.Thepolicyimplementsaconstraint:

AppliedattheacSetVaultPathPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiSetGraftPathScheme

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.7 Setnumberofexecutionthreadsusedtoprocessrules(Policy7)ThispolicyspecifiesthenumberofprocessestousewhenrunningjobsintheirodsReServer.TheirodsReServercanmulti‐tasksuchthatoneortwolongrunningjobscannotblocktheexecutionofotherjobs.Onefunctioncanbecalled:

msiSetReServerNumProc(numProc)‐numProccanbe"default"oranumberintherange0‐4.Avalueof0meansnoforking.ThevalueofnumProcwillbesetto1if"default"isinput.

Page 29: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

21

Thepolicyimplementsaconstraint:AppliedattheacSetReServerNumProcpolicyenforcementpoint

Theoperationsthatareperformedare:msiSetReServerNumProc

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.8 Set policy for processing files in bulk (Policy 8) Thisrulesetsthepolicyforexecutingthepostprocessingputrule(acPostProcForPut)forbulkputoperations.Sincethebulkputoptionisintendedtoimprovetheuploadspeed,executingtheacPostProcForPutforeveryfilewillslowdownthetheupload.Thisruleprovidesanoptiontoturnthepostprocessingoff.Onlyonefunctioncanbecalled:

msiSetBulkPutPostProcPolicy(flag)‐Thismicro‐servicesetswhethertheacPostProcForPutrulewillberunonbulkput.Validvaluesfortheflagare:

"on"‐enableexecutionofacPostProcForPut."off"‐disableexecutionofacPostProcForPut(default).

Thepolicyimplementsaconstraint:

AppliedattheacBulkPutPostProcPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiSetBulkPutPostProcPolicy

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.9 Manageindexingofthesystemstatecatalog(Policy9)Thisrulecontrolstheautomatedindexingofthemetadatacatalog.Intheruleexample,theindexingisdelayeduntilafuturetimespecifiedbythevariable*arg1.Validdelayexamplesfor*arg1are:

"<PLUSET>1s</PLUSET>" –delayexecutionforonesecond"<PLUSET>1m</PLUSET>"–delayexecutionforoneminute"<PLUSET>1h</PLUSET>" –delayexecutionforonehour"<PLUSET>1d</PLUSET>" –delayexecutionforoneday"<PLUSET>1y</PLUSET>" –delayexecutionforoneyear"<EA>ils.renci.org</EA>" ‐hostaddresswhereexecutionisperformed

ThispolicywasprovidediniRODSversion3.3,buthasbeendeprecatediniRODSversion4.x.Thepolicyimplementedaconstraint:

AppliedattheacVacuumpolicyenforcementpoint

Theoperationsthatwereperformedare:

Page 30: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

22

delaymsiVacuum

2.10 Setstoragequotapolicy(Policy10)Thisrulecanbeusedtoturnonresourcequotaenforcement.Themaximumstoragespaceforeachusercanbesetusingtheadministratorcommand,iadmin.Quotascanbesetforusersandforgroupsofusers,foreitherthetotalallowedstorageorforthestorageonaspecificstoragesystem.Onlyonefunctioncanbecalled:

msiSetRescQuotaPolicy()‐Thismicro‐servicesetswhethertheResourceQuotashouldbeenforced.Validvaluesfortheflagare:"on"‐enableResourceQuotaenforcement,"off"‐disableResourceQuotaenforcement(default).

Thepolicyimplementsaconstraint:

AppliedattheacRescQuotaPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiSetRescQuotaPolicy

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

2.11 Manageselectionofstorageresource(Policy11)Thispolicyisinvokedwhencreatingadataobject.Theruledefineshowresourcesareselectedforstoringfiles.Thisisapreprocessingrulethatisexecutedbeforetheobjectiscreated.Itcanbeusedtosettheresourceselectionschemewhenprocessingtheput,copyandreplicateoperations.Currently,threepreprocessingfunctionscanbeusedbythisrule:

msiSetNoDirectRescInp(rescList)‐setsalistofresourcesthatcannotbeusedbyanormaluserdirectly.Morethanoneresourcecanbeinputusingthecharacter"%"asseparator.e.g.,resc1%resc2%resc3.Thisfunctionisoptional,butifused,shouldbethefirstfunctiontoexecutebecauseitscreenstheresourceinput.

msiSetDefaultResc(defaultRescList,optionStr)‐setsthedefaultresource.Thisfunctionisnolongermandatory,butifitisused,ifshouldbeexecutedrightafterthescreeningfunctionmsiSetNoDirectRescInp.

defaultResc‐theresourcetouseifnoresourceisinput.A"null"meansthereisnodefaultResc.Morethanoneresourcecanbeinputusingthecharacter"%"asseparator.

optionStr–Valuecanbe"forced","preferred"or"null".A"forced"inputmeansthedefaultRescwillbeusedregardlessoftheuserinput.Theforcedactiononlyappliestouserswithnormalprivilege,“rodsuser”.

msiSetRescSortScheme(sortScheme)‐settheschemeforselectingthebestresourcetousewhencreatingadataobject.

sortScheme‐Thesortingscheme.Validschemesare"default","random","byLoad"and"byRescClass".The"byRescClass"schemewillputthe

Page 31: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

23

cacheclassofresourceonthetopofthelist.The"byLoad"schemewillputtheleastloadedresourceonthetopofthelist.Inordertoworkproperly,theResourceMonitoringsystemmustbeswitchedoninordertopickuptheloadinformationforeachserverintheresourcegrouplist.Thescheme"random"and"byRescClass"canbeappliedinsequence.e.g.,

msiSetRescSortScheme(random)msiSetRescSortScheme(byRescClass)

willselectrandomlyacacheclassresourceandputitonthetopofthelist.

Thepolicyimplementsaconstraint:AppliedattheacSetRescSchemeForCreatepolicyenforcementpoint

Theoperationsthatareperformedare:msiSetDefaultResc

Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template

Page 32: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

24

3 DataManagementPolicySet(SILSLifeTimeLibrary)TheLifeTimeLibraryusesfiveadditionalpoliciestocontrolcreationofpersonaldigitallibrariesforstudents.Oneofthesepoliciesmodifiestheoptionforselectingthedefaultstorageresource.Asecondpolicyturnsonquotaenforcement.Thusonlythreepoliciesrepresentnewrules.Thepoliciesare:

3.1 Turnonstoragequotaenforcement(Policy10)Thisruleimplementsrestrictionsonthetotalamountofstoragespacethatcanbeusedbyastudent.Whenthequotaisexceeded,astudentwillbeabletoreadfiles,butwillnotbeabletowritenewfiles.Thequotavaluesaresetbyrunningtheiadmincommand.iadminsuqUserNameResourceName ‐tosetaquotaonastorageresourceiadminsuqUserNametotal ‐tosetatotalstoragequota

Thepolicyimplementsaconstraint:AppliedattheacRescQuotaPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiSetRescQuotaPolicy

Theruleisavailableathttps://github.com/DICE‐UNC/policyworkbook/blob/master/acRescQuotaPolicy.re

3.1.1 Check for missing quotas Thispolicyidentifiesallaccounts(usernames)forwhichaquotahasnotbeenset.Thepolicyusespersistentstateinformation:

USER_IDUSER_NAMEQUOTA_USER_ID

Theoperationsthatareperformedare:foreachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐missing‐quota.r

3.1.2 Calculate total storage usage Thispolicycalculatesthetotalamountofstorageusedbypersonandidentifiesthepersonwhohasstoredthemostdata.

Page 33: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

25

Thepolicyusespersistentstateinformation:USER_IDUSER_NAMEQUOTA_USAGEQUOTA_USAGE_USER_ID

Theoperationsthatareperformedare:foreachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐storageReport.r

3.1.3 Identify persons who exceeded their quota Thisruleidentifiestheindividualswhohaveexceededtheirquotaandliststhetop10usersofstorage.Thisusestwopolicyfunctions,

createListaddToList.

Thepolicyusespersistentstateinformation:

USER_IDUSER_NAMEUSER_ZONEQUOTA_OVERQUOTA_USER_IDQUOTA_USAGEQUOTA_USAGE_USER_ID

Theoperationsthatareperformedare:breakselectforeachifwriteLinestrlenelem

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐checkQuota.r

3.1.4 Periodically update quota check ThestorageusageisupdatedwhenthemsiQuotamicro‐serviceisrun.Theusagecanalsobeupdatedbyrunningtheadministrativecommand:

Page 34: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

26

iadmincuThisruleupdatestheusageeveryday.Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

delaymsiQuotawriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐missing‐quota.r

3.2 Manageselectionofstorageresource(Policy11)ThisrulechangesthenameofthedefaultstoragesystemthatisusedforstoringfileswithintheLifeTimeLibrary.

Thepolicyimplementsaconstraint:AppliedattheacSetRescSchemeForCreatepolicyenforcementpoint

Theoperationsthatareperformedare:msiSetDefaultResc

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetRescSchemeForCreate.re

3.3 Manageselectionofstorageresourceforreplication(Policy12)ThisrulechangesthedefaultstoragesystemnameforreplicationoffileswithintheLifeTimeLibrary.Thepolicyimplementsaconstraint:

AppliedattheacSetRescSchemeForReplpolicyenforcementpoint

Theoperationsthatareperformedare:msiSetDefaultResc

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetRescSchemeForRepl.re

3.4 Enforcereplicationofeachnewfile(Policy13)Thisruleimplementsanintegrityrequirement,ensuringthateachfileaddedtotheLifeTimeLibraryisreplicatedtoasecondstoragesystem.Thereplicationisqueuedforexecutiontominimizewaittimeontheoriginalputaction.Currently,threepostprocessingfunctionscanbeusedindividuallyorinsequenceintheacPostProcForPutrule:msiSysChksumDataObj–createachecksumonthefileandstorethechecksumin

Page 35: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

27

themetadatacatalogunderthepersistentstatevariablename“DATA_CHECKSUM”.

msiExtractNaraMetadata‐extractandregistermetadatafromthejustuploadedNARAfiles.

msiSysReplDataObj(replResc,flag)‐canbeusedtoreplicateacopyofthefilejustuploadedorcopieddataobjecttothespecifiedreplicaresource(replResc).Validvaluesforthe"flag"inputare"all","updateRepl"and"rbudpTransfer".Morethanoneflagvaluescanbesetusingthe"%"characterasseparator.e.g.,"all%updateRepl"."updateRepl"meansupdateanexistingstalecopytothelatestcopy.The"all"flagmeansreplicatetoallresourcesinaresourcegrouporupdateallstalecopiesifthe"updateRepl"flagisalsoset."rbudpTransfer"meanstheRBUDPprotocolwillbeusedforthetransfer.A"null"inputmeansasinglereplicawillbemadeinoneoftheresourcesintheresourcegroup.ItmaybedesirabletodoreplicationonlyifthedataObjectisstoredinaresourcegroup.

Thepolicyimplementsaconstraint:AppliedattheacPostProcForPutpolicyenforcementpointChecksforspecificobjectpath,like"/lifelibZone/home/*"

Thesessionvariablesare: $objPathTheoperationsthatareperformedare:

delaymsiSysReplDataObj

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐ReplSILS.re

3.5 Manageaccesscontrolpolicy(Policy14)Thisrulekeepsusersfromseeingthenamesofotheruser’sfiles,andisneededtoensurethateachstudentcollectionisprivatetothatstudent.TherulesetstheAccessControlListpolicy.IftheruleisnotcalledorcalledwithanargumentotherthanSTRICT,theSTANDARDsettingisineffect,whichisfineformanysites.Bydefault,usersareallowedtoseecertainmetadata,forexamplethedata‐objectandsub‐collectionnamesineachother'scollections.WhenaccesscontrolsaremadeSTRICTbycallingmsiAclPolicy(STRICT),theGeneralQueryAccessControlisappliedoncollectionsanddataobjectmetadatawhichmeansthatthelistcommand,ils,willneed'read'accessorbettertothecollectiontoreturnthecollectioncontents(nameofdata‐objects,sub‐collections,etc.).Thedefaultisthenormal,non‐strictlevel,allowinguserstoseenamesofothercollections.Inallcases,accesscontroltothedata‐objectsisenforced.Evenifapersoncanseefilenamesinacollection,“read”accessisrequiredonafiletobeable

Page 36: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

28

toreadthefile.EvenwithSTRICTaccesscontrol,however,theadminuserisnotrestrictedsovariousmicroservicesandquerieswillstillbeabletoevaluatesystem‐wideinformation.Thesessionvariable,“$userNameClient”canbeusedtolimitactionstoindividualusers.However,thisisonlysecureinanirods‐passwordenvironment(notGSI),butyoucanthenhaverulesforspecificusers:

acAclPolicy{ON($userNameClient=="quickshare"){}}acAclPolicy{msiAclPolicy("STRICT");}

whichwasrequestedbyARCS(SeanFleming).SeersGenQuery.cformoreinformationon$userNameClient.Thetypicaluseistojustsetitstrictornotforallusers:Thepolicyimplementsaconstraint:

AppliedattheacAclPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiAclPolicy

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re

Page 37: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

29

4 DataAdministrationPolicySet(RDAPracticalPolicyworkinggroup)

TheResearchDataAlliancePracticalPolicyworkinggroupconductedasurveyof41sitesthatweremanagingdatacollections.Asetof11policycategoriesthatwereappliedacrossmostofthesiteswasidentified.Thepoliciesincludeautomationofadministrativefunctions,enforcementofmanagementdecisions,andvalidationofassessmentcriteria.ThepoliciesarelistedinTable1andhaveminimaloverlapwiththepolicysetsfordatasharingandstudentdigitallibraries,exceptforpoliciestomanageaccesscontrols.Foreachpolicycategory,multiplepoliciesmaybedefined.

4.1 Dataaccesscontrolpolicies(Policy14)Automatedapplicationofaccessrestrictionsbasedonmetadatasimplifiesadministrationofadatagrid.Everyrepositoryneedstobeabletoeasilyrestrictvariousdatasetstospecificaudiences(e.g.,campusmembersaregrantedreadaccessduetolicensing,whilewriteaccessisgrantedtocreatorsofacollection).Thisinformationisstoredassystemmetadataandischeckedonallaccesses.Accesscontrolsrequiretheabilitytoassignauniqueidentifiertoeachperson,validatetheidentityofeachuser,andthenauthorizeeachoperation.WithintheiRODSdatagrid,uniqueidentifiersareassignedtousersandfiles.Theidentifiersareusedtoassociateacccesscontrolswithausername.

4.1.1 FindtheUser_IDassociatedwithaUser_name:Sinceidentifiersforusersmaybesetaseitherstrings(USER_NAME)orintegers(USER_ID),apolicythatallowsapersontofindtheUSER_IDfortheirUSER_NAMEisuseful.Thispolicyqueriesametadatacatalog,andretrievestheUSER_ID for the person who is running the rule. The policy can be appliedinteractively to files within a collection, or can be automated as part of a fileingestionprocess.Fortheinteractiveversionofthepolicy,theoutputiswrittentothescreen.Thepolicyusespersistentstateinformation:

USER_IDUSER_NAME

Theoperationsthatareperformedare:foreachselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐userID.r

Page 38: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

30

4.1.2 FindtheFile_IDassociatedwithafilename:Sinceidentifiersforfilesmayalsobesetaseitherstrings(DATA_NAME)orintegers(DATA_ID),apolicythatfindstheDATA_IDforafileisuseful.Thispolicyqueriesametadatacatalog,andretrievestheDATA_IDforaspecifiedfilenamethatisinput to the rule. The result is written to the screen. The rule uses the policyfunctions: checkCollInput checkFileInputTheinputvariablesare:

*File afilename*RelativeCollectionName arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAME

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐fileID.r

4.1.3 Setwriteaccesscontrolforauser:Apersoncansetanaccesscontrolonafilethattheyownbyspecifyingthefilename,thedesiredaccesscontrol,andtheusernamethatwillbegivenaccess.Thispolicyreadsasinputtheusername,thecollectionandfileonwhichtheaccesscontrolisset,andthedesiredaccesscontrol. Themetadatacatalogisupdatedtorecordthechangeinaccesscontrol.Thisissimilartotheichmodcommand.Thisruleusesthepolicyfunctions:

checkCollInputcheckFileInputcheckPathInputcheckUserInputfindZoneHostName

Page 39: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

31

Theinputvariablesare:*Acl anaccesspermission*File afilename*RelativeCollection arelativecollectionname*User ausername

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusesthepersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

failforeachifmsiSetACLmsiSplitPathmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐setACL.r

4.1.4 Setoperationsthatareallowablefortheuser"public"Thispolicycontrolstheoperationsthat“public”usersareallowedtoexecute.Only 2 operations are allowed ‐"read" ‐ read files; and "query" ‐ browse somesystemlevelmetadata.Bothoperationscanbespecifiedbyusingtheseparator“%”.The rule uses the micro‐service “msiSetPublicUserOpr” to specify what types ofpublic accessoperationsareallowed. Themicro‐servicesare called fromapolicyenforcementpointassociatedwithsettingPublicUserPolicy.Thepolicyimplementsaconstraint:

Page 40: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

32

AppliedattheacSetPublicUserPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiSetPublicUserOpr

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetPublicUserPolicy.re

4.1.5 Checktheaccesscontrolsonafile:This policy checks each file in a collection forwhether a specific user has access.This rule has input parameters for the names of a collection and user for whichaccess controlswill be checked. Thedesiredaccesspermission is comparedwiththeaccesspermissions seton the file. If theaccess control isnot found, anerrormessage is written. In practice, access control checks on files are enforcedautomaticallybytheiRODSframework.Thisruleusespolicyfunctions:

checkCollInputcheckUserInputfindZoneHostName

Theinputvariablesare:

*Coll arelativecollectionname*User ausername

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:fail

Page 41: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

33

foreachifmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐acl.r

4.2 Dataformatcontrolpolicies(Policy15)FormatssuchasSPSS,SAS,andStatawillnotbearoundforeversoweneedtomovedataoutofsuchformatsintoopenandmoredurableformats.Policiesareneededtoidentifythedataformatsthatarepresentinacollection,andtransformobsoletedataformats.

4.2.1 SetformatconversionflagApolicyisneededtospecifywhenformatconversionisrequired.Thispolicysetsaconversionflagwhenthedatatypeisaspecifiedformat.Thedatatypeisnormallydefinedforafilewhenitisloadedintothedatagrid.Seethecommand iput–D“datatype”file‐nameTheruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Collrel arelativecollectionname*Type adatatype

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_NAMEDATA_TYPE_NAME

Theoperationsthatareperformedare:failforeachifmsiAddKeyValmsiAssociateKeyValuePairsToObjselectwriteLine

Page 42: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

34

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐setconv.r

4.2.2 Invoke format conversion ThispolicyinvokestheNCSAPolyglotservicetotransformadataformat.ThisexternalserviceisinvokedbysendinghttprequeststoaserveratDrexelUniversity.NotethatthefilethatisbeingconvertedwillalsobemovedtoDrexel,withtheconvertedfilereturnedoverthenetwork.Theruleusesthepolicyfunctions: addAVUMetadata deleteAVUMetadataTherulehasaconstraint: *Aname mustequal “ConvertMe”Theinputvariablesare:

*Aname ‐flagwithvalue"ConvertMe"*ItemName ‐pathofthefilebeingconverted

Outputfromtheconversionprogramis:

*out ‐nameoftheconvertedfileThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

ifirods_curl‐getmsiRemoveKeyValuePairsFromObjmsiSetAVUmsiString2KeyValPair

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐convertfile.r

4.2.3 IdentifyandarchivespecificfileformatsfromastagingareaFileformattypeisstoredinastateinformationvariablecalledDATA_TYPE_NAME.Queriescanbeissuedagainstthemetadatacatalogtoretrievefileswithagivenformattype.Operationsarealsosupportedforextractingthefileformattypeofafile,basedonthefileextension.Thispolicyexaminesastagingareaforfileswithaspecificformattype.Thefileformatisdeterminedfromthefileextension.Filesthathaveadesiredextension,

Page 43: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

35

inthiscaseanextension“.r”,aremovedintoaspecifiedcollection.Thismakesitpossibletosortfilesbyfileformattype.Thecollectionthatcorrespondstothestagingareaandthecollectionthatcorrespondstothedestinationarchivearereadfrominput.Notethatwhenafileismoved,theaccesscontrolsmustbereset.Thisruleusesthepolicyfunctions:

checkCollInput createLogFile isColl

Theinputvariablesare:

*Coll arelativecollectionname*Res astorageresource*Stage arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_NAME

Theoperationsthatareperformedare:delayfailforeachifmsiCollCreatemsiDataObjCreatemsiDataObjRenamemsiGetSystemTImemsiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐stageformat.r

4.3 NotificationPolicies(Policy16)Eventsthatoccurwithinthedatamanagementsystemcanbeloggedinanaudittrail.Theaudittrailcanbeparsedtoanalyzewhathashappened.Eventscanalsobemonitored,withappropriateE‐mailsenttoanadministrator.Eventscanalsobetrackedthroughnotificationsthataresenttoanindexingservereachtimea

Page 44: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

36

specifiedactionoccurs.Automatedcreationofeventmetadataisneededasdatasetsanddatacollectionsarebeingprocessed.Currentlythisisbeingdonemanuallyformostcollectionsatgreatcostandeffort.

4.3.1 NotifyoncollectiondeletionNotificationpoliciesareimplementedatPolicyEnforcementPoints,eitherbeforeanactionoccursoraftertheactioniscompleted.Arulecanbecreatedthatspecifiesthetypeofnotificationthatwillbeused.ThispolicysendsE‐mailtoanadministratorondeletionofacollection.Asessionvariable,$collName,isusedtoidentifywhichcollectionisbeingdeleted.

Thepolicyimplementsaconstraint:AppliedattheacPreprocForRmCollpolicyenforcementpoint

Theoperationsthatareperformedare:

msiSendMail

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPreProcForRmColl.re

4.3.2 Notification of events EventscanbedetectedatallpolicyenforcementpointsthroughuseofaC++versionofthepluggableruleengine.TheC++versionisfastenoughtotrackalloperationsperformedwithinthedatamanagementsystem.Thedetectedeventsaredocumentedinmessagesthataresenttoamessagequeueforprocessingbyanexternalindexingsystem.Thiscapabilitywillbeavailableinversion4.2ofiRODS.Policiescanthenbeassociatedwitheachmicro‐serviceplugintoautomateeventdetectionandauditing.Oneapplicationisthecorrelationofeachchangetothepersistentstateinformationwiththeeventthatcausedthechange.Thisrequiresmappingfromclientactions,tothepolicyenforcementpointsthatareinvoked,tothepoliciesthatarethenenforced,tothemicro‐servicesthatareexecuted,tothepersistentstateinformationattributesthataremodifiedorchanged.Anexampleofhowthiscanbedonebyhandisgiveninchapter8.Asimilarapproachcanbeusedtoauditallactionsperformeduponthedatamanagementsystem.ComputeractionablepoliciesformonitoringeventsarelistedinChapter5.6.The“rule_exists”functiontellstheruleenginepluginsystemwhichrulesthispluginlistensto.Inthiscaseitlistenstoanyruleunderthe"audit_"namespace.The“exec_rule”functionactuallyhandlestheauditing.Itlogsname,arguments,andthecondInputDatafieldoftheREIin‐memorystructureofanoperation,etc.totheserverlog.ThefullcodewillbeavailableonGithubwiththe4.2release.

Page 45: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

37

4.4 Useagreementpolicies(Policy17)Thecreationofauseagreementrequiresaninteractionwitheachuser,independentlyofthedatagrid.Theresultinginformationcanbecapturedasmetadatathatisassociatedwitheachfileinacollection.Itisthenpossibletotrackwhetherauseagreementhasbeenreceived,andwritepoliciesthatrestrictaccesswhenfileshavenoofficialuseagreement.

4.4.1 SetreceiptofsigneduseagreementAmetadataattributecanbedefinedforeachusertodesignatereceiptofasigneduseragreement.Thisisanexampleofauser‐definedmetadataattributethatcanbeassociatedwitheachusername.The policy sets the use agreement for a specified user. This policy uses themetadata attribute “Use_Agreement” to store a value of “RECEIVED” when a useagreementisconfirmed.Theruleusesthepolicyfunction: checkUserInput findZoneHostNameTheinputvariablesare:

*User ausernameThesessionvariablesare: $rodsZoneClientThepolicyusesthepersistentstateinformation:

USER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

failforeachifmsiAddKeyValmsiAssociateKeyValuePairsToObjmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐useSet.r

Page 46: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

38

4.4.2 IdentifyuserswithoutsigneduseagreementThispolicyqueriesallusernamestofinduserswhoeitherdonothavea“Use_Agreement”metadataattributename,orhaveavaluethatisnot“RECEIVED”.Ifeithercaseisfound,amessageiswrittentothescreen.Therearenoinputvariables.

Therearenosessionvariables.Thepolicyusespersistentstateinformation:

META_USER_ATTR_NAMEMETA_USER_ATTR_VALUEUSER_NAME

Theoperationsthatareperformedare:foreachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐useVerify.r

4.5 Integritypolicy(Policy18)Policiesaretypicallycreatedtoverifytheintegrityoffilesbycomparingthecurrentchecksumwithasavedvalueofthechecksum.However,integritypoliciescanalsobecreatedtoverifyaccesscontrolsonacollection,verifythepresenceofrequiredmetadata,verifyfiledistribution,etc.

4.5.1 VerifyaccesscontrolsonfilesThisruleanalysesthefilesinacollectiontoverifythatarequiredaccesscontrolispresentoneachfile.Theinputincludesthenameofthecollectionthatwillbeverified,thetypeofaccesscontrolthatisrequired,andthenameofapersonforwhichtheaccesscontrolisset.Theruleverifiesthecollectionname,retrievesaUSER_IDforthenamedperson,andretrievesaDATA_ACCESS_DATA_IDnumberforthetypeofaccesscontrol.Aloopismadeoverthefilesinthecollection,withasub‐loopthatverifiestheaccesscontroloneachfile.Theresultsareprintedtothescreen.Theruleusesthepolicyfunctions: checkCollInput checkUserInput findZoneHostNameTheinputvariablesare:

*Acl anaccesscontrol*Coll arelativecollectionname*User ausername

Page 47: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

39

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐integrityACL.r

4.5.2 CheckintegrityandnumberofreplicasoffilesinacollectionThispolicyimplements17basicoperationsneededforaproductionqualityruleforverifyingtheintegrityofacollection.Thebasicoperationsinclude:

1. Verifyingallinputparametersforconsistency2. Retrievingstateinformationfromthemetadatacatalogoneachexecution3. Verifying integrity of each file by comparing the saved checksumwith the

computedchecksum4. Updatingallreplicastothemostrecentversion5. Minimizingtheloadontheproductionservicesthroughadeadlinescheduler6. Differentiatingbetweenthelogicalnameforthefileandthephysicallocation

ofthereplicas

Page 48: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

40

7. Identifyingmissingreplicasanddocumentingtheirabsence8. Creatingnewreplicastoreplacemissingfiles9. Implementingloadlevelingtodistributefilescrossavailablestoragesystems10. Creatingalogfiletorecordallrepairoperationsandstoringthelogfileinthe

datagrid11. Trackingprogressofthepolicyexecution12. Initializingtheruleforthefirstexecution,includingsettingvariablestotrack

progress.13. Enablingrestartfromthelastcheckedfile14. Manipulatingfilesinbatchesof256filesatatimetohandlearbitrarilylarge

collections15. Minimizingthenumberofsleepperiodsrequiredbythedeadlinescheduler16. Checkingnewfilesthathavebeenaddedonarestart17. Generatingstatisticsabouttheexecutionrateandpropertiesofthefilesthat

werechecked.Implementingall17operationsincreasesthesizeoftheproductionpolicysubstantially.However,itispossibletoshowthattheaveragetimespentperfileisstilllessthanadiskrotationperiod,implyingthattheproductionruleissuitableforverifyingintegrityacrossarbitrarilylargecollections.

Thepolicytoperiodicallycheckintegrityusesthepolicyfunctions:

addAVUMetadataToColl checkCollInput checkMetaExistsColl checkRescInput createLogFile createReplicas findZoneHostName getNumSizeColl getRescColl isColl selectRescUpdate updateCollMeta verifyReplicaChksum

Theinputvariablesare:*Coll acollectionpathname*Delt alengthoftimetoruninseconds*NumReplicas numberofreplicas*Res astorageresource

Thesessionvariablesare: $rodsZoneClient

Page 49: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

41

Thepolicyusespersistentstateinformation:COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDDATA_NAMEDATA_REPL_NUMDATA_RESC_NAMEDATA_SIZEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:breakconsdelayelemfailforforeachiflistmsiAssociateKeyValuePairsToObjmsiCollCreatemsiDataObjChksummsiDataObjCreatemsiDataObjReplmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiSetAVUmsiSleepmsiSplitPathByKeymsiString2KeyValPairremoteselectsetelemwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐integrityACL.r

Page 50: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

42

4.6 Metadata extraction (Policy 19) Thenecessarytaskinbuildingadigitallibraryisthecreationofprovenanceanddescriptivemetadata.Thistypicallyrequiresinteractivecreationofthedescriptivemetadata.Forcollectionsthathavemorethanathousanddigitalobjects,thisbecomesalaborioustask.Ifthemetadataattributescanbeaggregatedintoastandardformat,thenbulkloadingofmetadatamaybeappropriate.ExamplesincludebulkloadingfromanXMLfileorapipe‐delimitedfile.Analternateapproachis“feature‐based”indexing,inwhichthedigitalobjectisexaminedforthepresenceofdesiredfeatures.Informationaboutafeatureisextractedandregisteredasmetadataonthedigitalobject.Anexampleispattern‐basedrecognitionofdescriptivemetadatawithinatextfile.

4.6.1 LoadmetadatafromanXMLfileMetadatacanbeloadedintoadatagriddirectlyfromanXMLfile.ThispolicyassumesaspecificstructurefortheXMLfileoftheform:

<?xmlversion="1.0"encoding="UTF‐8"?><metadata><AVU><Target>/$rodsZoneClient/home/$userNameClient/XML/sample.xml</Target><Attribute>OrderID</Attribute><Value>889923</Value><Unit/></AVU><AVU><Target>/$rodsZoneClient/home/$userNameClient/XML/sample.xml</Target><Attribute>OrderPerson</Attribute><Value>JohnSmith</Value><Unit/></AVU></metadata>

Notethatthisspecifiesthetargetfiletowhichthemetadataisadded.Eachmetadataattribute,value,andunitisformedintoanAVUthatisattachedasmetadatatothefile.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*targetObj arelativecollectionname*xmlObj arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

DATA_IDDATA_NAME

Page 51: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

43

COLL_NAME

Theoperationsthatareperformedare:failforeachifmsiLoadMetadataFromXmlmsiSplitPathselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐loadMetadataFromXml.r

4.6.2 Load metadata from a pipe‐delimited file Metadatacanbeloadedintoadatagriddirectlyfromapipe‐delimitedfile.Thispolicyassumesaspecificstructureforthepipe‐delimitedfileoftheform:

File‐name|attribute‐name|attribute‐valueFile‐name|attribute‐name|attribute‐value|unitsC‐collection‐name|attribute‐name|attribute‐valueC‐collection‐name|attribute‐name|attribute‐value|units

ForthespecifiedFile‐nameorcollection‐name,thepipe‐delimitedvaluesfortheattributename,theattributevalue,andtheattributeunitsorcommentscanbebulkloaded.Thisruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*Coll arelativecollectionnameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

DATA_IDDATA_NAMECOLL_NAME

Theoperationsthatareperformedare:failforeachifmsiLoadMetadataFromDataObj

Page 52: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

44

msiSplitPathselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metaloadpipe.r

4.6.3 ContextualmetadataextractionthroughpatternrecognitionPatternmatchingoperationscanbeappliedtotexttoextractcontextualmetadata.Atemplateforpatternmatchingcanbecreatedthatdefinestriplets:

<pre‐string‐regexp,keyword,post‐string‐regexp>.

Thetripletsarereadintomemory,andthenusedtosearchadatabuffer.Foreachsetofpreandpostregularexpressions,thestringbetweenthemisassociatedwiththespecifiedkeywordandcanbestoredasametadataattributeonthefile.Intheexample,thetemplatefilehastheformat:

<PRETAG>X‐Mailer:</PRETAG>MailerUser<POSTTAG></POSTTAG><PRETAG>Date:</PRETAG>SentDate<POSTTAG></POSTTAG><PRETAG>From:</PRETAG>Sender<POSTTAG></POSTTAG><PRETAG>To:</PRETAG>PrimaryRecipient<POSTTAG></POSTTAG><PRETAG>Cc:</PRETAG>OtherRecipient<POSTTAG></POSTTAG><PRETAG>Subject:</PRETAG>Subject<POSTTAG></POSTTAG><PRETAG>Content‐Type:</PRETAG>ContentType<POSTTAG></POSTTAG>

Theendtagisactuallya"return"forunixsystems,ora"carriage‐return/linefeed"forWindowssystems.Theexamplerulereadsatextfileintoabufferinmemory,readsinthetemplatefilethatdefinestheregularexpressions,andthenparses the text in the buffer to identify presence of a desiredmetadata attribute.Theruleusesthepolicyfunction: checkPathInput

Theinputvariablesare:*Len numberofbytes*Outfile arelativepathforafile*Pathfile arelativepathforafile*Tag arelativepathforafile

Thesessionvariablesare: $rodsZoneClient

Page 53: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

45

$userNameClientThepolicyusespersistentstateinformation:

DATA_IDDATA_NAMECOLL_NAME

Theoperationsthatareperformedare:failforeachifmsiAssociateKeyValuePairsToObjmsiDataObjClosemsiDataObjOpenmsiDataObjReadmsiExtractTemplateMDFromBufmsiGetObjTypemsiLoadMetadataFromDataObjmsiReadMDTemplateIntoTagStructmsiSplitPathselectwriteKeyValPairswriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metaload.r

4.6.4 Stripping metadata from a file Itmaybenecessarytostripmetadatafromafilebeforeaddingtherequiredmetadata.Thefollowingruletakesasinputthepathtothefile,andremovesdescriptivemetadata.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*Path arelativepathtoafileThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

DATA_IDDATA_NAMECOLL_NAME

Page 54: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

46

Theoperationsthatareperformedare:failforeachifmsiSplitPathmsiStripAVUsselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metastrip.r

4.7 Databackuppolicies(Policy20)Databackupcantakemultipleforms:

Time‐stampedcopiesofdigitalobjectsthataresavedinaseparatecollection Replicasofdigitalobjectsthatcanbeaccessedwhentheoriginalis

unavailable Copiesofdigitalobjectsthatareputintoseparatecollectionsordatagrids

Thechoicedependsuponwhetheratimehistoryoftheevolutionofthefileisneededorwhetherrecoveryisneededwhenfilesarecorrupted.

4.7.1 DataversioningpolicyAversionofafilecanbecreatedbyaddingatimestamp,andmovingtheversiontoanarchivedirectory.Thisruleprocessesfilesinacollection,creatingaversionofeachfilethatisstoredinadestinationdirectorycalled“SaveVersions”.Theruleusesthepolicyfunction: checkCollInput Theinputvariablesare:

*Dest arelativecollectionname*SourceFile arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_NAME

Theoperationsthatareperformedare:failforeachifmsiDataObjCopy

Page 55: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

47

msiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐version.rTheversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*Fil afilenameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAME

Theoperationsthatareperformedare:breakfailforeachifmsiDataObjCopymsiGetSystemTimemsiSetACLmsiSplitPathstrlensubstrselectwhilewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐versionfile.r

4.7.2 DatabackupstagingpolicyWithintheiRODSdatagrid,backups,copies,andreplicascanbesupported.The

Page 56: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

48

differenceisthesetofstateinformationthatisneededforeachtypeofentity.Abackupisatime‐stampedcopyofafile.Areplicaisanadditionalcopyofafilethatisstoredonaseparatestoragesystem.Thereplicanumberistrackedalongwithwhethertheoriginalhasbeenchanged.Genericstateinformationincludesacreationtimeforthedataobject,thelocationwherethedataobjectisstored,theownerofthedataobject,modificationtimestamps,andaccesscontrols.Anoutcomeofthisapproachisthatitispossibletousethesameclienttoaccessbackups,copies,andreplicas.Thisrulecreatesatime‐stampedbackupdirectory,andcopiesallofthefilesfromthesourcedirectorytothebackupdirectory.Therulereadsfrominputthecollectionforwhichthebackupwillbedone,thestoragelocationwherethebackupswillbestored,andthedestinationcollectionthatwillholdthebackup.Withinthedestinationcollection,atime‐stampedsub‐directoryiscreatedtoholdeachbackupset.Therulecheckstheinput,checksthateachoperationcompletescorrectly,andwritesinformationtoaserverlog.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Collrel arelativecollectionname*Destrel arelativecollectionname*Resource astorageresource

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAME

Theoperationsthatareperformedare:delayfailforeachifmsiCollCreatemsiCollRsyncmsiGetSystemTimeselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐backup.r

Page 57: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

49

4.7.3 Copy files to a federated staging area Thisruletakesallfilesina“stage”directoryonthefirstdatagrid,copiesthemtoan“Archive”directoryontheseconddatagrid,anddeletesthefilefromthefirstdatagrid.Therulealsologsalloftheactionsandwritesthelogtoadirectoryintheseconddatagrid.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll arelativecollectionname*DestZone azonename*Res astorageresource*Stage arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChecksummsiDataObjCopymsiDataObjCreatemsiDataObjUnlinkmsiGetSystemTImemsiSetACLmsiSplitPathByKeyremoteselect

Page 58: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

50

writeLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐stage.r

4.8 Dataretentionpolicies(Policy21)Each file in a collection may have a different retention period, or all files in acollectionmayhavethesameretentionperiod.TheiRODSdatagridspecifiesadataexpiration date in themetadata attribute “DATA_EXPIRY”. The expiration date isstoredasaUnix timevariable. Informationabout thecreation timeofeach file isstoredinthemetadataattributeDATA_CREATE_TIME.

4.8.1 PurgepolicytofreestoragespaceThispolicymanagesacachetoensurethataminimumamountoffreespaceisavailablefordepositionofnewfiles.Thepolicyrunsperiodically,every24hours.Aninformationcatalogisqueriedtofindthetotalamountofstoragespacethatisbeingused.Thisiscomparedtoaninputparameterthatspecifiesthemaximumallowedspace.Additionalinputparametersspecifythecollectionandthestorageresourcenames.Asecondqueryretrievesinformationaboutthefilenames,filesizes,andcreationtime.Theresultsetisorderedbythecreationdate,makingitpossibletoloopoverthefiles,deletingtheoldestfilesuntiltherequiredfreespaceisavailable.ThispolicywasdevelopedbyJean‐YvesNiefoftheFrenchNationalInstituteforNuclearPhysicsandParticlePhysicsComputerCenter.Thisrulecouldbemodifiedtopurgeoldbackupdirectories.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputfindZoneHostName

Theinputvariablesare:

*CacheRescName astorageresource*Collection arelativecollectionname*MaxSpAlwdTBs sizeinterabytes

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_CREATE_TIMEDATA_NAMEDATA_RESC_NAMEDATA_SIZE

Page 59: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

51

RESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:breakdelayfailforeachifmsiDataObjTrimmsiGetIcatTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐purge.r

4.8.2 Data expiration policy Thispolicychecksthedatespecifiedbyanexpirationmetadataattributethathasbeenassignedtothefile,andcreatesalistofallfilesthathaveexpired.Inputparametersareusedtospecifythecollectionthatisbeingcheckedandwhetherexpiredfilesshouldbefound.AqueryismadetotheinformationcatalogtogetalistoftheDATA_EXPIRYdateforeachfile.ThisiscomparedtothecurrentUnixtime. Filesthathaveexpiredare listedandthetotalnumber iscounted. Theruleusesthepolicyfunction: checkCollInput

Theinputvariablesare:*Coll arelativecollectionname*Flag ametadataflag

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_EXPIRYDATA_IDDATA_NAME

Page 60: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

52

Theoperationsthatareperformedare:failforeachifmsiGetIcatTimeselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐expiry.r

4.9 Dispositionpolicyforexpiredfiles(Policy22)FilesintheiRODSdatagridcanbetaggedwithadditionalmetadataattributes.Forexample,ametadataattributewiththename“Retention_Flag”canbeaddedtoeachfile,alongwithametadataattributevaluesuchas“EXPIRED”or“NOT_EXPIRED”.Byusingmetadatatotrackthestatusofeachfile,itispossibletoseparatetheretentionpolicyfromthedispositionpolicy.Theretentionpolicycansetthemetadataattribute,andthedispositionpolicycanreadthemetadataattribute.Thisrulemigratesfilestoanarchivethathaveametadataattributewiththename“Retention_Flag”thathasthevalue”EXPIRED”.Therulereadsasinputthenameofthecollectionthatwillbecheckedandthenameofthedestinationcollection.Thecollectionnamesareverified.Aqueryisthenissuedtotheinformationcatalogtoretrievethenamesofthefilesinthecollectionthathavethe“EXPIRED”valueforthe“Retention_Flag”.Allofthereturnedfilesinthelistaremoved to thedestination collection. Note that the access controls on the filewillneedtoberesetafterthemove.Theruleusesthepolicyfunction: checkCollInput

Theinputvariablesare:*Archiverel arelativecollectionname*Collrel arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Page 61: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

53

Theoperationsthatareperformedare:failforeachifmsiDataObjRenameselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐disposition.r

4.10 Restrictedsearchingpolicy(Policy23)Searchpoliciesmaybeappliedtothenamesoffiles,ortothedescriptivemetadata,ortosystemstateinformation.Adatagridadministratormaybeabletoexamineallofthemetadataandseeallfilenames,butanindividualusermayonlybeabletoseethecontentthattheyown.AnewgenqueryinterfaceisbeingdevelopedforiRODSversion4.2whichwillsupportaccesscontrolsonmetadata.

4.10.1 StrictaccesscontrolThemostcommonlyrequestedrestrictionistolimittheabilityofuserstoseeanyotheruser’sfiles.Thiscanbeappliedtoallusers,orappliedtoaspecificuser.AstrictaccesscontrolisimplementedthroughthePolicyEnforcementPointcalledacAclPolicy.Themicro‐servicemsiAclPolicyimplementstherestriction.Thepolicyimplementsaconstraint:

AppliedattheacAclPolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiAclPolicy

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re

4.10.2 Controlled queries Aquerytoanexternaldatabasecanbecreatedandregisteredasadatabaseobject.Clickingontheregisteredquerywillcausethequerytobeexecutedwiththeresultsreturnedasafile.Thismakesitpossibletocontrolinteractionswithsearchengines.

4.11 Storagecostreports(Policy24)Reportscanbegeneratedthatsummarizetheuseofanyaspectofthedatagrid.Themostcommonreportsdetailusagebyuserbystoragesystem.

4.11.1 UsagereportbyusernameandstoragesystemThebasicapproachistocalculatetheamountofstorageusedoneachstoragedeviceandthentogenerateacostbymultiplyingusagebythechargeperstorageforthedevicetype.Thiscanberefinedtoimplementaseparatecostperstorage

Page 62: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

54

device.Thecostinformationcanbestoredasametadataattributethatisassociatedwitheachstorageresource.Thisrulesumstheamountofstorageusedforeachdevicebyeachuser.Aqueryisissuedtotheinformationcatalogthatsumsthestorageforeachhomedirectoryinthedatagrid.Theresultiswrittentothescreen.

Therearenoinputvariables:Thesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_RESC_NAMEDATA_SIZEUSER_NAME

Theoperationsthatareperformedare:foreachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐storage.r

4.11.2 CostreportbyusernameandstoragesystemAcostalgorithmisimplementedbystoringa“costperbyte”metadataattributeoneachstorageresource.The“costperbyte”attributeisstoredasthemetadataattributecalled“Storage_Cost”,withtheattributevalueequaltothestoragecostperbyte.Aqueryisissuedtotheinformationcatalogtogetalistoftheusers.Thenforeachuser,aqueryisissuedtosumthestorageforeachuserforeachstoragedevice.Thestoragecostperbyteisretrievedbyaquery,andthestoragecostiscalculated.

Therearenoinputvariables:Thesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:

DATA_RESC_NAMEDATA_SIZECOLL_NAME

Page 63: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

55

META_RESC_ATTR_NAMEMETA_RESC_ATTR_VALUEUSER_NAME

Theoperationsthatareperformedare:foreachselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐storageCost.r

Page 64: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

56

5 OdumDataPreservationPolicysetThepreservationpoliciesoverlapwiththeRDAdatamanagementpolicies.Table1showshowthepolicysetsarerelated.TheOdumdatapreservationpoliciestypicallyrequiredintegrationwithadditionalsoftwaresystemsforimplementation.Thus:

De‐identificationofdata UsesBitcuratorApplyinguniquedataidentifiers UsesHandlesystemDatanormalizationtonon‐proprietaryformats UsesPolyglotAuthenticationidentitymanagement UsesInCommonCreationofPREMISeventdata UsesmessagebusAssessmentcriteriavalidation UsesindexingtechnologyMappingmetadataacrosssystems UsesHIVEAutomaticchecksums UsesSHA‐128Trackinguse UsesDataBook

5.1 Automate access restrictions (Policy 14) Oneapproachistoassociateaccessrestrictionswithacollection,andthenhaveallfileswithinthecollectioninherittheaccesscontrols.Whenafileisputintothecollection,therequiredaccesscontrolsareautomaticallyapplied.

5.1.1 SetinheritanceofaccesscontrolsonacollectionAccesscontrolsonafilecanbeinheritedfromthecollectionintowhichthefileisorganized.Thisrulereadsasinputthecollectionnameandthensetsan“inherit”flagonthecollection.Filesthataredepositedintothecollectionwill“inherit”theaccesscontrolsthatweresetonthecollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Acl anaccesscontrol*RelativeCollection arelativecollectionname*User ausername

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_ID COLL_NAME

Theoperationsthatareperformedare:

failforeachifmsiSetACL

Page 65: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

57

selectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐inherit.r

5.1.2 Check whether a specific person has access to a collection Theruleshowninsection4.1.5checkseachfileinacollectiontodeterminewhetheraspecifiedpersonhasaccess.Thetypeofaccesscontrolisdisplayed.Therulefindstheperson’sUSER_IDandtheDATA_IDforeachfileinthecollection.

5.1.3 Identify all persons with access to files in a collection Thisrulecreatesalistofallofthepersonswhohaveaccesstoanyfilewithinacollection.Thenumberoffilesthatcanbeaccessedandthetotalsizeoftheaccessiblefilesiscalculated.Theruleusesthepolicyfunction: containsTherearenoinputvariables:Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_NAME

DATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_SIZETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAME

Theoperationsthatareperformedare:

failforeachifselectstrlenwriteLine

Theruleisavailableat

Page 66: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

58

http://github.com/DICE‐UNC/policy‐workbook/odum‐list‐ACL.r

5.1.4 Identify files that can be accessed by an account Onceacollectionhasbeenanalyzedtodeterminewhichaccountshaveaccess,thelistofaccountnamescanbeexaminedtodeterminewhichaccountaccessshouldbedeleted.Thefollowingrulelistsallofthefilesthatcanbeaccessedbyaspecifiedaccount.Theruleusesthepolicyfunctions: checkUserInput findZoneHostNameTheinputvariablesare:

*Usern ausernameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

failforeachifmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐list‐ACL‐files.r

5.1.5 Delete access to files for a specified account Thefollowingrulesetstheaccessforaspecifiedaccountto“null”forallfileswithinacollection.Onlyfilesthatoriginallyhadaccesspermissionssetfortheaccountareprocessed.Theruleusesthepolicyfunction:

Page 67: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

59

checkUserInputfindZoneHostName

Theinputvariablesare:

*Usern ausernameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

failforeachifmsiSetACLmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐delete‐access.r

5.1.6 Copy files, access control lists, and AVUs to a federated data grid Onewaytocreateanarchiveofacollectionistocopythefilestoanindependentdatagrid,alongwiththeaccesscontrolsanddescriptivemetadata.Thispolicyassumesthattwodatagridsarefederated,thatthepathnamingforfilesintheseconddatagridisthesameasthepathnameintheprimarydatagrid,andthatuseraccountsfromtheprimarydatagridhavebeenestablishedintheseconddatagrid.Thepolicycopieseachfilefromthespecifiedcollectionintheprimarydatagridintoanequivalentdirectoryintheseconddatagrid,copiestheaccesscontrols,andcopiesthemetadata.Ifanaccounthasnotbeensetupinthefederateddatagrid,the

Page 68: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

60

ACLisnotset.Currently,theAVUcopydoesnotworkandunitsneedtobecopied.Theruleusesthepolicyfunction:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll arelativecollectionname*DestZone azonename*Res astoraeresource*Stage arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_ID COLL_NAME

DATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITSMETA_DATA_ATTR_VALUERESC_IDRESC_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

failforeachifmsiCollCreatemsiDataObjCopy

Page 69: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

61

msiDataObjCreatemsiDataObjUnlinkmsiGetSystemTImemsiSetACLmsiSetAVUmsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐copy‐ACL‐AVU.r

5.2 Normalize data to non‐proprietary formats (Policy 15) Apreservationenvironmentmustensurethatthedepositedrecordswillbeviewableinthefuture.Viabledataformatswillhavenon‐proprietaryoropensourceapplicationsforparsingthedataformats.Examplesofopensourceformatsincludetextfilesandpdffiles.Thearchivewilltypicallymaintainalistofalloweddataformats,checkeachfilethatisarchivedforthedataformattype,andcreateaversionofthefileinasustainableformat.Archivesthatmanagepersistentobjectswillstillpreservetheoriginaldataformat,enablingmigrationtoalternatedataformatsinthefuture.

5.2.1 Detection of format type FilesthathavetheformattypeincludedasanextensioninthefilenamecanbeautomaticallyanalyzedtosettheDATA_TYPE_NAMEpersistentstateattribute.ItisthenpossibletoqueryDATA_TYPE_NAMEtodetectwhetherfilesarepresentwithadefineddatatype.Thispolicyguessesthedatatypebasedonthefileextension,andthensetstheDATA_TYPE_NAMEpersistentstatevariableforeachfileinacollection.Theruleusesthepolicyfunction: checkCollInput

Theinputvariablesare:*Collrel arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAME

Page 70: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

62

Theoperationsthatareperformedare:failforeachifmsiSetDataTypemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐set‐data‐type.r

5.2.2 Automate format type detection TheDATA_TYPE_NAMEcanbeautomaticallysetoneveryputofafileintothedatagrid.Theruleusesthe$objPathsessionvariabletogetthefilename.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpoint

Theoperationsthatareperformedare:msiSetDataTypeFromExt

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐datatype.re

5.2.3 Identify file format extensions in a collection Thispolicygeneratesalistoftheformatextensionsthatareusedinacollection,countsthenumberoffileswitheachextension,andsumsthesizesofthefileswitheachextension.Theruleusesthepolicyfunctions: contains extTherearenoinputvariables.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_NAMEDATA_SIZE

Theoperationsthatareperformedare:

Page 71: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

63

foreachifselectstrlenwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐list‐extensions.r

5.3 Creation of PREMIS event data (Policy 16) ThePREMISschemaidentifieseventsthatareappliedtorecordsinanarchive.Thetypesofeventsincludemodificationstotherecord,usageoftherecord,andactionstakenbythearchiveadministrator.ThepluggablearchitectureofiRODSversion4.1allowseachoperationtobeannotatedwithpre‐andpost‐policyenforcementpointsInformationabouttheexecutionoftheoperationcanbetrappedandwrittentoalogfile.ThelogfilecanbeprocessedtoaddPREMIS‐styleeventmetadatatoeachrecord.AscalableapproachusesanexternalindextomanagethePREMISeventmetadata.PREMISmetadataincludesinformationabout:[1] Datarecordcomposition,location,creatingapplication,creationdate,

dependencies,format,type,size,softwaredependencies[2] Environment,hardware,storagemedium[3] Linkstopermissionstatements,intellectualentities[4] Messages[5] Relatedobjects,relationshiptype[6] Signatures,signers[7] Eventtypes,values,sequenceTheeventsthatoccurwithinthedatamanagementenvironmentcanbemappedtoPREMISeventinformation:

relatedEventIdentifierType relatedEventIdentifierValue relatedEventSequence

Thisinformationcanbekeptinanexternalindexingsystemtoenableanalysis,identificationofthetypesofeventsthatoccurwithinthedatamanagementsystem,andtimelinesoftheeventsappliedtoaspecificdatarecord.Communicationwiththeexternalindexingsystemisdonethroughamessagequeue.

5.3.1 Creating PREMIS event information ThefollowingrulesarebasedontheDatabooksystemfortrackingeventinformationaboutusage,datasets,andusers.TherulecreatesaJSONdocumentrepresentinganaccesseventencodedasPREMISmetadataandsendsitviatheAdvancedMessageQueueProtocoltoanexternalindexingsystem.ThePREMISeventinformationiscreatedusingthepolicyfunctions:

Page 72: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

64

genAccessIdwhichgeneratesaURIrepresentingthisparticularevent. jsonEncodewhichencodesthedatasothattheycanbeconcatenatedwith

JSONstrings. sendAccesswhichgeneratesamessageandsendsitusingAMQP sendRelatedEventwhichcreatesaJSONdocumentdescribingarelatedevent

betweenobjects. sendLinkingEventwhichcreatesaJSONdocumentdescribingalinkbetween

twoobjects.

5.3.2 Sending messages over AMQP ManyindexingsystemsrespondtomessagesusingtheAdvancedMessageQueueProtocol(AMQP).Alibraryofpolicyfunctionshasbeenimplementedtosupportmessages,calleddfc‐amqp.re.Thefunctionsinclude:1. amqpSend(*Host,*Queue,*Msg)

Sendsamessage *Host Hostaddressformessagequeue*Queue Queueforreceivingmessage*Msg Message

2. amqpRecv(*Host,*Queue,*Emp,*Msg)Receiveamessage *Host Hostaddressofthemessagequeue*Queue Queuethatisqueriedformessage*Emp Flagfortrimmingendoflinefrommessage*Msg Messagethatisreceived

3. startXmsgAmqpBridge(*Tic,*Log)Messagesareoftheformat"Host:Queue:Msg",assumingthatthereisno":"inHostorQueue.Messagesaretransferredevery30seconds. *Tic TicketofmessagewithinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

4. XmsgAmqpBridge(*Tic,*Log)TransfermessagesfromXmsgtoAMQP. *Tic TicketofmessagewithinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

5. startAmqpXmsgBridge(*Host,*Queue,*Tic,*Log)

AMQPtoXmsgbridge.Messagesarereadfrom*Queueon*Host,andwrittentostreamwithticket*Tic,every30seconds*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

6. AmqpXmsgBridge(*Host,*Queue,*Tic,*Log)BridgefromAMQPmessagequeuetoXmsgqueue*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

Page 73: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

65

7. startXmsgAmqpBridgeOneQueue(*Tic,*Host,*Queue,*Log)

XmsgtoAMQPbridgewhichsendsallXmsgsfromachanneltoaqueueevery30seconds*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

8. XmsgAmqpBridgeOneQueue(*Tic,*Host,*Queue,*Log)XmsgtoAMQPbridgewhichsendsallXmsgsfromachanneltoaqueue*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog

Thelibraryisavailableat:https://github.com/DICE‐UNC/policy‐workbook/blob/master/dfc‐amqp.re

5.4 Automation of user submission agreements (Policy 17) Whenfilesareloadedintoastagingarea,processingstepscanbeappliedbeforethefileismovedtothearchivallocation.Anexampleistheacquisitionofasignedusersubmissionagreement.Ausersubmissionagreementtypicallyspecifiesthattheuserownsthecopyrighttothefile,hastheauthoritytosubmitthefiletoanarchive,andagreestoasetofaccesspermissionsforthefile.ThiscanbeautomatedthroughuseofE‐mail,webforms,orformalhardcopysubmissionagreements.

5.4.1 Staging of files with a user submission agreement Filescanbemovedfromastagingareaintoanarchivewhenthepresenceofausersubmissionagreementischecked.Thispolicyassumesthataseparatecollectionisformedwithin thestagingarea,and that theusersubmissionagreementhasbeenassociated as an attribute on the collection name. As in the previous policy , thevariablename“Use_Agreement”ischeckedtoseeifthevalueis“RECEIVED”.Inthiscase,thecollectionnameischeckedinsteadoftheUSER_NAME. Theruleusesthepolicyfunction: checkCollInput

Theinputvariablesare:*Coll arelativecollectionname*Stage arelativecollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_NAMEMETA_COLL_ATTR_NAME

Page 74: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

66

META_COLL_ATTR_VALUE

Theoperationsthatareperformedare:failforeachifmsiDataObjRenameselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐stage‐ag.r

5.5 Automatic Checksums (Policy 18) TheBagIttechnologyencapsulatesdatainacontainerbeforetransportoverthenetwork.Withinthecontainer,amanifestfileisaddedthatprovidesachecksumforeachenclosedfile.Thechecksumcanbeextracted,comparedtoanewchecksumgenerateduponreceivingthefile,andverifiedtoensurethatthedatawerenotcorruptedontransport.Thechecksumcanberecordedasametadataattributeonthefile,DATA_CHECKSUM,andusedinthefuturetoverifyfileintegrity.

5.5.1 Creating a BagIt file  Thisrulegeneratesabag(tarfile)containingamanifest,alistofchecksums,andthefilescontainedwithinaspecifiedcollection.ThegenerateBagItrulecreatestheequivalentofaSubmissionInformationPackage.Extensionswouldbetheinclusionofdescriptivemetadata,provenancemetadata,andstructuralmetadata.Theruleusesthepolicyfunction: checkCollInput

Theinputvariablesare:*BAGITDATA acollectionname*NEWBAGITROOT acollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAME

Theoperationsthatareperformedare:

failforeachif

Page 75: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

67

msiCollCreatemsiCollRsyncmsiDataObjChksummsiDataObjClosemsiDataObjCreatemsiDataObjWritemsiFreeBuffermsiExecGenQuerymsiExecStrCondQuerymsiGetContInxFromGenQueryOutmsiGetValByKeymsiMakeGenQuerymsiMakeQuerymsiSplitPathmsiTarFileCreatemsiWriteRodsLogselectstrlensubstrwhilewriteLinewriteString

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bagit.r

5.6 AutomatedcaptureofProvenance/contextualmetadata(Policy19)Provenanceandcontextualmetadatacanbeassociatedwithfilesasmetadataattributes.ThesourceofthemetadatamaybeanXMLfile,oratextfile,orastructurewithineachdatafile.Anautomatedprocesstoacquirethemetadatawouldparsethemetadatasourcefile,andloadthemetadataasattributesoneacharchivedfile.ExamplesofthisapproachareprovidedinChapter4.6.

5.6.1 Provenance for administrative policies Provenancecanalsobetrackedforexecutionofadministrativepolicies.Workflowstructuredobjectsimplementautomatedcaptureofprovenanceinformationforeachexecutionofaworkflow.Theworkflowfileisofdatatype'msso'andusesthedot‐extension'.mso'.TheworkflowfileisregisteredintoiRODSandcanbeshared,executed,andre‐executed.Theworkflowlanguageisthesameasthatofthe'.r'fileusedbyirulecommand,butneednothavetheINPUTandOUTPUTstatements.Policiescanbestoredasworkflows,witheachexecutionoftheworkflowtrackedbythedatagrid.Foreachworkflowfile,oneassociatesastructuredobjectthatimplementsaniRODScollection‐typeenvironmentfortrackingexecutionsoftheworkflow.Allfilesassociatedwithaworkflowexecutionarestoredunderthisstructuredobjectcalled

Page 76: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

68

theWorkflowStructuredObject(WSO).OnecanviewtheWSOakintoaniRODScollectionwithahierarchicalstructure.Atthetoplevelofthisstructures,onestoresalltheparameterfilesneededtoruntheworkflow,aswellasanyinputfilesandmanifestfilesthatareneededfortheworkflowexecution.Beneaththislevel,asetofrundirectoriesiscreatedwhichactuallyhousetheresultsofanexecution.Hence,onecanviewtheWSOasacompletestructurethatcapturesallaspectsofaworkflowexecution.IniRODStheWSOiscreatedasamountpointintheiRODSlogicalcollectionhierarchy.Thisissimilartoamountedcollectionbutoftype"msso".Oneusestheimcollcommandtocreatethismountpoint.WeuseWSOandMSSO(micro‐servicestructuredobject)synonymouslyforhistoricreasonssincetheneedandideaforWSO/MSSOcamefromtheusageexperienceforMicro‐ServiceObjects(MSO).Apartfromtheworkflowfilethereisoneotherimportantfilecalledtheparameterfile(withdot‐extension'.mpf')whichcontainsinformationneededforexecutingtheworkflow.Weseparatedtheparameterfilefromtheworkflowfilesuchthatonecanassociatemultipleparameterfileswithaworkflowandusethemforexecutingwithdifferentinputvalues.Theparameterfilescontainsvaluesforworkflow*variablesthatareusedintheworkflowexecution.Italsocontainsinformationaboutfilesthatneedtostagedinbeforetheexecutionandstagedoutforarchivingaftertheexecution.Italsocontainsdirectivesfortheworkflowexecutionengine.TheparameterfilesaswellasanyinputfilescanbeingestedintotheWSOusingnormalicommandssuchasiput.WhenaparameterfileisingestedintoaWSO,arunfileisautomaticallycreatedwhichcanbeusedtoruntheparameterfilewiththeassociatedworkflow.Whenaworkflowexecutionoccursarundirectoryiscreatedforstoringtheresultsofthisrun.Dependinguponthedirectivesintheparameterfile,olderresultsareversionedoutordiscardedafterasuccessfulworkflowexecution.Theseversiondirectoriescanbelistedandaccessedusingthenormalicommandssuchasilsandiget.Workflowscanbecalledfromwithinotherworkflows.Thisfeatureallowsonetochainworkflows.Thiscanbedoneintwoways.Oneisbyopeninganotherworkflowparameterfileinsideaworkflowandusingthedatareturnedfromthisasnormallydoneforaccessingfilesinirods.Asecondwayofrunningaworkflowinsideanotheristocallitthroughaspecialpolicycalled"acRunWorkFlow".Thefirstwayisusefuliftheoutputfilefromaworkflowisverylargeandneedstoprocessmultiplebufferreadcalls.Thesecondwayisusefulwhenthereturneddataislessthan32MBinsize.Samplesofbothversionsareshownbelow.SampleWorkflowfile:eCWkflow.mssisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐eCWkflow.mss

#Inputparameters:#Nameof*File1‐firstoutputfilewrittenbytheworkflow#Nameof*File2‐secondoutputfilewrittenbytheworkflow

Page 77: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

69

#Outputparameteris:#None#Outputfromrunningtheexampleis:#messageaboutcompletionwrittentostdout##ThisworkflowexecutesthefilecalledmyWorkFlowtwicewithtwodifferentinputvalues#Thisisanexecutablefilethatislocatedinbin/cmddirectoryoftheiRODSserver.#Itcreatesanoutputfileusingthevaluegiveninthesecondargument.#Theworkflowalsoprintstostdoutthestatementaboutwhentheexecutionoccurred.testWorkflow{#odum‐eCWkflow.mssmsiExecCmd("myWorkFlow",*File1,"null","null","null",*Result1);msiExecCmd("myWorkFlow",*File2,"null","null","null",*Result2);msiGetFormattedSystemTime(*myTime,"human","%d‐%d‐%d%ldh:%ldm:%lds");writeLine("stdout","WorkflowExecutedSuccessfullyat*myTime");}

SampleParameterfileusedwitheCWkflow.ms:eCWkflow.mpf#Comments##FileNameshouldbeStarVariableNameoccurring#eitherinINPUTofthemssofileorinINPARAMofthisfile.#Pleaseidentifyallfilenamesastheywillbehelpfulforlatermetadataextraction#FILEPARAMfileStarVariableName#DIRPARAMcollStarVariableName##INPARAMparamName=paramValue#INPARAMINFOparamName,paramType=type,paramUnit=unit,valueSize=size,Comments=comments#parametersusedbytheworkflow#InthiscaseTherearetwofilesandanotherstringvalueparameter.INPARAM*File1="OutFile3"INPARAM*File2="OutFile4"INPARAM*Aval="test"##Identifyfilesthatareusedininputparams‐neededtostagebackoutputs.FILEPARAM*File1FILEPARAM*File2##Identifythestageareawheretheworkflowexecutionisperformed#bydefaultitisperformedatthe"bin"directoryoftheiRODSserver.#ThisisneededifoneisusingmsiExecCmdmicro‐serviceaspartoftheworkflow.#STAGEAREAbin##StageinfilesfromanywhereiniRODStothe"stagearea"#myDataisafilelocatedintheWSOandphoto.JPGisafilesomewhereelseiniRODS.STAGEINmyDataSTAGEIN/raja8/home/rods/photo.JPG##Stagebackadditionalfilescreatedaspartofrun#COPYOUT‐willleaveacopyinthe"stagearea"andmakeacopyiniRODSWSO#‐helpfulifitisneededbysubsequentworkflowexecution#STAGEOUT‐willmovefilefrom"stagearea"toiRODSWSO#InthiscasewearearchivingthetwofilesmyDataandphoto.JPGaswellasthe#"myWorkFlow"fileusedbytheworkflowexecution.

Page 78: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

70

COPYOUTmyWorkFlowSTAGEOUTmyDataSTAGEOUTphoto.JPG##Thenextsetofstatementsprovidedirectivestotheworkflowsystem.#CHECKFORCHANGEisusedfortestingwherethefilebeingcheckedhaschangedsince#thepreviousexecutionoftheworkflow.Ifthefileismodified/touchedthentheworkflow#isexecuted.Ifnoneofthefilesarechanged,thentheworkflowisnotexecuted.If#directed,thefilefrompreviousexecutionis"sentback"totheclient.#NOVERSIONisusedwhenversioningofoldresultsisnotneeded.#CLEANOUTisusedtoclearthestageareaafterexecution.#CHECKFORCHANGE/raja8/home/rods/photo.JPGCHECKFORCHANGEmyData

JustforfullinformationdisclosuretheexecutableformyWorkFlowisalsoprovidedbelow.

#!/bin/sh#Justatesttocopyanexistingfile#onemaylookatthisastakingafileandcreatinganewonepossiblyafterconversion#mycpisafilethattakesttasinputandcreatesanewoutputfilecmd/mycpcmd/tt"$1"

Callingaworkflowfromanotherworkflowispossible.Thefollowingexampleshowsaworkflowcallembeddedasanobjectopeninthesampleworkflowshownabove.Thisisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐testWorkflowCall1.mss

Thenextexampleshowsthesameactionusingaruleandisusefulwhenreadingsmallfiles.Thisisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐testWorkflowCall2.mssThestepsforusingaworkflowobjectareoutlinedbelow.Firstcreateanewcollectionandingesttheworkflowfile

imkdir/dfctest/home/rodsAdmin/workflowiput‐D"mssofile"./dfcDemoWkFlow.mss/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow.mss

CreateanewcollectionandmountthatcollectionasaWorkflowStructuredObjectassociatedwiththeworkflowfile.ThecollectionthatismountedasanMSOforaworkflowcanbeanywhereiniRODS.Ascanbeseen,onecanhavemorethanonesuchstructuremountedforaworkflowfile.Thenameofthecollectionneednotberelatedtothenameoftheworkflowfile.

imkdir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowimcoll‐mmsso/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow.mss/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow

IngestaparameterfileintheWSOcollection.OnecaningestmorethanoneparameterfilealsointhesameWSOcollection.Arunfileforeachparametricfileisautomaticallycreated.

iputdfcDemoWkFlow.mpf/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowiputdfcDemoWkFlow2.mpf/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow

Page 79: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

71

Onecaningestotherfiles(suchasinputfiles)thatareneededforworkflowexecution.

iputmyData/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/myData

OnecanperformilsontheWSOcollection.Itwillshowthetwoparameterfilesaswellasrunfilesthatareautomaticallycreatedforeachofthem.Notethatthenameoftherunfileisbasedonthefilenameoftheparametricfile.

ils/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowdfcDemoWkFlow.rundfcDemoWkFlow.mpfdfcDemoWkFlow2.rundfcDemoWkFlow2.mpfmyData

OnecanperformothericommandsalsoontheWSOcollection.Theigetcommandwillshowthecontentsofthefile.

icd/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowils‐liget../dfcDemoWkFlow.mss‐igetdfcDemoWkFlow.mpf‐igetdfcDemoWkFlow2.mpf‐igetmyData‐

Toexecutetheworkflowusingaparametricfile,performanaccessontheassociatedrunfile.Insteadofshowingwhatisinthe"run"file,thisigetactionexecutestheworkflowusingtheassociatedparametricfileandstorestheresults.Theigetreturnsafilebacktotheclient.Bydefaultthestdoutfromexecutionoftheworkflowisreturned.Ifoneneedsadifferentfiletobereturned,onecansetthatupaspartoftheworkflowfileortheparametricfileusingthedirective"SHOW".

igetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:28m

TheexecutionoftheworkflowalsocreatesanewdirectoryaspartoftheWSOstructureandstorestheresultsoftheexecution(asperthedirectivesinthe.mpfparametricfile).Thiscanbeseenbyperformingalistingofthedirectorywhichwillbenamedaftertheparametricfile.

ils/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow:dfcDemoWkFlow.rundfcDemoWkFlow.mpfdfcDemoWkFlow2.rundfcDemoWkFlow2.mpfmyDataC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir

ListingtherunDirwillshowtheresultsoftherun.Comparethiswiththedirectiveintheparametricfileabove.

ils‐ldfcDemoWkFlow.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir:rodsAdminmssoStdemoResc112012‐09‐20.11:28&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:28&myWorkFlowrodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:28&photo.JPGrodsAdminmssoStdemoResc522012‐09‐20.11:28&stdout

AnyofthefilesintherunDirdirectorycanbeaccessedusingtheigetcommand.

Page 80: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

72

Also,onecanhavewholedirectoriesstoredundertherunDir.Ifyouruntheworkflowagainwithoutchangingtheinput,theworkflowisnotactuallyexecuted.Insteadthecontentsoftheoldstdoutissentbacktotheclient.Alsotherewillbenonewfilescreated.

igetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:30m

Thisisbecauseneithertheinputfilesnortheworkflowsystemhavechangedandasperdirective,itwillnotre‐executetheworkflow.Ifweoverwriteoneoftheinputfiles,theworkflowwillbeexecuted.SincetheNOVERSIONdirectiveisnotintheparameterfile,theolderfileswillbeversionedandthenewfilescreatedintherunDirdirectory.

iput‐fmyData2/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/myDataigetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:30mils‐ldfcDemoWkFlow.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir:rodsAdminmssoStdemoResc202012‐09‐20.11:30&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:30&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:30&photo.JPGrodsAdminmssoStdemoResc212012‐09‐20.11:30&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:30&myWorkFlowrodsAdminmssoStdemoResc522012‐09‐20.11:30&stdout

Ascanbeseenbelow,theolderexecutionfilesarestoredunderdfcDemoWkFlow.runDir0

ils‐ldfcDemoWkFlow.runDir0/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir0:rodsAdminmssoStdemoResc112012‐09‐20.11:28&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:28&myWorkFlowrodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:28&photo.JPGrodsAdminmssoStdemoResc522012‐09‐20.11:28&stdout

Onecanruntheworkflowwithanotherparametricfileanditwillbeplacedinanewdirectory.

igetdfcDemoWkFlow2.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:31mils‐l/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow:rodsAdminmssoStdemoResc335544122012‐09‐20.11:26&dfcDemoWkFlow.runrodsAdminmssoStdemoResc6432012‐09‐20.11:26&dfcDemoWkFlow.mpfrodsAdminmssoStdemoResc335544122012‐09‐20.11:27&dfcDemoWkFlow2.runrodsAdminmssoStdemoResc6472012‐09‐20.11:27&dfcDemoWkFlow2.mpfrodsAdminmssoStdemoResc212012‐09‐20.11:29&myDataC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDirmssoStructFileC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir0mssoStructFileC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow2.runDir

Page 81: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

73

mssoStructFileils‐ldfcDemoWkFlow2.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow2.runDir:rodsAdminmssoStdemoResc202012‐09‐20.11:31&myOutFile3rodsAdminmssoStdemoResc202012‐09‐20.11:31&myOutFile4rodsAdminmssoStdemoResc11815882012‐09‐20.11:31&photo.JPGrodsAdminmssoStdemoResc212012‐09‐20.11:31&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:31&myWorkFlowrodsAdminmssoStdemoResc522012‐09‐20.11:31&stdout

NotethatthenameoftheoutputfilesaredifferentinthesecondrunasthenameswerechangedindfcDemoWkFlow2.mpf

5.7 Federation–periodicallycopydata(Policy20)Apolicyforcopyingdatabetweentwofederateddatagridswasprovidedinsection4.7.3.Thepolicycanbeturnedintoaperiodicallyexecutedrulebyaddingadelaycommandthatexecutesthepolicyeveryweek.Thisruletakesallfilesina“stage”directoryonthefirstdatagrid,copiesthemtoan“Archive”directoryontheseconddatagrid,anddeletesthefilefromthefirstdatagrid.Therulealsologsalloftheactionsandwritesthelogtoadirectoryintheseconddatagrid.Theruleusesthepolicyfunctions: checkCollInput checkRescInput

createLogFilefindZoneHostName

isCollTheinputvariablesare:

*Dest acollectionname*DestZone thedestinationzone*Res astorageresource*Src acollectionname

Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

Page 82: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

74

delayfailforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐stage‐ag.r

5.8 De‐identificationofData(Policy25)Thisiscrucialforallrepositoriesinallfieldswhenhumansubjectsdataareinvolved.Informationrelatedtoaddresses,socialsecuritynumbers,andcreditcardshastobeidentifiedandremoved.Theidentificationofpersonallyidentifieddatawithinsubmitteddigitalobjectsmaybepartofausersubmissionagreement.Theabilitytoautomatethedetectionisessentialwhenresearcherssubmitmaterial.

5.8.1 BitCuratorbasedprocessingTheBitCuratorprojectbringsinaseriesofopensourcedigitalforensicstoolsandtechniquestocollectinginstitutions,topreservetheirborn‐digitalcollections[6].iRODS(Integratedrule‐orienteddatasystem)isadata‐gridsoftwaresystem,whereuserscanbuildsharablecollectionsfromdatadistributedacrossfilesystemsandtapearchives[9].Thisprojectintegratesthetwotechnologies,allowingauserofiRODStoruntheBitCuratortoolsinaniRODSenvironmentandcopytheresultingreportsintotheiRODSgrid.ThisdocumentliststheBitCuratortoolsthatareintegratedintoiRODSandaoverviewofeachtoolalongwithadescriptiononhowtouseit.ThetoolsarerunonaniRODSserver,requiringaninstallationbythedatagridadministrator.

TheprerequisiteforrunningtheBitcuratortoolsonamediaoranysetoffilesistousethetool“Guymager”(http://guymager.sourceforge.net/)andgenerateanimageinthe.affor.E01format.

5.8.1.1 Generate Digital Forensics XML file ThisutilityusestheBitCuratorFiwalktool,takesanimageinthe.afforE01formandgeneratesanXMLfile.Asper[7],“DigitalForensicsXML(orDFXML)isa

Page 83: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

75

metadataschemadesignedtofacilitatethesharingofstructuredinformationproducedbyforensictools.DFXMLisanattempttostandardizeabstractionsbyprovidingaformalizedlanguagefordescribingforensicprocesses”.Referto[7]formoredetails.Thecommandtobeexecutedislocatedinthedirectoryirods/server/bin/cmd/fiwalk.ThisruleInvokestheFiwalktooltogeneratetheXMLoutputofthegivendiskimage.

CommandStructure:irule‐Fodum‐bcGenerateXml.r"*outXmlFile='/Path/to/xmlfile'""*image='/path/to/image.aff'"

Theinputvariablesare:

*image afilepathname*outXmlFile afilepathname

Thesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC

Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselecttimewriteLine

Theruleisavailableat

Page 84: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

76

http://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateFiwalkRule.rCommandexamples:

1.irule‐Fodum‐bcGenerateFiwalkRule.rDefaultparameterscanbemodifiedbychangingthefollowinglinewithappropriatevalues:

INPUT*outXmlFile="/AstroZone/home/pixel/bcfiles/xmlfile",*image="/AstroZone/home/pixel/bcfiles/charlie‐workusb‐2009‐12‐11.aff"

2.irule‐Fodum‐bcGenerateFiwalkRule.r"*outXmlFile='/home/xmlfile'""*image='/home/test.aff'"

Files:•LocalFileSystem:

ThefollowingfileresidesontheLocalFileSystem:$iRODS/server/bin/cmd/fiwalk

•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:$iRODS_grid/<xmlfile>

Implementationnotes:Thefiwalktool,anexecutablefile,iscopiedtoiRODS/server/bin/cmddirectory:

cp/usr/local/bin/fiwalkiRODS/server/bin/cmd/fiwalk

5.8.1.2 BulkExtractorThe“bulk_extractorisacomputerforensicstoolthatscansadiskimage,afile,oradirectoryoffilesandextractsusefulinformationwithoutparsingthefilesystemorfilesystemstructures.Theresultscanbeeasilyinspected,parsed,orprocessedwithautomatedtools.”[8]Thistooltakesthediskimage(the.afffile)asaninputandgeneratesanoutputdirectoryinthespecifiedlocation,containingatextfileforeachofthefeatureslocatedintheinputimage.FormoreinformationonBulkExtractorscanners,refertothefollowingURLs:

http://www.forensicswiki.org/wiki/Bulk_extractorhttp://wiki.bitcurator.net/index.php?title=Bulk_Extractor_Scanners

Thecommandtobeexecutedislocatedindirectory

irods/server/bin/cmd/bulk_extractorTheexecutioncommandis

bulk_extractor<image.aff>‐o<outputdirectory>

InputParameteris:ImageFilepathOutputParameteris:FilePathforFeatureFiles

CommandStructure:irule‐Fodum‐bcExtractFeatureFilesRule.r"*image='/path/to/image.aff'""outFeatDir='/path/to/outdir'"

Theinputvariablesare:

*image afilepathname

Page 85: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

77

*outFeatDir acollectionnameThesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC

Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutremoteselecttimewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateXml.r

Commandexamples:1.irule‐Fodum‐bcExtractFeatureFiles.r

Defaultparameterscanbemodifiedbychangingthefollowingline:INPUT*image="/AstroZone/home/pixel/bcfiles/charlie‐work‐usb‐2009‐12‐11.aff",*outFeatDir="/AstroZone/home/pixel/bcfiles/BeOutFeatDir"

2.irule‐Fodum‐bcExtractFeatureFiles.r"*image='<image>.aff'""*outDir='/home/be_feature_dir'"

Files:•LocalFileSystem:

Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/bulk‐extractor

•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:

Page 86: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

78

$iRODS_grid/be_feature_dirTheactuallistoffileswithinthisdirectorydependsonthefeaturesidentifiedwithintheimagefile.Examples:

$iRODS_grid/be_feature_dir/domain.txt$iRODS_grid/be_feature_dir/telephone.txt

Implementationnotes:ThefollowingfileiscopiedtoiRODS/server/bin/cmddirectory:

cp/usr/local/bin/bulk_extractoriRODS/server/bin/cmd/bulk_extractor)

5.8.1.3 GenerateAnnotatedFiles(identify_filenames)Thistooltakestheoutputfilesgeneratedbybulk_extractorandthediskimagefile(.afforE01format)astheinputsandcreatestheannotatedversionsofeachofthefeaturefilesgeneratedbythebulk_extractor.

InputParametersare:ImageFilepathBulk_extractordirectory

OutputParameteris:OutputdirectoryannotatedFilesDirtostoretheannotatedfiles.

Tool:identify_filenames‐‐all–imagefile"path/to/imagefile.aff""Path/to/beFeatDir""Path/to/outAnnDir"

CommandStructure:irule‐Focum‐bcAnnotateBeFiles.r"*image='/path/to/image.aff'"\"*beOutDir='/path/to/beDir'""*annotateFilesDir='/path/to/newdir'"

Theinputvariablesare:*beFeatDir acollectionname*image afilepathname*outAnnDir acollectionname

Thesessionvariablesare. $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC

Theoperationsthatareperformedare:breakerrorcodeerrormsgexecCmdArg

Page 87: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

79

failforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselectsplittimewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcAnnotateBeFiles.r

Commandexamples:1.irule‐Fodum‐bcAnnotateBeFiles.r

Thedefaultparameterscanbemodifiedbychangingthefollowinglinesappropriately:INPUT*image="/AstroZone/home/pixel/bcfiles/charlie‐work‐usb‐2009‐12‐11.aff",*beFeatDir="/AstroZone/home/pixel/bcfiles/beFeatDir",*outAnnDir="/AstroZone/home/pixel/bcfiles/outAnnDir"

2.irule‐Focum‐bcAnnotateBeFiles.r"*image='/home/test.aff'""*beOutDir='/home/beDir'""*annotateFilesDir='/home/annotated_dir'"

Files:•LocalFileSystem:

Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/identify_filenames

•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:

$iRODS_grid/annotated_dirTheactuallistoffileswithinthisdirectorydependsonthefeaturesidentifiedwithintheimagefile.Examples:

$iRODS_grid/annotated_dir/annotated_domain.txt$iRODS_grid/annotated_dir/annotated_telephone.txt

ImplementationNotes:ThefollowingfilesarecopiedtoiRODS/server/bin/cmddirectory:

~/Research/Tools/bulk_extractor/python/fiwalk.py~/Research/Tools/bulk_extractor/python/dfxml.py~/Research/Tools/bulk_extractor/python/bulk_extractor_reader.py~/Research/Tools/bulk_extractor/python/identify_filenames.pyasidentify_filenames

Page 88: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

80

5.8.1.4 GenerateBitCuratorReportsThistooltakesthexmloutputoftheFiwalktoolandtheannotatedfilescreatedbyidentify_filenamesastheinputsandproducesvariousreportsinExcelandPDFformatsinthespecifiedoutputdirectory.ThePythonscriptislocatedinirods/server/bin/cmd/bc_generate_reports

InputParametersare:AnnotatedFilesDirectory(GeneratedbytherulerulemsiBcAnnotateBeFiles.r)XMLfilegeneratedbyfiwalktool(usingtherule:

rulemsiBcGenerateXml.r)Configurationfile

OutputParameteris:OutputdirectorynewBcReportsDirwherethereportsaregenerated.

Tool:bc_generate_reports‐‐fiwalk_xmlfile</path/to/xmlfile/>‐‐annotated_dir</path/to/annotatedDir/‐‐outdir</path/to/outdir/>‐‐conf</path/to/configfile/>

CommandStructure:irule‐Fodum‐bcGenerateReportsRule.r"*fiwalkXmlFile='/Path/To/Xmlfile'""*annotatedDir='/Path/To/annotated_directory'""*outReportsDir='/Path/To/output_Reports_directory'""*conf='/Path/To/Config_file'"

Theinputvariablesare:*annotatedDir acollectionname*conf afilepathname*fiwalkXmlFile afilepathname*outReportsDir acollectionname

Thesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC

Theoperationsthatareperformedare:breakerrorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPut

Page 89: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

81

msiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselectsplittimewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateReportsRule.r

Commandexamples:1.irule‐Fodum‐bcGenerateReportRules.r

Thedefaultparameterscanbemodifiedbychangingthefollowinglinewithappropriateparameters:INPUT*fiwalkXmlFile="/AstroZone/home/pixel/bcfiles/bcTestFiwalkXmlfile.xml",*annotatedDir="/AstroZone/home/pixel/bcfiles/bcTestBeAnnDir",*outReportsDir="/AstroZone/home/pixel/bcfiles/outReportsDir",*conf="/AstroZone/home/pixel/bcfiles/bcTestConfigFile"

2.irule‐Fodum‐bcGenerateReportRules.r"*fiwalkXmlFile='/home/xmlfile'""*annotatedDir='/home/annotated_directory""*outReportsDir='/grid/output_directory'"“*conf=/home/config_file”

Files:•LocalFileSystem:

Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/generate_report

•iRODSGrid:Executingthisrulecreatesthefollowingdirectories/filesonthegrid:$iRODS_grid/outReportsDir:$iRODS_grid/outReportsDir/BeReport.pdf$iRODS_grid/outReportsDir/FiwalkDeletedFiles.pdf$iRODS_grid/outReportsDir/FiwalkReport.pdf$iRODS_grid/outReportsDir/bcTestFiwalkXmlfile.xml.xlsx$iRODS_grid/outReportsDir/bc_format_bargraph.pdf$iRODS_grid/outReportsDir/format_table.pdf$iRODS_grid/outReportsDir/bcfiles/outReportsDir/featuresThefilesunderthefeaturesdirectorydependsontheimage.

Examplesare:$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/domain.xlsx$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/telephone.xlsx$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/domain.pdf

Page 90: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

82

$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/telephone.pdf

Implementationnotes:ThefollowingfilesarecopiedtoiRODS/server/bin/cmddirectory:

$BitCurator/python/bc_reports_tab.pyasbc_reports_tab$BitCurator/python/generate_report.pyasbc_generate_reports$BitCurator/python/bc_utils.py$BitCurator/python/bc_config.py$BitCurator/python/bc_pdf.py$BitCurator/python/bc_graph.py$BitCurator/python/bc_regress.py$BitCurator/python/bc_genrep_dfxml.py$BitCurator/python/bc_genrep_text.py$BitCurator/python/bc_genrep_xls.py$BitCurator/python/bc_gen_feature_rep_xls.py$BitCurator/python/bc_config_file

5.8.1.5 BitcuratorGUIBitCuratorsupportsaGraphicalUserInterfaceusingwhichuserscanlaunchthetoolsexplainedabove.AruleiswrittentolaunchthisGUI.ButmoreworkneedstobedonetomaketheGUItoappearontheclientscreenratherthanontheserver.

Noinputvariablesareused:Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

errorcodeerrormsgifmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateReportsGuiRule.r

Commandexample:irule‐Fodum‐bcGenerateReportsGuiRule.r

5.9 Unique Identifiers for Data Sets (Policy 26) MultipleexternalrepositoriesrequirethegenerationofauniquedataID.AnexampleisDataONE,whichusestheHandlesystemtoassignauniqueidentifiertoa

Page 91: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

83

dataset.Notallrepositoriesusethesametypeofidentifier.Forinstance,theCaliforniaDigitalLibraryusesanARCidentifier.

5.9.1 Assigning a Handle to a File TheHandlesystemcanusealocalhandleregistryforassigningidentifierstofiles.Thelocalhandleregistry,inturn,isassignedauniqueidentifierinaglobalhandlesystem.ThefollowingrulecreatesahandleandregistersitintheDFChandleserver:(theregistrationofthehandleinourhandleserverindicatesitisavailableforaccessfromDataONE)Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpointRestrictedtocollectionslike“nexrad”

Thepolicyusessessionvariables $userNameClient

Theoperationsthatareperformedare:

msiExecCmdmsiGetStdoutInExecCmdOutmsiWriteRodsLog

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐handle‐nexrad.re.

Theruleexecutesashellscript:

#!/bin/bashif["$#"‐ne2];thenecho"Usage:create_handle<dataobjectid><dataobjecturl>"exit1fiOID="$1"URL="$2"HANDLE=$(java‐classpath./irods‐hs‐tools.jarorg.irods.dfc.CreateHandle./admpriv.bin"$URL""$OID")echo"$HANDLE"exit0;

5.9.2 Registering files in DataONE registry DataONEwebservicesareusedtoautomateregistryofaniRODScollectionintheDataONEregistry.WhentheDataONEwebserviceasksforalistofDataONEregisterediRODSdataobjects,themembernodewebservicerespondsbyretrieving

Page 92: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

84

thelistofobjectsthathavebeenregisteredinthehandleserver.Theharvestingisdoneperiodically,withtheresultthataniRODSdatacollectioncanbediscoveredandaccessedthroughtheDataONEservices.

5.10 Authentication identity management (Policy 27) TheiRODSdatagridprovidessupportforpluggableauthenticationenvironments.Eachplug‐incanalsosupportpre‐andpost‐policyenforcementpoints.Astandardexampleistheuseofanexternalcertificateauthorityforrecognizingusers.Anycertificatefromthatcertificateauthorityishonored,andacorrespondinguseraccountissetupinthedatagrid.Policiescontrolwhatthenewusersareallowedtodo.ThiscapabilitywasimplementedfortheAustralianResearchCollaborationService.TheiRODScommandlinetools(icommands)andGridFTPinterfacecanuseGSI(GridSecurityInfrastructure)authenticationwhichreliesonlimitedlifetimeproxycertificates.Inaddition,yourGSIcertificatemustbemappedtoyourARCSDataFabricaccount.ThisisdoneautomaticallyforARCSSLCScertificates,andyoucanaddadditionalmappingsforotherGSIcertificates.AcertificatecanalsobeacquiredfromCILogonthroughtheInCommoninfrastructure.AniRODSdatagridaccountcanbesetupwithauthenticationbasedontheGSIcertificate.

5.10.1 Verify access controls on each file Thedatagridmanagesaccesscontrollistsforeachfile.ItispossibletoquerytheiCATcatalogtocheckwhetheraccesspermissionhasbeengiventoindividualswhoshouldnolongerhaveaccess.Thistypicallyhappenswhenanadministratorretires,ortheaccesscontrolpoliciesforacollectionhavechanged.Therulelistedinsection4.1.5identiesaccesscontrolsonafileinacollectionforaspecificperson.

5.11 Automated Data Reviews (Policy 28) Itispossibletoreviewanyofthestateinformationthatisstoredforafile.Areportcanbegeneratedwhichlistsallofthenon‐compliantfileswithinacollection.

5.11.1 Metadata Review Thispolicycomparesthemetadataschemathatisassignedtoacollectionwiththemetadataattributessetoneachfilewithinthecollection.Thecollectionmetadataschemaisdefinedbysettingametadataattributeonthecollectionwithanattributevalueof“null”.Noinputvariablesareused.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_NAME

Page 93: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

85

DATA_NAMEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITS

Theoperationsthatareperformedare: break

foreachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐listmetadata.r

5.12 Mapping metadata across systems (Policy 29) TheHIVE(HelpingInterdisciplinaryVocabularyEngineering)technologyisusedtointegratevocabulariesencodedwiththeSimpleKnowledgeOrganizationSystem(SKOS),aWorldWideWebConsortium(W3C)standard.HIVEisaLinkedOpenData(LOD)technologyaligningwithLinkedOpenVocabularies(LOV)activities.TheHIVEapproachandtechnologiespromoteinteroperabilityamongdatarepositories,libraries,andarchives,allowingscholarlyworkstobeeasilyandquicklyindexedacrossmultipledisciplines.TheHIVEsystemcanbeaccessedfromtheiRODSDataGridusinganupdatedCurlmicro‐service.ARESTserviceisavailablethatcanqueryforhttp://URIsrepresentingconceptsinaSKOSvocabularythatisstoredintheHIVEsystem.AnexampleXMLrepresentationofa'concept'intheUATvocabularyforagivenURIis:

<hiveConcepturi="http://purl.org/astronomy/uat#T100"> <label>Astroparticlephysics</label> <altLabel>Particleastrophysics</altLabel> <broaderuri=http://purl.org/astronomy/uat#T828> <label>"Interdisciplinaryastronomy"</label></broader> <narroweruri=http://purl.org/astronomy/uat#T635> <label>"Gammarays"</label></narrower> <narroweruri=http://purl.org/astronomy/uat#T351> <label>"Cosmologicalneutrinos"</label></narrower> <narroweruri=http://purl.org/astronomy/uat#T689> <label>"Gravitationalwaves"</label></narrower> <relateduri=http://purl.org/astronomy/uat#T372> <label>"Darkmatter"</label></related> <vocabName>uat</vocabName></hiveConcept>

TheseURIsmaybeappliedtoiRODSdataobjectsusingtheAVUmechanism,wheretheAVUattributeisthevocabularyURI,andtheAVUunitisaspecialmarkerofthe

Page 94: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

86

form'iRODSUserTagging:HIVE:VocabularyTerm'thatindicatesthattheAVUisaresolvableURI.

5.12.1 Validate HIVE vocabularies AnexamplevalidationruleutilizestheRESTservicetoiterateoveriRODScollections,validatingthetermsasbeingvalidSKOSreferences,andgeneratingareportoninvalidterms.Noinputvariablesareused.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:

COLL_NAME

Theoperationsthatareperformedare:foreachifmsiCurlGetStrmsiCurlUrlEncodeStringselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐validateOntologies.r

Hereisanexampleoutputwhentwodataobjectsareannotated,onewithaninvalidterm:

test1@ubuntu:~/workspace/rule_workbench$irule‐Fvalidate_data_object_ontologies.rMetadatavalidationreport/fedZone1/home/rods/hive/libmsiCurlGetObj.cpphasurihttp://purl.org/astronomy/uat#TT888thatisnotinavalidontology

5.13 Export Datasets in Multiple Formats (Policy 30) Themotivationforchangingtheformatofafilemaybetocreateastandardrepresentationforpreservation,ortocreateapreferredformatfordisplay.TheabilitytoexportormakeavailabletodownloaddatasetsinmultipleformatssuchasExcel,CVS,SPSS,orStata(inothersciencesthiswouldincludeotherformatsbuttheissueisthesame–beingabletogoinandoutofopenandproprietaryformatstoaidpreservation)addressesbothfutureuserneedsandimmediateuserneeds.

5.13.1 Polyglot Format Conversion ThispolicyinvokestheNCSAPolyglotservicetotransformadataformat.Theoriginalfileisreplacedwiththemodifiedfile,andmetadataattributesareupdated.

Page 95: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

87

Ifanattributenamed“ConvertMe”ispresentonthefile,thefileisconverted.Thenameoftheoriginalfileisthenaddedasmetadata.Thepolicyimplementsaconstraint: AppliedattheacPostProcForModifyAVUMetadatapolicyenforcementpoint Checksthattheattributenameis“ConvertMe”

Theinputvariablesare:

*Option notused*ItemType notused*ItemName Fileorcollectionname*Aname Attributename*Avalue Attributevalue*Aunit Attributeunits

Thepolicyfunctionsare:deleteAVUMetadatamodAVUMetadata

Theoperationsthatareperformedare:

ifirods_curl_get

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForModifyAVUMetadata.re.

5.14 Check for viruses (Policy 31) Allfilesinastagingareacanbecheckedforthepresenceofavirus.Whenthecheckiscomplete,thefilescanthenbemovedintoacollection.Thisusestheclamscanviruscheckroutinewhichisrunasanexternalexecutable.TheclamscanprogrammustbeinstalledontheiRODSserverwherethestagingareaislocatedinthe/usr/bindirectory.

5.14.1 Scan files and flag infected objects Thisrulerunstheclamscanscriptonanexternalresource,whichchecksforthepresenceofviruses.Eachfileisflaggedwithametadataattributetorecordthestatusoftheviruscheck.Theclamscanpythonscriptis:

#!/usr/bin/pythonimportsubprocess,sysproc=subprocess.Popen(['/usr/local/bin/clamscan']+sys.argv[1:],stdout=subprocess.PIPE,stderr=subprocess.STDOUT)sys.stdout.write(proc.communicate()[0])sys.stdout.flush()sys.exit(abs(proc.returncode))

Thecontrollingpolicycanbeinvokedinteractively,oraddedtotherulebaseandinvokedaftereachfileload.

Page 96: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

88

Thepolicyimplementsaconstraint:

AppliedattheacScanFileAndFlagObjectpolicyenforcementpoint

Theinputparametersare:*Objpath iRODSfilethatisscanned*FilePath PhysicallocationofiRODSfile*Resource ResourceholdingphysicalcopyofiRODSfile

Theoperationsthatareperformedare:ifmsiAddKeyValmsiAssociateKeyValuePairsToObjmsiExecCmdmsiGetStdoutInExecCmdOutmsiGetSystemTime

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acScanFileAndFlagObject.re.

5.15 Rule set management (Policy 32) TheiRODSdatagridreliesuponadistributedruleengineanddistributedrulebasestoimplementpolicies.Ifapolicyischanged,forconsistencytherevisedrulebaseneedstobeinstalledateachserverlocation.

5.15.1 Deploy rule sets Thisruleidentifierstheservers,anduploadsanewversionoftherulebasetoeachserver.Themicro‐servicesusedbythisruleareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservicesTheinputvariablesare:

*ruleBaseName list(“core”)*targets list(“localhost”)

Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

breakerrorcodefailmsgforeachifmsiChksumRuleSetmsiMvRuleSetmsiReadRuleSet

Page 97: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

89

msiRmRuleSetmsiRuleSetExistsremotewhilewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐copyRule.r

5.16 Parse event trail for all persons accessing a collection (Policy 33) TheDFCDataBooksystemprovidesawaytorecordinformationabouteventsthatoccuronfileswithinthedatagrid.Thispolicyisimplementedintherulebase,suchthateventsareautomaticallytrackedacrossallclients.ThepoliciesareavailableinthefileiRODS/server/config/reConfigs/databook.re.Thepolicysetmodifieseachofthepolicyenforcementpointrulestoaddeventtracking.Theattributesthataretrackedare:

ATTR_IDATTR_HAS_VERSIONATTR_PREVIEWATTR_THUMB_PREVIEWATTR_CONTRIBUTORATTR_RELATEDATTR_REPLACED_BYATTR_REPLACESATTR_TITLEATTR_DESCRIPTION

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/databook.re

Page 98: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

90

6 Protected Data Policy Sets TheUNCrequirementsformanagementofprotecteddatasetshavebeenanalyzedfordevelopmentofcomputeractionablepoliciesthatcanautomatemanagementtasks.Thedatamanagementrequirementsareabstractedfromthedocument,

https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf

TherequirementsarelistedinAppendixE.Eachrequirementhasbeenevaluatedforthefeasibilityofcreatingacomputeractionablepolicythatautomatesenforcement.Policiesarealsodefinedtoverifythateachrequirementhavebeenenforced.Adeeparchiveisproposedformanagingdatathatcontains“Protected”informationatUNC.Noaccessispermittedfromtheexternalworldtothedeeparchive.Insteadprocessesrunningwithinthedeeparchivepulldatarecordsfromastagingarea.Onthestagingarea,thedatasetsarecheckedfor“Protected”information,encrypted,andstoredintothedeeparchive,asshowninFigure1.

The“Protected”recordsmayalsobearchivedatan“off‐site”locationsuchastheTexasAdvancedComputerCentertominimizeriskofdataloss.TheiRODSdatagridauthenticateseveryuser,authorizeseveryoperation,managesinteractionswiththestoragesystems,andcreatesaneventdatabasedetailingeveryinteraction.Policies

Figure1.Federateddatagridsforadeeparchive

Page 99: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

91

canparsetheeventdatabasetoverifycompliancewithpoliciesovertime,trackunauthorizedaccessattempts,andtrackdatacorruptionevents.Grouppermissionsaredefinedforaccesstothedatatosimplifyusermanagement.ThetasksforprotecteddataarelistedinTable2.

Table2.Protecteddatatasksrequiringpolicycontrol1  CheckforpresenceofPIIoningestion

2  Checkforvirusesoningestion

3  Checkpasswordsforrequiredattributes

4  Encryptdataoningestion

5  Encryptdatatransfers

6  Federation‐controldatacopies(accesscontrol)

7  Federation‐manageremotedatagridinteractions(updaterulebase)

8 Federation‐periodicallycopydata

9 Federation‐managedataretrieval(updateaccesscontrols)

10 Generatechecksumoningestion

11 Generatereportofcorrectionstodatasetsoraccesscontrols

12 Generatereportforcost(time)requiredtoauditevents

13 Generatereportoftypesofprotectedassetspresentwithinacollection

14 Generatereportofallsecurityandcorruptionevents

15 Generatereportofthepoliciesthatareappliedtothecollections

16 Listallstoragesystemsbeingused

17 Listpersonswhocanaccessacollection

18 Liststaffbypositionandrequiredtrainingcourses

19 Listversionsoftechnologythatarebeingused

20 Maintaindocumentonindependentassessmentofsoftware

21 Maintainlogofallsoftwarechanges,OSupgrades

22 Maintainlogofdisclosures

23 Maintainpasswordhistoryonusername

24 Parseeventtrailforallaccessedsystems

25 Parseeventtrailforallpersonsaccessingcollection

26  Parseeventtrailforallunsuccessfulattemptstoaccessdata

27 Parseeventtrailforchangestopolicies

28 Parseeventtrailforinactivity

29 Parseeventtrailforupdatestorulebases

30 Parseeventtrailtocorrelatedataaccesseswithclientactions

31 Providetestenvironmenttoverifypoliciesonnewsystems

Page 100: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

92

Foreachlistedtask,wedemonstrateaniRODSpolicythatimplementstheassociateddatamanagementfunctions.

6.1 Check for presence of PII on ingestion (Policy 34) Thebitcuratortechnologyisabletoparsebinaryimagesforpersonallyidentifiedinformationsuchascreditcardnumbersandsocialsecuritynumbers.Thecurrentimplementationrunsthebitcuratorexecutableonthestoragesystemholdingthedata.Thebitcuratortechnologyisdescribedinsection5.8.

6.2 Check for viruses on ingestion (Policy 31) Allfilesinastagingareacanbecheckedforthepresenceofavirus.Whenthecheckiscomplete,thefilescanthenbemovedintoacollection.Thisusestheclamscanviruscheckroutinewhichisrunasanexternalexecutable.TheclamscanprogrammustbeinstalledontheiRODSserverwherethestagingareaislocatedinthe/usr/bindirectory.

Table2continued.Protecteddatatasksrequiringpolicycontrol

32 Providetestsystemforevaluatingarecoveryprocedure

33 Providetrainingcoursesforusers

34 Replicatedatasetsoningestion

35 ReplicateiCATperiodically

36 Setaccessapprovalflag

37 Setaccesscontrols

38 Setaccessrestrictionuntilapprovalflagisset

39 Setapprovalflagpercollectionforenablingbulkdownload

40 SetassetprotectionclassifierfordatasetsbasedontypeofPII

41 Setflagforwhetherticketscanbeusedonfilesinacollection

42 Setlockoutflagandperiodonusername‐countingnumberoftries

43 Setpasswordupdateflagonusername

44 Setretentionperiodfordatareviews

45 Setretentionperiodoningestion

46 Tracksystemsbytype(server,laptop,router,….)

47 Verifyapprovalflagswithinacollection

48 Verifyfileshavenotbeencorrupted

49 Verifypresenceofrequiredreplicas

50 Verifythatnocontrolleddatacollectionshavepublicoranonymousaccess

51 Verifythatprotectedassetshavebeenencrypted

Page 101: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

93

6.2.1 Scan files and flag infected objects TheruleforinvokingvirusdetectionarelistedinSection5.14.1.Therulerunstheclamscanscriptonanexternalresource,whichchecksforthepresenceofviruses.Eachfileisflaggedwithametadataattributetorecordthestatusoftheviruscheck.

6.2.2 Migrate files that pass the virus check Aquerycanbemadetothecatalogtoidentifyfilesthathavepassedtheviruscheck.Thegoodfilesaremigratedtothearchive,andthevirusflagisreset.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiDataObjRenamemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐migrate‐files.r

6.3 Check passwords for required attributes (Policy 35) ThepolicyenforcementpointacCheckPasswordStrengthcheckspasswordstrength(addedafteriRODS3.2),andiscalledwhentheadminorusersetsapassword.Bydefault,thisisano‐opbutthesimpleruleexamplebelowcanbeusedtoenforceaminimalpasswordlength.Thepasswordmayalsorequireatleastonenumber.ThischeckmaybedonebyanexternalauthenticationmanagerinsteadofwithiniRODS.Thepolicyimplementsaconstraint:

AppliedattheacCheckPasswordStrengthpolicyenforcementpoint

Theinputparametersare:*password Password

Page 102: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

94

Theoperationsthatareperformedare:failifstrlenmsiSplitPathByKeysucceedwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acCheckPasswordStrength.re.

6.4 Encrypt data on ingestion  (Policy 36) TheiRODSdatagridsupportsSSLencryptionondatatransfers.Thesameencryptioncanbeaccessedthroughamicro‐servicetoencryptdataonstorage.Theexampleruleautomatesencryptiononfilessubmittedtothecollection:

/UNC‐CH/home/HIPAA/ArchiveThegoalistomaintaindataasanencryptedfileduringtransport,aswellaswithinstorage.TheruleisimplementedasapolicythatisenforcedattheacPostProcForPutpolicyenforcementpoint.Aflagissetonthefiletodenotethatencryptionhasbeendone.ThemetadataattributeDATA_ENCRYPTvalueissetto1.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpointChecksthatthecollectionis/UNC‐CH/home/HIPAA/Archive

Thesessionvariablesare:$objPath

Theoperationsthatareperformedare:failifmsiAssociateKeyValuePairsToObjmsiEncryptmsiSplitPathmsiString2KeyValPair

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐encrypt.re.

6.5 Encrypt data transfers (Policy 37) TheiRODSdatagridcanbesetuptouseSSL,andautomaticallyencryptdatatransfers.Thisisaconfigurationsettingthatiscontrolledbyenvironmentvariables:

Page 103: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

95

irodsSSLCertificateChainFile(server)‐thefilecontainingtheserver'scertificatechain.ThecertificatesmustbeinPEMformatandmustbesortedstartingwiththesubject'scertificate(actualclientorservercertificate),followedbyintermediateCAcertificatesifapplicable,andendingatthehighestlevel(root)CA.

irodsSSLCertificateKeyFile(server)‐privatekeycorrespondingtotheserver'scertificateinthecertificatechainfile.

irodsSSLDHParamsFile(server)‐theDiffie‐Hellmanparameterfilelocation. irodsSSLVerifyServer(client)‐whatlevelofservercertificatebased

authenticationtoperform.'none'meansnottoperformanyauthenticationatall.'cert'meanstoverifythecertificatevalidity(i.e.thatitwassignedbyatrustedCA).'hostname'meanstovalidatethecertificateandtoverifythattheirodsHost'sFQDNmatcheseitherthecommonnameoroneofthesubjectAltNamesofthecertificate.'hostname'isthedefaultsetting.

irodsSSLCACertificateFile(client)‐locationofafileoftrustedCAcertificatesinPEMformat.Notethatthecertificatesinthisfileareusedinconjunctionwiththesystemdefaulttrustedcertificates.

irodsSSLCACertificatePath(client)‐locationofadirectorycontainingCAcertificatesinPEMformat.ThefileseachcontainoneCAcertificate.ThefilesarelookedupbytheCAsubjectnamehashvalue,whichmusthencebeavailable.IfmorethanoneCAcertificatewiththesamenamehashvalueexist,theextensionmustbedifferent(e.g.9d66eef0.0,9d66eef0.1etc).Thesearchisperformedintheorderingoftheextensionnumber,regardlessofotherpropertiesofthecertificates.Usethe'c_rehash'utilitytocreatethenecessarylinks.

6.6 Federation ‐ control data copies (Policy 38) Aprimaryconcernisthatprotectedfilesinafederationretainappropriateaccesscontrols.Onewaytoachievethisistocopythemetadataattributesforeachfilealongwiththedata,andthenrunthesameACCESS_APPROVALpoliciesinthefederateddatagrid.Thisrulecopiesaccesscontrolsandmetadataattributesforafile.Thisassumesthatequivalentaccountsexistinbothdatagrids.ThisrequiresupgradestosupportafederateddatagridformsiCopyAVUMetadataandmsiLoadACLFromDataObj.Theruleusesthepolicyfunctions:

checkCollInputisData

Theinputvariablesare:*Coll arelativecollectionname*Zone azonename

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Page 104: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

96

Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_UDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITSMETA_DATA_ATTR_VALUETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_NAMEUSER_ZONE

Theoperationsthatareperformedare:failforeachifmsiDataObjCopymsiDataObjUnlinkmsiSetACLmsiSetAVUselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateFiwalkRule.r

6.7 Federation ‐ manage remote data grid interactions (Policy 32) Whentwodatagridsarefederated,decisionshavetobemadeaboutcompatibilityofthedatamanagementpolicies.Ifthedesireistohavebothdatagridsimplementthesamepolicies,thenthepoliciesfromtheUNCgridwillneedtobeloadedintothefederateddatagrid.Thisisofparticularimportanceforensuring:

Accesscontrols Retentionflags Protectedinformation Encryption Approvalflags

Page 105: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

97

6.7.1 Updating rule base across servers TheruleengineiniRODSreadsalocalcopyoftherulebasetoimproveperformance.Coordinationofthemultiplerulebasesisneededwhenpoliciesareupdated.Thisruleset,developedbyChrisSmith,storestherulesintheiCATmetadatacatalog,extractsrulesfromthecatalogintoafile,andthenupdateseachoftheserverrulebases.

6.7.1.1 Storing rules in the DB from a source file.  ThisruleisrunonthemasterICAT.ItreadsafiletoloadrulesintotheiCATcatalog.Oncerulesareloaded,theycanbeversionedbutnotdeleted.Theinputvariablesare:

*inFileName aninputfile*ruleBase arulebase

Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

msiAdmInsertRulesFromStructIntoDBmsiAdmReadRulesFromFileIntoStruct

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsStore.r

6.7.1.2 Prime the ICAT's rule base Thisruleisrunonthemastercatalog.RulesareretreivedfromtheiCATcatalog,andwrittenintoafilefordistribution.

Theinputvariablesare:*outFileName afilename*rloc hostname*ruleBase arulebase

Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation.

Theoperationsthatareperformedare:

ifmsiAdmRetrieveRulesFromDBIntoStructmsiAdmWriteRulesFromStructIntoFileremote

Page 106: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

98

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsApply.r

6.7.1.3 Push rules to resource serversThisrulepushestherulestoalltheresourceservers.Forserversthatdon'thostresources,aseparaterulewillneedtoberunateachservertoprimethelocalrulebasefromtheiCATcatalog.Theinputvariablesare:

*outFileName afilename*ruleBase arulebase

Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

RESC_LOC

Theoperationsthatareperformedare:foreachifmsiAdmRetrieveRulesFromDBIntoStructmsiAdmWriteRulesFromStructIntoFilemsiGetContInxFromGenQueryOutmsiGetMoreRowsmsiExecGenQuerymsiGetValByKeymsiMakeGenQueryremotewhilewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsPush.rAsecondapproachistoallowthefederateddatagridtoimplementaseparatesetofpolicies,butrestrictfileexchangebetweenthedatagridstodatathatdoesnotrequireprotection.Thiscanbecontrolledbyforcingalldataexchangestobedonewithdatathathaveanonymousaccess.ThisrestrictionisimplementedbynotallowinganymemberofthefederateddatagridtohaveanaccountintheUNCdatagrid.ThisminimizestheopportunitytogiveinappropriateaccesstodatawithintheUNCdatagrid.

Page 107: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

99

6.8 Federation – Copy Data from staging area (Policy 20) Filescanbestagedbetweentwodatagrids.Thisrulerecursivelycopiesfilesfromastagingareaintoaseconddatagrid,checksthatthefilesdonotalreadyexistintheseconddatagrid,verifieschecksumsafterthecopy,andsetsaccesspermissions.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Dest acollectionname*DestZone azonename*Owner ausername*Res astorageresource*Src acollectionname

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_MODIFY_TIMEDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremote

Page 108: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

100

selectstrlensubstrwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐stageFederation.r

6.9 Federation‐ manage data retrieval (Policy 39) Inappropriatedataretrievalcanbecontrolledfromafederationbyapplyingthesameaccesscontrolsandpoliciesacrossthefederateddatagrid.Thisisnecessarybecausethefederateddatagridcanbeaccesseddirectly,independentlyoftheoriginaldatagrid.Ifaccessisdonethroughtheoriginaldatagrid,accountscanbeestablishedinthefederateddatagridtocontroldataretrieval.Theaccountsreferencetheoriginaldatagrid:

Accountname UNC‐HIPAA#HIPAAUNC‐HIPAA#publicUNC‐HIPAA#gridAdmin

Accesscontrolscanthenbeappliedinthefederateddatagridforeachaccountintheoriginaldatagrid.Thisrulegeneratesapipe‐delimitedfileofuseraccountsinthedatagrid.Theruleusesthepolicyfunctions: checkRescInput createLogFile

findZoneHostNameisColl

Theinputvariablesare:

*Accounts ausername*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMERESC_IDRESC_NAMEUSER_NAME

Page 109: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

101

USER_TYPEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐create‐accounts.rThisrulereadsanAccountfiletogeneratenewaccounts.Notethattheaccountfileneedstobecopiedintothefederateddatagrid.Thecommandmustalsoberuninthefederateddatagrid.Theaccountnamesarecreatedintheform User_name#zone_nameNotethatthemicro‐servicemsiCreateUserAccountsFromDataObjisusedtoloadtheaccounts.Thismicro‐serviceisnotyetportedtoiRODSversion4.2.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*Path afilepathnameThesessionvariablesare:

$rodsZoneClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_NAME

Theoperationsthatareperformedare:failforeachifmsiCreateUserAccountsFromDataObjmsiSplitPath

Page 110: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

102

selectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐accountImport.r

6.10 Generate checksum on ingestion (Policy 40) Achecksumisgeneratedforeveryfilethatisputintothedatagrid.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpoint

Theoperationsthatareperformedare:msiSysChksumDataObj

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐checksum.re

6.11 Generate report of corrections to data sets or access controls (Policy 41) Theauditlogcanbeparsedtoidentifyallchangestodatasetsoraccesscontrols.Weassumethatanyfileforwhichanewversionhasbeencreatedconstitutesacorrectiontoadataset. Theauditingcapabilitydependsonasetofexternalservicesandrules. Thefollowingservicesareused:ElasticSearch,OSGi,andAMQP.ServiceMixprovidesbothOSGiandAMQP.OntheiRODSserver,auditingrequiresalistofiRODSrules,andclientlibrariesforsendingmessagestotheAMQPservice.Inaddition,networkingontheserversrunningtheseservicesmustbeconfiguredtoallowtheseservicestocommunicate. Therulesthatneedtobeinstalledinclude:databook_pep.re,databook.re,andamqp.re.Therulesetdatabook_pep.reoverridesthedefaultiRODSPEPssothatmessagesaresentforauditing.ThishasthelimitationthatifyoualreadyhavecustomizedPEPs,youhavetomanuallyeditthem.Alternatively,startingfromiRODS4.2,youcaninstalltheauditingpluginwhichwillallowyoutoavoidchangingyourcustomizedPEPs.Therulesetdatabook.reprovidesthemainfunctionalityforauditing.Therulesetamqp.reprovidesrulesforinteractingwithAMQP.Inaddition,PythonlibrariesareusedtosendmessagestoAMQP.Thesecanbesetupusinganautomatedsetupscriptfromthesourcerepository,althoughcustomizingthescriptisusuallynecessaryinordertoachieveaparticularsetup. Oncetheauditingservicesareinstalled,allsystemaccessinformationisstoredinanElasticsearchindex.Theindexcanbequeried.Anadministratorcanretrieveeventsbasedonthefollowingparameters:

fromDate:fromwhichdate toData:towhichdate

Page 111: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

103

event:theevent pid:urifilter start:startingindex andcount:howmanyresultstoreturn

AJavaprogramisusedtointeractwithElasticsearch.ThefollowingexamplegeneratesthenumberofaccesseventsperfileforreportingtoDataONE.Theresultscanbelimitedtoadaterange.TheEventsEnumdefineswhichtypeofeventtomonitor.Thetypesofeventsthataremonitoredarelistedinorg.dataone.service.types.v1.Event.

putdataobjectputgetdataobjectgetoverwritedataobjectoverwritedeletedataobjectdeletereplicatedataobjectreplicatesynch_failuredataobjectsynch_failure

Theprogramisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/dfc‐elasticsearch.java

6.12 Generate report for cost (time) required to audit events (Policy 42) Thisrulequeriestheeventindextoidentifytheamountoftimeneededtorunanaudit.TheexecutiontimeoftheJavascriptforaccessingElasticSearchissavedtocreatethecostreport.

6.13 Generate report of types of protected assets (Policy 43) Asummaryreportcanbegeneratedthatcountsthenumberoffileswithinacollectionforeachtypeofassetclassifier:

1‐ProtectedHealthInformation–PHI 2‐PersonallyIdentifiableInformation–PIIsuchassocialsecuritynumbers 3‐PaymentCardInformation–PCIsuchasaccountnumbers,cardholder

name,expirationdate,servicecode,CID,PINs 4‐Legallyrestricteddata–classified 5‐Proprietaryinformation

Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

Page 112: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

104

*Coll acollectionnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐asset‐report.r

6.14 Generate report of all security  and corruption events (Policy 44) Theauditlogcanbeparsedtoidentifyallaccessevents,andcorrelatetheaccesswithanauthenticationevent.Ifanaccesseventcannotbecorrelatedtoanauthenticationevent,apossiblesecurityeventcanbelogged.Forcorruptionevents,usethepolicyinSection14.Thisidentifiesandlistsallfilesthathavebeencorrupted.

6.15 Generate report of the policies applied to collections (Policy 45) WithintheiRODSdatagrid,policiesarestoredintheiCATmetadatacatalog.Thepoliciesareversioned,suchthateachpolicychangecreatesanewversion.Thepoliciescanbeextractedfromthecatalog,distributedtoeachsitewheredataarestored,andinstantiatedasadistributedrulebasethatcontrolsoperationswithinthedatagrid.TheiRODSdatagridreliesuponadistributedruleengineanddistributedrulebasestoimplementpolicies.Ifapolicyischanged,forconsistencytherevisedrulebaseneedstobeinstalledateachserverlocation.

6.15.1 Deploy rule sets Thisruleidentifierstheservers,anduploadsanewversionoftherulebasetoeachserver.Themicro‐servicesusedbythisruleareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservicesTheinputvariablesare:

Page 113: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

105

*ruleBaseName alistofrulebases*targets alistofhosts

Thepolicyfunctionsinclude:writeRuleSet

Nosessionvariablesareused.Thepolicydoesnotusepersistentstateinformation.

Theoperationsthatareperformedare:

errorcodeforeachifmsiChksumRuleSetmsiReadRuleSetwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐deploy‐rules.r

6.15.2 Update rule sets ThispolicyfunctionreadsandwritesrulesetsthathavebeendepositedintotheiCATcatalog.Themicro‐servicesusedbythispolicyfunctionareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservices.Thepolicyfunctionsinclude:

1. writeRuleSetThisincludesfunctionstowrite,andchecksumrulesets

*rbs alistofrulebases*addrs alistofhostaddresses

2. backupRuleSetThiscreatearulesetbackup *rb arulebase *rbak arulebase

Thepolicyfunctionsareavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐write‐rules.rhttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐backup‐rules.r

6.15.3 Print rule sets ThisruleprintstherulesetusedbyiRODSbylistingthecore.refile.

Page 114: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

106

Noinputvariablesareused.Nosessionvariablesareused.Thepolicydoesnotusepersistentstateinformation.

Theoperationsthatareperformedare:

msiAdmShowIRB

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐print‐rules.r

6.16 List all storage systems being used (Policy 46) Thisruleliststhestoragesystemsthatareattachedtothedatagrid.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation: RESC_NAME

Theoperationsthatareperformedare:

foreachselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐storage.r

6.17 List persons who can access a collection (Policy 47) Forthespecifiedcollection,alistisgeneratedofallpersonswhohaveaccesstofilesinacollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_ID

Page 115: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

107

DATA_ACCESS_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_NAMEUSER_ID

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐access.r

6.18 List staff by position and required training courses (Policy 48) Alistofallpersonswithaccountsinthedatagridcanbegenerated.TheUSER_INFOfieldcanbeusedtoannotatethestaffpositionandthelasttrainingcoursethroughXMLtags:

USER_INFO="<Position>staff</Position><Training>course</Training>"

6.18.1 Set position and training ThispolicymodifiesexistinguseraccountsaccordingtoinformationinaniRODSobject.Theformatoftheaccountfileis:

user‐name|field|new‐valuewherevalidfieldsinclude: type

zonecommentinfopassword

AfilecontainingthedesiredupdatesisloadedintotheReportsdirectory.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:

*Path afilepathnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_NAME

Page 116: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

108

Theoperationsthatareperformedare:

failforeachifmsiLoadUserModsFromDataObjmsiSplitPathselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐update‐user‐info.r

6.18.2 List staff by position and training Areportofallstaffpositionsandthelatesttrainingcanbegenerated.Noinputvariablesareused.Nosessionvariablesareused:Thepolicyusespersistentstateinformation:

USER_INFOUSER_NAMEUSER_TYPE

Theoperationsthatareperformedare:foreachselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐training.r

6.19 List versions of technology that are being used (Policy 49) Areportcanbekeptinthedatagridthatidentifiesthecurrentversionsofthehardwareandsoftwaretechnologiesusedinthepreservationenvironment.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.

Technologyreportname TechVersionReport Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports

Noinputvariablesareused.Nosessionvariablesareused:

Page 117: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

109

Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

msiDataObjGet

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐tech‐report.r

IniRODSversion4.x,technologiesarepluggedintotheiRODSframework.Bylistingallplug‐ins,theversionsofallhardwareandsoftwaresystemscanbeautomaticallytracked.TheizonereportcommandgeneratesajsonfilethatliststheentireiRODSZoneconfigurationinformation.Thecommandizonereportvalidatestheinformationagainsttheschematafoundathttps://schemas.irods.org.

6.20 Maintain document on independent assessment of software (Policy 50) Thereportonsoftwareassessmentcanbemanagedwithinthedatagrid.ThispolicyretrievesthespecifieddocumentfromtheReportdirectory.

Softwareassessmentreportname softwareAssessment Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports

Noinputvariablesareused.Nosessionvariablesareused:Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:

msiDataObjGet

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐assessment‐report.r

6.21 Maintain log of all software changes, OS upgrades (Policy 51) Thelogofsoftwarechangesismaintainedbythedatagridoperators.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.

Technologyreportname LogSoftwareChanges Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports

Noinputvariablesareused.Nosessionvariablesareused:

Page 118: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

110

Thepolicyusesnopersistentstateinformation:

Theoperationsthatareperformedare:msiDataObjGet

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐store‐log.r

6.21.1 Version log files Eachversionofalogfilecanbetracked.Whenafileisaddedtothesystem,aversionlabeledbythecurrenttimestampissaved,ensuringthatahistoryofchangescanbemaintained.Theversionismovedtoanarchivedirectory.

Theversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.TheownershipofthefileissettothehipaaAdminaccount.Theruleislistedinsection4.7.1.

6.22 Maintain log of disclosures (Policy 52) Adisclosurelogidentifiesalleventsassociatedwithunauthorizedaccesstofiles.Thewaysthismayhappeninclude:

Incorrectsettingofaccesscontrolsonthefilesinacollection.OnewaytodetectthisistologallfilesinacollectionthatdonothaveACCESS_APPROVALsetto1,buthaveanonymousorpublicaccess.

Directreadingofthefileondiskwithoutgoingthroughthedatagrid.Thismayhappenwhenasecurityvulnerabilityispresentwithintheoperatingsystemthathasnotbeenpatched.Detectionofthistypeofaccessrequiresparsingthesystemlogforthecomputer.

Unauthorizeduseofanaccount.Thisrequiresthattheunauthorizeduserlearnthepasswordassociatedwiththeaccount.Thismayhappenwhenapasswordissharedorstolen.Detectionofthistypeofaccessrequiresinteractionwiththeaccountownertodeterminewhethertheymadetheaccess.

Inallthreecases,areportcanbegeneratedthatisupdatedexternallytothedatagrid.Thereportcanbestoredinthedatagridwithversioningenabled,anddeletionturnedoff.TheversionisstoredinReports/Backup.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.

Technologyreportname DisclosureReport Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports Version /UNC‐CH/home/HIPAA/Reports/Backup

Aruletostorethereportusesthepolicyfunction: checkRescInput

Page 119: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

111

findZoneHostNameTheinputvariablesare:

*destRescName astorageresourceThesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:

COLL_NAMEDATA_IDDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiDataObjPutmsiSplitPathByKeymsiStoreVersionWithTSremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐version‐report.rToturnoffdeletiononcollection/UNC‐CH/home/HIPAA/Reports,setthepolicyenforcementpointacDataDeletePolicy.Thepolicyimplementsaconstraint:

AppliedattheacDataDeletePolicypolicyenforcementpointAcheckismadethattheobjectpathislike"/UNC‐CH/home/HIPAA/Reports/*"

Theoperationsthatareperformedare:

msiDeleteDisallowedTheruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acDataDeletePolicy.re.

Page 120: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

112

6.23 Maintain password history on user name (Policy 53) Ahistoryofpriorpasswordscanbekeptaseventsinanexternalindex.ThechallengeisthatthecurrentdesigndoesnotgenerateanidentityfortheuseruntilaftertheacCheckPasswordStrengthhasbeenexecuted.Oneapproachistocheckthepasswordhistoryaftertheusernameisdefined,withintheacSetPublicUserPolicyenforcementpoint.Metadataattributesforthepriorpasswordscanthenbechecked.Ifasimilarpriorpasswordisfound,arequesttochangethepasswordcanbemadeandtherulecanfail.Themetadataattributesare:

META_USER_ATTR_NAME PasswordHist META_USER_ATTR_VALUE priorpassword META_USER_ATTR_UNITS Setto0forcurrentpassword

ThispolicyloadspasswordsasattributesontheUSER_NAME.Thepolicyimplementsaconstraint:

AppliedattheacSetPublicUserPolicypolicyenforcementpoint

Thesessionvariablesthatareusedare:$userNameClient

Theoperationsthatareperformedare:

foreachifmsiAssociateKeyValuePairsToObjmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy.re.

6.24 Parse event trail for all accessed systems (Policy 54) Theauditlogcanbequeriedtoidentifyallaccessestotherepository.Foreachaccess,thestorageresourcecanbespecified.Theresultscanbesummarizedtoidentifyallofthestorageresourcesthatwereaccessed.

6.25 Parse event trail for all persons accessing collection (Policy 33) Theauditlogcanbequeriedtoidentifyallaccessestofilesinacollection.Foreachaccess,theidentityoftheaccountmakingtherequestisknown.Theresultscanbesummarizedtoidentifyallpersonswhoaccessedthecollection.Seesection5.16.

Page 121: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

113

6.26 Parse event trail for all unsuccessful attempts to access data (Policy 55) Eachaccessofthedatagridisauthenticated.Iftheauthenticationfails,aneventcanbegeneratediftherequestedoperationwasareadattempt.Theauditlogcanthenbequeriedtoidentifyallunsuccessfulaccessattemptstofilesinacollection.Theresultscanbesummarizedtoidentifytheaccountsthathadunsuccessfulaccessattempts.

6.27 Parse event trail for changes to policies (Policy 56) TheiRODSdatagridcanmaintainaneventdatabasethatlistsalleventsassociatedwithmanagingoraccessingthedatasystem.Thepoliciesthatrecordeventsgeneratemessagesthataresenttoanexternalindexingsystem.Bysearchingintheexternalindex,eventsassociatedwiththepolicyenforcementpointscanbeidentified:

pep_PLUGINOPERATION_prepep_PLUGINOPERATION_post

ChangestopoliciesshouldbesavedintheiCATcatalogasruleversionsusingthemicro‐services

msiAdmReadRulesFromFileIntoStruct msiAdmInsertRulesFromStructIntoDB

Thecorrespondingeventsintheeventdatabaseare:

pep_msiAdmReadRulesFromFileIntoStruct_pre pep_msiAdmReadRulesFromFileIntoStruct_post pep_msiAdmInsertRulesFromStructIntoDB_pre pep_msiAdmInsertRulesFromStructIntoDB_post

Aqueryisissuedagainsttheeventindexbyissuingalibcurlcall.Theoperationsthatareperformedare:

msiCurlGetStrwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐issue‐url.r.

6.28 Parse event trail for inactivity (Policy 57) Eachaccessofthedatagridistreatedasaseparatesession.Theuserisauthenticatedandtheoperationisauthorized.Whentherequestedoperationcompletes,thesessionisterminated.Thususerscannotbeloggedintothedatagridwithoutapplyingoperationsonthedata.Usersareonly“logged”intothedatagridwhiletheyareapplyingoperationsontheirdata.Thereisthepossibilityoflong‐runningoperations,suchasvalidatingchecksumsforallfilesinacollection.However,theseareexpectedusesofthesystem.

Page 122: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

114

6.29 Parse event trail for updates to rule bases (Policy 58) Theauditlogcanbequeriedtoidentifyallupdatesmadetothepolicies.Eventscanbegeneratedthatcorrespondtoexecutionofthemicro‐servicethatcreatesnewversionsofrulesthatareregisteredintotheiCATcatalog.Theresultscanbewrittentoafileorprinted.

6.30 Parse event trail to correlate data accesses with client actions (Policy 59) EventscanbegeneratedforaccessesthatincludethetypeofclientAPIthatwasused.EachclientAPIinteractsthroughaplug‐inthatcantrackusageevents.Eventsthataretrackedinclude:

dataobjread dataobjectupdate dataobjectoverwrite dataobjectput dataobjectget dataobjread dataobjwrite dataobjcreate dataobjremove

6.31 Provide test environment to verify policies on new systems (Policy 60) ThetestenvironmentshouldbeanindependentiRODSdatagridwithaseparateiCATcatalog,separatestorageservers,anddisjointuseraccounts.Thedirectorystructureshouldbesimilartotheproductionenvironment.Thispolicydownloadstherulesfromthetestenvironment,andstorestheminafile.Weassumethefollowing:

Testzoneiscalled uncTestZone Adminaccountiscalled uncTestAdmin Testzonerulebaseiscalled TestBase Rulefileiscalled NewRules

Theinputvariablesare:

*FileName afilenamein'server/config/reConfigs/'directorywithan.reextension

*RuleBase arulebasenameNosessionvariablesareused:Nopersistentstateinformationisused.

Theoperationsthatareperformedare:

msiGetRulesFromDBIntoStruct

Page 123: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

115

msiAdmShowIRBmsiAdmWriteRulesFromStructIntoFile

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐export‐policies.r

AsecondrulereadstherulesfromthefileNewRulesandloadsthemintotheproductioniCATcatalog.Theinputvariablesare:

*FileName afilenamein'server/config/reConfigs/'directorywithan.reextension

*RuleBase arulebasenameNosessionvariablesareused:Nopersistentstateinformationisused.

Theoperationsthatareperformedare:

msiAdmInsertRulesFromStructIntoDBmsiAdmReadRulesFromFileIntoStructmsiAdmShowIRB

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐import‐policies.r

6.32 Provide test system for evaluating a recovery procedure (Policy 61) Atestsystemwouldideallycontainacompletesetofrecordsfromtheoriginaldatagrid,includinganup‐to‐datecopyofthemetadatacatalog.Arecoveryprocedurewouldthenneedtodothefollowingsteps:

RecreatetheiCATcatalogfromthetestsystem.Thiswouldsetaccounts,definestorageresources,definefilenames,definecollections

Achecksumonthefileswouldthenberuntodetectanycorruptedfiles. Corruptedfileswouldbereplacedfromthetestsystem

Areplicationrulecouldberuntodetectproblems.Ifoneofthereplicasintheoriginaldatagridisstillgood,thisshouldbesufficient.However,ifnogoodreplicasexist,thenthefilewillneedtobereplacedfromthetestsystem.Areplicationruleislistedinsection4.5.2

6.33 Provide training courses for users (Policy 62) Informationabouttrainingcoursescanbekeptinaseparatedatabase.Foreachstaffposition,asetofrequiredtrainingcoursescanbedefined.Thelistofrequiredcoursescanbecomparedwiththecoursesthatweretaken,andstoredasUSER_INFO.

Page 124: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

116

6.34 Replicate data sets on ingestion (Policy 13) Whenafileisputintothecollection/UNC‐CH/home/HIPAA/Archive,itwillbereplicatedtoasecondstoragesystem.TheruleisenforcedattheacPostProcForPutpolicyenforcementpoint.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpointChecksthatthecollectionislike"/UNC‐ARCHIVE/home/Archive/*"

Thesessionvariablesthatareusedare:

$objPathTheoperationsthatareperformedare:

msiSysReplDataObjTheruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐replicate.re.

6.35 Replicate iCAT periodically (Policy 63) Atypicalapproachtoensuringthatthemetadataattributesareappropriatelybackedupistosetupamirrorcatalog,andusedynamicupdatestothemirrorcatalogtomaintainanactivecopy.Thisapproachworksaslongastherearenoerrorsintheoriginalcatalog.Toenablerecoveryfrompropagatederrors,anindependentsnapshotofthecatalogcanbeperiodicallycreated.Thisprovidesasecondrecoverymechanismincasebothcatalogsarecompromised.Inadditiontoreplication,thecatalogindicesneedtobeperiodicallyoptimized.Thisimprovesperformance.

6.36 Set access approval flag (Policy 64) ThisrulesetstheACCESS_APPROVALflagto1,andenablesaccessbypublicandanonymoususers.Theruleusesthepolicyfunctions: addAVUMetadata

checkCollInput deleteAVUMetadataTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:

Page 125: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

117

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME

Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐access‐set.r

6.36.1 Restrict access for “Protected” data  Eachcollectionthatcontains“Protected”informationwillhaveanApprovalflag,called

ACCESS_APPROVALWhenthevalueofthisattributeissetto“0”,nopublicoranonymousaccessisallowedtofileswithinthecollection.ThisrulesetstheACCESS_APPROVALflagto0foreveryfileinacollection,andrestrictsaccessbypublicandanonymousaccounts.Theruleusesthepolicyfunctions:

addAVUMetadatacheckCollInputdeleteAVUMetadata

Theinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_ID

Page 126: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

118

COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME

Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐restrict‐access.r

6.37 Set access controls (Policy 14) Thisrulekeepsusersfromseeingthenamesofotheruser’sfiles.TherulesetstheAccessControlListpolicy.IftheruleisnotcalledorcalledwithanargumentotherthanSTRICT,theSTANDARDsettingisineffect,whichisfineformanysites.Bydefault,usersareallowedtoseecertainmetadata,forexamplethedata‐objectandsub‐collectionnamesineachother'scollections.WhenaccesscontrolsaremadeSTRICTbycallingmsiAclPolicy(STRICT),theGeneralQueryAccessControlisappliedoncollectionsanddataobjectmetadatawhichmeansthatthelistcommand,ils,willneed'read'accessorbettertothecollectiontoreturnthecollectioncontents(nameofdata‐objects,sub‐collections,etc.).Thedefaultisthenormal,non‐strictlevel,allowinguserstoseenamesofothercollections.Inallcases,accesscontroltothedata‐objectsisenforced.Evenifapersoncanseefilenamesinacollection,“read”accessisrequiredonafiletobeabletoreadthefile.EvenwithSTRICTaccesscontrol,however,theadminuserisnotrestrictedsovariousmicroservicesandquerieswillstillbeabletoevaluatesystem‐wideinformation.Thesessionvariable,“$userNameClient”canbeusedtolimitactionstoindividualusers.However,thisisonlysecureinanirods‐passwordenvironment(notGSI),butyoucanthenhaverulesforspecificusers:

acAclPolicy{ON($userNameClient=="quickshare"){}}acAclPolicy{msiAclPolicy("STRICT");}

whichwasrequestedbyARCS(SeanFleming).SeersGenQuery.cformore

Page 127: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

119

informationon$userNameClient.Thetypicaluseistojustsetitstrictornotforallusers.ThepolicycanbeupdatedintheiRODScore.refile.Thepolicyimplementsaconstraint:

AppliedattheacACLPolicypolicyenforcementpointTheoperationsthatareperformedare:

msiAclPolicyTheruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re

6.37.1 Set access controls after proprietary period Thisrulechecksaflagforwhetheraproprietaryperiodhaselapsed,andthenprovidespublicaccesstothefile.TheflagACL_EXPIRYdefinesthedateandtimeafterwhichthefilebecomespublic.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll arelativecollectionnameThesessionvariablesare:

$rodsZoneClient$userNameClient

Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Theoperationsthatareperformedare:failforeachifmsiSetACLmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐set‐ACL.r

Page 128: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

120

6.38 Set access restriction until approval flag is set (Policy 65) Whenafileisaddedtoacollection,itnormallycanonlybeaccessedbytheowner,thepersonuploadingthefile.Thefilecaninheritaccesscontrolsfromitscollectionifthestickybitisenabled.Thisappliestheaccesscontrolsfromthecollectionastheaccesscontrolsonthefile.Astandardsequenceisto:

Turnofftheinheritflagonthecollection Loadafileintothecollection.Thefilecanonlybeaccessedbytheownerof

thefile. Explicitlyaddaccesscontrolsforagroup

o MembersofthegroupcanthenaccessthefileWhentheapprovalflagissettoone,thenpublicaccesscanbeenabled.Publicaccessallowsaccessbyallaccountswithinthedatagrid.Foraccessbypersonswithoutanaccountinthedatagrid,Anonymousaccessmustalsobeenabled.

6.39 Set approval flag per collection for enabling bulk download (Policy 66) Bulkdownloadsareinitiatedbyaclient,whichmanageseitheraloopoveraspecifiedfilesetoroverfilesinacollection.Restrictionofbulkdownloadrequiresapolicyenforcementpoint,acBulkGetPreProcPolicy.Thiscouldbeturnedoffforingeneral.Thepolicyimplementsaconstraint:

AppliedattheacBulkGetPreProcPolicypolicyenforcementpointTheoperationsthatareperformedare:

msiSetBulkGetPostProcPolicyTheruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐off.re

Bulkprocessingcanbeturnedoffforacollection.Thepolicyimplementsaconstraint:

AppliedattheacBulkGetPreProcPolicypolicyenforcementpointAcheckismadeforaspecificcollection"/UNC‐CH/home/HIPAA"

Thesessionvariablesare:

$objPath

Theoperationsthatareperformedare:ifmsiSetBulkGetPostProcPolicymsiSplitPath

Page 129: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

121

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐on.re

Bulkprocessingcanbecontrolledforacollectionthathasaflag“BulkDownLoad”withavalue“off”.Thepolicyimplementsaconstraint:

AppliedattheacBulkGetPreProcPolicypolicyenforcementpointThesessionvariablesare:

$objPath

Theoperationsthatareperformedare:ifforeachmsiSetBulkGetPostProcPolicymsiSplitPathselect

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐flag.re

ThesepoliciescanbeupdatedintheiRODScore.refile.

6.40 Set asset protection classifier for data sets based on type of PII (Policy 67) Eachdatasetshouldbeassignedaprotectionclassifierthatdefineswhetherthefilecontains:

1‐ProtectedHealthInformation–PHI 2‐PersonallyIdentifiableInformation–PIIsuchassocialsecuritynumbers 3‐PaymentCardInformation–PCIsuchasaccountnumbers,cardholder

name,expirationdate,servicecode,CID,PINs 4‐Legallyrestricteddata–classified 5‐Proprietaryinformation

Theclassifierisstoredinametadataattributeforeachfile: META_DATA_ATTR_NAME=AssetProtectionClassifier META_DATA_ATTR_VALUE=“protectionclassifiervalue1‐5” META_DATA_ATTR_UNIT=“”

AnapproachistouseabitcuratorruletoassignassetclassifierforPII,PHI,PCI.

6.41 Set flag for whether tickets can be used on files in a collection (Policy 68) TheiRODSdatagridsupportsthecreationofticketsthatenableaccesstospecificdatasetsbypersonswhodonothaveanaccount.Theticketscontrolthenumberofallowedaccessesandthetimeperiodduringwhichtheaccesscanbemade.ForcollectionsthathavetheACCESS_APPROVALflagsetto0,ticket‐basedaccessisprohibited.

Page 130: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

122

Thepolicyimplementsaconstraint:AppliedattheacTicketPolicypolicyenforcementpoint

Thesessionvariablesare: $objPathTheoperationsthatareperformedare:

ifforeachmsiSplitPathselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acTicketPolicy.re.

6.41.1 Remove public and anonymous access Ticketaccessrequiresthatanonymousaccesspermissionbeset.WhentheACCESS_APPROVALflagissetto0,anonymousaccessisturnedoff.ThusticketaccesscanbecontrolledbysettingtheACCESS_APPROVALflagto0.Therulelistedinsection6.36.1canbeusedtosettheACESS_APPROVALflagto0.

6.42 Set lockout flag and period on user name ‐ counting number of tries (Policy 69) 

Whenauserexceedsthenumberofallowedattemptswhentryingtologonwithoutsuccess,alockoutflagwillbesetforaspecifiedperiodoftime.Ideallythisisdonebytheauthenticationsystem.

6.42.1 Set lockout period on user name Thecodethatcheckstheusernamewillneedtobeaugmentedwithapolicyenforcementpoint(acChkUserLogon)thatimplementsthreemetadataattributesforauser:

META_USER_ATTR_NAME NumberAttempts META_USER_ATTR_NAME LockoutPeriod META_USER_ATTR_NAME ResetPassword

ThecontrolpointacChkUserLogonwillneedtobecalledforeverycontrollediCommand.NotethattheNumberAttemptscounterwillneedtobesetbackto“0”onasuccessfullogin.Thisrulesetsincrementstheattemptcounter,andsetsanexpirationtimewhentheallowednumberofattemptsisexceeded.Thepolicyimplementsaconstraint:

AppliedattheacChkUserLogonpolicyenforcementpoint

Page 131: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

123

Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:

foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acChkUserLogon.re.Asecondruleteststheexpirationtimetoreleasethelockoutflag.ThisrulecouldbeaddedtotheacSetPublicUserPolicy.Thepolicyimplementsaconstraint:

AppliedattheacSetPublicUserPolicypolicyenforcementpoint

Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:

foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy‐lockout.re.

6.43 Set password update flag on user name (Policy 70) Aflagisassociatedwitheachusernametospecifywhethertheyneedtoupdatetheirpassword.Thisusestheattribute:

META_USER_ATTR_NAME ResetPasswordThevaluecanbesetto‘1’forallusersbytheadministrator.Noinputvariablesareused.

Page 132: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

124

Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

META_USER_ATTR_NAMEMETA_USER_ATTR_VALUEUSER_NAME

Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐passwordUpdate.r

EachtimetheacSetPublicPolicyenforcementpointisexecuted,theResetPasswordflagcanbecheckedandamessagecanbewrittentostdout.Thepolicyimplementsaconstraint:

AppliedattheacSetPublicUserPolicypolicyenforcementpoint

Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:

foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy‐reset.re.

6.44 Set retention period for data reviews (Policy 71) TheiRODSdatagridprovidesametadataattribute,DATA_EXPIRY,foraretentionperiod.Thechoiceofwhattodowhentheretentionperiodisoverisgovernedbyadispositionpolicy.OneapproachistosetDATA_EXPIRYforadatareview.Aquery

Page 133: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

125

canthenbeissuedtoidentifyfilesthatneedtobereviewed.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_EXPIRYDATA_NAME

Theoperationsthatareperformedare:foreachifmsiGetSystemTimeselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐retention‐review.r

6.45 Set retention period on ingestion (Policy 21) Asystemattribute,DATA_EXPIRY,isusedtodefineanexpirationdateforadigitalobject.ThisrulesetsanexpirationdateaspecifiednumberofsecondsgreaterthantheingestiontimeforaspecifiedcollectionThepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpointChecksforcollectionequalto“/UNC‐ARCHIVE/home/Archive”

Thesessionvariablesare: $objPathTheoperationsthatareperformedare:

ifmsiGetSystemTimemsiSplitPathmsiSysMetaModify

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐expiry.re.

Page 134: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

126

6.46 Track systems by type (server, laptop, router,….) (Policy 72) Eachsystemusedwithintherepositorycanbelabeledbyitstype.TheinformationcanbekeptinafilethatisstoredintheReportsfolder.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.

Technologyreportname LogSystemType Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports

Theinputvariablesare:

*destRescName astorageresourceNosessionvariablesareused.Nopersistentstateinformationisused.

Theoperationsthatareperformedare:

msiDataObjPut

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐store‐system‐log.r

6.47 Verify approval flags within a collection (Policy 73) Thisruleexaminesacollectiontodeterminewhetheranyofthefileshavenotbeenapprovedforaccess,andlistsallsuchfiles.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Theoperationsthatareperformedare:failforeachifselectwriteLine

Page 135: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

127

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐check‐access‐approval.r

6.48 Verify files have not been corrupted (Policy 18) Theruleforverifyingthatfileshavenotbeencorruptedcanbecombinedwiththeruletocheckexistenceofreplicas.Aversionoftheruleislistedinsection4.5.2.

6.49 Verify presence of required replicas (Policy 74) Arulecanberunperiodicallytoverifythateveryfilehasareplica.Thisrulechecksboththeexistenceoftherequiredreplica,validatesthechecksums,andreplacesmissingorcorruptedfiles.Aversionoftheruleislistedinsection4.5.2.

6.50 Verify that no controlled data have public or anonymous access (Policy 75) Eachcollectionthatcontains“Protected”informationwillhaveanApprovalflag,called

ACCESS_APPROVALWhenthevalueofthisattributeissetto“0”,nopublicoranonymousaccessisallowedtofileswithinthecollection.Whentheflagissetto“1”,anonymousaccessisallowed.

6.50.1 Restrict access to “Protected” data ThisrulecheckstheACCESS_APPROVALflag,andrestrictsaccessbypublicandanonymousaccounts.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUEUSER_IDUSER_NAME

Theoperationsthatareperformedare:foreachifmsiSetACLselectwriteLine

Page 136: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

128

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐verify‐access‐approval.r

6.51 Verify that protected assets have been encrypted (Policy 76)Checkthatallfilesinthecollection

/UNC‐CH/home/HIPAA/ArchivehavetheDATA_ENCRYPTflagsetto1.Iftheflagismissingorthevalueisnot1,writeanoutputlineandencryptthefile.

6.51.1 Check that files with ACCESS_APPROVAL = 0 are encrypted ThisversionoftherulelooksfortheACCESS_APPROVALflag.Ifthevalueissetto0,thenthefileencryptionischecked.Ifthefileisnotencrypted,anoutputlineiswrittenandthefileisencrypted.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE

Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiEncryptmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐encrypt‐check.r

Page 137: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

129

7 Data Management Plan Example Rules Datamanagementplans(DMPs)arerequiredbytheNationalScienceFoundationandotherfederalagenciesforeverysubmittedproposal.TheDMPsspecifytasksrelatedtoformationofthedigitalcollection,analysis,storage,publication,andarchives.Theexpectationisthatthetaskscanbeautomatedthroughpoliciesthatareeitherappliedatpolicyenforcementpoints,orthatareperiodicallyexecuted.AnanalysisofNSFrequirementsforDMPsisshowninTable7.1.Atotalof38taskswereidentified,alongwiththetypeofenvironmentvariableneededasinputforeachtask.

Table7.1.DataManagementPlanTasks      DMP tasks                    Variable  Policy

1  Collection  Managers & staff  Roles  48

2     Costs  Budget  24

3     Collection plans  How, what  45

4     Instrument types  Type  77

5     Event log  Event  54

6     Collection report  Event  41

7     Required data policies  Products  17

8     Data category   Type  78

9     Use of existing data  Source  79

10  Analysis  Quality control  Plans  80

11     Analysis plans  Plans  81

12     Data sharing during analysis  Who  82

13     Data dictionary / glossary  Type  29

14     Naming includes  Attributes  83

15     Data format type  Type  16

16     DOI for data sets  Type  27

17     Metadata standard  Type  29

18     Metadata export as  Type  84

19  Storage  Collection  Location  85

20     Size  Size  86

21  Publication  Make original data public  When  87

22     Make Data products public   When  88

23     Re‐use   Policies  89

24     Re‐distribution  Community  90

25     Access restrictions  Privacy  14

26     IPR  Type  91

27     Web access through  How  92

28     Data sharing system  Type  93

29     Code distribution system  Type  94

30  Archive  Retention period  Period  21

31     Curation  Plans  95

32     Archive  Location  96

33     Number of replicas  #  13

34     Backup frequency  Policies  97

35     Integrity check frequency  Policies  18

36     Technology evolution  Plans  49

37     Catalog  Metadata  9

38     Transformative migration  Formats  15

EachdirectorateanddivisionatNSFhasselecteddifferentaspectstoemphasize.ThesepreferredtasksareindicatedinTable7.2.

Page 138: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

130

Table7.2.DMPtasksbyNSFDirectorate/Division   AGS  AST  CHE  CISE  DMR  EAR  EHR  ENG  GEN  OCE  PHY  SBE 

1           X           X     X     X 

2           X     X     X           X 

3  X  X  X  X  X  X  X  X  X  X  X    

4                             X       

5                             X       

6                             X       

7  X  X  X  X     X  X        X     X 

8  X  X  X  X  X  X  X  X  X  X  X  X 

9  X  X  X  X  X  X  X  X  X  X  X  X 

10           X  X  X     X  X  X  X    

11           X  X  X     X  X     X    

12        X                    X       

13                             X       

14                             X       

15  X  X  X  X  X     X  X  X  X  X  X 

16                 Cite        URL          

17  X  X  X  X  X  X  X  X  X  X  X  X 

18     X        X  X  X  X  X  X  X  X 

19  X  X  X                    X       

20  X  X  X  X  X  X  X  X  X  X  X  X 

21  X  X  X  X  X  < 2 yrs  X  X  X  X  X  X 

22  X  X  X  X  X  X  X  X  X  X  X  X 

23  X  X  X  X  X  X  X     X  X  X  X 

24  X  X  X  X  X  X  X     X  X  X  X 

25  X  X  X  X  x  X  X  X  X  X  X  X 

26  X  X  X  X  X  X  X  X  X  X     X 

27  X  X  X     X  X  X  X  X  X  X  X 

28  X  X  X  X  X  X  X  X  X  X  X    

29                             X       

30  X  X  X  X  X  X  X  >3 yrs  X  X  X  X 

31  X  X  X  X  X  X  X  X  X  X  X  X 

32  X  X  X  X  X  EAR  X  X  X  X  X  X 

33  X  X  X  X  X  X  X  X  X     X    

34  X  X  X  X  X  X  X  X  X     X    

35  X  X  X  X  X  X  X  X  X     X    

36  X  X  X                            

37  X  X  X     X  X  X  X  X     X  X 

38  X  X  X     X  X  X  X  X     X    

TounderstandhowactualDMPswerecreated,18DataManagementPlans(DMP)werecomparedtodeterminewhetheracommonsetofpoliciescouldbeimplementedforautomatingmanagementtasks.TheDMPswereacquiredfromtheDataONEwebsite(exampleDMPs)andfromtheDataManagementPlanningtool(publicDMPSfromtheCaliforniaDigitalLibrary).EachDMPwascomparedwiththetasksdeterminedfromtheNSFrequirements.Theexpectationisthateachtaskcanbeautomatedbycreatingasetofdatamanagementpoliciesforsettingenvironmentvariables(suchasretentionperiod),enforcingthepolicy,andverifyingthepolicy.ThetasksfromtheDMPsarelistedinTables7.3Aand7.3B.,ThetasksspecifiedintheDMPsvarieddramatically.Forthetasksthatdependeduponanenvironmentalvariable,thevalueofthevariablewasspecifiedforeachtaskforeachplan.

Page 139: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

131

Table7.3A–Publisheddatamanagementplans

Task 

Environment 

Variables 

Cultural O

bjects 

Mauna Loa CO2 

Sensor 

Surface weather data 

Multim

edia Text 

Annotation 

Parietal Cortex 

Biosignature Suites 

Peer Power 

Anthropod Responses 

Andvari 

      NEH  AGS  AGS NEH BIO BIO CISE GEN  NEH

1  Roles  X     X    X

2  Budget          

3  How, what  X  X  X X X X 

4  Type          

5  Event     X    

6  Event     X    

7  Products          

8  Type  X  X  X X X X X 

9  Source        GHCN‐D    Institutions

10  Plans     X  X   

11  Plans  X  X Generate netCDF                   

12  Who          

13  Type          

14  Attributes     timestamp                time stamp    

15  Type  .txt  CSV, text CSV, 

netCDF    

.plx, 

.dvt, .avt 

.pdf, .tif, .csv  .txt  .xsl, .csv    

16  Type  URL  X  X   

17  Type  METS  X  X  X       Dublin Core  EML    

18  Type     XML  XML images   

19  Location          

20  Size          

21  When  X  X  X    X

22  When     6 months  review project end  2 yrs        publication  review 

23  Policies          

24  Community          

25  Privacy  none       creator, copyright        CCL       

26  Type  none       

27  How  URLs  URLs  URLs URLs FTP website URL  website

28  Type           Dspace  iRODS google docs  Dspace       

29  Type  UCSC     GitHub   

30  Period     forever  forever 5 yr 10 yr forever long‐term  project

31  Plans          

32  Location  UC3  ORNL  ORNL       IDEC, CCNP 

UNM‐Dspace  mySQL    

33  #  1     3   

34  Policies  Daily    Daily, 

monthly        periodic  periodic  daily    

35  Policies          

36  Plans          

37  Metadata          

38  Formats          

Page 140: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

132

Table7.3B–DataManagementPlans

Task 

Making Data Count 

Collaboration as a 

means of retention 

Meterological 

measuremen

ts East 

Antartica 

Project 1 data 

managem

ent 

Agent‐based

 model of 

population 

Engineered Bioactive 

Interfaces 

Inquiring into 

Engineering 

Certain Stem 

HydroShare 

   NSF  IES  AGS  SBE SBE GEN ENG EHR AGS 

1     X     X X X X   

2                

3  X  X  X     X X X   

4                

5                

6                

7                

8     X        X X X X   

9  DataONE           Yes        Yes 

Time series, Geospatial ‐ NASA, 

USGS NWI 

10              Yes   

11                         

Web Map, WaterOneFlow, Web 

Feature, Web Coverage 

12                

13                

14                    metadata source, date    

15       .txt, .csv 

.html, .txt, .csv, .xml  ArcGIS  .xsl, .tif, .txt 

audio, .txt, .xsl       

16  EZID           X   

17  COUNTER     WMO     FGDC     X  education WaterML, OGC, ISO, 

INSPIRE 

18        .txt     CUAHSI HIS

19                

20                

21  Apache 2  X  X     X NSF Collaboration driven

22           project project   

23              Use agreement

24                

25  IRB  IRB        proprietary IRB IRB Research group driven

26                

27        website  website website website HTTP, FTP, DataONE

28  Merritt                      CUAHSI HIS, HUBzero, 

iRODS 

29  X           gForge

30        10 yrs     3 yrs 3 yrs 10 yrs   

31                

32  GitHub  OSF.io  US ADC  EPA    Uknowl‐edge 

DataCommons     HydroShare 

33        3     2 1 2   

34                

35       3 

months                   

36              iRODS

37              DataONE

38                

Page 141: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

133

Task27specifiedthetypeofaccessclientthatcouldbeusedtointeractwiththedatacollection.MostDMPsplannedtopublishdatathroughalocalwebsiteortoprovidepersistentURLstoenableremoteaccesstothedatasets.Task30specifiedtheretentionperiod.Somesitesplannedtokeepthedataforever,oraslongasthedesignatedrepositorywasfunctional.Task32specifiedtherepositorywherethedatasetswouldbemanaged.TheDMPsidentifiedawidevarietyofdatamanagementsystems,fromlocaldiskcaches,tolocaldatabases,toinstitutionalrepositories,tofederalrepositories.MostoftheDMPSdidnotspecifytheresourceswherethecollectionwouldbeassembled,andinsteadspecifiedthefinalarchive.ThemostcomprehensiveDMPplanthatwasexaminedwastheDataONEexampleDataManagementPlanfor“AtmosphericConcentrations,MaunaLoaObservatory,Hawaii,2011‐2013”.Thisplanincluded16ofthepolicies.TheMaunaLoaDMPislistedinAppendixF.Weanalyzedtheplantoidentifythedatamanagementrequirementsandextractedthefollowingtasks: 3.Plansforassemblingthecollection

5.Maintenanceofaneventlogrecordingchangestosensors6.Maintenanceofacollectionreport8.Categorizationasobservationaldata10.Qualityassessment11.Analysisplans14.Timestampincludedinfilename15.Datatypesare.csv,.txt16.DOIcreatedforeachfile17.Metadatastandardbasedondiscipline18.MetadataexportedasXML21.Alloriginaldataismadepublic22.Dataproductsaremadepublicafter6monthsandreview27.WebaccessprovidedthroughURLs30.Dataretainedforever32.DataarchivedatORNL

AsimilaranalysiswasdoneforadministrationofprotecteddataatUNCincludingPII,PHI,andPCIdatatypes.Atotalof48taskswasidentified,includingpasswordstrengthassessments,detectionofthepresenceofprotecteddata,characterizationofthetypeofprotecteddata,loggingofaccessevents,andanalysisofaudittrails.Thisindicatedthatthetasklistfordatamanagementplansisexpectedtoexpandasadditionaltypesofdataaremanaged.Foreachtask,wecreateacomputeractionablerulethatcanbeusedtoautomateexecution.WeusetheintegratedRuleOrientedDataSystemrulelanguagetowritetherules.Theresultingrulesarelistedbelowforeachtask.

Page 142: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

134

7.1 Staffing policies (Policy 48) Therolesneededtoimplementadatamanagementplaninclude:

1. administrator–personmakingthefinancialcommitmentformaintainingtherepository

2. collectionmanager–personmaintainingthepropertiesofthedatacollection(requiredmetadataanddataformatstandards,collectionquality)

3. datagridadministrator–personmaintainingthepropertiesoftherepository(repositorysoftwareupgrades,driversforstoragesystems,clients)

4. informationtechnologyadministrator–personmaintainingthestoragesystems,network,authenticationsystems.

Typically,atleasttwopersonsareneededforeachofthedatagridandinformationtechnologyadministratorpositions.Thisprovidesredundancyneededtoensureaccessacrossvacations.Thefollowingpolicycountsthenumberofdatagridadministratorsforacollection.Thepolicychecksthenumberofuserswhocanaccessaspecifiedcollectionandliststheiraccountnames.Therearenoinputvariables.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

USER_NAMEUSER_TYPE

Theoperationsthatareperformedare:foreachselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐list‐admin.r

7.2 Cost reporting (Policy 24) Thecostofmanagingadatacollectionincludes:

1. Facilitycostsforfloorspaceandpower2. Equipmentcostsforstoragesystems,networks,andcomputerservers3. Mediacostsfortape4. Laborcostsforoperations5. Networkcostsforloadingthecollectionandforcollectionaccess

Thecostscanbedistributedacrossthefilesinthecollection.Howeverthecostsmaybeproportionalto:

‐ Thenumberoffiles

Page 143: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

135

‐ Thesizeofthefiles‐ Theamountofmetadata

Apolicythataggregatescostsacrossthesethreemetricsislistedbelow.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*FacCount Costfactorpermillionfiles*FacMeta Costfactorpermillionattributes*FacSize CostfactorperGigabyte*Rep acollectionname*Res astorageresource*Src acollectionname

Thepolicyusessessionvariables: $rodsZoneClientThepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_IDDATA_SIZEMETA_DATA_ATTR_IDRESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐cost‐report.r

Page 144: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

136

7.3 Collection creation planning (Policy 45) Collectioncreationplanningidentifiesthepropertiesthatwillbeassociatedwithacollection.Thepropertiesaredrivenbyassertionsthatthecollectioncreatorswillclaimaboutthedigitalentities,suchasprovenance,authenticity,quality,completeness.Collectionplanningalsorequirestheidentificationof:

‐ Mechanismsforingestingsensordataintoacollection‐ Namingconventionsassignedtothefiles‐ Arrangementoffilesintocollection‐ Identificationofappropriateprovenancemetadata‐ Identificationofappropriatedescriptionmetadata‐ Assignmentofaccesscontrols‐ Identificationofproceduresforgeneratingderiveddataproducts.‐ Qualitycontrol

Thespecificpoliciesthatautomatethesetasksdependuponthespecificdetailsofthecollectionformationprocessandthetypeofdatathatarebeingorganized(observational,experimental,simulation,survey).Examplepoliciesforcollectionarrangementmightbe:

‐ Organizebytimeperiod.Eachmonthanewsubcollectionisstarted.‐ Organizebydatatype.Separatecollectionsaremadeforsensordata,

simulationdata,documents.‐ Organizebycontributor.‐ Organizebyexperiment.

Theexamplepolicylistedbeloworganizesdatafilesbyatimeextension.Filesarecopiedfromastagingareaintosubcollectionsforeachyear.Theruleusesthepolicyfunctions:

checkCollInput isCollTheinputvariablesare:

*Destcoll acollectionname*Srccoll acollectionname

Nosessionvariablesareused.Thepolicyusespersistentstateinformation:

COLL_IDCOLL_NAMEDATA_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjRename

Page 145: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

137

msiSplitPathByKeyselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐stage‐time.r

7.4 Instrument control (Policy 77) Thecontrolofthedatastreamsfromsensorsrequiresidentificationofhowfrequentlytoharvestobservationaldata,howtoaggregatethesensordataintofiles,andhowtoarchivethedatastreams.Asanexample,weillustratetheharvestingofsensordatafromanexternalAntelopeRealTimeSystem.Theplanningrequiresidentifyinghowfrequentlytoharvest,theformattobeusedtostorethedata,andhowtonamethefiles.Theruleharvests100,000packetsfromaspecificsensor.Theinputvariablesare:

*Coll acollectionname*Loc aseekaddresswithinafile*modeln amodelnumber*Offset afileoffset*OrbHost ahostaddress*OrbParam aparameterforasensor*PKTNum numberofpackets*Resc flagforfilecreate*Sensor typeofsensor

Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation.

Theoperationsthatareperformedare:

formsiCollCreatemsiDataObjClosemsiDataObjCreatemsiDataObjLseekmsiDataObjOpenmsiDataObjWritemsiFreeBuffermsiOrbClosemsiOrbDecodePktmsiOrbOpenmsiOrbReapmsiOrbSelectselect

Page 146: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

138

writeLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐sensor‐harvest.r

7.5 Event log for collection formation (Policy 54) Errorsmayoccurinthesensordataastheyarebeinggenerated(missingvaluesorbadcalibration),whenthesensordataarearchived(transmissionerror),andafterstorage(datacorruption).Detectionoferrorsongenerationrequiresanalysisofthedatastream,testforvaluesoutofrange,andtestsformissingvalues.Detectionoftransmissionerrorscanbehandledwithnetworkprotocols.Detectionoferrorsafterstoragerequiresperiodicvalidationofchecksums.Thefollowingruleverifiesthechecksumsofallfilesintheaccount/Mauna/home/atmos.Sincethesizeofthecollectionissmall,theruledoesnotneedtomonitortheloadonthesystem.Alogfileiscreatedthatcontainsatimestampforwhenthecheckwasrun,andthatlistsallcorruptedfiles.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll acollectionname*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_ACCESS_COLL_IDCOLL_ACCESS_USER_IDCOLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDRESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeach

Page 147: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

139

ifmsiCollCreatemsiDataObjChksummsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐validate‐chksum.r

7.6 Collection reports (Policy 41) Informationaboutthecollectionmayincludethenumberoffiles,thesizeofthedata,thenumberofmetadatavalues,theusage,whenintegritychecksweredone,theuniformityofmetadataacrossthefiles,thesizedistribution,etc.Theinformationmaybeorganizedbyeachsub‐collection,orbyfiletype,orbyyear.ReportsaregeneratedbyissuingqueriestotheiCATcatalogandformattingtheresults.Thisexamplepolicyliststhesizeofeachcollectionandthenumberoffilesthatarepubliclyaccessible.Theruleusesthepolicyfunction:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*PathColl acollectionname*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_ACCESS_COLL_IDCOLL_ACCESS_USER_IDCOLL_IDCOLL_NAMEDATA_IDDATA_SIZERESC_IDRESC_NAMEUSER_ID

Page 148: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

140

USER_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report.r

7.7 Product formation (Policy 17) Whenprocessingobservationaldata,communitiesgeneratethreeadditionalclassesofdata:1)calibrateddata,2)physicalvariables,3)griddeddata.Theprocessingstepscanbeaggregatedintoaprocessingpipelinethatautomaticallygenerateseachsuccessivedataclass.Theprocessingcanbeappliedeachtimeafileisdepositedintoaknowndirectory,orappliedinabatchmodeataremotecomputeserver,orappliedatthestorageresource.Theprocessingstepscanalsobecapturedinaworkflowthatisregisteredintothedatagrid.Eachexecutionoftheworkflowcanbetracked,associatingtheworkflowinputwiththeworkflowoutput.Thefollowingruleillustratesprocessingthatisautomaticallyappliedeachtimeafileisdepositedintoaspecifiedcollection.Inthiscaseareportisamendedtoaddinformationabouteachfilethatisdeposited.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpoint

Thesessionvariablesare$objPath

Theoperationsthatareperformedare:foreachifmsiDataObjChksummsiDataObjOpenmsiDataObjLseek

Page 149: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

141

msiGetSystemTimemsiSplitPathselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐report.re.

7.8 Data category management (Policy 78) Thecategoriesofdataincludeobservational,experimental,simulation,survey,andpublications.Differentassertionscanbemadeabouteachtypeofdata.Thusobservationaldataneedstobecalibrated,convertedtophysicalvariables,andmappedtoacoordinatesystem.Experimentaldatamayrequireadditionalprovenanceinformationthatrecordthedetailsofeachexperiment.Simulationdataneedclosetrackingofsimulationversionandinputfiles.Publicationdatamayhaveareleasedatethatdependsuponacceptancebyajournal.Ineachcase,asetofassertionsaremadeaboutthedatacollectionwhichareuniformlyappliedtoalldepositedfiles.SimilarlytotheProductGenerationtask,datacategorymanagementcanbeexpressedasasetofprocessingstepsthatenforcetheassertions.Anexamplepolicyistheautomatedapplicationofaprocessingsteponthestoragesystemholdingthedata.Thisruleexecutesanapplication(calledapp)storedintheirods/server/bin/cmddirectory.Twoinputargumentsaresetupfortheapp,andthetemporaryfilesaredeleted.Theruleusesthepolicyfunction:

checkPathInputTheinputvariablesare:

*Cmd anapplicationcommand*outXmlFile afilepathname*Pathf afilepathname

Nosessionvariablesareused.Thepersistentstateinformationis:

COLL_NAMEDATA_IDDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC

Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArg

Page 150: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

142

failforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselecttimewriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐external‐process.r

7.9 Re‐using existing data (Policy 79) Adatagridcanaccessfilesfromexternalrepositories.Alocalcopycanbemadeandusedinprocessingsteps.Mostrepositoriesprovidewebservicesforaccessingfiles.ThisexampleruleretrievesafilefromaspecifiedURLandstoresacopyofthefileinthedatagrid.Theinputvariablesare:

*destObj afilepathname*url aURL

Nosessionvariablesareused.Nopersistentstateinformationisused.

Theoperationsthatareperformedare:

msiCurlGetObjwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐get‐object‐url.r

7.10 Quality control (Policy 80) Assertionsaboutpropertiesofacollectioncanbeverifiedbyperiodicallyevaluatingassessmentcriteria.Thetypesofpropertiesthatcanbeverifiedincluderequiredmetadata,requiredfiletype,integrity,distribution,etc.

Page 151: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

143

Theexamplerulecomparesthemetadatadefinedonacollectionandchecksthateachfileinthecollectionhashadthesamemetadataattributesdefined.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_COLL_ATTR_NAMEMETA_DATA_ATTR_IDMETA_DATA_ATTR_NAME

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐check‐coll.r

7.11 Analysis procedures (Policy 81) Eachtimeafileisaddedtothesystem,anewfileversioniscreated.Aversionofafilecanbecreatedbyaddingatimestamp,andmovingtheversiontoanarchivedirectory.Thisruleprocessesfilesinacollection,creatingaversionofeachfilethatisstoredinadestinationdirectorycalled“SaveVersions”.Theruleiscalledruleversion.randislistedinsection4.7.1.

Theversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.TheruleisautomaticallyexecutedwithintheacPostProcForPutpolicyenforcementpoint.Notethataccesscontrolshavetobesetontheversionedfile.Theruleiscalledruleversionfile.randislistedinsection4.7.1.

Page 152: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

144

Therule“ruleversionfile.r”canbemodifiedtoenforceversioningataPolicyEnforcementPoint.Thefollowingruleisappliedeverytimeafileisloadedintothedatagrid.Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpointFilesareversionedtoaspecificcollection

Thesessionvariablesare:

$objPath$rodsZoneClient$userNameClient

Theoperationsthatareperformedare:msiDataObjCopymsiGetSystemTimemsiSetACLmsiSplitPathmsiSplitPathByKey

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐version.re.

7.12 Analysis collaborations (Policy 82) Whencollaborationsresultinmultiplepersonsupdatingacollection,achangelogwillbeneededtodeterminewhenupdateshavebeenmadetoacollection.Twoapproachesaretoanalyzeaudittrails,ortoperiodicallysummarizethecontentsofthecollection.Achangelogsummarizesallchangesmadetothesensordata.Thechangelogcanbecreatedbylistingallofthefilesthatareinthe“/Mauna/home/atmos/version”directory.Theruleusesthepolicyfunction

checkRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Res astorageresourceThesessionvariablesare:

$rodsZoneClientThepersistentstateinformationis:

COLL_IDCOLL_NAME

Page 153: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

145

DATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report‐changes.r

7.13 Data dictionary (Policy 29) AreservedvocabularycanbeimplementedforacollectionusingtheHIVE(HelpingInterdisciplinaryVocabularyEngineering)system.HIVEmaintainsanontologyforadiscipline,definingrelationshipsbetweenwordsaswellasastandardvocabulary.Thedescriptivemetadataregisteredonfileswithinacollectioncanbecheckedforcompliancewiththereservedvocabulary.Thisensuresthatwell‐knowntermscanbeusedtoquerythecollectionandidentifyrelevantmaterial.AnexamplevalidationruleutilizesaRESTservicetoiterateoveriRODScollections,validatingthetermsasbeingvalidSKOSreferences,andgeneratingareportoninvalidterms.Theruleiscalledvalidate‐ontologies.randislistedinsection5.12.1.Anexampleoutputforwhentwodataobjectsareannotated,onewithaninvalidterm,islistedbelow.

test1@ubuntu:~/workspace/rule_workbench$irule‐Fvalidate_data_object_ontologies.rMetadatavalidationreport/fedZone1/home/rods/hive/libmsiCurlGetObj.cpphasurihttp://purl.org/astronomy/uat#TT888thatisnotinavalidontology

7.14 Naming control (Policy 83) TheingestionofdataintothecollectionisgovernedbyprocessesoutsideofiRODS.IfanAntelopeRealTimeSystemisbeingusedtomanagethesensordata,thenmicro‐servicesexisttoautomatetheperiodicingestionofsensorrecordsfromARTS

Page 154: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

146

intoaniRODScollection.Theupdatecanbedoneperiodically.NotethattheattributeDATA_CREATE_TIMEisautomaticallyseteachtimeafileiscreated,andDATA_MODIFY_TIMEisautomaticallyseteachtimeafileismodified.Theruleiscalleddmp‐sensor‐harvest.randislistedinsection7.4.

7.15 Data format control (Policy 16) Acheckcanbemadethatthedatatypeassociatedwitheachsensordatafileis.csv.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_NAMEDATA_TYPE_NAME

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐checkDataType.r

7.16 Unique identifiers (Policy 27) ADigitalObjectIdentifiercanbegeneratedautomaticallythroughanextensiontotheacPostProcForPutrule.TheHandlesystemcanusealocalhandleregistryforassigningidentifierstofiles.Thelocalhandleregistry,inturn,isassignedauniqueidentifierinaglobalhandlesystem.ThefollowingrulecreatesahandleandregistersitintheDFChandleserver:(theregistrationofthehandleinourhandleserverindicatesitisavailableforaccessfromDataONE.)Thepolicyimplementsaconstraint:

AppliedattheacPostProcForPutpolicyenforcementpoint

Thesessionvariablesare:

Page 155: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

147

$objPath

Theoperationsthatareperformedare:msiGetStdoutInExecCmdOutmsiExecCmd

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐handle.re.Theruleexecutesashellscript:

#!/bin/bashif["$#"‐ne2];thenecho"Usage:create_handle<dataobjectid><dataobjecturl>"exit1fiOID="$1"URL="$2"HANDLE=$(java‐classpath./irods‐hs‐tools.jarorg.irods.dfc.CreateHandle./admpriv.bin"$URL""$OID")echo"$HANDLE"exit0;

7.17 Metadata standard (Policy 29) Themetadataattributesthatwillbecreatedcanbespecifiedinatemplate.Dependinguponthesensordataformat,theattributescanbeparsedfromeachsensorfileandaddedasmetadataonthefile.Examplesexistforparsingmetadatafromtextfiles,netCDFfiles,XMLfiles,etc.Patternmatchingoperationscanbeappliedtotexttoextractcontextualmetadata.Atemplateforpatternmatchingcanbecreatedthatdefinestriplets:

<pre‐string‐regexp,keyword,post‐string‐regexp>.

Thetripletsarereadintomemory,andthenusedtosearchadatabuffer.Foreachsetofpreandpostregularexpressions,thestringbetweenthemisassociatedwiththespecifiedkeywordandcanbestoredasametadataattributeonthefile.Intheexample,thetemplatefilehastheformat:

<PRETAG>X‐Mailer:</PRETAG>MailerUser<POSTTAG></POSTTAG><PRETAG>Date:</PRETAG>SentDate<POSTTAG></POSTTAG><PRETAG>From:</PRETAG>Sender<POSTTAG></POSTTAG><PRETAG>To:</PRETAG>PrimaryRecipient<POSTTAG></POSTTAG>

Page 156: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

148

<PRETAG>Cc:</PRETAG>OtherRecipient<POSTTAG></POSTTAG><PRETAG>Subject:</PRETAG>Subject<POSTTAG></POSTTAG><PRETAG>Content‐Type:</PRETAG>ContentType<POSTTAG></POSTTAG>

Theendtagisactuallya"return"forunixsystems,ora"carriage‐return/linefeed"forWindowssystems.Theexamplerulereadsatextfileintoabufferinmemory,readsinthetemplatefilethatdefinestheregularexpressions,andthenparses the text in the buffer to identify presence of a desiredmetadata attribute.Theruleiscalledrulemetaload.randislistedinsection4.6.3.

7.18 Metadata export (Policy 84) ThedescriptivemetadatathatareregisteredoneachfilecanbeextractedandwrittenasanXMLfile.ThisrulecreatesanXMLmetadatafileforeachfileinthe/Mauna/home/atmos/sensordirectory.Thefollowingstructureisused:

<?xmlversion="1.0"?><catalog><Filepath=”COLL_NAME/DATA_NAME”><META_DATA_ATTR_NAME>META_DATA_ATTR_VALUE</META_DATA_ATTR_NAME></File></catalog>

Thenameofthemetadatafileiscreatedbyappending.xmltothenameofthesensordatafile.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputfindZoneHostName

Theinputvariablesare:

*Relcoll arelativecollectionname*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTION

Page 157: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

149

ZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiDataObjClosemsiDataObjCreatemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐createXML.r

7.19 Collection creation system (Policy 85) Thedatamanagementplanshouldincludeinformationaboutthesystemthatwillbeusedtoassemblethecollection.Thismaybedifferentfromthesystemusedtoarchivethecollection.Acollaborationenvironmentfacilitatescollectioncreation.Eachcollaboratingpersonisgivenanaccount,andpermissionsaresettoallowdepositionoffilesintothesharedcollection.Thisrequires:

‐ Creatingsharedcollectionname.Thismaybeaseparateaccountinthedatagrid.

‐ Settingwriteaccesscontrolsonthesharedcollection.Thismaybedonebycreatingausergroupthatisallowedtoupdatethecollection.

‐ Definingthedesirednamingconventionforthefiles.Thismayrequirerenamingeachfileasitisdeposited.

‐ Definingtherequiredprovenanceanddescriptivemetadataneededforeachfile.Thismayrequireextractionofheaderinformationfromeachfile.

Thefollowingpolicyliststhenamesofthepersonsineachgroupthatcanupdatethesharedcollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:

COLL_ACCESS_COLL_IDCOLL_ACCESS_TYPECOLL_ACCESS_USER_ID

Page 158: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

150

COLL_IDCOLL_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_GROUP_IDUSER_IDUSER_NAME

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐check‐group.r

7.20 Collection size (Policy 86) ThetotalsizeofthecollectioncanbefoundbyqueryingtheiCATcatalog.Thetotalsizeshouldincludethestoragespaceforreplicas,thestoragespaceforintermediateproducts,andthestoragespaceforpublishedresults.Theexamplepolicytakesasinputacollectionname.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll acollectionname*PathColl acollectionname*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_ID

Page 159: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

151

DATA_SIZERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report‐size.r

7.21 Publication of original data (Policy 87) Astandardapproachistoplacetherestrictedaccessdatainacollection,createusergroupsforallowedusers,andrestrictaccesstojusttheallowedusergroups.TherearethreetypesofdatamanagedbytheMaunaLoaproject:sensordata,deriveddataproducts,andresearchdata.Thesecanbehandledbycreatingthreecollections:

/Mauna/home/atmos/sensor /Mauna/home/atmos/derived /Mauna/home/atmos/research

Wewillturnoninheritanceineachcollection,andsettheaccesscontrolsatthecollectionlevel.PublicaccessisspecifiedforallsensordatafortheMaunaLoadata.IntheiRODSdatagrid,publicaccessisthroughthe“anonymous”account.Weturnoninheritanceonthe“sensor”datacollectionandgiveaccesstothe“anonymous”account.Theruleusesthepolicyfunction:

checkCollInputTheinputvariablesare:

*RelativeCollection arelativecollectionnameThesessionvariablesare:

$rodsZoneClient$userNameClient

Page 160: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

152

Thepersistentstateinformationis:

COLL_IDCOLL_NAME

Theoperationsthatareperformedare:failforeachifmsiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐set‐public.r

7.22 Publication of data products (Policy 88) ThetimeperiodsforholdingdataproprietaryvariedacrosstheDMPs,andexamplesincluded6months,2years,untilprojectend,untilprojectreview,anduntilresearchpublication.FortheMaunaLoadata,allderiveddatawillbeheldprivateuntilasixmonthperiodhaselapsed.Attheendofthisperiodwechangethereadaccesstopublic.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*RelativeCollection arelativecollectionname*Acl anaccesscontrol

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_CREATE_TIMEDATA_IDDATA_NAME

Theoperationsthatareperformedare:failforeach

Page 161: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

153

ifmsiGetSystemTimemsiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐proprietary‐change.r

7.23 Re‐use policies (Policy 89) Collectionre‐useoccurswhenthecollectionissubsummedintoanotherdigitallibrary,orprocessedthroughanewdataprocessingpipeline,orarchivedatanothersite.Dependinguponthetypeofdata,re‐usemayentailmultiplerequirements:

‐ Accesspermission.Allproprietaryorconfidentialdatarequirenegotiationofaccessagreements.Thismayrequireanonymizationofdatafiles,orencryptionofdatafiles,orcreationofaccesscontrols.

‐ Descriptivemetadata.Thecontextassociatedwitheachfileisrepresentedbyastandardmetadataschema.Re‐usemayrequiremappingfromthechosenstandardtoanothermetadataschema.TheHIVEtechnologyprovidestheabilitytomapbetweenontologiestosimplifythisprocess.

‐ Integritychecks.Integrityshouldbeverifiedoneachshareddataobject.Thisimpliesthecommunitythatisre‐usingthedatacanverifychecksumsoneachfile.

‐ Policy‐encodedobjects.Thepoliciesthatgovernaccessandprocessingofadigitalobjectcanbeencapsulatedwiththedigitalobject.Ifthesepoliciesareautomaticallyloadedintoacontrollingruleenginewhenthedigitalobjectisused,controlcanbemaintainedevenwhenthedigitalobjectisre‐used.Theimplementationwillrequire:

o Encryptionofthedigitalobject.o Negotiationbetweentheinstitutionthatisre‐usingthedigitalobject

andtheoriginalrepositoryfortheencryptionkey.o Verificationthatthere‐useinstitutioniscapableofenforcingthe

policies.o Extractionoftheassociatedpoliciesandthereloadingintoare‐use

ruleengine‐ PreservationofDigitalObjectIdentifiers.Themetadatausedtoidentifythe

digitalobjectsshouldbepreservedbythere‐useinstitution.‐ Provenancetrail.Digitalobjectsthatarederivedfromtheoriginaldata

shouldincludemetadatathatdenotesthesourceandthetransformationthatwereappliedtotheoriginaldata.Thetransformationscanbeencapsulatedinworkflowsthatcanberegisteredintotherepositoryalongwithidentifiersfortheinputfilesandtheoutputfiles.

Page 162: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

154

Theimplementationofthesepoliciesdependsuponthetechnologyusedbythere‐useinstitution.Ifdatagridtechnologyisused,manyoftheserequirementsmaybeimplementedthroughfederationoftheoriginaldatagridandthere‐usedatagrid.

7.24 Distribution policies (Policy 90) Researchersprefertohavealocalcopyofthedatasetstheyareanalyzing.Thisminimizeslatencyinprocessingpipelines,ensuresaccess,andenablestrackingofversionsofthedatawithoutdisruptingtheoriginalcollection.Distributionpoliciesmaybedefinedto:

Cachedataonaresourceataremoteinstitution. Controlwhichdatasetsmaybere‐used. Automategenerationofcopiesattheremotesitewhenfilesareaddedtoa

collection. Distributefilesacrossinstitutionsdependinguponthetypeofdata.An

exampleisthedistributionofsensordatatotheinstitutionthatisworkingwithaparticularsensor.

Applytransformativemigrationasthedatasetsaredistributedtoensuretheappropriateformatisprovided.

Distributeworkflowsthatcanbeusedtoprocessthedatasets. DistributeapplicationswithinDockervirtualenvironmentimagesthatcanbe

usedtoanalyzethedatasets. DistributethedescriptivemetadataeitherasanXMLfile,oraCSVfile,ora

JSONfile.ThefollowingpolicygeneratesaJSONfilecontainingthedescriptivemetadataforthedatafilesinacollection.Foreachfile,aJSONfileisputintoasubdirectorycalled“Metadata”.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputfindZoneHostNameisColl

Theinputvariablesare:

*Coll acollectionname$Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_NAMEMETA_DATA_ATTR_UNITS

Page 163: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

155

META_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐json.r

7.25 Privacy access restrictions (Policy 14) TherearenorestrictionsonaccessfortheMaunaLoasensordata.TypicalaccessrestrictionsforotherDMPsincludeInstitutionalReviewBoard,proprietarydata,andcopyright.Asbefore,therestrictionscanbeenforcedbyplacingrestricteddatainacollection,creatingusergroupsfortheallowedusers,andonlypermittingallowedgroupstoaccessthedata.Astandardtaskistoverifythattheaccesscontrolshavebeensetcorrectly.Theruleusesthepolicyfunctions:

checkUserInputcontainsfindZoneHostName

Theinputvariablesare:

*Group agroupnameThesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_ID

Page 164: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

156

DATA_IDDATA_SIZETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeachifmsiSplitPathByKeyremoteselectstrlenwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐group‐access.r

7.26 IPR restrictions (Policy 91) Weassumethatfilesdepositedintotheresearchdirectoryhavebeenpublished.Toensurepublicaccess,weonlyneedtosetinheritanceonthedirectoryforthe“anonymous”account.ThiscanbedoneasshownforTask1.Thisruleusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Acl anaccesscontrol*RelativeCollection arelativecollectionname*User ausername

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAME

Theoperationsthatareperformedare:

Page 165: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

157

failforeachifmsiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/odum‐inherit.rAmoresophisticatedrulewouldcheckforametadataflagthatspecifiesthatpublicationhasbeendone.Thisrulecheckswhetherthevalueofa“PUBLICATION”flagissetto1,andthenprovidespublicaccess.Theruleusesthepolicyfunctions:

addAVUMetadatacheckCollInputdeleteAVUMetadata

Theinputvariablesare:

*Coll acollectionnameThesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME

Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiSetAVUmsiString2KeyValPairselectwriteLine

Page 166: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

158

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐publication.r

7.27 Web access policies (Policy 92) AstandardapproachacrosstheDMPsistoprovideapersistentURLforaccessingdatasets.WithintheiRODSdatagrid,eitheraURLcanbecreatedforpublicaccess,oraticketcanbecreatedthatdefinesapersistentURL,definesaccesscontrols,andalsodefinesthetimeperiodoverwhichtheticketisvalid.Anypersonholdingtheticketisallowedaccesstothedataset.Ticketscanbecreatedbyawebclient,orcanbecreatedbyrunningtheiticketiCommand.Arulecanbecreatedtolistticketsusedwithinacollection.Theruleusesthepolicyfunction:

checkCollInputTheinputvariablesare:

*Coll acollectionnameThesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMETICKET_DATA_COLL_NAMETICKET_EXPIRYTICKET_ID

Theoperationsthatareperformedare:failforeachifselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐list‐tickets.r

7.28 Data sharing system (Policy 93) Thechoiceofthedatamanagementsystemforsharingorpublishingthedataproductsdependsonthetypeofdataproduct.MostDMPsuseGitHubtopublishcode,adatabasetopublishinformation,andadatarepositorytopublishdata.Ineachofthesecases,thedatasetsaretypicallypubliclyaccessed.Forfinergrain

Page 167: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

159

accesscontrol,adigitalrepositoryordatagridischosen.Thedatasharingsystemshouldprovidethefollowingcapabilities:

Collectionhierarchy.Thisisneededtoseparatethegenerationofdatafromthepublicationofdata.

Accesscontrols.Usuallyintermediatedataproductsarenotreleasedtothepublic.Deriveddataproductsareusuallyheldproprietaryuntiltheyareverifiedforquality.

Supportfordistributeddata.Dataproductsmaybelocatedatmultiplesitesandshouldbemanagedbythedatasharingsystem.

7.29 Code distribution system (Policy 94) ThedistributionofcodemaybedonethroughanopensourcecoderepositorysuchasGitHub,orthroughawebsite,oreventhroughadatarepository.Themajorchallengesarethemanagementofversions,thedevelopmentofdocumentation,andunittestingtoverifyallupdates.

7.30 Retention period (Policy 21) Theretentionperiodforthedataproductsisusuallymeasuredinyears.Achallenge,then,ishowtoshowthatthedataproductswereretainedfortherequiredlengthoftime.Oneapproachistoturnoffdeletiononthedatacollection.Thepolicyimplementsaconstraint:

AppliedattheacDataDeletePolicypolicyenforcementpoint

Theoperationsthatareperformedare:msiDeleteDisallowed

Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acDataDeletePolicy‐collection.re Thisprohibitsdeletionevenbyanadministrator.Thefilesinthecollectioncanthenbecheckedforwhethertheirretentionperiodhasbeenpassed.Theruletocheckretentionperiodusesthepolicyfunction: checkCollInputTheinputvariablesare:

*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_EXPIRYDATA_NAME

Page 168: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

160

Theoperationsthatareperformedare:failforeachifmsiGetSystemTimeselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐check‐retention.r

7.31 Curation plans (Policy 95) Curationactivitiesinclude:

Validationofdescriptivemetadata Validationofprovenancemetadata Settingofaccesscontrols Verificationofdataformats

ThecurationpoliciescanberegisteredintotheiCATcatalog.Thepoliciescanthenberetrievedfromthecatalogandpublishedasareport.Theexamplepolicylistsallofthepoliciesthatarebeingenforcedatpolicy‐enforcementpointswithintheiRODSdatagrid.Theruleusesthepolicyfunction:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll acollectionname*Res astorageresource

Thesessionvariablesare:

$rodsZoneClientThepersistentstateinformationis:

COLL_IDCOLL_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:failforeach

Page 169: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

161

ifmsiAdmShowIRBmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiDataObjWritemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐pepRules.r

7.32 Archive system (Policy 96) Forlongtermstorage,adepositionwillberequiredintotheremotearchive.Iftwodatagridsarefederated,thenarulecanberuntoarchiveallfilesfromaselectedcollectionintotheremotestoragelocation.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Acct ausername*Dest acollectionnamein*DestZone*DestZone azonename*Res astorageresource*Src acollectionname

Thesessionvariablesare:

$rodsZoneClientThepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Page 170: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

162

Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐archive.r

7.33 Replication policy (Policy 13) Thenumberofreplicascanbeverifiedforeachfileinacollection.Thisrulelistsallfilesforwhichtherequirednumberofreplicasisnotavailable.Theruleusesthepolicyfunction:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl

Theinputvariablesare:

*Coll acollectionname*Numrep numberofreplications*Res astorageresource

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_IDDATA_NAME

Page 171: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

163

Theoperationsthatareperformedare:failforeachifmsiSetACLselectwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/odum‐check‐replicas.r

7.34 Backup policy (Policy 97) Thetimeperiodbetweenbackupscanbesetbyspecifyingaperiodicruleexecutionforarchivingdata.WecanturntherulespecifiedforTask18intoaperiodicrulethatisexecutedevery7days.Theruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisCollisData

Theinputvariablesare:

*Acct ausername*Dest acollectionname*DestZone azonename*Res astorageresource*Src acollectionname

Thesessionvariablesare:

$rodsZoneClient$userNameClient

Thepersistentstateinformationis:

COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME

Theoperationsthatareperformedare:

Page 172: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

164

delayfailforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine

Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐periodic‐backup.r

7.35 Integrity verification (Policy 18) Integritychecksshouldbeperformedperiodicallytocatchfailuremodessuchasmediafailure,storagesystemfailure,dataoverwrites,operatorerror,etc.Evenifboththehardwareandsoftwareperformflawlessy,itisstillpossibleforanoperatorerrortodeleteoroverwriteafile.Thereplicationruleisturnedintoarulethatisexecutedeveryyear.Aproductioncapableversionoftheruleisshownthatisrestartable,monitorstheexecutionrate,checkstheinputvariables,maintainsalogfileofallactions,repairscorruptedfiles,andreplacesmissingreplicas.Inthe“delay”command,theexecutionfrequencyforrepeatingtheruleneedstobeset.Anexampleforatestevery6monthswouldbe: delay(("<PLUSET>1s</PLUSET>"<EF>6m</EF>){Theruleisnamed“rda‐replication‐rule.r”andislistedinsection4.5.2.Thisruleusesthepolicyfunctions:

checkCollInputcheckRescInputcreateLogFilecheckMetaExistsCollfindZoneHostNamegetNumSizeCollgetRescCollisCollselectRescUpdatecreateReplicasupdateCollMeta

Page 173: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

165

7.36 Technology management policies (Policy 49) Theizonereportcommandliststhepropertiesofthedatagrid,includingboththeiCATcatalogandstorageservers.Updatesaboutsoftwareversionsandhardwareversionscanbetrackedbyperiodicallyrunningtheizonereport.Thereportincludesinformationaboutmicro‐serviceplugins,policies,andstoragesystems.

7.37 Metadata catalog management (Policy 9) Themetadatacatalog,iCAT,containsallofthestateinformationforthedatagrid.Tominimizerisk,themetadatacatalogshouldbereplicated.Periodicbackupdumpsofthecatalogshouldbesavedoutsideofthedatagrid.Thedatagridusesschemaindirectiontostoredescriptiveandprovenancemetadataattributes.Onceastandardschemaischosen,theschemacanbeinstalledasaHIVEontology.Arulecanthenberuntocomparethedescriptivemetadataforeachfilewiththestandardschema.Anexampleruleiscalledvalidate‐ontologies.randislistedinsection5.12.1.

7.38 Transformative migration (Policy 15) Themigrationofdataformatstonewtechnologyissupportedthroughinvocationofexternaltransformationsystems,suchasNCSAPolyglotandBrownDog.Accesstothesesystemsisinvokedthroughamicro‐servicethatissueshttppostandgetcommands.Examplesforinvokingexternalservicesarelistedinsections5.13.1(acPostProcForModifyAVUMetadata.r),6.27(hipaa‐issue‐url.r),and7.9(dmp‐get‐object‐url.r).

Page 174: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

166

8 Verifying Policy Sets: Toverifyatheoryofpolicy‐baseddatamanagement,agenericcharacterizationofdatamanagementsystemsisneeded.Tobasethediscussiononwell‐knownconcepts,considerthecharacterizationoffilesystemsshowninFigure2.Thefilesystemcomprisesanenvironmentthatisdefinedbythestateinformationmaintainedabouteachfile.Interactionswiththefilesystemconsistofeventsthatspecifyanoperation.Eachoperationmanipulatesafileandchangestheassociatedstateinformation.Operationsmayrequireaccesstostateinformationsuchasfilelocation,orfilesize,orfileowner.Ifthestateinformationisconsistentlyupdatedoneachoperationappliedtofileswithinthefilesystem,theenvironmentcanhavepropertiessuchascompleteness,consistency,correctness,andclosure.Thesepropertiesdescribefouressentialelementsofdatamanagement:

1)Whatarethebasicbuildingblocksforcomposingprocedures?2)Whataretheconstraintsforprocedureauthoringanddeployment?3)Howareproceduresimplemented?4)Howistheoutputofprocedureshandled?

Completenessmeansthatalloperationsforeachmanagedfiletypearesupported.Consistencymeansthattherearenoconflictingprocedures.Correctnessmeansthatagivenoperationperformswithouterror.Closuremeansthatoperationsonfileswillgeneratefilesthataremembersofthesystem.Wecanevaluatethepropertiesofcompleteness,consistency,correctness,andclosurebyanalyzingchangestothestateinformation.TypicalfilesystemstateinformationislistedinTable2.Theoperationsperformeduponthefilesystemmayconsistofcreate,open,close,read,write,update,seek,stat,chown,link,andunlink.Anoperationmaybeappliedtoafileortoagroupoffiles.Interactionswiththefilesaredonethroughinteractiveexecutionofclients,whichinvokethedesiredoperationthroughasystemcall.Thisapproachmakesitpossibletoimplementastandarddata

Figure2.FileSystemCharacterization

Table2.FileSystemStateInformationFileName

FileLocationondiskCreationtime

ModificationtimeFilesize

AccesscontrolLocks

SoftLinkDirectory

Page 175: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

167

managementapproachondifferenttypesofhardwaresystems,whichinturnenablesthemigrationoffilesacrossstoragesystems.Wecangeneralizethismodelofdatamanagementbyintroducingpoliciesthatcontroltheoperationsperformedwithinthesystem.InFigure3,weintroducethreesignificantchanges:

Operationsarereplacedbypolicies.

Filesarereplacedbyobjects. Updatesonobjectsandon

stateinformationareimplementedasprocedures.

Agiveneventmayinvokemultiplepolicies.Eachpolicycontrolstheexecutionofaprocedurethatchainstogethermultipleoperationsexpressedasmicro‐services.Theobjectsmanipulatedbythepoliciescanincluderesources,users,digitalobjects,micro‐services,rules,metadata,andthepropertiesoftheenvironmentitself.Forexample,considertheadditionofafiletothesystem.Eventhoughtheexpliciteventisasimplefileaddition,theresponseofthesystemmayrequiretheexecutionofmultiplepolicies,witheachpolicypotentiallyexecutingproceduresthatmanipulatemultipletypesofobjects.Policiesthatareexecutedmayinclude:

1. Authenticationofthepersonaddingthefile2. Authorizationfortheadditionofafile3. Evaluationofastoragequotaforthestorageresource4. Creationofalogicalnameforthefile5. Logicalarrangementofthefileasamemberofacollection6. Physicalaggregationofthefileintoacontainer7. Selectionofastorageresourceforthephysicalcopyofthefile8. Creationofaphysicalfilenameonthestorageresource9. Inheritanceofaccesscontrolsfromthecollectionaccesscontrols10. Creationofachecksum11. Replicationofthefiletoasecondstoragelocation12. Assignmentofaretentionperiodforthefile13. Assignmentofadatatypetothefilebasedonthefileextension14. Storageofsystemlevelmetadata(ownername,accesscontrols,checksum,

filesize,replicalocation,retentionperiod,filetype)15. Extractionandstorageofdescriptivemetadata

Figure3.Policy‐basedDataManagement

Page 176: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

168

16. Creationofanarchivalinformationpackage(aggregatingmetadatawiththefile)

17. StorageofthefileTheresponseofthesystemiscontrolledbythepoliciesthatareenforcedwithintheenvironment.Anotablechallengeisthatpolicy‐baseddatamanagementsystemshavetheabilitytochangethecontrollingpolicies,andthereforechangetheresponseofthesystemtoexternalevents.Aprocessforvalidatingthepropertiesoftheenvironmentisneededtoverifythateitherthenewpoliciesarecompatiblewithpriorpoliciesandthatthepropertiesoftheenvironmenthavenotchanged,orthattheimpactofthenewpoliciescanbedefinedasapprovedchangestostateinformation.Wecancharacterizeinteractionswiththedatamanagementsystemintermsoftheallowedevents.Eventsmaybeinitiatedinteractivelybyexternalusers,orbytime‐basedprocedures,orbychangesofstateinformation.Inpolicy‐baseddatamanagement,eventsaredetectedatpolicy‐enforcementpoints,whichcontroltheselectionofpoliciesthatshouldbeapplied.Thepoliciesinturncontroltheexecutionofproceduresthatread/create/updatestateinformationandmodifytheobjectsinthesystem.Policiesinvokedatpolicy‐enforcementpointscontrolhowtheenvironmentrespondstoevents.Amappingbetweenevents,thepolicy‐enforcementpoints,thepolicies,theprocedures,andassociatedchangestostateinformationisnecessarytodescribetheenvironment.Ifallchangestostateinformationcanbeidentifiedforallevents,thenthepropertiesoftheenvironmentcanbeverified.Wecanbuildacharacterizationofadatamanagementsystemintermsofthefollowingconcepts:

1. Eventsinvokedbyusersofthesystema. Create,modify,delete,access

2. Entitiesthataremanagedbythesystema. Users,digitalobjects,resources(storage,compute),metadata,rules,

micro‐services,environmentframework3. Policiesthatcontrolassertionsabouttheenvironment

a. Propertiesassociatedwitheachtypeofentity(provenancemetadata,accesscontrol,audittrail,aggregation,retentionperiod)

b. Propertiescontrollingenvironmentoperations(numberofprocessingthreads,numberofI/Ostreams,choiceofphysicalpathname)

Wecanverifyatheoryofpolicy‐baseddatamanagementbyanalyzingtheconsistency,completeness,correctness,andclosureofthestateinformationafterapplicationofeverysupportedevent.Todothiswewillneedtodefinethesetofpoliciesthatareinvokedbyeachevent.Foreachpolicywewillneedtodefinetheproceduresthatareinvoked,andthesetofstateinformationvariablesthataremodifiedbyeachprocedure.Notethatproceduresarecomposedbychainingtogethermicro‐services.Wecanthenidentifythesetsofstateinformation

Page 177: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

169

generatedormodifiedbyeachmicro‐service.Averificationpolicycanbedefinedthatvalidatesthattherevisedstateinformationisconsistentwiththedesiredcollectionproperties.Thisapproachcanbeappliedforeachdatamanagementdomain(datasharing,digitallibrary,preservation,processingpipeline)byanalyzingthecontrollingpoliciesandprocedures.Theresultsaredomaindependent.Ananalysisneedstobedoneforeachdomainandforeachchangetothesetofpolicies.Howevertheapproachisgeneric,andtheunderlyinginfrastructurethatisusedtoimplementthepolicy‐baseddatamanagementisgeneric.Inadistributedenvironmentthatencompassesmultiplestoragelocations,multiplenetworkpaths,andmultipleadministrativedomains,correctnesscannotbeguaranteed.Astoragesystemmayhaveamediafailureandcorruptthedatabits.Anetworkmaybecomeunavailableandatransfermaynotcomplete.Aremoteadministratormaychoosetoperformmaintenanceandtakeanentiresystemoffline.Thisimpliesthattheenvironmentneedstobeabletodetectinconsistencies,anduseperiodicpoliciestocorrecttheproblems.Asimpleexampleisthemanagementofintegrity.Astandardapproachistogenerateachecksumforeachfile,andreplicatethefileacrossmultiplestoragesystems.Apolicycanbeexecutedperiodicallythatverifiestheintegrityofeachfilebycomparingthecurrentchecksumwiththestoredvalue.Whenacorruptedfileisfound,thesystemcandeletethecorruptedfile,createanewreplicafromanuncorruptedcopy,updatethesystemmetadata,andlogtheevent.Agoalinapolicy‐baseddatamanagementsystemistoimplementpoliciesthatverifythedesiredpropertiesoftheenvironment,andthatimplementrecoveryproceduresasneededtoensurecompliance.Anextendedgoalistoimplementpoliciesthatensurethatdesiredpropertiesaremaintainedastheenvironmentevolves.

8.1 Analysis of the integrated Rule Oriented Data System ThegeneralityoftheapproachcanbeillustratedusingtheiRODSintegratedRuleOrientedDataSystem[5,10].TheiRODSsoftwareimplementsvirtualizationmechanismsthatenablethefederationofexistingdatamanagementsystems,andtheenforcementofdesiredenvironmentpropertiesacrossthefederatedsystems.TheiRODSdatagridmanagesmultipletypesofentitiesindependentlyofthechoiceofauthenticationenvironment,storagesystem,database,andadministrativedomain:

Users(logicalusernamespace) Digitalobjects(files,workflowstructuredobjects,softlinks) Resources(storagesystems,repositories,computesystems) Metadata(systemstateinformation,provenanceinformation,descriptive

information) Rules(computeractionablepoliciesthatcontroltheexecutionofprocedures) Micro‐services(computerexecutablefunctionsthatcanbechainedinto

procedures)

Page 178: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

170

Environmentframework(thedatagriditself).Standardpropertiescanbegeneratedforeachtypeofentity:

Logicalname(persistentidentifierdefinedbythedatagrid) Accesscontrols Aggregation(formationofgroups) Descriptivemetadata Audittrailofeventsandactions

Thestandardpropertiesarereifiedassystemstateinformationthatarestoredinarelationaldatabase(theiCATcatalog).Theimpactofeacheventthataccessesthesystemcanbetrackedthroughthecorrespondingchangestothestateinformation.IniRODS,manyofthestateinformationattributesareupdatedbytheiRODSservermiddlewaretoguaranteeconsistency.Howeverthedatagridadministratorcancustomizechangestothesystembymodifyingthepoliciesthatarestoredintherulebase.Sincethesepoliciesreflectdecisionsbythedatagridadministrator,aprocedureisneededthatverifiestheconsistencyofthedatagrid.Wecangenerateacomprehensiveassessmentoftheconsistentupdateofstateinformationbyanalyzingthemappingof:

Events(clientactions)tomultiplepolicy‐enforcementpoints Policiesinvokedatpolicy‐enforcementpoints Procedurescontrolledbyeachpolicy Chainofmicro‐servicesinvokedbyaprocedure Updatestostateinformationgenerationbyeachmicro‐service Verificationpolicythatmonitorsthestateofthesystem

8.2 Policy‐enforcement points InAppendixA,welistthepolicy‐enforcementpointsiniRODS.Theycanbelooselygroupedintocontrolpointsformanipulatingfiles,users,resources,systemstateinformation,andenvironmentparameters.WhiletheiRODSdatagridprovides71policy‐enforcementpoints,thestandarddatagridusespoliciesatonly11pointswhicharelistedinsection2.

Inpractice,sitesaddrulestoenforcespecificpropertieswithinthedatagrid.Forexample,intheSILSLifeTimeLibrary[11]fiveadditional/modifiedrulesareused,listedinsection3.ToverifythattheLifeTimeLibraryrulesetenforcestherequiredproperties,wewillneedtoexaminewhicheventsinvokethepolicies,andthenanalyzechangestothestateinformationforconsistency.

8.3 Client invocation of policy‐enforcement points InAppendixB,welisteventsgeneratedbytheexecutionoftheunixshellcommandsprovidedwiththeiRODSdatagrid(icommands).TheunixshellcommandsarethemostcomprehensiveinterfaceforiRODSintermsofthepolicy‐enforcementpoints

Page 179: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

171

thatcanbetriggered.Eachcommandinvocationmaycausepoliciesatmultiplepolicy‐enforcementpointstobeexecuted.Forthecaseofloadingafileintothedatagrid,thefollowingtenpolicy‐enforcementpointsaretriggered:

1. acChkHostAccessControl2. acSetPublicUserPolicy3. acAclPolicy4. acSetRescSchemeForCreate5. acRescQuotaPolicy6. acSetVaultPathPolicy7. acPreProcForModifyDataObjMeta8. acPostProcForModifyDataObjMeta9. acPostProcForCreate10. acPostProcForPut

WeimmediatelycanseethatfourofthepoliciesaddedfortheSILSLifeTimeLibrarywillneedtobeverifiedfortheirimpactonpolicy‐enforcementpoints3,4,5,and10intheabovelist.TheadditionalpolicyfortheLifeTimeLibrarycontrolsthepreferredstoragelocationforreplications.AnassertionaboutthepropertiesoftheLifeTimeLibraryrequiresverifyingthatthenewpolicieshavenotchangedthedatagridproperties.Wedothisbycheckingwhetherchangestothestateinformationforeachoftheserulesmaintainsthedesiredcompleteness,correctness,closure,andconsistency.Atotalof80differentclientinteractionsarelistedinAppendixB,alongwiththepolicyenforcementpointsthataretriggered.Forotherevents,adifferentsetofpolicyenforcementpointsmaybetriggered.However,allclients(webbrowsers,loadlibraries,I/Olibraries)willtriggerthesamepolicyenforcementpointsforthesameevents.

8.4 Procedures executed at each policy enforcement point TheproceduresexecutedwithintheiRODSdatagridarecomposedbychainingtogethermicro‐services.AppendixCliststheavailablemicro‐services,organizedalphabetically.Mostofthemicro‐servicesdonotaffectthesystemstateinformation,andinsteadareusedtomanagetheworkflow,orinteractwithexternalsystems,orsupportstringmanipulation,orsupportarithmeticoperations,orsupportadministrativefunctions.Therearecurrently348micro‐servicesavailableforuseinrules.Foreachmicro‐servicethesetofsystemattributesthatareread,modified,orwrittenisidentified.AlistofqueriablepersistentstateinformationattributesarelistedinAppendixD.IfapersistentstateinformationattributeisnotincludedinAppendixC,thenitisnotreadormodifiedbyamicro‐service.Thereareatotalof67differentsetsofstateinformationthatmaybemodified.ThesetsarelistedintablesC:2,C:3,andC:4.

Page 180: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

172

Ofthelistof348micro‐services,only103modifystateinformation.Outofatotalof338systemstateattributes,151attributesaremodifiedbythemicro‐services.Themappingchallengeistherefore:

80separateclienteventsrepresentedbyicommandactions 71policyenforcementpoints 103micro‐servicesthatmanipulatestateinformation 151persistentstateattributes

Thenumberofcombinationsthatshouldcheckedis

Numberofclientevents*Numberofpolicyenforcementpointsaccessedbytheevent*Numberofmicro‐servicesinvokedatapolicyenforcementpoint*Numberofpersistentstateattributesmodifiedbyamicro‐service.

Inthefollowinganalysis,weignorethepolicy‐enforcementpointsthathavenotbeenmodified,andthemicro‐servicesthatarenotinvokedatapolicy‐enforcementpoint.WeexaminetheimpactofeachpolicyfortheSILSLifeTimeLibrary:

acAclPolicyenforcementpointisusedby37oftheclientactions.o ThispolicycallsthemsiAclPolicy("STRICT")micro‐service.o ThemsiAclPolicysets“STRICT”accessinastructureinmemory.The

persistentstateinformationisnotchangeddirectly.o Tocheckenforcementofthispolicy,alistingoffilesinanon‐public

useraccountcanbetriedtoverifythatthefilescannotbeseen. acSetRescSchemeForCreateenforcementpointisusedby7oftheclient

actions,basicallyeachtimeafileiscreated.o ThispolicycallsthemsiSetDefaultResc("lifelibResc1","null")micro‐

service.o ThemsiSetDefaultRescdefinesthestoragesystemtouseforcreating

afileinastructureinmemory.Thepersistentstateinformationisnotchangeddirectly.

o Theimpactofthepolicycanbemonitoredbyrunningarulethatverifiesthateachfilehasacopyresidingon“lifelibResc1”:

ruleverifyFiles{#Verifyeachfilehasacopyonaspecifiedstorageresource*Path="/$rodsZoneClient/home/$userNameClient/%";*Q=selectDATA_NAME,COLL_NAMEwhereCOLL_NAMElike'*Path';*Count=0;foreach(*Rin*Q){*F=*R.DATA_NAME;*C=*R.COLL_NAME;*Q2=selectcount(DATA_ID)whereCOLL_NAME='*C'andDATA_NAME='*F'andDATA_RESC_NAME='*Resc';foreach(*R2in*Q2){if(*R2.DATA_ID=="0"){*Count=*Count+1;}

Page 181: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

173

}}writeLine("stdout","Atotalof*Countfilesarenotpresenton*Resc");}INPUT*Resc="lifelibResc1"OUTPUTruleExecOut

acSetRescSchemeForReplenforcementpointisusedby1clientactionforcreatingareplica.

o ThispolicyalsocallsthemsiSetDefaultResc("renci‐unix1","null")micro‐service.

o ThemsiSetDefaultRescdefinesthestoragesystemtouseforreplicatingafileinastructureinmemory.Thepersistentstateinformationisnotchangeddirectly.

o Enforcementofthepolicycanbemonitoredbyrunningarulethatverifiesthateachfilehasareplicaon“renci‐unix1”.

ruleverifyFiles{#Verifyeachfilehasacopyonaspecifiedstorageresource*Path="/$rodsZoneClient/home/$userNameClient/%";*Q=selectDATA_NAME,COLL_NAMEwhereCOLL_NAMElike'*Path';*Count=0;foreach(*Rin*Q){*F=*R.DATA_NAME;*C=*R.COLL_NAME;*Q2=selectcount(DATA_ID)whereCOLL_NAME='*C'andDATA_NAME='*F'andDATA_RESC_NAME='*Resc';foreach(*R2in*Q2){if(*R2.DATA_ID=="0"){*Count=*Count+1;}}}writeLine("stdout","Atotalof*Countfilesarenotpresenton*Resc");}INPUT*Resc="renci‐unix1"OUTPUTruleExecOut

acRescQuotaPolicyenforcementpointisnotcalledbyanicommand.

o ThispolicycallsthemsiSetRescQuotaPolicy("on")micro‐service.o ThemsiSetRescQuotaPolicyturnsonthestoragequotainastructure

inmemory.Thepersistentstateinformationisnotchangeddirectly.o Enforcementofthepolicycanbecheckedbyrunningarulethat

checkstheQUOTA_USAGE.ruleQuota{#Countnumberofusersthatexceedthequota*Q=selectQUOTA_USER_NAME,QUOTA_OVER;*Count=0;foreach(*Rin*Q){*Over=double(*R.QUOTA_OVER);if(*Over>0.){*Count=*Count+1;*User=*R.QUOTA_USER_NAME;

Page 182: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

174

writeLine("stdout","User*Userexceededquota");}writeLine("stdout","*Countpersonsexceedquota");}}INPUTnullOUTPUTruleExecOut

acPostProcForPutenforcementpointisusedby5clientactions.o Thepolicycallstwomicro‐services

delay("<PLUSET>1s</PLUSET>") Thisusespersistentstatevariableset#60tomodify

stateinformation:o RULE_EVENTo RULE_EXEC_ADDRESS

RULE_EXEC_ESTIMATED_EXE_TIMEo RULE_EXEC_FREQUENCYo RULE_EXEC_IDo RULE_EXEC_NAMEo RULE_EXEC_NOTIFICATION_ADDRo RULE‐EXEC_PRIORITYo RULE_EXEC_REI_FILE_PATHo RULE_EXEC_TIMEo RULE_EXEC_USER_NAMEo RULE_ID

msiSysReplDataObj('renci‐unix1','null') Thisreadsthepersistentstatevariablesinset#18to

collectstateinformation:o COLL_CREATE_TIMEo COLL_IDo COLL_MODIFY_TIMEo COLL_NAMEo COLL_OWNER_NAMEo COLL_OWNER_ZONEo DATA_ACCESS_DATA_IDo DATA_ACCESS_TYPEo DATA_ACCESS_USER_IDo TOKEN_IDo TOKEN_NAMEo TOKEN_NAMESPACEo USER_GROUP_IDo USER_IDo USER_NAMEo USER_TYPEo USER_ZONE

Thisupdatespersistentstatevariablesforthereplica:o DATA_CHECKSUMo DATA_COLL_IDo DATA_COMMENTSo DATA_CREATE_TIMEo DATA_EXPIRYo DATA_IDo DATA_MAP_ID

Page 183: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

175

o DATA_MODIFY_TIMEo DATA_NAMEo DATA_OWNER_NAMEo DATA_OWNER_ZONEo DATA_PATHo DATA_REPL_NUMo DATA_RESC_GROUP_NAMEo DATA_RESC_NAMEo DATA_SIZEo DATA_STATUSo DATA_TYPE_NAMEo DATA_VERSION

Thecreationofareplicacanbeverifiedbyrunningaperiodicrulethatchecksthatareplicaforeachfileexists,andthattheintegrityofthereplicahasnotbeencompromised.

Page 184: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

176

9 Summary: Theimpactofmodificationstothepoliciesusedinpolicy‐baseddatamanagementsystemcanbebasedonanalysisofchangestopersistentstateinformation.Theprocessrequiresidentifyingtheevents(actions)executedbyuseofthesystem,andtheresponsesmadetotheactionsunderpolicy‐basedcontrol.Theresponsesaremappedfromtheclientevents,throughpolicy‐enforcementpoints,tothepoliciesthatareenforced,tothemicro‐servicesthatareexecuted,andfinallytothepersistentstateinformationthatismodified.Rulesthatanalyzetheconsistencyofthechangedstateinformationcanthenbeperiodicallyappliedtoverifysystemstate.Thisapproachrequiresananalysisruleforeachpolicythatischanged.AnexamplebasedontheSILSLifeTimeLibrarypolicysetispresented.

10 Acknowledgements: ThedevelopmentoftheiRODSdatagridandtheresearchresultsinthispaperwerefundedbytheNSFOCI‐1032732grant,"SDCIDataImprovement:ImprovementandSustainabilityofiRODSDataGridSoftwareforMulti‐DisciplinaryCommunityDrivenApplication,"(2010‐2013),andtheNSFCooperativeAgreementOCI‐094084,“DataNetFederationConsortium”,(2011‐2015).WethankShanePusz,UniversityofNorthCarolinaatChapelHillforgeneratingthemicro‐serviceusageinformationfortheiRODSstateinformationattributes.

11 References: 1. http://irods.org/download/2. Moore,R.,A.Rajasekar,MichaelConway,GaryMarchionini,M.Nutt,K.Street,M.

Sullivan,S.Trujillo,B.Wolfe,“LifeTimeLibrary”,JCDLDigitalLibraries‐BeyondtheDesktopworkshop,June16‐17,2011,Ottawa,Canada.

3. ResearchDataAllianceFileDepot,“Implementations:PracticalPolicyWorkingGroup,September2014”.

4. Rajasekar,R.,M.Wan,R.Moore,W.Schroeder,S.‐Y.Chen,L.Gilbert,C.‐Y.Hou,C.Lee,R.Marciano,P.Tooby,A.deTorcy,B.Zhu,“iRODSPrimer:IntegratedRule‐OrientedDataSystem”,Morgan&Claypool,2010.

5. Ward,J.,M.Wan,W.Schroeder,A.Rajasekar,A.deTorcy,T.Russell,H.Xu,R.Moore,“TheintegratedRule‐OrientedDataSystem(iRODS3.0)Micro‐serviceWorkbook”,DICEFoundation,November2011,ISBN:9781466469129,Amazon.com.

6. BitCurator:http://www.bitcurator.net/7. DFXML:http://wiki.bitcurator.net/index.php?title=Fiwalk_and_DFXML8. BulkExtractor:http://www.forensicswiki.org/wiki/Bulk_extractor9. iRODS:https://www.irods.org

Page 185: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

177

10. Rajasekar,R.,Wan,M.,Moore,R.,Schroeder,W.,Chen,S.‐Y.,Gilbert,L.,Hou,C.‐Y.,Lee,C.,Marciano,R.,Tooby,P.,deTorcy,A.,andZhu,B..2010.iRODSPrimer:IntegratedRule‐OrientedDataSystem,Morgan&Claypool.DOI=10.2200/S00233ED1V01Y200912ICR012.

11. Moore, R., A. Rajasekar, Michael Conway, Gary Marchionini, M. Nutt, K. Street, M. Sullivan, S. Trujillo, B. Wolfe, “Life Time Library”, JCDL Digital Libraries-Beyond the Desktop workshop, June 16-17, 2011, Ottawa, Canada.

Page 186: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

178

Appendix A:  Policy‐enforcement Points Each policy‐enforcement point is named. A policy can be added to the rule base(core.re file)using thenameofapolicy‐enforcementpoint to invokea controllingprocedure. Thus to set access control to strict (meaning that no‐one can see thenamesofanyoneelse’sfiles,weaddthepolicy: acAclPolicy{msiAclPolicy("STRICT");}Thepolicyinvokestheexecutionofthemicro‐servicemsiAclPolicyusingtheinputparameter“STRICT”.Threetypesofpolicy‐enforcementpointsareused:

1. Providecontroloftheexecutionofasystemfunction.2. Provide pre‐process control for defining input to the system function

(acPreProc).3. Provide post‐process control formanipulating the output from the system

function(acPostProc).

TableA.1PolicyEnforcementPointsPolicyEnforcementPoint Policy

acAclPolicy ThisrulesetsAccessControlListpolicy.

acBulkPutPostProcPolicyThisrulesetsthepolicyforexecutingthepostprocessingputrule(acPostProcForPut)forbulkput.

acCheckPasswordStrengthThisisapolicypointforcheckingpasswordstrength,calledwhentheadminoruserissettingapassword.

acChkHostAccessControlThisrulecheckstheaccesscontrolbyhostanduserbasedonthepolicygivenintheHostAccessControlfile.

acCreateDefaultCollections Thisrulecontrolscreationofstandardcollectionsforanewuser.acCreateUser Thisruleenablespre‐processandpost‐processforcreationofauser.

acDataDeletePolicyThisrulesetsthepolicyfordeletingdataobjects.ThisisthePreProcessingrulefordelete.

acDeleteUser ThisruleenablespreprocessandpostprocessforuserdeletionacDeleteUserZoneCollections ThisruledeletesstandardusercollectionswithinazoneacGetUserByDN ThisrulecanbeconfiguredtodosomespecialhandlingofGSIDNs.acPostProcForCollCreate Thisrulesetsthepost‐processingpolicyforcreatingacollection.acPostProcForCopy Ruleforpostprocessingthecopyoperation.acPostProcForCreate Ruleforpostprocessingofdataobjectcreate.acPostProcForCreateResource Thisrulesetsthepost‐processingpolicyforcreatinganewresource.acPostProcForCreateToken Thisrulesetsthepost‐processingpolicyforcreatinganewtoken.acPostProcForCreateUser Thisrulesetsthepost‐processingpolicyforcreatinganewuser.acPostProcForDataObjRead Ruleforpostprocessingthereadbuffer.acPostProcForDataObjWrite Ruleforpreprocessingthewritebuffer.acPostProcForDelete Thisrulesetsthepost‐processingpolicyfordeletingdataobjects.acPostProcForDeleteResource Thisrulesetsthepost‐processingpolicyfordeletinganoldresource.acPostProcForDeleteToken Thisrulesetsthepost‐processingpolicyfordeletinganoldtoken.acPostProcForDeleteUser Thisrulesetsthepost‐processingpolicyfordeletinganolduser.acPostProcForFilePathReg Ruleforpostprocessingtheregistrationorafilepath.acPostProcForGenQuery Thisrulesetsthepost‐processingpolicyforgeneralquery.acPostProcForModifyAccessControl Thisrulesetsthepost‐processingpolicyforaccesscontrolmodification.

acPostProcForModifyAVUmetadataThisrulesetsthepost‐processingpolicyforadding/deletingandcopyingtheAVUmetadatafordata,collection,resources,anduser.

acPostProcForModifyCollMetaThisrulesetsthepost‐processingpolicyformodifyingsystemmetadataofacollection.

acPostProcForModifyDataObjMetaThisrulesetsthepost‐processingpolicyformodifyingsystemmetadataofadataobject.

acPostProcForModifyResource Thisrulesetsthepost‐processingpolicyformodifyingthepropertiesofaresource.acPostProcForModifyResourceGroup Thisrulesetsthepost‐processingpolicyformodifyingmembershipofaresource

Page 187: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

179

group.acPostProcForModifyUser Thisrulesetsthepost‐processingpolicyformodifyingthepropertiesofauser.acPostProcForModifyUserGroup Thisrulesetsthepost‐processingpolicyformodifyingmembershipofausergroup.

acPostProcForObjRenameThisrulesetsthepost‐processingpolicyforrenaming(logicallymoving)dataandcollections.

acPostProcForOpen Ruleforpostprocessingofdataobjectopen.

acPostProcForPhymvRuleforpostprocessingofdataobjectmoveofaphysicalfilepath(e.g.‐iregcommand).

acPostProcForPut Ruleforpostprocessingtheputoperation.acPostProcForRepl Ruleforpostprocessingofdataobjectreplication.acPostProcForRmColl Thisrulesetsthepost‐processingpolicyforremovingacollection.acPostProcForTarFileReg Ruleforpostprocessingtheregistrationoftheextractedtarfile(fromibun‐x).acPreprocForCollCreate ThisisthePreProcessingruleforcreatingacollection.acPreProcForCreateResource Thisrulesetsthepre‐processingpolicyforcreatinganewresource.acPreProcForCreateToken Thisrulesetsthepre‐processingpolicyforcreatinganewtoken.acPreProcForCreateUser Thisrulesetsthepre‐processingpolicyforcreatinganewuser.

acPreprocForDataObjOpenPreprocessruleforopeninganexistingdataobjectwhichisusedbytheget,copyandreplicateoperations.

acPreProcForDeleteResource Thisrulesetsthepre‐processingpolicyfordeletinganoldresource.acPreProcForDeleteToken Thisrulesetsthepre‐processingpolicyfordeletinganoldtoken.acPreProcForDeleteUser Thisrulesetsthepre‐processingpolicyfordeletinganolduser.acPreProcForExecCmd RuleforpreprocessingwhenremotelyexecutingacommandacPreProcForGenQuery Thisrulesetsthepre‐processingpolicyforgeneralquery.acPreProcForModifyAccessControl Thisrulesetsthepre‐processingpolicyforaccesscontrolmodification.

acPreProcForModifyAVUmetadataThisrulesetsthepre‐processingpolicyforadding/deletingandcopyingtheAVUmetadatafordata,collection,resources,anduser.

acPreProcForModifyCollMeta Thisrulesetsthepre‐processingpolicyformodifyingsystemmetadataofacollection.

acPreProcForModifyDataObjMetaThisrulesetsthepre‐processingpolicyformodifyingsystemmetadataofadataobject.

acPreProcForModifyResource Thisrulesetsthepre‐processingpolicyformodifyingthepropertiesofaresource.

acPreProcForModifyResourceGroupThisrulesetsthepre‐processingpolicyformodifyingmembershipofaresourcegroup.

acPreProcForModifyUser Thisrulesetsthepre‐processingpolicyformodifyingthepropertiesofauser.acPreProcForModifyUserGroup Thisrulesetsthepre‐processingpolicyformodifyingmembershipofausergroup.

acPreProcForObjRenameThisrulesetsthepre‐processingpolicyforrenaming(logicallymoving)dataandcollections

acPreprocForRmCollThisisthePreProcessingruleforremovingacollection.Currentlythereisnofunctionwrittenspecificallyforthisrule.

acRenameLocalZone Thisrulerenamesthezoneandallcollectionswithinthezone.acRescQuotaPolicy Thisrulesetsthepolicyforaresourcequota.acSetChkFilePathPerm Thisrulemanagesmountingofcollections.acSetMultiReplPerResc Preprocessruleforreplicatinganexistingdataobject.acSetNumThreads Ruletosetthenumberofthreadsforadatatransfer.

acSetPublicUserPolicyThisrulesetsthepolicyforthesetofoperationsthatareallowablefortheuser"public"

acSetRescSchemeForCreate Thisisthepreprocessingruleforcreatingadataobject.acSetRescSchemeForRepl Thisisthepreprocessingruleforreplicatingadataobject..

acSetReServerNumProcThisrulesetsthepolicyforthenumberofprocessestousewhenrunningjobsintheirodsReServer.

acSetVaultPathPolicy ThisrulesetsthepolicyforcreatingthephysicalpathintheiRODSresourcevault.acTicketPolicy Thisisapolicypointforticket‐basedaccesscontrol.acTrashPolicy Thisrulesetsthepolicyforwhetherthetrashcanshouldbeused.

Page 188: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

180

Appendix B:  Client Invocation of Policy Enforcement Points Eachpolicyenforcementpointmaybeinvokedbymultipleclientevents.Foreventsthat manipulate files, up to 12 policy enforcement points are accessed for eachinteraction.Inthefollowingtables,thecolumnslistthepolicyenforcementpoints.Client actions that invoke a policy enforcement point are listed in separate rows.Notethateachtabledefineseventsthatinvokedifferentpolicyenforcementpoints.

TableB.1Filemanipulationevents

icommands 

   acChkH

ostAccessControl 

acSetPublicUserPolicy 

acAclPolicy 

acSetRescSchem

eForCreate 

acRescQ

uotaPolicy 

acSetVaultPathPolicy 

acPreProcForM

odifyD

ataO

bjM

eta 

acPostProcForM

odifyD

ataO

bjM

eta 

acPreprocForDataO

bjOpen

 

acPostProcForOpen

 

acSetRescSchem

eForRepl 

acSetM

ultiRep

lPerResc 

acPostProcForCreate 

acPostProcForPut 

acPostProcForCopy 

acPostProcForRep

acPostProcForPhym

acPreProcForObjRen

ame 

acPostProcForObjRen

ame 

acPreProcForRmColl 

acTrashPolicy 

acDataD

eleteP

olicy 

acPreProcForCollCreate 

acPostProcForCollCreate 

acPostProcForFilePathReg 

acPostProcForRmColl 

acPostProcForDelete 

icp  Copy a file  x  x  x x x x x x x x x x               

icp ‐N 2 Copy a file using 2 I/O threads 

x  x  x x x x x x x x    x    x                                    

iphybun Physically bundle a collection 

x  x  x x x x x x x    x                                             

irepl  Replicate a file  x  x  x x x x x x x x               

ibun ‐c D Upload/download tar files 

x  x  x x x x x x          x x                                       

iput Put a file into the data grid 

x  x  x x x x x x          x x                                       

iphymv Physically move a file 

x  x  x x x x x x       x             x                              

imv  Move a file  x  x  x x x x x x x                

irm  Remove a file  x  x  x x x x x x x     x  x       

irm ‐r collection Recursively remove a collection 

x  x  x       x x x       x                x x  x  x  x                

ichksum  Checksum a file  x  x  x x x               

iput ‐f Overwrite an existing file 

x  x  x          x x x x       x                                       

irsync Synchronize two collections 

x  x  x          x x x x       x                                       

irule ‐ msiDataObjWrite 

Write a file  x  x  x          x x x x       x                                       

irule ‐ msiDataObjRead 

Read a file  x  x  x                x x                                                

idbo exec Execute a database resource 

x  x  x                x x                                                

iget Get a file from the data grid 

x  x  x                x x                                                

igetwild.sh  Get multiple files  x  x  x x x               

imkdir  Make a directory  x  x  x          x  x 

ireg  Register a file  x  x  x          x  x  x

irmtrash  Empty trash  x  x  x     x        x x

Page 189: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

181

TableB.2Eventsthatmanipulateusersandresources

icommands    

acChkH

ostAccessControl 

acSetPublicUserPolicy 

acAclPolicy 

acCreateU

ser 

acPreProcForCreateU

ser 

acCreateU

serF1 

acCreateD

efaultCollections 

acCreateUserZoneCollections 

acCreateCollByA

dmin 

acCreateUserZoneCollections 

acCreateD

efaultCollections 

acPostProcForCreateU

ser 

acPreProcForM

odifyU

ser 

acPostProcForM

odifyU

ser 

acDeleteU

ser 

acPreProcForDeleteU

ser 

acDeleteU

serF1 

acDeleteD

efaultCollections 

acDeleteU

serZoneC

ollections 

acDeleteCollByA

dmin 

acPostProcForDeleteU

ser 

acPreProcForCreateR

esource 

acPostProcForCreateR

esource 

acPreProcForDeleteResource 

acPostProcForDeleteR

esource 

iadmin mkuser  Make a user x  x     x x x x x x x x x              

iadmin mkgroup 

Make a user group 

x  x     x  x  x  x  x  x  x  x  x                                        

iadmin moduser 

Modify a user  x  x                                x  x                                  

ipasswd  Create password  x  x     x x              

iadmin rmuser  Remove user  x  x     x x x x  x  x  x      

iadmin mkresc  Make a resource  x  x              x  x

iadmin rmresc Remove a resource 

x  x                                                                 x  x 

TableB.3AdministrativeOperations

icommands 

   acChkH

ostAccessControl 

acSetPublicUserPolicy 

acAclPolicy 

acPreProcForM

odifyResource 

acPostProcForM

odifyResource 

acPreProcForM

odifyU

serGroup 

acPostProcForM

odifyU

serGroup 

acPreProcForM

odifyResourceG

roup 

acPostProcForM

odifyResourceG

roup 

acPreProcForCreateToken 

acPostProcForCreateToken 

acPreProcForDeleteToken 

acPostProcForDeleteToken 

acVacuum 

acPreProcForM

odifyA

VUMetadata 

acPostProcForM

odifyA

VUMetadata 

acPreProcForM

odifyA

ccessControl 

acPostProcForM

odifyA

ccessControl 

acPreProcForM

odifyCollM

eta 

acPostProcForM

odifyCollM

eta 

acRen

ameLocalZone 

acG

etIcatResults 

acPurgeFiles 

iadmin modresc Modify a resource  x x x x               

iadmin atg  Add user to group  x x x x               

iadmin rfg  Remove use from group  x x x x               

iadmin atrg  Add resource to resource group x x x x               

iadmin rfrg Remove resource from resource group 

x x                x x                                          

iadmin at  Add token  x x x x               

iadmin rt  Remove token  x x x x               

iadmin pv  Initiate database vacuum  x x x               

imeta  List metadata  x x x x             

ichmod  Change access  x x x    x  x       

imcoll ‐m l  Mount a collection  x x x          x  x 

iadmin modzone  Modify a zone 

x x                                                       x      

irule ‐ acPurgeFiles  Purge deleted files 

x x x                                                       x x

Page 190: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

182

TableB.4Operationsonmetadata,rules,andremoteexecution

icommands 

   acChkH

ostAccessControl 

acSetPublicUserPolicy 

acAclPolicy 

acConvertTo

Int 

acGetUserByD

acSetNumTh

reads 

acSetChkFilePathPerm 

acSetReServerNumProc 

acPreProcForGen

Query 

acPostProcForGen

Query 

acPostProcForDataO

bjW

rite 

acPostProcForDataO

bjRead 

irule ‐ acConvertToInt  Execute a rule to convert to integer x x x x        

gsi authentication  Authenticate using GSI x        

irule ‐ acSetNumThreads  Set number of threads for data transfer x x x         

irule ‐ msiNoChkFilePathPerm.r  Set permissions for registration x x x       

irule ‐ acSetReServerNumProc  Set number of execution threads x x   x     

PrePostProcForGenQueryFlag = 1  Execute general query     x  x 

ReadWriteRuleState = ON_STATE  Modify a data object         x x

irule ‐ rulemsiExecGenQuery  Execute a general query x x x        

iinit  Initialize access to the data grid x x        

iadmin  Administration interface x x        

iadmin mkdir  Make a directory x x        

icd  Change directory x x x        

iexecmd  Execute a remote command x x        

ifsck  Check consistency of data in vault x x x        

ilocate  Search for a file x x x        

ils  List files  x x x        

ilsresc  List resources x x x        

imiscsvrinfo  List server information x x        

ips  Display connections for running agents x x        

iqdel  Delete rule from queue x x x        

iqmod  Modify rule in queue x x        

iqstat  List rules in queue x x x        

iquest  Query metadata catalog x x x        

iquota  Show information on iRODS quotas x x x        

irule  Execute a rule x x        

iscan i:  Check registration of local files x x x        

isysmeta  List system metadata x x        

itrim  Delete replicas x x x        

iuserinfo  List user information x x x        

ixmsg  Send a message        

ienv  List environment variables        

ihelp  List icommands        

iadmin mkzone  Make a data grid        

iadmin rmzone  Remove a data grid        

iadmin asq  Set an alias        

iadmin rsq  Remove an alias        

ierror  List error message        

iexit  Exit from the data grid        

ipwd  Change password        

Page 191: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

183

Appendix C:  Micro‐services  Themicro‐servicesencapsulatebasicoperationsthatmaybeusefulwhenimplementingapolicy.Thetypesofoperationsincludemanipulationof:

1. Collections2. Dataobjects3. Outputfilesandstrings4. Rulebase5. Workflow6. Messagingsystem7. Environment8. Metadata9. Externalservices10. Remotedatabaseaccess11. Softlinks12. HDF13. Propertylists14. URLs15. Webservices16. XML

Foreachmicro‐service,anidentifierisprovidedthatdefinesthesetofpersistentstatevariablesreadormodifiedbyexecutionofthemicro‐service.ThepersistentstatevariablesetsarelistedinTableC.2.Notethatmicro‐servicesthatdonotmodifystateinformationarelistedwithpersistentstateset“0”.

TableC.1Listofmicro‐servicesavailableiniRODSversion4.0

Micro‐service    Persistent State Set 

‐  Negation operator for arithmetic 0

!  Negation operator for boolean variables 0

!=  Negation operation for conditional test 0

.  Structure operator for extracting variables from structure 0

*  Workflow variable 0

/  Division operator for arithmetic 0

&&  And operator for query 0

%  Module operator for arithmetic 0

%%  Or operator for query 0

^  Exponentiation operator for arithmetic 0

^^  Calculate nth root for arithmetic 0

+  Addition operator for arithmetic 0

++  Addition operator for strings 0

<   Less than operator for conditional tests 0

<=  less than or equal operator for conditional tests 0

=  Assignment operator for variables 0

==  Equal operator for conditional tests 0

>   Greater than operator for conditional tests 0

>=  Greater than or equal operator for conditional tests 0

||  Or operator for query 0

abs  Absolute value operator for arithmetic 0

applyAllRules  Apply all rules 0

average  Average operator for arithmetic 0

Page 192: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

184

bool  Boolean type operator 0

break  Break loop execution operator for workflow 0

ceiling  Calculate closest larger integer for arithmetic 0

cons  List definition operator 0

cut  No retry operator on failure for workflow 0

datetime  Date‐time converter for workflow 0

datetimef  Data‐time formatted converter for workflow 0

delay  Delay execution of a rule 60

double  Double type operator 0

elem  List element operator 0

errorcode  Trap error code operator for workflow 0

errormsg  Trap error message operator for workflow 0

eval  Evaluate code 0

execCmdArg  Execute remote command with an argument 0

exp  Exponentiation operator for arithmetic 0

fail  Fail operator for workflow 0

floor  Calculate closest lower integer for arithmetic 0

for  For loop operator for workflow 0

foreach  For each loop operator for workflow list 0

hd  Calculate the head of a list 0

if  Conditional test for workflow 0

int  Integer type operator 0

let  Define function variables in an expression 0

like  Similarity operator for query 0

like regex  Similarity operator for query 0

list  List structure type 0

log  Logarithm operator for arithmetic 0

match  Matches a string against a regular expression 0

max  Maximum operator for arithmetic 0

min  Minimum operator for arithmetic 0

msiAclPolicy  Set access control policy 0

msiAddConditionToGenQuery  Add condition to a general query 0

msiAddKeyVal  Add key‐value pair to an in‐memory structure 0

msiAddKeyValToMspStr Add key‐value pair to an in‐memory structure for concatenating command arguments  0 

msiAddSelectFieldToGenQuery  Add select field to a general query 0

msiAddToNcArray  Modify an array in a netCDF file 0

msiAddUserToGroup  Admin ‐ add a user to a group 66

msiAdmAddAppRuleStruct  Admin ‐ add rules to an in‐memory structure 0

msiAdmAppendToTopOfCoreRE  Admin ‐ append rules to the top of the rule base (core.re file)  0

msiAdmChangeCoreRE  Admin ‐ change the rule base (core.re file) 0

msiAdmClearAppRuleStruct  Admin ‐ clear rules from the in‐memory structure 0

msiAdmInsertDVMapsFromStructIntoDB Admin ‐ Insert persistent state name maps from memory structure into database  48 

msiAdmInsertFNMapsFromStructIntoDB Admin‐Insertfunctionnamemapsfrommemorystructureintodatabase 51 

msiAdmInsertMSrvcsFromStructIntoDB Admin ‐ insert micro‐service names from in‐memory structure into database  54 

msiAdmInsertRulesFromStructIntoDB  Admin ‐ Insert rules from memory structure into database 58

msiAdmReadDVMapsFromFileIntoStruct  Admin ‐ load persistent state name maps from file into memory structure  0

msiAdmReadFNMapsFromFileIntoStruct  Admin ‐ Load function name maps from file into memory structure  0

msiAdmReadMSrvcsFromFileIntoStruct  Admin ‐ Read micro‐service name maps from file into memory structure  0

msiAdmReadRulesFromFileIntoStruct  Admin ‐ Read rules from file into memory structure 0

msiAdmRetrieveRulesFromDBIntoStruct  Admin ‐ Load rules from database into a memory structure 59

msiAdmShowCoreRE  Admin ‐ list rules from rule base (core.re file) 0

msiAdmShowDVM  Admin ‐ list persistent state names 0

msiAdmShowFNM  Admin ‐ list function names (micro‐services) 0

msiAdmWriteDVMapsFromStructIntoFile  Admin ‐ write persistent state name maps from memory into a file  0

msiAdmWriteFNMapsFromStructIntoFile  Admin ‐ write function name maps from memory into a file 0

msiAdmWriteMSrvcsFromStructIntoFile  Admin ‐ write micro‐service names from memory into a file 0

msiAdmWriteRulesFromStructIntoFile  Admin ‐ write rules from memory into a file 0

msiApplyDCMetadataTemplate Apply the Dublin Core template to set attribute‐value‐unit triplets on a digital object  27 

Page 193: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

185

msiAssociateKeyValuePairsToObj  Add attribute‐value‐units to a digital object, specified as key‐value pairs  7

msiAutoReplicateService  Verify integrity and repair corrupted digital objects 26

msiBytesBufToStr  Format a buffer into a string 0

msiCheckAccess  Check access control 28

msiCheckHostAccessControl  Check host access control 65

msiCheckOwner  Check owner of a digital object 0

msiCheckPermission  Check access permissions 0

msiCloseGenQuery  Close the memory structure for a general query 0

msiCollCreate  Create a collection 24

msiCollectionSpider  Apply workflow to digital objects in a collection 15

msiCollRepl  Replicate a collection 18

msiCollRsync  Recursively synchronize a source collection with a target collection  14

msiCommit  Commit a change to the metadata catalog 0

msiConvertCurrency  Get conversion rates for currencies from a web service 0

msiCopyAVUMetadata  Copy attribute‐value‐units between digital objects 27

msiCreateCollByAdmin  Admin ‐ create a collection 2

msiCreateUser  Admin ‐ create a user 63

msiCreateUserAccountsFromDataObj  Create user accounts specified in a list in a digital object 20

msiCreateXmsgInp  Create an Xmsg packet from input parameters (messaging system)  0

msiCutBufferInHalf  Decrease size of an in‐memory buffer 0

msiDataObjAutoMove  Move a file into a destination collection 13

msiDataObjChksum  Checksum a digital object 15

msiDataObjClose  Close a digital object 47

msiDataObjCopy  Copy a digital object 16

msiDataObjCreate  Create a digital object 13

msiDataObjGet  Get a digital object 13

msiDataObjLseek  Seek to a location in a digital object 0

msiDataObjOpen  Open a digital object 20

msiDataObjPhymv  Physically move a digital object 22

msiDataObjPut  Put a digital object into the data grid 0

msiDataObjRead  Read a digital object 0

msiDataObjRename  Rename a digital object 13

msiDataObjRepl  Replicate a digital object 13

msiDataObjRsync  Synchronize a digital object with an iRODS collection 15

msiDataObjTrim  Delete selected replicas of a digital object 13

msiDataObjUnlink  Delete a digital object 20

msiDataObjWrite  Write a digital object 0

msiDboExec  Execute a database resource object 56

msiDbrCommit  Execute a database resource commit 56

msiDbrRollback  Rollback a database resource object 56

msiDeleteCollByAdmin  Admin‐ delete a collection 36

msiDeleteDisallowed  Turn off deletion for a digital object 0

msiDeleteUnusedAVUs  Delete unused attribute‐value‐unit triplets 52

msiDeleteUser  Delete a user 67

msiDeleteUsersFromDataObj  Delete users specified in a list in a digital object 20

msiDigestMonStat  Generate and store load factors for monitoring resources 61

msiDoSomething  Template for constructing a new micro‐service 0

msiExecCmd  Execute a remote command 0

msiExecGenQuery  Execute general query user 

defined 

msiExecStrCondQuery  Convert a string to a query and execute user 

defined 

msiExit  Add a user explanation to the error stack 0

msiExportRecursiveCollMeta Recursively export collection metadata into a buffer using pipe‐delimited format  33 

msiExtractTemplateMDFromBuf Use a template to apply pattern matching to a buffer and extract key‐value pairs  0 

msiFlagDataObjwithAVU  Add an attribute‐value‐unit to a digital object 27

msiFlagInfectedObjs  Parse the output from clamscan and flag infected objects 20

msiFloatToString  Convert a binary variable to a string 0

msiFlushMonStat  Delete old usage monitoring statistics 0

msiFreeBuffer  Free space allocated to an in‐memory buffer 0

msiFreeNcStruct  Free an in‐memory structure used to process netCDF files 0

Page 194: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

186

msiFtpGet  Get a file from an FTP site 9

msiGetAuditTrailInfoByActionID  Get audit trail information based on ActionID 1

msiGetAuditTrailInfoByKeywords  Get audit trail information based on use of keywords 1

msiGetAuditTrailInfoByObjectID  Get audit trail information based on ObjectIDs 1

msiGetAuditTrailInfoByTimeStamp  Get audit trail information based on time stamps 1

msiGetAuditTrailInfoByUserID  Get audit trail information based on userID 1

msiGetCollectionACL  Get access controls for a collection 6

msiGetCollectionContentsReport  Generate a report of collection contents 34

msiGetCollectionPSmeta  Get attribute‐value‐units from a collection in pipe‐delimited format  38

msiGetCollectionSize  Get the size of a collection 35

msiGetContInxFromGenQueryOut Get continuation index for whether additional rows are available for a query result  0 

msiGetDataObjACL  Get access control list for a digital object 19

msiGetDataObjAIP  Create XML file containing system and descriptive metadata  12

msiGetDataObjAVUs  Get attribute‐value‐units from a digital object 32

msiGetDataObjPSmeta  Get attribute‐value‐units from a digital object in pipe‐delimited format  32

msiGetDiffTime  Get the difference between two system times 0

msiGetDVMapsFromDBIntoStruct  Load persistent state name maps from database into memory structure  49

msiGetFNMapsFromDBIntoStruct  Load function name maps from database into memory structure  50

msiGetIcatTime  Get the system time from the metadata catalog 0

msiGetMoreRows  Get more query results 0

msiGetMSrvcsFromDBIntoStruct   Load micro‐service names from database into memory structure  53

msiGetObjectPath  Convert from in‐memory structure to string for printing 0

msiGetObjType  Get the type of digital object (file, collection, user, resource)  31

msiGetQuote  Get stock quotation by accessing external web service 0

msiGetRescAddr  Get the IP address of a storage resource 0

msiGetRulesFromDBIntoStruct  Load rules from database into a memory structure 59

msiGetSessionVarValue  Get value of a session variable from in‐memory structure 0

msiGetStderrInExecCmdOut  Retrieve standard error from remote command execution 0

msiGetStdoutInExecCmdOut  Retrieve standard out from remote command execution 0

msiGetSystemTime  Get the system time from the iRODS server 0

msiGetTaggedValueFromString  Use pattern‐based extraction to retrieve a value for a tag from a string  0

msiGetUserACL  Get access control list for a user 30

msiGetUserInfo  Get information about a user 64

msiGetValByKey  Extract a value from in‐memory structure that holds result of a query  0

msiGoodFailure  Force failure in a workflow without initiating recovery procedures  0

msiGuessDataType  Guess the data type based on the file extension 62

msiH5Dataset_read  Read an HDF5 files 0

msiH5Dataset_read_attribute  Get attributes from an HDF5 file 0

msiH5File_close  Close an HDF5 file 44

msiH5File_open  Open an HDF5 file 25

msiH5Group_read_attribute  Get group attributes from an HDF5 file 0

msiHumanToSystemTime  Convert human time format to system time format 0

msiImageConvert  Convert image format 0

msiImageGetProperties Get image properties from an image (Colors, ColorSpace, Depth, Format, Gamma, …)  0 

msiIp2location  Convert an IP address to a location using an external web service  0

msiIsColl  Verify digital object is a collection 37

msiIsData  Check if digital object is a file 31

msiListEnabledMS  List enabled micro‐services 0

msiLoadACLFromDataObj  Load access controls from a list in a digital object 20

msiLoadMetadataFromDataObj  Load attribute‐value‐units from a list in a digital object 20

msiLoadMetadataFromXml  Load metadata for digital objects from an XML file 11

msiLoadUserModsFromDataObj  Load user information from a list in a digital object 20

msiMakeGenQuery  Make a general query 0

msiMakeQuery  Construct a query 0

msiMergeDataCopies  Merge multiple collections to create an authoritative version  17

msiNccfGetVara  Get variables from a netCDF file 0

msiNcClose  Close a netCDF file 0

msiNcCreate  Create a netCDF file 10

msiNcGetArrayLen  Get array length from a netCDF file 0

msiNcGetAttNameInInqOut  Get attribute names from a netCDF file 0

msiNcGetAttValStrInInqOut  Get attribute values from a netCDF file 0

Page 195: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

187

msiNcGetDataType  Get data type from a netCDF file 0

msiNcGetDimLenInInqOut  Get dimension length from a netCDF file 0

msiNcGetDimNameInInqOut  Get dimension name from a netCDF file 0

msiNcGetElementInArray  Get an element from an array in a netCDF file 0

msiNcGetFormatInInqOut  Get the format of a netCDF file 0

msiNcGetGrpInInqOut  Get group information from a netCDF file 0

msiNcGetNattsInInqOut  Get the number of attributes in a netCDF file 0

msiNcGetNdimsInInqOut  Get the number of dimensions in a netCDF file 0

msiNcGetNGrpsInInqOut  Get the number of groups in a netCDF file 0

msiNcGetNumDim  Get a dimension from a netCDF file 0

msiNcGetNvarsInInqOut  Get the number of variables in a netCDF file 0

msiNcGetVarIdInInqOut  Get a variable ID from a netCDF file 0

msiNcGetVarNameInInqOut  Get a variable name from a netCDF file 0

msiNcGetVarsByType  General variable sub‐setting function for a netCDF file 0

msiNcGetVarTypeInInqOut  Get a variable type from a netCDF file 0

msiNcInq  Query a netCDF file 0

msiNcInqGrps  Get group paths for a given netCDF ID 0

msiNcInqId  Get netCDF ID 0

msiNcInqWithId  Query a netCDF file with a netCDF ID 0

msiNcIntDataTypeToStr  Convert netCDF data type to a string 0

msiNcOpen  Open a netCDF file 13

msiNcOpenGroup  Open a group within a netCDF file 0

msiNcRegGlobalAttr  Register a global attribute in a netCDF file 0

msiNcSubsetVar  Subset a variable in a netCDF file 0

msiNcVarStat  List variable information in a netCDF file 0

msiNoChkFilePathPerm Set policy for checking the file path permission when registering a physical file path  0 

msiNoTrashCan  Set policy for use of trash can 0

msiObjByName  Retrieve astronomy images by name using web services 0

msiobjget_dbo  Get a database object from a registered database resource 0

msiobjget_http  Get an http page from a registered web site 0

msiobjget_irods  Get a file from a registered iRODS path name 0

msiobjget_slink  Get a digital object referenced by a soft link to an iRODS data grid  20

msiobjget_srb  Get a file from a registered Storage Resource Broker path name  0

msiobjget_test  Test the micro‐service object framework 0

msiobjget_z3950  Get an object from a registered Z39.50 site 0

msiobjput_dbo  Write a registered database object resource 0

msiobjput_http  Write a registered http page 0

msiobjput_irods  Write a registered iRODS digital object 0

msiobjput_slink  Write a registered iRODS digital object in a remote iRODS data grid  0

msiobjput_srb  Write a registered Storage Resource Broker digital object 0

msiobjput_test  Test the micro‐service object framework 0

msiobjput_z3950  Write a registered Z 39.50 digital object 0

msiObjStat  Get status of digital object for workflow 21

msiOprDisallowed  Disallow an operation 0

msiPhyBundleColl  Physically bundle a collection 23

msiPhyPathReg  Register a physical path 0

msiPrintGenQueryInp  Print a general query 0

msiPrintGenQueryOutToBuffer  Write contents of output results from a general query into a buffer  0

msiPrintKeyValPair  Print a key value pair returned from a query 0

msiPropertiesAdd  Add properties to a list 0

msiPropertiesClear  Clear properties from a list 0

msiPropertiesClone  Clone a properties list 0

msiPropertiesExists  Verify existence of properties in a list 0

msiPropertiesFromString  Create a properties list from a string 0

msiPropertiesGet  Get a property from a list 0

msiPropertiesNew  Create a new property list 0

msiPropertiesRemove  Remove properties from a list 0

msiPropertiesSet  Set the value of a property in a list 0

msiPropertiesToString  Convert a property list into a string buffer 0

msiQuota  Admin ‐ calculate storage usage and check storage quotas 46

msiRcvXmsg  Receive an Xmsg packet (messaging system) 0

msiReadMDTemplateIntoTagStruct  Parse a buffer holding a tag template and store the tags in an in‐memory  0

Page 196: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

188

tag structure

msiRecursiveCollCopy  Recursively copy a collection 5

msiRemoveKeyValuePairsFromObj Remove attribute‐value‐unit from digital object, specified as key‐value pair  28 

msiRenameCollection  Rename a collection 8

msiRenameLocalZone  Admin ‐ Rename the local zone (data grid) 40

msiRmColl  Remove a collection 39

msiRollback  Roll back a database transaction 0

msiSdssImgCutout_GetJpeg  Get an astronomy image cutout using a web service 0

msiSendMail  Send e‐mail message 0

msiSendStdoutAsEmail  Send standard output as an e‐mail message 0

msiSendXmsg  Send an Xmsg packet (messaging system) 0

msiServerBackup  Backup an iRODS server to a local vault 3

msiServerMonPerf  Monitor server performance 57

msiSetACL  Set an access control 4

msiSetBulkPutPostProcPolicy Control use of the acPostProcForPut policy when using a bulk put operation  0 

msiSetChkFilePathPerm  Disallow non‐admin user from registering files 0

msiSetDataObjAvoidResc  Disallow use of a storage resource 0

msiSetDataObjPreferredResc  Set the preferred storage resource 0

msiSetDataType  Set the type of digital object (file, collection, user, resource) 41

msiSetDataTypeFromExt  Set a recognized data type for a digital object based on its extension  42

msiSetDefaultResc  Set the default storage resource 0

msiSetGraftPathScheme  Define the physical path name for storing files 0

msiSetMultiReplPerResc  Allow multiple replicas to exist on the same storage resource  0

msiSetNoDirectRescInp  Define a list of resources that cannot be used by a normal user  0

msiSetNumThreads  Set the number of threads used for parallel I/O 0

msiSetPublicUserOpr  Set a list of operations that can be performed by the user "public"  0

msiSetQuota  Set resource usage quota 55

msiSetRandomScheme  Set the physical path name based on a randomly generated path  0

msiSetReplComment  Set data object comment field 29

msiSetRescQuotaPolicy  Turn resource quotas on or off 0

msiSetRescSortScheme  Set the scheme used for selecting a storage resource 0

msiSetReServerNumProc  Set the number of execution threads for processing rules 0

msiSetResource  Set the resource to use within a workflow 0

msiSleep  Sleep for a specified interval 0

msiSortDataObj Sort the order in which resources will be accessed to retrieve a replicated digital object  0 

msiSplitPath  Split a path into a collection and file name 0

msiSplitPathByKey  Split a path based on a key (separate a file name from an extension)  0

msiStageDataObj  Stage a digital object to a specified resource 0

msiStoreVersionWithTS  Create a time‐stamped version of a digital object 20

msiStrArray2String  Convert an array of strings to a list of strings separated by "%"  0

msiStrCat  Concatenate a string to a target string 0

msiStrchop  Remove the last character of a string 0

msiString2KeyValPair  Convert a string to a key‐value pair in memory structure 0

msiString2StrArray  Convert a list of strings separated by "%" to an in‐memory array of strings  0

msiStripAVUs  Remove attribute‐value‐units from a digital object 28

msiStrlen  Get the length of a string 0

msiStrToBytesBuf  Load a string into an in‐memory buffer 0

msiStructFileBundle  Create a bundle of files in a collection for export as a tar file 13

msiSysChksumDataObj  Checksum a digital object 45

msiSysMetaModify  Modify system metadata attributes 43

msiSysReplDataObj  Admin ‐ replicate a digital object 18

msiTarFileCreate  Create a tar file 47

msiTarFileExtract  Extract files from a tar file 20

msiVacuum  Optimize indices in the metadata catalog 0

msiWriteRodsLog  Write a string into iRODS/server/log/rodsLog 0

msiXmlDocSchemaValidate Validate an XML document schema for adding attributed‐value‐unit triplets  13 

msiXmsgCreateStream  Create a message stream (messaging system) 0

msiXmsgServerConnect  Connect to a message stream (messaging system) 0

msiXmsgServerDisConnect  Disconnect from a message stream (messaging system) 0

Page 197: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

189

msiXsltApply  Apply an XSLT transformation to an XML document 13

msiz3950Submit  Retrieve a record from a Z39.50 server 0

nop  Null operation 0

not like  Not like operator for query 0

not like regex  Not like operator for query using regular expression 0

readXMsg  Read a message stream (messaging system) 0

remote  Execute rule at a remote site 0

setelem  Set an element in a list 0

size  Return the number of elements in a list 0

split  Split a string 0

str  Convert a variable to a string 0

strlen  Return the length of a string 0

substr  Create a specified sub‐string 0

succeed  Cause a workflow to immediately succeed (workflow operator)  0

time  Get the current time 0

timestr  Convert a datetime variable to a string 0

timestrf  Convert a datetime variable to a string using a format 0

tl  Calculate the tail of a list 0

triml  Trim a prefix of a string 0

trimr  Trim a suffix of a string 0

while  While loop (workflow operator) 0

writeBytesBuf  Write a buffer to standard output or standard error 0

writeKeyValPairs Write key‐value pairs to standard output or standard error from an in‐memory structure  0 

writeLine  Write a line to standard output or standard error 0

writePosInt  Write a positive integer to standard output or standard error  0

writeString  Write a string to standard output or standard error 0

writeXMsg  Write a message packet (messaging system) 0

ThesetsofpersistentstateinformationarelistedintableC:2.Eachpersistentstateinformationsetidentifieswhetherapersistentstate:

1–attributeisread 2–attributeismodified 3–attributeisbothreadandmodified.

TableC:2Persistentstateattributesmodifiedbymicro‐servicesforfiles&

collections

Persistent State Variable Sets  2  3  4  5  6  7  8  9 10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

Number of micro‐services  1  1  1  1  1  1  1  1  1  1  1 10  1  3  1  1  2  1 

11  1  1  1 

COLL_ACCESS_COLL_ID  2  3  3  1 1 1 1 1 1          

COLL_ACCESS_TYPE  2  3  3  1 1 1 1 1 1          

COLL_ACCESS_USER_ID  2  3  3  1 1 1 1 1 1          

COLL_CREATE_TIME  2  3  1  1 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1

COLL_ID  3  3  1  1 1 1 1 3 1 1 1 1 1 1 1 1 1  1  1  1  1  1

COLL_INHERITANCE    1      1 1          

COLL_MODIFY_TIME  2  3  1  1 2 1 1 1 1 1 1 1 1 3 1  1  1  1  1  1

COLL_NAME  3  3  1  1 1 1 3 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1

COLL_OWNER_NAME  2  3  1  1 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1

COLL_OWNER_ZONE  2  3  1  1 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1

COLL_PARENT_NAME  2  2      3 1          

DATA_ACCESS_DATA_ID      3  1 1 1 1 1 1 1 1 1 1 1 1 1  1       

DATA_ACCESS_TYPE      3  1 1 1 1 1 1 1 1 1 1 1 1 1  1       

DATA_ACCESS_USER_ID      3  1 1 1 1 1 1 1 1 1 1 1 1 1  1       

DATA_CHECKSUM        1 3 2 1 1 1 3 3 3 1 3  1  1  1  2 

Page 198: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

190

Persistent State Variable Sets  2  3  4  5  6  7  8  9 10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

DATA_COLL_ID    1  1  1 1 1 1 2 1 1 1 1 1 1 3 3  1  1  1   

DATA_COMMENTS        1 1 1 1 1 1 1 1 3    1     

DATA_CREATE_TIME        1 3 2 1 1 1 1 1 1 1 3  1  1  1   

DATA_EXPIRY        1 1 1 1 1 1 1 1 3    1     

DATA_ID    1  1  1 1 1 1 3 2 1 1 1 1 1 1 1 3  1  1  1   

DATA_MAP_ID        1 1 1 1 1 1 1 1 3    1     

DATA_MODIFY_TIME        1 3 2 1 1 1 1 1 3 3 3  1  1  1   

DATA_NAME    1  1  1 1 1 1 3 2 1 1 1 1 1 1 1 3  1  1  1   

DATA_OWNER_NAME        1 3 2 1 1 1 1 1 1 1 3  1  1  1   

DATA_OWNER_ZONE        1 3 2 1 1 1 1 1 1 1 3  1  1  1   

DATA_PATH        1 3 2 1 1 1 1 1 1 1 3    1    2 

DATA_REPL_NUM        1 3 2 1 1 1 1 1 1 1 3    1     

DATA_RESC_GROUP_NAME        1 2 2 1 1 1 1 1 1 1 3    1    2 

DATA_RESC_NAME        1 3 2 1 1 1 1 1 1 1 3    1    2 

DATA_SIZE        1 3 2 1 1 1 1 1 1 1 3  1  1  1    2

DATA_STATUS        1 1 1 1 1 1 1 1 3    1     

DATA_TYPE_NAME        1 2 2 1 1 1 1 1 1 1 3    1     

DATA_VERSION        1 2 2 1 1 1 1 1 1 1 3    1     

META_COLL_ATTR_ID        2 3          

META_COLL_ATTR_NAME        2 3          

META_COLL_ATTR_UNITS        2 3          

META_COLL_ATTR_VALUE        2 3          

META_COLL_CREATE_TIME        2 3          

META_COLL_MODIFY_TIME        2 3          

META_DATA_ATTR_ID        2 3 3          

META_DATA_ATTR_NAME        2 3 1 1          

META_DATA_ATTR_UNITS        2 3 1 1          

META_DATA_ATTR_VALUE        2 3 1 1          

META_DATA_CREATE_TIME        2 3 2          

META_DATA_MODIFY_TIME        2 3 2          

TOKEN_ID  1  1  1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1       

TOKEN_NAME  1  1  1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1       

TOKEN_NAMESPACE  1  1  1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1       

USER_GROUP_ID    1  1  1 1 1 1 1 1 1 1 1 1 1 1         

USER_ID  1  1  1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1       

USER_NAME  1  1  1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1       

USER_TYPE    1  1  1 1 1 1 1 1 1 1 1 1 1 1         

USER_ZONE  1    1  1 1 1 1 1 1 1 1 1 1 1 1 1         

ZONE_NAME      1             

ZONE_TYPE      1             

TableC:3Additionalpersistentstateattributesetsforoperationsonfilesandcollections.

Persistent State Variable Sets 24 

25 

26 

27 

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44 

45 

46 

47 

Number of micro‐services  1  1  1  3  3 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1  1  1  2 

COLL_CREATE_TIME  1                 

COLL_ID  1  1  1  1  1 1 1 1 1 1 1 1 1 1 1        

COLL_MODIFY_TIME  1          2 2        

COLL_NAME  1  1  1  1  1 1 1 1 1 1 1 1 1 1 1 2        

COLL_OWNER_NAME  1                 

COLL_OWNER_ZONE  1          2        

COLL_PARENT_NAME            1 2        

DATA_ACCESS_DATA_ID    1  1  1  1 1 1 1 1 1 1  1     

DATA_ACCESS_TYPE    1  1  1  1 1 1 1 1 1 1       

DATA_ACCESS_USER_ID    1  1  1  1 1 1 1 1 1 1       

DATA_CHECKSUM    1  3        2     

DATA_COLL_ID    1  1  1  1 1 1 1 1 1 1 1 1        

Page 199: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

191

Persistent State Variable Sets 24 

25 

26 

27 

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44 

45 

46 

47 

DATA_COMMENTS    1  1      2 2        

DATA_CREATE_TIME    1  1             

DATA_EXPIRY    1  1      2        

DATA_ID    1  1  1  1 1 1 1 1 1 1 1 1 1 1 1  1     

DATA_MAP_ID    1  1             

DATA_MODIFY_TIME    1  1      2 2       

DATA_NAME    1  1  1  1 1 1 1 1 1        

DATA_OWNER_NAME    1  1          1   

DATA_OWNER_ZONE    1  1      2     1   

DATA_PATH    1  1      1        

DATA_REPL_NUM    1  1      1 1 1 1  1     

DATA_RESC_GROUP_NAME    1  1          1   

DATA_RESC_NAME    1  1             

DATA_SIZE    1  1      1     1  2 

DATA_STATUS    1  1             

DATA_TYPE_NAME    1  1      1 2 2 2        

DATA_VERSION    1  1             

META_COLL_ATTR_ID            1 1        

META_COLL_ATTR_NAME            1 1        

META_COLL_ATTR_UNITS            1 1        

META_COLL_ATTR_VALUE            1 1        

META_DATA_ATTR_ID        2    1 1        

META_DATA_ATTR_NAME        2    1 1        

META_DATA_ATTR_UNITS        2    1 1        

META_DATA_ATTR_VALUE        2    1 1        

META_DATA_CREATE_TIME        2           

META_DATA_MODIFY_TIME        2           

QUOTA_LIMIT                1   

QUOTA_MODIFY_TIME                2   

QUOTA_OVER                2   

QUOTA_RESC_ID                3   

QUOTA_USAGE                3   

QUOTA_USAGE_RESC_ID                1   

QUOTA_USAGE_USER_ID                1   

QUOTA_USER_ID                3   

RESC_ID                1   

RESC_MODIFY_TIME            2        

RESC_NAME      1          1   

RESC_ZONE_NAME            2        

RESC_VAULT_PATH      1             

RULE_MODIFY_TIME            2        

RULE_OWNER_ZONE            2        

TOKEN_ID    1  1  1  1 1 1 1 1 1       

TOKEN_NAME    1  1  1  1 1 1 1 1 1       

TOKEN_NAMESPACE    1  1  1  1 1 1 1 1 1       

USER_GROUP_ID    1  1  1  1 1 1 1 1 1  1  1   

USER_ID    1  1  1  1 1 1 1 1 1 1  1  1   

USER_MODIFY_TIME            2        

USER_NAME    1  1  1  1 1 1 1 1 1 1  1  1   

USER_TYPE    1  1  1  1 1 1 1 1 1    1   

USER_ZONE    1  1  1  1 1 2 1 1 1 1    1   

ZONE_ID            1        

ZONE_MODIFY_TIME            2        

ZONE_NAME            3        

Page 200: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

192

TableC:4Persistentstateattributesmodifiedbymicro‐servicesforaudittrails,rules,andusers

Persistent State Variable Sets  1 48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64 

65 

66 

67 

Number of micro‐services  5  1 1 1 1 1 1 1 1 3 1 1 2 1 1 1 1 1  1  1  1 

AUDIT_ACTION_ID  1             

AUDIT_COMMENT  1             

AUDIT_CREATE_TIME  1             

AUDIT_MODIFY_TIME  1             

AUDIT_OBJ_ID  1             

AUDIT_USER_ID  1             

DVM_BASE_MAP_BASE_NAME     3 1            

DVM_BASE_MAP_CREATE_TIME     2            

DVM_BASE_MAP_MODIFY_TIME     2            

DVM_BASE_MAP_OWNER_NAME     2            

DVM_BASE_MAP_OWNER_ZONE     2            

DVM_BASE_MAP_VERSION     3 1            

DVM_BASE_NAME     3            

DVM_CONDITION     3 1            

DVM_CREATE_TIME     2            

DVM_EXT_VAR_NAME     3 1            

DVM_ID     3 1            

DVM_INT_MAP_PATH     3 1            

DVM_MODIFY_TIME     2            

DVM_OWNER_NAME     2            

DVM_OWNER_ZONE     2            

DVM_VERSION     2            

FNM_BASE_MAP_BASE_NAME     1 2            

FNM_BASE_MAP_CREATE_TIME     2            

FNM_BASE_MAP_MODIFY_TIME     2            

FNM_BASE_MAP_OWNER_NAME     2            

FNM_BASE_MAP_OWNER_ZONE     2            

FNM_BASE_MAP_VERSION     1 2            

FNM_BASE_NAME     3            

FNM_CREATE_TIME     2            

FNM_EXT_FUNC_NAME     1 3            

FNM_ID     1 3            

FNM_INT_FUNC_NAME     1 3            

FNM_MODIFY_TIME     2            

FNM_OWNER_NAME     2            

FNM_OWNER_ZONE     2            

META_COLL_ATTR_ID     2            

META_DATA_ATTR_ID     2            

MSRVC_MODULE_NAME     1 2            

MSRVC_NAME     1 2            

MSRVC_SIGNATURE     1 2            

MSRVC_VERSION     1 2            

MSVRC_HOST     1 2            

MSVRC_ID     1 2            

MSVRC_LANGUAGE     1 2            

MSVRC_LOCATION     1 2            

MSVRC_STATUS     1 2            

MSVRC_TYPE_NAME     1 2            

QUOTA_LIMIT     3            

QUOTA_MODIFY_TIME     2            

QUOTA_OVER     2            

QUOTA_RESC_ID     3            

QUOTA_USAGE     1            

QUOTA_USAGE_RESC_ID     1            

QUOTA_USAGE_USER_ID     1            

QUOTA_USER_ID     3            

Page 201: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

193

Persistent State Variable Sets  1 48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64 

65 

66 

67 

RESC_GROUP_RESC_ID     1            

RESC_GROUP_NAME     1            

RESC_ID     1 1            

RESC_NAME     1 1 1            

RESC_ZONE_NAME     1            

RESC_VAULT_PATH     1            

RULE_BASE_MAP_BASE_NAME     3 1            

RULE_BASE_MAP_CREATE_TIME     2            

RULE_BASE_MAP_MODIFY_TIME     2            

RULE_BASE_MAP_OWNER_NAME     2            

RULE_BASE_MAP_OWNER_ZONE     2            

RULE_BASE_MAP_PRIORITY     2 1            

RULE_BASE_MAP_VERSION     3 1            

RULE_BASE_NAME     1            

RULE_BODY     1 1            

RULE_CONDITION     1 1            

RULE_EVENT     1 1            

RULE_EXEC_ADDRESS     2            

RULE_EXEC_ESTIMATED_EXE_TIME     2            

RULE_EXEC_FREQUENCY     2            

RULE_EXEC_ID     2            

RULE_EXEC_NAME     2            

RULE_EXEC_NOTIFICATION_ADDR     2            

RULE_EXEC_PRIORITY     2            

RULE_EXEC_REI_FILE_PATH     2            

RULE_EXEC_TIME     2            

RULE_EXEC_USER_NAME     2            

RULE_ID     3 1 1            

RULE_NAME     1 1            

RULE_RECOVERY     1 1            

SLD_RESC_NAME     1            

SLD_CREATE_TIME     1            

TOKEN_ID     1            

TOKEN_NAME     1 1            

TOKEN_NAMESPACE     1            

TOKEN_VALUE2     1            

USER_COMMENT     1          

USER_CREATE_TIME     2 1          

USER_GROUP_ID     1 2    1  2    

USER_ID     1 2 1  1  1  1 

USER_INFO     1          

USER_MODIFY_TIME     2 1          

USER_NAME     1 2 1  1  1  1 

USER_TYPE     1 2 1     1    

USER_ZONE     1 2 1     1  1 

ZONE_NAME     1 1            

ZONE_TYPE     1 1            

Page 202: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

194

Appendix D:  Persistent State Variables Thepersistentstatevariablesthatcanbequeriedarelistedbelow.NotethatmanyoftheattributesaremaintainedandsetbytheiRODSservers,independentlyofthemicro‐servicesandthepolicy‐enforcementpoints.

TableD:1PersistentStateVariablesPersistentStateAttribute ExplanationAUDIT_ACTION_ID InternalidentifierfortypeofactionthatisauditedAUDIT_COMMENT CommentonauditactionforthisinstanceAUDIT_CREATE_TIME CreationtimestampforauditactionAUDIT_MODIFY_TIME Modificationtimestampforauditaction

AUDIT_OBJ_IDInternalIdentifieroftheobject(data,collection,user,etc.)onwhichtheauditactionwasperformed

AUDIT_USER_ID InternalIdentityofuserwhoseactionwasauditedCOLL_ACCESS_COLL_ID AliasedCollectionidentifierusedforaccesscontrolCOLL_ACCESS_NAME Accessstringforcollection(cf.DATA_ACCESS_NAME)COLL_ACCESS_TYPE InternalidentifierforaccessnameCOLL_ACCESS_USER_ID Internalidentifieroftheuserwhoseactionisaudited.COLL_COMMENTS CommentsaboutthecollectionCOLL_CREATE_TIME Collectioncreationtimestamp

COLL_FILEMETA_CREATE_TIME

WhenaUnixdirectoryisimportedintoiRODSfromclient‐side,thedirectorymetadatainthefilesystemiscapturedintheiCATunderCOLL_FILEMETA.Thisisusefulwhengettingthedirectorybackintotheclientasthe“original”metadatacanbere‐created.TheCOLL_FILEMETA_CREATE_TIMEvariableholdsthevaluewhenthedirectorymetadatawasinsertedintoiCAT

COLL_FILEMETA_CTIME OriginalUnixdirectorycreatetimeattheclient‐side.COLL_FILEMETA_GID OriginalUnixGroup‐idforthedirectory(usedforACLs)attheclient‐side.COLL_FILEMETA_GROUP OriginalUnixGroupnameforthedirectory(usedforACLs)attheclient‐side.COLL_FILEMETA_MODE OriginalUnixACLforthedirectoryattheclient‐side.COLL_FILEMETA_MODIFY_TIME ValuewhenthedirectorymetadatawasmodifiediniCATCOLL_FILEMETA_MTIME OriginalUnixtimestampforlastmodificationattheclient‐sideCOLL_FILEMETA_OBJ_ID OriginalUnixobject_idforthedirectorattheclient‐side.COLL_FILEMETA_OWNER OriginalUnixownerforthedirectoryattheclient‐side.COLL_FILEMETA_SOURCE_PATH OriginalUnixpathforthedirectoryattheclient‐side.COLL_FILEMETA_UID OriginalUnixuser‐idofownerforthedirectoryattheclient‐side.COLL_ID Collectioninternalidentifier

COLL_INHERITANCEAttributesinheritedbysub‐collectionsfromparent‐collection:ACL,metadata,pins,locks

COLL_MAP_ID Internalidentifierdenotingthetypeofcollection.COLL_MODIFY_TIME LastmodificationtimestampforcollectionCOLL_NAME LogicalcollectionnameCOLL_OWNER_NAME CollectionownerCOLL_OWNER_ZONE HomezoneofthecollectionownerCOLL_PARENT_NAME ParentcollectionnameCOLL_TOKEN_NAMESPACE SeeTOKEN_NAMESPACE(alsoDATA_TOKEN_NAMESPACE),notusedDATA_ACCESS_DATA_ID Internalidentifierofthedigitalobjectforwhichaccessisdefined

DATA_ACCESS_NAMEAccessstringiniCATusedfordata,collections,etc.(e.g.readobject)iquest"SELECTTOKEN_NAMEWHERETOKEN_NAMESPACE='access_type'"

DATA_ACCESS_TYPE InternalICATidentifierDATA_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedondigitalobject

DATA_CHECKSUMChecksumstoredastaggedlist:<BINHEX>12344</BINHEX><MD5>22234422</MD5>

DATA_COLL_ID CollectioninternalidentifierDATA_COMMENTS CommentsaboutthedigitalobjectDATA_CREATE_TIME CreationtimestampforthedigitalobjectDATA_EXPIRY Expirationdateforthedigitalobject

Page 203: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

195

DATA_FILEMETA_CREATE_TIME

WhenaUnixfileisimportedintoiRODSfromclient‐side,thefilemetadatainthefilesystemiscapturedintheiCATunderDATA_FILEMETA.Thisisusefulwhengettingthefilebackintotheclientasthe“original”metadatacanbere‐created.TheDATA_FILEMETA_CREATE_TIMEvariableholdsthevaluewhenthefilemetadatawasinsertedintoiCAT

DATA_FILEMETA_CTIME OriginalUnixfilecreatetimeattheclient‐side.DATA_FILEMETA_GID OriginalUnixGroup‐idforthefile(usedforACLs)attheclient‐side.DATA_FILEMETA_GROUP OriginalUnixGroupnameforthedirectoryfile(usedforACLs)attheclient‐side.DATA_FILEMETA_MODE OriginalUnixACLforthefileattheclient‐side.DATA_FILEMETA_MODIFY_TIME ValuewhenthefilemetadatawasmodifiediniCATDATA_FILEMETA_MTIME OriginalUnixtimestampforlastmodificationattheclient‐sideDATA_FILEMETA_OBJ_ID OriginalUnixobject_idforthefileattheclient‐side.DATA_FILEMETA_OWNER OriginalUnixownerforthefileattheclient‐side.DATA_FILEMETA_SOURCE_PATH OriginalUnixpathforthefileattheclient‐side.DATA_FILEMETA_UID OriginalUnixuser‐idofownerforthefileattheclient‐side.

DATA_IDUniqueDatainternalidentifier.Adigitalobjectisidentifiedby(zone,collection,dataname,replica,version).Theidentifierissameacrossreplicasandversions.

DATA_MAP_ID InternalidentifierdenotingthetypeofdataDATA_MODIFY_TIME LastmodificationtimestampforthedigitalobjectDATA_NAME LogicalnameofthedigitalobjectDATA_OWNER_NAME UserwhocreatedtheobjectDATA_OWNER_ZONE HomezoneoftheuserwhocreatedtheobjectDATA_PATH PhysicalpathnamefordigitalobjectinresourceDATA_REPL_NUM Replicanumberstartingwith“1”DATA_REPL_STATUS Replicastatus:locked,is‐deleted,pinned,hideDATA_RESC_GROUP_NAME NameofresourcegroupinwhichdataisstoredDATA_RESC_NAME LogicalnameofstorageresourceDATA_SIZE SizeofthedigitalobjectinbytesDATA_STATUS Digitalobjectstatus:locked,is‐deleted,pinned,hideDATA_TOKEN_NAMESPACE Namespaceofthedatatoken:e.g.datatype,notusedDATA_TYPE_NAME Typeofdata:jpegimage,PDFdocument

DATA_VERSIONVersionstringassignedtothedigitalobject.Olderversionsofreplicashaveanegativereplicanumber

DVM_BASE_MAP_BASE_NAME NamefortheDataBaseofDataVariableSetofMaps(e.g.“core”incore.dvm)DVM_BASE_MAP_COMMENT CommentsforDVM_BASE_MAPDVM_BASE_MAP_CREATE_TIME CreationtimeforDVM_BASE_MAPDVM_BASE_MAP_MODIFY_TIME LastModificationtimeforDVM_BASE_MAPDVM_BASE_MAP_OWNER_NAME Owner’snameoftheDVM_BASE_MAPDVM_BASE_MAP_OWNER_ZONE Owner’szonenameoftheDVM_BASE_MAPDVM_BASE_MAP_VERSION VersionoftheDVM_BASE_MAP(emptyor0meanscurrent)DVM_BASE_NAME ForeignkeyreferencetoDVM_BASE_MAP_BASE_NAMEDVM_COMMENT CommentfortheDVMDVM_CONDITION ConditionforapplyingtheDVMMappingcorrespondingtoDVM_EXT_VAR_NAMEDVM_CREATE_TIME CreationtimeoftheDVMMappingDVM_EXT_VAR_NAME ExternalnamefortheMap(theactual$‐variable)DVM_ID AninternalidentifierforDVMMappingDVM_INT_MAP_PATH InternalStructurepathinREIcorrespondingtoDVM_EXT_VAR_NAMEDVM_MODIFY_TIME LastmodificationtimefortheDVMMappingDVM_OWNER_NAME Owner’snameoftheDVM_MappingDVM_OWNER_ZONE Owner’szonenameoftheDVMMappingDVM_STATUS StatusoftheDVM_Mapping(emptyisvalid)DVM_VERSION VersionfortheDVM_Mapping(emptyor0meanscurrent)

FNM_BASE_MAP_BASE_NAMENamefortheDataBaseofFunctionNameSetofMaps(e.g.“core”incore.fnm).Thiscanbeusedforgivingvirtualnamesformicro‐servicesandrulesandforversioningnamesforthesame.

FNM_BASE_MAP_COMMENT CommentsforFNM_BASE_MAPFNM_BASE_MAP_CREATE_TIME CreationtimeforFNM_BASE_MAPFNM_BASE_MAP_MODIFY_TIME LastModificationtimeforFNM_BASE_MAPFNM_BASE_MAP_OWNER_NAME Owner’snameoftheFNM_BASE_MAPFNM_BASE_MAP_OWNER_ZONE Owner’szonenameoftheFNM_BASE_MAPFNM_BASE_MAP_VERSION VersionoftheFNM_BASE_MAP(emptyor0meanscurrent)FNM_BASE_NAME ForeignkeyreferencetoFNM_BASE_MAP_BASE_NAMEFNM_COMMENT CommentfortheFNMMappingFNM_CREATE_TIME CreationtimeoftheFNMMappingFNM_EXT_FUNC_NAME ExternalnamefortheFNMMapping

Page 204: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

196

FNM_ID AninternalidentifierforFNMMappingFNM_INT_FUNC_NAME InternalStructurepathinREIcorrespondingtoFNM_EXT_FUNC_NAMEFNM_MODIFY_TIME LastmodificationtimefortheFNMMappingFNM_OWNER_NAME Owner’snameoftheFNM_MappingFNM_OWNER_ZONE Owner’szonenameoftheFNMMappingFNM_STATUS StatusoftheFNM_Mapping(emptyisvalid)FNM_VERSION VersionfortheFNM_Mapping(emptyor0meanscurrent)META_ACCESS_META_ID Internalidentifierofthe(AVU)metadataforwhichaccessisdefinedMETA_ACCESS_NAME SeeDATA_ACCESS_NAMEMETA_ACCESS_TYPE InternalICATidentifierMETA_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonmetadataMETA_COLL_ATTR_ID InternalidentifierformetadataattributeforcollectionMETA_COLL_ATTR_NAME MetadataattributenameforcollectionMETA_COLL_ATTR_UNITS MetadataattributeunitsforcollectionMETA_COLL_ATTR_VALUE MetadataattributevalueforcollectionMETA_COLL_CREATE_TIME CreationtimeforthemetadataforcollectionsMETA_COLL_MODIFY_TIME LastmodificationtimeforthemetadataforcollectionsMETA_DATA_ATTR_ID InternalidentifierformetadataattributefordigitalobjectMETA_DATA_ATTR_NAME MetadataattributenamefordigitalobjectMETA_DATA_ATTR_UNITS MetadataattributeunitsfordigitalobjectMETA_DATA_ATTR_VALUE MetadataattributevaluefordigitalobjectMETA_DATA_CREATE_TIME TimestampwhenmetadatawascreatedMETA_DATA_MODIFY_TIME TimestampwhenmetadatawasmodifiedMETA_MET2_ATTR_ID InternalidentifierformetadataattributeformetadataMETA_MET2_ATTR_NAME MetadataattributenameformetadataMETA_MET2_ATTR_UNITS MetadataattributeunitsformetadataMETA_MET2_ATTR_VALUE MetadataattributevalueformetadataMETA_MET2_CREATE_TIME CreationtimeforthemetadataformetadataMETA_MET2_MODIFY_TIME LastmodificationtimeforthemetadataformetadataMETA_MSRVC_ATTR_ID Internalidentifierformetadataattributeformicro‐serviceMETA_MSRVC_ATTR_NAME Metadataattributenameformicro‐serviceMETA_MSRVC_ATTR_UNITS Metadataattributeunitsformicro‐serviceMETA_MSRVC_ATTR_VALUE Metadataattributevalueformicro‐serviceMETA_MSRVC_CREATE_TIME Creationtimeforthemetadataformicro‐serviceMETA_MSRVC_MODIFY_TIME Lastmodificationtimeforthemetadataformicro‐serviceMETA_NAMESPACE_COLL NamespaceofcollectionAVU‐tripletattributeMETA_NAMESPACE_DATA NamespaceofdigitalobjectAVU‐tripletattributeMETA_NAMESPACE_MET2 NamespaceofmetadataAVU‐tripletattributeMETA_NAMESPACE_MSRVC Namespaceofmicro‐serviceAVU‐tripletattributeMETA_NAMESPACE_RESC NamespaceofresourceAVU‐tripletattributeMETA_NAMESPACE_RESC_GROUP Namespaceofresource‐groupAVU‐tripletattributeMETA_NAMESPACE_RULE NamespaceofruleAVU‐tripletattributeMETA_NAMESPACE_USER NamespaceofuserAVU‐tripletattributeMETA_RESC_ATTR_ID InternalidentifierformetadataattributeforresourceMETA_RESC_ATTR_NAME MetadataattributenameforresourceMETA_RESC_ATTR_UNITS MetadataattributeunitsforresourceMETA_RESC_ATTR_VALUE MetadataattributevalueforresourceMETA_RESC_CREATE_TIME CreationtimeforthemetadataforresourceMETA_RESC_MODIFY_TIME LastmodificationtimeforthemetadataforresourceMETA_RESC_GROUP_ATTR_ID InternalidentifierformetadataattributeforresourcegroupMETA_RESC_GROUP_ATTR_NAME MetadataattributenameforresourcegroupMETA_RESC_GROUP_ATTR_UNITS MetadataattributeunitsforresourcegroupMETA_RESC_GROUP_ATTR_VALUE MetadataattributevalueforresourcegroupMETA_RESC_GROUP_CREATE_TIME CreationtimeforthemetadataforresourcegroupMETA_RESC_GROUP_MODIFY_TIME Lastmodificationtimeforthemetadataforresourcegroup META_RULE_ATTR_ID InternalidentifierformetadataattributeforaruleMETA_RULE_ATTR_NAME MetadataattributenameforaruleMETA_RULE_ATTR_UNITS MetadataattributeunitsforaruleMETA_RULE_ATTR_VALUE MetadataattributevalueforaruleMETA_RULE_CREATE_TIME CreationtimeforthemetadataentryforaruleMETA_RULE_MODIFY_TIME LastmodificationtimeforthemetadataforaruleMETA_TOKEN_NAMESPACE SeeTOKEN_NAMESPACEMETA_USER_ATTR_ID InternalidentifierformetadataattributeforuserMETA_USER_ATTR_NAME Metadataattributenameforuser

Page 205: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

197

META_USER_ATTR_UNITS MetadataattributeunitsforuserMETA_USER_ATTR_VALUE MetadataattributevalueforuserMETA_USER_CREATE_TIME Internalidentifierofthe(AVU)metadataforwhichaccessisdefinedMETA_USER_MODIFY_TIME SeeDATA_ACCESS_NAMEMSRVC_ACCESS_MSRVC_ID InternalICATidentifierMSRVC_ACCESS_NAME Userorgroup(name)forwhichtheaccessisdefinedonmetadataMSRVC_ACCESS_TYPE InternalICATidentifierMSRVC_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonthemicro‐serviceMSRVC_COMMENT Commentsformicro‐serviceMSRVC_CREATE_TIME Creationtimeforthemicro‐serviceMSRVC_DOXYGEN Doxygendocumentationforthemicro‐serviceMSRVC_HOST Hosttypesatwhichthemicro‐servicecanbeexecutedMSRVC_ID InternalIdforthemicro‐serviceMSRVC_LANGUAGE Languageinwhichthemicro‐serviceiswrittenMSRVC_LOCATION TheLocationofthemicro‐serviceexecutableMSRVC_MODIFY_TIME LastModificationtimeforthemicro‐serviceMSRVC_MODULE_NAME Modulenameforthemicro‐serviceMSRVC_NAME Nameofthemicro‐serviceMSRVC_OWNER_NAME Ownernameofthemicro‐serviceMSRVC_OWNER_ZONE Owner’szonenameofthemicro‐serviceMSRVC_SIGNATURE Digitalsignature(checksum)forthemicro‐serviceMSRVC_STATUS Statusofthemicro‐serviceMSRVC_TOKEN_NAMESPACE SeeTOKEN_NAMESPACEMSRVC_TYPE_NAME Typeofthemicro‐serviceMSRVC_VARIATIONS Variations(orforms)ofthemicro‐serviceMSRVC_VER_COMMENT Commentsonthemicro‐serviceMSRVC_VER_CREATE_TIME Creationtimeofversionofthemicro‐serviceMSRVC_VER_MODIFY_TIME Lastmodificationtimeofversionofthemicro‐serviceMSRVC_VER_OWNER_NAME Ownernameoftheversionofthemicro‐serviceMSRVC_VER_OWNER_ZONE Ownerzonenameoftheversionofthemicro‐serviceMSRVC_VERSION Versionofthemicro‐serviceQUOTA_LIMIT HighlimitforquotaforresourceinQUOTA_RESC_IDforQUOTA_USER_IDQUOTA_MODIFY_TIME LastmodificationtimeofquotaQUOTA_OVER FlagifquotaisexceededQUOTA_RESC_ID InternalResourceIDforquotaQUOTA_RESC_NAME ResourceNameforquotaQUOTA_USAGE NameofUsageforquota(normallywrite)QUOTA_USAGE_MODIFY_TIME LastmodificationtimeofquotausageQUOTA_USAGE_RESC_ID InternalResourceIDforquotausageQUOTA_USAGE_USER_ID InternalUserIDforquotausageQUOTA_USER_ID InternalUserIDforquotaQUOTA_USER_NAME UserNameforQuotaQUOTA_USER_TYPE UsertypenameforquotaQUOTA_USER_ZONE UserzonenameforquotaRESC_ACCESS_NAME SeeDATA_ACCESS_NAMERESC_ACCESS_RESC_ID InternalidentifieroftheresourceforwhichaccessisdefinedRESC_ACCESS_TYPE InternalICATidentifierRESC_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonresourceRESC_CLASS_NAME Resourceclass:primary,secondary,archivalRESC_COMMENT CommentaboutresourceRESC_CREATE_TIME CreationtimestampofresourceRESC_FREE_SPACE FreespaceavailableonresourceRESC_FREE_SPACE_TIME TimeatwhichfreespacewascomputedRESC_GROUP_ID InternalIdforresourcegroupRESC_GROUP_NAME LogicalnameoftheresourcegroupRESC_GROUP_RESC_ID InternalidentifierfortheresourcegroupRESC_ID Internalresourceidentifierforresourceinthegroup

RESC_INFOTaggedinformationlist:<MAX_OBJ_SIZE>2GBB</MAX_OBJ_SIZE><MIN_LATENCY>1msec</MIIN_LATENCY>

RESC_LOC ResourceIPaddressRESC_MODIFY_TIME LastmodificationtimestampforresourceRESC_NAME LogicalnameoftheresourceRESC_STATUS OperationalstatusofresourceRESC_TOKEN_NAMESPACE SeeTOKEN_NAMESPACERESC_TYPE_NAME Resourcetype:HPSS,SamFS,database,orb

Page 206: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

198

RESC_VAULT_PATH ResourcepathforstoringfilesRESC_ZONE_NAME NameoftheiCAT,uniquegloballyRULE_ACCESS_NAME InternalidentifieroftheiRODSruleforwhichaccessisdefinedRULE_ACCESS_RULE_ID SeeDATA_ACCESS_NAMERULE_ACCESS_TYPE InternalICATidentifierRULE_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedoniRODSruleRULE_BASE_MAP_BASE_NAME NamefortheDataBaseofRuleSetofMaps(e.g.“core”incore.re).RULE_BASE_MAP_COMMENT CommentsforRULE_BASE_MAPRULE_BASE_MAP_CREATE_TIME CreationtimeforRULE_BASE_MAPRULE_BASE_MAP_MODIFY_TIME LastModificationtimeforRULE_BASE_MAPRULE_BASE_MAP_OWNER_NAME Owner’snameoftheRULE__BASE_MAPRULE_BASE_MAP_OWNER_ZONE Owner’szonenameoftheRULE_BASE_MAP

RULE_BASE_MAP_PRIORITYPrioritizationoftheRULE_BASE_MAP(emptyor0meanscurrent).Thistellswhichmaphaspriorityoverothermaps.Thiscandefineatree/forest.

RULE_BASE_MAP_VERSION VersionoftheRULE_BASE_MAP(emptyor0meanscurrent)RULE_BASE_NAME RulebasetowhichtheruleisamemberRULE_BODY BodyoftheruleRULE_COMMENT CommentsontheruleRULE_CONDITION ConditionoftheruleRULE_CREATE_TIME CreationtimeoftheruleRULE_DESCR_1 Descriptionofrule(1)RULE_DESCR_2 Descriptionofrule(2)RULE_DOLLAR_VARS SessionvariablesusedintheruleRULE_EVENT Eventnameoftherule(canbeviewedasrulename)RULE_EXEC_ADDRESS HostnamewherethedelayedRulewillbeexecutedRULE_EXEC_ESTIMATED_EXE_TIME EstimatedexecutiontimeforthedelayedRuleRULE_EXEC_FREQUENCY DelayedRuleexecutionfrequencyRULE_EXEC_ID InternalidentifierforadelayedRuleexecutionrequestRULE_EXEC_LAST_EXE_TIME PreviousexecutiontimeforthedelayedRuleRULE_EXEC_NAME LogicalnameforadelayedRuleexecutionrequestRULE_EXEC_NOTIFICATION_ADDR NotificationaddressfordelayedRulecompletionRULE_EXEC_PRIORITY DelayedRuleexecutionpriorityRULE_EXEC_REI_FILE_PATH Pathofthefilewherethecontext(REI)ofthedelayedRuleisstoredRULE_EXEC_STATUS CurrentstatusofthedelayedRuleRULE_EXEC_TIME TimewhenthedelayedRulewillbeexecutedRULE_EXEC_USER_NAME UserrequestingadelayedRuleexecutionRULE_ICAT_ELEMENTS Permanent(#‐variables)affectedbytheruleRULE_ID InternalidentifierfortheruleRULE_INPUT_PARAMS ParametersusedasinputwheninvokingtheruleRULE_MODIFY_TIME LastmodificationtimeoftheruleRULE_NAME Nameoftherule(canbedifferentfromRULE_EVENTRULE_OUTPUT_PARAMS OutputparameterssetbytheruleinvocationRULE_OWNER_NAME OwnernameoftheruleRULE_OWNER_ZONE Owner’szonenameoftheruleRULE_RECOVERY RecoverypartoftheruleRULE_SIDEEFFECTS Sideeffects(%‐variables)–usedasasemanticofwhattheruledoesRULE_STATUS Statusoftherule(valid/activeorotherwise)RULE_TOKEN_NAMESPACE SeeTOKEN_NAMESPACERULE_VERSION Versionoftherule

SL_CPU_USEDServerloadinformation:cpuused.Serverloadinformationiscomputedperiodicallyforallserversinthegrid,ifenabledbytheadministrator.

SL_CREATE_TIME Serverloadinformation:creationtimeoftheentrySL_DISK_SPACE Serverloadinformation:diskspaceusedSL_HOST_NAME Serverloadinformation:hostnameoftheserverSL_MEM_USED Serverloadinformation:memoryusedSL_NET_INPUT Serverloadinformation:networkinputloadSL_NET_OUTPUT Serverloadinformation:networkoutputloadSL_RESC_NAME Serverloadinformation:resourceforwhichdiskspaceisprovidedSL_RUNQ_LOAD Serverloadinformation:runqueueloadSL_SWAP_USED Serverloadinformation:swapspaceusedSLD_CREATE_TIME Serverloaddigestinformation:digestcreationtimeSLD_LOAD_FACTOR Serverloadinformation:loadfactorcomputedromserverloadinformationSLD_RESC_NAME Serverloadinformation:resourcenameforwhichtheloadfactoriscomputedTICKET_ALLOWED_GROUP_NAME Usergrouptowhichtheticket(TICKET_ALLOWED_GROUP_TICKET_ID)isvalidTICKET_ALLOWED_GROUP_TICKET_ID Identifierfortheticket

Page 207: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

199

TICKET_ALLOWED_HOSTHostforwhichtheticket(TICKET_ALLOWED_HOST_TICKET_ID)isvalidAllowsinvocationoftheticket‐basedaccessonlyfromthishost.Usefulforscheduledjobs

TICKET_ALLOWED_HOST_TICKET_ID IdentifierfortheticketTICKET_ALLOWED_USER_NAME Usertowhichtheticket(TICKET_ALLOWED_GROUP_TICKET_ID)isvalidTICKET_ALLOWED_USER_TICKET_ID IdentifierfortheticketTICKET_COLL_NAME CollectionnameonwhichtheticketisissuedTICKET_CREATE_TIME TicketcreationtimeTICKET_DATA_COLL_NAME CollectionnameoftheobjectonwhichtheticketisissuedTICKET_DATA_NAME DatanameoftheobjectonwhichtheticketisissuedTICKET_EXPIRY ExpirationdateforaticketTICKET_ID IdentifierfortheticketTICKET_MODIFY_TIME LastmodificationtimefortheticketTICKET_OBJECT_ID (Internal)ObjectIdfortheobjectonwhichtheticketisissuedTICKET_OBJECT_TYPE Ticketmaybefordata,resource,user,rule,metadata,zone,collection,tokenTICKET_OWNER_NAME NameofthepersonwhocreatedtheticketTICKET_OWNER_ZONE HomezoneofthepersonwhocreatedtheticketTICKET_STRING HumanreadablenamefortheticketTICKET_TYPE Typeofticket,either“read”or“write”TICKET_USER_ID IdentifierofthepersonwhoisusingtheticketTICKET_USES_COUNT NumberoftimesatickethasbeenusedTICKET_USES_LIMIT MaximumnumberoftimesaticketmaybeusedTICKET_WRITE_BYTE_COUNT NumberofbyteswrittenforaccessesthroughagiventicketTICKET_WRITE_BYTE_LIMIT MaximumnumberofbytesthatmaybewrittenusingagiventicketTICKET_WRITE_FILE_COUNT NumberoffileswrittenforaccessesthroughagiventicketTICKET_WRITE_FILE_LIMIT MaximumnumberoffilesthatcanbewrittenusingagiventicketTOKEN_COMMENT CommentontokenTOKEN_ID InternalidentifierfortokennameTOKEN_NAME Avalueinthetokennamespace;e.g.“jpgimage”TOKEN_NAMESPACE Namespacefortokens;e.g.datatype,resource_type,rule_type,…TOKEN_VALUE Additionaltokeninformationstring(e.g.dotextensionsforjpg:jpg,.jpg2,jg)TOKEN_VALUE2 AdditionaltokeninformationstringTOKEN_VALUE3 AdditionaltokeninformationstringUSER_COMMENT CommentabouttheuserUSER_CREATE_TIME CreationtimestampUSER_DN Distinguishednameintaggedlist:<authType>distinguishedName</authType>USER_GROUP_ID InternalidentifierfortheusergroupUSER_GROUP_NAME LogicalnamefortheusergroupUSER_ID Userinternalidentifier

USER_INFOTaggedinformation:<EMAIL>[email protected]</EMAIL><PHONE>5555555555</PHONE>

USER_MODIFY_TIME LastmodificationtimestampUSER_NAME Username

USER_TYPEUserrole(rodsgroup,rodsadmin,rodsuser,domainadmin,groupadmin,storageadmin,rodscurators)

USER_ZONE HomeDataGridoruserZONE_COMMENT Commentaboutthezone

ZONE_CONNECTIONConnectioninformationintaggedlist;<PASSWORD>RPS1</PASSWORD><GSI>DISTNAME</GSI>

ZONE_CREATE_TIME DateandtimestampforcreationofadatagridZONE_ID DataGridorzoneidentifierZONE_MODIFY_TIME DateandtimestampformodificationofadatagridZONE_NAME DataGridorzonename,nameoftheiCATZONE_TYPE Typeofzone:local/remote/other

Page 208: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

200

Appendix E:  Protected Data Requirements 

Thedatamanagementrequirementsareabstractedfromthedocument,https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf

Eachrequirementhasbeenevaluatedforthefeasibilityofcreatingacomputeractionablepolicythatautomatesenforcement. Documentthepolicies Protecttheconfidentiality,integrity,andavailabilityofinformationfrom

accidentalorintentionalunauthorizedmodification,destructionordisclosure Periodicriskassessmenttodocumenttypesofthreatsandvulnerabilities,and

evaluateinformationassetsandtechnologyfordatacollection,storage,dissemination,andprotection

Protectedassetsinclude:o Paymentcardaccountnumbers,cardholdername,expirationdate,

servicecode,andCID/PINso Legallycoveredentitieso SocialSecurityNumbersandpersonalinformationo ProtectedHealthInformation–demographic,physicalormentalhealth,

provisionofhealthcare,healthcarepaymentthatidentifiestheindividual

Protectiontaskso Dataavailableondemandbyanauthorizedpersono Datanotaccessiblebyunauthorizedpersonorprocesso Encryptiono Integrityo Identifyinvolvedpersonidentificationo Identifyinvolvedcomputersystems

SecurityOfficeo Monitorpolicydistributiontoresourceso Basicsecuritysupport(accounts,accesscontrols,OSupgrades)o Classificationofcomputerresourceso Systemdesignforsecuritycontrolso Vulnerabilitydetection,notificationo Detectionofunauthorizedaccess(audittrails)o Trainingo Securityauditso Reports

Collectionownerso PresenceofHIPAAinformationo Dataretentionperiodo Applicationofpoliciesandproceduresfordataprotectiono Authorizingaccesso Specifyingcontrols,settingcontrolpolicieso Reportinglossormisuse

Page 209: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

201

o Correctingproblemso Trainingo Trackingapprovalprocessesforsystems

Datagridadministrator–custodiano Providephysicalsafeguards–one‐timepasswordstoaccessiCATo Provideproceduresforsecurityo Controlaccesstoinformationo Releaseinformationthroughprivacyprocedureso Evaluatecosteffectivenessofcontrolso Maintainpoliciesandprocedureso Promoteeducationo Reportlossormisuseo Respondtosecurityincidents

Usermanagement–projectso Reviewandapproverequestsforaccesso Updateemployees’securityrecordswithpositionandjobfunction

changeso Updateaccessonemployeeterminationortransfero Revokephysicalaccesstoterminatedemployeeso Promotetrainingo Reportlossormisuseo Initiatecorrectiveactionso Followrecommendationsforpurchaseandimplementationofsystems

Usero Onlyaccessinformationforauthorizedjobresponsibilitieso Complywithaccesscontrolso ReportdisclosuresofPHIotherthanfortreatment,payment,orhealth

careo Keeppersonalauthenticationinformationconfidentialo Reportlossormisuseo Initiatecorrectiveactions

Classifyinformationo Protectedhealthinformationo Confidentialinformation–PCI,PIo Internalinformation–allinformationnotPHI,Confidential,orPublico Publicinformation

Computerandinformationcontrolo Ownership,licensingofsoftwareo Inventoryofsoftwareandcomputers,users,managerso Virusprotection,scanallfileso Accesscontrols

authorizationbysupervisorcontextbased–ticket authorizationrolebased authorizationuserbased authentication–uniqueuserID

Page 210: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

202

Controlledpasswords Biometric TokensinconjunctionwithaPIN

Passwordsecurity Nore‐useormultipleuse Minimumlength,expiration,encryptionduring

transmission,storage Logunsuccessfulattempts Proceduresforvalidatinguserswhorequestpassword

reset Automatictimeoutafterperiodofinactivity Log‐off

o Dataintegrity Transactionaudit Replication Checksums Encryptioninstorage Digitalsignatures Datavalidationonentry

o Transmissionsecurity Integrity–checksums Encryptioninmessagingsystems

o Remoteaccess Onlyapprovedmethodsandpathways

o Physicalaccess Accesscontrolledareas,HVAC Authenticationtodatagridandaccesscontrols Authenticationtoworkstation,automaticscreensavers

o Facilityaccesscontrols Contingencyforemergencyoperationsafterdisaster Facilitysecurityplan–policiesandprocedures Documentedprocedurestovalidateaccess Documentedmaintenanceoffacility

o Emergencyaccess Proceduresforauthorization,implementation,revocation

Equipmentandmediacontrolso Mediadisposalo Trackcustodyofmediao Databackup

Othermediacontrolso Encryptionforstorageonremovablemediao Encryption,power‐onpasswords,autologoffformobiledeviceso Ownershipofmediaforassigningresponsibility

Datatransfer/printingo Approvalforbulkdownload

Page 211: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

203

o De‐identificationofdata–Bitcuatoro Encryptdatatransfers

SocialMediao NoPHI,confidential,orproprietaryinformationo Nopatientidentificationinformationo Nopatientphotographs

Auditcontrolso Recordactivitybyusersandsystemadministratorso Reviewactivitylogso Preservereviewsfor6years

Evaluationo Verifyproceduresaftereachoperationalorenvironmentalchange

Contingencyplano Enablerecoveryofdata

Documentdatabackupplan Backupdataoffsite Manageaccesscontrolsonreplicas

o Disasterrecoveryplan–procedureforrestoringdatao Emergencyoperationplan–fornaturaldisasterso Proceduresfortestingcontingencyplansonrevisiono Identifycriticalcomponents

Passwordcontrolso Nosharingofpasswordso Singlesign‐onsystemforpasswordso NopasswordsonPCo Nodictionarywordso Encryptpasswordso Maximumof5invalidpasswordscauseslockoutfor30minuteso Contain1uppercase,1lowercase,1numbero Minimumlengthof10characterso Passwordschangedannuallyo Maintainhistoryofprior6passwords,preventre‐use

Peer‐to‐peero F2Pfile‐sharingprogramsareprohibitedo InternetstoragemaynotbeusedforPHIandconfidentialinformation

Page 212: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

204

Appendix F: Mauna Loa Sensor Data DMP TypesofDataProducedAirsamplesatMaunaLoaObservatorywillbecollectedcontinuouslyfromairintakeslocatedatfivetowers–acentraltowerandfourtowerslocatedatcompassquadrants.RawdatafileswillcontaincontinuouslymeasuredCO2concentrations,calibrationstandards,referencesstandards,dailycheckstandards,andblanks.Thesamplelineslocatedatcompassquadrantswereusedtoexaminetheinfluenceofsourceeffectsassociatedwithwinddirections[3,4].InadditiontotheCO2data,wewillrecordweatherdata(windspeedanddirection,temperature,humidity,precipitation,andcloudcover).SiteconditionsatMaunaLoaObservatorywillalsobenotedandretained.Thefinaldataproductwillconsistof5‐minute,15‐minute,hourly,daily,andmonthlyaverageatmosphericconcentrationofCO2,inmolefractioninwater‐vapor‐freeairmeasuredattheMaunaLoaObservatory,Hawaii.DataarereportedasadrymolefractiondefinedasthenumberofmoleculesofCO2dividedbythenumberofmoleculesofdryairmultipliedbyonemillion(ppm).Thefinaldataproducthasbeenthoroughlydocumentedintheopenliterature[2]andinScrippsInstitutionofOceanographyInternalReports[1].Thedatagenerated(rawCO2measurements,meteorologicaldata,calibrationandreferencestandards)willbeplacedincomma‐separated‐valuesinplainASCIIformat,whicharereadableoverlongtimeperiods.Thefinaldatafilewillcontaindatesforeachobservation(time,day,monthandyear)andtheaverageCO2concentration.Thefinaldataproductdistributedtomostuserswilloccupylessthan500KB;rawandancillarydata,whichwillbedistributedonrequestcompriselessthan10MB. DataandMetadataStandardsMetadatawillbecomprisedoftwoformats–contextualinformationaboutthedatainatextbaseddocumentandISO19115standardmetadatainanxmlfile.Thesetwoformatsformetadatawerechosentoprovideafullexplanationofthedata(textformat)andtoensurecompatibilitywithinternationalstandards(xmlformat).ThestandardXMLfilewillbemorecomplete;thedocumentfilewillbeahuman‐readablesummaryoftheXMLfile. PoliciesforAccessandSharingThefinaldataproductwillbereleasedtothepublicassoonastherecalibrationofstandardgaseshasbeencompletedandthedatahavebeenprepared,typicallywithinsixmonthsofcollection.Thereisnoperiodofexclusiveusebythedatacollectors.UserscanaccessdocumentationandfinalmonthlyCO2datafilesviatheScrippsCO2Programwebsite(http://scrippsco2.ucsd.edu).ThedatawillbemadeavailableviaftpdownloadfromtheScrippsInstitutionofOceanographyComputerCenter.Rawdata(continuousconcentrationmeasurements,weatherdata,etc.)willbemaintainedonaninternallyaccessibleserverandmadeavailableonrequestatnochargetotheuser. PoliciesforRe‐use,DistributionAccesstodatabasesandassociatedsoftwaretoolsgeneratedundertheprojectwillbeavailableforeducational,researchandnon‐profitpurposes.Suchaccesswillbeprovidedusingweb‐basedapplications,asappropriate.

Page 213: Policy Template Workbook iRODS 4datafed.org/dev/wp-content/uploads/2016/05/DFC-policy-template.pdf · Scalable Data Management Infrastructure in a Data Grid‐Enabled Digital Library

205

MaterialsgeneratedundertheprojectwillbedisseminatedinaccordancewithUniversity/ParticipatinginstitutionalandNSFpolicies.Dependingonsuchpolicies,materialsmaybetransferredtoothersunderthetermsofamaterialtransferagreement.Publicationofdatashalloccurduringtheproject,ifappropriate,orattheendoftheproject,consistentwithnormalscientificpractices.Researchdatawhichdocuments,supportsandvalidatesresearchfindingswillbemadeavailableafterthemainfindingsfromthefinalresearchdatasethavebeenacceptedforpublication. PlansforArchivingandPreservationShortTerm:Thedataproductwillbeupdatedmonthlyreflectingupdatestotherecord,revisionsduetorecalibrationofstandardgases,andidentificationandflaggingofanyerrors.Thedateoftheupdatewillbeincludedinthedatafileandwillbepartofthedatafilename.Versionsofthedataproductthathavebeenrevisedduetoerrors/updates(otherthannewdata)willberetainedinanarchivesystem.Arevisionhistorydocumentwilldescribetherevisionsmade.DailyandmonthlybackupsofthedatafileswillberetainedattheKeelingGroupLab(http://scrippsco2.ucsd.edu,accessed05/2011),attheScrippsInstitutionofOceanographyComputerCenter,andattheWoodsHoleOceanographicInstitution’sComputerCenter.LongTerm:Ourintentisthatthelong‐termhighqualityfinaldataproductgeneratedbythisprojectwillbeavailableforusebytheresearchandpolicycommunitiesinperpetuity.Therawsupportingdatawillbeavailableinperpetuityaswell,forusebyresearcherstoconfirmthequalityoftheMaunaLoaRecord.Theinvestigatorshavemadearrangementsforlong‐termstewardshipandcurationattheCarbonDioxideInformationandAnalysisCenter(CDIAC),OakRidgeNationalLaboratory(seeletterofsupport).ThestandardizedmetadatarecordsfortheMaunaLoaCO2datawillbeaddedtothemetadatarecorddatabaseatCDIAC,sothatinteresteduserscandiscovertheMaunaLoaCO2recordalongwithotherrelatedEarthsciencedata.CDIAChasastandardizeddataproductcitation[5]includingDOI,thatindicatestheversionoftheMaunaLoaDataProductandhowtoobtainacopyofthatproduct.