22
Criteria and evaluation of research data repository platforms @ the University of Pretoria, South Africa Presented by Mr. Johann van Wyk & Mr. Isak van der Walt Library Services University of Pretoria Project team UP IT: Karin, Yzelle, Herman UP Library: Isak, Johann, Heila

Criteria and evaluation of research data repository ... · repository platforms @ the University of Pretoria ... A total RDM solution include all phases of the Research data life

Embed Size (px)

Citation preview

Criteriaandevaluationofresearchdatarepositoryplatforms@theUniversityofPretoria,SouthAfrica

PresentedbyMr.JohannvanWyk &Mr.IsakvanderWaltLibraryServicesUniversityofPretoria

ProjectteamUPIT:Karin,Yzelle,HermanUPLibrary:Isak,Johann,Heila

Agenda

• ProjectScope&projectteam• Researchdatalifecycle• E-ResearchFramework• ProductInvestigation• Criteria&evaluation• Recommendations• NextSteps• Documentsproduced

ProjectScope

Thescopeoftheprojectwastoevaluateproducts(commercialandopensource)whichcouldbeutilisedasaResearchDataRepositoryPlatformaspartofatotalResearchDataManagement(RDM)solutionatUP.

AtotalRDMsolutionincludeallphasesoftheResearchdatalifecycle,butfortherepositorysolution,thefocuswasthusonidentifyingapotentialsolutionforthe“Dissemination”phaseoftheresearchdatalifecycle.

RDMRepositoryProjectTeamBusinessSponsor– ProfStephanieBurton(VP:Research)ITSSponsor– AndreKleynhans (DeputyDirector:ITS)

ProjectTeammembers:ITSProjectManagerandBusinessAnalyst– KarinMeyerITSInfrastructureArchitect - DrYzelleRoetsITSeResearchSupportManager– HermanJacobs

LibraryServices:SeniorITConsultant– IsakvanderWaltLibraryServices:AssistantDirector:RDM– JohannvanWykLibraryServices:DeputyDirector:StrategicInnovation– DrHeilaPienaar

DATAFLOWwithintheRESEARCHDATALIFECYCLE

PROCESSESwithintheRESEARCHDATALIFECYCLE

CreatingData

ProcessingData

AnalysingData

ResearchDataLifeCycle

Re-usingData

PreservingData

GivingAccesstoData

ProductInvestigationMethodologyFinalisationofproductevaluationcriteria• Consultedwithvariousstakeholders

• LibraryandITSstaff• ExternalstakeholdersattheNEDICCworkshopheldattheCSIR• PeerUniversities

• Utilisedvariousselectioncriteria fromotherinstitutionse.g.LeedsUniversity,TexasDigitalLibraryandtheRDARPRDIGMatrix(http://tinyurl.com/RPRD-matrix)selectioncriteriaasabasisandadapteditaccordingtoUPspecificrequirements.

ProductShortListingProductswereshortlistedbasedonthefollowing:• Productscanofproductsbeingusedinternationally,and• MostcommonlyusedproductsatuniversitiessimilartoUP(sizeandresearchactivity).

ProductEvaluation• UP’sformalRequestForInformation(RFI)processwasfollowed• Productevaluationcriterialistwascompiledandsendtoshortlistedvendorstogether

withstandardRFIdocumentation• Therequestedinformationwasreceivedfromthevendorsandpreparedforscoring,and• Productswerescoredandevaluated.

EvaluationCriteria

• Functional/Businesscriteria:DepositandUpload;Re-Usability;IdentityandAccessManagement;Reporting;Discovery;Preservation

• NonFunctional:RepositoryArchitecture;DataManagement;DataGovernance

• Technicalaspects:Back-endManagement;Integration;Infrastructure

• Vendorspecific:Support,Training,UsageofProduct

• Performance requirements• Integration requirements

UniqueID RequirementDescription Priority

DU-1 Offercustomisablemetadataschemaasperresearchareaordiscipline(includingmandatoryfields). H

DU-2 Offertheindexingofmetadata. H

DU-3 Offersufficientsupportforgeospatialandjournalarticlemetadata.Supportassociationofsingleormultiplefileswithonemetadatarecord. H

DU-4 Uploadandstoremetadataatadataobjectlevel,whereadataobjectisafolderthatcontainsoneormorefiles. M

DU-5 Supportmultiplefiletypesandformatsofdata,e.g.MSExcel2007,MySQLdatabase,rawdatafilefromaCampbellCR10datalogger,anymultimedia,etc. H

DU-6 Thesystemshouldhaveasimpleprocessforuploadinglarge(multi-TB)datasets,potentiallyconsistingofthousandsoffiles. Musthavetheabilitytouploadlargedatasets(e.g.2MB,2GB,1TB). H

DU-7 Supportcontrolledlistsagainstsomemetadatafields,eitherheldlocallyordrawnfromanexternalsourcee.g.Subjectvocabularies. H

DU-8 Supportcustomisationofout-of-the-boxhelptextandprovidecontextsensitivefeedbackforthedepositore.g.Highlightmissingmetadatafields,fileuploadfailurealert. M

DU-9 Accommodateworkflowwheredataneedstobedestructedwithanapprovalprocessandaudittrail. L

DU-10 Researchersmustbeabletosubmitdatatorepositorythemselves. H

DU-11 Processofsubmittingdatatoarepositoryfromothersystems/instruments. H

DU-12 Abilitytobatchuploaddataintoarepository. H

DU-13 Thirdpartymustbeabletouploaddatasetonbehalfofresearcher. H

DU-14 Supportgeneration/labellingofpersistentuniqueidentifiersfordatasetsincludingDOIs. H

DU-15 Abilitytosupportthesubmissionofdataatanyresearchstage(i.e.InitialData,WorkingData,FinalDataStages)totherepository. M

DU-16 Explainhowuserinterfacecustomisationisachieved. H

DU-17 Out-of-the-boxuserinterfaceintuitive(easytouse)tousers. M

DU-18 Out-of-the-boxuserinterfacemeetsaccessibilityrequirements,e.g.W3CWCAG1. H

DU-19 AssignmentofIntellectualProperty(IP)rightsandmultiplecontentlicensingoptionswithtermsandconditionsexposedclearly humanandmachinere-usersispossible,suchascopyrightandcreativecommons(CC). H

Table 1: Deposit and Upload functional criteria

ShortlistedProducts&RFIFeedback

ProductVendor/ImplementationPartner

RFIFeedback

DSpace Atmire Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.

Figshare DigitalScience Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.

Islandora Discoverygarden Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.

Dataverse HarvardUniversity Receivedinsufficientinformationoncriterialist,implementationoptionsandcost.

PURR PurdueUniversity FailedtorespondtoRFI.

RedboxQueenslandCyberInfrastructureFoundation(QCIF)

Receivedinformationoncriterialist,butRedboxisonlyametadatarepositoryandnotadatarepository.

Implementationoptionswithmostimportantadvantages/disadvantages– Option1

Option Advantages Disadvantages

Option1- Locallyhosted(bothapplicationandstoragearelocallyhostedatUP)

• UPnotdependentoninternetforaccesstoapplication

• UPabletomanageown data• Compliancetolegalissuesregardingdata,i.e.POPIAct

• Riskofsecurityislower(controlownstorage)

• Resourcestobeprovided(includesInfrastructureandHumanresourcesforapplication andstorage)whichincreasecost

• Requiredskillsset(e.g.webskills)islimitedornotcurrentlyavailableinITS

• UPbandwidthwillcauserestrictions,i.e.indexingofsite

• Opensourceproduct- nolegalentity/responsiblecompanyforassistance,support,enhancements,newreleases,etc.

Implementationoptionswithmostimportantadvantages/disadvantages– Option2

Option Advantages Disadvantages

Option2- Hybrid(applicationiscloudhosted,whilethestorageislocallyhosted)

• Collaborationwithotherinstitutionsinfutureiseasier

• Noadditionalresources(HRorinfrastructure)arerequiredfortheapplication

• Legalentity existi.e..theapplication

• Geographicredundancy• HighavailabilityontheUPfrontend– nobandwidthconstraints

• Meta dataaswellasdatawillbealwaysavailable,searchableandabletobeindexed

• UPwillbeincontroloftheirIP(controlownstorage)

• Riskofsecuritywillbelower(controlownstorage)

• ResourcestobeprovidedwhichincludesinfrastructureandhumanresourcesforstorageaswellasRD,backups,accesscontrol,cooling,etc.

• Requiredskillsset(e.g.webskills)islimitedornotcurrentlyavailableinITS

• Indexing ofsitedependentonUP’sbandwidth

Implementationoptionswithmostimportantadvantages/disadvantages– Option3

Option Advantages Disadvantages

Option3- Fullycloud-based(boththeapplicationandstoragearecloudhostedthroughthevendor)

• Collaborationwithotherinstitutionsinfutureiseasier

• Noadditionalresources(HRorinfrastructure)arerequiredfortheapplication

• Legalentity existi.e.theapplication

• Geographicredundancy• HighavailabilityontheUPfrontend– nobandwidthconstraints

• Meta dataaswellasdatawillbealwaysavailable,searchableandabletobeindexed

• UPwillbeincontroloftheirIP(controlownstorage)

• Riskofsecuritywillbelower(controlownstorage)

• UPdoesnothavecontrolofIP(governanceandaccessibilitytoUP’sdataisinthehandsofthevendor)

• PossiblefuturesanctionsagainstsomecountriesmayresultinsomeusersfromotherpartsoftheworldnotbeingabletoreachUP’srepository

• GrowingrunningcostasUPwillhavetopayforup-anddownloading aswellasstorageofdata

ProductEvaluationResults

Criteria Figshare Islandora DSpace

BEEEEAllproductsandassociatedvendors/implementationpartnersareinternationally based,

thereforenoweightwasassignedinthescoringexercise.RequirementsCriteria(inclfunctional,non-functional,vendor)

85%fit 96%fit 65%fit

Pricing

Preferentialcriteria:HybridOption(option2)

100%Fit10%fit– onlyavailable throughhugecustomdevelopment

whichposeshugeriskstoUP.0%Fit

Preferentialcriteria:Consortialpricing

100%Fit 0%fit 0%fit

CONFIDENTIAL

RecommendationsThefollowingisrecommendedforimplementingofaResearchDataRepositoryplatform)solutionatUP:• Figshare should be considered as the product of choice• Implement the Hybrid implementation option with the application

being cloud hosted and a local storage of 20Tb to start with• Local storage can be supplemented in future with Cloud storage• Storage should be investigated in line with the total eResearch

initiative and framework of UP• A business owner needs to be identified to be responsible for a total

RDM implementation• Implementation of a Research Data Repository platform will require a

significant increase in Human and Infrastructure Resource components,and

• Consortial pricing can be kept in mind for the future and was not usedas a determining selection criterion.

NextSteps

• AppointaBusinessowner(s)foratotalRDMsolution• InvestigatetoolsthatcansupporttheResearch-in-Processphase,e.g.myTardis

• Finalisestoragesolution(eg.AfricanResearchCloud)• BusinessCasetosecureresources(financialandhuman)

• Implementationofrepositorysolution• Training ofresearchers&librarystaff

Gapanalysis:Figshare (obtained0onthesecriteria)

Functionalcriteria:• Mustbeabletochangedataformats,althoughmostformatsareagnostic.• Auto-generatepreservationmetadata,e.g.PREMIS.• Abilitytomigratefilesindatasetstonew/otherformatsovertime.• BecompliantwiththeOAIS(OpenArchivalInformationSystem)referencemodel.

Non-functionalcriteria:Offerde-duplicationofdata,metadata

Disadvantages:• TheannualsubscriptionfeeforFigshare isrelativelyhigh• Customisationisnotpossibleasitisaproprietaryproduct• Theproprietaryproductaspectalsolimitsthelookandfeelcustomisationofthe

producttoreflectmoreofUP’sfootprint,and• NolocalsupportexistswithinSouthAfrica.

ContextDiagram:ResearchDataManagement

Documents• UPResearchDataRepositoryEvaluation• UPResearchDataManagementBusinessRequirementsSpecification

• Executivesummary• RDMProjectProgressFeedback• ContextDiagramforRDM• Islandora,Figshare,Redbox,DSpace,Dataverse,PURRrequirementscriteriafeedbackdocuments

Stillalotofgroundtocover

ThankYou