81
Repository and preservation systems Chair: Chris Keene, Jisc 06/07/2016 1

Repository and preservation systems

  • Upload
    jisc

  • View
    1.219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Repository and preservation systems

Repository and preservation systemsChair: Chris Keene, Jisc

06/07/2016

1

Page 2: Repository and preservation systems

01/05/2023

IntroductionChair: Chris Keene, Jisc

Page 3: Repository and preservation systems

01/05/2023

Research data management shared service for the UKJohn Kaye, Jisc

Page 4: Repository and preservation systems

Jisc-CNI Conference Jisc Research Data Shared Service06/07/2016

Page 5: Repository and preservation systems

Contents

»Background and Policy Context»Sector Requirements»Shared Service»Timescales»Engagement

01/05/2023Jisc-CNI Research Data Shared Service 5

Page 6: Repository and preservation systems

Research at Risk

01/05/2023Jisc-CNI Research Data Shared Service 6

Page 7: Repository and preservation systems

Research Funder Policies

01/05/2023Jisc-CNI Research Data Shared Service 7

» Public good: Publicly funded research data are produced in the public interest should be made openly available with as few restrictions as possible

» Planning for preservation: Institutional and project specific data management policies and plans needed to ensure valued data remains usable

» Discovery: Metadata should be available and discoverable; Published results should indicate how to access supporting data

» Confidentiality: Research organisation policies and practices to ensure legal, ethical and commercial constraints assessed; research process should not be damaged by inappropriate release

» First use: Provision for a period of exclusive use, to enable research teams to publish results

» Recognition: Data users should acknowledge data sources and terms & conditions of access

» Public funding: Use of public funds for RDM infrastructure is appropriate and must be efficient and cost-effective

RCUK Common Principles on Data Policy

Page 8: Repository and preservation systems

EPSRC Policy

01/05/2023Jisc-CNI Research Data Shared Service 8

» Retained EPSRC-funded research data is preserved for a minimum of ten years

» Effective data curation is provided throughout full data lifecycle» Knowledge of publicly-funded research data holdings» Discoverability; recording of third party access requests» Notice and justification of access restrictions, for example ‘commercially

confidential’» Awareness and use of relevant law, for example FOI» Awareness and compliance with research data policise» Adequate RDM resource allocation for example from quality-related

research (QR) funding or research grants

Page 9: Repository and preservation systems

Strategic guidance from…

01/05/2023Jisc-CNI Research Data Shared Service 9

»UCISA research IT systems group - ›Procure a shared national RDM service

»UUK research policy network discussion –›Concern over multiple solutions

Page 10: Repository and preservation systems

What would you like Jisc to Provide?

01/05/2023Jisc-CNI Research Data Shared Service 10

2015 Research Systems Survey:» “Currently the UK is running a very inefficient model requiring individual

institutions to establish their own repositories. Influencing future central/research council provision would be useful”

» “A national data repository”» “Increasing use of CRISes to fulfill traditional repository functions does not

seem to be prioritised as an issue by JISC……”» “If not able to provide e.g. data repositories, influence funder or

sector/community provision to support the needs of that funder/community.”» “Data access and user tracking tools and statistics on shared archive

services”» “Development of the national research data registry. This will have

implications for institutional research data registry development.”

Page 11: Repository and preservation systems

A Key Requirement - Preservation

01/05/2023Jisc-CNI Research Data Shared Service 11

Page 12: Repository and preservation systems

A Key Requirement - Interoperability

01/05/2023Jisc-CNI Research Data Shared Service 12

Page 13: Repository and preservation systems

Vision

»Researchers shouldn’t need to think (too much!) about Research Data Management

»"Visible data, invisible infrastructure”›Provide researchers intuitive, easy functionality to publish, archive and preserve their research outputs.

›Provide interoperable systems to allow researchers and institutions to fulfil and go beyond policy requirements and adhere to best practice throughout the RDM lifecycle.01/05/2023Jisc-CNI Research Data Shared Service 13

Page 14: Repository and preservation systems

Why a Shared Service?

01/05/2023Jisc-CNI Research Data Shared Service 14

» There is no single “solution” easily available and that meets requirements for Universities to enable Research Data Management

» More effective Research Data Management must happen to comply with Funder Mandates, ensure data is not lost, and to realise a whole range of positive benefits

» A shared service (provided by Jisc) seems to offer a number of benefits:» Cost savings and efficiencies» Common approaches and practice» Research system standardisation and interoperability» Others…

Page 15: Repository and preservation systems

Pilot Institutions» Pilot institutions selected to create a balanced portfolio of types of

institution, specialisms and research systems already in place

01/05/2023Jisc-CNI Research Data Shared Service 15

Institution NameCardiff UniversityCREST - Consortium for Research Excellence, Support and Training (Buckinghamshire New University, Harper Adams, St Mary’s -Twickenham, UCA & Winchester)Imperial College of Science, Technology and MedicineMiddlesex UniversityPlymouth UniversityRoyal College of MusicSt George's Hospital Medical SchoolUniversity of CambridgeUniversity of LancasterUniversity of LincolnUniversity of St AndrewsUniversity of SurreyUniversity of York

Page 16: Repository and preservation systems

Pilots’ MVP’s

»Easy to use and cost effective archiving, ingest, preservation, repository, reporting and discovery supported that can handle sensitive data”

»“Robust data storage that has growth ability for active and archive data”

»“Standard metadata profile - international for interoperability”

»“Integration with all main CRIS systems”»“Meets REF and funder deposit requirements (supports

deposit of REF data output types)”»…..........01/05/2023Jisc-CNI Research Data Shared Service 16

Page 17: Repository and preservation systems

What we need

01/05/2023Jisc-CNI Research Data Shared Service 17

Page 18: Repository and preservation systems

Where are we now?

01/05/2023Jisc-CNI Research Data Shared Service 18

Page 19: Repository and preservation systems

Research at Risk Portfolio

01/05/2023Jisc-CNI Research Data Shared Service 19

Page 20: Repository and preservation systems

Project Support

01/05/2023Jisc-CNI Research Data Shared Service 20

Page 21: Repository and preservation systems

Project Support

01/05/2023Jisc-CNI Research Data Shared Service 21

Milestones 2015-18

Apr 2015-Dec 2015

Jan 2016 – July 2016 Aug-2016 -June 2017

Jul 2017-Sept 2017 Oct 2017-Apr 2018

-Requirements - HEI Pilots Selected-Procurement commences

- Support consultancy work begins-Supplier Framework selected

-Alpha Development-Alpha service tested and reviewed

-Beta Development-Feedback on Beta Service

-Detailed HEI requirements and technical architecture-Contracting commences

-Development Phase-Contact additional early adopter HEI’s and promote Beta Service

-Business planning and Begin Business Case-Market Research and Consultation

-Promote service to institutions-Start on next phases (service enhancement/modular)

-Requirements - HEI Pilots Selected-Procurement commences

-Institutional survey-HEI and supplier workshops-Pilot HEI selection process

- Business case decision

-If go then begin transition to production service

Page 22: Repository and preservation systems

researchdata.network

01/05/2023Jisc-CNI Research Data Shared Service 22

Page 23: Repository and preservation systems

01/05/2023

Jisc-CNI Research Data Shared Service

23

Thank you!Email: [email protected]

Twitter:@JohnPKaye

Blog: http://researchdata.jiscinvolve.org

Except where otherwise noted, this work is licensed under CC-BY-NC-ND

Page 24: Repository and preservation systems

01/05/2023

HydraTom Cramer, Stanford University – Chris Awre, University of Hull

Page 25: Repository and preservation systems

get a head on your repository

Tom CramerStanford University

@tcramer

Chris AwreUniversity of Hull

@clawre

Page 26: Repository and preservation systems

get a head on your repository

Why

?

Page 27: Repository and preservation systems

Why use a particular repository technology?

Page 28: Repository and preservation systems

Why use a particular repository technology?

Wrong question

How can we implement sustainable repository infrastructureto serve our digital content management needs?

Page 29: Repository and preservation systems

Answers to questions• How do I manage my various collections of different digital content?

• How can I deal with the different file types I’m having to archive?

• How do I ensure I can cope with the increasing amount of digital content I need to manage?

• How can I manage my digital content in a way that is meaningful?

• How can I ensure that I can sustain the technology choice I make?

Page 30: Repository and preservation systems

Building the digital library

Page 31: Repository and preservation systems

Creating a sustainable open source project

Technology Community

Page 32: Repository and preservation systems

Creating a sustainable open source project

Technology Community

Page 33: Repository and preservation systems

One Body, Many Heads

Page 34: Repository and preservation systems

One Body, Many Heads

Page 35: Repository and preservation systems

CRUD in Repositories

Page 36: Repository and preservation systems

CRUD in Repositories

Page 37: Repository and preservation systems

A Word About…• Flexible, Extensible, Durable,

Object Repository Architecture• Open source digital repository

• middleware for relating your objects and hooking them to services & storage

• Particularly powerful for data & other “non-simple” content types

• More than 300 adopters worldwide

• 4 major software releases since 2000

Page 38: Repository and preservation systems

Large UniversitiesSmall UniversitiesCollegesPublic BroadcastingGovernment MinistryNational LibrariesNational LabSmall Research LabsNational Digital RepositoryStatewide Digital LibrariesChemical Heritage FoundationMuseum of Performing ArtsA Shakespeare Festival

Self-deposit SystemDigital Collections SystemSheet MusicArchitectural ResourcesElectronic Theses & DissertationsDigital Image SystemMedia ManagementMedia Preservation SystemResearch Data ManagementDigitization Workflow SystemDigital Preservation SystemDigital Archives SystemAnd more!

Used By... Used For...

Page 39: Repository and preservation systems

Solutions and Solution Bundles

SufiaRYO (roll your own)

Hydra in a Box

Page 40: Repository and preservation systems

Trend 1: Move to Linked Data

PCDM (Portland Common Data Model), for data and code interoperability

Page 41: Repository and preservation systems

Trend 2: Architecting Layers & Gems for Code Reuse

Active FedoraHydra::PCDMHydra::Works

Curation ConcernsSufia

Local customizationHydra App Layers Hydra Gems

(kinda like sprinkles)browse-everythinghydra-editorhydra-derivativeshydra-role-managementhydra-shibbolethGeomashiiif_manifestorcidquestioning_authorityetc.

Page 42: Repository and preservation systems

Trend 3: Hydra-in-a-Box●Directed project to produce a turnkey solution

○ ...and a hosted service

○ ...and metadata enrichment engine

●2.5 years (May 2015 - November 2017)

●$2M grant from IMLS

●Core partners = DPLA, DuraSpace & Stanford○Plus significant & growing community

contributions

Page 43: Repository and preservation systems

What is Hydra? CommunityHydra Connect

Mailing lists, Slack, Skype/Hangouts

Meetings – manager and technical focus

Page 44: Repository and preservation systems

Hydra Partners & Adopters

Page 45: Repository and preservation systems

Hydra Partners & Known Users

Page 46: Repository and preservation systems

Hydra Partners & Known Users

Page 47: Repository and preservation systems

Communication Channels

Page 48: Repository and preservation systems

Hydra Interest & Working Groups

Page 49: Repository and preservation systems

Hydra – getting localised• Hydra New England (NE) regional group

• Hydra West Coast regional group

• Developer congresses• Stanford and Michigan this year so far

• Fostering face-to-face exchange of ideas and putting them into practice

Page 50: Repository and preservation systems

Hydra UKDurham

Lancaster

York

Hull

Oxford

LSE

Research datacatalogue

Research outputmanagement

Digitised archives

Marketing images

InstitutionalRepository

Born digital archive

Digital library

Page 51: Repository and preservation systems

Hydra in (other parts of) EuropeIreland

• Digital Repository of Ireland (based at Trinity College, Dublin)

• University College Dublin

• Maynooth University

Denmark

• Royal Library of Copenhagen

• Danish Technical University

Theatre Museum of Barcelona

Hydra Europe Symposia

• Dublin 2014• London 2015• ?

Page 52: Repository and preservation systems

Hydra Support

Page 53: Repository and preservation systems

PartnershipHydra would not work without partnership

Hydra would not work if we tried to do the same by ourselves

Partnership has brought together many different types of institution who would not have worked together otherwise

Partnership has been stimulated by recognising a common need and finding a way to address this together

Partnership has helped us find answers to our questions

Page 54: Repository and preservation systems

Answers to questions• How do I manage my various collections of different digital content?

• How can I deal with the different file types I’m having to archive?

• How do I ensure I can cope with the increasing amount of digital content I need to manage?

• How can I manage my digital content in a way that is meaningful?

• How can I ensure that I can sustain the technology choice I make?

Page 56: Repository and preservation systems

01/05/2023

Addressing the preservation gap at the University of YorkJenny Mitcham, University of York

Page 57: Repository and preservation systems

Addressing the preservation gap at the University of York

Jenny Mitcham - University of York

Jisc and CNI conference - 7 July 2016

Page 58: Repository and preservation systems

Why do we need digital preservation?

Page 59: Repository and preservation systems

Why is this relevant for research data?• Funder requirements around retention:

– NERC - data should be retained for a minimum of 10 years but for projects of major importance this may need to be 20 years or longer

– STFC - expect data to be retained for a minimum of 10 years and data that cannot be re-measured should be retained indefinitely

– Wellcome Trust – expect data to be kept for a minimum of 10 years but suggest longer periods for certain types of data

– EPSRC – expect research data to be securely preserved for a minimum of 10 years from the date of last access‐

Page 60: Repository and preservation systems

University of York RDM questionnaire 2013• Which data management issues have you come

across in your research over the last five years?– “Inability to read files in old software formats on old

media or because of expired software licences”– 24% of 181 researchers who answered this question

admitted this had been a problem for them

…and researchers already encounter barriers to reusing data

Page 61: Repository and preservation systems

Most universities have a place to store data

The researcher

The researcher gives data to the repository

Access to the research data via the repository interface

But what about this bit?The Open Archival Information System

Data reuse will happen here The repository

ingests the data

Page 62: Repository and preservation systems

Visible v. invisible

Visible

Invisible

Page 63: Repository and preservation systems
Page 64: Repository and preservation systems

Filling the digital preservation gap:Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

Page 65: Repository and preservation systems

The teamUniversity of Hull:• Chris Awre – Head of Information Services,

Library and Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University Archivist

University of York:• Julie Allinson – Manager, Digital York• Jen Mitcham – Digital Archivist

Artefactual Systems

Page 66: Repository and preservation systems

What have we been doing?• Phase 1 – explore: test Archivematica, research,

do some thinking (3 months)• Phase 2 – develop: make Archivematica better for

RDM, plan implementation (4 months)• Phase 3 – implement: set up proof of concepts at

York and Hull, investigation of the file format problem (6 months)

Page 67: Repository and preservation systems

York

Page 68: Repository and preservation systems

Hull

Page 69: Repository and preservation systems

A quick look at file formatsResearch data file formats are:• Numerous• Sometimes a bit obscure• Sometimes very big• Ever-changing• Often very newThis means they can be hard to preserve... because we can’t identify them. If we can’t identify them how can we carry out preservation activities?

Page 70: Repository and preservation systems

Top research data applications at York

Page 71: Repository and preservation systems

The NDSA Levels of Digital Preservation:

Level 2 requires you to know what you’ve got ...and levels 3 and 4 build on this

Page 72: Repository and preservation systems

Can we identify our research data?We ran Droid over the research data deposited with us over the past year. Out of 3752 individual files:• for 1382 (37%) of the files a file format was identified

– 668 (48%) by signature– 648 (47%) by extension– 65 (5%) by container

• 34 different file formats were identified automatically

Page 73: Repository and preservation systems

Identified research data files• Files identified by Droid (listed by file type)

Page 74: Repository and preservation systems

Unidentified research data files• Files not identified by Droid (listed by file ext)• 107 different file extensions not identified

Page 75: Repository and preservation systems

Every little helps

Page 76: Repository and preservation systems

How do we improve this result?• More file signature research required

– institutions can submit sample files to TNA– or they can create their own file format signatures– digital preservation tools (eg: Archivematica) can help us with better reporting on unidentified files

We can improve the tools if we work together

Page 77: Repository and preservation systems

Where to find out more

Page 78: Repository and preservation systems

Do talk to me (or Chris) if you are interested in finding out more about our

preservation workUseful links:Project website: http://www.york.ac.uk/borthwick/archivematicaDigital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/PRONOM: http://www.nationalarchives.gov.uk/PRONOM/Phase 1 report: http://dx.doi.org/10.6084/m9.figshare.1481170Phase 2 report: https://dx.doi.org/10.6084/m9.figshare.2073220

Page 79: Repository and preservation systems

01/05/2023

Emulation developmentsDavid Rosenthal, Standford University

Page 80: Repository and preservation systems

01/05/2023

»AWAITING CONTENT

Page 81: Repository and preservation systems

Repository and preservation systems

06/07/2016

81