Digital | Curation | Centre Digital Curation Centre www.dcc.ac.uk Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and Seamus Ross Funded by:
Digital | Curation | Centre 2 Session Overview 1. Introduction & Briefing 2. Towards a Technical Model of Digital Curation: our R&D 3. Planning Delivery of Services & the Associates Network
Digital | Curation | Centre 3 1. Introduction & Briefing Background story on the DCC So whos that new kid on the block? What is digital curation anyway? adding value & ensuring longevity Aims & objectives for the DCC improving the quality of what is done Our planning & our progress timelines & deliverables How does this relate to the JISC Programme?
Digital | Curation | Centre 4 Background to the DCC (1) Two parallel policy concerns 1. Neglect of digital heritage, especially given investment in digitsation programmes JISC Continuing Access and Digital Preservation Strategy, 2002-2005 eLib Programme, eLib3, Circular 5/97: Digital Preservation Digital Preservation Coalition formed in 2002 2. Differing data sharing practices in eScience, especially given huge data volumes Links between eScience Programme and JISC Report commissioned by JISC Cttee for Support of Research (Lord & Macdonald, May 2003) twin drivers: Digital Preservation & Continuing Access (e-Science) identified need for national digital curation centre
Digital | Curation | Centre 5 Interpretation of JISC policy JISC plays 3 roles 1.promotes, supports & develop management & preservation of institutional and community digital materials for UK benefit 2.partner to Research Council/AHRB & other national/international bodies 3.as organization, appropriate grant conditions for JISC-funded creation of digital resources; good practice for JISC created/managed materials escalating scale and complexity of digital resources to be curated and the subsequent urgency of developing a critical mass of expertise, shared services and tools, for long-term digital preservation require a step change in investment and approaches. Over the next three years a greater emphasis on development of production services and tools needed to build on previous research studies and projects. Digital preservation remains a challenging area in which techniques, costs, and skills are still in development: advocacy, dissemination and training, to embed preservation needs as appropriate in JISC funding programmes.
Digital | Curation | Centre 6 Interpreting the implementation plan Risk assessment studies, eg ePrints Calls to implement studies recommendations for services and integration of preservation activity & standards into repositories funded by JISC. Series of community calls to support records management and digital preservation in institutions - cf FOI compliance. Establish Digital Curation Centre to: Provide central focus of skilled staff & research links to wider network of development activity, researchers, & services Develop set of central services, standards, and tools for a range of distributed digital data centres & preservation services, across the Information Environment & Research Grid. JISC Partnership funding, eg Web-archiving study: jointly funded by JCIE and Wellcome Trust Digital Preservation Coalition as an independent entity with JISC membership and sector activity supported by JISC. National preservation of e-journals, through RLN/RSLG
Digital | Curation | Centre 7 Back to the DCC Background (2) JISC Circular 6/03, initially issued June 2003 Call postponed, revised & re-issued with more significant research component Joint funding: JISC and e-Science Core Programme 750K pa (outreach, services & development) 250K pa (research) Unlikely that any single organisation could do whats expected Expressions of Interest & Full Proposals from Consortia Final selection made in December 2003 Negotiations & clarification in January 2004
Digital | Curation | Centre 8 Designation of DCC Task entrusted to Consortium of four institutional partners Universities of Edinburgh (lead), Glasgow & Bath together with CCLRC (Rutherford Appleton and Daresbury Laboratories) brought together through the National eScience Centre jointly managed by Universities of Edinburgh & Glasgow Two 3-year awards made: JISC funding started on 1st March 2004 EPSRC grant-funded starts on 1st September 2004 Phase One set-up some early deliverables of website & helpdesk preparation for full operation & launch of services in October planning formal opening for early November 2004
Digital | Curation | Centre 9 Responsibilities across the DCC Them with titles Peter Burnhill, Director (Phase One) with Robin Rice, Phase One Project Co-ordinator (from EDINA & Data Library, University of Edinburgh) Peter Buneman Research Director (& PI on EPSRC grant) Professor of Informatics, University of Edinburgh Liz Lyon, Associate Director (Community Support & Outreach) Director of UKOLN, University of Bath Seamus Ross, Associate Director (Service Definition & Delivery) Director of HATII [ERPANET], University of Glasgow David Giaretta, Associate Director (Development) Head of Astronomical Software & Services, CCLRC Two significant & well known Ex Portfolio names Malcolm Atkinson, Director, NeSC Chris Rusbridge, Director, Information Services, UofGlasgow
functional management & collaboration Industry research collaborators standards bodies testbeds & tools communities of practice: users community support & outreach research development co-ordination service definition & delivery management & admin support curation organisations eg DPC Collaborative Associates Network of Data Organisations
Digital | Curation | Centre 11 What is this digital curation anyway? The term Digital Curation is a new invention. Digital Data Curation Task Force - Report of Strategy Discussion Day (2002) citing Tony Hey citing use by Dr John Taylor, Director General of the Research Councils, to distinguish the actions involved in caring for digital data beyond its original use, from digital preservation. The concepts reach extends beyond libraries. The e-Science Curation Report (2003) proposed the following distinctions: Curation : managing & promoting the use of data from point of creation, to ensure fit- for-contemporary-purpose, available for discovery & re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will involve maintaining links with annotation & with other published materials. Archiving : curation activity which ensures that data are properly selected, stored, can be accessed logical and physical integrity is maintained over time, including security and authenticity. Preservation : activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.
Digital | Curation | Centre 12 digital curation:... digital objects and data, over their life-cycle, for current & future generations of use... = f(data curation & digital preservation) data curation [when high current/ongoing interest] actions needed to maintain and utilise digital data & research results over entire life-cycle data creation & management; adding value; generating new sources of information & knowledge, for use digital preservation [for longevity;fall off in interest] long-run technological/legal accessibility & usability storage, maintenance & accessibility of information content in digital material over the long-term, for use OAIS concept of designated community Digital curation redefined...
Digital | Curation | Centre 13 Data curation in action Astronomy Integrating and analysing distributed data (AstroGrid) publishing multi-TB sky surveys (SuperCOSMOS & WFCAM) interoperability standards (IVO Alliance) BioInformatics data publishing: generic tools for XML export (EBI Biomart) annotation tools for massive data sets (Pubmed, VOTable) archiving tools for dynamic data sets (biological DBs) Environmental sciences spatio-temporal annotation (OS Mastermap/ Mouse Atlas) Document management Repository certification (RLG Task Force)
Digital | Curation | Centre 14 Digital preservation approaches Migration & Refreshment Emulation & Encapsulation Digital Archaeology & Rescue Document Format Specification Robin Rice & Najla Semple, http://www.lib.ed.ac.uk/sites/digpres/
Digital | Curation | Centre 15 Communities of Practice: Social Sciences (IASSIST) History of sharing economical in terms of both data collector and respondent Data about humans problems of confidentiality confronted early on Mixed blessing of agreed proprietary formats (OSIRIS, SPSS, etc.) allows migration Future-proofing - 30 years of data advocacy! Tradition of data archiving & data citation Building new data standards out of common experience data archivists, & data librarians: the new digital curators? www.iassistdata.org
Digital | Curation | Centre 16 Unifying Themes for D C C data as evidence for one or more designated communities archival responsibility at one or more institutional levels with institutional policies & individuals competence engage/discover communities of practice, to invoke/provoke good practices appraisal & retention/disposal logical & physical integrity: authenticity/security research problems in productive research domains eg Informatics, Law School
Digital | Curation | Centre 17 Aims & Objectives for the DCC quality improvement in data curation & digital preservation Initial focus: data as evidence for scholarly conclusions Wider remit: worlds of scholarly communication & eLearning twin aims:excellence in research & excellence in service need to bridge across communities: universities & research institutes scientific data tradition & document tradition multi-sectoral, international
Digital | Curation | Centre 18 We are all curators now... The term curation builds on our understanding of the word curator who keeps something for the public good, value of which often needs to be brought out by the curator. 1. this open context implies more support for explicit policies with regard to data sharing, and it has major implications for structuring and tools. 2. the digital curator as store-keeper closely linked to promoting new science, looking forward to identify new ways to serve present and future researchers. digital curator should take an active role in promoting and adding value to holdings manage the value of collection adding links and annotation to provide context recording provenance of changes made
Digital | Curation | Centre 19 Planning & Progress We must plan for the Long, with our 2020 Vision - 15yrs we have large territory, and large expectation multi-disciplinary, multi data type, multi tradition/profession national and international, but also local and hidden from view a lot is going on how to ensure that we do something sensible with the s and the trust we have been given? who/what should we plan to affect/effect? policy-makers; responsible curators; (researchers?) how do we wish to be judged, and when? collaboration & win-win-win scenarios
Digital | Curation | Centre 20 focii of attention in set-up phase Users: client, peer and policy communities outreach & community support; service definition/delivery; development co-ordination; research agenda user requirements analysis: Leona Carpenter (Focus Groups) Consortium: organisation from partner participation roles; commitment; norming/performing; operational communication; consortium agreement (IPR) Employers: institutional settings re-deployment/appointments; accommodation; commitment/reporting -> Project Plan, as living document
Digital | Curation | Centre 21 weekly AccessGrid/telecon; two face2face meetings defining programme of deliverables; re-deploying & recruiting staff; planning appointment of full time director in time for Launch early deliverables: www.dcc.ac.uk with links, presentations & progress updates email@example.com for contacts & offers of collaboration project plan submitted to JISC, late May 2004 defining R & D programme & services for delivery eg curation architecture; repository of tools & technical information engaging curators in existing community of practice Phase One Progress, March -
Digital | Curation | Centre Towards a Technical Model of Digital Curation: our R&D David Giaretta Funded by:
Digital | Curation | Centre 23 What can we rely on in the Long Term The bits - BIT PRESERVATION Paper documents that people can read ISO standards The information we collect either in the far future DCC or its successor Some kind of remote access Some kind of computers People?
Digital | Curation | Centre 24 Preservation vs Current Use There are already very many architectures to support immediate use of information Including JISC architecture Aim to support these Therefore chose to be guided by long-term preservation aspects to promote this we should emphasise interoperability and automated use as far as possible. based initially on OAIS Reference Model but add other ideas later bear e-Science in mind
Digital | Curation | Centre 25 OAIS Reference Model Functional Model
Digital | Curation | Centre 26 OAIS Preservation Planning - key aspects Representation Net Designated Communities & Knowledge Base
Digital | Curation | Centre 27 Representation Net
Digital | Curation | Centre 28 Preservation Issues Given a file or a stream of bits how does one know what Representation Information is needed (this question applies to Representation Information itself as well as to the digital objects we are primarily interested in preserving and using); how does one know, for example, if this thing is in FITS format? Someone may simply know what it is and how to deal with it i.e. the bits are within the Knowledge Base One may be able to recognise the format by looking for various types of patterns. One may feed the bits into all available interpreters to see which accept the data as valid Other means. The only safe way: have an associated label which points to the appropriate Representation Information Note this does not exclude the other methods e.g. for data rescue