24
a centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge 23 May 2006 Funded by: This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Introduction to Digital Archives Maureen Pennock EAOLUG Spring/Summer Meeting 2006

A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

Embed Size (px)

Citation preview

Page 1: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Introduction to Digital Archives

Maureen Pennock

EAOLUG Spring/Summer Meeting 2006

Page 2: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Today’s talk• The DCC

• Background & Context

• What We Do

• Digital Archives & Archiving• Definitions

• Main Issues

• OAIS

• Systems

Page 3: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

UK Digital Curation Centre• JISC Circular 6/03 called for bids in digital curation

• JISC and the e-Science Core Programme funding• for development, services and outreach in digital

curation• for a research programme

• Impetus to action• Growth in e-Science activity and data creation• Recognition that continuing access to digital

information is needed

Page 4: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Partners• University of Edinburgh (lead site)

• Chris Rusbridge, Prof Peter Buneman

• University of Glasgow - HATII• Prof Seamus Ross, Director of HATII and Erpanet

• University of Bath - UKOLN• Dr Liz Lyon, Director of UKOLN

• Councils for the Central Laboratory of the Research Councils (CCLRC)• Dr David Giaretta

Page 5: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Objectives• Lead a vibrant international research programme to improve

quality in data curation and digital preservation

• Deliver effective, efficient and high demand services

• undertake evaluation of tools, methods, standards and policies

• work with the community to establish registries of tools and technical information

• Create an active, innovative and collaborative Associates Network

• Connect communities

• Universities and Research institutions

• Scientific data and documents

• International & cross-sector

Page 6: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Research• Annotation in Databases• Data archiving• Socio-economic and legal issues• Metadata extraction and curation• Provenance and databases• Data transformation, integration and publishing• Security• Supporting technologies• Organisational and cultural challenges to digital

curation

Page 7: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Development• DCC Approach to Digital Curation (white paper) –

sets out the path for development activities:• Monitoring international standards• Development of a Representation Information

Registry/Repository (DCC RIR)• Development of recommendations for tools and methods for

generating Representation Information• Creating testbeds for digital curation tools• Creating auditing and certification processes for trusted

repositories

Page 8: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Services• Information Services

• Community-developed Digital Curation Manual• Briefing Papers & FAQ’s• Technology Watch• Case Studies• Best Practice Checklists

• Advisory Services• Events: information days, workshops, training,

conferences• Helpdesk

• Audit and Certification Services

Page 9: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Summary• Support and promote continuing improvement

in the quality of data curation and preservation activity

• Nurture strong community relationships between practitioners, researchers, and curators

• Address digital curation from all aspects of the records life-cycle

• Develop and promote curation knowledge, tools and techniques

• Identify and research new organisational, technical, and supporting curation challenges

Page 10: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Digital Curation• Digital curation is all about maintaining and

adding value to a trusted body of digital information for current and future use; specifically, we mean the active management and appraisal of data over the life-cycle of scholarly and scientific materials.

• Digital Curation brings a whole host of

challenges• The range of stakeholders that affect the

survival of digital material cuts across the whole life-cycle

• Everyone plays an important role

Page 11: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Digital Archiving• Digital archiving is a curation activity• Ensures that

• Data is properly selected • Data is properly stored• Data can be accessed• The logical and physical integrity of the data

is maintained over time• Data is secure and authentic *

* Lord & MacDonald, e-Science Data Curation Report, 2003

Page 12: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Digital Preservation• Digital preservation is an archiving activity• Ensures that specific items of data are

maintained over time so that they can still be accessed and understood through changes in technology *

• Includes content files and associated metadata• Combats digital obsolescence• Keeps data authentic despite technological

change• Has technical, organisational, and cultural

challenges

* Lord & MacDonald, e-Science Data Curation Report, 2003

Page 13: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

What is a Digital Archive?• Inconsistency in use of the terms digital

archive, digital repository, and digital library• Task Force on Archiving Digital Information

1996: “Defines digital archives strictly in functional terms as repositories of digital information that are collectively responsible for ensuring, through the exercise of various migration strategies, the integrity and long-term accessibility of the nation’s social, economic, cultural and intellectual heritage instantiated in digital form.”

• Provide reliable solutions for life-cycle and long-term management of digital archival materials

• System driver is Preservation, leading to Access

Page 14: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

What is a Digital Repository?• Collections of digital objects: content +

metadata• Cross-domain implementation• Offer minimum set of basic services – Get,

Search, Access control• Sustainable & trusted; well-supported and

managed• Policies, processes, services, people• Overall commitment to stewardship of digital

materials• Enables quick & remote access to digital

materials

Page 15: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Main Issues for Digital Archives• User Requirements• Transfer & Ingest• Metadata• Standards• Digital preservation strategies• Linkage• Audit and Certification• Legal Issues• Access restrictions

Page 16: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

OAIS• Open Archival Information System Reference Model• ISO 14721:2003• "An archive, consisting of an organisation of people

and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community"

• Establishes a common framework of terms and concepts

• Defines an Information Model • Identifies basic Functions of an OAIS

Page 17: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

OAIS Functional Model• Functional model has six entities:

• Ingest; • Archival Storage; • Data Management; • Administration; • Preservation Planning; • Access

• Described using UML diagrams

Page 18: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

OAIS Functional Entities

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

Page 19: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

DSpace• DSpace: “DSpace is a groundbreaking digital

repository system that captures, stores, indexes, preserves, and redistributes an organization's research data [...] the DSpace software platform serves a variety of digital archiving needs.”

• Open source software• Example use:

• American Museum of Natural History Research Library

• Chapel Hill, SILS, Theses & Dissertations• University of Cambridge – Academic & related

content • Edinburgh Research Archive (ERA)

Page 20: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

EPrints• Eprints: “GNU EPrints is generic archive

software under development by the University of Southampton. It is intended to create a highly configurable web-based archive.”

• Open Source software• Example uses:

• Southampton Crystal Structure Report Archive• Central Connecticut State University Digital Archive• Central European University – Preprint Archive• Curtin institute of Technology Institutional Repository• DLIST – Digital Library of Information Science &

Technology

Page 21: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Fedora• Fedora: “Open source software that gives

organisations a flexible service-oriented architecture for managing and delivering their digital content.”

• Open source software• Example uses:

• Digital Case, Case Western Reserve University's electronic repository and archive: stores, disseminates, and preserves faculty research in digital formats (both born digital and digitised)

• University of Queensland eSpace – research digital repository with published articles and conference papers, book chapters, theses and other forms of written research

Page 22: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Others• Other systems such as Digital Commons

institutional repository service• Other, custom-built systems

• NARA Electronic Records Archives (ERA) project

• UK National Archives• Public Record Office, Victoria• KB eDepot, Netherlands• Several other large bodies whose archive

pre-dates development of aforementioned repository software

• Commercial systems

Page 23: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

In conclusion• There is much in common between digital

archives, libraries, and repositories• Intention and subsequent functionality is the

key to defining digital storage systems• Digital Archives offer a framework for

maintaining & preserving the authenticity and integrity of records over time

• Several software solutions are available• Development is ongoing• Need technical know-how to implement• There is still a lot of work to do... .

Page 24: A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons

a centre of expertise in data curation and preservation

EAOLUG :: RSC :: Cambridge 23 May 2006

Thank you.

Questions?

Maureen [email protected]

Join the DCC Associates Network at http://www.dcc.ac.uk