39
a centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http:// creativecommons .org/licenses/by- nc - sa /2.5/ scotland / ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. From Digital Creation to Digital Curation Managing Digital Cultural Heritage Resources Maureen Pennock Digital Curation Centre, UKOLN, University of Bath

A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

Embed Size (px)

Citation preview

Page 1: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

From Digital Creation to Digital Curation

Managing Digital Cultural Heritage Resources

Maureen Pennock

Digital Curation Centre, UKOLN, University of Bath

Page 2: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Today’s Talk• Introductions• The UK Digital Curation Centre• Curation and the digital life-cycle• Issues in developing and managing digital

collections• Helpful projects and initiatives• Discussion

Page 3: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

The UK Digital Curation Centre

Page 4: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Digital Curation

• Digital Curation, broadly interpreted, is about maintaining and adding value to a trusted body of digital information for current and future use

• The active management and appraisal of data over the entire life-cycle

Page 5: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

The DCC• Launched in 2004• Established to help solve the extensive

challenges of digital preservation and curation, and to provide research, advice and support services to UK institutions

• Consortium project with 4 main partners• 4 main teams distributed across the 4 UK

locations• Funded by JISC & the e-Science Core

Programme

Page 6: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Organisation to Engage & Collaborate

Industry

research collaborators

standards bodies

testbeds& tools

communities of practice: users

community support & outreach

research

development co-ordination

service definition & delivery

management & admin support

Collaborative Associates Network of DataOrganisations

curation organisations eg DPC

Page 7: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

DCC Outreach• Raising Awareness and Dissemination

• Website (http://www.dcc.ac.uk )• International Journal of Digital Curation

• Annual International Conference

• Understanding Users and their Needs• Requirements gathering

• Associates Network

• DCC Forum

Page 8: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

DCC Services• Information Services

• Community-developed Digital Curation Manual• Briefing Papers & FAQ’s• Technology Watch, Standards Watch, Legal Watch• Case Studies• Best Practice Checklists

• Advisory Services• Events: information days, workshops, training• Helpdesk

• Audit and Certification Services

Page 9: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

DCC Research• Annotation in Databases• Data archiving• Socio-economic and legal issues• Metadata extraction and curation• Ontologies and data dictionaries• Provenance and databases• Data transformation, integration and publishing• Supporting technologies• Networks of trusted digital repositories• Organisational and cultural challenges to digital

curation

Page 10: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

DCC Development• DCC Approach to Digital Curation (white

paper) – sets out the path for development activities:• Monitoring international standards• Creating testbeds for digital curation tools• Development of recommendations for tools and

methods for generating Representation Information

• Development of a Representation Information Registry (DCC RIR)

Page 11: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Digital Curation and the Life-Cycle

Page 12: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Why a life-cycle approach?• Curation is a life-cycle approach to management and

preservation of digital objects, necessary because:• Digital materials are fragile & susceptible to change from

technological advances throughout their life-cycle• Each stage can impact on subsequent stages• Traditional management processes can need adapting for

digital materials with different requirements.

• The life-cycle approach enables continuity and provenance despite technological and organisational contextual change

• Maximises investments and potential

Page 13: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Life-Cycle model

CreationAccess & Re-use

Selection

Active Use

Acquisition

Storage & Preserv-

ation

Digital Object

• Life-cycle model differs slightly depending on the context (e.g. libraries/ archives/museums)

• This generic model addresses libraries

Page 14: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

From Creation to Curation• Life-cycle approach facilitates continuity and

control over the different stages • Each stage can impact on the following one:

• Creation impacts on many stages, as the way a resource is created affects the way it can be curated and its sustainability

• Creation problematic in a digital heritage context as you may not have control over the way resources are created

Page 15: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Issues in Developing and Managing Digital Collections

Page 16: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

The Digital Library: Discuss• What exactly is a digital library?

Page 17: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

The Digital Library: Discuss• What exactly is a digital library?

• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can

be achieved multifariously)• An added-value service?

Page 18: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

The Digital Library: Discuss• What exactly is a digital library?

• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can

be achieved multifariously)• An added-value service?

• Professional disparity over the definition (especially the difference between this and a digital archive)

Page 19: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

The Digital Library: Discuss• What exactly is a digital library?

• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can

be achieved multifariously)• An added-value service?

• Professional disparity over the definition (especially the difference between this and a digital archive)

• More than just a search engine and an access mechanism – more than just the Internet!

Page 20: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Potential digital library resources

Digitised• Maps and Posters• Photographs• Original texts – books,

manuscripts, newspapers, journals

• Audio-visual material• Microfilm

Born Digital• Maps and Posters• Photographs• E-Publications• Audio-visual material• Websites (which will

invariably contain multi-media objects)

• Cataloguing data?

Page 21: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Issues

• Range across the life-cycle• Involves different stakeholders in each• Communication essential

Technical Preservation Organisational

Legal Financial Cultural

Page 22: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Technical issues (1)• Harvesting & Accession • Storage – which model to implement?• Metadata – what metadata are needed?• Security – protection from unauthorised or

malicious access• User access – what tools are needed?

Page 23: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Technical issues (2)• Preservation

• Objects highly environmentally dependent• Software/hardware changes many times during the lifetime

of the records – every five years?• Content may be altered if action is undertaken• Content will become inaccessible if action is not taken

• Preservation strategies & tools• Fragility of storage media• Media obsolescence• File deterioration• Hardware & software obsolescence

Page 24: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Organisational and Cultural issues• Organisational and cultural infrastructure not

usually geared towards digital longevity• Digital cultural heritage resources are often

primarily recognised as resources for the ‘here and now’

• ‘Here and now’ access practices ≠ longevity!• Preservation issues not recognised/regarded• Staffing – expansion of duties or new staff?• Need for senior managerial support, e.g policy,

finances…

Page 25: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Financial issues• Financial:

• Not just a one-off ‘digitising’ or ‘collecting’ cost• Preservation activity can require ongoing financial

commitment• Who will pay – now and in the future? • What are the cost benefits?• Where’s the business model?• Will access be payment-restricted?

Page 26: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Legal issues• Legal:

• Meeting legal obligations: data protection, copyright, database right…

• Who is responsible?

• Copyright particularly relevant, as copying can be a vital act in preservation and access• Impact of DRM on copying abilities• A new definition of copying needed?

Page 27: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Addressing the issues• Follow progress in national initiatives• Collaborate & communicate• Engage the consumer

• Success requires commitment:• At a policy level (integrated)• At a managerial level (support/backing)• At a staffing level (actions/activities)

Page 28: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Strategy (1)• A written policy and strategy to support activities and

help secure resources• Take a life-cycle approach to support curation and

preservation planning• If creating resources, provide good practice guidance for

sustainability (eg when digitising or accepting digitised resources)

• Assess collection/selection criteria – are they still valid? Do they need expanding? Identify possible resources

• Digital resources can complement & enhance physical ones

• Be aware of externally produced digital resources (eg websites); check other heritage collections before gathering!

Page 29: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Strategy (2)

• Identify legal restraints in collection/management/access• Can value be added to resources during acquisition?• Store objects in a secure environment• Plan for preservation activities to maintain access to

authentic resources over time and avoid incurring extra costs

• Determine access and user requirements• Implement integrated approach to collection accessibility

• Adapt and learn from national and other leading activities

Page 30: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Helpful projects and initiatives for preservation and accessibility

Page 31: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

National Library of Scotland• Developed several digital and web-accessible

themed collections:• Propaganda: A weapon of war (posters/images)• Maps• First Scottish books• Robert-Louis Stevenson (letters, sketches, photos)• Muriel Spark – the story• Churchill: The evidence (contains school resources)

• Trusted Digital Repository• Part of the UK Web Archiving Consortium (UKWAC)

• Selection and collection criteria for Scottish web sites• ‘Archiving the UK General Election 2005’

Page 32: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

UK WAC• UK Web Archiving Consortium (6 members)

• British Library, National Library of Scotland, National Library of Wales, The National Archives, Wellcome Library, JISC

• Collects Web content selectively • Uses modified PANDAS collection/harvesting software

developed by the National Library of Australia• Underlying harvesting program is currently HTTrack• Permission is sought from site owners in advance• Persistent Identifier URLs• Single partner assumes responsibility for each site• Central repository of metadata• The collections are publicly accessible

• Website: http://www.webarchive.org.uk/

Page 33: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Internet Archive• Non-profit organisation, based in U.S.• Wants to offer permanent access to digital online

materials of all types• Founded in 1996, has been collecting since then … much

content donated by Alexa Internet• Collects sites by crawling and harvesting web sites

• Sites can 'opt out' by way of robots.txt file on the web server

• Most content is freely available to the public, e.g. through the Wayback Machine

• Interface issues: only the URL indicates that the page is archived

• Website: http://www.archive.org/

Page 34: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

IIPC (1)• International Internet Preservation Consortium

• Builds co-operation between the Internet Archive and national and research libraries

• Co-ordinated by the Bibliothèque nationale de France• The British Library is the only current UK member, other

national library partners include the Library of Congress, the Library and Archives Canada and the national libraries of Australia, Denmark, Finland, Iceland, Italy, Norway and Sweden

• Reflects those with current experience of Web archiving• Both working-groups and tool development• Phase II will enable new partners to join the consortium

• Website: http://netpreserve.org/

Page 35: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

IIPC (2)*• Phase I - developing the IIPC toolkit

• Standards and tools for supporting:• Acquisition - archival quality crawler (Heritrix); portable

database extraction and migration tool for database-driven deep web sites (DeepARC)

• Managing collections - analytical and prioritization tools for automatically focusing harvesting; curation tools to provide a non-technical interface for selecting, monitoring and verifying archived web sites

• Collection storage and maintenance - tools for manipulating formats; a standardised storage format (WARC), standards for metadata

• Access and finding aids - browse interfaces (WERA) and search facilities (NutchWAX)

* Michael Day, IWMW 2006

Page 36: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

LOCKSS (1)• Lots of Copies Keeps Stuff Safe (LOCKSS)• An ‘easy and inexpensive way to collect, store,

preserve, and and provide access to their own, local copy of authorised content they purchase’ (LOCKSS website)

• E-Journal collection and preservation system• Open Source Software• Runs on standard desktop hardware• Requires very little technical

administration

Page 37: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

LOCKSS (2)• Trial and pilot projects underway

• DCC support available through helpdesk and dedicated Advisory post

• Current trial suitable only for certain titles (due to licensing arrangements with publishers)

• Private networks can be developed:• Requires technical development• Minimum of six machines necessary to

achieve desired redundancy• Suitable for, eg, online course material

Page 38: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Further resources• National Library of Scotland http://www.nls.uk

• National Library of Wales http://www.llgc.org.uk/

• British Library http://www.bl.uk

• DCC website http://www.dcc.ac.uk

• UKOLN website http://www.ukoln.ac.uk • SLAINTE website http://www.slainte.org.uk/

• Digital Archives Regional Pilot (DARP) project http://www.data-archive.ac.uk/randd/darp.asp

• ‘Building and Sustaining Digital Collections’, Abbey Smith http://www.clir.org/

Page 39: A centre of expertise in data curation and preservation CILIPs Branch/Group Day :: 27 September 2006 :: Dundee Funded by: This work is licensed under the

a centre of expertise in data curation and preservation

CILIPs Branch/Group Day :: 27 September 2006 :: Dundee

Thank You & Discussion

Maureen Pennock

[email protected]

Join the DCC Associates Network (it’s free!)

http://www.dcc.ac.uk/associates/