View
218
Download
4
Tags:
Embed Size (px)
Citation preview
1
Archiving Workflow between a Local Repository and the National Library ArchiveExperiences from the DiVA Project
Eva Müller, Peter Hansson, Uwe Klosa, Stefan Andersson
Electronic Publishing CentreUppsala University Library, Sweden
2
focus on
• Using URN:NBN as persistence identifier for referencing and as identifier in an archiving workflow
• Workflow between a local repository and the national library archive
3
Outline
• DiVA project and its objectives• DiVA publishing system• DiVA Archive and archiving
workflow• URN:NBN and it’s role in the
workflow• Conclusions and next steps
4
DiVA ProjectDigitala Vetenskapliga Arkivet (Digital Scientific Archive)
• Since September 2000 • Objectives:
– Technical solutions & well functioning workflow supporting full-text publication of doctoral theses, working papers…
– Explore ways to ensure the future use and understanding of the digital objects in the archive
5
DiVA Publishing System
• Focus on workflow– reuse and enhance the data directly from
the source document,originally created by authors, for metadata and a digital master for an electronic & printed version
– store & checksum the files– assign a persistent identifier– send a copy to the National Library Archive
6
Author
DiVA Manager
LocalRepository
Word Processor
Word ProcessingFormat (Template)
DiVADocument
Format
Local Long-termStorage
Long-termstoragepackages
DiVADocument
Format
7
Implementation
• Java – XML technologies• Currently an Oracle database used for
indexing and searching• Architecture: component-based design
– Modularity and reusability of the components– Possibility to seamlessly replaced modules
with improved implementations of the component
8
DiVA system
• Used by 5 universities in Sweden– Stockholm, Umeå, Uppsala och
Örebro university + Södertörns högskola
• Soon 1 university in Denmark– Århus University (Staatsbibliotek i
Århus)
9
Issues
• How can we ensure the future access and understanding of documents we produce locally?
• What factors increase potential for success?
• Can these factors be integrated into an automated and low-cost workflow?
10
How can we ensure accessibility in the future?
• A stable point of reference (persistent identifier)
• Use human-readable, non-proprietary storage format
• Storage in several locations
11
How can we minimize risks for data loss?
• Multiple copies in different locations
• Mechanism to keep track of copies
12
Analysis of interest
– Authors • Dissemination of their intellectual output
– Universities• Track research output• Reduce publishing cost• Increase impact
– National libraries• Legal deposit
13
Strategies
• Decision to use XML as a primary storage format
• Decision to use URN:NBN • Decision to cooperate with the
Royal Library • Decision to fit all needs into an
automated workflow
14
DiVA Document Format
• Internal format • Version 1.0 (described in XML Schema)
– 99 elements– Component based– Extensible– Administrative elements are combined with
descriptive elements– DocBook DTD is used for the content part
of the document
15
Web Services
Author
Metadata DisseminationServices
Dublin Core
MARC 21
TEI Header
Endnote
Reference Manager
URN:NBN Register Format
Export Formats
DiVA Document Format
LocalRepository
Word Processor
Word ProcessingFormat (Template)
Content DisseminationServices
DocBook
TEI
HTML
Export Formats
16
Implementation of the Archiving Workflow• Assignment of the URN:NBN to the resources• Implementation of URN:NBN Resolution
Service• URN:NBN as a unique identifier within the
archive• URN:NBN as a naming convention for files,
directories and archival packages• URN:NBN as a part of disseminated metadata
17
Assignment of the URN:NBN
Sub domain – managed locallyStructure URN:NBN:se:X:divaURN:NBN:se:uu:diva+locally
managed serial numberURN:NBN is used as identifier for
each item – an item is a single publication without consideration of format
18
Implementation of URN:NBN Resolution Service
• Only basic functionally, more development planned
• Implemented as a java-servlet and contains an harvester which can harvest URN:URL-bindings from many different repositories
19
Resolution ServiceConfiguration File
Other
URN:NBNRegister Format
request
response
request
response
URN:NBN:se to URLmappings
requeste.g. http://urn.kb.se/resolve?urn=
responseuser redirected to an URL
DiVA
User
Royal Library
RepositoriesURN:NBN
Register Format
URN:NBNresolution
service
20
URN:NBN as a naming convention for files, directories and archival packages
21
Workflow to the National Library Archive
• Archiving packages– Administrative and descriptive
metadata stored in XML– Today – content in pdf and where
possible even in XML– Each manifestation is bundled to
AP• General data • Format specific data
22
metadatacontent
stylesheetsschemas
checksums
checksum
name: URN:NBN:se:[specific part]
Archiving Package
23
Central
Local
URN:NBNResolution
Service
Long-termStorage
LibraryCatalogue
Long-termStorage
urn:nbn:se:….urn:nbn:se:.. ->http://www…urn:nbn:se:.. ->http://www...urn:nbn:se:.. ->http://www...
List of URN:NBNto URL mappings
Metadata & Content
Repository
Long-term storage packages
Long-term storage packages MARC 21XML
Metadata
24
Conclusions • Low-cost system that supports a fully
automated workflow from the point of submission works well
• Using harvesting model for updates to the mapping registry makes the management of URN:NBN simple
• Automatic creation of MARC21 records makes cataloguing faster and less expensive
• Push model to deliver archival packages makes this process more reliable and easier to manage
25
…
• Use of XML for all metadata associated with each archival package increases the likelihood of future understanding of the digital objects in the archive and offers the potential to easily extract document metadata, if necessary
• Modularity of technical solution offers advantage of component reusability and is a solid basis for further local and cooperative development
• Long-term access to institutional research publications can be assured with cooperation from national libraries
26
Next steps
New project funded by Royal Library’s Department for National Co-ordination and Development (BIBSAM)– Examine and evaluate current solutions– Develop and implement a generalized
archiving workflow between a local repository and a national archive focusing on the variety of publishing platforms and systems
27
More information
• http://publications.uu.se/• http://publications.uu.se/epcentre/• http://publications.uu.se/
conferences/ecdl2003/archiving_ECDL_2003.pdf