View
212
Download
0
Embed Size (px)
Citation preview
Implementation of a Digital Media Archive in a University Library
Ivar A. L. Hoel
EUUG Annual ConferenceAmsterdam, September 2nd, 2004
What I will talk about
Our original vision
(= who we are, and why we chose Sirsi)
The environment we are working in
(= trends in European research libraries concerning archives and repositories)
The solution we are aiming at
(= our concept of a HYBRID institutional repository)
How we are implementing Hyperion Digital Archive
(= solutions chosen, unsolved problems, proposals for improvement)
What I will talk about
Our original vision [slide 3-7]
(= who we are, and why we chose Sirsi)
The environment we are working in [slide 8-16]
(= trends in European research libraries concerning archives and repositories)
The solution we are aiming at [slide 17-29]
(= our concept of a HYBRID institutional repository)
How we are implementing Hyperion Digital Archive [slide 30-43]
(= solutions chosen, unsolved problems, proposals for improvement)
Organizational background
This is not a factory
It is the largest library school in the world
You will find it in Copenhagen, Denmark
The Royal School of Librarianship and Information Science (RSLIS)
1000+ students
70+ faculty members
5000+ seminar and course participants annually
… and a well-equipped library to serve it
The library specializes in LIS documents and resources
Why we did choose SIRSI
The answer to that question tells quite a lot about what a digital archive means to us
Why we did choose SIRSI
Year 2000: We were in need of a new library system
We were sure that outsourcing – to a vendor, or preferably to another library – was the future
Therefore choice of a good partner was more important than the choice of a library system – we thought
BUT we also needed a digital archive
AND we did not trust any other library to run that
SO we made a U-turn : we bought our own systems – from SIRSI
The trends we see around us in Europe
Because of the soaring price of scientific information, the idea of alternative publishing methods is gaining momentum, not only in libraries but also universities etc.
E-publishing in research libraries is almost exclusively interpreted as self-publishing of universities’ research output
Open Archives Initiative (OAI) is a central protagonist in this context
The SPARC idea of an Institutional Repository is increasingly talked about
Two recent examples
In mid-February 2004, when an earlier version of this presentation was being prepared for the 2004 SIRSI Super Conference in St. Louis, two recent events were used as examples:
The CERN Workshop on Innovations in Scholarly Communication (OAI3) 12.-14.th February in Geneva (http://info.web.cern.ch/info/OAIP/)
The opening Feb. 18th of a global preprint and eprint service, comparable to OAIster, at the Technical Knowledge Center for Denmark at the Technical University of Denmark. P.t. 26 eprint archives (service providers) and 1million records (http://preprints.cvt.dk)
Open access as a universal principle for scholarly activities
From The Berlin Declaration on Open Access to Knowledge in Science and Humanities, 22 October 2003:
Major national and international organisations of science and culture consider their mission only half complete if the information they produce is not made freely available to society.[Source: Theresa Velden, Heinz Nixdorf Center for Information Management in the Max Planck Society - at the OAI3 Workshop, CERN, Geneva Switzerland, 12-14 Feb 2004]
From The Berlin Declaration
“The Internet has fundamentally changed the practical and economic realities of distributing scientific knowledge and cultural heritage. For the first time ever, the Internet now offers the chance to constitute a global and interactive representation of human knowledge, including cultural heritage and the guarantee of worldwide access.”
„In order to realize the vision of a global and accessible representation of knowledge, the future Web has to be sustainable, interactive, and transparent. Content and software tools must be openly accessible and compatible.”
“Our organizations are interested in the further promotion of the new open access paradigm to gain the most benefit for science and society.”
Conclusion on Open Access in Europe
The Berlin Declaration aims at acceptance of paradigm of open access as a universal principle for scholarly activities. Governments, universities, research institutions, funding agencies, foundations, libraries, museums, archives, learned societies and professional associations are invited to join the present signatories.
Please contact:Prof. Dr. Peter GrussPresident of the Max Planck SocietyMunich, GermanyURL: www.zim.mpg.de/openaccess-berlin/e-mail: [email protected]
Realization requires sustainable service infrastructure and long term commitment of players, far reaching organizational, socio-economic changes (copyright, role of information professionals, business models)
Signatories of Berlin Declaration
Max Planck SocietyGerman Research Foundation (DFG)Fraunhofer Society Leibniz AssociationHelmholtz AssociationDeutscher WissenschaftsratAssociation of Universities and other Higher Education Institutions in GermanyBerlin-Brandenburg Academy of Sciences and HumanitiesStaatliche Kunstsammlungen DresdenDeutscher BibliotheksverbandDeutsche Initiative für Netzwerkinformation (DINI)Centre National de la Recherche Scientifique (CNRS)Institut National de la Santé et de la Recherche Médicale National Hellenic Research FoundationFund for Scientific Research - FlandersMinister of Education Cultura y Deportes Gobierno de CanariasFWF Austrian Science FundNorwegian Institute of Palaeography and Historical PhilologyIstituto e Museo di Storia della Scienza FlorenceCentral European University BudapestAcademia EuropaeaOpen Society Institute (OSI)Chinese Academy of Science
Institutional repositories should contain
Search system
Self-service for upload of documents
Handling of administrative workflows
Rights management
Structuring of collections
Support of diverse formats
Metadata allocation to the documents
Handling of permanent URLs
Bitmap preservation
OAI compatibility
[Compiled by a Danish colleague]
Choices made by Danish neighbour libraries
The colleague who drew up the list (previous slide) is migrating from a Danish system, and will soon be using DSpace from MIT
Other Danish research libraries will try out other possibilities, among them the DiVA system from Uppsala University Libraries, Digital Publication Unit, in Sweden
DiVA has strong support in other Swedish universities, and will probably be used by Norwegian university libraries as well
So where does this leave us as Hyperion users?
Are we left alone, with a system that is out of touch with what is happening?
Or can Hyperion cope with the challenges?
Let us have a look at the meaning of a digital archive in a modern library
When we started thinking about a digital archive, words like
OAI
and
Institutional Repository
were unknown to us. In fact, they were not even invented yet.
We had quite different uses for a digital archive in mind.
Example: What do we do with this report in a modern, hybrid library ?
An important 107 page report, commissioned by a Swedish organization, written and published in Denmark in pdf-format on a webserver.
Earlier it would been printed, mimeographed or xeroxed in a number of copies.
Today it will be on a webserver for some time.
What do we do with the pdf-publication?
Print it out and send it to the bookbinder, then treat it as a printed book? (Weight: 1 lb 3 oz. + binding)
Catalogue it with a link in the 856-field to the appropriate URL?
Put it in the digital archive?
What do we do with the pdf-publication?
Print it out and send it to the bookbinder, then treat it as a printed book? Yes, that is what we did pre-1995.
Catalogue it with a link in the 856-field to the appropriate URL? Yes, that is what we did pre-2003.
Put it in the digital archive? Yes, that is what we do now (+ possibly the URL link as well)
When e-print takes over from print
The digital archive takes over the function that the shelves have in the traditional library
There will be a mixture of documents kept on a webserver and documents kept on a digital document server. This mixture will lead to some confusion
There will be a mixture of documents originating from outside sources (cf. the pdf-example), and documents produced within the organization (e.g. the research output from a university)
What about the webservers? - Q
We have been asked: Why use money on a Digital Document server, when a webserver is ”free”?
We have been asked: Why don’t we just rely on webservers for documents that are already out there? Do we intend to catalog the Internet?
What about the webservers? - A
The webservers simply do not live up to the requirements for a repository, even if their contents are indexed
We have seen too many examples of documents that vanish, and webservers that close down without a trace, for us to believe that we can rely on the webservers if we want to serve our users in the way we did with print documents
We catalogue carefully selected web documents in our Unicorn catalogue
We do not intend to rely on legal deposit for web documents
So, what can we do at the RSLIS library?
The library at RSLIS is not a large research library, and the research output of the Library School is small compared to the large universities
To them, e-publishing of the research output is a huge task by itself
In fact, research registration as such is a challenge
To us, e-publishing is only part of the task
Further, our library already is in charge of the university’s research registration, which gives us a great advantage
We are aiming at a HYBRID DIGITAL ARCHIVE
Sticking to our original vision, we want to establish both an Institutional Repository (for research publication) and a Digital Media Archive (for documents acquired from outside sources) at the same time
We want to use Hyperion for both
Can Hyperion – which was sold as a Digital Media Archive - live up to that?
RSLIS – Main areas for digital archive collecting
Research from RSLIS
PhD theses
Student’s theses
LIS documents
Danish library history
RSLIS internal documents
= Institutional Repository
= Media Archive
RSLIS – Main 6-part Digital Archive hierarchy
Let us look at the 10-item list again Institutional repositories should contain:
Search system
Self-service for upload of documents
Handling of administrative workflows
Rights management
Structuring of collections
Support of diverse formats
Metadata allocation to the documents
Handling of permanent URLs
Bitmap preservation
OAI compatibility
Hyperion strengths and weaknessesAll ten are supported, but some better than others
Search system
Self-service for upload of documents
Handling of administrative workflows
Rights management
Structuring of collections
Support of diverse formats
Metadata allocation to the documents
Handling of permanent URLs
Bitmap preservation
OAI compatibility
Practical problem areas
(Post configuration problems – the pre-configuration problems are forgotten)
Choosing the documents
Acquiring the documents
Copyright handling
Metadata creation
Choosing the documents
A collection development plan has to be drawn up, just as for print material.
We have seven plans, where we describe in detail what we intend to do and to achieve for each of our seven main collecting areas.
What we intend to achieve will probably take years and years to be fulfilled
Acquiring the documents
Getting permission for the files and getting the files (which is not the same!)
Handling the files in the interim period
Quality control routines
Naming rules for files and file AltIDs
When to display document only, and when to open the Hyperion hierarchy
Are all text documents to be indexed without exception?
Can multi-part text documents sensibly be treated as multi-image documents?
Do we prefer some file formats to others, e.g. PDF to Word?
Copyright handling
Copyright legislation is a morass
Nevertheless, we intend to keep the rules
Therefore, it is time consuming, as we will have to acquire written consent from all copyright owners
For students we have implemented a written agreement form, giving the library the right to put the thesis in the Digital Archive, and giving the student the right to withdraw the electronic document with 6 months notice. (But a paper copy will still be there)
For researchers, we will (together with the RSLIS) try to persuade them not to give the commercial publishers exclusive rights
Metadata creation
Apart from copyright handling, metadata creation is the most time consuming part of Hyperion work
We have not found any easy way out, because we (still) have so many requirements to fulfil
A main question is what to catalogue with a MARC record, when to use Dublin Core, and when to do both
Conversion between MARC and DC (or vice versa) would save much time
IF we decide to take preservation problems more seriously, other and more developed metadata sets than the DC set preconfigured in Hyperion would be a great benefit
MARC cataloguing and/or DC metadata
Main group danMARC2 Dublin Core
Research X X
PhD and student’s theses
X X
LIS-documents X X
Danish Library history, incl. photo collection
very few X
RSLIS internal documents
X
Metadata fields supplied with (our) Hyperion
CONTRIBUTR
COVERAGE
CREATOR
DATE
DESCRIPT
FORMAT
LANGUAGE
OTHER
PUBLISHER
RELATION
RELAPART
RELAVERS
RESOURCE
RESOURCEID
RIGHTS
SOURCE
SUBJECT
TITLE
TITLEREL
TYPE
ATID
DESC
ID
Metadata fields used by RSLIS (red)
CONTRIBUTR
COVERAGE
CREATOR
DATE
DESCRIPT
FORMAT
LANGUAGE
OTHER
PUBLISHER
RELATION
RELAPART
RELAVERS
RESOURCE
RESOURCEID
RIGHTS
SOURCE
SUBJECT
TITLE
TITLEREL
TYPE
ATID
DESC
ID
Frequency of metadata fields usedSource: Library Hi Tech 21 (2003) p. 164
CONTRIBUTR (20%)
COVERAGE (55%)
CREATOR (64%)
DATE (59%)
DESCRIPT (51%)
FORMAT (32 %)
LANGUAGE (41%)
OTHER
PUBLISHER (70%)
RELATION (39%)
RELAPART
RELAVERS
RESOURCE
RESOURCEID (100%)
RIGHTS (63%)
SOURCE (11%)
SUBJECT (60%)
TITLE (77%)
TITLEREL
TYPE (76%)
Example of RSLIS metadata
RSLIS – Use of attribute icons 1
RSLIS – Use of attribute icons 2
A Configuration inadequacy
More deficiencies that we have discovered
- The order of the DC metadata fields is nonsensical, and cannot be changed- A template for filling in DC metadata would be much welcomed, while waiting for MARC conversion and/or import functionality- There is a need to distinguish between searchable and non-searchable metadata fields. Otherwise, it will not be possible to search for a word that is used to describe a folder: the complete contents of the folder will be returned- When several images are connected to one metadata set, the images popping up cover the button that is to used for forward and backward viewing of the range of images- It is not possible to display a (historical) web page consisting of a set of files (eg. gif, css) – they cannot be interpreted properly to be shown in the way a web browser shows the page