Implementation of a Digital Media Archive in a University Library Ivar A. L. Hoel EUUG Annual Conference Amsterdam, September 2nd, 2004

Implementation of a Digital Media Archive in a University Library

Ivar A. L. Hoel

EUUG Annual ConferenceAmsterdam, September 2nd, 2004

What I will talk about

Our original vision

(= who we are, and why we chose Sirsi)

The environment we are working in

(= trends in European research libraries concerning archives and repositories)

The solution we are aiming at

(= our concept of a HYBRID institutional repository)

How we are implementing Hyperion Digital Archive

(= solutions chosen, unsolved problems, proposals for improvement)

What I will talk about

Our original vision [slide 3-7]

(= who we are, and why we chose Sirsi)

The environment we are working in [slide 8-16]

(= trends in European research libraries concerning archives and repositories)

The solution we are aiming at [slide 17-29]

(= our concept of a HYBRID institutional repository)

How we are implementing Hyperion Digital Archive [slide 30-43]

(= solutions chosen, unsolved problems, proposals for improvement)

Organizational background

This is not a factory

It is the largest library school in the world

You will find it in Copenhagen, Denmark

The Royal School of Librarianship and Information Science (RSLIS)

1000+ students

70+ faculty members

5000+ seminar and course participants annually

… and a well-equipped library to serve it

The library specializes in LIS documents and resources

Why we did choose SIRSI

The answer to that question tells quite a lot about what a digital archive means to us

Why we did choose SIRSI

Year 2000: We were in need of a new library system

We were sure that outsourcing – to a vendor, or preferably to another library – was the future

Therefore choice of a good partner was more important than the choice of a library system – we thought

BUT we also needed a digital archive

AND we did not trust any other library to run that

SO we made a U-turn : we bought our own systems – from SIRSI

The trends we see around us in Europe

Because of the soaring price of scientific information, the idea of alternative publishing methods is gaining momentum, not only in libraries but also universities etc.

E-publishing in research libraries is almost exclusively interpreted as self-publishing of universities’ research output

Open Archives Initiative (OAI) is a central protagonist in this context

The SPARC idea of an Institutional Repository is increasingly talked about

Two recent examples

In mid-February 2004, when an earlier version of this presentation was being prepared for the 2004 SIRSI Super Conference in St. Louis, two recent events were used as examples:

The CERN Workshop on Innovations in Scholarly Communication (OAI3) 12.-14.th February in Geneva (http://info.web.cern.ch/info/OAIP/)

The opening Feb. 18th of a global preprint and eprint service, comparable to OAIster, at the Technical Knowledge Center for Denmark at the Technical University of Denmark. P.t. 26 eprint archives (service providers) and 1million records (http://preprints.cvt.dk)

http://info.web.cern.ch/info/OAIP/



Open access as a universal principle for scholarly activities

From The Berlin Declaration on Open Access to Knowledge in Science and Humanities, 22 October 2003:

Major national and international organisations of science and culture consider their mission only half complete if the information they produce is not made freely available to society.[Source: Theresa Velden, Heinz Nixdorf Center for Information Management in the Max Planck Society - at the OAI3 Workshop, CERN, Geneva Switzerland, 12-14 Feb 2004]

From The Berlin Declaration

“The Internet has fundamentally changed the practical and economic realities of distributing scientific knowledge and cultural heritage. For the first time ever, the Internet now offers the chance to constitute a global and interactive representation of human knowledge, including cultural heritage and the guarantee of worldwide access.”

„In order to realize the vision of a global and accessible representation of knowledge, the future Web has to be sustainable, interactive, and transparent. Content and software tools must be openly accessible and compatible.”

“Our organizations are interested in the further promotion of the new open access paradigm to gain the most benefit for science and society.”

Conclusion on Open Access in Europe

The Berlin Declaration aims at acceptance of paradigm of open access as a universal principle for scholarly activities. Governments, universities, research institutions, funding agencies, foundations, libraries, museums, archives, learned societies and professional associations are invited to join the present signatories.

Please contact:Prof. Dr. Peter GrussPresident of the Max Planck SocietyMunich, GermanyURL: www.zim.mpg.de/openaccess-berlin/e-mail: [email protected]

Realization requires sustainable service infrastructure and long term commitment of players, far reaching organizational, socio-economic changes (copyright, role of information professionals, business models)

Signatories of Berlin Declaration

Max Planck SocietyGerman Research Foundation (DFG)Fraunhofer Society Leibniz AssociationHelmholtz AssociationDeutscher WissenschaftsratAssociation of Universities and other Higher Education Institutions in GermanyBerlin-Brandenburg Academy of Sciences and HumanitiesStaatliche Kunstsammlungen DresdenDeutscher BibliotheksverbandDeutsche Initiative für Netzwerkinformation (DINI)Centre National de la Recherche Scientifique (CNRS)Institut National de la Santé et de la Recherche Médicale National Hellenic Research FoundationFund for Scientific Research - FlandersMinister of Education Cultura y Deportes Gobierno de CanariasFWF Austrian Science FundNorwegian Institute of Palaeography and Historical PhilologyIstituto e Museo di Storia della Scienza FlorenceCentral European University BudapestAcademia EuropaeaOpen Society Institute (OSI)Chinese Academy of Science

Institutional repositories should contain

Search system

Self-service for upload of documents

Handling of administrative workflows

Rights management

Structuring of collections

Support of diverse formats

Metadata allocation to the documents

Handling of permanent URLs

Bitmap preservation

OAI compatibility

[Compiled by a Danish colleague]

Choices made by Danish neighbour libraries

The colleague who drew up the list (previous slide) is migrating from a Danish system, and will soon be using DSpace from MIT

Other Danish research libraries will try out other possibilities, among them the DiVA system from Uppsala University Libraries, Digital Publication Unit, in Sweden

DiVA has strong support in other Swedish universities, and will probably be used by Norwegian university libraries as well

So where does this leave us as Hyperion users?

Are we left alone, with a system that is out of touch with what is happening?

Or can Hyperion cope with the challenges?

Let us have a look at the meaning of a digital archive in a modern library

When we started thinking about a digital archive, words like

OAI

and

Institutional Repository

were unknown to us. In fact, they were not even invented yet.

We had quite different uses for a digital archive in mind.

Example: What do we do with this report in a modern, hybrid library ?

An important 107 page report, commissioned by a Swedish organization, written and published in Denmark in pdf-format on a webserver.

Earlier it would been printed, mimeographed or xeroxed in a number of copies.

Today it will be on a webserver for some time.

What do we do with the pdf-publication?

Print it out and send it to the bookbinder, then treat it as a printed book? (Weight: 1 lb 3 oz. + binding)

Catalogue it with a link in the 856-field to the appropriate URL?

Put it in the digital archive?

What do we do with the pdf-publication?

Print it out and send it to the bookbinder, then treat it as a printed book? Yes, that is what we did pre-1995.

Catalogue it with a link in the 856-field to the appropriate URL? Yes, that is what we did pre-2003.

Put it in the digital archive? Yes, that is what we do now (+ possibly the URL link as well)

When e-print takes over from print

The digital archive takes over the function that the shelves have in the traditional library

There will be a mixture of documents kept on a webserver and documents kept on a digital document server. This mixture will lead to some confusion

There will be a mixture of documents originating from outside sources (cf. the pdf-example), and documents produced within the organization (e.g. the research output from a university)

What about the webservers? - Q

We have been asked: Why use money on a Digital Document server, when a webserver is ”free”?

We have been asked: Why don’t we just rely on webservers for documents that are already out there? Do we intend to catalog the Internet?

What about the webservers? - A

The webservers simply do not live up to the requirements for a repository, even if their contents are indexed

We have seen too many examples of documents that vanish, and webservers that close down without a trace, for us to believe that we can rely on the webservers if we want to serve our users in the way we did with print documents

We catalogue carefully selected web documents in our Unicorn catalogue

We do not intend to rely on legal deposit for web documents

So, what can we do at the RSLIS library?

The library at RSLIS is not a large research library, and the research output of the Library School is small compared to the large universities

To them, e-publishing of the research output is a huge task by itself

In fact, research registration as such is a challenge

To us, e-publishing is only part of the task

Further, our library already is in charge of the university’s research registration, which gives us a great advantage

We are aiming at a HYBRID DIGITAL ARCHIVE

Sticking to our original vision, we want to establish both an Institutional Repository (for research publication) and a Digital Media Archive (for documents acquired from outside sources) at the same time

We want to use Hyperion for both

Can Hyperion – which was sold as a Digital Media Archive - live up to that?

RSLIS – Main areas for digital archive collecting

Research from RSLIS

PhD theses

Student’s theses

LIS documents

Danish library history

RSLIS internal documents

= Institutional Repository

= Media Archive

RSLIS – Main 6-part Digital Archive hierarchy

Let us look at the 10-item list again Institutional repositories should contain:

Search system



Rights management





Bitmap preservation

OAI compatibility

Hyperion strengths and weaknessesAll ten are supported, but some better than others

Search system



Rights management





Bitmap preservation

OAI compatibility

Practical problem areas

(Post configuration problems – the pre-configuration problems are forgotten)

Choosing the documents

Acquiring the documents

Copyright handling

Metadata creation

Choosing the documents

A collection development plan has to be drawn up, just as for print material.

We have seven plans, where we describe in detail what we intend to do and to achieve for each of our seven main collecting areas.

What we intend to achieve will probably take years and years to be fulfilled

Acquiring the documents

Getting permission for the files and getting the files (which is not the same!)

Handling the files in the interim period

Quality control routines

Naming rules for files and file AltIDs

When to display document only, and when to open the Hyperion hierarchy

Are all text documents to be indexed without exception?

Can multi-part text documents sensibly be treated as multi-image documents?

Do we prefer some file formats to others, e.g. PDF to Word?

Copyright handling

Copyright legislation is a morass

Nevertheless, we intend to keep the rules

Therefore, it is time consuming, as we will have to acquire written consent from all copyright owners

For students we have implemented a written agreement form, giving the library the right to put the thesis in the Digital Archive, and giving the student the right to withdraw the electronic document with 6 months notice. (But a paper copy will still be there)

For researchers, we will (together with the RSLIS) try to persuade them not to give the commercial publishers exclusive rights

Metadata creation

Apart from copyright handling, metadata creation is the most time consuming part of Hyperion work

We have not found any easy way out, because we (still) have so many requirements to fulfil

A main question is what to catalogue with a MARC record, when to use Dublin Core, and when to do both

Conversion between MARC and DC (or vice versa) would save much time

IF we decide to take preservation problems more seriously, other and more developed metadata sets than the DC set preconfigured in Hyperion would be a great benefit

MARC cataloguing and/or DC metadata

Main group danMARC2 Dublin Core

Research X X

PhD and student’s theses

X X

LIS-documents X X

Danish Library history, incl. photo collection

very few X

RSLIS internal documents

X

Metadata fields supplied with (our) Hyperion

CONTRIBUTR

COVERAGE

CREATOR

DATE

DESCRIPT

FORMAT

LANGUAGE

OTHER

PUBLISHER

RELATION

RELAPART

RELAVERS

RESOURCE

RESOURCEID

RIGHTS

SOURCE

SUBJECT

TITLE

TITLEREL

TYPE

ATID

DESC

ID

Metadata fields used by RSLIS (red)

CONTRIBUTR

COVERAGE

CREATOR

DATE

DESCRIPT

FORMAT

LANGUAGE

OTHER

PUBLISHER

RELATION

RELAPART

RELAVERS

RESOURCE

RESOURCEID

RIGHTS

SOURCE

SUBJECT

TITLE

TITLEREL

TYPE

ATID

DESC

ID

Frequency of metadata fields usedSource: Library Hi Tech 21 (2003) p. 164

CONTRIBUTR (20%)

COVERAGE (55%)

CREATOR (64%)

DATE (59%)

DESCRIPT (51%)

FORMAT (32 %)

LANGUAGE (41%)

OTHER

PUBLISHER (70%)

RELATION (39%)

RELAPART

RELAVERS

RESOURCE

RESOURCEID (100%)

RIGHTS (63%)

SOURCE (11%)

SUBJECT (60%)

TITLE (77%)

TITLEREL

TYPE (76%)

Example of RSLIS metadata

RSLIS – Use of attribute icons 1

RSLIS – Use of attribute icons 2

A Configuration inadequacy

More deficiencies that we have discovered

- The order of the DC metadata fields is nonsensical, and cannot be changed- A template for filling in DC metadata would be much welcomed, while waiting for MARC conversion and/or import functionality- There is a need to distinguish between searchable and non-searchable metadata fields. Otherwise, it will not be possible to search for a word that is used to describe a folder: the complete contents of the folder will be returned- When several images are connected to one metadata set, the images popping up cover the button that is to used for forward and backward viewing of the range of images- It is not possible to display a (historical) web page consisting of a set of files (eg. gif, css) – they cannot be interpreted properly to be shown in the way a web browser shows the page

Documents

Implementation of a Digital Media Archive in a University Library Ivar A. L. Hoel EUUG Annual Conference Amsterdam, September 2nd, 2004