34
GBV – Adding Digital Content to Library Records a library unions use case eSciDoc Days 2008 Konstantin Rekk Marc-J. Tegethoff Head Office (VZG) of the Common Library Network (GBV) www.gbv.de

autopdf_3644.pdf

Embed Size (px)

DESCRIPTION

robot

Citation preview

GBV – Adding Digital Content

to Library Records a library unions use case

eSciDoc Days 2008

Konstantin RekkMarc-J. Tegethoff

Head Office (VZG) of the Common Library Network (GBV) www.gbv.de

Overview

• What's GBV/VZG?

• Digital Content – a New Challenge to Libraries

• Three Steps to Mate Both Worlds at VZG

Common Library Network: GBVMembers: Federal States of

• Bremen

• Hamburg

• Mecklenburg-Western Pomerania

• Lower Saxony

• Saxony Anhalt

• Schleswig-Holstein

• Thuringia

and

• Foundation of Prussian Cultural Heritage (SPK)

VZG, Göttingen

GBV Goals

• introduction, maintainance and support of a common, homogenous library system infrastructure

• cataloging and service-oriented network of >800 scientific and public libs

• Partners: the Library Network of Baden-Württemberg BSZ, of Hesse (HeBIS), der German National Library (DNB), OCLCPICA in Leiden, Netherland and the Agence bibliographique de l´ enseignement superieur, France (ABES)

VZG -

Head Office of the GBV

• customers are libraries, not end users

• offer services to libs which they can use to serve their users

• library automation

• development of innovative library specific services for libraries

• fundamental task: shared cataloging service

VZG Services Overview

External Systems:

Union Catalogs, Worldcat,...

External Systems:

Union Catalogs, Worldcat,...

Common Union Catalogue: GVK

[CBS]

GBV Search&Order: GSO

[PSI]

Digital Content

[eSciDoc/Fedora]

[CONTENTdm]

Lokal Library Systems

[LBS 3/4/Sunrise]

Master-Slave

new: Catalogue EnrichmentHosting

Export/Import

CBS -

„Central Library System“

• heart of the lib. networks IT infrastr.

• Union Catalogue (GVK) of bibliographic records - virtual library of and public access to combined resources of all participating libs

• Pica/Pica+ (Pica Cataloguing Rules - 512 pp. )

• further services: • ILL - Online-Interlibrary Loan

• document delivery service subito

• additional library specific services to support library business processes

DMS Goals -

Catalog Enrichment and Hosting of Digital Content

• application hosting

• data backup and storage facility for projects without own infrastructure

• improve searchability and availability

• link different kind of content

• cooperate in standardisation of interchange formats, thesauri, classification schemas

• software support and development

DMS -

Some Constraints

• all business processes are built around catalog/Pica format:• primary source for search and retrieval

• primary storage for metadata

• primary reference for content models

• primary field of competence of VZG staff

• existing infrastructure to integrate with repository and middleware

• no publication workflow, rather cataloging workflow

ZVDD -Central Directoryof Digitised Prints

Different Project Categories

• ToC for books, abstracts, ...

• digitized print publications

• integration of full text

• digital born content:•National Licences: eJournals, ebooks, ...

• archives

• museums

• archeological collections

• closed projects versus in process

• catalog binded or not

Archive - Digitales

Stadtarchiv Duderstadt

Archive - Digitales

Stadtarchiv Duderstadt

Archeology - www.viamus.de

• Views .jpg

• 360° Panorama .mov

• Text spoken .mp3

• Descr. text .xml

• 3D-Scan .mts

Viamus

Museum - Digicult

• .tif

• .xml

Projects > Issues• old projects with ended funding – save data!

• local legacy databases, applications and formats, island solutions

• logical structure of objects not always visible from the MD

• needs analysis of specific domain

• heterogeneous objects

• MD from different sources and in different (non-standard) formats

Projects > Issues > Content Models

• expertise required for domain specific object management – data objects modeling

• Cataloging of DOs still fragmented by business domain, no established guidelines

Projects > Issues > Amount of Data

• Storage: 2008 50 TB expected, about 400TB over the next years

• 315000 Documents ( ToC), 2.000.000 Images

• next year >10.000.000 Images

Solution?

Three Phases:

1. add content to records

2. get SOA-ready

3. add records to content

Phase One –

Adding Digital Content To Library Records• essentially adding a link to a suitable category

of the record

• inject digital content into the library system

• catologue as a means for storing and organizing structural and semantic metadata

• Paradigm: one bibliographic record (aufnahme) – one DO Object

• hope – will be good enough for most purposes

Library Catalog as MD Store for DOs• catalog binded objects

• use existing standards as reference for an object modeling approach

• Pica metadata format from library world – force a mapping, press structural information from non library world into the pica format – not 100% faithful

• concordance Pica <-> non-library formats

Pica Concordance Example

Exchange Formats as Guidelines for Creating Domain Specific Content Models

• EAD - Encoded Archival Description (EAD Working Group of the Society of American Archivists SAA)

• EAC - Encoded Archival Context (Projekt LEAF (Linking and Exploring Authority Files) )

• museumdat (Special Interest Group Documentation of German Museums Association (DMB) ) - generalisation of CDWA Lite, compatible to CIDOC-CRM (ISO 21127) )

• see also FRBR – Functional Requirements for Bibliographic Records

Staged Hierarchical Storage

Copy 3: Tape Robot

LTO3

WORM

LTO1

Active

Files Archiving

Firewall

40 TB

Copy 1

Fedora Masterinstanz

Sun X4600

(SUN SAM-FS)

Disc-Cache

900 GB 900 GB

Quick-File-System

Copy 2

Virtual

Tapelibrary

Coopan

DMS-Storage

(Failover + Test)

Using eSciDoc in Phase One

• feeling that our needs are taken into consideration

• structured storage

• minimal metadata set and pluggable transformation

• automatic indexing

• SRU - interface

• REST and SOAP APIs for repository access ... built for SOA

• predefined straightforward one-size-fits-all container/collection model

Using eSciDoc in Phase One

• semi-structured free-form table of contents (toc) for objects (entry page)

• Shibboleth integration

• complex queries for selection and aggregation of objects• find all objects of type ToC that have been changed in

the last 24 h ( to update Indexer CBS )

• find all objects of type ToC for export to an other library organization

Phase Two –

SOA-based System Integration

• integrate catalogue, repository and web2.0 functions

• provide stable and consistent interfaces to cataloging clients

• ensure reliable processing and linking of data

• extract common functions from scripts to webservices (format tranformations, validation, ...)

• for specific purposes use scripts for flexible piping of webservices

System Integration Plan

Broker

Apache Proxy

DO

Cat

alog

ing

eSciDoc

Fedora

CBS

Handle System

GS

O

(Sea

rch

)

(Ret

riev

e)

Com

mon

Lib

Ser

vice

s

DO

In

gest

Win

IBW

Issue:Transaction SafetyData Integrity

‏ ‏

Phase Three

• „... long, long ahead in a library far, far ...“ - full fledged repository, when library and digital repository world will have become one, adding library records to digital objects?

• extended modeling and usage of object relations, connections

• extended semantical indexing of heterogenous MD-formats, ontologies and other semweb stuff

• complete models for museums, archives, specific domains

Using eSciDoc in Phase Three

• hope to reuse or contribute content models according to german and international standards from the community

• cataloging workflow support might become interesting

• reusing clients and workflow components

What do we need? -

Summary

• fast ingest (eSciDoc v254: ToC 51 Tage, VD17 121 Tage, Gale 4,5 Jahre)

• clustering, replication

• „reduced batch versions“ of interfaces

• Directory of reusable DO Models, Modeling recommendations, best practices (Community)

• Metadata Mapping Recommendations (Community)

• Shibboleth (already there)

• in our case: more put then get

Questions?