Upload
daniel-sandria
View
7
Download
2
Tags:
Embed Size (px)
DESCRIPTION
robot
Citation preview
GBV – Adding Digital Content
to Library Records a library unions use case
eSciDoc Days 2008
Konstantin RekkMarc-J. Tegethoff
Head Office (VZG) of the Common Library Network (GBV) www.gbv.de
Overview
• What's GBV/VZG?
• Digital Content – a New Challenge to Libraries
• Three Steps to Mate Both Worlds at VZG
Common Library Network: GBVMembers: Federal States of
• Bremen
• Hamburg
• Mecklenburg-Western Pomerania
• Lower Saxony
• Saxony Anhalt
• Schleswig-Holstein
• Thuringia
and
• Foundation of Prussian Cultural Heritage (SPK)
VZG, Göttingen
GBV Goals
• introduction, maintainance and support of a common, homogenous library system infrastructure
• cataloging and service-oriented network of >800 scientific and public libs
• Partners: the Library Network of Baden-Württemberg BSZ, of Hesse (HeBIS), der German National Library (DNB), OCLCPICA in Leiden, Netherland and the Agence bibliographique de l´ enseignement superieur, France (ABES)
VZG -
Head Office of the GBV
• customers are libraries, not end users
• offer services to libs which they can use to serve their users
• library automation
• development of innovative library specific services for libraries
• fundamental task: shared cataloging service
VZG Services Overview
External Systems:
Union Catalogs, Worldcat,...
External Systems:
Union Catalogs, Worldcat,...
Common Union Catalogue: GVK
[CBS]
GBV Search&Order: GSO
[PSI]
Digital Content
[eSciDoc/Fedora]
[CONTENTdm]
Lokal Library Systems
[LBS 3/4/Sunrise]
Master-Slave
new: Catalogue EnrichmentHosting
Export/Import
CBS -
„Central Library System“
• heart of the lib. networks IT infrastr.
• Union Catalogue (GVK) of bibliographic records - virtual library of and public access to combined resources of all participating libs
• Pica/Pica+ (Pica Cataloguing Rules - 512 pp. )
• further services: • ILL - Online-Interlibrary Loan
• document delivery service subito
• additional library specific services to support library business processes
DMS Goals -
Catalog Enrichment and Hosting of Digital Content
• application hosting
• data backup and storage facility for projects without own infrastructure
• improve searchability and availability
• link different kind of content
• cooperate in standardisation of interchange formats, thesauri, classification schemas
• software support and development
DMS -
Some Constraints
• all business processes are built around catalog/Pica format:• primary source for search and retrieval
• primary storage for metadata
• primary reference for content models
• primary field of competence of VZG staff
• existing infrastructure to integrate with repository and middleware
• no publication workflow, rather cataloging workflow
Different Project Categories
• ToC for books, abstracts, ...
• digitized print publications
• integration of full text
• digital born content:•National Licences: eJournals, ebooks, ...
• archives
• museums
• archeological collections
• closed projects versus in process
• catalog binded or not
Archeology - www.viamus.de
• Views .jpg
• 360° Panorama .mov
• Text spoken .mp3
• Descr. text .xml
• 3D-Scan .mts
Projects > Issues• old projects with ended funding – save data!
• local legacy databases, applications and formats, island solutions
• logical structure of objects not always visible from the MD
• needs analysis of specific domain
• heterogeneous objects
• MD from different sources and in different (non-standard) formats
Projects > Issues > Content Models
• expertise required for domain specific object management – data objects modeling
• Cataloging of DOs still fragmented by business domain, no established guidelines
Projects > Issues > Amount of Data
• Storage: 2008 50 TB expected, about 400TB over the next years
• 315000 Documents ( ToC), 2.000.000 Images
• next year >10.000.000 Images
Phase One –
Adding Digital Content To Library Records• essentially adding a link to a suitable category
of the record
• inject digital content into the library system
• catologue as a means for storing and organizing structural and semantic metadata
• Paradigm: one bibliographic record (aufnahme) – one DO Object
• hope – will be good enough for most purposes
Library Catalog as MD Store for DOs• catalog binded objects
• use existing standards as reference for an object modeling approach
• Pica metadata format from library world – force a mapping, press structural information from non library world into the pica format – not 100% faithful
• concordance Pica <-> non-library formats
Exchange Formats as Guidelines for Creating Domain Specific Content Models
• EAD - Encoded Archival Description (EAD Working Group of the Society of American Archivists SAA)
• EAC - Encoded Archival Context (Projekt LEAF (Linking and Exploring Authority Files) )
• museumdat (Special Interest Group Documentation of German Museums Association (DMB) ) - generalisation of CDWA Lite, compatible to CIDOC-CRM (ISO 21127) )
• see also FRBR – Functional Requirements for Bibliographic Records
Staged Hierarchical Storage
Copy 3: Tape Robot
LTO3
WORM
LTO1
Active
Files Archiving
Firewall
40 TB
Copy 1
Fedora Masterinstanz
Sun X4600
(SUN SAM-FS)
Disc-Cache
900 GB 900 GB
Quick-File-System
Copy 2
Virtual
Tapelibrary
Coopan
DMS-Storage
(Failover + Test)
Using eSciDoc in Phase One
• feeling that our needs are taken into consideration
• structured storage
• minimal metadata set and pluggable transformation
• automatic indexing
• SRU - interface
• REST and SOAP APIs for repository access ... built for SOA
• predefined straightforward one-size-fits-all container/collection model
Using eSciDoc in Phase One
• semi-structured free-form table of contents (toc) for objects (entry page)
• Shibboleth integration
• complex queries for selection and aggregation of objects• find all objects of type ToC that have been changed in
the last 24 h ( to update Indexer CBS )
• find all objects of type ToC for export to an other library organization
Phase Two –
SOA-based System Integration
• integrate catalogue, repository and web2.0 functions
• provide stable and consistent interfaces to cataloging clients
• ensure reliable processing and linking of data
• extract common functions from scripts to webservices (format tranformations, validation, ...)
• for specific purposes use scripts for flexible piping of webservices
System Integration Plan
Broker
Apache Proxy
DO
Cat
alog
ing
eSciDoc
Fedora
CBS
Handle System
GS
O
(Sea
rch
)
(Ret
riev
e)
Com
mon
Lib
Ser
vice
s
DO
In
gest
Win
IBW
Issue:Transaction SafetyData Integrity
Phase Three
• „... long, long ahead in a library far, far ...“ - full fledged repository, when library and digital repository world will have become one, adding library records to digital objects?
• extended modeling and usage of object relations, connections
• extended semantical indexing of heterogenous MD-formats, ontologies and other semweb stuff
• complete models for museums, archives, specific domains
Using eSciDoc in Phase Three
• hope to reuse or contribute content models according to german and international standards from the community
• cataloging workflow support might become interesting
• reusing clients and workflow components
What do we need? -
Summary
• fast ingest (eSciDoc v254: ToC 51 Tage, VD17 121 Tage, Gale 4,5 Jahre)
• clustering, replication
• „reduced batch versions“ of interfaces
• Directory of reusable DO Models, Modeling recommendations, best practices (Community)
• Metadata Mapping Recommendations (Community)
• Shibboleth (already there)
• in our case: more put then get
Credits and Contact
created by Konstantin Rekk