Upload
roy-chapp
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
Digital Libraries with Greenstone:an open source solution
Tod Olson - University of Chicago
Fred Miller - Illinois Wesleyan University
Curtis Kelch - Illinois Wesleyan University
Copyright Tod Olson, Fred Miller, and Curtis Kelch 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
Digital Libraries with Greenstone
• Introduction
• About digital libraries
• Greenstone overview
• Examples
• Future
• Live demos
• Q & A
The World of Digital Libraries
• Access to Digital Collections– Text, images, audio, video– Searching and metadata
• Digital libraries versus repositories– Access and preservation
• Digital Preservation Tutorial http://www.library.cornell.edu/iris/tutorial/dpm/
Sorting Out the Ingredients
• Raw materials
• User interface
• Elements of organization
• Building the collection
GreenstoneNew Zealand Digital Library Project
at the University of Waikato• with UNESCO, Human Info NGO
International, every continentExamples:• Academic
– Digitization projects– Classes on digital libraries
• Non-academic– UNESCO humanitarian documentation
Greenstone features
• Works with existing documents– Imports several formats
• Searching: full text and metadata– Dublin Core, custom metadata
• Browse• Structured documents
– Indexing, access
• Extensible & customizable• OpenSource software (GPL)
Greenstone ArchitectureReceptionist
Collection Server Collection Server
DB & Indexes
Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356
Protocol
Collection
Import
DB & Indexes
Collection
Import
DB & Indexes
Collection
Import
Receptionist
Greenstone Architecture
Receptionist• Provides user
interface• Accept user input• Send to appropriate
collection server• Accept results• Dynamic page
generation
Collection Server• Handle collection
content• Search and filter
information• Return results• multiple collections
DB &Indexes
HTML
PDF Import BuildGSAF
???
Building Collections
Building collections
• Create a collection framework– or work with an old collection
• Select documents
• Import documents– Converts to internal XML format (GSAF)
• Build collection– creates search indexes and browse listings
GSAF: internal XML format
Section:• Description
– Metadata fields
• Content– Text,internal markup, images
• Section– No limit in number or depth
Hierarchical documentsSections nest, tree structure
<Section><Description>
<Metadata name=“Title” value=“…”><Content>
[Text, images, links, etc.]<Section>
<Description><Metadata name=“Title” …>
<Content>…<Section>…
<Section>…<Section>…
GSAF: internal XML format
Config file: collect.cfg
Collection-specific configuration file, collect.cfg, specifies:
• file types to import • Indexes and browse lists
– Document or section level– paragraph (text index only)
• display of results and browse listings • document displays
Chopin Early Editions
Over 400 early edition Chopin scores1830’s to 1880’s
Target audience: music scholars & musicians.
On web, page-turnable JPEG images. Online in March 2003
Currently 374 scores in online collection
Usage:Nearly100 hits per day, > 30% of use is international.
Catalogrecords
ScannedImages
Structuralmetadata
METSXSLT Greenstone
ArchiveFormat
GreenstoneDig. LibrarySoftware
Humanprocessing
XML-based automated processing
Build overview
METS to GSAF
dmdSecMODS: Title, …
fileSecpage1.jpgpage2.jpg
structMapdiv: Score
div: Page 1div: Page 2
SectionDescription
Metadata: Title, …Content:
Title, …Section
Content: Page 1 page1.jpg
SectionContent: Page 2
page2.jpg
Greenstone benefits for Chopin
• Robust, mature system• Recovered time in project
– Fast to bring up– UI out of the box– Dynamic page generation– Incremental customization
• XML compliant– Natural mapping from METS to GSAF
The Argus Digital Collection
• Illinois Wesleyan Student Newspaper– 1894 to 2000
• Preservation and Access
• Image PDF versus full text
• Web interface for building metadata
• Customized searches
Argus Metadata Maintenance
Argus Search
Argus Issue “front door”
Ongoing work: Greenstone
• Greenstone Librarian Interface (GLI)
• Greenstone 3
Greenstone Librarian Interface (GLI)
• Collection management– Informed by work at
GS sites– Assist collection
designer– Support all phases of
collection build process
– Do not specify workflow
• Java-based GUI tool– Formerly called the
“Gatherer”
• 2 yrs in development– Beta sites: Bangalore
and elsewhere
• Training sessions– UNESCO sessions in
Asia, Africa– JCDL 2004 tutorial
Greenstone 3
GS2 mature, 5+ yrs., wide deployment– Constraints: support legacy systems– Other technologies have matured: Java, XML
GS3: rewrite in Java, XML, XSLT• Distributed architecture, SOAP• METS as internal format
– Group assembled for Greenstone METS profile(s)
• OAI support planned• 1 year in dev; alpha testing in lab
Links & Further Information
Greenstone: http://www.greenstone.org/ Chopin Early Editions: http://chopin.lib.uchicago.edu/Argus Digital Collection:
http://www.iwu.edu/library/services/argus1.htm Argus Greenstone Documentation:
http://www.iwu.edu/~ckelch/ArgusProjectDoc12.pdf Witten & Bainbridge. How to Build a Digital Library. Morgan
Kaufman, 2003.
More about Greenstone…