Upload
brent-holmes
View
214
Download
2
Embed Size (px)
Citation preview
Kurt MalyDepartment of Computer Science
Old Dominion UniversityNorfolk, Virginia 23529, USA
Digital Libraries, OAI and Free Software for Education and Science
5th National ConferenceComputer Application Federation of China Instrument & Control Society
Yinchuan, Ningxia Province,PRCSeptember 22-24, 2003
Sept 24, 2003 5th National CACIS Conference
2
Outline Digital Libraries The Open Archives Initiative Free Software Systems
Arc DP9 Kepler RVOT
Conclusions Important URLs
Sept 24, 2003 5th National CACIS Conference
3
Digital Libraries DL = library whose content is
stored digitally and can be accessed over the Internet
Key difference between DLs and the general Web is that the content is structured and has metadata associated with it allowing for more precise results to queries
Sept 24, 2003 5th National CACIS Conference
4
Digital Libraries Development of software to support DLs
has proceeded along proprietary software lines
It is extremely difficult for the average user to find information that is in different DLs
Need for interoperability between DLs
Sept 24, 2003 5th National CACIS Conference
5
Digital Libraries DL interoperability can be achieved at three
levels technical:protocol, format, etc. should be
consistent so that messages can be exchanged content: agreements cover the data and metadata,
agreements on the interpretation of messages organizational: includes rules for access, for
changing collections and services, payment, and authentication
Need to federate, filter and provide value-added services on remote content
Sept 24, 2003 5th National CACIS Conference
6
Open Archives Initiative address technical interoperability
among distributed archives facilitate the discovery of content in
distributed archives The OAI framework defines two
functional roles: data providers (archives) and service providers
Sept 24, 2003 5th National CACIS Conference
7
Open Archives Initiative Data providers: expose the metadata of their
objects for harvesting Service providers: extract metadata from data
providers via the OAI metadata harvesting protocol
Service provider develop value-added services that are based on the metadata collected from data providers such as: cross-archive search engines, linking systems,
and peer-review systems
Sept 24, 2003 5th National CACIS Conference
8herbert van de sompel
The Open Archives Iinitiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel
OAI origin
herbert van de sompel
Sept 24, 2003 5th National CACIS Conference
9
Core concepts of Santa Fe convention
herbert van de sompel
• low-barrier interoperability
• data-provider & service-provider model
• metadata harvesting model
• shared metadata format and parallel, community-
specific metadata formats
• acceptable use
Dienst subset
OAMS
XML reply
HTTP based
Gentelmen’s agreement
Sept 24, 2003 5th National CACIS Conference
10
core concepts in OAI 1.0
herbert van de sompel
• low-barrier interoperability
• data-provider & service-provider model
• metadata harvesting model
• shared metadata format and parallel, community-
specific metadata formats
• acceptable use
• flexibility
OAI 1.0 protocol
Dublin Core
HTTP based
Community specific
Reply • XML Schema
• Self contained
Sept 24, 2003 5th National CACIS Conference
11
The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.
new OAI mission statement
herbert van de sompel
Sept 24, 2003 5th National CACIS Conference
12
The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program.
new OAI mission statement
herbert van de sompel
Sept 24, 2003 5th National CACIS Conference
13
The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials.
[...]
new OAI mission statement
herbert van de sompel
Sept 24, 2003 5th National CACIS Conference
14
Free software - Arc Arc harvests metadata currently from
about 150 OAI compliant archives normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle)
over 6 Million metadata records from various subject domains
Arc also provides OAI layer, thus making hierarchical harvesting possible
Sept 24, 2003 5th National CACIS Conference
15
Sept 24, 2003 5th National CACIS Conference
16
Sept 24, 2003 5th National CACIS Conference
17
Free Software – DP9 “deep web" or "invisible web" a vast
repository of content, such as documents in online databases, that general-purpose web crawlers cannot reach
500 times that of the surface web Internet search engines can not index OAI
collections, as they are not aware of the OAI protocol
Sept 24, 2003 5th National CACIS Conference
18
Free Software – DP9 A Web crawler indexes a Web site by starting
with a base HTML page and following the links on this page to go deeper to retrieve other pages on the Web site
DP9 computes and presents an HTML page presented to a Web crawler as a result of an OAI request, and the links on the Web page leads to other OAI requests
Sept 24, 2003 5th National CACIS Conference
19
Free Software – DP9 DP9 provides an entry page and if a web
crawler finds this entry page, it may follow the links on this page and send requests to DP9.
DP9 will then forward the request to corresponding OAI Data Providers and process the returned XML records
Depending on the depth a crawler follows, it can index all records in an OAI Data Provider
Sept 24, 2003 5th National CACIS Conference
20
Free Software – DP9
W eb Craw ler
O AI Repos itory
O AI Repos itory
URLW rapper
J S P /S ervlet
X S LTP roc es s or
O AIHandler
DP9S t a t icU R L
T rans late
S end O AI reques t/G et X M L reply
Call
ReturnHT M L
Call
Sept 24, 2003 5th National CACIS Conference
21
Sept 24, 2003 5th National CACIS Conference
22
Free Software - Kepler The objective of the Kepler framework is to
satisfy the need for the average researchers at an average university to publish results and disseminate them to a wide audience quickly and conveniently
The Kepler framework is based on OAI to support what is called "personal data providers" or "archivelets"
Sept 24, 2003 5th National CACIS Conference
23
Free Software - Kepler Kepler framework - a digital library of
many ‘little’ publishers. an easy-to-use archivelet that is
downloadable and self-installing an automated registration service to
support tens of thousands of publishers a simple service provider to harvest
metadata from archivelets.
Sept 24, 2003 5th National CACIS Conference
24
O AI C om pliantR epository
Publish ingT ool
O AI C om pliantR epository
Publish ingT ool
O AI C om pliantR epository
Publish ingT ool
R egistra tionService
ServiceProvider
ServiceProvider
ServiceProvider
Sept 24, 2003 5th National CACIS Conference
25
Sept 24, 2003 5th National CACIS Conference
26
Sept 24, 2003 5th National CACIS Conference
27
Free Software - RVOT Rapid Visual OAI Tool (RVOT) is a tool that
can help small organizations in making their collections OAI-PMH compliant
construct an OAI-PMH repository from a collection of files metadata translation tool
records in the original collection can be in any of the supported formats including RFC1807, MARC subset, and COSATI formats
lightweight HTTP server including an OAI-PMH request handler
Sept 24, 2003 5th National CACIS Conference
28
Free Software - RVOT
Table 1. OAI-PMH Related Tools
Category Tools Publishing software DSpace, eprints.org, CDSWare, Kepler Data provider programming framework
UIUC OAI Implementation, OCLC OAICat, VTOAI package, oaiperl
Server software integrated with harvester
Arc, Clelestial
Harvester programming framework
OCLC OAIHarvester, oaiperl, my.OAI
Other tools DP9, Repository Explorer
Sept 24, 2003 5th National CACIS Conference
29
Free Software – RVOT
Category Tools Publishing software DSpace, eprints.org, CDSWare, Kepler Data provider programming framework
UIUC OAI Implementation, OCLC OAICat, VTOAI package, oaiperl
Server software integrated with harvester
Arc, Clelestial
Harvester programming framework
OCLC OAIHarvester, oaiperl, my.OAI
Other tools DP9, Repository Explorer
Sept 24, 2003 5th National CACIS Conference
30
Sept 24, 2003 5th National CACIS Conference
31
Sept 24, 2003 5th National CACIS Conference
32
Sept 24, 2003 5th National CACIS Conference
33
Conclusions OAI makes the many digital libraries available
today interoperate in such a way that users can discover information across a wide variety of domains without having to be aware of the many different user interfaces of the individual libraries
OAI was founded by researchers who were interested not only in free distribution of information but also in free distribution of software
Sept 24, 2003 5th National CACIS Conference
34
Conclusions All the software systems described in this
paper are freely available either in OpenSource or directly from the research group that created it
one caveat: free software does not necessarily mean no cost running of services. One still has to account for the need for technical support and hardware to set up services
Sept 24, 2003 5th National CACIS Conference
35
Important URLs http://dlib.cs.odu.edu - ODU digital
library research group http://www.openarchives.org http://arc.cs.odu.edu http://sourceforge.net/projects/oaiarc/ http://dlib.cs.odu.edu/dp9 http://kepler.cs.odu.edu