Upload
signa
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
U.S. Government Use of the OAI-PMH. Michael L. Nelson Old Dominion University Norfolk Virginia, USA [email protected] http://www.cs.odu.edu/~mln/. Indo-US Workshop on Open Digital Libraries and Interoperability Arlington, VA - June 23-25, 2003. Acknowledgements. - PowerPoint PPT Presentation
Citation preview
U.S. Government Use of the OAI-PMH
Michael L. NelsonOld Dominion University
Norfolk Virginia, USA
[email protected] http://www.cs.odu.edu/~mln/
Indo-US Workshop on Open Digital Libraries and Interoperability
Arlington, VA - June 23-25, 2003
Acknowledgements
• ODU: K. Maly, M. Zubair, J. Bollen, X. Liu
• LANL: R. Luce, X. Liu
• NASA: G. Roncaglia, J. Rocker
• MAGiC (UK): P. Needham
Outline
• Review:– OAI-PMH– data provider / service provider model
• including “aggregators”
• Role of registration for repositories• NASA projects• OSTI demo project• Technical Report Interchange (TRI)
– NASA, DOE, DOD
Disclaimer: Scientific and Technical Information (STI)
• This talk will cover US Government focused / sponsored STI only
• This talk will not cover American Memory– a cultural history project from the Library of
Congress (LoC)• http://memory.loc.gov/
– the LoC played a significant role in the definition and early adoption of the OAI-PMH
Acronym Review
NASA Department of Energy Department of Defense
CASI(Center for AeroSpace
Information)http://www.sti.nasa.gov/
OSTI(Office of Scientific and Technical Information)
http://www.osti.gov/
DTIC(Defense Technical Information Center)http://www.dtic.mil/
LaRC = Langley Research Center LANL = Los Alamos National LaboratorySandia = Sandia National Laboratory
AFRL = Air Force Research Laboratory
The Rise and Fall of Distributed Searching
• wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice– Davis & Lagoze, JASIS 51(3), pp. 273-80– Powell & French, Proc 5th ACM DL, pp. 264-265
• distributed searching of N nodes still viable, but only for small values of N
• NCSTRL: N > 100; bad• NTRS/NIX: N<=20; ok (but could be better)
resource
all available metadata about David
item
Dublin Coremetadata
MARCmetadata
SPECTRUMmetadata records
item = identifier
record = identifier + metadata format + datestamp
set-membership is item-level property
resource – item - record
Overview of OAI-PMH Verbs
Verb Function
Identify description of repository
ListMetadataFormats metadata formats supported by repository
ListSets sets defined by repository
ListIdentifiers OAI unique ids contained in repository
ListRecords listing of N records
GetRecord listing of a single record
metadataabout therepository
harvestingverbs
most verbs take arguments: dates, sets, ids, metadata formatsand resumption token (for flow control)
Data Providers / Service Providers
data providers(repositories)
service providers(harvesters)
Aggregators
data providers(repositories)
service providers(harvesters)
aggregator
aggregators allow for:• scalability for OAI-PMH• load balancing • community building• discovery
Aggregators
• Frequently interchangeable terms:– aggregators: likely to be community / institutionally
focused– caches: stores a copy, less likely to be community-
oriented– proxies: less likely to store a copy, may gateway between
OAI-PMH and other protocols• Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03
• To learn more about aggregators, caches & proxies:– http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm– http://www.cs.odu.edu/~mln/jcdl03/
Example Aggregators
• Arc - http://arc.cs.odu.edu/– first described “hierarchical harvesting” in D-
Lib Magazine, 7(4) 2001• http://www.dlib.org/dlib/april01/liu/04liu.html
• Celestial - http://celestial.eprints.org/– among other services, it provides a history of
harvests (successful vs. errors)• http://celestial.eprints.org/cgi-bin/status
OAI-PMH 2.0 Registration
Data Providers: http://www.openarchives.org/Register/BrowseSites.plService Providers: http://www.openarchives.org/service/listproviders.html
75 repositories registered
??? unregistered repositories
unregistered because:• testing / development• not for public harvesting • public, but “low-profile”• never got around to it…• ???
DP:SP ~= 5:1
Registration is Nice……But Not Required
• OAI-PMH is (becoming) the “http” for digital libraries– there is no central registry of http servers
• remember the NCSA “What’s New” page? (ca. 1994)
• There will never be “registration support” in OAI-PMH– registries are a type of service provider, built on top of
OAI-PMH– registration will be an integral part of community
building– friends…
<friends>• A light weight, optional, DP-centric
method to communicate the existence of “others”
http://techreports.larc.nasa.gov/ltrs/oai2.0/?verb=Identify
..<description> <friends ..namespace stuff..> <baseURL>http://naca.larc.nasa.gov/oai2.0</baseURL> <baseURL>http://ntrs.nasa.gov/oai2.0</baseURL> <baseURL>http://horus.riacs.edu/perl/oai/</baseURL> <baseURL>http://ston.jsc.nasa.gov/collections/TRS/oai/</baseURL> </friends> </description>..
<friends>…</friends>
http://techreports.larc.nasa.gov/ltrs/oai2.0/ http://naca.larc.nasa.gov/oai2.0/
http://ntrs.nasa.gov/oai2.0/
http://ston.jsc.nasa.gov/collections/TRS/oai/
http://horus.riacs.edu/perl/oai/
harvester
Identify
NASA <friends> example
Use of <friends>
Slide from S. Warner, Cornell University
Langley Technical Report Server
• publicly available– began as an anonymous ftp
server in 1992; http access in 1993
– model for other technical report servers at other NASA centers
• details in NASA TM-109162
• mostly LaTeX, MS Word, other systems– some scanned reports
http://techreports.larc.nasa.gov/ltrs/http://techreports.larc.nasa.gov/ltrs/oai2.0/
NACA Technical Report Server
• publicly available– began in 1996– details in NASA TM-1999-
209127
• scanned reports from 1917-1958– NACA = predecessor to NASA
• contents mirrored with the MaGIC project– a UK-based grey-literature
preservation project– OAI-PMH used to mirror
contents
http://naca.larc.nasa.gov/http://naca.larc.nasa.gov/oai2.0/
NACA Report 1345
as seen through its native DLhttp://naca.larc.nasa.gov/
NACA Report 1345
as seen through MAGiChttp://www.magic.ac.uk/
NACA Report 1345
as seen through its Scirus(Elsevier)http://www.scirus.com/
NACA Report 1345
as seen through OAIster
http://oaister.umdl.umich.edu/
NACA Report 1345
as seen through my.OAI(FS Consulting)http://www.myoai.com/
NTRS OAI Architecture
user
. . .
search for “cfd applications”
local copy ofmetadata
metadata harvested offline, through OAI interface
each node independently maintained
individual nodes canstill support direct userinteraction
NTRS
LTRS ATRS GTRS CASITRS
all searching, browsing, etc. performed on the metadata here
content (reports) remain archived at the local sites
NASA Technical Report Server• publicly available• replacement for the former
distributed searching version of NTRS– MySQL– Va Tech harvester– modified “bucket”– details in Nelson, Rocker,
Harrison, Library Hi-Tech, 21(2) (July 2003)
• a service provider & aggregator– same OAI-PMH baseURL as
used for interactive searchinghttp://ntrs.nasa.gov/
NASA Technical Report Server
• advanced, fielded search
• explicit query routing – 12 NASA repositories
– 4 non-NASA repositories
• turned “off” by default
non-NASArepositories
> 0.5M records
NASA DLs in the Larger STI Realm
NTRS
LTRS ATRS CASITRS…
DOEDODUniversitiesPublishers . . .International
NTRS could also be a data provider from the point of view of other DLs; allowing theharvesting of NASAreport metadata.
NTRS could also harvestmetadata from other DLs,and provide access to non-NASA content.
We hope to influencethe direction of the science.gov effort to useOAI-PMH
this could be a fully connected graph
OSTI Energy Citations Database
• OAI-PMH support just recently added (Feb 2003)– not yet officially
announced or registered
– 20k records, 8k full-text
• other OSTI collections planned
http://www.osti.gov/energycitations/
Technical Report Interchange • Goal: share technical reports between 4 US
government labs without creating new digital libraries for users to learn!– NASA Langley Research Center– Air Force Research Laboratory– Los Alamos National Laboratory (DOE)– Sandia National Laboratory (DOE)
• Solution: use cooperating OAI-PMH caches at each site to – export local contents – ingest remote contents
TRI Production System - Status
LaRCTRI System
LANLTRI System
SandiaTRI System
AFRLTRI System
ODUTRI System(Listener)
Records coming in from other TRI systems
Records going out to other TRI systems
Slide from M. Zubair, ODU
ProposedIn
Production
Mappings in TRI
Laboratory NativeMetadataFormat
Native SourceCommercial DLSystem
NativeDestinationCommercial DLSystem
LaRC MARC BASIS+ (TBD)LANL MARC + local fields Geac ADVANCE Science ServerAFRL COSATI Sirsi STILAS Sirsi STILASSandia MARC Horizon Verity
Details in Liu, et al. ECDL 2002; the above table also taken from the same paper
A Single TRI Module
Local DB
Scheduler
Read new data fromremote DLWrite new data publishedin local DL
Input Directory
Local DL Manager
Remote Data inDC formatLocal Data in DC format
Write Remote data to localformat
output Directory
Read local data andconvert to DC format
Connect to remote DL byOAI protocol
OAI Harvester ControlCommon Modules in all three DLsSpecific module for each DL
Slide from M. Zubair, ODU
The Future: Community Building
• Ultimately, protocols and metadata formats are not what makes a difference
• Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML)
• The best current example: The Open Language Archives Community – http://www.language-archives.org/
• OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends
STI Communities
• Government produced/sponsored STI• http://ntrs.nasa.gov/• http://www.osti.gov/energycitations/• http://dlib.cs.odu.edu/tri/
• Academia– self-archiving vs. institutional archives
• http://www.soros.org/openaccess/• http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm
• Commercial publishers– e.g. BioMed Central
• http://www.biomedcentral.com/