61
ECDL Workshop “Extending Interoperability of Digital Libraries: Building on the Open Archives Initiative” Lisbon – September 21, 2000 Edward A. Fox [email protected] http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech,

ECDL Workshop “Extending Interoperability of Digital Libraries: Building on the Open Archives Initiative” Lisbon – September 21, 2000 Edward A. Fox [email protected]

Embed Size (px)

Citation preview

ECDL Workshop“Extending Interoperability of

Digital Libraries:Building on the

Open Archives Initiative”

Lisbon – September 21, 2000

Edward A. [email protected] http://fox.cs.vt.edu

CS DLRL Internet TIC

Virginia Tech, Blacksburg, VA, USA

Acknowledgements (Selected)

Sponsors: CNI, DLF, Dept. of Energy, DFG, NASA, NSF, …

VT Faculty/Staff: Anthony Atkins, …

VT Students: Fernando Das Neves, George Fillipini, Robert France, Marcos Goncalves, Hussein Suleman, …

Open Archives Initiative

OAIwww.openarchives.org

[email protected]

Program

9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward

Program

9-10 Session 1 – Introduction– Introductory Remarks (Fox, Lagoze) – 15 min– Introductions from Participants

(Fox, Lagoze) – 30 min– Historical Overview (Fox) – 45 min

Introductory Remarks - Fox

Welcome! Thanks to conference organizers Program/Logistics Latest in series of meetings that have

shaped OAI during its first year

Introductory Remarks - Lagoze

Introductions from Participants - 1

“Straw Polls”

Training: CS / LIS / Sciences / Humanities / ?

Work Now: CS / LIS / Sciences / Humanities / ?

Location: University / Industry / Gov. / Assn. / ?

OAI Connection: Run an “archive” or DL or collection / Manage data / Develop software / Standards / ?

Introductions from Participants - 2

OAI Meeting Involvement: Santa Fe mtg / San Antonio mtg / Technical Committee / Cornell mtg / Steering Committee

OAI Trials: Opened an archive / Developed software for OAI / ?

OAI Project: Wrote proposal / Plan to write a proposal / Have internally funded project / Have externally funded project

Introductions from Participants

Short Statements (20 seconds per person)– Name (pronounced slowly, clearly)– Country– Affiliation (institution/organization)

Historical Overview - Fox

Meetings– Santa Fe – “archives of the world unite”

Philosophy Repositories / Building on Black Boxes Approaches to building repositories VT view Some proposals for funding Development efforts

Open Archives Initiative (OAI) xxx@LANL, high-energy physics (Ginsparg, 1991) CSTR + WATERS = NCSTRL (Lagoze,1994) xxx + NCSTRL = CoRR collaboration (1998) Universal Preprint Service protoproto, Oct. 21-22, 1999, Santa Fe

– led by LANL, CNI, DLF, Mellon --> OAi Santa Fe Convention (see Feb. D-Lib Magazine article) Follow-on mtgs: 6/3@San Antonio, 9/21@Lisbon (ECDL) Archives -> Open Archives

– Support unique archive identifiers– Implement Open Archives metadata set (DC, using XML)– Implement OA harvesting protocol (derived from Dienst protocol)– Register the archive

Build tools, layer other services: linking, searching, …

Open Archives (protoproto)

ArXiv & Los Alamos National LabCogPrints & U. SouthamptonNACA & NASA (reports)NCSTRL & Cornell U.NDLTD & Virginia TechRePEc & U. SurreyTotal of around 200K records

Original Open Archives Members

American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation

NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University

Open Archives Future – 1st View

EconWPA (U. Washington) e-biomed -> PubMed Central (NIH) PubScience (DOE) Clinical Medicine Netprints (+ other HighWire Press holdings ) University ePub (California Digital Library) All public e-prints (MIT) Scholar’s Forum (Caltech) Int’l: CERN, Germany, India, Mexico, … Goal: millions of books/articles/reports / yr

OAi Philosophy

Self-archiving = submission mechanismLong-term storage system = archiveOpen interface = harvesting mechanismData provider + service providerStart with “gray literature”

– e-prints/pre-prints, reports, dissertations, …

Tiered Model of Interoperability

Mediator services

Metadata harvesting

Document models

Repository of Digital Objects

RepositoryAccessProtocol

handle

Digital object

terms and conditions

OAI – Repository Perspective

Required: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Black Box OAI-ETD Perspective

ISTEC(Ibero

America)

PhysDis

NSYSU(Taiwan)

ADT(Australia)

BN.PT(Portugal)

www.theses.org

CyberTheses(Francophone)

VT

Dissert.Online(Germany)

MITOhioLINK

CBUC(Catalunya)

NDC(Greece)

NDC(Greece)

CIC U. Bergen(Norway)

Approaches to Open Archives

Build ByDiscipline

Build By Institution

Approaches to Open Archives

Build ByDiscipline

Build By Institution

AuthorCategoryInterdisciplinaryYearLanguageQuery …

Mechanisms

Sharing– Join initiative, run software– Make metadata and archive available

Aggregating– By discipline– By institution– By genre

Automating– Workflow– Harvesting and providing services– Federated searching– Dynamic linking (e.g., with SFX)

VT View of the Open Archives Initiative (OAI)

Enable sharing of publication metadata and full-text by digital libraries

Standardize low-level mechanisms to share contents of libraries

Build higher-level user-centric and administrative services in meta-libraries

Install organizational mechanisms to support the technical processes

Virginia Tech Projects

MARC XML-DTDComputer Science Teaching Centre (CSTC)W3C Web Characterization RepositoryOAI Repository ExplorerNetworked Digital Library of Theses and

Dissertations (NDLTD)OAI-Campus (esp. multimedia)

MARC XML-DTD

XML Transport format for US-MARC records

Standardized metadata exchange format for traditional library services joining OAI

CS Teaching Center (CSTC)

Collection of reviewed online resources used to aid in teaching of Computer Science

Supports author submission and peer-review process for new ACM Journal of Educational Resources In Computing (JERIC)

Connected with NSDL (NSF 00-44)

http://www.cstc.org

W3C Web Characterization Repository

Online database of metadata related to publications, tools and data sets dealing with Web characterization

Project of the Web Characterization Activity working group of the World-Wide-Web Consortium (www.w3c.org/WCA)

http://purl.org/net/repository

OAI Repository Explorer

Serves as a compliancy test Allows browsing of open archives using only OAI

protocol Sends requests on behalf of user, parses and checks

responses and displays browsable interface Will detect most discrepancies in protocol

http://purl.org/net/explorer

OAI-Campus

Undergrad term project for Honors course on digital libraries

Aim is to have many OAs on campus Emphasis will be on multimedia collections Survey developed for campus:http://intercom.virginia.edu/

SurveySuite/Surveys/OAiVT

Funding Success

NSF-DFG / VT-Oldenburg: OAI research for next 3 years 2 countries 2 domains

– Physics– Electronic theses and dissertations (ETDs)

Evolution of existing efforts to use OAI Refinement of services as ontologies develop

Funding Failures

NSF ITR – Large US Dept. of Education (FIPSE) – 5 sites

– Training– Graduate students

Figure 1. Layers Related to Open Archives Initiative

Services

Search/Browse

Authoring Citation Checking Submission

Metadata Creation

Editorial: Reviewing, Certification

Registry

Archives: Name, ID, Description, Terms and Conditions, …

Metadata Formats: Name, XML DTD, …

Archive Formats: Name, Standard, Preservation Process, …

Protocols Tools

Services

Copy-Edit / Add Value Citation DB Updating

Authority Control

Preservation Conversion

Text/MM Editing

Gazetteer Cataloging

Collaboration

Annotation

Summarization

Citation / Linking

SFX

CiteSeer

Repository NCSTRL Repository

EconWPA Repository

RePEc Repository

Repository for NDLTD Open Archives Harvesting Protocol

Metadata Formats: OA Metadata Set, NDLTD Standard (DC-based) Set

Transaction Log

Training Resources

VT Partition

Record (Metadata)

Record (Full Content)

… …

UVA Partition

Metadata Content

Caltech Partition

Metadata Content

Other Development Efforts

Cornell Software Los Alamos Software Southampton Software ODU Software Other Software Registered Archives

Program

9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward

Program

11-12:30 Session 2 – Technical Details– Expanding the Scope and New Technical

Agreements (Lagoze) – 60 min– Framing the Discussion for the Afternoon (Fox) –

30 min

Expanding the Scope and New Technical Agreements - Lagoze

Framing the Discussionfor the Afternoon – Fox - 1

Divide into groups soon for lunch Sit and discuss in groups during lunch Groups report back in afternoon

– Present comments orally– Lead discussion of those comments

Groups submit report later through email

Framing the Discussionfor the Afternoon – Fox - 2

Possible Groups: Political agendas and their unfolding

– “Gray literature”, Courseware/NSDL, … Guiding principles for technical agenda

– What is an archive?– What is best terminology?

Implementation plans for OAI core

Framing the Discussionfor the Afternoon – Fox - 3

Possible Groups (cont’d): Requirements for OAI-related services /

design of component-based DL Implementation plans for OAI-related services Linking OAI with other initiatives: science data, …

Program

9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward

Program

2-3:30 Session 3 – Discussion– Funding Agencies/Sponsors – 30 min– General Discussion (Fox, Lagoze) – 60 min

Funding Agencies / Sponsors

General Discussion

Reactions to OAI agreements Applications of OAI to communities

represented by attendees

Program

9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward

Program

3:50-4:20 Session 4 – Presentations– Constantino Thanos (IEI-CNR, Italy)– Robert Tansley (U. Southampton, UK)– Eberhard Hilf (U. Oldenburg, Germany)

Program

9-10 Session 1 – Introduction 10:30-11 Break 11-12:30 Session 2 – Technical Details 12:30-2 Lunch 2-3:30 Session 3 – Discussion 3:30-3:50 Break 3:50-4:20 Session 4 – Presentations 4:20-5 Session 5 – Moving Forward

Program

4:20-5 Session 5 – Moving Forward (Fox, Lagoze) – 40 min– Plans for implementation– Future research agendas– Community building: listservs, …

VT - 1

General purpose tools:Hussein’s PERL implementationMarcos’ Java implementationOAI Repository Explorer – Version 2

http://purl.org/net/explorer

VT - 2

NSDL / XXDL ?Bill Graves, Collegis/EdupriseIMSUNC WilmingtonChemistry, CS, Math, …CSTC already involved in OAI

VT - 3

MARIAN:Evolved from CODER (~1987)C/C++ version: SIGIR’93Research and production DL systemHarvest/Gateway: Dienst, “Harvest”,

OAI, Z39.59 + OAI to Greenstone, Phronesis

www.theses.org

OAI Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

Sets by subject Sets by origin

MDO MDOMDOMDO

MARIAN

Dienst

VTLS

Harvest Z39.50 OAI - 1 OAI - 2…

Figure 1. Layers Related to Open Archives Initiative

Services

Search/Browse

Authoring Citation Checking Submission

Metadata Creation

Editorial: Reviewing, Certification

Registry

Archives: Name, ID, Description, Terms and Conditions, …

Metadata Formats: Name, XML DTD, …

Archive Formats: Name, Standard, Preservation Process, …

Protocols Tools

Services

Copy-Edit / Add Value Citation DB Updating

Authority Control

Preservation Conversion

Text/MM Editing

Gazetteer Cataloging

Collaboration

Annotation

Summarization

Citation / Linking

SFX

CiteSeer

Repository NCSTRL Repository

EconWPA Repository

RePEc Repository

Repository for NDLTD Open Archives Harvesting Protocol

Metadata Formats: OA Metadata Set, NDLTD Standard (DC-based) Set

Transaction Log

Training Resources

VT Partition

Record (Metadata)

Record (Full Content)

… …

UVA Partition

Metadata Content

Caltech Partition

Metadata Content

Program

4:20-5 Session 5 – Moving Forward (Fox, Lagoze) – 40 min– Plans for implementation– Future research agendas– Community building

Closing– Thank you!– Future meetings: JCDL’2001, ECDL’2001– Online discussions