92
Digital Libraries: From Theory to Applications in Education and Business ICADL 2000 – Seoul, Korea December 7, 2000 Edward A. Fox [email protected] http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech,

Digital Libraries: From Theory to Applications in Education and Business ICADL 2000 – Seoul, Korea December 7, 2000 Edward A. Fox [email protected]

Embed Size (px)

Citation preview

Digital Libraries: From Theory to Applications in Education and Business

ICADL 2000 – Seoul, KoreaDecember 7, 2000

Edward A. [email protected] http://fox.cs.vt.edu

CS DLRL Internet TICVirginia Tech, Blacksburg, VA, USA

Outline

Introduction (5S)Education (CSTC, NDLTD)OAIMARIANConclusions

Acknowledgements (Selected) Conference Organizers and Sponsors Mentors: JCR Licklider, Michael Kessler, Gerard Salton Sponsors: Advance Auto Parts, CNI, DLF, IBM, NLM, NSF, OCLC,

UNESCO, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Tony Atkins, Debra Dudley, John Eaton, Jim

Hicks, Lance Matheson, Gail McMillan, James Powell, … VT Students: Fernando Das Neves, Robert France, Marcos

Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Ohm Sornil, Hussein Suleman, Omar Vasnaik, Marc Vass, …

Visitors: Mann-Ho Lee (Korea), Byongsun Kim (Korea), Shalini Urs (India), Akira Maeda (Japan)

Internet TechnologyInnovation Center

Supported by Virginia’s Center for Innovative Technology

Statewide University Partners - Governing Board:

Christopher Newport University– William Winter, William Muir, Virginia Electronic Commerce Technology Center /

Southeastern Virginia Network (VECTEC/SEVAnet)

George Mason University– Scott Martin, Internet Multimedia Center (ICM)– Steven Ruth, International Center for Applied Studies in IT (ICASIT)

University of Virginia– Alf Weaver, Internet Commerce Group (InterCom)– Jim French, Internet Digital Library

Virginia Tech– Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS– Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE

JCDL 2001 First Joint ACM/IEEE Conference on

Digital Libraries (+ NSF DLI-2 PI mtg)

http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U.

URLs

http://fox.cs.vt.eduhttp://www.dlib.vt.edu (DLRL)http://ei.cs.vt.edu/~dlib (Courseware)www.ndltd.org & www.theses.orgwww.cstc.org (CSTC and JERIC)www.openarchives.org (OAI)www.jcdl.org (JCDL’2001 – June 24-28)

Collaboration!

U.S. – Korea Joint Workshop onDigital Libraries

San Diego Supercomputer CenterAugust 10 & 11, 2000

Sponsored byNational Science Foundation, USA

Ministry of Information & Communication, KoreaInstitute of Information Tech. Assessment, Korea

San Diego Supercomputer CenterUniversity of Maryland

Virginia Tech

Workshop Participants (1 of 3) 

Robert Allen University of Maryland [email protected]

Dookwon Baik Korea University [email protected]

Ching-Chih Chen

Simmons College, Boston [email protected]

Su-Shing Chen University of Missouri - Columbia [email protected]

Jonghoon Chun Myongji University [email protected]

Gregory Crane Tufts University [email protected]

Lois Delcambre Oregon Graduate Institute [email protected]

Edward Fox Virginia Tech [email protected]

Michael Gertz University of California, Davis [email protected]

Stephen Helmreich

New Mexico State University [email protected] 

Workshop Participants (2 of 3) 

Ulf Hermjakob USC Information Sciences Institute [email protected]

Soon Joo Hyun Information & Communications University (ICU)

[email protected]

Hyeon Kim Korea Research & Development Information Center

[email protected]

Sung-Hyuk Kim Sookmyung Women’s University [email protected]

Yongchae Kim Ministry of Information & Communication

[email protected]

Ron Larsen University of Maryland [email protected]

Sang-goo Lee Seoul National University [email protected]

Sang Ho Lee Soongsil University [email protected]

Young-Suk Lee MIT, Lincoln Laboratory [email protected]

Karl Lo University of California, San Diego [email protected] 

Workshop Participants (3 of 3) 

Bruce Miller University of California, San Diego [email protected]

Sung Been Moon

Yonsei University [email protected]

Reagan Moore San Diego Supercomputer Center [email protected]

Sung Hyon Myaeng

Chungnam National University [email protected]

Gang-Tak Oh National Computerization Agency, Seoul [email protected]

Sam-Gyun Oh SungKyunKwan University [email protected]@YURIM.SKKU.AC.KR

Hae-Chang Rim Korea University [email protected]

Shalini Urs University of Mysore [email protected]

Lee Zia National Science Foundation [email protected] 

Some Observations

So many conferences! Lots of R&D! Exhibits: a DL industry is emerging. But: we don’t cite each other’s works; nobody is asking “Why”; we are not connecting theory + projects; nobody is talking about OAI.

So, I’ve redone my talk, since you can see:– paper in proceedings– demo tomorrow (p. 327) and online– see tutorial notes (in book) and online

DL = Users Direct(Organized Artifact Mediated Communication)

Author

Reader

Digital

LibraryEditorReviewer

Teacher

Learner

Librarian

Sponsor

Publisher

DL = Users Direct(Organized Artifact Mediated Communication)

SalesAgent

Inventory Digital

LibrarySales Partners

Parts Supplier

Training

Home

Garages

Shopper

StoreRepairManuals

B2BB2CStaff

CS 6604: Digital Libraries (Fall 2000)CS 6604: Digital Libraries (Fall 2000) http://scholar.lib.vt.edu/imagebase/

DL of Images of Birds for Virginia Tech Museum DL of Images of Birds for Virginia Tech Museum of Natural Historyof Natural History

Student TeamStudent TeamAmeya DateyAmeya Datey

Aniket SuleAniket SuleSupriya AngleSupriya Angle

Balaprasuna ChennupatiBalaprasuna Chennupatiand the Eagle Scoutsand the Eagle Scouts

UnderUnder t the guidance ofhe guidance ofDr. Edward FoxDr. Edward Fox

Ms. Llyn Sharp (VT Museum of Natural History)Ms. Llyn Sharp (VT Museum of Natural History)Mr. Anthony Atkins (Digital Library and Archives)Mr. Anthony Atkins (Digital Library and Archives)

Plus, 3-D VTMNH minerals

in UH3004

Libraries of the FutureJCR Licklider, 1965, MIT Press:

Unified Theory? Not ready in 1960s Analog – unified field theory in physics “Mess” today – segmented field, specialities

– Database <-> Knowledge <-> Content Mgmnt– Multimedia, Hypermedia, Hypertext– Logic, Algebra, Artificial Intelligence, …

Expensive, annoying for users– Don’t know where to look– Don’t know how to use services

5S Layers

Societies

Scenarios

Spaces

Structures

Streams

Definition: Digital Libraries are complex systems that

help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

Definition: 5S Framework Societies: interacting people (, computers) Scenarios: services, functions, operations, methods Spaces: domains + constraints (e.g., distance,

adjacency): 2D, vector, probability Structures: relations, trees, nodes and arcs Streams: sequences of items (text, audio, video,

network traffic) (5 Element System: Fire, Wood, Earth, Metal, Water)

5S: Combinations

Societies + Scenarios = user model Societies + Scenarios + Spaces = user interface Streams + Structures = markup Streams + Structures + Scenarios = object Structures + Scenarios = DBMS

Outline

Introduction (5S)Education (CSTC, NDLTD)OAIMARIANConclusions

NSDL Spine

full-servicecollectionsfull-servicecollectionsNSDLCollections

referenceditems &

collections

referenceditems &

collections

ReferencedItems &

Collections

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

discussion

CI Services

personalization

CI Services

topic-map registry

CI Services

query transform

Core Collection- Usage Services

annotation

Core Collection-Building Servicesprotocol mediation

Core Collection-Building Services

persistence

Core Collection-Building Services

harvesting

Portals &Clients

Portals &Clients

Portals &Clients

(Slide from Dave Fulker, Bill Arms – 11/2/2000)

ARIADNE Screens (E. Duval)

CS Teaching Center (CSTC)

Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.

Learners benefit from having well-crafted modules that have been reviewed and tested.

Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.

ACM Education Board and SIG support, new NSF grant with UNCW, Eduprise, TCNJ, … - iLumina Project

ACM J. of Educational Resources in Computing (JERIC)

Browsing (1)

Browsing (2)

A Digital Library Case Study

Domain: graduate education, research

Genre: ETDs = electronic theses & dissertations

Submission: http://etd.vt.edu

Collection: http://www.theses.org

Project: Networked Digital Library

of Theses & Dissertations http://www.ndltd.org

(NDLTD – remember: ND LTD / NDL TD) (also, newer NUDL:

Networked University Digital Library, with e-courseware, etc.)

ETD Initiative (and UMI)

StudentsLearn aboutDL, EPub

TDsbecome more

expressive

N. Amer. (T)Ds areaccessible, archived

Global TDsbecome more

accessible,archived

UMI

Universities

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

What are the long term goals?

Attract all TDs/yr: 50K D-US, 25K D-Germany, 10K TD-Canada, …

>200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …)

Dramatic increase in knowledge sharing: literature reviews, bibliographies, …

Services providing lifelong access for students: browse, search, prior searches, citation links

Hundreds/thousands of downloads / year / work

The Networked Digital Library of Theses and Dissertations

www.NDLTD.org

Leader of the Worldwide ETD(Electronic Thesis and Dissertation) Initiative

Training AuthorsExpanding Access

Preserving KnowledgeImproving Graduate Education

Enhancing Scholarly CommunicationEmpowering Students & Universities

Outline

Introduction (5S)Education (CSTC, NDLTD)OAIMARIANConclusions

Why do we need the Open Archives Initiative ?

Current standards are too complicated Information wants to be free !

We can decouple– Running an archive (DL content collection)– Running a service (DL system / operation)

So we can have more and better archives, that build on each other

So we can have better services, that work on multiple collections

OAI: Archives of Digital Objects

ArchiveAccessProtocol

Handle(ID)

Digital object

terms and conditions

The Open Archives Initiativewww.openarchives.org

a technical introduction

Hussein Suleman ([email protected])

Virginia Tech DLRL

December 2000

History

Santa Fe Convention (October 1999)– Electronic pre-print community

San Antonio (July 2000), Lisbon (Sept. 2000)– Broader interest from other parties

Ithaca Meeting (September 2000)– Formulation of general-purpose protocol

OAI Open Meetings (January –Feb. 2001)– Public release of specifications

Federation vs. OAI Harvesting Federation

– Sending out queries to remote sites and combining results

Harvesting– Gathering all metadata from remote sites into a central

search system– Lightweight protocol– Robust– Less network traffic– Redundant servers

Black Box OAI-ETD Perspective

ISTEC(Ibero

America)

PhysDis

NSYSU(Taiwan)

ADT(Australia)

BN.PT(Portugal)

www.theses.org

CyberTheses(Francophone)

VT

Dissert.Online(Germany)

MITOhioLINK

CBUC(Catalunya)

NDC(Greece)

SEALS(S.Africa)

CIC U. Bergen(Norway)

Splitting Data & Services

Data Provider– Implements the OAI protocol on archive to

allow external access to data

Service Provider– Uses the OAI protocol to access external

archives and provide services (such as searching or linking) on their metadata

The Big Picture

DL

Repository 1 Repository 2 Repository 3 Repository 4

Requirements for OAI Protocol

Unique identifiers (URNs) for each record

Date-stamp for each record when last modified/created/deleted

HTTP server with scripting ability

OAI Harvesting Protocol v1

Operates over HTTP HTTP Requests and XML Responses HTTP Error codes

6 Service requests (verbs):– Identify, ListMetadataFormats, ListSets– ListIdentifiers, GetRecord, ListRecords

Identify - Response

ListMetadataFormats - Response

GetRecord - Response

Verb: ListRecords

Retrieves metadata for multiple records

Parameters– from – start date (O)– until – end date (O)– set – set to harvest from (O)– resumptionToken – flow control mechanism (X)– metadataPrefix – metadata format (R)

ListRecords - Response

Feature: Different Metadata

Feature: Date Ranges

Feature: Resumption Token

Repository Explorer

ODU Search Service

What Next ?

In General– Cross-archive searching– Cross-archive linking, de-duping, threading– Selective Filtering– Open-DL in a Box ?

VT– The VT Digital Library– NDLTD Union Catalog

[acknowledgements]Carl Lagoze

the Open Archives Initiative

Herbert Van de SompelCornell University -- Computer Science

DLF FALL FORUM 2000 – Chicago – November 18th 2000

Actions

herbert van de sompel

• establish organizational stability for the OAI:

• institutional backing from CNI & DLF

• steering committee: policy guidance

• technical committee: technical specifications

• executive group: day to day coordination

• workshops: public dissemination, feedback

• revise specifications to allow adoption beyond preprints

low-barrier interop umbrella

herbert van de sompel

metadata

OPAC

image

FTXT

A&I

e-print

low-barrier interop umbrella

herbert van de sompel

metadata

OPAC

image

FTXT

A&I

e-print

AuthorTitleAbstractIdentifer

OAI harvesting tools

herbert van de sompel

service providerharvester

data providerrepository

DatestampIdentifierSet

Records

repos i tory

• publication of specifications: • January 2001• US Open Day, January 23rd Washington DC• EC Open Day, February 2001, Berlin

• freeze specifications for 1 year:

• stable for experimentation; not definitive• minimize risk for early adopters

• maximize chances for future interoperability across communities

revision of specifications

herbert van de sompel

alpha test of specs (11/2000-01/2001)

herbert van de sompel

• data providers:• arXiv -- Los Alamos • NACA -- NASA• CogPrints -- U Southampton• ETD -- Virginia Tech• Thesis & Dissertations from WorldCat -- OCLC

• data providers:• HeinOnline law journals -- Cornell U• TEI-lite collection -- U Tennessee• STM publisher metadata -- U Illinois• Resource Disovery Network -- UKOLN• Open Language Archives -- U Pennsylvania• Open Video Project -- U North Carolina• Museum info. -- CIMI

alpha test of specs (11/2000-01/2001)

herbert van de sompel

• software:

• OAI harvesting interface to Ex Libris Aleph 500 Integrated Library System -- Ex Libris

• OAI harverster – Cornell U

•OAI harverster – Virginia Tech

• Open-source software capable of creating a merged catalog of metadata harvested from OAI-servers -- OCLC

alpha test of specs (11/2000-01/2001)

herbert van de sompel

• service providers:• Repository explorer -- Virginia Tech• MARIAN DL -- Virginia Tech• ARC service -- Old Dominion U

alpha test of specs (11/2000-01/2001)

herbert van de sompel

The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.

New OAI mission statement

herbert van de sompel

The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program.

The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials.

[...]

New OAI mission statement

herbert van de sompel

Harvesting Document Metadata for Federated Search

CS6604 Fall 2000 Project

Presented By

Avnish Kumar Chhabra

Benefits of Harvesting

Limited storage requirement Fast search Consistently ranked results Improved reliability Distributed collections are transparent to

user. Efficient use of network resources.

Design of the Solution

OAI wrapper

Z39.50 Wrapper

Update Scheduling

Query GenerationDigital Library

collection

Parser/Updater

Queries

Replies

New Metadata

MARIAN Metadata Database

Boundary of System developed

Implementation

Main scheduler thread:

Server, Protocol, Update Frequency

SiteInfo Schedule File

OAI harvester class: OAIInterface

Instantiated with URL of OAI siteAnd scheduling frequency

HarvestorMonitor:Monitor for arbitrating access to network resources

DL Collection

OAIHandlerXML Document Event Handler

class

Auth

Sub

Abs

Features of the system developed

Per-collection execution thread Schedules updates Encapsulation of protocol specific details Extensibility Control over active execution threads Fault tolerance

– Server unreachable– Failure / timeout of individual connections

Time zones and date ambiguity considered

Outline

Introduction (5S)Education (CSTC, NDLTD)OAIMARIANConclusions

MARIAN Layers

Database Layer

Search Engine Layer

User Information Layer

User Interface Layer

User User User User

GermanPhysDis

Collection

5SL Source

Description

wrapper wrapper

Harvestprotocol

VT OAI

Collection

MARIAN Mediation Middleware

MIT ETDCollection...

Open Archives

protocol

wrapper...Dienst

protocol

SOIF

DublinCore RFC1807

NDLTD/NUDL/Digital Library User

Queries + Results

GreekHellenic Dissertations

Collection

wrapper

MARCZ39.50

protocol

WrapperGenerator

Local Data Store

Search ServicesRecommendation Services, etc

AnalysisIndexingLinking

Part of Hierarchy ofMARIAN Classes

Digital Information Object

Structured Document

Text

English Text Non-English

European Language Text

Korean Text

Controlled String

Person’s Name

Relevant Document Structure

MARIAN-Phronesis Interoperability

CS6604 Fall 2000 Project

Tracy Lewis

Ryan Richardson

Kim Woods

MARIAN-Phronesis V1 Architectural Diagram

Phron Query

CGI Script

Search Page

Display to user

Create object

instance

MARIAN PHRONESIS

Marian Query

CGI ScriptPhron Results

MARIAN-Phronesis Login Page

Query in Español

Outline

Introduction (5S)Education (CSTC, NDLTD)OAIMARIANConclusions

Conclusions

Education is an important application of DLs Having a framework and theory may lead to

better (more effective) systems and broader applicability– 5S– MARIAN

Interoperability is part of the DL grand challenge– OAI