Upload
gwen-bridges
View
215
Download
2
Embed Size (px)
Citation preview
Digital Libraries: From Theory to Applications in Education and Business
ICADL 2000 – Seoul, KoreaDecember 7, 2000
Edward A. [email protected] http://fox.cs.vt.edu
CS DLRL Internet TICVirginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected) Conference Organizers and Sponsors Mentors: JCR Licklider, Michael Kessler, Gerard Salton Sponsors: Advance Auto Parts, CNI, DLF, IBM, NLM, NSF, OCLC,
UNESCO, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Tony Atkins, Debra Dudley, John Eaton, Jim
Hicks, Lance Matheson, Gail McMillan, James Powell, … VT Students: Fernando Das Neves, Robert France, Marcos
Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Ohm Sornil, Hussein Suleman, Omar Vasnaik, Marc Vass, …
Visitors: Mann-Ho Lee (Korea), Byongsun Kim (Korea), Shalini Urs (India), Akira Maeda (Japan)
Internet TechnologyInnovation Center
Supported by Virginia’s Center for Innovative Technology
Statewide University Partners - Governing Board:
Christopher Newport University– William Winter, William Muir, Virginia Electronic Commerce Technology Center /
Southeastern Virginia Network (VECTEC/SEVAnet)
George Mason University– Scott Martin, Internet Multimedia Center (ICM)– Steven Ruth, International Center for Applied Studies in IT (ICASIT)
University of Virginia– Alf Weaver, Internet Commerce Group (InterCom)– Jim French, Internet Digital Library
Virginia Tech– Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS– Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE
JCDL 2001 First Joint ACM/IEEE Conference on
Digital Libraries (+ NSF DLI-2 PI mtg)
http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U.
URLs
http://fox.cs.vt.eduhttp://www.dlib.vt.edu (DLRL)http://ei.cs.vt.edu/~dlib (Courseware)www.ndltd.org & www.theses.orgwww.cstc.org (CSTC and JERIC)www.openarchives.org (OAI)www.jcdl.org (JCDL’2001 – June 24-28)
Collaboration!
U.S. – Korea Joint Workshop onDigital Libraries
San Diego Supercomputer CenterAugust 10 & 11, 2000
Sponsored byNational Science Foundation, USA
Ministry of Information & Communication, KoreaInstitute of Information Tech. Assessment, Korea
San Diego Supercomputer CenterUniversity of Maryland
Virginia Tech
Workshop Participants (1 of 3)
Robert Allen University of Maryland [email protected]
Dookwon Baik Korea University [email protected]
Ching-Chih Chen
Simmons College, Boston [email protected]
Su-Shing Chen University of Missouri - Columbia [email protected]
Jonghoon Chun Myongji University [email protected]
Gregory Crane Tufts University [email protected]
Lois Delcambre Oregon Graduate Institute [email protected]
Edward Fox Virginia Tech [email protected]
Michael Gertz University of California, Davis [email protected]
Stephen Helmreich
New Mexico State University [email protected]
Workshop Participants (2 of 3)
Ulf Hermjakob USC Information Sciences Institute [email protected]
Soon Joo Hyun Information & Communications University (ICU)
Hyeon Kim Korea Research & Development Information Center
Sung-Hyuk Kim Sookmyung Women’s University [email protected]
Yongchae Kim Ministry of Information & Communication
Ron Larsen University of Maryland [email protected]
Sang-goo Lee Seoul National University [email protected]
Sang Ho Lee Soongsil University [email protected]
Young-Suk Lee MIT, Lincoln Laboratory [email protected]
Karl Lo University of California, San Diego [email protected]
Workshop Participants (3 of 3)
Bruce Miller University of California, San Diego [email protected]
Sung Been Moon
Yonsei University [email protected]
Reagan Moore San Diego Supercomputer Center [email protected]
Sung Hyon Myaeng
Chungnam National University [email protected]
Gang-Tak Oh National Computerization Agency, Seoul [email protected]
Sam-Gyun Oh SungKyunKwan University [email protected]@YURIM.SKKU.AC.KR
Hae-Chang Rim Korea University [email protected]
Shalini Urs University of Mysore [email protected]
Lee Zia National Science Foundation [email protected]
Some Observations
So many conferences! Lots of R&D! Exhibits: a DL industry is emerging. But: we don’t cite each other’s works; nobody is asking “Why”; we are not connecting theory + projects; nobody is talking about OAI.
So, I’ve redone my talk, since you can see:– paper in proceedings– demo tomorrow (p. 327) and online– see tutorial notes (in book) and online
DL = Users Direct(Organized Artifact Mediated Communication)
Author
Reader
Digital
LibraryEditorReviewer
Teacher
Learner
Librarian
Sponsor
Publisher
DL = Users Direct(Organized Artifact Mediated Communication)
SalesAgent
Inventory Digital
LibrarySales Partners
Parts Supplier
Training
Home
Garages
Shopper
StoreRepairManuals
B2BB2CStaff
CS 6604: Digital Libraries (Fall 2000)CS 6604: Digital Libraries (Fall 2000) http://scholar.lib.vt.edu/imagebase/
DL of Images of Birds for Virginia Tech Museum DL of Images of Birds for Virginia Tech Museum of Natural Historyof Natural History
Student TeamStudent TeamAmeya DateyAmeya Datey
Aniket SuleAniket SuleSupriya AngleSupriya Angle
Balaprasuna ChennupatiBalaprasuna Chennupatiand the Eagle Scoutsand the Eagle Scouts
UnderUnder t the guidance ofhe guidance ofDr. Edward FoxDr. Edward Fox
Ms. Llyn Sharp (VT Museum of Natural History)Ms. Llyn Sharp (VT Museum of Natural History)Mr. Anthony Atkins (Digital Library and Archives)Mr. Anthony Atkins (Digital Library and Archives)
Plus, 3-D VTMNH minerals
in UH3004
Libraries of the FutureJCR Licklider, 1965, MIT Press:
Unified Theory? Not ready in 1960s Analog – unified field theory in physics “Mess” today – segmented field, specialities
– Database <-> Knowledge <-> Content Mgmnt– Multimedia, Hypermedia, Hypertext– Logic, Algebra, Artificial Intelligence, …
Expensive, annoying for users– Don’t know where to look– Don’t know how to use services
Definition: Digital Libraries are complex systems that
help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
Definition: 5S Framework Societies: interacting people (, computers) Scenarios: services, functions, operations, methods Spaces: domains + constraints (e.g., distance,
adjacency): 2D, vector, probability Structures: relations, trees, nodes and arcs Streams: sequences of items (text, audio, video,
network traffic) (5 Element System: Fire, Wood, Earth, Metal, Water)
5S: Combinations
Societies + Scenarios = user model Societies + Scenarios + Spaces = user interface Streams + Structures = markup Streams + Structures + Scenarios = object Structures + Scenarios = DBMS
NSDL Spine
full-servicecollectionsfull-servicecollectionsNSDLCollections
referenceditems &
collections
referenceditems &
collections
ReferencedItems &
Collections
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
discussion
CI Services
personalization
CI Services
topic-map registry
CI Services
query transform
Core Collection- Usage Services
annotation
Core Collection-Building Servicesprotocol mediation
Core Collection-Building Services
persistence
Core Collection-Building Services
harvesting
Portals &Clients
Portals &Clients
Portals &Clients
(Slide from Dave Fulker, Bill Arms – 11/2/2000)
CS Teaching Center (CSTC)
Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
Learners benefit from having well-crafted modules that have been reviewed and tested.
Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
ACM Education Board and SIG support, new NSF grant with UNCW, Eduprise, TCNJ, … - iLumina Project
ACM J. of Educational Resources in Computing (JERIC)
A Digital Library Case Study
Domain: graduate education, research
Genre: ETDs = electronic theses & dissertations
Submission: http://etd.vt.edu
Collection: http://www.theses.org
Project: Networked Digital Library
of Theses & Dissertations http://www.ndltd.org
(NDLTD – remember: ND LTD / NDL TD) (also, newer NUDL:
Networked University Digital Library, with e-courseware, etc.)
ETD Initiative (and UMI)
StudentsLearn aboutDL, EPub
TDsbecome more
expressive
N. Amer. (T)Ds areaccessible, archived
Global TDsbecome more
accessible,archived
UMI
Universities
What are the long term goals?
Attract all TDs/yr: 50K D-US, 25K D-Germany, 10K TD-Canada, …
>200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …)
Dramatic increase in knowledge sharing: literature reviews, bibliographies, …
Services providing lifelong access for students: browse, search, prior searches, citation links
Hundreds/thousands of downloads / year / work
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Leader of the Worldwide ETD(Electronic Thesis and Dissertation) Initiative
Training AuthorsExpanding Access
Preserving KnowledgeImproving Graduate Education
Enhancing Scholarly CommunicationEmpowering Students & Universities
Why do we need the Open Archives Initiative ?
Current standards are too complicated Information wants to be free !
We can decouple– Running an archive (DL content collection)– Running a service (DL system / operation)
So we can have more and better archives, that build on each other
So we can have better services, that work on multiple collections
OAI: Archives of Digital Objects
ArchiveAccessProtocol
Handle(ID)
Digital object
terms and conditions
The Open Archives Initiativewww.openarchives.org
a technical introduction
Hussein Suleman ([email protected])
Virginia Tech DLRL
December 2000
History
Santa Fe Convention (October 1999)– Electronic pre-print community
San Antonio (July 2000), Lisbon (Sept. 2000)– Broader interest from other parties
Ithaca Meeting (September 2000)– Formulation of general-purpose protocol
OAI Open Meetings (January –Feb. 2001)– Public release of specifications
Federation vs. OAI Harvesting Federation
– Sending out queries to remote sites and combining results
Harvesting– Gathering all metadata from remote sites into a central
search system– Lightweight protocol– Robust– Less network traffic– Redundant servers
Black Box OAI-ETD Perspective
ISTEC(Ibero
America)
PhysDis
NSYSU(Taiwan)
ADT(Australia)
BN.PT(Portugal)
www.theses.org
CyberTheses(Francophone)
VT
Dissert.Online(Germany)
MITOhioLINK
CBUC(Catalunya)
NDC(Greece)
SEALS(S.Africa)
CIC U. Bergen(Norway)
…
…
Splitting Data & Services
Data Provider– Implements the OAI protocol on archive to
allow external access to data
Service Provider– Uses the OAI protocol to access external
archives and provide services (such as searching or linking) on their metadata
Requirements for OAI Protocol
Unique identifiers (URNs) for each record
Date-stamp for each record when last modified/created/deleted
HTTP server with scripting ability
OAI Harvesting Protocol v1
Operates over HTTP HTTP Requests and XML Responses HTTP Error codes
6 Service requests (verbs):– Identify, ListMetadataFormats, ListSets– ListIdentifiers, GetRecord, ListRecords
Verb: ListRecords
Retrieves metadata for multiple records
Parameters– from – start date (O)– until – end date (O)– set – set to harvest from (O)– resumptionToken – flow control mechanism (X)– metadataPrefix – metadata format (R)
What Next ?
In General– Cross-archive searching– Cross-archive linking, de-duping, threading– Selective Filtering– Open-DL in a Box ?
VT– The VT Digital Library– NDLTD Union Catalog
[acknowledgements]Carl Lagoze
the Open Archives Initiative
Herbert Van de SompelCornell University -- Computer Science
DLF FALL FORUM 2000 – Chicago – November 18th 2000
Actions
herbert van de sompel
• establish organizational stability for the OAI:
• institutional backing from CNI & DLF
• steering committee: policy guidance
• technical committee: technical specifications
• executive group: day to day coordination
• workshops: public dissemination, feedback
• revise specifications to allow adoption beyond preprints
low-barrier interop umbrella
herbert van de sompel
metadata
OPAC
image
FTXT
A&I
e-print
AuthorTitleAbstractIdentifer
OAI harvesting tools
herbert van de sompel
service providerharvester
data providerrepository
DatestampIdentifierSet
Records
repos i tory
• publication of specifications: • January 2001• US Open Day, January 23rd Washington DC• EC Open Day, February 2001, Berlin
• freeze specifications for 1 year:
• stable for experimentation; not definitive• minimize risk for early adopters
• maximize chances for future interoperability across communities
revision of specifications
herbert van de sompel
alpha test of specs (11/2000-01/2001)
herbert van de sompel
• data providers:• arXiv -- Los Alamos • NACA -- NASA• CogPrints -- U Southampton• ETD -- Virginia Tech• Thesis & Dissertations from WorldCat -- OCLC
• data providers:• HeinOnline law journals -- Cornell U• TEI-lite collection -- U Tennessee• STM publisher metadata -- U Illinois• Resource Disovery Network -- UKOLN• Open Language Archives -- U Pennsylvania• Open Video Project -- U North Carolina• Museum info. -- CIMI
alpha test of specs (11/2000-01/2001)
herbert van de sompel
• software:
• OAI harvesting interface to Ex Libris Aleph 500 Integrated Library System -- Ex Libris
• OAI harverster – Cornell U
•OAI harverster – Virginia Tech
• Open-source software capable of creating a merged catalog of metadata harvested from OAI-servers -- OCLC
alpha test of specs (11/2000-01/2001)
herbert van de sompel
• service providers:• Repository explorer -- Virginia Tech• MARIAN DL -- Virginia Tech• ARC service -- Old Dominion U
alpha test of specs (11/2000-01/2001)
herbert van de sompel
The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.
New OAI mission statement
herbert van de sompel
The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program.
The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials.
[...]
New OAI mission statement
herbert van de sompel
Harvesting Document Metadata for Federated Search
CS6604 Fall 2000 Project
Presented By
Avnish Kumar Chhabra
Benefits of Harvesting
Limited storage requirement Fast search Consistently ranked results Improved reliability Distributed collections are transparent to
user. Efficient use of network resources.
Design of the Solution
OAI wrapper
Z39.50 Wrapper
Update Scheduling
Query GenerationDigital Library
collection
Parser/Updater
Queries
Replies
New Metadata
MARIAN Metadata Database
Boundary of System developed
Implementation
Main scheduler thread:
Server, Protocol, Update Frequency
SiteInfo Schedule File
OAI harvester class: OAIInterface
Instantiated with URL of OAI siteAnd scheduling frequency
HarvestorMonitor:Monitor for arbitrating access to network resources
DL Collection
OAIHandlerXML Document Event Handler
class
Auth
Sub
Abs
Features of the system developed
Per-collection execution thread Schedules updates Encapsulation of protocol specific details Extensibility Control over active execution threads Fault tolerance
– Server unreachable– Failure / timeout of individual connections
Time zones and date ambiguity considered
MARIAN Layers
Database Layer
Search Engine Layer
User Information Layer
User Interface Layer
User User User User
GermanPhysDis
Collection
5SL Source
Description
wrapper wrapper
Harvestprotocol
VT OAI
Collection
MARIAN Mediation Middleware
MIT ETDCollection...
Open Archives
protocol
wrapper...Dienst
protocol
SOIF
DublinCore RFC1807
NDLTD/NUDL/Digital Library User
Queries + Results
GreekHellenic Dissertations
Collection
wrapper
MARCZ39.50
protocol
WrapperGenerator
Local Data Store
Search ServicesRecommendation Services, etc
AnalysisIndexingLinking
Part of Hierarchy ofMARIAN Classes
Digital Information Object
Structured Document
Text
English Text Non-English
European Language Text
Korean Text
Controlled String
Person’s Name
MARIAN-Phronesis V1 Architectural Diagram
Phron Query
CGI Script
Search Page
Display to user
Create object
instance
MARIAN PHRONESIS
Marian Query
CGI ScriptPhron Results