Upload
lizbeth-tyler
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
CS 502 Computing Methods for Digital
Libraries
Cornell University – Computer ScienceHerbert Van de [email protected]
Lecture 2: A Library Automation perspective on Digital Libraries?
Based on lectures 1997-1999
• Primary Sources:• the content itself; an intellectual work• a journal article, a journal, a book, a CD, a Web Page
Types of Resources
• Secondary Sources:• contain data about content; metadata• abstracting&indexing databases:
• per discipline for journal articles, conference proceedings, (books): Inspec, Medline, MLA, BIOSIS, ...• for books: Books in Print, ...
• citation databases• global coverage
Types of Resources
• Catalogue:• also data about data; metadata• descriptions at the level of a book, a journal, not the article level• but: only about holdings of a certain library• contains location information• local coverage: a catalogue per library/institution• OPAC system
Types of Resources
Web?
• Primary Source: a web page• Secondary Sources: databases behind search engines• Catalogue: URLs associated with records in the databases behind search engines
Search engines are a merger of secondary sources and a catalogue
Core goal of library automation
Optimize the consultation chain
Keywords => secondary sources => References
References => catalogue
=> Locations
Locations => document delivery
=> Papers in primary sources
early library automation
• Automation of in-house procedures (acquisitions, serials control, loan, …) • Technology:
• mainframe/minicomputers• dedicated terminals
• Digital content: • OPAC system (end-user)•A&I dbases on online hosts (happy few)
• conceptual: in-house procedures => information & information delivery
• technological: LAN, CD-ROMlibraries: A&I databases, local storage
• 1985 -
two reorientations in library automation
FIRST WAVE
from automated housekeeping of libraries & archives to automated information
first wave
Catalogue Secondary sources
Primarysources
CD-ROM LANCD-ROM soft
ILSOPACS
CD-ROM LANCD-ROM soft
two reorientations in library automation
• conceptual: database networking => integrated information environment
• technological: open systems, client-server, inter-application tools, global networkinglibrary: WWW, C/S CD-ROM, Z39.50, local and remote storage, full-content
• 1993 -
SECOND WAVE
from database networkingto the digital library
THE HYBRID INFORMATION ENVIRONMENT
THE INTERLINKED INFORMATION ENVIRONMENT
THE ACCESSIBILITY
second wave : characteristics
second wave : characteristics
THE HYBRID INFORMATION ENVIRONMENT
paper based traditional library +/-
information formal non-formal
digital formal/digital informal/digital
DISCOVERY
SELECTION
LOCATION
REQUEST
USE
PAPER
FORMAL
DIGITAL
NON-FORMAL
AUTOMATEDTRADITIONAL
LIBRARY
SECONDARYTOOLS
CATALOGUEBIBLIOGRAPHIC
HOLDING
DOC DEL REQloan, ill, on-site
show library cardpay invoiceread
INFORMAL DIGITAL
SEARCHENGINES
URL
access
FORMAL DIGITAL
SECONDARYSERVICES
HEADERSTOC
URL
authenticationchargingaccess
FORMAL versus NON-Formal information– non-formal information is over-valued– formal information is neglected
ACTUAL INFORMATION ENVIRONMENT : – non formal information is better reachable than
formal information
“ Some users are naive enough to believe thatinformation found only on the internet is adequate for a
literature search “(Saunders & Mitchell, The evolving virtual library)
DISCOVERY
SELECTION
LOCATION
REQUEST
USE
AUTOMATEDTRADITIONALLIBRARY
INFORMALDIGITAL
FORMAL DIGITAL
PAPER
FORMAL
DIGITAL
NON-FORMAL
SECONDARYTOOLS
CATALOGUEBIBLIOGRAPHIC
HOLDING
DOC DEL REQloan, ill, on-site
show library cardpay invoiceread
SEARCHENGINES
URL
access
SECONDARYSERVICES
HEADERSTOC
URL
authenticationchargingaccess
2nd wave : characteristics
THE ACCESSIBILITY
• location independent solutions• platform independent solutions• access via standard user-interfaces• access control and accounting mechanisms
go for the Web
information systems : Z39.50
• NISO Z39.50 standard, client-server protocol for information retrieval
• user interface:– simultaneous access to multiple databases– simultaneous access to multiple servers– single user-interface for multiple resources– http, Z39.50 clients
• built around services such as: initialisation, search, retrieval, sort, browse, …
• many OPACs, A&I databases
information systems : http • protocol of the Web
– OPACs– full text at publishers’ sites– “aggregators”: ISI WoS, OCLC, Dialog, UMI– preprint archives– discipline oriented Internet portals– Internet search engines– information not typical to the library
environment• user-interface:
– within the environment of an aggregator: single interface for multiple resources
– some projects about searching multiple resources from a single interface : Virginia Tech, Old Dominion, Stanford
information systems : SilverPlatter ERL
• client-server solution for access to databases delivered on CD-ROM & tape & ftp
• 290+ scientific databases (Current Contents, Inspec, PsycLit, ERIC, MLA, Medline, …)
• local databases via partner publishing (Belgian Scientific Union Catalogue, Flemish Catalogue of Public Libraries, Belgian National Bibliography)
• user interface:– simultaneous access to multiple databases &
servers– http & ERL clients & Z39.50 clients– single user interface for multiple databases
information systems: thin client technology
• traditional Windows CD-ROMs: no web-interface
• provide web-access via thin client technology
information systems
face the reality: the ideal world does not existlive with multiple user interfaces
authentication
• Coalition for Networked Information:– authentication : the right to use a name– authorization : the right to use a service
• why authentication?– personalized services – as a basis for authorization in distributed systems with
licensed information
• provide single sign-on for environment
cf. IT: file-server, e-mail, UNIX account, OPAC, …)
•Athens e-Lib– http://www.athens.ac.uk/index.html– central user-database / decentral administration / central
authentication / central authorization
• the Digital Library Federation Authentication and
Authorization initiative– https://www1.columbia.edu/sec/acis/rad/xsamd/
– institutional user-database / institutional administration /
institutional authentication / institutional authorization– based on standards: LDAP(s), HTTP(S), certificates (X509)
– split between authentication and authorization
auth : solutions/initiatives
session management
• positive authentication => session-ID• store identity & session-ID
– in the browser (cookie)– at server side
• store other info in browser or server:• from authentication: username, e-mail address, ...
• from session: interface language, status of menu-system, session-ID in information sytems, ...
• keep track of user actions in the environment
menu system
• provide overall presentation of the environment• search/browse the database of databases
authorization
• status:• user is authenticated in the environment• user has a raw indication on authorization for resources
• next: use a resource• requires extra identification• authentication & authorization in the resource• opposed to the single sing-on concept, privacy issues, …• IP address based (!!off location, dynamic IP addressing!!)
• no real solutions: it is a mess!• via an authorization module (proxy), in the background• provide seamless access to the environment