43
CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected] Lecture 2: A Library Automation perspective on Digital Libraries? Based on lectures 1997-1999

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected] Lecture 2: A Library

Embed Size (px)

Citation preview

CS 502 Computing Methods for Digital

Libraries

Cornell University – Computer ScienceHerbert Van de [email protected]

Lecture 2: A Library Automation perspective on Digital Libraries?

Based on lectures 1997-1999

• Primary Sources:• the content itself; an intellectual work• a journal article, a journal, a book, a CD, a Web Page

Types of Resources

• Secondary Sources:• contain data about content; metadata• abstracting&indexing databases:

• per discipline for journal articles, conference proceedings, (books): Inspec, Medline, MLA, BIOSIS, ...• for books: Books in Print, ...

• citation databases• global coverage

Types of Resources

• Catalogue:• also data about data; metadata• descriptions at the level of a book, a journal, not the article level• but: only about holdings of a certain library• contains location information• local coverage: a catalogue per library/institution• OPAC system

Types of Resources

Web?

• Primary Source: a web page• Secondary Sources: databases behind search engines• Catalogue: URLs associated with records in the databases behind search engines

Search engines are a merger of secondary sources and a catalogue

Core goal of library automation

Optimize the consultation chain

Keywords => secondary sources => References

References => catalogue

=> Locations

Locations => document delivery

=> Papers in primary sources

early library automation

• Automation of in-house procedures (acquisitions, serials control, loan, …) • Technology:

• mainframe/minicomputers• dedicated terminals

• Digital content: • OPAC system (end-user)•A&I dbases on online hosts (happy few)

• conceptual: in-house procedures => information & information delivery

• technological: LAN, CD-ROMlibraries: A&I databases, local storage

• 1985 -

two reorientations in library automation

FIRST WAVE

from automated housekeeping of libraries & archives to automated information

from database networkingto the digital library

two reorientations in library automation

SECOND WAVE

first wave

Catalogue Secondary sources

Primarysources

CD-ROM LANCD-ROM soft

ILSOPACS

CD-ROM LANCD-ROM soft

second wave

infoA

meta

soft A

infoB

cata

soft B

infoZ

prim

soft Z

infoT

soft T

two reorientations in library automation

• conceptual: database networking => integrated information environment

• technological: open systems, client-server, inter-application tools, global networkinglibrary: WWW, C/S CD-ROM, Z39.50, local and remote storage, full-content

• 1993 -

SECOND WAVE

from database networkingto the digital library

THE HYBRID INFORMATION ENVIRONMENT

THE INTERLINKED INFORMATION ENVIRONMENT

THE ACCESSIBILITY

second wave : characteristics

second wave : characteristics

THE HYBRID INFORMATION ENVIRONMENT

paper based traditional library +/-

information formal non-formal

digital formal/digital informal/digital

DISCOVERY

SELECTION

LOCATION

REQUEST

USE

PAPER

FORMAL

DIGITAL

NON-FORMAL

AUTOMATEDTRADITIONAL

LIBRARY

SECONDARYTOOLS

CATALOGUEBIBLIOGRAPHIC

HOLDING

DOC DEL REQloan, ill, on-site

show library cardpay invoiceread

INFORMAL DIGITAL

SEARCHENGINES

URL

access

FORMAL DIGITAL

SECONDARYSERVICES

HEADERSTOC

URL

authenticationchargingaccess

FORMAL versus NON-Formal information– non-formal information is over-valued– formal information is neglected

ACTUAL INFORMATION ENVIRONMENT : – non formal information is better reachable than

formal information

“ Some users are naive enough to believe thatinformation found only on the internet is adequate for a

literature search “(Saunders & Mitchell, The evolving virtual library)

second wave : characteristics

THE INTERLINKED INFORMATION ENVIRONMENT

DISCOVERY

SELECTION

LOCATION

REQUEST

USE

AUTOMATEDTRADITIONALLIBRARY

INFORMALDIGITAL

FORMAL DIGITAL

PAPER

FORMAL

DIGITAL

NON-FORMAL

SECONDARYTOOLS

CATALOGUEBIBLIOGRAPHIC

HOLDING

DOC DEL REQloan, ill, on-site

show library cardpay invoiceread

SEARCHENGINES

URL

access

SECONDARYSERVICES

HEADERSTOC

URL

authenticationchargingaccess

2nd wave : characteristics

THE ACCESSIBILITY

• location independent solutions• platform independent solutions• access via standard user-interfaces• access control and accounting mechanisms

go for the Web

N

authentication

session management

menusystem authorization

ERL

Z39.50

URL

THIN

interlinking

authentication

session management

menusystem authorization

ERL

Z39.50

URL

THIN

interlinking

N

information systems : Z39.50

• NISO Z39.50 standard, client-server protocol for information retrieval

• user interface:– simultaneous access to multiple databases– simultaneous access to multiple servers– single user-interface for multiple resources– http, Z39.50 clients

• built around services such as: initialisation, search, retrieval, sort, browse, …

• many OPACs, A&I databases

information systems : http • protocol of the Web

– OPACs– full text at publishers’ sites– “aggregators”: ISI WoS, OCLC, Dialog, UMI– preprint archives– discipline oriented Internet portals– Internet search engines– information not typical to the library

environment• user-interface:

– within the environment of an aggregator: single interface for multiple resources

– some projects about searching multiple resources from a single interface : Virginia Tech, Old Dominion, Stanford

information systems : SilverPlatter ERL

• client-server solution for access to databases delivered on CD-ROM & tape & ftp

• 290+ scientific databases (Current Contents, Inspec, PsycLit, ERIC, MLA, Medline, …)

• local databases via partner publishing (Belgian Scientific Union Catalogue, Flemish Catalogue of Public Libraries, Belgian National Bibliography)

• user interface:– simultaneous access to multiple databases &

servers– http & ERL clients & Z39.50 clients– single user interface for multiple databases

information systems: thin client technology

• traditional Windows CD-ROMs: no web-interface

• provide web-access via thin client technology

information systems

face the reality: the ideal world does not existlive with multiple user interfaces

N

authentication

session management

menusystem

ERL

Z39.50

URL

THIN

authorization interlinking

authentication

• Coalition for Networked Information:– authentication : the right to use a name– authorization : the right to use a service

• why authentication?– personalized services – as a basis for authorization in distributed systems with

licensed information

• provide single sign-on for environment

cf. IT: file-server, e-mail, UNIX account, OPAC, …)

•Athens e-Lib– http://www.athens.ac.uk/index.html– central user-database / decentral administration / central

authentication / central authorization

• the Digital Library Federation Authentication and

Authorization initiative– https://www1.columbia.edu/sec/acis/rad/xsamd/

– institutional user-database / institutional administration /

institutional authentication / institutional authorization– based on standards: LDAP(s), HTTP(S), certificates (X509)

– split between authentication and authorization

auth : solutions/initiatives

authentication

session management

menusystem

ERL

Z39.50

URL

THIN

N

authorization interlinking

session management

• positive authentication => session-ID• store identity & session-ID

– in the browser (cookie)– at server side

• store other info in browser or server:• from authentication: username, e-mail address, ...

• from session: interface language, status of menu-system, session-ID in information sytems, ...

• keep track of user actions in the environment

authentication

session management

menusystem

ERL

Z39.50

URL

THIN

N

authorization interlinking

menu system

• provide overall presentation of the environment• search/browse the database of databases

authentication

session management

menusystem

ERL

Z39.50

URL

THIN

N

authorization interlinking

authorization

• status:• user is authenticated in the environment• user has a raw indication on authorization for resources

• next: use a resource• requires extra identification• authentication & authorization in the resource• opposed to the single sing-on concept, privacy issues, …• IP address based (!!off location, dynamic IP addressing!!)

• no real solutions: it is a mess!• via an authorization module (proxy), in the background• provide seamless access to the environment

authentication

session management

menusystem authorization

ERL

Z39.50

URL

THIN

interlinking

N

interlinking module

• tradition: fragmented and non-context-sensitive approach• now: SFX/OpenURL