50
Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) December 19, 2002

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

  • Upload
    selah

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search. William H. Mischo [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) - PowerPoint PPT Presentation

Citation preview

Page 1: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Emerging Information Technologies: The Role of XML, DOIs, OpenURL,

and Federated SearchWilliam H. [email protected]

Grainger Engineering Library Information CenterUniversity of Illinois at Urbana-Champaign

2002 International Conference on Digital Archive Technologies (ICDAT2002)

December 19, 2002

Page 2: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Outline• Digital Libraries and the Distributed Information

Environment.• Document Representation and Full-Text• Digital Library Tools• Illinois Projects.• XML Technologies.• Metadata Technologies.• DOIs, Linking, Local Resolver• Portals, Simultaneous Search, Linking• Grainger Search Aid• Issues & Trends.

Page 3: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

The Digital Library• ‘Digital’, ‘Virtual’, ‘Electronic’ Library as

network-based library without regard to place and time.

• Tendency to apply term to collections and resources.

• Digital Collections vs. Digital Library.

• Emphasis on the integration of collections and services (e.g. NSDL grant).

• Application of standards and protocols is important.

Page 4: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Scholarly Communication Overview• E-Resources are Web-based and publisher-centric.• Growth of Heterogeneous Distributed Repositories.• Value-added services and ‘branding’ of journals.• Prestige of Journals and Publishers• Reciprocal linking relationships between publishers.• Cooperation on linking standards (DOI, CrossRef).• Alternative publishing models - Academia, Preprint

Servers, disintermediation.

Page 5: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Distributed Information Environment• We live in a world of multiple, heterogeneous

information repositories, resources, portals, and IR systems. – OPACs – local, regional, national shared bibliographic

databases.– Local and remote A & I Services.– Discrete publisher and vendor repositories (full-text).– Web search engines, vertical portals, custom portals

(NSDL, ARL Portal).– Local metadata, digital objects, GIS, finding aids.– Preprint servers and institutional repositories (D-Space).– Instructional (course) management systems (WebCT,

Blackboard).– Harvestable (OAI) sites and services.

Page 6: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 7: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Distributed Repository - Issues • Integration of discrete, heterogeneous information

resources.• Role of federated and broadcast searching of distributed

resources.• Integration of collections with reference, instructional

and navigation services -TOC, remote reference assistance.

• Integration of Library, institutional, vendor, publisher, and government portals and information services.

• Linking technologies.• Metadata harvesting, archiving.

Page 8: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Distributed Environment Action Plan• Pressing need for document representation,

retrieval, transmission, and linking middleware tools and standards.

• Metadata standards, DOIs, OpenURL.

• Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries.

• Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape.

Page 9: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Portal Functions:--Authorization--Linking mechanisms between resources and among resources.--Simultaneous search.--Navigation

OPACA& I Services

(Local and Remote)

Full-TextResources

Web Client

Portal Presentation LevelLocal Link Server,

Local Value-Added

Local Databasesand OAI

Resources via DBMS

Linking:--Between full-text using DOI, CrossRef, Appropriate Copy.

--Between A&I and full-text.

--Between OPAC and full-text.

Web Resources &Knowledge

Environments

E-ResourceRegistry

Aggregator(Ebsco, OCLC)

PublisherPortal

(Elsevier)

CrossRefMetadata

DOIServer

Page 10: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Document Representation

• Continuum of Web-Enabled technologies -- all presently being utilized.

• Evolving technologies and standards.

• Role and history of markup.

• XML: its role and importance.

• The Smart Document.

Page 11: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 12: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Digital Library Tools• We have at our disposal the tools to create integrated

digital libraries from the distributed digital resources environment in which we operate:– Standard retrieval environment (Web) and interface/client

(Web Browser);– Standard transport mechanisms to connect heterogeneous

content (HTTP, OAI, SOAP);– Standard metalanguages and tools for describing and

transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS);

– Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases);

– Standard linking tools and infrastructure (DOI, OpenURL, CrossRef).

• Candidate set of ‘best practices’ for IR.

Page 13: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Work by Illinois DLI Group• We are attempting to address many of these issues within

the Digital Library Initiatives group.• Headquartered at Grainger Engineering Library

Information Center at UIUC.• Grant Work:

– Digital Library Initiative I (NSF, others), 1994-1998.– Corporation for National Research Initiatives (CNRI) D-Lib

Test Suite, 1998-2001.– Collaborating Partners Program, 1998--.– Andrew Mellon Foundation OAI Harvesting grant, 2001-2002.– NSF NSDL (National Science, Engineering, Technology, and

Mathematics Digital Library) Program, 2002-2004.– Institute of Museum and Library Services (IMLS) Registry and

Integration grant, 2002-2005.

Page 14: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Illinois Testbed Project• Funded under DLI-I by NSF, DARPA, and

NASA, 1994--1998. Awards made to 6 universities.

• Large-scale Testbed, Distributed Repository models, evaluation, Web software.

• Funded under CNRI D-Lib Test Suite Program, 1998—2001.

• Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier.

• All XML Journal -- AIP, APS, ACM.

Page 15: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Illinois Full-Text Testbed• American Institute of Physics--APL, JAP, RSI

– 19,000+ articles, 1995--.• American Physical Society--PRL

– 15,000+ articles, 1995--, weekly updates.• ASCE Journals (25 titles)

– 11,000+ articles, 1995--.• IEE Proceedings and Electronics Letters

– 9,500+ articles, 1993--.

• IEEE Computer Society.• ASM (American Society for Materials) Handbook.• ACM (Association for Computing Machinery)

Transactions.• Elsevier Science.

Page 16: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Accomplishments• Process & retrieve from multiple publishers &

heterogeneous DTDs.

• SGML to XML Conversion.

• Development of a metadata specification that uses RDF, Dublin Core (DCQ and XML) XML Schemas, local Namespace.

• Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata.

• XSLT, CSS, for transformation & rendering, including Mathematics.

Page 17: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Accomplishments (2)• Introduction of numerous technologies now deployed

within publisher repositories:– Forward and Backward links in bibliographies -- within

Testbed/Repository, from/to A & I Services.

– Use of XSLT for transforming XML to HTML.

– Rich extended abstracts.

• Conversion of ISO 12083 math markup to MathML. CSS/DHTML mathematics rendering. Use of plug-ins.

• Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices.

• Local Link Server for DOIs, Context-Sensitive linking.

Page 18: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

XML (eXtensible Markup Language)

• Like SGML, a Data Description Metalanguage.• XML a subset/version of SGML.• Document representation and interchange Standard.• Allows fine-granularity markup of content and structure.

Author can create their own elements (extensible).• Tags define the structure of document not the presentation

format.• Validated vs. “well-formed” - separation of authoring

process from representation & presentation.• Either validated in DTD/Schema or well-formed.• Integrated with relational DBs.

Page 19: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

XML Features• The milestones in document description and

transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability.

• DTD not required with XML. Needed if internal entities.

• Use of Document Object Model (DOM).

• Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document.

Page 20: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

XML in Information Technologies• Used in Open Archives Initiative (OAI),

NSDL.• Compatible with MS SQL Server, Tamino

(Software AG), Oracle, DLXS/XPAT (University of Michigan/OpenText), others.

• Integral to Web Services (WSDL) and SOAP – Google Web Service.

• Used in Library of Congress MODS and METS metadata technologies.

• Baked into XyVision and publishing packages.

Page 21: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

XML, XSLT, and CSS• Use XML full-text articles as ordered hierarchy

of content objects.• Generate item-level metadata in XML, using

RDF and Dublin Core syntax and semantics.• XSLT and CSS used to present metadata and

articles in either XML or HTML format depending on Browser.

• Mathematics rendering using MathML tools (conversion from ISO 12083 to MathML).

• Real-time transformation between XML and HTML using XSLT.

Page 22: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Schemas vs. DTDs

• Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements.

• Schema addresses limitations of DTDs and the increasingly data-oriented role of XML.

• W3C XML Schema Working Group: two documents: XML structures and datatypes.

Page 23: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Schema Justification• Description of document type’s structure should

be in an XML document instead of written in special syntax (DTD).

• Schema are in XML: easier to edit and process using standard XML DOM manipulation tools.

• DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices.

Page 24: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Metadata and Linking Standards

• Digital Object Identifier (DOI) and Persistent Object Identifiers.

• OpenURL and Value-Added Service Components (SFX).

• Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF.

• Local Resolver Servers.

Page 25: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Open Archives Initiative (OAI)• Released version 1.0 of metadata harvesting

protocols. Frozen through second quarter 2001.• Mechanism for data providers to expose their

metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories.

• Roots in e-print archives.• Lightweight, low-barrier. Easy to implement Web

server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.

Page 26: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Ongoing Investigations• Relationship between interoperability models for

search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive.

• OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site.

• Role of HTTP harvesting, Spider technology.• Reference Linking integration built on OpenURL and

DOI.• Reference Assistant software with simultaneous search,

point-of-contact assistance, and remote reference capability.

Page 27: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Portals and Gateways• Role is to bring together and integrate

disparate e-resources.• Provide a systematic ‘view’ of the

information landscape, particularly full-text.• Two primary foci: robust search/navigation

and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text.

• Central to this implementation is federated and simultaneous search and reference linking technologies.

Page 28: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Digital Object Identifier (DOI)• DOI is both a unique identifier of a piece of

digital content AND a system to access that content digitally. Persistent object identifier.

• ‘The ISBN for the 21st Century’ -- Norman Paskin.

• DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database.

• Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.

Page 29: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

DOI Construction• First real open standard for content identification.

• DOI is a number that identifies a digital object:– 10.1063/S000369519903216

• 10 Registration Agency Prefix

• 1063 Publisher Prefix

• S000369519903216 Suffix (Publisher-assigned ID)

• Suffix can be SICI or PII.

• The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g:– 10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf

Page 30: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Using a DOI• DOIs are resolved using the Handle System

technology from CNRI (Corporation for National research Initiatives).

• Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g:– dx.doi.org/10.1063/333 redirects to

www.pubsite.org/apr99/artl1.pdf

Page 31: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Reference Linking• CrossRef Publisher system: major Sci-Tech

professional societies and commercial publishers.

• System design calls for one URL for each DOI; underlying technology can handle multiple URLs however.

• Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem.

Page 32: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Cookie on clientClient

(Web Browser)

DOI Proxy

Illinois LocalLink Server

OpenURL

AwareLocal

AIP, IEE

CrossRefMetadataDatabase

dx.doi.org/10.1063/1234HandleServer

AIP

IEE

Elsevier

DOI

Metadata

LocalValueAdded

Nosfx=y

UIUC MetadataRegistry

OpenURL

Page 33: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Simultaneous Search Implementations• DialIndex from Dialog.• Ex Libris MetaLib service.• Endeavor EnCompass.• Innovative Interfaces MetaFind.• Ovid Multiple Search and reference De-Duping.• ISI Web of Knowledge.• Gale Corporation InfoTrac Total Access.• WebFeat.• California Digital Library SearchLight system.• Los Alamos FlashPoint system.• Fretwell-Downing partnering with ARL Portal and

Monash University.

Page 34: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Grainger Search Aid• Assist users in the selection of appropriate

databases .• Normalize user search arguments and display

search results from candidate databases.• Cross-database asynchronous concurrent

searching.• Article level and e-journal Web site access to

publisher full-text repositories.• Utilize OpenURL, CrossRef metadata database

and DOI for reference linking at the article level.• Proxying of vendor systems and capability of

‘taking over’ the search in vendor native mode.

Page 35: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Grainger Search Aid

Page 36: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 37: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 38: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 39: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Reference Assistant Project• Utilize Search Aid simultaneous search and

link capabilities.

• Opportunity to explore interface and navigation issues.

• Mimics the behavior of reference librarian.

• Allows the application of ‘best match’ and ‘quorum searching’ algorithms.

Page 40: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Reference Assistant Top Menu

Page 41: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 42: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search
Page 43: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Simultaneous Search Implementations

• Shared Blackboard approach employing Independent Searchbots dedicated to searching information resources and passing results to Web clients.

• Event-Driven, Asynchronous HTTP Queries from within a Single Script returning results to Web browser.

Page 44: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Event-Driven, Asynchronous Queries

• Single, event-driven web server process, asynchronously querying multiple resources.

• Uses WinHTTP from ASP and VBScript• Simpler, not as flexible. Search algorithms and

processing coded in scripts.• This is the approach we currently use for our

service.• Implementation of multi-step login and session

variable passthru being investigated.

Page 45: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

OpenURL-Based Services

• Standard for expressing and transmitting metadata.

• Promise of standardized, normalized search results.

• Provides value-added links to the Ovid search results.

• Using CrossRef metadata database to look up DOIs.

Page 46: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

CiteParse.dll• An ActiveX DLL which can parse various Ovid

citations and turn them into OpenURLs:

• Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan MRT. Mawst LJ. Temperature analysis … quantum-well lasers. [Article] IEEE Journal of Quantum Electronics. 38(6):640-651, 2002 Jun.

• http://…/resolver.asp?genre=article&aulast=Tansu&auinit1=N&atitle=Temperature+analysis+…+quantum-well+lasers&title=IEEE+Journal+of+Quantum+Electronics&volume=38&issue=6&spage=640&epage=651&pages=640-651&date=2002-06

Page 47: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Conclusions• User reactions very positive.• The one-stop-shopping approach has been successful.• Users consider ability to link to full-text from citations

in A & I Services and from references on publisher portals very helpful.

• Technically, best approach appears to be a hybrid of asynchronous client interface with Web Services querying databases. Moves database middleware to Web Services and eliminates extensive custom script code for search and database query.

Page 48: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Publishing Trends• Publishers will continue to add value to

online journal articles.

• Digital version will become version of record.

• Virtual journals (both publisher-based and cross-publisher) will become common.

• Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations.

Page 49: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Publishing Trends (Continued)• Personalized services will be available --

agent technology, alerting services.

• Different economic and subscription models will be introduced.

• Deconstruction of Journal (Bob Kelly, APS); article at a time publishing.

• Journal branding or perhaps publisher branding.

• Academia issues: publishing, tenure.

Page 50: Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Continuing Issues• Role of Authors, Academic Institutions,

Libraries, Publishers, Abstracting & Indexing Services.

• Disintermediation may affect both Libraries and Publishers.

• Information as Function not Place.

• Provide a ‘Digital Library’ out of digital collections.

• Role of XML technology.

• Service mechanisms: processing & archiving, search and discovery, presentation, linking.