21
1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching in distributed digital libraries University of Michigan, Ann Arbor, University Library March 19, 2002 William R. Kehoe [email protected] Digital Library and Information Technologies

1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

Embed Size (px)

Citation preview

Page 1: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

1

Reference Linking inProject Euclid

…with some thoughts on the preservation of digital collections.

A presentation at the Workshop on

Linking and searching in distributed digital libraries

 University of Michigan, Ann Arbor, University Library

March 19, 2002

William R. [email protected]

Digital Library and Information Technologies

Cornell University Library

Page 2: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library2

Context – what is Project Euclid? Requirements – the constraints for the

reference linking system Implementation – some design views Next Steps – our plans for the future Preservation – thinking long-term about

digital collections

Overview

Page 3: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library3

What is Project Euclid?

A partnership of independent publishers of mathematics and statistics journals

Publishers provide born-digital versions of their print journals.

http://projecteuclid.org

Page 4: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library4

Reference Linking: two viewpoints

The publisher’s point of view Links to multiple resources add value to the electronic version. MR numbers, CrossRef DOIs, web links are included in the

reference when we find them

The library’s point of view The appropriate copy problem—does a link lead to a copy for

which the library has viewing/distribution rights. Is the copy an authentic representation of the original?

Project Euclid represents publishers

Page 5: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library5

Purpose

References in article files are made available as links on HTML abstract pages

<<PDF>>

<<HTML>>

Page 6: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library6

Requirements

Automatic processing Extensibility to multiple reference styles Extensibility to multiple input formats Low-cost maintenance High accuracy

Page 7: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library7

Implementation

ConversionConversion

ExtractionExtraction

ParsingParsing

Look-upLook-up

Creating LinksCreating Links

StoringStoring

<<PDF>>

•Title•Author and affiliation•Abstract goes here•Body•References

<<XML>>

•Title•Author and affiliation•Abstract goes here•Body•References

Page 8: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library8

Conversion

The converter is Derek Noonberg’s “pdftotext” utility. http://www.foolabs.com/xpdf/home.html

<<PDF>><<PDF>>

•Title•Author and affiliation•Abstract goes here•Body•References

•Title•Author and affiliation•Abstract goes here•Body•References

<<Text>><<Text>>

•Title•Author and affiliation•Abstract goes here•Body•References

•Title•Author and affiliation•Abstract goes here•Body•References

ConverterConverter

Page 9: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library9

Conversion/Extractionactivity diagram

Page 10: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library10

Extraction

A fragment of the perl module that extracts the references from the text version of an article

Page 11: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library11

Object view

Reference

MRNum

Year

TitleDOI Journal

String

LinkedString

Parsing Method Factory

getMRNum()

getYear()

getDOI()

getTitle()

getJournal()

… more …

Page 12: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library12

Parsing

Each element of a Reference is extracted by a subroutine customized for how the element appears in a particular journal style.

Page 13: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library13

Look-up

•|IEEE Trans. Automat. Control|chang||||1994||||Stability, queue length and delay of deterministic and stochastic queue

•|SIAM J. Control Optim.|Dupuis||||1989||||

Query

Result set0018-9286|IEEE Trans. Automat. Control|Chang|39|5|913|1994|||95b:90029|Stability, queue length, and delay of deterministic and stochastic queueing networks.

|SIAM J. Control Optim.|Dupuis||||1989||||

Page 14: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library14

Link Creation

<string>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 993 1000.</string>

<linkedString>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 993 1000. <a href="http://www.ams.org/mathscinet-getitem?mr=90h:32025" target="_blank">MR 90h:32025</a></linkedString>

An HTML anchor tag is inserted into the reference string and saved to an XML file. The User Interface module later uses the linkedString element when creating an Article Abstract page on the fly. It doesn’t have to know how to create the link.

Page 15: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library15

Storing

<referenceList><reference>

<refString></refString><linkedString></linkedString><title></title><journal></journal>… more elements …

</reference><reference>

… elements …</reference>

</referenceList>

Stored as an XML file

Page 16: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library16

…an HTML link on the article’s abstract page …

Display

An element in an xml file provides…

… which links to a MathSciNet page

Page 17: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library17

Next Steps

More journals Adding DOIs to the abstract page Conversion from LaTeX files Digitized back issues

Page 18: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library18

Addendum on Digital Preservation

Libraries and others are considering ways to preserve our digital resources for the long term.

One possible solution is the LOCKSS system (Lots of Copies Keep Stuff Safe)

Another solution is to preserve the metadata needed to describe and reconstruct a collection while preserving and providing access to the data files. The Consultative Committee for Space Data Systems has published a Reference Model for an Open Archival Information System (OAIS). Many of the persons working with digital collections in the library and archive world are using this model to plan for long-term preservation.

Page 19: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library19

From the Reference Model for an Open Archival Information System (OAIS)

Archival Information Package

Archival Information Package

Archival Information Package

PreservationDescriptionInformation

PreservationDescriptionInformation

RepresentationInformation

RepresentationInformation

*

1Data

Object

DataObject

<< file >>Digital Object

<< file >>Digital Object

Content Information

Content Information

ReferenceInformation

ReferenceInformation

Provenance Information

Provenance Information

FixityInformation

FixityInformation Context

Information

ContextInformation

Most digital collections contain some form of the objects in blue.

OAIS-compliant systems also contain the metadata objects in yellow

Page 20: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library20

OAIS Functional Model

From the Reference Model for an Open Archival Information System (OAIS)

Page 21: 1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching

March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library21

For More Information…

Project Euclid—http://projecteuclid.org MR Batch Lookup—

http://www.ams.org/mrlookup-support/technical_help.html#http Consultative Committee for Space Data

Systems—http://www.ccsds.org Reference Model for an Open Archival Information System

(OAIS)—http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf

LOCKSS—http://lockss.stanford.edu