Upload
lilian-higgins
View
217
Download
0
Embed Size (px)
Citation preview
1
Reference Linking inProject Euclid
…with some thoughts on the preservation of digital collections.
A presentation at the Workshop on
Linking and searching in distributed digital libraries
University of Michigan, Ann Arbor, University Library
March 19, 2002
William R. [email protected]
Digital Library and Information Technologies
Cornell University Library
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library2
Context – what is Project Euclid? Requirements – the constraints for the
reference linking system Implementation – some design views Next Steps – our plans for the future Preservation – thinking long-term about
digital collections
Overview
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library3
What is Project Euclid?
A partnership of independent publishers of mathematics and statistics journals
Publishers provide born-digital versions of their print journals.
http://projecteuclid.org
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library4
Reference Linking: two viewpoints
The publisher’s point of view Links to multiple resources add value to the electronic version. MR numbers, CrossRef DOIs, web links are included in the
reference when we find them
The library’s point of view The appropriate copy problem—does a link lead to a copy for
which the library has viewing/distribution rights. Is the copy an authentic representation of the original?
Project Euclid represents publishers
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library5
Purpose
References in article files are made available as links on HTML abstract pages
<<PDF>>
<<HTML>>
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library6
Requirements
Automatic processing Extensibility to multiple reference styles Extensibility to multiple input formats Low-cost maintenance High accuracy
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library7
Implementation
ConversionConversion
ExtractionExtraction
ParsingParsing
Look-upLook-up
Creating LinksCreating Links
StoringStoring
<<PDF>>
•Title•Author and affiliation•Abstract goes here•Body•References
<<XML>>
•Title•Author and affiliation•Abstract goes here•Body•References
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library8
Conversion
The converter is Derek Noonberg’s “pdftotext” utility. http://www.foolabs.com/xpdf/home.html
<<PDF>><<PDF>>
•Title•Author and affiliation•Abstract goes here•Body•References
•Title•Author and affiliation•Abstract goes here•Body•References
<<Text>><<Text>>
•Title•Author and affiliation•Abstract goes here•Body•References
•Title•Author and affiliation•Abstract goes here•Body•References
ConverterConverter
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library9
Conversion/Extractionactivity diagram
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library10
Extraction
A fragment of the perl module that extracts the references from the text version of an article
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library11
Object view
Reference
MRNum
Year
TitleDOI Journal
String
LinkedString
Parsing Method Factory
getMRNum()
getYear()
getDOI()
getTitle()
getJournal()
… more …
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library12
Parsing
Each element of a Reference is extracted by a subroutine customized for how the element appears in a particular journal style.
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library13
Look-up
•|IEEE Trans. Automat. Control|chang||||1994||||Stability, queue length and delay of deterministic and stochastic queue
•|SIAM J. Control Optim.|Dupuis||||1989||||
Query
Result set0018-9286|IEEE Trans. Automat. Control|Chang|39|5|913|1994|||95b:90029|Stability, queue length, and delay of deterministic and stochastic queueing networks.
|SIAM J. Control Optim.|Dupuis||||1989||||
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library14
Link Creation
<string>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 993 1000.</string>
<linkedString>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 993 1000. <a href="http://www.ams.org/mathscinet-getitem?mr=90h:32025" target="_blank">MR 90h:32025</a></linkedString>
An HTML anchor tag is inserted into the reference string and saved to an XML file. The User Interface module later uses the linkedString element when creating an Article Abstract page on the fly. It doesn’t have to know how to create the link.
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library15
Storing
<referenceList><reference>
<refString></refString><linkedString></linkedString><title></title><journal></journal>… more elements …
</reference><reference>
… elements …</reference>
</referenceList>
Stored as an XML file
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library16
…an HTML link on the article’s abstract page …
Display
An element in an xml file provides…
… which links to a MathSciNet page
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library17
Next Steps
More journals Adding DOIs to the abstract page Conversion from LaTeX files Digitized back issues
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library18
Addendum on Digital Preservation
Libraries and others are considering ways to preserve our digital resources for the long term.
One possible solution is the LOCKSS system (Lots of Copies Keep Stuff Safe)
Another solution is to preserve the metadata needed to describe and reconstruct a collection while preserving and providing access to the data files. The Consultative Committee for Space Data Systems has published a Reference Model for an Open Archival Information System (OAIS). Many of the persons working with digital collections in the library and archive world are using this model to plan for long-term preservation.
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library19
From the Reference Model for an Open Archival Information System (OAIS)
Archival Information Package
Archival Information Package
Archival Information Package
PreservationDescriptionInformation
PreservationDescriptionInformation
RepresentationInformation
RepresentationInformation
*
1Data
Object
DataObject
<< file >>Digital Object
<< file >>Digital Object
Content Information
Content Information
ReferenceInformation
ReferenceInformation
Provenance Information
Provenance Information
FixityInformation
FixityInformation Context
Information
ContextInformation
Most digital collections contain some form of the objects in blue.
OAIS-compliant systems also contain the metadata objects in yellow
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library20
OAIS Functional Model
From the Reference Model for an Open Archival Information System (OAIS)
March 19, 2002 William R. Kehoe, Digital Library and Information Technology, Cornell University Library21
For More Information…
Project Euclid—http://projecteuclid.org MR Batch Lookup—
http://www.ams.org/mrlookup-support/technical_help.html#http Consultative Committee for Space Data
Systems—http://www.ccsds.org Reference Model for an Open Archival Information System
(OAIS)—http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
LOCKSS—http://lockss.stanford.edu