12
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO 65211

Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Embed Size (px)

Citation preview

Page 1: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Technology Choices for the JSTOR Online Archive

Presented by

Chang FengDepartment of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia,

MO 65211

Page 2: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Reference

Technology Choices for the JSTOR Online Archive, S. W. Thomas, K. Alexander, and K. Guthrie, Computer (February 1999), 60-65.

Page 3: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

JSTOR Overview

Goals: To increase access to older scholarly materials by converting them to digital media and providing a full-text search capability.

Benefits: Preservation of the original documents and conserving library shelf space.

Development phases:– Phase-I (scheduled for completion by the end of

1999): minimum of 100 journal titles, primarily in the humanities and social sciences.

– As of December 1998: 67 journal titles, total 450,000 articles and 2.7 million pages.

Page 4: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Implementation JSTOR

Principles– Let mission guide technical choices.

– User first. Issues to be addressed when building the digital

library– Formats (e.g., image v.s. formatted text)

– Storage, display and distribution technologies (e.g., CD-ROM v.s. Internet)

Page 5: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Implementing JSTOR

Mission: A reliable and faithful electronic archive Choice of technology: Scanned-in image at 600 dpi

for each page. Mission: Searchable Choice of technology: Use OCR software to create

text files that would let the user search journals’ full text.

Mission: Reduce long-term library costs Choice of technology: Database storage centralized,

with distribution over the Internet.

Page 6: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Delivering JSTOR Pages

Deliver in GIF format: ~30 Kbytes/page. Converts page to screen resolution as needed. System caches converted pages for 3-4 days. Deliver one page at a time with next page pre-

loading. Print entire article: ( at 600 or 150 dpi resolution )

– JPrint as a separate application (faster)

– Adobe Acrobat files

– PostScript files

Page 7: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Searching JSTOR

Graphic searching interface. Stores the full text in one file per page. Each article also contains a citation file. Text files have embedded tags that specify

which parts of the text belong to which article. Separate index for each journal title. Articles are indexed using Full-Text

Lexicographer (U. of Michigan):– Allow dynamic updating (no index down time).– Periodically optimizing index with no down time.

Page 8: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Browser Interoperability

Major issue: Back compatibility.– Support HTML 3.2 standard

– JSTOR interface uses frame, but can adjust itself automatically to an unframed interface.

– Use new technology to enhance functionality, but not to provide basic functionality.

– Plug-ins not encouraged.

Page 9: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

JSTOR Server Infrastructure

Storage: – Online: 600 dpi TIFF page images compressed with

Cartesian Perceptual Compression (1:4, CPI Inc.).

– Offline: multiple copies of the original TIFF images for archival purposes.

Performance:– Replacing CGI programs with FastCGI or Java

servlets.

– Server mirroring

Page 10: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Issues of Server Mirroring

Mirror server load balancing: Currently using a round-robin method.

Mirror server synchronization: Currently, new release (> 1 GB/month) are shipped overnight on magnetic tape to mirror sites.

User state synchronization: Currently,– Regenerate the data at the current server if possible, or

– Current server request information from the server that originally created it and caches that copy for future use.

Page 11: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Authentication

Cross organization access management JSTOR currently rely on participating

institutions to supply with authenticated IP address.

Under evaluation: – digital certificates issued by the participating

institutions.

– password-based access control.

Page 12: Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,

Conclusions

The choice of technology is based on the mission of the project and user feedback.

Must remain flexible.