41
Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002 www.jstor.org

Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Embed Size (px)

Citation preview

Page 1: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Electronic Archiving & JSTOR

Kevin Guthrie

e-icolc, Thessanoliki, Greece

October 2002

www.jstor.org

Page 2: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Overview

• E-Archiving – Defining the Problem

• The Mellon Foundation Grant ProgramExplanation

Lessons Learned

• E-Archiving Economics

• JSTOR E-Archiving Approach

Page 3: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving – Defining the Problem

Page 4: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

The Print Archive

• It occurs as a by-product of access. Journals or books are purchased because of need and then retained.

• The content must be held locally.

• It is a system of countless local decisions and there is no system-wide planning effort.

• The buildings housing the libraries lend themselves nicely to fund raising.

• Library volume counts impact competitive standing.

• There is a significant but relatively stable and predictable cost stream for maintenance.

Page 5: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

How is an E-Archive Different?

• Challenges– The dynamic nature of the formats– We cannot predict the course of future software

developments – there can be no black box technological solution

– We must establish new relationships between preservation and use so that usage leads to preservation. Benign neglect is not effective

• Opportunities– Freedom from time and space– Economies of scale in distribution

Page 6: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving is a Growing Problem

• We are in a time of transition

• Users find the electronic version more convenient – “copy of record”

• Many libraries bearing double costs, but increasingly they are cancelling print subscriptions, taking only electronic

• There is no systematic archiving solution in place

Page 7: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving:Assumptions and Basic Premise

• The academic community needs a system of trusted archives of “born digital” journal content

• The trusted archives must have a sustainable economic model and be able to preserve the content for the very long-term

Page 8: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving – The Mellon Foundation Grant Program

Page 9: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving:Mellon Foundation Program

• There were seven grant recipients

• The goal was to find a workable and sustainable model for an ongoing e-archiving effort.

• To explore “presentation file” and “source file” options.

•Cornell

•Harvard

•MIT

•NYPL

•Stanford (LOCKSS)

•University of Pennsylvania

•Yale

Page 10: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Working Assumptions of the Source File Archives

• Archive should be independent of publishers– responsibility of institutions for whom archiving is a

core mission

• Archiving requires active publisher partnership• Address long timeframes • Archive design based on Open Archival

Information System (OAIS) model

Page 11: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Working Assumptions of the Source File Archives

• Archive negotiates relationship with publisher• Publisher deposits content regularly• Content accompanied by metadata to support

discovery and preservation • Archived content only accessible under specific

conditions• Archive assumes responsibility for long-term

preservation

Page 12: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Questions that Arose

• What is archived?• In what format?• When is archive accessible?• Who can access archived content?• What does the archive “preserve”?• Who does archiving?• How is the archive paid for?• How is the archive governed?

Page 13: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Content of e-journals not just full-length articles

• Journal description• Editorial board• Instructions to authors• Rights and usage terms• Copyright statement• Ordering information• Reprint information• Indexes, membership lists, errata, etc.

Page 14: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Challenging Content

• Masthead, “front matter” stored as web pages, not in content management systems

• No control over the format of supplementary materials (datasets, images, tables, etc.)

• Advertising very complex– dynamic, frequently from third party, can involve

country-specific complexities

• Links frequently separate from articles– regularly updated, sometimes dynamic

Page 15: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

File Formats?

• PDF? SGML/XML? HTML? All or none?

• PDF ubiquitous but there are concerns– Proprietary

– Emphasizes presentation, not meaning

– Is it preservable?

• Sometimes only choice

Page 16: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

File Formats?

• PDF? SGML/XML? HTML? All or none?

• XML increasingly common

• Migration path seems more clear –flexible

• Many different DTDs. Can we develop a standard archival exchange DTD?

• NLM/Mulberry/Inera/Harvard effort

Page 17: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Interchange DTD

• How low is the common denominator? • What gets lost?

– inevitably sacrifices some functionality and original appearance

• Transformation from publisher’s “native” DTD involves risks

• Some technically difficult areas– extended character sets, mathematical and chemical

formulae, tables. “generated text”

Page 18: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Access Terms

• Publishers prefer “dark” archives– does not compete with publisher’s service

• If “dark”, what “trigger events” make it accessible?– after a given period of time (‘moving wall”)?– when content is not otherwise accessible

(“failsafe”)? – only when content enters the public domain?

Page 19: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Why is Access Important?

• How do you verify that what you are preserving is accessible? Users are good auditors

• Where do you find the resources to underwrite the costs associated with archiving?

• Archiving is a public good. Clean air, public park. A mechanism for payment?

Page 20: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Who Should Pay for the Archive?

• Who benefits?– Publishers, libraries, authors, scholarly societies…

– Is there a way to share costs?

• Cost categories include– Preparation of “archivable” objects

– Ingestion and quality control

– Long-term storage

– Preservation activity

Page 21: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Mellon Foundation Program Findings

• Archiving seems technically feasible• Publishers indicate that archiving is important to

them• Progress on developing a common archival

exchange DTD • Shared understanding that archives are necessary

to establish e-versions as the publications of record and for it to be possible to let go of paper subscriptions

Page 22: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

• Challenges in the Organizational Model– Follow-on grant proposals required substantial $ but were

potentially duplicative and in some ways overlapping. There appear to be economies of scale.

– Difficult to effect coordinated activity by distinct universities. Individual universities found it difficult to develop a business model

that would distribute fairly the costs, benefits, and incentives associated with e-journal archiving.

It was difficult to organize and justify a process for any one university to take on the archival responsibility for others at the scale required.

E-Archiving:Mellon Foundation Program Findings

Page 23: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

• Mellon concluded that an organizational entity which is separate from the universities and which is dedicated to the task of e-archiving journal literature is needed.

The largest issue is: How to create a sustainable economic model in support of an e-archive?

E-Archiving:Mellon Foundation Program Findings

Page 24: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

E-Archiving Economics

Page 25: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Archiving Economics:What About an E-Archive?

• Presently, publishers are offering access to the content.

• We are truly talking about is the long-term preservation of this content, unbundled from the access.

• The content and the archive are valued, but is there a willingness to pay?

• Are institutions willing to pay insurance premiums for archival protection?

• The lack of an economy associated with electronic archiving is a huge challenge facing the community, because we have no model in place.

Page 26: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR’s Mission

• To help the scholarly community take advantage of advances in information technologies.

• To develop a trusted archive of core scholarly journal literature, emphasizing conversion of entire journal backfiles and preservation of future e-versions.

• To enhance the accessibility of older journal literature

In pursuing its mission, JSTOR takes a system-wide perspective, seeking benefits for libraries, publishers and scholars & students.

Page 27: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Why Is JSTOR an Archive?

An archive must consider things such as:– Technological Choices

– Data Backup and Redundancy

– Publisher Relationships – Perpetual Rights to the Source Content

– Financial Strategies and Economics

– “Moving Wall” to Preserve Future e-Versions

Mission is critical.

Page 28: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Archiving Economics:The JSTOR Example

• A&S I – approximately 7,700 volumes

• Building, storage & maintenance:– Prime space: $125,000

– Remote storage: $31,000

• Circulation– Prime space: $1 per use

– Remote Storage: $3 per use

• JSTOR Fees– Archive Capital Fee: $10,000 - $45,000

– Annual Access Fee: $2,000 - $5,000

Page 29: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Archiving Economics:JSTOR purchase as an example

• Even research librarians do not focus on the JSTOR archival mission, they often just see JSTOR as a useful database. Therefore, JSTOR is typically purchased from the acquisitions budget.

• It does not recognize the overall value, nor the overall savings to the institution.

• The capital part of the value is not fungible and not recognized, but it exists.

• Who is the archiving czar? Is there an archive budget?

Page 30: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Archiving Economics:How To Pay For Complex E-Archive

• There are no organizational and accounting systems set up to underwrite the “archiving” function.

• No one is used to paying someone else for central archiving.• There is no building to name and no volume count to promote.• There is no budgetary line item.• Despite the rhetoric, will institutions be willing to underwrite a

centrally held archive to preserve little-used materials?

Article in Educause Review:http://www.educause.edu/ir/library/pdf/erm00164.pdf

Page 31: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Archiving Economics:Conclusion

• E-archiving requires some level of central planning and coordination.

• Institutions will have to establish a mechanism to provide funds to support such an effort.

• Governments may need to subsidize archives in order to build them on a massive scale.

• The financial systems must be consistent with the principles being pursued.

• Some form of access may need to be bundled with the archiving/preservation function

Page 32: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR’s Approach to E-Archiving

Page 33: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR E-Archive

• Focused on planning for the receipt of electronic data in accordance with moving walls

• First lessons in data ingest connected to Current Issues Linking effort

• Internal organizational approach has been to use existing staff working as part of an e-archiving working group

Page 34: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR E-Archive

• Establishing new unit dedicated to e-archiving.• Have been granted $1.3M in start up funding from

the Mellon Foundation.• Additional funding for the unit will come from

JSTOR, a “paying customer” of the new unit.

• We will have the same principals as with the print, but a new business model is needed.

Page 35: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Why JSTOR as Organizational Home

• Not-for-profit status

• Mission

• Non-competitive with publishers

• Dedicated to long term preseravtion

• Relationships with over 1,400 libraries in 70 countries

• Relationships with nearly 180 publishers

Page 36: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Why JSTOR as Organizational Home

• Appreciation for IP issues and evolving law

• Experience converting over 10M journal pages and providing continual access to the archive

• Experience developing sustainable access and business models

• Positive and strong relationships with various granting agencies

Page 37: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR E-Archive Anticipated Activities

Mellon Grant: 18 month grant period.

The Goal: To establish a credible and sustainable operation for e-archiving that

includes all the key components required for an ongoing archiving enterprise.

Page 38: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR E-Archive Anticipated Activities

• Establish the parameters of content.– Determine what content will be preserved.

• Establish an access model which balances the needs of publishers, librarians & scholars.– Address the “public good” problem.

• Secure agreements with publishers.– Begin with current JSTOR publishers.

– Explore other publisher relationships.

Page 39: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

JSTOR E-Archive Anticipated Activities

• Establish a production operation.– Apply quality control lessons gained through

experience digitizing print.

– Build on progress made by Mellon program participants.

• Build a technical infrastructure – Compatible with the OAIS Reference Model

Page 40: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Where we’re starting

Exciting Challenges:– Continue with the print.– Begin archiving the electronic version of the

titles currently within JSTOR.

Page 41: Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Kevin GuthriePresident, JSTOR

www.jstor.org