Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October 2002

Preview:

Citation preview

Electronic Archiving & JSTOR

Kevin Guthrie

e-icolc, Thessanoliki, Greece

October 2002

www.jstor.org

Overview

• E-Archiving – Defining the Problem

• The Mellon Foundation Grant ProgramExplanation

Lessons Learned

• E-Archiving Economics

• JSTOR E-Archiving Approach

E-Archiving – Defining the Problem

The Print Archive

• It occurs as a by-product of access. Journals or books are purchased because of need and then retained.

• The content must be held locally.

• It is a system of countless local decisions and there is no system-wide planning effort.

• The buildings housing the libraries lend themselves nicely to fund raising.

• Library volume counts impact competitive standing.

• There is a significant but relatively stable and predictable cost stream for maintenance.

How is an E-Archive Different?

• Challenges– The dynamic nature of the formats– We cannot predict the course of future software

developments – there can be no black box technological solution

– We must establish new relationships between preservation and use so that usage leads to preservation. Benign neglect is not effective

• Opportunities– Freedom from time and space– Economies of scale in distribution

E-Archiving is a Growing Problem

• We are in a time of transition

• Users find the electronic version more convenient – “copy of record”

• Many libraries bearing double costs, but increasingly they are cancelling print subscriptions, taking only electronic

• There is no systematic archiving solution in place

E-Archiving:Assumptions and Basic Premise

• The academic community needs a system of trusted archives of “born digital” journal content

• The trusted archives must have a sustainable economic model and be able to preserve the content for the very long-term

E-Archiving – The Mellon Foundation Grant Program

E-Archiving:Mellon Foundation Program

• There were seven grant recipients

• The goal was to find a workable and sustainable model for an ongoing e-archiving effort.

• To explore “presentation file” and “source file” options.

•Cornell

•Harvard

•MIT

•NYPL

•Stanford (LOCKSS)

•University of Pennsylvania

•Yale

Working Assumptions of the Source File Archives

• Archive should be independent of publishers– responsibility of institutions for whom archiving is a

core mission

• Archiving requires active publisher partnership• Address long timeframes • Archive design based on Open Archival

Information System (OAIS) model

Working Assumptions of the Source File Archives

• Archive negotiates relationship with publisher• Publisher deposits content regularly• Content accompanied by metadata to support

discovery and preservation • Archived content only accessible under specific

conditions• Archive assumes responsibility for long-term

preservation

Questions that Arose

• What is archived?• In what format?• When is archive accessible?• Who can access archived content?• What does the archive “preserve”?• Who does archiving?• How is the archive paid for?• How is the archive governed?

Content of e-journals not just full-length articles

• Journal description• Editorial board• Instructions to authors• Rights and usage terms• Copyright statement• Ordering information• Reprint information• Indexes, membership lists, errata, etc.

Challenging Content

• Masthead, “front matter” stored as web pages, not in content management systems

• No control over the format of supplementary materials (datasets, images, tables, etc.)

• Advertising very complex– dynamic, frequently from third party, can involve

country-specific complexities

• Links frequently separate from articles– regularly updated, sometimes dynamic

File Formats?

• PDF? SGML/XML? HTML? All or none?

• PDF ubiquitous but there are concerns– Proprietary

– Emphasizes presentation, not meaning

– Is it preservable?

• Sometimes only choice

File Formats?

• PDF? SGML/XML? HTML? All or none?

• XML increasingly common

• Migration path seems more clear –flexible

• Many different DTDs. Can we develop a standard archival exchange DTD?

• NLM/Mulberry/Inera/Harvard effort

Interchange DTD

• How low is the common denominator? • What gets lost?

– inevitably sacrifices some functionality and original appearance

• Transformation from publisher’s “native” DTD involves risks

• Some technically difficult areas– extended character sets, mathematical and chemical

formulae, tables. “generated text”

Access Terms

• Publishers prefer “dark” archives– does not compete with publisher’s service

• If “dark”, what “trigger events” make it accessible?– after a given period of time (‘moving wall”)?– when content is not otherwise accessible

(“failsafe”)? – only when content enters the public domain?

Why is Access Important?

• How do you verify that what you are preserving is accessible? Users are good auditors

• Where do you find the resources to underwrite the costs associated with archiving?

• Archiving is a public good. Clean air, public park. A mechanism for payment?

Who Should Pay for the Archive?

• Who benefits?– Publishers, libraries, authors, scholarly societies…

– Is there a way to share costs?

• Cost categories include– Preparation of “archivable” objects

– Ingestion and quality control

– Long-term storage

– Preservation activity

Mellon Foundation Program Findings

• Archiving seems technically feasible• Publishers indicate that archiving is important to

them• Progress on developing a common archival

exchange DTD • Shared understanding that archives are necessary

to establish e-versions as the publications of record and for it to be possible to let go of paper subscriptions

• Challenges in the Organizational Model– Follow-on grant proposals required substantial $ but were

potentially duplicative and in some ways overlapping. There appear to be economies of scale.

– Difficult to effect coordinated activity by distinct universities. Individual universities found it difficult to develop a business model

that would distribute fairly the costs, benefits, and incentives associated with e-journal archiving.

It was difficult to organize and justify a process for any one university to take on the archival responsibility for others at the scale required.

E-Archiving:Mellon Foundation Program Findings

• Mellon concluded that an organizational entity which is separate from the universities and which is dedicated to the task of e-archiving journal literature is needed.

The largest issue is: How to create a sustainable economic model in support of an e-archive?

E-Archiving:Mellon Foundation Program Findings

E-Archiving Economics

Archiving Economics:What About an E-Archive?

• Presently, publishers are offering access to the content.

• We are truly talking about is the long-term preservation of this content, unbundled from the access.

• The content and the archive are valued, but is there a willingness to pay?

• Are institutions willing to pay insurance premiums for archival protection?

• The lack of an economy associated with electronic archiving is a huge challenge facing the community, because we have no model in place.

JSTOR’s Mission

• To help the scholarly community take advantage of advances in information technologies.

• To develop a trusted archive of core scholarly journal literature, emphasizing conversion of entire journal backfiles and preservation of future e-versions.

• To enhance the accessibility of older journal literature

In pursuing its mission, JSTOR takes a system-wide perspective, seeking benefits for libraries, publishers and scholars & students.

Why Is JSTOR an Archive?

An archive must consider things such as:– Technological Choices

– Data Backup and Redundancy

– Publisher Relationships – Perpetual Rights to the Source Content

– Financial Strategies and Economics

– “Moving Wall” to Preserve Future e-Versions

Mission is critical.

Archiving Economics:The JSTOR Example

• A&S I – approximately 7,700 volumes

• Building, storage & maintenance:– Prime space: $125,000

– Remote storage: $31,000

• Circulation– Prime space: $1 per use

– Remote Storage: $3 per use

• JSTOR Fees– Archive Capital Fee: $10,000 - $45,000

– Annual Access Fee: $2,000 - $5,000

Archiving Economics:JSTOR purchase as an example

• Even research librarians do not focus on the JSTOR archival mission, they often just see JSTOR as a useful database. Therefore, JSTOR is typically purchased from the acquisitions budget.

• It does not recognize the overall value, nor the overall savings to the institution.

• The capital part of the value is not fungible and not recognized, but it exists.

• Who is the archiving czar? Is there an archive budget?

Archiving Economics:How To Pay For Complex E-Archive

• There are no organizational and accounting systems set up to underwrite the “archiving” function.

• No one is used to paying someone else for central archiving.• There is no building to name and no volume count to promote.• There is no budgetary line item.• Despite the rhetoric, will institutions be willing to underwrite a

centrally held archive to preserve little-used materials?

Article in Educause Review:http://www.educause.edu/ir/library/pdf/erm00164.pdf

Archiving Economics:Conclusion

• E-archiving requires some level of central planning and coordination.

• Institutions will have to establish a mechanism to provide funds to support such an effort.

• Governments may need to subsidize archives in order to build them on a massive scale.

• The financial systems must be consistent with the principles being pursued.

• Some form of access may need to be bundled with the archiving/preservation function

JSTOR’s Approach to E-Archiving

JSTOR E-Archive

• Focused on planning for the receipt of electronic data in accordance with moving walls

• First lessons in data ingest connected to Current Issues Linking effort

• Internal organizational approach has been to use existing staff working as part of an e-archiving working group

JSTOR E-Archive

• Establishing new unit dedicated to e-archiving.• Have been granted $1.3M in start up funding from

the Mellon Foundation.• Additional funding for the unit will come from

JSTOR, a “paying customer” of the new unit.

• We will have the same principals as with the print, but a new business model is needed.

Why JSTOR as Organizational Home

• Not-for-profit status

• Mission

• Non-competitive with publishers

• Dedicated to long term preseravtion

• Relationships with over 1,400 libraries in 70 countries

• Relationships with nearly 180 publishers

Why JSTOR as Organizational Home

• Appreciation for IP issues and evolving law

• Experience converting over 10M journal pages and providing continual access to the archive

• Experience developing sustainable access and business models

• Positive and strong relationships with various granting agencies

JSTOR E-Archive Anticipated Activities

Mellon Grant: 18 month grant period.

The Goal: To establish a credible and sustainable operation for e-archiving that

includes all the key components required for an ongoing archiving enterprise.

JSTOR E-Archive Anticipated Activities

• Establish the parameters of content.– Determine what content will be preserved.

• Establish an access model which balances the needs of publishers, librarians & scholars.– Address the “public good” problem.

• Secure agreements with publishers.– Begin with current JSTOR publishers.

– Explore other publisher relationships.

JSTOR E-Archive Anticipated Activities

• Establish a production operation.– Apply quality control lessons gained through

experience digitizing print.

– Build on progress made by Mellon program participants.

• Build a technical infrastructure – Compatible with the OAIS Reference Model

Where we’re starting

Exciting Challenges:– Continue with the print.– Begin archiving the electronic version of the

titles currently within JSTOR.

Kevin GuthriePresident, JSTOR

www.jstor.org

Recommended