The Digital Preservation ProblemPreservation of digital content is an enormous issue, too big for individual institutions to handle alone.Working together cooperatively well develop effective models for how to do this.Preservation system requirement: no single point of failure.
The LOCKSS (Lots of Copies Keeps Stuff Safe) initiative was developed at Stanford to provide Libraries with low-cost online storage of content they license from publishers. This enables libraries operating in the online environment to again own the material they purchased, as they do in the print environment. Libraries can return to the necessary role of custodians of scholarly digital materials. Because materials are stored at several sites, the risk of losing content in the event of disaster is minimised because there are other copies stored elsewhere. LOCKSS claims to be the only system available that addresss these problems.
LOCKSS is not expensive so it can work with small as well as large publishers. It works well for any digital format delivered via the web. The LOCKSS system administration can be shared among those with technical expertise and those without. You decide on storage on a small number of large machines or a smaller machine.
LOCKSS is expanding its applications beyond ejournals. Im involved in:MetaArchive of Southern Digital CultureASERL ETDsInstitutional Repositories: libraries are not generally expecting IRs to solve the e-journal preservation problem. They are turning to solutions such as LOCKSS to do that.
Library uses inexpensive PC and free LOCKSS software downloaded from the webLOCKSS clusters six independent servers that audit each other, keeping ejournals complete. It periodically collects content from publisherIf there is a permission statement on the publishers journal (called a manifest page)Any format: text, sound, video, images, etc.Publishers grant permission for libraries to collect materials in chunks, called LOCKSS archival units--typically a volume and all its component issuesPreserves content among LOCKSS machines at other institutionsPeriodically audits content among LOCKSS machines and repairs as neededDisseminates content to Library usersHost librarys readers see the content at original URLUnless it isnt available from there and then it is delivered from the readers librarys LOCKSS-preserved content.It looks the same from either source: with the exception of dynamic content such as advertisements or rotating gifs.
70 publishers, >2,000 titles, endorse LOCKSS
Library (consortium) negotiates with publishersPermission to collect and preserve content is an addition to, not a separate agreement.Publishers trust LOCKSSCollections begin with subscriptions, not retrospectively--pubrs like thisPublishers continue to control who collects their e-content and how it is used.Libraries have access to these collections in perpetuity--after subscriptions endOutside the appropriate user community, access only to audit and repair filesContent is not shared among machines: Readers at institutions that did not collect it from the publisher do not get access via an institution that did. Access to LOCKSS-preserved content is triggered when someone in your authorized community of library users cant access an article, for example, from the publishers database.Storage costs low: 2003: $0.70 = one year, one journal, ~0.5GB
Costs associated with LOCKSS are all low
Low system administration costsStorage costs low: 2003: $0.70 = one year, one journal (0.5GB) Public access: preservation and accessDark archive: preservation onlyMetaArchive of Southern Digital Culture: Ill tell you more about this project ETDs: Electronic Theses and DissertationsASERL: Association of SouthEastern Research Libraries
9/11 web sites -- NYPL
Newspapers -- University of Utahstate newspaper project with othersGovernment DocumentsHalf-life of a federal web resource is 4 months12 library partners looking for support -- international, federal, state, and local gov docsIn a free society, citizens must be able to access the information published by their governments. A decade of experience with Web-pubd gov info has demonstrated that leaving materials only in the agencies custody can result in loss of important publications.
One of the issues we have to deal with in some of these projects is defining the unit to be captured and preserved periodically. Since we often dont necessarily have nice logical units like a volume that is a collection of issues. Ive learned that collections in my Digital Library and Archives are far less dead that youd expect to find in Special Collections. So weve renamed our version of LOCKSS to CLOCKSS.
Primary outcomes for partnerships:Identify and preserve significant contentLeverage resources, experience via collaborationPromote standards and best practices
These 6 collaborating institutions received an award for $1.3M to develop over 3 years a preservation cooperative of digital content with a particular focus on Southern culture and history
Some of the MetaArchive partners will also be part of the ASERL ETD project.
Those partners, at this time (6/10/05) areUniversity of Kentucky--Beth KraemerVT--meFSU--Robert McDonaldLOCKSS--Vicky Reich and Tom Robertson (MetaArchive partner too)Georgia Tech--Tyler WlatersVanderbilt--project initiated by Paul GhermanASERL (John Burger, ex dir, coordinating)[Pettersens renovated Alford-Nixon House]Our philosophy is that effective digital preservation efforts succeed through a strategy for dispersing multiple copies of content in secure, distributed locations over time and validating the integrity of those copies periodically. We anticipate that if a file at one institution is lost or damaged, it can be replaced with the same file from one of the other five institutions.
Later this summer we will run our first test of the distributed, but closed, preservation network that is a dark archive, accessible only to these partners and some of it will be open to the partners only at very specific and brief periods, e.g., embargoed ETDs.
The purpose of a dark archive is to function as a repository for information that can be used as a failsafe during disaster recovery. http://www.webopedia.com/TERM/D/dark_archive.html
EJOURNALS ARE IN A LIGHT ARCHIVEThat is, access is open to all the appropriate members of the community.
[Anisfield house]Content SelectionPreservation efforts are likely to be most coherent around shared focusSubject domain: Southern culture and historySelection of collections to be preserved made by teams of subject specialists and archivists at partner institutionsThese teams creating a conspectus of collections for consideration and prioritizationUsing collection framework of the Encyclopedia of Southern Culture The selection of specific materials is left to the cooperating institutionsDeveloping a Conspectus: Content SelectionWhat collections will be preserved?Metadata schema adapted from many sources; Dublin Core ,UK/RSLP, Western States DCBest Practices, IMLS/DCC-UIUC, OCLC/RLG PREMIS (Preservation Metadata) Metadata accompanies and makes reference to each collection and provides associated descriptive, structural, administrative, and other kinds of information. Clifford Lynch, DLib Magazine, 1999We will publish online the adapted way in which we use metadata showing any unique or qualified tags that are used (Storage & Use MD and metadata that is adapted for LOCKSS are of particular interest).Insuring that we have appropriate rights to harvest and distribute whether among a partners in a closed preservation network or to our sponsor, the Library of Congress--our national library for the publicCan the digital collections be made available for harvesting?
Small preservation caches vs. megacaches like MetaArchive:
Disk storage arrays attached to each vault server in the network can store 2 TB. 90%) will be dedicated to the shared content harvest that the cooperative will jointly identify and assemble. The remainder of the networks capacity will be allocated for preservation of critical content identified solely by the individual partners. By allocating a quota of 40 GB of replicated, secure storage to each partner for preservation of locally determined content, we offer a clear incentive for members to continue in the cooperative in the future. By creating a mechanism for cooperative members to both contribute to the common good and individual interests, we strike an effective balance that will be sustainable over time.
Will develop a simple and flexible cooperative agreement as a model for other institutions seeking to cooperate for purposes of digital preservationThe cooperative seeks to not only create an effective preservation network for one body of digital content, but enable the creation of many others for this important purpose
The LOCKSS approach tries to prevent content being lost through budget cuts by dispersing all costs and responsibilities across many institutions. The systems robustness depends upon redundancy of hardware, software, content and administration.