View
213
Download
0
Category
Tags:
Preview:
Citation preview
Defining Digital Preservation
Service, policy and technology issues O-Space & the Ontario Scholars Portal
Alan Darnell & Rea Devakos
Agenda
Digital Preservation lite Dspace O-Space
Information Literacy Cooperative Repository Government Documents Individual and federated instances
Ontario Scholars Portal Preservation of e-journals
Questions
Context
The implications for preserving continued access to important digital materials is already being felt by libraries and archives, many of which have begun to consider and take initial steps to meet their responsibility effectively. www.dpconline.org/graphics/digpresstratoverview.html
The needs, as it turns out are great, and the approaches not yet clear. New Model Scholarship: Will it Survive http://www.clir.org/pubs/reports/pub114/contents.html
Popular attention…
..“Born digital” materials utilize the technology to provide a level of convenience and functionality. For example, dynamic databases which are constantly updated to produce large scale mapping or on demand publications.. These utilize the technology very effectively for current access but pose considerable challenges in terms of the ability to maintain access to them over time and also the ability to compare data at different points of time.
www.dpconline.org/graphics/digpresstratoverview.html
Digital Preservation Defined
The planning, resource allocation, and application of preservation methods and technologies necessary to ensure that digital information of continuing value remains accessible and usable
http://www.uky.edu/~kiernan/DL/hedstrom.html
Another definition
..ensuring that records which are created electronically using today’s computer systems and applications, will remain available, usable, and authentic in ten to one hundred years time, when the applications and systems which were used to create and interpret the record will, more likely than not, no longer be available. Digital preservation consists of preserving more than just the record’s bit stream. We must also be able to interpret the bit stream in order for the record to survive. Without interpretation, the bit stream is nothing more than a meaningless series of 0’s and 1’s.
http://www.digitaleduurzaamheid.nl/bibliotheek/docs/white_paper_emulatie_EN.pdf
Accepted practice
Preservation of analog materials has traditionally been achieved through something akin to a “crisis management” strategy. Once conditions have reached a point where there is significant risk that valuable materials will be lost, efforts are organized to stave off impending disaster. These efforts are often financed through sporadic infusions of “soft”, unbudgeted funding – government grants, philanthropic donations..
It is likely that one-off funding mechanisms will also play a prominent role in supplying resources for digital preservation. But the characteristics of digital materials are such that primary reliance on an ad hoc approach to the economics of digital preservation is almost certain to prove inadequate.
http://www.oclc.org/research/projects/digipres/incentives-dp.pdf
“Retrospective” preservation?
Amount of documents generated Machine dependency
Speed of tech change Multiple “versions” of hardware and software
Media fragility Ease of change: preserve integrity, authenticity
and history of item Life cycle management Rendering so not to loose attributes unresolved Evolving standards In addition to traditional preservation Digital Rights Management
More than the technology
Organizational Infrastructure Rationale and mandate Policies, procedures, plans
Technological infrastructure Ongoing support for a robust, flexible & cost
effective platform Resources
Staffing, technology, operations www.library.cornell.edu/iris/tutorial/dpm/challenges/index.html
Potential for failure
can fail …for many reasons: policy (for example, the institution chooses to stop
funding it), management failure or incompetence, or technical problems.
redundancy I worry a great deal about what the various impacts and
implications of the first few major failures of institutional repositories--for whatever reasons--will be; I fear, for example, that they may greatly set back scholarly acceptance of authorship of digital works; they may have a corrosive effect on the trust that underpins campus communities; they may undermine broad social support for higher education. Sadly, I have little doubt that we will see such failures within the next decade or so. I hope I am wrong.
Lynch http://www.arl.org/newsltr/226/ir.html
From the library’s perspective
Scope and scaleNumber & size of collections & files
Complexity of collectionsHomogenous or heterogeneous Simple or complex digital objects
Value: centrality to organization’s mission
Control: long term access to materials
Questions..
What is “usable and interpretable” Do we keep content readable Maintain look and feel
What do we select for long term preservation How long is long term
What technologies can be employed What standards should be followed How will we pay for everythingPeter Brittle, Cornell
No good cost estimates..
Digital preservation is essentially about preserving access over time. This makes it virtually impossible neatly to segregate costs which are only for digital preservation from costs which are only about access
www.dpconline.org/graphics/digpresstratoverview.html
Beyond printing to paper
http://dspace.dial.pipex.com/stewartg/metpres.html
Multiple meanings…
Long-term maintenance of a bitstream the zeros and ones Viability to the maintenance of the bitstream:
• Information must be intact and readable from the storage media Provide continued accessibility of its contents.
Renderability: viewable by humans and processible by computers
Understandability: interpretable by humans ..It is one thing to preserve a bitstream, but quite another to
preserve the content, form, style, appearance, and functionality.
www.library.cornell.edu/iris/tutorial/dpm/dpmtutorial.pdf
Jargon..
Authenticityreliable or trustworthy
Fixityunchanged
Compressionreduce file size for storage,
transmission or processing.
Migration
set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another or from one generation of computer technology to a subsequent (one.) The purpose.. is to retain the ability for clients to retrieve, display an otherwise use.
Emulation
Combines software and hardware to reproduce in all essential characteristic the performance of another computer of a different design, allowing programs or media designed for particular environment to operate in a different, usually new environment.
Universal Virtual Computer development of a computer program
independent of any existing hardware or software that could simulate the basic architecture of every computer since the beginning..
Canonicalization
Technique designed to allow the determination of whether the essential characteristics of a document have remained intact through a conversion from one format to another.
Preservation metadata
Information necessary to maintain the viability, renderablity and understadability of digital resources over the long term
http://www.oclc.org/research/projects/pmwg/presmeta_wp.pdf
I have an object..
Identification – what format is it? Validation – purportedly of format F; is it? Transformation –
format F, but need G; how can I produce it Characterization –
what are its significant properties Risk assessment –
is it at risk of obsolescence Delivery
how can I render ithttp://www.ifla.org/IV/ifla69/papers/128e-Abrams_Seaman.pdf
OAIS
Open archival information system a type of archive consisting of an
organization of people and systems that has accepted the responsibility to preserve information for one or more designated communities
http://www.library.cornell.edu/iris/tutorial/dpm/dpmtutorial.pdf
Users will..
seek documents that are easily retrieved and manipulated, transmittable, and transportable from a repository to the sites of research, presentation, and teaching. .. Preserving digital materials in formats that are reliable and usable, however, will require long-term maintenance of structural characteristics, descriptive metadata, and display, computational, and analytical capabilities that are very demanding of both mass storage and software for retrieval and interpretation
http://www.uky.edu/~kiernan/DL/hedstrom.html
IRs defined
An institutional repository consists of formally organized and managed collections of digital content generated by faculty, staff, and students at an institution
http://sitemaker.umich.edu/dams/files/etcom-2003-repositories.pdf
Lynch’s definition
university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.
Most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution
an effective institutional repository of necessity represents a collaboration among librarians, information technologists, archives and records mangers, faculty, and university administrators and policymakers
http://www.arl.org/newsltr/226/ir.html
IRs..
Institutional Capture Access
Some: PreserveCumulative PerpetualTechnically openInteroperable
Repositories: Librarians’ Usage..
Subject ArXivEconPapers
TypeMerlotEprints
Publishers
Institutional CalTech
Educational technologist use
1. Collection-based digital repositories managed by library professionals, either stand-alone or aggregated;
2. Course management systems and associated file stores;
3. Collections of research data and reports managed by academic departments;
4. Student academic portfolio systems;5. Institutional file storage systems;6. Digital asset management workflow systems; or7. Web content management systems used by
institutions or departments to store and stage Web content.
http://sitemaker.umich.edu/dams/files/etcom-2003-repositories.pdf
Why Institutional Repositories
TheoryAccess to digital materials on the webShift in scholarly communicationPreservation
PracticalBrandingProliferation of personal or unit
websites
Why libraries?
ExpertiseLarge-scale collection management
• Assessment / collection policies• Preservation
MetadataBusiness practices
CommitmentLong time framesMission / Scope
Why IRs now?
Online storage dropped Development: (Some) standards & development Metadata Preservation Pre & e-prints Innovative scholarly complex digital
“objects” Lynch http://www.arl.org/newsltr/226/ir.html
Non-technical challenges..
As with most change programs, the most significant challenge facing institutional repositories is the “administrative attention span” and long-term commitment to insure preservation and maintenance of the repository over time, providing the necessary confidence to enable faculty members to contribute their works to the repository
http://sitemaker.umich.edu/dams/files/etcom-2003-repositories.pdf
…
For those organizations within the university concerned with stewardship--we think immediately of libraries, archives, and museums but should recognize there are also huge numbers of academic units that curate collections of information--it should be clear that institutional repositories raise complex and nuanced questions about organizational roles, responsibilities resources, and strategies
Lynch http://www.arl.org/newsltr/226/ir.html
DSpace
Captures, describes, preserves and distributes digital intellectual products
Any format Preservation archive Open Source system Federated system Both a service model and code..
Digital preservation “philosophy” Lots of digital material is already lost Most digital materials is at risk Better to have it, do a bit of preservation
work than lose it completely Need to capture as much information as
possible to support functional preservation Cost benefit ratios
Redundancy: the “Federation”
Original developed byMIT H-P
FederationTorontoCambridgeColumbia Cornell Ohio State Rochester Washington
Technical underpinnings
Based on MIT’s DSpace Open Source
• Java Standards Based
OAIS Compliant • Open Archival Information System
Qualified Dublin Core Metadata Peristent Identifier: CNRI Handle
• Corporation for National Research Initiatives
Archives and libraries must now contend with entirely new forms of electronically-enabled discourse and new forms of artistic and cultural expression that do not have predecessors in the analog world. No current preservation method is adequate for preserving dynamic data objects from complex systems. There are no established conceptual models or technical processes for preserving multi-media works, interactive hyper-media, on-line dialogues, or many of the new electronic forms being created today. The archival requirements to preserve content, context and structure and to maintain the capability to display, link and manipulate digital objects only heighten their software dependency.
www.uky.edu/~kiernan/DL/hedstrom.html
Four starting steps..
The primary objectives is that processes for digital objects become “business as usual” at the Library.
Searle and Thompson
http://www.dlib.org/dlib/april03/thompson/04thompson.html
Minimum preservation Minimum preservation metadatametadata
The preservation community has at its disposal a variety of tactics for digital preservation that appear to work effectively for e certain types of materials in certain restricted environments, but we have not yet developed solutions that are scalable to the general problem. ..This is not to suggest that there is or should be a single solution…The methods used will vary depending on the complexity of the original data objects, the extent to which the functionality for computation, display, indexing and authentication must be maintained, and the requirements of current or anticipated users.
www.uky.edu/~kiernan/DL/hedstrom.html
Affordability
Regardless of how the responsibility for digital preservation is distributed, societies only allocate a small and finite amount of resources to preserving scholarly and cultural resources. And in the digital environment it seems likely that more preservation responsibilities will be distributed to individual creators, right holds, distributors, small institutions, and other players in the production and dissemination process. Therefore, it seems imperative that digital preservation technologies become affordable and accessible…
www.uky.edu/~kiernan/DL/hedstrom.html
Formats
In the past decade, digital librarians have worked hard to define the parameters of “materials in preservable form.” They have tried to specify which formats encoding schemes will hold up best through one or more cycle of data migration. Because of their often-prescriptive nature, these efforts have met with mixed success in the academic community.
New Model Scholarship: Will it Survive http://www.clir.org/pubs/reports/pub114/contents.html
http://ospace.scholarsportal.info
The potential
. The development of institutional repositories emerged as a new strategy that allows universities to apply serious, systematic leverage to accelerate changes taking place in scholarship and scholarly communication, both moving beyond their historic relatively passive role of supporting established publishers in modernizing scholarly publishing through the licensing of digital content, and also scaling up beyond ad-hoc alliances, partnerships, and support arrangements with a few select faculty pioneers exploring more transformative new uses of the digital medium.
http://www.arl.org/newsltr/226/ir.html
Community portal
Community portal
Who does what
“Library” Server management Storage management Technical and user support
Communities = Administrative units Supply content and metadata Set policy
• Content• Who may contribute, approve and access • Identity
Creators select, capture & Creators select, capture & describe usingdescribe using Qualified Dublin Core Metadata
Qualified Dublin Core Metadata
Retain copyrightRetain copyright
License for distribution & preservationLicense for distribution & preservation
Versioning
Can be:All instances of the work in different
formats e.g. PDF, XMLAll editions of work over time
• Official changes• Periodic snapshots (e.g. websites)
Metadata lists all available versions
Preservation
Everything will be retrievable but not necessarily “functional”
“Support” as many formats as possible: Supported = functional preservation Known
• recognize, cannot guarantee full support Unsupported = unknown
• cannot recognize a format• "application/octet-stream"
Preservation platform
CaptureVariety of formats accepted Bitstream hence format independent
Refresh “Keep it safe” Authenticity: checksums Metadata store with objects
Stewardship is easy and inexpensive to claim; it is expensive and difficult to honor, and perhaps it will prove to be all too easy to later abdicate. Institutions need to think seriously before launching institutional repository programs.
Lynch http://www.arl.org/newsltr/226/ir.html
MIT preservation policies
SupportedMigration for texts, images, audiosEmulation for software, multimedia
UnsupportedBit preservationBatch migration where possible
• Commerical conversion
There are two ways to examine digital preservation requirements: from the perspective of users of digital materials and from the view of libraries, archives, and other custodians who assume responsibility for their maintenance, preservation, and distribution. Libraries and archives will not accomplish their preservation missions if they do not satisfy the requirements of their users by preserving materials in formats that enable the types of analyses that users wish to perform. At the same time, libraries and archives are unlikely to be able to satisfy all requirements of all potential users primarily due to resource constraints.
http://www.uky.edu/~kiernan/DL/hedstrom.html
Dspace@MIT Research Areas
Digital preservationDigital files: audio, video, image, textWeb sitesSoftware
Personal archivingLaptop DSpaceProactive collaboration with content
creators
Movers and shakers Open Archives Initiatives
Interoperability standards to facilitate dissemination of content
Metadata Harvesting Protocol Harvard Digital Repository Service FEDORA
Flexible Extensible Digital Object and Repository Architecture
CARL IR Pilot EPrints
SHERPA • Securing a Hybrid Environment for Research
Preservation and Access RLG & OCLC Australian National University CLIR
Promises to keep
In establishing institutional repositories, institutions are both accepting risks and making promises; they are creating new expectations. In a budget crunch, the institutional repository may be one of the last things that can be cut, given the way that digital preservation demands steady and consistent attention and hence funding. Faculty who choose to rely on institutional repositories to disseminate and preserve their work are placing a great deal of trust in their institution and in the integrity, wisdom, and competence of the people who manage it. We need to ensure that our institutional repositories are worthy of this trust.
Lynch http://www.arl.org/newsltr/226/ir.html
Follow-up
www.tspace.library.utoronto.ca www.dspace.org Rea Devakos
rea.devakos@utoronto.ca416 978-0533
Preserving Electronic Journals
Alan Darnell
Ontario Scholars Portal Project
What’s so special about e-journals? Similar technical issues and solutions as
with other digital content but a different economic context
Difference between access and ownership and the “model license”
The role of the publisher, the role of the library, who bears the costs
How does legal deposit fit in and what role do national agencies have to play?
Changing nature of “born-digital” e-journals; what is the copy of record?
Model License
Addresses the need for librarians to have long-term access to material they have paid for under license
JISC/NESLI Model License Perpetual access from publisher server,
third-party, electronic copy Relationship with publisher that extends
beyond the subscription term CNSLP Model License Is the model license enough of a
guarantee?
Role of the publisher and library The library as preservation agent
traditional view from the print world an important responsibility An unbearable cost in the electronic environment? ->
LOCKSS (Digital Library Federation / Mellon) The publisher as preservation agent
commercial value in maintaining backfiles act now as archives by default
Trusted third parties OCLC / JSTOR (Mellon Grant) / National Libraries
(Dutch National Library and Elsevier) / CrossRef
Legal Deposit
Print material is still considered the preferred format for legal deposit
Legal deposit is nationally based and doesn’t address all the research needs of universities
LC’s National Digital Information and Infrastructure Program
“Born digital” e-journals
Print representation, if any, is less complete than the electronic version
Includes supporting material in multiple formats (e.g. databases, GIS data, 3-D chemical structures)
How will we migrate all of these data types into the future?
Importance of experience with D-Space and institutional repositories
Some international projects
OCLC’s Digital Archive LOCKSS (Stanford) JSTOR Electronic Archiving Initiative JISC/NESLI – Model License LC National Digital Information
Infrastructure and Preservation Program
Ontario Scholars Portal
4000 + e-journals and 5 M articles from different publishers
Some metadata only, but mostly full-text Acts as an “archive” for content licensed
under CNSLP and OCUL consortial agreements
Similar to about 20 other projects world-wide based on ScienceServer software
Recommended