39
Digital Preservation and Curation of Information TEAM 9: NELL BUTLER, BRANDON HOWARD, CHARI SANDERS, EMILEE WHITEHILL

Digital preservation and curation of information.presentation

Embed Size (px)

Citation preview

Digital Preservation and Curation of InformationTEAM 9: NELL BUTLER, BRANDON HOWARD, CHARI SANDERS, EMILEE WHITEHILL

A Wealth of Information

"According to a recent study by market research

company, IDC...the size of the information universe is

currently 800,000 petabytes...but it's just a down

payment on next year's total, which will reach 1.2 million

petabytes or 1.2 zettabytes" (Harvey, 2012,).

Harvey, D. (2012). Preserving Digital Materials. Berlin ; Boston: De Gruyter Saur, p9.

Preservation in a Digital Age

The rapid growth of technology and the innovation

which accompanies it has given rise to an explosion of

information requiring preservation which can withstand

the tests of time.

However, the knowledge needed to create effective,

reliable practices has not yet been developed.

Library and recordkeeping practices are transitioning

from collection-based models, where preservation

principles have been cultivated over hundreds of years,

to environments in which collections are becoming

secondary to information resources.

Hybrid Libraries

It matters little whether information resources are managed at local or remote locations.

The idea of non-custodial collections has been examined, and in some cases, implemented, simply because the prodigious increase in digital records demanded new library services which would provide users with access to a broad array of options.

Thus, librarians are managing composite resources that include physical collections, digital information, and digital libraries.

Harvey, D. (2012). Preserving Digital Materials. Berlin ; Boston: De Gruyter Saur, p7.

The Need for a New Preservation

Model

Library, archive, and recordkeeping conventions are moving from a preservation model, where the priority has been on safeguarding physical objects (books, manuscripts, CDs) to one in which no such articles exist.

This raises the question of how preservation is to be understood in digital settings.

Harvey, D. (2012). Preserving Digital Materials. Berlin ; Boston: De Gruyter Saur, p7.

Preservation Practices for a Digital

Age

Fundamental elements of preservation programs in digital environments should incorporate these considerations:

Although many archival items benefit from minimal handling, digital information must be aggressively maintained from the moment it is created.

Without consistent attention to the technology that houses it, a collection may disappear.

In addition to technical issues, political and social concerns may also pose challenges.

arvey, D. (2012). Preserving Digital Materials. Berlin ; Boston: De Gruyter Saur, p12.

Digital Curation

Digital Curation in its most simple form is the collection

and preservation of digital resources to be used by

future users.

But why?

To understand digital curation one could first review the

importance of physical curation.

Why Physical Curation?

Because some things can be easily viewed as vital and

representative of a generation or of importance, for

example:

Journals of world leaders

Works of popular and influential writers

Paintings and sketches of artists and inspirational people

Items that represent an important event

Digital Data

Information in the present age is stored in the Internet in

various forms. Social media, journals, artist pages, and

Vlogs all represent a generation.

This information is fleeting and stored on various servers

controlled by a single corporation.

What happens if the information is not viewed as

important?

What happens if the corporation has a server failure?

What happens if the information is deleted?

Principles of Digital Curation

Digital curation is a new and growing field of study for

librarians and archivist that is taking advantage of all the

new technology in its field.

Because of this, the field is broad in objectives.

Despite this broadness, some objectives overlap.

According to Elizabeth Yakel’s work, Digital Curation, published in the OCLC Systems & Services, there are five

similarities or important concepts.

Principles of Digital Curation

Lifecycle/ Continuum management of the materials perhaps even reaching back to the creation of the record keeping system

Active involvement over time of both record creators and potentially digital curators

Appraisal and selection of materials

Development and provision of access

Ensuring preservation and usability and accessibility of the objects

Yakel, E, Digital Curation. OCLC Systems & Services: International digital library perspective, 23(4), 335-340.

Orphan Technology

Technology that is outdated, potentially unusable,

and/or the last of its kind.

An example would be discovering an old computer with

distinct file formats that would require a specialized team

to carefully extract the data. This may sound like a

James Bond Film, however, it is much closer to reality

than you think. . .

What Will Digital Preservation Do?

This very incident happened not long ago and was

reported on by Wired.com (Link to outside page).

An old computer was found in the Andy Warhol Museum

that no one had chosen to investigate. The file formats

were old, but contained unique art the world had never

scene.

A team came together and carefully extracted the

information and new works were displayed to the world.

If the team had hurried quickly and just accessed the

data, the world may have never seen those digital

paintings.

How Can You Help?

There are many Digital Curation projects available to be

a part of. Check your local, museum, or university club

to be a part of something exciting.

One may also use their ALA membership to join the

Digital Curation Interest Group located on their website.

Educate patrons about donating potential orphan

technology and being cautious of deleting digital

treasures.

Hold library programs that excite and educate patrons

about digital archiving and curation.

Short Term Preservation Technologies

Backup

Redundancy Configuration in Content Delivery Systems

Byte Replication

Backup

Many times the content can only be retrieved via the

software with which it was originally backed up

Redundancy Configuration in

Content Delivery Systems

The entire system is running over two or more computers

in two or more data centers

Online at the same time, or one of the systems is held in

reserve to be brought online quickly if the other system

fails

Byte Replication

Creation of identical copies of files, file systems, or

websites

Different copies held in different locations to ensure the

likelihood that should one become unavailable, access

to another is probable

No file format updates

Discoverability can be extremely difficult

Technical Strategies

Migration

The process of transforming digital content from its

existing format to a different format that is usable and

accessible on the technology in current use

Emulation

Involves developing software that imitates earlier

hardware and software that can be used to read

older file formats

Three Organizational Models

Government Funded National Libraries

Community-Supported Independent Preservation

Libraries

e.g. Portico

Networked Library Efforts

Groups of libraries that have pooled their resources to

share the responsibility and costs of preservation

e.g. LOCKSS and CLOCKSS

Portico

Focus on preserving e-journals, e-books, digitized

newspapers, and libraries’ locally created or

digitized content

Publishers provide digital files

Both libraries and publishers give annual financial

contributions

Libraries audit the archive and make sure content is

being added to the archive for preservation

Uses the migration-based preservation strategy

Portico Content Availability

Accessible by faculty, staff, and students at participating

libraries when a publisher

Ceases operations

Stops publishing a title

No longer offers back issues

Suffers catastrophic and sustained failure of its

delivery platform

Or in the case of a post-cancellation access request

by the publisher

Portico Services

Preservation planning

Analyzed and given a plan of action

Receipt and inventory management

Supplied to Portico via

Portable media

Standard transfer protocol

Software developed by Portico

Processing and archival deposit

Given multiple formats and kept in many geographical locations

Monitoring and management

Performs regular fixity and completeness checks

LOCKSS and CLOCKSS Lots of Copies Keeps Stuff Safe (LOCKSS)

Digital archiving system in which content is collected in the

system as it gets published

Content continuously compared between all different member

libraries, and differences are corrected

If for any reason the content a user is looking for is not retrieved

from the publisher, the LOCKSS copy is provided

Transparent format migration: involves a change of format to

match the needs of the user as the content is viewed

Controlled LOCKSS (CLOCKSS)

An offshoot of LOCKSS

Content is only provided in the advent of a trigger event

Content is preserved in the publisher’s original format, not an

archival format

Roles and Responsibilities

What are the roles and responsibilities of curators and

repositories?

Roles-Repositories and Curators

Repositories- archives, special collections libraries,

museums, research centers, etc.

They maintain “stewardship of digital materials.”

Curators- keepers and custodians of collections

Two groups- resource creators and resource managers

Resource creators “create well formed and sustainable

resources using open and standard file formats wherever

possible.”

Resource managers provide information or resources,

correctly manage them and make them accessible to

users

Responsibilities

Sustainability

How long will the resource last?

What infrastructure and policies

must be established to provide continuous development and

care?

Proper care and maintenance

to ensure resources long-term

viability

Protect against obsolescence

Appraisal and Identification

What information or resources

should be chosen to preserve?

What identifier (unique label) should be used for cataloging

and indexing?

How many digital records

should be retained?

Responsibilities

Selection

Complementary to Appraisal

What records are most

important to preserve? Which records to discard?

Which records provide the

most comprehensive view of

modern society?

Create a wide range of criteria to select these resources

Authenticity

Allows digital resources to be

reliably reused

Is the resource free from corruption, alteration or

manipulation?

Keeping resources as close to

their original form as possible

and retaining the most vital

elements

Responsibilities

Accessibility & Use

Who can use or access this information?

Ensure all users access in accordance with repositories access policies

Does not deny access or bestow privileged access to users

Enable continued access to digital resources

Make certain any restrictions are appropriate

Security & Protection

How should the information

and resources be stored?

Must ensure safety from damage, vandalism, theft and

disasters

Create and implement policies

that protect resources

Work with colleagues, IT staff and law enforcement to

protect against threats and

dangers digital and physical

Copyrights

As stated by Hirtle (2003), “Digital preservation and

access is all about copying.”

The exclusive rights of copyright holders are in conflict

with the needs of curators and repositories

Copyrights holders control (1) the ability to reproduce,

(2) the ability to publicly display information, (3) the right

to adapt information

Digital rights management software embedded in

resources control how they are used and for how long

What rights do curators and repositories have to preserve

digital information and resources?

Rights

What are the rights of curators and repositories?

Copyright Act:

Section 108(b) & 108(c)

Section 108(b)- allows libraries and archives right to reproduce

unpublished resources as long as they own them

May make maximum three copies for “preservation, security and deposit.”

Have the authority to create maximum three copies of published

resources if damaged, deteriorating or lost. Cannot make copy

unless this occurs

Section 108(c)- allows libraries and archives narrowed reproduction

rights

Have the authority to create maximum three copies of published

resources if damaged, deteriorating or lost. Cannot make copy unless this occurs

The Fair Use Provision

Gives repositories and curators the right to copy and

preserve resources that they may not own and digital

resources that they legally own

Must fulfill the four factors (PNMA) as stated by Mary

Minow (2006)

1) Purpose of use- socially beneficial? Non-commercial?

2) Nature of work- what is being copied?

3) Amount of Substantiality used- how much is being

copied?

4) Market impact- monetary compensation for the

copyright owner?

US Digital Millennium

Copyright Act (DCMA)

Libraries and archives are able to make a maximum of

three copies of digital resource for preservation.

Many formats can be copied.

Copies cannot be accessed outside of the repository

Copies cannot be digitally distributed

Digital Preservation and the Three-Legged Stool

The frameworks associated with digital preservation

have been compared to a three-legged stool.

Nancy McGovern, who began working with the

preservation of digital information at the U.S. National

Archives thirty years ago, describes the three-legged

stool, as consisting of “organizational infrastructure (the

"what"), technological infrastructure (the "how") and a

resources framework (the "how much") of building an

organization's digital preservation program.

Nancy McGovern, Digital Preservation Pioneer. Library of Congress: Digital Preservation. Retrieved April 2015, from:

http://www.digitalpreservation.gov/series/pioneers/mcgovern.html

Information as Power

From a philosophical perspective, power is central to the infrastructure of the organization, and is the means through which resources are generated.

Knowledge has traditionally been wielded by elites who recognized the power of intelligence.

In fact, “Problems of government secrecy and the dangers of political influence on recordkeeping have ancient origins” (Jimerson, 2007).

Jimerson, R. (2007). Archives for All: Professional Responsibility and Social Justice. The American Archivist, Vol. 70, p261.

Archives as Power

“Written texts entrenched theocratic tyranny over vast

reaches of monotheistic time and space,” according to

David Lowenthal. “Most archives originated as

instruments of landowners’ and lawgivers’ control. . .

Archives confirmed and certified rights to land, labor,

rents, and produce. Entry to archives was confined to

princely, and scribal elites” (2006).

Lowenthal, D. (2006). Archives, Heritage, and History. Archives, Documentation, and Institutions of Social Memory: Essays from the Sawyer Seminar. Editors:

Francis X. Blouin, Jr. and William G. Rosenberg. Ann Arbor: University of Michigan Press, 2006), p194.

Archives and Advocacy

Archivists, recordkeepers, and information professionals hold the positions once reserved for princely and scribal elites, for they control access to the material for which they are responsible.

Randall Jimerson, former President of the Society for American Archivists, refers to archives as “sites of power.”

He believes archivists should embrace their power in ascertaining “what records will be preserved…..for the benefit of all members of society (and that) archivists can use the power of archives to promote accountability, open government, diversity, and social justice” (2007).

Jimerson, R. (2007). Archives for All: Professional Responsibility and Social Justice. The American Archivist, Vol. 70, p252.

TED Talk

To summarize the potential and value of digital

information, here is a TED Talk by Adam Ostrow about

the possibilities and values of our digital lives