42
1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta, Georgia May 30, 2007

1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Embed Size (px)

Citation preview

Page 1: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

1.2 Content Management

Catherine M. JannikGeorgia Institute of Technology

MetaArchive Distributed Digital Preservation WorkshopEmory University – Atlanta, Georgia

May 30, 2007

Page 2: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Session Learning Objectives

Scoping Content: Determining the scope of your network and what you will harvest.

Describing Content: Determining how you will describe content and why you need a conspectus.

Inventorying Collections: Completing the conspectus based on your scope and schema.

Harvesting: Prioritizing and preparing for harvest.

Page 3: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

Identify criteria for what is in and out of scope; what will be harvested

Scoping decisions1. Subject

2. Media formats

3. Risk

4. MetaArchive case study

Page 4: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

1. Subjecta. Is there a subject area the members share in

common that might provide a starting point?

b. How to define that subject area and its boundaries.

c. Establish or adopt a controlled vocabulary.

Page 5: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

The partner institutions of this project are engaged in a three-year process to develop a cooperative for the preservation of at-risk digital content with a particular content focus: the culture and history of the American South.

Page 6: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 7: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

A discussion of Southern culture and history must always begin with clarification of the terms. Southern is a term that, to most, brings to mind a particular region. However, upon closer inspection, the South and its boundaries are not so easily mapped. One could begin and end with the eleven former Confederate states, though that excludes the four other slave states that remained part of the Union. One could consider the “census south:” the Confederacy with the addition of Delaware, Maryland, West Virginia, Oklahoma, and the District of Columbia. There is also the Gallup organization’s South that includes the Confederate eleven plus Oklahoma and Kentucky. Then there are the areas of the country that serve as home to former Southerners who retain much of their culture and infuse their new locals with vestiges of their former homes. As the Encyclopedia’s editors and authors did, we will rely on a cultural definition of the South more inclusive than not, focusing largely on the former states of the Confederacy but without excluding the margins of the region where the culture of the South is evident. After careful contemplation of the meaning of “culture,” the editors of the Encyclopedia planned their work “to carry out [T.S.] Eliot’s belief that ‘culture is not merely the sum of several activities, but a way of life.’” History is the most easily defined of the terms and is evident in most of the collections in the MetaArchive project however, an historical component was not required for consideration.

10.13.2004

Page 8: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

The definition of Southern culture and history used in this project is constructed with broad strokes. The Content Committee responsible for this definition owes a debt of gratitude to the editors of the Encyclopedia of Southern Culture on whose introduction we relied heavily.

A discussion of Southern culture and history must always begin with clarification of the terms. Southern is a term that, to most, brings to mind a particular region. However, upon closer inspection, the South and its boundaries are not so easily mapped. One could begin and end with the eleven former Confederate states, though that excludes the four other slave states that remained part of the Union. One could consider the “census south:” the Confederacy with the addition of Delaware, Maryland, West Virginia, Oklahoma, and the District of Columbia. There is also the Gallup organization’s South that includes the Confederate eleven plus Oklahoma and Kentucky, and the National Endowment for the Humanities includes Puerto Rico and the Virgin Islands in its South Atlantic Humanities Center.

The South is also an identity. Southerners who move outside of the region, however defined, retain much of their culture and infuse their new locales with vestiges of their former homes. Conversely, people born outside of the South who come to live within the region find that their work and lives are influenced by their adopted home and themselves become a part of the evolving South.

As the Encyclopedia’s editors and authors did... 04.19.2005

Page 9: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

MetaArchive

Scope document https://www.metaarchive.org/metawiki/index.php?title=Main_Page

Page 10: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

2. Media formatsa. What can the repository support? (LOCKSS is

agnostic)

b. Which of those formats does each member have

c. Of that list, which formats do we want to include?

d. Stance on master vs. derivative or compressed files (jpg vs. tiff)

Page 11: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

MetaArchive and Formats

Formats and MediaThe digital formats of the material considered do not affect the harvest because LOCKSS is format-agnostic. The LOCKSS system provides redundant replication of files in any format. Hence, formats were not a major consideration for risk ranking. Most of the candidate collections incorporate content in several formats and each institution will handle its own format migration outside the scope of this project. Collections stored only on off-line media should be considered at high risk and, therefore, become part of this preservation cache.

The extent or size of the collection and the Internet Media [MIME] Types included will be noted. NOTE: This follows the Western States Dublin Core Metadata Best Practices Draft v 2.0 Draft August, 2004 http://www.cdpheritage.org/resource/metadata/documents/WSDCMBP_v2-0.pdf

Example Format [Extent] 3,000,000 bytes Format [Medium] DVD Format [IMT] image/jpeg

Page 12: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Master vs. derivative The MetaArchive of Southern Digital Culture understands that in

the digital realm there are often times various versions of a digital object which comprise the complete copy of the object as a whole. In regards to this area we encourage our partners to select the version or versions of the digital object which best represent the original content of the object. We recognize that organizations such as DLF have created standards for elements of a digital master registry (http://www.diglib.org/collections/reg/DigRegGuide.htm). In such documents we note the attempt to distinguish two categories of digital objects which we will refer to as (i) preservation or digital masters and (ii) access, digital use or surrogate copies. Member institutions will decide whether they want to preserve their digital masters, their access copies, or all versions of a digital object.

Page 13: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

http://www.diglib.org/collections/reg/DigRegGuide.pdf

Page 14: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

3. Riska. Born digital vs. digitized

b. Particular formats

c. Items in use vs. dark items

Page 15: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

3. Risk

Page 16: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Scoping Content

4. MetaArchive Case Study

Page 17: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Describing Content

How will the content be described?

Cataloging decisions

1. Schemas

2. Conspectus database

3. MetaArchive case study

Page 18: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Describing Content

1.Schemas

a. Adopt, adapt, or create a schema?

b. Content, context, and format must be captured.

Page 19: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Western States Dublin Core Metadata Best Practices

http://www.westernwater.org/pdf/Western%20waters%20Dublin%20Core%20Metadata_v2-0.pdf

Page 20: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Collaborative Digitization Program Dublin Core Metadata Best Practices

http://www.cdpheritage.org/cdp/documents/CDPDCMBP.pdf

Page 21: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Dublin Core Collections Application Profile

http://dublincore.org/groups/collections/collection-application-profile/

Page 22: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

UKOLN Research Support Libraries Programme (RSLP) Collection Description Schema

http://www.ukoln.ac.uk/metadata/rslp/schema/

Page 23: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

IMLS DCC Collection Description Metadata Schema

http://imlsdcc.grainger.uiuc.edu/CDschema_elements.asp

Page 24: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

PREMIS Preservation Metadata: Implementation Strategies

http://www.oclc.org/research/projects/pmwg/premis-final.pdf

Page 25: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

MetaArchive Collection-Level Conspectus Metadata Specification

http://metaarchive.org/pdfs/conspectus_md_2005.html

Page 26: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 27: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Describing Content

2. Conspectus database

a. Collection-management tool

b. How it’s constructed

c. MetaArchive’s framework

Page 28: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 29: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Describing Content

Page 30: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Describing Content

3. MetaArchive Case Study

Page 31: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 32: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 33: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 34: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 35: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 36: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,
Page 37: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Inventorying Collections

Gather information based on descriptive schema

Describing your collection

1. Inventory

a. What to include

b. Timeframe

c. Preparation

Page 38: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Inventorying Collections

a. What to include – network and local decision

b. Timeframe – agree on one

c. Preparation – get all the information together first!

Page 39: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Harvesting

What to save/harvest first, later, eventually and how

Preparing for harvest

1. Prioritization

a. Institution level

b. Network level

Page 40: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Harvesting

2. Preparation

a. Inventory

b. “Data wrangling”

c. Plug-ins and manifest pages

Page 41: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

HARVEST!!

Page 42: 1.2 Content Management Catherine M. Jannik Georgia Institute of Technology MetaArchive Distributed Digital Preservation Workshop Emory University – Atlanta,

Discussion

[email protected]

Thanks!