63
Down and Dirty Digitization: Everything you need to know about putting content online Roy Tennant California Digital Library

Down and Dirty Digitization: Everything you need to know about putting content online

  • Upload
    rane

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Down and Dirty Digitization: Everything you need to know about putting content online. Roy Tennant California Digital Library. Outline. Project Planning Selecting Material to Digitize Digitization Purpose Basic Imaging Principles Capturing Images Editing Images Best Practices - PowerPoint PPT Presentation

Citation preview

Page 1: Down and Dirty Digitization: Everything you need to know about putting content online

Down and Dirty Digitization:Everything you need to know about putting content online

Roy TennantCalifornia Digital Library

Page 2: Down and Dirty Digitization: Everything you need to know about putting content online

Outline

Project Planning Selecting Material to Digitize Digitization Purpose Basic Imaging Principles Capturing Images Editing Images Best Practices Conversion to Text Metadata Access Systems Skills Required of Staff Preservation

Page 3: Down and Dirty Digitization: Everything you need to know about putting content online

Project PlanningWho will do the work?What systems will be required?What are the specifications for images

and metadata?How much will the project cost?Who will own and manage the digital

products that will be produced?

Steve Chapman, from Handbook for Digital Projects, NEDCC

Page 4: Down and Dirty Digitization: Everything you need to know about putting content online

Selecting Material to Digitize

Publishing rights Available support/funding opportunity Critical mass Uniqueness Reputation Audience and potential use Diversity of material type Ability to stand on its own and fit in with other

collections

Page 5: Down and Dirty Digitization: Everything you need to know about putting content online

What Do We Preserve?The body or the soul?

The artifact The intellectual content

How do we decide that the artifact has preservation value?

Who decides?

Page 6: Down and Dirty Digitization: Everything you need to know about putting content online

The Artifact The “look and feel” The experience of interacting with a specific

object Consequences:

Choices for providing access are limited Time and money spent on recreating the artifact

may be better spent on increasing access In some cases, preserving the look and feel

actually harms other uses

Page 7: Down and Dirty Digitization: Everything you need to know about putting content online
Page 8: Down and Dirty Digitization: Everything you need to know about putting content online

Written MaterialHandwritten texts (diaries, etc.), or

those with handwritten notations (manuscript drafts, etc.) can easily be considered to have artifactual value

But how much artifactual value do printed texts have?

And born-digital texts?What’s it worth to you?

Page 9: Down and Dirty Digitization: Everything you need to know about putting content online

“If the goal of preservation is persistent utility, then functionality rather than aesthetics should drive system design.”

— Stephen Chapman, “Content Follows Form: Preservation via Systems Design, Microform & Imaging Review

Page 10: Down and Dirty Digitization: Everything you need to know about putting content online

Persistent UtilityForm must be allowed to be altered or

destroyed to retain or enhance function If function cannot be retained or

enhanced, then form should be preserved

Page 11: Down and Dirty Digitization: Everything you need to know about putting content online

Considerations for Retaining Items in Original FormatAgeEvidential valueAesthetic valueScarcityAssociational valueMarket valueExhibition value

Page 12: Down and Dirty Digitization: Everything you need to know about putting content online

“The issue is not to evaluate the artifact per se to determine what survives and what does not…The issue is the need to agree on a method for interrogating the individual artifact, that would, in a climate of finite resources, help make a good decision about whether and how to preserve it.”

— Council on Library and Information Resources, The Evidence in Hand: the Report of the Task Force on the Artifact in Library Collections

Page 13: Down and Dirty Digitization: Everything you need to know about putting content online

How Do We Preserve It?

$0

$200

$400

$600

$800

$1,000

$1,200

$1,400

$1,600

$1,800

$2,000

Bind/Box Deacidify Microfilm Digitize Simple Book Digitize Complex Book Conserve

Preservation costs by method calculated by the Library of Congress Preservation Directorate

Page 14: Down and Dirty Digitization: Everything you need to know about putting content online

Types of Materials

Printed text/

Simple line art

Manuscripts

Halftones

Continuous Tone

Mixed

From Anne Kenney, et.al., Moving Theory into Practice

Page 15: Down and Dirty Digitization: Everything you need to know about putting content online

Benchmarking The process whereby you determine your

digitization requirements using the material you will digitize

Page 16: Down and Dirty Digitization: Everything you need to know about putting content online

Resolution

One pixel

The number of pixels in a given area defines the resolution of an image

1”

500 x 1,000 pixels

Page 17: Down and Dirty Digitization: Everything you need to know about putting content online

Dynamic Range (bit-depth)

1 bit 8 bit grayscale 8 bit color 24 bit color (GIF) (GIF) (JPEG)

1 bit = black or white8 bits = 256 shades16 bits = thousands24 bits = millions36 bits = billions

Page 18: Down and Dirty Digitization: Everything you need to know about putting content online

RGB Color Space

Red

Green

Blue

8 bits per channel = 24 bit color image

12 bits per channel = 36 bit color image

Color Channels

Page 19: Down and Dirty Digitization: Everything you need to know about putting content online

Image CompressionLossless — the image is unchanged

after compression (no image data is lost) Typical file size: 50% of original Example: LZW compression

Lossy — the image is altered after compression (image data is lost) Example: JPEG

Page 20: Down and Dirty Digitization: Everything you need to know about putting content online

TIFF

Tagged Image File FormatMost often used to save “master

versions” of images (unedited)Can be compressed or uncompressed

Page 21: Down and Dirty Digitization: Everything you need to know about putting content online

Compuserve GIF

Graphic Interchange Format (GIF) Maximum 8 bits/pixel: 256 colors (shades) Good for:

Text and line art Thumbnails

Not good for: Full-color pictures Anything that requires more than 256 colors

Page 22: Down and Dirty Digitization: Everything you need to know about putting content online

JPEG

Joint Photographic Engineers Group JPEG is actually a compression scheme; the

image file format is JFIF (JPEG File Image Format)

Good for: Full-color pictures Anything that requires more than 256 colors

Not good for: Text or line art

Page 23: Down and Dirty Digitization: Everything you need to know about putting content online

New Image Formats

Portable Network Graphics (PNG) - from the W3C to replace the Compuserve GIF format and provide more capabilities

JPEG2000 - An upgrade of the JPEG format Flashpix - from a consortium of commercial

companies, to provide much higher-resolution images in a way that allows speedy network delivery

MrSID - From LizardTech, good for large format materials (maps, panoramic photos, etc.)

Page 24: Down and Dirty Digitization: Everything you need to know about putting content online

Capturing Images

Technologies Digital Cameras Flatbed Scanners Film Scanners Kodak PhotoCD

OutsourcingStandards and Best Practices

Page 25: Down and Dirty Digitization: Everything you need to know about putting content online

Digital Cameras

BetterLight Super6K6,000 x 8,000 pixels, 136MB (24bit RGB)$16,990

Phase One PowerPhase FX10,500 x 12,600 pixels, 760MB (48 bit RGB)

Page 26: Down and Dirty Digitization: Everything you need to know about putting content online

Flatbed ScannersMinimum requirements:

600 X 1200 dpi optical resolution

36-bit colorNot for slides or transparencies, best for

81/2”x11” or 81/2”x14” originalsSheet feeder (often optional) helpful for

digitizing text

Page 27: Down and Dirty Digitization: Everything you need to know about putting content online

Film ScannersFor 35mm slides and negatives;

others available for larger formats$600 - $3,000 Most around 2700-4000

dpi,30-36 bit color

Page 28: Down and Dirty Digitization: Everything you need to know about putting content online

Kodak PhotoCDTake pictures with a normal camera, but

have your pictures “developed” onto a PhotoCD

A proprietary image format: ImagePAC, but very high resolution (4 different resolutions)

Page 29: Down and Dirty Digitization: Everything you need to know about putting content online

Outsourcing: Pros and Cons Benefits:

No ramp-up costs (both time and money) Probably higher quality, at least to begin with High volume capability

Drawbacks: May be more costly if you have underutilized staff

time No internal capability or experience developed (that

is, when the money runs out, so does your chance to do anything more)

Rare items may require in-house digitization

Page 30: Down and Dirty Digitization: Everything you need to know about putting content online

Outsourcing: How Write an RFQ (Request for Quote) outlining:

Type and amount of material being digitized Quality requirements Volume per unit of time requirements

For RFQ guidance and samples, see RLG Tools for Digital Imaging: www.rlg.org/preserv/RLGtools.html

Page 31: Down and Dirty Digitization: Everything you need to know about putting content online

Digital Image Work Flow

Original TIFF or PCD10-100+MB

JPEG100K

GIF10K

RGB Color Space IndexedColorSpace

Resize,Sharpen

Rotate,Crop,

Retouch,Brightness/

Contrast

Stored offline Stored online

Page 32: Down and Dirty Digitization: Everything you need to know about putting content online

Editing Images

RotatingCroppingRetouchingAdjustingResizingSharpeningSaving

Page 33: Down and Dirty Digitization: Everything you need to know about putting content online

Image Editing Demonstration

Page 34: Down and Dirty Digitization: Everything you need to know about putting content online

Conversion to Text Optical Character Recognition (OCR)

software is required (Caere OmniPage Pro, Xerox TextBridge, etc.)

Quality and typography of originals is key Less than 99.5% accuracy is less expensive

to have re-keyed offshore For some applications, uncorrected text is

sufficient

Page 35: Down and Dirty Digitization: Everything you need to know about putting content online

Imaging Best PracticesGeneral guidelines for archival versions:

Photos, illustrations, maps, etc.: 300-600dpi 24-36 bit color

B/W Text document: 300-600dpi 8 bit grayscale

Negatives and Slides: 2000-4000 pixels in longest dimension 24-36 bit color for color; 8 bit grayscale for B/W

Page 36: Down and Dirty Digitization: Everything you need to know about putting content online

Imaging Best Practices

“The key to image quality is not to capture at the highest resolution or bit depth possible, but to match the conversion process to the informational content of the original, and to scan at that level--no more, no less.” — Moving Theory Into Practice

Page 37: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Types

Structured description of an object or collection of objects

Three basic types: descriptive - e.g., title, creator, subject -

used for discovery administrative - e.g., resolution, bit

depth - used for managing the collection

structural - e.g., table of contents page, page 34, etc. - used for navigation

Page 38: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Appropriate LevelMetadata: Appropriate Level

Collection-level access: Discovery metadata describes the collection Example: Archival finding aid encoded in

SGML; see http://www.oac.cdlib.org/

Item-level access: Discovery metadata describes the item Example: individual metadata records for

each item; see http://jarda.cdlib.org/cgi-bin/imagesearch.pl

Page 39: Down and Dirty Digitization: Everything you need to know about putting content online

IndividualFinding

Aid

Images

Collection Level AccessCollection Level Access

Search Interface (Library catalog

or dedicated)

IndividualFinding

Aid

Page 40: Down and Dirty Digitization: Everything you need to know about putting content online
Page 41: Down and Dirty Digitization: Everything you need to know about putting content online
Page 42: Down and Dirty Digitization: Everything you need to know about putting content online
Page 43: Down and Dirty Digitization: Everything you need to know about putting content online

Search Interface (Dedicated)

Images

Item Level AccessItem Level AccessFinding Aids

Page 44: Down and Dirty Digitization: Everything you need to know about putting content online

jarda.cdlib.org/search.html

Page 45: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Granularity <name>William Randolph Hearst</name> <name>

<first>William</first><middle>Randolph</middle><last>Hearst</last>

</name> Consider all uses for the metadata Design for the most granular use Store it in a machine-parseable format

Page 46: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Qualification<name role=“creator”>William Randolph

Hearst</name><subject scheme=“LCSH”>Builder --

Castles -- Southern California</subject>

Page 47: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Machine Parseability

The ability to pull apart and reconstruct metadata via software

For example, this:

Can easily become this:

<name><first>William</first><middle>Randolph</middle><last>Hearst</last>

</name>

<DC.creator>Hearst, William Randolph</DC.creator>

Page 48: Down and Dirty Digitization: Everything you need to know about putting content online

Metadata: Standards

Metadata: Collection Level:

Encoded Archival Description (EAD) - lcweb.loc.gov/ead/

Item Level: MARC Dublin Core - purl.org/DC/ MODS - www.loc.gov/standards/mods/

Harvesting: Open Archives Initiative, www.openarchives.org

Page 49: Down and Dirty Digitization: Everything you need to know about putting content online

Access SystemsExhibitBrowseSearch

Page 50: Down and Dirty Digitization: Everything you need to know about putting content online

Access Systems: Exhibit Goals:

Inviting Easy to navigate Highlight selected parts of a collection Teach

Requirements: Great graphic design Informative and succinct commentary Interesting subject matter

Page 51: Down and Dirty Digitization: Everything you need to know about putting content online
Page 52: Down and Dirty Digitization: Everything you need to know about putting content online
Page 53: Down and Dirty Digitization: Everything you need to know about putting content online

Access Systems: BrowseGoals:

Provide intriguing and interesting paths into and throughout a collection

Give a broad sense of a collection, but not show everything necessarily

Requirements: Logical browse paths May have multiple paths to the same items

(e.g., time, geography, subject)

Page 54: Down and Dirty Digitization: Everything you need to know about putting content online
Page 55: Down and Dirty Digitization: Everything you need to know about putting content online
Page 56: Down and Dirty Digitization: Everything you need to know about putting content online
Page 57: Down and Dirty Digitization: Everything you need to know about putting content online

Access Systems: Search Goals

To provide post-coordinate access to all items in a collection relevant to a particular query

To provide good methods to create a search as well as refine or alter the display as required

Requirements: Good search software (database or indexing software) Good metadata (minimum is probably a title or caption

for each item) Good interface (options for navigation, search

refinement, etc.)

Page 58: Down and Dirty Digitization: Everything you need to know about putting content online
Page 59: Down and Dirty Digitization: Everything you need to know about putting content online

Skills Required of Staff Imaging OCR Markup languages (HTML, XML) Cataloging & metadata Indexing and database technology User interface design Programming Web technology Project management

Page 60: Down and Dirty Digitization: Everything you need to know about putting content online

How Does Digital Data Die?

Let me count the ways… New replaces old Death of a sponsor Sponsor loses interest Lost functionality Format rot Media format obsolescence Content format obsolescence Disaster

Page 61: Down and Dirty Digitization: Everything you need to know about putting content online

Preserving Digital Content No preservation format Digital preservation techniques:

Print (on acid free paper!) Store Refresh Encapsulate Emulate Proliferate (Lots Of Copies Keep Stuff Safe or

LOCKSS)

Page 62: Down and Dirty Digitization: Everything you need to know about putting content online

Preserving Digital Content Institutional commitmentConsortial agreementsCooperatively funded central

repositoriesPreservation Open Market

Page 63: Down and Dirty Digitization: Everything you need to know about putting content online

The Best DefenseWhat will ensure that material will not be

preserved? Ignorance of its existence Ignorance of its worth Inability or unwillingness to pay for its

preservationAccess helps with all of these problems