26
Erin Kinney, Wyoming State Library

Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,

Embed Size (px)

Citation preview

Erin Kinney, Wyoming State Library

Motivation

• #1 priority that came out of 2004 statewide digitization meeting

• WSL received many reference questions, obituary and ILL requests

Digitize all newspapers published in Wyoming 1849-1922* and make themeasily accessible over the internet.

*Preserved on microfilm at the Wyoming State Archives

Project

Project

• 1,436 microfilm reels• 850,000 full pages• 8,000,000 clippings

Funding

• Applied for a 2006 NDNP grant, which was not funded.

• The Wyoming State Legislature appropriated $940,000 to the State Library in FY07-08.

• Requested additional funding in FY09-10 which was later denied.

Wyoming State LibraryWyoming State ArchivesUniversity of Wyoming

American Heritage CenterWyoming Press Association

Wyoming State Historical Society

The Partners

Partners• Wyoming State Archives provided copies of

master microfilm reels• Wyoming State Historical Society provided

metadata workers• The company picked to do the work was PTFS

from Bethesda, MD

Why PTFS?• Expertise: people, process, software, hardware

• More than ten years imaging experience• All media types, qualities, formats• Many hardware and software configurations• R&D

Development of an archiving system has helped PTFS perfect imaging capabilities

Technical Requirements

• All text searchable• Content management system (CMS) with a web

interface, and a customizable thesaurus• Very powerful search engine

Technical Standards

• Project followed 2007 NDNP best practices• High Accuracy OCR & Auto-Zoning• 400 dpi grayscale• Enhanced metadata

• Articles, legal/land notices, and advertisements clipped

Digitization ProcessesReceive newspaper microfilm reels; Inventory control

Categorize, sort, prepare

Scan Microfilm at 400 dpi

Export to USB External Drive

Enhance metadataPost Image

Processing

Data formatting for system

QC/QA

OCR images

ArchivalWare Approval Server

Zone, crop & de-skew full images

Create image/text PDFs: full page & clippings

File Approx. Size

1 reel strip image (~1000 pages) 50 GB

2 page up TIFF images 50 MB

Archive TIFF images 30 MB

Uncompressed page level PDFs 5 MB

Compressed page level PDFs 500-900 kb

Clipping PDFs (uncompressed) 100-800 kb

Image Sizes

PTFS Confidential

Challenges

• Image Quality, OCR Accuracy• Difficult to achieve high OCR accuracy• Original text quality varies : yellowed paper, bleed through,

faded, bound page curvature• Microfilm quality varies• Dark borders, washed out sections, out of focus• Misc: Scratches, Thumbs, Tape, Staples!!

• Grayscale best, but results in large files sizes

PTFS Confidential

Challenges

• Rules for zoning (for clipping) are complicated to design and execute• Newspaper formats vary widely from title to title

& year to year• Determine zoning rules and consistently follow

PTFS Confidential

Challenges

“NDNP Ready” Imaging & metadata standards, XML packets

Massive storage requirements—many, many terabytes of storage File types: TIFF and PDF Browse hierarchy

Determines organization of collection Supports logical presentation

Page & clipping relationships

PTFS Confidential

Solutions

Browse Hierarchy Organized by county/city, then newspaper title, year, month, date Pages and clippings will be presented together

Page & clipping relationship Clippings linked to pages

Archive quality image location Archive quality images transported via USB external drive and backed

up to tape (twice!)

Lessons Learned

• Lots of open communication, between all partners, contractor and sub-contractors

• Start looking for money early, but make sure you have your ducks in a row. Don’t get discouraged.

Lessons Learned• Think of the long term implications of decisions

made at the beginning of the project• Decisions made at the beginning of the project can

have unforeseen, and often huge, implications.

Opportunities

• Fill in gaps in coverage• Orphan papers—publishers and even towns that no

longer exist• “New to us” newspaper titles that haven’t been

located and microfilmed yet

Wyoming Newspaper Project Contact

Erin Kinney, Digital Initiatives [email protected]

http://wyonewspapers.org