10
Port Townsend Leader Historical Newspaper Archive Keith Darrock

Port Townsend Leader Historical Newspaper Archive Keith Darrock

Embed Size (px)

Citation preview

Page 1: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Port Townsend Leader Historical Newspaper Archive

Keith Darrock

Page 2: Port Townsend Leader Historical Newspaper Archive Keith Darrock

HISTORY

PO

RT T

OW

NSEN

D LE

AD

ER

Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>

Page 3: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Schema >> Dublin CoreIssue & headlines: April 1, 1910 Page three

May Roberts company to open engagement tonightCubs and soldiers to play practice gameSchooner Inca is coming to Puget Sound

Keywords: drama; baseball; Named individuals: Roberts, May

Rassmussen, CaptainVessels: schooner Inca; Publisher: The Leader Company Place of publication: United States--Washington (State)--Port Townsend Type of publication: Newspaper Frequency: Daily except Monday Title notes: The Port Townsend daily leader (1904-1916); Continues: Morning leader. Continued by:

Port Townsend leader (1916). Image format: GIF image Scanning data: Scanned from 35 mm silver negative microfilm by OCLC Preservation Resources; GIF

images are 1600 pixels wide, type 89A with 2 added colors derived from 600 dpi bitonal TIFF images. Source of other formats: Microfilm: Port Townsend Public Library, 1220 Lawrence St., Port Townsend, WA 98368, 360-385-3181. Microfilm and bound originals: Jefferson County Historical Society, 210 Madison St., Port Townsend, WA 98368, 360-385-1003.

Rights: Use of this image is restricted to non-commercial, public access and does not include the right to create text versions.

Example taken from: http://content.lib.washington.edu/cgi-bin/viewer.exe?CISOROOT=/ptleader&CISOPTR=3978

Page 4: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Content Standards • MIG: Metadata Implementation Group >> http://www.lib.washington.edu/msd/mig/default.html• Provides guidelines for creating a collection >>• Dublin Core Field Properties Table >>

http://www.lib.washington.edu/msd/mig/advice/default.html

• Date field mm/dd/yyyy• Issue & headlines• Vessels• Named individuals • Keywords• Page notes

• Leader historical archive does not follow strict content standards in terms of controlled vocabulary >>

Page 5: Port Townsend Leader Historical Newspaper Archive Keith Darrock

• Put onto microfilm as a Washington State Library project >>

• Scanned by OCLC into images >>

• Images originated in 600 dpi TIFF format >>

• Finalized as 1600 pixels wide GIF format >>

• Uploaded to UW servers via CONTENTdm clients

Digitization Standards

Page 6: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Harvested into a Federated Search Tool?

• The Port Townsend Leader archive has not been harvested by OAIster yet…

• However, many collections within the UW digital collection have

• The Port Townsend Leader archive can be found in OCLC’s CONTENTdm Collection of Collections http://collections.contentdm.oclc.org/

Page 7: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Software Used to House Records & Digitized Works

• CONTENTdm >>

• Originally developed by the University of Washington >>

• 2001 Digital Media Management, Inc was created. System made available to outside entities >>

• OCLC purchased in 2006, now owns and manages.

Page 8: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Who’s Responsible for Indexing?

The Port Townsend Public Library manages volunteer(s) to hand index certain fields. Including:

• Issue and headlines

• Keywords

• Named individuals

• Vessels

Using a controlled vocabulary? Sometimes, including the first three years 1903-06 and sporadically there after. Actual LCSH headings, probably not.

*Automation will not solve the need for human indexing within the date range and subject fields

Page 9: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Automation – Can We Do It?

• In over ten years, human indexing has only completed ten years of content. However, this is a lot of work, over 7,000 images so far! >>

• Need a more efficient solution? >>

• Use OCR (ABBYY FineReader) software to extract the image’s text in batches >>

• Add new field >> Text, that contains all text (searchable-YES)

• Two files; image & OCR (text) combined via CONTENTdm >> upload all to UW main server >>

Page 10: Port Townsend Leader Historical Newspaper Archive Keith Darrock

Challenges to Automation >>

• Working with volunteers, need library staff involvement >>

• Making compound or complex objects >>

• Still need subject terms & date applied by a human indexer >>

• Having volunteers use an actual controlled vocab. >>

• Time to do it all?

I Automation >>