Upload
elizabeth-sims
View
213
Download
0
Embed Size (px)
Citation preview
Port Townsend Leader Historical Newspaper Archive
Keith Darrock
HISTORY
PO
RT T
OW
NSEN
D LE
AD
ER
Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>
Schema >> Dublin CoreIssue & headlines: April 1, 1910 Page three
May Roberts company to open engagement tonightCubs and soldiers to play practice gameSchooner Inca is coming to Puget Sound
Keywords: drama; baseball; Named individuals: Roberts, May
Rassmussen, CaptainVessels: schooner Inca; Publisher: The Leader Company Place of publication: United States--Washington (State)--Port Townsend Type of publication: Newspaper Frequency: Daily except Monday Title notes: The Port Townsend daily leader (1904-1916); Continues: Morning leader. Continued by:
Port Townsend leader (1916). Image format: GIF image Scanning data: Scanned from 35 mm silver negative microfilm by OCLC Preservation Resources; GIF
images are 1600 pixels wide, type 89A with 2 added colors derived from 600 dpi bitonal TIFF images. Source of other formats: Microfilm: Port Townsend Public Library, 1220 Lawrence St., Port Townsend, WA 98368, 360-385-3181. Microfilm and bound originals: Jefferson County Historical Society, 210 Madison St., Port Townsend, WA 98368, 360-385-1003.
Rights: Use of this image is restricted to non-commercial, public access and does not include the right to create text versions.
Example taken from: http://content.lib.washington.edu/cgi-bin/viewer.exe?CISOROOT=/ptleader&CISOPTR=3978
Content Standards • MIG: Metadata Implementation Group >> http://www.lib.washington.edu/msd/mig/default.html• Provides guidelines for creating a collection >>• Dublin Core Field Properties Table >>
http://www.lib.washington.edu/msd/mig/advice/default.html
• Date field mm/dd/yyyy• Issue & headlines• Vessels• Named individuals • Keywords• Page notes
• Leader historical archive does not follow strict content standards in terms of controlled vocabulary >>
• Put onto microfilm as a Washington State Library project >>
• Scanned by OCLC into images >>
• Images originated in 600 dpi TIFF format >>
• Finalized as 1600 pixels wide GIF format >>
• Uploaded to UW servers via CONTENTdm clients
Digitization Standards
Harvested into a Federated Search Tool?
• The Port Townsend Leader archive has not been harvested by OAIster yet…
• However, many collections within the UW digital collection have
• The Port Townsend Leader archive can be found in OCLC’s CONTENTdm Collection of Collections http://collections.contentdm.oclc.org/
Software Used to House Records & Digitized Works
• CONTENTdm >>
• Originally developed by the University of Washington >>
• 2001 Digital Media Management, Inc was created. System made available to outside entities >>
• OCLC purchased in 2006, now owns and manages.
Who’s Responsible for Indexing?
The Port Townsend Public Library manages volunteer(s) to hand index certain fields. Including:
• Issue and headlines
• Keywords
• Named individuals
• Vessels
Using a controlled vocabulary? Sometimes, including the first three years 1903-06 and sporadically there after. Actual LCSH headings, probably not.
*Automation will not solve the need for human indexing within the date range and subject fields
Automation – Can We Do It?
• In over ten years, human indexing has only completed ten years of content. However, this is a lot of work, over 7,000 images so far! >>
• Need a more efficient solution? >>
• Use OCR (ABBYY FineReader) software to extract the image’s text in batches >>
• Add new field >> Text, that contains all text (searchable-YES)
• Two files; image & OCR (text) combined via CONTENTdm >> upload all to UW main server >>
Challenges to Automation >>
• Working with volunteers, need library staff involvement >>
• Making compound or complex objects >>
• Still need subject terms & date applied by a human indexer >>
• Having volunteers use an actual controlled vocab. >>
• Time to do it all?
I Automation >>