60
Digital Preservation Digitization Basics for Archives and Special Collections - Part 2: Store and Share WiLSWorld 2015

Digitization Basics for Archives and Special Collections – Part 2: Store and Share

  • Upload
    wils

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Digital PreservationDigitization Basics for Archives and

Special Collections - Part 2: Store and Share

WiLSWorld 2015

SD

JH

CP

UW Digital Collections Center

Steven DastDigital Asset Librarian

Jesse HendersonDigital Services Librarian

Cat PhanDigital Services Librarian

“We'll examine the issue of digital preservation . . . ”

“. . . including practical steps you can take to preserve your digital content with limited resources.”

Characteristics of digital information

Strengths● Easy to make and transmit

perfect copies● Machine readable content and

metadata facilitates automation● Storage relatively inexpensive

and becoming more so

Challenges● Fragile, easily malleable● Storage media not durable● High density of storage● Requires technology to render

into human readable formo Obsolescenceo Early signs of loss may not

be apparento Loss generally extensive

Two primary stages in digital lifecycle

Creation stage● Intense, focused

action● Maximize value of

digital material● Risk of errors

Preservation stage● Long-term,

sporadic action● Minimize cost of

maintenance● Risk of failures

Strategies for digital preservation

Take advantage of our strengths:● Make lots of copies in different places● Automate file handling and management

Take steps to minimize challenges...

Strategies for both phases

Use broadly supported standard file format that store uncompressed data:TIFF for imagesWAV for audio● mitigates obsolescence, data fragility● facilitates future bulk processing

Strategies for both phases

Work as consistently as possible; keep good records; document special cases● Reduce cost of future preservation actions

Strategies for both phases

Use file naming system that is simple and consistent, but flexible.Remember: whatever system you choose is (almost) entirely for your convenience—to the computer they’re all just strings of characters.Nevertheless, tool requirements (if and when they exist) override any other factors.

Strategies — File naming

Avoid spaces and special characters (/ \ : * ? “)Use letters and numbers, underscore ( _ ), hyphen( - ).Dot ( . ) is okay, but has a special functionFor broadest compatibility, use 8.3 conventionDon’t use capitalization for meaningful differences

Strategies — File naming

Effects and side effects of file names● Identity● Order / sequencing (0122.tif 0123.tif 0124.tif)

● Collocation / grouping(ncb01.tif ncb02.tif ncb03.tif mca01.tif mca02.tif)

Strategies — File naming

Using meaningful file names can● Facilitate error detection and recovery

o missing or misplaced files● Aid ‘manual’ handling and checking of files

o Name order = natural ordero Name reflects content of file in some way

● Increase maintenance and correction costso Insertion or deletion of files in a sequence

Strategies — File naming

Also use directories to help organize files● Same naming conventions apply (avoid . )● Same naming benefits and cautions● Nesting directories allows for richer

hierarchical relationships, but may foil some automation options

● Limit to 500-1000 files when feasible

Strategies — File naming

UWDCC naming system for books:One directory per volume, with flexible four-digit sequential filenames. Directories may be grouped for multi-volume monographs, by series, by project, or several of the aboveUWMad/Yearbooks/Yrbk1972/0001.tif

Strategies — File naming

UWDCC naming system for photographs:Short alpha pre-fix with a flexible serial number, ad hoc system of separation into directories, usually based on serial numberUWArchives/uwar02/uwar02345.tif

Strategies — File naming

Bottom line:If you have technical requirements for file names, follow them.Beyond that, choose a system that maximizes human utility, keeping in mind the balance between encoded meaning and requirements for maintenance

Strategies for creation phase

Create high-quality digital surrogates sufficient to meet current and anticipated needs● Encourages future investment in the material

Strategies for creation phase

Create backups of current work and maintain fall-back positions in case corrections are needed● Reduces cost of errors● Mitigates fragility and malleability of data

Strategies for creation phase

Check your work at major transitions, not just for quality issues, but also for completeness and accuracy● Increases value of the collection● Facilitates future processing

Strategies for preservation phase

Choose storage media that best match your resources and requirements.● Make multiple copies so that you can react to failure● If possible, mitigate technological risk by storing files on

different types of media● Mitigate risk of physical disasters by storing media in

multiple locations

Strategies — Storage mediaTechnology Size Stability Cost

Flash storage 4 – 256 GB 5-20 years or less $0.50/GB

Hard drive (magnetic disk)

1 TB – ? 25-30 years, prone to mechanical failure

$0.05/GB +++

Magnetic tape 400 GB – 2.5 TB 25-30 years $0.01–0.50/GB

CD-R 630–700 MB 100–200 years for high-quality media (MAM-A)

$2.50/disc =$3.50/GB

DVD-R/+R 4.7 GB 100–200 years (?) for high-quality media

$2.50–4.00/disc = $0.50-0.85/GB

The Cloud 1 – 30 TB ? $0.002–0.10/GB monthly!

Strategies — Storage media

Over its history, UWDCC has used● JAZ disks● Duplicate CD-R● Duplicate data tapes● Hard drives with duplicate data tapesWe currently have ~18 TB of archived data

Strategies — Storage media

Recommended options for getting started● CD-R or DVD-R/+R

o Use the good stuff: MAM-A Gold Archive mediao Always make duplicateso Consider supplementing with cloud storage

● Graduate to hard driveso Active RAID-enabled disks much safer than stand-

alone hardware sitting on a shelf● Add tape when technology staff can support

Strategies — Storage media

Avoid● Flash drives — too unstable● Reliance on the Cloud as your only archive

Strategies for preservation phase

Anytime you move data to a new medium or a new physical device, verify!(Now that you’re no longer actively working with the files, it’s easy for a bad transfer to go unnoticed.)If the new media/device can be write-protected, do so.

Strategies for preservation phase

Create checksums for each file that you archive● Use now to verify files on transfer● Use later to detect data degradation● Also useful to determine whether files are

actually the same or not

Strategies for preservation phase

Keep track (metadata!) of where your files are archived ● Material that can’t be located has not been

preserved● Will help to prioritize future preservation

actions

UWDCC workflow

1. Metadata first: checklist for subsequent work2-5. Working files organized under three directories: original, inprocess, final

Initial scan to ‘original’ - never editedCopy to ‘inprocess’ - cleaned up for accessFinished version to ‘final’ - metadata check

6. Distribution files created from ‘final’ masters

UWDCC workflow

7. ‘Click-through’ all images in test mode7a. Once all is correct: public release!8a. Recheck files against metadata8b. Create checksums for local files8c. Transfer files to archival media8d. Verify checksums for transferred files9. Now safe to delete working copies

UWDCC Tools

● Microsoft Excel or FileMaker Pro for metadata entry (sometimes Access)

● Variety of scanners chosen to maximize flexibility● Manufacturer’s software / VueScan● GoldenThread (ISA) for evaluating scanner quality● Adobe Photoshop for image editing● AppleScript for custom automation of various workflow

tasks● Built-in Unix functions for checksums, file-handling

Other tool options

Image editing:GIMP (Windows, Mac, Linux)Paint.net (Windows)

Automation:VBScript, JScript, VBA (Windows)Python (Windows, Mac, Linux)

Checksum and verification:Fastsum, Checksum (corz.org) (Windows)

SummaryBoth Phases Creation Phase Preservation Phase

★ Use broadly supported standard file formats(tiff, wav)

★ Develop consistent workflow, document special cases

★ File naming - follow technical rules; design it for humans

○ Balance between using filename for meaning and keeping it easy to maintain

★ Start with high-quality scans of source documents

★ Make backups of current work, maintain fall-back positions

★ Check work at major transitions

★ Storage media○ Start: CD-R or DVD-

R/+R, maybe supplement with Cloud

○ Step up: hard drives○ Add tape if can

support(Avoid flash drives and Cloud as sole archive)

★ Verify anytime you move things

★ Write-protect if you can★ Create checksums★ Metadata:

Know what you have, where it is, and what you can do with it

Selected references and readingGeneral DPhttp://digitalpowrr.niu.edu/wp-content/uploads/2014/05/Overwhelmed-to-action.rinehart_prudhomme_huot_2014.pdfhttp://commons.lib.niu.edu/handle/10843/13610http://files.eric.ed.gov/fulltext/ED426715.pdfhttps://en.wikipedia.org/wiki/Digital_preservation

Filenaminghttp://www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name

Storage mediahttp://www.nps.gov/museum/publications/conserveogram/22-05.pdf

Selected tools and resourcesScanninghttp://www.hamrick.com (Vuescan)http://www.imagescienceassociates.com/(GoldenThread)

Image editinghttp://www.gimp.orghttp://www.getpaint.net/index.html

Archival CDs and DVDshttp://www.mam-a-store.com

Scripting

http://www.pctools.com/guides/article/id/2/page/1/

https://www.python.org

http://macosxautomation.com/applescript/firsttutorial/index.html

Checksum tools

http://www.fastsum.com

http://corz.org/windows/software/checksum/

Questions?

UWDC & Digital as PreservationThe UWDCC recently launched a pilot project in collaboration with our Preservation Departmentto develop standards and guidelines for utilizing digitization as a preservation medium at UW-Madison.

This presentation focuses primarily on workflowand only on changes we can and have implemented in our current environment for preservation-level projects.

Detail from page 2 of ‘The modern priscilla’ Vol. XXXVI, No. V (July, 1922). The Dovie Horvitz Collection.

UWDC & Digital as Preservation

Type Hardware Software

High Speed scanning Panasonic KV-S3065C High Speed Color Scanner

Reliable Throughput Image Viewer (RTIV)

Flatbed scanning Epson Expression 10000XL (includes one with Epson A3 Transparency adapter) Epson Expression 11000XL

Epson Scan Utility

Overhead Reprographic scanning

BetterLight Super 6K-HS Digital Scanning Back

ViewFinder camera control software

Slide scanning Nikon Super COOLSCAN 5000 ED film scanner

VueScan scanner software

Digital photography

Equipment

UWDC & Digital as Preservation

The basics:● What is Preservation? - Extending the useful life of our stuff.● Why do we do it? Protect, Represent, Transcend.

Do something with those berries before they spoil! Pickle something! In essence, preservation is extending the useful life of our stuff.

Don’t let those veggies just turn into compost. Protect! Secure the value and usefulness of our resources.

Taste the summer sunshine in your veggies when you eat them out of season.Represent! We want our digital formats to be an authentic representation of the original.

Pickles and jam exist only when cucumbers and berries are transformed into something new Transcend! Preserve originals to take advantage of and/or discover new uses.

UWDC & Digital as Preservation

Prep:1. Identify

What do we have that needs preserving? Where did it come from?

2. Evaluate & AssessMake sure our equipment and ingredients are up to the preservation process. Figure out how much we can handle at one time.

3. SelectCondition: Does one thing spoil faster than another? High use: Which items circulate the most? Scarcity: What are others not preserving?

4. Review your recipeConsult the cookbooks (in our case FADGI) and make sure you’ve read through your recipe. Have everything you need before you start.

Steps 1 & 3 handled by our Preservation Department.Steps 2 & 4 done by UWDCC.

UWDC & Digital as Preservation

What did this look like at UWDC?● Researched current literature - focus on FADGI.● Established baseline, optimum performance data for hardware -

GoldenThread

UWDC & Digital as Preservation

FADGI = whoa…Lots to digest! Our takeaways:Evaluate and Assess our digitization environment & tweak our recipe● Quantifying Scanner Performance ● Targets and software to use for this: GoldenThread● Color Management

Appendix A: Digitizing for Preservation Reformatting of PhotographsCompare characteristics of preservation vs.production master files.

UWDC & Digital as Preservation

Using GoldenThread● Flatbeds and Epson Scan software - customizing the color balance

settings per scanner● BetterLights and ViewFinder software - custom tone curves per set-up,

per scanner

UWDC & Digital as Preservation

Using targets and software to determine performance3s: +/- 6 aim points4s: +/- 3 aim points

UWDC & Digital as Preservation

Established baseline, optimum performance.Establish maintenance schedule.

UWDC & Digital as PreservationMonthly: Check BetterLight and Flatbed performance against baseline performance with Golden Thread

Quarterly: Calibrate monitors on reformatting supervisors’ computersZig-Align BetterLights (or more frequently if needed)

Biannually: Calibrate scanning station monitorsCalibrate and characterize BetterLights(create new baseline tone curves in the software)Calibrate and characterize Flatbeds(update histogram settings)

UWDC & Digital as Preservation

Access recipe:● 300 dpi● 24-bit color or (grayscale on our high speed scanner)● Flatbed, BetterLight or high speed scanner● Custom tone curves on BL software per set-up● Custom histograms on Flatbeds● Cropping borders based on project● “Cooked” masters archived

Original object itself is the preservation master (you intend to hold onto it) and digital surrogates are for access.

UWDC & Digital as Preservation

Preservation recipe:● 400 dpi● 24-bit color● BetterLight only (for now)● Custom tone curves per project/issue● Object target captured per page/scan● Device target per project/issue/day● Always crop outside the pages● “Raw” and “Cooked” masters archived

Digital version expected to be the preservation master in the absence of the original object,therefore highest possible fidelity is desired.