Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Building Digital Preservation Workflows without a “Keystone”
Image: http://falloutfancentral.wikia.com/wiki/The_Keystone?file=Under-the-keystone-logo-web.png with https://openclipart.org/detail/204962/the-rosetta-stone
Background � Digital Preservation Framework document � RFP for digital preservation system � Hire Digital Preservation Analyst � Electronic Records Task Force � Strategic Plan
Image: https://openclipart.org/detail/228096/hourglass
Electronic Records Task Force � Duration: 1 year � Focus: Archives and Special Collections materials � Human Resources: 8 full-time members + 3
resource personnel � Financial Resources: $10,000 budget � Tasks
� Phase 1: Develop Initial Capacity for Ingest and Processing
� Phase 2: Define Tasks and Workflow for Staff � Phase 3: Develop Ingest and Processing Workflows
� Future Steps � Preservation and Access
Staff Commitment � 8 fulltime members + 3 resource personnel
� 1 hour meeting 1x a month
� Ingest Working Group (5 people) � Met as needed � Three people had the most time to offer � Majority of testing done by these three � Review the suggestions of the few; brought to
the larger task force to discuss
Image: https://openclipart.org/detail/217931/a-demonstration-in-grayscale
Budget � Hardware (~ 4000)
� Workstation + more � Windows computer, two monitors � Floppy disk controller with power supply � 4TB hard drive
� Two 2TB external hard drive (mac, windows) � Two write blockers (Tableau T35es-R2 and T8-R2)
� Software (~ 130) � Quick View Plus � TeraCopy � Data Discovery (Identity Finder)
� Physical Space (~2000) � Cube set up costs
Image: https://openclipart.org/detail/167791/money-16
Develop Initial Capacity � Identify, procure, and implement initial
hardware and software needed for secure ingest and processing
� Built off work previously completed reviewing others workstation set ups
Phase 1
Images courtesy of Lara Friedman-Shedlov
Develop Initial Capacity � Identify and secure access to initial file /
storage space for working with incoming records and storage of processed records
Phase 1
Phases 2 & 3 � Set Priorities � Define Tasks and
Workflow for Staff � Develop Ingest and
Processing Workflows
We asked: • What materials were we going to focus on? • How do we transfer it? (software/equipment) • What procedures and workflows do we use? • Who is responsible for tasks?
Phases 2 & 3
Simplified Workflow
Move to storage
Decide what to do
with it
File transfer
Gather info from donor
Understand what was received
Get the “Stuff”
Gather Info on the “Stuff”
File manifest/metadata
Sensitive/Private info
Virus/malware
Phases 2 & 3
Phases 2 & 3
Main Tools � Data Accessioner � TeraCopy � DROID � HashMyFiles
Phases 2 & 3
Workflow / Procedures
Phases 2 & 3
Phases 2 & 3
Prepare for File Transfer � Preliminary review:
� Media � Format (mac?) � Size (2TB vs 16GB)
� File Types (.cda, .zip) � Logical transfer vs.
Disk Image
� Setup file location:
� Choose best tool: � Data Accessioner � TeraCopy + DROID � BitCurator
Phases 2 & 3
Digital Archives [drive] - Unit / Department Name [folder]
- Accession Number [folder] - _AccessionInfo [folder] - file [file] - file [file] - file etc [file]
Learn More* About Files � File Types/Formats
� Data Accessioner � DROID
� Checksums � Data Accessioner � DROID � HashMyFiles
� Duplicate Files � HashMyFiles
� Folder Structure / File Relationships � Data Accessioner � DROID � HashMyFiles
Phases 2 & 3
*Collect Metadata
Icon: https://openclipart.org/detail/184627/learn-icon-by-ousia-184627
Specific Screenings � Virus / Malware check
� Microsoft’s Endpoint Protection
� Private / Personal Identifiable Information (SSNs, cc#, etc…) � Identity Finder
� Document findings / decisions
Images: https://openclipart.org/detail/22678/icon_monitoring-by-jean_victor_balin and https://openclipart.org/detail/172071/scanned-business-stamp-2-by-merlin2525-172071
Phases 2 & 3
The “Packages”: SIPs and AIPs � Submission Information
Package (SIP) � Transferred files � Metadata
� File formats � Checksums � Sensitive information � Original order
� Archival Information Package (AIP) � _AccessionInfo
� Need to rerun reports (new organization)
� Record of actions (google templates)
� Files (appraised)
Phases 2 & 3
Image: https://openclipart.org/detail/87799/download-package-by-kuba
Preparation of the Final Package � Possible Actions and Decisions
� Deleting duplicates � Duplicate File Finder
� Renaming files/folders � Reorganizing files
� Remove Empty Directories
� Removing/redacting sensitive information
� Not accepting and/or transforming certain file formats
?
Image: https://openclipart.org/detail/118645/list--liste-by-lmproulx
Phases 2 & 3
Other “Products” � Guide for Electronic
Records (Staff/Donors)
� E-Records Transfer Sheet
� Accession Log � Workflow Charts � Deed of Gift
Addendum
Time Tracking…
# of Collections Amt. of GB Hours to Process Time Spent
Initial Ingest 13 collections 1122 GB 92.75 hours to ingest 10 hours/month spent ingesting
content
Averages for Initial Ingests
12 GB/hour 7 Hours/Collection
Future Estimates
(remaining on accession log)
37 collections 5450 GB
259 Hours to ingest based on # of hours/
collection
454 Hours to ingest based on # of GB/hour
~ 2 years to process at the
current rate
~ 4 years to process at the
current rate
Summary � Challenges
� Working out the details takes time
� Learning and remembering the ‘gotchas’
� Having time to work on this in addition to other responsibilities
� Conclusions � We need to keep
at it – ERTF2 � We need
dedicated staffing � Consistency is the
Keystone
Image: http://falloutfancentral.wikia.com/wiki/The_Keystone?file=Under-the-keystone-logo-web.png
Thank You � U of M Libraries Digital Preservation Website
www.lib.umn.edu/dp/guides
https://www.lib.umn.edu/dp/digital-preservation-framework
� Electronic Records Task Force Final Report http://conservancy.umn.edu/handle/11299/174097
Carol Kussmann
Digital Preservation Analyst [email protected]