IASSIST 2008 presentation about our move from off-line to on-line archival storage
Citation preview
1. Moving an Archive from Tape to Disk A Case Study at ICPSR
IASSIST 2008 Stanford University Bryan Beecher IT Director
ICPSR
2. Overview of todays talk Where we were Background info
Digital Preservation @ ICPSR in 2006 Where we went Digital objects
Physical objects Where we want to go Fedora 2
3. What is ICPSR? Collect digital objects primarily social
science data Add value to the objects Preserve and disseminate
Other programs too Summer Program in Quantitative Methods Digital
Preservation workshop Clients Higher-education Data producers who
dont want to preserve or disseminate 3
4. A peak inside ICPSR Computer & Network Services ICPSRs
technology shop System and network management Software, service,
and database development Data Library Manage off-line storage of
digital objects Manage off-site collection of paper records Service
staff requests for digital and physical objects Historically had
little interaction 4
5. DigiPres at ICPSR in 2006 The Good The Bad Two copies of
each digital Using low-density tape for object; one off-site
archival storage Metadata stored in a Metadata not stored with
relational database the objects Stable processes Manual processes
Large collection of old Large collection of old stuff (paper
records and stuff (paper records and media) media) 5
6. DigiPres at ICPSR in 2006 September 2006 ICPSR hires its
first Digital Preservation Officer Nancy McGovern Data Library team
joins Computer & Network Services DPO sets policies The newly
expanded CNS implements those policies and operates the technology
6
7. Policy changes Do NOT need to preserve original media
Preservation commitment is to the intellectual content Media is
only a container holding that content Do NOT need to preserve paper
records except where there is value Do need a digital copy outside
of Ann Arbor Do need to collect key metadata about deposits
Provenance Digital fingerprints 7
8. The Plan Track service requests via help desk software Whos
asking for materials? How many requests for digital materials v.
paper v both? How many requests each month? Wherever possible
automate digital preservation operations Completeness and
correctness increases Staff become available for retrospective
projects Also automate ICPSR staff access to materials 8
9. The Plan (more) Transition ALL digital content from tape to
disk A copy on tape too is OK, but not primary copies Expensive to
access Difficult to tell if copy A and copy B are in sync Discard
extraneous administrative documents Just the low hanging fruit Turn
over remaining documents to records management professionals 9
10. Interlude - Comcast An Internet connection at the Warehouse
would be very helpful Access to databases, Intranet Thought we
might purchase a broadband connect We started with Comcast.
Comcast: Well need to include an installation surcharge to cover a
few extra installation costs. ICPSR: How much? 10
11. Our reaction Comcast: Thirty-two thousand dollars. ICPSR:
Uh, no. The Warehouse now has an AT&T DSL connection 11
12. Execution moving to disk DLT tape - bulk of our content
approx 275 unique Two copies of each tape ICPSR HQ The Warehouse
Each tape holds up to 20Gb to 40Gb During Feb Jun 2007 ICPSR moved
the content of these tapes to spinning disk Starting in Jan 2007
ICPSR stopped using DLT tape for archival storage 12
13. Execution moving to disk Approx 5TB of unique content
across all tapes How many copies? (1) ICPSR on-line (1) ICPSR
off-line (1-3) Chronopolis (SDSC, NCAR, UMd) (2) IU HPSS (0-5)
LOCKSS-based, NDIIPP-funded syndicated storage More? Intending to
destroy the DLT media at end of 2008 13
14. Execution moving to disk Also have 2000 cartridge (3480)
and 9-track tapes Have been reading 50/week for many months now;
will finish these before the end of 2008 High success rate for
reading (> 80%) Also had a stash of over 10k tapes that had
already been migrated, but not discarded For this we used extra
special, extra gentle treatment 14
15. Carefully removing the tapes 15
16. Who ya gonna call? 16
17. Before the harvest 17
18. After the harvest 18
19. Costs - media Numbers are in thousands 40 30 20 10 Were 0
Now Master Backup Media copy per copy per mgmt TB TB 19
20. Costs media (notes) Were spending approx $2000/TB/copy on
DLT tape $65k/year staff to read, write, migrate and manage tapes
Now spending approx $2000/TB/copy for expensive SATA disk in our
EMC $100/TB/copy for LTO-3 tape $0/TB/copy for off-site, on-line
copies with our friends Staff cost for plain old file and tape
management can live on the margins 20
21. Execution paper documents Stored at the Warehouse 3200 sq
ft facility located near Ann Arbor airport 2500 sq ft manufacturing
space 600 sq ft of office space (the three Front Rooms) 100 sq ft
of kitchette, rest room $35k year for rent; $5k for utilities
21
22. Birds eye 1 of 3 22
23. Birds eye panning right 23
24. Birds eye panning right 24
25. Execution paper documents Phase I (clean up) Identify,
gather and recycle paper with no archival value File listings
Census 2000 Completed in 2007; recycled 40 cubic yards Phase II
(clean out) Consolidate Administrative and Archival materials into
an acid-free folder stored in an archival quality box In progress;
expect to complete by the end of August 2008 25
26. Costs paper documents Numbers are in thousands $200 $150
$100 Current $50 Planned $0 Storage & Retrieval & Supplies
& Management Returns Misc 26
27. Execution automation Digital Object Database Database of
metadata about every identified file in the archives Digital
fingerprint Location Source Plugged into our ingest system and our
dissemination system Powers some really useful tools 27
28. Execution automation Goodies for ICPSR staff Download page
has extra knob to view ALL files Intranet tools that link Internal
Study Tracking System Public-facing study download system
Private-facing digital preservation system Immediate and direct
access to all digital objects 28
29. Looking forward Lots of good progress so far Better access
for ICPSR staff More robust preservation Reduced costs But does the
IT guy ever give up $ once he gets it? But not done yet Still need
a proper digital preservation system Fedora 29
30. Looking forward (continued) Long-term, off-site, on-line
copies Heavily subsidized today What about the future costs? What
if we start preserving and disseminating much larger digital
objects? Restricted-access materials Balancing good preservation v.
securing sensitive data 30