Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists

Preview:

DESCRIPTION

Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists. Lisa M. Schmidt lisa.schmidt@matrix.msu.edu http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007. - PowerPoint PPT Presentation

Citation preview

Taming the Wild LISTSERV; or, How to Preserve

Specialized E-Mail Lists

Lisa M. Schmidtlisa.schmidt@matrix.msu.eduhttp://www.h-net.org/archive/

MATRIX: The Center for Humane Arts, Letters & Social Sciences Online

Michigan State UniversityMay 23, 2007

H-Net: Humanities and Social Sciences Online

• International consortium of scholars and teachers

• Oldest collection of born-digital and content-moderated arts, humanities, and social science material on the Internet

• Valuable scholarly resource– More than 180 networks, or e-mail lists– More than 230 “private” lists

• More than 1 million e-mail messages

MATRIX

• Digital humanities research center• Devoted to the application of new

technologies in humanities and social science teaching and research

• Uses Internet technologies to improve education and increase the flow of information

NHPRC Grant

• Conduct assessment of existing H-Net preservation policies and practices

• Develop an improved long-term preservation plan

• Apply NARA/OCLC TRAC checklist• Useful to those managing large collections of

electronic records• Research semantic clustering search

techniques

Preserving E-Mail Lists as Scholarly Resources

• How H-Net Works• Current Preservation Practices• Trustworthy Repositories Audit &

Certification: Criteria and Checklist (TRAC)

• Other E-Mail Preservation Projects• Preservation Improvement Plan

How H-Net Works:Network Configuration

How H-Net Works:Backup & Security

• Daily incremental backups, weekly full backups– Tapes cycle through system every 6 weeks– Swapped tapes kept in locked cabinet in secured

room– Tapes replaced as needed

• Monthly full, permanent tape backups– Tapes kept in secured room– Plans to keep log and move to offsite storage

• Server rack kept in climate controlled, physically secured room

How H-Net Works:Posting Messages

• H-Net runs on LISTSERV Software• Users must be list subscribers to post• Messages written in plain text• No attachments allowed on public lists

How H-Net Works:Posting Messages

Message Posting Process

How H-Net Works:Archiving of Lists

• Messages kept in flat text files called “notebooks”

• Post from a few seconds up to several days after approval

• Notebook includes messages posted during a weekly time period

How H-Net Works:Archiving of Lists

Time Period Day of Month

a 1-7

b 8-14

c 15-21

d 22-28

e 29-31

Ex. “h-africa.log0802a”

How H-Net Works:Archiving of Lists

• BRS Database– Newest notebook messages parsed and copied

every 24 hours– MD5 hashes created for each message– Available for full-text search

• MySQL Database Cache– Log browse cache extracts key metadata,

creates MD5 hashes– Cache builder script writes metadata to MySQL

database cache

How H-Net Works:Archiving of Lists

Message Metadata Stored in MySQL Database

How H-Net Works:Message Retrieval

How H-Net Works:Message Retrieval

How H-Net Works:Message Retrieval

How H-Net Works:Message Retrieval

How H-Net Works:Message Retrieval

Constructed Persistent URLhttp://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=

H-Albion&month=0805&week=c&msg=jeSTCR0QAxq28hhgJPZ%2beQ&user=&pw=

Current Preservation Practices

Message Ingest, Storage, and Retrieval Processes

Current Preservation Practices

• Backup and storage• Significant properties: message content,

stored in plain text formats• Authenticity

– Informal check by author and/or editor on posting– Broken URL on message retrieval attempt

• Cached metadata fulfills PDI requirement

Current Preservation Practices

OAIS PDI Term H-Net Cached MetadataReference Information filename + messageid

Context Information filename, from, subject, date (dpb)

Provenance Information filename, from, subject, date (dpb)

Fixity Information messageid

Cached Metadata

TRAC

• Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) published by NARA and OCLC, 02/07

• For certification by third party or self assessment

• Three sections– Organizational Infrastructure– Digital Object Management– Technologies, Technical Infrastructure, & Security

TRAC

Other E-Mail Preservation Projects

Preservation of Electronic Mail Collaboration InitiativeNorth Carolina State Archives, Kentucky Department of Library and Archives, Pennsylvania State Archiveshttp://www.ah.dcr.state.nc.us/records/EmailPreservation/default.htm

Collaborative Electronic Records ProjectSmithsonian Institution/Rockefeller Archives Centerhttp://siarchives.si.edu/cerp/index.htm

Collection-Based Long-Term PreservationSan Diego Supercomputer Centerhttp://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA365661&Location=U2&doc=GetTRDoc.pdf

All Used XML Encoding

Preservation Improvement Plan:Backup & Storage

• Media refreshment schedule for all tapes• Systematic sampling, remounting, reading,

retensioning permanent tapes• More than one set of backup tapes, or a

server mirror• Secure storage systems• Backup log• Participation in distributed storage system,

such as LOCKSS or iRODS

Preservation Improvement Plan:Authenticity

• Shorten and standardize ingest time window to seconds rather than weeks

• Define and document access permissions• Maintain audit log that tracks all activities

associated with records• Perform regular authenticity checks using

message digests• Consider using SHA-2 for integrity checks

Preservation Improvement Plan

• Continue to use MD5 to calculate name• Generate shorter persistent URL for use as

citation

• Awkward metadata handling• Editor data should be added to what’s there,

not replace it

Preservation Improvement Plan:Migration

• Messages and Notebooks– No migration strategy needed– Plain text ASCII and UTF-8 stable, open formats

• Attachments– Make private lists browsable by providing constructed

URL– Display attachment link in browse window– Detach attachments from notebook files, store separately,

link to original message– Provide conversion on demand to current formats

Preservation Improvement Plan:From TRAC Checklist

• Succession plan• Periodic review or trigger event definition• Technology watch• Document, document, document!

– Technology history– Change management system– Staff roles, responsibilities, and authorizations– Written recovery plan

References

• H-Net Archives, Documentation, http://www.hnet.org/archive/doc.php

• H-Net: Humanities and Social Sciences Online, http://www.h-net.org

• InterPARES, http://www.interpares.org• MATRIX: The Center for Humane Arts, Letters, and

Social Sciences Online, http://www.matrix.msu.edu• OAIS Reference Model,

http://public.ccsds.org/publications/archive/650x0b1.pdf• Trustworthy Repositories Audit & Certification: Criteria

and Checklist, http://www.crl.edu/PDF/trac.pdf