30
Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists Lisa M. Schmidt [email protected] http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007

Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists

  • Upload
    darci

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists. Lisa M. Schmidt [email protected] http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007. - PowerPoint PPT Presentation

Citation preview

Page 1: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Taming the Wild LISTSERV; or, How to Preserve

Specialized E-Mail Lists

Lisa M. [email protected]://www.h-net.org/archive/

MATRIX: The Center for Humane Arts, Letters & Social Sciences Online

Michigan State UniversityMay 23, 2007

Page 2: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

H-Net: Humanities and Social Sciences Online

• International consortium of scholars and teachers

• Oldest collection of born-digital and content-moderated arts, humanities, and social science material on the Internet

• Valuable scholarly resource– More than 180 networks, or e-mail lists– More than 230 “private” lists

• More than 1 million e-mail messages

Page 3: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

MATRIX

• Digital humanities research center• Devoted to the application of new

technologies in humanities and social science teaching and research

• Uses Internet technologies to improve education and increase the flow of information

Page 4: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

NHPRC Grant

• Conduct assessment of existing H-Net preservation policies and practices

• Develop an improved long-term preservation plan

• Apply NARA/OCLC TRAC checklist• Useful to those managing large collections of

electronic records• Research semantic clustering search

techniques

Page 5: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preserving E-Mail Lists as Scholarly Resources

• How H-Net Works• Current Preservation Practices• Trustworthy Repositories Audit &

Certification: Criteria and Checklist (TRAC)

• Other E-Mail Preservation Projects• Preservation Improvement Plan

Page 6: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Network Configuration

Page 7: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Backup & Security

• Daily incremental backups, weekly full backups– Tapes cycle through system every 6 weeks– Swapped tapes kept in locked cabinet in secured

room– Tapes replaced as needed

• Monthly full, permanent tape backups– Tapes kept in secured room– Plans to keep log and move to offsite storage

• Server rack kept in climate controlled, physically secured room

Page 8: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Posting Messages

• H-Net runs on LISTSERV Software• Users must be list subscribers to post• Messages written in plain text• No attachments allowed on public lists

Page 9: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Posting Messages

Message Posting Process

Page 10: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Archiving of Lists

• Messages kept in flat text files called “notebooks”

• Post from a few seconds up to several days after approval

• Notebook includes messages posted during a weekly time period

Page 11: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Archiving of Lists

Time Period Day of Month

a 1-7

b 8-14

c 15-21

d 22-28

e 29-31

Ex. “h-africa.log0802a”

Page 12: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Archiving of Lists

• BRS Database– Newest notebook messages parsed and copied

every 24 hours– MD5 hashes created for each message– Available for full-text search

• MySQL Database Cache– Log browse cache extracts key metadata,

creates MD5 hashes– Cache builder script writes metadata to MySQL

database cache

Page 13: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Archiving of Lists

Message Metadata Stored in MySQL Database

Page 14: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Message Retrieval

Page 15: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Message Retrieval

Page 16: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Message Retrieval

Page 17: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Message Retrieval

Page 18: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

How H-Net Works:Message Retrieval

Constructed Persistent URLhttp://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=

H-Albion&month=0805&week=c&msg=jeSTCR0QAxq28hhgJPZ%2beQ&user=&pw=

Page 19: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Current Preservation Practices

Message Ingest, Storage, and Retrieval Processes

Page 20: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Current Preservation Practices

• Backup and storage• Significant properties: message content,

stored in plain text formats• Authenticity

– Informal check by author and/or editor on posting– Broken URL on message retrieval attempt

• Cached metadata fulfills PDI requirement

Page 21: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Current Preservation Practices

OAIS PDI Term H-Net Cached MetadataReference Information filename + messageid

Context Information filename, from, subject, date (dpb)

Provenance Information filename, from, subject, date (dpb)

Fixity Information messageid

Cached Metadata

Page 22: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

TRAC

• Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) published by NARA and OCLC, 02/07

• For certification by third party or self assessment

• Three sections– Organizational Infrastructure– Digital Object Management– Technologies, Technical Infrastructure, & Security

Page 23: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

TRAC

Page 24: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Other E-Mail Preservation Projects

Preservation of Electronic Mail Collaboration InitiativeNorth Carolina State Archives, Kentucky Department of Library and Archives, Pennsylvania State Archiveshttp://www.ah.dcr.state.nc.us/records/EmailPreservation/default.htm

Collaborative Electronic Records ProjectSmithsonian Institution/Rockefeller Archives Centerhttp://siarchives.si.edu/cerp/index.htm

Collection-Based Long-Term PreservationSan Diego Supercomputer Centerhttp://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA365661&Location=U2&doc=GetTRDoc.pdf

All Used XML Encoding

Page 25: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preservation Improvement Plan:Backup & Storage

• Media refreshment schedule for all tapes• Systematic sampling, remounting, reading,

retensioning permanent tapes• More than one set of backup tapes, or a

server mirror• Secure storage systems• Backup log• Participation in distributed storage system,

such as LOCKSS or iRODS

Page 26: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preservation Improvement Plan:Authenticity

• Shorten and standardize ingest time window to seconds rather than weeks

• Define and document access permissions• Maintain audit log that tracks all activities

associated with records• Perform regular authenticity checks using

message digests• Consider using SHA-2 for integrity checks

Page 27: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preservation Improvement Plan

• Continue to use MD5 to calculate name• Generate shorter persistent URL for use as

citation

• Awkward metadata handling• Editor data should be added to what’s there,

not replace it

Page 28: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preservation Improvement Plan:Migration

• Messages and Notebooks– No migration strategy needed– Plain text ASCII and UTF-8 stable, open formats

• Attachments– Make private lists browsable by providing constructed

URL– Display attachment link in browse window– Detach attachments from notebook files, store separately,

link to original message– Provide conversion on demand to current formats

Page 29: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

Preservation Improvement Plan:From TRAC Checklist

• Succession plan• Periodic review or trigger event definition• Technology watch• Document, document, document!

– Technology history– Change management system– Staff roles, responsibilities, and authorizations– Written recovery plan

Page 30: Taming the Wild LISTSERV;  or, How to Preserve Specialized E-Mail Lists

References

• H-Net Archives, Documentation, http://www.hnet.org/archive/doc.php

• H-Net: Humanities and Social Sciences Online, http://www.h-net.org

• InterPARES, http://www.interpares.org• MATRIX: The Center for Humane Arts, Letters, and

Social Sciences Online, http://www.matrix.msu.edu• OAIS Reference Model,

http://public.ccsds.org/publications/archive/650x0b1.pdf• Trustworthy Repositories Audit & Certification: Criteria

and Checklist, http://www.crl.edu/PDF/trac.pdf