22
Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004

Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Preservation of Electronic Mail

Druscie SimpsonNC State Archives

November 19, 2004

E-mail: The Digital Divide Also

Multiplies

E-mail as a Burden The Radicati Group and Merrill Lynch estimate that

email is growing at a rate of 300% annually. The Age (July 8, 2003)

The real problem: not more email, but “larger and larger attachments, generating an average of 5MB of email content” daily. The Age (July 8, 2003)

Email generates about 400,000 terabytes of new information each year worldwide

About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is 667,585 terabytes. (How Much Information 2003, UC Berkeley)

       What do I do with ALL that e‑mail?!     

Why are we so interested in E‑Mail and Digital Records?

Email’s far reaching effects

Loss of Corporate Knowledge

Imagine you’re new in the office. All of the information to do your job was on your computer. Your predecessor deleted the information before leaving or it was password protected. You don’t have the password.

Legal Implications If it is in an email and

it sent from, received by, or is stored on a government computer, it is a legal record

Never put anything in an e-mail you don’t want on the front page of the local paper.

Always CYO cover your office.)

Users have several options for keeping their saved e-mails: They may leave it on the mail provider’s

server They may leave it on a web-based mail

server such as Hotmail or Yahoo They may store it in their e-mail client

such as Outlook, Eudora, Netscape They may store it on the file system of

their PC as individual .eml files (MS Outlook Express Electronic Mail)

In each of these circumstances the actual byte stream used to represent the e-mail message is slightly different.

 While an e-mail server and e-mail client are obliged to communicate with each other using standards (SMTP, POP3, and IMAP) they are not required to store the e-mail using any sort of standard.

We will be looking for a solution that will have the widest possible use Start with an IMAP server Enhance server with the ability to take the

contents of its message store and create the desired standard XML files called XMTP Using XMTP, SMTP messages can be

transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e-mail and HTML

In the testing phase, but not launched yet http://sourceforge.net/projects/smtp/

IMAP seems to be the only protocol that supports moving and copying e-mail messages from place to place while preserving the e-mail message’s native format.

This means that no matter where the e-mail message ends up, almost any IMAP compliant e-mail client can send it to an “archives” server.

How? Have the user send e-mail directly to a

server hosted by the NC State Archives Have the user send e-mail to an

enhanced IMAP server maintained by their agency This would enable the agency to be able to

locally access the archives e-mail messages IMAP server could then send snapshots to or

send us the XMTP files on electronic media via USPS

Have the user collect and send .pst files to the NC State Archives

Archives will open them with Outlook and move them to the enhanced IMAP server (process would be automated)

Archives should also be able to access packages of e-mail in other formats since Outlook can convert from Eudora, Netscape, etc.

Once loaded into Outlook, the e-mail packages would then be sent to the IMAP server.

Any strategy based on the interception of the data stream is out since we want to collect the e-mail message only after the user has been given a chance to cull and organize them.

Our proposal is to use hmailserver (a source forge open source project) which is an IMAP server that uses MySql or Microsoft SQL server as its message store.

http://www.hmailserver.com

The hMailServer installation contains a minimal MySQL-installation, so if you don't already have a database server in your network, MySQL is installed automatically when you install hMailServer.

The XML creation utility could interface directly with the message store instead of the IMAP protocol.

Hmailserver comes with an attendant com component that can be used to access the data store

Life of an e-mail message E-mail message is sent to the user’s mail server User downloads the message to his/her mailbox User optionally places the message into a folder

on his/her local system User creates a folder on the “Archive” IMAP server User moves the mail from his/her inbox or

specified folder to the folder on the “Archives” IMAP server

An administrator requests that the IMAP server create one or more XML files containing the user’s e-mail

XML files are saved as a preservation copy

Access to Email #1

Load the XML into ENCompass Utilize the IMAP server by enhancing it

to provide web access to its native store similar to the user interface provided by Lurker http://sourceforge.net/projects/lurker

Access to Email #2

Utilizing Documentum by enhancing it to ingest the XML produced by the IMAP server. Documentum server would be used purely

as an e-mail repository, not as a document management application.

Utilize Documentum as a document management application to interfile e-mail messages into named record series

Access to Email #3

Move e-mail messages into a Share Point Portal server Use Outlook to collect the message from the

IMAP server and send them to SPP. Switch-to-Switch Protocol. Protocol specified in

the DLSw standard, used by routers establish DLSw connections, locate resources, forward data, and handle flow control and error recovery.?

XML files would serve purely as a preservation copy.

This Particular Project Take 6 gigabytes of e-mail from Governor

Jim Hunt’s administration (1993-2001; bulk dates 1997-2001) and make it accessible and preservable. E-mail has been appraised and culled to create

the core for preservation E-mail is in Microsoft Outlook .pst files and can be

accessed only by using the correct version of Outlook

Create/utilize programs to move the e-mails out of Microsoft’s proprietary .pst format into a non-proprietary and stable XML format

Also want to write software that is more universal in scope and can be used with most electronic records.

Hire a programmer to write code to convert the .pst files from their format to XML format

Take the converted XML files and load them onto our server and make them available to the public via the web and searchable through our online catalog system (ENCompass/MARS)

Wish us luck! We are very excited to have this

opportunity to explore this potential solution

We hope to take what we learn and apply it to the collection of other electronic government resources that are archival

We’ll keep you posted!