Developing a Framework for File Format Migrations iPRES 2015 Chapel Hill, NC 3 November 2015 Joey...

Preview:

Citation preview

Developing aFramework for

File Format Migrations

iPRES 2015 Chapel Hill, NC 3 November 2015Joey Heinen and Andrea Goethals

Outline

1. Project Drivers2. Migration Framework3. Key Lessons Learned4. Q&A

Researcher to Librarian A: I somehow can't open the file extension .smi (see attachment) working on a mac. Any ideas? I'm so sorry to bother you with this. 

Librarian A to Audio Engineer: This is now a recurring problem. Is there another audio option for the website? Right now, the files on my computer seem to try to download into VLC perpetually, but never play. They used to open in Itunes. Any advice?

IT Support: Unfortunately it is not possible to play RealPlayer SMI playlist files on a Mac. The user should try playing the files on a Windows PC instead. The latest version of RealPlayer does not work with Real Audio files. The files will not play, and the application itself may crash on some Windows 7 machines. The user should download and install an older version of Real Audio Player

Librarian A: This is why we need a new audio platform… I can't even play the files anymore… ?? 

Obsolete Formats in the DRS

• Audio– RealAudio files & SMIL Playlists (use copies)

• Images– Kodak PhotoCD images (archival and production

masters)• Obsolescence revealed through use

Ongoing Need

• Self-assessments revealed our inexperience with format migrations as a gap area in our repository practices

• Need a format migration framework– Cover whole workflow– Flexible structure• Format-agnostic• Scalable up or down• Plug-n-play roles and responsibilities

Opportunity

• IMLS-funded NDSR Boston project• 9-month resident to work on strategic projects

designed by the host institution• Uninterrupted time to focus on developing a

format migration framework by working through 2 real use cases

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Format Migration Framework

• Five phases:

Each with activities and deliverables

Phase One: Plan for Test

• Define stakeholders and their migration roles• Identify concurrent projects that can affect

migration• Acquire specifications• Research formats and identify key properties• Identify the content “buckets” that can be

migrated together• Analyze content, metadata,

migration tools Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Deliverable Ex. for Kodak PCD: Content Groupings

Different cropping, roles, and format of Production Master

Deliverable Ex. for Kodak PCD: Content Groupings

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Phase Two: Test

• Select sample content to test with• Design migration tests to exercise tools and

different processes• Perform migration tests• QA results using authoritative metrics• Decide on the migration tools and process

based on the test

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Ex. for Kodak PCD: Testing Metrics

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Phase Three: Refine Plan

• Design how content created by the migration will interact with existing content and the repository– How will it be ingested into the repository?– What metadata will be added about the content

and migration?– What existing metadata will change?– What will be retained / de-accessioned?

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

The pathways differ in the settings (“film term”) used with the migration tool

The pathways differ in which source files will be used to create the new archival masters and deliverables

Kodak PCD Ex.: One of ThreeMigration Pathways

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

For the new JP2 archival masters and deliverables different color spaces are used

Intermediate TIFF files are generated but not kept

Phase Four: Execute Plan

• Set up migration environment• If necessary, custom development– create scripts to automate migration process– modify existing ingest tools

• Schedule migration• Perform migration• Ingest content into the repository

Plan for Test Test Refine

PlanExecute

Plan Wrap-up

Phase Five: Wrap-Up

• Verify and document results• Schedule any clean-up needed, e.g.:– de-accessioning of replaced content– changes to identifiers for new use copies

• Decide on final disposition of migration documentation and “file” it (e.g. delete, save, preserve, expose)

• Adjust migration framework ifneeded Plan for

Test Test Refine Plan

Execute Plan Wrap-up

Key Lessons Learned

• Users are the experts on obsolescence– Repository managers need to solicit use problems

from end users, support staff, etc.• Migrations are more complex/nuanced than

commonly depicted in the literature• This generic framework is useful in practice &

can help us institutionalize format migrations

Thank You

Joey HeinenDigital Production Coordinator

Northeastern University Librariesj.heinen@neu.edu

@joey_heinen

Andrea GoethalsManager of Digital Preservation and Repository

ServicesHarvard Library

andrea_goethals@harvard.edu

Recommended