100
Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital Library Program Jenn Riley, Digital Library Program April 14, 2008

Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital

Embed Size (px)

Citation preview

Building an Audio Preservation System at

Indiana University Using Standards

and Best Practices

Mike Casey, Archives of Traditional Music

Jon Dunn, Digital Library Program

Jenn Riley, Digital Library Program

April 14, 2008

The Problem

• Numbers• Degradation• Obsolescence

April 18, 2023

Plus many more!

Audio/Video at IUB

• AAAMC• Music Library• University Archives• ATM• HPER• Radio/TV• Center for the Study of History and

Memory• Kinsey Institute• Athletics• Emeriti House• Office of University Marketing• Wells Library

• AISRI• Office of Dean of the Faculties• Lilly Library• Alumni Relations• School of Journalism• School of Music• School of Law• Traditional Arts Indiana• Department of History• Department of Anthropology• Department of

Folklore/Ethnomusicology• Black Film Center/Archive

By the Numbers

• ATM: 110,000 (mostly) audio recordings• Wells Library, Kent Cooper Room: 20,000 videos• Music Library: 137,000 audio recordings

*2,000 lacquer discs

*8,000 DATs

*50,000 open reel tapes• CSHM: 3,200 audio recordings• AAAMC: 5,900 audio recordings

Obsolescence

• Audio formats• Equipment (playback machines, test devices)• Repair parts• Playback expertise• Repair expertise• Tools• Supplies

Preservation in the Analog Domain

• Life expectancy critically important• Predicting when a recording will fail• Quest for the eternal carrier• Target preservation format-mastering-quality

open reel tape• Standards set in the mid-1980’s-ARSC/AAA

The New Paradigm

• Eternal sound carriers never available• Maintaining equipment long-term

unmanageable

Therefore, classical preservation strategy is hopeless

The New Paradigm

• Preserve the content, not the carrier• The eternal file, not the eternal carrier• Use digital mass storage systems

Longevity of carriers in mass storage systems of minor importance

Standards and Best Practices

• Ensure Quality• Provide Philosophical/Ethical Foundation• Encourage Sustainability• Foster Interoperability• Provide a Migration Path

Preserving Digital Information

• Advantage: Digital information may be copied without degradation

• Disadvantage: Digital information requires active management in order to remain accessible

Risks of Digital Information: Bit Loss• Degradation of physical media

– Optical, magnetic• Damage or theft of physical media• Media obsolescence

– Ability to read physical media– Ability to read logical media format

Risks of Digital Information: Semantic Loss• Even if the bits are intact, can a file still be

understood?

• File format obsolescence• Loss of context

– Insufficient metadata

Risks of Digital Information: Integrity• How do we know whether or not information

has been altered, whether intentionally or unintentionally?

Methods of Mitigating Risks

• Migration– Migration of data to new physical media– Migration of data to new file formats

• Replication– Multiple copies of data in multiple locations

• Validation– Retain checksums for files, routinely retrieve

files and compare against checksums

Scaling Digital Preservation

• Migration, replication, and validation require:– Automated processes– Ongoing monitoring, management, and

planning– Ongoing funding for technology refresh

Digital Repositories

• Centrally-managed systems for storage (and delivery) of digital information

• Leverage economies of scale for storage and management costs

• Support preservation integrity functions (migration, replication, validation)

• Much easier to manage than many little pockets of digital information

OAIS: Open Archival Information System• ISO Standard 14721:2003• Origins in space science community• Conceptual framework for an archival system

dedicated to preserving and maintaining access to digital information over the long term

• Basis for much work on digital preservation within the library and archive community

OAIS Reference Model

Preservation Packages in OAIS

• Preservation package– Digital content plus metadata

• SIP: Submission Information Package• AIP: Archival Information Package• DIP: Dissemination Package

From OAIS to Trusted Digital Repositories• 2002 OCLC-RLG task force report:

– Trusted Digital Repositories: Attributes and Responsibilities

• What are the attributes of a trusted repository?– OAIS compliance– Administrative responsibility– Organizational viability– Financial sustainability– System security– Procedural accountability

Trusted Digital Repositories: Auditing and Certiciation• Digital Repository Audit Method Based on Risk

Assessment (DRAMBORA)– http://www.repositoryaudit.edu/

• Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist– OCLC/NARA/CRL report– http://www.crl.edu/PDF/trac.pdf

Archives of Traditional Music

• Established 1948 • 110,000 recordings• 1890s to present• Field—30%• World music traditions• Endangered/extinct world languages

Sound DirectionsDigital Preservation and Access for Global Audio Heritage

• Collaboration between Harvard University and Indiana University

• Phase 1 an R&D project funded by NEH• Focus on preservation

Sound DirectionsDigital Preservation and Access for Global Audio Heritage

Project Partners

• Archives of Traditional Music, Indiana University• Archive of World Music, Harvard University• Harvard College Library Audio Preservation Services• Digital Library Program, Indiana University• Office for Information Systems, Harvard University

Sound DirectionsDigital Preservation and Access for Global Audio Heritage

Objectives

• Research best practices in areas without standards or best practices

• Develop best practices to meet existing and emerging standards

• Test existing and emerging standards/best practices with a real world project

Sound DirectionsDigital Preservation and Access for Global Audio Heritage

Results

Publication—

Sound Directions: Best Practices for Audio Preservation Development of audio preservation system Software tools Preservation of field collections

Sound DirectionsDigital Preservation and Access for Global Audio Heritage

Project Future

• “Preservation” Phase funded by NEH• Increase throughput• Simultaneous transfer• Indiana automation• Release ATMC• Develop new access system for field collections

Migration decision

Workflow management

Workflow management / scheduling

Cleaning or physical restoration as needed

System / ProjectPlanning & DevelopmentFundingPersonnel / VendorEquipment

Software ToolsCreation / maintenance of software and scripts

Selection for PreservationAssess research valueEvaluate conditionConsider political, technical, and other issuesEstablish priorities

Digitization Analog playbackA/D conversionCreation of Preservation Master FilesLocal filenames

Digitization Technical metadata Structural metadata ChecksumsQuality controlLocal storage solution

Post-Transfer ProcessingQuality controlGeneration of derivativesMarking areas of interest in filesSignal processing (if appropriate)

Preliminary Work / Pilot Project

Exploratory transfers and metadata collectionQuality controlReassessment of digitization plan

Collection SetupGather and assess documentationEvaluate collection needs / condition Assess cataloging / descriptive metadata issuesDevelop digitization planAssess and calibrate equipment

Ingestion into / Copy to Long-Term Storage

SolutionPreservation packages

Periodic EvaluationData integrity checkingFormat obsolescence analysis

MigrationNew carrierNew format

Common sense definition of a system:

• Set of interacting units or elements• Forms an integrated whole• Performs a function

A few basic principles…

• Each element/part affects the whole• Whole is greater than sum of parts• Inputs and outputs• Equifinality

What should we preserve?Selection for Preservation

• Analysis of research value• Evaluation of preservation condition and risk

Data Collection/Analysis

Research ValueScore

Condition/Risk(FACET)

Score

Combined SelectionScore

Collection Ranking

Curatorial Review

Selection forPreservation

FACET

• Software tool—point-based, collection level• Analyzes data on condition of field formats• Returns a risk assessment score

FACET Package

• Software• Formats document• Procedures manual• FACET worksheets

Where should preservation work be done?

• In-house or outsource?• Issues: studio space, technical expertise,

amount of work, future location of expertise• Critical listening spaces• Development of preservation studio

Who should do preservation transfer work?

• Audio engineer• Importance of analog playback stage• Audio examples

Who and Where Best Practices

• Use audio engineers in the workflow where their skill is required

• Critical listening environment• Use cleanest, most direct signal path to

converter• Instant comparison from playback machine and

post A/D converter• Test/calibration chain

What is the target preservation format?

• Digital file• Broadcast Wave Format (BWF or BWAV)

Preservation involves a long-term responsibility to the digital file

What do we look for in a file format?

• Disclosure• Adoption• Transparency• Self-documentation• External dependencies• Impact of patents• Technical protection mechanisms

http://www.digitalpreservation.gov/formats/sustain/sustain.shtml

Broadcast Wave Format

• Audio file format based on .wav files• EBU 1996 for the exchange of files• Non-proprietary • Recommended by IASA, AES, NARAS, Sound

Directions for preservation• “Chunk” for metadata residing with the file• Time stamp

Broadcast Wave Format

Metadata elements include:

• Description of the sound sequence• Name of the originator• Date/time• Coding history (signal chain components)• Format independent, sample accurate time

stamp• “Catastrophic” metadata

How do we define the files we create?

• What is in them?• How are they created?• What do they represent?

Preservation (Archival) Master Files

Best Practice Documents

• Unmodified• No subjective alterations or improvements• Preserve history, not re-write it• As true to the original source as possible

Preservation (Archival) Master Files

• Complete, unaltered stream from playback machine

• Carrier of raw material from transfer• No editing, signal processing, data reduction,

gain manipulation, announcements (slates)• 24 bit, 96 kHz

Preservation (Archival) Master Files

Best Practices

• Define purpose of every digital file• Written guidelines on characteristics of files• Written guidelines on “technical” and content

edits• Maintain common reference timeline

Data Integrity

Data integrity checking “Checksums” MD5 hash or algorithm

A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav

Data Integrity

• All files with enduring value• As soon as possible• Critical metadata stored in database and in

preservation package• Verify before trusting

A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav

How do we make the preserved content understandable and manageable?

• Descriptive Metadata• Administrative—Technical Metadata• Administrative—Digital Provenance• Administrative—Rights Management• Structural Metadata

Audio Technical Metadata Collector (ATMC)

• Enter/edit technical and structural metadata• Audio object and process history metadata• Enter/edit audio object evaluations• Parse files to collect metadata

Quality Control and Assurance

• Quality control vs. quality assurance• QC at ATM: aural, visual, software tools• Collection setup—preliminary transfers• Role of permanent staff• QA at ATM

How do we store the data immediately after capture?

• Local, interim storage• Backup copies at each stage• ATM NAS• Additional redundant copy

Director

Project DevelopmentSelection for Preservation

Archivist

SelectionPreview CollectionsQC Documentation

Librarian

Cataloging Issues

Associate Director

Project ManagementSelection—Format IssuesScheduling Coordination

QCR&D

Audio Engineer

Preservation TransferPreservation Master FilesTechnical MD Collection

ChecksumsBWAV MD

ADL’sSignal Processing

Project Assistant

Content DivisionProduction Masters

QCADL’s

Workflow ManagementCollection SetupIngestion Process

ProgrammerSoftware/ScriptDevelopment

Digital Library Program

Preservation Repository ServicesDeliverables

Access System

The Role of Metadata in Digital Preservation

What is metadata?

• “The stuff we need to know in order to discover and manage data over the long term”

• Here’s a better definition:

“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”

NISO. “Understanding Metadata.” 2004. <http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>

Metadata standards

• Standards define mutually agreed-upon:– Definitions of key terms– “Fields” of data to record– Rules for structuring data in these fields

• In this area, generally expressed in XML • Allow us to benefit from community experience• Promote preservation by providing for more

predictable data

Evaluating metadata standards

• Good fit for the type of material I have?• Supports my access/management/preservation

needs?• Are there existing tools to help me create it?• Has it been used before in similar situations?• Who maintains it?• How quickly are the standards in this

environment changing?

Creating metadata

• Generally not done by humans encoding data directly in the storage format

• Instead:– Humans use tools designed for specific

purposes– Derived computationally from the digital

resource itself

Technical metadata

• Tracks properties of a digital file necessary for its rendering and processing

• Can also include data about the circumstances of creation of a digital file

• Often format- or media-specific• Much can be generated automatically from

digital file

Digital provenance metadata

• Tracks the history of a set of related digital files– Can include the methodology by which the

“master” file was created from an analog source (overlap with technical metadata)

– What transformative processes have been applied to the file

– Relationship of “derivative” files to the “master”

Structural metadata

• Documents relationships within and between digital files– Locating the same intellectual content on

multiple representations– Noting points of interest within a single

resource– Grouping and sequencing multiple files that

make up a logical whole

Rights metadata

• Covers legal, moral/ethical, financial rights over resources– Rights holders– Copyright status– Conditions on access– Usage fees/royalty payments

• Can be in human- or machine-readable format

Descriptive metadata

• Like “cataloging”• Allows users and collection managers to find

and identify resources of interest• Factual information such as creator, date

created, running time (overlap with technical metadata)

• Constructed information such as title• Subjective information such as topic, genre

Preservation metadata

• Some overlap with technical and process history metadata

• Catch-all for all the metadata we need to support the preservation process that’s not recorded elsewhere

• Most important feature: tracking events that occur during the preservation process

Preservation Packages

Types of preservation packages

• According to OAIS:– Submission information package (SIP)– Archival information package (AIP)– Dissemination information package (DIP)

• The AIP is what is stored (potentially broken up into pieces) in the IU repository

• Metadata Encoding and Transmission Standard (METS) used to wrap various pieces together

Information representation

• Repository needs two simultaneous views of the content it manages– Physical files– Functions the repository needs to support

Technical metadata

Audio Engineering Society, Core Audio Schema Draft. AES X098-B/SC-03-06.

Also record for analog source object!

Digital provenance metadata

Audio Engineering Society, Audio Processing History Draft. AES X098-C/SC-03-06.

Structural metadata (1)

Audio Engineering Society, Audio Decision List. AES 31-3andMetadata Encoding and Transmisson Standard (METS), <structMap> section

Structural metadata (2)

Audio Engineering Society, Audio Decision List. AES 31-3andMetadata Encoding and Transmisson Standard (METS), <structMap> section

Rights metadata

• For field audio collections, the ATM knows:– Collector– Terms of deposit governing access

• This area still under develop for the IU repository

• No decision yet on metadata format; need more thorough analysis of the functions this metadata needs to support

Descriptive metadata (1)

MARCXML

Descriptive metadata (2)

METS reference to external Word document

Preservation metadata

• Still under investigation for IU repository, for all formats of material

• Will need to implement before any preservation events occur

• Will likely use PReservation Metadata Implementation Strategies (PREMIS) data dictionaries and schema

Need to share

• Copies in multiple repositories can help ensure preservation

• Sound Directions did a test exchange of content between IU and Harvard– Different repository architectures– Different preservation package structures

• ...demonstrated how different levels of preservation are possible

Two Repositories Supported by the Digital Library Program• IUScholarWorks Repository

– “Institutional Repository”• For preserving and providing access to IU’s

research output: articles, papers, etc.

– Based on DSpace software• IU Digital Library Repository

– General-purpose digital content repository– Based on Fedora software

Fedora

• Flexible Extensible Digital Object Repository Architecture

• Open source digital repository software developed by Cornell and the University of Virginia

• Supported by new organization: Fedora Commons

• Basis for IU Digital Library Repository

Moving Content to a Digital Repository – Idealized Workflow

Master audio filesin MDSS

Delivery audiofiles on streaming server

Metadata records ondisk

ATMC/AudioWorkstation Upload

preservationpackage

Temporary Server Disk

Storage

FedoraRepository

Validate and ingest

IU Massive Data Storage System(MDSS)• Hierarchical storage management

– Some storage on hard disks– Much more storage on automated tape

• Managed by UITS Research Technologies• Servers in Bloomington and Indianapolis

connected via I-Light high-speed fiber link• Total capacity: 2+ petabytes• Need to build Fedora-MDSS connection

Repository Status

• Fedora is running in production– Supporting access to image and text

collections– Experiments with loading audio and video

• Need to improve tools for ingest and retrieval to support audio projects

• Not yet a true preservation repository

Toward a Preservation Repository

• Need to add:– File integrity validation– Integration with MDSS – replication of data– Eventually, file format obsolescence

monitoring and migration• Self-audit and/or external certification as

Trusted Digital Repository– DRAMBORA, TRAC

Access Systems

• Variations2– variations2.indiana.edu– Provides access to cataloged commercial

recordings from the Music Library and ATM

• Need access system to provide discovery and delivery of field collections and other types of archival audio collections

Questions?

[email protected][email protected][email protected]

• www.dlib.indiana.edu/projects/sounddirections/