Upload
austen-tyrell
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Building an Audio Preservation System at
Indiana University Using Standards
and Best Practices
Mike Casey, Archives of Traditional Music
Jon Dunn, Digital Library Program
Jenn Riley, Digital Library Program
April 14, 2008
April 18, 2023
Plus many more!
Audio/Video at IUB
• AAAMC• Music Library• University Archives• ATM• HPER• Radio/TV• Center for the Study of History and
Memory• Kinsey Institute• Athletics• Emeriti House• Office of University Marketing• Wells Library
• AISRI• Office of Dean of the Faculties• Lilly Library• Alumni Relations• School of Journalism• School of Music• School of Law• Traditional Arts Indiana• Department of History• Department of Anthropology• Department of
Folklore/Ethnomusicology• Black Film Center/Archive
By the Numbers
• ATM: 110,000 (mostly) audio recordings• Wells Library, Kent Cooper Room: 20,000 videos• Music Library: 137,000 audio recordings
*2,000 lacquer discs
*8,000 DATs
*50,000 open reel tapes• CSHM: 3,200 audio recordings• AAAMC: 5,900 audio recordings
Obsolescence
• Audio formats• Equipment (playback machines, test devices)• Repair parts• Playback expertise• Repair expertise• Tools• Supplies
Preservation in the Analog Domain
• Life expectancy critically important• Predicting when a recording will fail• Quest for the eternal carrier• Target preservation format-mastering-quality
open reel tape• Standards set in the mid-1980’s-ARSC/AAA
The New Paradigm
• Eternal sound carriers never available• Maintaining equipment long-term
unmanageable
Therefore, classical preservation strategy is hopeless
The New Paradigm
• Preserve the content, not the carrier• The eternal file, not the eternal carrier• Use digital mass storage systems
Longevity of carriers in mass storage systems of minor importance
Standards and Best Practices
• Ensure Quality• Provide Philosophical/Ethical Foundation• Encourage Sustainability• Foster Interoperability• Provide a Migration Path
Preserving Digital Information
• Advantage: Digital information may be copied without degradation
• Disadvantage: Digital information requires active management in order to remain accessible
Risks of Digital Information: Bit Loss• Degradation of physical media
– Optical, magnetic• Damage or theft of physical media• Media obsolescence
– Ability to read physical media– Ability to read logical media format
Risks of Digital Information: Semantic Loss• Even if the bits are intact, can a file still be
understood?
• File format obsolescence• Loss of context
– Insufficient metadata
Risks of Digital Information: Integrity• How do we know whether or not information
has been altered, whether intentionally or unintentionally?
Methods of Mitigating Risks
• Migration– Migration of data to new physical media– Migration of data to new file formats
• Replication– Multiple copies of data in multiple locations
• Validation– Retain checksums for files, routinely retrieve
files and compare against checksums
Scaling Digital Preservation
• Migration, replication, and validation require:– Automated processes– Ongoing monitoring, management, and
planning– Ongoing funding for technology refresh
Digital Repositories
• Centrally-managed systems for storage (and delivery) of digital information
• Leverage economies of scale for storage and management costs
• Support preservation integrity functions (migration, replication, validation)
• Much easier to manage than many little pockets of digital information
OAIS: Open Archival Information System• ISO Standard 14721:2003• Origins in space science community• Conceptual framework for an archival system
dedicated to preserving and maintaining access to digital information over the long term
• Basis for much work on digital preservation within the library and archive community
Preservation Packages in OAIS
• Preservation package– Digital content plus metadata
• SIP: Submission Information Package• AIP: Archival Information Package• DIP: Dissemination Package
From OAIS to Trusted Digital Repositories• 2002 OCLC-RLG task force report:
– Trusted Digital Repositories: Attributes and Responsibilities
• What are the attributes of a trusted repository?– OAIS compliance– Administrative responsibility– Organizational viability– Financial sustainability– System security– Procedural accountability
Trusted Digital Repositories: Auditing and Certiciation• Digital Repository Audit Method Based on Risk
Assessment (DRAMBORA)– http://www.repositoryaudit.edu/
• Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist– OCLC/NARA/CRL report– http://www.crl.edu/PDF/trac.pdf
Archives of Traditional Music
• Established 1948 • 110,000 recordings• 1890s to present• Field—30%• World music traditions• Endangered/extinct world languages
Sound DirectionsDigital Preservation and Access for Global Audio Heritage
• Collaboration between Harvard University and Indiana University
• Phase 1 an R&D project funded by NEH• Focus on preservation
Sound DirectionsDigital Preservation and Access for Global Audio Heritage
Project Partners
• Archives of Traditional Music, Indiana University• Archive of World Music, Harvard University• Harvard College Library Audio Preservation Services• Digital Library Program, Indiana University• Office for Information Systems, Harvard University
Sound DirectionsDigital Preservation and Access for Global Audio Heritage
Objectives
• Research best practices in areas without standards or best practices
• Develop best practices to meet existing and emerging standards
• Test existing and emerging standards/best practices with a real world project
Sound DirectionsDigital Preservation and Access for Global Audio Heritage
Results
Publication—
Sound Directions: Best Practices for Audio Preservation Development of audio preservation system Software tools Preservation of field collections
Sound DirectionsDigital Preservation and Access for Global Audio Heritage
Project Future
• “Preservation” Phase funded by NEH• Increase throughput• Simultaneous transfer• Indiana automation• Release ATMC• Develop new access system for field collections
Migration decision
Workflow management
Workflow management / scheduling
Cleaning or physical restoration as needed
System / ProjectPlanning & DevelopmentFundingPersonnel / VendorEquipment
Software ToolsCreation / maintenance of software and scripts
Selection for PreservationAssess research valueEvaluate conditionConsider political, technical, and other issuesEstablish priorities
Digitization Analog playbackA/D conversionCreation of Preservation Master FilesLocal filenames
Digitization Technical metadata Structural metadata ChecksumsQuality controlLocal storage solution
Post-Transfer ProcessingQuality controlGeneration of derivativesMarking areas of interest in filesSignal processing (if appropriate)
Preliminary Work / Pilot Project
Exploratory transfers and metadata collectionQuality controlReassessment of digitization plan
Collection SetupGather and assess documentationEvaluate collection needs / condition Assess cataloging / descriptive metadata issuesDevelop digitization planAssess and calibrate equipment
Ingestion into / Copy to Long-Term Storage
SolutionPreservation packages
Periodic EvaluationData integrity checkingFormat obsolescence analysis
MigrationNew carrierNew format
Common sense definition of a system:
• Set of interacting units or elements• Forms an integrated whole• Performs a function
A few basic principles…
• Each element/part affects the whole• Whole is greater than sum of parts• Inputs and outputs• Equifinality
What should we preserve?Selection for Preservation
• Analysis of research value• Evaluation of preservation condition and risk
Data Collection/Analysis
Research ValueScore
Condition/Risk(FACET)
Score
Combined SelectionScore
Collection Ranking
Curatorial Review
Selection forPreservation
FACET
• Software tool—point-based, collection level• Analyzes data on condition of field formats• Returns a risk assessment score
Where should preservation work be done?
• In-house or outsource?• Issues: studio space, technical expertise,
amount of work, future location of expertise• Critical listening spaces• Development of preservation studio
Who should do preservation transfer work?
• Audio engineer• Importance of analog playback stage• Audio examples
Who and Where Best Practices
• Use audio engineers in the workflow where their skill is required
• Critical listening environment• Use cleanest, most direct signal path to
converter• Instant comparison from playback machine and
post A/D converter• Test/calibration chain
What is the target preservation format?
• Digital file• Broadcast Wave Format (BWF or BWAV)
Preservation involves a long-term responsibility to the digital file
What do we look for in a file format?
• Disclosure• Adoption• Transparency• Self-documentation• External dependencies• Impact of patents• Technical protection mechanisms
http://www.digitalpreservation.gov/formats/sustain/sustain.shtml
Broadcast Wave Format
• Audio file format based on .wav files• EBU 1996 for the exchange of files• Non-proprietary • Recommended by IASA, AES, NARAS, Sound
Directions for preservation• “Chunk” for metadata residing with the file• Time stamp
Broadcast Wave Format
Metadata elements include:
• Description of the sound sequence• Name of the originator• Date/time• Coding history (signal chain components)• Format independent, sample accurate time
stamp• “Catastrophic” metadata
How do we define the files we create?
• What is in them?• How are they created?• What do they represent?
Preservation (Archival) Master Files
Best Practice Documents
• Unmodified• No subjective alterations or improvements• Preserve history, not re-write it• As true to the original source as possible
Preservation (Archival) Master Files
• Complete, unaltered stream from playback machine
• Carrier of raw material from transfer• No editing, signal processing, data reduction,
gain manipulation, announcements (slates)• 24 bit, 96 kHz
Preservation (Archival) Master Files
Best Practices
• Define purpose of every digital file• Written guidelines on characteristics of files• Written guidelines on “technical” and content
edits• Maintain common reference timeline
Data Integrity
Data integrity checking “Checksums” MD5 hash or algorithm
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
Data Integrity
• All files with enduring value• As soon as possible• Critical metadata stored in database and in
preservation package• Verify before trusting
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
How do we make the preserved content understandable and manageable?
• Descriptive Metadata• Administrative—Technical Metadata• Administrative—Digital Provenance• Administrative—Rights Management• Structural Metadata
Audio Technical Metadata Collector (ATMC)
• Enter/edit technical and structural metadata• Audio object and process history metadata• Enter/edit audio object evaluations• Parse files to collect metadata
Quality Control and Assurance
• Quality control vs. quality assurance• QC at ATM: aural, visual, software tools• Collection setup—preliminary transfers• Role of permanent staff• QA at ATM
How do we store the data immediately after capture?
• Local, interim storage• Backup copies at each stage• ATM NAS• Additional redundant copy
Director
Project DevelopmentSelection for Preservation
Archivist
SelectionPreview CollectionsQC Documentation
Librarian
Cataloging Issues
Associate Director
Project ManagementSelection—Format IssuesScheduling Coordination
QCR&D
Audio Engineer
Preservation TransferPreservation Master FilesTechnical MD Collection
ChecksumsBWAV MD
ADL’sSignal Processing
Project Assistant
Content DivisionProduction Masters
QCADL’s
Workflow ManagementCollection SetupIngestion Process
ProgrammerSoftware/ScriptDevelopment
Digital Library Program
Preservation Repository ServicesDeliverables
Access System
What is metadata?
• “The stuff we need to know in order to discover and manage data over the long term”
• Here’s a better definition:
“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”
NISO. “Understanding Metadata.” 2004. <http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>
Metadata standards
• Standards define mutually agreed-upon:– Definitions of key terms– “Fields” of data to record– Rules for structuring data in these fields
• In this area, generally expressed in XML • Allow us to benefit from community experience• Promote preservation by providing for more
predictable data
Evaluating metadata standards
• Good fit for the type of material I have?• Supports my access/management/preservation
needs?• Are there existing tools to help me create it?• Has it been used before in similar situations?• Who maintains it?• How quickly are the standards in this
environment changing?
Creating metadata
• Generally not done by humans encoding data directly in the storage format
• Instead:– Humans use tools designed for specific
purposes– Derived computationally from the digital
resource itself
Technical metadata
• Tracks properties of a digital file necessary for its rendering and processing
• Can also include data about the circumstances of creation of a digital file
• Often format- or media-specific• Much can be generated automatically from
digital file
Digital provenance metadata
• Tracks the history of a set of related digital files– Can include the methodology by which the
“master” file was created from an analog source (overlap with technical metadata)
– What transformative processes have been applied to the file
– Relationship of “derivative” files to the “master”
Structural metadata
• Documents relationships within and between digital files– Locating the same intellectual content on
multiple representations– Noting points of interest within a single
resource– Grouping and sequencing multiple files that
make up a logical whole
Rights metadata
• Covers legal, moral/ethical, financial rights over resources– Rights holders– Copyright status– Conditions on access– Usage fees/royalty payments
• Can be in human- or machine-readable format
Descriptive metadata
• Like “cataloging”• Allows users and collection managers to find
and identify resources of interest• Factual information such as creator, date
created, running time (overlap with technical metadata)
• Constructed information such as title• Subjective information such as topic, genre
Preservation metadata
• Some overlap with technical and process history metadata
• Catch-all for all the metadata we need to support the preservation process that’s not recorded elsewhere
• Most important feature: tracking events that occur during the preservation process
Types of preservation packages
• According to OAIS:– Submission information package (SIP)– Archival information package (AIP)– Dissemination information package (DIP)
• The AIP is what is stored (potentially broken up into pieces) in the IU repository
• Metadata Encoding and Transmission Standard (METS) used to wrap various pieces together
Information representation
• Repository needs two simultaneous views of the content it manages– Physical files– Functions the repository needs to support
Technical metadata
Audio Engineering Society, Core Audio Schema Draft. AES X098-B/SC-03-06.
Also record for analog source object!
Digital provenance metadata
Audio Engineering Society, Audio Processing History Draft. AES X098-C/SC-03-06.
Structural metadata (1)
Audio Engineering Society, Audio Decision List. AES 31-3andMetadata Encoding and Transmisson Standard (METS), <structMap> section
Structural metadata (2)
Audio Engineering Society, Audio Decision List. AES 31-3andMetadata Encoding and Transmisson Standard (METS), <structMap> section
Rights metadata
• For field audio collections, the ATM knows:– Collector– Terms of deposit governing access
• This area still under develop for the IU repository
• No decision yet on metadata format; need more thorough analysis of the functions this metadata needs to support
Preservation metadata
• Still under investigation for IU repository, for all formats of material
• Will need to implement before any preservation events occur
• Will likely use PReservation Metadata Implementation Strategies (PREMIS) data dictionaries and schema
Need to share
• Copies in multiple repositories can help ensure preservation
• Sound Directions did a test exchange of content between IU and Harvard– Different repository architectures– Different preservation package structures
• ...demonstrated how different levels of preservation are possible
Two Repositories Supported by the Digital Library Program• IUScholarWorks Repository
– “Institutional Repository”• For preserving and providing access to IU’s
research output: articles, papers, etc.
– Based on DSpace software• IU Digital Library Repository
– General-purpose digital content repository– Based on Fedora software
Fedora
• Flexible Extensible Digital Object Repository Architecture
• Open source digital repository software developed by Cornell and the University of Virginia
• Supported by new organization: Fedora Commons
• Basis for IU Digital Library Repository
Moving Content to a Digital Repository – Idealized Workflow
Master audio filesin MDSS
Delivery audiofiles on streaming server
Metadata records ondisk
ATMC/AudioWorkstation Upload
preservationpackage
Temporary Server Disk
Storage
FedoraRepository
Validate and ingest
IU Massive Data Storage System(MDSS)• Hierarchical storage management
– Some storage on hard disks– Much more storage on automated tape
• Managed by UITS Research Technologies• Servers in Bloomington and Indianapolis
connected via I-Light high-speed fiber link• Total capacity: 2+ petabytes• Need to build Fedora-MDSS connection
Repository Status
• Fedora is running in production– Supporting access to image and text
collections– Experiments with loading audio and video
• Need to improve tools for ingest and retrieval to support audio projects
• Not yet a true preservation repository
Toward a Preservation Repository
• Need to add:– File integrity validation– Integration with MDSS – replication of data– Eventually, file format obsolescence
monitoring and migration• Self-audit and/or external certification as
Trusted Digital Repository– DRAMBORA, TRAC
Access Systems
• Variations2– variations2.indiana.edu– Provides access to cataloged commercial
recordings from the Music Library and ATM
• Need access system to provide discovery and delivery of field collections and other types of archival audio collections
Questions?
• [email protected]• [email protected]• [email protected]
• www.dlib.indiana.edu/projects/sounddirections/