NISO Two-Part Webinar Sustainable Information, Part 2:
Digital Preservation for Audio-Visual Content
Wednesday, December 17, 2014
Speakers:
Andrea Goethals, Manager of Digital Preservation and Repository Services, Harvard University Library
David Ackerman, Head of Media Preservation, Harvard University Library
Brian Campanotti, Chief Technical Officer, Front Porch Digital
Tom Cramer, Chief Technology Strategist & Associate Director, Digital Library Systems & Services, Stanford University Libraries
http://www.niso.org/news/events/2014/webinars/text_preservation/
Planning for Video Preservation Services at Harvard
NISO Webinar, Dec. 17, 2014David Ackerman & Andrea Goethals, Harvard Library
Agenda
• Preserving video – analysis and decisions (Andrea)
• Reformatting video – workflows and challenges (David)
PRESERVING VIDEO – ANALYSIS & DECISIONS
Andrea Goethals
Video23%
Vector Graphics16%
Office Documents14%
DNG6%
3D Models5%
Software6%
Other Still Images6%
Databases6%
Ebooks5%
Datasets5%
Web Sites3%
Other OCR Text1%
Newspaper2% GIS
2%
DRS Format Requests (2004 – Present)
Chart last updated 12/8/2014
Building Blocks Across Harvard
• Service providers– Preservation Services (Digital Preservation, Media
Preservation)– IT (HUIT / Library Technology Services / Digital Video
Services)
• Infrastructure– Digital Repository Service– Media Preservation’s digitization studio
• Users, collectors, creators– Harvard repositories and schools– HarvardX, DCE and other RTL video generators– Current & future researchers, teachers, learners
Building Blocks – Beyond
• Kaltura (video management and delivery)
• MediaSite (lecture capture)
• 3Play Video (video captioning)
• AVPreserve
Format Analysis: CriteriaAvailability onlineBackward/Forward CompatibilityCommunity/3rd Party SupportComplexityCompressionCostDeveloper/Corporate SupportDomain SpecificityEase of IdentificationEase of ValidationError-toleranceExpertise AvailableGeographic SpreadInstitutional PoliciesLegal RestrictionsLifetimeMetadata SupportRendering Software AvailableRevision RateSpecifications AvailableSpecification QualityStandardizationStorage SpaceTechnical DependenciesTechnical Protection MechanismUbiquityValueViruses
Format Analysis: CriteriaAvailability online Browser SupportBackward/Forward CompatibilityCommunity/3rd Party SupportComplexity Level of Format ComplexityCompression Degree to which Compression is UnderstoodCost to Maintain Environment for Access and ProcessingDeveloper/Corporate SupportDomain SpecificityEase of IdentificationEase of Validation Accurate ValidationError-toleranceExpertise AvailableGeographic SpreadInstitutional PoliciesLegal Restrictions Affecting Use Now or Long-TermLifetimeMetadata Support Descriptive Metadata Support; Technical Metadata SupportRendering Software Available Quantity and Availability of Rendering SoftwareRevision RateSpecifications AvailableSpecification Quality Degree to Which Specification is Complete and UnderstandableStandardization StandardizedStorage Space Storage Requirements Relative to Other Similar FormatsTechnical Dependencies Dependence on Particular HW/SWTechnical Protection Mechanism Support for Technical Protection MechanismsUbiquity Widespread Use by Consumers; Widespread Use by ProfessionalsValueViruses Malware
Format Analysis: CriteriaAvailability online Browser SupportBackward/Forward CompatibilityCommunity/3rd Party SupportComplexity Level of Format ComplexityCompression Degree to which Compression is UnderstoodCost to Maintain Environment for Access and ProcessingDeveloper/Corporate SupportDomain SpecificityEase of IdentificationEase of Validation Accurate ValidationError-toleranceExpertise AvailableGeographic SpreadInstitutional PoliciesLegal Restrictions Affecting Use Now or Long-TermLifetimeMetadata Support Descriptive Metadata Support; Technical Metadata SupportRendering Software Available Quantity and Availability of Rendering SoftwareRevision RateSpecifications AvailableSpecification Quality Degree to Which Specification is Complete and UnderstandableStandardization StandardizedStorage Space Storage Requirements Relative to Other Similar FormatsTechnical Dependencies Dependence on Particular HW/SWTechnical Protection Mechanism Support for Technical Protection MechanismsUbiquity Widespread Use by Consumers; Widespread Use by ProfessionalsValueViruses Malware
Dependency on a Single Organization or CompanyArchival Use
Ability to Encode in True Lossless CompressionAbility to Encode in Visually Lossless CompressionMax Chroma SubsamplingMax ResolutionHighest Bit ResolutionHighest Supported BitrateCompression Ratio
Format Analysis: CriteriaAvailability online Browser SupportBackward/Forward CompatibilityCommunity/3rd Party SupportComplexity Level of Format ComplexityCompression Degree to which Compression is UnderstoodCost to Maintain Environment for Access and ProcessingDeveloper/Corporate SupportDomain SpecificityEase of IdentificationEase of Validation Accurate ValidationError-toleranceExpertise AvailableGeographic SpreadInstitutional PoliciesLegal Restrictions Affecting Use Now or Long-TermLifetimeMetadata Support Descriptive Metadata Support; Technical Metadata SupportRendering Software Available Quantity and Availability of Rendering SoftwareRevision RateSpecifications AvailableSpecification Quality Degree to Which Specification is Complete and UnderstandableStandardization StandardizedStorage Space Storage Requirements Relative to Other Similar FormatsTechnical Dependencies Dependence on Particular HW/SWTechnical Protection Mechanism Support for Technical Protection MechanismsUbiquity Widespread Use by Consumers; Widespread Use by ProfessionalsValueViruses Malware
Dependency on a Single Organization or CompanyArchival Use
Ability to Encode in True Lossless CompressionAbility to Encode in Visually Lossless CompressionMax Chroma SubsamplingMax ResolutionHighest Bit ResolutionHighest Supported BitrateCompression Ratio
High importance
Medium importance
Low importance
Format Analysis: CriteriaCost to Maintain Environment for Access and ProcessingExpertise AvailableLegal Restrictions Affecting Use Now or Long-TermQuantity and Availability of Rendering SoftwareSpecifications AvailableDependence on Particular HW/SWWidespread Use by ConsumerWidespread Use by ProfessionalsDependency on a Single Organization or CompanyAbility to Encode in True Lossless CompressionAbility to Encode in Visually Lossless CompressionMax Chroma SubsamplingMax ResolutionHighest Bit ResolutionHighest Supported BitrateCompression Ratio
Preferred Formats
• Archival formats
– Uncompressed in QT, 8 or 10 bit
– JPEG 2000 in MXF or QT, recommend lossless
– DV in QT, only if from DV tape), many variations
– MPEG-2 in MPEG-2 or QT
• Delivery formats
– H.264 in QT, many profiles
Accepted Formats
• Archival formats
– DNxHD in MXF or QT
– ProRes in QT
• Delivery formats
– Theora in QT or Matroska
Metadata Analysis
• Technical metadata– Chose EBU Core 1.5 (aligns well with AES-60, structure
mirrors MediaInfo’s output)
– Considered PBCore
• Source metadata– Chose a revised UTVideoSrc (native suitability to
physical media, right amount of detail)
– Considered EBU Core
• Process history– Chose a revised reVTMD (specific, simple, sufficient)
Tool Analysis
• Chose: MediaInfo
– Raw output could map to metadata schemas
– Currently supported
– Widely adopted
• Others considered:
– ExifTool
– FFProbe
Video Content Model
OBJECT =
1 Object Descriptor
1..n Video Files
0..n Video Files
0..n Video Files
HAS_SOURCE
HAS_SOURCE
1 metadata file and 1 or more derivative video files
Video Object & Auxiliary Objects
OBJECTContent model = VIDEO1 or more derivative video files:
FILE...FILE...FILE
OBJECTContent model = TEXTObject-level role= VIDEO EDIT DECISION LIST
1 text file
OBJECTContent model = AUDIOObject-level role= DOUBLE SYSTEM AUDIO
1 or more derivative audio files
OBJECTContent model = TEXTObject-level role= CLOSED CAPTION DATA
1 text file
OBJECTContent model = TEXTObject-level role= SUBTITLE DATA
1 text file
OBJECTContent model = STILL IMAGEObject-level role= POSTER FRAME
1 or more derivative image files
OBJECTContent model = DISK IMAGEObject-level role= (TBD)
TBD files)
HAS_DOCUMENTATION
HAS_LARGER_CONTEXT
HAS_SUPPLEMENT
HAS_SUPPLEMENT
REFORMATTING WORKFLOWS AND CHALLENGES
Dave Ackerman
Video Format Breakdown from Preliminary Spot Surveys
D1,0.14%
Video8,0.57%
DV,3.28%
1"open-reeltape,3.28%
DVD-R,4.26%
Beta,5.73%
1/2"Open-reeltape,19.46%U-Ma c,
27.55%
Film,35.72%
On-site Video Reformatting Capability
• 1” open Reel Type C (NTSC)
• Digital betacam (NTSC)
• Betacam SP (NTSC)
• Beta-1
• ¾” U-matic (NTSC)
• Mini-DV (NTSC)
• DVCPRO (NTSC)
• DVCAM (NTSC)
• Video-8/Video Hi-8
• SVHS (NTSC)
• VHS (NTSC)
• Laserdisc
• DVD
Video Storage Cost Comparison:DRS Fee / Video Hour / Storage Year
$0.00
$500.00
$1,000.00
$1,500.00
$2,000.00
$2,500.00
$3,000.00
$3,500.00
$4,000.00
$4,500.00
$5,000.00
$5,500.00
$6,000.00
$6,500.00
$7,000.00
NTSCUncompressed8bit(RGBA)
PALUncompressed8bit(RGBA)
720P29.97uncompressed8bit(RGBA)
1080P29.97uncompressed8bit(RGBA)
2K29.97uncompressed8bit(RGBA)
4K29.97uncompressed8bit(RGBA)
NTSCUncompressed8bit(YUV)
PALUncompressed8bit(YUV))
720P29.97uncompressed8bit(YUV)
1080P29.97uncompressed8bit(YUV)
2K29.97uncompressed8bit(YUV)
4K29.97uncompressed8bit(YUV)
NTSCJPEG2000
PALJPEG2000
720P29.97JPEG2000
1080P29.97JPEG2000
2K29.97JPEG2000
4K29.97JPEG2000
NTSCProres(HQ)
PALProres(HQ)
720P29.97Prores(HQ)
1080P29.97Prores(HQ)
2K29.97Prores(HQ)
4K29.97Prores(HQ)
NTSCProres
PALProres
720P29.97Prores
1080P29.97Prores
2K29.97Prores
4K29.97Prores
NTSCDNxHD220
PALDNxHD220
720P29.97DNxHD220
1080P29.97DNxHD220
2K29.97DNxHD220
4K29.97DNxHD220
NTSCCineformRAW
PALCineformRAW
720P29.97CineformRAW
1080P29.97CineformRAW
2K29.97CineformRAW
4K29.97CineformRAW
JPEG 2000
Uncompressed (YUV)
Uncompressed (RGBA)
Prores (HQ)
Prores (422)
DNXHD Cineform RAW
$0.00$1,000.00$2,000.00$3,000.00$4,000.00$5,000.00$6,000.00$7,000.00$8,000.00
NTSCUncompressed10bit(RGB)
PALUncompressed10bit(RGB)
720P29.97uncompressed10bit
1080P29.97uncompressed10bit
2K29.97uncompressed10bit(RGB)
4K29.97uncompressed10bit(RGB)
NTSCUncompressed10bit(YUV)
PALUncompressed10bit(YUV))
720P29.97uncompressed10bit
1080P29.97uncompressed10bit
2K29.97uncompressed10bit(YUV)
4K29.97uncompressed10bit(YUV)
NTSCJPEG2000
PALJPEG2000
720P29.97JPEG2000
1080P29.97JPEG2000
2K29.97JPEG2000
4K29.97JPEG2000
Video Storage Cost Comparison:DRS Fee / Video Hour / Storage Year
Looking at 10 bit...
JPEG 2000
Uncompressed (YUV)
Uncompressed (RGBA)
Staging Storage
• 3-10 x archival master size
– Digitization
– Content processing
– Transcoding
– Metadata collection
– SIP creation and deposit staging
Media
Preservation
Prioritization
Drivers
Item Condition
Media Age
Media Lifespan
Media Failure Modes
Collection Level Storage
Conditions
Reproducer Availability
Format Knowledge Available
Reformatting Specialist
User Community
Replaceability
Reissue Availability
Number of Copies Held
Teaching & Learning
ILL Requests
Project Requests
Legal Obligations
Grant Projects
Projects from Oversight Committee
Research Request
Course Work
Reformatting Workflow (Tape Based)
Reformatting Workflow (File Based)
Phase 1
• Video Reformatting Service
• Enhanced DRS to support:
– Ingest of Video
• Enhance FITS to identify formats, extract metadata
– Metadata editing
– Video storage & preservation
– Basic video delivery service
Phase 2
• Citations
• User annotations
• Closed captioning
• Multi-lingual audio
• Descriptive audio
• Playlists created by faculty, students, librarians
• Other deposit streams (e.g. from Kaltura to the DRS)
Thank you.Questions?
Hydra-Blacklight: An Open Source Stack for AV
Preservation (and More)
December 2014
Tom Cramer
Chief Technology Strategist
Stanford University Libraries
@tcramer
What Is Hydra?
• A robust repository fronted by feature-rich, tailored applications and workflows (“heads”)
➭ One body, many heads
• Collaboratively built “solution bundles” that can be adapted and modified to suit local needs.
• A community of developers and adopters extending and enhancing the core
➭ If you want to go fast, go alone. If you want to go far, go together.
Fundamental Assumption #1
No single system can provide the full range
of repository-based solutions for a given
institution’s needs,
…yet sustainable solutions require a
common repository infrastructure.
Books
Articles
Theses
Images
Maps
Data (Raster)
Data (Comp.)
Data (Observ.)
Audio
Video
Documents
Point Solution Approach …Welcome to Siloville
ETDs (Theses)
Books, Articles
ImagesAudio-Visual
Research Data
Maps & GIS
Docu-ments
Management Access Preservation(?)
Effective? Sustainable?
Repository-Powered Approach
ETDs (Theses)
Books, Articles
ImagesAudio-Visual
Research Data
Maps & GIS
Docu-ments
Scalable, Robust, Shared Management and
Preservation Services
One Body, Many Heads…
ETDs (Theses)
Books, Articles
ImagesAudio-Visual
Research Data
Maps & GIS
Docu-ments
Scalable, Robust, Shared Management and
Preservation Services
Hydra Technical Framework
CRUD in Repositories
Repository/Persistent Storage
Create/Submit/Edit
(CUD)
Search/View
(R)
CRUD in Repositories
Repository/Persistent Storage
Create/Submit/Edit
(CUD)
Search/View
(R)
Major Hydra Components
Fedora Solr
Solrizer
Blacklight
(R)
hydra-headRails Plugin
(CUD)
Blacklight
(Read Only)
A Note on Ruby on Rails
• Rapid application development for web applications: “Convention over configuration”
– 10x productivity
• Supportable: MVC (Model-View-Controller) and
Rails framework make code well-structured, predictable
• Testable: Rspec and Cucumber give powerful,
automatable, testing tools
• Learnable: Stanford went from 1 to 8 Ruby savvy
developers in one year (no new hires)
– 1 week learning curve to basic proficiency
A Note on Fedora
• Flexible, Extensible, Durable Object Repository Architecture– Flexible: model and store any content types
– Extensible: easy to augment with apps and services
– Durable: foundation of preservation repository
• Proven, sustained and successful digital repository– 100’s of adopters; 13 years of development, 4 releases
– Vibrant community & funding under DuraSpace
• Fedora 4.0 released this month; co-evolving with Hydra
Fedora 4 Preservation-Friendly Feature Set
• Auditing, versioning & fixity services
• Clustering & scalability
• Event-driven architecture
• Advanced storage capabilities
– Including support for very large files
• “Projection” over remote file stores
• Native RDF support
A Note on Blacklight
• Repository-agnostic, feature-rich, content-
aware, turnkey access interface
• Vibrant, multi-institutional, open source
community on its own
• Can be used independently, or as the first
component of, Hydra
• 100s of adopters worldwide; ~450 members of
the blacklight-development list
Rock & Roll Hall of Fame: Blacklight for Catalog, EAD and Media
OpenVault: Blacklight for Video at WGBH
Digital Commonwealth at BPL: Blacklight for statewide repository
Spotlight: Blacklight for exhibits
Hydra Community Framework
Fundamental Assumption #2
No single institution can resource the
development of a full range of solutions on
its own,
…yet each needs the flexibility to tailor
solutions to local demands and workflows.
Hydra Philosophy -- Community
• An open architecture, with many contributors to a common core
• Collaboratively built “solution bundles” that can be adapted and modified to suit local needs
• A community of developers and adopters extending and enhancing the core
• “If you want to go fast, go alone. If you want to go far, go together.”
One body, many heads
Community
• Conceived & executed as a distributed, collaborative, open source effort from the start
• Initially a joint development project between Stanford, Univ of Virginia, and Univ of Hull
• Hydra Partners are the backbone of the project
• Coalition of the willing• No fees or dues• Apache-style consensus and governance
• Steering Group provides administration, continuity, and serves as backstop when needed
• But no central planning, no Project Director, no “one” architect
Hydra Partners…
…are individuals, institutions, corporations or
other groups that have committed to contributing
to the Hydra community; they not only use the
Hydra technical framework, but also add to it in
at least one of many ways: code, analysis,
design, support, funding, or other resources.
Hydra Partners collectively advance the project
and the community for the benefit of all
participants.
https://wiki.duraspace.org/display/hydra/Hydra+Community+Framework
Code Licensing
• All Hydra code is available under Apache License, Version 2.0
• All code commitments are being managed through Contributor License Agreements
• Individual – so each developer is clear about what they are contributing
• Corporate – so each institution is clear about what it is contributing
• Code contributors maintain ownership of their IP
• …and grant a non-exclusive license to the project and its users
Hydra Current State
Hydra Partners
0
5
10
15
20
25
OR09 OR10 OR11 OR12 OR13 OR14
OR = Open Repositories Conference
Hydra Partners and Known Users
0
5
10
15
20
25
30
35
40
45
50
OR09 OR10 OR11 OR12 OR13 OR14 Now
OR = Open Repositories Conference
A Worldwide Presence
Second Worldwide Hydra Connect. Cleveland, OH, Sept 2014.
Trending
Trending
Hydra Heads of Note
Avalon & HydraDAM for Media
Sufia
BPL Digital Commonwealth
UCSD DAMS
Northwestern Digital Image Lib.
Avalon
HydraDAM
HydraDAM2
• NEH just funded Indiana & WGBH for 2nd
round of HydraDAM development
• Exercise HydraDAM on Fedora 4
• RDF-based data models
• Flexible storage
• Integrate HydraDAM (back-end) with Avalon and OpenVault (front-ends)
• Integrate with mass digitization workflows
• 2 year effort
http://projecthydra.org
NISO Webinar • December 17, 2014
Questions?All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2014/webinars/av_preservation/
NISO Two-Part Webinar
Sustainable Information, Part 2:
Digital Preservation for Audio Visual Content
Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU