Upload
richard-wright
View
48
Download
1
Embed Size (px)
Citation preview
Workshop: Preservation and Access for Audio and Video
Richard Wright
BBC R&D – PrestoCentre
Goportis Digital Preservation Summit
Hamburg – 19Oct 2011
We are on a journey
Which began here:
And here:
And here:
phonautograph, developed in 1857 by Parisian inventor Édouard-Léon Scott de Martinville
And here:
And here:
"The cinema is an invention without a future"
- Louis Lumière
And here:
All in the 19th Century
In the 20th
C, came broadcasting
And the format wars:
List formats and dates
Leaving us with this:
The BBC Archive, 1995
• About 1 million hours BBC radio and television
• 1.5 million items of film and videotape
• 750,000 radio recordings
• 3 million photographs
• 1.2 million commercial recordings (“grams”)
• 4 million items of sheet music
• 22 million newspaper cuttings
• 550,000 document files; 20,000 rolls of microfilm
• 500,000 phonetic pronunciations
TV Archive Holdings, 1999
Film 30%
D3 6%
Digibeta 1%
Betacam11%
VHS 14%Umatic 4.5%
1” C Format 12%
2” Quad 1%
Ekta, Reversal 12%
TV Formats and Holdings
• 2” Quad, 60s-80s: Dr Who, Dad’s Army, Steptoe & Son, Forsythe Saga, Fawlty Towers, Secret Army
• 1” C, 70s-90s: Yes Minister, Eastenders, Angels, Wogan, All Our Working Lives
• Film: 50 years, 40s-90s: Man Alive, 1984, Ascent of Man, British Empire, Omnibus, Pride and Prejudice 95
• Ektachrome: 67-82 news: Vietnam War, Yom Kippur war, all major domestic stories
• U-Matic: 82-90’s: Lockerbie, General Elections in 1983 & 1987, the Gulf War
Radio Archive, 1999
• Radio holdings: 750,000 recordings, 300,000 hours
• Fewer technical problems with audio recordings
1/4” tapes in regions 37%
1/4” tapes in London 32%
CD Sound effects 0.02%
LP Sound effects 0.4%
DAT NCA sequence programmes , bulletins 1%1/4” News bulletins 3%
Cassette NCA sequence programmes 4%
CD Programme extract compilation 0.1%
1/4” complete programmes 15%
1/4” film unit tapes 1%
DAT, London & Region 2%
LP & 78RPM Programme Extract, 5% of sound holdings
Radio Formats and Holdings
• Radio One sessions: 40k recordings, all BBC copyright; mainly ¼” tape
– Rolling Stones, Beatles, Who, Jimi Hendrix, Led Zeppelin, The Fall ... and so many more
• News off-air on DAT 1990-2001, then CD to 06
• Files from tapeless production onto CD (6 yrs)
• Shellac and vinyl pressings from 20s-60s
– Special problems with lacquer, acetate, aluminium master discs: 16”, fragile, deteriorate
The Problem: Analogue Media
DecayingObsoleteFragile
Presto Survey, 20015 million hours of holdings(10 European broadcasters)
Decaying Obsolete Fragile
• Obsolescence: at least 2/3 of the material• Deterioration: approximately 1/3 of the
material • Fragile media: roughly 1/4 of the material
Overall: 70% ofholdings haveproblems (2001)
The Solution:digitisation
Obsolescence
• Videotape–2”; 1”; U-Matic: no playback equipment
• Film–Disappearing in post production
• Audio formats–Grams : no playback equipment (in BBC !!)
–¼” no longer accepted in BBC radio production and playout systems
Deterioration
• Videotape – decay of adhesive– 2”; 1”; U-Matic (30% read failures at BBC)
• Audio – decay of adhesive: ¼” tape– Lacquer discs
• Magnetic sound tracks: vinegar syndrome
• Other Acetate
• Decay of film splices
• General decay of polymer materials– Even the sleeves on vinyl LPs
Fragile Media
• Vinyl (45s, LPs)
– and shellac (78s)
• Film
– 10 plays per print (videotape: 50)
• Video or audiotape
– physical damage
– magnetic fields
Size of the Problem– in Europe
• Presto: found 5 million hours 2001
– Mainly broadcast archives
• Prestospace: found 10 million hours 2004
– Broadcast and large national collections
• TAPE: found additional 20 million hours
– In collections not covered previously
• UNESCO estimate: 200 million hours worldwide (100 million in Europe)
Where is the material?
• Broadcast archives 30% (roughly)
• National collections 15%
• Other major collections 15%
• Small and specialist collections 40%
NB: all these figures refer to archived material ONLY (TAPE survey)
What to do about it? Presto preservation factory
• Efficient workflow for digitisation–Staff specialisation–Cartography and Triage
Adam Smith: “the division of labour in pin manufacturing – and the great increase in the quantity of work that results” (UK £20 note)
wiki.prestospace.org preservation guide
Problems with the solution: digitisation not fully accepted
“You’re not preserving anything; you’re only making more proxies and adding to the problem”
• Not accepted as a solution for film
• Not easy to implement for video (in full quality); problems: encodings, compression, formats, file size, bandwidth ...
• But – very much accepted for audio: BWF
Problems with the solution: needs Digital Preservation
The approach in 2006:
•Media
•Multiple copies
•Maintenance
•Migration
Media
• Datatape : cheaper that hard drives– Needs an expensive tape drive
– And has reliability issues
• Optical is cheapest of all– But isn’t really mass storage (DVD=4.7 GB)
• New DVD format(s) promise 20 to 100 GB
– And has reliability issues
• Hard drives prices have dropped sharply– Easiest to automate management
– And has reliability issues
Multiple copies
• Two copies
– Two technologies
• In two places
• But fastest recovery is by mirroring
– Which means identical technologies
• Big arguments about RAID vs simpler options vs more complex options
Maintenance
• Life cycle management
• Should be every archive’s
built-in process
• Begins with blank media
– Then the writing
– Then the initial checking
– Then the periodic checking: aerobics, scrubbing
• Ends with migration to the next format
Migration
• A fact of life
• Every five years
• Can involve a lot of manual handling (of datatapes or optical media)
• Or can be nearly transparant (disc upgrades) –but: every three years!
• Best practice: uncompressed file formats
Digital Preservation 2009formal management model
Is the format a
problem?
START HERE
Archive for a
few years
What cost/quality/risk
option can you affordCompress
lossy
YES
NO
UncompressCompress
lossless
END HERE
(1)
(2)
(3) (4)
(5a)(5b)
(5c)
... with emulation
Is the format at
risk?
START HERE
Archive for a
few years
What cost/
quality/risk can
you afford?Compress
lossy
YES
NO
UncompressCompress
lossless
END HEREMultivalent
Stepping back: the real problem with storage (1)
Medium Bits/cm² Life
Stone 10 10 000
Paper 104 1000
Film 107 100
Disc 1010 10
Each change 1000 times cheaper,
but lasts 1/10th as long
The problem (2)
• Current storage media are unreliable
– Discs fail
– Data tapes fail
– Optical media fail (and are easily damaged)
– Companies fail
– exceptions? Glass discs; Holographic media; Going back to film; Digital film;
The problem (3)
• Storage isn’t just about media– Encoding and obsolescence
– File formats and obsolescence
– File management systems and obsolescence
– Physical interfaces and obsolescence
– Operating systems and obsolescence
– System complexity and associated risks
– Human errors
–Cost: continuous maintenance
What is the cost of continuous maintenance?
• You need a model of storage operating and replacement costs, into the future
• What storage? So you need a storage strategy:
– Allocation of storage: primary, backup, cloud ...
– Operation of storage: cycles for copying, checking
– Some idea of relating costs to risks !!!
• NOT available from storage vendors
Simple Preservation Model
And now:one PrestoPRIME tool
• A model for storage systems, to calculate
– Cost
– Risk
– Loss
– And compare what-if scenarios
• http://prestoprime.it-innovation.soton.ac.uk/
Storage Systems
HDD in serversMigration required every 4 years. Running CostsAccess: €0.1 per GB
Storage: €1 per GB per yearCorruption RatesAccess: avg. 1 in 500 files
Latent: avg. 1 in 750 files per year
HDD on shelvesMigration required every 4 years. Running CostsAccess: €1 per GB
Storage: €0.25 per GB per yearCorruption RatesAccess: avg. 1 in 100 files
Latent: avg. 1 in 500 files per year
More Storage Systems
Data tapes in a robotMigration required every 6 years. Running CostsAccess: €0.2 per GB
Storage: €0.4 per GB per yearCorruption RatesAccess: avg. 1 in 1x104 files
Latent: avg. 1 in 1x105 files per year
Data tapes on shelvesMigration required every 6 years. Running CostsAccess: €1 per GB
Storage: €0.1 per GB per yearCorruption RatesAccess: avg. 1 in 1x104 files
Latent: avg. 1 in 1x105 files per year
Storage Configuration
Found 3 storage configurations. Add...
Disk with Tape
System 1: HDD in servers
Files accessed avg of 0.25 times per year, staying constant
Scrubbing every 1 year(s)
System 2: Data tapes in a robot
Files accessed avg of 0 times per year, staying constant
Scrubbing every 3 year(s)
File Collections
• Found 1 file collection. Add...
• read-only
• Default File Collection
• Length of cost/loss projection is 25 year(s). Files
• 100 thousand initially, staying constant.
• Average File Size
• 25 GB.
Plans
Found 3 plans. Add...
Disk and Tape edit Delete Evaluate
File Collection: Default File Collection
25 year lifetime. 100 files, avg. 25 GB in size.
Storage Configuration: Disk with Tape
Uses HDD in servers and Data tapes in a robot systems.
http://prestoprime.it-innovation.soton.ac.uk/
Now: Three Areas of Digital Preservation, and PrestoPRIME tools
1) Digitisation – going digital
2) Digital Workflows – working digitally
3) Digital Preservation (proper)
1. Digitisation – Key Ideas
• Cartography and Triage
– Make a map of your holdings
– Decide on priorities
• Make a preservation plan
– Digitisation: in-house or a service provider?
• Better – Faster – Cheaper
– Division of labour: Adam Smith, industrial process
– Lower prices by contractors for archive work
Joanneum: Quality Analysis Tool
2- Working with digital content (lots of files)
It’s all about management– DAM/MAM and Trusted Repositories – what do
they do, what don’t they do -- White Papers– Storage –ITI online free tools– Metadata – Joanneum mapping and validation ;
“tag gardening” Univ Amsterdam; fingerprinting INA
– Digital library technology RAI, BBC MXF support– Access – Joanneum Time-based navigation,
annotation– Rights – RAI ontology, Eurix implementation
3- Preserving the digital content
• Preservation Platform: P4=PrestoPRIME Preservation Platfom, Eurix; Rosetta, Ex Libris
• Standards: OAIS; formal control; formal preservation actions eg migration; P4
• Emulation – Multivalent, Univ of Liverpool• Formats, carriers, storage: Planning and
strategy: PrestoPRIME white papers• Managing and maintaining storage into the
future –SLA’s for outsourced service; white papers, software for real-time SLA monitoring; modelling and simulation tools
Access: Audiovisual Content and Digital Libraries
• Digitisation makes audiovisual content available to web access, including digital libraries
• Broadcast archive projects: Birth of TV, VIdeoActive, EUScreen (link to Europeana)
• BUT – what are digital libraries doing to provide access to audiovisual content?
Four requirementsfor sensible access
• Granularity
• Navigation
• Reference and Citation
• Annotation
Granularity - division into meaningful units
• Keyframes
• Other methods to represent video
• and audio:
Navigation
• "Click and play" on visual representation of the meaningful units
Reference and Citation
• the core requirement for scholarly discourse
– along with a major change in attitude!
• Needs a permanent place for “things to be”
– Hence the need for stable audiovisual collections
“Hamlet, for example, is comparable to Saxo
Grammaticus' Gesta Danorum.[citation needed]
King Lear is based on King Leir in Historia
Regum Britanniae by Geoffrey of Monmouth,
retold in 1587 by Raphael Holinshed.[citation
needed] “
wikipedia
Annotation
• the core requirement for social web = interactivity
• individual interacts with content
• individuals interact with other individuals
Thank You
Preservation Guide
preservationguide.co.uk
PrestoCentre prestocentre.eu
Richard Wright [email protected]