Research & Development
Analysing media in the cloud
An experiment and a marketplace
Tristan Ferne
Executive Producer
BBC Research & Development
Research & Development
A experiment in using the cloud to process a radio archive
A prototype for the World Service archive
A marketplace for analysing media in the cloud
Research & Development
ABC-IP
Automatic Broadcast Content Interlinking Project
Unlocking media archives by making better use of metadata
TSB competition for “Metadata: increasing the value of digital content”
BBC R&D and Metabroadcast
May 2011 - May 2013
Research & Development
The BBC World Service archiveA 3-year digitisation project
50,000 radio programmes from the past 45 years
3 years of continuous audio
500TB of high quality audio
Research & Development
The missing metadata
Missing fields
Incorrect data
Spelling mistakes
Research & Development
Listening machines
Research & Development
Noisy transcriptsto be raised in a crisp and easy gait collar tradition and mystique and net bottle westphal mia ballroom with a fifth will one of your very well that p. c. set a caustic wet plate is sprint says it twice to purposes again who's addicted across stick is a podium which stopped at a slow start to the masses of setting up a world and on top was a big nineteen ninety three after a renewed spirit of the big dig ,comma off trillo .period when you are unable to compose and see what it's stole to working for a while at the guys when i started the eighth that we teach eighteen hamper and a timeless dave they'd each code for my list tinged yellow and io i had no east p. n. c. and i was a big epic tina afoot o'mara i. q. from kodiak and there was so they become kosher shopko misfit and i was a david to compose his team's end and at haas tied to districts in the indian head of i. a. moved to beijing
Research & Development
Extracting topics
Extract keywords from noisy transcripts
Match to Linked Data topics from DBpedia
Disambiguate using distance within the “semantic” space
Research & Development
Processing in the cloud
26,280 hours of audio processed
36,729 compute hours on “small” cloud machines
Processed whole archive in 2 weeks at a cost of ~$3,000
Built an API for managing the process
Research & Development
Machines + People
Research & Development
http://worldservice.prototyping.bbc.co.uk
Research & Development
http://worldservice.prototyping.bbc.co.uk
Research & Development
comma – Cloud marketplace for media analysis
TSB competition for “Innovating in the Cloud”
BBC R&D, Somethin’Else and Kite
May 2013 - May 2015
Research & Development
Media analysisTopic generation from text
Summarising text
Sentiment analysis
Speaker identification and diarisation
Music identification
Mood classification of audio and video
Face recognition
Segmentation of audio and video
Object and place recognition
Scene detection in video
Subtitle creation
Research & Development
Problems with media analysis
Computationally intensive
Hard to integrate with other systems
Hard to evaluate and compare
Hard to know what's possible and what’s available
Research & Development
Making media analysis easy
Algorithm providers upload algorithms
Media owners upload content and choose what they want to analyse
The platform manages:
Computation and scaling
Storing the data
Monitoring
Billing
Research & Development
The comma marketplace
Algorithm developers; e.g. research departments at universities and SMEs
Media owners; e.g. broadcasters, museums, archives, even individuals
Research & Development
Analysing media in the cloud
Tristan Ferne, BBC R&D
@tristanf
http://www.bbc.co.uk/rd
http://worldservice.prototyping.bbc.co.uk