Automatic transcription of video files sig media

Preview:

Citation preview

Automatic transcription of video

filesCarlos Turró

Universitat Politecnica de Valencia

Agenda• Why automatic transcription• State of the art: The transLectures project• Automatic transcription of Lecture Recordings: The Opencast Project• Notes & the near future

Why automatic transcription of video files?• Accessibility

Why automatic transcription of video files?• Accessibility

• Searching into a video file• Searching into a video repository• Topic identification• …and much more

The transLectures project• Development of an engine for Automated Speech Recognition (ASR) for

lectures & educational content• Development of translation tools for that content

• Implementation• Case studies: Videolectures.NET & Polimedia (UPV video repository)• Real-life evaluation• Integration into Opencast

http://www.translectures.eu

5

transLectures partners

12 Nov 2013

Name Country

1 Universitat Politècnica de València (MLLP) Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK

36 Months

November 2014

Statistical transcription (and translation)

Acustic Model

LanguageModel

TRANSCRIPTION

Sound ASR Engine

Statistical transcription (and translation)

Acustic Model

LanguageModel

Manually transcriptedvoice Modeling Engine

Architecture of TransLecturesLecture

Language Model

Slides

Extracontent

Result

Intelligent interaction

Transcription Translation

Languages

12 Nov 2013 10

• Transcription (ASR)• EN • SL • ES

• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Transcription and Translation Platform

Transcription and Translation Platform API

Transcription and Translation Platform• Post-editing web interface (in HTML5)

Example video• https://media.upv.es/?id=b444d12e-db23-9a4f-9b3b-d1d9275d4cb4

Scientifical Evaluations• WER = Word Error Ratio

• The lower the better

• Usually, a human transcriptor has a WER around 12

Beyond transLectures

Beyond transLecturesWER

Language M10 M17Dutch 25.7 24.5Italian 21.2 17.7Portuguese 45.9 43.0Spanish 15.9 14.4Estonian N/A 27.1French N/A 22.7

Beyond transLectures

The Opencast Community is…Universities, companies and people:• concerned with academic video• attracted to the Opencast values of openly exchanging ideas,

experience, knowledge and code• committed to building and maintaining a robust, flexible, high-quality

open source lecture capture and academic video management solution.

Now also part of

Full-featured Lecture Recording ecosystem

Who uses Opencast?Around the world, with strong adoption in Europe especially.

43 Adopters with public information (May 2014)

30+ commercial partner clients

http://opencast.org/matterhorn-adopters

Yesterday’s tweet

Indexing in Opencast• Opencast has built-in OCR indexing capabilities

Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search server

• New operations can be addedVideo (slides) -> transcription (tL) -> Apache Lucene search serverorVideo (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache Lucene search server

Why do I need an indexing server?• Powerful, Accurate and Efficient Search Algorithms

• ranked searching -- best results returned first• many powerful query types: phrase queries, wildcard queries, proximity

queries, range queries and more• fielded searching (e.g. title, author, contents)• sorting by any field• multiple-index searching with merged results• allows simultaneous update and searching• flexible faceting, highlighting, joins and result grouping• fast, memory-efficient and typo-tolerant suggesters

Demo on searching• https://media.upv.es

Notes & the near future• ASR Technology is enough good for automated transcription of videos

… with enough good sound

• There are lecture recording systems that enables to plug transcriptions for searching

…like Opencast

• There are already things to solve• Transcription speed (in good progress)• Topic indentification• Adding more languages

Thanks!Questions?

Learning more ….transLectureshttp://translectures.eu

Video in a multilingual context (EMMA)http://association.media-and-learning.eu/portal/resource/ml-webinar-video-multilingual-context

Opencast State of the Projecthttp://lanyrd.com/2015/apereo/sdmpry/

Recommended