Automatic transcription of video files sig media

Automatic transcription of video

filesCarlos Turró

Universitat Politecnica de Valencia

Agenda• Why automatic transcription• State of the art: The transLectures project• Automatic transcription of Lecture Recordings: The Opencast Project• Notes & the near future

Why automatic transcription of video files?• Accessibility

• Searching into a video file• Searching into a video repository• Topic identification• …and much more

The transLectures project• Development of an engine for Automated Speech Recognition (ASR) for

lectures & educational content• Development of translation tools for that content

• Implementation• Case studies: Videolectures.NET & Polimedia (UPV video repository)• Real-life evaluation• Integration into Opencast

http://www.translectures.eu

transLectures partners

12 Nov 2013

Name Country

1 Universitat Politècnica de València (MLLP) Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK

36 Months

November 2014

Statistical transcription (and translation)

Acustic Model

LanguageModel

TRANSCRIPTION

Sound ASR Engine

Statistical transcription (and translation)

Acustic Model

LanguageModel

Manually transcriptedvoice Modeling Engine

Architecture of TransLecturesLecture

Language Model

Slides

Extracontent

Result

Intelligent interaction

Transcription Translation

Languages

12 Nov 2013 10

• Transcription (ASR)• EN • SL • ES

• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Transcription and Translation Platform

Transcription and Translation Platform API

Transcription and Translation Platform• Post-editing web interface (in HTML5)

Example video• https://media.upv.es/?id=b444d12e-db23-9a4f-9b3b-d1d9275d4cb4

Scientifical Evaluations• WER = Word Error Ratio

• The lower the better

• Usually, a human transcriptor has a WER around 12

Beyond transLectures

Beyond transLecturesWER

Language M10 M17Dutch 25.7 24.5Italian 21.2 17.7Portuguese 45.9 43.0Spanish 15.9 14.4Estonian N/A 27.1French N/A 22.7

Beyond transLectures

The Opencast Community is…Universities, companies and people:• concerned with academic video• attracted to the Opencast values of openly exchanging ideas,

experience, knowledge and code• committed to building and maintaining a robust, flexible, high-quality

open source lecture capture and academic video management solution.

Now also part of

Full-featured Lecture Recording ecosystem

Who uses Opencast?Around the world, with strong adoption in Europe especially.

43 Adopters with public information (May 2014)

30+ commercial partner clients

http://opencast.org/matterhorn-adopters

Yesterday’s tweet

Indexing in Opencast• Opencast has built-in OCR indexing capabilities

Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search server

• New operations can be addedVideo (slides) -> transcription (tL) -> Apache Lucene search serverorVideo (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache Lucene search server

Why do I need an indexing server?• Powerful, Accurate and Efficient Search Algorithms

• ranked searching -- best results returned first• many powerful query types: phrase queries, wildcard queries, proximity

queries, range queries and more• fielded searching (e.g. title, author, contents)• sorting by any field• multiple-index searching with merged results• allows simultaneous update and searching• flexible faceting, highlighting, joins and result grouping• fast, memory-efficient and typo-tolerant suggesters

Demo on searching• https://media.upv.es

Notes & the near future• ASR Technology is enough good for automated transcription of videos

… with enough good sound

• There are lecture recording systems that enables to plug transcriptions for searching

…like Opencast

• There are already things to solve• Transcription speed (in good progress)• Topic indentification• Adding more languages

Thanks!Questions?

Learning more ….transLectureshttp://translectures.eu

Video in a multilingual context (EMMA)http://association.media-and-learning.eu/portal/resource/ml-webinar-video-multilingual-context

Opencast State of the Projecthttp://lanyrd.com/2015/apereo/sdmpry/

Automatic transcription of video files sig media

Engineering

Automatic Music Transcription for Monophonic Piano …cs229.stanford.edu/proj2017/final-posters/5148253.pdf · Automatic Music Transcription for Monophonic Piano Music via Image Recognition

1 Automatic transcription of polyphonic music based on the …disit.org/disitmn/articoli/articoloIEEEfinal.pdf · 1 Automatic transcription of polyphonic music based on the constant-Q

Automatic Guitar Tablature Transcription Online

AUTOMATIC TRANSCRIPTION FOR POLYPHONIC MUSIC A …

Automatic Drum Transcription and Source Separation

MONOPHONIC AUTOMATIC MUSIC TRANSCRIPTION WITH

Deep Learning in Automatic Piano Transcription

Automatic Transcription of Polyphonic Piano Music Using Genetic

The Automatic Transcription of Music to Determine its Chord …€¦ · The Automatic Transcription of Music 1 The Automatic Transcription of Music to Determine its Chord Progression

Automatic Music Transcription: An Overvieewerts/publications/2019...1 Automatic Music Transcription: An Overview Emmanouil Benetos Member, IEEE, Simon Dixon, Zhiyao Duan Member, IEEE,

Automatic Drums Transcription for polyphonic music using Non … · 2015. 9. 4. · Automatic Drums Transcription for polyphonic music using Non-Negative Matrix Factor Deconvolution

Automatic Tablature Transcription of Electric Guitar Recordings by

Automatic Transcription of Organ Tablature Music Notation

Automatic music transcription using sequence to sequence

An HMM-Based Automatic Singing Transcription Platform for

Automatic Drum Transcription and Source Separation - eirhomepage.eircom.net/~derryfitzgerald/ThesisFitz.pdf · Automatic Drum Transcription and Source Separation ... broad review

DeepLearning forJazz Walking Bass Transcription · PDF fileDeepLearning forJazz Walking Bass Transcription ... nWeimar Jazz Database ... A., “Automatic transcription of melody, bass

Automatic Phonetization-based Statistical Linguistic … · Automatic Phonetization-based Statistical ... Standard Arabic, Phonetic Transcription, Phonetization, Ranked Frequency

Signal Processing Methods for the Automatic Transcription

Data-based Automatic Phonetic Transcription