Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Advanced language technologies enhance audiovisual translation productionTiina Lindh-Knuutila
Audiovisual translation - New Trends in Translation Technology17 May 2017, Tallinn
Case: Subtitling services for YLE
Project ProposalAlaotsikko
● Lingsoft has provided subtitling for the hard-of-hearing for the Finnish broadcasting company Yle since 2011 both in Finnish and in Swedish
Methodologies in use● Respeaking● Automatic speech recognition● Post-processing● Realignment and automatic subtitling
Respeaking?
Project ProposalAlaotsikko
● Repeating what is being said on a program ○ clean audio for automatic
speech recognition (ASR)● Leave out speaker hesitation● Possibly summarize what’s being
said● Within language or between
languages● Live or for background ASR ● if not live, possibility to slow down
the program video to keep up.
ASR + post-process
“Nice to see all of you here!”
© Lingsoft Oy
Subtitling process 0.0
Program broadcast
Subtitling
Program: 1 hSubtitling: 13 h?
13:00:0000:00:00
© Lingsoft Oy
Program broadcast
Respeaking
Automatic speech recognition
Automatic post processing and realign
Finishing and proofreading
00:00:00
Broadcast: 1 hLanguage technology assisted subtitling process: ~9 h
Subtitling process 1.0
Otto huomasi , että uudessa kotipaikassa oltiin avuliaita . Korjaamokarhu auttoi isää korjaamaan jäätelöauton . . . Kauppiaskarhu auttoi äitiä pakkaamaan ostokset . . . Ja poliisikarhu auttoi väkeä pääsemään kadun yli. Jopa naapurissa asui avuliaita karhuja.
400:00:47,360 --> 00:00:52,840Otto huomasi, että uudessakotipaikassa oltiin avuliaita.500:00:55,400 --> 00:01:01,800Korjaamokarhu auttoi isää korjaamaanjäätelöauton... Kauppiaskarhu auttoi -600:01:01,880 --> 00:01:09,760äitiä pakkaamaan ostokset... Ja poliisikarhuauttoi väkeä pääsemään kadun yli.700:01:12,240 --> 00:01:15,040Jopa naapurissa asui avuliaita karhuja.
1510:00:48,10 --> 10:00:53,05Otto huomasi, että uudessakotipaikassa oltiin avuliaita.1610:00:55.15 → 10:00:59.10Korjaamokarhu auttoi isää korjaamaan jäätelöauton.1710:01:00.10--> 10:01:04.05Kauppiaskarhu auttoi äitiä pakkaamaan ostokset.1810:01:06.10 --> 10:01:10,05 Ja poliisikarhu auttoi väkeä pääsemään kadun yli.1910:01:12.10 --> 10:01:16.03Jopa naapurissaasui avuliaita karhuja.
Raw text = respoken text after automatic speech recognition
Pre-subtitling = raw text with automatic postprocessing and re-align
Finished subtitling
From speech to subtitles
The ASR output can be analyzed and post-processed with language analysis tools:
● Join sub-word units (morphs) produced by the ASR to words● Numbers are shrunk according to the rules● Punctuation and capitalizations are added with the help of a
parser● Sentence borders are recognized to identify change of speaker● Phrases are recognized within sentences to recognize places
where to split a sentence into lines or several blocks ● Collocations that need to be kept together are identified:
‘President Wilson’, ‘in White House’● Adding and fixing punctuation not automatically provided by the
ASR or lacking from the respoken audio.
Linguistic postprocessing
Case SVT
Project ProposalAlaotsikko
Live subtitling is challenging
Velotype keyboard proficiency takes long
⇒ Teach the subtitlers to respeak instead of typing
Goals• Meet SVT’s obligation to subtitle 80 % of live TV shows by 2019 (65% by 2016)
• Create Speech Recognition prototype in subtitling of live weather forecasts
• High-quality speech recognition in a limited domain• Implement Speech Recognition in the subtitling workflow to facilitate and speed up subtitlers’ work.
Benefits• Improved process, better productivity – “more with less”• Readiness to implement ASR for new domains and purposes• 80 % subtitling obligation in the reach
SVT Goals and benefits
1. Cleaned Language Resources for Swedish National Language Resource Bank • Text Corpora • Audio Corpora • Pronunciation Resources• Additional Lexical Resources, e.g. word lists
2.Lingsoft Speech Recognition Platform (LSRP)• Speaker dependent speech recognition for weather forecast domain in Swedish• “Current events” language model adaptation with topical wordlists• Access to speech recognition resources (audio, text)• Statistic of speech recognition tasks• Services through Customer APIs
3. Testing• Lingsoft Integration Test plan and results• Test User Interface as reference implementation for APIs
Project outcomes
Quality requirements• 2,5 seconds latency > Yes • 96 % accuracy (WER max. 4 %) > 2.7, with noise 3.3 • Voice commands > Yes• Background noise <=10 dB allowed > as recorded by SVT • Word confidence > Yes • n-best lists > Yes • Baseline wordcount >75 000 words in base forms > Yes
Effort and Schedule• ~1000 work days• met all requirements on time and within budget
Project metrics
Speaker diarization and identification
Machine translation of subtitles
Automatic sentence compression for subtitling
Semantic enrichment of subtitles
New possibilities
@memadprojectMeMAD Project
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
MeMAD Methods for Managing Audiovisual Data
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
MeMAD consortium
Aalto University, Finland (Coordinator)
University of Helsinki, Finland
EURECOM, France
University of Surrey
YLE Finnish broadcasting company
Limecraft, Belgium
Lingsoft, Finland
Institut national audiovisuel, France
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Revolutionizing digital storytelling
WHY?
Audiovisual media content is an essential resource of modern history
The amount of audiovisual content is huge and growing
To fully benefit from multilingual audiovisual content, we need efficient tools to make visual content accessible
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Revolutionizing digital storytelling
WHAT?
New methods that help to translate moving images and sounds into words
Methods developed in MeMAD will help us to manage large amounts of audiovisual data cost efficiently
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Revolutionizing digital storytelling
FOR WHOM?
For anyone using audiovisual content!
Professionals who work in the Creative Industries will receive new methods for video management and digital storytelling.
Visually and hearing impaired, among others, will have better access to video content.
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Revolutionizing digital storytelling
HOW?
Advanced methodologies and novel approaches:
Machine learning
Automatic Speech Recognition
Machine Translation
Audio and Video event detection
Semantic Description
Human Audio Description
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Use case examples
The user can
1. discover media content about a specific theme, person, or place
2. get the right parts from the program
The user “Kalle” is studying furniture design and is interested in seeing video content about furniture designed in the 1970s.
The user “Eeva” watches the evening news, but would like to skip topics that she is not interested in.
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Use case examples
Production team can
1. ingest, organize and edit new footage
2. discover archived content3. manage material and footage
between multiple production parties
Nature documentary production team returns with a large collection of raw footage, which they ingest into the production system. The system indexes the files so that the production team can move on with scripting and editing their program. Typically the amount of media is quite large, but the production schedule is not as tight as on day-to-day news production.
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Use case examples
The user can get enriched content such as
1. relevant media content2. more details and background
information3. fact checking of what’s being
said4. relevant advertising
The end-user “Rosa” is watching a program about animals in Sahara. The program contains all kind of exotic animals which Rosa has not seen before and would like to know more about them. Luckily the on-demand service displays next to the video picture information about the currently visible objects, such as ants, birds and plants.
MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.
Use case examples
The professionals of live or semi-live subtitling or audio description benefit from improved work processes
Different audiences benefit from improved subtitles, audio description and (machine) translated versions of the subtitles
Yle staff member “Teppo” is subtitling live broadcast/webcast on an election night.
“Laura” lives in Belgium but would like to watch content created in other parts of Europe, even content produced in other languages. Thanks to automatically translated subtitles or audio descriptions, Laura can experience content otherwise inaccessible to her.
Tiina [email protected]
www.lingsoft.fi
Eteläranta 10, 00130 Helsinki, Finland
Kauppiaskatu 5 A, 20100 Turku, Finland
p. 02 2793300