Advanced language technologies - European Commission · 2018-05-30 · Advanced language technologies enhance audiovisual translation production Tiina Lindh-Knuutila Audiovisual translation

Advanced language technologies enhance audiovisual translation productionTiina Lindh-Knuutila

Audiovisual translation - New Trends in Translation Technology17 May 2017, Tallinn

Case: Subtitling services for YLE

Project ProposalAlaotsikko

● Lingsoft has provided subtitling for the hard-of-hearing for the Finnish broadcasting company Yle since 2011 both in Finnish and in Swedish

Methodologies in use● Respeaking● Automatic speech recognition● Post-processing● Realignment and automatic subtitling

Respeaking?


● Repeating what is being said on a program ○ clean audio for automatic

speech recognition (ASR)● Leave out speaker hesitation● Possibly summarize what’s being

said● Within language or between

languages● Live or for background ASR ● if not live, possibility to slow down

the program video to keep up.

ASR + post-process

“Nice to see all of you here!”

© Lingsoft Oy

Subtitling process 0.0

Program broadcast

Subtitling

Program: 1 hSubtitling: 13 h?

13:00:0000:00:00

© Lingsoft Oy

Program broadcast

Respeaking

Automatic speech recognition

Automatic post processing and realign

Finishing and proofreading

00:00:00

Broadcast: 1 hLanguage technology assisted subtitling process: ~9 h

Subtitling process 1.0

Otto huomasi , että uudessa kotipaikassa oltiin avuliaita . Korjaamokarhu auttoi isää korjaamaan jäätelöauton . . . Kauppiaskarhu auttoi äitiä pakkaamaan ostokset . . . Ja poliisikarhu auttoi väkeä pääsemään kadun yli. Jopa naapurissa asui avuliaita karhuja.

400:00:47,360 --> 00:00:52,840Otto huomasi, että uudessakotipaikassa oltiin avuliaita.500:00:55,400 --> 00:01:01,800Korjaamokarhu auttoi isää korjaamaanjäätelöauton... Kauppiaskarhu auttoi -600:01:01,880 --> 00:01:09,760äitiä pakkaamaan ostokset... Ja poliisikarhuauttoi väkeä pääsemään kadun yli.700:01:12,240 --> 00:01:15,040Jopa naapurissa asui avuliaita karhuja.

1510:00:48,10 --> 10:00:53,05Otto huomasi, että uudessakotipaikassa oltiin avuliaita.1610:00:55.15 → 10:00:59.10Korjaamokarhu auttoi isää korjaamaan jäätelöauton.1710:01:00.10--> 10:01:04.05Kauppiaskarhu auttoi äitiä pakkaamaan ostokset.1810:01:06.10 --> 10:01:10,05 Ja poliisikarhu auttoi väkeä pääsemään kadun yli.1910:01:12.10 --> 10:01:16.03Jopa naapurissaasui avuliaita karhuja.

Raw text = respoken text after automatic speech recognition

Pre-subtitling = raw text with automatic postprocessing and re-align

Finished subtitling

From speech to subtitles

The ASR output can be analyzed and post-processed with language analysis tools:

● Join sub-word units (morphs) produced by the ASR to words● Numbers are shrunk according to the rules● Punctuation and capitalizations are added with the help of a

parser● Sentence borders are recognized to identify change of speaker● Phrases are recognized within sentences to recognize places

where to split a sentence into lines or several blocks ● Collocations that need to be kept together are identified:

‘President Wilson’, ‘in White House’● Adding and fixing punctuation not automatically provided by the

ASR or lacking from the respoken audio.

Linguistic postprocessing

Case SVT


Live subtitling is challenging

Velotype keyboard proficiency takes long

⇒ Teach the subtitlers to respeak instead of typing

https://dreambroker.com/channel/zsk1wewt/cx1f7idy?quality=1080p

https://dreambroker.com/channel/zsk1wewt/cx1f7idy?quality=1080p

Goals• Meet SVT’s obligation to subtitle 80 % of live TV shows by 2019 (65% by 2016)

• Create Speech Recognition prototype in subtitling of live weather forecasts

• High-quality speech recognition in a limited domain• Implement Speech Recognition in the subtitling workflow to facilitate and speed up subtitlers’ work.

Benefits• Improved process, better productivity – “more with less”• Readiness to implement ASR for new domains and purposes• 80 % subtitling obligation in the reach

SVT Goals and benefits

1. Cleaned Language Resources for Swedish National Language Resource Bank • Text Corpora • Audio Corpora • Pronunciation Resources• Additional Lexical Resources, e.g. word lists

2.Lingsoft Speech Recognition Platform (LSRP)• Speaker dependent speech recognition for weather forecast domain in Swedish• “Current events” language model adaptation with topical wordlists• Access to speech recognition resources (audio, text)• Statistic of speech recognition tasks• Services through Customer APIs

3. Testing• Lingsoft Integration Test plan and results• Test User Interface as reference implementation for APIs

Project outcomes

Quality requirements• 2,5 seconds latency > Yes • 96 % accuracy (WER max. 4 %) > 2.7, with noise 3.3 • Voice commands > Yes• Background noise <=10 dB allowed > as recorded by SVT • Word confidence > Yes • n-best lists > Yes • Baseline wordcount >75 000 words in base forms > Yes

Effort and Schedule• ~1000 work days• met all requirements on time and within budget

Project metrics

Speaker diarization and identification

Machine translation of subtitles

Automatic sentence compression for subtitling

Semantic enrichment of subtitles

New possibilities

[email protected]

@memadprojectMeMAD Project

MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. This presentation has been produced by theMeMAD project. The content in this presentation represents the views of the authors, and the European Commission has no liability in respect of the content.

MeMAD Methods for Managing Audiovisual Data


MeMAD consortium

Aalto University, Finland (Coordinator)

University of Helsinki, Finland

EURECOM, France

University of Surrey

YLE Finnish broadcasting company

Limecraft, Belgium

Lingsoft, Finland

Institut national audiovisuel, France


Revolutionizing digital storytelling

WHY?

Audiovisual media content is an essential resource of modern history

The amount of audiovisual content is huge and growing

To fully benefit from multilingual audiovisual content, we need efficient tools to make visual content accessible



WHAT?

New methods that help to translate moving images and sounds into words

Methods developed in MeMAD will help us to manage large amounts of audiovisual data cost efficiently



FOR WHOM?

For anyone using audiovisual content!

Professionals who work in the Creative Industries will receive new methods for video management and digital storytelling.

Visually and hearing impaired, among others, will have better access to video content.



HOW?

Advanced methodologies and novel approaches:

Machine learning

Automatic Speech Recognition

Machine Translation

Audio and Video event detection

Semantic Description

Human Audio Description


Use case examples

The user can

1. discover media content about a specific theme, person, or place

2. get the right parts from the program

The user “Kalle” is studying furniture design and is interested in seeing video content about furniture designed in the 1970s.

The user “Eeva” watches the evening news, but would like to skip topics that she is not interested in.


Use case examples

Production team can

1. ingest, organize and edit new footage

2. discover archived content3. manage material and footage

between multiple production parties

Nature documentary production team returns with a large collection of raw footage, which they ingest into the production system. The system indexes the files so that the production team can move on with scripting and editing their program. Typically the amount of media is quite large, but the production schedule is not as tight as on day-to-day news production.


Use case examples

The user can get enriched content such as

1. relevant media content2. more details and background

information3. fact checking of what’s being

said4. relevant advertising

The end-user “Rosa” is watching a program about animals in Sahara. The program contains all kind of exotic animals which Rosa has not seen before and would like to know more about them. Luckily the on-demand service displays next to the video picture information about the currently visible objects, such as ants, birds and plants.


Use case examples

The professionals of live or semi-live subtitling or audio description benefit from improved work processes

Different audiences benefit from improved subtitles, audio description and (machine) translated versions of the subtitles

Yle staff member “Teppo” is subtitling live broadcast/webcast on an election night.

“Laura” lives in Belgium but would like to watch content created in other parts of Europe, even content produced in other languages. Thanks to automatically translated subtitles or audio descriptions, Laura can experience content otherwise inaccessible to her.

Tiina [email protected]

www.lingsoft.fi

Eteläranta 10, 00130 Helsinki, Finland

Kauppiaskatu 5 A, 20100 Turku, Finland

p. 02 2793300

http://www.lingsoft.fi

Documents

Advanced language technologies - European Commission · 2018-05-30 · Advanced language technologies enhance audiovisual translation production Tiina Lindh-Knuutila Audiovisual translation