11

Click here to load reader

2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

  • Upload
    tauyou

  • View
    76

  • Download
    2

Embed Size (px)

DESCRIPTION

Presentation by Diego Bartolome, tauyou CEO at the TAUS Executive Forum that was held in Barcelona in 2011. It provides insights into how to deal with specific data and linguistic issues that arise when you are creating machine translation solutions for Language Service Providers.

Citation preview

Page 1: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #1

language technology

for

optimum localization

Diego Bartolomé, CEO

Page 2: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #2

optimum workflow

gather in-domain data

train the translation solution

enrich solution with related text

terminology priorization

update the translation solution

add rules to enhance quality

weekly updates

Page 3: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #3

data issues 1

<large volume of heterogeneus data>

training with all the data

semantic classification for domain selection

fine tuning for each client

glossary priorization

continuous machine learning

Page 4: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #4

data issues 2

<scarce data>

add dictionaries into corpora

complementary segments from memories

balance client data with generic texts

in-domain adaptation of generic system

increase the number of sentences with rules

Page 5: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #5

data issues 3

<dirty data>

remove multiple translations

eliminate text in other languages

correct spelling

select sentences with correct grammar

automatic alignment with client terminology

filter out other undesired segments

Page 6: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #6

data issues 4

<data creation and enhancement>

final client defined

unaligned translated documents

generic translations

optimum corpus/memories creation

rule-based extension/filtering

Page 7: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #7

linguistic issues 1

<untranslated words>

dictionary creation

<grammatical errors>

post-processing rules

<blind quality filtering>

do not translate sentences below threshold

Page 8: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #8

linguistic issues 2

<source text cleaning>

spelling and grammar

sentence simplification

terminology homogenization

<special words detection>

people, places, organizations

alphanumeric codes

Page 9: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #9

use case

<recurrent small volumes>

frequent translations

clients from different domains

<workflow>

gather as much data as possible

receive a new file for translation

create an ad hoc domain for that file

train the translation solution + basic rules

<output>

optimum adaptation for a file in around 4 hours

Page 10: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #10

<tauyou_text> snapshot

<fully customizable>

look and feel + functionality

Page 11: 2011 TAUS Executive Forum Barcelona: Language Technology for optimum localization

© 2011 #11

Thanks!

// Diego Bartolomé, PhD

<address> C/ Les Planes 39 – 08201 Sabadell – Spain

<phone> +34 93 711 29 96

<cell> +34 670 331 225

<email> [email protected]

<www> tauyou.com