Upload
tauyou
View
110
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Presentation by CPSL and tauyou at the tekom annual conference. It provides the case of a successful implementation of machine translation in a mid-size Language Service Providers.
Citation preview
Speaker: Speaker: BelBeléénn GarcGarcííaa--Ochoa (CPSL)Ochoa (CPSL)
CoCo--speaker: Diego speaker: Diego BartolomBartoloméé ((tauyoutauyou <language technology>)<language technology>)
Implementation of a Machine Implementation of a Machine
Translation Engine at CPSLTranslation Engine at CPSL
TheThe speakerspeaker
Localization Director at CPSL
CPSL is a Multilingual Service Provider since 1963
Headquarters in Barcelona-Spain
Other Offices in:
Madrid-Spain
Germany
UK
CPSL staff includes over 50 people
Belén García-Ochoa
TheThe coco--speakerspeaker
CEO tauyou <language technology>
tauyou provides language technologies for the localization industry since 2006
Main clients: medium-sized LSPs
Headquarters in Barcelona
Diego Bartolomé
CPSL and Machine Translation
Post-editing services provided to a software
company for a huge project
Lots of translated words in a tight timeframe
MainMain difficultiesdifficulties foundfound
LotsLots ofof clientsclients
DifferentDifferent subjectsubject mattersmatters
DifferentDifferent languagelanguage combinationscombinations
WorkaroundWorkaround
LotsLots ofof clientsclients::
A A listlist ofof thethe mostmost appropiateappropiate clientsclients forfor
usingusing thethe engineengine waswas createdcreated
BasedBased onon thisthis listlist, , wewe establishedestablished thethe
DifferentDifferent subjectsubject mattersmatters
AndAnd thethe
DifferentDifferent languagelanguage combinationscombinations
Human Human postpost--editingediting vs. vs.
humanhuman translationtranslation
TheThe standardstandard wordswords thatthat a a translatortranslator
can do can do perper dayday isis 2,5002,500..
TheThe standardstandard wordswords thatthat a a reviewerreviewer ofof
human human translationtranslation can do can do perper dayday isis
12,000.12,000.
AnAn average average ofof thethe wordswords thatthat can be can be
postpost--editededited perper dayday isis 8,000. 8,000.
Dedicated hybrid machine translation Dedicated hybrid machine translation
engine that is continuously customizedengine that is continuously customized
CorpusCorpus--based with rules for prebased with rules for pre-- and and
postpost--processingprocessing
Data confidentiality is guaranteedData confidentiality is guaranteed
Translation speedTranslation speed
The tauyou solutionThe tauyou solution
Any type of documentAny type of document
Glossary priorizationGlossary priorization
Fast domain creation/updateFast domain creation/update
Fully customizableFully customizable
Quality metrics computationQuality metrics computation
Terminology extractionTerminology extraction
Main characteristicsMain characteristics
gather ingather in--domain datadomain data
train the translation solutiontrain the translation solution
enrich solution with related textenrich solution with related text
terminology priorizationterminology priorization
update the translation solutionupdate the translation solution
add rules to enhance qualityadd rules to enhance quality
weekly updatesweekly updates
Optimum domain creationOptimum domain creation
Optimize translation quality for a clientOptimize translation quality for a client
gather client datagather client data
train the translation solutiontrain the translation solution
add rules to enhance qualityadd rules to enhance quality
continuous improvementcontinuous improvement
CPSL workflow 1CPSL workflow 1
General purpose translatorGeneral purpose translator
gather clients datagather clients data
add generic texts to provide a good sampleadd generic texts to provide a good sample
train the translation solutiontrain the translation solution
add rules to enhance qualityadd rules to enhance quality
periodical improvementperiodical improvement
CPSL workflow 2CPSL workflow 2
Data creation and enhancementData creation and enhancement
user defineduser defined
unaligned translated documentsunaligned translated documents
generic translationsgeneric translations
optimum corpus/memories creationoptimum corpus/memories creation
rulerule--based extension/filtering based extension/filtering
Other use casesOther use cases
tauyou interfacetauyou interface
Tabs can be customizedTabs can be customized
Detailed analysis of translated documentsDetailed analysis of translated documents
Several customized parameters, including word Several customized parameters, including word
error rate, number of word edits, tag differences, etcerror rate, number of word edits, tag differences, etc
Useful in machine translation but also in normal Useful in machine translation but also in normal
quality processquality process
Quality metricsQuality metrics
Unilingual and bilingual terminology listsUnilingual and bilingual terminology lists
Customized according to position in the sentence, Customized according to position in the sentence,
word type, number of words, etcword type, number of words, etc
Feed the MT engine or tool for human translatorFeed the MT engine or tool for human translator
Terminology extractionTerminology extraction
Increase usage of translation memoriesIncrease usage of translation memories
Automatic domain classificationAutomatic domain classification
Source text enhancement Source text enhancement
spelling, grammar, structure, terminology ...spelling, grammar, structure, terminology ...
Special words detectionSpecial words detection
New domains/language pairs creationNew domains/language pairs creation
The futureThe future
QuestionsQuestions??
[email protected]@cpsl.com
www.cpsl.comwww.cpsl.com
[email protected]@tauyou.com
www.tauyou.comwww.tauyou.com