Upload
caleb-ryan
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
www.systransoft.comwww.systransoft.com 11 TM
Translating Subtitles using Machine TranslationTranslating Subtitles using Machine TranslationPractices, Problems, MethodologyPractices, Problems, Methodology
Elsa Sklavounou, Ph. D.Elsa Sklavounou, Ph. D.Linguist, Co-funded Projects Technical CoordinatorLinguist, Co-funded Projects Technical CoordinatorSYSTRANSYSTRAN
www.systransoft.comwww.systransoft.com 22 TM
SYSTRAN MT Customization MethodologyOverview
A customization project involves three different customization levels that provide incremental higher translation quality:
Basic Terminology
Complex Terminology
Linguistic Rules
www.systransoft.comwww.systransoft.com 33 TM
SYSTRAN MT Customization MethodologyOverview
Basic TerminologyThe first step entails the creation of a User Dictionary that covers most of the noun terminology in the corpus, and various simple adjective and verb terms.
Complex TerminologyThe second level concerns the coding of complex terminological entries; such as the coding of complex verbs with their complements (subject, object…) and their translations.
Linguistic RulesThe third level involves language-specific code modifications in the SYSTRAN linguistic modules.
www.systransoft.comwww.systransoft.com 44 TM
SYSTRAN MT Customization MethodologyLevel 1 & Level 2
Customization level 1 and 2 focuses on the implementation in the systems of specialized terminology from the corpus. Level 1 and 2 tasks include:
Simple and complex terms extraction ;
Simple and complex terms translations ;
Simple and complex terms coding ;
Simple and complex terms review ;
www.systransoft.comwww.systransoft.com 55 TM
SYSTRAN MT Customization MethodologyLevel 1 & Level 2
Step 1: Corpus installation and analysis
Prerequisite 1: a formatted corpus
Step 2: Term extraction
Simple terms (nouns and noun expressions)
Complex terms (verb patterns)
DNT (Do Not Translate) integration
www.systransoft.comwww.systransoft.com 66 TM
SYSTRAN MT Customization MethodologyLevel 3
Customization level 3 focuses on the implementation of linguistic rules uniquely adapted to language-specific syntactic and semantic issues found in translations taken from the corpus. Level 3 tasks include: Detailed linguistic evaluations and the development of a comprehensive customization plan:
Implementation of customized rules Regression tests Correction of linguistic translation errors Acceptance testing before release
www.systransoft.comwww.systransoft.com 77 TM
SYSTRAN MT Customization MethodologyQuality Levels
Estimate of the quality levels that may be achieved for each customization level.
www.systransoft.comwww.systransoft.com 88 TM
SYSTRAN MT Customization MethodologySoftware Tools
The process for coding simple and complex terms and related dictionary maintenance is managed by the SYSTRAN Linguistics Platform that integrates the following two tools, required to complete customization levels 1 and 2.
www.systransoft.comwww.systransoft.com 99 TM
SYSTRAN MT Customization MethodologySoftware Tools
SYSTRAN Dictionary Manager
The SYSTRAN Dictionary Manager (SDM) enables translators to build and manage multilingual dictionaries. SDM includes preparation steps for dictionary coding tasks, an online dictionary lookup (via an HTML interface), and a compiler for runtime machine translation dictionaries. It is composed of three main components: a database, HTML query form (dictionary lookup, reports, logs, import and export) and a Windows client (interactive coding tool).
www.systransoft.comwww.systransoft.com 1010 TM
SYSTRAN Customization Methodology Software Tools
The SYSTRAN Review Manager (SRM) is a productivity tool used for
the review quality assessment and maintenance of linguistic resources used combined with a SYSTRAN system.
www.systransoft.comwww.systransoft.com 1111 TM
SYSTRAN Customization MethodologyPrerequisite 1:
a formatted grammatical corpus
Grammar Writing RulesUsing ArticlesAvoiding Speech AmbiguityUsing EnumerationEnsuring Subject-Verb AgreementUsing Prepositions Using Infinitives at the Beginning of Sentences Using Imperatives Observing Punctuation RulesUsing Main ClausesUsing Subordinate ClausesUsing Relative Clauses
Avoiding Multiple StackingUsing Compound Words Using Capitalization Using Spelling VariationsLexical Ambiguities Disambiguation of Product Names and MenusAvoiding Lexical AmbiguitiesUsing CompoundsFormat and Typographical IssuesSegmentation
www.systransoft.comwww.systransoft.com 1212 TM
SYSTRAN Customization Methodologyfor MUSA
Two-process fully-automatically generated Corpus: Speech Recognition (KU Leuven),Automatic Sentence Compression (CNTS)
First priority
Subtitles Constraints
Second Priority
The least possible ambiguous content
Lesson learned : No prerequisite
www.systransoft.comwww.systransoft.com 1313 TM
SYSTRAN MT Customization Methodology
Upgraded Software Tools (Client Tools v5)
www.systransoft.comwww.systransoft.com 1414 TM
SYSTRAN Translation Project Manager Terminology Review
Not Found Words Extraction
Reviewing Terminology and Sentences
The Terminology Review tab in the Review window lets you identify expressions such as Not Found Words or Terminology extracted by the software.
www.systransoft.comwww.systransoft.com 1515 TM
SYSTRAN Translation Project Manager Terminology Review
Not Found Words Extraction
Examples
SRC_Idthese parents know measles can be dangerous, but they don't want their child to have MMR, the triple vaccine which protects them from measles, mumps and rubella.
Raw MTces parents savent la rougeole peut être dangereuse, mais ils ne veulent pas que leur enfant a MMR, le vaccin triple qui les protège contre la rougeole, les oreillons et la rubéole.
www.systransoft.comwww.systransoft.com 1616 TM
SYSTRAN Translation Project ManagerAlternative Meanings
Alternative Meanings
shows alternative translations based on different meanings of a source word or expression.
The Alternative Meanings tab in the Review window shows alternative meanings for expressions in SYSTRAN or User Dictionaries
www.systransoft.comwww.systransoft.com 1717 TM
SYSTRAN Translation Project ManagerAlternative Meanings
Examples
SRC_Id
they'd rather pay for single vaccines at 60 pounds a shot, even though the government insists MMR is safe.
Raw MT
ils payeraient plutôt les vaccins uniques à 60 livres un coup de feu, quoique le gouvernement exige que MMR est sûr.
Customized MT
ils payeraient plutôt les vaccins uniques à 60 livres une injection, quoique le gouvernement exige que MMR est sûr.
www.systransoft.comwww.systransoft.com 1818 TM
SYSTRAN Dictionary Manager User Dictionaries (UDs)
User Dictionaries (UDs) let you increase the quality of source language analyses, which also increases thetranslation output for all associated target languages. UDs can be used for a number of functions, including:Automatically translating Not Found Words in the SYSTRAN dictionary.Overriding the target-language meaning of a word or expression in the SYSTRAN dictionaries, a capability that lets you customize translation output to fit specific needs.Ensuring that an expression is always treated as a unit by SYSTRAN analysis programs.
www.systransoft.comwww.systransoft.com 1919 TM
SYSTRAN Dictionary Manager User Dictionaries (UDs)
Metrics
Type of DictionaryENFRENEL
Do Not Translate Words3532 entries (enxx)
Proper Nouns1495 entries (enfr)1495 entries (enel)
MUSA Terminology1443 entries (enfr)5228 entries (enel)
www.systransoft.comwww.systransoft.com 2020 TM
SYSTRAN Dictionary Manager User Dictionaries (UDs)
Examples
SRC_IDAndrew Wakefield ignited the debate over MMR by announcing the findings of research into a group with autism and bowel disease.
Raw MTAndrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec la maladie d'autism et d'entrailles.
Customized MTAndrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec autisme et maladie d'entrailles.
www.systransoft.comwww.systransoft.com 2121 TM
SYSTRAN Translation Project Manager Source Analysis
Interactive Disambiguation
The Source Analysis tab in the Review window shows how the software handled source ambiguities and allows you to override the software selections.
www.systransoft.comwww.systransoft.com 2222 TM
SYSTRAN Translation Project Manager Source Analysis
Interactive Disambiguation
Examples
ID 523At first we thought it was parts of the building but it was people, literally people falling all around us.Raw MT
D'abord nous avons pensé que ce faisait partie du bâtiment mais c'était les gens, peuplent littéralement la chute tout autour de nous. Customized MTD’abord nous avons pensé que c’etait des fragments du bâtiment, mais c’était des gens, littéralement des gens qui tombaient autour de nous.
www.systransoft.comwww.systransoft.com 2323 TM
SYSTRAN Dictionary Manager Normalization Dictionaries (NDs)
Normalization Dictionaries (NDs)There are two types of Normalization Dictionaries (NDs): source normalization and target normalization.Source normalization normalizes source document before translation. Target normalization adapts translation output to user needs in term of terminology consistency. It can also provide a way to replace expressions chosen by the software’s translation engine with user-defined expressions.
www.systransoft.comwww.systransoft.com 2424 TM
SYSTRAN Dictionary Manager Normalization Dictionaries (NDs)
Examples
SRC_IDswe did n't know she had measles but we do. I mean I ca n't help...
Raw MTnous avons fait le n't savons qu'il a eu la rougeole mais nous faisons. Je veux dire l'aide de n't d'I ca…
Customized MT via SRC Normalizationnous n'avons pas su qu'il a eu la rougeole mais nous faisons. Je veux dire que je ne peux pas aider
www.systransoft.comwww.systransoft.com 2525 TM
SYSTRAN Translation Project Manager Sentence Review
for Translation Memory Construction
The Sentence Review tab in the Review window compares sentences in the source and target. You can then check the sentences you want to send to User Dictionaries, where you can work with them further in order to post-edit them and construct Translation Memories.
www.systransoft.comwww.systransoft.com 2626 TM
SYSTRAN Dictionary Manager Translation Memories (TMs)
Translation Memory (TM)
A set of translated and validated sentences that can be integrated into the translation process. Translation Memories (TMs) are databases of aligned pre-translated sentences. Unlike Dictionaries, TMentries can be formatted (for example, italic or bold) and are used by the translation engine to performmatches on full sentences in the source document. TMs are not usually created manually, but are built usingSYSTRAN’s Translation Project Export or from TMX files.
www.systransoft.comwww.systransoft.com 2727 TM
SYSTRAN Dictionary Manager Translation Memories (TMs)
Examples
ID 370Now people kind of started panicking and said we've got to leave no matter what.
Raw MTMaintenant sorte de personnes de panique commencée et dite nous avons pour laisser n'importe ce que. Customized MTLes gens maintenant avaient l’air de paniquer disant qu’ils devaient à tout prix partir.
www.systransoft.comwww.systransoft.com 2828 TM
SYSTRAN Dictionary Manager Translation Memories (TMs)
Translation Memory Import/Export
Already existent Tmx standard translation memory exchange files can be imported/exported via SYSTRAN Dictionary Manager .