Training Guide PE Certification

8/17/2019 Training Guide PE Certification

1/79

Training_Guide_PE_CertificationRevision Date: 30/10/2013

SDL CertificationPost-editing Certification


2/79

ii

Table of contents

1 Introduction

1.1 About this training workbook ...................................................................................................................... 1

2 A brief history of post-editing and MT

2.1 What is MT? ............................................................................................................................................... 2

2.2 MT development in the last century ........................................................................................................... 2

2.3 A short history of MT at SDL ...................................................................................................................... 5

3 Post-editing versus Translation

3.1 Global developments and the localisation industry .................................................................................... 8

3.2 Why post-edit? ......................................................................................................................................... 10

3.3 Why translate? ......................................................................................................................................... 11

4 MT Technologies

4.1 The challenges of MT .............................................................................................................................. 12

4.2 Rules-based Machine Translation (RBMT) .............................................................................................. 14

4.3 Statistical Machine Translation (SMT) ..................................................................................................... 18

4.4 Hybrid Systems ........................................................................................................................................ 21

5 How the MT output is created

5.1 Baselines ................................................................................................................................................. 23

5.2 Verticals ................................................................................................................................................... 24

5.3 Customisations ........................................................................................................................................ 26

5.4 Engine training process ........................................................................................................................... 27

6 From the MT output onwards: the basics of post-editing 6.1 Introduction to post-editing ....................................................................................................................... 34

6.2 Degrees of post-editing ............................................................................................................................ 35

6.3 The quality check process ....................................................................................................................... 38

7 How to get the most out of MT

7.1 What makes an effective post-editor? ...................................................................................................... 40

7.2 Post-editing quality expectations ............................................................................................................. 41

7.3 Under-editing ........................................................................................................................................... 43 7.4 Over-editing ............................................................................................................................................. 44


3/79

iii

7.5 Help improve MT for the future ................................................................................................................ 47

8 Expected Statistical MT behavior

8.1 Common patterns to watch for when post-editing .................................................................................... 50

8.2 How to provide feedback to improve the MT output ................................................................................. 52

9 Using BeGlobal baselines in SDL Trados Studio

9.1 BeGlobal baselines .................................................................................................................................. 59

9.2 How to add SDL BeGlobal Community as a translation provider in SDL Trados Studio .......................... 59

10 Summary

10.1 Conclusion to training workbook .............................................................................................................. 63

11 Further references

11.1 More information on MT and post-editing ................................................................................................ 65

12 Appendix:

12.1 Post-editing examples.............................................................................................................................. 67


4/79

1

1 Introduction1.1 About this training workbook

The scope of this training workbook is to introduce the reader to the techniques and

skills involved in post-editing machine translation (MT) output. It provides practical

examples of best-practice post-editing and recurrent issues such as over-editing and

under-editing. Moreover, it aims to familiarise translators with MT technology in order to

enable their involvement in the entire process from training engines to post-editing

content to publishable quality.

The document covers the following areas:

The history and development of MT

The various MT technologies currently used and the effects they have on the

quality and post-editability of the MT output

The post-editing and quality check processes and their relation to conventional

human translation A guide to effectively post-editing MT output to understandable and publishable

quality

Common patterns to watch for when post-editing MT output

Using BeGlobal baselines in Studio

Where to find further information on MT and post-editing processes

In addition, the document aims to address some of the common misconceptions about

MT:

MT is taking away my job

MT output is always low quality

MT material is only useful when it can be easily edited

MT does not leave any room for creativity

MT does not fit with my translation style

MT technology is too complicated Post-editing is less skilled than translation


5/79

2

2 A brief history of post-editing and MT2.1 What is MT?

Machine Translation (MT) is automated translation that uses software to translate text

from one natural language to another. It is one of the oldest applications of Artificial

Intelligence and both facilitates and accelerates the creation of high quality translations.

Post-editing MT output can increase productivity in comparison with conventional

translation. It allows companies to deliver a high quality translation at greater speed,

and consequently at lower cost, and as such can be considered a new industry “trend”.

However, it is important to remember that MT does not replace human translators. MT

is a tool rather than an end solution and a stage of human correction will always be

necessary when post-editing to a publishable quality. Nonetheless, it is an effective tool

when understood and used correctly.

Uses of Machine Translation

2.2 MT development in the last century

Following on from the efforts of war-time cryptography, MT is generally considered to

have started in the 1950s. In 1954, the successful execution of the Georgetown

• MT is generated by baseline engines orcustomised engines and the output is useddirectly, with no human intervention.

• This solution is used mostly for content such asemails, support content or instant messages,where the user wants to have an idea of thecontent, without the need for high quality.

Fully AutomatedUseful Translation

(FAUT)

• MT output from customised MT engines ispost-edited by linguists to a quality levelequivalent to conventional translation.

• Post-editing MT content is the preferredsolution for publishable documents. It is usedas part of a high quality translation process.

Post-editing


6/79

3

Experiment - the fully automated translation of approximately sixty Russian sentences

into English - ushered in an era of significant funding for MT research in the USA.

Researchers believed they could produce a fully automated MT system within three tofive years. This endeavour proved more difficult than expected, however, and ten years

later funding was cut when it became clear that the development of MT had not

progressed as far as originally hoped.

Early attempts at MT typically failed because of a lack of coverage. The models

functioned by encoding a limited selection of transformational rules that simply did not

provide for the diversity of natural language translation. Consequently, the first attempts

in the 1970s and 1980s to commercialise MT operated by drastically increasing the

number of encoded transformational rules. This produced Rules-Based Machine

Translation (RBMT), which functioned relatively successfully with targeted human

feedback over a particular domain. However, this led to the further problem of how to

make the abundance of transformational rules needed to encode language pairs co-

operate with one another. The answer was a statistical approach to MT.

In the late 1980s, computational power increased and became less expensive and as a

result interest picked up in Statistical Machine Translation (SMT). From the 1990s,

statistical learning approaches came to the fore, led by cutting-edge work from the IBM

research team. SMT systems no longer required the same human effort to encode

transformational rules and update lexicons and terminology lists, but rather exploited

the wealth of existing translations, covering numerous language pairs, to extract rules

based on statistical probability.

Since the 1990s, SMT has been pushed forward through intensive research and

training as well as support from Industry, Defence Advanced Research Projects Agency

(DARPA) and EC FP7. Statistical MT has been deployed in real-world, commercial

contexts by Language Weaver (now part of SDL), Google, Microsoft and IBM, and

there is on-going research into hybrid phrase-based and syntax-based MT. In 2011,

SMT was boosted with Google's announcement that it would charge for access to the

Google Translate API. Shortly afterwards, Microsoft also announced that it would start

charging for use of the Microsoft Translator API. These two events can be viewed as akey milestone for the Machine Translation Industry and the Localisation Industry as a


7/79

4

whole. The progression to a paid API model for machine translation is a clear sign that

both the use and the quality of MT has matured to a level where enterprises and

developers see sufficient value in MT to invest in it.

After many decades, it appears that the models used in MT are more in line with our

understanding of how human language cognition and processing operates. This does

not mean that MT output is of an equal standard to output produced by the human

brain. However, we now understand more about what MT can contribute to the

Localisation Industry and have an invaluable tool for translation that is becoming ever

more prominent in the field.

MT accuracy is improving every year and many new techniques are being developed

and deployed as the field becomes more and more interdisciplinary, drawing from

computer science, linguistics, probability theory, algorithm design, automata theory and

engineering.


8/79

5

Some facts about MT today

2.3 A short history of MT at SDL

SDL first adopted MT into Translation Services in the year 2000 after acquiring a Rules-

Based Machine Translation (RBMT) engine from Transparent Language, which

became SDL Enterprise Translation Server (ETS). In 2004, the Knowledge-based

Translation System (KbTS) Group was set up to use ETS in a high quality translation

process.

In 2009, Statistical Machine Translation (SMT) was beginning to establish itself firmly in

the localisation industry following rapid development. SDL forged a strategic

partnership with leading SMT developer, Language Weaver, allowing SDL to extend

the languages supported by MT.

In 2010, SDL acquired Language Weaver and are continuing to invest heavily in the

development of SMT technology. SDL rolled out this capability to their Production

• Of the top 50 global companies, 53% publicallyacknowledge that they use an MT solution

• 54% of non-Anglophones use MT when visiting Englishlanguage websites

• 75% of people use free MT tools

• It is estimated that at least three-quarters of web userstake advantage of free translation tools due to the greateraccessibility and integration of MT solutions.


9/79

6

Offices which resulted in a huge increase in scalability and allowed the process to grow

rapidly. KbTS was re-branded in 2011 to iMT (intelligent Machine Translation) and the

first post-editing projects were rolled out using SDL Language Weaver SMT.

Today the SDL iMT department consists of an in-house team of language specialists,

MT scientistis and project managers, supplemented by trained teams in the Production

Offices plus a large fully-trained freelance post-editing team. The iMT team are

responsible for the maintenance of MT engines and for all MT evaluations and

customisations within SDL Global Solutions. The Project Management team manages

the set-up of projects, plans and schedules the customisations. The linguists are

responsible for evaluating the project data for MT suitability based on the content to be

translated. Once the project is approved for MT, the linguists prepare the data, test the

results and organise training for the linguistic team in the Production Office as well as

the freelancers who will work on the project. This approach of preparation, testing and

training helps guarantee a high quality MT engine and therefore a high quality final

translation.

And as for the future, developments within MT are made through improved models and

algorithms as well as by adding more high quality training data. SDL is constantly

working on improvements to the machine translation technology so that even better MT

engines can be created going forwards. The future for MT at SDL is full of possibilities

and iMT will be on-hand to offer its many years of expertise as the range of MT

solutions increases.


10/79

7

Brief Timeline of MT at SDL

2000

Acquisition of Rules Based Machine Translation (RBMT)

engine from Transparent Language: SDL EnterpriseTranslation Server (ETS)

2004 Knowledge-based Translation System (KbTS) Group set up touse ETS in a high quality translation process

2009 Partnership with Language Weaver (LW)

2009 Training from LW on how to customise SMT engines

2010 Rollout of post-editing process to Production Offices

2010 Due diligence and acquisition of Language Weaver

2011 Re-branding of KbTS to iMT

2011 First iMT projects using SMT

2013 Continued development of SMT within SDL


11/79

8

3 Post-editing versus Translation3.1 Global developments and the localisation industry

An increasing number of companies are entering the international market and are

publishing localised materials in a bid to reach more customers and realise greater

sales opportunities. This is based on the finding that 85% of consumers feel that having

pre-purchase information in their own language is a critical factor in buying services.

IBM estimates that 2.5 quintillion (1018) bytes of data are created every day and that

90% of corporate data originated in just the last three years. On average, companies

translate this content into 11 languages. At the same time, strong competition and the

need for faster turnaround times means that there is an immediate need to lower costs

and achieve savings through efficient and streamlined technology processes.

Key trends impacting on global businesses

Many of the recent trends affecting global business and information management will

have important consequences for the field of translation in the coming years. By the

end of 2014 there will be 2 billion users of computers and most of the growth forecast is

in the upcoming markets. This means that there will be more customers for software

• Business globalisation

• Internet use of multiple devices

• Explosive growth in digital content

• Effective targeting and revenue capture

• Growth of translated content

• Multimedia and video

• Extreme brand management across all channels

• Social media and community


12/79

9

and appliances and consequently a larger need for translations of user interfaces and

manuals.

In addition, by 2014, there will also be 2.5 billion users of the internet, which is 36% of

the world‟s population, compared with 22% in 2010. Information equivalent to 10 billion

DVDs will be sent over the internet each month. Not everyone will be able to access the

information in the language of origin and consequently there will be a larger demand for

translations in order to make information as widely accessible as possible.

Furthermore, Cloud Computing has also begun to make an impact in the technologies

industry. The use of the cloud is growing, and more and more users will needtranslations of the materials and content. The user interfaces will also require

translation as the number of end-users with different language requirements grows.

Thus, the demand for translation of both content and the interface itself is steadily

increasing.

Finally, social networking tools are rapidly increasing in popularity. The content lacks

specific structure and often involves interaction between users in various languages.

Companies are increasingly adopting social networking and professional use will

ultimately mean that more translations are needed and in a shorter time – in fact, often

in real-time, as and when content is created. Again, this will result in a greater need for

translation.

In all of the above, the importance of English as a global lingua franca is slowly

decreasing. Between 2000 and 2010, the two languages with the greatest growth on

the internet were Arabic and Mandarin Chinese – both of which grew twentyfold. In

contrast, content in English „only‟ tripled. Proportionally, then, English is declining in

importance relatively quickly. It is estimated that by 2020 English will have lost its status

as a lingua franca altogether. However, rather than being replaced with another natural

language, linguistic diversity will be the new status quo and translation will be key to

communication. In summary, then, there will be an increasing demand for more content

at greater speed and in an increasing number of languages.

So the question is, how can MT and post-editing help respond to these trends?


13/79

10

3.2 Why post-edit?

In the last few years, there have been significant developments in MT technology. SDL

has always been up to date with this development, and uses MT mainly to increase

efficiency whilst still delivering quality. This is achieved through integration of the MT

engines with SDL‟s translation environments – SDL Trados Studio, TMS and

WorldServer – which results in a streamlined process, leading to faster turnarounds

and higher cost-effectiveness.

A growing number of SDL‟s customers and freelance translators now rely on MT for a

high-quality, integrated translation process. Customised machine translation enginesdeliver output of such good quality that post-editing is faster than translating from

scratch. Indeed, MT solutions can reduce production times by as much as 50% in some

cases. As such, many clients consider MT the only viable way to process the enormous

volume of content they need to localise. Moreover, in certain cases, it allows the client

to consider translating content that they would not otherwise have tackled as the cost

would have been prohibitive.

However, post-editing is not only of value to the client but also has many advantages

for the translator. SDL‟s intelligent Machine Translation will help freelance translators to

remain competitive and save time. We combine our SMT technology with project-

specific Translation Memories to produce translations of post-editable quality that can

help to increase productivity. Post-editing is not inferior to conventional translation but

requires all the usual translation skills – such as domain knowledge, excellent

command of the source and target language, proficiency with CAT tools – plus a

willingness to embrace new technological advances.

The demand for MT solutions is growing quickly and post-editing is rapidly becoming a

basic skill for translators. Learning how to post-edit will give linguists a foothold in an

evolving market and open up new freelance possibilities. We have seen a real swing in

attitudes in the last few years with many clients looking to MT as the default option to

help deliver translation faster and cheaper – without sacrificing quality.

In summary, the following client and translator benefits apply:


14/79

11

3.3 Why translate?

Whilst post-editing can provide a number of benefits for clients and translators alike, not

all projects will be suitable for post-editing. Because MT typically reproduces the

material used to train the engine, previously unseen material can present difficulties.

This is particularly common in text types with highly complex sentence structures or

very specific terminology and texts with a high amount of ambiguity which require

translations to move away from the source.

At SDL, all content is evaluated carefully before a project or part of a project is

considered for MT. Machine Translation technology is improving all the time and

content types that were not suitable two years ago, are now handled very productively

using Machine Translation. In some cases, however, conventional translation will still

be the recommended solution for the foreseeable future.

• Lower cost• Faster time to market

• Publishable quality• Higher volumes for translation• Ability to handle digital content explosion

Clientbenefits

• A valuable new skill that opens moreopportunities

• Competitive edge in an evolving market• Greater speed and efficiency

• Higher volumes compensate for lower post-editing rates

Translator

benefits


15/79

12

4 MT Technologies4.1 The challenges of MT

MT shares many of the challenges of human language translation. These include the

ambiguity and polysemy of natural human language as well as the high levels of

linguistic diversity between languages. Particularly, where there is variation in the

morphological or syntactic characteristics of a language it becomes much harder for MT

to match the source and target phrases. Given that no linguistic information is encoded

into the statistical model this often presents problems.

Some of the main issues and active research problems for MT (as well as conventional

translation) are summarised below:


16/79

13

The challenges of MT

• Domain and genre: vocabulary; style (including active vs. passive)and sentence length will vary accordingly.

• Ambiguity: human language is ambiguous on both lexical andsyntactic levels

• E.g. "bank" can be the financial institution or the edge of a river

• E.g. "I saw the man with the telescope" - Is it the man or the speakerwho is holding the telescope?

• Variation in morphology and word order

• E.g. case and definiteness endings in Hungarian, and Swedish

• E.g. Verb - Subject - Object order in Arabic and Hebrew

• No one-to-one translation: a word that covers many social, culturaland linguistic meanings in one language may require finer distinctions

in another language and vice versa

• E.g. politeness levels in Japanese

• E.g. German "Tasse" = English "mug" or English "cup"

• Idioms: difficult to translate like any other form of formulaic language

• E.g. French "Avoir les dent longues" = English "To be ambitious" (Lit:"To have long teeth")

• Language specific characteristics• E.g. Arabic tokenisation, Chinese word segmentation, etc.


17/79

14

4.2 Rules-based Machine Translation (RBMT)

Chronologically speaking, Rules-Based Machine Translation (RBMT) was the first

approach to automated translation. It involves parsing a source sentence, analysing the

structure, converting this to a machine-readable code and then transforming it into the

target.

The core system is based on a set of grammatical rules for each of the languages,

combined with a dictionary. The dictionary contains source words and phrases, their

translations and detailed grammatical information, such as part of speech and

inflection. It provides the modules with the linguistic knowledge they need.

The rules are the “linguistic processor” of the system, responsible for analysis and

generation. They use linguistic information stored in the dictionary. These rules are

intended to represent the grammatical knowledge of speakers and specify inherent

agreement and relational information.

At the translation stage, the MT engine analyses each source sentence and tags the

words and phrases with their part of speech to identify the grammatical components, for

example, the subject, object and verb. The MT system then looks up the translations of

these grammatically tagged words and phrases in the machine dictionary and

combines them using the coded language rules for the target language. This builds the

translated sentence.

A large core dictionary provides the translations for everyday words and phrases. For

translations that use special terminology, an RBMT system can use custom dictionariesin conjunction with the baseline to improve translation accuracy.

Example

Determiner and noun need to agree in number and gender

Subject and finite verb need to agree in number


18/79

15

How to recognise RBMT output

The RBMT output is based on 3 factors:

Rules for language pair

General settings that can be customized (such as quotation marks, verb tense,

accents, decimal point)

The project dictionary where the specific terminology is entered and which is key

to improve the MT quality.

Some common issues can be identified when post-editing rules based machine

translation. Here we include some examples from English into French, Italian, Spanish,

Portuguese, Dutch, German, Swedish, and Finnish, which are the most common

languages for RBMT.

In order to recognise MT error patterns, post-editors should look out for the following

potential issues when post-editing.

Use of superfluous articles

Superfluous articles are commonly added in most languages, these can also occur

before proper nouns.

EN Source: Free High Speed Internet Access!

IT MT output: l‟ Accesso gratuito a internet ad alta velocità!

IT Post-edited: Accesso gratuito a internet ad alta velocità!

EN Source: Oil filter unit: Removal - Refitting

FR MT output: Bloc filtre à huile : La dépose - la Repose

FR Post-edited: Bloc filtre à huile : Dépose - Repose

Use of simple prepositions


19/79

16

When a term has not been entered in the Customised Dictionary, simple prepositions

are used and they should to be corrected when needed.

EN Source: Reconnect ECT sensor electrical connector.

FR MT output: Reconnecter le connecteur électrique de capteur ECT

FR Post-edited: Reconnecter le connecteur électrique du capteur ECT

Acronyms automatically translated into terms

When a specific acronym has not been entered in the Customised Dictionary it is

automatically and consistently translated into a common term which exists in the Core

Dictionary.

EN Source: MR

IT MT output: Sig.

DE MT output: Herr

FR MT output: M.

Proper nouns translated literally

EN Source: Thanks to Peter Ferry for reporting the VBScript/Jscript BufferOverrun Vulnerability.

IT MT Output: Grazie al Traghetto di peter per segnalare la Vulnerabilità legata al

sovraccarico del buffer di VBScript JScript.

IT Post-edited: Grazie a Peter Ferry per aver segnalato la vulnerabilità legata alsovraccarico del buffer di VBScript JScript.

EN Source: He lives in Palm Springs.

FR MT output: Il habite à Printemps de Paume.

FR Post-edited: Il habite à Palm Springs.


20/79

17

Capitalisation issues

The MT follows the source capitalisation, unless specific terms have been entered in

the Customised Dictionary with the required capitalisation (problem especially in IT

texts, e.g. UI options)

EN Source: Click Add Custom Phone Tune.

FR MT output: Cliquez sur Ajoutez l'Air Personnalisé de Téléphone.

FR Post-edited: Cliquez sur Ajouter une mélodie de téléphone personnalisée.

EN Source: Select the appropriate option in the Automatic Synchronizationsection

PT-BR MT output: Selecione a opção apropriada na seção Sincronização Automática

PT-BR Post-edited: Selecione a opção apropriada na seção Sincronizaçãoautomática

Disambiguation of homographs

You can encounter what we call “homograph resolution”. This means that the same

source term can be translated as a noun AND a verb (or an adjective, etc.), for example

NETWORK (a network, to network/networking).

When there is a homograph resolution issue, the entire syntax is misanalysed.

In the following examples the nouns are interpreted as verbs:

EN Source: Check box D6 on the blue label

DE MT output: Kasten D6 auf dem blauen Aufkleber prüfen

DE Post-edited: Kontrollkästchen D6 auf dem blauen Aufkleber

PT Source: The water reservoir does not contain enough water .

PT MT output: O reservatório de água não contém suficiente aguar .


21/79

18

PT Post-edited: O reservatório de água não contém água suficiente.

Compound formation and hyphenation issues

For some languages such as German and Finnish compounding rules may work. If they

do not work, the post-editor must amend accordingly and the term should get encoded.

RBMT – Pros and Cons

RBMT allows for excellent terminology control. There is no need for pre-existing TMs

as project dictionaries can be created from scratch and the output is systematic, rightly

or wrongly, meaning that experienced post-editors can post-edit quickly and reliably

with time. However, it can take a number of years to develop a new language pair and

the source must be well-written to generate good output. Moreover, project dictionaries

are time-consuming to create and therefore expensive to maintain and output is often

not very fluent and not sensitive to context, providing a single translation per term.

4.3 Statistical Machine Translation (SMT)

A Statistical Machine Translation (SMT) system learns to translate by analysing large

volumes of previously translated content. The starting point for training an engine is an

aligned corpus of source and translated sentences of hundreds of millions of words.

The training process subdivides each of the source sentences into words and series of

words (n-grams) and analyses the associated translated sentences. In this way the

training process determines for each n-gram in the source the most likely set of

• A lot of control of rules and terminology• Once the grammar is established, new projects can be created

from scratch relatively quickly

• Once set up, projects are easy to maintain• Consistent use of terminology

Pros

• The grammar is very time-consuming to develop• Rather literal translations• Too context-sensitive

Cons


22/79

19

translations. By analysing just the translated content, the training process learns the

order in which the translated words are most likely to occur. The more training data and

the more consistency there is in this data, the more accurate the process becomes.

In the next stage of the process, the system compiles all of the learned data into the

runtime MT engine. The runtime MT engine subdivides each sentence into smaller

chunks and looks up the possible translations in the compiled database. For a given

source sentence this process results in many possible translated sentences. The MT

engine uses the statistical data on the probability of a translation and the word order to

determine the best candidate for the MT output.

For general purpose translations, the system uses a baseline language engine that is

trained with a large corpus of broad spectrum content – hundreds of millions of words.

To enhance performance for applications that use specific terminology, a SMT system

can be trained with a corpus that contains only or mostly content that is close to the

data that is to be translated. An ideal corpus for this is a large Translation Memory (TM)

that contains the previous translations of a project. The recommended volume of data

required is 1 to 5 million words, although it is possible to work with less than 1 million.

This is known as customisation or training.

The quality of the MT output depends on both the linguistic and technical quality of the

material included. However compared to RBMT, SMT provides a more fluent translation

with some context-sensitivity and better reflects the style of the training material.

SMT – Pros and Cons

• Customisation times are quicker than with RBMT• Output reads more fluently and is stylistically better than the output

from a rules-based system

• Able to select the correct translation in certain contexts: e.g.“device” in IT domain

• Generally shorter setup times

Pros


23/79

20

Compared with RBMT, Statistical Machine Translation can offer a larger number of

languages for post-editing as engines are lower cost and faster to train, as well as

easier to maintain. Moreover, because SMT is trained with “real” sentences and

phrases the direct output can be more fluent than with RBMT, which is good for raw

output requirements and additionally helps the post-editor. In addition, there is a high

level of research activity surrounding SMT and performance improvement is predicted

for the future. For this reason, SMT is the technology of choice at SDL.

However, it should nonetheless be noted that SMT requires large amounts of memory

space and processing capacity – though this in itself becomes less of a problem with

technological developments. Moreover, the output is dependent on the quality and

volume of data used for the customization, and therefore the post-editor must be aware

of the range of common trends in order to post-edit accurately. Similarly, it is harder to

implement changes in terminology made by the client than with RBMT and a project

specific engine can only be created if there is sufficient data as a starting point.

Syntax-based SMT – pros and cons

Syntax-based translation is based on the idea of translating syntactic units, rather than

single words or strings of words. A Syntax-based statistical engine can improve

grammatical accuracy and ensure that verbs are realised in the correct position.

• Need for large bilingual corpora (millions of words)• Difficult to maintain (for retraining a high amount of content is

needed, which takes time to gather)

• Need for processing time – file processing times are higher with animpact on hardware costs

Cons

• Better modelling of target language structure• Ensures there is always a verb present• Realises the verb in the correct position• Better handling of function words, such as prepositions• Has a more powerful decoding algorithm

Pros


24/79

21

The following table summarises the key differences between SMT and RBMT:

Attribute SMT RBMT

Does not need a large

volume of aligned data fortraining/customisation +

Number of languagessupported +

Setup time for newlanguage +

Terminology control +

Software UI term handling +

Raw fluency +

Raw accuracy +

Level of research activity

and performanceimprovement predicted +

4.4 Hybrid Systems

One thing that is being explored in contemporary research into MT technology is the

possibility of creating a hybrid engine, where dictionaries, rules and statistical features

are combined so as to obtain the best of both worlds. This can be done in many

• Early stages of development

• Sometimes less accurate terminology as no link to baselineCons


25/79

22

different ways; examples are the use of a dictionary to enforce certain translations in

SMT and the use of statistical techniques to determine the best translation for a

homograph such as “bank” or “get”, where the translation is different depending on thecontext.

However, current solutions are fairly pragmatic and leave room for further development

in future. In some cases, hybrid systems do not back up to a baseline and this can

exacerbate common MT issues, such as terminology inconsistencies and/or content left

untranslated.


26/79

23

5 How the MT output is createdStatistical MT is now the technology of choice at SDL, so this course will now

concentrate on SMT technology.

SDL takes a three-pronged approach to SMT and uses the following different engine

types, matching the solution to the particular use case:

5.1 Baselines

The core MT engines developed by SDL are known as baselines. These baseline

systems are bilingual corpora used as general databases for each language pair. They

are based on a large translated corpus of hundreds of millions of words, taken from

reliable sources available in the public domain, such as news, IT documentation,

technical manuals and publically-available government material, and distributed across

various domains, including IT, automotive, news, sports, electronics, etc.

Baselines are under constant development and new releases are launched frequently.

Customised engines• Content trained for specific client corpus

Verticals

• Domain-specific engines

Baselines

• Generic engines containing diverse data


27/79

24

This solution produces good results for clients who require immediate access to MT,

who do not have sufficient volumes of data and/or wish to translate general content

across several domains.

Client-specific customisations and domain-specific verticals normally use baseline

engines as a backup; so if a certain word, phrase, or even grammatical structure is not

present in the training data, the engine may still be able to produce a translation.

Baselines – Pros and Cons

5.2 Verticals A vertical is a trained statistical engine that exclusively contains data related to a

specific subject area, or domain, such as IT, Automotive, Electronics etc. When a client

does not have enough translated data to be used for a client-specific training, a vertical

solution can be used instead of a customisation on top of the baseline corpus.

These domain-specific engines therefore provide a point of entry for projects that have

small TMs. They also prove useful in those cases where there is not enough time to

create a project-specific engine before the first jobs start to flow in. Because the vertical

Pros Cons


28/79

25

is a ready-to-use solution, it does not have the development effort involved in creating

client-specific engines.

Based on the higher volume of data used in a Vertical when compared to a

customisation, the engine is less likely to take translations from the baseline and

therefore less likely to produce a general instead of a more specific technical

translation. However, as the data for the Vertical will come from different sources within

a domain it is also more likely to find inconsistencies in style and terminology that will

need to be checked during the post-editing and quality-checking stages.

SDL Verticals are available for the following domains in a wide number of languages

These engines are always under development and, whenever there is a considerable

amount of new data and/or new technical features that can enhance the overall

performance of the engine, they are retrained to improve the overall quality of the MT

output.

Automotive Vertical

Consumer Electronics (CE) Vertical

HiTech (IT Hardware) Vertical

Travel Vertical


29/79

26

The vertical retraining process is designed to increase productivity when working with

vertical output. However, if a client prefers a specific translation for a certain term which

was correct in the original vertical, a retraining might mean that this term could bechanged to a more widely used translation. This will need to be corrected during post-

editing and we recommend adding terms like this to your QA check.

Verticals – Pros and Cons

5.3 Customisations

A customisation is a trained statistical engine that only (or mainly) contains client-

specific corpora. It involves preparing client-specific TMs in order to get the best MToutput for production. The recommended requirement for a successful customisation is

an aligned corpus of 1 million words of relevant customer data, although this may vary

per project and language pair, and it is possible to create a customisation with lower

volumes of customer data.

Using this type of material guarantees adherence to client-specific terminology and

style.

Pros Cons


30/79

27

As the machine translation output is fully based on the bilingual corpus, with no

syntactical or lexical data added, the quality of the output can only be as good as the

quality of the corpus. If the corpus data has inconsistent terminology and/or style, theresulting MT may also be inconsistent. That is why it is important that the linguist

responsible for the customisation chooses suitable data to be added to the SMT engine

training.

Customisation – Pros and Cons

5.4 Engine training process

When a project is sent to iMT, all the necessary data is collated – including project

TMs, sample files, project information, etc. The next step in the process is to evaluate

the source text and establish if it is suitable for machine translation. A source evaluation

will also allow the linguist to identify any possible issues with the use of MT on the

project, so that action can be taken during engine creation to try to minimise those

issues. If the data is suitable, then the TMs are prepared for training the engine. SMT

engine training is an iterative process, and involves the following steps:

Pros Cons


31/79

28

TM cleaning

Data cleaning is a process applied to the training corpus in order to make it compatible

with the platform where the SMT engines are created. This process improves the

quality of the data by removing content which could adversely affect the MT output,

such as tags, entities, misaligned segments, and corruptions. This could appear in the

output and provoke a drop in productivity. Some parts are also harmonised towards

achieving MT output that will be faster to post-edit, as less changes will be required.

Creation of training

During a customisation, several trainings with different combinations of data may be

uploaded to the system and then evaluated so the iMT team can select the one that

delivers the best results. A second trial is based on the results of the first one – the

problems found in the output are traced back to the TM data, which is then manipulated

further to try to solve the issues. The training with the best results is then deployed for

production.

Selection of test sentences

For MT testing purposes, the linguist selects a set of sentences which do not appear in

the corpus which will be uploaded to the SMT system. Ideally, the sentences should be

taken from new untranslated project files, as this is the best way to reproduce a realtranslation scenario and really test the engine to the max.

1• TM cleaning

2• Selection of test sentences

3• Testing


32/79

29

Testing

One of the biggest challenges within the MT industry at this point in time is to find an

automatic measure that will be able to forecast if a particular MT output will be able to

reach the particular user‟s goal. Achieving this objective is particularly difficult as there

are no unique solutions in translation. Many translations may be right for one sentence

and even more translations can be wrong. Since an automatic assessment of MT

output quality is generally based on comparing the MT to reference translations, finding

an automatic procedure to determine the MT output quality is a challenging task where

a lot of work is currently being concentrated.

Nowadays, many MT providers choose between human and automatic evaluations (or

a combination of both).

Human evaluation is normally centred on Likert-based scales. With this method,

resources are asked to score aspects of the MT output by following a list of parameters

associated with a numerical scale. For example, „score 5 if the output is entirely correct,

score 4 if the output is understandable but has grammatical errors,…‟.This kind of

assessment mainly focuses on understandability, although some vendors have started

looking into Likert-based scales that could help assess the post-editing effort. Human

evaluation can also be used to compare two or more MT engines or systems, and is

based on the evaluator stating their preference between two or more MT outputs

generated for the same source sentences.

Some of the disadvantages inherent with human evaluation are:

Performing this kind of tests is relatively expensive and time consuming, asseveral resources are required for assessing each and every engine.

Human evaluations are prone to subjectivity and final assessments may not be

consistent after all.

Resources need to be familiar with the scales and follow them to the letter in

order to obtain valid results.


33/79

30

However, when done well, a human evaluation is still often considered to be more

reliable than automated measures, and has the added advantage of a human translator

being able to provide useful comments on the issues found on the MT output.

The productivity increase though is still a difficult factor to predict for all cases, as

productivity may vary per job and also per resource (it varies with post-editing

experience, for instance). Most productivity tests in the industry are based on a

combination of measuring post-editing speed, and post-editing effort, or comparing

post-editing speed with conventional translation speed.

In the last decades, many measures for automated evaluation have been proposed.

Most automated measures assess the quality of the machine translation compared to a

reference translation which is deemed to be high quality. Some of the most widely

spread ones are detailed below.

BLEU (Bilingual Evaluation Understudy) score: this algorithm is meant to evaluate the

quality of text which has been machine-translated. The central idea behind BLEU is

“the closer a machine translation is to a professional human translation, the better it is”.For that, scores are calculated for individual translated segments – generally sentences

– by comparing them with a set of good quality reference translations. Those scores

are then averaged over the whole corpus to reach an estimate of the translation's

overall quality. Intelligibility or grammatical correctness are not taken into account

explicitly, they are supposed to be included in the correct reference translations.

NIST: the name of this metric comes from the US National Institute of Standards and

Technology. This measure is based on the BLEU score, but it differs from this algorithm

in several points.

Whilst BLEU simply calculates how many n-grams match both in the reference

translation and in the MT output and gives these n-grams the same weight, NIST also

calculates how “informative” a particular n-gram is. When a correct n-gram is found, the

algorithm measures if that combination is a common sequence in the corpus material or

if, on the other hand, that fragment is not that common in the data. Depending on the

result, an n-gram will be given more or less weight. To give an example, if the bigram


34/79

31

"on the" is correctly matched, it will receive lower weight than the correct matching of

bigram "interesting calculations", as this is less likely to occur.

NIST also differs from BLEU in how some penalties are calculated. For example, small

variations in translation length do not impact the overall NIST score as much as in

BLEU.

METEOR (Metric for Evaluation of Translation with Explicit ORdering): this metric was

designed to address some of the problems found in the more popular BLEU metric, and

also produce good correlation with human judgment at the sentence or segment level

(this differs from the BLEU metric in that BLEU seeks correlation at the corpus level).

For that, several features that had not been part of any other metrics at the time were

introduced. Matches in METEOR are made by following the parameters below, among

others:

Exact words: as with other metrics, a match is made if two words are identical in the

machine translation output and the reference translation.

Stem: words are reduced to their stem form. If two words have the same stem, a match

is also made.

Synonymy: words are matched if they are synonyms of one another. Words are

considered synonymous if they share any synonym sets according to an external

database.

TER (Translation Edit Rate): this metric measures the number of edits required to

change a machine translation output into one of the human references.

Levenshtein distance: this metric measures the similarity or the dissimilarity (“distance”)

between two text strings by calculating the minimum amount of single-character edits

(insertion, deletion, substitution) required to change one word into another. In the field

of machine translation, this can be done by comparing the raw MT output to the human

translation.

Let‟s look at a couple of examples:

http://en.wikipedia.org/wiki/Distancehttp://en.wikipedia.org/wiki/Distancehttp://en.wikipedia.org/wiki/Distancehttp://en.wikipedia.org/wiki/String_(computer_science)http://en.wikipedia.org/wiki/String_(computer_science)http://en.wikipedia.org/wiki/Distance


35/79

32

The Levenshtein distance between "sport" and "short" is 1, because 1 edit is required

to convert one word into the other (replace “p” with “h”).

The Levenshtein distance between “dog” and “frog” is 2, as it is not possible to convert

the first word into the second with fewer edits (replace “d” with “f” and add “r”).

This algorithm always has a maximum value that corresponds to the maximum length

of both input strings. In the case that 2 words do not have anything in common, the

minimum amount of edits will not exceed the maximum amount of characters of the

longer string.

Example: if we have “computer” and “alibi”, the Levenshtein distance will be 8 and no

higher than 8:

replace “c” with “a”

replace “o” with “l”

replace “m” with “I”

replace “p” with “b”

replace “u” with “I”

delete “t”

delete “e”

delete “r”

As with other automated measures, the results of the Levenshtein distance are not set

in stone. As mentioned before, there can be many correct translations for a single

source; however, the Levenshtein distance will not be able to measure quality on its

own. Results will vary, for example, if clauses are positioned differently in the MT output

and in the human reference translation.

Example:


36/79

33

MT: “If I go home after 10pm, I will let you know”.

Reference human translation: “I will let you know if I go home after 10 pm”.

In this case, the MT output is correct and no changes would be necessary during a

post-editing stage. However, the Levenshtein distance will be quite high, as many

changes would be required to turn the first sentence into the second one.

That suggests once more the importance of selecting large test beds to run any of

these automated evaluations on, as that will allow us to get more reliable results.

Automatic measures also have their limitations: the reference translation is not always

available, and those measures do not give an indication of post-editing productivity

expected. Therefore, they are useful for engine training development and comparison,

but not necessarily practical for a production scenario.

In January 2011, TAUS began working with a group of its enterprise members with a

clear objective in mind – tackle the general problem of evaluating translation quality. And consequently the idea of the Dynamic Quality Evaluation Framework (DQF) was

born.

The framework is still in development, and will allow users to profile their content and

receive guidance on best-fit evaluation techniques. A knowledge base documenting

best practices provides detailed practical information on how to carry out seven specific

types of quality evaluation. By establishing best practices, metrics and benchmarks

within a dynamic framework, the project team sought to apply best-fit evaluation

approaches depending on content type and usage, moving away from the dated, static

– one size fits all – approach used by most vendors.


37/79

34

6 Using the MT Output: the basics of post-editing

6.1 Introduction to post-editing

Post-editing is a new phase that replaces conventional translation for MT projects. It is

a change in the process, but the working environment remains the same. The same

applications and the same reference materials used in a conventional translation

project are also used when post-editing. Machine-translation is a new component in the

process that provides human translators more leverage along with the use of TMs.

Post-editors work on CAT tools editing fuzzy matches from the TM and machine-

translated segments to a publishable quality.

Post-editing is a skill which translators develop with time. Post-editors will not be fully

productive from day one as they need to learn their trade. Industry research has shown

that experience is the single most important factor in translation productivity andbecomes even more influential in post-editing. Over time, translators can adapt their

working practices to use the MT output to their advantage.


38/79

35

Integrating post-editing into a production environment

On a file for post-editing, the Translation Memory is applied as usual, to create the

100% matches and fuzzy matches. Machine translation is applied to any untranslated

text left after the TM is applied.

The post-editing phase itself involves a number of key stages. Since the post-editor is

attempting to be as efficient and productive as possible, preparation is key. Do not rush

ahead without taking time to consider the source and MT output. Determine the

useable parts and then build around these. Focus on accuracy, without under- or over-

editing, and finally check over the grammar and the terminology. Post-editors are

generally advised that if the text scans well, it will flow well.

6.2 Degrees of post-editing

The market makes a distinction between post-editing to publishable quality and post-

editing to an understandable level. Post-editing to publishable level is the highest

quality standard. This is in line with the expectations of the majority of SDL‟s clients.


39/79

36

After post-editing, files undergo a quality check to ensure that the translation is correct

and fluent. The final quality should be comparable to conventional translation.

Post-editing to understandable quality, or light post-editing is normally required for low

visibility text, or texts that would not otherwise be translated for a client as it would be

too expensive and time-consuming. A client might decide to opt for understandable

quality texts in order to reduce the number of support requests for a product or to

provide an extra service to the user, for example. Typical purposes of understandable

quality texts include offering users a quick answer on how to fix an issue or providing a

translation solution for low visibility content, such as FAQs, blogs, and knowledge

bases.

When post-editing to an understandable level alone, it is less important to correct style

and grammar so long as the meaning of a translation is clear. Most important, however,

is to follow the clear project requirements that should always be provided by the client

in advance.

Examples of light post-editing

LP SOURCE EN MT EN PE COMMENTS

IT-EN

Attrezzo dicompressione permisurare lasporgenza dellecanne dei cilindri (dautilizzare con380000364 e piastrespecifiche)

Tools for compression tomeasure cylinder linerprotrusion ( use with380000364 and specificplates)

Tool for compression tomeasure cylinder linerprotrusion ( use with380000364 and specificplates)

The plural needs to beedited because"attrezzo" is singular inthe Italian source, butthere is no need toremove the space afterthe bracket

IT-EN

Prima di iniziare

qualsiasi lavoro inquest'area, spegnereil motore ed estrarrela chiave diaccensione.

Always stop the engine

and remove the Keybefore working in thisarea.

Always stop the engine

and remove the Keybefore working in thisarea.

There is no need to

change the uppercase tolower case

FR-EN

Si la valeur souhaitéen’est pas obtenue,

répéter lesinstructions 3 à 5.

If the desired pressurehas not been reached,repeat instructions 3 to5.

If the desired pressurehas not been reached,repeat instructions 3 to5.

"Required" would bebetter than "desired",but since this is perfectlyunderstandable there isno need to change it.


40/79

37

EN-DE

To remove the 3Ddiffuser:

Zum Entfernen des 3DRefraktionstechnik:

Zum Entfernen des 3DRefraktionstechnik:

The MT has the wrongcase “des” instead of

“der”. But the MT

sentence is perfectlyunderstandable as it is.

EN-FR

The pressure isreduced to pilotpressure.

La pression est réduit àla pression pilote.

La pression est réduit àla pression pilote.

The gender agreement iswrong, should be“réduite“ instead of“réduit”, but the

sentence isunderstandable as it isand that does not needto be corrected.

Publishable quality vs. Understandable level

Post-editing to publishable quality is covered in mode detail in the next chapters. When

post-editing to publishable quality, the following rules apply:

• Most frequent form of post-editing• Generally used for higher visibility texts• Comparable to conventional translation• High quality expectations

• Follows standard client expectations

Publishable

Quality

• Less frequent form of post-editing• Generally used for lower visibility texts• Focus on meaning not on style and grammar• Expectations based on specific client

requirements

• Clear requirements are needed

Understandable

Level


41/79

38

6.3 The quality check process

It is recommended that the post-editing process is followed by a quality check, which is

the equivalent of conventional review.

1

•Read the source segment first and then the MT output

2•Determine the usable elements (single words and phrases) and makethem the basis for your translation

3

•Build from the MT output and use every part of the MT output that canspeed up your work

4

•Take care not to over-edit (unnecessary rephrasing) or under-edit (wrongprepositions, inflections, compounds, etc.) the MT output. The adjustmentof style (such as “may” versus “might”) can be optional, but grammaticalcorrectness in the target is not

5

•Correct any grammatical errors and make sure that the terminology of theMT output is compliant with glossaries and termbases. This will always

need to be checked as any inconsistencies in the training material will bereproduced in the output

6•Run the compulsory checks (spelling, grammar, terminology check)

7

•Finally, after post-editing each segment, reread your translation and make

sure that no details are missing and you have not left any words that arenot needed


42/79

39

As part of SDL‟s workflow, the quality check is performed as a separate step by a

reviewer and guarantees that the translation is fully publishable. To achieve this, quality

at source is key – the post-edited file should already be of publishable quality. Tofacilitate this, ensure that the post-editor receives clear instructions and has access to

all most up-to-date reference materials. The required QA checks need to be run and

can be used as an indication of the post-editing quality.

When quality-checking, always bear the MT in mind and understand the initial MT

output. Identify known problems in advance (see section 8) and make sure to include

them in your checks (e.g. wrong prepositions, terminology, known issues with MT). It is

important to learn to distinguish between what needs to be changed and what can

remain untouched. Note that there are some items which always need to be amended

by the post-editor. Examples include date formats, spacing, wrong prepositions or

terminology issues caused by several possible translations of the same word.

When quality-checking machine-translated material, focus on over-editing and under-

editing (depending on style and client requirements). Over-editing will lead to lower

productivity and needs to be avoided during both the PE and the QA check phase.

Under-editing may result in quality issues and will impact negatively on the time needed

for quality check.

Before starting a quality check, make sure that all the content has been translated.

Then check that the post-edited text reads well from a user„s point of view. The post-

edited text must match the source. Be careful to look for mistranslations, words left out

from the translation or additional words which are not on the source text. Check that

there are no typos. Scrolling down the file will enable you to spot spelling mistakes and

inconsistencies. Terminology should be consistent with the master glossary, especially

product names. It is vital that terminology is consistent. Sometimes terminology is not

consistent in the TMs and there are additional lists and guidelines for terminology.

Finally, check that style is overall consistent with the rest of the files and complies with

the style guide from the client.


43/79

40

7 How to get the most out of MT7.1 What makes an effective post-editor?

In order to post-edit effectively, it is essential to use the machine translation output as

much as possible. Do not ignore the machine translation output and do not translate

segments from scratch. In almost all cases some parts of the automatic translation

output can be used and help to speed up work.

The following guidelines will help you to identify usable parts and achieve the maximum

post-editing productivity. The translator needs to achieve publishable quality at the

post-editing stage without sacrificing translation speed. Once you have learnt to identify

usable parts and to use them, you will find post-editing easier and faster than

translating from scratch. Like any other new skill, however, there is a learning curve

with MT post-editing: the more you practice, the faster and easier it gets.

Post-editing tips

However, the MT is not only useful when it is easy to edit. You can also use the MT as

a source of inspiration when looking for the correct translation and pick out bits of the

sentence to reuse rather than trying to keep as much of the sentence as possible. This

Do not ignore orerase the MT

output

Maximise theusage of the MT

output

Use the

appropriatestyle andterminology

Follow theproject/client

style guidelines

If the MT meetsthe project

requirements,do not modify it

Do not spend timeresearching

terminologyunless the MT is

clearly wrong

Do not replacewords withsynonyms

Do not makealterations for

the sake ofvariation alone

If formatting is anissue, restore the

original sourceformat and paste

the useful MTparts instead

An alternative ifthere are manytags is to deletethem, edit the

text, then insertthe tags again

At the end, re-readthe segment andcompare it to the

source foraccuracy


44/79

41

is particularly relevant for longer sentences. Even sentences that are largely incorrect

can be useful so long as deleting the incorrect material is not time-consuming.

Apart from this, it is important to bear in mind that account knowledge is important for

post-editing as well. Whilst this is important for all translation projects – conventional as

well as MT – a solid knowledge of the project requirements with regard to style

guidelines, terminology, TM and client expectations will help you achieve good post-

editing productivity.

So what makes a good post-editor?

7.2 Post-editing quality expectations

The quality expectations will vary according to the degree of post-editing and the client

requirements. However, certain general principles apply. The aim is to deliver a high

quality translation faster than a conventional translation. Translation speed is a key

Excellentlinguistic

skills

Domain andsubject

knowledge

Proficiencywith CATtools and

automated

text-checking

Positiveattitude

towards MT

Practice!


45/79

42

factor when post-editing. Therefore, the machine translation needs to be corrected with

a view to maintaining efficiency.

There should be no difference in quality between a human translation and a post-edited

translation when post-editing to publishable quality. However, there may be a slight

shift in style. Style should be correct and appropriate to the project, but may need to be

less refined in order to allow for a more efficient use of the MT output. Where a client

specifically asks for MT to be used on their project, the client needs to be made aware

of this and expectations need to be managed accordingly.

There will of course be a certain amount of variation – but this is a feature ofconventional translation as well. So long as the quality criteria are adhered to, a post-

edited text will be considered to have met the quality expectations.


46/79

43

Post-editing quality criteria

There are two main issues that post-editors often face when attempting to fulfil the

highest possible quality criteria in the shortest amount of time. These are under-editing

and over-editing and will be discussed in more detail in the following sections.

7.3 Under-editing

If a post-editor has under-edited the MT output, they may have missed important errors

that needed to be corrected and may reflect badly on the quality of the translation.

Under-editing is generally characterised by the following features:

• The translation must be a correct reflection of the source.

• Spelling and punctuation must be correct.

• The translation must be grammatically and syntactically correct andreflect the conventions of the target language.

• The correct terminology must be applied and used consistently(including preferred translations for frequently occurring terms).

• Cultural references (date and time formats, units of measurement,number formats, currency, etc.) must be correctly adapted.

• The style and register of the target must be appropriate for thedocument type.

• The original formatting must be reproduced.

• Project guidelines must be followed.

• The translation must read well and be suitable for the end user.


47/79

44

Below are some examples of under-editing:

LP Source MT PE Reviewer Comment

EN-ES

On its wallsyou'll discoverthe figures of apuma and asnake.

En sus murallas,descubrirá lacifras de unpuma y unaserpiente.

En sus murallasdescubrirá lafiguras de unpuma y unaserpiente.

En sus murallasdescubrirá las figuras de unpuma y unaserpiente.

The term “cifras” hasbeen correctly post-edited and replaced with“figuras”, but the article“la” has not beenchanged to the pluralform.

EN-ES

Inside you cansee a

sacrificial altarmade of ahuge stone.

En su interior sepuede ver una

altar desacrificios de unaenorme piedra.

En su interior sepuede ver una

altar de sacrificioshecho con unaenorme piedra.

En su interior sepuede ver unaltar de

sacrificios hechocon una enormepiedra.

The preposition “de” hasbeen correctly post-edited, but the article“una” does notcorrespond to the gender

of the noun “altar” (“una”is feminine whilst “altar”is masculine).

EN-FR

How long willthe battery lastusinginteractive

features (suchas games) onmy phone?

Combien detemps durel'autonomie àpartir d'interactive

fonctions (commeles jeux) sur montéléphone ?

Combien de tempsdure l'autonomiede la batterielorsque j'utilise lesfonctionsinteractives

(comme les jeux)sur mon telephone?

Quelle estl'autonomie de labatterie lorsque

j'utilise lesfonctionsinteractives

(comme les jeux)de montelephone ?

"Combien de tempsdure" should not becombined with the word"autonomie". The litteraltranslation of "How longdoes XXX last" is notappropriate in thiscontext. The correctversion is "Quelle estl'autonomie".

The preposition "sur" isnot appropriate in thiscontext.

7.4 Over-editing

If a post-editor has over-edited the MT output, they may be taking extra time which may

affect their overall productivity and reduce the benefits of post-editing. Over-editing is

typically characterised by preferential rather than necessary changes.

• Errors (spelling, typos)• Mistranslations (target does not match source)

• Inconsistent terminology• Inaccuracy• Inconsistency in figures, units of measurement,

etc.

• Incorrect formatting• Not following project-specific instructions

Under-editing


48/79

45

There is always room to allow stylistic changes and creativity with post-editing, and

certainly stylistic features that do not meet with the client style guides should be

amended. The important thing to remember is not to let preferential changes distract

from necessary amendments and not to let these changes have a negative impact on

the overall productivity.

Below are some examples of over-editing:

LanguagePair Source MT PE with Overediting

PE withoutOverediting

Commenton

overeditedversion

DE-EN

Die Kühlungerfolgt durchdas massiveAluminium-Gehäuse unddie seitlichangebrachtenKühlrippen undkommt gänzlichohne Lüfteraus.

The cooling takesplace through the solidaluminum case and theside-mounted coolingfins and comescompletely withoutfans.

The cooling finsfitted on the side ofthe solid aluminiumcasing ensure thatthe computer iscooled, as it comescompletely withoutfans.

Cooling takes place through the solidaluminium casing andthe side-mountedcooling fins - there isno need whatsoeverfor fans.

Unnecessaryre-orderingand re-translating ofsegments

DE-EN

Aber nicht nurÄußerlich hatdiesesFestplattengehä

use einiges zubieten.

But not only on theoutside, this hard driveenclosure hassomething to offer.

This hard drive casinghas more than just agreat design.

But it's not only on theoutside where thishard drive casing hassomething to offer.

Overeditedversion isstylisticallymore

pleasing, butrequires amajorrewrite, whileversionwithoutoverediting isequallycorrect.

DE-EN

Fotos mit 1,3Megapixeln

Photos with 1.3megapixels

1.3 megapixel photos Photos with 1.3megapixels

Unnecessaryre-orderingof segments

DE-EN

Zudem stehenverschiedenSATA-Typen zurAuswahl, wiez.B. Micro SATAoder Slimline-

In addition there aredifferent SATA-typesare available, such asmicro SATA or SlimlineSATA.

There are varioustypes of SATAavailable for this, suchas micro SATA orslimline SATA.

In addition, there aredifferent SATA typesavailable, such asmicro SATA or slimlineSATA.

Unnecessaryre-phrasingand changeof syntax.

• Do not rewrite the translation unlessunavoidable

• Do not change correct and understandabletranslations, even if they could be phrased morenaturally or fluently

• If the MT output style meets the projectrequirements, do not change it

• Reduce changes to a minimum and focus onactual mistakes

Over-editing


49/79

46

SATA.

DE-EN

Mit der 1 Meter

langenTischantennekönnen SieIhren WLAN-Empfangdeutlichoptimieren.

With the 1 meter long

Tischantenne you cansignificantly optimize your WLAN-reception.

You can optimise your

WLAN receptionsignificantly using the1-m table-topantenna.

With the 1-m table-top

antenna you cansignificantly optimise your WLAN reception.

Unnecessary

re-orderingof segments;more of theMT can beleftunchanged ifsyntax iskept as is

EN-DE

Make sure thatthe brake pedalis depressedwhile youperform thisprocedure.

Sicherstellen, dass dasBremspedalniedergedrückt wirdwährend Sie diesesVerfahren durchführen.

Währenddessen mussdas Bremspedalweiterhin gedrücktwerden!

Das Bremspedal mussniedergedrückt sein,während Sie diesesVerfahrendurchführen.

Unnecessaryre-write;usable partsof the MTwere ignoredin overeditedversion

EN-DE

Install theBluetoothprinter on yourcomputer andset it as thedefault printer.

Installieren Sie dieBluetooth Drucker aufIhrem Computer, undrichten Sie ihn alsStandarddrucker.

Installieren Sie denBluetooth-Drucker aufIhrem Computer, undlegen Sie ihn alsStandarddrucker fest.

Installieren Sie denBluetooth-Drucker aufIhrem Computer, undrichten Sie ihn alsStandarddrucker ein.

Unnecessaryuse ofsynonyms;verb"einrichten"wasunnecessarilyreplaced by"festlegen"

EN-DE

Allow thecomputer tolockautomaticallyafter 10seconds.

Warten Sie, bis derComputer die Sperreautomatisch nach 10Sekunden.

Gestatten Sie, dass der Computer nach 10Sekunden automatischgesperrt wird.

Warten Sie, bis derComputer die Sperre nach 10 Sekundenautomatisch aktiviert.

Unnecessaryuse ofsynonyms;verb"warten" wasunnecessarilyreplaced by

"gestatten";"warten"conveyed thesamemeaning inthis context)

EN-DE

When theproximityfeature isenabled butinactive, thefollowingmessagedisplays in theBluetooth

Device Controlwindow for thephone:

Wenn der NäheFunktion aktiviert, abernicht aktiv ist, wird diefolgende Meldung in derBluetooth DeviceControl Fenster für dasTelefon:

Wenn dieNäherungsfunktioneingeschaltet aberinaktiv ist, wird imFenster "Bluetooth-Gerätesteuerung" fürdas Telefon diefolgende Meldungangezeigt:

Wenn dieNäherungsfunktionaktiviert aber nichtaktiv ist, wird diefolgende Meldung imFenster "Bluetooth-Gerätesteuerung" fürdas Telefon angezeigt:

Unnecessaryuse ofsynonyms;"eingeschaltet" is synonymto "aktiviert"and "inaktiv"is synonymto "nicht

aktiv" in thiscontext

EN-DE

This featureprovides a quickway to transferfiles withoutrequiring you tobrowse the filesystem on theother device.

Diese Funktion bieteteine schnelleMöglichkeit, Dateien,ohne die Datei zudurchsuchen auf deranderen Gerät zuübertragen.

Mithilfe dieserFunktion lassen sich Dateien schnellübertragen, ohne dasDateisystem desanderen Gerätsdurchsuchen zumüssen.

Diese Funktion bieteteine Möglichkeit,Dateien schnell ohneDurchsuchen desDateisystems desanderen Geräts zuübertragen.

Unnecessaryre-orderingof segments;more of theMT can beleftunchanged ifthe syntax iskept as is

EN-FR

Afterdisconnectingthe high voltage

terminals,busbars, etc.,insulate the

Après avoir débranchéles bornes hautetension, jeux, etc.,

isoler les pièces avec dela bande adhésiveisolante.

Après ledébranchement desbornes, barres

collectrices, etc. hautetension, isoler lespièces avec du ruban

Après avoir débranchéles bornes, barrescollectrices, etc. haute

tension, isoler lespièces avec du rubanisolant.

Unnecessarychange ofsyntax


50/79

47

parts withinsulating tape.

isolant.

EN-FR

For furtherinformation onthe Table View,see the tutorial"Table ViewProductivityFeatures"

Pour plus d'informationssur l'affichage entableau, voir lessections du tutoriel"Fonctions deproductivité - Affichageen tableau"

Pour obtenir de plus

amplesrenseignements surl’affichage en tableau,voir le tutoriel «Fonctions deproductivité -Affichage en tableau »

Pour plusd'informations surl’affichage en tableau,voir le tutoriel «Fonctions deproductivité -Affichage en tableau »

Correct

expression inMT; notneeding anyediting

EN-FR

Alternator isfound to benoisy

L'alternateur estbruyant

Le client trouve que l’alternateur estbruyant

L'alternateur estbruyant

Correctexpression inMT; notneeding anyediting

EN-FR

The oil in thesepassages istrapped and theblade does notmove.

L'huile dans cespassages est piégée etla lame ne bouge pas.

La lame ne bouge pascar l'huile de cesconduits est piégée.

L'huile dans cespassages est piégée etla lame ne bouge pas.

Unnecessaryrephrasing

EN-IT

Be sure that thehydraulic hoseis free ofabrasion.

Accertarsi che ilflessibile idraulico siaprivo di abrasioni.

Assicurarsi che ilflessibile idraulico siaprivo di abrasioni.

Accertarsi che ilflessibile idraulico siaprivo di abrasioni.

Unnecessaryuse of asynonym.

EN-IT

Adjust theangle by raisingthe rear of thevehicle toensure watercovers the

joints.

Regolare l'angolosollevando la parteposteriore del veicolo per assicurarsi chel'acqua copre i giunti.

Sollevando la parteposteriore del veicolo,regolare l'angolo perassicurarsi che l'acquacopra i giunti.

Regolare l'angolosollevando la parteposteriore del veicolo per assicurarsi chel'acqua copra i giunti.

Unnecessaryre-orderingof phrases.

EN-IT

The only way toallow thedevice tovalidate a self-signedcertificate is toinstall thecertificate onthe device.

L'unico modo perconsentire ildispositivo per convalidare un certificatoautofirmato perinstallare il certificatosul dispositivo.

Per permettere aldispositivo di convalidare un certificatoautofirmato, l'unicomodo è quello diinstallare il certificatosul dispositivo.

L'unico modo perconsentire aldispositivo di convalidare un certificatoautofirmato è quello diinstallare il certificatosul dispositivo.

Unnecessaryuse ofsynonymsandreordering ofphrases.

7.5 Help improve MT for the future

To make it easier to post-edit in the future make sure that you post-edit and translate in

an MT-friendly way using simple sentence structure and without adding additional

information or rephrasing the source and complicating the word order in the target

unnecessarily. This will improve the training material with which engines are retrained.

For some language combinations, the word order is considerably different between

source and target and this will always pose problems for MT. However, keeping closer

to the source is generally the best way forward:


51/79

48

In this instance, the second translation has the advantage that the word order in the

target is closer to the word order in the source. This can help the MT engine to match

up the words “error ” (German: “Fehler”) and “dash” (German: “Armaturenbrett”) more

easily with their correct translations.

If the verb is usually found at the beginning of the sentence in the source and at the

end of the sentence in the target, adding a lot of additional information in the middle

can also make it harder for the MT to match up source and target segments correctly.

As a rule, the MT engine can handle shorter phrases better than long convoluted

sentences.

A more MT-friendly style is also achieved by keeping trans

Documents

Training Guide PE Certification