15. Alessandro Cattelan (Translated) Natural Language Processing for Translation)


Citation preview

Natural Language

Processing for Translation Alessandro Cattelan, Translated srl

Extremely fragmented

market both in terms of

language service

providers and customers.

Language industry size

Language service industry

$33.5 billion in 2012



Large customers spend millions

of dollars a year in translation.

However, it is the smaller

customers with limited budgets

that make up most of the market.

Language industry customers

Specific characteristics

Larger customers Large budgets

Use technology (MT, TM, termbases, etc.)

Efficient processes (translation is part of the development cycle)

Smaller customers Tight budgets

No technology and no processes

Smaller Customers

Even though they are on a

tight budget and use no

technology for translation, we

can still give them something

better than this…

Common requirements

Both smaller and larger customers are interested in:

Getting high quality translations

Receiving the translation as soon as possible

Saving as much as possible

Challenge → Opportunity

No technology and no processes

to improve efficiency in translation

Develop technology and

processes to win customers

Content reuse

Large public translation memories

make it possible to leverage

previously translated content and to

reduce weighted word count.

Collecting data

Aligning bilingual content

Making data available in CAT tools



Translation Memory

Never translate the same sentence twice… nor part of it!

Improving matching algorithm for translation memories


To open a file, select File from the

menu and click on Open

Per aprire un file, selezionare File

dal menu e fare clic su Apri

Select File from the menu […]

Translation Memory

Never translate the same sentence twice… nor part of it!

Improving matching algorithm for translation memories

Using MT to complete fuzzy matches


Select File from the menu Selezionare File dal menu

Select File from the menu and

click on New document

Selezionare File dal menu […]

Machine Translation

Most of the times, customers do not have custom MT engines nor

the data to create an engine.

Use existing domain-specific engines, even though they are not

adapted to the customer

Adapt generic engines to specific domains (needs to be fast!)

Adapt the engine in real-time with the user translations

Using generic engines

Post-processing of MT output from generic engines:

Correcting terminology issues

Adapting output to previous translations

Managing mark-up…

“If I have seen further it is by standing on the shoulders of giants.” [I. Newton]

MT quality evaluation

Establishing the right weight for words translated by MT systems.

MT quality evaluation

What is a fair rate

for editing machine

translation output?

Confidence scores for MT

Matching metrics for TM


MT quality perceived by the


Terminology Management

Terminology management can have a great impact on

quality and productivity.

Automatic extraction of terminology

Finding target language equivalents for source terms

Adding context to the terms

Any questions?