5. manuel arcedillo & juanjo arevalillo (hermes) translation memories

Preview:

Citation preview

Translation memories

Hermes Traducciones y Servicios Lingüísticos

A brief history…

Processes have changed…

…but not the ultimate goal.

Productivity

Found in Translation,

Nataly Kelly & Jost Zetzsche

(2012)

LAN

Project Managers

LAN Server

Translation Memory

Translators Revisers

Engineering

INTERNET

Translator Reviser DTPer Project Manager

Project Managers

LAN Server

Translation Memory

Translators Revisers

Engineering

WAN

MT CAT

TEnTs SaaS

Crowdsourcing

Clouding

Project Managers

Translation Memory

Translators

Revisers

Engineering

LAN Server

Internals of a translation memory

Translation Memory Exchange

•OSCAR (Open Standards for Container/Content Allowing Re-use)

•TMX Standard (Translation Memory eXchange).

•Leveraging of translation memories regardless the tool or platform.

The ancestors of CAT Tools…

XL8 DOS tool in a workflow known as XLN

IBM TranslationManager

Exact match

Proposed terms in dictionary

Source text

Translation

proposal

Trados Workbench

Déjà-Vu

Star Transit (no memory!)

WordFast

SDLx

memoQ

OmegaT (free!)

Workflow tools: Across

Across

SDL Idiom World Server

Specialised tools: Catalyst

Specialised tools: Passolo

Basic TM features in CAT tools

Leverage of previous translations.

Analysis for quoting, planning and keeping

track of progress.

Concordance for sub-segment searches.

Maintenance to perform global changes,

import/export content, etc.

Leveraging TMs

CAT tools provide answers to these questions:

What is the fuzzy match of the segment?

What parts of the text are different?

Where is the match coming from?

Fuzzy match display

Fuzzy match display (II)

Fuzzy match display (III)

Fuzzy match display (IV)

Analysis feature

Every word from each segment is assigned to a different match band:

101%

100%

99-95%

94-85%

84-75%

New words

Repetitions

Analysis results

Different tools, different word counts

101% 41,352

100% 4194

99-95% 3698

94-85% 2077

84-75% 5270

New words 5241

Repetitions 2068

Total 63,900

CAT Tool 1 CAT Tool 2

101% 29,782

100% 16,002

99-95% 6038

94-85% 2633

84-75% 1369

New words 6150

Repetitions 5451

Total 58,425

Different word counts

There is no standard fuzzy matching algorithm.

CAT tools may have different auto-substitution elements:

numbers, dates, acronyms, variables, etc.

Different approaches to 101% matches.

Cross-file repetitions and internal fuzzy leverage.

Different file format filters.

Different segmentation rules.

SRX is the standard for segmentation rules.

Weighted word count

Each band is assigned a percentage of the full word rate

according to a weighting scheme (negotiable per client). For

example:

101% 0%

100% 20%

99-95% 30%

94-85% 40%

84-75% 50%

New words 100%

Repetitions 20%

Different tools, different word counts (II)

Band Words

Weighted

words

101% 41,352 x 0% 0

100% 4194 x 20% 839

99-95% 3698 x 30% 1109

94-85% 2077 x 40% 831

84-75% 5270 x 50% 2635

New words 5241 x 100% 5241

Repetitions 2068 x 20% 414

Total 63,900 11,069

CAT Tool 1 CAT Tool 2

Band Words

Weighted

words

101% 29782 x 0% 0

100% 16002 x 20% 3200

99-95% 6038 x 30% 1811

94-85% 2633 x 40% 1053

84-75% 1369 x 50% 684

New words 6150 x 100% 6150

Repetitions 5451 x 20% 1090

Total 58,425 14,989

Weigted word count tools

TMs and statistical analysis

If big enough, TMs provide the bilingual corpus

necessary to build SMT engines.

Some CAT tools can scan the TM in search of

correlation between words in source and target.

Recommended