18
MT Evaluation Seeing the Wood for the Trees John Tinsley CEO and Co-founder TAUS QE Summit. Dublin. 28 th May 2015

MT Evaluation: Seeing the Wood for the Trees

Embed Size (px)

Citation preview

Page 1: MT Evaluation: Seeing the Wood for the Trees

MT Evaluation Seeing the Wood for the Trees

John TinsleyCEO and Co-founder

TAUS QE Summit. Dublin. 28th May 2015

Page 2: MT Evaluation: Seeing the Wood for the Trees

We need to marry data that we know from operations with data we product during MT evaluations to create intelligence

Let’s look at how we can find that out and what it means…

Making the business case for MT

KNOWNS

•  Revenue from translation

•  Costs (internal, outsourced)

•  Variations of this information across content and languages

UNKNOWNS

•  MT performance

•  Cost of MT

•  Variations of this information across content and languages

Page 3: MT Evaluation: Seeing the Wood for the Trees

Calculating potential ROI

Parameters  

Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost  €0.10   €0.08   5,000,000  

MT  Weighted  Word  Count  

No  Machine  Transla3on   With  Machine  Transla3on  LSP  Revenue   €500,000   LSP  Revenue   €500,000  Vendor  Cost   €400,000   Vendor  Cost  MT  Cost   0   MT  Cost  Gross  Profit   €100,000   Gross  Profit  Gross  Profit  Margin   20.0%   Gross  Profit  Margin  

Gross  Profit  Increase  when  using  MT   ???%  

**These numbers are for illustrative purposes only and not related to the case study

Page 4: MT Evaluation: Seeing the Wood for the Trees

ProblemLarge Chinese to English patent translation project. Challenging content and language

QuestionWhat if any efficiencies can machine translation add to the workflow of RWS translators?

How we applied different types of MT evaluation and different stages in the process, at various go/no stages, to help RWS to assess whether MT is viable for this project

Client Case Study – RWS

- UK headquartered public company- Founded 1958- 9th largest LSP (CSA 2013 report)- Leader in specialist IP translations

Page 5: MT Evaluation: Seeing the Wood for the Trees

Lots of different ways to do evaluation**–  automatic scores

•  BLEU, METEOR, GTM, TER

–  fluency, adequacy, comparative ranking–  task-based evaluation

•  error analysis, post-edit productivity

Different metrics, different intelligence–  what does each type of metric tell us?–  which ones are usable at which stage of evaluation?

e.g. can we really use automatic scores to assess productivity?

e.g. does productivity delta really tell us how good the output is?

MT Evaluation – where do we start!?

Page 6: MT Evaluation: Seeing the Wood for the Trees

Can we improve our baseline engines through customisation? Step 1: Baseline and Customisation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

BLEU TER

Iconic Baseline

Iconic Customised

What next?

How good is the output relative to the task, i.e. post-editing?- fluency/adequacy not going to tell us- let’s start with segment level TER

-  Huge improvement

-  Intuitively, scores reflect well but don’t really say anything

-  Let’s dig deeper

Page 7: MT Evaluation: Seeing the Wood for the Trees

Translation Edit Rate: correlates well with practical evaluations

If we look deeper, what can we learn?

INTELLIGENCE

• Proportion of full matches (i.e. big savings)

• Proportion of close matches (i.e. faster that fuzzy matches)

• Proportion of poor matches

ACTIONABLE INFORMATION

• Type of sentence with high/low matches

• Weaknesses and gaps

• Segments to compare and analyse in translation memory

Page 8: MT Evaluation: Seeing the Wood for the Trees

TER

sco

re

Step 2: Segment-level automatic analysis

Distribution of segment-level TER scores

This represents a 24% potential productivity gain**

segment length

Page 9: MT Evaluation: Seeing the Wood for the Trees

With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the Dynamic Quality Framework

Beware the variables**!•  Translators: different experience, speed, perceptions of MT

–  24 translators: senior, staff, and interns

•  Test sets: not representative; particularly difficult–  2 tests sets, comprising 5 documents, and cross-fold validation

•  Environment and task: inexperience and unfamiliarity–  Training materials, videos, and “dummy” segments

Step 3: Productivity testing

Page 10: MT Evaluation: Seeing the Wood for the Trees

Overall average

Findings and Learnings

25% productivity gain

Experienced: 22%Staff: 23%

Interns: 30%

Test set 1.1: 25%Test set 1.2: 35%Test set 2.1: 06%Test set 2.2: 35%

Correlates with TER

Rollout with junior staff for more immediate impact on bottom line?

Don’t be over concerned by outliers.Use data to facilitate source content profiling?

What it tells us

By Translator Profile

By Test Set

Page 11: MT Evaluation: Seeing the Wood for the Trees

Look our for anomalies**–  segments with long timings (above average ratio words/minute)–  sentences that don’t change much from MT to post-edit*–  segments with unusually short timings

In this case, the next step is production roll-out to validate these in the actual translator workflow over an extended period.

Warnings, Tips, and Next Steps

Now would be the right time to do fluency/adequacy if you need to verify that post-editing is producing, at least, similar quality output

Page 12: MT Evaluation: Seeing the Wood for the Trees

Calculating the ROI - revisited

Parameters  

Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost  €0.10   €0.08   5,000,000  

MT  Weighted  Word  Count  

No  Machine  Transla3on   With  Machine  Transla3on  LSP  Revenue   €500,000   LSP  Revenue   €500,000  Vendor  Cost   €400,000   Vendor  Cost  MT  Cost   0   MT  Cost  Gross  Profit   €100,000   Gross  Profit  Gross  Profit  Margin   20.0%   Gross  Profit  Margin  

Gross  Profit  Increase  when  using  MT   ???%  

**These numbers are for illustrative purposes only and not related to the case study

Page 13: MT Evaluation: Seeing the Wood for the Trees

Calculating the ROI – plugging in the numbers

Parameters  

Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost  €0.10   €0.08   25%   5,000,000   €0.008  

MT  Weighted  Word  Count  3,750,000  

No  Machine  Transla3on   With  Machine  Transla3on  LSP  Revenue   €500,000   LSP  Revenue   €500,000  Vendor  Cost   €400,000   Vendor  Cost   €300,000  MT  Cost   0   MT  Cost   €40,000  Gross  Profit   €100,000   Gross  Profit   €160,000  Gross  Profit  Margin   20.0%   Gross  Profit  Margin   32%  

Gross  Profit  Increase  when  using  MT   60%  

**These numbers are for illustrative purposes only and not related to the case study

Page 14: MT Evaluation: Seeing the Wood for the Trees

Identify the gaps in your data

3 take home messages

Understand the process to collect the right information

Continuous assessment

Page 15: MT Evaluation: Seeing the Wood for the Trees

Thank You! [email protected]

@IconicTrans

Page 16: MT Evaluation: Seeing the Wood for the Trees

Iconic Translation Machines•  Machine Translation with Subject Matter Expertise•  Headquartered here in Dublin•  Strong tradition of MT research and development

underpinning the company and its technologies

This presentation•  MT evaluation: what, how, when, why?

–  What ways can we evaluate MT? –  How do we carry out the evaluation? –  When in the process do we carry out certain types of evaluation?–  Why do we do certain evaluations and what do they tell us?

By way of introduction…

Page 17: MT Evaluation: Seeing the Wood for the Trees

Step 2: Segment-level automatic analysis

Productivitythreshold

Plot of TER scores by length

Page 18: MT Evaluation: Seeing the Wood for the Trees

Step 2: Segment-level automatic analysis

Distribution of segment-level TER scores