Automatic Translation Error Analysismtmarathon2011.fbk.eu/sites/mtmarathon2011.fbk.eu/files/...Outline Got to know the tools; cross-evaluation (all) Hjerson++ (Maja) Addicter's friendlier

Automatic Translation Error AnalysisProject results & conclusions

MT Bondone (side project)

Outline● Got to know the tools; cross-evaluation (all)

● Hjerson++ (Maja)● Addicter's friendlier and richer interface (Dan)● A non-paranoid alignment for Addicter (Martin)

● Both tools on wmt11 En-De (Sabine)● Both tools on dataset X (Arianna, Suhel)

Addicter vs Hjerson● Addicter (HMM)

● Moses the decoder uses beam search

● The Moses program employs suboptimal pruning● Hjerson & Addicter (Greedy)

● Moses the decoder uses beam search

● The Moses program employs suboptimal pruning

● Failed translation attempt or missing+extra pair? Hard task● Turns out, second strategy is better (ranking, error prec./rec.)

Hjerson on Czech data● WMT'09, En-Cs● Very rich morphology including inflections and

derivations● Very free word order● Flexible human error analysis (related to the

given reference, but only loosely)

Ranking evaluation● correlations over error categories between

0.4 and 0.7● correlations over translation systems:

● strongest for missing words and lexical errors(0.7 - 1)

● weaker for reordering and morphological errors(0.2 - 0.8)

● reason: above mentioned characteristics of Czech language

● (again) weak for extra words (-0.2 - 0.4)

Precision, Recall, Confusions● General problem:

● (again) extra words confused with lexical errors

● Problems related to the Czech language:● morphological errors confused with lexical● much more reordering errors

WER alignment on base forms

Improves some aspects:● better correlations over error classes● better recall of extra words (+less confusion

with lexical errors)● price: deterioration of lexical recall + more lexical

errors confused with extra words● however, the gain is significantly larger

● better precision of extra words● price: more correct words are tagged as extra

Addicter / Visualizer● Easy install (uses internal webserver now)● Improved interface● Reference-hypothesis alignment

● Multiple alignments of the same sentence● Color highlighting of automatically found errors

● … DEMO

Addicter on English● Automatic testing system for all tools and

datasets

● Greedy alignment for Addicter● fast (linear search)● based on context, lemma and PoS similarity● suffers from lexical error overkill (better than not

detecting them)● evaluated on manually annotated WMT09 De-En –

similar to Hjerson● Addicter's best built-in aligner

Test on WMT11 EN-DE Data● 22 MT systems/outputs● No manually annotated gold standard ● Ranking according to manual judgments● Application of both Addicter & Hjerson to all the

systems‘ output

Number of Errors● Addicter tags between 81-90k of 150k tokens

with errors, Hjerson between 84-95k.● The systems with the fewest errors:

● online-B: rank #2 of 22● illc-uva: rank #21 of 22● RBMT systems are tagged with more errors

Fun with CorrelationsAddicter

Total errors 0,003

Inflection errors 0,113

Extra words -0,283

Missing words 0,268

Lexical errors 0,086

Reordering 0,189

Hjerson

Total errors -0,109

Inflection errors 0,432

Extra words -0,351

Missing words 0,427

Lexical errors -0,275

Reordering 0,579

Infl+ext+reord 0,654

Error Analysis of the Error Analysis● Addicter tags very conservatively wrt

reordering/inflection, Hjerson is greedy.● The lack of alignment in Hjerson leads to many

errors: the German determiner is often wrongly tagged with inflection or reordering errors.

● Addicter abuses extra/miss (can be fixed by creating a better alignment).

Example - Hjerson●Aktuálně.cz "tested" the Social Democrat members of the new Council in terms of the well-established slang that originated in the town hall during the few last years, when Prague was ruled by the current coalition partners.

●Die Zeitung Aktuálně.cz hat Mitglieder des neuen Rates aus der ČSSD mal ein wenig "abgeklopft", wie sie den notorischen Slang beherrschen, der sich in den letzten Jahren eingebürgert hat, in denen die heutigen Koalitionspartner in Prag am Ruder waren.

●Aktuáln.cz "testete" die Sozialdemokratin-Mitglieder vom neuen Rat in Bezug auf die feste Umgangssprache von den gegenwärtigen Koalitionspartnern, die während der paar letzten Jahre im Rathaus entstand, als Prag regiert wurde.

Example - Addicter● New Councilors of CSSD will most probably have to overcome certain

language barriers to understand their old-new colleagues from ODS in Prague Council and municipal council.

● Die neuen Ratsherren der Hauptstadt aus den Reihen der ČSSD werden offensichtlich gewisse Sprachbarrieren überwinden müssen, um ihre alt-neuen Kollegen aus der ODS im Prager Rat und in der Stadtvertretung überhaupt verstehen zu können.

● Neue Ratsmitglieder von CSSD werden am wahrscheinlichsten Sprachbarrieren überwinden müssen, um ihre altneuen Kollegen von ODS in Prag-Rat und Magistrat zu verstehen.

Test on IWSLT'11 Ar-En Data● In progress

● “The system is of good quality and far too many errors are marked”

Conclusions● Hjerson updated; evaluates better; usable for

error/system ranking and rough error-tagging● Addicter updated; now also usable for

error/system ranking and rough error-tagging● Both tools tested on EnDe, En->Cs, Ar->En

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20

Documents

Automatic Translation Error Analysismtmarathon2011.fbk.eu/sites/mtmarathon2011.fbk.eu/files/...Outline Got to know the tools; cross-evaluation (all) Hjerson++ (Maja) Addicter's friendlier