Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement

Error Analysis of Two Types of Grammar for the purpose

ofAutomatic Rule Refinement

Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell

Language Technologies Institute

Carnegie Mellon University

AMTA 2004

October 1 AMTA 2004 2

Outline

• Automatic Rule Refinement• AVENUE and resource-poor scenarios• Experiment

• Data (eng2spa)• Two types of grammar• Evaluation results• Error analysis• RR required for each type

• Conclusions and Future Work


General- MT output still requires post-editing- Current systems do not recycle post-editing efforts

back into the system, beyond adding as new training data

within Avenue- Resource-poor scenarios: lack of manual grammar

or very small initial grammar- Need to validate elicitation corpus and

automatically learned translation rules

Motivation for Automatic RR


Motivation for Automatic RRGeneral- MT output still requires post-editing- Current systems do not recycle post-editing efforts

back into the system, beyond adding as new training data

within Avenue- Resource-poor scenarios: lack of manual grammar

or very small initial grammar- Need to validate elicitation corpus and

automatically learned translation rules


AVENUE and resource-poor scenarios

• No e-data available (often spoken tradition) SMT or EBMT

• lack of computational linguists to write a grammar

So how can we even start to think about MT?– That’s what AVENUE is all about

Elicitation Corpus + Automatic Rule Learning + Rule Refinement

What do we usually have available in resource-poor scenarios? Bilingual users


AVENUE overview

Learning

Module

Transfer Rules

Lexical Resources

Run Time Transfer System

Lattice

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Handcrafted rules

Morphology

Morpho-logical analyzer


Automatic and Interactive RLR

SLS3

SLSentence1– TLSentence1 SLSentence2– TLSentence2

Automatically Learned Rule R

TLS3

1st step

2nd step

TLS3’

RR module

R’ (R refined)

SLS3

TLS3’


Interactive Elicitation of MT errorsAssumptions:

• non-expert bilingual users can reliably detect and minimally correct MT errors, given:– SL sentence (I saw you)– up to 5 TL sentences (Yo vi tú, ...)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)

• using an online GUI: the Translation Correction Tool (TCTool)

Goal: Simplify MT correction task maximally

User studies: 90% error detection accuracy and 73% error classification [LREC 2004]


TCTool v0.1•Add a word•Delete a word•Modify a word•Change word order

Actions:


RR Framework• Find best RR operations given a:

• grammar (G),

• lexicon (L),

• (set of) source language sentence(s) (SL),

• (set of) target language sentence(s) (TL),

• its parse tree (P), and

• minimal correction of TL (TL’)

such that TQ2 > TQ1• Which can also be expressed as:

max TQ(TL|TL’,P,SL,RR(G,L))


Types of RR operations• Grammar:

– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]

– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]

– R0 R1[=R0 + constr= -]

R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]

• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr]

– Lex0 Lex1[=Lex0 + constr]

– Lex0 Lex1[Lex0 + TLword] Lex1 (adding lexical item)

bifurcate

refine


Data: English - Spanish

Training• First 200 sentences from AVENUE Elicitation

Corpus• Lexicon: extracted semi-automatically from first

400 sentences (442 entries)

Test• 32 sentences manually selected from the next 200

sentences in the EC to showcase a variety of MT errors


Manual grammar

• 12 rules (2 S, 7 NP, 3 VP)

• Produces 1.6 different translations on average


Learned Grammar + feature constraints

• 316 rules (194 S, 43 NP, 78 VP, 1 PP)• emulated decoder by reordering of 3 rules

• Produces 18.6 different translations on average


Comparing Grammar Output: Results

• Manually:

• Automatic MT Evaluation:NIST BLEU METEOR

Manual grammar 4.3 0.16 0.6Learned grammar 3.7 0.14 0.55


Error Analysis• Most of the errors produced by the manual grammar can be

classified into:– lack of subj-pred agreement– wrong word order of object pronouns (clitic)– wrong preposition– wrong form (case)– OOV words

• On top of these, the learned grammar output exhibited errors of the following type:– lack of agreement constraints– missing preposition– over-generalization


• Same (both good)

• Manual Grammar better

• Learned Grammar better

• Different (both bad)

Examples


Types of RR required for

Manual Grammar

• Bifurcate a rule to code an exception:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]

– R0 R1[=R0 + constr= -]

R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]

Learned Grammar

• Adjust feature constraints, such as agreement:– R0 R1 [=R0 +|- constr] Cov[R0] Cov[R1]


Conclusions

• TCTool + RR can improve both hand-crafted and automatically learned grammars.

• In the current experiment, MT errors differ almost 50% of the time, depending on the type of grammar.

• Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization.

• We expect the RR to give the most leverage when combined with the Learned Grammar.


Future Work

• Experiment where user corrections are used both as new training examples for RL and to refine the existing grammar with the RR module.

• Investigate using reference translations to refine MT grammars automatically... but much harder since they are not minimal post-editions.


Questions???

Thank you!


RR Framework• types of operations: bifurcate, make more

specific/general, add blocking constraints, etc.

• formalizing error information (clue word)

• finding triggering features

Documents

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement