Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language

Automatic Rule Learning for Resource-Limited Machine

Translation

Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin,

Ralf Brown

Language Technologies InstituteCarnegie Mellon University

October 11, 2002 AMTA 2002 2

Why Machine Translation for Minority and Indigenous Languages?

• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)

• Is there hope for MT for languages with limited resources?

• Benefits include:– Better government access to indigenous communities

(Epidemics, crop failures, etc.)– Better indigenous communities participation in

information-rich activities (health care, education, government) without giving up their languages.

– Language preservation– Civilian and military applications (disaster relief)

October 11, 2002 AMTA 2002 3

MT for Minority and Indigenous Languages: Challenges

• Minimal amount of parallel text• Possibly competing standards for

orthography/spelling• Often relatively few trained linguists• Access to native informants possible• Need to minimize development time

and cost

October 11, 2002 AMTA 2002 4

AVENUE PartnersLanguage Country Institutions

Mapudungun (in place)

Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education

Quechua(discussion)

Peru Ministry of Education

Iñupiaq(discussion)

US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center

Siona(discussion)

Colombia OAS-CICAD, Plante, Department of the Interior

October 11, 2002 AMTA 2002 5

AVENUE: Two Technical Approaches

• Generalized EBMT• Parallel text 50K-

2MB (uncontrolled corpus)

• Rapid implementation

• Proven for major L’s with reduced data

• Transfer-rule learning

• Elicitation (controlled) corpus to extract grammatical properties

• Seeded version-space learning

October 11, 2002 AMTA 2002 6

AVENUE Architecture

User

Learning Module

ElicitationProcess

SVSLearning Process

TransferRules

Run-Time Module SLInput

SL Parser

TransferEngine

TLGenerator

EBMTEngine

UnifierModule

TLOutput

October 11, 2002 AMTA 2002 7

Learning Transfer-Rules for Languages with Limited Resources

• Rationale:– Large bilingual corpora not available– Bilingual native informant(s) can translate and align a

small pre-designed elicitation corpus, using elicitation tool– Elicitation corpus designed to be typologically

comprehensive and compositional– Transfer-rule engine and new learning approach support

acquisition of generalized transfer-rules from the data

October 11, 2002 AMTA 2002 8

Overview of Learning Approach

1. Elicitation Corpus: Bilingual data is acquired from a specifically engineered corpus

2. Feature Detection: Gather information about features and their values in the minority language

3. Rule Learning: Infer syntactic transfer rules by first guessing and then iteratively refining

October 11, 2002 AMTA 2002 9

The Elicitation Corpus

• Translated, aligned by bilingual informant• Corpus consists of linguistically diverse

constructions• Based on elicitation and documentation work

of field linguists (e.g. Comrie 1977, Bouquiaux 1992)

• Organized compositionally: elicit simple structures first, then use them as building blocks

• Goal: minimize size, maximize coverage

October 11, 2002 AMTA 2002 10

The Transfer EngineAnalysis

Source text is parsed into its grammatical structure. Determines transfer application ordering.

Example:

他看书。 (he read book)

S

NP VP

N V NP

他看书

TransferA target language tree is created by reordering, insertion, and deletion.

S

NP VP

N V NP

he read DET N

a book

Article “a” is inserted into object NP. Source words translated with transfer lexicon.

GenerationTarget language constraints are checked and final translation produced.

E.g. “reads” is chosen over “read” to agree with “he”.

Final translation:

“He reads a book”

October 11, 2002 AMTA 2002 11

Transfer Rule Formalism

Type informationPart-of-speech/constituent

informationAlignments

x-side constraints

y-side constraints

xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))

;SL: the man, TL: der Mann

NP::NP [DET N] -> [DET N]((X1::Y1)(X2::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X2 AGR) = *3-SING)((X2 COUNT) = +)

((Y1 AGR) = *3-SING)((Y1 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y1 GENDER)))

October 11, 2002 AMTA 2002 12

Transfer Rule Formalism (II)

Value constraints

Agreement constraints

;SL: the man, TL: der MannNP::NP [DET N] -> [DET N]((X1::Y1)(X2::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X2 AGR) = *3-SING)((X2 COUNT) = +)

((Y1 AGR) = *3-SING)((Y1 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y1 GENDER)))

October 11, 2002 AMTA 2002 13

Rule Learning - Overview

• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the source

side (grammatical structure)• Three steps:

1. Flat Seed Generation: first guesses at transfer rules; no syntactic structure

2. Compositionality: use previously learned rules to add structure

3. Seeded Version Space Learning: refine rules by generalizing with validation

October 11, 2002 AMTA 2002 14

Flat Seed Generation

Create a transfer rule that is specific to the sentence pair, but abstracted to the POS level. No syntactic structure.

Element Source

SL POS sequence f-structure

TL POS sequence TL dictionary, aligned SL words

Type information corpus, same on SL and TL

Alignments informant

x-side constraints f-structure

y-side constraints TL dictionary, aligned SL words (list of projecting features)

October 11, 2002 AMTA 2002 15

Flat Seed Generation - Example

The highly qualified applicant did not accept the offer.Der äußerst qualifizierte Bewerber nahm das Angebot nicht an.

((1,1),(2,2),(3,3),(4,4),(6,8),(7,5),(7,9),(8,6),(9,7))

S::S [det adv adj n aux neg v det n] -> [det adv adj n v det n neg vpart](;;alignments:(x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7));;constraints:((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. )

October 11, 2002 AMTA 2002 16

Compositionality - Overview

• Traverse the c-structure of the English sentence, add compositional structure for translatable chunks

• Adjust constituent sequences, alignments• Remove unnecessary constraints, i.e. those

that are contained in the lower-level rule• Adjust constraints: use f-structure of correct

translation vs. f-structure of incorrect translations to introduce context constraints

October 11, 2002 AMTA 2002 17

Compositionality - Example

S::S [det adv adj n aux neg v det n] -> [det adv adj n v det n neg vpart](;;alignments:(x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7));;constraints:((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. )

S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) …. )

NP::NP [det AJDP n]-> [det ADJP n]

((x1::y1)…((y3 agr) = *3-sing)((x3 agr = *3-sing)

….)

October 11, 2002 AMTA 2002 18

Seeded Version Space Learning: Overview

• Goal: further generalize the acquired rules• Methodology:

– Preserve general structural transfer– Consider relaxing specific feature constraints

• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)

• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary

• The seed rules in a group form the specific boundary of a version space

• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

October 11, 2002 AMTA 2002 19

Seeded Version Space Learning

NP v det n NP VP …1. Group seed rules into version spaces as above.2. Make use of partial order of rules in version space. Partial order is defined

via the f-structures satisfying the constraints.3. Generalize in the space by repeated merging of rules:

1. Deletion of constraint2. Moving value constraints to agreement constraints, e.g.

((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num)

4. Check translation power of generalized rules against sentence pairs

October 11, 2002 AMTA 2002 20

Seeded Version Space Learning: Example

S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-sing) … )((y3 agr) = *3-sing) ((y4 agr) = *3-sing)… )

S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-plu) …((y3 agr) = *3-plu) ((y4 agr) = *3-plu)… )

S::S[NP aux neg v det n] -> [NP n det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …((y1 def) = *+) ((y1 case) = *nom)((y4 agr) = (y3 agr))… )

October 11, 2002 AMTA 2002 21

Preliminary Evaluation

• English to German• Corpus of 141 ADJPs, simple NPs and

sentences• 10-fold cross-validation experiment• Goals:

– Do we learn useful transfer rules?– Does Compositionality improve

generalization?– Does VS-learning improve generalization?

October 11, 2002 AMTA 2002 22

Summary of Results

• Average translation accuracy on cross-validation test set was 62%

• Without VS-learning: 43%• Without Compositionality: 57%• Average number of VSs: 24• Average number of sents per VS: 3.8• Average number of merges per VS: 1.6• Percent of compositional rules: 34%

October 11, 2002 AMTA 2002 23

Conclusions

• New paradigm for learning transfer rules from pre-designed elicitation corpus

• Geared toward languages with very limited resources

• Preliminary experiments validate approach: compositionality and VS-learning improve generalization

October 11, 2002 AMTA 2002 24

Future Work

1. Larger, more diverse elicitation corpus2. Additional languages (Mapudungun…)3. Less information on TL side4. Reverse translation direction5. Refine the various algorithms:

• Operators for VS generalization• Generalization VS search• Layers for compositionality

6. User interactive verification

October 11, 2002 AMTA 2002 25

Seeded Version Space Learning: Generalization

• The partial order of the version space:Definition: A transfer rule tr1 is strictly more general than another transfer rule tr2 if all f-structures that are satisfied by tr2 are also satisfied by tr1.

• Generalize rules by merging them:– Deletion of constraint– Raising two value constraints to an agreement

constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num))

October 11, 2002 AMTA 2002 26

Seeded Version Space Learning: Merging Two Rules

Merging algorithm proceeds in three steps. To merge tr1 and tr2 into trmerged:

1. Copy all constraints that are both in tr1 and tr2 into trmerged

2. Consider tr1 and tr2 separately. For the remaining constraints in tr1 and tr2 , perform all possible instances of raising value constraints to agreement constraints.

3. Repeat step 1.

October 11, 2002 AMTA 2002 27

Seeded Version Space Learning:The Search

• The Seeded Version Space algorithm itself is the repeated generalization of rules by merging

• A merge is successful if the set of sentences that can correctly be translated with the merged rule is a superset of the union of sets that can be translated with the unmerged rules, i.e. check power of rule

• Merge until no more successful merges

Documents

Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language