Session #2

Session #2

• Introductions– Syllabus, requirements, etc.

• Language in 10 minutes• Lecture

– MT Approaches– MT Evaluation

• MT Lab

Road Map

• Multilingual Challenges for MT• MT Approaches• MT Evaluation

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Gisting

MT ApproachesGisting Example

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.


Source word

Source syntax


Target syntax

Target word

Analysis Generation

Gisting

Transfer

MT ApproachesTransfer Example

• Transfer Lexicon – Map SL structure to TL structure

poner

X mantequilla en

Y

:obj:mod:subj

:obj

butter

X Y

:subj :obj

X puso mantequilla en Y X buttered Y


Source word

Source syntax


Target syntax

Target word

Analysis Generation

Gisting

Transfer

Interlingua

MT ApproachesInterlingua Example: Lexical Conceptual Structure

(Dorr, 1993)


Source word

Source syntax


Target syntax

Target word

Analysis Generation

Interlingua

Gisting

Transfer


Source word

Source syntax


Target syntax

Target word

Analysis Generation

Interlingual Lexicons

Dictionaries/Parallel Corpora

Transfer Lexicons

MT ApproachesStatistical vs. Rule-based

Source word

Source syntax


Target syntax

Target word

Analysis Generation

Statistical MT Noisy Channel Model

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Statistical MT Automatic Word Alignment

• GIZA++– A statistical machine translation toolkit used to train word alignments.– Uses Expectation-Maximization with various constraints to bootstrap

alignments

Slide based on Kevin Knight’s http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Mary

did

not

slap

the

green

witch

Maria no dio una bofetada a la bruja verde

Giza Plateau and MT Pyramid

• EGYPT toolkit• Giza, Giza++• Cairo visualizer• Pharaoh• Hiero• Moses

Words WordsWords

Statistical MT IBM Model (Word-based Model)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Phrase-Based Statistical MT

• Foreign input segmented in to phrases– “phrase” is any sequence of words

• Each phrase is probabilistically translated into English– P(to the conference | zur Konferenz)– P(into the meeting | zur Konferenz)

• Phrase distortion

This is state-of-the-art!

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Mary

did

not

slap

the

green

witch

Maria no dió una bofetada a la bruja verde

Word Alignment Induced Phrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)


Mary

did

not

slap

the

green

witch




(a la, the) (dió una bofetada a, slap the)


Mary

did

not

slap

the

green

witch





(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)

(bruja verde, green witch)


Mary

did

not

slap

the

green

witch





(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)

(a la bruja verde, the green witch) …

Word Alignment Induced PhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Mary

did

not

slap

the

green

witch





(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)

(a la bruja verde, the green witch) …

(Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)

Word Alignment Induced PhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Phrase-Based Models• Sentence f is decomposed into J phrases f1

J = f1,...,fj,...,fJ

• Sentence e is decomposed into l phrases e = eI1 = e1,...,ei,...,eI.

• We choose the sentence with the highest probability:

Phrase-Based Models• Model the posterior probability using a log-linear combination

of feature functions. • We have a set of M feature functions hm(eI

1,f1J),m = 1,...,M. For

each feature function, there exists a model parameter λm ,m = 1,...,M

• The decision Rule is

• Features cover the main components• Phrase-Translation Model• Reordering Model• Language Model

Advantages of Phrase-Based SMT

• Many-to-many mappings can handle non-compositional phrases

• Local context is very useful for disambiguating– “Interest rate” …– “Interest in” …

• The more data, the longer the learned phrases– Sometimes whole sentences


Source word

Source syntax


Target syntax

Target word

Analysis Generation

MT ApproachesStatistical vs. Rule-based vs. Hybrid

Knowledge Acquisition Strategy

Knowledge Representation Strategy

All manual

Deep/ Complex

Shallow/ Simple

Fully automated

Learn from un-annotated data

Phrase tables

Word-based only

Learn from annotated data

Example-based MT

Original statistical MT

Typical transfer system

Classic interlingual system

Original direct approach

Syntactic Constituent Structure

Interlingua

New Research Goes Here!

Semantic analysis

Hand-built by non-experts

Hand-built by experts

Electronic dictionaries

Slide courtesy ofLaurie Gerber

MT Approaches Practical Considerations

• Resources Availability– Parsers and Generators

• Input/Output compatability– Translation Lexicons

• Word-based vs. Transfer/Interlingua– Parallel Corpora

• Domain of interest• Bigger is better

• Time Availability– Statistical training, resource building

Road Map

• Multilingual Challenges for MT• MT Approaches• MT Evaluation

MT Evaluation

• More art than science• Wide range of Metrics/Techniques

– interface, …, scalability, …, faithfulness, ... space/time complexity, … etc.

• Automatic vs. Human-based– Dumb Machines vs. Slow Humans

Human-based Evaluation ExampleAccuracy Criteria

Human-based Evaluation ExampleFluency Criteria

Fluency vs. Accuracy

Accuracy

Fluency

conMTFAHQMT

Prof.MT

Info.MT

Automatic Evaluation ExampleBleu Metric

(Papineni et al 2001)

• Bleu – BiLingual Evaluation Understudy – Modified n-gram precision with length penalty – Quick, inexpensive and language independent – Correlates highly with human evaluation – Bias against synonyms and inflectional variations

Test Sentence

colorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep iratelydrab emerald concepts sleep furiously

colorless immature thoughts nap angrily


Test Sentence

colorless green ideas sleep furiously




Unigram precision = 4/5


Test Sentence

colorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiously




Unigram precision = 4 / 5 = 0.8Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n

= (0.8 0.5)╳ ½ = 0.6325 63.25


Automatic Evaluation ExampleMETEOR

(Lavie and Agrawal 2007)

• Metric for Evaluation of Translation with Explicit word Ordering

• Extended Matching between translation and reference– Porter stems, wordNet synsets

• Unigram Precision, Recall, parameterized F-measure• Reordering Penalty• Parameters can be tuned to optimize correlation with

human judgments• Not biased against “non-statistical” MT systems

• A syntactically-aware evaluation metric– (Liu and Gildea, 2005; Owczarzak et al.,

2007; Giménez and Màrquez, 2007)• Uses dependency representation

– MICA parser (Nasr & Rambow 2006)– 77% of all structural bigrams are surface

n-grams of size 2,3,4• Includes dependency surface span as

a factor in score– long-distance dependencies should

receive a greater weight than short distance dependencies

• Higher degree of grammaticality? 0%5%

10%15%20%25%30%35%40%45%50%

1 2 3 4plus

Automatic Evaluation Example

SEPIA (Habash and ElKholy 2008)

Metrics MATR Workshop

• Workshop in AMTA conference 2008 – Association for Machine Translation in the Americas

• Evaluating evaluation metrics• Compared 39 metrics

– 7 baselines and 32 new metrics– Various measures of correlation with human judgment– Different conditions: text genre, source language,

number of references, etc.

Building an SMT System• Training

– Input• Parallel Training Data• Monolingual Data of the Target Language

– Output• Phrase Table• Language Model• Initial Translation Model parameters

• Tuning– Input

• Parallel Tune Data Set• Decoder • Phrase Table + Language Model + Initial Translation Model parameters

– Output• Improved Translation Model parameters

• Development & Testing– Input

• Parallel Test Data Set• Decoder• Phrase Table + Language Model + FinalTranslation Model parameters

– Output• BLEU score

Training

Preprocess

Alignment

Phrase Extraction

Phrase Table (PT)

LM

MonolingualTargetSource

Create LM

Tuning

Tune Set

Preprocess

Decoder

LM

PT

Weights

Output

Metric Score Optimal?

Adjust Weights

Done

No

Yes

Testing

Test Set

Preprocess

Decoder

LM

PT Weights

Output

Documents

Session #2