34
Methods for Machine Translation Prasanth Kolachina Statistical methods for NLP March 13 th 2014 P. Kolachina (Sprakbanken) MT 13 th Mar, 2014 1 / 34

Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Methods for Machine Translation

Prasanth Kolachina

Statistical methods for NLP

March 13th 2014

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 1 / 34

Page 2: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Outline

1 Introduction to Machine Translation

2 Statistical Machine Translation

3 IBM Word Based Models

4 Current approches in SMT

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 2 / 34

Page 3: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

What is M. Translation?

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 3 / 34

Page 4: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Machine Translation

Translation is task of transforming text in one language to anotherlanguage

interpretation of meaningpreservation of meaning and structure in original text

Importance of context in interpretation and translation

There is nothing outside the text.– Jacques Derrida, “Of Grammatology” (1967)

This transformation process: can it be automatized?

Machine TranslationIf not completely, to what extent?

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 4 / 34

Page 5: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Origins of Mechanical Translation

First ideas from information theory

“Translation memorandum” Weaver [1955]

Essentially, decoding the meaning in one language and re-encoding thesame in target language

Early attempts to translate using a bilingual dictionary

Information encoding in text is more complex than simple wordmeanings

Encoded at different levels of “linguistic analysis”Morphology, Syntax, Semantics, Discourse and Pragmatics

ALPAC report led to the creation of Computational Linguistics

Advanced research in both Linguistics and Computer ScienceE.g. Quick sort

Was originally called Mechanical Translation!

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 5 / 34

Page 6: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Formalizing approaches to MT

Post ALPAC report of 1966

Formal grammars and algorithms for NLU, NLG and MTVauquois [1968]

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 6 / 34

Page 7: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Corpus-based MT

Hand-crafted translation grammars are difficult to develop

Require many man-hours from linguistic expertsLimitations on coverage of grammars

Can these grammars be learnt from data?

Parallel corporaNaturally “occurring” for different languagesTranslation fragments are aligned at some level

Typically, sentence alignedTranslation memories!

Example-based, Statistical MT

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 7 / 34

Page 8: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Corpus-based MT

1Example from Petrov [2012]P. Kolachina (Sprakbanken) MT 13th Mar, 2014 8 / 34

Page 9: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Statistical MT

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 9 / 34

Page 10: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Noisy-Channel model

Warren Weaver’s “memorandum”When I look at an article in Russian, I say: “This is really written in

English, but it has been coded in some strange symbols. I will now

proceed to decode”.

– Weaver [1955]

Original message in English (S) can be “reconstructed” using a sourcemodel and a channel model for a signal in Russian (R)

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 10 / 34

Page 11: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Statistical MT

Translation is a search problem

E = argmaxE

P(E|F)

By application of Bayes rule and mathematical simplification

E = argmaxE

PTM(F|E) ∗PLM(E)

Two primary components in the model

Translation model PTM ≈ channel modelLanguage model PLM ≈ source model

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 11 / 34

Page 12: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Formalizing approaches to SMT

2Example from Petrov [2012]P. Kolachina (Sprakbanken) MT 13th Mar, 2014 12 / 34

Page 13: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Word level SMT

Model to learn word translations from corpora

Proposed by Brown et al. [1993] at IBM

Hence the name, IBM word models

Notion of word alignment

What do these alignments tell?

First approximations to extract “richer” translation models“richer” i.e. linguistic fragments higher in the pyramid

Useful not only in MT, but many other applications

cross-lingual *

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 13 / 34

Page 14: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Word Alignment

3Example from Petrov [2012]P. Kolachina (Sprakbanken) MT 13th Mar, 2014 14 / 34

Page 15: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

IBM Models

Different models to capture regular variations across language

morphologyword order

Models 1-4 for PTM

How toestimate parameters, say

p(new|nouvelles) orp(collecting|perception)

decode new sentences using these parameters

We will look at Models 1 and 2 in today’s lecture!

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 15 / 34

Page 16: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Nuts and Bolts of the IBM Models

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 16 / 34

Page 17: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

IBM Models

Modeling PTM

Different parameters are defined to explain translation process

lexical translation t(f |e) –model 1distortion q –model 2fertility n –model 3relative distortion q′ –model 4

t(f |e) and q for the current discussion

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 17 / 34

Page 18: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

IBM Model 1

Given sentence pairs with word-alignments

can we compute t(f |e)maximum likelihood based on counts

t(haus|house) = C(haus,house)C(house)

or

t(das|the) = C(das,the)C(the)

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 18 / 34

Page 19: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

IBM Model 2

For the same example,

can we compute q(j|i, l,m)maximum likelihood based on counts

q(j|i, l,m) =C(j|i, l,m)

C(i, l,m)

i word position in source sentencej word position in translation

m length of source sentencel length of translation

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 19 / 34

Page 20: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Parameter estimation in IBM Models

So, what is missing in the previous examples?

Assumption of alignments being given is unlikely

Alignments are hidden or latent variablesUnobserved

Recall the Expectation-Maximization algorithm P. Dempster et al. [1977]

estimates a statistical model when hidden variables are present

E-step estimates the parameter values and M-step maximizes thelikelihood of the translations

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 20 / 34

Page 21: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Estimation step

Given the table of counts from previous iteration

estimate distributions t and q defined previously

t(fi|ej) =c(ej , fi)

c(ej)

q(j|i, l,m) =c(j|i, l,m)

c(i, l,m)

for all possible values of (ej , fi) and (j|i, l,m)

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 21 / 34

Page 22: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Maximization step

Given distributions t and q

modify counts to reflect probability of translations

how to estimate probability of translation

δ(i, j, l,m) =q(j|i, l,m) ∗ t(fi|ej)∑l

j′=0 q(j′|i, l,m) ∗ t(fi|e′j)

how to modify counts

c(ej , fi) = c(ej , fi) + δ(i, j, l,m)

c(ej) = c(ej) + δ(i, j, l,m)

c(j|i, l,m) = c(j|i, l,m) + δ(i, j, l,m)

c(i, l,m) = c(i, l,m) + δ(i, j, l,m)

for all possible values of (ej , fi) and (j|i, l,m)

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 22 / 34

Page 23: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Practical issues

How to implement this EM for IBM models?

initialize parameter distributions t and q to random valuesinitialize all count tables c to 0

Maximize first using initial t values over entire corpus

Estimate new parameter distributions using new count tables

Iterate over these two steps until EM reaches convergence

EM will converge for model 2 Collins [2012]

The result can be local optimum rather than “real” solution

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 23 / 34

Page 24: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Decoder

Given a translation model PTM and a language model for targetlanguage PLM

find the most “likely” translation for a source sentence

An intractable problem: no exact solution

Maximize over all possible translationsEach translation can be generated by many underlying alignmentsSum over all such plausible alignments

Number of plausible permutations and alignments are exponential insentence length

Inexact search instead of exact search

approximations make decoding tractable

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 24 / 34

Page 25: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Greedy decoder

Start by assigning each word its most probable translation

hypothesis

Compute the probability of the hypothesis

scores from both PTM and PLM

Make mutations to the hypotheses until no difference in probabilityscores (Turitzin [2005])

What are plausible mutations

Change translation options for each wordAdd new words to hypothesis or remove existing wordsMoving words around inside the hypothesis

swap non-overlapping segments

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 25 / 34

Page 26: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Decoding example

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 26 / 34

Page 27: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Beyond Word models in SMT

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 27 / 34

Page 28: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Shortcomings of IBM Models

Simplifying assumptions in model formulation (Brown et al. [1993])

Lack of context in predicting likely translation of a word

1. The ball went past the bat and hit the stumps in the last ball of the innings.

2. The bat flew out of the cave with wings as black as night itself.

3. They danced to the music all night at the ball.

Not very different from dictionary lookup to translate

Discarding linguistic information encoded in a sentence

Morphological variantsSyntactic structure like part-of-speech tags

Multi-word concepts

break the ice

liven up

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 28 / 34

Page 29: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Extending words to phrases

Phrasal translations rather than word translations (Koehn et al. [2003])

Simple way to incorporate local context into translation model

Phrase pairs are extracted using alignment template

Word alignments are used to extract “good” phrase pairsReordering at phrase level instead of word reorderings

Notion of phrase is not defined linguistically

any n-gram in the language is a phrase

State-of-art models

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 29 / 34

Page 30: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Reiterating ..

4Example from Petrov [2012]P. Kolachina (Sprakbanken) MT 13th Mar, 2014 30 / 34

Page 31: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Encoding Linguistic Information

Various levels of linguistic information

Morphology: gender, number or tense

Factored phrase models

Syntax: syntactic reordering between language pairs

regular patterns for a language pairfor e.g. adjectives in English and French orClause reordering between English and GermanSyntax-based SMT

Other information

Semantics, Discourse, Pragmatics

All of these are open research problems !!

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 31 / 34

Page 32: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Evaluating MT

Evaluation criteria

fluency of translationsadequacy i.e. translations preserving meaning

Human judgements are most reliable

Nießen et al. [2000]

Very expensive and time-consumingVariation in judgements

Automatic evaluation metrics

Compute similarity of translations to reference translationsBLEU, NIST (A. Papineni et al. [2002]) and many moreChoice of metric varies depending on application requirements

How to interpret evaluation scores?

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 32 / 34

Page 33: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

Next?

VG assignment (optional)

Implement IBM Model 2

Help session next week

Interested further in MT

Feel free to contact Richard or me :-)

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 33 / 34

Page 34: Methods for Machine Translation - Göteborgs universitet · Morphology, Syntax, Semantics, Discourse and Pragmatics ALPAC report led to the creation of Computational Linguistics Advanced

References I

A. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation ofMachine Translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, 311–318.Philadelphia, Pennsylvania, USA: Association for Computational Linguistics. URLhttp://www.aclweb.org/anthology/P02-1040.

Brown, Peter E., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of StatisticalMachine Translation: Parameter Estimation. Computational Linguistics 19:263–311. URLhttp://aclweb.org/anthology-new/J/J93/J93-2003.

Collins, Michael. 2012. Statistical Machine Translation: IBM Models 1 and 2.

Koehn, Philipp, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of HumanLanguage Technologies: The 2003 Annual Conference of the North American Chapter of the Association for ComputationalLinguistics, 48–54. Edmonton, Canada: Association for Computational Linguistics. URLhttp://aclweb.org/anthology-new/N/N03/N03-1017.

Nießen, Sonja, Franz Josef Och, Gregor Leusch, and Herrmann Ney. 2000. An Evaluation Tool for Machine Translation: FastEvaluation for MT Research. In Proceedings of the Second Conference on International Language Resources and Evaluation(LREC’00), 39–45. Athens, Greece: European Language Resources Association (ELRA).

P. Dempster, Arthur, Laird M. Nan, and Bruce Rubin Donald. 1977. Maximum Likelihood from Incomplete Data via the EMAlgorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39:1–38. URLhttp://www.jstor.org/stable/2984875.

Petrov, Slav. 2012. Statistical NLP.

Turitzin, Michael. 2005. SMT of French and German into English Using IBM Model 2 Greedy Decoding.

Vauquois, Bernard. 1968. A Survey of Formal Grammars and Algorithms for Recognition and Transformation in MechanicalTranslation. In Proceedings of IFIP Congress, 1114–1122. Edinburgh.

Weaver, Warren. 1955. Translation. Technical report, Cambridge, Massachusetts.

P. Kolachina (Sprakbanken) MT 13th Mar, 2014 34 / 34