Download ppt - Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Bayesian Word Alignment for Statistical Machine Translation

Authors: Coskun Mermer, Murat Saraclar

Present by Jun Lang2011-10-13 I2R SMT-Reading Group

Paper info

• Bayesian Word Alignment for Statistical Machine Translation

• ACL 2011 Short Paper

• With Source Code in Perl on 379 lines

• Authors– Coskun Mermer– Murat Saraclar

Core Idea

• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1

• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM

Mathematics

• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic

h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj

• T: parameter table, size is VE x VF

• te,f = P(f|e): word translation probability

IBM Model 1

T as a random variable

Dirichlet Distribution

• T={te,f} is an exponential family distribution

• Specifically being multinomial distribution

• We choose the conjugate prior

• In the case of Dirichlet Distribution for computational convenience


Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution

Avoid rare words acting as “garbage collectors”


sample the unknowns A and T in turn

¬j denotes the exclusion ofthe current value of aj .

Algorithm

A can be arbitrary, but normal EM output is better

Results

Code View

bayesalign.pl

Conclusions

• Outperform classical EM in BLEU up to 2.99

• Effectively address the rare word problem

• Much smaller phrase table than EM

• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing

3