Bayesian Word Alignment for Statistical Machine Translation
Authors: Coskun Mermer, Murat Saraclar
Present by Jun Lang2011-10-13 I2R SMT-Reading Group
Paper info
• Bayesian Word Alignment for Statistical Machine Translation
• ACL 2011 Short Paper
• With Source Code in Perl on 379 lines
• Authors– Coskun Mermer– Murat Saraclar
Core Idea
• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1
• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM
Mathematics
• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic
h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj
• T: parameter table, size is VE x VF
• te,f = P(f|e): word translation probability
IBM Model 1
T as a random variable
Dirichlet Distribution
• T={te,f} is an exponential family distribution
• Specifically being multinomial distribution
• We choose the conjugate prior
• In the case of Dirichlet Distribution for computational convenience
Dirichlet Distribution
Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution
Avoid rare words acting as “garbage collectors”
Dirichlet Distribution
sample the unknowns A and T in turn
¬j denotes the exclusion ofthe current value of aj .
Algorithm
A can be arbitrary, but normal EM output is better
Results
Code View
bayesalign.pl
Conclusions
• Outperform classical EM in BLEU up to 2.99
• Effectively address the rare word problem
• Much smaller phrase table than EM
• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing
3