Word Sense Disambiguation for Machine Translation

Word Sense Disambiguation for Machine Translation

Han-Bin Chen2010.11.24

Reference Paper• Cabezas and Resnik. 2005. Using WSD Techniques for Lexical

Selection. (Technical report)• Carpuat and Wu. 2005. Word Sense Disambiguation vs.

Statistical Machine Translation. (ACL 2005)• Carpuat and Wu. 2005. Improving Statistical Machine

Translation using Word Sense Disambiguation. (EMNLP 2007)• Chan et al. 2007. Word Sense Disambiguation Improves

Statistical Machine Translation. (ACL 2007)• Apidianaki. 2009. Data-driven semantic analysis for

multilingual WSD. (EACL 2009)

SMT Workflow

Language model

Input: source language

Translation modelReordering model

Bilingual Corpus Monolingual Corpus

Decoder

Output: target language

MT Research Areas

Language model

Input: source language

Translation modelReordering model

Bilingual Corpus Monolingual Corpus

Decoder

Output: target language

Word Alignment

Evaluation Metric

Translation Model (TM)

• Research in TM– Phrase extraction– Phrase filtering– Phrase augmentation– Word Sense Disambiguation (WSD)

Traditional WSD

• Target word is a single content word– Noun, verb, adjectives

• Classification task with predefined senses– WordNet, HowNet

• Modern WSD system– Not limited to local context– Linguistic information– Position-sensitive– Syntactic– Collocation

• A intuitive application of WSD is SMT

WSD in MT

• Wrong translations from Google Translate• what is today's special ?• 什麼是今天的特色 ?

• I would like to reserve a table for three• 我想保留一表三• the plane will briefly stop over in the airport• 這架飛機將簡要地停留在機場

WSD in MT: Early Stage

• Whether WSD model can help SMT– Energetically debated question over the past years

• Implicit WSD in SMT– Local context: phrase table & language model

• Dedicated WSD system– Wider variety of context features– Position, sentence-level, document-level features

• WSD should play a role in MT• Publicly available SMT system

– Pharaoh by Philipp Koehn (2003~2004)

Small Scale Experiment (1)

• Marine CARPUAT and Dekai Wu, 2005• Chinese-to-English translation task• Chinese lexical sample task includes 20 target• Trained with state-of-the-art WSD

– 37 training instances per target word

(manual annotation)

Small Scale Experiment (2)

• Hard decision– Force the decoder to choose translations from glosses– Decided by language model

• Surprising and frustrating result– Small data, out-of-domain material, hard decision– Language model effect

Translation Disambiguation (1)

• Clara Cabezas and Philip Resnik, 2005– Address 3 problems of the previous work

• Use aligned target word directly as "sense"– 4 senses for "briefly": { 短暫地 , 短時間地 , 簡潔地 , 簡要地 }– Trained with state-of-the-art WSD– Handle "small data" and "out-of-domain" problems

• Soft decision– Pharoah XML markup

• Choose specified translations and translation model together

– Handle "hard decision" problem

Translation Disambiguation (2)• Pharaoh XML markup

• Experiment & Result• Spanish-to-English test from Europarl test• WSD: 0.2382, Baseline: 0.2356• Not statistically significant• But at least it is not a decrease

Toward Better Integration into SMT

• How to better integrate WSD into SMT?• Phrase-based sense disambiguation (PSD)• Key points

– Phrase, not word– Integration into log-linear model: weight tuning

Successful Integration (1)

• Chan et al., 2007• Chinese-to-English translation• Sense disambiguation on Chinese phrase

– 1 or 2 consecutive Chinese words– Extract training examples from word-aligned corpus

• Add WSD features– Contextual probability of WSD – Reward probability of WSD

Successful Integration (2)

• Statistically significant improvement

• 將無法取得更多援助或其他讓步• Hiero: will be more aid and other concessions• Hiero+WSD: will be unable to obtain more aid and other

concessions

PSD System (1)

• Marine CARPUAT and Dekai Wu, 2007• WSD model for every phrase

– Extract training data from phrase extraction– WSD probability as new feature

• Comments– Not every phrase need WSD– Technical problem (Pharaoh)

PSD System (2)

• Result: better translation on all test sets

IWSLT 2006 dataset NIST 2004 test set

PSD System (3)

Recent Issue

• Different translations may have the same sense– 2 senses for "briefly", rather than 4– Sense 1: { 短暫地 , 短時間地 }– Sense 2: { 簡潔地 , 簡要地 }

• Automatic sense clustering

Sense Clustering (1)

• Marianna Apidianaki, 2009• Two translations are semantically related

– If they occur in similar context

• Translation unit (TU) as context– Bilingual sentence pair

• Source word "briefly" • Translations

– { 短暫地 , 短時間地 , 簡潔地 , 簡要地 }– {t1, t2, t3, t4}


• "briefly-t1" occurs in context {TU1, TU4, TU25, TU88…}• "briefly-t2" occurs in context {TU5, TU18, TU92, TU126…}• Clustering based on pairwise context similarity

– Apidianaki, 2008


• Experiment– English-Greek translation– 150 ambiguous English nouns

• Evaluation of lexical selection– Strict precision (Exact match with answer word)

– Enriched precision (Match with the cluster of answer word)

• Result

Conclusion

• From WSD to PSD• However, semantic is also important• Future work

– Semantic PSD

Documents

Word Sense Disambiguation for Machine Translation