34
HMM Word and Phrase Alignment for Statistical Machine Translation Yonggang Deng 1 William Byrne 1,2 1 Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD 21218, USA 2 Machine Intelligence Lab, Cambridge University Engineering Department Trumpington Street, Cambridge CB2 1PZ, UK HLT-EMNLP Conference, 2005 Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 1 / 18

HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

HMM Word and Phrase Alignment forStatistical Machine Translation

Yonggang Deng1 William Byrne1,2

1Center for Language and Speech Processing,The Johns Hopkins University,

Baltimore, MD 21218, USA

2Machine Intelligence Lab,Cambridge University Engineering DepartmentTrumpington Street, Cambridge CB2 1PZ, UK

HLT-EMNLP Conference, 2005

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 1 / 18

Page 2: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction

Goal: improve word alignments of parallel corpora for bettertranslationIBM Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

The state of the art word alignments especially on large bitexts

We want to propose an alternativeComparable performance to Model-4Efficient training, fast, with controlled memory usage

Can build a single model over large bitextsGoal is to use the model, not just the alignments

Will give examples of extending phrase pairs under model-basedposterior distribution

Present a statistical Word-to-Phrase HMM alignment model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 2 / 18

Page 3: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Outline

1 IntroductionHMM-based word alignment modelsIBM Model-4

2 Word-to-Phrase HMM Alignment ModelsModel FormulationParameter EstimationWord Alignment Results

3 Statistical Phrase Alignment ModelsWord Alignment Induced Phrase Translation ModelsModel-based Phrase Pair PosteriorTranslation Results

4 Conclusions

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 3 / 18

Page 4: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction HMM

HMM-based Word Alignment Models (Vogel et al, ’96)

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

State sequences←→ word alignmentsModel parameters: transition probabilities, translation tablesWords are generated one by one, one transition emits one target wordEfficient Balm-Welch and Viterbi algorithms

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 4 / 18

Page 5: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction HMM

HMM-based Word Alignment Models (Vogel et al, ’96)

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

State sequences←→ word alignmentsModel parameters: transition probabilities, translation tablesWords are generated one by one, one transition emits one target wordEfficient Balm-Welch and Viterbi algorithms

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 4 / 18

Page 6: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction HMM

HMM-based Word Alignment Models (Vogel et al, ’96)

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

State sequences←→ word alignmentsModel parameters: transition probabilities, translation tablesWords are generated one by one, one transition emits one target wordEfficient Balm-Welch and Viterbi algorithms

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 4 / 18

Page 7: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

IBM Model-4 Word Alignments (Brown et al, ’93)

Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

中国 早日 加入 世贸组织

s1 s2 s3 s4

ButWhat makes the model powerful also makes computation complexExact-EM is problematic, sub-optimal estimation algorithms usedDifficult to compute statistics under the model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 5 / 18

Page 8: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

IBM Model-4 Word Alignments (Brown et al, ’93)

Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

中国 早日 加入 世贸组织

s1 s2 s3 s4

Create a tablet for each source word

ButWhat makes the model powerful also makes computation complexExact-EM is problematic, sub-optimal estimation algorithms usedDifficult to compute statistics under the model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 5 / 18

Page 9: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

IBM Model-4 Word Alignments (Brown et al, ’93)

Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

中国 早日 加入 世贸组织

s1 s2 s3 s4

Table lookup to decide fertility: # of target words connected

ButWhat makes the model powerful also makes computation complexExact-EM is problematic, sub-optimal estimation algorithms usedDifficult to compute statistics under the model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 5 / 18

Page 10: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

IBM Model-4 Word Alignments (Brown et al, ’93)

Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

中国 早日 加入 世贸组织

s1 s2 s3 s4

the

world

trade

organization

at

an

early

date

accession

to

china

‘s

Sample target words from translation table i.i.d.

ButWhat makes the model powerful also makes computation complexExact-EM is problematic, sub-optimal estimation algorithms usedDifficult to compute statistics under the model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 5 / 18

Page 11: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

IBM Model-4 Word Alignments (Brown et al, ’93)

Model-4 generated by GIZA++ Toolkit (Och and Ney, ’03)

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

the

world

trade

organization

at

an

early

date

accession

to

china

‘s

Distortion Model

ButWhat makes the model powerful also makes computation complexExact-EM is problematic, sub-optimal estimation algorithms usedDifficult to compute statistics under the model

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 5 / 18

Page 12: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Introduction IBM Model-4

Goal

To find replacement for Model-4 word-to-word Alignment withparameter estimation as efficient as HMM

retain Markov property, omit distortioncomparable performance to Model-4

incorporate fertilityefficient statistics computation

want to use models, not just the alignments

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 6 / 18

Page 13: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Make HMM More Powerful in Generating Observations

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

Target phrases rather than words are emitted after jumping into a stateBuilding links from source words to target phrases explicitlyCan model dependencies between words within phrases

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 7 / 18

Page 14: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Make HMM More Powerful in Generating Observations

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

Target phrases rather than words are emitted after jumping into a stateBuilding links from source words to target phrases explicitlyCan model dependencies between words within phrases

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 7 / 18

Page 15: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Make HMM More Powerful in Generating Observations

china ‘s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

s1

t1 t2

s2

t9

s3 s4

t3 t4 t5 t6 t7 t8 t10 t12t11

Target phrases rather than words are emitted after jumping into a stateBuilding links from source words to target phrases explicitlyCan model dependencies between words within phrases

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 7 / 18

Page 16: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Word-to-Phrase HMM Alignment Models

Word-to-phrase alignment a : saj → v j

Phrase Count Model P(K ), depends on wordsMarkov Transition P(i|i ′)Phrase Length Model n(φ; source word), similar to fertilityWord-to-Phrase Translation Probabilities P(v j |saj ; φ)

china ’s accession to the world trade organization at an early date

中国 早日 加入 世贸组织

v1 v2 v3 v4

P( 3 | 1 ) P( 4 | 3 )

P( 2 | 4 )

P( K = 4 )

P(v1 | sa1 ; 1) P(v2 | sa2 ; 2) P(v3 | sa3 ; 3) P(v4 | sa4 ; 4)

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 8 / 18

Page 17: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Word-to-Phrase HMM vs. IBM Model-4

IBM Model-4 WtoP HMMword fertility + distortion phrase length model

e2

f1

e1

f3

f4

f2

f3

f4

f2

f1

e2

e1

f3

f4

f2

f1

Not exactly equivalent, but similar descriptive powerInspired by features of Model-4but incorporated within HMM

to allow efficient estimation and alignment

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 9 / 18

Page 18: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Model Formulation

Word-to-Phrase Translation Probabilities

Basic model components can be refined in various ways

Replace weak i.i.d. word-by-word translation

P(world trade organization|-���; 3) =?

= t(world|-���) · t(trade|-���) · t(organization|-���)

←− i.i.d. t-table=t(world|-���) · t2(trade|world,-���) · t2(organization|trade,-���)

←− bigram t-tableModel i.i.d. bigramP(world|-���) 0.06 0.06P(trade|world,-���) 0.06 0.99P(organization|trade,-���) 0.06 0.99P(world trade organization|-���, 3) 0.0002 0.0588

Assigns higher probability to correct translation than i.i.dIncorporates context without losing algorithmic efficiency

DP still possibleUse same estimation techniques as used for bigram LMsData sparseness, Witten-Bell smoothing

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 10 / 18

Page 19: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Parameter Estimation

Embedded Estimation of Word-to-Phrase HMM

Forward-Backward Procedure

Incremental build (experience from ASR)Pruning can be applied

Unsupervised training from scratch

Model-1, 10 its (initial t-table)Model-2, 5 its (better t-table)WtoW HMM, 5 its (initial Markov model)WtoP HMM N=2, 3, .., each 5 its (Markov model, phrase length)WtoP HMM with bigram t-table, 5 its (bigram t-table)

Parallel Implementation

Partitioning training bitextE-step: Collect counts from each partition parallelM-step: Merge counts to update model parametersMemory efficient, virtually no limitation on training bitext size

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 11 / 18

Page 20: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Parameter Estimation

Embedded Estimation of Word-to-Phrase HMM

Forward-Backward Procedure

Incremental build (experience from ASR)Pruning can be applied

Unsupervised training from scratch

Model-1, 10 its (initial t-table)Model-2, 5 its (better t-table)WtoW HMM, 5 its (initial Markov model)WtoP HMM N=2, 3, .., each 5 its (Markov model, phrase length)WtoP HMM with bigram t-table, 5 its (bigram t-table)

Parallel Implementation

Partitioning training bitextE-step: Collect counts from each partition parallelM-step: Merge counts to update model parametersMemory efficient, virtually no limitation on training bitext size

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 11 / 18

Page 21: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Parameter Estimation

Embedded Estimation of Word-to-Phrase HMM

Forward-Backward Procedure

Incremental build (experience from ASR)Pruning can be applied

Unsupervised training from scratch

Model-1, 10 its (initial t-table)Model-2, 5 its (better t-table)WtoW HMM, 5 its (initial Markov model)WtoP HMM N=2, 3, .., each 5 its (Markov model, phrase length)WtoP HMM with bigram t-table, 5 its (bigram t-table)

Parallel Implementation

Partitioning training bitextE-step: Collect counts from each partition parallelM-step: Merge counts to update model parametersMemory efficient, virtually no limitation on training bitext size

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 11 / 18

Page 22: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Word-to-Phrase HMM Alignment Models Results

Bitext Alignment Results

Test: NIST 2001 MT-eval set, 124 sentence pairs w/ manual word alignments

Comparable performance to Model-4 on FBIS training bitext

Increasing max phrase length N improves quality in C → E direction

Bigram translation probability improves word-to-phrase links

A good balance between 1-1 and 1-N distribution can be achieved

2 4 6 8 10 121500

1850

2200

2550

2900

3250

3600

3950

4300

η

# of h

ypoth

esize

d link

s

0 2 4 6 8 10 12 1426

28

30

32

34

36

38

40

42

Over

all A

ERη

1−1 Links1−N Links Overall AER

Comparable performance when extending to large scale bitexts

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 12 / 18

Page 23: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Word Alignment Induced Phrase Translation Models

Statistical Phrase Translation Models

Phrase-based SMT performs better than word-based SMT

Phrases Pair Inventory (PPI) extracted from word aligned bitext (Och et al, ’99)

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 13 / 18

Page 24: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Word Alignment Induced Phrase Translation Models

But word alignments are imperfect ...

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

?

Relying on the one-best word alignment may exclude some valid phrase pairs

Goal is to define a probability distribution over phrase pairs

Allows more control over generation of phrase pairs

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 14 / 18

Page 25: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Word Alignment Induced Phrase Translation Models

But word alignments are imperfect ...

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

Relying on the one-best word alignment may exclude some valid phrase pairs

Goal is to define a probability distribution over phrase pairs

Allows more control over generation of phrase pairs

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 14 / 18

Page 26: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Word Alignment Induced Phrase Translation Models

But word alignments are imperfect ...

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

Relying on the one-best word alignment may exclude some valid phrase pairs

Goal is to define a probability distribution over phrase pairs

Allows more control over generation of phrase pairs

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 14 / 18

Page 27: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Model-based Phrase Pair Posterior

Doesn’t rely on a single alignment

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

j1 j2

i1 i2

Define a set of alignments that align words to words in phrasesA(i1, i2; j1, j2) = {am

1 : aj ∈ [i1, i2] iff j ∈ [j1, j2]}Calculate the likelihood of the source phrase producing the target phraseP(T, A(i1, i2; j1, j2 ) |S) =

Pa : am

1 ∈A(i1,i2;j1,j2 ) P(T, a|S)

Obtain phrase pair posteriorP(A(i1, i2; j1, j2 ) |T, S) = P(T, A(i1, i2; j1, j2 ) |S)/P(T|S)

Efficient DP-based implementation for WtoP HMM, Difficult for Model-4

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 15 / 18

Page 28: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Model-based Phrase Pair Posterior

Doesn’t rely on a single alignment

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

j1 j2

i1 i2

Define a set of alignments that align words to words in phrasesA(i1, i2; j1, j2) = {am

1 : aj ∈ [i1, i2] iff j ∈ [j1, j2]}Calculate the likelihood of the source phrase producing the target phraseP(T, A(i1, i2; j1, j2 ) |S) =

Pa : am

1 ∈A(i1,i2;j1,j2 ) P(T, a|S)

Obtain phrase pair posteriorP(A(i1, i2; j1, j2 ) |T, S) = P(T, A(i1, i2; j1, j2 ) |S)/P(T|S)

Efficient DP-based implementation for WtoP HMM, Difficult for Model-4

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 15 / 18

Page 29: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Model-based Phrase Pair Posterior

Doesn’t rely on a single alignment

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

j1 j2

i1 i2

Define a set of alignments that align words to words in phrasesA(i1, i2; j1, j2) = {am

1 : aj ∈ [i1, i2] iff j ∈ [j1, j2]}Calculate the likelihood of the source phrase producing the target phraseP(T, A(i1, i2; j1, j2 ) |S) =

Pa : am

1 ∈A(i1,i2;j1,j2 ) P(T, a|S)

Obtain phrase pair posteriorP(A(i1, i2; j1, j2 ) |T, S) = P(T, A(i1, i2; j1, j2 ) |S)/P(T|S)

Efficient DP-based implementation for WtoP HMM, Difficult for Model-4

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 15 / 18

Page 30: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Model-based Phrase Pair Posterior

Doesn’t rely on a single alignment

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

j1 j2

i1 i2

Define a set of alignments that align words to words in phrasesA(i1, i2; j1, j2) = {am

1 : aj ∈ [i1, i2] iff j ∈ [j1, j2]}Calculate the likelihood of the source phrase producing the target phraseP(T, A(i1, i2; j1, j2 ) |S) =

Pa : am

1 ∈A(i1,i2;j1,j2 ) P(T, a|S)

Obtain phrase pair posteriorP(A(i1, i2; j1, j2 ) |T, S) = P(T, A(i1, i2; j1, j2 ) |S)/P(T|S)

Efficient DP-based implementation for WtoP HMM, Difficult for Model-4

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 15 / 18

Page 31: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Model-based Phrase Pair Posterior

Doesn’t rely on a single alignment

There is no gang and money linked politics in hong kong and there will not be such politics in future either

香港 今日 没有 黑金黑金黑金黑金 政治政治政治政治,今后 亦 不会 有 黑金 政治

j1 j2

i1 i2

Define a set of alignments that align words to words in phrasesA(i1, i2; j1, j2) = {am

1 : aj ∈ [i1, i2] iff j ∈ [j1, j2]}Calculate the likelihood of the source phrase producing the target phraseP(T, A(i1, i2; j1, j2 ) |S) =

Pa : am

1 ∈A(i1,i2;j1,j2 ) P(T, a|S)

Obtain phrase pair posteriorP(A(i1, i2; j1, j2 ) |T, S) = P(T, A(i1, i2; j1, j2 ) |S)/P(T|S)

Efficient DP-based implementation for WtoP HMM, Difficult for Model-4

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 15 / 18

Page 32: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Model-based Phrase Pair Posterior

Augmented PPI for a Better Coverage

Baseline PPI

extracted from 1-best alignments using establishing techniques(Och et al., ’99)

Goal: add phrase pairs to improve test set coverage

For each foreign phrase v in test set not covered by the baseline

for each sentence pair containing vfind the English phrase u that maximizes the phrase pair posterioradd (u, v) to the baseline PPI if posterior exceeds a threshold value

Balance coverage against phrase translation quality

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 16 / 18

Page 33: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Statistical Phrase Alignment Models Translation Results

Translation Results

25.5

26

26.5

27

27.5

28

28.5

eval02 eval03 eval04(news)

Model-4 Baseline

WtoP HMM Baseline

WtoP HMM Augmented

36

37

38

39

40

41

42

43

eval02 eval03 eval04(news)

Model-4 Baseline

WtoP HMM Baseline

WtoP HMM Augmented

Chinese-English Arabic-English

TTM Decoder - WFST implementation with monotone order (Kumar et al, ’05)

Used all parallel corpora available from LDC

C-E: 200M En. words (FBIS, Xinhua, HK News, ..., all UN bitexts)A-E: 130M En. words (news, all UN bitexts)

Relaxing threshold in PPI augmenting improves coverage and BLEU score

WtoP model can even be applied to augment Model-4 PPI (see the paper)

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 17 / 18

Page 34: HMM Word and Phrase Alignment for Statistical Machine ...mi.eng.cam.ac.uk/~wjb31/ppubs/hltemnlp05wtoptalk.pdf · Introduction HMM HMM-based Word Alignment Models (Vogel et al, ’96)

Conclusions

Conclusions

The word-to-phrase HMM alignment modelis capable of good quality bitext word alignment over very largetraining bitextshas efficient training algorithm with parallel implementation

Model-based phrase pair distribution enablesan improved phrase pair extraction strategycontrolled balance coverage vs. quality

WtoP HMM performsslightly better than Model-4 on large C-E translation systemsignificantly better than Model-4 on large A-E translation system

WtoP HMM is a powerful framework for exploring new models ofword and phrase translation

Deng & Byrne (CLSP) HMM Word & Phrase Alignment for SMT HLT-EMNLP 2005 18 / 18