Upload
austin-taylor
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES
-Nirdesh Chauhan
Outline
Problem statement in SMT
Translation models
Using Giza++ and Moses
Introduction to SMT
Given a sentence in foreign language F, find most appropriate translation in English E
P(F|E) – Translation model P(E) – Language model
The Generation Process4
Partition: Think of all possible partitions of the source language
Lexicalization: For a give partition, translate each phrase into the foreign language
Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries
We need the notion of alignment to better explain mathematic behind the generation process
Alignment
Word-based alignment
For each word in source language, align words from target language that this word possibly produces
Based on IBM models 1-5 Model 1 – simplest As we go from models 1 to 5, models get
more complex but more realistic
This is all that Giza++ does
Alignment
A function from target position to source position:
7
The alignment sequence is: 2,3,4,5,6,6,6Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..
To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I+1)J
IBM Model 1: Generative Process
8
IBM Model 1: Details
No assumptions. Above formula is exact. Choosing length: P(J|E) = P(J|E,I) = P(J|I) = Choosing Alignment: all alignments equiprobable
Translation Probability
A
J
jaJ jjeft
IEFP
1
)|(*)1(
)|(
),,|(*),|(*)|()|( AEJFPEJAPEJPEFPA
9
),,,|(*),,|(
),,|(*),|(
),,|(*),|(
11
11
11
11
1
11111
IJjj
J
j
Ijj
IJJIJ
eaJffPeJaaP
eaJfPeJaP
EJAFPEJAP
A
IJjj
J
j
Ijj eaJffPeJaaPEJPEFP ),,,|(*),,|(*)|()|( 1
11
11
11
11
1
1),,|( 1
11
IeJaaP Ij
j
)|(),,,|( 11
11
1 jajefteaJffP IJj
j
Training Alignment Models10
Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment
probability P(ai|ai-1,I)
How to compute these probabilities if all you have is a parallel corpora
Intuition : Interdependence of Probabilities
11
If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable
If you were given alignments with probabilities then you can compute translation probabilities
Looks like a chicken and egg problem
EM algorithm comes to the rescue
Expectation Maximization (EM) Algorithm
12
Used when we want maximum likelihood estimate of the parameters of a model when the model depends on hidden variables-In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities • Init: Start with an arbitrary estimate of parameters• E-step: compute the expected value of hidden variables• M-Step: Recompute the parameters that maximize the likelihood of data given the expected value of the hidden variables from E-step
Example of EM Algorithm13
Green houseCasa verde
The houseLa case
Init: Assume that any word can generate any word with equal prob:
P(la|house) = 1/3
E-Step14
J
jaj
J jeft
I
EJAFPEJAPEJFAP
1
)|(*)1(
),,|(*),|(),|,(
E-Step:
A
EFAP
EFAPEFAP
)|,(
)|,(),|(
M-Step
15
f
EF A
eftcount
eftcounteft
EFAefCAPeftcount
)|(
)|()|(
),,|,(*)()|(,
E-Step again
J
jaj
J jeft
I
EJAFPEJAPEJFAP
1
)|(*)1(
),,|(*),|(),|,(
A
EFAP
EFAPEFAP
)|,(
)|,(),|(
16
1/3 2/3 2/3 1/3
Repeat till convergence
Limitation: Only 1->Many Alignments allowed
17
Phrase-based alignment
More natural
Many-to-one mappings allowed
Generating Bi-directional Alignments Existing models only generate uni-directional
alignments Combine two uni-directional alignments to get
many-to-many bi-directional alignments
19
Hindi-Eng Alignment
छु� ट्टि�यों� के� लिए गो वा� एके प्रमु�ख समु�द्र-तटी�यों गो�तव्य है�
Goa |
is
a |
premier |
beach
vacation | | |
destination | |
20
Eng-Hindi Alignment
छु� ट्टि�यों� के� लिए गो वा� एके प्रमु�ख समु�द्र-तटी�यों गो�तव्य है�
Goa
|
is
a
|premier
|
beach
|
vacation
|
destination
|21
Combining Alignments
छु� ट्टि�यों� के� लिए गो वा� एके प्रमु�ख समु�द्र-तटी�यों गो�तव्य है�
Goa +
is
a +premier |
|
beach
|
vacation | |
+
destination
|
| |
22P=2/3=.67, R=2/7=.3P=4/5=.8,R=4/7=.6
P=5/6=.83,R=5/7=.7P=6/9=.67,R=6/7=.85
A Different Heuristic from Moses-Site
23
GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e);
GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and
( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and
( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new )
Proposed Changes:After growing diagonalAlign the shorter sentence firstAnd use alignments only fromcorresponding directional alignment
Generating Phrase Alignments
छु� ट्टि�यों� के� लिए गो वा� एके प्रमु�ख समु�द्र-तटी�यों गो�तव्य है�
Goa +
is
a +premier +
beach
+
vacation + +
+
destination + +
24a premier beach vacation destinationएके प्रमु�ख समु�द्र-तटी�यों गो�तव्य है�
premier beach vacationप्रमु�ख समु�द्र-तटी�यों
Using Moses and Giza++
Refer to http://www.statmt.org/moses_steps.html
Steps
Install all packages in Moses
Input - sentence aligned parallel corpus
Training Tuning Generate output on test corpus
(decoding)
Example
train.enh e l l o
h e l l o
w o r l d
c o m p o u n d w o r d
h y p h e n a t e d
o n e
b o o m
k w e e z l e b o t t e r
train.prhh eh l ow
hh ah l ow
w er l d
k aa m p aw n d w er d
hh ay f ah n ey t ih d
ow eh n iy
b uw m
k w iy z l ah b aa t ah r
Sample from Phrase-tableb o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1
0.181818 2.718
b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718
c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| 1 0.0486111 1 0.154959 2.718
c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718
d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718
d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718
e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718
e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5 2.718
e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.111111 0.5 0.111111 2.718
e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1 0.133333 2.718
e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718
h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718
h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718
l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1 0.5 2.718
l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718
l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718
Testing output
h o t hh aa t
p h o n e p|UNK hh ow eh n iy
b o o k b uw k