Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
The Dark Secrets of
MTRevealed
Machine TranslationI256: Applied Natural Language Processing
John DeNeroSome slides on loan from Dan Klein & others
Thursday, November 5, 2009
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNOVEL SENTENCE
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNOVEL SENTENCE
I will do it soon
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
Uses of Translation
• Assimilation
• Gist of a document is helpful
• Dissemination
• High quality expected; may be closed domain
• Communication
• Wide range of quality requirements
Thursday, November 5, 2009
Uses of Translation
• Assimilation
• Gist of a document is helpful
• Dissemination
• High quality expected; may be closed domain
• Communication
• Wide range of quality requirements
Machine translation is much lower cost, much faster, and much easier to access than convetional translation. However, it’s worse.
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s’58 ’00’s
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
’58 ’00’s
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
Statistical data-driven approach introduced
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
Statistical data-driven approach introduced
Statistical MT thrives
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
The Problem with Dictionary Look-ups
顶部顶端顶头
盖盖帽极尖峰面摘心
/top/roof/
/summit/peak/top/apex/
/coming directly towards one/top/end/
/lid/top/cover/canopy/build/Gai/
/surpass/top/
/extremely/pole/utmost/top/collect/receive/
/peak/top/
/fade/side/surface/aspect/top/face/flour/
/top/topping/
Example from Douglas Hofstadter
Thursday, November 5, 2009
The Problem with Dictionary Look-ups
顶部顶端顶头
盖盖帽极尖峰面摘心
/top/roof/
/summit/peak/top/apex/
/coming directly towards one/top/end/
/lid/top/cover/canopy/build/Gai/
/surpass/top/
/extremely/pole/utmost/top/collect/receive/
/peak/top/
/fade/side/surface/aspect/top/face/flour/
/top/topping/
carrot, class, pile, condition, drawer, speed, bikini, lungs, “top dog”, “top brass”, “top of the line”, “big top”, “over the top”, “pop top”, “top off”, “off the top of my head”, “take it from the top”, “I’m on top of it”, ...
Example from Douglas Hofstadter
Thursday, November 5, 2009
Levels of Language Transfer
Source text
Target text
Thursday, November 5, 2009
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Thursday, November 5, 2009
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Morphology
Thursday, November 5, 2009
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Syntax
Morphology
Thursday, November 5, 2009
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Semantics
Syntax
Morphology
Thursday, November 5, 2009
Interlingua
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Semantics
Syntax
Morphology
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .de muy buen grado
Input Output
Grammar
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
Input Output
Grammar
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Input Output
Grammar
gladly
ADV
PRPVB
MD VP
VPNP .
S
PRP
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
Product of Experts Model
Models that factor over rules
Product of Experts Model
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Product of Experts Model
Models that factor over rules
Product of Experts Model
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
I!
i=1
P (ei|ei!1, ..., e1)!1
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
I!
i=1
P (ei|ei!1, ..., e1)!1
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
How good is this target sentence?
Thursday, November 5, 2009
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
Unsupervised Word Alignment
• Input: A large bitext of sentences and their translations
• Approach: Using what we know about the problem and corpus statistics, align words of translations automatically
• Exciting fact: Unsupervised methods perform well enough that very few systems use supervised word alignment
Thursday, November 5, 2009
Unsupervised Word Alignment
• Input: A large bitext of sentences and their translations
• Approach: Using what we know about the problem and corpus statistics, align words of translations automatically
• Exciting fact: Unsupervised methods perform well enough that very few systems use supervised word alignment
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
• Often one-to-one or many-to-one (usually over contiguous phrases)
• Occasionally many-to-many, driven by non-literal translations
Thursday, November 5, 2009
Heuristic Estimation
• Two words that co-occur regularly are translations
• Normalize by the word frequencies
• Enforcing competition across words (e.g., finding a one-to-one or many-to-one mapping) is a good idea
c(f) c(e)
c(e, f)
2 · c(e, f)c(e) + c(f)
The number of times e and f appear together
Count of word f Count of word f
Dice coefficient
Thursday, November 5, 2009
Heuristic Estimation
• Two words that co-occur regularly are translations
• Normalize by the word frequencies
• Enforcing competition across words (e.g., finding a one-to-one or many-to-one mapping) is a good idea
c(f) c(e)
c(e, f)
2 · c(e, f)c(e) + c(f)
The number of times e and f appear together
Count of word f Count of word f
Dice coefficient
Thursday, November 5, 2009
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
Thursday, November 5, 2009
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
a6=5
Thursday, November 5, 2009
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
P (f, a|e) =J!
j=1
P (aj = i|I, J)P (fj |ei)
=1
I + 1P (fj |ei)
a6=5
Thursday, November 5, 2009
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
P (f, a|e) =J!
j=1
P (aj = i|I, J)P (fj |ei)
=1
I + 1P (fj |ei)
a6=5
Thursday, November 5, 2009
Estimating Model 1 Parameters
P (f |e)
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
P (f |e)
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
P (f |e)
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
P (f |e) =sum of posteriors for f aligned to e
sum of posteriors of any f ! aligned to e
Thursday, November 5, 2009
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
• Repeat e- and m-step many times (like 5 or 10)
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
P (f |e) =sum of posteriors for f aligned to e
sum of posteriors of any f ! aligned to e
Thursday, November 5, 2009
Aligning Words Under the Model
• Viterbi: For every j, select i that maximizes
• Posterior: Align every (i,j) that has
P (aj = i|e, f)
P (aj = i|e, f) > !
Gives competition among explanations
Gives control over how many alignment links to posit
Thursday, November 5, 2009
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
Problems with IBM Model 1
• Too many alignments to rare words (garbage collection)
• Alignments jump around all over the sentence
Thursday, November 5, 2009
Problems with IBM Model 1
• Too many alignments to rare words (garbage collection)
• Alignments jump around all over the sentence
Thursday, November 5, 2009
Intersected IBM Model 1
Thursday, November 5, 2009
Intersected IBM Model 1
• Train Model 1 in both directions, align with each, then intersect the output(Och and Ney, ’03)
• Result is one-to-one with Viterbi alignments
• Second model filters the first, eliminating mistakes
Thursday, November 5, 2009
Intersected IBM Model 1
Model P/R AERModel 1 E→F 82/58 30.6
Model 1 F→E 85/58 28.7
Model 1 AND 96/46 34.8
• Train Model 1 in both directions, align with each, then intersect the output(Och and Ney, ’03)
• Result is one-to-one with Viterbi alignments
• Second model filters the first, eliminating mistakes
Thursday, November 5, 2009
Joint Training for IBM Model 1
Model P/R AERModel 1 E→F 82/58 30.6Model 1 F→E 85/58 28.7Model 1 AND 96/46 34.8Model 1 INT 93/69 19.5
• We can intersect model predictions during training as well
• Modified alignment posterior:
• Models are forced to agree as they select parameters
• Same precision benefits, but higher recall from more agreement
Pe!f (aj = i|e, f) · Pf!e(ai = j|e, f)
Thursday, November 5, 2009
IBM Model 2
• Words at the beginning of sentences should align
• Words at the end of sentences should align
• Alignment probability depends on position, e.g.
P (f, a|e) =J!
j=1
P (aj = i|I, J) · P (fj |ei)
! exp("!
""""ai " iI
J
"""") · P (fj |ei)
Thursday, November 5, 2009
Phrase Movement
Des tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre.
On Tuesday Nov. 4, earthquakes rocked Japan once again
Absolute position distortion isn’t quite right
Thursday, November 5, 2009
IBM Models 1/2
Thank you , I shall do so gladly .
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
IBM Models 1/2
Thank you , I shall do so gladly .
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
IBM Models 1/2
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
IBM Models 1/2
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3| I, J)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
IBM Models 1/2
Thank you , I shall do so gladly .
1 3 7 6 9
1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3| I, J)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
8 8 88
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6 8 8 88
E:
F:
Thursday, November 5, 2009
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6 98 8 88
E:
F:
Thursday, November 5, 2009
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
Thursday, November 5, 2009
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
Thursday, November 5, 2009
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
-2 -1 0 1 2 3
Thursday, November 5, 2009
HMM Examples
Thursday, November 5, 2009
AER for HMMs
Model AER
Model 1 INT 19.5
HMM E→F 11.4
HMM F→E 10.8
HMM AND 7.1
HMM INT 4.7
GIZA M4 AND 6.9
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP
Thursday, November 5, 2009
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP PRNVB
MD VP
VP
NP
will do it
P( ) = 0.8VP
lo haré NP
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Align words with a probabilistic model
1
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Translate with the larger structures
3
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar Rules
〈lo X de ... grado ;
X it gladly〉
Word Aligned Sentence Pair
Thursday, November 5, 2009
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar Rules
〈lo X de ... grado ;
X it gladly〉
Word Aligned Sentence Pair
Model Parameters
Relative frequency counts
c( lo X de muy buen grado ; X it gladly )P(es|en) =
c( * ; X it gladly )
Thursday, November 5, 2009
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
Thursday, November 5, 2009
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
Thursday, November 5, 2009
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar RulesA
DV
Thursday, November 5, 2009
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar RulesA
DV
Thursday, November 5, 2009
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
〈lo haré ADV ;
will do it ADV〉
VP →
AD
V
Thursday, November 5, 2009
What Happens in Practice
Je vois un chat
Machine translation system:
Model of translation
Thursday, November 5, 2009
What Happens in Practice
Je vois un chat I see a spade
Machine translation system:
Model of translation
Thursday, November 5, 2009
What Happens in Practice
Je vois un chat I see a spade
Machine translation system:
Model of translation
... appelez un chat un chat
... call a spade a spade
Sentence-aligned parallel corpus:
......
Thursday, November 5, 2009
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
Thanks,thatdo [first; future]
ofverygooddegree.
Gloss
Thursday, November 5, 2009
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
Thanks,thatdo [first; future]
ofverygooddegree.
Gloss
Thursday, November 5, 2009
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
A sampled phrase alignment(our system)
Thursday, November 5, 2009
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
A sampled phrase alignment(our system)
Thursday, November 5, 2009
Example Machine Translation Pipeline
Thursday, November 5, 2009
A Machine Translation Pipeline
Phrase Model Training (Moses)
Example from CMU INCA System (Vogel et al)
Thursday, November 5, 2009
Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3
[ara-tune4600:211] 1-best PoS-Tree
al
NNP
@-@
HYPH
baz
NNP
NML
NPB
NP-C63425995
declined
VBD
to
TO
give
VB
any
DT
statements
NNS30229081
NPB
upon
IN
his
PRP$
arrival
NN
NPB
in
IN
the
DT
province
NN
NPB
NP-C59736686
PP
NP-C
PP114470921
NP-C
VP-C
VP220719583
SG-C
VP
S-BAR
.
.
S151963398
GLUE265961794
TOP265961890
64
2190
13
3
7
26
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 !v · !w 52.82
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 !v · !w 52.82
Thursday, November 5, 2009
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
Thursday, November 5, 2009
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
Thursday, November 5, 2009
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
Thursday, November 5, 2009
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
5/7
7/8
Thursday, November 5, 2009
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
5/7
7/8
Systems are trained to optimize this
metric
Thursday, November 5, 2009
Integrating MT into Other Systems
• Speech-to-speech translation
• Cross-lingual information retrieval
• Translated optical character recognition
• Mobile device integration
• Text-oriented web services of all kinds
Thursday, November 5, 2009