Upload
sharon-cunningham
View
217
Download
0
Embed Size (px)
Citation preview
1
COMP791A: Statistical Language Processing
Machine Translation andStatistical Alignment
Chap. 13
2
Contents
1- Machine Translation 2- Statistical Machine Translation 3- Text Alignment
Length-based methods Offset alignment by signal processing
techniques Lexical methods of sentence alignment
4- Word Alignment
3
Where: meaning(text2) == meaning(text1) i.e. faithful text2 is perfecly grammatical and idiomatic i.e. fluent
MT is very hard translation programs available today do not perform very well
Goal of MT
Text1 in source
language
Text2 in target
language
4
Little history of MT 1950’s
inspired by the code-breakers of WWII Russian is just an encoded version of English “We’ll have this up and running in a few years, it’ll be great, just give us
lots of money” 1964
ALPAC report (Automatic Language Processing Advisory Committee) “…we do not have useful machine translation…” “…there is no immediate or predictable prospect of useful machine
translation…” Nearly sank funding for all of AI.
1990’s DARPA funds research in MT 2 “competitive” approaches
Statistical MT (IBM at TJ Watson Research Center) Rule-based MT(CMU, ISI, NMSU)
Regular competitions And the winner was… Systran!
5
Difficulties in MT Different word order (SVO vs VSO vs SOV languages)
“the black cat” (DT ADJ N) --> “le chat noir” (DT N ADJ) Many-to-many mapping between words in different
languages “John knows Bill.” --> “John connaît Bill.” “John knows Bill will be late.” --> “John sait que Bill sera en
retard.” Overlapping of word senses
leg
patteétape
jambe pied
foot
paw
human
journey
chair
animal
animal
human
bird
6
analysis --> transfer --> generation
Each arrow can be implemented with rule-based methods or probabilistically
The Transfer Metaphor
Interlinguaattraction(NamedJohn, NamedMary, high)
English Semanticsloves(John, Mary)
French Semanticsaime(Jean, Marie)
English SyntaxS(NP(John) VP(loves, NP(Mary)))
French SyntaxS(NP(Jean) VP(aime, NP(Marie)))
English WordsJohn loves Mary
French WordsJean aime Marie word transfer
(memory-based translation)
syntactic transfer
semantic transfer
knowledge transfer
7
Syntactic transfer Solves some problems…
Word order Some cases of lexical choice
Ex: Dictionary of analysis
know: verb ; transitive ; subj: human ; obj: NP || Sentence
Dictionary of transfer know + obj [NP] --> connaître know + obj [sentence] --> savoir
But syntax is not enough… No one-to-one correspondence between syntactic
structures in different languages (syntactic mismatch)
8
2-Statistical MT: Being faithful & fluent
Often impossible to have a true translation; one that is: Faithful to the source language, and Fluent in the target language Ex:
Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past
behaviour, and what we did wrong, and how to avoid the problem next time)”
So need to compromise between faithfulness & fluency
Statistical MT tries to maximise some function that represents the importance of faithfulness and fluency
Best-translation T*= argmaxT fluency(T) x faithfulness(T, S)
9
The Noisy Channel Model Statistical MT is based on the noisy channel model Developed by Shannon to model communication (ex. over a
phone line)
Noisy channel model in SMT (ex. en|fr): Assume that the true text is in English But when it was transmitted over the noisy channel, it somehow got
corrupted and came out in French i.e. the noisy channel has deformed/corrupted the original English
input into French So really… French is a form of noisy English
The task is to recover the original English sentence (or to decode the French into English)
10
Fundamental Equation for SMT Assume we are translating from FR-->EN (en|fr) Intuitively we saw that:
e* = argmaxe fluency(e) x faithfulness(e, f)
More formally:
e* = argmaxe P(e|f)
By Bayes theorem:
But P(f) is the same for all e, so
may seem circular… why not just P(e|f) ??? P(f|e) x P(e) allows us to have a sloppy translation model Hopefully P(e) will correct the mistakes of the translation
model
P(f)P(e) x e)|P(f
f)|P(e
P(e) x e)|P(f argmaxe*e
11
Example of SMT (en|jp) Source sentence (Japanese): “2000men taio” Translation model
From the Translation model: ”2000 correspondence” is the best translation
But the Language model: “2000 correspondence” is not frequent at all so overall: “dealing with Y2K” is the best translation! (maximizes their
product)
2000men taio
More probable 2000 correspondence
Year 2000 corresponding
Y2K equivalent
200 years tackle
200 year deal with
Less probable … …
12
We need 3 things (for en|fr):1. A Language Model of English: P(e)
Measures fluency Probability of an English sentence We can do this with an n-gram or PCFG ~ Provides the right word ordering and collocations ~ Provides a set of fluent sentences to test for potential
translation
2. A Translation Model: P(f|e) Measures faithfulness Probability of an (French, English) pair We can do this with text (word) alignment of parallel
corpora ~ Provides the right bag of words ~Tests if a given fluent sentence is a translation
3. A Decoder: argmax An effective and efficient search technique to find e* Usually we use a heuristic search
13
seen in class…
We need a Language Model P(e)
14
We need 3 things (for en|fr):1. A Language Model of English: P(e)
Measures fluency Probability of an English sentence We can do this with an n-gram or PCFG ~ Provides the right word ordering and collocations ~ Provides a set of fluent sentences to test for potential
translation
2. --> A Translation Model: P(f|e) Measures faithfulness Probability of an (French, English) pair We can do this with text (word) alignment of parallel
corpora ~ Provides the right bag of words ~Tests if a given fluent sentence is a translation
3. A Decoder: argmax An effective and efficient search technique to find e* Usually we use a heuristic search
15
Probability of an FR sentence being a translation of an EN sentence
~ the product of the probabilities that each FR word is the translation of some EN word
unigram translation model ex: P(le chien est mort | the dog is dead) = P(le|the) x P(chien|dog) x P(est|is) x P(mort|dead)
So we need to know, for each FR word, the probability of it mapping to each possible EN word
But where do we get these probabilities?
We need a translation model P(f|e) ex: IBM model 3
n
1iii )e|P(fe)|P(f
16
Parallel Texts Parallel texts or bitexts
Same content is available in several languages Official documents of countries with multiple official
languages -> literal, consistent
Alignment Paragraph to paragraph, sentence to sentence, word to
word
Language2
Sectionk
Paragraphk
Sentencek
Phrasek
Wordk
… Wordm
Language1
Sectioni
Paragraphi
Sentencei
Phrasei
Word i
… Word j
17
Problem 1: Fertility word choice is not 1-to-1
ex: Je mange à la maison.--> I eat home.
solution: a word with fertility n gets copied n times, and for each of
these n times, gets translated independently
ex: à la maison --> home à --> fertility 0 la --> fertility 0 maison --> fertility 1 use unigram translation model to translate maison-->home
ex: home --> à la maison home --> fertility 3 home home home --> à la maison note: the translation model will give the same probability to:
home home home --> maison à la… it is up to the language model to select the correct word order
18
Problem 2: Word order word order is not the same in both languages
ex: le chien brun --> the brown dog solution:
assign an offset to move words from their original positing to their final position
ex: chien brun --> brown dog brown --> offset +1 dog --> offset -1
Making the offset dependent on the words would be too costly… so in IBM model 3, the offset only depends:
on the position of the word within the sentence!!! the length of the sentences in both languages P(offset=o | Position = p, EngLen = m, FrLen = n)
ex: brown dog offset of brown = P(offset| 1,2,2) ex: P(+1| 1,2,2) = .3 P(0| 1,2,2) = .6 P(-1| 1,2,2) = .1
19
An Example (en|fr)
Given the English
The brown dog did not
go home
Fertility Model 1 1 1 1 2 1 3
Transformed English
The brown dog did not not go home home home
Translation Model
Le brun chien est n' pas allé à la maison
Offset Model 0 +1 -1 +1 -1 0 0 0 0 0
A possible Translation
Le chien brun n' est pas allé à la home
Then use Language Model P(e) to evaluate fluency of all possible translations
P(e) x e)|P(f argmaxe*e
20
Summary : IBM-3 for (en|fr)
to find P(e|f), we need:1. Language model for English P(e): P(wordEi | wordEi-1)
2. Translation model P(f|e): 1. Translation model per se: P(wordF | wordE)
1. Fertility model of English: P(Fertility=n | wordE)
2. Offset model for French: P(Offset=o | pos, lenF, lenE)
21
We need 3 things (for en|fr):1. A Language Model of English: P(e)
Measures fluency Probability of an English sentence We can do this with an n-gram or PCFG ~ Provides the right word ordering and collocations ~ Provides a set of fluent sentences to test for potential translation
2. --> A Translation Model: P(f|e) Measures faithfulness Probability of an (French, English) pair We can do this with text (word) alignment of parallel corpora ~ Provides the right bag of words ~Tests if a given fluent sentence is a translation
3. --> A Decoder: argmax An effective and efficient search technique to find e* Usually we use a heuristic search
22
We needed a decoder
we can compute P(e|f) for any given pair of (en,fr) sentences… that's nice
but: what we really want is to find the English
sentence that maximises P(e|f) given a French sentence
assume a vocabulary of 100,000 words in English there are 105n possible English sentences of
length n.. and many alignments of each one, and many
possible offsets … we need a search algorithm (ex. A*)
23
3- Text alignment used to find P(f|e) not a trivial task Problems:
not always one sentence to one sentence translators do not always translate one sentence in the
input into one sentence in the output although true in 90% of the cases.
crossing dependencies the order of sentences are changed in the translation.
Large pieces of material can disappear
24
Egyptianhieroglyphs
Egyptian Demotic
Greek
carved in 196 BCfound in 1799decoded in 1822
The Rosetta Stone
25
A modern Rosetta Stone: TransSearch
26
Note: Re-ordering of phrases Disappearance of phrases (they are implied in the French
version)
Quand aux
eaux minérales et aux limonades,
Elles rencontrent toujours plus d’adeptes.
En effet
notre sondage
fait ressortir
des ventes
nettement supérieures
à celles de 1987,
pour les boissons à base de cola
notamment.
According to
our survey, 1988
sales of
mineral water and soft drinks were
much higher
than in 1987,
reflecting
their growing popularity
of these products.
Cola drink
manufacturers
in particular
achieved above average growth rate.
Example
27
Aligning sentence and paragraph BEAD is a n:m grouping
S, T : text in two languages
S = (s1, s2, … , si) T = (t1, t2, … , tj) Each sentence can occur
in only one bead Assume no crossing (but
occurs in reality) Most common (90%) 1:1 But also: 0:1, 1:0, 2:1,
1:2, 2:2, 2:3, 3:2 …
s1
.
.
.
.
.
.
.si
t1
.
.
.
.
.
.
.tj
S Tb1
b2
b3
b4
b5
.
.bk
28
Quand aux
eaux minérales et aux limonades,
Elles rencontrent toujours plus d’adeptes.
En effet
notre sondage
fait ressortir
des ventes
nettement supérieures
à celles de 1987,
pour les boissons à base de cola
notamment.
According to
our survey, 1988
sales of
mineral water and soft drinks were
much higher
than in 1987,
reflecting
their growing popularity
of these products.
Cola drink
manufacturers
in particular
achieved above average growth rate.
2:2 alignment
Example
29
Approaches to Text Alignment Length-Based Methods
short sentences will be translated with short sentences
long sentences will be translated with long sentences Offset Alignment by Signal Processing
Techniques do not attempt to align beads of sentences just try to align position offsets in the two parallel
texts Lexical Methods
use lexical information to align beads of sentences
30
Approaches to Text Alignment --> Length-Based Methods Offset Alignment by Signal Processing
Techniques Lexical Methods
31
Rationale Short sentence -> short sentence Long sentence -> long sentence
Length nb of words or nb of characters
Advantages: Efficient (for similar languages) Fast!
32
Length-based method Rationale: Short sentence -> short sentence / Long sentence -> long sentence Length: nb of words or nb of characters Advantages: Efficient (for similar languages) and fast! Gale and Church (1993):
Find alignment A with highest probability given the two parallel texts S and T. Union Bank of Switzerland Corpus (English, French, German) Let D(i,j) be the lowest cost alignment (the distance) between sentences s 1,…,si and t1,…,tj
])t,[t],s,[salign2:cost(22)j2,D(i
)t ],s,[salign1:cost(21)j2,D(i
])t,[t,salign2:cost(12)j1,D(i
)t,salign1:cost(11)j1,D(i
),salign0:cost(1j)1,D(i
)t,align1:cost(01)jD(i,
minj)D(i,
j1ji1i
ji1i
j1ji
ji
i
j
0D(0,0)
33
Example
cost(align(s1, t1))
cost(align(s2, t2))
cost(align(s3, ))
t1
t2
t3
t1
t2
t3
s1
s2
s3
s4
alignment 1
cost(align([s1, s2], t1))
cost(align(s3, t2))
cost(align(s4, t3))
+
+
alignment 2
+
+
cost(align(s4, t3))
L2L1
+
L1
Mean length ratio of sentences (nb of characters) in bead is ~1 German/English = 1.1 French/English = 1.06
Cost of an alignment Calculate the difference (distance) between lengths of sentences in the
beads So as to minimize this distance i.e. try to align beads so that the lengths of the sentences from the 2
languages in each bead are as similar as possible.
34
Results
Gale and Church (1993) use Dynamic Programming to efficiently consider
all possible alignments and find the minimum cost alignment
method performs well (at least on related languages) 4% error rate only 2% error rate on 1:1 alignments higher error rate on more difficult alignments
Assumes paragraph alignment Without a paragraph alignment, error rate triples
35
Approaches to Text Alignment Length-Based Methods --> Offset Alignment Lexical Methods
36
Offset alignment Length-based methods work well on clean texts but may break down in real-world situations
Ex: noisy text (OCR output with no clear sentence or paragraph boundaries,…)
Church (1993) Goal: Showing roughly what offset in one text aligns with
what offset in the other. uses cognates (words that are similar across
languages) Ex: proper names, numbers, common ancestors…
Ex: Smith, 848-3000, superior/supérieur But: uses cognates at the level of character sequences
NOT at the word level Build a dot-plot
37
S T
T
the source and translated text are concatenated a square graph is made with this text on both axes a dot is placed at (x,y) when there is a match.
[Unit= character 4-grams]
Sample Dot Plot
S Perfect match of a text with itself
Match of a text with its translation (cognates)
Match of a text with its translation (cognates)
The small diagonals provide an alignment in terms of offsets in the two texts
38
Approaches to Text Alignment Length-Based Methods Offset Alignment by Signal Processing
Techniques --> Lexical Methods
39
Lexical methods Align beads of sentences using lexical information Kay and Röscheisen (1993)
Idea: Use word alignment to help determine sentence alignment Then use sentence alignment to refine word alignment,…
Method:1. Begin with start and end of text as anchors2. Form an envelope of all possible alignments (no crossing of
anchors) where: Possible alignments must be at a certain distance away from the
anchors The distance increases as we get further away from the anchors
3. Choose pairs of words that co-occur in these potential alignments
4. Pick the best sentences involved in step 3 (having the most lexical correspondences) and use them as new anchors
5. Repeat steps 2-5
40
Example
Sentences of language 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 ●
2 ● ● ●
3 ● ● ● ●
4 ● ● ● ● ●
5 ● ● ● ● ●
6 ● ● ● ● ● ●
7 ● ● ● ● ● ●
8 ● ● ● ● ● ●
9 ● ● ● ● ● ● ●
10 ● ● ● ● ● ●
11 ● ● ● ● ● ●
12 ● ● ● ● ●
13 ● ● ● ●
14 ● ● ● ●
15 ● ● ●
16 ●
Sente
nce
s of
language
1
41
Example (con’t)
Sentences of language 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 ●
2 ● ● ●
3 ● ● ● ●
4 ● ● ● ●
5 ● ● ● ● ●
6 ● ● ● ● ● ●
7 ● ● ● ● ● ●
8 ● ● ● ● ● ●
9 ● ● ● ● ● ● ●
10 ● ● ● ● ● ●
11 ● ● ● ● ● ●
12 ● ● ● ● ●
13 ● ● ● ●
14 ● ● ● ●
15 ● ● ●
16 ●
Sente
nce
s of
language
1
42
Example (con’t)
Sentences of language 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 ●
2 ● ●
3 ● ●
4 ●
5 ● ● ●
6 ● ● ● ●
7 ● ● ● ●
8 ● ● ●
9 ●
10 ● ● ●
11 ● ● ●
12 ● ● ● ●
13 ● ● ● ●
14 ● ● ●
15 ● ● ●
16 ●
Sente
nce
s of
language
1
43
Example (con’t)
Sentences of language 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 ●
2 ● ●
3 ● ●
4 ●
5 ● ● ●
6 ● ● ● ●
7 ● ● ● ●
8 ● ● ●
9 ●
10 ● ● ●
11 ● ● ●
12 ● ● ● ●
13 ● ● ● ●
14 ● ● ●
15 ● ● ●
16 ●
Sente
nce
s of
language
1
44
Example (con’t)
Sentences of language 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 ●
2 ● ●
3 ●
4 ● ●
5 ● ● ●
6 ● ● ● ●
7 ●
8 ● ●
9 ● ● ● ●
10 ● ● ● ●
11 ●
12 ● ●
13 ● ●
14 ● ●
15 ● ●
16 ●
Sente
nce
s of
language
1
45
Word Alignment Usually done in two steps:
1. Do sentence/text alignment2. Select words from aligned pairs and use frequency
or chi-square to see if they co-occur more frequently English: In the beginning God created the heavens and the earth.Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât.
English: God called the expanse heaven.Vietnamese: Ðúc Chúa Tròi dat tên khoang không la tròi.
English: … you are this day like the stars of heaven in number.Vietnamese: … các nguoi dông nhu sao trên tròi.
Can also use an existing bilingual dictionary to start the word-alignment