Upload
ilyssa
View
27
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Session #2. Introductions Syllabus, requirements, etc. Language in 10 minutes Lecture MT Approaches MT Evaluation MT Lab. Road Map. Multilingual Challenges for MT MT Approaches MT Evaluation. Gisting. MT Approaches MT Pyramid. Source meaning. Target meaning. Source syntax. - PowerPoint PPT Presentation
Citation preview
Session #2
• Introductions– Syllabus, requirements, etc.
• Language in 10 minutes• Lecture
– MT Approaches– MT Evaluation
• MT Lab
Road Map
• Multilingual Challenges for MT• MT Approaches• MT Evaluation
MT ApproachesMT Pyramid
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Gisting
MT ApproachesGisting Example
Sobre la base de dichas experiencias se estableció en 1988 una metodología.
Envelope her basis out speak experiences them settle at 1988 one methodology.
On the basis of these experiences, a methodology was arrived at in 1988.
MT ApproachesMT Pyramid
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Gisting
Transfer
MT ApproachesTransfer Example
• Transfer Lexicon – Map SL structure to TL structure
poner
X mantequilla en
Y
:obj:mod:subj
:obj
butter
X Y
:subj :obj
X puso mantequilla en Y X buttered Y
MT ApproachesMT Pyramid
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Gisting
Transfer
Interlingua
MT ApproachesInterlingua Example: Lexical Conceptual Structure
(Dorr, 1993)
MT ApproachesMT Pyramid
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Interlingua
Gisting
Transfer
MT ApproachesMT Pyramid
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Interlingual Lexicons
Dictionaries/Parallel Corpora
Transfer Lexicons
MT ApproachesStatistical vs. Rule-based
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
Statistical MT Noisy Channel Model
Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf
Statistical MT Automatic Word Alignment
• GIZA++– A statistical machine translation toolkit used to train word alignments.– Uses Expectation-Maximization with various constraints to bootstrap
alignments
Slide based on Kevin Knight’s http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dio una bofetada a la bruja verde
Giza Plateau and MT Pyramid
• EGYPT toolkit• Giza, Giza++• Cairo visualizer• Pharaoh• Hiero• Moses
Words WordsWords
Statistical MT IBM Model (Word-based Model)
http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf
Phrase-Based Statistical MT
• Foreign input segmented in to phrases– “phrase” is any sequence of words
• Each phrase is probabilistically translated into English– P(to the conference | zur Konferenz)– P(into the meeting | zur Konferenz)
• Phrase distortion
This is state-of-the-art!
Morgen fliege ich nach Kanada zur Konferenz
Tomorrow I will fly to the conference In Canada
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dió una bofetada a la bruja verde
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)
(a la bruja verde, the green witch) …
Word Alignment Induced PhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dió una bofetada a la bruja verde
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)
(a la bruja verde, the green witch) …
(Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)
Word Alignment Induced PhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Phrase-Based Models• Sentence f is decomposed into J phrases f1
J = f1,...,fj,...,fJ
• Sentence e is decomposed into l phrases e = eI1 = e1,...,ei,...,eI.
• We choose the sentence with the highest probability:
Phrase-Based Models• Model the posterior probability using a log-linear combination
of feature functions. • We have a set of M feature functions hm(eI
1,f1J),m = 1,...,M. For
each feature function, there exists a model parameter λm ,m = 1,...,M
• The decision Rule is
• Features cover the main components• Phrase-Translation Model• Reordering Model• Language Model
Advantages of Phrase-Based SMT
• Many-to-many mappings can handle non-compositional phrases
• Local context is very useful for disambiguating– “Interest rate” …– “Interest in” …
• The more data, the longer the learned phrases– Sometimes whole sentences
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Source word
Source syntax
Source meaning Target meaning
Target syntax
Target word
Analysis Generation
MT ApproachesStatistical vs. Rule-based vs. Hybrid
Knowledge Acquisition Strategy
Knowledge Representation Strategy
All manual
Deep/ Complex
Shallow/ Simple
Fully automated
Learn from un-annotated data
Phrase tables
Word-based only
Learn from annotated data
Example-based MT
Original statistical MT
Typical transfer system
Classic interlingual system
Original direct approach
Syntactic Constituent Structure
Interlingua
New Research Goes Here!
Semantic analysis
Hand-built by non-experts
Hand-built by experts
Electronic dictionaries
Slide courtesy ofLaurie Gerber
MT Approaches Practical Considerations
• Resources Availability– Parsers and Generators
• Input/Output compatability– Translation Lexicons
• Word-based vs. Transfer/Interlingua– Parallel Corpora
• Domain of interest• Bigger is better
• Time Availability– Statistical training, resource building
Road Map
• Multilingual Challenges for MT• MT Approaches• MT Evaluation
MT Evaluation
• More art than science• Wide range of Metrics/Techniques
– interface, …, scalability, …, faithfulness, ... space/time complexity, … etc.
• Automatic vs. Human-based– Dumb Machines vs. Slow Humans
Human-based Evaluation ExampleAccuracy Criteria
Human-based Evaluation ExampleFluency Criteria
Fluency vs. Accuracy
Accuracy
Fluency
conMTFAHQMT
Prof.MT
Info.MT
Automatic Evaluation ExampleBleu Metric
(Papineni et al 2001)
• Bleu – BiLingual Evaluation Understudy – Modified n-gram precision with length penalty – Quick, inexpensive and language independent – Correlates highly with human evaluation – Bias against synonyms and inflectional variations
Test Sentence
colorless green ideas sleep furiously
Gold Standard References
all dull jade ideas sleep iratelydrab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Automatic Evaluation ExampleBleu Metric
Test Sentence
colorless green ideas sleep furiously
Gold Standard References
all dull jade ideas sleep iratelydrab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Unigram precision = 4/5
Automatic Evaluation ExampleBleu Metric
Test Sentence
colorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiously
Gold Standard References
all dull jade ideas sleep iratelydrab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Unigram precision = 4 / 5 = 0.8Bigram precision = 2 / 4 = 0.5
Bleu Score = (a1 a2 …an)1/n
= (0.8 0.5)╳ ½ = 0.6325 63.25
Automatic Evaluation ExampleBleu Metric
Automatic Evaluation ExampleMETEOR
(Lavie and Agrawal 2007)
• Metric for Evaluation of Translation with Explicit word Ordering
• Extended Matching between translation and reference– Porter stems, wordNet synsets
• Unigram Precision, Recall, parameterized F-measure• Reordering Penalty• Parameters can be tuned to optimize correlation with
human judgments• Not biased against “non-statistical” MT systems
• A syntactically-aware evaluation metric– (Liu and Gildea, 2005; Owczarzak et al.,
2007; Giménez and Màrquez, 2007)• Uses dependency representation
– MICA parser (Nasr & Rambow 2006)– 77% of all structural bigrams are surface
n-grams of size 2,3,4• Includes dependency surface span as
a factor in score– long-distance dependencies should
receive a greater weight than short distance dependencies
• Higher degree of grammaticality? 0%5%
10%15%20%25%30%35%40%45%50%
1 2 3 4plus
Automatic Evaluation Example
SEPIA (Habash and ElKholy 2008)
Metrics MATR Workshop
• Workshop in AMTA conference 2008 – Association for Machine Translation in the Americas
• Evaluating evaluation metrics• Compared 39 metrics
– 7 baselines and 32 new metrics– Various measures of correlation with human judgment– Different conditions: text genre, source language,
number of references, etc.
Building an SMT System• Training
– Input• Parallel Training Data• Monolingual Data of the Target Language
– Output• Phrase Table• Language Model• Initial Translation Model parameters
• Tuning– Input
• Parallel Tune Data Set• Decoder • Phrase Table + Language Model + Initial Translation Model parameters
– Output• Improved Translation Model parameters
• Development & Testing– Input
• Parallel Test Data Set• Decoder• Phrase Table + Language Model + FinalTranslation Model parameters
– Output• BLEU score
Training
Preprocess
Alignment
Phrase Extraction
Phrase Table (PT)
LM
MonolingualTargetSource
Create LM
Tuning
Tune Set
Preprocess
Decoder
LM
PT
Weights
Output
Metric Score Optimal?
Adjust Weights
Done
No
Yes
Testing
Test Set
Preprocess
Decoder
LM
PT Weights
Output