76
Machine Translation: Challenges and Approaches Some slides from Nazar Habash and Dragomir Radev

Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MachineTranslation:ChallengesandApproaches

Some slides from Nazar Habash and Dragomir Radev

Page 2: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Announcements• Explana'onofmidtermgradesatendofclass(remindme!)

• Reading•  Today:C18.1-18.2NLP• Nextweek:C18.3,18.4,NLP

• HW2willbereturnednextweek

• Myofficehourstoday:4:30-5:30

Page 3: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Semanticinterpretation• Seman'crolelabeling,framenetparsers,AMTparsers•  Takesyntac'ctreeasinput,produceaseman'crepresenta'onasoutput

• Informa'onextrac'on• Producerela'ons,events,en''es

• Parsingdirectlyintoprogramminglanguage(languageasac'on)• PercyLiang:usinglanguagetorepresentif-thenrecipes(E.g.,controllingsmartphones)

•  Largeonlinerepositoryofenglish/code

Page 4: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MultilingualUsers•  ContentlanguagesforwebsitesPercentageofInternetusersbylanguage

http://en.wikipedia.org/wiki/Global_Internet_usage

Page 5: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Afrikaans

Bulgarian

Greek German Igno Kurdish Malayalm

Polish sindhi Tamil

Albanian Catalan English Gujara' Indonesian

Kyrgyz Maltese Portuguese

Sinhala Telugu

Amharic Cebuano Esperanto

Hai'anCreole

Irish Lao Maori Punjabi Slovak Thai

Arabic Chichewa

Estonian Hausa Italian La'n Marathi Romanian

Sloveian Turkish

Armenian

Chinese Filipino Hawaiian Japanese Latvian Mongolian

Russian Somali Ukranian

Azerbaijani

Corsican Finnish Hebrew Javanese Lithuanian

Myanmar

Samoan Spanish Urdu

Basque Croa'an French Hindi Kannada Luxembourgish

Nepala ScotsGaelic

Sundanese

Uzbek

Belarusian

Czech Frisian Hmong Kazakh Macedonian

Norwegian

Serbian Swahili Veitnamese

Bengali Danish Galician Hungarian

Khmer Malagasy Pashto Sesotho Swedish welsh

Bosnian Dutch Georgian Icelandic Korean Malay Persian Shona Tajik Xhosa

Yiddish Yoruba Zulu

Page 6: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Thankyouforyouracen'on!Ques'ons?

Page 7: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

• Romancelanguageshandledwell

• Similarlanguagepairshandledwell(e.g.,Spanish,Portuguese)

• Formalgenreshandledbecer

S'llmanyproblems!

Page 8: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today• Mul'lingualChallengesforMT

• MTApproaches•  Sta's'cal• Neuralnet(Nov6th)

• MTEvalua'on

Page 9: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today• Mul'lingualChallengesforMT

• MTApproaches•  Sta's'cal• Neuralnet

• MTEvalua'on

Page 10: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MultilingualChallenges• OrthographicVaria'ons

• Ambiguousspelling  • كتب الاولاد اشعارا كَتَبَ الأوْلادُ اشعَاراً

•  Ambiguouswordboundaries• 

• LexicalAmbiguity• Bankèبنك(financial)vs. ضفة(river)•  Eatèessen(human)vs.fressen(animal)

Page 11: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MultilingualChallengesMorphologicalVariations

• Affixa'onvs.Root+Pacern

write è written كتب è ب وكتمkill è killed قتل è ل وقتمdo è done فعل è ل وفعم

conj

noun

plural article

•  Tokenization And the cars è and the cars

ات سيارالو è w Al SyArAt Et les voitures è et le voitures

Slide from Nizar Habash

Page 12: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

هنا لستI-am-not here

am

I here

I am not here

not

ت لس

هنا

Translation Divergences conflation

Je ne suis pas ici I not am not here

suis

Je ici ne pas

Page 13: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

TranslationDivergencesEnglish John swam across the river quickly Spanish Juan cruzó rapidamente el río nadando

Gloss: John crossed fast the river swimming Arabic اسرع جون عبور النهر سباحة

Gloss: sped john crossing the-river swimming Chinese 约翰 快速 地 游 过 这 条 河

Gloss: John quickly (DE) swam cross the (Quantifier) river

Russian Джон быстро переплыл реку Gloss: John quickly cross-swam river

Page 14: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

LanguageDifferences-vocabulary

[Example from Jurafsky and Martin]

Page 15: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

LanguageDifferences-Syntax• Wordorder

•  SVO:English,Mandarin•  VSO:Irish,ClassicalArabic•  SOV:Hindi,Japanese

• Wordorderinphrases(Fr.)•  lamaisonbleue,thebluehouse

• Wordorderinsentences(Jap.)•  Iliketodrinkcoffee•  watashiwakohiionomunogasukidesu•  I-subjcoffee-objdrink-dat-rhemelike

• Preposi'ons(Jap.)•  toMariko,Mariko-ni

Page 16: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today• Mul'lingualChallengesforMT

• MTApproaches•  Sta's'cal

• MTEvalua'on

Page 17: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Page 18: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

String-to-StringTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Page 19: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesGistingExample

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.

Page 20: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Phrase-BasedTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Page 21: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Tree-to-TreeTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Page 22: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesTransferExample

• TransferLexicon• MapSLstructuretoTLstructure

à

poner

X mantequilla en

Y

:obj :mod :subj

:obj

butter

X Y

:subj :obj

X puso mantequilla en Y X buttered Y

Page 23: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Tree-to-StringTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Page 24: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Page 25: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AMRcharacteristics• Rooted,labeledgraphs• Abstractawayfromsyntac'cdifferences

• Hedescribedherasagenius• Hisdescrip'onofher:genius•  Shewasageniusaccordingtohisdescrip'on

• UsePropbankframesets•  “bondinvestor”:invest-01

• HeavilybiasedtowardsEnglish

Page 26: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

• Variables(ornodes)foren''es,events,proper'es,states

• Leafnodesarelabeledwithconcepts:•  (b/boy)aninstanceoftheconceptboy

• Rela'onslinken''es•  (d/die-01:loca'on(p/park)):therewasadeathinthepark

• AMRconcepts•  Englishwords(e.g.,boy),Propbankframesets(e.g.,want-01)orspecialkeywords(en'ty-types,quan''esorconjunc'ons)

Page 27: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AMRrelations•  ~100rela'ons•  Framearguments

•  Arg0,arg1,arg2,arg3,arg4,arg5(Propbank)•  Generalseman'crela'ons

•  :Accompanier,:age,:beneficiary,:cause,:compared-to,:concession,:condi'on,:consistof,:degree,:des'na'on,:direc'on,:domain,:dura'on,:employed-by,:example,:extent,:frequency,:instrument,:li,:loca'on,:manner,:medium,:mod,:mode,:name,:part,:path,:polarity,:poss,:purpose,:source,:subevent,:subset,:'me,:topic,:value.

•  Rela'onsforquan'ty•  :quant,:unit,:scale

•  Rela'onsfordateen'ty•  :day,:month,:year,:weekday,:'me,:'mezone,:quarter,:dayperiod,:season,:year2,:decade,:century,:calendar,:era.

•  Rela'onsforlists•  :op1,:op2,….:op10

•  Plusinverses(e.g.,:arg0-of,:loca'on-of)

Page 28: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Generalsemanticrelations

Page 29: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Inverserelations• Inordertoobtainrootedstructures

• (s/sing-01•  :arg0(b/boy

:source(c/college))

• Theboyfromthecollegesang.

Page 30: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

• (b/boy•  :arg0-of(s/sing-01):source(c/college))thecollegeboywhosang

Page 31: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

ModalsandNegation• Nega'onisrepresentedwith:polarityandmodalityisrepresentedwithconcepts

• (g/go-01:arg0(b/boy):polarity-)

Theboydidnotgo.

Page 32: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Interlingual Lexicons

Transfer Lexicons Transfer Lexicons

Dictionaries/Parallel Corpora

Page 33: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today• Mul'lingualChallengesforMT

• MTApproaches•  Sta's'cal• Neuralnet

• MTEvalua'on

Page 34: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

TranslationasDecoding• “Onenaturallywondersiftheproblemoftransla'oncouldconceivablybetreatedasaproblemincryptography.WhenIlookatanar'cleinRussian,Isay:'ThisisreallywriceninEnglish,butithasbeencodedinsomestrangesymbols.Iwillnowproceedtodecode.'“• WarrenWeaver,“Transla'on(1955)”

Page 35: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

http://www.ancientegypt.co.uk/writing/rosetta.html

Carved in 196 BC in Egypt Deciphered by Champollion in 1822 Mixture of Egyptian (hieroglyphs and Demotic) and Greek

TheIirstparallelcorpus:TheRosettaStone

Page 36: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Europarl:AParallelCorpusforStatisticalMachineTranslation• ProceedingsoftheEuropeanParliament• 21Europeanlanguages

• Romanic(French,Italian,Spanish,Portuguese,Romanian),Germanic(English,Dutch,German,Danish,Swedish),Slavik(Bulgarian,Czech,Polish,Slovak,Slovene),Finni-Ugric(Finnish,Hungarian,Estonian),Bal'c(Latvian,Lithuanian),andGreek

• 60millionwords/language• Mustbealignedfirst

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 37: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 38: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 39: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTNoisyChannelModel

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 40: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red 2. red flower a 3. flower red a 4. a red dog 5. dog cat mouse 6. a red flower

Page 41: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 42: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low 2. red flower a Low 3. flower red a Low 4. a red dog High 5. dog cat mouse Low 6. a red flower High

Page 43: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 44: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low High 2. red flower a Low High 3. flower red a Low High 4. a red dog High Low 5. dog cat mouse Low Low 6. a red flower High High

Page 45: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low High Low 2. red flower a Low High Low 3. flower red a Low High Low 4. a red dog High Low Low 5. dog cat mouse Low Low Low 6. a red flower High High High

Page 46: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTAutomaticWordAlignment

•  GIZA++•  Asta's'calmachinetransla'ontoolkitusedtotrainwordalignments.•  UsesExpecta'on-Maximiza'onwithvariousconstraintstobootstrapalignments

Slide based on Kevin Knight’s http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Mary

did

not

slap

the

green

witch

Maria no dio una bofetada a la bruja verde

Page 47: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 48: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StatisticalMTIBMModel(Word-basedModel)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 49: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

IBM’sEMtrainedmodels(1-5)•  Wordtransla'on•  Localalignment•  Fer'li'es•  Class-basedalignment•  Re-orderingAllareseparatemodelstotrain!Model1: ∏

=+==

m

jajm jefp

nceafpeapeafp

1

)|()1(

),|(*)|()|,(

Page 50: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Phrase-BasedStatisticalMT

•  Foreign input segmented in to phrases –  “phrase” is any sequence of words

•  Each phrase is probabilistically translated into English –  P(to the conference | zur Konferenz) –  P(into the meeting | zur Konferenz)

•  Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an intro. This was state-of-the-art before neural MT

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 51: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 52: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 53: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 54: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) …

Word Alignment Induced Phrases Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 55: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)

WordAlignmentInducedPhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 56: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AdvantagesofPhrase-BasedSMT

• Many-to-manymappingscanhandlenon-composi'onalphrases

• Localcontextisveryusefulfordisambigua'ng• “Interestrate” à…• “Interestin”à…

• Themoredata,thelongerthelearnedphrases•  Some'meswholesentences

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 57: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

StringtoTreeTranslation

(Yamada and Knight 2001)

He adores listening to music

He music to listening adores

He/ha music to listening/no ga adores/desu

Page 58: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Clauserestructuring(Collinsetal.)•  IchwerdeIhnendenReportaushaendigen…damitSiedeneventuelluebernehmentkoennen.

•  Iwillpass_onto_youthereport,so_thatyoucanadoptthatperhaps

•  Googletranslate:Iwillgiveyouthereport...sothatyoucantakeovertheeventuality.

•  verbini'al:thatperhapsadoptcan->adoptthatperhapscan

•  verbsecond:sothatyouadopt…can->sothatyoucanadopt

•  movesubject:sothatcanyouadopt->sothatyoucanadopt(inGerman,split-prefixphrasalverbsareverycommon,e.g.,“anrufen”->“rufensiebicenocheinmalan”–callrightbackplease)

Page 59: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

SynchronousGrammars• Generateparsetreesinparallelintwolanguagesusingdifferentrules

• E.g.,• NP->ADJN(inEnglish)• NP->NADJ(inSpanish)

• ITG(InversionTransduc'onGrammar)[Wu1995]• Don’tallowallpermuta'onsinderiva'ons• Only<>and[]areallowed

Page 60: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTApproachesPracticalConsiderations

• ResourceAvailability•  ParsersandGenerators

•  Input/Outputcompatability

•  Transla'onLexicons•  Word-basedvs.Transfer/Interlingua

•  ParallelCorpora•  Domainofinterest•  Biggerisbecer

•  TimeAvailability•  Sta's'caltraining,resourcebuilding

Page 61: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today• Mul'lingualChallengesforMT

• MTApproaches•  Sta's'cal• Neuralnet(Thursday)

• MTEvalua'on

Page 62: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

MTEvaluation• Moreartthanscience• WiderangeofMetrics/Techniques

•  interface,…,scalability,…,faithfulness,...space/'mecomplexity,…etc.

• Automa'cvs.Human-based• DumbMachinesvs.SlowHumans

Page 63: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

5 contents of original sentence conveyed (might need minor corrections)

4 contents of original sentence conveyed BUT errors in word order

3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.

2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers

1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses

Human-based Evaluation Example Adequacy Criteria

Page 64: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

5 clear meaning, good grammar, terminology and sentence structure

4 clear meaning BUT bad grammar, bad terminology or bad sentence structure

3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure

2 meaning unclear BUT inferable

1 meaning absolutely unclear

Human-based Evaluation Example Fluency Criteria

Page 65: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Today:Crowdsourcing• AmazonMechanicalTurkorCrowdFlower

• CreateaHITforeachsentence

• Getmul'pleworkerstorate

• Pay.01to.10perhit

• Completeanevalua'oninhours(vsdays/weeks)

• Ethics?

Page 66: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AutomaticEvaluationExampleBleuMetric(Papinenietal2001)

• Bleu•  BiLingualEvalua@onUnderstudy•  Modifiedn-gramprecisionwithlengthpenalty•  Quick,inexpensiveandlanguageindependent•  Correlateshighlywithhumanevalua'on•  Biasagainstsynonymsandinflec'onalvaria'ons

Page 67: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Page 68: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4/5

Slide from Nizar Habash

Page 69: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4 / 5 = 0.8 Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n = (0.8 ╳ 0.5)½ = 0.6325 è 63.25

Page 70: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

BLEUscoresfor110translationsystemstrainedonEuroparl

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 71: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 72: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Bleuscores2019(teamsinWMT)Portuguese->SpanishSpanish->Portuguese

Page 73: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Bleuscores2019(teamsinWMT)Hindi->NepaliNepali->Hindi

Page 74: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):
Page 75: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

AutomaticEvaluationExampleMETEOR(LavieandAgrawal2007)

•  MetricforEvalua'onofTransla'onwithExplicitwordOrdering

•  ExtendedMatchingbetweentransla'onandreference•  Porterstems,wordNetsynsets

•  UnigramPrecision,Recall,parameterizedF-measure•  ReorderingPenalty•  Parameterscanbetunedtoop'mizecorrela'onwithhumanjudgments

•  Notbiasedagainst“non-sta's'cal”MTsystems

Page 76: Machine Translation: Challenges and Approacheskathy/NLP/2019/ClassSlides/Class16MT/MT2019.pdf• (b/boy) an instance of the concept boy • Relaons link en’’es • (d/die-01 :locaon(p/park)):

Midtermgrades• Availableazerclass

• Mean:69• Median:70.25• Max:95.5• Min:32.5

• STDEV:14.04

• Willbecurved