38
CS388: Natural Language Processing Lecture 24: Mul9linguality + Morphology Greg Durrett

CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

CS388:NaturalLanguageProcessingLecture24:Mul9linguality+Morphology

GregDurrett

Page 2: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Administrivia

‣ Project2graded;average=19.0

‣ Finalprojectpresenta9onsnextweek

‣ SeeCanvasannouncementforwhoispresen9ngwhen

‣ Canbe“workinprogress”,butthereshouldbeatleastpreliminaryresults

‣ FinalreportsdueonDecember14;noslipdays

Page 3: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Dealingwithotherlanguages

‣ManyalgorithmssofarhavebeendevelopedforEnglish

‣ Somestructureslikecons9tuencyparsingdon’tmakesenseforotherlanguages‣ NeuralmethodsaretypicallytunedtoEnglish-scaleresources,maynotbethebestforotherlanguageswherelessdataisavailable

1)Whatotherphenomena/challengesdoweneedtosolve?

‣ Ques9on:

2)Howcanweleverageexis9ngresourcestodobe]erinotherlanguageswithoutjustannota9ngmassivedata?

‣ OtherlanguagespresentsomeproblemsnotseeninEnglishatall!

Page 4: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

ThisLecture

‣Morphology:effectsandchallenges

‣ Cross-lingualtaggingandparsing

‣Morphologytasks:analysis,inflec9on,wordsegmenta9on

Page 5: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Morphology

Page 6: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Whatismorphology?‣ Studyofhowwordsform

‣ Deriva9onalmorphology:createanewlexemefromabaseestrange(v)=>estrangement(n)become(v)=>unbecoming(adj)

Ibecome/shebecomes

‣ Inflec9onalmorphology:wordisinflectedbasedonitscontext

‣Maynotbetotallyregular:enflame=>inflammable

‣Mostlyappliestoverbsandnouns

Page 7: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalInflec9on‣ InEnglish: Iarrive youarrive he/she/itarrives

wearrive youarrive theyarrive[X]arrived

‣ InFrench:

Page 8: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalInflec9on‣ InSpanish:

Page 9: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

NounInflec9on

‣ Nomina9ve:I/he/she,accusa9ve:me/him/her,geni9ve:mine/his/hers

‣ Notjustverbseither;gender,number,casecomplicatethings

Igivethechildrenabook<=>IchgebedenKinderneinBuchItaughtthechildren<=>IchunterrichtedieKinder

‣ Da9ve:mergedwithaccusa9veinEnglish,showsrecipientofsomething

Page 10: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

IrregularInflec9on‣ Commonwordsareolenirregular

‣ Iam/youare/sheis

‣ However,lesscommonwordstypicallyfallintosomeregularparadigm—thesearesomewhatpredictable

‣ Jesuis/tues/elleest

‣ Yosoy/ustedestá/ellaes

Page 11: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Agglu9na9ngLangauges‣ Finnish/Turkish/Hungarian(Finno-Ugric):whatapreposi9onwoulddoinEnglishisinsteadpartoftheverb

‣Manypossibleforms—andinnewswiredata,onlyafewareobservedilla9ve:“into” adessive:“on”

Page 12: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Morphologically-RichLanguages‣ManylanguagesspokenallovertheworldhavemuchrichermorphologythanEnglish(Chineseisthemainexcep9on)

‣ CoNLL2006/2007:dependencyparsing+morphologicalanalysesfor~15mostlyIndo-Europeanlanguages

‣Wordpiece/byte-pairencodingmodelsforMTarepre]ygoodathandlingtheseifthere’senoughdata

‣ SPMRLsharedtasks(2013-2014):Syntac9cParsingofMorphologically-RichLanguages

Page 13: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Morphologically-RichLanguages

‣ Greatresourcesforchallengingyourassump9onsaboutlanguageandforunderstandingmul9lingualmodels!

Page 14: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalAnalysis/Inflec9on

Page 15: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalAnalysis‣ InEnglish,notthatmanywordforms,lexicalfeaturesonwordsandwordvectorsarepre]yeffec9ve

‣Whenwe’rebuildingsystems,weprobablywanttoknowbaseform+morphologicalfeaturesexplicitly

‣ Inotherlanguages,*lots*moreunseenwords!Affectsparsing,transla9on,…

‣ Howtodothiskindofmorphologicalanalysis?

Page 16: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalAnalysis

Ámakormányegyetlenadócsökkentésétsemjavasolja.

n=singular|case=nomina9ve|proper=no

deg=posi9ve|n=singular|case=nomina9ve

n=singular|case=nomina9ve|proper=no

n=singular|case=accusa9ve|proper=no|pperson=3rd|pnumber=singular

mood=indica9ve|t=present|p=3rd|n=singular|def=yes

Butthegovernmentdoesnotrecommendreducingtaxes.

‣Whyisthisuseful?

Page 17: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalAnalysis‣ Givenaword,needtorecognizewhatitsmorphologicalfeaturesare

‣ LotsofworkonArabicinflec9on(highamountsofambiguity)

‣ Basicapproach:

‣ Lexicon:tellsyouwhatpossibili9esare

‣ Analyzer:sta9s9calmodelthatdisambiguates

‣ModelsarelargelyCRF-like:scoremorphologicalfeaturesincontext

Page 18: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Predic9ngInflec9on‣ Otherdirec9on:givenbaseform+features,inflecttheword

Durre]andDeNero(2013)

‣ Hardforunknownwords—needmodelsthatgeneralize

w i n d e n

Page 19: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Predic9ngInflec9on

w i n d e n

i

iiia...

i1

en

eestet-...

en2

n

-sttte...

n1

i1 n1

= =====

n1

en1

en2

to wind (de)

en

estt-...

en1

‣ Otherdirec9on:givenbaseform+features,inflecttheword

‣ Hardforunknownwords—needmodelsthatgeneralize

‣ Takeabunchofexis9ngverbsfromWik9onary,extractthesechangerulesusingcharacteralignments

Durre]andDeNero(2013)

‣ TrainaCRFwithcharactern-gramcontextfeaturestolearnwheretoapplythem

Page 20: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphologicalReinflec9on

Chahuneauetal.(2013)

‣Machinetransla9onwherephrasetableisdefinedintermsoflemmas‣ “Translate-and-inflect”:translateintouninflectedwordsandpredictinflec9onbasedonsourceside

Page 21: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

WordSegmenta9on

Page 22: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphemeSegmenta9on‣ Canwedosomethingunsupervisedratherthanthesecomplicatedanalyses?

CreutzandLagus(2002)

‣ unbecoming=>un+becom+ing—weshouldbeabletorecognizethesecommonpiecesandsplitthemoff

‣ Howdoweodothis?

Page 23: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

MorphemeSegmenta9on‣ Simpleprobabilis9cmodel

CreutzandLagus(2002)

‣ p(mi)=count(token)/count(alltokens)

‣ TrainwithEM:E-stepinvolveses9ma9ngbestsegmenta9onwithViterbi,M-step:collecttokencounts

allowedexpectedneedneeded all+owe+dexpe+ctedn+e+edne+ed+ed E0

M0:edhascount3 all+ow+edexpect+edne+edne+ed+ed

‣ Someheuris9cs:rejectraremorphemes,one-le]ermorphemes

‣ Doesn’thandlestemchanges:becoming=>becom+ing

E1

Page 24: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

ChineseWordSegmenta9on

‣ LSTMsovercharacterembeddings/characterbigramembeddingstopredictwordboundaries

‣ SomelanguagesincludingChinesearetotallyuntokenized

Chenetal.(2015)

‣ Havingtherightsegmenta9oncanhelpmachinetransla9on

Page 25: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTaggingandParsing

Page 26: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging‣Mul9lingualPOSinduc9on

Snyderetal.(2008)

‣ Genera9vemodeloftwolanguagessimultaneously,jointalignment+taglearning

‣ Complexgenera9vemodel,requiresGibbssamplingforinference

Page 27: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging‣WehaveresourcesforlanguageslikeEnglish—canweusethesemoredirectly?

DasandPetrov(2011)

Ilikeitalot

Jel’aimebeaucoup

NVPRDTADJ

NPRV??

‣ TagwithEnglishtagger,projectacrossbitext,trainFrenchtagger?‣ Candosomethingsmarter

Page 28: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging

DasandPetrov(2011)

{Ilikeitalot

Jel’aimebeaucoup

NVPRDTADJ

{‣ Formagraphoftrigrams,usethesetopropagateknowledgeabouttags

Page 29: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTaggingDasandPetrov(2011)

Ilikeit

l’aimebeaucoup l’adoreunpeul’adorebeaucoup

edgeweightsbasedonsimilarityofcontextsthesetrigramsoccurin

Iloveit helovesitshelovesit

edgeweightsbasedonalignments(middlewordmustbealigned)

‣ Eachnodeisassociatedwithadistribu9onovertags,labelpropaga9onupdatestheseusingthegraph

Page 30: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging

DasandPetrov(2011)

Page 31: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging

DasandPetrov(2011)

‣ Takethesetrigramsandtreatthemas“soltrainingexamples”andlearnanHMMtagger

‣ Labelpropaga9on:encouragesnodeswithhigher-weightedgesbetweenthemtohavesimilartags

‣ Prunetoonlykeeptagsabovesomeprobabilitytogetthelexicon(validtag-wordpairs)

Page 32: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualTagging

DasandPetrov(2011)

‣ EM-HMM/featureHMM:unsupervisedmethodswithagreedymappingfromlearnedtagstogoldtags

‣ Projec9on:projecttagsacrossbitexttomakepseudogoldcorpus,trainonthat

Page 33: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualParsing

McDonaldetal.(2011)

‣ NowthatwecanPOStagotherlanguages,canweparsethemtoo?

‣ Directtransfer:trainaparseroverPOSsequencesinonelanguage,thenapplyittoanotherlanguage

Iliketomatoes

PRONVERBNOUN

Jelesaime

PRONPRONVERB

Ilikethem

PRONVERBPRON ‣ Eventhoughwe'veneverseenthissequenceinEnglishanddon’tknowthewords,wecans9llfigureitout

Page 34: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualParsing

McDonaldetal.(2011)

‣Mul9-dir:transferaparsertrainedonseveralsourcetreebankstothetargetlanguage

‣Mul9-proj:morecomplexannota9onprojec9onapproach

Page 35: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualEmbeddings

Ammaretal.(2016)

‣ mul9Cluster:usebilingualdic9onariestoformclustersofwordsthataretransla9onsofoneanother,replacecorporawithclusterIDs,train“monolingual”embeddingsoverallthesecorpora

‣ mul9CCA:“project”allotherlanguagesintoEnglish

‣ CCA:learnaprojec9onofaligneddatapointsintoasharedspace

‣ Learnasharedmul9lingualembeddingspacesoanyneuralsystemcantransferover

Page 36: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Cross-LingualEmbeddings

Ammaretal.(2016)

‣Wordvectorsworkpre]ywellat“intrinsic”tasks,someimprovementonthingslikedocumentclassifica9onanddependencyparsingaswell

Page 37: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Wherearewenow?‣ Universaldependencies:treebanks(+tags)for70+languages

‣Manylanguagesares9llsmall,soprojec9ontechniquesmays9llhelp

‣Morecorporainotherlanguages,lessandlessrelianceonstructuredtoolslikeparsers,andpretrainingonunlabeleddatameansthatperformanceonotherlanguagesisbe]erthanever

‣ BERThaspretrainedmul9lingualmodelsthatseemtoworkpre]ywell(trainedonawholebunchoflanguages)

Page 38: CS388: Natural Language Processing Lecture 24 ... · This Lecture ‣ Morphology: effects and challenges ‣ Cross-lingual tagging and parsing ‣ Morphology tasks: analysis, inflec9on,

Takeaways

‣ManylanguageshaverichermorphologythanEnglishandposedis9nctchallenges

‣ Problems:howtoanalyzerichmorphology,howtogeneratewithit

‣ CanleverageresourcesforEnglishusingbitexts

‣ Next9me:wrapup+ethicsofNLP