Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CS388:NaturalLanguageProcessing
GregDurre8
Lecture25:Mul
Administrivia
‣ Project2backtoday/tomorrow
‣ TACCalloca
Dealingwithotherlanguages
‣ManyalgorithmssofarhavebeendevelopedforEnglish‣ Somestructureslikecons
ThisLecture
‣Morphologicalrichness:effectsandchallenges
‣ Cross-lingualtaggingandparsing
‣Morphologytasks:analysis,inflec
Morphology
Whatismorphology?‣ Studyofhowwordsform
‣ Derivaestrangement(n)
become(v)=>unbecoming(adj)
Ibecome/shebecomes
‣ Inflecinflammable
‣Mostlyappliestoverbsandnouns
MorphologicalInflec
MorphologicalInflec
NounInflec
IrregularInflec
Agglu
Morphologically-RichLanguages
‣ManylanguagesusedallovertheworldhavemuchrichermorphologythanEnglish(Chineseisthemainexcep
Morphologically-RichLanguages
‣ Greatresourcesforchallengingyourassump
MorphologicalAnalysis/Inflec
MorphologicalAnalysis:Hungarian
Ámakormányegyetlenadócsökkentésétsemjavasolja.
n=singular|case=nomina/ve|proper=no
deg=posi/ve|n=singular|case=nomina/ve
n=singular|case=nomina/ve|proper=no
n=singular|case=accusa/ve|proper=no|pperson=3rd|pnumber=singular
mood=indica/ve|t=present|p=3rd|n=singular|def=yes
Butthegovernmentdoesnotrecommendreducingtaxes.
MorphologicalAnalysis
‣ Givenaword,needtopredictwhatitsmorphologicalfeaturesare
‣ LotsofworkonArabicinflec
Predic
Predic
MorphologicalReinflec
WordSegmenta
MorphemeSegmentaun+becom+ing—weshouldbeabletorecognizethesecommonpiecesandsplitthemoff
‣ Howdowedothis?
MorphemeSegmenta
ChineseWordSegmenta
Cross-LingualTaggingandParsing
Cross-LingualTagging
‣ LabelingPOSdatasetsisexpensive‣ Canwetransferannota
UnsupervisedTagging
‣Mul
TaggingbyAnnota
Cross-LingualTagging
DasandPetrov(2011)
‣ EM-HMM/featureHMM:unsupervisedmethodswithagreedymappingfromlearnedtagstogoldtags
‣ Projec
Cross-LingualParsing
McDonaldetal.(2011)
‣ NowthatwecanPOStagotherlanguages,canweparsethemtoo?
‣ Directtransfer:trainaparseroverPOSsequencesinonelanguage,thenapplyittoanotherlanguage
Iliketomatoes
PRONVERBNOUN
JelesaimePRONPRONVERB
Ilikethem
PRONVERBPRON
Parsertrained toaccepttag input
VERBistheheadofPRONandNOUN
parsenew data
train
Cross-LingualParsing
McDonaldetal.(2011)
‣Mul
Cross-LingualWordRepresenta
Mul
Mul
Mul
Mul
Mul
MulHindi(Devanagari).Transferswelldespitedifferentalphabets!
‣ Japanese=>English:differentscriptandverydifferentsyntax
Mul
Wherearewenow?
‣ Universaldependencies:treebanks(+tags)for70+languages
‣Manylanguagesares
Takeaways
‣ManylanguageshaverichermorphologythanEnglishandposedis