Upload
doannhu
View
218
Download
0
Embed Size (px)
Citation preview
SentimentAnalysis
Announcements• Homework3dueat2:30pmnextTuesday.
FromCoreNLPtoApplications
CORENLPParsingPOStaggingSemanBcs
APPLICATIONS SenBmentSummarizaBonInformaBonExtracBon
MachineTranslaBon
Today• SenBmentanalysistasks:definiBon• SenBmentresources• TradiBonalsupervisedapproach• Neuralnetapproach
Doembeddingshandlenegation?• 1:not1.0000000000000004• 2:n't0.8595728019811346• 3:but0.839545755064721• 4:did0.8378272618764329• 5:would0.8187187243474063• 6:should0.8147740055059252• 7:if0.8116091330796058• 8:because0.7987450091713499• 9:they0.7944962528430977• 10:be0.791361002418091• 11:could0.7894321710724349• 12:never0.7860447682817786• 13:any0.7842654035407371• 14:even0.776876305477035• 15:do0.7708686700685263• 16:only0.7691260950825682• 17:might0.7673607671887417• 18:so0.7670919214405183• 19:that0.7630105150824054• 20:though0.7625415104568314• 21:does0.7556821279593668• 22:cannot0.7536603867497527• 23:neither0.7520466068389566• 24:yet0.7470431538715656• 25:although0.7465924525156928
AntonymsandSynonyms:embeddingfor“hot”• 1:hot0.9999999999999996
• 2:cool0.6137860693915586• 3:hoVest0.5816319703075693• 4:heat0.5266680665228453• 5:warm0.51671900202736• 6:cold0.5093751774671291• 7:chili0.4624909077143189• 8:dry0.4613872561048547• 9:heated0.45721498258314297• 10:bubbling0.4534137094122158• 11:hoVer0.4529186992101415• 12:spots0.4416197356093728• 13:boiling0.44035866405318447• 14:billboard0.4340849896360003• 15:soW0.4268572343642097• 16:temperature0.42600188687018437• 17:wet0.4198371006642362• 18:chocolate0.41844508951954273• 19:water0.4174513613786725• 20:temperatures0.4160490504977998• 21:drink0.41476813237122767• 22:stove0.41431353491608697• 23:humid0.41044559731588987• 24:sizzling0.4083319186481177• 25:cooking0.408002615081707
Whatissentiment?• ExpressionofposiBveornegaBveopinions• ..Towardsatopic,person,event,enBty• ..Towardsanaspect
Whysentimentanalysis?• SenBmentiscommoninonlineplaYorms• Peoplewriteabouttheirpersonalviewpoints
• UsefultounderstandwhatpeoplethinkaboutpoliBcalissues,poliBcalcandidates,importanteventsoftheday• UsefulforgeneraBngsummariesofreviews:restaurants,products,movies
Thesentimentanalysistask(s)• SubjecBvevsobjecBve• PosiBve,negaBveorneutral• DowehavesenBmenttowardsatarget?OraspectbasedsenBment?• What/whoisthesenBmentsource?
SubjectivevsObjective• �Atseveraldifferentlayers,it’safascina3ngtale.[“Who’sSpyingonOurComputers”,GeorgeMelloanWallStJournal.(Bookreview)• BellIndustriesIncincreaseditsquarterlyto10centsfrom7centsashare.
ExamplesfromWeibeetal2004
Positive/Negative/Neutral• FromUseNet:
• NegaBve:Ihadinmindyourfacts,Buddy,nothers.• PosiBve:Nicetouch.“Alleges”whateverfactspostedarenotinyourpersonaofwhatis“real”• Neutral:Marchappearstobeanes3matewhileearlieradmissioncannotbeen3relyruledout,"accordingtoChen,alsoTaiwan'schiefWTOnego3ator
ExamplesfromWeibeetal2004andRosenthal2014
SubjectivePhrases• TheforeignministrysaidThursdaythatitwas“surprised,toputitmildly”bytheU.S.StateDepartment’scri0cismofRussia’shumanrightsrecordandobjectedinparBculartothe“odious”secBononChechnya.[MoscowTimes,03/08/2002]• Subjec3vityanalysisiden3fiestextthatrevealsanauthor’sthoughts,beliefsorotherprivatestates.
ExamplesfromWeibeetal2004
SubjectivePhrasesandSources• TheforeignministrysaidThursdaythatitwas“surprised,toputitmildly”bytheU.S.StateDepartment’scri0cismofRussia’shumanrightsrecordandobjectedinparBculartothe“odious”secBononChechnya.[MoscowTimes,03/08/2002]• Whowassurprised?• WhowascriBcal?
ExamplesfromWeibeetal2004
AdditionalExamples• AuthoriBesareonlytooawarethatKashgaris4,000kilometres(2,500miles)fromBeijingbutonlyatenthofthedistancefromthePakistaniborder.
• Taiwan-madeproductsstoodagoodchanceofbecomingevenmorecompeBBvethankstowideraccesstooverseasmarketsandlowercostsformaterialimports,hesaid.
• "MarchappearstobeamorereasonableesBmatewhileearlieradmissioncannotbeenBrelyruledout,"accordingtoChen,alsoTaiwan'schiefWTOnegoBator.
• fridayeveningplansweregreat,butsaturday'splansdidntgoasexpected--iwentdancing&itwasanokclub,butterriblycrowded:-(
• WHYTHEHELLDOYOUGUYSALLHAVEMRS.KENNEDY!SHESAFUCKINGDOUCHE
• AT&Twasokaybutwhenevertheydosomethingniceinthenameofcustomerserviceitseemslikeafavor,whileT-Mobilemakesthatanormaleverydaythin
16
ExamplesfromRosenthal2014
SentimenttowardsTarget• IpreUymuchenjoyedthewholemovie.Target=wholemovie,senBment=posiBve.• Bulgariaiscri3cizedbytheEUbecauseofslowreformsinthejudiciarybranch,thenewspapernotes.Target=Bulgaria,senBment=negaBve• Stanishevwaselectedprimeministerin2005.Sincethen,hehasbeenaprominentsupporterofhiscountry’saccessiontotheEU.Target=country’saccesstotheEU,senBment=posiBve
ExamplesfromBreck&Cardieforthcoming
Datasets(Sem-evaldatasetsalsoused)
2000sentencesineachcorpus
18
Corpus AverageWordCount
AverageCharacterCount
Subjec6vePhrases
Objec6vePhrases
VocabularySize
CharacterLengthRestric6ons
LiveJournal 14.67 66.47 3035(39%) 4747(61%) 4747 30-120
MPQA 31.64 176.68 3325(41%) 4754(59%) 7614 none
TwiVer 25.22 118.55 2091(36%) 3640(64%) 8385 0-140
Wikipedia 15.57 77.20 2643(37%) 4496(63%) 4342 30-120
MPQA:extensivelyannotateddatasetbyStoyanav,CardieandWeibe2004.15opinionorientedqusBons,15factorientedquesBons.Alongwithtextspansfrom252arBcles.
RosenthalandMcKeown2013)
ExampleSentences
LiveJournal iwillhavetosBcktomycanonfilmslrunBlinafewyearsicanaffordtoupgradeagain:)
MPQA ThesaleinfuriatedBeijingwhichregardsTaiwananintegralpartofitsterritoryawaiBngreunificaBon,byforceifnecessary.
TwiVer RT@tashjade:That’sreallysad,CharlieRT“UnBltonightIneverrealisedhowfuckedupIwas”-CharlieSheen#sheenroast
Wikipedia PerhapsifreportedcriBcallybyawesternsourcebutcertainlynotbyanIsraelisource.
19
SubjecBve ObjecBve
SentimentLexicons• GeneralInquirer• SenBWordNet• DicBonaryofAffect(DAL)
DictionaryofAffectinLanguage• DicBonaryof8742wordsbuilttomeasuretheemoBonalmeaningoftexts• Eachwordisgiventhreescores(scaleof1to3)• pleasantness-alsocalledevaluaBon(ee)• acBveness(aa)• andimagery(ii)
C.M.Whissel.1989.Thedic6onaryofaffectinlanguage.InR.PlutchikandH.Kellerman,editors,EmoBon:theoryresearchandexperience,volume4,London.Acad.Press.
21
Wordnet• Propernouns(e.g.BritneySpears)areautomaBcallymarkedasobjecBve• WordsthatdonotexistintheDALarelookedupinWordnet• ComputetheaverageoftheDALscoresofallthesynonymsofthefirstsense• Iftherearenosynonyms,lookatthehypernym
22
Wiktionary• WikBonaryisafreecontentdicBonary• hVp://www.wikBonary.org
• IfaworddoesnotappearintheDALorWordnetlookitupinWikBonary• ComputetheaverageoftheDALscoresforeachwordinthedefiniBonthathasitsownWikBonarypage
23
Emoticons• 1000emoBconsweregatheredfromseverallistsavailableontheinternet• Wekeptthe192emoBconsthatappearedatleastonceandmappedeachemoBcontoasingleworddefiniBon
24
Methods• Pre-processingsteps• EmoBconkeysandcontracBonexpansion• Chunkerandtagger*• LexicalFeatures*• SyntacBcFeatures*• SocialMediaFeatures
*ApoorvAgarwal,FadiBiadsy,andKathleenR.McKeown.2009.Contextualphrase-levelpolarityanalysisusinglexicalaffectscoringandsyntac6cn-grams.InProceedingsofEACL’09
25
PreprocessingLiveJournal [i]/NPsub[willhavetos6ck]/VPobj[to]/PPobj[mycanonfilmslr]/NPobj[unBl]/
PPobj[in]/PPobj[afewyears]/NPsub[i]/NPsub[canaffordtoupgrade]/VPobj[again:)]/NPsub
MPQA [Thesale]/NPsub[infuriated]/VPobj[Beijing]/NPobj[which]/NPsub[regards]/VPsub[Taiwan]/NPobj[anintegralpart]/NPsub[of]/PPobj[itsterritoryawaiBngreunificaBon,]/NPobj[by]/PPobj[force]/NPsub[if]/obj[necessary.]/sub
TwiVer [RT@tashjade:]/NPobj [That]/Npobj[is]/VPsub[really]/sub[sad,]/sub[CharlieRT]/NPobj[”]/NPobj[UnBl]/PPobj[tonight]/NPsub[I]/NPsub[never]/sub[realised]/VPsub[how]/sub[fucked]/VPsub[up]/PPobj[I]/NPsub[was]/VPsub[”]/obj[-CharlieSheen#sheenroast]/NPobj
Wikipedia [Perhaps]/sub[if]/obj[reported]/VPsub[cri6cally]/sub[by]/PPobj[awesternsourcebut]/NPsub[certainlynot]/sub[by]/PPobj[anIsraelisource.]/NPsub
26Xuan-HieuPhan,CRFChunker:CRFEnglishPhraseChunkerhVp://crfchunker.sourceforge.net/,2006
LexicalFeatures• POSTags*• N-grams*• Performedchi-squarefeatureselecBononthen-grams
*ApoorvAgarwal,FadiBiadsy,andKathleenR.McKeown.2009.Contextualphrase-levelpolarityanalysisusinglexicalaffectscoringandsyntac6cn-grams.InProceedingsofEACL’09
27
SyntacticFeatures• Usethemarkedupchunkstoextractthefollowing:*• n-grams:1-3words• POS:NP,VP,PP,JJ,other• PosiBon:target,right,leW• SubjecBvity:subjecBve,objecBve• Minandmaxpleasantness
*ApoorvAgarwal,FadiBiadsy,andKathleenR.McKeown.2009.Contextualphrase-levelpolarityanalysisusinglexicalaffectscoringandsyntac6cn-grams.InProceedingsofEACL’09
28
SocialMediaFeatures
29
30
-
0.04
0.08
0.12
0.16
0.20CapitalW
ords s o
OutofV
ocabulary s o
EmoB
cons s o
Acronyms s o
Exclam
aBon
s s o
Repe
ated
ExclamaB
ons s o
Que
sBon
s s o
Repe
ated
Que
sBon
s s o
Ellipses s o
SocialMediaFeatures
twiVer
livejournal
wikipedia
mpqa
SMfeaturestendtobeveryrare.Thefrequencyforeachfeatureislessthan1%perdataset
SingleCorpusClassiOication
BalancedUnbalanced
• LogisBcRegressioninWeka
• 10runsof10-foldcross-validaBon
• StaBsBcalsignificanceusingthet-testwithp=.001
31
Experiment LiveJournal MPQA TwiVer Wikipedia
n-gramsize 100 2000 none none
majority 58% 59% 64% 63%
JustDAL 76.5% 75.7% 83.6% 80.4%
DicBonaries+SM 77.1% 76.1% 84% 81.4%
Wordnet 76.7% 75.6% 84% 80.7%
Wordnet+SM 77.1% 76.1% 84.2% 81.4%
DicBonaries 76.6% 75.7% 83.9% 80.7%
SM 77% 76.1% 83.7% 81.2%
Experiment LiveJournal MPQA TwiVer Wikipedia
n-gramsize 100 200 none none
majority 50% 50% 50% 50%
JustDAL 74.7% 75.7% 81.9% 79.3%
DicBonaries+SM 76.7% 76.2% 82.6% 80.2%
Wordnet 75.1% 75.8% 82.4% 79.1%
Wordnet+SM 76.6% 75.3% 82.6% 80.3%
DicBonaries 75.3% 75.8% 82.4% 79.1%
SM 76.2% 76.3% 82.2% 80.4%
SocialMediaErrorAnalysis• Wikipedia• PunctuaBonwasusefulasafeaturefordeterminingthataphraseisobjecBveifitisasmallphrase.However,severalsubjecBvephraseswereincorrectlyclassifiedbecauseofthis 32
0%
20%
40%
60%
80%
100%
Wikipedia
subjecBveobjecBve
SocialMediaErrorAnalysis• TwiVer• ellipsesdohelpindicatethatasentenceisobjecBve.Theaccuracyimprovedfrom82%to92%forsentenceswiththisfeature• AllothersocialmediafeatureswereincorrectlyclassifiedasobjecBve/subjecBvedependingonthesocialmediapreference. 33
0%
20%
40%
60%
80%
100%
TwiQer
subjecBveobjecBve
SocialMediaErrorAnalysis• LiveJournal• OutofVocabularywordsandpunctuaBonwerethemostusefulsocialmediafeatures.• InalldatasetsthepunctuaBonfeaturecausedcloseto50/50exchangebutthefeaturewasbestinLiveJournal. 34
0%
20%
40%
60%
80%
100%
Livejournal
subjecBveobjecBve
Cross-GenreClassiOication
35
TwiQer LiveJournal MPQA Wikipedia
TwiQer 71.6% 62.1% 76.9%
LiveJournal 82.5% 65.4% 80.9%
MPQA 75.6% 69.3% 71.2%
Wikipedia 82.4% 76.7% 62.4%
Training
TesBng
Thischartdisplaysthebestresultsforeachexperiment
Cross-GenreClassiOication
36
TwiQer LiveJournal MPQA Wikipedia
TwiQer 71.6% 62.1% 76.9%
LiveJournal 82.5% 65.4% 80.9%
MPQA 75.6% 69.3% 71.2%
Wikipedia 82.4% 76.7% 62.4%
Training
TesBng
• TheonlinegenresdonotdowellinpredicBngMPQAsentences
Cross-GenreClassiOication
37
TwiQer LiveJournal MPQA Wikipedia
TwiQer 71.6% 62.1% 76.9%
LiveJournal 82.5% 65.4% 80.9%
MPQA 75.6% 69.3% 71.2%
Wikipedia 82.4% 76.7% 62.4%
Training
TesBng
• LiveJournaltrainingdatadoesagoodjobofpredicBngtheotheronlinegenres• WikipediatrainingdatadoesagoodjobofpredicBngTwiVer
Cross-GenreClassiOication
38
TwiQer LiveJournal MPQA Wikipedia
TwiQer 71.6% 62.1% 76.9%
LiveJournal 82.5% 65.4% 80.9%
MPQA 75.6% 69.3% 71.2%
Wikipedia 82.4% 76.7% 62.4%
Training
TesBng
• TwiVertrainingdatadoesadecentjobofpredicBngWikipedia• WikipediatrainingdatadoesadecentjobofpredicBngLiveJournal
Cross-GenreClassiOication
39
TwiQer LiveJournal MPQA Wikipedia
TwiQer 71.6% 62.1% 76.9%
LiveJournal 82.5% 65.4% 80.9%
MPQA 75.6% 69.3% 71.2%
Wikipedia 82.4% 76.7% 62.4%
Training
TesBng
• Ingeneral,usingtheMPQAastrainingdoesnotperformwell• UsingTwiVerastrainingdoesnotperformwellinpredicBngLiveJournal
sentences
NeuralNetworkApproachestoSentiment• Goldberg:• TakeastandardRNNsuchasshowninclasslastBme• Takealabeleddataset(e.g.,IMDBsenBmentdataset)• IniBalizewithpre-trainedwordembeddings(wordtovecorglove)• UsesigmoidtopredictbinarysenBmentlabels:posiBvevsnegaBve.
Example• Foreachsentenceinthetrainingcorpus,classify,comparetogoldstandardandcomputeloss,backpropagate.• Recallthatwemayusemini-batchessothatwe’renotback-propagaBngforeachexample
• Ihadinmindyourfacts,Buddy,nothers.
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3sigmoid
I had in
wx
wh
x3
h3
σ
mind …
Inthisoverview,wreferstotheweightsButtherearedifferentkindsofweightsLet’sbemorespecific
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had in
wx
U
x3
h3
σ
mind …
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had in
wx
U
x3
h3
σ
mind …
Waretheweights:thewordembeddingmaatrixmulBplicaBonwithxiyieldstheembeddingforxUisanotherweightmatrixH0isoWennotspecified.Histhehiddenlayer.
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
U
x1
h0
h1
wx
U
x2
h2
σwx
U
x3
h3
σ
y3sigmoid
I had in
wx
U
x3
h3
σ
mind …
ht=σ(Uwxt)ht-1
σ
RNN–Ihadinmindyourfacts,buddy,nothers.
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
Sigmoid
I had in
wx
wh
x3
h3
σ
mind …
Y=posiBve?Y=negaBve?Finalembeddingrunthroughthesigmoid
funcBon->[0,1]1=posiBve0=negaBveOWenfinalhisusedaswordembeddingforthesentence
UpdatingParametersofanRNN
wx
wh
x1
h0
h1
σwx
wh
x2
h2
σwx
wh
x3
h3
σ
y3sigmoid
Cost
wy
BackpropagaBonthroughBmeGoldlabel=0(negaBve)AdjustweightsusinggradientRepeatmanyBmeswithallexamples
SlidefromRadev
I had in
RecursiveDeepModelsforSemanticCompositionalityoveraSentimentTreebank• Socheretal,Stanford2013hVps://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf• Problemwithpreviouswork:difficultyexpressingthemeaningoflongerphrases• Goal• TopredictsenBmentatthesentenceorphraselevel• CaptureeffectofnegaBonandconjuncBons• SenBmentTreebank• RecursiveNeuralTensorNetwork
SentimentTreebank• MoviereviewexcerptsfromroVentomatoes.com(Pang&Lee2005)• 10,662sentences• ParsedbyStanfordParser(Klein&Manning2003)• 215,154phrases• EachphraselabeledforsenBmentusingAmazonMechanicalTurk(AMT)• 5classesemerge:negaBve,somewhatnegaBve,neutral,somewhatposiBve,posiBve
Example--verynegaBve++veryposiBve- NegaBve+posiBve0neutral
RecursiveNeuralModels
RNN:RecursiveNeuralNetwork
WaretheweightstolearnWεf=tanh
MV-RNNMatrixvectorRNN• Introduceweightmatrixassociatedwitheachnon-terminal(P2foradjP)andterminal(Afora)• a=not,b=very,c=good
RNTN:RecursiveNeuralTensorNetwork• TheMV-RNNhastoomanyparameterstolearn(sizeofvocabulary)• CanwegetcomposiBonalitywithreducedparameters?• P1=f([ab]u1u2a)u3u4b=f([ab]u1a+u2b)u3a+u4b
=f(u1aa+u2ab+u3ab+u4bb)
Results
Positive–“mostcompelling”
Negative–“leastcompelling”
HandlingConjunctions
NextTime• SummarizaBon