Upload
alice-oh
View
5.291
Download
1
Tags:
Embed Size (px)
Citation preview
Topic and Text Analysis for Sentiment, Emotion, and Computational Social Science
November 2012Alice [email protected] & Information Labhttp://uilab.kaist.ac.kr
1
Thursday, December 6, 2012
Overview
• Topic modeling research
• CIKM 2011: Distance-dependent Chinese restaurant franchise (ddCRF)
• ICML 2012: Dirichlet process with random mixed measures (DP-MRM)
• CIKM 2012: Recursive chinese restaurant process for modeling topic hierarchies (rCRP)
• NIPS Big Learning Workshop 2012: Distributed Online Learning for Latent Dirichlet Allocation (DoLDA)
• Computational social science research
• WSDM 2011: Aspect sentiment unification model for online review analysis
• ICWSM 2012: Social aspects of emotions in Twitter conversations
• ACL 2012: Self-disclosure and relationship strength in Twitter conversations
2
Thursday, December 6, 2012
Do you feel what I feel?Social Aspects of Emotions in Twitter Conversations
Suin Kim, JinYeong Bak, Alice OhICWSM 2012
3
Thursday, December 6, 2012
Asking Research Questions
4
Thursday, December 6, 2012
Asking Research Questions
4
Thursday, December 6, 2012
Asking Research Questions
Human emotion is typically studied as a within-person, one-direction, non-repetitive phenomenon; focus has traditionally been on how one individual feels in reaction to various stimuli at a certain point of time. But people recognize and inevitably react emotionally and otherwise to expressions of emotion of other people. We propose that organizational dyads and groups inhabit emotion cycles: Emotions of an individual influence the emotions, thoughts and behaviors of others; others’ reactions can then influence their future interactions with the individual expressing the original emotion, as well as that individual’s future emotions and behaviors. People can mimic the emotions of others, thereby extending the social presence of a specific emotion, but can also respond to others’ emotions, extending the range of emotions present.
5
Thursday, December 6, 2012
Social Aspects of Emotions: Motivating Question
How are our emotions affected by others we talk to?
Thursday, December 6, 2012
Social Aspects of Emotions: Research Questions
• How do we communicate our emotions?
• Use a topic model on Twitter conversations to discover the “topics” that represent the eight emotions
• Analyze the proportions of the total tweets for the emotions
• How do we influence other people’s emotions?
• Analyze the and emotion transitions of the tweets
• Look for topics that change the emotions of the conversation partners
• Find interesting patterns of emotion pairs
Thursday, December 6, 2012
Social Aspects of Emotions: Data
• Twitter conversation data: approx 220k dyads who “reply” to each other, 1,670k conversational chains
!"!
#!
$!
%!
Thursday, December 6, 2012
Seed Words (We Feel Fine by Harris & Kamvar)
anticipationhopewaitawaitinspirexcitborereadiexpectnervoucalmmotivpreparcertainanxiouoptimistforese
joyawesomamazwonderexcitgladfinebeautihighluckisuperperfectcompletspecialblesssafeproud
angershitbitchassmeandamnmadjealoupissannoiangriupsetmoronragescrewstuckirrit
surpriseamazwowwonderweirdluckidiffer
awkwardconfusholistrangshockodd
embarrassoverwhelmastoundastonish
fearscarestresshorrornervouterroralarmbehindpanicfearafraiddesperthreatentensterrififrightanxiou
sadnesssorribadawsadwronghurtbluedeadlostcrushweakdepressworslowterribllone
disgustsickwrongevilfatuglihorriblgrossterriblselfishmiserpathetdisgustworthlessaw
ashamfuck
acceptanceokaioksamealrightsafelazirelaxpeaccontentnormalsecurcompletnumbfulfil
comfortdefeat
Thursday, December 6, 2012
Dirichlet Forest Prior
• Dirichlet Forest Prior (Andrzejewski et al.)
• Mixture of Dirichlet tree distribution
• Dirichlet tree: Generalization of Dirichlet distribution
• Knowledge is expressed using Must-link and Cannot-link primitives
• Must-link (love, sweetheart)
• Cannot-link (exciting, bored)
10DF-LDA
Thursday, December 6, 2012
Dirichlet Forest Prior
• Dirichlet Forest Prior (Andrzejewski et al.)
• Mixture of Dirichlet tree distribution
• Dirichlet tree: Generalization of Dirichlet distribution
• Knowledge is expressed using Must-link and Cannot-link primitives
• Must-link (love, sweetheart)
• Cannot-link (exciting, bored)
10
qβ
η
DF-LDA
Thursday, December 6, 2012
Domain Knowledge in Dirichlet Forest Prior
11
Seed Words
anticipationhopewaitawaitinspirexcitborereadiexpectnervoucalmmotivpreparcertainanxiouoptimistforese
joyawesomamazwonderexcitgladfinebeautihighluckisuperperfectcompletspecialblesssafeproud
angershitbitchassmeandamnmadjealoupissannoiangriupsetmoronragescrewstuckirrit
surpriseamazwowwonderweirdluckidiffer
awkwardconfusholistrangshockodd
embarrassoverwhelmastoundastonish
fearscarestresshorrornervouterroralarmbehindpanicfearafraiddesperthreatentensterrififrightanxiou
sadnesssorribadawsadwronghurtbluedeadlostcrushweakdepressworslowterribllone
disgustsickwrongevilfatuglihorriblgrossterriblselfishmiserpathetdisgustworthlessaw
ashamfuck
acceptanceokaioksamealrightsafelazirelaxpeaccontentnormalsecurcompletnumbfulfil
comfortdefeat
Must-link within a class Cannot-link between classes
Thursday, December 6, 2012
Dirichlet Forest vs. Dirichlet
12
FearDF-LDA don’t think but know why even wanna care worry understand
FearLDA good exam lol luck just school haha i’m xx worry tomorrow
SurpriseDF-LDA that very really cool wow wonder just some differ amazing
SurpriseLDA just rt holy got thank did shit new love lol awesome buy oh
SadnessDF-LDA bad my real feel life aw sad kill lost dead hurt wrong sick
SadnessLDA lol just know sorry isn’t oh tweet did haha don’t thought think
Thursday, December 6, 2012
Emotion Topics How do we express emotions?
JoyAnticipation AngerTopic 114omglovehahathankreallyTopic 107lovethankfollowwow
Topic 159gooddayhopemorningthankTopic 158lovethankmisshug
Topic 125hopebetterfeelthanksoonTopic 26goodthankhopemiss
Topic 146comewaitweekdayjuneTopic 146gooddaytimework
Topic 131lmaofuckassbitchshitTopic 4assyolmaonigga
Topic 19lmaoshitdamnfuckohTopic 13shitniggasmhyea
FearTopic 48omgohlmaoshitscareTopic 78happenheartattackhospital
Topic 27don’tcomenightsleepoutsideTopic 140timegotworkday
SurpriseTopic 172yeagknowthinktruefunnyTopic 89knowdon’tthinklook
Topic 15thinkdon’tknowmakereallyTopic 94hahadontthinkreally
29 70 21 14 5
Sadness DisgustTopic 6ohsorryhahaknowdidntTopic 59hurtgotgoodbadpain
Topic 106tweetreplydidn’treadsorryTopic 155ohreallymakefeel
Topic 116ohfuckdon’tyeewTopic 116lookhahaohknow
Topic 22don’tohthinkyeahlmaoTopic 174don’tthinksaypeople
AcceptanceTopic 43okohthankcoolokayTopic 102knowtryletok
Topic 199xxthankgoodokayfollowTopic 8nightlovegoodsleep
17 7 18 NeutralTopic 180comwwwhttpcheckyoutubeTopic 156twitterfacebookpeopleaccount
Topic 184accountgoogleappworkemailTopic 67foodchickencookrt
19
13
Thursday, December 6, 2012
Emotion Topics How do we express emotions?
JoyAnticipationTopic 114omglovehahathankreallyTopic 107lovethankfollowwow
Topic 125hopebetterfeelthanksoonTopic 26goodthankhopemiss
SadnessTopic 6ohsorryknowdidntTopic 59hurtgotgoodbadpain
NeutralTopic 180comwwwhttpcheckyoutubeTopic 156twitterfacebookpeopleaccount
GreetingCaring Sympathy IT/Tech
14
Thursday, December 6, 2012
Emotion Transitions Plutchik’s Wheel of Emotions
Joy39.7%
0.51
Acceptance10.4%
0.23
Fear2.6%
0.11
Surprise7.4%
0.17
Anticipation15.1%
0.26
Disgust2.9%
0.11
Sadness9.1%
0.19
0.31Anger12.8%
0.37
0.33
0.32
0.31
0.33
0.21
0.34
0.15
0.140.13
0.15
15
Thursday, December 6, 2012
Defining “Influence”
User A
User B
Having a tough day today. RIP Harrison. I’ll
miss you a ton :/
Just pray about it. God will help you.
Not really religious, but thanks man. :)
If you need talk you know I’m here.
Time
(Sadness) (Acceptance)
(Anticipation)
16
Thursday, December 6, 2012
Defining “Influence”
emotion influencing tweet
User A
User B
Having a tough day today. RIP Harrison. I’ll
miss you a ton :/
Just pray about it. God will help you.
Not really religious, but thanks man. :)
If you need talk you know I’m here.
Time
(Sadness) (Acceptance)
(Anticipation)
16
Thursday, December 6, 2012
Topic 117tweetpeopledon’treadpostTopic 59hurtgotbadpainfeel
Emotion Influences What can you say to make your partner feel better?
Joy → SadnessSadness → Joy
Topic 18wearlookthinkloveblackTopic 24lovethankgreatnewlook
Acceptance → Anger
Topic 31i’mgotlmaxshitdaTopic 13lmaoshitniggasmhyea
GreetingSympathizing
Swearing Complaining
17
Thursday, December 6, 2012
0
0.075
0.15
0.225
0.3
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.0410.0710.082
0.053
0.265
0.0610.081
0.0420.051
Emotion Influence: Sadness to Joy
Emotion Influence: Joy to Anger
Emotion Influence:Anger to Joy
0
0.1
0.2
0.3
0.4
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.2110.230.2140.2090.1910.2370.253
0.358
0.273
Expressing Anger has 26.5% of chance of changing the partner’s emotion from
Joy to Anger.
18
Expressing Joy has 35.8% of chance of changing the partner’s emotion from Sadness to Joy.
Thursday, December 6, 2012
Outliers
19
A: Sorry to hear about your bags. If you would like us to get someone to contact you DM usyour reference and contact number.
B: it's on it's way to manch. If the woman on the check in desk in Miami hadn't been tryingto be all smart! Been no problem.
A: Sorry about that. Pleased to hear they located it quickly for you though.
B: mistakes happen.
Thursday, December 6, 2012
Analyzing Self-Disclosure Behaviors in Twitter Conversations Using Text Mining
Techniques (Presented at ACL 2012)
JinYeong Bak, Suin Kim, Alice Oh{jy.bak, suin.kim}@kaist.ac.kr, [email protected]
Department of Computer Science, KAIST
Thursday, December 6, 2012
2012-07-11
In social psychology} Degree of self-disclosure in a relationship depends on
the strength of the relationship} Strategic self-disclosure can strengthen the relationship
Introduction
21
I like you too!
You’re my best
friend!
Thursday, December 6, 2012
2012-07-11
Hypothesis
22
Twitter conversations also show a similar pattern} Dyads with high relationship strength show more self-disclosure
behavior} Dyads with low relationship strength show less self-disclosure
behavior
I like you too!
You’re my best
friend!Hello~
Hi
Thursday, December 6, 2012
2012-07-11
Methodology} Twitter Data} 131K users } 2M conversations
} Relationship Strength} Chain frequency (CF)} Chain length (CL)
} Self-Disclosure} Personal information} Open communication} Profanity
} Analysis with Topic Models} Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])} Aspect and sentiment unification model (ASUM, [Jo, WSDM 2011])
23
Thursday, December 6, 2012
2012-07-11
Twitter Conversation} A Twitter conversation chain} 3 or more tweets } at least one reply by each user
} Our Twitter conversation data} Oct 2011 to Dec 2011} 131K users} 2M chains} 11M tweets
24
https://twitter.com/#!/britneyspears
Example of a conversation chain
Thursday, December 6, 2012
2012-07-11
Relationship Strength} Social psychology literature states relationship strength can be
measured by communication frequency and length [Granovetter, 1973;
Levin and Cross, 2004]} CF: chain frequency} The number of conversational chains between the dyad
averaged per month} CL: chain length} The length of conversational chains between the dyad
averaged per month} Relationship strength} A high CF or CL for a dyad means the relationship is strong} A low CF or CL for a dyad means the relationship is weak
25
Thursday, December 6, 2012
2012-07-11
Self-Disclosure} Open communication - Openness} Negative openness} Nonverbal openness} Emotional openness} Receptive openness – difficult to find in tweets} General-style openness – not clearly defined in the literature
} Personal Information} Personally Identifiable Information (PII)} Personally Embarrassing Information (PEI)
} Profanity} nigga, ass, wtf, lmao
26
Thursday, December 6, 2012
2012-07-11
Negative openness
} Method} We use ASUM with emoticons as seed words
[ “Aspect and sentiment unification model for online review analysis”, Jo, WSDM’11]} ASUM is LDA-based joint model of topic and sentiment} ASUM takes unannotated data and classifies each sentence (tweet) as
positive/negative/neutral
Self-Disclosure - Openness
27
Thursday, December 6, 2012
2012-07-11
Self-Disclosure - OpennessNonverbal openness
} Method} We look for emoticons, ‘lol’, ‘xxx’} Emoticons are like facial expressions -- :) :( :P} ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a
similar manner to nonverbal openness
28
Thursday, December 6, 2012
2012-07-11
Self-Disclosure - OpennessEmotional openness
} Method} Look for tweets that contain common expressions of feeling words
[We feel fine (Harris, J, 2009)]
29
Thursday, December 6, 2012
2012-07-11
Self-Disclosure – Personal InformationPersonally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
30
Ex) name, location, email address, job,social security number
Ex) clinical history,sexual life,job loss, family problem
Thursday, December 6, 2012
2012-07-11
Self-Disclosure – Personal Information}
31
Thursday, December 6, 2012
2012-07-11
Self-Disclosure – Personal InformationExample of PII, PEI and Profanity topics } Shown by high probability words in each topic
PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity
san tonight pants teeth family nigga
live time wear doctor brother lmao
state tomorrow boobs dr sister shit
texas good naked dentist uncle ass
south ill wearing tooth cousin bitch
32
Thursday, December 6, 2012
2012-07-11
Results
Thursday, December 6, 2012
2012-07-1134
weak ßà strong weak ßà strong
weak ßà strong weak ßà strong
sentiment nonverbal emotional profanity PII & PEI
Thursday, December 6, 2012
2012-07-1135
weak ßà strong
weak ßà strong
emotional PII & PEI
weak ßà strong
weak ßà strong
Thursday, December 6, 2012
2012-07-11
Results: Interpretation} Emotional openness} When they are not very close, they express frequent encouragements,
or polite reactions to baby or pets
36
Thursday, December 6, 2012
2012-07-11
Results: Interpretation} PII} When they meet new acquaintances, they use PII to introduce
themselves
37
Thursday, December 6, 2012
2012-07-11
ResultsAnalyzing outliers: a dyad linked weakly but shows high self-disclosure
38
Thursday, December 6, 2012
Distributed Online Learning for Latent Dirichlet Allocation
JinYeong Bak, Dongwoo Kim, and Alice OhNIPS 2012Workshop on Big Learning
39
Thursday, December 6, 2012
Motivation
• Problem 1: Inference for LDA takes a long time
• Problem 2: Continuously expanding corpus necessitates continuous updates of model parameters
• But updating of model parameters is not possible with plain LDA
• Must re-train with the entire updated corpus
• Solution to 1: Distributed inference shortens inference time (Newman JMLR 2009, Wang WWW 2012)
• Solution to 2: Online (batch) learning enables updates to model parameters (Hoffman NIPS 2010)
• Our Approach: Combine distributed inference and online learning
40
Thursday, December 6, 2012
Distributed Online LDA
• Based on variational inference
• Mini-batch updates via stochastic learning (variational EM)
• Distribute variational EM using MapReduce
41
Thursday, December 6, 2012
Experimental Setup
• Data: 5.1M Twitter conversations
• 4.8M English Wikipedia articles
• 60 node Hadoop system
• Each node with 8 x 2.30GHz cores
42
Thursday, December 6, 2012
Wikipedia Results
43
Topic 0 Topic 22 Topic 42 Topic 65 Topic 94 Topic 170 Topic 232
relativityphysicseinsteinquantumgravity
channeltelevision
tvcablenews
milkchocolate
sugarfood
cream
godbible
moseschaptergenesis
partyelection
presidentmemberelected
seasonteamleaguegame
football
albumsongbandmusic
released
Minibatch oLDA DoLDA Speedup
16,384 238666.25 47994.03 4.97
32,768 188508.71 33470.03 5.63
65,536 206290.27 26788.53 7.70
Thursday, December 6, 2012
Twitter Temporal Patterns of Topics
44
Conversation b1 on November 2, 2010 A I wish I could vote today, but I have to work for 14 hours B is it legal for them not to give you time off to vote? A probably Conversation b2 on March 31, 2012 A Mitt Romney: "Obama should release the notes and transcripts of
all his meetings with world leaders" B Why is he being held to higher standard than any other president. A did you see my Santorum 'slip' tweet? Is the media afraid to
comment on it? B oh yes I did. I saw it mentioned yesterday also. disgusting and he
should be raked over hot coals for it.
0.005
0.010
0.015
10−10 11−01 11−04 11−07 11−10 12−01 12−04Day
Doc
umen
t pro
porti
on
0.004
0.006
0.008
0.010
0.012
11−07 11−10 12−01Day
Doc
umen
t pro
porti
on
Conversation c1 on September 5, 2011 A Oh god, miss Waite ran over to me up the school just now! :L on
the plus subjects are now picked! :D B what did you pick?? A english, RE, art and psychology! :) was unsure between history
and psych but found out bubbles was teaching it so nooo! :L Conversation c2 on October 12, 2011 A :) My day's been okay! It feels long! But school' was okayish. I
hope you have an awesome day! :D B that's good then! Ahh hope it's not cause anything bad happened?
Thanks! Have a great sleep :) A no! Class was just boring lol and thanks! :) i will! Even though i
have to wake up early tomorrow for a midterm! :S
<Topic words: party vote people politics obama>
<Topic words: school mate class teacher grade>
Thursday, December 6, 2012
CAVEAT
45
Big Data, social media data, do not always get the right answers! They contain much noise and much bias. Sentiment analysis is also full of problems at the big data-level because every small assumption can turn out to cause wide swings in the final interpretation of the data. They are valuable because they have opened up possibilities for analyses of naturally-occurring data in huge amounts.We need better methods and tools that are tailored for social media.We need to ask the right questions that can be answered well despite the biases of the social media data.
Thursday, December 6, 2012
For details, visit our webpage:http://uilab.kaist.ac.krOr email me:[email protected]
Thursday, December 6, 2012