Upload
jinyeong-bak
View
133
Download
2
Tags:
Embed Size (px)
Citation preview
Self-disclosure in Twitter conversations
JinYeong [email protected]
Department of Computer Science, KAIST
About Me
2014-10-232
JinYeong Bak Ph.D. student at KAIST, U&I Lab Research interests Bayesian Data Analysis Computational Social Science
About Me
2014-10-232
JinYeong Bak Ph.D. student at KAIST, U&I Lab Research interests Bayesian Data Analysis Computational Social Science
Research Intern, MSRA, 2013, Supervisor: Chin-Yew Lin Related publications Self-Disclosure and Relationship Strength in Twitter Conversations,
ACL 2012 (with Suin Kim, Alice Oh) Self-disclosure topic model for classifying and analyzing Twitter
conversations, EMNLP 2014 (with Chin-Yew Lin, Alice Oh)
2014-10-23
Overview
2014-10-23
Self-disclosure
4 2014-10-23
Self-disclosure
4 2014-10-23
Self-disclosure
4 2014-10-23
Self-disclosure
4 2014-10-23
Limitations in Previous Works
2014-10-235
Survey Hand coding Lab environment
Limitations in Previous Works
2014-10-235
Survey Hand coding Lab environment
Hard to identify self-disclosure
in naturally occurring and large dataset
Twitter Conversations
https://twitter.com/britneyspears
Example of a Twitter conversation
6 2014-10-23
Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM)
7 2014-10-23
Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM)
7 2014-10-23
Accuracy and average F1
Self-disclosure & Social featuresWhat are relations between self-disclosure and social features in Twitter conversations?
8
2014-10-23
Self-disclosure (SD)
The verbal expressions by which a person reveals aspects of self to others [Jourard1971b]
Process of making the self known to others [Jourard&Lasakow1958]
3~40% of everyday conversation is consist of self-disclosure [Dunbar et al.1997]
Self-disclosure: Definition
10 2014-10-23
Self-disclosure: Level
2014-10-2311
Self-disclosure level [Vondracek et al.1971, Barak et al.2007]
No disclosure (G level)General information and ideas
Medium disclosure (M level)General information about self or someone close to him
High disclosure (H level)Sensitive information about self or someone close to him
Self-disclosure: G level General information and ideas No information about self or someone close to him
12 2014-10-23
Self-disclosure: M level General information about self or someone close to him Personal events, age, occupation and family members
13 2014-10-23
Self-disclosure: H level Sensitive information about self or someone close to him Problematic behaviors of self and family members Physical appearance, health, death, sexual topics
14 2014-10-23
Self-disclosure: Relations
2014-10-2315
Human relationship Degree of self-disclosure in a relationship depends on the
strength of the relationship [Duck2007]
Strategic self-disclosure can strengthen the relationship
Self-disclosure: Relations
2014-10-2316
Benefits Can get social support from others [Derlega et al.1993]
Can cope with stress [Derlega et al.1993,Tamir and Mitchell2012]
Examples
Self-disclosure: Relations
2014-10-2317
Consideration Easy to be attacked when private information is opened Need to manage privacy boundary (e.g. people, topics) [Petronio2002]
Example
Limitations in Previous Works
2014-10-2318
Survey Asking questions to participants Cons) Biased by participants memory
Limitations in Previous Works
2014-10-2318
Survey Asking questions to participants Cons) Biased by participants memory
Hand coding Analyzing dataset by human Cons) Cannot apply to large dataset
Limitations in Previous Works
2014-10-2318
Survey Asking questions to participants Cons) Biased by participants memory
Hand coding Analyzing dataset by human Cons) Cannot apply to large dataset
Lab environment Experiments held in lab or artificial environment Cons) Not real/naturally occurring dataset
Research Questions
2014-10-2319
How can we find self-disclosure in large & naturally occurring corpus automatically?
Research Questions
2014-10-2319
How can we find self-disclosure in large & naturally occurring corpus automatically?
What are relations between self-disclosure and social features in large & naturally occurring corpus?
2014-10-23
Twitter Conversations
21
Online social networking service www.twitter.com 200 million users send over 400 million tweets daily
(2013.09)
2014-10-23https://twitter.com/NoSyu
Tweet
2014-10-2322
Users write 140-characters messages Users mention others or re-tweet other’s tweet
Conversation in Twitter
2014-10-2323
Users have a conversation in Twitter
https://twitter.com/britneyspears
Conversation Topics
2014-10-2324
Users discuss several topics with others
Soccer Politics
Conversation Topics
2014-10-2325
Users discuss several topics with others
Places Family
Twitter Conversations A Twitter conversation 5 or more tweets At least one reply by each user
https://twitter.com/britneyspears
Example of a Twitter conversation
26 2014-10-23
Twitter Conversations A Twitter conversation 5 or more tweets At least one reply by each user
Twitter conversation data Aug 2007 to Jul 2013 102K users 2M conversations 17M tweets
https://twitter.com/britneyspears
Example of a Twitter conversation
26 2014-10-23
2014-10-23
Self-disclosure and relationship strength in Twitter conversations
ACL 2012 short paper
2014-10-23
Self-disclosure: Relations
2014-10-2328
Human relationship Degree of self-disclosure in a relationship depends on the
strength of the relationship [Duck2007]
Strategic self-disclosure can strengthen the relationship
Research Question
2014-10-2329
Does Twitter conversations also show a similar pattern? Dyads with high relationship strength show more self-
disclosure behavior Dyads with low relationship strength show less self-disclosure
behavior
Methodology Twitter data 131K users 2M conversations
30 2014-10-23
Methodology Twitter data 131K users 2M conversations
Relationship strength Conversation frequency (CF) Conversation length (CL)
30 2014-10-23
Methodology Twitter data 131K users 2M conversations
Relationship strength Conversation frequency (CF) Conversation length (CL)
Self-disclosure Personal information Profanity
30 2014-10-23
Methodology Twitter data 131K users 2M conversations
Relationship strength Conversation frequency (CF) Conversation length (CL)
Self-disclosure Personal information Profanity
Analysis with topic models Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])
30 2014-10-23
Relationship Strength CF: conversation frequency The number of conversational chains between the dyad
averaged per month
CL: conversation length The length of conversational chains between the dyad
averaged per month
31 2014-10-23
Relationship Strength CF: conversation frequency The number of conversational chains between the dyad
averaged per month
CL: conversation length The length of conversational chains between the dyad
averaged per month
Relationship strength A high CF or CL for a dyad means the relationship is strong A low CF or CL for a dyad means the relationship is weak
31 2014-10-23
Self-disclosure Personal information Personally Identifiable Information (PII) Personally Embarrassing Information (PEI)
Profanity nigga, ass, wtf, lmao
32 2014-10-23
Self-disclosure: Personal InformationPersonally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
33 2014-10-23
Ex) name, location, email address, job,social security number
Ex) clinical history,sexual life,job loss, family problem
Self-disclosure: Personal Information Discover topics in each conversation Use LDA [Blei2003] with 𝑘𝑘 = 300 LDA outputs a topic proportion for each conversation LDA outputs a multinomial word distribution for each topic
34 2014-10-23
Self-disclosure: Personal Information Discover topics in each conversation Use LDA [Blei2003] with 𝑘𝑘 = 300 LDA outputs a topic proportion for each conversation LDA outputs a multinomial word distribution for each topic
Find related topics Annotate conversations that best represent each topic Use Amazon Mechanical Turk Turkers annotated conversations for
Existence of PII
Existence of PEI
Keywords
34 2014-10-23
Self-disclosure: Personal InformationExample of PII, PEI and Profanity topics Shown by high probability words in each topic
PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity
san tonight pants teeth family nigga
live time wear doctor brother lmao
state tomorrow boobs dr sister shit
texas good naked dentist uncle ass
south ill wearing tooth cousin bitch
35 2014-10-23
Results
2014-10-2336
weak strong weak strong
Profanity PII & PEI
Conversation Frequency Conversation Length
Profanity PII & PEI
Results
2014-10-2337
weak strong weak strong
profanity PII & PEI
Conversation Frequency Conversation Length
profanity PII & PEI
Results: Interpretation PII When they meet new acquaintances,
they use PII to introduce themselves
38 2014-10-23
Summary Used a large corpus of Twitter conversations Measured relationship strength by conversation frequency
and conversation length Measured self-disclosure by PII, PEI Profanity
Confirmed hypothesis that stronger relationships show more self-disclosure behaviors in Twitter conversations
39 2014-10-23
Weakness of the Paper
2014-10-2340
Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degree
Weakness of the Paper
2014-10-2340
Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degreeSelf-disclosure level
Weakness of the Paper
2014-10-2340
Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degree
Use naïve computational method LDA with post-processing Need to build more concrete novel method
Self-disclosure level
Weakness of the Paper
2014-10-2340
Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degree
Use naïve computational method LDA with post-processing Need to build more concrete novel method
Self-disclosure level
Self-disclosure Topic Model
2014-10-23
Self-disclosure Topic Model (SDTM)
EMNLP 2014 long paper
2014-10-23
Difficulties for SD research Lack of ground-truth dataset of SD level No tagged dataset for Twitter conversation No accessible self-disclosure datasets
42 2014-10-23
Difficulties for SD research Lack of ground-truth dataset of SD level No tagged dataset for Twitter conversation No accessible self-disclosure datasets
Lack of study about SD in computational linguistics Definitions and relations with others in social psychology Survey or hand-coding Related word categories in LIWC [Houghton et al.2012]
42 2014-10-23
Ground-truth Dataset Process Sample random 301
Twitter conversations Ask it to three judges Tag self-disclosure level
to each tweet Work on a web-based platform
43Screenshot of annotation web-based platform
2014-10-23
Ground-truth Dataset Process Sample random 301
Twitter conversations Ask it to three judges Tag self-disclosure level
to each tweet Work on a web-based platform
Result Tagged G: 122, M: 147, H: 32
conversations Fleiss kappa: 0.68
43Screenshot of annotation web-based platform
2014-10-23
Assumptions: First person pronounsFirst person pronouns are good indicators for self-disclosure Ex) ‘I’, ‘My’ Used in previous research [Joinson et al.2001, Barak et al.2007]
44 2014-10-23
Assumptions: First person pronounsFirst person pronouns are good indicators for self-disclosure Ex) ‘I’, ‘My’ Observed highly discriminative features between G and M/H in
annotated dataset
45
Unigram Bigram Trigram
my I love I have a
I I was is going to
I’m I have to go to
but my dad want to go
was go to and I was
I’ve my mom going to miss
2014-10-23
Assumptions: TopicsM and H level have different topics [General vs Sensitive] information about self or intimate
46 2014-10-23
Assumptions: TopicsSelf-disclosure related topics by LDA
Location Time Adult Health Family Profanity
san tonight pants teeth family nigga
live time wear doctor brother lmao
state tomorrow boobs dr sister shit
texas good naked dentist uncle ass
south ill wearing tooth cousin bitch
47 2014-10-23
Assumptions: TopicsM and H level have different topics [General vs Sensitive] information about self or intimate Can be formalized as topics
Personally Identifiable Information General information about self Ex) name, location, email address, job, …
Secrets Sensitive information about self Ex) physical appearance, health, sexuality, death, …
48 2014-10-23
Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling
49 2014-10-23
Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling Classifying G and M/H level Observed first-person pronouns Using learned maximum entropy classifier
49 2014-10-23
Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling Classifying G and M/H level Observed first-person pronouns Using learned maximum entropy classifier
Classifying M and H level Observed words Using seed words for each level
49 2014-10-23
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Self-Disclosure Topic Model (SDTM)
2014-10-2350
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
Maximum Entropy Classifier
2014-10-2351
Learned from annotated dataset Works better than others
(C4.5, Naïve Bayes, SVM with linear kernel, polynomial kernel and radial basis)
Used to identify aspect and opinions in topic model [Zhao2010]
Seed WordsSeed words are prior knowledge for each level G level No seed words (symmetric prior)
M level Data-driven approach in Twitter conversation
H level Data-driven approach from external dataset
52 2014-10-23
Seed Words M level Data-driven approach
Use Twitter conversation dataset
Get frequently occurred trigram that begin with ‘I’ and ‘my’
53 2014-10-23
Seed Words M level Data-driven approach
Use Twitter conversation dataset
Get frequently occurred trigram that begin with ‘I’ and ‘my’
Example seed words
53
Name Birthday Location Occupation
My name is My birthday is I live in My job is
My last name My birthday party I lived in My new job
My real name My bday is I live on My high school
2014-10-23
Seed Words H level Data-driven approach
Use external dataset (Six Billion Secrets) http://www.sixbillionsecrets.com Users write and share his/her secrets 26,523 posts
Extract high ranked word features
54 2014-10-23
Example of secret posts in Six Billion Secrets
Seed Words H level Data-driven approach
Use external dataset (Six Billion Secrets) http://www.sixbillionsecrets.com Users write and share his/her secrets 26,523 posts
Extract high ranked word features
Example seed words
54
Physical appearance Health condition Death
chubby addicted deadfat surgery died
scar syndrome suicideacne disorder funeral
2014-10-23
Example of secret posts in Six Billion Secrets
Classifying Performance Data Annotated Twitter conversation Random shuffled 80/20 train/test
55 2014-10-23
Classifying Performance Data Annotated Twitter conversation Random shuffled 80/20 train/test
Methods BOW+
Bag of Words + Bigrams + Trigrams features, Maximum entropy FirstP
Occurrence of first-person pronouns features, Maximum entropy SEED
Seed words and trigrams features, Maximum entropy FirstP+SEED
FirstP and SEED feature, Two stage Maximum entropy SDTM
Self-disclosure Topic Model
55 2014-10-23
Classifying Performance
56 2014-10-23
Classifying Performance
56 2014-10-23
Classifying Performance
57 2014-10-23
Classifying Performance
57 2014-10-23
Classifying Performance
57 2014-10-23
Classifying Performance
57 2014-10-23
2014-10-23
Self-disclosure & Social features
EMNLP 2014 long paper
Self-disclosure & Social featuresWhat are relations between self-disclosure and social features in Twitter conversations? Research questions
1. Does high self-disclosure lead to longer conversations?2. Is there difference in conversation length patterns over time
depending on overall self-disclosure level?3. Does high self-disclosure users have many conversation partners?4. Does high self-disclosure users have more conversations
frequently?
59
Research QuestionsQ1) Does high self-disclosure lead to longer conversations?
60 2014-10-23
Research QuestionsQ2) Is there difference in conversation length patterns over time depending on overall self-disclosure level?
61 2014-10-23
High SD level dyad
Low SD level dyad
Research Questions
2014-10-2362
Q3) Does high self-disclosure users have many conversation partners?
High SD level userLow SD level user
Research Questions
2014-10-2363
Q4) Does high self-disclosure users have more conversations frequently?
High SD level userLow SD level user
ResultsHigh ranked topics in each level (G, M, H levels)
Shown by high probability words in each topic
G 1 G 2 M 1 M 2 H 1 H 2
obama league send going better ass
he’s win email party sick bitch
romney game i’ll weekend feel fuck
vote season sent day throat yo
right team dm night cold shit
president cup address dinner hope fucking
64 2014-10-23
ResultsQ1) Does high self-disclosure lead to longer conversations?Ans) Positive relations between initial SD level and changes CL
65 2014-10-23
ResultsQ2) Is there difference in CL patterns over time by overall SD level?
Ans) ‘high’ and ‘mid’ groups increase CL over time, not ‘low’
‘high’ groups talk more in a conversation than ‘mid’ & ‘low’ groups
66 2014-10-23
Results
2014-10-2367
Q3) Does high self-disclosure users have many conversation partners?
Ans) ‘mid’ self-disclosure users have more conversation partners than others
# Partners # Conv / Day Words / Conv Conv Length
low 3.33 0.46 59.17 4.13
mid 3.55 0.52 61.17 4.28
high 3.47 0.54 63.26 4.45
p-value <0.001 <0.001 <0.1 <0.001
Results
2014-10-2368
Q4) Does high self-disclosure users have more conversations frequently?
Ans) ‘high’ self-disclosure users have more conversations per day than others
# Partners # Conv / Day Words / Conv Conv Length
low 3.33 0.46 59.17 4.13
mid 3.55 0.52 61.17 4.28
high 3.47 0.54 63.26 4.45
p-value <0.001 <0.001 <0.1 <0.001
Results
2014-10-2369
Finding) Researchers often look at the number of words in a conversation
for relation with self-disclosure Conversation length is more significant than # words
# Partners # Conv / Day Words / Conv Conv Length
low 3.33 0.46 59.17 4.13
mid 3.55 0.52 61.17 4.28
high 3.47 0.54 63.26 4.45
p-value <0.001 <0.001 <0.1 <0.001
Summary
2014-10-2370
Self-disclosure (SD) Definition from social psychology Limitations in previous research
Computational approaches for self-disclosure Twitter conversation dataset Self-disclosure topic model (SDTM)
Self-disclosure & Social features Relationship strength over time Conversation partners and frequency
Future Work
2014-10-2371
Self-disclosure for a user timeline tweets Have positive relations with
Loneliness [Al-Saggaf.2014]
Online social network usage [Trepte.2013]
Predict user’s Loneliness and give a social support
Usage patterns in online social network and give feedback
Self-disclosure by machine Looks like human in dialogue system Can increase satisfaction in talking cure dialogue system
Reference
2014-10-2372
[Jourard1971b] Sidney M Jourard. 1971b. The transparent self (rev. ed.). Princeton, NJ: VanNostrand.
[Jourard1958] Sidney M Jourard and Paul Lasakow. 1958. Some factors in self-disclosure. The Journal of Abnormal and Social Psychology, 56(1):91.
[Dunbar et al.1997] Robin IM Dunbar, Anna Marriott, and Neil DC Duncan. 1997. Human conversational behavior. Human Nature, 8(3):231–246.
[Vondracek and Vondracek1971] Sarah I Vondracek and Fred W Vondracek. 1971. The manipulation and measurement of self-disclosure in preadolescents. Merrill-Palmer Quarterly of Behavior and Development, 17(1):51–58.
[Chelune and others1979] Gordon J Chelune et al. 1979. Self-disclosure: Origins, patterns, and implications of openness in interpersonal relationships. Jossey-Bass San Francisco.
[Barak&Gluck-Ofri2007] Azy Barak and Orit Gluck-Ofri. 2007. Degree and reciprocity of self-disclosure in online forums. CyberPsychology & Behavior, 10(3):407–417.
[Jo2011] Jo, Yohan, and Alice H. Oh. "Aspect and sentiment unification model for online review analysis." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.
Reference
2014-10-2373
[Tamir and Mitchell2012] Diana I Tamir and Jason P Mitchell. 2012. Disclosing information about the self is intrinsically rewarding. roceedings of the National Academy of Sciences, 109(21):8038–8043.
[Duck2007] Steve Duck. 2007. Human relationships. Sage. [Bak et al.2012] JinYeong Bak, Suin Kim, and Alice Oh. 2012. Self-disclosure and
relationship strength in twitter conversations. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 60–64. Association for Computational Linguistics.
[Derlega et al.1993] Valerian J. Derlega, Sandra Metts, Sandra Petronio, and Stephen T. Margulis. 1993. Self-Disclosure, volume 5 of SAGE Series on Close Relationships. SAGE Publications, Inc.
[Wills1985] Thomas Ashby Wills. 1985. Supportive functions of interpersonal relationships.
[Trepte.2013] Sabine Trepte and Leonard Reinecke. 2013. The reciprocal effects of social network site use and the disposition for selfdisclosure: A longitudinal study. Computers in Human Behavior, 29(3):1102 – 1112.
[Harris, J, 2009] Kamvar, Sep, and Jonathan Harris. We feel fine: An almanac of human emotion. Simon and Schuster, 2009.
Reference
2014-10-2374
[Houghton and Joinson2012] David J Houghton and Adam N Joinson. 2012. Linguistic markers of secrets and sensitive self-disclosure in twitter. In System Science (HICSS), 2012 45th Hawaii International Conference on, pages 3480–3489. IEEE.
[Steinfield et al.2008] Charles Steinfield, Nicole B Ellison, and Cliff Lampe. 2008. Social capital, selfesteem, and use of online social network sites: A longitudinal analysis. Journal of Applied Developmental Psychology, 29(6):434–445.
[Petronio2002] Petronio, S. 2002. Boundaries of privacy: Dialectics of disclosure. 29. Albany, NY
[Valkenburg2011] Valkenburg, Patti M and Sumter. 2011. Sindy R and Peter, Jochen, Gender differences in online and offline self-disclosure in pre-adolescence and adolescence. British journal of developmental psychology
[Sprecher2012] Susan Sprecher, Stanislav Treger and Joshua D. Wondra. 2012. Effects of self-disclosure role on liking, closeness, and other impressions in get-acquainted interactions. Journal of Social and Personal Relationships.
[Zhao2010] Wayne Xin Zhao, Jing Jiang, HongfeiYan, and Xiaoming Li. 2010. Jointly modeling aspects and opinions with a maxent-lda hybrid. In Proceedings of EMNLP.
Thank you!Any questions or comments?
JinYeong [email protected]
Department of Computer Science, KAIST
75 2014-10-23