Self-disclosure in twitter conversations - talk in QCRI

Self-disclosure in Twitter conversations

JinYeong [email protected]

Department of Computer Science, KAIST

About Me

2014-10-232

JinYeong Bak Ph.D. student at KAIST, U&I Lab Research interests Bayesian Data Analysis Computational Social Science

About Me

2014-10-232

JinYeong Bak Ph.D. student at KAIST, U&I Lab Research interests Bayesian Data Analysis Computational Social Science

Research Intern, MSRA, 2013, Supervisor: Chin-Yew Lin Related publications Self-Disclosure and Relationship Strength in Twitter Conversations,

ACL 2012 (with Suin Kim, Alice Oh) Self-disclosure topic model for classifying and analyzing Twitter

conversations, EMNLP 2014 (with Chin-Yew Lin, Alice Oh)

2014-10-23

Overview

2014-10-23

Self-disclosure

4 2014-10-23

Self-disclosure

4 2014-10-23

Self-disclosure

4 2014-10-23

Self-disclosure

4 2014-10-23

Limitations in Previous Works

2014-10-235

Survey Hand coding Lab environment


2014-10-235

Survey Hand coding Lab environment

Hard to identify self-disclosure

in naturally occurring and large dataset

Twitter Conversations

https://twitter.com/britneyspears

Example of a Twitter conversation

6 2014-10-23


Graphical model of Self-Disclosure Topic Model

Self-Disclosure Topic Model (SDTM)

7 2014-10-23



7 2014-10-23

Accuracy and average F1

Self-disclosure & Social featuresWhat are relations between self-disclosure and social features in Twitter conversations?

8

2014-10-23

Self-disclosure (SD)

The verbal expressions by which a person reveals aspects of self to others [Jourard1971b]

Process of making the self known to others [Jourard&Lasakow1958]

3~40% of everyday conversation is consist of self-disclosure [Dunbar et al.1997]

Self-disclosure: Definition

10 2014-10-23

Self-disclosure: Level

2014-10-2311

Self-disclosure level [Vondracek et al.1971, Barak et al.2007]

No disclosure (G level)General information and ideas

Medium disclosure (M level)General information about self or someone close to him

High disclosure (H level)Sensitive information about self or someone close to him

Self-disclosure: G level General information and ideas No information about self or someone close to him

12 2014-10-23

Self-disclosure: M level General information about self or someone close to him Personal events, age, occupation and family members

13 2014-10-23

Self-disclosure: H level Sensitive information about self or someone close to him Problematic behaviors of self and family members Physical appearance, health, death, sexual topics

14 2014-10-23

Self-disclosure: Relations

2014-10-2315

Human relationship Degree of self-disclosure in a relationship depends on the

strength of the relationship [Duck2007]

Strategic self-disclosure can strengthen the relationship


2014-10-2316

Benefits Can get social support from others [Derlega et al.1993]

Can cope with stress [Derlega et al.1993,Tamir and Mitchell2012]

Examples


2014-10-2317

Consideration Easy to be attacked when private information is opened Need to manage privacy boundary (e.g. people, topics) [Petronio2002]

Example


2014-10-2318

Survey Asking questions to participants Cons) Biased by participants memory


2014-10-2318


Hand coding Analyzing dataset by human Cons) Cannot apply to large dataset


2014-10-2318


Hand coding Analyzing dataset by human Cons) Cannot apply to large dataset

Lab environment Experiments held in lab or artificial environment Cons) Not real/naturally occurring dataset

Research Questions

2014-10-2319

How can we find self-disclosure in large & naturally occurring corpus automatically?

Research Questions

2014-10-2319

How can we find self-disclosure in large & naturally occurring corpus automatically?

What are relations between self-disclosure and social features in large & naturally occurring corpus?

2014-10-23

Twitter Conversations

Twitter

21

Online social networking service www.twitter.com 200 million users send over 400 million tweets daily

(2013.09)

2014-10-23https://twitter.com/NoSyu

http://www.twitter.com/

https://twitter.com/NoSyu

Tweet

2014-10-2322

Users write 140-characters messages Users mention others or re-tweet other’s tweet

Conversation in Twitter

2014-10-2323

Users have a conversation in Twitter



Conversation Topics

2014-10-2324

Users discuss several topics with others

Soccer Politics

Conversation Topics

2014-10-2325

Users discuss several topics with others

Places Family

Twitter Conversations A Twitter conversation 5 or more tweets At least one reply by each user



26 2014-10-23


Twitter Conversations A Twitter conversation 5 or more tweets At least one reply by each user

Twitter conversation data Aug 2007 to Jul 2013 102K users 2M conversations 17M tweets



26 2014-10-23


2014-10-23

Self-disclosure and relationship strength in Twitter conversations

ACL 2012 short paper

2014-10-23


2014-10-2328

Human relationship Degree of self-disclosure in a relationship depends on the

strength of the relationship [Duck2007]

Strategic self-disclosure can strengthen the relationship

Research Question

2014-10-2329

Does Twitter conversations also show a similar pattern? Dyads with high relationship strength show more self-

disclosure behavior Dyads with low relationship strength show less self-disclosure

behavior

Methodology Twitter data 131K users 2M conversations

30 2014-10-23


Relationship strength Conversation frequency (CF) Conversation length (CL)

30 2014-10-23



Self-disclosure Personal information Profanity

30 2014-10-23



Self-disclosure Personal information Profanity

Analysis with topic models Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])

30 2014-10-23

Relationship Strength CF: conversation frequency The number of conversational chains between the dyad

averaged per month

CL: conversation length The length of conversational chains between the dyad

averaged per month

31 2014-10-23

Relationship Strength CF: conversation frequency The number of conversational chains between the dyad

averaged per month

CL: conversation length The length of conversational chains between the dyad

averaged per month

Relationship strength A high CF or CL for a dyad means the relationship is strong A low CF or CL for a dyad means the relationship is weak

31 2014-10-23

Self-disclosure Personal information Personally Identifiable Information (PII) Personally Embarrassing Information (PEI)

Profanity nigga, ass, wtf, lmao

32 2014-10-23

Self-disclosure: Personal InformationPersonally Identifiable Information (PII)

Personally Embarrassing Information (PEI)

33 2014-10-23

Ex) name, location, email address, job,social security number

Ex) clinical history,sexual life,job loss, family problem

Self-disclosure: Personal Information Discover topics in each conversation Use LDA [Blei2003] with 𝑘𝑘 = 300 LDA outputs a topic proportion for each conversation LDA outputs a multinomial word distribution for each topic

34 2014-10-23

Self-disclosure: Personal Information Discover topics in each conversation Use LDA [Blei2003] with 𝑘𝑘 = 300 LDA outputs a topic proportion for each conversation LDA outputs a multinomial word distribution for each topic

Find related topics Annotate conversations that best represent each topic Use Amazon Mechanical Turk Turkers annotated conversations for

Existence of PII

Existence of PEI

Keywords

34 2014-10-23

Self-disclosure: Personal InformationExample of PII, PEI and Profanity topics Shown by high probability words in each topic

PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity

san tonight pants teeth family nigga

live time wear doctor brother lmao

state tomorrow boobs dr sister shit

texas good naked dentist uncle ass

south ill wearing tooth cousin bitch

35 2014-10-23

Results

2014-10-2336

weak strong weak strong

Profanity PII & PEI

Conversation Frequency Conversation Length

Profanity PII & PEI

Results

2014-10-2337

weak strong weak strong

profanity PII & PEI

Conversation Frequency Conversation Length

profanity PII & PEI

Results: Interpretation PII When they meet new acquaintances,

they use PII to introduce themselves

38 2014-10-23

Summary Used a large corpus of Twitter conversations Measured relationship strength by conversation frequency

and conversation length Measured self-disclosure by PII, PEI Profanity

Confirmed hypothesis that stronger relationships show more self-disclosure behaviors in Twitter conversations

39 2014-10-23

Weakness of the Paper

2014-10-2340

Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degree


2014-10-2340

Use naïve definition of degree of self-disclosure PII, PEI, Profanity Need to use more concrete definition for self-disclosure degreeSelf-disclosure level


2014-10-2340


Use naïve computational method LDA with post-processing Need to build more concrete novel method

Self-disclosure level


2014-10-2340


Use naïve computational method LDA with post-processing Need to build more concrete novel method

Self-disclosure level

Self-disclosure Topic Model

2014-10-23

Self-disclosure Topic Model (SDTM)

EMNLP 2014 long paper

2014-10-23

Difficulties for SD research Lack of ground-truth dataset of SD level No tagged dataset for Twitter conversation No accessible self-disclosure datasets

42 2014-10-23

Difficulties for SD research Lack of ground-truth dataset of SD level No tagged dataset for Twitter conversation No accessible self-disclosure datasets

Lack of study about SD in computational linguistics Definitions and relations with others in social psychology Survey or hand-coding Related word categories in LIWC [Houghton et al.2012]

42 2014-10-23

Ground-truth Dataset Process Sample random 301

Twitter conversations Ask it to three judges Tag self-disclosure level

to each tweet Work on a web-based platform

43Screenshot of annotation web-based platform

2014-10-23

Ground-truth Dataset Process Sample random 301

Twitter conversations Ask it to three judges Tag self-disclosure level

to each tweet Work on a web-based platform

Result Tagged G: 122, M: 147, H: 32

conversations Fleiss kappa: 0.68

43Screenshot of annotation web-based platform

2014-10-23

Assumptions: First person pronounsFirst person pronouns are good indicators for self-disclosure Ex) ‘I’, ‘My’ Used in previous research [Joinson et al.2001, Barak et al.2007]

44 2014-10-23

Assumptions: First person pronounsFirst person pronouns are good indicators for self-disclosure Ex) ‘I’, ‘My’ Observed highly discriminative features between G and M/H in

annotated dataset

45

Unigram Bigram Trigram

my I love I have a

I I was is going to

I’m I have to go to

but my dad want to go

was go to and I was

I’ve my mom going to miss

2014-10-23

Assumptions: TopicsM and H level have different topics [General vs Sensitive] information about self or intimate

46 2014-10-23

Assumptions: TopicsSelf-disclosure related topics by LDA

Location Time Adult Health Family Profanity

san tonight pants teeth family nigga

live time wear doctor brother lmao

state tomorrow boobs dr sister shit

texas good naked dentist uncle ass

south ill wearing tooth cousin bitch

47 2014-10-23

Assumptions: TopicsM and H level have different topics [General vs Sensitive] information about self or intimate Can be formalized as topics

Personally Identifiable Information General information about self Ex) name, location, email address, job, …

Secrets Sensitive information about self Ex) physical appearance, health, sexuality, death, …

48 2014-10-23


Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling

49 2014-10-23


Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling Classifying G and M/H level Observed first-person pronouns Using learned maximum entropy classifier

49 2014-10-23


Self-Disclosure Topic Model (SDTM) Based on probabilistic topic modeling Classifying G and M/H level Observed first-person pronouns Using learned maximum entropy classifier

Classifying M and H level Observed words Using seed words for each level

49 2014-10-23


2014-10-2350

Rough description of how to infer self-disclosure in SDTM

Maximum Entropy Classifier

Topic Model

G level

M level

H level

Topic Model with Seed Words

Tweet


2014-10-2350



Topic Model

G level

M level

H level


Tweet


2014-10-2350



Topic Model

G level

M level

H level


Tweet


2014-10-2350



Topic Model

G level

M level

H level


Tweet


2014-10-2350



Topic Model

G level

M level

H level


Tweet


2014-10-2350



Topic Model

G level

M level

H level


Tweet


2014-10-2351

Learned from annotated dataset Works better than others

(C4.5, Naïve Bayes, SVM with linear kernel, polynomial kernel and radial basis)

Used to identify aspect and opinions in topic model [Zhao2010]

Seed WordsSeed words are prior knowledge for each level G level No seed words (symmetric prior)

M level Data-driven approach in Twitter conversation

H level Data-driven approach from external dataset

52 2014-10-23

Seed Words M level Data-driven approach

Use Twitter conversation dataset

Get frequently occurred trigram that begin with ‘I’ and ‘my’

53 2014-10-23

Seed Words M level Data-driven approach

Use Twitter conversation dataset

Get frequently occurred trigram that begin with ‘I’ and ‘my’

Example seed words

53

Name Birthday Location Occupation

My name is My birthday is I live in My job is

My last name My birthday party I lived in My new job

My real name My bday is I live on My high school

2014-10-23

Seed Words H level Data-driven approach

Use external dataset (Six Billion Secrets) http://www.sixbillionsecrets.com Users write and share his/her secrets 26,523 posts

Extract high ranked word features

54 2014-10-23

Example of secret posts in Six Billion Secrets

http://www.sixbillionsecrets.com/

Seed Words H level Data-driven approach

Use external dataset (Six Billion Secrets) http://www.sixbillionsecrets.com Users write and share his/her secrets 26,523 posts

Extract high ranked word features

Example seed words

54

Physical appearance Health condition Death

chubby addicted deadfat surgery died

scar syndrome suicideacne disorder funeral

2014-10-23

Example of secret posts in Six Billion Secrets

http://www.sixbillionsecrets.com/

Classifying Performance Data Annotated Twitter conversation Random shuffled 80/20 train/test

55 2014-10-23

Classifying Performance Data Annotated Twitter conversation Random shuffled 80/20 train/test

Methods BOW+

Bag of Words + Bigrams + Trigrams features, Maximum entropy FirstP

Occurrence of first-person pronouns features, Maximum entropy SEED

Seed words and trigrams features, Maximum entropy FirstP+SEED

FirstP and SEED feature, Two stage Maximum entropy SDTM

Self-disclosure Topic Model

55 2014-10-23

Classifying Performance

56 2014-10-23


56 2014-10-23


57 2014-10-23


57 2014-10-23


57 2014-10-23


57 2014-10-23

2014-10-23

Self-disclosure & Social features

EMNLP 2014 long paper

Self-disclosure & Social featuresWhat are relations between self-disclosure and social features in Twitter conversations? Research questions

1. Does high self-disclosure lead to longer conversations?2. Is there difference in conversation length patterns over time

depending on overall self-disclosure level?3. Does high self-disclosure users have many conversation partners?4. Does high self-disclosure users have more conversations

frequently?

59

Research QuestionsQ1) Does high self-disclosure lead to longer conversations?

60 2014-10-23

Research QuestionsQ2) Is there difference in conversation length patterns over time depending on overall self-disclosure level?

61 2014-10-23

High SD level dyad

Low SD level dyad

Research Questions

2014-10-2362

Q3) Does high self-disclosure users have many conversation partners?

High SD level userLow SD level user

Research Questions

2014-10-2363

Q4) Does high self-disclosure users have more conversations frequently?

High SD level userLow SD level user

ResultsHigh ranked topics in each level (G, M, H levels)

Shown by high probability words in each topic

G 1 G 2 M 1 M 2 H 1 H 2

obama league send going better ass

he’s win email party sick bitch

romney game i’ll weekend feel fuck

vote season sent day throat yo

right team dm night cold shit

president cup address dinner hope fucking

64 2014-10-23

ResultsQ1) Does high self-disclosure lead to longer conversations?Ans) Positive relations between initial SD level and changes CL

65 2014-10-23

ResultsQ2) Is there difference in CL patterns over time by overall SD level?

Ans) ‘high’ and ‘mid’ groups increase CL over time, not ‘low’

‘high’ groups talk more in a conversation than ‘mid’ & ‘low’ groups

66 2014-10-23

Results

2014-10-2367

Q3) Does high self-disclosure users have many conversation partners?

Ans) ‘mid’ self-disclosure users have more conversation partners than others

# Partners # Conv / Day Words / Conv Conv Length

low 3.33 0.46 59.17 4.13

mid 3.55 0.52 61.17 4.28

high 3.47 0.54 63.26 4.45

p-value <0.001 <0.001 <0.1 <0.001

Results

2014-10-2368

Q4) Does high self-disclosure users have more conversations frequently?

Ans) ‘high’ self-disclosure users have more conversations per day than others


low 3.33 0.46 59.17 4.13

mid 3.55 0.52 61.17 4.28

high 3.47 0.54 63.26 4.45

p-value <0.001 <0.001 <0.1 <0.001

Results

2014-10-2369

Finding) Researchers often look at the number of words in a conversation

for relation with self-disclosure Conversation length is more significant than # words


low 3.33 0.46 59.17 4.13

mid 3.55 0.52 61.17 4.28

high 3.47 0.54 63.26 4.45

p-value <0.001 <0.001 <0.1 <0.001

Summary

2014-10-2370

Self-disclosure (SD) Definition from social psychology Limitations in previous research

Computational approaches for self-disclosure Twitter conversation dataset Self-disclosure topic model (SDTM)

Self-disclosure & Social features Relationship strength over time Conversation partners and frequency

Future Work

2014-10-2371

Self-disclosure for a user timeline tweets Have positive relations with

Loneliness [Al-Saggaf.2014]

Online social network usage [Trepte.2013]

Predict user’s Loneliness and give a social support

Usage patterns in online social network and give feedback

Self-disclosure by machine Looks like human in dialogue system Can increase satisfaction in talking cure dialogue system

Reference

2014-10-2372

[Jourard1971b] Sidney M Jourard. 1971b. The transparent self (rev. ed.). Princeton, NJ: VanNostrand.

[Jourard1958] Sidney M Jourard and Paul Lasakow. 1958. Some factors in self-disclosure. The Journal of Abnormal and Social Psychology, 56(1):91.

[Dunbar et al.1997] Robin IM Dunbar, Anna Marriott, and Neil DC Duncan. 1997. Human conversational behavior. Human Nature, 8(3):231–246.

[Vondracek and Vondracek1971] Sarah I Vondracek and Fred W Vondracek. 1971. The manipulation and measurement of self-disclosure in preadolescents. Merrill-Palmer Quarterly of Behavior and Development, 17(1):51–58.

[Chelune and others1979] Gordon J Chelune et al. 1979. Self-disclosure: Origins, patterns, and implications of openness in interpersonal relationships. Jossey-Bass San Francisco.

[Barak&Gluck-Ofri2007] Azy Barak and Orit Gluck-Ofri. 2007. Degree and reciprocity of self-disclosure in online forums. CyberPsychology & Behavior, 10(3):407–417.

[Jo2011] Jo, Yohan, and Alice H. Oh. "Aspect and sentiment unification model for online review analysis." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.

Reference

2014-10-2373

[Tamir and Mitchell2012] Diana I Tamir and Jason P Mitchell. 2012. Disclosing information about the self is intrinsically rewarding. roceedings of the National Academy of Sciences, 109(21):8038–8043.

[Duck2007] Steve Duck. 2007. Human relationships. Sage. [Bak et al.2012] JinYeong Bak, Suin Kim, and Alice Oh. 2012. Self-disclosure and

relationship strength in twitter conversations. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 60–64. Association for Computational Linguistics.

[Derlega et al.1993] Valerian J. Derlega, Sandra Metts, Sandra Petronio, and Stephen T. Margulis. 1993. Self-Disclosure, volume 5 of SAGE Series on Close Relationships. SAGE Publications, Inc.

[Wills1985] Thomas Ashby Wills. 1985. Supportive functions of interpersonal relationships.

[Trepte.2013] Sabine Trepte and Leonard Reinecke. 2013. The reciprocal effects of social network site use and the disposition for selfdisclosure: A longitudinal study. Computers in Human Behavior, 29(3):1102 – 1112.

[Harris, J, 2009] Kamvar, Sep, and Jonathan Harris. We feel fine: An almanac of human emotion. Simon and Schuster, 2009.

Reference

2014-10-2374

[Houghton and Joinson2012] David J Houghton and Adam N Joinson. 2012. Linguistic markers of secrets and sensitive self-disclosure in twitter. In System Science (HICSS), 2012 45th Hawaii International Conference on, pages 3480–3489. IEEE.

[Steinfield et al.2008] Charles Steinfield, Nicole B Ellison, and Cliff Lampe. 2008. Social capital, selfesteem, and use of online social network sites: A longitudinal analysis. Journal of Applied Developmental Psychology, 29(6):434–445.

[Petronio2002] Petronio, S. 2002. Boundaries of privacy: Dialectics of disclosure. 29. Albany, NY

[Valkenburg2011] Valkenburg, Patti M and Sumter. 2011. Sindy R and Peter, Jochen, Gender differences in online and offline self-disclosure in pre-adolescence and adolescence. British journal of developmental psychology

[Sprecher2012] Susan Sprecher, Stanislav Treger and Joshua D. Wondra. 2012. Effects of self-disclosure role on liking, closeness, and other impressions in get-acquainted interactions. Journal of Social and Personal Relationships.

[Zhao2010] Wayne Xin Zhao, Jing Jiang, HongfeiYan, and Xiaoming Li. 2010. Jointly modeling aspects and opinions with a maxent-lda hybrid. In Proceedings of EMNLP.

Thank you!Any questions or comments?

JinYeong [email protected]

Department of Computer Science, KAIST

75 2014-10-23

Presentations & Public Speaking

Self-disclosure in twitter conversations - talk in QCRI