KnowMe and ShareMe: Understanding Automatically Discovered Personality Traits from Social Media and User

KnowMe and ShareMe:

Understanding Automatically

Discovered Personality Traits from

Social Media and User

+ CHI 2014

- Liang Gou, Michelle X. Zhou, Huahai Yang

/ 맹욱재

x 2014 fall like winter

Problem

Much recent work on using the digital footprints on social media to predict personal traits.

1)veracity of social media2)imperfection in prediction algorithms3)senstive nature of one’s personal traits

much research still needed for better effectivenesse.g user’s preference of sharing their computationally drived traits

Method in brief

2 part study involving 256 participants1)examing the feasibility & effectiveness of

automatically deriving 3 types of personality traits from Twitter, including Big 5 personality, basic human values, fundamental needs.

2) investigating user’s opinion of using & sharing of these traits.

Findings

potential feasibility of automatically deriving one’s personality traits from social media with various factor impacting the accuracy of model

61.5% users are willing to share their derived traits in the workplace.

many factors significantly influence their sharing preference

Intro

Psychology to Behavioral Economicspersonality

influence a person’s behavior & performance.

Traditional psychometric tests are

impractical in real world. e.g., asking millions of

customer to take a personality test for customized service

Intro (Cont.)

Advances in Psycholinguisticsfeasible to automatically infer personality traits from one’s

liguistic footprints

Emergence of social mediaprompted many users to leave their linguistic footprints on

the internet

Method in detail

Developing s systemusing one’s Twitter footprints to automatically derive her

personality traits.Not using # of posts, votes But analyzing

the language choice with lexicon-based approach

This model compute 3 basic types of personality traits

UI of KnowMe

Pilot Experiment

3 main issues(veracity of social media, imperfection in prediction algorithms,

senstive nature of one’s personal traits)

with limited group of users within company

Research Questions

To find answer for 2 set of questions

How accurate are our system?

– How well match with the psychometric test scores?

– How well match with our users’ perception about themselves?

Whether & how would users share in an enterprise context?

– What and with whom?

– What are the perceived benefits and risks of sharing?

Related Work

Personality Modeling & ComputationPsychology(Marketing) & Behavioral Economics

-> Computational ModelingPsycholiguistic analysis

Constructing own dictionaries

Big 5 personality to essays, conversation scripts, emails infering political

orientation, emotional statesBig 5 personality to Facebook, Twitter

-> + Basic values, fundamental Needs

Big 5, Basic Values, Needs

Related Work (Cont.)

Privacy, Contextual Integrity & PersonalityPrivacy preference in different types of data

personal communication( email, social media)

mobile, location based activities

-> sharing of personal traits

People’s traits impact their privacy concerns

trust & risk propensity - certain dimension of Big 5

-> many factors beyond Big 5 on sharing of personal traits

Contextual Integrity Theory

privacy concerns from violation & changes of

1) context 2) actors 3) attributes 4) transmission principle

-> workplac who traits & properties traits granuality

KnowMe

log in with Twitter ID

collecting most recent 200 public tweets(representative sample :

within 10% rank of the result from all)

-> automatically drives 3 types of personal traits

lexicon-based analysis by calculating correlation between

traits and words.(e.g. “we”,”us” - high agreeable of Big 5, self-transcendance of basic values)

Big 5, basic values - LIWC dictionary

needs model - custom dictionary

Custom Dictionaries

Hybrid emprical & computational approach

1) large-scale, psychometric studies on Mechanical Turk

to collect training data. item-based survey collecting

user’s psychometric scores describing their needs +

participant generated text from 5000 turkers

2) how these texts correlated with each needs dimension

-> built customized dictionary

+ built statistical model to predict scores

Participants

1325 colleagues with at least 200 tweets invited

via email

-> 625 responed

-> 256 completed study

(369 droped due to lengty survey - 45m)

USA(42%), Europe(32.1%), rest(25.9%)

age : 30~45 (representative of company)

Main Experiment Part 1

Assessing automatically drived traits

gauging accuracy of system

Psychometric test

50-item Big 5, 21-item basic values, 52-item fundametal needs

Perception of derived traits

video tutorial, system provided detailed explanation of traits

asking rate how well match of perception of themselves on 5-likert scale

never given psychometric score for avoiding the interaction effect

Main Experiment Part 2

Understanding traits sharing preference

guided by framework of contextual integrity

Attribute of information

Trait type

H1a. different preference for sharing 3 types of traits

H1b. Within each traits, different sharing preference for its

sub-traits

Trait value

H2a. values of traits affect sharing preference

H2b. likely to share more high-values positive traits

Main Experiment Part 2 (Cont.)

Trait accuracy

H3a. accuracy of traits impacts sharing behavior

H3b. tend to share accurate traits

Actors

H4a. different preference about sharing traits with different audience

(“public”, “dostant colleagues”, “mamagement”, “close colleagues”)

H4b. 3 types of traits impact sharing behavior

For general disposition toward privacy & adoption of new tech

asked 5 questions including 3 from Disposition to Value Privacy

2 from Techology Innovativeness.

Main Experiment Part 2 (Cont.)

Context

user’s perceived benefits & risks of sharing traits in company.

Receive their opinion

Trasmission Priciples

types of constraints on information flow from senders to recipients.

Granuality of information at 3 levels : “none”, “range” (ordinal scale),

“numeric” (precise score)

Results Part 1

224 completed all questions

Comparing with psychometric scores

Correlation coefficients 0.05 < r < 0.2 : consistent with previous work

using multi-dimensionality measure

Big 5. extrovesion & agreeableness : highly correlated (p=0.001)

basic values, conservation & open to change : negatively correlated

(p=0.003)

using RV-coefficient for overall correlation

basic values(p=0.06), Big 5(p=0.83), needs(p=0.61)

Results Part 1 (Cont.)

Results Part 2

Effects of Trait Type

type of traits significantly affect on sharing preference(p<0.001)

“(values) seems VERY personal information...feel vulnerable if shared in workplace”


Effects of Trait Value

value of traits only significant on basic values (p=0.001)

H2a, H2b partially supported


Effects of Trait Accuracy

accuracy of all traits significantly affect on sharing preference

(p<0.001)

prefered to share “perfect” trait (p<0.01)

low preference of “not at all” trait to other level (p<0.05)

H3a, H3b supported

Effects of Actors

61.5% willing to share more with close colleagues & management

than

others (p<0.001)

No significant difference between close colleagues and management

No interactions among traits -> consistent across all.

H4a partially supported


Effects of User’s Personality Traits - H4b supported


Perceived Benefits and Risks

Two coders independently read all complete responses(225 X 3 X 2 = 1344)

categorized it with several interations.

Inter-coder reliability with Cohen’s Kappa benefits = 0.94, risk = 0.95


Prefered Control Mechanisms

11category Cohen’s Kappa = 0.93

Implication

Support of System Transparency

1) usage wise, clearly explain the meaning of each trait

“Existence of clear legend...might understand something

else...explains carefully”

“Give some example… of misuse/misinterpretate...good

example with benefits and risks are key”

2) functionally, system should be prescriptive and clearly

state of capability & limitation

“...certain attribute are inaccurate...to gauge them

properly”

“...inform how many entries were taken … for such result”

Implication (Cont.)

Mixed-Initiative Privacy Preserving

1) what to share - control granuality

“...able to switch off sections of information”

“analysis of how traits are perceived by others…”

2) what to share with - specify recipient

“approve explicitly the list of people...validate the impact..”

3) wanting alert when some accessing their profiles

“social listening feedback loop what others might

perceive”

4) when to share - giving sense of protection

“don’t to show to new manager”

Implication (Cont.)

Mixed-Initiative Privacy Preserving

5) control of sharing frequency

track her “downtime”

6) where to share

control channel (paper, email, online depending on

context)

Implication (Cont.)

User-Assisted Personality Discovery

1) allowing user to amend & mark derived results

limitation on analytic inaccuracy, data quality, culture

influence

collective amendments from multiple users for system’s

learning

2) allowing user to comment for system’s learning

3) allowing user to select the results

“...exclude specific tweet”

4) potential system abuse - manipulating the result for

advantage

Discussion

Data Variety and Model Effectiveness

1) non-trivial task due to nuances in channel & context

“...only share work-related contents on twitter...”

2) People’s personality can be changed from big events like

becoming a parent

3) multiple personality even within one channel

“...my Work Twitter… my personal twitter...”Research Implication

1) multiple source are characterized by availability, veracity, life span and

so on

2) how to consolidate muliple personalities - hybrid approach

Discussion (Cont.)

Cultural and Language Influence

1) psychological model based on Western culture

needs model based on Maslow’s hierachy of needs - 20c

Western middle-class males

hardly neutral

only English input -vastly differ for Chinese

2) prefered to share individual traits valued by Wester

culture like openess & idealisn

“Family has very different meaning…”

3) language proficiency influence on results

language-specific model should be develped

Discussion Point

1. Are you willing to use this system? Can you

belive this system accurately derive your traits?

2. Variety of data sources for better deriving

VS

Specific area of data for better deriving