87
User Trait Expression and Portrayal through Social Media Daniel Preot ¸iuc-Pietro Bloomberg LP 1 November 2018

User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

User Trait Expression and Portrayalthrough Social Media

Daniel Preotiuc-Pietro

Bloomberg LP

1 November 2018

Page 2: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Context

The availability of large scale user generated data provides thecontext for new applications and research.

The key elements are:• metadata

• user• time• location

• volume• diversity

• text• images• network information

Page 3: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Context

The availability of large scale user generated data provides thecontext for new applications and research.

The key elements are:• metadata

• user• time• location

• volume• diversity

• text• images• social connections

Page 4: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

User Traits and Text

Hypothesis

User generated text reveals individual differences in bothdemographic and psychological traits.

Page 5: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Demographic Traits

• Age (Rao et al. 2010, ACL)• Gender (Burger et al. 2011, EMNLP)• Location (Eisenstein et al. 2010, EMNLP)• Political Orientation (Volkova et al. 2014, ACL)

Page 6: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Demographic Traits

• Age (Rao et al. 2010, ACL)• Gender (Burger et al. 2011, EMNLP)• Location (Eisenstein et al. 2010, EMNLP)• Political Orientation (Volkova et al. 2014, ACL)• Popularity (Lampos et al. 2014, EACL)• Occupation (Preotiuc-Pietro et al. 2015, ACL)• Income (Preotiuc-Pietro et al. 2015, PLoS ONE)• Political Ideology (Preotiuc-Pietro et al. 2017, ACL)• Race (Preotiuc-Pietro & Ungar 2018, COLING)

Page 7: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Psychological Traits

Psychological traits:• Mental illness (Coppersmith et al. 2014, ACL)• Personality (Schwartz et al. 2013, PLoS ONE)• Empathy (Abdul-Mageed et al. 2017, ICWSM)

Page 8: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Psychological Traits

Psychological traits:• Mental illness (Coppersmith et al. 2014, ACL)• Personality (Schwartz et al. 2014, PLoS ONE)• Empathy (Abdul-Mageed et al. 2017, ICWSM)• ‘Dark Triad’ Personality (Preotiuc-Pietro et al. 2016, CIKM)• Active Open-Minded Thinking (Carpenter et al. 2018,

JDM, in press)

Page 9: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

1. Data2. Prediction3. Insight

Example: Political Ideology

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 10: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Data

Social media data analysis:

• Unobtrusive• Observe behaviors, rather than self-reported

• Access to data from a larger and more diverse population• Traditional social science research is based on convenience

lab samples

• Access to both historical and real-time data• Fine spatial granularity

Page 11: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Data - Ethics

Twitter – profiles are public by default

Facebook/Instagram – users provide informed consent to sharedata

User-trait analysis requires trait-level information and,provided through surveys, is sensitive and is anonymised.

All studies were approved by the institutional Internal ReviewBoard (IRB).

Page 12: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Data - Example

We collected a new data set:• 3.938 users (4.8M tweets)• public Twitter handles with >100 posts

Political ideology is reported through an online survey:• our use case is US politics• the major US ideology spectrum is Conservative – Liberal• seven point scale• additionally reported age, gender and other demographics

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 13: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Data - Applications

Social media data enables new types of applications andstudies.

Real-time passive polling:

Page 14: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Prediction

Prediction Insight

Perspective NLP/ML Social Science

Goal Models to predict traits of unknownusers

Gain a better understanding ofgroup behaviors and differences

Framing Predictive task Exploring/testing hypotheses

Methods Regression/Classification Statistical hypothesis testingInterpretable featuresUse domain experts in analysis

Page 15: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Prediction - Example

• Linear Regression• Learning: V. Conservative (1) – V. Liberal (7)• Engagement: Neural (4) – Moderate C/L (3&5) – C/L (2&6)

– Very C/L (1&7)• 10 fold-cross validation• Range of linguistic features• Evaluation – Pearson R between predictions and true labels

.294

.165

.286

.149

.300

.169.145

.079

.256

.169

.369

.196

.00

.10

.20

.30

.40

Leaning Engagement

Unigrams LIWC Topics Emotions Political All

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 16: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Prediction - Applications

Applications of predictive models of user traits:

• Improving downstream NLP tools:• sentiment analysis• text classification

• Personalised AI applications:• machine translation• dialogue systems with an identity

• Uncover and adjust model biases• Control for demographic biases in data analysis• Marketing or Targeted ads• Measure communities in real-time over space and time

Page 17: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Insight

Prediction Insight

Perspective NLP/ML Social Science

Goal Models to predict traits of unknownusers

Gain a better understanding ofgroup behaviors and differences

Framing Predictive task Exploring/testing hypotheses

Methods Regression/Classification Statistical hypothesis testingInterpretable featuresUse domain experts in analysis

Page 18: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Insight - Example

Differences between moderate and extreme users

Words associated with moderateliberals (5 and 6).

Words associated with extremeliberals (7).

relative frequency

a aacorrelation strength

Correlations are age and gender controlled

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 19: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Applications - Insight

Insight allows us to:

• Gain a better understanding of:• human behaviors• language use• linguage change• cultural differences• stylistic differences• pragmatic differences• human stereotypes

• Confirm or generate new data-driven hypotheses

Page 20: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

1. Data2. Prediction3. Insight

All steps pose unique challenges and implications.

Page 21: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 22: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 23: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

User sampling

Collecting representative gold data for training models.

For political orientation, previous NLP research collected users:

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 24: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

User sampling

Our hypotheses:

1. These users are far more likely to be politically engaged2. The prediction problem was over-simplified3. Neutral users are not accounted for4. There are differences between moderate and extreme users

on the same side

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 25: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Engagement

Data set obtained using previous methods

2.64 2.95

0.73

0.79

0.11

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 26: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Engagement

Our data set (survey-based, 7 point ideology scale)

2.64 0.76 0.55 0.42 0.36 0.46 0.51 0.76 2.95

0.73

0.24

0.140.07 0.07

0.09 0.12

0.19

0.79

0.11

0.03

0.03

0.02 0.020.03

0.03

0.04

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 27: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Engagement

Our data set (survey-based, 7 point ideology scale)

2.64 0.76 0.55 0.42 0.36 0.46 0.51 0.76 2.95

0.73

0.24

0.140.07 0.07

0.09 0.12

0.19

0.79

0.11

0.03

0.03

0.02 0.020.03

0.03

0.04

0.18

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00Political word usage across

user groups

Media/Pundit Names

Politician Names

Political Words

Average percentage of political word usage

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 28: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Over-simplification

The prediction problem was over-simplified

.891

.785

.662

.581

.972

.785

.679

.590

.976

.789

.690

.625

.5

.6

.7

.8

.9

1.0

CvL 1v7 2v6 3v5

Topics Political Terms Domain Adaptation

ROC AUC, Logistic Regression, 10 fold-cross validation.

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 29: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

User sampling

Take aways:

• 3x more political terms for automatically identified userscompared to the highest survey-based scores

• Performance drops by 15% even when predicting extremeusers

• Performance drops by 35% to close to random whenpredicting between politically moderates

User sampling has a important impact in experimental results.

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 30: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 31: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection

Trait collection: Identifying the trait value for users.

Several common methods exist:

1. Self-report2. Distant Supervision3. Perception (Annotation)4. Survey-based

Page 32: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection

1. Self-Report

• Method:• Mining profile descriptions• Mining tweet contents• Mining network connections• Processing profile images

• Advantages:• Large volume• Easy to implement

• Disadvantages:• Sample biases - some groups of users are more likely to

self-disclose personal information• Data usually required post-filtering due to false positives

Page 33: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection

2. Distant Supervision

• Method:• Map users to community statistics (e.g. Census data)

• Advantages:• Very large volume• Wide variety of traits have community statistics

• Disadvantages:• Statistics may be outdated• Twitter population is a biased sample of the general

population• Users that can be geolocated are not representative of the

Twitter population• Geo-located tweets might be posted from a different

location than the user’s home

Page 34: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection

3. Perception

• Method:• Human annotation of profiles, including text

• Advantages:• Accurate for common traits• Medium volume

• Disadvantages:• Contains systematic biases and stereotypes of particular

traits• Models trained on this data will capture only the

perception of the annotator

Page 35: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection

4. Survey-based

• Method:• Ask users for trait information through surveys

• Advantages• Collect information from the actual users• Can collect multiple traits• Can collect less common psychological traits

• Disadvantages:• Costly / Low volume• May be untruthful – but we can safeguard

Page 36: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Trait collection - Comparison

Comparing trait collection methods, race prediction, evaluatedon survey-based traits.

Daniel Preotiuc-Pietro and Lyle Ungar. “User-Level Race and Ethnicity Predictorsfrom Twitter Text”. In: COLING. 2018.

Page 37: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Survey-based vs. Perceived

We studied how the two differ in relation to demographic traits.

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 38: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Experimental Setup

20 Tweets/user

9 ratings/user

Forced choice guess

Self-rated confidence (1-5)

Real traits known inadvance through

self-reports

This way we isolate the textual cues from any other profilerelated cues (screen name, profile pic, etc)

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 39: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Data Set

Trait Outcome #Users #RatersGender M/F 2607 1083Age Integer 1066 737Education Adv/BSc/HS 900 481Political Orientation Lib/Cons 2500 943

Data set statistics

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 40: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Human perception accuracy

.517

.330

.500

.000

.757

.445

.816

.416

.858

.488

.903

.631

.0

.1

.2

.3

.4

.5

.6

.7

.8

.91.0

Gender (%) Education (%) PoliticalOrientation (%)

Age (r)

Random Accuracy Majority/Average Guess

People are usually correct.

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 41: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Inaccurate Gender Stereotypes

Trained two models on the same data with:• perceived labels• real labels

Training on perceived traits introduces a systematic biasLucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and

Daniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 42: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Inaccurate Gender Stereotypes

40.1

6.17.9

45.8

0

10

20

30

40

50

Males Females

Pred. Male Pred. Female

Model predictions.

42.2

9.97.2

40.7

0

10

20

30

40

50

Males Females

Perc. Male Perc. Female

Human guesses.

Model trained on >10,000 users with self-reported gender.

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 43: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Inaccurate Gender Stereotypes

40.1

6.17.9

45.8

0

10

20

30

40

50

Males Females

Pred. Male Pred. Female

Model predictions.

42.2

9.97.2

40.7

0

10

20

30

40

50

Males Females

Perc. Male Perc. Female

Human guesses.

The accuracies for correct predictions are reversed

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 44: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Inaccurate Gender Stereotypes

40.1

6.17.9

45.8

0

10

20

30

40

50

Males Females

Pred. Male Pred. Female

Model predictions.

42.2

9.97.2

40.7

0

10

20

30

40

50

Males Females

Perc. Male Perc. Female

Human guesses.

The accuracies for incorrect predictions are also reversed!

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 45: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Inaccurate Gender Stereotypes

Words more likely to be associatedwith females among male authors

Words more likely to be associatedwith males among female authors

The size of the word is the strength to which they’re inaccuratestereotypes i.e. ’love’ is more likely to mislead people inguessing female compared to ’wonderful.’

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, andDaniel Preotiuc-Pietro. “Analyzing Biases in Human Perception of User Age andGender from Text”. In: ACL. 2016.

Page 46: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Controlling Perception

Can we control human perception of demographic traits?

We restrict to selecting tweets from the user’s timeline.Daniel Preotiuc-Pietro, Sharath Chandra Guntuku, and Lyle Ungar. “Controlling

Human Perception of Basic User Traits”. In: EMNLP. 2017.

Page 47: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Controlling Perception

Annotator accuracy on predicting gender in the threeconditions.

76.66%

40.67%35.99%

55.73%

32.26%

23.47%

91.33%

47.83%43.50%

0%

25%

50%

75%

100%

Overall Females Males

Random Opposite Same

Daniel Preotiuc-Pietro, Sharath Chandra Guntuku, and Lyle Ungar. “ControllingHuman Perception of Basic User Traits”. In: EMNLP. 2017.

Page 48: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Beyond Survey-based Methods

Survey-based screening methods for mental illnesses areimperfect.

Mental illness is less likely to be self reported due to lack ofawareness or social stigma.

Surveys may not be the best tool for collecting ’gold’ labels.

Social media can be an alternative.

Johannes Eichstaedt et al. “Facebook Language Predicts Depression in MedicalRecords”. In: PNAS. 2018.

Page 49: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Beyond survey-based methods

We linked medical records with clinical diagnosis of depressionto Facebook data.

Johannes Eichstaedt et al. “Facebook Language Predicts Depression in MedicalRecords”. In: PNAS. 2018.

Page 50: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 51: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Content analysis

There are differences between neutral users and ideologicallyextreme users.

Words associated with eitherextreme conservative or liberal

Words associated with neutralusers

a aacorrelation strength

Correlations are age and gender controlled. Extreme groups arecombined using matched age and gender distributions.

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 52: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Content analysis

There are differences between moderate and extreme users onthe same side.

Words associated with moderateliberals (5 and 6).

Words associated with extremeliberals (7).

relative frequency

a aacorrelation strength

Correlations are age and gender controlledDaniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond Binary

Labels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 53: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Content analysis

Rank Correlation Topic (most frequent words)1 .116 hilarious, celeb, capaldi, corrie, chatty, corden,

barrowman2 .106 photo, art, pictures, photos, instagram, photoset,

image3 .106 hot, sex, naked, adult, teen, porn, lesbian, tube,

tits4 .087 turn, accidentally, barely, constantly, onto, bug,

suddenly5 .086 ha, ooo, uh, ohhh, ohhhh, maam, gotcha, gee,

ohhhhh

LIWC-1 .104 hfuck, gay, sex, sexy, dick, naked, fucks, cock,aids, cum

LIWC-2 .088 hate, fuck, hell, stupid, mad, sucks, suck, war,dumb, ugly

Word2Vec topics with the highest Pearson correlation betweenmoderately liberal users and moderately conservative users(gender/age controlled).

Daniel Preotiuc-Pietro, Liu Ye, Daniel J Hopkins, and Lyle Ungar. “Beyond BinaryLabels: Political Ideology Prediction of Twitter Users”. In: ACL. 2017.

Page 54: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 55: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a female ?

Charming – Fascinating

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 56: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a female ?

Charming – Fascinating

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 57: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by an older person?

Impressive – Amazing

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 58: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by an older person?

Impressive – Amazing

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 59: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a person of higheroccupational class ?

Suggestions – Proposals

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 60: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a person of higheroccupational class ?

Suggestions – Proposals

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 61: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a female ?

Brutal – Fierce

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 62: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a female ?

Brutal – Fierce

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 63: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by an older person?

Defensive – Protective

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 64: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by an older person?

Defensive – Protective

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 65: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a person of higheroccupational class ?

Humour – Wit

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 66: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Which word is more likely to be used by a person of higheroccupational class ?

Humour – Wit

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 67: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

68.5%73.7%

67.2%

50%

60%

70%

80%

90%

100%

Gender Age Occ.Class

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 68: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

The method for quantifying phrase choice is straightforward:

Gender(w) = log(

Female(w)Male(w)

)(1)

Within a paraphrase pair (w1,w2), the differenceGender(w1) −Gender(w2) is the stylistic distance.

We use only equivalent paraphrases of 1–3 grams from PPDB2.0.

Statistics are computed over large Twitter data sets with usertraits.

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 69: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

Study which attributes of words in a pair are preferred by onegroup:

• Word Length in Characters• Word Length in Syllables

Simple proxies for word complexity

• Affective Norms: Valence, Arousal, Dominance14k rated wordsValence: suicide (0.15)→ bacon (0.70)→ laughter (1)

• Concreteness40k rated words: spirituality (1)→morning (3.44)→ tiger (5)

• Age of Acquisition30k rated words: great (5.05)→ splendid (7.22)→ tremendous (10.63)

• More in the paper ...

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 70: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

-.048

-.051

-.053

.047

.089

-.037

-.022

-.028

.077

.158

-.124

-.026

-.034

.110

.211

-0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25

Concreteness

Happiness

Word Rareness

# Syllables

Word Length

Occ.Class (High) Age (>30) Gender (M)

Correlation coefficients between paraphrase pair worddifferences and user group differences in usage.

Daniel Preotiuc-Pietro, Wei Xu, and Lyle Ungar. “Discovering User AttributeStylistic Differences via Paraphrasing”. In: AAAI. 2016.

Page 71: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

.163

-.068

-.043

-.012

-.041

.067

.182

-.002

-.014

.036

-.001

.050

.045

.097

-.060

.010

.031

.028

.050

.047

.080

-.032

-.007

.030

.005

.040

.016

.010

-.014

.023

.000

-.024

.004

-.020

-.065

-.200 -.150 -.100 -.050 .000 .050 .100 .150 .200

Age of Acquisition

Concreteness

Dominance

Arousal

Happiness

#Syllables

Word Length

Openess Conscientiousness Extraversion Agreeableness Neuroticism

Correlation coefficients between paraphrase pair preferenceand user group usage.

Daniel Preotiuc-Pietro, Jordan Carpenter, and Lyle Ungar. “Personality DrivenDifferences in Paraphrase Preference”. In: NLP+CSS Workshop, ACL. 2017.

Page 72: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Phrase Choice

.163

-.068

-.043

-.012

-.041

.067

.182

-.002

-.014

.036

-.001

.050

.045

.097

-.060

.010

.031

.028

.050

.047

.080

-.032

-.007

.030

.005

.040

.016

.010

-.014

.023

.000

-.024

.004

-.020

-.065

-.200 -.150 -.100 -.050 .000 .050 .100 .150 .200

Age of Acquisition

Concreteness

Dominance

Arousal

Happiness

#Syllables

Word Length

Openess Conscientiousness Extraversion Agreeableness Neuroticism

Correlation coefficients between paraphrase pair preferenceand user group usage.

Daniel Preotiuc-Pietro, Jordan Carpenter, and Lyle Ungar. “Personality DrivenDifferences in Paraphrase Preference”. In: NLP+CSS Workshop, ACL. 2017.

Page 73: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 74: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Stylistic Differences

Correlations of stylistic features with age and income.

0.3 0.2 0.1 0.0 0.1 0.2 0.3

Income r

0.3

0.2

0.1

0.0

0.1

0.2

0.3

Age r

# Char/Token

# Tokens/Tweet

# Chars/Tweet

#words>5char

Type/token RatioPunctuation

Smileys

URLs

ARIF-Kincaid

Coleman-Liau

Flesch RE

FOGSMOG

LIX

Nouns

Verbs

Pronouns

Adverbs

Adjectives

Determiners

Interjections

Named entitiesContextuality

Abstract

Hedging

Specific

Elongations

Hapax legom.

Surface

Readability

Syntax

Style

Lucie Flekova, Lyle Ungar, and Daniel Preotiuc-Pietro. “Exploring StylisticVariation with Age and Income on Twitter”. In: ACL. 2016.

Page 75: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Stylistic Differences

Specificity – quantifies how much detail is engaged in text.

1 – Always too much.

5 – Mascara is the most commonly worn cosmetic, and women will spend an average of $4,000

on it in their lifetimes

Yifan Gao, Yang Zhong, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “Predicting andAnalyzing Language Specificity in Social Media Posts”. In: AAAI. 2019.

Page 76: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Aspects

This talk will try to address some of these aspects:

1. Data collection• User sampling• Trait collection

2. Prediction3. Insight

• Content• Phrase choice• Style• Pragmatic roles

Page 77: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

Vulgar words are often used in communication (1%)

Despite this, they are a restricted set of words (100)

Demographic traits impact how often users employ vulgarwords online (correlations with % vulgar use):

Isabel Cachola, Eric Holgate, Daniel Preotiuc-Pietro, and Junyi Jessy Li.“Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentimentanalysis in social media”. In: COLING. 2018.

Page 78: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

Vulgarity is employed purposefully

Vulgar words are used for different pragmatic functions

We identified six different pragmatic functions

We annotated 8,524 instances of vulgar words across 7,800tweets from users with known demographic traits.

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 79: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

1. Express aggression (15.2%)

The word is used in order to harm the person or group thetweet is about.

USER You are an ass Your industry is full of assholes and you do nothing to improve (...)

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 80: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

2. Express emotion (24.8%)

The word is used to express emotions (positive or negative)related to the users internal states, exclamations, feelings orattitudes towards an object. If removing the vulgar term, theexpressed emotion is lacking.

There are so many things I want to do, But investing in equipment is a pain in the ass

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 81: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

3. Emphasise (29.8%)

The word is used to emphasize a statement or feeling.

today is a good ass day URL

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 82: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

4. Auxiliary (17.0%)

The use of this word is simply a manner of speaking and doesnot fit any of the above descriptions. Descriptions of externalemotions (those of someone else) fall into this category.

Wish USER could save my ass on these exams like he used to

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 83: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

5. Signal Group Identity (4.7%)

This word is used as a marker of identity in a specific socialgroup.

Now this is a group of ass kickers

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 84: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Pragmatic roles

6. Non-Vulgar (8.2%)

The use of this word is not vulgar (e.g., named entities thatinvolve vulgar words).

Kick Ass 2 - Red Band Trailer URL

Eric Holgate, Isabel Cachola, Daniel Preotiuc-Pietro, and Junyi Jessy Li. “WhySwear? Analyzing and Inferring the Intentions of Vulgar Expressions”. In: EMNLP.2018.

Page 85: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Take Aways

• Data collection poses challenges:• Sampling biases• Label collection

• Insight is important for social science and obtainedthrough• Interpretable modelling and prediction methods• Linguistically motivated features• Collaboration with domain experts• Traditional social science approaches• Quasi-experimental methods

Page 86: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Thank You!

Thank you to my amazing collaborators:

Page 87: User Trait Expression and Portrayal through Social Media€¦ · Political Orientation Lib/Cons 2500 943 Data set statistics Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle

Thank You!

Thank you!