26
Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and Thore Graepel Computational Social Media Karolos Antoniadis Presentation 12th of March, 2020

gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Private traits and attributes are predictable from digital records of human behaviorMichal Kosinski, David Stillwell, and Thore Graepel

Computational Social Media

Karolos Antoniadis

Presentation12th of March, 2020

Page 2: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

2

Private traits and attributes are predictable from digital records of human behavior

Page 3: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

3

Private traits and attributes are predictable from digital records of human behavior

Page 4: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

4

Private traits and attributes are predictable from digital records of human behavior

• openness• extraversion• age• sexual orientation• gender• ethnicity• etc.

Page 5: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

5

Private traits and attributes are predictable from digital records of human behavior

• openness• extraversion• age• sexual orientation• gender• ethnicity• etc.→

Page 6: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Problem

• Information about people might be predicted.

• For example, studies have shown that attributes can be predicted from browsing logs, used language.

6

Page 7: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Problem

• Information about people might be predicted.

• For example, studies have shown that attributes can be predicted from browsing logs, used language.

7

Question: Use basic digital records to automatically and accurately estimate personal attributes?

Page 8: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Contribution

With Facebook likes, we can accurately estimate a wide range of personal attributes (typically assumed private).

8

Page 9: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Data

Objects: quotes, web sites, press articles, books, images, etc.

9

Likes are shared with friends to express support, bookmarking, etc.

Page 10: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Data

10

9 million unique objects liked by users.

A majority of the objects associated with very few users.

Discard likes with < 20 users and users with < 2 likes.

What remains?

Page 11: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Data

11

55,814 Objects

58,4

66 U

sers

Page 12: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Data

12

55,814 Objects

58,4

66 U

sers

user

object

Page 13: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Data

13

55,814 Objects

58,4

66 U

sers

0 or 1user

object

Page 14: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Labels

14

Personality traits with the International Personality Item Pool (IPIP).

Religion, political party, etc. from Facebook profile.

Ethnicity by looking at users’ pictures.

Two types: dichotomous and numeric.

Page 15: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Models

Reduce the dimensionality of the User-Like matrix with SVD.

15

Use 100 components.

Build models that predict traits and attributes.

For numeric variables: linear regressionFor dichotomous variables: logistic regression

Page 16: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Approach - Overview

16

Page 17: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Highest accuracy: gender & ethnicity

Lowest accuracy: divorced parents

17

Results - Dichotomous Variables

Page 18: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Numeric Variables

18

Page 19: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Predictive Likes

19

Page 20: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Predictive Likes

20

Page 21: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results

21

Page 22: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Power of Likes

22

Even a single like resultsin nonnegligible accuracy.

Page 23: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Overview

23

Few users were associated with explicitly revealing Likes.

Less than 5% of gay users liked explicitly gay objects.

Page 24: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

Results - Overview

24

Likes can accurately predict individual traits and attributes.

Few users were associated with explicitly revealing Likes.

Less than 5% of gay users liked explicitly gay objects.

Page 25: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

ConclusionPersonal attributes, ranging from sexual orientation to intelligence, can be automatically and accurately inferred using their Facebook’s likes.

25

PROS CONS

• Improve products and services• Improve recommendations• New avenues in psychology

• Revealing without consent (danger)• What do we reveal?• Distrust in online services

Page 26: gatica/teaching-csm/readings/csm-reading2-presenta… · Private traits and attributes are predictable from digital records of human behavior Michal Kosinski, David Stillwell, and

ConclusionPersonal attributes, ranging from sexual orientation to intelligence, can be automatically and accurately inferred using their Facebook’s likes.

26

PROS CONS

• Improve products and services• Improve recommendations• New avenues in psychology

• Revealing without consent (danger)• What do we reveal?• Distrust in online services

Thank y

ou!