Mental Disorder Detection on Twitter

Preview:

Citation preview

Mental Disorder Detection on Twitter: Bipolar Disorder and Borderline

Personality Disorder

National Tsing Hua UniversityDepartment of Information System and

Application

Advisor: Prof. Yi-Shin ChenStudent: Chun-Hao Chang

Introduction

18.1% people suffer from mental disorder in United States (*)

Using Social Network to research on Mental Disorder

National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml

Analyze

Challenges

How to efficiently collect the tweets of patients?

How to correctly detect mental disorder patients?

Related Works

Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))

Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder

Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)

Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis

Two important related works are as following:

Related Works

Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))

Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder

Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)

Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis

Two important related works are as following:

700 Million Tweets from Oct 2014 to Aug 2015

Only 8 Bipolar and 3 BPD patients are found

Background

Bipolar Disorder:

*Unstable and impulsive emotions

Cycling between Maniac and Depression episodes

Borderline Personality Disorder:

*Unstable and impulsive emotions

Impaired social interactions

Framework

Data Collecting

A community portals is a Twitter account which is followed by a lot of patients.

The community portals can be found by searching for the disorder on the Twitter website.

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

1

Data CollectingDownload Followers of Community Patients.

(5000 followers for each portal in this study)

Filter out suspicious patients from follower profiles by keywords:

BPD and Bipolar in this study

Manually label the users as patients, experts and non-related.

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

23

4

A BPD patient

A BPD Expert

Data Collecting Download Tweets by REST API

(3200 tweets at most, exclude retweets)

1. Randomly sample English spoken users by Twitter Streaming API

2. Download Tweets by REST API

(3200 tweets at most, exclude retweets)

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

65

Data Collected Group Users

Random Samples 823

Bipolar 798

BPD 427

Bipolar Experts 54

BPD Experts 42

We assume theses random sampled Twitter users and experts does not have Bipolar or BPD

Because prevalence of Bipolar is 2.6% and BPD is 1.6% (*) in United States. It shouldn’t seriously damage the predictive performance

National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml

Preprocessing

Sentiment 140 API

Emotion Classification API

Processed Data of Patients, Experts and Random Samples

Spam and Inactive User Filter

1. Tweets amount > 1002. Tweets contain hyperlink lower

than 50%

Positive Negtaive Neutral

Data after preprocessing

Group Users Tweets Averaged Tweets

Random Samples

548 796957 1454.3

Bipolar Patients

278 347774 1250.99

BPD Patients 203 225774 1112.19

Bipolar Experts 11 14056 1611.67

BPD Experts 9 19696 1790.55

Feature Extraction

1

TF-IDF Features

LIWC Features

Pattern of Life Features

TF-IDF Calculation

LIWC Counting

Polarity Extraction

Emotions Extraction

Age Gender Prediction

Social Behavior Extraction

Open vocabulary approach by

calculating unigram and bigram

Personal behaviors: Emotional Pattern,

Social Interactions and Profiles Data

64 Categories of Psychological

Dictionary

Feature Extraction :

Pattern of Life FeaturesProposed by Coppersmith et al. We further improve it as following :

1. Polarity: Positive and negative percentages, Positive and negative combos ratio, Flips ratios

2. Emotions: Percentage of eight emotions

3. Age and Gender: Inferred age and gender(*)

4. Social Interactions: Mentioning Rate, Frequent menting Counts, Unique Mentioning Counts

Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media: The open-vocabulary approach." PloS one 8.9 (2013): e73791.

APA

Feature Extraction: Illustration of combos and Flips

3 min 900 min 18 min 15 min 800 min 200 min

1 Flips 3 Negative Combos

Flip Ratio = 1 / 7Negative Combo Ratio = 3 / 7

Time interval between tweets

Flip Time threshold: 30 minCombo Time threshold: 120 min

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Selection Bias Test

Limited Data Test

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Limited Data Test

Selection Bias Test

Shows relationship between precision and recall

Randomly split data into 10 chunks, 9 chunks for training and 1 chunks for testing. And calculate the precision and recall after multiple iteration

Evaluations on Bipolar: 10-fold Cross Validation

Area Under the Curve:

Pattern of Life

0.90

LIWC 0.91

TF-IDF 0.96

Evaluations on BPD: 10-fold Cross Validation

Area Under the Curve:

Pattern of Life

0.91

LIWC 0.90

TF-IDF 0.96

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Selection Bias Test

Selection Bias Test

To see if model is predicting people having disorder or just talking about it

11 Bipolar experts 9 BPD experts as the testing data. It shows the tendency of classifiers mis-classified experts as patients

Selection Bias Test

TOP 10 Keywords from TF-IDF Classifier

Bipolar BPD

mentalhealth dbt

meds feeling

blog borderline

therapy helps

anxiety self harm

thoughts psychiatrist

feel better cpn

electroboyusa disorder

health bpdchat

bipolarblogger depression

TF-IDF Classifier has the tendency to detectpeople who are talking about disorder

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Limited Data Test

Selection Bias Test

Reveals how precision changes when the tweets are limited.

Similar to 10-fold cross validation, but the testing data are extracted only from the latest K tweets

Evaluations on Bipolar: Limited Tweets Precision

Evaluations on BPD: Limited Tweets Precision

Conclusion:

How to efficiently collect the tweets data patients?

We proposed an efficient and accessible way to collect tweets of patients

How to correctly detect mental disorder patients?

We suggested that Pattern of Life Model gives high precision and low bias

Recommended