30
Mental Disorder Detection on Twitter: Bipolar Disorder and Borderline Personality Disorder National Tsing Hua University Department of Information System and Application Advisor: Prof. Yi-Shin Chen Student: Chun-Hao Chang

Mental Disorder Detection on Twitter

Embed Size (px)

Citation preview

Page 1: Mental Disorder Detection on Twitter

Mental Disorder Detection on Twitter: Bipolar Disorder and Borderline

Personality Disorder

National Tsing Hua UniversityDepartment of Information System and

Application

Advisor: Prof. Yi-Shin ChenStudent: Chun-Hao Chang

Page 2: Mental Disorder Detection on Twitter

Introduction

18.1% people suffer from mental disorder in United States (*)

Using Social Network to research on Mental Disorder

National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml

Analyze

Page 3: Mental Disorder Detection on Twitter

Challenges

How to efficiently collect the tweets of patients?

How to correctly detect mental disorder patients?

Page 4: Mental Disorder Detection on Twitter

Related Works

Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))

Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder

Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)

Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis

Two important related works are as following:

Page 5: Mental Disorder Detection on Twitter

Related Works

Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))

Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder

Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)

Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis

Two important related works are as following:

700 Million Tweets from Oct 2014 to Aug 2015

Only 8 Bipolar and 3 BPD patients are found

Page 6: Mental Disorder Detection on Twitter

Background

Bipolar Disorder:

*Unstable and impulsive emotions

Cycling between Maniac and Depression episodes

Borderline Personality Disorder:

*Unstable and impulsive emotions

Impaired social interactions

Page 7: Mental Disorder Detection on Twitter

Framework

Page 8: Mental Disorder Detection on Twitter

Data Collecting

A community portals is a Twitter account which is followed by a lot of patients.

The community portals can be found by searching for the disorder on the Twitter website.

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

1

Page 9: Mental Disorder Detection on Twitter
Page 10: Mental Disorder Detection on Twitter

Data CollectingDownload Followers of Community Patients.

(5000 followers for each portal in this study)

Filter out suspicious patients from follower profiles by keywords:

BPD and Bipolar in this study

Manually label the users as patients, experts and non-related.

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

23

4

Page 11: Mental Disorder Detection on Twitter

A BPD patient

Page 12: Mental Disorder Detection on Twitter

A BPD Expert

Page 13: Mental Disorder Detection on Twitter

Data Collecting Download Tweets by REST API

(3200 tweets at most, exclude retweets)

1. Randomly sample English spoken users by Twitter Streaming API

2. Download Tweets by REST API

(3200 tweets at most, exclude retweets)

Keywords Filter

Manual Verification

Collect Followers (REST API)

Randomly Sample Users(Streaming API & REST API)

Manually Collect Community Portals

Tweets of Patients , Experts and Random Samples

Collect Tweets (REST API)

65

Page 14: Mental Disorder Detection on Twitter

Data Collected Group Users

Random Samples 823

Bipolar 798

BPD 427

Bipolar Experts 54

BPD Experts 42

We assume theses random sampled Twitter users and experts does not have Bipolar or BPD

Because prevalence of Bipolar is 2.6% and BPD is 1.6% (*) in United States. It shouldn’t seriously damage the predictive performance

National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml

Page 15: Mental Disorder Detection on Twitter

Preprocessing

Sentiment 140 API

Emotion Classification API

Processed Data of Patients, Experts and Random Samples

Spam and Inactive User Filter

1. Tweets amount > 1002. Tweets contain hyperlink lower

than 50%

Positive Negtaive Neutral

Page 16: Mental Disorder Detection on Twitter

Data after preprocessing

Group Users Tweets Averaged Tweets

Random Samples

548 796957 1454.3

Bipolar Patients

278 347774 1250.99

BPD Patients 203 225774 1112.19

Bipolar Experts 11 14056 1611.67

BPD Experts 9 19696 1790.55

Page 17: Mental Disorder Detection on Twitter

Feature Extraction

1

TF-IDF Features

LIWC Features

Pattern of Life Features

TF-IDF Calculation

LIWC Counting

Polarity Extraction

Emotions Extraction

Age Gender Prediction

Social Behavior Extraction

Open vocabulary approach by

calculating unigram and bigram

Personal behaviors: Emotional Pattern,

Social Interactions and Profiles Data

64 Categories of Psychological

Dictionary

Page 18: Mental Disorder Detection on Twitter

Feature Extraction :

Pattern of Life FeaturesProposed by Coppersmith et al. We further improve it as following :

1. Polarity: Positive and negative percentages, Positive and negative combos ratio, Flips ratios

2. Emotions: Percentage of eight emotions

3. Age and Gender: Inferred age and gender(*)

4. Social Interactions: Mentioning Rate, Frequent menting Counts, Unique Mentioning Counts

Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media: The open-vocabulary approach." PloS one 8.9 (2013): e73791.

APA

Page 19: Mental Disorder Detection on Twitter

Feature Extraction: Illustration of combos and Flips

3 min 900 min 18 min 15 min 800 min 200 min

1 Flips 3 Negative Combos

Flip Ratio = 1 / 7Negative Combo Ratio = 3 / 7

Time interval between tweets

Flip Time threshold: 30 minCombo Time threshold: 120 min

Page 20: Mental Disorder Detection on Twitter

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Selection Bias Test

Limited Data Test

Page 21: Mental Disorder Detection on Twitter

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Limited Data Test

Selection Bias Test

Shows relationship between precision and recall

Randomly split data into 10 chunks, 9 chunks for training and 1 chunks for testing. And calculate the precision and recall after multiple iteration

Page 22: Mental Disorder Detection on Twitter

Evaluations on Bipolar: 10-fold Cross Validation

Area Under the Curve:

Pattern of Life

0.90

LIWC 0.91

TF-IDF 0.96

Page 23: Mental Disorder Detection on Twitter

Evaluations on BPD: 10-fold Cross Validation

Area Under the Curve:

Pattern of Life

0.91

LIWC 0.90

TF-IDF 0.96

Page 24: Mental Disorder Detection on Twitter

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Selection Bias Test

Selection Bias Test

To see if model is predicting people having disorder or just talking about it

11 Bipolar experts 9 BPD experts as the testing data. It shows the tendency of classifiers mis-classified experts as patients

Page 25: Mental Disorder Detection on Twitter

Selection Bias Test

Page 26: Mental Disorder Detection on Twitter

TOP 10 Keywords from TF-IDF Classifier

Bipolar BPD

mentalhealth dbt

meds feeling

blog borderline

therapy helps

anxiety self harm

thoughts psychiatrist

feel better cpn

electroboyusa disorder

health bpdchat

bipolarblogger depression

TF-IDF Classifier has the tendency to detectpeople who are talking about disorder

Page 27: Mental Disorder Detection on Twitter

Classifiers Training and Evaluations

TF-IDF Models

Pattern of Life Models

LIWC Models

Random Forest Classifier Training

10-Fold Cross Validation Test

Limited Data Test

Selection Bias Test

Reveals how precision changes when the tweets are limited.

Similar to 10-fold cross validation, but the testing data are extracted only from the latest K tweets

Page 28: Mental Disorder Detection on Twitter

Evaluations on Bipolar: Limited Tweets Precision

Page 29: Mental Disorder Detection on Twitter

Evaluations on BPD: Limited Tweets Precision

Page 30: Mental Disorder Detection on Twitter

Conclusion:

How to efficiently collect the tweets data patients?

We proposed an efficient and accessible way to collect tweets of patients

How to correctly detect mental disorder patients?

We suggested that Pattern of Life Model gives high precision and low bias