12
A Study on the Spacio-Temporal Trend of Brand Index using Twitter Messages Sentiment Analysis

A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Embed Size (px)

Citation preview

Page 1: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

A Study on the Spacio-Temporal Trend of Brand Index using Twitter Mes-sages Sentiment Analysis

Page 2: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Abstract

Twitter Data

Social

Science

Human

ArtMedical

Economy

SentimentAnalysis

Page 3: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Introduction

Twitter Crawling

Data Pre-processing

Korean Morphology Analysis

Twitter Opinion Mining Sentiment Dictionary

Evaluating performance of candidate classifiers

Sentiment Classification

Visualize Associative Relationship of Terms

Relationship with Brand Index

Page 4: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Twitter Crawling

Twitter API

Streaming API

REST API- Search API

Get 1% of alltwitter data inreal time

Get twitter datafrom the keyword

2013.9.9.Mon. 9:35pm ~ Now

About 10,000 ~ 15,000 tweets per a day

Total 1,220,000 tweets (2013.11.2.Sat)

Page 5: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Data Pre-Processing

Only get tweets which contain at least more than 3 Korean characters and tweets within a 500km radius of Seoul, Korea. To remove foreign languages, special characters

Remove tweets which only contain location information.

Remove retweets

ويتكلم نهائيا السمع فقد متعب ابو الملك ان خبر اكد المستوى رفيع وامير موثوق صدرتخريف )) (( مفهوم وغير مترابط غير Sat Oct 12 00:06:37 KST 2013::كالم

I'm at Club ELLUI - @ellui_seoul ( 서울특별시 ) w/ 2 others http://t.co/zhcrncosKH::Sat Oct 12 00:02:06 KST 2013

Page 6: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Korean Morpheme Analyzer

꼬꼬마 Korean Morpheme Analyzer

한나눔 Korean Morpheme Analyzer

Komoran Korean Morpheme Analyzer

Lucene Korean Analyzer

은전한닢 Korean Morpheme Analyzer

Performance of the analyzer

Foreign language and slang tagging

Sentiment related word tagging (slang, verb, emoticon)

It has good dictionary

Don’t need to think about word spacing

But, unable to perceive lots of emoti-cons, metaphor, sarcasm, irony.

Page 7: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Korean Morpheme Analyzer

> 배가 아파서 병원에 갔다 . 

배 NN,F, 배 ,*,*,*,*,* 가 JKS,F, 가 ,*,*,*,*,* 아파서 VA+EC,F, 아파서 ,Inflect,VA,EC, 아프 /VA+ ㅏ서 /EC,* 병원 NN,T, 병원 ,*,*,*,*,* 에 JKB,F, 에 ,*,*,*,*,* 갔 VV+EP,T, 갔 ,Inflect,VV,EP, 가 /VV+ ㅏㅆ /EP,* 다 EF,F, 다 ,*,*,*,*,* . SF,*,*,*,*,*,*,* EOS 

Noun

VerbAdjective

Adverb

Root

Page 8: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Building Sentiment Dictionary

Manually labeled twitter data

1 • 6 days of twitter data (2013.9.9, 9.16, 9.23, 9.30, 10.7, 10.14)• Labeled positive and negative sets of Noun, Adjective, Verb, Root (total 8

sets)• Labeled by 4 person

2 • 20,000 reviews from 2 movies • 545 positive set, 545 negative set,

545 neutral set

Naver Movie review data with rating

1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 100

500

1000

1500

2000

2500

3000

3500Positive

Positive

negative

Movie 1 Movie 2

Page 9: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Sentiment Classification

SVM Classifier 1. Training set - 150 positive set, 150 negative set (Twitter data)

2. Test set – 545 positive set, 545 negative set (Movie review data)

Accuracy = 70.64220183486239% (770/1090) (classification) Mean squared error = 1.1743119266055047 (regression) Squared correlation coefficient = 0.18400994471523438 (regression)

Naïve bayes Classifier

SO-PMI Classifier

Page 10: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Building Sentiment Dictionary

Unlabeled & labeled data set

Ternary classifier : Naïve Bayes,SO-PMI, SVM

Positive

set

Negative

setNeutr

alset

Positive

set

Negative

set

Neutral

set

Positive

set

Negative

setNeutral

set

SO-PMI

SVM

Naïve Bayes

Page 11: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Sentiment of Brand Index

SamsungGalaxy S2

Battery LCDPrice ….

: Brand (keyword)

: Related nouns (attribute)

AdjectiveVerbNoun

Adverb …

correlation

good

good nice

good good

Nice, pretty,lovely …

Bad, terrible …

PMI(word, pword) + PMI(word, nword)DeterminingObjectivity

Page 12: A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Scenario