Upload
so-yeon-kim
View
57
Download
0
Tags:
Embed Size (px)
Citation preview
A Study on the Spacio-Temporal Trend of Brand Index using Twitter Mes-sages Sentiment Analysis
Abstract
Twitter Data
Social
Science
Human
ArtMedical
Economy
SentimentAnalysis
Introduction
Twitter Crawling
Data Pre-processing
Korean Morphology Analysis
Twitter Opinion Mining Sentiment Dictionary
Evaluating performance of candidate classifiers
Sentiment Classification
Visualize Associative Relationship of Terms
Relationship with Brand Index
Twitter Crawling
Twitter API
Streaming API
REST API- Search API
Get 1% of alltwitter data inreal time
Get twitter datafrom the keyword
2013.9.9.Mon. 9:35pm ~ Now
About 10,000 ~ 15,000 tweets per a day
Total 1,220,000 tweets (2013.11.2.Sat)
Data Pre-Processing
Only get tweets which contain at least more than 3 Korean characters and tweets within a 500km radius of Seoul, Korea. To remove foreign languages, special characters
Remove tweets which only contain location information.
Remove retweets
ويتكلم نهائيا السمع فقد متعب ابو الملك ان خبر اكد المستوى رفيع وامير موثوق صدرتخريف )) (( مفهوم وغير مترابط غير Sat Oct 12 00:06:37 KST 2013::كالم
I'm at Club ELLUI - @ellui_seoul ( 서울특별시 ) w/ 2 others http://t.co/zhcrncosKH::Sat Oct 12 00:02:06 KST 2013
Korean Morpheme Analyzer
꼬꼬마 Korean Morpheme Analyzer
한나눔 Korean Morpheme Analyzer
Komoran Korean Morpheme Analyzer
Lucene Korean Analyzer
은전한닢 Korean Morpheme Analyzer
Performance of the analyzer
Foreign language and slang tagging
Sentiment related word tagging (slang, verb, emoticon)
It has good dictionary
Don’t need to think about word spacing
But, unable to perceive lots of emoti-cons, metaphor, sarcasm, irony.
Korean Morpheme Analyzer
> 배가 아파서 병원에 갔다 .
배 NN,F, 배 ,*,*,*,*,* 가 JKS,F, 가 ,*,*,*,*,* 아파서 VA+EC,F, 아파서 ,Inflect,VA,EC, 아프 /VA+ ㅏ서 /EC,* 병원 NN,T, 병원 ,*,*,*,*,* 에 JKB,F, 에 ,*,*,*,*,* 갔 VV+EP,T, 갔 ,Inflect,VV,EP, 가 /VV+ ㅏㅆ /EP,* 다 EF,F, 다 ,*,*,*,*,* . SF,*,*,*,*,*,*,* EOS
Noun
VerbAdjective
Adverb
Root
Building Sentiment Dictionary
Manually labeled twitter data
1 • 6 days of twitter data (2013.9.9, 9.16, 9.23, 9.30, 10.7, 10.14)• Labeled positive and negative sets of Noun, Adjective, Verb, Root (total 8
sets)• Labeled by 4 person
2 • 20,000 reviews from 2 movies • 545 positive set, 545 negative set,
545 neutral set
Naver Movie review data with rating
1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 100
500
1000
1500
2000
2500
3000
3500Positive
Positive
negative
Movie 1 Movie 2
Sentiment Classification
SVM Classifier 1. Training set - 150 positive set, 150 negative set (Twitter data)
2. Test set – 545 positive set, 545 negative set (Movie review data)
Accuracy = 70.64220183486239% (770/1090) (classification) Mean squared error = 1.1743119266055047 (regression) Squared correlation coefficient = 0.18400994471523438 (regression)
Naïve bayes Classifier
SO-PMI Classifier
Building Sentiment Dictionary
Unlabeled & labeled data set
Ternary classifier : Naïve Bayes,SO-PMI, SVM
Positive
set
Negative
setNeutr
alset
Positive
set
Negative
set
Neutral
set
Positive
set
Negative
setNeutral
set
SO-PMI
SVM
Naïve Bayes
Sentiment of Brand Index
SamsungGalaxy S2
Battery LCDPrice ….
: Brand (keyword)
: Related nouns (attribute)
AdjectiveVerbNoun
Adverb …
correlation
good
good nice
good good
Nice, pretty,lovely …
Bad, terrible …
PMI(word, pword) + PMI(word, nword)DeterminingObjectivity
Scenario