Upload
babaseigo
View
14
Download
0
Tags:
Embed Size (px)
Citation preview
Classification Method for Shared Information on
Twitter Without Text Data
the University of Tokyo, Japan
Seigo BabaFujio Toriumi, Takeshi Sakaki, Kosuke Shinoda, Kazuhiro Kazama, Satoshi Kurihara, Itsuki Noda
The 3rd International Workshop on Social Web for Disaster Management (SWDM'15) with WWW’15(May 2015, Florence, Italy)
1
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
2
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
3
Information in Disaster Situation
• Local information must be collected– For Victims
• Shelter location• Tsunami, ...
– For Rescuers• Donating money• Volunteer activities, ...
4
How to collect information in disaster situation ?
• From mass media ? – General and public information only– Not personalized
• From social media ?– They perform well
• [10 Mendoza],[11 miyabe],[10 sakaki]
– In particular, Twitter is useful• We also focus on Twitter
5
Classification of Tweets is required
6
A lot of Tweets 5,000 Tweets posted per sec in
the 2011 Great East Japan Earthquake ( Official Twitter Blog — Japan )
Collecting appropriate Tweets is difficult
Classification of Tweets is required !
Weakness in Classification using Text Mining
7
「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」
:
Group
Cluster①
「My head hurts 」 「 A head office 」 「 Protect your head 」Cluster②
:
• Are they topic similar?
Focusing on Retweet
• RT(Retweet): Suggest a user has interest in a Tweet[13 Toriumi]
8
「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」
:
Group
Cluster①
「 Shut off the gas 」 「Wear shoes 」 「 Protect your head 」Cluster②
:
RT
RT
RT
Interest
Purpose of this study
9
「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」
:
Group
Cluster①
「 Shut off the gas 」 「Wear shoes 」 「 Protect your head 」Cluster②
:
RT
RT
RT
Interest
Propose a novel tweet classification method focusing on retweets
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
10
An outline of proposed method
11
Calculate the similarity between
tweets and
Construct retweet network
Network clustering
Tweet1 Tweet20.15
Tweet3
0.03
The Similarity of Retweet Users
• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may
share a topic
– – Users who retweeted tweet ,
12・・・・
=Retweet
T1 T2 T3 T4 T5 ・・・・・・
T=Tweet
The Similarity of Retweet Users
• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may
share a topic
– – Users who retweeted tweet ,
13・・・・
=Retweet
T1 T2 T3 T4 T5 ・・・・・・
T=Tweet
Construct retweet network
• Connect two tweets which satisfy – =0.05– The similarities for all the combination of two
tweets were calculated– Nodes in obtained component may be topic
similar mutually
14
T1
T4T3
T20.06
0.1
0.030.11
0.02
0.01
T1
T4T3
T2=0.05
Data
• Tweets retweeted more than 100 times from March 5 to 24, 2011 – The Great East Japan Earthquake occurred at
11th – 34,860 tweets
15
Retweet network
16
Network Clustering
• It is assumed that large component have various topics
• Apply clustering method based on Newman method [04 Newman] to retweet network– To extract clusters that contain similar tweets
17
Clustering Result
• 11,494 Tweets→2,001 Clusters• Following slides show some clusters
18
Result Example 1
• Cluster about shelter
19
The Oura cafeteria on the Ueno Campus of the Tokyo University of the Arts is open. You can
spend the night there.
[a quick report] Okumakodo is open! It looks like it has some blankets http://twitpic.com/48f6y2
Are you all right? [The Tokyo Bunka Kaikan just opened. It's getting dark and cold, so if you are
around Ueno Station, please go there.]
Result Example 2
• Cluster about advice for victims
20
If you are evacuating with a baby, wrap the baby in a blanket and carry it in a tote bag. No baby
buggies! \#jishin
[Please spread] If you use Twitter by mobile phone, turn off your icons to conserve battery life.
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
21
Proposed Method’s Validity
• Conduct subjective experiments to clarify the proposed method’s validity– Are tweets in same cluster similar to each
other ?• The Experiment consists of 2 choice
questions
22
Example of a question in subjective experiment
23
Twitter is a source of information
Yahoo! Map shows the area of the rolling
blackouts
The site gives information about power plant and rolling blackouts
Which tweet is more topic-similar to me?
Choice Tweet A Choice Tweet B
Statement Tweet
How to Make Questions ?
• Choice tweets consist of two tweets– Inner tweet
• Belongs to the cluster to which the statement tweet belongs
– Outer tweet• Belongs to the cluster to which the statement tweet
does not belong
24
Tweet
Tweet
Tweet
Tweet
ClusterTweet
Tweet
Tweet
ClusterInner Tweet
Statement Tweet
Outer Tweet
How to Make Questions ?
25
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Cluster Cluster
Tweet
Tweet
Tweet
Tweet
ClusterTweet
Tweet
Tweet
Tweet
Cluster
Tweet
• Two cluster are selected randomly
Tweet
Tweet
Tweet
Cluster
How to Make Questions ?
26
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Cluster Cluster
Tweet
Tweet
Tweet
Tweet
ClusterTweet
Tweet
Tweet
Tweet
Cluster
Tweet
• Two cluster are selected randomly
Tweet
Tweet
Tweet
Cluster
How to Make Questions ?
27
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Cluster Cluster
Tweet
Tweet
Tweet
Tweet
ClusterTweet
Tweet
Tweet
Tweet
Cluster
Tweet
• A statement Tweet is selected randomly
Tweet
Tweet
Tweet
ClusterStatement
Tweet
How to Make Questions ?
28
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Cluster
Tweet
Tweet
Tweet
Tweet
ClusterTweet
Tweet
Tweet
Tweet
Cluster
Tweet
• Inner Tweet and outer Tweet are selected randomly
Tweet
Tweet
Tweet
ClusterStatement
TweetInner Tweet Outer Tweet
Example of a question in subjective experiment
29
Twitter is a source of information
Yahoo! Map shows the area of the rolling
blackouts
The site gives information about power plant and rolling blackouts
Which tweet is more topic-similar to me?
Choice Tweet A Choice Tweet B
Statement Tweet
Inner TweetOuter Tweet
Examinees and Questions
• 100 questions were selected randomly– Each examinee solved 50 of them
• Fourteen Examinees– Seven examinees solved each question– If more than four examinees select a inner
tweet, the result is labeled as ‘Correct’.
30
50 Questions
50 Questions
Subjective Experiment Result
• 89% of all the question were correct !
31
89% !
Subjective Experiment Result
• The similarities are obvious– More than six examinees selected the inner
tweet in 77% of questions
32
77% !
The validity was confirmed
• We confirmed the validity of the proposed method – The rate of the clusters whose nodes are
mutually similar in the cluster to all cluster is very high
– The similarities of the nodes in each cluster are obvious
33
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
34
Can classification based on text mining group them?
35
This tweet was posted by a volunteer center. Yesterday, more than 1000 people read it and
learned about dangerous areas and shortages. What should we do? http://t.co/4JpWlXt \#jishin
RT [please spread] Check that your car has a jack for changing tires. They are useful for rescuing
victims from rubble. \#jishin \#jisin
• Some clusters have little linguistic similarities– Which are difficult to group by using text mining
Cluster about advice for victims
Linguistic Similarities in Clusters
• The quantitative assessments of linguistic similarities is required
• Apply Vector Space Model– Calculate the linguistic similarity between two
document based on TF-IDF• In this Study, document = tweet
36
Apply Vector Space Model
• Calculate linguistic similarities of two tweets for all the combination() – Including linked and unlinked combination– To calculate reference values
37
Reference Values
• The result of calculation for all combination – Average = 0.0156
• When the similarity between two tweets is under that average(0.0156), their linguistic similarity is random at most
38
Linguistic Similarities in each cluster
• The linguistic similarities in each cluster were also calculated– Defined as the average of the tweets for all
the combinations of the nodes that belong to the cluster
39
𝐶23❑Cluster
Tweet1 Tweet2
Tweet1
Tweet2 Tweet3
Tweet3
0.5
0.3
0.1
=0.3Tweet1
Tweet2
Tweet3
All combinations
in cluster
Linguistic similarity in cluster
Linguistic Similarities in each cluster
• 8.25 % of all clusters are under 0.0156– Some of the clusters are as low as randomly
selected tweets– Which are difficult to group by using text
mining !
40
8.25%, 0.0156
Example of Clusters with low linguistic similarities 1
• Cluster about life in shelter– Linguistic similarity is 0.0108
41
I've experienced two big earthquakes. I spent a few nights in a car and saw many senior citizens who seemed to be suffering from economy class syndrome from remaining in the same posture for a long time. If you have to spend too much
time in a car or a cramped shelter, don't forget to stretch your legs.
If children are shaking or suffering from fear, hug and comfort them.
Example of Clusters with low linguistic similarities 2
• Cluster about advices for victims– Linguistic similarity is 0.0052
42
RT [Summarize the information]Open the door, Cook some rice, Place baggages in an entrance, Buy water, Snacks and a towel,
Blankets, Wear shoes ....
My friend who survived the Great Hanshin Earthquake evacuated his house in pajamas. So
tonight, sleep in clothes just case you have to leave quickly.
Contents
• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions
43
Conclusions
• We proposed a novel method of the classification of tweets by focusing on retweets without using text mining
• Most of the obtained clusters have local information which are very useful in disaster situation
44
Conclusions
• A subjective experiment confirmed the validity of our method – Nodes are similar to each other in 89 %
clusters– The similarities are obvious
• Clusters obtained by our method are topic-similar, even if they are not linguistically similar
45
Future Works
• Apply a softClustering method to retweet network– Our proposed method is alternative classification– A tweet can’t belong to multi clusters
46
TsunamiShelter
Donating moneyVolunteer
Donating supplies
?
Information for victims
Information for rescuers
Future Works
• Apply a softClustering method to retweet network– When softClustering is applied to retweet
network, a tweet can belong to multi clusters
47
Tsunami
Shelter
Donating money
Volunteer
Donating supplies
Information for victims
Information for rescuers
Future Works
• Reduce the amount of calculations– Information must be provided quickly in
disaster situation
48