Upload
symeon-papadopoulos
View
1.761
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Paper presentation in PCI 2013. Abstract:
Citation preview
PCI13 Thessaloniki, 19 Sep 2013
Community Structure, Interaction and Evolution
Analysis of Online Social Networks around Real-World
Social Phenomena
Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
Problem
#2
Online Social Networks (OSNs) are immense!
#3
Motivation
• Social Networks – Used to be small (Grevy's zebra dataset) – Easy to organize
• Online Social Networks (Twitter) – Have an immense amount of data – Incredibly difficult to organize and extract useful information
• Ways to monitor activity in OSNs: – Keywords (Produces too much info, doesn’t work when lexical variations are used) – Newshounds and Persons of Interest (may result in loss of info)
• Proposal to leverage: – Time – Communities formulated by users interested in a specific topic – The behavior of these communities in time
• Provide the user with info regarding: – Temporal user activity per topic – Influential, Stable and Persistent Communities – Users worth following (possibility of new newshounds) – Content worth monitoring
#4
Framework overview
Feature
Fusion
Most influential
users and
communities
+
Popular
hashtags
Persistence
Stability
Centrality*
(PageRank)
Community
Size
Evolution
Heatmap
Pre-processsing
(Information
Extraction)
Temporal
Adjacency Matrix
Creation
Interaction Data
Discretization
Community
Evolution Detection
Community
Detection
(Louvain)
Ranking Process
Evolution Detection Process
*Ongoing work
Twitter Data
Mentions and
hashtags in
time
#5
Interaction data discretization
• Community evolution study requires timeslot analysis
• Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened)
• In this framework, the timeslots are created using the local minima of the overall activity
• Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation
• Minima and negative slopes show us that the users’ interest is diminishing
#6
Interaction data discretization example
#7
Community detection & evolution
1 1 2 1 1 3 1 2 1 1 1
2 2 2 2
1 1 1 1 1 1
1 1
2 1
2 1 4 1
1 2 2 2
2 1 1 1
1 8 2 1 1 1 1 1
2 4
1 1 1 2 1
1 1 1 2 1
1 1 1 1 1
4 1 2 1
1 1 1 4
1 1 2 1 1 3
1 1 1 1 2
1 1 2 1 1
1 1 1 2 1
5 1 1 2 2
Timeslot (n-2)
Timeslot (n-1)
Timeslot (n)
Timeslot (n+1)
Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.)
n-1 n n+1
T1
T5
T4
T3
T2
C6(n-1)
C1n C1(n+1)C1(n-1)
C2(n-1) C2n C2(n+1)
C4(n-1) C4(n+1)
C5n C5(n+1)
C3n C3(n+1)C3(n-1)
Sequential Adjacency Matrices Evolving Communities Timeslots [1,…,n-1,n,n+1,…]
Communities C = {C1n,C2n, ...,Ckn}
Time-Evolving Communities Ti
Louvain Community Detection
A popular greedy modularity optimization approach.
The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced:
a) Small community detection by local modularity optimization
b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes
It was selected due to its efficiency regarding:
• Speed
• Accuracy when dealing with ad-hoc networks
• Due to its hierarchical structure it allows to look at communities at different resolutions
#8
T11 T21 T41 T61 T81 T91
T11 T41 T52 T91
T11 T21 T52 T81 T91
T21 T52 T74 T91
T41 T52 T74 T81 T91
#9
Community evolution detection
C11 C21 C31 C41 C51 C61 C71 C81 C91
C12 C22 C32 C42 C52 C62 C72 C82 C92
C13 C23 C33 C43 C53 C63 C73 C83 C93
C14 C24 C34 C44 C54 C64 C74 C84 C94
C15 C25 C35 C45 C55 C65 C75 C85 C95
Comparing the communities from each row to communities from past rows using the Jaccard Index
Community similarity according to:
• Jaccard Index • Adaptive threshold
Adaptive threshold:
• Relative to size • Range: [0.7,0.1]
#10
Single timeslot graph example
Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity.
#11
Evolution features, fusion & ranking
Centrality
Persistence
Stability
Community Evolution
Dynamic Community
Ranking
Ranked Communities
(All Users)
Ranked Users in Communities
based on Centrality
Content (txt) from timeslots of
interest
User Interface
• Persistence: overall appearances / total number of timeslots
• Stability: overall consecutive appearances/ total number of timeslots
• PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links
Pros and Cons
#12
Dynamic Community and User Ranking
• Advantages – Saves user time (manually searching for news is extremely time
consuming)
– Enables browsing through the most important information
– Provides a sense of user importance over time (users worth following for future investigations)
• Disadvantages – Community Detection and Community Evolution Detection are slow
processes
– No semantic ranking (lack of content consideration) renders the framework susceptible to error
Framework application example
Application on a dataset extracted from the Twitter OSN.
• Dataset Characteristics: – Period: 32 days
– Keywords: 40 (English and Greek)
– Unique users: 857K
– Messages: 880K
– Edges: 1.07M
#13
Greek Global
Hashtags Keywords Hashtags Keywords
Michaloliakos nazi
#Xryshaygh Kasidiaris #nazi far right
#GoldenDawn golden dawn #extremeright extreme right
#Kasidiaris xrysh aygh #farright Hitler
illegal immigrants Swastica
Framework application example
• Results – Total number of communities:
232K
– Final number of communities (excluding self loops & communities<3): 89K
– Total evolution steps: 7K
– Total evolving communities: 1.1K
– Number of Timeslots: 28
#14
• Light Shades signify Small communities • Dark Shades signify Large Communities
Framework application example (results)
Rank 1 2 3 4 5
Community Id 1,122 13,2044 10,404 18,89 22,2
Timeslot appearance
1,2,3,4,5,6,7,8,9,11,13
13,15,16,17,18,19,20,22,23,25
10,11,12,15,16,17,18,19
18,19,20,21,22,23,25
22,23,24,25,26,27
Size/slot 16,15,8,5,7,28,4,8,9,8,30
3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3 36,137,323,281,64,146,139
977,1129,942,946,1251,2054
Persistence 0.392857 0.357142 0.285714 0.25 0.214285
Stability 0.310344 0.241379 0.241379 0.206896 0.206896
Centrality 0.635401 0.801170 0.817923 0.820052 0.797400
Popular Tags (ranked)
Indiebooks, bcn, madrid, andalucía, españa
keepmovingforward Israel, ashkenazi, ptsd, 2rrf
Jamaat, nazi, shahbag, taliban, sayeedi
1,01,31,4,2
Topic Spanish book on Hitler: El Legado
Pakistani person named Nazi
Israeli anti-nazi posts
Associating Jamaat (Bangladesh) to nazi
Videogame
#15
Framework application example (Greek interest)
Group of interconnected foreign and Greek communities surrounded by an abundance of groups and single users.
#16
A Greek community commenting on a poll that presented the GGD party as the most popular amongst unemployed citizens
Future Work
• Enhance community similarity search (speedup)
• Framework enrichment by incorporating retweets as a feature
• Introduce to journalists for constructive criticism
#17
Mention, Retweet &
Timestamp Information
Extraction
Community
Detection
Community
Evolution
Detection
Community
Size
Total # of
Mentions
Degree of
mentions
Persistence
Stability
Centrality
Could they be
used as a
Ground Truth
Set?
Provide a
base line
Fusion
Most
influential
users and
communities
+
Popular
hashtags
Query
Correction &
Improvement
via Relevance
Feedback?
Twitter Data
Retweets in
time
Conclusions
• A framework for extracting information from evolving communities in dynamic social networks.
• Significant information can be retrieved by studying the evolution of communities of OSNs (e.g. Twitter).
• Existence of a large number of dynamic communities with various evolutionary characteristics.
#18
Thank you!
Questions?
#19
Data and code are available at:
https://github.com/socialsensor/community-evolution-analysis/