Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Graph Sampling and SparsificationLecture 19
CSCI 4974/6971
7 Nov 2016
1 / 10
Today’s Biz
1. Reminders
2. Review
3. Graph Sampling/Sparsification
2 / 10
Reminders
I Assignment 4: due date November 10thI Setting up and running on CCI clusters
I Assignment 5: due date TBD (before Thanksgivingbreak, probably 22nd)
I Assignment 6: due date TBD (early December)
I Tentative: No class November 14 and/or 17
I Final Project Presentation: December 8th
I Project Report: December 11th
I Office hours: Tuesday & Wednesday 14:00-16:00 Lally317
I Or email me for other availability
3 / 10
Today’s Biz
1. Reminders
2. Review
3. Graph Sampling/Sparsification
4 / 10
Quick Review
Graph Compression:
I
5 / 10
Today’s Biz
1. Reminders
2. Review
3. Graph Sampling/Sparsification
6 / 10
Sampling and Summarization for Social NetworksShouDe Lin, MiYen Yeh, and ChengTe Li, National Taiwan
University
7 / 10
Sampling and Summarization for Social NetworksPAKDD 2013 Tutorial
Shou‐De Lin*, Mi‐Yen Yeh#, and Cheng‐Te Li** Computer Science and Information Engineering, National Taiwan University
# Institute of Information Science, Academic [email protected], [email protected], [email protected]
Tutorial slides can be downloaded here: http://mslab.csie.ntu.edu.tw/tut‐pakdd13/
About This Tutorial
• It is a two‐hour tutorial for PAKDD2013 on socialnetwork sampling and summarization– We do not anticipate to cover everything relevant tothis topic.
– We will highlight the trend, categorize different types of strategies, and describe some ongoing works of us
• Agenda– Introduction + Sampling +Q/A(45+10 min)– Summarization + conclusion + Q/A (45+10 min)
13/05/02 Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 2
by Paul Butler
3What can be mined from this picture?
Big Social Network Billions of different types of nodes and links
Motivation
>1 billion
>500 million
>200 million
• Sometimes the full networks are not completely observed in advance
• Even they are, loading everything into memory for further analysis might not be feasible
• Even it is feasible, generating some simple statistics (e.g. average path length, diameter) can take a long time, not to mention more complicated ones (e.g. counting the occurrence of certain pattern)
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 4
•1+Billion users•Avg: 130 friends each node It costs >1TB memory to simply save the raw
graph data (without attributes, labels norcontent)
This can cause problems for information extraction, processing, and analysis
An Example on Facebook
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 5Two possible solutions: Sampling and Summarization
Sampling Versus Summarization• Sampling
– Assume the information of nodes/links become known only after they are sampled
– Require certain sampling strategy to explore/expand the network gradually
– Goal: gradually identify a small set of representativenodes and links of a social network, usually given little prior information about this network
• Summarization– The entire social network is known in prior– Goal: condense the social network as much as possible without losing too much information
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 6
Homogeneous VS Heterogeneous Social Networks
• Homogeneous Single Relational Network– Single object type & Link type
• HeterogeneousMulti‐Relational Network– Multiple object type & Link type
• Example– Homogeneous
– HeterogeneousLink TypesFriendFamilyLove
Link TypesFriend
13/05/02 7Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial
Sampling for Social Networks
Sampling Social Networks• Assume that the detailed information of a node can only be seen after it is sampled– Entire social network is not known in advance
• Goal– Sample (i.e. gradually observe nodes and links) a sub‐network that represents the whole network
• To preserve certain properties of the original network
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 9
Evaluating the Sampling Quality
• How to measure the quality of the sampling algorithm?
• A sampling algorithm is effective if– The sampled social network can preserve certain network properties
– Using the sampled network to perform an ultimate task (e.g. centrality analysis, link prediction, etc), one can produce similar results as if this task were performed on the fully observed network
– The sample sub‐network is small
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 10
Properties Preserved (1/3)
• Homogeneous Static Social Network– In/Out Degree Distribution– Path Length Distribution– Clustering Coefficient Distribution– Eigenvalues– Weakly/Strongly Connected Component Size Distribution
– Community Structure– Etc..
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 11
Properties Preserved (2/3)• Homogeneous Dynamic Social Networks(Graphs are time‐evolving)
– Densification Power Law• Number of edges vs. number of nodes over time
– Shrinking diameter• Observed that shrinks and stabilizes over time
– Average clustering coefficient over time– Largest singular value of graph adjacency matrix over time
– Etc…Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 12
Properties Preserved (3/3)
• Heterogeneous Social Network– Note type Distribution– Intra‐link and Inter‐link type Distribution– Higher‐order types connection
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 13
Evaluation Metrics• Whether certain properties are preserved
– For single value properties (E.g. clustering coefficient, average path length), one can measure whether this value is preserved
– For distributional properties (E.g. degree distribution, component size distribution), one can compute the distance between two distributions (e.g. KL divergence)
• Whether certain end‐task can be performed similarly– Performing a certain task using the sampled network, and check whether the results are similar to those when the full network is used
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 14
Sampling for HomogeneousSocial Networks
Three Main Strategies
• Node Selection• Edge Selection• Sampling by Exploration
– Random Walk– Graph Search– Chain‐Referral Sampling
Seeds (i.e., ego)
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 16
Node Selection
• Random Node Sampling– Uniformly select a set of nodes
• Degree‐based Sampling [Adamic’01]
– the probability of a node being selected is proportional to its degree (assuming known)
• PageRank‐based Sampling [Leskovec’06]– the probability of a node being selected is proportional to its PageRank value (assuming known)
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 17
Edge Selection
• Random Edge (RE) Sampling– Uniformly select edges at random, and then include the associated nodes
• Random Node‐Edge (RNE) Sampling– Uniformly select a node, then uniformly select an edge incident to it
• Hybrid Sampling [Leskovec’06]– With probability p perform RE sampling, with probability 1‐p perform RNE sampling
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 18
Edge Selection (cont.)• Induced Edge Sampling [Ahmed’12]
– Step 1: Uniformly select edges (and consequently nodes) for several rounds
– Step 2: Add edges that exist between sampled nodes• Frontier Sampling [Ribeiro’10]
– Step 0: Randomly select a set of nodes L as seeds– Step 1: Select a seed u from L using degree‐based sampling
– Step 2: Select an edge of u, (u, v), uniformly– Step 3: Replace u by v in L and add (u, v) to the sequence of sampled edges
– * Repeat Step 1 to 3Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 19
Sampling by Exploration• Random Walk [Gjoka’10]
– The next‐hop node is chosen uniformly among the neighbors of the current node
• Random Walk with Restart [Leskovec’06]– Uniformly select a random node and perform a random walk with restarts
• Random Jump [Ribeiro’10]– Same as random walk but with a probability p we jump to any node in the network
• Forest Fire [Leskovec’06]– Choose a node u uniformly – Generate a random number z and select z out links of u that are not yet visited
– Apply this step recursively for all newly added nodesLin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial
13/05/02 20
Sampling by Exploration (cont.)
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 21
• Ego‐Centric Exploration (ECE) Sampling– Similar to random walk, but each neighbor has p probability to be selected
– Multiple ECE (starting with multiple seeds)• Depth‐First / Breadth‐First Search [Krishnamurthy’05]
– Keep visiting neighbors of earliest / most recently visited nodes
• Sample Edge Count [Maiya’11]
– Move to neighbor with the highest degree, and keep going
• Expansion Sampling [Maiya’11]
– Construct a sample with the maximal expansion. Select the neighbor v based on
S: the set of sampled nodes, N(S): the 1st neighbor set of S∈ ∪
Example: Expansion Sampling
EG
H
F
A
B C
D
|N({A})|=4
|N({E}) – N({A}) ∪{A}|=|{F,G,H}|=3|N({D}) – N({A}) ∪{A}|=|{F}|=1
qk ‐ sampled node degree distribution
pk ‐ real node degree distribution
Drawback of Random Walk: Degree Bias!
• Real average node degree ~ 94, Sampled average node degree ~ 338• Solution: modify the transition probability :
13/05/02 23
,
1∗ min 1,
1 ,
0
If w is a neighbor of v
If w = v
otherwise
Metropolis Graph Sampling• Step 1: Initially pick one subgraph sample S with n’ nodes randomly
• Step 2: Iterate the following steps until convergence2.1: Remove one node from S2.2: Randomly add a new node to S S’2.3: Compute the likelihood ratio
– *(S) measures the similarity of a certain property between the sample S and the original network G
• Be derived approximately using Simulated Annealing
[Hubler’08]
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 24
∗ ′∗
1: : ≔ 1: : ≔ with probability
: ≔ with probability 1
Sampling for Heterogeneous Social Networks
Sampling on Heterogeneous Social Networks
• Heterogeneous Social Networks (HSN)– A graph G=<V, E> has n nodes (v1,v2, …, vn), m directed edges (e1, …, em) and k different types
– Each node/edge belongs to a type• Given a finite set L = {L1, ..., Lk} denoting k types
• Sampling methods for HSN– Multi‐graph sampling– Type‐distribution preserving sampling– Relational‐profile preserving sampling
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 26
[Gjoka’10]
(Li’ 11)
(Yang’13)
Multigraph Sampling
• Random walk sampling on the union multiple graph to avoid stopping on the disconnected graph.
13/05/02 Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 27
Sampling Heterogeneous Social Networks
• Sampling methods for HSN– Multi‐graph sampling– Type‐distribution preserving sampling– Relational‐profile preserving sampling
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 28
[Gjoka’10](Li’ 11)
(Yang’13)
Node Type Distribution Preserving Sampling
• Given a graph G and a sampled subgraph GS
• The node type distribution of GS is expected to be the same as G, i.e., d(Dist(Gs),Dist(G)) = 0– d() denotes the difference between two distributions
(9:6) = (3:2)
Sampled NetworkOriginal Network
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 29
Connection‐type Preserving Sampling
• Heterogeneous Connection– For an edge E[vi,vj]– Intra‐connection edge: Type(vi) = Type(vj)– Inter‐connection edge: Type(vi) != Type(vj)
• Intra‐Relationship preserving– The ratio of the intra‐connection should be preserved, that is:
d(IR(GS),IR(G)) = 0– If the intra‐relationship is preserved, the inter‐relationship is
also preserved
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 30
Respondent‐driven Sampling• First proposed in social science[Heck’99] to solve the hidden
population in surveying.• Two Main Phases:
Snowball sampling Finding steady‐state in Recruitment matrix
31
G
respondents
limited coupon c
limited coupon c
limited coupon c
S11 S12 S13
S21 S22 S23
S31 S32 S33
N‐step
transition P1 P2 P3
Transition Matrix
steady‐state vector
• Respondent‐driven Sampling does a good job with small node size, but saturate to mediocre afterwards
• Random node sampling performs poorly in the beginning, but reaches the best results after sufficient amount of nodes are sampled.
13/05/02 Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 32
Comparing Different Sampling algorithmsSimilarity of node type‐distribution Similarity of Intra‐link distribution
Heterogeneous Social Networks
• Sampling methods for HSN– Multi‐graph sampling– Type‐distribution preserving sampling– Relational‐profile preserving sampling
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 33
[Gjoka’10]
(Li’ 11)
(Yang’13)
Relational Profile Preserving Sampling• Node‐type/intra‐type preservation considers the semantics of nodes, but not the structure of networks
• Propose the Relational Profile to consider semantic and structure all together– Capture the dependency between each Node Type(NT) and Edge
Type(ET) of a directed Heterogeneous Network– Consists of 4 Relational Matrices
• Conditional probabilities P(Tj|Ti) (e.g. P(LT=cites|NT=paper) )• Node to node, node to edge, edge to node, edge to edge
NT ET
NT Transition Matrix
Transition Matrix
ET Transition Matrix
Transition Matrix
papercites
cites
journal_of
authored
author
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 34
Example of Relational Profile (RP)P A C J c p a
P 0.44 0.22 0.22 0.11 0.44 0.33 0.22
A 1 1
C 1 1
J 1 1
c 1 0.22 0.44 0.33
p 0.5 0.33 0.17 0.66 0.33
a 0.5 0.5 0.6 0.4
P A C J c p a
P 0.182 0.364 0.091 0.273 0.182 0.364 0.364
A 1 1
C 1 1
J 1 1
c 1 0.5 0.5
p 0.5 0.125 0.375 0.17 0.5 0.33
a 0.5 0.5 0.22 0.33 0.44
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 35
Challenge: How to approximate RP when the true RP is unknown
• We propose Exploration by Expectation Sampling• Aim to preserve the unknown relational profile while adding new sample node1. Randomly choose a starting node and the corresponding edges 2. Based on current RP, select a next node from all 1 degree neighbor3. Add the new node and all its edges4. Update RP of the sub‐sampled graph5. Repeat step 3, 4 & 5 until the converge of RP
• Which node should be selected?– Select the node whose inclusion can potentially lead to the largest change to the existing RP
• Use the partially observed RP to generate the ‘expected amount of change’ of each node as its score
• Weighted sampling based on the score
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 36
Relational Profile Sampling (RPS)
D(v, Gs) = estimated change of RP given sampling v on the current graph Gs=E[ΔP(Gs, Gs+v)|Gs] , where ΔP = RMSERP
Goal: maximize expected property (Relational Profile distribution) change
Exploiting the existing RP, P(type(v)=t|Gs) can beobtained using the observed types of v’s neighbors
vwhich can be calculated as
vRP(type |type )
RP(type |type )
RP(type |type )
RP(type |type )
P(type|type) can be obtained from the existing RP
Idea: Sample to increase the diversity
Gs
Evaluation• Datasets: 3 real‐life large scale social networks• Baselines:
– Random Walk Sampling (RW)– Degree‐based sampling (HDS)
• Evaluation I (Property Preservation): see how well the sampled network approximates two properties of the full network
• Evaluation II (Prediction): training a prediction model using the sampled network to infer out‐of‐sampled network status:– Node Type Prediction: Predict the type of unseen nodes in the
network using a sub‐sampled network– Missing Relations Prediction: Recover/predict the missing links– Features:
• fdeg = (in/out deg; avg in/out deg of neighbors)• ftopo = (Common Neighbors; Jaccard’s Coefficient; etc)• fnt = P(type(v)|Gs)=• fRPnode = • fRPpath =
Experiments (Property Preservation)• RP (RMSE)
• Weighted PageRank
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
1 5 9 13 17 21 25 29 33 37 41 45 49
Kend
all‐T
au
# Nodes Sampled (in 10s)
RW HDS RPS
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
1 5 9 13 17 21 25 29 33 37 41 45 49民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
1 6 11 16 21 26 31 36 41 46
Hep Aca Movie
Type dependency preservation
Preserving relative node weights propagated throughout entire network
Experiments (Prediction)• We show Academic Network for brevity.
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
民國前/通用格式
Accuracy
number of sampled nodes
highDeg RandWalk RPS
Node Type Prediction Missing Relation Prediction
Task‐driven Network Sampling• Sampling Community Structure
[Maiya’10][Satuluri’11]
• Sampling Network Backbone for Influence Maximization [Mathioudakis’11]
• Sampling High Centrality Individuals [Maiya’10]
• Sampling Personalized PageRank Values [Vattani’11]
• Sampling Network for Link/Label Prediction [Ahmed’12]
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 41
Short Summary
Lin et al., Sampling and Summarization for Social Networks, PAKDD 2013 tutorial 13/05/02 42
Homogeneous SN Heterogeneous SNNode and Edge
Selection[Leskovec’06] [Adamic’01] [Ahmed’12][Ribeiro’10] [Kurant’12]
Sampling by Exploration
[Krishnamurthy’05] [Leskovec’06][Hubler’08][Gjoka’10][Ribeiro’10] [Maiya’11][Kurant’11]
[Gjoka’11][Li’11][Kurant’12][Yang’13]
Task‐driven Sampling
[Maiya’10][Satuluri’11][Mathioudakis’11][Vattani’11][Ahmed’12]
• Why sampling a social network? the full network (e.g. Facebook) cannot be fully observed crawling can be costly in terms of resource and time consumption (therefore
a smart sampling strategy is needed)
Detecting Community Structures in Social Networks byGraph Sparsification
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha,Subhasis Majumder, Heritage Institute of Technology, Kolkata,
India
8 / 10
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Detecting Community Structures in Social Networksby Graph Sparsification
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder
Department of Computer Science and Engineering,Heritage Institute of Technology, Kolkata, India
September 5, 2016
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Figure: The tendency of people to live in racially homogeneous neighborhoods[1]. In yellow andorange blocks % of Afro-Americans ≤ 25, in brown and black boxes % ≥ 75.
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detection
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Definition of a Community
For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V.
For disjoint communities, ∀i, j we have Ci⋂Cj = ∅
For overlapping communities, ∃i, j where Ci⋂Cj 6= ∅
Figure: Zachary’s Karate Club Network
C = {C1, C2, C3}, C1 = yellow nodes,C2 = green, C3 = blue is a disjoint cover
However, C = {C1, C2}, C1 = yellow &green nodes and C2 = blue & green nodesis an overlapping cover
For our problem, we concentrate on disjoint community detectionPartha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
A Little Background: Edge Betweenness Centrality
cB(e) =∑
s,t∈Vs6=t
σ(s, t | e)σ(s, t)
Top 6 edgesEdge cB(e) Type
(10, 13) 0.3 inter(3, 5) 0.23333 inter
(7, 15) 0.2079 inter(1, 8) 0.1873 inter
(13, 15) 0.1746 intra(5, 7) 0.1476 intra
Bottom 6 edgesEdge cB(e) Type
(8, 11) 0.022 intra(1, 2) 0.0269 intra
(9, 11) 0.031 intra(8, 9) 0.0412 intra
(12, 15) 0.052 intra(3, 4) 0.060 intra
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
A Little Background: Edge Betweenness Centrality
cB(e) =∑
s,t∈Vs6=t
σ(s, t | e)σ(s, t)
Top 6 edgesEdge cB(e) Type
(10, 13) 0.3 inter(3, 5) 0.23333 inter
(7, 15) 0.2079 inter(1, 8) 0.1873 inter
(13, 15) 0.1746 intra(5, 7) 0.1476 intra
Bottom 6 edgesEdge cB(e) Type
(8, 11) 0.022 intra(1, 2) 0.0269 intra
(9, 11) 0.031 intra(8, 9) 0.0412 intra
(12, 15) 0.052 intra(3, 4) 0.060 intra
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
A Little Background: Edge Betweenness Centrality
cB(e) =∑
s,t∈Vs6=t
σ(s, t | e)σ(s, t)
Top 6 edgesEdge cB(e) Type
(10, 13) 0.3 inter(3, 5) 0.23333 inter
(7, 15) 0.2079 inter(1, 8) 0.1873 inter
(13, 15) 0.1746 intra(5, 7) 0.1476 intra
Bottom 6 edgesEdge cB(e) Type
(8, 11) 0.022 intra(1, 2) 0.0269 intra
(9, 11) 0.031 intra(8, 9) 0.0412 intra
(12, 15) 0.052 intra(3, 4) 0.060 intra
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
The Girvan-Newman Algorithm
Proposed by Michelle Girvan and Mark Newman[2] in 2002
The Key Ideas
Based on reachability of nodes - shortest paths
Edges are selected on the basis of the edge betweenness centrality
The algorithm
1 Compute centrality for all edges
2 Remove edge with largest centrality; ties can be broken randomly
3 Recalculate the centralities on the running graph
4 Iterate from step 2, stop when you get clusters of desirable quality
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
(a) Best edge: (10, 13)
(f) Final graph
(b) Best edge: (3, 5)
(e) Best edge: (2, 11)
(c) Best edge: (7, 15)
(d) Best edge: (1, 8)
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method: A Greedy Approach
Proposed by Blondel et al[3] in 2008
Takes the greedy maximization approach
Very fast in practice, it’s the current state-of-the-art in disjoint communitydetection
Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity
Contracts the graph in each iteration thereby speeding up the process
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method: A Greedy Approach
Proposed by Blondel et al[3] in 2008
Takes the greedy maximization approach
Very fast in practice, it’s the current state-of-the-art in disjoint communitydetection
Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity
Contracts the graph in each iteration thereby speeding up the process
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method: A Greedy Approach
Proposed by Blondel et al[3] in 2008
Takes the greedy maximization approach
Very fast in practice, it’s the current state-of-the-art in disjoint communitydetection
Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity
Contracts the graph in each iteration thereby speeding up the process
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method: A Greedy Approach
Proposed by Blondel et al[3] in 2008
Takes the greedy maximization approach
Very fast in practice, it’s the current state-of-the-art in disjoint communitydetection
Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity
Contracts the graph in each iteration thereby speeding up the process
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method: A Greedy Approach
Proposed by Blondel et al[3] in 2008
Takes the greedy maximization approach
Very fast in practice, it’s the current state-of-the-art in disjoint communitydetection
Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity
Contracts the graph in each iteration thereby speeding up the process
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Louvain Method in Action
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Outline for Part I
1 Building Community Preserving Sparsified NetworkAssigning Meaningful Weights to EdgesSparsification using t-spanner
2 Fast Detection of Communities from the Sparsified NetworkMethodology and VisualizationsExperimental Results
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Our Method
Input: An unweighted network G(V, E)
Output: A disjoint cover C
1 Use Jaccard coefficient to turn G into a weighted network G(V, E ,W)
2 Construct an t-spanner of G(V, E ,W). Take the complement of GS , call it Gcomm
3 Use LINCOM to break Gcomm into small but pure fragments
4 Use the second phase of Louvain Method to piece all the small bits and piecestogether to get C
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Our Method
Input: An unweighted network G(V, E)
Output: A disjoint cover C
1 Use Jaccard coefficient to turn G into a weighted network G(V, E ,W)
2 Construct an t-spanner of G(V, E ,W). Take the complement of GS , call it Gcomm
3 Use LINCOM to break Gcomm into small but pure fragments
4 Use the second phase of Louvain Method to piece all the small bits and piecestogether to get C
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Our Method
Input: An unweighted network G(V, E)
Output: A disjoint cover C
1 Use Jaccard coefficient to turn G into a weighted network G(V, E ,W)
2 Construct an t-spanner of G(V, E ,W). Take the complement of GS , call it Gcomm
3 Use LINCOM to break Gcomm into small but pure fragments
4 Use the second phase of Louvain Method to piece all the small bits and piecestogether to get C
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Our Method
Input: An unweighted network G(V, E)
Output: A disjoint cover C
1 Use Jaccard coefficient to turn G into a weighted network G(V, E ,W)
2 Construct an t-spanner of G(V, E ,W). Take the complement of GS , call it Gcomm
3 Use LINCOM to break Gcomm into small but pure fragments
4 Use the second phase of Louvain Method to piece all the small bits and piecestogether to get C
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Our Method
Input: An unweighted network G(V, E)
Output: A disjoint cover C
1 Use Jaccard coefficient to turn G into a weighted network G(V, E ,W)
2 Construct an t-spanner of G(V, E ,W). Take the complement of GS , call it Gcomm
3 Use LINCOM to break Gcomm into small but pure fragments
4 Use the second phase of Louvain Method to piece all the small bits and piecestogether to get C
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Jaccard Intro
Definition
wJ(e(vi, vj)) =|Γ(vi) ∩ Γ(vj)||Γ(vi) ∪ Γ(vj)|
where Γ(vi) is the neighborhood of the node vi∴ wJ ∈ [0, 1]
Jaccard works well in domains where local influence is important[4][5][6]
The computation takes O(m) time
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Jaccard Intro
Definition
wJ(e(vi, vj)) =|Γ(vi) ∩ Γ(vj)||Γ(vi) ∪ Γ(vj)|
where Γ(vi) is the neighborhood of the node vi∴ wJ ∈ [0, 1]
Jaccard works well in domains where local influence is important[4][5][6]
The computation takes O(m) time
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Jaccard Example
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Jaccard Example
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Jaccard Example
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Table: Jaccard weight statistics for top 10% edges in terms of wJ .
Network |E| intra-cluster top 10% edges in terms of wJ
edge count Total edges Intra-edge Fraction
Karate 78 21 7 7 1
Dolphin 159 39 15 15 1
Football 613 179 61 61 1
Les-Mis 254 56 25 25 1
Enron 180,811 48,498 18,383 18,220 0.99113
Epinions 405,739 146,417 40,573 36,589 0.90180
Amazon 925,872 54,403 92,587 92,584 0.99996
DBLP 1,049,866 164,268 104,986 104,986 1
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Spanner
A (α, β)-spanner of a graph G = (V, E , W) is a subgraph GS = (V, ES , WS),
such that,
δS(u, v) ≤ α . δ(u, v) + β ∀ u, v ∈ V
A t-spanner is a special case of (α, β) spanner where α = t and β = 0
Authors Size Running Time
Althofer et al. [1993] [7] O(n1+1k ) O(m(n1+
1k + nlogn))
Althofer et al. [1993] [7] 12n
1+ 1k O(mn1+
1k )
Roddity et al. [2004] [8] 12n
1+ 1k O(kn2+
1k )
Roddity et al. [2005] [9] O(kn1+1k ) O(km) (det.)
Baswana and Sen [2007] [10] O(kn1+1k ) O(km) (rand.)
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Spanner
A (α, β)-spanner of a graph G = (V, E , W) is a subgraph GS = (V, ES , WS),
such that,
δS(u, v) ≤ α . δ(u, v) + β ∀ u, v ∈ VA t-spanner is a special case of (α, β) spanner where α = t and β = 0
Authors Size Running Time
Althofer et al. [1993] [7] O(n1+1k ) O(m(n1+
1k + nlogn))
Althofer et al. [1993] [7] 12n
1+ 1k O(mn1+
1k )
Roddity et al. [2004] [8] 12n
1+ 1k O(kn2+
1k )
Roddity et al. [2005] [9] O(kn1+1k ) O(km) (det.)
Baswana and Sen [2007] [10] O(kn1+1k ) O(km) (rand.)
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: Original network n = 11,m = 18δ(1, 5) = 5
Figure: A 3-spanner of the networkn = 11,m = 11 δs(1, 5) = 12
Since δs(1, 5) < t . δ(1, 5), the edge (1, 5) is discardedThe other edges are discarded in a similar fashion.
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: Dolphin network. n = 62, m = 159
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: 3-spanner. n = 62, m = 150
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: 5-spanner. n = 62, m = 148
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: 7-spanner. n = 62, m = 144
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: 9-spanner. n = 62, m = 138
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Name n Spanner #intra-community #inter-community
Karate 34
Original 59 193 57 195 53 197 51 189 48 19
Dolphin 59
Original 120 393 117 385 102 387 100 389 90 38
Football 115
Original 447 1633 385 1665 376 1667 293 1669 286 165
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Building Community Preserving Sparsified NetworkFast Detection of Communities from the Sparsified Network
Assigning Meaningful Weights to EdgesSparsification using t-spanner
Figure: Original US Footballnetwork
Figure: Sparsified networkGcomm
Figure: Final network withcommunities marked asseparate components
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Shreshtha, Subhasis Majumder Detecting Community Structures in Social Networks by Graph Sparsification
Today: In class work
I Implement node and edge sampling methods
I Compare their efficacy on various networks
9 / 10
Graph Sparsification and SamplingBlank code and data available on website
(Lecture 19)www.cs.rpi.edu/∼slotag/classes/FA16/index.html
10 / 10