Community detection and proximity measuresMaximilien Danisch1,2, Jean-Loup Guillaume1 and Bénédicte Le Grand2
1 LIP6 (CNRS – UPMC) and 2 CRI (Université Paris 1)
Context: in complex networks, there are communities
networks nodes edges communitiesWikipedia Wikipedia pages hypertext links Wikipedia categoriesFacebook profiles friendships colleagues, families, sports club
P2P peers file exchanges communities of interests
Problematics: How to find these communities?I The greedy optimization of a quality function suffers from:(i) local minima and (ii) hidden scale parameters.I Let’s look for an alternative approach!
Idea 1: use the related notion of proximity measure
Rank nodes according to their proximity to a given node.Irregularities in the decrease can indicate the presence of communities:
I Often a powerlaw is obtained = no scale can be extracted= the node belongs to several communities of various sizes.
Idea 2: use the notion of multi-ego-centered community
I While one node generaly belongs to many communities, a smallset of nodes, e.g. 2, is enough to define a single community.
Idea 3: a proximity measure has to be parametrized
n common neighbors
2 1 3
Which node is closer to 1: 2 or 3?
What to do with HUBs?
Proxαβλ(i, j) = 1dβj
λ∑l=0αlNl(i, j)
Nl(i, j): number of paths of length lbetween i and j. dj: degree of node j.
I If we know some nodes that are near to one another, e.g. nodesbelonging to a same community, we can learn the parameters thatrank these nodes as close as possible to one another.
AUC optimization: the AUC is ameasure of the accuracy of a classifier,it is equal to the probability that aclassifier ranks a randomly chosenpositive instance higher than arandomly chosen negative one.
0.0001 0.001 0.01 0.1 1
ALPHA
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
BETA
0.989
0.9892
0.9894
0.9896
0.9898
0.99
0.9902
0.9904
Application 1: find all communities of a node
A- Choose a set of candidates:
C- Clean and label the communities:
0 10 20 30 40 50
0
10
20
30
40
500.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
B- Find bi-ego-centered communities:
100 101 102 103 104 105 10610-6
10-5
10-4
10-3
10-2
10-1
100
Chess BoxingChessboardMIN: Chess notation
100 101 102 103 104 105 10610-6
10-5
10-4
10-3
10-2
10-1
100
Chess BoxingAchi, NaganoMIN: Morabaraba
I We can use a similar framework to find all overlappingcommunities in a networks.
Application 2: community completion
Average number of paths of length 1 to 6 between two nodes in the samecommunity and two nodes in different communities:
100 101 102 103 104
DEGREE OF THE TARGET NODE10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
NUM
BER
OF N
BP O
F LE
NGTH
1 INOUT
100 101 102 103 104
DEGREE OF THE TARGET NODE10-3
10-2
10-1
100
101
102
NUM
BER
OF N
BP O
F LE
NGTH
2 INOUT
100 101 102 103 104
DEGREE OF THE TARGET NODE10-1
100
101
102
103
104
NUM
BER
OF N
BP O
F LE
NGTH
3 INOUT
100 101 102 103 104
DEGREE OF THE TARGET NODE103
104
105
106
NUM
BER
OF N
BP O
F LE
NGTH
4 INOUT
100 101 102 103 104
DEGREE OF THE TARGET NODE105
106
107
108
109
NUM
BER
OF N
BP O
F LE
NGTH
5 INOUT
100 101 102 103 104
DEGREE OF THE TARGET NODE108
109
1010
1011
1012
NUM
BER
OF N
BP O
F LE
NGTH
6 INOUT
Learning a proximity measure, combining scorings and unfolding the community:
0 500 1000 1500 2000 2500 3000 3500 4000
AMONG THE K FIRST NODES
0.0
0.2
0.4
0.6
0.8
1.0
PR
OPO
RTIO
N O
F N
OD
ES
OF
TH
E C
OM
MU
NIT
Y
DGLG
KATZ
T and F
CAROP
DISTANCE2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
SECO
ND D
ERIV
ATIV
E
10-9
10-8
10-7
10-6
10-5
10-4
10-3
PROX
IMIT
Y SC
ORE
100 101 102 104 105 106103
ALL NODESTRAINING SETTEST SETSECONDE DERIVATIVE
RANK OF THE NODE