Download pdf - Community detection and proximity measures · Community detection and proximity measures MaximilienDanisch1,2,Jean-LoupGuillaume1 andBénédicteLeGrand2 1LIP6(CNRS–UPMC)and2CRI(UniversitéParis1)

Community detection and proximity measuresMaximilien Danisch1,2, Jean-Loup Guillaume1 and Bénédicte Le Grand2

1 LIP6 (CNRS – UPMC) and 2 CRI (Université Paris 1)

Context: in complex networks, there are communities

networks nodes edges communitiesWikipedia Wikipedia pages hypertext links Wikipedia categoriesFacebook profiles friendships colleagues, families, sports club

P2P peers file exchanges communities of interests

Problematics: How to find these communities?I The greedy optimization of a quality function suffers from:(i) local minima and (ii) hidden scale parameters.I Let’s look for an alternative approach!

Idea 1: use the related notion of proximity measure

Rank nodes according to their proximity to a given node.Irregularities in the decrease can indicate the presence of communities:

I Often a powerlaw is obtained = no scale can be extracted= the node belongs to several communities of various sizes.

Idea 2: use the notion of multi-ego-centered community

I While one node generaly belongs to many communities, a smallset of nodes, e.g. 2, is enough to define a single community.

Idea 3: a proximity measure has to be parametrized

n common neighbors

2 1 3

Which node is closer to 1: 2 or 3?

What to do with HUBs?

Proxαβλ(i, j) = 1dβj

λ∑l=0αlNl(i, j)

Nl(i, j): number of paths of length lbetween i and j. dj: degree of node j.

I If we know some nodes that are near to one another, e.g. nodesbelonging to a same community, we can learn the parameters thatrank these nodes as close as possible to one another.

AUC optimization: the AUC is ameasure of the accuracy of a classifier,it is equal to the probability that aclassifier ranks a randomly chosenpositive instance higher than arandomly chosen negative one.

0.0001 0.001 0.01 0.1 1

ALPHA

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

BETA

0.989

0.9892

0.9894

0.9896

0.9898

0.99

0.9902

0.9904

Application 1: find all communities of a node

A- Choose a set of candidates:

C- Clean and label the communities:

0 10 20 30 40 50

0

10

20

30

40

500.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

B- Find bi-ego-centered communities:

100 101 102 103 104 105 10610-6

10-5

10-4

10-3

10-2

10-1

100

Chess BoxingChessboardMIN: Chess notation

100 101 102 103 104 105 10610-6

10-5

10-4

10-3

10-2

10-1

100

Chess BoxingAchi, NaganoMIN: Morabaraba

I We can use a similar framework to find all overlappingcommunities in a networks.

Application 2: community completion

Average number of paths of length 1 to 6 between two nodes in the samecommunity and two nodes in different communities:

100 101 102 103 104

DEGREE OF THE TARGET NODE10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

NUM

BER

OF N

BP O

F LE

NGTH

1 INOUT

100 101 102 103 104


10-2

10-1

100

101

102

NUM

BER

OF N

BP O

F LE

NGTH

2 INOUT

100 101 102 103 104


100

101

102

103

104

NUM

BER

OF N

BP O

F LE

NGTH

3 INOUT

100 101 102 103 104

DEGREE OF THE TARGET NODE103

104

105

106

NUM

BER

OF N

BP O

F LE

NGTH

4 INOUT

100 101 102 103 104


106

107

108

109

NUM

BER

OF N

BP O

F LE

NGTH

5 INOUT

100 101 102 103 104


109

1010

1011

1012

NUM

BER

OF N

BP O

F LE

NGTH

6 INOUT

Learning a proximity measure, combining scorings and unfolding the community:

0 500 1000 1500 2000 2500 3000 3500 4000

AMONG THE K FIRST NODES

0.0

0.2

0.4

0.6

0.8

1.0

PR

OPO

RTIO

N O

F N

OD

ES

OF

TH

E C

OM

MU

NIT

Y

DGLG

KATZ

T and F

CAROP

DISTANCE2.5

2.0

1.5

1.0

0.5

0.0

0.5

1.0

1.5

SECO

ND D

ERIV

ATIV

E

10-9

10-8

10-7

10-6

10-5

10-4

10-3

PROX

IMIT

Y SC

ORE

100 101 102 104 105 106103

ALL NODESTRAINING SETTEST SETSECONDE DERIVATIVE

RANK OF THE NODE