27
DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1 ,Michele Coscia 3 , Fosca Giannotti 2 , Dino Pedreschi 2,1 1 Computer Science Dep., University of Pisa, Italy {rossetti,pedre}@di.unipi.it 2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, giulio.rossetti}@isti.cnr.it 3 Harvard Kennedy School, Cambridge, MA, US [email protected] April 23th 2013

DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Embed Size (px)

Citation preview

Page 1: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

DEMONA Local-first Discovery Method For

Overlapping Communities

Giulio Rossetti2,1 ,Michele Coscia3, Fosca Giannotti2, Dino Pedreschi2,1

1 Computer Science Dep., University of Pisa, Italy {rossetti,pedre}@di.unipi.it2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, giulio.rossetti}@isti.cnr.it

3 Harvard Kennedy School, Cambridge, MA, US [email protected]

April 23th 2013

Page 2: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Outline• Problem Definition

• What is a community?• Community Discovery

• Communities and complex (social) networks• A matter of perspective

• DEMON Algorithm(s)• Properties• Experiments• Extension

• Conclusions

Page 3: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

What is a community?Unfortunately does not exist a completely shared definition of what a community is.

A general idea is that a community represent:

“A set of entities where each entity is closer, in the network sense, to the other entities within the community than to the entities outside it.”

or

“A set of nodes tightly connected within each other than with nodes belonging to other sets.”

Page 4: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Community DiscoveryThe aim of CD algorithms is to identify communities hidden into complex network structure

Why Community Discovery?• “Cluster” homogeneous nodes relying on topological information

• (Clustering networked entities)

Major Problems:• Each algorithm models different properties of real world communities

• Comparison and evaluation of different methodologies is not trivial

• Found an acceptable compromise between number of communities and their sizes• Context Dependency

Page 5: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Community Discovery ApproachesGiven the complexity of the problem a number of different typologies of approaches where proposed, analyzing:

• Directed\Undirected edges• Weighted\Unweighted edges• Top-Down\Bottom-Up partitioning• Multidimensionality• Overlap among Communities• Hierarchical Communities• …

DEMON: Undirected, Bottom-Up, Overlapping(with Directed, Weighted, Hierarchical extensions)

Page 6: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Outline• Problem Definition

• What is a community?• Community Discovery

• Communities and complex (social) networks• A matter of perspective

• DEMON Algorithm(s)• Properties• Experiments• Extension

• Conclusions

Page 7: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Communities in (Social) Networks• Communities can be seen as the

basic bricks of a (social) network

• In simple, small, networks it is easy identify them by looking at the structure..

Page 8: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

…but real world networks are not “simple”

• We can’t identify easily different communities

• Too many nodes and edges

Page 9: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Are they two different phenomena?

No!

Page 10: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

A Matter of PerspectiveThe only difference is in the scale

Locally, for each node the structure makes sense

Globally, we are tangled in complex overlaps

Idea: a bottom-up approach!

Page 11: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Outline• Problem Definition

• What is a community?• Community Discovery

• Communities and complex (social) networks• A matter of perspective

• DEMON Algorithm(s)• Properties• Experiments• Extension

• Conclusions

Page 12: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Reducing the complexityReal Networks are Complex

Objects

Can we make them “simpler”?

Ego-Networks

(networks builded upon a focal node , the "ego”, and the nodes to whom ego is directly connected to

plus the ties, if any, among the alters)

Page 13: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

DEMON Algorithm• For each node n:

1. Extract the Ego Network of n

2. Remove n from the Ego Network

3. Perform a Label Propagation1

4. Insert n in each community found

5. Update the raw community set C

• For each raw community c in C1. Merge with “similar” ones in the set (given a threshold)

(i.e. merge iff at most the ε% of the smaller one is not included in the bigger one)

1 Usha N. Raghavan, R Jeka Albert, and Soundar Kumara. Near linear time algorithm to detect community structuresin large-scale networks. Physical Review E

Page 14: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

• Each node has an unique label (i.e. its id)

• In the first (setup) iteration each node, with probability α, change its label to one of the labels of its neighbors;

• At each subsequent iteration each node adopt as label the one shared (at the end of the previous iteration) by the majority of its neighbors;

• We iterate untill consensus is

reached.

Label Propagation – The idea

Page 15: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Label Propagation – Discussion

• Why Label Propagation?• Quasi-linear algorithm• Share our idea of what a community is

• Problem:• Ping-Pong effect

(the algorithm is non-deterministic)

• Solution• Multilabel allowed

(we need overlapping communities after all…)

Page 16: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

DEMON - Two nice properties

• Incrementality:Given a graph G, an initial set of communities C and an incremental update ∆G consisting of new nodes and new edges added to G, where ∆G contains the entire ego networks of all new nodes and of all the preexisting nodes reached by new links, then

Those property makes the algorithm highly parallelizable: it can run independently on different fragments of the overall network with a relatively small combination work

• Compositionality:Consider any partition of a graph G into two subgraphs G1, G2 such that, for any node v of G, the entire ego network of v in G is fully contained either in G1 or G2. Then, given an initial set of communities C:

DEMON(G1 ∪ G2,C) = Max(DEMON(G1,C), DEMON(G2,C))

DEMON(∆G ∪ G,C) = DEMON(∆ G, DEMON(G,C))

Page 17: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Experiments Networks (with metadata):

Congress (nodes US politicians, connected if they co-sponsor the same bills)

IMDb (nodes Actors, connected if they play in the same movies)

Amazon (nodes Products, connected if they were purchased together)

Compared Algorithms: Infomap, non-overlapping state-of-the-art

Rosvall and Bergstrom “Maps of random walks on complex networks reveal community structure”, PNAS, 2008

HLC, overlapping state-of-the-art Ahn, Bagrow and Lehmann “Link communities reveal multiscale complexity in networks”, Nature,

2010

Page 18: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Quality Evaluation – Community size

• number of communities• average community size

Amazon

Page 19: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Quality Evaluation - Label Prediction

Multilabel Classificator (BRL, Binary Relevance Learner) Community memberships of a node as known attributes, real

world labels (qualitative attributes) target to be predicted;

IMDbCongress

Page 20: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Quality Evaluation - Community Cohesion

• How good is our community partition in describing real world knowledge about the clustered entities? • “Similar nodes share more qualitative attributes than dissimilar

nodes”

Iff CQ(P)>1 we are grouping together similar nodes

Page 21: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

HDemon – Hierarchical merge

Why Hierarchical merge?

1. Classic DEMON Merge function did not scale well• Complexity issue (~O(|C|2))• Bottleneck for huge networks (such as social graphs)

2. We need to find the right granularity for the communities

• Extensions needed for Label Propagation Algorithm:• Weighted networks• Directed networks (not yet used)

Page 22: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

HDemon – Hierarchical merge

HDemon(Graph G)Cc = connectedComponent(G)

C = ExtractCommunities(G)

while (|C|>Cc)

For c in C:

N <- N make_node(c)∪For (n,m) in N:

If (n share nodes with m):E <- E (n,m)∪

C <- ExtractCommunities(new Graph(N,E))

ExtractCommunities(Graph G)Egos <- EgoNetworks(G)

for e in Egos:

C = C LabelPropagation(e)∪return C

11

11

11

12

1 11

1

Page 23: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Outline• Problem Definition

• What is a community?• Community Discovery

• Communities and complex (social) networks• A matter of perspective

• DEMON Algorithm(s)• Properties• Experiments• Extension

• Conclusions

Page 24: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Future works – Framework structure

HFDemon(Graph G)Cc ← |connectedComponent(G)|

C ← ExtractCommunities(G)

while (|C|>Cc)

For c in C:

N ← N make_node(c)∪For (n,m) in N:

If n share nodes with m

E ← E (n,m)∪C ← ExtractCommunities(new Graph(N,E))

ExtractCommunities(Graph G)C ← <OverlappingCD(G)>

return C

Different scenarios may require requires alternative communities “definitions”.

• Framework for Bottom-up (and overlapping) CD• Regular vs. Hierarchical

FDemon(Graph G)C ← <OverlappingCD(e)>

Forall c in C(v)

C ← Merge(C,c,merging_function)

return C

Page 25: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Future Works – Social Community Evolution

Thesis proposal “Evolution in Social Networks”

Idea:1. Social networks are not static objects

• Nodes, Edges can appear and disappear• The same interaction could occur multiple times• Communities changes consequently

Major Problems1. Size and granularity of the communities influence hevily evolutive models

• Hierarchical merging?

2. Which are the nodes prone to leave\join a communities?• Role identification

3. How “strong” is a community?• Community strength measure• Community life-cycle

Page 26: DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer

Conclusions DEMON approaches the community discovery problem trough the

analysis of simple network sub-structures (ego-networks)

Overlapping and Hierarchical algorithms are guided by a social perspective

DEMON outperforms state-of-the-art methodologies

Possible parallel implementation: high scalability