25
Dense Subgraph Extraction with Application to Community Detection Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Embed Size (px)

Citation preview

Page 1: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Dense Subgraph Extraction with Application to

Community DetectionAuthor: Jie chen and Yousef Saad

IEEE transactions of knowledge and data engineering

Page 2: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Introduction◦ Assumption of proposed method

The types of graph The method

◦ Undirected graph◦ Directed graph◦ Bipartite graph

Experiment

Outline

Page 3: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

A challenging problem in the analysis of graph structures is the dense subgraph problem, where given a sparse graph, the objective is to identify a set of meaningful dense subgraphs.

The dense subgraphs are often interpreted as “communities”, based on the basic assumption that a network system consists of a number of communities, among with the connections are much fewer than those inside the same community.

Introduction

Page 4: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

The number k of partitions is mandatory input parameter, and the partitioning result is sensitive to the change of k.

Most of partition methods yield a complete clustering of the data.

Many graph partitioning techniques favor balancing, i.e., sizes of different partitions should not vary too much.

The drawbacks of general partition methods

Page 5: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

The adjacency matrix A is a sparse matrix. The entries of A are either 0 or 1, since the

weights of the edges are not taken into account for the density of a graph.

The diagonal of A is empty, since it does not allow self-loops.

The assumption of the proposed method by author

Page 6: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Undirected graph-G(V,E)◦V is the vertex set and E is the edge set.

The types of graphs

1

4

5

2

3

1 2 3 4 5

1 0 1 0 1 1

2 1 0 0 0 1

3 0 0 0 0 1

4 1 0 0 0 1

5 1 1 1 1 0

• The definition of the undirected graph density is

Adjacency matrix-symmetric

Page 7: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Bipartite graph-G(V,E)◦ Undirected graph◦ V is the vertex set and E is the edge set.

The types of graphs

1

25

4

3

1 2 3 4 5

1 0 0 0 1 1

2 0 0 0 0 1

3 0 0 0 1 1

4 1 0 1 0 0

5 1 1 1 0 0

Adjacency matrix

• The definition of the bipartite graph density is

B

BT

Page 8: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Directed graph-G(V,E)◦V is the vertex set and E is the edge set.

The types of graphs

1

4

5

2

3

1 2 3 4 5

1 0 1 0 1 0

2 1 0 0 0 0

3 0 0 0 0 1

4 0 0 0 0 1

5 0 1 1 0 0

Adjacency matrix-nonsymmetric

• The definition of the directed graph density is

Page 9: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

1. We construct a adjacency matrix A with G(V,E).2. We use

to build matrix M that stores the cosines between any two columns of the adjacency matrix A.3. Then we construct a weight graph G’(V,E’) whose

weighted adjacency matrix M is defined as M(i,j).

Undirected graph

Page 10: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

4. A top-down hierarchical clustering of the vertex set V is performed by successively deleting the edges e’ ∈ E’, in ascending order of the edge weights. When G’ first becomes disconnected, V is partitioned in two subsets, each of which corresponds to a connected component of G’.

5. The termination will take place when the density of the partition passes a certain density threshold dmin.

Undirected graph

Page 11: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering
Page 12: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

1

2 3

5 4

67

9

8

Page 13: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

1

2

3

4

5

6

7

8

9

d(Gs)=1d(Gt)=0.6

d(Gt)=0.8

dmin=0.75

Page 14: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

7 8 9 1 2 6 3 5 4

Page 15: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

The adjacency matrix A of a directed graph is square but not symmetric. When Algorithm is applied to a nonsymmetric adjacency matrix, it will result in two different dendrograms, depending on whether M is computed as the cosines of the columns of A, or the rows of A.

We symmetrize the matrix A (i.e., replacing A by the pattern matrix of A+AT ) and use the resulting symmetric adjacency matrix to compute the similarity matrix M.

Directed graph

Page 16: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Remove the direction of the edges and combine the duplicated resulting edges, then it yields an undirected graph

Or use the adjacency matrix AA+AT

Directed graph transform into undirected graph

1

4

5

2

3

)E~

(V,G~

1

4

5

2

3

)E~

(V,G~

E)(V,G 1 2 3 4 5

1 0 1 0 1 0

2 1 0 0 0 1

3 0 0 0 0 1

4 1 0 0 0 1

5 0 1 1 1 0

Page 17: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Without any edge removal of the graph G’ (using M as the weighted adjacency matrix), the vertex set is already partitioned into two subsets: V1 and V2.

Any subsequent hierarchical partitioning will only further subdivide these two subsets separately.

Bipartite graphs

Page 18: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

A reasonable strategy for this purpose is to augment the original bipartite graph by adding edges between some of the vertices that are connected by a path of length 2.

is obtained by erasing the diagonal of M1.

Augment the bipartite graph

1M̂

Page 19: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

dmin=0.5

Page 20: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering
Page 21: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering
Page 22: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Experiments-accuracy

iresult extraction of verticesofnumber The :V~

icomponent of verticesofnumber The :V

i

i

Page 23: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering
Page 24: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Experiments-Polblogs The graph contain

1490 vertices, among which the first 758 are liberal blogs, and remaining 732 are conservative.

The edge in the graph indicates the existence of citation between the two blogs.

Page 25: Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering

Comparisons with the Clauset, Newman, and Moore(CNM) approach.

CNM approach: bottom-up hierarchical clustering. Dataset: foldoc-G(13356, 120238)

◦ It extracted from the online dictionary of computing

Experiments