35
Detecting Communities Via Simultaneous Clustering of Graphs an Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimo County KDD 2008 Workshop on Web Mining and Web Usage Analysis

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Embed Size (px)

Citation preview

Page 1: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies

Akshay JavaAnupam JoshiTim Finin

University of Maryland, Baltimore County

KDD 2008Workshop on Web Mining and Web Usage Analysis

Page 2: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 3: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 4: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Social Media

Describes the online technologies

and practices that people use to

share opinions, insights,

experiences, and perspectives

and engage with each other.

~Wikipedia

Page 5: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Social Media Graphs

G = (V,E) describing the relationships between different entities (People, Documents, etc.)

G’ = <V,T,R> a tri-partite graph that expresses how entities ‘Tag’ some resource

11 22 33 44

11 22Tags

11 22 33 44 URLs

Users

Page 6: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it.

Political Blogs

Twitter Network

Facebook Network

What is a Community

Page 7: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 8: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Community DetectionClustering Approach

Clustering Approach1. Agglomerative/Hierarchical

Topological Overlap: Similarity is measured in terms of number of nodes that both i and j link to. (Razvasz et al.)

Page 9: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Community DetectionClustering Approach

Clustering Approach1. Agglomerative/Hierarchical

2. Divisive/Partition based

Remove edges that have highest edge betweenness centrality

Political Books

(Girvan-Newman Algorithm)

Page 10: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Community DetectionSpectral Approach

• The graph can be partitioned using the eigenspectrum of the Laplacian. (Shi and Malik)

• The second smallest eigenvector of the graph Laplacian is the Fiedler vector.

• The graph can be recursively partitioned using the sign of the values in its Fielder vector.

L = D −W = I − D−

1

2 *W * D−

1

2

NCut(A,B) = Cut(A,B)1

Vol(A)+

1

Vol(B)

⎣ ⎢

⎦ ⎥

Normalized Cuts

Graph Laplacian

Cost of edges deleted to disconnect the graph

Total cost of all edges that start from B

Page 11: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Community DetectionCo-Clustering

• Spectral graph bipartitioning• Compute graph laplacian using

Where is the document by term matrix

(Dhillon et al.)€

A ∈ ℜn×m

Page 12: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 13: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Social Media Graphs

Links Between Nodes Links Between Nodes and Tags

Simultaneous Cuts

Page 14: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it and share similar tags.

Communities in Social Media

Page 15: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Clustering Tags and Graphs

1 1 1 0 0

1 1 1 0 0

1 0 1 1 0

1 0 0 1 1

1 0 0 1 1

1 1 0 0 0 1 1 1 0

1 1 1 0 0 1 1 0 0

0 0 1 1 1 0 0 1 1

0 0 0 1 1 0 0 1 1

⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

Nodes

Nodes

Nod

esT

ags

Tag

sN

odes

Tags

Tags

1

1

−1

−1

−1

1

1

−1

−1

Fiedler Vector Polarity

W ' =I C

C T βW

⎝ ⎜

⎠ ⎟

β= 0 is like co-clustering,

β= 1 Equal importance to blog-blog and blog-tag,

β>> 1 NCut

Page 16: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Clustering Tags and Graphs

β= 0 is like co-clustering,

β= 1 Equal importance to blog-blog and blog-tag,

β>> 1 NCut

Clustering Only Links

Clustering Links + Tags

W ' =I C

C T βW

⎝ ⎜

⎠ ⎟

Page 17: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Clustering Tags and GraphsClustering Only Links

Clustering Links + Tags

Page 18: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 19: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Datasets

• Citeseer– Agents, AI, DB, HCI, IR, ML– Words used in place of tags

• Blog data – derived from the WWE/Buzzmetrics dataset– Tags associated with Blogs derived from del.icio.us– For dimensionality reduction 100 topics derived from blog homepages using LDA (Latent Dirichilet Allocation)

• Pairwise similarity computed – RBF Kernel for Citeseer– Cosine for blogs

Page 20: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Citeseer Data

Accuracy = 36% Accuracy = 62%

Higher accuracy by adding ‘tag’ information

Page 21: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

SimCut Results in• Higher intra-cluster similarity• Lower inter-cluster similarity

Citeseer DataNCut SimCut

Page 22: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Constrains cuts based on both• Link Structure• Tags

Citeseer DataNCut SimCutTrue

Page 23: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

SimCut Results in• Higher intra-cluster similarity• Lower inter-cluster similarity

Blog DataNCut SimCut

Page 24: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Blog DataNCut SimCut

Ncut

Few, Large clusters with low intra-cluster similarity

SimCut

Moderate size clusters higher intra-cluster similarity

35 Clusters

Page 25: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Effect of Number of Tags, ClustersCiteseer

More tags help, to an extent

Lower mutual information if only the graph is used

Mutual Information compares clusters to ground truth

Page 26: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Effect of Number of Tags, ClustersBlogs

More tags help, to an extent

Lower mutual information if only the graph is used

Mutual Information compares clusters to content-based clusters (no tags/graph)

Page 27: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 28: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Future Work

• Evaluating SimCut algorithm on derived feature types like: named entities, sentiments and opinions, links to main stream media.

• For a dataset with ground truth, a comparison of graph based, text based and graph+tag based clustering

• Evaluating effect of varying β

Page 29: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

• Introduction• Community Detection

– Clustering Approach– Spectral Approach– Co-Clustering

• Simultaneous Clustering• Evaluation• Future Work• Conclusions

Outline

Page 30: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Conclusions

• Many Social Media sites allow users to tag resources

• Incorporating folksonomies in community detection can yield better results

• SimCut can be easily implemented and relates to Ncut with two simultaneous objectives– Minimize number of node-node edges being cut– Minimize number of node-tag edges being cut

• Detected communities can be associated with meaningful, descriptive tags

Page 31: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Thanks!

Page 32: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

http://ebiquity.umbc.eduhttp://socialmedia.typepad.com

Page 33: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

More Tags

Only Graph SimCut

Page 34: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Citeseer (Community Size, Similarity)

Page 35: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County

Blogs (Community Size, Similarity)