27
Graph Data Management Lab, School of Computer Scienc GDM@FUDAN www.gdm.fudan.edu.cn mation here: The 12-th International Conference of Date Engin Version 1(2012-3-25) 张张张 A Large-Scale Community Structure Analysis in F acebook Email:[email protected]

Graph Data Management Lab, School of Computer Science GDM@FUDAN Put conference information here: The 12-th International Conference

Embed Size (px)

Citation preview

Page 1: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Put conference information here: The 12-th International Conference of Date Engineering

Version 1(2012-3-25)张俊骏

A Large-Scale Community Structure Analysis in Faceboo

k

Email:[email protected]

Page 2: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Put conference information here: The 12-th International Conference of Date Engineering

OutLine

•Introduction•Data Collection Algorithm• (1) BFS sampling (2) Uniform sampling•Detection Communities• (1) LPA algorithm (2) FNCA algorithm•Experimentation• (1) Community structure similarity• (2) Out-of-scale community

Page 3: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Introduction•Large-Scale: There have been over 500 million users

registered in Facebook in 2011.•Community •Structure : (1) Relationships are very tight over some

areas of the social life , such as family, colleagues,friends.

• (2) While the outgoing connections not belonging to any of these categories are less likely to happen.

Page 4: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Introduction(2)•Large-Scale: There have been over 500 million users

registered in Facebook in 2011.•Community •Structure : (1) Relationships are very tight over some

areas of the social life , such as family, colleagues,friends.

• (2) While the outgoing connections not belonging to any of these categories are less likely to happen.

Page 5: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Introduction(3)•Community: A sub-structure within the overall graph ,

in which the density of the relationships in a certain community is much greater than the density among communities.

•Clustering : Get the communities within the certaingraph (overall , or generating subgraph). In mathematic word , find a partition

• V = (V1∪V2∪ ... ∪Vn) , in which V1-Vn are vertex sets and for any Vx and Vy ,

• Vx ∩ Vy = Ø•

Page 6: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Introduction(4)•DataSets: (1) 2 different samples of the graph of

relationships among the social network users .

• (2) Each contains millions entities, and then adopting two fast and efficient community detecting algorithms .

• (3) Working with no a-priori knowledge .

Page 7: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm

•BFS Sampling

Page 8: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm (2)

•BFS Sampling • (1) Starting from one node • (2) End when reaching the required level or node

number.• (3) Easy to achieve ; Efficient• (4) Depend on the node selected at the start .

Page 9: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm (3)

•Uniform Sampling

Page 10: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm(4)

•Uniform Sampling • Legal ID number in Facebook: about 2^32• Existed ID number in Facebook: about 500 million (2011)• Thus , theoretically , if we want to mine a dataset of 1 million existed IDs , we need to test:• S = 1,000,000 / (500,000,000/2^32) = 8,590,000 legal IDs• Thus , generate 8,590,000 legal IDs randomly , check whether that ID exist . If so , mine the information of this node ; otherwise , drop it . • •

Page 11: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm(5)

•Uniform Sampling • Obviously , the advantage of uniform sampling is the • fact that the social network of the nodes will not make• effect on the result . • In the actual experiment , the generating dataset is a • little smaller than BFS , because some users hide themselves from the random search . •

Page 12: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Data Collection Algorithm(6)

•DataSet Description

•平均集聚系数为所有结点 Vi 的局部集聚系数的均值•结点 Vi 的局部集聚系数 Ci 是它的相邻结点之间的连接数与它们所有可能存在连接的数量的比值。•

Page 13: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities•LPA algorithm•

Page 14: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(2)

•LPA algorithm• (1) Under specific conditions , could not converge. In order to avoid deadlocks and to guarantee an efficient network clustering , we suggested to adopt an "asynchronous" update of the labels, thus considering the values of some neighbors at the previous iteration and some at the actual one.• (2) About 5 iterations are sufficient to correctly classify

95% of vertices of the network .•

Page 15: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(3)

•LPA algorithm• (3) It could exist a path connecting a pair of vertices in a

group passing through vertices belonging to different groups.We devise a final step• to split the groups into one • or more contiguous• communities.• (4) Near linear cost• (5) Not stable in some cases•

Page 16: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(4)•FNCA algorithm(Pre)•

Page 17: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(4)•FNCA algorithm(Pre)•

Page 18: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(5)•FNCA algorithm

•Aij=1 当且仅当点 i 和点 j 互相连接。•δ ( u,v ) =1 当且仅当 u=v•ki 就是点 i 与所有其他点 j 的 Aij 的总和 ( 即点 i 的总边数 )•m 是所有点的 k 值的总和的一半(即图的总边数)•r(i) 即 i 所属的社区

Page 19: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(6)

Page 20: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(7)•FNCA algorithm• (1) Experimental results show that, the clustering solution• of FNCA is good enough before iteration number reaches

50 for most networks (even large scale)• (2) Generally speaking , the community structure of a network

• is evident when its Q-value is greater than 0.3

• (3) The time complexity of the FNCA algorithm can not be

• worse than O(T * n * k * c)

Page 21: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(8)•Experimentation Result•

Page 22: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Detecting Communities(9)•Experimentation Result•

Page 23: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Experimentation•Community structure similarity

Page 24: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Experimentation(2)•Community structure similarity•

• rough method:

• improved method:

• M11 代表 v 交 w 之间共享的元素总数, M01 代表 w-v , M10 代表 v-w 当且仅

当 v=w 时这个 J 值等于 1

Page 25: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Experimentation(3)•Experimantal results•

Page 26: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Experimentation(4)•Out-of-scale community

•Maybe the shortage of algorithms , maybe it real exists . Anyway , it will be studied in the future.•

Page 27: Graph Data Management Lab, School of Computer Science GDM@FUDAN  Put conference information here: The 12-th International Conference

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

www.gdm.fudan.edu.cn

Thank you!