45
Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network Jure Leskovec ([email protected]) Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney, Kevin Lang, Aniraban Dasgupta

Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

Embed Size (px)

DESCRIPTION

Jure Leskovec ([email protected]) Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney, Kevin Lang, Aniraban Dasgupta. Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network. - PowerPoint PPT Presentation

Citation preview

Page 1: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

Size matters:1) Cluster structure of large networks2) Searching the world’s social networkJure Leskovec ([email protected])Computer Science DepartmentCornell University / Stanford University

Joint work with: Eric Horvitz, Michael Mahoney, Kevin Lang, Aniraban Dasgupta

Page 2: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

Rich data: Networks

Large on-line computing applications have detailed records of human activity: On-line communities: Facebook (120 million) Communication: Instant Messenger (~1 billion) News and Social media: Blogging (250 million)

We model the data as a network (an interaction graph)

Can observe and study phenomena at scales not

possible before Communication network

Page 3: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

3

Outline

The Small-world experiment:▪ On a 240 million node communication

network of Microsoft Instant Messenger

Small vs. large networks:▪ Modeling community (cluster) structure of

large networks

Zachary’s karate club (N=34) Tiny part of a large social network

Page 4: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

4

How expressed are communities?

How community like is a set of nodes?

Idea: Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure.

Conductance (normalized cut)

Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more

community-like sets of nodes

S

S’

Page 5: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

5

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

What is “best”

community of 5 nodes?

Page 6: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

6

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/6 = 0.83

What is “best”

community of 5 nodes?

Page 7: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

7

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

What is “best”

community of 5 nodes?

Page 8: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

8

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

What is “best”

community of 5 nodes?

Page 9: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

9

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Community size, log k

log Φ(k)Φ(5)=0.25

Φ(7)=0.18

k=5 k=7

Page 10: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

10

NCP plot: Low-dimensional and random graphs

d-dimensional meshes Hierarchically nested clusters

Page 11: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

11

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds

to cut B

Page 12: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

12

NCP plot: Network Science Collaborations between scientists in

Networks [Newman, 2005]

Page 13: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

13

Present work: Large networks

Previous work mostly focused on community structure of small networks (~100 nodes)

We examined 108 different large networks

Page 14: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

14

Example of a large network Typical example:

General relativity collaboration network (4,158 nodes, 13,422 edges)

Page 15: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

15

More NCP plots of networks

Page 16: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

16

Φ(k

), (

con

du

ctan

ce)

k, (community size)

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Communities get worse and worse

Best community has ~100

nodes

Page 17: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

17

Explanation: Downward part

Small clusters on the edge of the network are responsible for downward part of NCP plot

NCP plot

Best cluster

Page 18: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

18

Explanation: Upward part

Each additional edge inside the cluster costs more: NCP plot

Φ=2/4 = 0.5

Φ=8/6 = 1.3

Φ=64/14 = 4.5

Each node has twice as many

children

Φ=1/3 = 0.33

Page 19: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

19

Suggested network structure

Network structure: Core-

periphery (jellyfish, octopus)

Whiskers are responsible for

good communities

Denser and denser

core of the network

Core contains

~60% nodes and ~80%

edges

Page 20: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

20

What is a good model?

What is a good model that explains such network structure?

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

Page 21: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

21

Forest Fire model works

Forest Fire [LKF05]: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

Notes:• Preferential attachment flavor - second neighbor is not uniform at random.• Copying flavor - since burn seed’s neighbors.• Hierarchical flavor - seed is parent.• “Local” flavor - burn “near” -- in a diffusion sense -- the seed vertex.As community grows it

blends into the core of

the network

Page 22: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

22

Forest Fire NCP plot

rewired

network

Page 23: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

23

Typical cluster size

How does the size of best cluster scale with the size of the network?

Page 24: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

24

Size of best cluster over time

Cluster size remains constant (even if one allows nesting) over time

Linked in network over time

Page 25: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

25

Cluster size vs. network size

Each dot is a different network

Page 26: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

26

Connections

The Dunbar number 150 individuals is maximum community size

What edges “mean” and community identification

Using node and edge types/attributes Implications for machine learning

No large clusters No/little (assortative) hierarchical structure Can’t be well embedded – no underlying

geometry

Page 27: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

27

The small-world of the MSN Instant Messenger

Joint work with Eric Horvitz, Microsoft Research

Page 28: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

28

The Small-world experiment

Milgram’s small world experiment

The Small-world experiment [Milgram ’67, Dodds-Muhamad-Watts ‘03] People send letters from Nebraska to Boston

How many steps does it take? 6.2 on the average, thus “6 degrees of separation”

Page 29: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

29

The Small-world experiment 1) Short paths exist in a social

network 2) People are able to find them

(using only partial knowledge of the network)

Local search: forwarding a message

ts

d(s,t)=h

Good nodes:d=h-1

Bad nodes: d≥h

Target

Page 30: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

30

Our dataset: Instant Messaging

Contact (buddy) list Messaging window

Page 31: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

31

MSN communication

We collected the data for June 20064.5Tb of compressed data: 245 million users logged in 180 million users engaged in

conversations 255 billion exchanged messages 1 billion conversations / day

Page 32: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

32

MSN network

The network: 180M nodes, 1.3B undirected edges

Page 33: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

33

MSN: path lengths

MSN Messenger network

Number of steps

between pairs of people

Avg. path length 6.690% of the people can be reached in

< 8 hops

Hops Nodes0 1

1 10

2 78

3 3,96

4 8,648

5 3,299,252

6 28,395,849

7 79,059,497

8 52,995,778

9 10,321,008

10 1,955,007

11 518,410

12 149,945

13 44,616

14 13,740

15 4,476

16 1,542

17 536

18 167

19 71

20 29

21 16

22 10

23 3

24 2

25 3

Page 34: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

34

Degree distribution:

A node that exchanged

messages with ~2 million people

Page 35: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

35

Robustness of shortest paths

Short paths exist and they are robust

Randomized network (same degree distr.)

All links

Both way links

Page 36: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

36

Learning to search in a network

What is the decision function that makes me forward the message to the target?

ts

d(s,t)=h

Good nodes:d=h-1

Bad nodes: d≥h

Target

What are the characteristics of shortest paths? How hard is it to

find them?

Page 37: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

37

Does geography help?

t s

Page 38: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

38

Does geography help?

t s

Page 39: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

39

How hard is to find a good node?

t s

Page 40: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

40

How hard is to find a good node?

Probability of success if we forward to a

random neighbor

t s

Page 41: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

41

Algorithm accuracy at hops

t s

Page 42: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

42

Algorithm accuracy at hops

t s

Use a decision tree to learn a classifier:Model: 0.4128Random : 0.0207

Page 43: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

43

The learned model

Green bar is prob. that node is good

Page 44: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

44

Comparing search heuristics Pick a pair of nodes: start at s Walk until hit the target t where next node is chosen:

Search alg. % found Mean path lengthRandom 0.0008 3,709MinGeoDist 0.0282 778MaxDeg 0.0158 4,964Deg/Geo2 0.1446 2,676Cntry 0.0108 402Cntry*Deg 0.1313 3,114Lang 0.0055 1,699Lang*Deg 0.0496 3,163 Age 0.0012 2,890 Age*Deg 0.0203 5,324 ts

It works!(in a network with 180 million nodes)

-- Milgram’s path completion is 29%-- Dodds,Muhhamad, Watts: 0.015% comp

Page 45: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network

45

Conclusions and reflections

Why are networks the way they are?

Only recently have basic properties been observed on a large scale Confirms social science intuitions; calls

others into question

Benefits of working with large data Observe structures not visible at

smaller scales