107
School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th , 2006

School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Embed Size (px)

Citation preview

Page 1: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

School of InformationUniversity of Michigan

Search in networks

Lada Adamic (U. Michigan)

NetSci Workshop

May 16th, 2006

Page 2: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Outline

Search in structured networks small world experiments geographical models hierarchical models studies:

HP labs email network (simulated) Club Nexus online community (simulated) Phone interview company (survey) LiveJournal (simulated)

Search in unstructured networks power law networks Erdos-Renyi networks P2P networks (Gnutella example)

Page 3: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Search in structured networks

Page 4: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

NE

MA

Milgram’s experiment (1960’s):

Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target.

Small world experiments then

Page 5: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Milgram’s small world experiment

Target person worked in Boston as a stockbroker. 296 senders from Boston and Omaha. 20% of senders reached target. Typical strategy – if far from target choose someone

geographically closer, if close to target geographically, choose someone professionally closer

average chain length = 6.5

“Six degrees of separation”

Page 6: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Small world experiments now

email experiment Dodds, Muhamad, Watts, Science 301, (2003)

18 targets13 different countries

24,163 message chains 384 reached their targetsaverage path length 4.0

image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html(unrelated to small world experiment…)

Page 7: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Small world experiment at Columbia

Successful chains disproportionately used• weak ties (Granovetter)• professional ties (34% vs. 13%)• ties originating at work/college• target's work (65% vs. 40%)

. . . and disproportionately avoided• hubs (8% vs. 1%) (+ no evidence of funnels)• family/friendship ties (60% vs. 83%)

Strategy: Geography -> Work

Page 8: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Why study small world phenomena?

Curiosity:Why is the world small?How are people able to route messages?

Social Networking as a Business:Friendster, Orkut, MySpace,FaceBookLinkedIn, Spoke, VisiblePath

Page 9: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Six degrees of separation - to be expected

Pool and Kochen (1978) - average person has 500-1500 acquaintances

Ignoring clustering, other redundancy …

~ 103 first neighbors, 106 second neighbors, 109 third neighbors

But networks are clustered: my friends’ friends tend to be my friends

Watts & Strogatz (1998) - a few random links in an otherwise clustered graph give an average shortest path close to that of a random graph

Page 10: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Is this the whole picture?

Why are small worlds navigable?

Page 11: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

How to choose among hundreds of acquaintances?

Strategy:Simple greedy algorithm - each participant chooses correspondentwho is closest to target with respect to the given property

Models

geographyKleinberg (2000)

hierarchical groupsWatts, Dodds, Newman (2001), Kleinberg(2001)

high degree nodesAdamic, Puniyani, Lukose, Huberman (2001), Newman(2003)

How are people are able to find short paths?

Page 12: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Reverse small world experiment

Killworth & Bernard (1978): Given hypothetical targets (name, occupation, location, hobbies, religion…)

participants choose an acquaintance for each target Acquaintance chosen based on (most often) occupation, geography only 7% because they “know a lot of people” Simple greedy algorithm: most similar acquaintance two-step strategy rare

Page 13: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

nodes are placed on a lattice andconnect to nearest neighbors

additional links placed with puv~

Spatial search

“The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain”

S.Milgram ‘The small world problem’, Psychology Today 1,61,1967

ruvd

Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’Proc. 32nd ACM Symposium on Theory of Computing, 2000.(Nature 2000)

Page 14: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

When r=0, links are randomly distributed, ASP ~ log(n), n size of grid

When r=0, any decentralized algorithm is at least a0n2/3

no locality

When r<2, expected time at least rn(2-r)/3

0~p p

Page 15: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Overly localized links on a latticeWhen r>2 expected search time ~ N(r-2)/(r-1)

4

1~p

d

Page 16: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Links balanced between long and short range

When r=2, expected time of a DA is at most C (log N)2

2

1~p

d

Page 17: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Hierarchical social network models

Page 18: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’NIPS 14, 2001

Hierarchical network models:

Individuals classified into a hierarchy, hij = height of the least common ancestor.

Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)

Group structure models:Individuals belong to nested groupsq = size of smallest group that v,w belong to

f(q) ~ q-

Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)

ijhijp b

h b=3

e.g. state-county-city-neighborhoodindustry-corporation-division-group~

Page 19: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Sketch of proof

T

S

R|R|<|R’|<|R|

k = c log2n calculate probability that s fails to have a link in R’

R’

Page 20: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Identity and search in social networksWatts, Dodds, Newman (Science,2001)

individuals belong to hierarchically nested groups

multiple independent hierarchies h=1,2,..,H coexist corresponding to occupation, geography, hobbies, religion…

pij ~ exp(- x)

Page 21: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006
Page 22: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Identity and search in social networksWatts, Dodds, Newman (2001)

Message chains fail at each node with probability pNetwork is ‘searchable’ if a fraction r of messages reach the target

N=102400

N=409600

N=204800

(1 )L

Lq p r

Page 23: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Small World Model, Watts et al.

Fits Milgram’s data well

Model parameters:N = 108

z = 300 g = 100b = 10= 1, H = 2

Lmodel= 6.7Ldata = 6.5

http://www.aladdin.cs.cmu.edu/workshops/wsa/papers/dodds-2004-04-10search.pdf

more slides on this:

Page 24: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Mary

Bob

Jane

Who couldintroduce me toRichard Gere?

High degree search

Adamic et al. Phys. Rev. E, 64 46135 (2001)

Page 25: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Small world experiments so far

Classic small world experiment:

Given a target individual, forward to one of your acquaintances

Observe chains but not the rest of the social network

Reverse small world experiment (Killworth & Bernard)

Given a hypothetical individual,

which of your acquaintances would you choose

Observe individual’s social network and possible choices,

but not resulting chains or complete social network

Page 26: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Use a well defined network:HP Labs email correspondence over 3.5 months

Edges are between individuals who sent at least 6 email messages each way

450 usersmedian degree = 10, mean degree = 13average shortest path = 3

Node properties specified:degreegeographical locationposition in organizational hierarchy

Can greedy strategies work?

Testing search models on social networksadvantage: have access to entire communication networkand to individual’s attributes

Page 27: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

100

101

102

103

104

10-8

10-6

10-4

10-2

100

outdegree

freq

uenc

y

outdegree distribution = 2.0 fit

Power-law degree distribution of all senders of email passing through HP labs

Strategy 1: High degree search

number of recipients sender has sent email to

pro

po

rtio

n o

f se

nd

ers

Page 28: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Filtered network (at least 6 messages sent each way)

0 20 40 60 800

5

10

15

20

25

30

35

number of email correspondents, k

p(k

)

0 20 40 60 8010

-4

10-2

100

k

p(k

)

Degree distribution no longer power-law, but Poisson

It would take 40 steps on average (median of 16) to reach a target!

Page 29: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Strategy 2:Geography

Page 30: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

1U

2L 3L

3U

2U

4U

1L

87 % of the4000 links arebetween individualson the same floor

Communication across corporate geography

Page 31: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Cubicle distance vs. probability of being linked

102

103

10-3

10-2

10-1

100

distance in feet

pro

po

rtio

n o

f lin

ked

pa

irs

measured1/r

1/r2

optimum for search

Page 32: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Strategy 3: Organizational hierarchy

Page 33: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Email correspondence superimposed on the organizational hierarchy

Page 34: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Example of search path

distance 1

distance 1

distance 2

hierarchical distance = 5search path distance = 4

distance 1

Page 35: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Probability of linking vs. distance in hierarchy

in the ‘searchable’ regime: 0 < < 2 (Watts, Dodds, Newman 2001)

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6p

rob

ab

ility

of l

inki

ng

hierarchical distance h

observedfit exp(-0.92*h)

Page 36: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Results

0 5 10 15 20 250

1

2

3

4

5x 10

4

number of steps in search

nu

mb

er

of

pa

irs

distance hierarchy geography geodesic org random

median 4 7 3 6 28

mean 5.7 (4.7) 12 3.1 6.1 57.4

0 2 4 6 8 10 12 14 16 18 200

2000

4000

6000

8000

10000

12000

14000

16000

number of steps

nu

mb

er

of p

airs

hierarchy geography

Page 37: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Expt 2

Searchinga socialnetworkingwebsite

Page 38: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006
Page 39: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Profiles:

status (UG or G) yearmajor or departmentresidencegender

Personality (choose 3 exactly):you funny, kind, weird, …friendship honesty/trust, common interests, commitment, …romance - “ -freetime socializing, getting outside, reading, …support unconditional accepters, comic-relief givers, eternal optimists

Interests (choose as many as apply)books mystery & thriller, science fiction, romance, …movies western, biography, horror, …music folk, jazz, techno, …social activities ballroom dancing, barbecuing, bar-hopping, …land sports soccer, tennis, golf, …water sports sailing, kayaking, swimming, …other sports ski diving, weightlifting, billiards, …

Page 40: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Differences between data sets

• complete image of communication network

• affinity not reflected

• partial information of social network

• only friends listed

HP labs email network Online community

Page 41: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

0 20 40 60 80 1000

50

100

150

200

250

number of links

nu

mb

er

of u

sers

with

so

ma

ny

links

100

101

102

100

101

102

number of links

num

ber

of u

sers

Degree Distribution for Nexus Net 2469 users, average degree 8.2

Page 42: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Problem: how to construct hierarchies?

0 1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014

separation in years

pro

b. t

wo

un

de

rgra

ds

are

frie

nd

s

data

(x+1)-1.1 fit

0 1 2 3 4 50

0.005

0.01

0.015

0.02

separation in years

prob

. tw

o gr

ads

are

frie

nds data

(x+1)-1.7 fit

Probability of linking by separation in years

Page 43: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Hierarchies not useful for other attributes:

0 100 200 300 400 500 6000

0.01

0.02

0.03

0.04

0.05

0.06

distance between residences

pro

ba

bili

ty o

f be

ing

frie

nd

s

Geography

Other attributes: major, sports, freetime activities, movie preferences…

Page 44: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Strategy using user profiles

prob. two undergrads are friends (consider simultaneously)

• both undergraduate, both graduate, or one of each

• same or different year

• both male, both female, or one of each

• same or different residences

• same or different major/department

Results

random 133 390high degree 39 137profile 21 53

strategy median mean

With an attrition rate of 25%, 5% of the messages get through atan average of 4.8 steps,=> hence network is barely searchable

Page 45: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

The accuracy of small world chains in social networksPeter D. Killworth, Christopher McCarty, H. Russell Bernard, Mark House

Social Networks, 2006 First parallel study of

individuals choices vs. actual shortest paths

Network 105 members of an

interviewing bureau 10,920 shortest path

connections who knows whom who a person would select as

the next link in a chain to a particular person x recent hire

old timer

worked a while

Page 46: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Accuracy of small world chains

Shortest paths use the network of who-knows whom to calculate actual shortest

paths compare to paths formed by individuals’ choices

21.7% fail through reaching missing data 23.7% reach cycles : i chooses j, j chooses i 54.6% reach the target, with chains that are 40% longer on average

than the shortest path

Page 47: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Next choice accuracy and a Markov model

48% of the time, a person chooses a contact who is closer to the target over half of the choices are wrong!

Markov model: terminate chain with probability (attrition) choose someone closer to the target with probability p,

otherwise choose someone at same distance

Page 48: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

LiveJournal

LiveJournal provides an API to crawl the friendship network + profiles friendly to researchers great research opportunity

basic statistics Users

How many users, and how many of those are active? Total accounts: 9980558 ... active in some way: 1979716 ... that have ever updated: 6755023 ... updating in last 30 days: 1300312 ... updating in last 7 days: 751301 ... updating in past 24 hours: 216581

Page 49: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Predominantly female & young demographic

Male: 1370813 (32.4%) Female: 2856360 (67.6%) Unspecified: 1575389

13 18483

14 87505    

15 211445          

16 343922               

17 400947                  

18 414601                  

19 405472                  

20 371789                

21 303076             

22 239255           

23 194379         

24 152569       

25 127121      

26 98900     

27 73392    

28 59188   

29 48666  

Age distribution

Page 50: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Geographic Routing in Social Networks

David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins (PNAS 2005)

data used Feb. 2004 500,000 LiveJournal users with US locations giant component (77.6%) of the network clustering coefficient: 0.2

Page 51: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Degree distributions

The broad degree distributions we’ve learned to know and love but more probably lognormal than power law

broader in degree than outdegree distribution

Page 52: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Results of a simple greedy geographical algorithm

Choose source s and target t randomly Try to reach target’s city – not target itself At each step, the message is forwarded from the current message holder u

to the friend v of u geographically closest to t

stop if d(v,t) > d(u,t)

13% of the chains are completed

stop if d(v,t) > d(u,t)

pick a neighbor at random in the same city if possible, else stop

80% of the chains are completed

Page 53: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

the geographic basis of friendship

= d(u,v) the distance between pairs of people The probability that two people are friends given their

distance is equal to P() = + f(), is a constant independent of geography is 5.0 x 10-6 for LiveJournal users who are very far apart

Page 54: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

the geographic basis of friendship

The average user will have ~ 2.5 non-geographic friends The other friends (5.5 on average) are distributed according to an

approximate 1/distance relationship But 1/d was proved not to be navigable by Kleinberg, so what

gives?

Page 55: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Navigability in networks of variable geographical density

Kleinberg assumed a uniformly populated 2D lattice But population is far from uniform population networks and rank-based friendship

probability of knowing a person depends not on absolute distance but on relative distance (i.e. how many people live closer) Pr[u ->v] ~ 1/ranku(v)

Page 56: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Structured search Conclusions

Individuals associate on different levels into groups.

Individuals tend to know others who are ‘close by’

Group structure facilitates decentralized search using social ties.Hierarchy search faster than geographical search

Simple strategies are not perfect – but short (rather than shortest) chains can be found

Page 57: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Weighted shortest paths

Routes shortest route from Chicago to Boston

vertex: intersection edge weights: road distances alternative weights: expected time traveled, gas consumed… usually sum the weights from each segment

start

finish

freeway, 65 mph

40 miles/65 mph ~ 37 minutes

freeway, 70 mph

30 miles/70 mph

~ 26 minutes

surface road

25 mph, 50 miles

2 hours

Page 58: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Reliable paths through social networks

The probability of transmitting a message or infectious agent could be related to the strength of the tie e.g. rather than summing the weights, we might multiply the

probabilities of getting through

p = 0.5p = 0.5

p = 0.001 p = 1

Probability of getting an idea through to the head of labs

via CEO (0.001*1 = 0.001), via direct manager (0.5*0.5 = 0.25)

p = 0.05

Page 59: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

MotivationPower-law (PL) networks, social and P2P

Analysis of scaling of search strategies in PL networks

Simulationartificial power-law topologies, real Gnutella networks

Comparison with existing P2P search strategiesReflector, Morpheus

Directed SearchFreenet

Search in random networks

2

Page 60: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Mary

Bob

Jane

Who couldintroduce me toRichard Gere?

How do we search?

Page 61: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

AT&T Call Graph

Aiello et al. STOC ‘00

# o

f te

lep

hon

e n

um

be

rsfr

om

wh

ich

ca

lls w

ere

ma

de

# of telephone numbers called

Page 62: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

100

101

100

101

102

number of neighbors

pro

po

rtio

n o

f n

od

es

datapower-law fit = 2.07

Gnutella network

power-law link distribution

summer 2000,data provided by Clip2

Page 63: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Preferential attachment model

Nodes join at different times

The more connections a node has, the more likely it is to acquirenew connections

Growth process produces power-law network

host cache

pingping

ping

ping

ping

ping ping

Page 64: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

file sharing w/o a central index

queries broadcast to every node within radius ttl as network grows, encounter a bandwidth barrier (dial up modems cannot keep up with query traffic, fragmenting the network)

Gnutella and the bandwidth barrier

Clip 2 reportGnutella: To the Bandwidth Barrier and Beyondhttp://www.clip2.com/gnutella.html#q17

Page 65: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

1

6

54

63

67

2

94

number ofnodes found

power-law graph

Page 66: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

93

number ofnodes found

13

711

1519

Poisson graph

Page 67: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Search with knowledge of 2nd neighbors

Page 68: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Outline of search strategy

pass query onto only one neighbor at each step

requires that nodes sign query- avoid passing message onto a node twice

requires knowledge of one’s neighbors degree- pass to the highest degree node

requires knowledge of one’s neighbors neighbors- route to 2nd degree neighbors

OPTIONS

Page 69: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Generating functions

M.E.J. Newman, S.H. Strogatz, and D.J. Watts ‘Random graphs with arbitrary degree distributions and their

applications’, PRE, cond-mat/0007235

Generating functions for degree distributions

Useful for computing moments of degree distribution, component sizes, and average path lengths

00

( ) kk

k

G x p x

Page 70: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Fun with generating functions

normalization condition: probabilities sum to 1

1)1(0

0

kkpG

derivatives: the generating function contains all the information of the degree distribution

00

!

1

xk

k

k x

G

kp

Page 71: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Fun with generating functions (cont’d)

Expected degree of a randomly chosen vertex

'0(1)k

k

k kp G

Higher moments of degree distribution

1

0 )(

x

n

kk

nn xGdx

dxpkk

Page 72: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Example: Poisson distribution

Let p = z/N be the probability of an edge existing between two vertices (z is the average degree)

kkNkN

k

xppk

NxG

)1()(

00

)1(~)1( xzN epxp for large N

)1(0 )(' xzzexG

zG )1('0

zkxk

k

k ezkx

G

kp

!

1

!

10

0 just the regular Poisson

distribution

Page 73: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Introducing cutoffs

max 1k N a node cannot have more connections than there are other nodes

This is important for exponents close to 2

1 1

11kp C

x 2 2

6C

1000

( 1000, 2) ~ 0.001kp k p

Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors:

1000(1 ( 1000, 2)) ~ 0.36p k

without a cutoff, for = 2have > 50% chance of observing a node with more neighbors than there are nodes

for = 2.1, have a 25% chance

Page 74: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

# of sites linking to the site

prop

ortio

n of

site

s w

/ so

man

y lin

ks

1000

Selecting from a variety of cutoffs

Nk max1.

2. /k

k eCkp Newman et al.

3.

0

Ckpk

1CNk

otherwise

Generating Function

kCN

k

xkCxG

1

10

Aiello et al.

1 million websites (~ 1997)

N

Page 75: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Aiello’s ‘conservative’ vs. Havlin’s ‘natural’ cutoff

1

1

* 1

~

kN p

Ck N

k N

cutoff where expectednumber of nodes of degreek is 1

k

n(k)

k

n(k)

1

1

cutoff so thatexpected number of nodesof degree > k is 1

max

max

1

1 1max

1

1max

* 1

~

~

~

kk k

k k

N p

ck N

k N

k N

Page 76: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

The imposed cutoff can have a dramaticeffect on the properties of the graph

degrees drawn at random, for = 2, and N = 1000

Page 77: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

00

( ) kk

k

G x p x

is the probability that a randomlychosen vertex has degree k

~kp k

is a generating function

'0(1)k

k

k kp G is the expected degree of a randomlychosen vertex

'0

1 '0 1

G xG x

G is the distribution of remaining

outgoing edges following and edge

assuming neighbors don’t share edges

11 '1

'02 GGz is the expected number of second

degree neighbors

22

2

2 2

2

1

11

Generating functions for degree distributions Random graphs with arbitrary degree distributions and their applications by Newman, Strogatz & Watts

Page 78: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

search with knowledge of first neighbors

max

max

maxmax

max

max

01

' 1 10 0

1

' 1 1 20 max

1 1

'' 1 101 ' '

10 0

1 2'

20

3 2' max1 '

0

( )

( ) ( )

1(1) 1

2

( )( )

(1) (1)

( 1)(1)

( 2) 21(1)

(1)

kk

kk

kk

kk

kk

G x c k x

G x G x c k xx

G k c k k dk k

G x cG x k x

G G x

ck k x

G

kG

G

2max( 1) (3 )

( 2)(3 )

k

Generating function with cutoff

Average degree of vertex

constant in Nfor 2<<3, and kmax~Na, decreaseswith N

Average number of neighborsfollowing an edge

~

Page 79: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

search with knowledge of first neighbors (cont’d)

3 3' 3max max

1 1 max' 20 max

1 2(1)

(1) (3 ) 1 (3 )B

k kz G k

G k

In the limit t->2, ' max1

max

(1)log( )

kG

k

Let’s for the moment ignore the fact that as we do a random walk, we encounter neighborsthat we’ve seen before

s = number of steps =1B

N

z

~ ~

~

Page 80: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Search time with different cutoffs

0.18(2.1)s N

3 3m

2

ax

( ) ,2 3N N

sk N

NIf kmax = N,

max

max

log(log( )

, 2)N k

s Nk

2

21

33max 1

( ) ,2 3NN N

sk

N

If kmax = N1/(-1),

max

max

log( )(2) log( )

N ks

kN

0.1(2.1) Ns

~

~

~

~

~

~

grow from 1,000 to 1,000,000 nodes,

search time increases by a factor of ~2

grow 1000x

search time increases 3x

Page 81: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

search with knowledge of first neighbors (cont’d)

2 3 /13

3max

,2 3

( )

N Ns N

kN

If kmax = N,

So the best we can do is for exponents close to 2 N

2nd neighbor random walk, ignoring overlap:

15.0~1.2, NNS 213~, NNS

2

2

~

s B

B

n z N

NS

z N

232' max

2 1 1 1 21 max

2( ( )) (1)

1 (3 )Bx

kz G G x G

x k

~

Page 82: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Following the degree sequence

a ~ s = # of steps taken

1.0deg 1.2, NNS

2nd neighbors, ignoring overlap:

Go to highest degree node, then next highest, … etc.

max

max

1 11 max~

k

D k az Nk dk Nak

max

max

' 2(2 )1 1

2( 2) 2 4 /

( ) ~

~ ~

Dz G x Nak

s k N

Page 83: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

0 10 20 30 40 50 60 70 80 90 100

1

= 2.00 = 2.25 = 2.50 = 2.75 = 3.00 = 3.25 = 3.50 = 3.75

degree of node

deg

ree

of

nei

gh

bo

r -

1d

egre

e o

f n

od

e

2

20

10

5

Ratio of the degree of a node to the expected degree of its highest degree neighbor for 10,000 node power-law graphs of varying exponents

Page 84: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Actor collaboration graph (imdb database)

~ 2.0-2.2

Exponents close to 2 required to search effectively

World Wide Web, ~ 2.0-2.3, high degree nodes: directories, search engines

Social networks, AT&T call graph ~ 2.1

Gnutella

100 101 102 103 104100

101

102

103

104

105

number of costars

num

ber

of a

ctor

s/ac

tres

ses actors, = 2

actresses, = 2.1

Page 85: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

15

50

18 17

8

10

9

6

Following the degree sequence

Page 86: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Complications

Should not visit same node more than once

Many neighbors of current node being visited were also neighbors of previously visited nodes, and there is a bias toward high degree nodes being ‘seen’ over and over again

Page 87: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

0 100 200 300 400 500 6000

5

10

15

20

25

30

step

deg

ree

of

no

de

not visitedvisitedneighbors visited

Status and degree of node visited

Page 88: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

1

10-2

0.1

1

step

pro

po

rtio

n o

f n

od

es

fo

un

d a

t s

tep

random walk

10 102

103

104

105

106

10-3

10-4

0 20 40 60 80 100

0.2

0.4

0.6

0.8

1

step

cum

ula

tive

no

des

fo

un

d a

t st

ep

random walk degree sequence

seeking high degree nodesspeeds up the search process

about 50% of a 10,000 node graphis explored in the first 12 steps

Progress of exploration in a 10,000 node graph knowing2nd degree neighbors

12

degree sequence

Page 89: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

101

102

103

104

105

100

101

102

103

size of graph

cove

rtim

e fo

r h

alf

the

no

des

random walk = 0.37 fit

degree sequence = 0.24 fit

Scaling of search time with size of graph

Page 90: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Comparison with a Poisson graph

100

101

102

103

100

101

102

de

gre

e o

f c

urr

en

t n

od

e

step

Poisson power-law

101

102

104

106

100

101

102

103

104

105

number of nodes in graph

co

ve

r ti

me

fo

r 1

/2 o

f g

rap

h

constant av. deg. = 3.4

= 1.0 fit

10

xzexG

xGxGz

xxG 001

expected degree and expecteddegree following a link are equal

scaling is linear

Page 91: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

step

cum

ula

tive

no

des

fo

un

d a

t st

ep

high degree seeking 1st neighborshigh degree seeking 2nd neighbors

50% of the files in a 700 node network can be found in < 8 steps

Gnutella network

Page 92: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

• Maintain a list of files in their neighborhood

• Check query against list.

• Periodically contact neighbors to maintain list

• Append ID to each query processed

Required modifications to nodes

Tradeoff

storage/cpu(available)

bandwidth(limited)

for

Page 93: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

• localized indexing• traffic routed to high degree nodes

Partial implementation:

Theory vs. reality:

• overloading high degree nodes but no worse than original scenario where all nodes handle all traffic

assume high degree -> high bandwidth so can carry the traffic load

• fewer nodes used for routing, system is more susceptible to maliciousattack

Page 94: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Clip2 Distributed Search Solutionshttp://dss.clip2.com© Clip2.com, Inc.

Broadband user runningReflector

Broadband user runningGnutella

Dial-up user runningGnutella

Page 95: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

LimeWire, BearShare:drop connections to unresponsive hosts drives slower hosts to have fewer connections &move to edge of network

Kazaa, BearShare defender, Morpheus SuperNodes

from Clip2: Morpheus out of the Underworldhttp://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html

Connection-preferencing rules

Supernodes

Page 96: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Freenet

Queries are passed to one peer at a time.

Queries routed to high degree nodes.

Has a power-law topology Theodore Hong, ‘Performance’ chapter in O’Reilley’s “Peer-to-Peer, Harnessing the Power of Disruptive Technologies”

Scales as N0.275 with the size of the network, N.

Page 97: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Theodore Hong, power - law link distribution of a simulated Freenet network

Page 98: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Theodore Hong, scaling of mean search time

on a simulated Freenet network

Page 99: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Node specialization key to Freenet’s speed

Each node forwards query to node with “closest” hash key

Node passing back a match remembers the address the data came from

Results in nodes developing a bias towards a part of the keyspace

112659 ?356?

340388

396 135214

356340388

396 135214

Queries are naturally routed to high degree nodesUse keys for orientation

Page 100: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Conclusions

Search is faster and scales in power-law networks

Networks intended to be searched, such as Gnutella,have a favorable P-L topology

High degree strategy has partially been implemented in existing p2pclients, such as BearShare, Kazaa & Morpheus

Page 101: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

Current research on search

search in weighted networks expertise search P2P architectures with ‘friendship’ overlays weak ties vs. strong ties and online communication

Page 102: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

A PL link distribution shortens the average shortest path

1

1

1

21

1 zz

zzz

r

rr

Poisson: = z1

PL: > z1

Page 103: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

1 1.5 2 2.5 3 3.5 4 4.5 510

0

101

102

103

104

105

106

radius

neig

hbor

s at

rad

ius

power-law =2.5Poisson =1.0

102

104

106

0

2

4

6

N

PLPS

Page 104: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

A

B

What about the shortest path discovered along the way?

B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103.

each node passes message to highest degree neighbor it hasn’t passed the message to previously

‘cut off’ loops

Page 105: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

102

103

104

105

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

N

av

. pa

th le

ng

th f

ou

nd

PL high degree0.72*ln(N)

A high degree seeking strategy finds shortest paths whose average scales logarithmically with the size of the graph

Page 106: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

102

103

104

105

101

102

N

av

. pa

th le

ng

th f

ou

nd

PLPoisson

N0.46

N0.48

Scaling of the path length found using a• random strategy on a PL graph• high-degree strategy on a Poisson graph

Page 107: School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006

102

103

104

105

100

101

102

103

104

N

me

dia

n s

ea

rch

co

st

PL high degreePL randPoisson high degree

But…Search costs are prohibitive, might as well do a BFS