Upload
kristian-richards
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
School of InformationUniversity of Michigan
Search in networks
Lada Adamic (U. Michigan)
NetSci Workshop
May 16th, 2006
Outline
Search in structured networks small world experiments geographical models hierarchical models studies:
HP labs email network (simulated) Club Nexus online community (simulated) Phone interview company (survey) LiveJournal (simulated)
Search in unstructured networks power law networks Erdos-Renyi networks P2P networks (Gnutella example)
Search in structured networks
NE
MA
Milgram’s experiment (1960’s):
Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target.
Small world experiments then
Milgram’s small world experiment
Target person worked in Boston as a stockbroker. 296 senders from Boston and Omaha. 20% of senders reached target. Typical strategy – if far from target choose someone
geographically closer, if close to target geographically, choose someone professionally closer
average chain length = 6.5
“Six degrees of separation”
Small world experiments now
email experiment Dodds, Muhamad, Watts, Science 301, (2003)
18 targets13 different countries
24,163 message chains 384 reached their targetsaverage path length 4.0
image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html(unrelated to small world experiment…)
Small world experiment at Columbia
Successful chains disproportionately used• weak ties (Granovetter)• professional ties (34% vs. 13%)• ties originating at work/college• target's work (65% vs. 40%)
. . . and disproportionately avoided• hubs (8% vs. 1%) (+ no evidence of funnels)• family/friendship ties (60% vs. 83%)
Strategy: Geography -> Work
Why study small world phenomena?
Curiosity:Why is the world small?How are people able to route messages?
Social Networking as a Business:Friendster, Orkut, MySpace,FaceBookLinkedIn, Spoke, VisiblePath
Six degrees of separation - to be expected
Pool and Kochen (1978) - average person has 500-1500 acquaintances
Ignoring clustering, other redundancy …
~ 103 first neighbors, 106 second neighbors, 109 third neighbors
But networks are clustered: my friends’ friends tend to be my friends
Watts & Strogatz (1998) - a few random links in an otherwise clustered graph give an average shortest path close to that of a random graph
Is this the whole picture?
Why are small worlds navigable?
How to choose among hundreds of acquaintances?
Strategy:Simple greedy algorithm - each participant chooses correspondentwho is closest to target with respect to the given property
Models
geographyKleinberg (2000)
hierarchical groupsWatts, Dodds, Newman (2001), Kleinberg(2001)
high degree nodesAdamic, Puniyani, Lukose, Huberman (2001), Newman(2003)
How are people are able to find short paths?
Reverse small world experiment
Killworth & Bernard (1978): Given hypothetical targets (name, occupation, location, hobbies, religion…)
participants choose an acquaintance for each target Acquaintance chosen based on (most often) occupation, geography only 7% because they “know a lot of people” Simple greedy algorithm: most similar acquaintance two-step strategy rare
nodes are placed on a lattice andconnect to nearest neighbors
additional links placed with puv~
Spatial search
“The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain”
S.Milgram ‘The small world problem’, Psychology Today 1,61,1967
ruvd
Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’Proc. 32nd ACM Symposium on Theory of Computing, 2000.(Nature 2000)
When r=0, links are randomly distributed, ASP ~ log(n), n size of grid
When r=0, any decentralized algorithm is at least a0n2/3
no locality
When r<2, expected time at least rn(2-r)/3
0~p p
Overly localized links on a latticeWhen r>2 expected search time ~ N(r-2)/(r-1)
4
1~p
d
Links balanced between long and short range
When r=2, expected time of a DA is at most C (log N)2
2
1~p
d
Hierarchical social network models
Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’NIPS 14, 2001
Hierarchical network models:
Individuals classified into a hierarchy, hij = height of the least common ancestor.
Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)
Group structure models:Individuals belong to nested groupsq = size of smallest group that v,w belong to
f(q) ~ q-
Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)
ijhijp b
h b=3
e.g. state-county-city-neighborhoodindustry-corporation-division-group~
Sketch of proof
T
S
R|R|<|R’|<|R|
k = c log2n calculate probability that s fails to have a link in R’
R’
Identity and search in social networksWatts, Dodds, Newman (Science,2001)
individuals belong to hierarchically nested groups
multiple independent hierarchies h=1,2,..,H coexist corresponding to occupation, geography, hobbies, religion…
pij ~ exp(- x)
Identity and search in social networksWatts, Dodds, Newman (2001)
Message chains fail at each node with probability pNetwork is ‘searchable’ if a fraction r of messages reach the target
N=102400
N=409600
N=204800
(1 )L
Lq p r
Small World Model, Watts et al.
Fits Milgram’s data well
Model parameters:N = 108
z = 300 g = 100b = 10= 1, H = 2
Lmodel= 6.7Ldata = 6.5
http://www.aladdin.cs.cmu.edu/workshops/wsa/papers/dodds-2004-04-10search.pdf
more slides on this:
Mary
Bob
Jane
Who couldintroduce me toRichard Gere?
High degree search
Adamic et al. Phys. Rev. E, 64 46135 (2001)
Small world experiments so far
Classic small world experiment:
Given a target individual, forward to one of your acquaintances
Observe chains but not the rest of the social network
Reverse small world experiment (Killworth & Bernard)
Given a hypothetical individual,
which of your acquaintances would you choose
Observe individual’s social network and possible choices,
but not resulting chains or complete social network
Use a well defined network:HP Labs email correspondence over 3.5 months
Edges are between individuals who sent at least 6 email messages each way
450 usersmedian degree = 10, mean degree = 13average shortest path = 3
Node properties specified:degreegeographical locationposition in organizational hierarchy
Can greedy strategies work?
Testing search models on social networksadvantage: have access to entire communication networkand to individual’s attributes
100
101
102
103
104
10-8
10-6
10-4
10-2
100
outdegree
freq
uenc
y
outdegree distribution = 2.0 fit
Power-law degree distribution of all senders of email passing through HP labs
Strategy 1: High degree search
number of recipients sender has sent email to
pro
po
rtio
n o
f se
nd
ers
Filtered network (at least 6 messages sent each way)
0 20 40 60 800
5
10
15
20
25
30
35
number of email correspondents, k
p(k
)
0 20 40 60 8010
-4
10-2
100
k
p(k
)
Degree distribution no longer power-law, but Poisson
It would take 40 steps on average (median of 16) to reach a target!
Strategy 2:Geography
1U
2L 3L
3U
2U
4U
1L
87 % of the4000 links arebetween individualson the same floor
Communication across corporate geography
Cubicle distance vs. probability of being linked
102
103
10-3
10-2
10-1
100
distance in feet
pro
po
rtio
n o
f lin
ked
pa
irs
measured1/r
1/r2
optimum for search
Strategy 3: Organizational hierarchy
Email correspondence superimposed on the organizational hierarchy
Example of search path
distance 1
distance 1
distance 2
hierarchical distance = 5search path distance = 4
distance 1
Probability of linking vs. distance in hierarchy
in the ‘searchable’ regime: 0 < < 2 (Watts, Dodds, Newman 2001)
2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6p
rob
ab
ility
of l
inki
ng
hierarchical distance h
observedfit exp(-0.92*h)
Results
0 5 10 15 20 250
1
2
3
4
5x 10
4
number of steps in search
nu
mb
er
of
pa
irs
distance hierarchy geography geodesic org random
median 4 7 3 6 28
mean 5.7 (4.7) 12 3.1 6.1 57.4
0 2 4 6 8 10 12 14 16 18 200
2000
4000
6000
8000
10000
12000
14000
16000
number of steps
nu
mb
er
of p
airs
hierarchy geography
Expt 2
Searchinga socialnetworkingwebsite
Profiles:
status (UG or G) yearmajor or departmentresidencegender
Personality (choose 3 exactly):you funny, kind, weird, …friendship honesty/trust, common interests, commitment, …romance - “ -freetime socializing, getting outside, reading, …support unconditional accepters, comic-relief givers, eternal optimists
Interests (choose as many as apply)books mystery & thriller, science fiction, romance, …movies western, biography, horror, …music folk, jazz, techno, …social activities ballroom dancing, barbecuing, bar-hopping, …land sports soccer, tennis, golf, …water sports sailing, kayaking, swimming, …other sports ski diving, weightlifting, billiards, …
Differences between data sets
• complete image of communication network
• affinity not reflected
• partial information of social network
• only friends listed
HP labs email network Online community
0 20 40 60 80 1000
50
100
150
200
250
number of links
nu
mb
er
of u
sers
with
so
ma
ny
links
100
101
102
100
101
102
number of links
num
ber
of u
sers
Degree Distribution for Nexus Net 2469 users, average degree 8.2
Problem: how to construct hierarchies?
0 1 2 30
0.002
0.004
0.006
0.008
0.01
0.012
0.014
separation in years
pro
b. t
wo
un
de
rgra
ds
are
frie
nd
s
data
(x+1)-1.1 fit
0 1 2 3 4 50
0.005
0.01
0.015
0.02
separation in years
prob
. tw
o gr
ads
are
frie
nds data
(x+1)-1.7 fit
Probability of linking by separation in years
Hierarchies not useful for other attributes:
0 100 200 300 400 500 6000
0.01
0.02
0.03
0.04
0.05
0.06
distance between residences
pro
ba
bili
ty o
f be
ing
frie
nd
s
Geography
Other attributes: major, sports, freetime activities, movie preferences…
Strategy using user profiles
prob. two undergrads are friends (consider simultaneously)
• both undergraduate, both graduate, or one of each
• same or different year
• both male, both female, or one of each
• same or different residences
• same or different major/department
Results
random 133 390high degree 39 137profile 21 53
strategy median mean
With an attrition rate of 25%, 5% of the messages get through atan average of 4.8 steps,=> hence network is barely searchable
The accuracy of small world chains in social networksPeter D. Killworth, Christopher McCarty, H. Russell Bernard, Mark House
Social Networks, 2006 First parallel study of
individuals choices vs. actual shortest paths
Network 105 members of an
interviewing bureau 10,920 shortest path
connections who knows whom who a person would select as
the next link in a chain to a particular person x recent hire
old timer
worked a while
Accuracy of small world chains
Shortest paths use the network of who-knows whom to calculate actual shortest
paths compare to paths formed by individuals’ choices
21.7% fail through reaching missing data 23.7% reach cycles : i chooses j, j chooses i 54.6% reach the target, with chains that are 40% longer on average
than the shortest path
Next choice accuracy and a Markov model
48% of the time, a person chooses a contact who is closer to the target over half of the choices are wrong!
Markov model: terminate chain with probability (attrition) choose someone closer to the target with probability p,
otherwise choose someone at same distance
LiveJournal
LiveJournal provides an API to crawl the friendship network + profiles friendly to researchers great research opportunity
basic statistics Users
How many users, and how many of those are active? Total accounts: 9980558 ... active in some way: 1979716 ... that have ever updated: 6755023 ... updating in last 30 days: 1300312 ... updating in last 7 days: 751301 ... updating in past 24 hours: 216581
Predominantly female & young demographic
Male: 1370813 (32.4%) Female: 2856360 (67.6%) Unspecified: 1575389
13 18483
14 87505
15 211445
16 343922
17 400947
18 414601
19 405472
20 371789
21 303076
22 239255
23 194379
24 152569
25 127121
26 98900
27 73392
28 59188
29 48666
Age distribution
Geographic Routing in Social Networks
David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins (PNAS 2005)
data used Feb. 2004 500,000 LiveJournal users with US locations giant component (77.6%) of the network clustering coefficient: 0.2
Degree distributions
The broad degree distributions we’ve learned to know and love but more probably lognormal than power law
broader in degree than outdegree distribution
Results of a simple greedy geographical algorithm
Choose source s and target t randomly Try to reach target’s city – not target itself At each step, the message is forwarded from the current message holder u
to the friend v of u geographically closest to t
stop if d(v,t) > d(u,t)
13% of the chains are completed
stop if d(v,t) > d(u,t)
pick a neighbor at random in the same city if possible, else stop
80% of the chains are completed
the geographic basis of friendship
= d(u,v) the distance between pairs of people The probability that two people are friends given their
distance is equal to P() = + f(), is a constant independent of geography is 5.0 x 10-6 for LiveJournal users who are very far apart
the geographic basis of friendship
The average user will have ~ 2.5 non-geographic friends The other friends (5.5 on average) are distributed according to an
approximate 1/distance relationship But 1/d was proved not to be navigable by Kleinberg, so what
gives?
Navigability in networks of variable geographical density
Kleinberg assumed a uniformly populated 2D lattice But population is far from uniform population networks and rank-based friendship
probability of knowing a person depends not on absolute distance but on relative distance (i.e. how many people live closer) Pr[u ->v] ~ 1/ranku(v)
Structured search Conclusions
Individuals associate on different levels into groups.
Individuals tend to know others who are ‘close by’
Group structure facilitates decentralized search using social ties.Hierarchy search faster than geographical search
Simple strategies are not perfect – but short (rather than shortest) chains can be found
Weighted shortest paths
Routes shortest route from Chicago to Boston
vertex: intersection edge weights: road distances alternative weights: expected time traveled, gas consumed… usually sum the weights from each segment
start
finish
freeway, 65 mph
40 miles/65 mph ~ 37 minutes
freeway, 70 mph
30 miles/70 mph
~ 26 minutes
surface road
25 mph, 50 miles
2 hours
Reliable paths through social networks
The probability of transmitting a message or infectious agent could be related to the strength of the tie e.g. rather than summing the weights, we might multiply the
probabilities of getting through
p = 0.5p = 0.5
p = 0.001 p = 1
Probability of getting an idea through to the head of labs
via CEO (0.001*1 = 0.001), via direct manager (0.5*0.5 = 0.25)
p = 0.05
MotivationPower-law (PL) networks, social and P2P
Analysis of scaling of search strategies in PL networks
Simulationartificial power-law topologies, real Gnutella networks
Comparison with existing P2P search strategiesReflector, Morpheus
Directed SearchFreenet
Search in random networks
2
Mary
Bob
Jane
Who couldintroduce me toRichard Gere?
How do we search?
AT&T Call Graph
Aiello et al. STOC ‘00
# o
f te
lep
hon
e n
um
be
rsfr
om
wh
ich
ca
lls w
ere
ma
de
# of telephone numbers called
100
101
100
101
102
number of neighbors
pro
po
rtio
n o
f n
od
es
datapower-law fit = 2.07
Gnutella network
power-law link distribution
summer 2000,data provided by Clip2
Preferential attachment model
Nodes join at different times
The more connections a node has, the more likely it is to acquirenew connections
Growth process produces power-law network
host cache
pingping
ping
ping
ping
ping ping
file sharing w/o a central index
queries broadcast to every node within radius ttl as network grows, encounter a bandwidth barrier (dial up modems cannot keep up with query traffic, fragmenting the network)
Gnutella and the bandwidth barrier
Clip 2 reportGnutella: To the Bandwidth Barrier and Beyondhttp://www.clip2.com/gnutella.html#q17
1
6
54
63
67
2
94
number ofnodes found
power-law graph
93
number ofnodes found
13
711
1519
Poisson graph
Search with knowledge of 2nd neighbors
Outline of search strategy
pass query onto only one neighbor at each step
requires that nodes sign query- avoid passing message onto a node twice
requires knowledge of one’s neighbors degree- pass to the highest degree node
requires knowledge of one’s neighbors neighbors- route to 2nd degree neighbors
OPTIONS
Generating functions
M.E.J. Newman, S.H. Strogatz, and D.J. Watts ‘Random graphs with arbitrary degree distributions and their
applications’, PRE, cond-mat/0007235
Generating functions for degree distributions
Useful for computing moments of degree distribution, component sizes, and average path lengths
00
( ) kk
k
G x p x
Fun with generating functions
normalization condition: probabilities sum to 1
1)1(0
0
kkpG
derivatives: the generating function contains all the information of the degree distribution
00
!
1
xk
k
k x
G
kp
Fun with generating functions (cont’d)
Expected degree of a randomly chosen vertex
'0(1)k
k
k kp G
Higher moments of degree distribution
1
0 )(
x
n
kk
nn xGdx
dxpkk
Example: Poisson distribution
Let p = z/N be the probability of an edge existing between two vertices (z is the average degree)
kkNkN
k
xppk
NxG
)1()(
00
)1(~)1( xzN epxp for large N
)1(0 )(' xzzexG
zG )1('0
zkxk
k
k ezkx
G
kp
!
1
!
10
0 just the regular Poisson
distribution
Introducing cutoffs
max 1k N a node cannot have more connections than there are other nodes
This is important for exponents close to 2
1 1
11kp C
x 2 2
6C
1000
( 1000, 2) ~ 0.001kp k p
Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors:
1000(1 ( 1000, 2)) ~ 0.36p k
without a cutoff, for = 2have > 50% chance of observing a node with more neighbors than there are nodes
for = 2.1, have a 25% chance
# of sites linking to the site
prop
ortio
n of
site
s w
/ so
man
y lin
ks
1000
Selecting from a variety of cutoffs
Nk max1.
2. /k
k eCkp Newman et al.
3.
0
Ckpk
1CNk
otherwise
Generating Function
kCN
k
xkCxG
1
10
Aiello et al.
1 million websites (~ 1997)
N
Aiello’s ‘conservative’ vs. Havlin’s ‘natural’ cutoff
1
1
* 1
~
kN p
Ck N
k N
cutoff where expectednumber of nodes of degreek is 1
k
n(k)
k
n(k)
1
1
cutoff so thatexpected number of nodesof degree > k is 1
max
max
1
1 1max
1
1max
* 1
~
~
~
kk k
k k
N p
ck N
k N
k N
The imposed cutoff can have a dramaticeffect on the properties of the graph
degrees drawn at random, for = 2, and N = 1000
00
( ) kk
k
G x p x
is the probability that a randomlychosen vertex has degree k
~kp k
is a generating function
'0(1)k
k
k kp G is the expected degree of a randomlychosen vertex
'0
1 '0 1
G xG x
G is the distribution of remaining
outgoing edges following and edge
assuming neighbors don’t share edges
11 '1
'02 GGz is the expected number of second
degree neighbors
22
2
2 2
2
1
11
Generating functions for degree distributions Random graphs with arbitrary degree distributions and their applications by Newman, Strogatz & Watts
search with knowledge of first neighbors
max
max
maxmax
max
max
01
' 1 10 0
1
' 1 1 20 max
1 1
'' 1 101 ' '
10 0
1 2'
20
3 2' max1 '
0
( )
( ) ( )
1(1) 1
2
( )( )
(1) (1)
( 1)(1)
( 2) 21(1)
(1)
kk
kk
kk
kk
kk
G x c k x
G x G x c k xx
G k c k k dk k
G x cG x k x
G G x
ck k x
G
kG
G
2max( 1) (3 )
( 2)(3 )
k
Generating function with cutoff
Average degree of vertex
constant in Nfor 2<<3, and kmax~Na, decreaseswith N
Average number of neighborsfollowing an edge
~
search with knowledge of first neighbors (cont’d)
3 3' 3max max
1 1 max' 20 max
1 2(1)
(1) (3 ) 1 (3 )B
k kz G k
G k
In the limit t->2, ' max1
max
(1)log( )
kG
k
Let’s for the moment ignore the fact that as we do a random walk, we encounter neighborsthat we’ve seen before
s = number of steps =1B
N
z
~ ~
~
Search time with different cutoffs
0.18(2.1)s N
3 3m
2
ax
( ) ,2 3N N
sk N
NIf kmax = N,
max
max
log(log( )
, 2)N k
s Nk
2
21
33max 1
( ) ,2 3NN N
sk
N
If kmax = N1/(-1),
max
max
log( )(2) log( )
N ks
kN
0.1(2.1) Ns
~
~
~
~
~
~
grow from 1,000 to 1,000,000 nodes,
search time increases by a factor of ~2
grow 1000x
search time increases 3x
search with knowledge of first neighbors (cont’d)
2 3 /13
3max
,2 3
( )
N Ns N
kN
If kmax = N,
So the best we can do is for exponents close to 2 N
2nd neighbor random walk, ignoring overlap:
15.0~1.2, NNS 213~, NNS
2
2
~
s B
B
n z N
NS
z N
232' max
2 1 1 1 21 max
2( ( )) (1)
1 (3 )Bx
kz G G x G
x k
~
Following the degree sequence
a ~ s = # of steps taken
1.0deg 1.2, NNS
2nd neighbors, ignoring overlap:
Go to highest degree node, then next highest, … etc.
max
max
1 11 max~
k
D k az Nk dk Nak
max
max
' 2(2 )1 1
2( 2) 2 4 /
( ) ~
~ ~
Dz G x Nak
s k N
0 10 20 30 40 50 60 70 80 90 100
1
= 2.00 = 2.25 = 2.50 = 2.75 = 3.00 = 3.25 = 3.50 = 3.75
degree of node
deg
ree
of
nei
gh
bo
r -
1d
egre
e o
f n
od
e
2
20
10
5
Ratio of the degree of a node to the expected degree of its highest degree neighbor for 10,000 node power-law graphs of varying exponents
Actor collaboration graph (imdb database)
~ 2.0-2.2
Exponents close to 2 required to search effectively
World Wide Web, ~ 2.0-2.3, high degree nodes: directories, search engines
Social networks, AT&T call graph ~ 2.1
Gnutella
100 101 102 103 104100
101
102
103
104
105
number of costars
num
ber
of a
ctor
s/ac
tres
ses actors, = 2
actresses, = 2.1
15
50
18 17
8
10
9
6
Following the degree sequence
Complications
Should not visit same node more than once
Many neighbors of current node being visited were also neighbors of previously visited nodes, and there is a bias toward high degree nodes being ‘seen’ over and over again
0 100 200 300 400 500 6000
5
10
15
20
25
30
step
deg
ree
of
no
de
not visitedvisitedneighbors visited
Status and degree of node visited
1
10-2
0.1
1
step
pro
po
rtio
n o
f n
od
es
fo
un
d a
t s
tep
random walk
10 102
103
104
105
106
10-3
10-4
0 20 40 60 80 100
0.2
0.4
0.6
0.8
1
step
cum
ula
tive
no
des
fo
un
d a
t st
ep
random walk degree sequence
seeking high degree nodesspeeds up the search process
about 50% of a 10,000 node graphis explored in the first 12 steps
Progress of exploration in a 10,000 node graph knowing2nd degree neighbors
12
degree sequence
101
102
103
104
105
100
101
102
103
size of graph
cove
rtim
e fo
r h
alf
the
no
des
random walk = 0.37 fit
degree sequence = 0.24 fit
Scaling of search time with size of graph
Comparison with a Poisson graph
100
101
102
103
100
101
102
de
gre
e o
f c
urr
en
t n
od
e
step
Poisson power-law
101
102
104
106
100
101
102
103
104
105
number of nodes in graph
co
ve
r ti
me
fo
r 1
/2 o
f g
rap
h
constant av. deg. = 3.4
= 1.0 fit
10
xzexG
xGxGz
xxG 001
expected degree and expecteddegree following a link are equal
scaling is linear
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
step
cum
ula
tive
no
des
fo
un
d a
t st
ep
high degree seeking 1st neighborshigh degree seeking 2nd neighbors
50% of the files in a 700 node network can be found in < 8 steps
Gnutella network
• Maintain a list of files in their neighborhood
• Check query against list.
• Periodically contact neighbors to maintain list
• Append ID to each query processed
Required modifications to nodes
Tradeoff
storage/cpu(available)
bandwidth(limited)
for
• localized indexing• traffic routed to high degree nodes
Partial implementation:
Theory vs. reality:
• overloading high degree nodes but no worse than original scenario where all nodes handle all traffic
assume high degree -> high bandwidth so can carry the traffic load
• fewer nodes used for routing, system is more susceptible to maliciousattack
Clip2 Distributed Search Solutionshttp://dss.clip2.com© Clip2.com, Inc.
Broadband user runningReflector
Broadband user runningGnutella
Dial-up user runningGnutella
LimeWire, BearShare:drop connections to unresponsive hosts drives slower hosts to have fewer connections &move to edge of network
Kazaa, BearShare defender, Morpheus SuperNodes
from Clip2: Morpheus out of the Underworldhttp://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html
Connection-preferencing rules
Supernodes
Freenet
Queries are passed to one peer at a time.
Queries routed to high degree nodes.
Has a power-law topology Theodore Hong, ‘Performance’ chapter in O’Reilley’s “Peer-to-Peer, Harnessing the Power of Disruptive Technologies”
Scales as N0.275 with the size of the network, N.
Theodore Hong, power - law link distribution of a simulated Freenet network
Theodore Hong, scaling of mean search time
on a simulated Freenet network
Node specialization key to Freenet’s speed
Each node forwards query to node with “closest” hash key
Node passing back a match remembers the address the data came from
Results in nodes developing a bias towards a part of the keyspace
112659 ?356?
340388
396 135214
356340388
396 135214
Queries are naturally routed to high degree nodesUse keys for orientation
Conclusions
Search is faster and scales in power-law networks
Networks intended to be searched, such as Gnutella,have a favorable P-L topology
High degree strategy has partially been implemented in existing p2pclients, such as BearShare, Kazaa & Morpheus
Current research on search
search in weighted networks expertise search P2P architectures with ‘friendship’ overlays weak ties vs. strong ties and online communication
A PL link distribution shortens the average shortest path
1
1
1
21
1 zz
zzz
r
rr
Poisson: = z1
PL: > z1
1 1.5 2 2.5 3 3.5 4 4.5 510
0
101
102
103
104
105
106
radius
neig
hbor
s at
rad
ius
power-law =2.5Poisson =1.0
102
104
106
0
2
4
6
N
PLPS
A
B
What about the shortest path discovered along the way?
B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103.
each node passes message to highest degree neighbor it hasn’t passed the message to previously
‘cut off’ loops
102
103
104
105
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
N
av
. pa
th le
ng
th f
ou
nd
PL high degree0.72*ln(N)
A high degree seeking strategy finds shortest paths whose average scales logarithmically with the size of the graph
102
103
104
105
101
102
N
av
. pa
th le
ng
th f
ou
nd
PLPoisson
N0.46
N0.48
Scaling of the path length found using a• random strategy on a PL graph• high-degree strategy on a Poisson graph
102
103
104
105
100
101
102
103
104
N
me
dia
n s
ea
rch
co
st
PL high degreePL randPoisson high degree
But…Search costs are prohibitive, might as well do a BFS