Geogra phic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar...
40
Geographic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins Presentation prepared by Dor Medalsy
Geogra phic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins Presentation prepared by Dor Medalsy
Geogra phic routing in social networks David Liben-Nowell,
Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins
Presentation prepared by Dor Medalsy
Slide 2
Introduction Anecdotal evidence that we live in a small world,
where arbitrary pairs of people are connected through extremely
short chains of intermediary friends, is ubiquitous. Experimental
studies have verified this property in real social networks, and
theoretical models have been advanced to explain it we introduce a
richer model relating geography and social-network friendship.
Slide 3
Milgrams experiment Sociological experiments, beginning with
the seminal work of Milgram have shown that a source person can
transmit a message to a target through only a small number of
intermediate friends, using only scant information about the
targets geography and occupation. The successful messages passed
from source to target through six intermediaries, six degrees of
separation.
Slide 4
Milgrams experiment The explanation was: Random graphs have
small diameter. Only two minor problems: o a bad model of social
networks. o doesnt explain the small-world phenomenon six degrees
of separation.
Slide 5
Geographic dimension As part of the recent surge of interest in
networks, there has been active research exploring strategies for
navigating synthetic and small-scale social networks including
routing through common membership in groups, popularity, and
geographic proximity. Subjects report that geography and occupation
are by far the two most important dimensions in choosing the next
step in the chain. Geography tends to predominate in early
steps.
Slide 6
Adding nongeographic dimensions to routing strategies,
especially once the chain has arrived at a point geographically
close to the target, can make routing more efficient. However,
geography appears to be the single most valuable dimension for
routing, and we are thus interested in understanding how powerful
geography alone may be. Question: what is the connection between
friendship and geography, and to what extent can this connection
explain the navigability of large-scale real-world social
networks?
Slide 7
We present a study that combines measurements of the role of
geography in a large social network with theoretical modeling of
path discovery, using the measurements to validate and inform the
theoretical results. I. A simulation-based study on a 500,000
person online social network reveals that routing through
geographic information alone allows people to discover short paths
to a target city. II. 70% of friendships are derived from
geographical processes.
Slide 8
III. Existing models that predict the probability of friendship
solely on the basis of geographic distance are too weak to explain
these friendships, rendering previous theoretical results
inapplicable. IV. Density-aware model of friendship formation
called rank-based friendship, relating the probability that a
person befriends a particular candidate to the inverse of the
number of closer candidates. We are able to prove that the presence
of rank-based friendship for any population density implies that
the social network will contain discoverable short paths to small
destination region under geographic routing.
Slide 9
The LiveJournal Community Online blogging community. 1.3
million ( 500,000-in the continental USA) in February 2004.
LiveJournal users provide: o Disturbingly detailed accounts of
their personal lives. o Profiles (geographic location, topical
interests, explicit list of other bloggers whom he or she considers
to be a friend). o Using geographic location, compute
longitude/latitude of users. The resolution of our geographic data
is limited to the level of towns and cities. So we Study problem of
global routing.
Slide 10
The LiveJournal Community In our study our goal is to direct a
message to the targets city by geographic factors only. Once the
proper locality has been reached, a local routing problem must then
be solved to move the message from the correct city down to the
correct person by using a wide set of potential nongeographic
factors, like interests or profession.
Slide 11
The LiveJournal Community Graph of LiveJournal social network:
A set of user vertices(500,000) A social relationship linking them
- edges(3,959,440. friendship) u is a friend of v relationship
defined by the explicit appearance of blogger u in the list of
friends in the profile of blogger v. d(u,v) - The geographic
distance between two people u and v.
Slide 12
The graph is directed network. About 8 friend per user. 80% of
them mutual friendship. 77.6% form a giant component in which any
two people u and v are connected by chains of friends. The
coefficient of the network is 0.2 (the proportion of the time that
u and v are themselves friends if they have a common friend w)
Slide 13
The in-degree log/log plot is more linear than the out-degree
plot, but both appear far more parabolic than linear. These curves
provide some evidence supporting a log-normal degree distribution
in social networks, instead of a power-law distribution.
Slide 14
Geographic Routing We perform a simulated version of the
message-forwarding experiment in the LiveJournal social network,
using only geographic information to choose the next message holder
in a chain. The main goals: 1. Determining whether individuals
using purely geographic information in a simple way can succeed in
discovering short paths to a destination city. 2. Analyzing the
applicability of existing theoretical models that explain the
presence or absence of short discoverable paths in networks.
Slide 15
Geographic Routing This approach allows us to investigate the
performance of simple routing schemes without suffering from a
reliance on the voluntary participation of the people in the
network. The information on the location of every friend of every
participant then allows us to analyze in detail the underlying
geographic basis of friendship in explaining these results. The
simulation, messages are forwarded by using the geographically
greedy routing algorithm GEOGREEDY.
Slide 16
GEOGREEDY Algorithm 1) Choose source s and target t randomly.
2) Try to reach targets city not target itself. 3) At each step,
the message is forwarded from the current message holder u to the
friend v of u geographically closest to t - MIN{d(v,t) : v friend
of u}. 4) If d(v,t)>d(u,t) then the chin fails. 5) Stop when you
reach targets city Problem: Contain restrict condition users can
forward message only to friend whom they have explicitly listed in
their profile and the friend geographically closest to the
target.
Slide 17
Modifed GEOGREEDY Algorithm 1) Choose source s and target t
randomly. 2) Try to reach targets city not target itself. 3) At
each step, the message is forwarded from the current message holder
u to the friend v of u geographically closest to t - MIN{d(v,t) : v
friend of u}. 4) If d(v,t)>d(u,t) then forwards the message to a
person selected at random from us city else the chin fails. 5) Stop
when you reach targets city
Slide 18
Results of GeoGreedy Algorithm stop if d(v,t) > d(u,t) 13%
of the chains are completed. Median 4 Mean length 4.12 if d(v,t)
> d(u,t) pick a neighbor at random in the same city if possible,
else stop. 80% of the chains are completed. Median 12 Mean length
16.74 f(k) - The fraction of pairs in which the chain reaches ts
city in exactly k steps.
Slide 19
GEOGREEDY Algorithm Conclusions 1) Routes messages only to the
destination city and does not suffer from problems of voluntary
participation, which explain why our completion rate is
significantly higher than earlier experiments. 2) Even under
restrictive forwarding conditions(narrow choice of actions),
geographic information is sufficient to perform global routing in a
significant fraction of cases. 3) This simulated experiment shows
that the first GEOGREEDY algorithm is lower bound on the presence
of short discoverable paths and Modified GEOGREEDY Algorithm is
upper bound.
Slide 20
The Geographic Basis Of Friendship Because a restrictive
global-routing scheme enjoys a high success rate, a question
naturally arises: Is there some special structure relating
friendship and geography that might explain this finding?
Slide 21
The Geographic Basis Of Friendship We examine The relationship
between friendship probability and geographic distance: = d(u,v)
the distance between pairs of people P( ) - the proportion of pairs
u,v separated by distance who are friends. The probability that two
people are friends given their distance is equal to P( ) = + 1/ ,
is a constant independent of geography. probability is 5.0 x 10 -6
for LiveJournal users who are very far apart.
Slide 22
As increases, P( ) decreases, indicating that geographic
proximity indeed increases the probability of friendship. Fig. 3A
verifies that geography remains crucial in online friendship. for
distances larger than 1,000 km, the background friendship
probability begins to dominate geography-based friendships.
Slide 23
Slide 24
Removing nongeographic friendships them from our plot to see
only the geographic friendships, correcting for the background
friendship probability (f( ) = P( ) - ). f( ) decreases smoothly as
increases. We use only the average persons 5.5 geographic links to
give a sufficient explanation of the navigable small-world
phenomenon.
Slide 25
Kleinbergs social network model Put n people on a k-dimensional
grid. Connect each to its immediate geographic neighbors. Add one
long-distance link per person:
Slide 26
Kleinberg & Watts models Watts present a model to explain
searchability in social networks based on assignments of
individuals to locations in multiple hierarchical dimensions. Two
individuals are socially similar if they are nearby in any
dimension. Disadvantages: Although interests or occupations might
be naturally hierarchical, geography is far more naturally
expressed in 2D Euclidean space. Their work does not include a
theoretical analysis of the model as the network size grows, nor
does it include a direct empirical comparison to a real social
network.
Slide 27
Kleinbergs social network model
Slide 28
Kleinbergs model & GEOGREEDY if the probability f[d(u, v)]
of geographic friendship between u and v is roughly proportional to
1/(d(u, v))^2, then the finding of short paths by GEOGREEDY will be
explained.
Slide 29
Explain the contradiction A dot is shown for every distinct
United States location home to at least one LiveJournal user. The
population of each successive displayed circle increases by 50,000
people. Note that the gap between the 350,000- and 400,000-person
circles encompasses almost the entire Western United States.
Evidence of the nonuniformity of the LiveJournal population:
Slide 30
Explain the contradiction showing a distinction in friendship
probability as a function of distance for residents of the East and
West coasts. A geographic model of friendship must be based on more
than distance alone.
Slide 31
Why does distance fail? Population density varies widely across
US red and blue vertices: best friends in Minnesota, strangers in
Manhattan. To summarize: Any model of friendship that is based
solely on the distance between people is insufficient to explain
the geographic nature of friendships in the LiveJournal network. A
model must be based on something beyond distance alone.
Slide 32
How do we handle non-uniformly distributed populations?
Rank-Based Friendship Instead of distance, use Rank as the key
geographic notion: o when examining a friend v of u, the relevant
quantity is the number of people who live closer to u than v does.
o Formally, The probability that u and v are geographic friends
is:
Slide 33
Rank-Based Friendship Rank-based friendship implies that
GEOGREEDY will find short paths in any social network. The
LiveJournal network exhibits rank-based friendship.
Slide 34
A rank-based population network consists of: A 2-dimensional
grid N of locations. a population P of people, living at points in
N (|P|=n). a set E PP of friendships: one edge from each person in
each direction Long-range link to fifth person, chosen by
rank-based friendship Population Networks locations rounded to the
nearest integral point in longitude/latitude.
Slide 35
Geographic Linking in the LiveJournal Social Network We return
to the LiveJournal social network to show that rank-based
friendship holds in a real network.
Slide 36
Geographic Linking in the LiveJournal Social Network Fig. 5A.
The LiveJournal data contain geographic information limited to the
level of towns and cities, our data do not have sufficient
resolution to distinguish between all pairs of ranks. Fig. 5B. We
show the same data, where the probabilities are averaged over a
range of 1,306 ranks. This experiment validates that the
LiveJournal social network does exhibit rank-based friendship,
which thus yields a sufficient explanation for the experimentally
observed navigability properties.
Slide 37
Geographic Linking in the LiveJournal Social Network The same
data are replotted (unaveraged and averaged, respectively),
correcting for the background friendship probability: we plot the
rank r versus P(r) = 5.0 x 10 -6.
Slide 38
The slopes of the lines for the two coasts are nearly the same,
and they are much closer together than the distance
friendship-probability slopes shown in Fig. 4B. confirming that
probabilities based on ranks are a more accurate representation
than distance-based probabilities.
Slide 39
Summary The LiveJournal social network displays a surprising
and variable relationship between geographic distance and
probability of friendship, which is inconsistent with earlier
theoretical models. The network evinces short paths discoverable by
using geography alone, even though existing models predict the
opposite. Rank-based friendship is provides two desirable
properties: o (i) it matches our experimental observations
regarding the relationshipbetween geography and friendship. o (ii)
it admits a mathematical proof that networks exhibiting rank-based
friendship will contain discoverable short paths.
Slide 40
Summary The LiveJournal social network displays a surprising
and variable relationship between geographic distance and
probability of friendship, which is inconsistent with earlier
theoretical models. rank-based friendship is mechanism that has
been empirically observed in real networks and theoretically
guarantees small-world properties. Watts suggest that multiple
independent dimensions play a role in message routing, and our
results confirm this viewpoint: on average about one-third of
LiveJournal friend. We have shown that the natural mechanisms of
friendship formation result in rank-based friendship: people have
formed relationships with almost exactly the connection between
friendship and rank that is required to produce a navigable small
world.