Upload
hui-zhang
View
212
Download
0
Embed Size (px)
Citation preview
Computer Networks 46 (2004) 555–574
www.elsevier.com/locate/comnet
Using the small-world model to improveFreenet performance q
Hui Zhang a,*,1, Ashish Goel b,2, Ramesh Govindan a,1
a Department of Computer Science, University of Southern California, 2642 Ellendale place #207, 941 W. 37th Place,
Los Angeles, CA 90007, USAb Departments of Management Science and Engineering and (by courtesy) Computer Science, Stanford University, CA 94305, USA
Received 11 August 2003; received in revised form 5 April 2004; accepted 18 May 2004
Available online 8 July 2004
Responsible Editor: S. Lam
Abstract
Efficient data retrieval in an unstructured peer-to-peer system like Freenet is a challenging problem. In this paper, we
study the impact of workload on the performance of Freenet. We find that there is a steep reduction in the hit ratio of
document requests with increasing load in Freenet. We show that a slight modification of Freenet�s routing table cachereplacement scheme (from LRU to a replacement scheme that enforces clustering in the key space) can significantly
improve performance. Our modification is based on intuition from the small-world models and theoretical results by
Kleinberg; our replacement scheme forces the routing tables to resemble neighbor relationships in a small-world
acquaintance graph––clustering with light randomness. Our simulations show that this new scheme improves the re-
quest hit ratio dramatically while keeping the small average hops per successful request comparable to LRU. A simple,
highly idealized model of Freenet under clustering with light randomness proves that the expected message delivery time
in Freenet is O(logn) if the routing tables satisfy the small-world model and have the size h(log2n).� 2004 Elsevier B.V. All rights reserved.
Keywords: Peer-to-peer; Algorithms; Freenet; Small-world; Cache replacement
1389-1286/$ - see front matter � 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.comnet.2004.05.004
q This is an extended version of the paper appearing in the proceedings of the IEEE Infocom, 2002.* Corresponding author. Tel.: +1 323 766 9537; fax: +1 213 740 7108.
E-mail address: [email protected] (H. Zhang).1 This research was supported in part by NSF grant CCR0126347.2 This research was done while the author was at the University of Southern California, and was supported in part by NSF grant
CCR0126347 and NSF Career grant No. 0133968.
556 H. Zhang et al. / Computer Networks 46 (2004) 555–574
1. Introduction
A peer-to-peer networked system is a collabo-
rating group of Internet nodes which overlay their
own special-purpose network on top of the Inter-net. Such a system performs application-level rout-
ing on top of IP routing. These systems share some
of the same characteristics as the Internet: they can
grow to be quite large, may need to utilize distrib-
uted control and configuration, usually employ a
naming scheme that allows them to address a node
without knowing its exact whereabouts, and pos-
sess a routing mechanism that allows each nodeto meaningfully communicate with the rest of the
system. Typically a peer-to-peer network performs
a very specific function such as distributed data
storage, cache replication, multicasting, and uses
normal Internet functionality for all other pur-
poses. These systems are diverse enough that it
would not be desirable to make modifications to
the IP routing/naming protocols to support eachinstance of such a system.
Peer-to-peer systems such as Napster [20] and
Gnutella [21] have received a fair amount of atten-
tion in the non-technical literature. In a parallel
development, several research groups have de-
signed a variety of peer-to-peer systems: systems
that provide infrastructure for flexibly evolving
the Internet [13,16], and systems for large-scalenetwork storage [10], anonymous publishing [3],
and application-level multicast [2,5,6].
At the core of many of these systems [3,10,13,16]
lie innovative and novel distributed algorithms for
disseminating content. These systems can be mode-
led as distributed hash–tables; they essentially dis-
tribute Ækey,dataæ tuples across various nodes in alarge network in a manner that facilitates scalableaccess to these tuples using the key. We taxonomize
these systems into two categories: structured sys-
tems where the assignment of a key to the node
on which its corresponding data is stored is deter-
mined by the structure of the key space, and
unstructured systems where no such assignment ex-
ists. Systems like CAN [13], Chord [16], and Tapes-
try [19] are examples of the former, and Freenet [3]is an instance of the latter.
Such systems can have interesting and unantici-
pated properties. In this paper, we study the per-
formance of an unstructured system, Freenet. We
find that the hit ratio (the likelihood of finding
the datum associated with a key within a fixed num-
ber of network hops) in Freenet is crucially depend-
ent on the local policy used to manage the cache ofdata (called datastore in Freenet) and the routing
table. A standard LRU-like cache replacement pol-
icy can result in significantly low hit ratio under
heavy load (i.e., when the number of files stored
in the network is high). This is an important find-
ing. While memory is cheap these days, Freenet-
like peer-to-peer systems will always have limits
on cache sizes. Thus, examining the performancedegradation of such systems under heavy load be-
comes important, especially, for example, in the
context of understanding the immunity of these
systems to denial-of-service attacks. Moreover, this
is not merely an academic exercise; heavy routing
loads and the resulting poor performance have
been observed in the Freenet network [22] recently,
caused by users splitting large files (e.g., operatingsystem images) into hundreds of smaller files to
enable faster, parallel, downloads.
At low loads, Freenet performs well because
each node ‘‘specializes’’ in a part of the key space
[3], resulting in routing tables at nodes that are
clustered around a small number of values. At
higher loads, however, frequent replacement of
routing table entries using LRU breaks up the clus-tering in the routing table. Our solution to this per-
formance degradation relies on two observations.
First, we observe that the routing tables can be
shaped by changing the cache replacement policy
at each node; this need not include any changes
to the Freenet routing protocol and is therefore
incrementally deployable. Second, we use intuition
from the small world model [8,17,18] which saysthat the routing distance in a graph is small if each
node has pointers to its geographical neighbors
(causing clustering) as well as some randomly cho-
sen far away nodes. This intuition leads us to pro-
pose a simple clustered cache replacement scheme
with a small amount of randomization.
We then perform a simulation study to compare
the two schemes under heavy loads. Our proposedcache replacement scheme results in a significantly
higher hit ratio compared to LRU, and also a
significantly smaller number of average hops per
Fig. 1. An example of a search in a Freenet network. Node A
searches for key 8 and finally finds it in E.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 557
request. The proposed scheme and LRU result in
similar values for the average number of hops
per successful request. It is interesting that such a
small change in the system can lead to a significant
improvement.To understand this, we also develop an ideal-
ized model of Freenet under our cache replace-
ment scheme. We prove that the expected
message delivery time in this idealized model is
O(logn), provided we have h(log2n) amount ofmemory per node. Here n is the number of nodes
in the system. By comparison, more structured sys-
tems such as Chord [16], Pastry [15], Tapestry [19]achieve a delivery time of O(logn) with h(logn)memory. It is surprising that an unstructured sys-
tem such as Freenet can come so close to the struc-
tured systems, even in a highly idealized model.
The paper is organized as follows: in Section 2
we give a brief description of Freenet and the
small-world model. Section 3 presents the prelimi-
nary simulation results for the original Freenetprotocol. In Section 4 a simple enhanced-clustering
cache replacement scheme is proposed and the sim-
ulation results show that the enhanced-clustering
caching outperformed both LRU caching and the
strict enhanced-clustering caching without random
shortcuts. Section 5 presents our analytical results
and Section 6 concludes the paper.
2. Freenet and the small-world model
This section provides some background on the
design of Freenet and on the properties of small-
world graphs.
2.1. Freenet
Freenet [3] is a distributed anonymous informa-
tion storage and retrieval system. In Freenet, files
are identified by binary file keys obtained by
applying a hash function to a string that describes
the contents of the file. For this reason, we use the
words key, file, and data interchangeably in this
paper. Each node maintains a routing table whichis a set of Ækey,pointeræ pairs, where pointer pointsto a node that has a copy of the file associated with
key. A steepest-ascent hill-climbing search with
backtracking is used to locate a document. Loop
detection and aHopsToLive (Freenet�s TTL) coun-ter are added to this basic scheme to avoid request
looping and exhaustive searching.
Freenet�s routing algorithm is best described
using an example. Fig. 1 shows how a request mes-
sage is routed in Freenet. A request for key 8 is ini-
tiated at node A. Node A forwards the request to
node B, which forwards it to node C since in B�srouting table C is the node who has the closestkey to key 8. Node C is unable to contact any
other nodes and returns a backtracking ‘‘request
failed’’ message to B. Node B then tries its second
choice, D. Node D finds key 8 in its routing table
and forwards the request to the corresponding
node––E. The data is returned from E via D and
B back to A, which ends this request sequence.
The data is cached on D, B and A. An entry forkey 8 is also created in the routing tables of D, B
and A. Data inserts follow a similar strategy to re-
quests [3].
In this algorithm, there is no global consensus
on where a document should be stored. Freenet
chose an unstructured key space architecture in or-
der to achieve anonymity and deniability. How-
ever, the request routing, data insertion, anddata caching algorithms in Freenet are designed
so that, over time, the key space becomes increas-
ingly structured and the routing tables converge to
a state where most of the queries are answered suc-
cessfully and quickly.
In addition to appropriately structuring theFree-
net routing tables, essential to system performance
3 The simulator can be downloaded from http://net-
web.usc.edu/~huizhang/freenet.html
558 H. Zhang et al. / Computer Networks 46 (2004) 555–574
in Freenet is its datastore. As described in Freenet
protocol [3], when a file is ultimately returned (for-
warded) for a successful retrieval (insertion), the
node caches the file in its datastore, passes the data
to the upstream (downstream) requester, and cre-ates a new entry in its routing table associating
the actual data source with the requested key.
When a new file arrives (from either a new insert
or a successful request) which would cause the
datastore to exceed the designated size, the least
recently used (LRU) files are evicted in order until
there is room. Routing table entries are also even-
tually replaced using an LRU policy as the tablefills up. Note that the size of the routing table is
chosen with the intention that the entry for a file
will be retained longer than the file itself.
2.2. Small-world model
The small-world phenomenon is pervasive in
networks arising from society, nature, and tech-nology. In many such networks, empirical obser-
vations suggest that any two individuals in the
network are likely to be connected through a short
sequence of intermediate acquaintances [9,17]. One
network construction that gives rise to small-world
behavior is where each node in the network knows
its physical neighbors, as well as a small number of
randomly chosen distant nodes. The latter repre-sent shortcuts in the network. It has been shown
that this construction leads to graphs with small
diameter, leading to a small routing distance be-
tween any two individuals [17,18]. Kleinberg [9]
defined an infinite family of network models that
naturally generalized the model in [18] and then
proved that there is exactly one model within this
family for which a decentralized algorithm existsto find short paths with high probability. In this
model, the probability of a random shortcut being
a distance x away from the source is proportional
to 1/x in one dimension, proportional to 1/x2 in
two dimensions, and so on.
2.3. Freenet and the small-world phenomenon
Freenet�s routing mechanisms were designed toevolve the network into a state with low average
path lengths and high hit ratios. In particular, in
[7] it is shown that the median path length in a
Freenet network scales logarithmically with the
size of the network and its clustering coefficient
is high, both defining characteristics of small-
world networks. However, those simulations wereconducted under low load. In their 1000-node sim-
ulation, the average number of files inserted into
the network by one node is only 2.5. We were able
to reproduce the logarithmic relationship observed
in [7] at these low loads using our simulator.
However, in this paper, we focus on the behav-
ior of Freenet under heavy loads. We show that, in
this regime, Freenet�s performance degrades con-siderably. Then, we use intuition from the small-
world model to design a cache replacement scheme
that attempts to preserve Freenet�s performanceeven under heavy loads.
3. Simulating Freenet performance under heavy load
Freenet�s design focus on anonymity makes ithard to measure the global network directly, a fact
already observed by the designers of Freenet [3].
For this reason, we resort to simulation to study
the performance of Freenet. We implemented a
stand-alone simulator 3 that mimics document
generation, storage, routing, and retrieval in Free-
net. In this section, we state the assumptions madein our simulation, define the metrics that we use in
our evaluation of Freenet and present our simula-
tion results.
Much of this paper is devoted to the study of
Freenet under heavy ‘‘loads’’. Our measure of load
is the average number of files inserted by a node
into the system, since we are interested in investi-
gating the impact of cache replacement strategies.
3.1. Simulation assumptions
Lacking data about actual file insertion work-
loads, we assume that all nodes generate data files
and send requests at the same rate. Furthermore,
we assume that all nodes have the same size of
0
0.2
0.4
0.6
0.8
1
5 10 15 20 25
Req
uest
Hit
Rat
io
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60)
LRU
Fig. 2. The curve of hit ratio vs. load for Freenet with LRU
scheme.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 559
datastore and routing table. While these assump-
tions are somewhat unrealistic, they enable us to
understand the impact of cache limits more easily.
We also examine how deviations from these
assumptions impact our conclusion.In our simulations, we do not model the impact
of node failures. Node failures are somewhat or-
thogonal to the impact of cache limits, and we
do not expect that they will alter our basic find-
ings. Node failures can impact the convergence
time of the Freenet system to a ‘‘good’’ state, how-
ever.
3.2. Performance metrics
Intuitively, the performance of a routing system
such as Freenet is well captured by the average
path length required to access a data item. In prac-
tice, Freenet imposes a HopsToLive on data ac-
cesses. Therefore, the performance of the system
is better represented by two metrics: request hit ra-tio and average hops per request.
The former is defined as the ratio of the number
of successful requests to the total number of re-
quests made. For reasons that will become later,
we define two variants of the latter metric. Average
hops per request is the ratio of the total number of
hops incurred across all requests to the number of
all requests. When a request fails, it is deemed tohave incurred HopsToLive number of hops. Aver-
age hops per successful request is defined similarly,
but only for requests that are successful.
3.3. Freenet performance under varying loads
In this section, we illustrate the performance of
Freenet under heavy load using a simple simula-tion. The duration of the simulation was 30,000
time steps, and the network had 300 nodes. All
files are of the same size (we call it uniform size dis-
tribution). Each node had a datastore limit of 60
files, a routing table limit of 110 files. The initial
topology of the system is a ring: each node has
pointer to two neighbors. This initial topology is
imposed by Freenet routing tables, and need nothave any relation to the underlying physical Inter-
net topology. Each request is limited to 40 hops.
Each node randomly generates and inserts a key
(i.e., a file) with probability K per time step in
the first 200 time steps (K varies in the range
[0.005,0.13]). All insertions are stopped after time
200. Each node generates a request for a random
key with probability R=0.002 per time stepthroughout the whole simulation. The document
popularity has an uniform distribution, i.e., all files
have the same probability to be requested.
The simulation result in Fig. 2 shows that re-
quest hit ratio decreases significantly with an in-
crease in the loads in the network. To check
whether this behavior was primarily due to cache
or routing table size limits, we repeated the abovesimulation but increased the datastore size to 180
and the routing table size to 230, or kept the data-
store size as 60 but increased the routing table size
to 240. The simulation results in Fig. 3 shows two
curves with the same shape. In addition, we did the
same simulation with the document popularity as
Zipf distribution and the document size as Zipf
distribution (the Zipf parameter a=1), and stillgot the qualitatively same curves as shown in
Fig. 4.
In an attempt to understand this rapid degrada-
tion with loads, we plot the key distribution (at the
end of the simulation) of an arbitrarily chosen
Freenet node�s routing table under light and heavyloads. Fig. 5 shows that under light loads, the keys
at this node tend to be clustered close to key num-ber 350,000 (approximately). But this clustering
0
0.5
1
1.5
2
0 200000 400000 600000 800000 1e+06Key value
Key Distribution in Routing Table under Heavy Load (LRU)
Fig. 6. The local clustering phenomenon becomes weak under
heavy load (in this case average number of keys generated per
node=20).
0
0.2
0.4
0.6
0.8
1
5 10 15 20 25
Req
uest
Hit
Rat
io
# of keys generated per node
RING TOPOLOGY (300 nodes)
LRU: cache size =180, routing table size =230LRU: cache size =60, routing table size =240
Fig. 3. Simulation with a larger datastore and/or routing table.
We see the hit ratio still decreased rapidly with the increasing of
the loads.
0
0.2
0.4
0.6
0.8
1
5 10 15 20 25
Req
uest
Hit
Rat
io
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60)
Zipf file populairtyZipf file size distribution
Fig. 4. The curves of hit ratio vs. load for Freenet with LRU
scheme, under Zipf file popularity or size distribution.
0
0.5
1
1.5
2
0 200000 400000 600000 800000 1e+06Key value
Key Distribution in Routing Table under Light Load (LRU)
Fig. 5. Local clustering is obvious under light load (in this case
average number of keys generated per node=2).
560 H. Zhang et al. / Computer Networks 46 (2004) 555–574
characteristic vanishes with an increase in the
number of files (keys) generated by each node (as
shown in Fig. 6), and the keys are more uniformlydistributed.
This disappearance of high local clustering ap-
pears to be responsible for the significantly low
hit ratio. Intuitively, the HopsToLive mechanism
(which assures reasonable latency) prevents
exhaustive search and Freenet nodes depend on lo-
cal clustering to direct the search for a given key. If
this intuition were true, we would expect that acache management strategy that preserves key
clustering under heavy loads would exhibit a much
more graceful system degradation. We investigate
such a strategy in the next section.
4. Enhanced-clustering cache replacement scheme
We are now left with an interesting problem:
how can we improve the routing performance of
the Freenet protocol efficiently without affecting
its design goals? Increasing the cache size or
HopsToLive value may not always be possible ordesirable. The cache-size is locally administered
Random key(shortcut)
s (x)
Clustered keys
Key Space
Fig. 7. For the routing table at node x to conform to the small-
world model, we need a set of key entries clustered around some
key s(x) and one or more randomly chosen shortcut keys.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 561
since it depends on individual system resources.Increasing the HopsToLive value could increase
the hit ratio, but at the expense of significantly in-
creased access latency. From Fig. 7, the crucial
observation is that we can achieve our perform-
ance goals by preserving key clustering in the
cache and this can be done passively by changing
the cache replacement policy without changing
the Freenet routing protocol or sacrificing ano-nymity and deniability. Recall that when a data-
store is full at a node, the node discards the least
recently used files and creates a new Ækey,pointerætuple in the routing table for the new file to be ca-
ched. In order to shape the routing table, we use
the following enhanced-clustering cache replace-
ment scheme instead of LRU (note that LRU is
still used for routing table replacement).
4.1. Enhanced-clustering cache replacement scheme
The enhanced-clustering cache replacement
scheme consists of two parts:
1. Each node x chooses a seed s(x) 4 randomly
from the key space S when it joins the system.2. When the datastore at a node x is full and a new
file with the tuple ÆKey u,DataSource dæ arrivesdue to either a new insertion or a successful re-
quest, the node finds out in the current data-
store the file with key v farthest from the seed
in terms of the distance in the key space S.
4 s(x) can be the globally unique identifier assigned to x in
original Freenet protocol.
Distanceðv; sðxÞÞ¼ maxw2Datastore
Distanceðw; sðxÞÞ:
(a) If Distance(u, s(x))>Distance(v, s(x)), with
the probability p (randomness) x caches the
new file, creates one entry Æu,dæ for it in therouting table, and evicts the old file associ-
ated with key v. This has the effect of creat-
ing a few random shortcuts.
(b) If Distance(u, s(x)) 6 Distance(v, s(x)), withthe probability 1�p x caches the new file,
creates one entry Æu,dæ for it in the routingtable, and evicts the old file associated with
key v. This has the effect of clustering the
keys in the routing table around the seed
of the node.
(c) Otherwise, the file is not locally cached and
no routing table entry is created for it.
4.2. Comparison of three cache replacement
schemes
We now investigate whether this simple replace-
ment scheme performs well under heavy load. To
calibrate its performance, we compare it with the
LRU cache replacement scheme. In addition, toseparately understand the contribution to per-
formance of clustering and random shortcuts, we
evaluate a third alternative that we call strict en-
hanced-clustering. Strict enhanced-clustering uses
the algorithm defined above, but with p set to 0:
this disables random shortcuts and effectively
shapes the routing table to make the network a
regular graph.In our evaluations, enhanced-clustering uses
p=0.03 and keeps small number of random short-
cuts in the routing table. Intuitively, this scheme
should make Freenet look more like a small-world
network. The probability p=0.03 was chosen in
order to achieve a combination of high hit ratio
and low average hops per request for the range
0 6 p 6 1 in simulations. It remains an interestingopen problem to determine the best value of p as a
function of the network and load characteristics.
4.2.1. Performance under heavy load
We repeated the simulation in Section 3 with
uniform file size and popularity distributions.
Fig. 8 shows the availability of the system (i.e.,
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25
Avg
. Hop
s P
er R
eque
st
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 9. Average hops per request for Freenet with LRU, strict
enhanced-clustering and enhanced-clustering.
0
5
10
15
20
25
0 5 10 15 20 25
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 10. Average hops per successful request for Freenet with
LRU, strict enhanced-clustering and enhanced-clustering.
5 Of the few random shortcuts in the routing table, most are
the pointers for the keys generated by the local node itself,
leaving the real random shortcuts not nearly enough to impact
performance.
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25
Req
uest
Hit
Rat
io
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 8. Availability of Freenet with LRU, strict enhanced-
clustering and enhanced-clustering.
562 H. Zhang et al. / Computer Networks 46 (2004) 555–574
the request hit ratio) as the load increases. Fig. 9
shows the average hops per request. These figures
show that enhanced-clustering techniques in gen-
eral result in higher availability and lower average
hops per request compared to LRU. Also, en-hanced-clustering significantly outperforms both
LRU and strict enhanced-clustering. Fig. 10 illus-
trates another interesting phenomenon. In this fig-
ure, we plot the average number of hops per
successful request. The original LRU scheme per-
forms much better than strict enhanced-clustering
in this metric. However, enhanced-clustering com-
pares well with LRU for this metric.
4.2.2. Key distribution in routing tables
To understand these performance results, we
draw the key distribution (at the end of the simu-
lation) in an arbitrarily chosen Freenet node�srouting table under heavy load for strict en-
hanced-clustering and enhanced-clustering. (Figs.11 and 12). Fig. 11 shows that the network under
strict enhanced-clustering can be modeled as a reg-
ular graph, where each node has pointers to a few
neighbors. 5 By contrast, Fig. 12 shows that en-
hanced-clustering does shape the routing table to
approximate the one in Fig. 7.
4.2.3. Hop distribution of successful requests
To get more insight into the performance differ-
ence among the three alternatives, we consider the
hop distribution of successful requests (shown in
Fig. 13) under heavy loads (26 keys per node).
While enhanced-clustering achieved much more
successful requests than LRU did (as 10,958 vs.
1396), they both have skewed hop distributions
such that most successful requests are resolved ina few hops. By contrast, strict enhanced-clustering
shows a uniform hop distribution, although the
number of its total successful requests (5781) is
much closer to that of enhanced-clustering than
0
0.5
1
1.5
2
0 200000 400000 600000 800000 1e+06Key value
Key Distribution in Routing Table under Heavy Load (Enhanced-clustering)
Fig. 12. The keys in the routing table cluster around the seed of
this node and some random keys distribute in the key space (in
this case average number of keys generated per node=20).
0
200
400
600
800
1000
1200
1400
1600
0 5 10 15 20 25 30 35 40
Num
ber
of R
eque
sts
Hops
Hop number Distribution of Successful Requests work load = 26 keys per node
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 13. Hop distribution of successful requests: heavy loads.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5 10 15 20 25 30 35 40
Num
ber
of R
eque
sts
Hops
Hop number Distribution of Successful Requests work load = 2 keys per node
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 14. Hop distribution of successful requests: light loads.
0
0.5
1
1.5
2
0 200000 400000 600000 800000 1e+06Key value
Key Distribution in Routing Table under Heavy Load (Strict enhanced-clustering)
Fig. 11. The keys in the routing table cluster around the seed of
this node (in this case average number of keys generated per
node=20).
H. Zhang et al. / Computer Networks 46 (2004) 555–574 563
LRU. This explains the results of Fig. 10. At low
loads, this performance difference is not evident
(Fig. 14) and all distributions overlap with each
other.
4.2.4. Other simulation scenarios
We ran the simulations with different initial top-
ologies (ring+random, tree, tree+random,
star+random, random), varying number of nodes
(300–3000), different values of HopsToLive (40–
100), varying cache sizes (50–200), and varying
routing table sizes (100–240). All simulation re-
sults showed qualitatively similar trends. For
example, Figs. 15 and 16 show the comparison
of three schemes with relatively large routing ta-
bles (routing table size is 240), which is qualita-tively the same as what we saw in Figs. 8 and 10.
4.2.5. Discussion
We now present some additional intuition
about why the enhanced-clustering with random
shortcuts performs well. Fig. 6 shows that the
routing table of Freenet is not very clustered when
LRU is used as the cache-replacement scheme.This is analogous to random graphs. It is well
known that random graphs have small diameters
as long as the number of edges is sufficiently large
[1]. However, since the edges are drawn at random,
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60, routing table size =240)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 16. Average hops per successful request for Freenet with
LRU, strict enhanced-clustering and enhanced-clustering: large
routing table size.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 5 10 15 20 25
Req
uest
Hit
Rat
io
# of keys generated per node
RING TOPOLOGY (300 nodes, cache size =60, routing table size =240)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 15. Availability of Freenet with LRU, strict enhanced-
clustering and enhanced-clustering: large routing table size.
564 H. Zhang et al. / Computer Networks 46 (2004) 555–574
it is hard to use local rules to travel from a given
node to the node which contains the desired key.
Not surprisingly, under heavy load, LRU has a
low hit ratio. Fig. 11 shows that the routing table
for Freenet with strict enhanced-clustering is com-
pletely clustered. This corresponds to a highly reg-
ular graph, where each node is connected to its
neighbors. It is very easy to get from a given nodeto a desired node in such a graph. However, the
high clustering implies that each hop travels only
a small distance in the key-space, which results in
a high number of average hops. Fig. 12 shows
that the enhanced-clustering with random short-
cuts results in a small-world like graph. The
high clustering in this graph makes it easy to use
local rules to arrive at a desired node; at the same
time, the random shortcuts sometimes allow us to
travel large distances in the key-space using onehop.
Kleinberg showed that for efficient routing, a
node needs to choose a shortcut at a distance x
with probability proportional to 1/x [9]. This can
also be implemented using the following local
replacement rule: suppose there are two keys u
and v which are contending for being the shortcut
keys out of a Freenet node Y. Let xu and xv betheir distances from the Y�s seed key. Then wewould keep u with probability xv/(xu+xv) and keep
v with probability xu /(xu+xv). However, we have
chosen to implement the simpler scheme outlined
earlier in this section. It is important to mention
that this, and other simple variants, could also give
similar results as enhanced-clustering. Also, LRU
may well have other practical advantages thatwould override the improvements presented in this
section. However, the point of the paper is that en-
hanced-clustering enables the system to preserve
its small-world property regardless of workload,
and this is a desirable design.
4.2.6. Zipf distribution for file size and popularity
While uniform file size and popularity distribu-tions make it to understand the performance of
our schemes, practical file size and popularity dis-
tributions are significantly skewed. To see how
network performance evolves with time when the
document size and/or popularity are Zipf-ian, we
repeated the simulation on a 900-node network
with the ring as initial topology, using a load of
20 keys per node. The total simulation time was60,000 time steps so as to make sure the network
performance was stable at the end of the simula-
tion. We choose the Zipf parameter a as 1. Figs.17 and 18 show the document access distribution
and size distribution in the simulations. For Zipf
size distribution, we set the file size range from 1
unit to 180 units so that the average file size is
about 3 units. The datastore size is also set to180 units so that the cache capacity of each node
is around 60 files in average, comparable to that
in the simulations with uniform size distribution.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Req
uest
Hit
Rat
io
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 60 files, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 19. Evolution of hit ratio for Freenet: uniform file size and
uniform file popularity.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Req
uest
Hit
Rat
io
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 60 files, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 20. Evolution of hit ratio for Freenet: uniform file size and
Zipf file popularity.
1
10
100
1000
10000
1 10 100 1000 10000
Acc
ess
Tim
es (
in lo
g)
Ranking of documents by access times (in log)
Zipf File Access Popularity
in simulation
Fig. 17. Document access popularity: Zipf distribution.
0.0001
0.001
0.01
0.1
1
1 10 100 1000
Pro
b[fil
e f’s
siz
e >
X] (
in lo
g)
File size X (in log)
Zipf File Size Distribution
in simulation
Fig. 18. Document size: Zipf distribution.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 565
In all simulations, we assume a file�s size and itspopularity are not correlated.
Since files sizes are different, we had to amend
each of our cache replacement strategies to evict
multiple files in order to cache a new file. Once a
new file is decided to be cached, the old files will
be evicted one by one according to their closeness
(based on either time in the case of LRU or virtualdistance in the case of enhanced-clustering) until
enough room is ready for the new file.
We ran the simulations over all four possible
distribution combinations: uniform file size and
uniform file popularity, uniform file size and Zipf
file popularity, Zipf file size and uniform file pop-
ularity, Zipf file size and Zipf file popularity. Figs.
19–22 show the evolution of hit ratio in Freenetalong with the time, and Figs. 23–26 show the evo-
lution of average hops per successful request.
As we can see, the performance of both LRUand enhanced-clustering does not change dramati-
cally under different distributions. They both
achieved smaller average hops per successful re-
quest under the Zipf document popularity. This
is because the local caching actions in Freenet
makes the popular documents pervasive and easy
to be accessed. However, LRU achieved a poor re-
quest hit ratio in all cases which means it is verydifficult to find out unpopular documents in Free-
net with LRU scheme. By contrast, enhanced-clus-
tering resulted in a significantly better hit ratio
than LRU while keeping the average hops per suc-
cessful request close to that of LRU.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Req
uest
Hit
Rat
io
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 22. Evolution of hit ratio for Freenet: Zipf file size and
Zipf file popularity.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Req
uest
Hit
Rat
io
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 21. Evolution of hit ratio for Freenet: Zipf file size and
uniform file popularity.
0
5
10
15
20
0 5 10 15 20 25 30 35 40
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 60 files, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 23. Evolution of average hops per successful request for
Freenet: uniform file size and uniform file popularity.
0
5
10
15
20
0 5 10 15 20 25 30 35 40
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 60 files, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 24. Evolution of average hops per successful request for
Freenet: uniform file size and Zipf file popularity.
566 H. Zhang et al. / Computer Networks 46 (2004) 555–574
Surprisingly, strict enhanced-clustering is sensi-
tive to file size distribution. Actually, it performed
the same as enhanced-clustering under Zipf size
distribution. To illustrate this phenomenon
clearly, we varied the Zipf parameter a from 1 to
15 so that file size distribution shifted from Zipf
to uniform distribution. As shown in Figs. 27
and 28 (file popularity is uniform in the simula-tion), strict enhanced-clustering resembled en-
hanced-clustering when the files in the network
are heterogeneous in terms of size. Since short
path length in a small-world network is due to
the shortcuts, there must be shortcuts created in
Freenet with strict enhanced-clustering even
though it is not done explicitly. We believe the rea-
son for this ‘‘spontaneous shortcutting’’ is due to
the cascading evicting strategy we adopted.
Remember that the only criteria when evicting
old files is the proximity in key space. With Zipfdistribution the total size of evicted files will com-
monly be greater than the size of the new cached
file. That means normally there will be some room
left to cache small random files after each normal
caching, which leads to the ‘‘spontaneous short-
cutting’’.
As a summary, we see that enhanced-clustering
enables the system to preserve its small-world
0
5
10
15
20
2 4 6 8 10 12 14
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Parameter a: Pr.[Size = x] = cx^-(a+1)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 10 keys per node)
Strict enhanced-clusteringEnhanced-clustering
Fig. 27. Average hops per successful request for Freenet:
varying Zipf parameter a.
0
5
10
15
20
0 5 10 15 20 25 30 35 40
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 26. Evolution of average hops per successful request for
Freenet: Zipf file size and Zipf file popularity.
0
0.2
0.4
0.6
0.8
1
1.2
2 4 6 8 10 12 14
Req
uest
Hit
Rat
io
Parameter a: Pr.[Size = x] = cx^-(a+1)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 10 keys per node)
Strict enhanced-clusteringEnhanced-clustering
Fig. 28. Hit ratio for Freenet: varying Zipf parameter a.
0
5
10
15
20
0 5 10 15 20 25 30 35 40
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Time Steps (X 1500)
RING TOPOLOGY (900 nodes, cache size = 180 units, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 25. Evolution of average hops per successful request for
Freenet: Zipf file size and uniform file popularity.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 567
property regardless of the distribution type of filesize and popularity.
4.2.7. The impact of ResetDataSource probability
While key clustering in the datastores and the
routing tables is the desired status for Freenet
[3], there needs to be some way to protect the den-
iability/anonymity of data holders. In original
Freenet, the deniability (the ignorance of the con-tents in their datastore) of data holders relies on
data encryption and file splitting. As for the ano-
nymity of data holders, it primarily relies on the
ResetDataSource probability and small HopsToL-
ive perturbation. With the ResetDataSource prob-
ability, an intermediate node on the request proxy
chain will reset the returned data file�s DataSource
address with its own address. Due to smallHopsToLive perturbation, an intermediate node
cannot conclude whether or not its previous hop
is the message originator or forwarder. Clearly,
data encryption and file splitting have no direct
impact on Freenet routing performance. Small
HopsToLive perturbation also has trivial effect
on it. Therefore, in the following we only investi-
gate the factor ResetDataSource probability.We repeated the 900-node simulation in Section
4.2.6 with uniform file size and uniform file popu-
larity, and varied the ResetDataSource probability
0
0.2
0.4
0.6
0.8
1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Req
uest
Hit
Rat
io
ResetSource probability
RING TOPOLOGY (900 nodes, cache size =60, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 29. Hit ratio for Freenet: varying ResetDataSource prob-
ability.
0
5
10
15
20
25
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
ResetSource probability
RING TOPOLOGY (900 nodes, cache size =60, load = 20 keys per node)
LRUStrict enhanced-clustering
Enhanced-clustering
Fig. 30. Average hops per successful request for Freenet:
varying ResetDataSource probability.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Req
uest
Hit
Rat
io
Precentage of Freenet Nodes using enhanced-clustering scheme
2000 nodes, cache size = 60 files, load = 12 keys per node
Zipf file popularity
Fig. 31. Hit ratio vs. percentage of enhanced-clustering-ena-
bled nodes.
568 H. Zhang et al. / Computer Networks 46 (2004) 555–574
from 0.05 (the default value in Freenet configura-
tion) to 0.5. Figs. 29 and 30 show the stable Free-
net performance (i.e., the average during the last
1500 simulation time steps) as a function of the
ResetDataSource probability. As expected, the Re-
setDataSource probability has little impact on the
routing performance of LRU and strict enhanced-
clustering because routing table information is notuseful at all in the former scheme, and the Data-
Source information is not used after the first few
hops in the latter scheme. However, the perform-
ance of enhanced-clustering deteriorates with the
increasing ResetDataSource probability, especially
for the metric average hops per successful request.
This is because of the ‘‘shortening’’ shortcut span
effect caused by intermediate DataSource ad-
dresses. Despite that, enhanced-clustering main-
tained reasonable routing path length even under
a ResetDataSource probability as large as 0.5,and achieved a much higher hit ratio than the
other two schemes did.
4.2.8. Incremental deployment of enhanced-cluster-
ing scheme
Finally, we evaluated the incremental deploy-
ability of enhanced-clustering in Freenet, and spe-
cifically the performance of networks where arandom fraction x of nodes used enhanced-cluster-
ing for cache replacement, while all other nodes
used LRU replacement. Such a scenario more clo-
sely parallels how the new replacement scheme
would be deployed in an operational network. A
key criterion is that incremental deployment of
the new scheme does not degrade the performance
of the network, since this would otherwise theincentive for adoption of the scheme.
We conducted our simulations on a 2000-node
network, using a moderate load of 12 keys per
node. The total simulation time was 60,000 time
steps so as to make sure the network performance
was stable at the end of the simulation. We varied
x from 0 to 1.
Figs. 31 and 32 show the stable Freenet per-formance (the average during the last 1500 simula-
tion time steps) as a function of x. As we can see,
0
1
2
3
4
5
0 0.2 0.4 0.6 0.8 1
Avg
. Hop
s P
er S
ucce
ssfu
l Req
uest
Precentage of Freenet Nodes using enhanced-clustering scheme
2000 nodes, cache size = 60 files, load = 12 keys per node
Zipf file popularity
Fig. 32. Avg. hops per successful request vs. percentage of
enhanced-clustering-enabled nodes.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 569
the hit ratio of Freenet improved quickly with the
increase of enhanced-clustering enabled nodes.More encouragingly, a relatively small deployment
of enhanced-clustering improves Freenet hit ratios
dramatically. For example, with 20% enhanced-
clustering nodes, the hit ratio was improved more
than twice of that in a pure LRU Freenet, while
the average hops per successful request was in-
creased by only about 20%. With 50% deployment,
the hit ratio has reached above 90% of the hit ratioin a pure enhanced-clustering Freenet, while the
average hops per successful request was increased
only from 2 to 3. These results indicate that en-
hanced-clustering can actually incentivize its own
adoption, since even small deployments can signif-
icantly improve performance.
Node X Node Y
V seed(Y)
2K keys clustering around seed(Y)
Node X Node Y
Fig. 33. Node-to-node links are created by transit keys.
5. Analysis
The analytical results we present in this section
are an attempt to formally demonstrate that thesmall-world model can lead to good results in an
idealized Freenet-like scenario. These results are
not hard evidence of the superiority of our scheme,
but do give us some intuition for the performance
of enhanced-clustering.
5.1. An idealized network model
We consider an idealized model of Freenet
which assumes that the seeds are chosen so that
they are distributed evenly in the key space and
the routing tables are shaped to fit the small-world
hypothesis, i.e., the routing tables at each node
have the structure shown in Fig. 7. Assume each
node has a (2K+S)-files-sized data store and a(2M+S)-entries routing table.M should be no less
than K since the routing table should maintain at
least the pointers to the (2K+S) files in the data
store.
In the idealized model each node X caches the
files for the 2K keys centered at s(X) and S ran-
domly chosen keys. The routing table of X will
keep entries for the 2M keys centered at s(X) andthe S randomly chosen keys. Fig. 33 shows how
the links are created from one node to another.
The arrow from a key to a node means this node
contains this key in its data store and therefore
claims itself as the source of this key. The arrow
from a node to a key means this node has an entry
for this key in its routing table. A link exists from
nodes X to node Y if and only if a transit key u ex-ists between them and is called X!u!Y. For
example, in Fig. 33 Node X has a link to Y via
the transit key v. When searching for a key k, a
link X!u!Y is called a ‘‘misleading’’ link if u
is the closest key to k in the routing table of X
but Y cannot find a closer key v to k than u so that
Y can forward the request to v�s correspondingnode. Misleading links may exist in Freenet butwe will not consider the effect of misleading links
until the next subsection. Rather, we use an ideal-
ized model at the node level as shown in Fig. 34. In
Fig. 34 the solid links represent the local contacts
Node 1 Node i 1 Node i Node i+1 Node N
N nodes in the network
Fig. 34. An idealized model of Freenet from the node level. The
dotted edges correspond to shortcut connections.
570 H. Zhang et al. / Computer Networks 46 (2004) 555–574
between each pair of neighbors via their over-
lapped routing tables and the dotted links depictthe random shortcuts the nodes choose. Each node
has two pointers to its two neighbors and a link
pointing to a node randomly chosen in the net-
work. We assume that all nodes lose information
about the initial inserters of the files eventually.
The source of any file (key) is known to everyone
as the current holder of this file.
Theorem 1. In the idealized network model, if each
node x has S= logn random shortcuts, and x chooses
its random shortcuts so that any random shortcut
has an endpoint y with probability proportional to
1/dxy where dxy = js(x)� s(y)j, then the expectednumber of hops required to find a document in this
idealized system is O(logn) using Freenet�s search
algorithm. Here n is the number of nodes in the
system.
Proof. Analogous to the proof in [9], we start
by calculating the probability that node v is cho-
sen by node u for its random shortcut i (1 6 i 6logn):
d�1uvP
x6¼ud�1ux
Pd�1uvPn=2
i¼12i�1
P1
2 lnð3nÞduv:
Now, we say that the execution of the searching
for the target k is in phase j when the distance of
the current node to k is greater than 2j and at most
2j+1. Therefore, the initial value of j is at most
log(n/2). Suppose we are now in phase j,
log(logn) 6 j 6 logn�1, and the current node
holding the request is u, how long will it take to
end the search in phase j and enter the next phase?This requires the request to enter the set Bj of
nodes within distance 2j of the target k. There are
1þX2ji¼12 > 2jþ1
nodes in Bj, each is within the distance
2j+1+2j<2j+2 of u, and therefore each has a prob-
ability of at least logn/(2 ln (3n)2j+2) of being one
shortcut of u. Thus the probability that u can find
a node in Bj to forward the request to is at least
2jþ1 log n
2 lnð3nÞ2jþ2> 1
8ðwhen n � 1Þ:
Let Xj denote total number of steps spent in phase
j, log(logn) 6 j 6 logn�1. We have
E½X j ¼X1i¼1
Pr½X j P i 6X1i¼1
1� 18
� �i�1 ¼ 8:
Finally, if 0 6 j 6 log(logn), E[Xj] is still O(1) with
the analogous analysis. Now we use X to denote
the total number of steps spent during the whole
search. By definition,
X ¼Xlog n�1j¼0
X j:
By the linearity of expectation, we have
E½X ¼Xlog n�1j¼0
E½X j ¼ Oðlog nÞ: �
5.2. Idealized model with misleading links
Theorem 1 holds only if the distance from the
request to target k decreases strictly in each step
during the search. However, this condition may
not exist when there are misleading links in Free-net. In this subsection, we will consider the possi-
bility that a misleading link may be chosen at
some step of a search in Freenet. We assume that
while searching for key k, some node X forwarded
the request to Y. The node X must have had an en-
try of the form Æk 0,Yæ in its routing table, while k 0
must be one of the keys in Y�s data store. Let usassume that k 0 can be any of the (2K+S) keyswhich are present in the data store of Y with equal
probability. Clearly, Y will now find a tuple Æk*,Væ
searching direction
searching direction
seed(Y)
2K keys clustering around seed(Y)
Node X Node Y
seed(Y)
2K keys clustering around seed(Y)
Node X Node Y
(a)
(b)
Fig. 35. Two cases in which the request is forwarded through a
misleading link. (a) A request is passed from X to Y through a
key far away from seed. (b) A request is passed from X to Y
through the rightmost key of Y.
H. Zhang et al. / Computer Networks 46 (2004) 555–574 571
in its routing table such that k* is closest to k, and
then forward the request to node V. But there are
two cases in which Y cannot find such a node V:
� Key k 0 is one of the S shortcuts of Y. In thiscase all other keys may be far away from the
target key k (shown in Fig. 35a);
� Key k 0 is the closest key to k in Y�s 2K clusteredkeys (shown in Fig. 35b).
Then, the possibility that a misleading link is
chosen is equal to (S+1)/(2K+S). Next we will
prove that Freenet�s performance (logarithmicnumber of hops) in Theorem 1 can be maintained
even under the effect of misleading links as long as
a polylogarithmic-sized routing table/datastore is
provided in each node.
Theorem 2. In the idealized network model
considering the effect of misleading links, if K P32 log2(n) and each node x chooses its logn random
shortcuts so that any random shortcut has an
endpoint y with probability proportional to 1/d
where dxy = js(x)� s(y)j, then the expected number
of hops required to find a document in this idealized
system is O(logn) using Freenet�s search algorithm.
Here n is the number of nodes in the system.
Proof. From Theorem 1 we know a search will end
in at most 8 logn steps on the average when there isno effect of misleading links. We count 16 logn
steps as one run. If no misleading links are encoun-
tered during a run, then using Theorem 1 and
Markov�s inequality [11], we know that the proba-bility of finding the key in that run is at least 1/2.
Let q be the probability of encountering nomislead-
ing link during a run, and let p be the probability
that that run is successful under the effect of mis-leading links. Then p P q/2. Also, the possibility
that a misleading link is chosen at one step is equal
to (S+1)/(2K+S)<1/(32 log n), which leads to
qP 1� 1
32 log n
� �16 log n¼
ffiffiffi1
e
rðwhen n � 10Þ:
Therefore, p P 0.303. Let X denotes the total
number of runs to find a document. We have
E½X ¼X1i¼1
Pr½X P i 6X1i¼1
ð1� pÞi�1 ¼ 1p� 3:3:
The expected number of hops to find a document
is at most 3.3· (16 logn)=O(logn). h
Theorems 1 and 2 state that, our replacement
policy motivated by the small-world model gives
a logarithmic number of hops on the average in
our idealized model, both with and without mis-
leading links. Here there is a requirement that
the random shortcut be chosen from a specific dis-
tribution (called inverse rth-power distribution in[9]). It is possible to satisfy this requirement in
our scheme by using different values of p depend-
ing on the distance of the candidate shortcut key
from the seed key. The datastore (and routing ta-
ble size) required is also polylogarithmic in the size
of the Freenet system. Of course the size of the
data store should be at least large enough to make
sure that every key has a copy in some node.
5.3. Idealized Freenet model vs. skip-list based
architecture
A skip-list based architecture is one of the gen-
eral architectures adopted by many structured
572 H. Zhang et al. / Computer Networks 46 (2004) 555–574
peer-to-peer systems including Chord [16], Pastry
[15] and Tapestry [19]. In skip-list based architec-
tures, the key space S is linear, and a hierarchical
routing structure is constructed on top of this
space in a distributed fashion. The hierarchicalstructure resembles a skip-list. Each node is
responsible for a continuous range of values in
the key-space. Each node stores a pointer to a
node that is its neighbor in the key-space, to an-
other node that is roughly a distance two away,
another node that is roughly a distance 4 away,
and so on. Thus an arbitrary node stores roughly
logn pointers where n is the number of nodes inthe peer-to-peer system, and routing to a node
takes roughly logn steps. This is an elegant and
practical design.
To discover the relation between skip-list
based architecture and the idealized model, we
calculate the cumulative probability Pr(D/2, D)
of the inverse-distance probability used in the ide-
alized model, i.e., the probability that the short-cut x falls in the distance range [D/2, D]. Now,
we have
PrD2;D
� �¼ 1
c log n
XDx¼D=2
1
x� 1
c0 log nðD � 1Þ:
That means for a node i, when we divide the
remaining nodes into log(n/d) groups A0;A1; . . . ;Ai; . . . ;Alognd�1, where Ai consists of all nodes thedistance of whose seeds to s(i) is between 2id and
2i+1d, i will have a shortcut from each group with
the same probability. When the number of short-
cuts is h(logn), the routing table in the nodesresembles exactly that in the skip-list based archi-
tecture in a probabilistic way.
To summarize, in this idealized model, a care-
fully configured unstructured system gives thesame ballpark performance (i.e. polylogarithmic)
in terms of the size of the data store/routing table
and the average number of hops as the structured
schemes. If Theorem 2 continues to hold in more
realistic settings, then it would be an important
consideration in the design of peer-to-peer net-
worked systems. In order to have a less-idealized
and more realistic model, we need to analyze Free-net dynamics in more. This is an important open
problem.
6. Conclusions
In this paper we sketched some preliminary
work we did toward improving the performance
of unstructured systems such as Freenet by usingintuition from the small-world model. We first
gave a brief description of the Freenet protocol.
We then sketched the main idea behind the
small-world model and explained how we could
use intuition gained from this model to influence
Freenet performance. We presented a new but sim-
ple cache replacement scheme: enhanced-cluster-
ing. Our simulations showed a significantincrease in availability and a significant decrease
in average number of hops per request when we
used the enhanced-clustering caching rather than
LRU or strict enhanced-clustering caching. It is
important to note that this change did not involve
any modifications to the Freenet protocol, just to
local user behavior when the datastore gets filled.
Finally, we analyzed the latency of Freenet (withthe modified cache replacement scheme) in an ide-
alized model and proved that the average time to
find a key in such a model is just O(logn) where
n is the number of nodes in the system. In addition
to being interesting in their own right, the results
sketched in this paper also illustrate how theoreti-
cal techniques and insights can improve the per-
formance of peer-to-peer systems.It would be interesting to do a more complete
analysis of Freenet as a stochastic system as op-
posed to our simple idealized model. More gener-
ally, there is a need to take a principled look at the
properties of various algorithms proposed in the
peer-to-peer literature. Peer-to-peer networks have
a rich theoretical structure, and sophisticated algo-
rithmic techniques have been employed in the sys-tems currently under development [10,13,16].
Some of the rich structure of this problem was
exemplified by the pioneering work of Plaxton
et al. [12], where they give an elegant scheme to
achieve optimum latency in networks which fol-
low power-law expansion [4]. This has recently
been extended to all graphs, but with weaker per-
formance guarantees [14]. Some of the specificquestions that need to be addressed are under-
standing the fundamental bounds on the perform-
ance of peer-to-peer systems, designing algorithms
H. Zhang et al. / Computer Networks 46 (2004) 555–574 573
that allow these systems to perform at peak avail-
ability and minimum latency, and understanding
the role that the small-world model might play in
improving the performance of the peer-to-peer sys-
tems.
Acknowledgment
We would like to thank the anonymous referees
for their insightful comments that improved the
presentation of this paper.
References
[1] B. Bollobas, Random Graphs, Academic Press, London,
1985.
[2] Y. Chu, S. Rao, H. Zhang, A case for end-system
multicast, in: ACM Sigmetrics, 2000.
[3] I. Clarke, O. Sandberg, B. Wiley, T.W. Hong, Freenet: A
distributed anonymous information storage and retrieval
system, Freenet White Paper, http://freenetproject.org/
freenet.pdf, 1999.
[4] M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law
relationships of the Internet topology, in: ACM SIG-
COMM, 1999.
[5] P. Francis, Yallcast: Extending the Internet multicast
infrastructure, Unrefereed Report, http://www.yallcast.
com/docs/index.html, 1999.
[6] A. Goel, K. Munagala, Extending greedy multicast routing
to delay sensitive applications, in: ACM Symposium on
Discrete Algorithms (short abstract), July 1999, Also, to
appear as an invited paper in the special issue of Algorith-
mica on Internet Algorithms.
[7] T. Hong, Performance, in: A. Oram (Ed.), Peer-to-Peer:
Harnessing the Power of Disruptive Technologies, O�Re-illy, Sebastopol, CA, 2001.
[8] J. Kleinberg, Navigation in a small-world, Nature 406
(2000).
[9] J. Kleinberg, The small-world phenomenon: an algorithmic
perspective, Cornell Computer Science Technical Report
99-1776, 2000.
[10] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P.
Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon,
W. Weimer, C. Wells, B. Zhao, Oceanstore: An architec-
ture for global-scale persistent storage, in: Ninth Interna-
tional Conference on Architectural Support for
Programming Languages and Operating Systems (ASP-
LOS 2000), November 2000.
[11] R. Motwani, P. Raghavan, Randomized Algorithms,
Cambridge University Press, Cambridge, 1995.
[12] C.G. Plaxton, R. Rajaraman, A.W. Richa, Accessing
nearby copies of replicated objects in a distributed
environment, in: 9th Annual ACM Symposium on Parallel
Algorithms and Architectures, Newport, RI, June 1997,
pp. 311–320.
[13] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S.
Shenker, A scalable content addressable network, in: ACM
SIGCOMM, 2001.
[14] R. Rajaraman, A.W. Richa, B. Vocking, G. Vuppuluri, A
data tracking scheme for general networks, in: Thirteen
ACM Symposium on Parallel Algorithms and Architec-
tures, 2001.
[15] A. Rowstron, P. Druschel, Pastry: Scalable, distributed
object location and routing for large-scale peer-to-peer
systems, in: International Conference on Distributed Sys-
tems Platforms (Middleware), November 2001.
[16] I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balak-
rishnan, Chord: A peer-to-peer lookup service for Internet
applications, in: ACM SIGCOMM, 2001.
[17] D.J. Watts, Small-Worlds: The Dynamics of Networks
Between Order and Randomness, Princeton University
Press, Princeton, NJ, 1999.
[18] D.J. Watts, S.H. Strogatz, Collective dynamics of �small-world� networks, Nature 393 (1998) 440–442.
[19] B.Y. Zhao, J.D. Kubiatowicz, A.D. Joseph, Tapestry: an
infrastructure for fault-tolerant wide-area location and
routing, Technical Report UCB/CSD-01-1141, UC Berke-
ley, April 2001.
[20] Napster, http://www.napster.com.
[21] Gnutella, http://www.gnutella.wego.com.
[22] I. Clarke, Private communication, April 2003.
Hui Zhang received his B.Eng. degreefrom Hunan University, and hisM.Eng. degree from Institute of Auto-mation, Chinese Academy of Sciences.He is now pursuing his Ph.D. degree inthe Computer Science Department atthe University of Southern California.His research interests include peer-to-peer and overlay routing, and rand-omized algorithms.
Ashish Goel is an Assistant Professor ofManagement Science and Engineeringand (by courtesy) Computer Science atStanford University. He received hisPh.D. in Computer Science from Stan-ford in 1999, and was an AssistantProfessor of Computer Science at theUniversity of Southern California from1999 to 2002.His research interests lie inanalysis of algorithms, with computernetworks and molecular self-assemblybeing two important application do-mains. He is a recipient of an Alfred P.
Sloan faculty fellowship (2004–2006), a Terman faculty fellow-
ship from Stanford, and an NSF Career Award (2002–2007).574 H. Zhang et al. / Computer Networks 46 (2004) 555–574
Ramesh Govindan received his B.Tech.degree from the Indian Institute ofTechnology at Madras, and his M.S.and Ph.D. degrees from the Universityof California at Berkeley. He is anAssociate Professor in the ComputerScience Department at the Universityof Southern California. His researchinterests include Internet routing andtopology, and wireless sensor net-works.