Upload
eric-jui-lin-lu
View
224
Download
9
Embed Size (px)
Citation preview
ARTICLE IN PRESS
Journal of Network and Computer Applications 32 (2009) 578–588
Contents lists available at ScienceDirect
Journal of Network and Computer Applications
1084-80
doi:10.1
$ Thi
ROC, un� Corr
E-m
journal homepage: www.elsevier.com/locate/jnca
ML-Chord: A multi-layered P2P resource sharing model$
Eric Jui-Lin Lu a,�, Yung-Fa Huang b, Shu-Chiu Lu b
a Department of Management Information Systems, National Chung Hsing University, 250 Kuo Kuang Road, Taichung 402, Taiwan, ROCb Graduate Institute of Networking and Communication Engineering, Chaoyang University of Technology, 168 Gifeng E. Road, Wufeng, Taichung County 413, Taiwan, ROC
a r t i c l e i n f o
Article history:
Received 24 November 2007
Received in revised form
7 July 2008
Accepted 5 August 2008
Keywords:
Peer-to-peer
Semantic
Chord
RDF
Resource sharing
45/$ - see front matter & 2008 Elsevier Ltd. A
016/j.jnca.2008.08.002
s research was partially supported by the Nati
der Contract no. NSC95-2221-E-005-050-MY
esponding author. Tel.: +886 4 22840864; fax
ail address: [email protected] (E.J.-L. Lu).
a b s t r a c t
In recent years, due to the emergence of P2P technology, people rely on the Internet to share resources.
It is believed that the number of users and shared resources will become enormously huge. As a result,
many researches have been dedicated to improve the scalability and efficiency of P2P models. In this
paper, we propose a multi-layered P2P resource sharing model, called ML-Chord, that assigns nodes into
Chord-like layers based on the categories of shared resources. From the experimental results, it shows
that ML-Chord is both efficient and scalable.
& 2008 Elsevier Ltd. All rights reserved.
1. Introduction
With the blooming development of the Internet, the demandfor efficient resource sharing is increased rapidly. Due to itssimplicity, P2P technology has been widely used for sharingresources, and it is believed that the number of users and sharedresources will become enormously huge. This tendency, however,raises two critical issues that must be resolved when designing aP2P resource sharing model: one is scalability and the other isefficiency.
In the past, many P2P models were proposed. These P2Pmodels were classified into three categories (Lv et al., 2002):centralized, decentralized and unstructured, as well as decentra-lized and structured. A well-known centralized P2P system isNapster. Because a centralized directory server is required, it is ingeneral not scalable. For decentralized and unstructured P2Psystems such as Gnutella, query messages are broadcasted untilthe requested resources are found, and this results in high trafficoverhead and low scalability (Sen and Wang, 2004). For efficiencyand scalability, many P2P systems are decentralized and struc-tured, and the most well-known ones include CAN (Ratnasamyet al., 2001), Pastry (Rowstron and Druschel, 2001), Chord (Stoicaet al., 2001, 2003), and Tapstry (Zhao et al., 2004). In CAN, the d-dimensional search space is dynamically partitioned into N spaces
ll rights reserved.
onal Science Council, Taiwan,
2.
: +886 4 22857173.
which results in the average query cost of logarithmic orderOðdN1=d
Þ. For Pastry, Chord, and Tapstry models, indexes (key–value pairs) were distributed among nodes. This results in lowquery costs of logarithmic order Oðlog NÞ. Because resourceindexes were generated using hashing functions and distributedamong nodes, these P2P systems were also called DHT-based(distributed hash table) systems. To perform well not only on theoverlay network but also on its underlying network, multi-ringDHT models were proposed and their sub-rings were createdbased on either network latency (Xu et al., 2003), administrativedomains such as firewalls and gateways (Zhao et al., 2002;Mislove and Druschel, 2004), or content locality (Garces-Ericeet al., 2003). Kaashoek and Karger (2003) also proposed a DHT-based system called Koorde which is based on Chord and de Bruijngraph. In Koorde, the query cost is Oðlog NÞ if each node has twoneighbors, the query cost can be reduced to Oðlog N= log log NÞ ifeach node has Oðlog NÞ neighbors. In 2005, Wepiwe and Simeonov(2005) proposed a concentric multi-ring network that furtherimproved Koorde. Although DHT-based systems are in generalscalable and efficient, they only support exact-match search.
Recently, a new arena of P2P research is the usage of metadatato describe the shared resources which in part resolves the exact-match search problem embedded in pure DHT-based systems .Furthermore, it is believed that P2P systems using metadata canprovide flexible and faster queries. There are many semantic-based P2P projects such as RDFStore (RDFStore, 2006), Edutella(Nejdl et al., 2002, 2003, 2004), RDFPeers (Cai and Frank, 2004;Cai et al., 2004), Expertise (Haase et al., 2004), ContextPeers (Guet al., 2005b), SCS (Gu et al., 2005a), SuperRing (Antonopouloset al., 2006), M-Chord (Novak and Zezula, 2005), and R-Chord (Liu
ARTICLE IN PRESS
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 579
and Zhuge, 2006). These semantic-based P2P models can beroughly classified into two categories based on their mainobjectives. The main objective of one category is to provide fasterqueries by reducing search space based on the metadata of sharedresources. Example of such P2P systems include Expertise,ContextPeers, SCS, and SuperRing. The main objective of the othercategory, called peer data management systems (PDMS), is toprovide flexible queries such as disjunctive and range queries.Example of such P2P systems include RDFStore, RDFPeers, M-Chord, Edutella, and R-Chord.
To support flexible queries in PDMS, each shared resource isdescribed by metadata. All metadata are usually saved inrepositories, called super-peers, which are nodes that arerelatively static and have higher bandwidth or computing power.The organization of super-peers can be either centralized (ex.RDFStore) or distributed (ex. Edutella, RDFPeers, M-Chord, and R-Chord). In Edutella, metadata are saved in nodes, called super-peers, where they are organized as a hypercube. In the RDFPeersand M-Chord, metadata are saved in nodes that are organized in aChord-like ring. In the R-Chord model, metadata are saved insuper-peers that are organized in a hybrid structure and related toeach other with views. In PDMS, queries are sent to repositories tolocate the nodes that host the requested resources. Because super-peers in PDMS are relatively static, the maintenance overhead ofthese models is rarely discussed. Because the main purpose of thispaper is to propose a P2P model that is not only efficient andscalable, but also of low maintenance cost, P2P systems in thesecond category are not considered in the rest of the paper.
In this paper, we focus on the first category and propose amulti-layered P2P resource sharing model called ML-Chord. InML-Chord, all resources are classified into categories based on aselected ontology. Each category is corresponding to an overlaylayer in ML-Chord. Because a shared resource may belong to morethan one category, the node that hosts the shared resource may belinked to or associated with more than one layer. On each layer,nodes are organized in a Chord-like manner. From the results ofour various experiments that studied average query costs, averagemaintenance costs, the average costs of node joining, the stabilityin case of massive node failure, etc., we show that ML-Chord issuperior than both Chord and SCS.
The rest of the paper is organized as follows: Section 2 brieflyreviewed semantic-based P2P protocols in the first category. Thedesign of ML-Chord is discussed in Section 3. In Section 4, varioussimulation experiments and their results are presented andanalyzed. Finally, we conclude our work in Section 5.
ContextBus A
ContextBus C
ContextBus D
ContextBus B
:ContextPeer :BridgePeer
Fig. 1. Framework of ContextPeers file sharing system.
2. Related work
Edutella (Nejdl et al., 2002, 2003, 2004) is a Gnutella-like P2Pnetwork (Ripeanu et al., 2002) which utilizes RDF to describe andsearch a wide range of resources. In Edutella, users issue queriescoded in Edutella query language, and query messages arebroadcasted to all nearby nodes. If the queried resources cannotbe found among the nearby nodes, the query messages will betransmitted again to their nearby nodes until either the requestedresources are found or the failure messages are returned if thequeries exceed the number of time-to-live (TTL). Like all broad-cast-based P2P systems, Edutella suffers large transmissionoverhead and poor scalability.
Unlike Edutella which transmits query messages blindly tonear nodes, Haase et al. (2004) proposed a P2P protocol thatselects to-be-queried nodes intelligently to reduce unnecessarytransmissions. In Hasse and Siebes model, each node extracts asummary, known as an expertise, from its knowledge base andsends the expertise to other nodes. When receiving an expertise, a
node will compare the receiving expertise with its own expertise.If they are similar, the node would save the expertise. Whenquerying resources, a node sends query messages to adjacentnodes by the way of broadcast. When receiving a query message, anode will check whether or not it has the requested resources. Ifnot, it will extract the subject from the message and compare thesubject with all expertises the node has. Then, the query messagewill only be sent to those adjacent nodes whose expertises aremore similar to the query message than its own expertise. TheHasse and Siebes model reduces transmission overhead withoutblindly broadcasting. However, if there is a gap between theinquiry node and the target nodes, the resources on the targetnodes may not be found.
Gu et al. (2005b) proposed ContextPeers which classifies sharedresources, based on their metadata, into groups. Each group is aContextBus as shown in Fig. 1. Each ContextBus is a unstructurednetwork topology that deliverers messages by the way of broad-cast. Each node can be linked to one or more ContextBuses basedon the categories of its sharing resources. A node with bettercapability (ex. larger communication bandwidth or processingpower, more battery power, etc.) can be selected as a BridgePeer
which is linked to all ContextBuses. When receiving a querymessage on the same ContextBus, a node will broadcast the querymessage to other nodes on the same ContextBus. Otherwise, whenreceiving a query message for resources on a ContextBus that ithas no direct link, a node transmits the query message to aBridgePeer and then the BridgePeer broadcasts the query messageonto the target ContextBus.
As stated by its developers (Gu et al., 2005a), the scalability ofContextPeers is poor and its maintenance cost is high. Therefore,they proposed a new model called semantic context space (SCS). InSCS, a ContextBus becomes a semantic cluster (SC) which is furtherdivided into clusters. A node is allocated to a cluster based on themain category of its sharing resources. Nodes within a cluster arefully interconnected. The network topology becomes a ring as shownin Fig. 2. When a node (ex. N1) receives a query message Q, N1checks whether or not Q falls into its own category SC0. If yes, N1broadcasts Q to its own cluster C0 and also forwards Q to adjacentclusters. Otherwise, N1 forwards Q to adjacent SCs such as SC1 andSC7. To speed up search process, SCS allows users to define shortcuts
between SCs. There is one shortcut between SC0 and SC4 as shownin Fig. 2. As stated in the paper, although the more shortcuts thebetter search performance, authors warned that maintenance costgrows rapidly when the number of shortcuts is increased.
Antonopoulos et al. (2006) developed a multi-ring modelbased on Chord. Each shared resource is described by one or morekeywords. Nodes are organized in multiple keyword rings. Eachnode in a keyword ring contains the list of nodes that hostresources matching a certain keyword/value pair. For example, asshown in Fig. 3, the FedoraVersion ring contains nodes N2, N4,and N7 which contain the list of nodes that host resourcesmatching ‘‘FedoraVersion ¼ 6:0’’, ‘‘FedoraVersion ¼ 7:1’’ and
ARTICLE IN PRESS
N4
N5
N1
N1C1
N7
N8
N6
C28
C26
C25
C24
C22
C20
C19C18
C17
C12
C11
C10
C9
C8
C6
C4
C3C2
C0
C14C16
N3
SC0SC7
SC6
SC5
SC4 SC3
SC2
SC1
Fig. 2. The network topology of SCS.
Fig. 3. The network topology of superring.
: Peer : BP
CategoryLayer
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588580
‘‘FedoraVersion ¼ 8:0’’, respectively. A Super Ring is also neededto connect keyword rings. To query resources matching‘‘ FedoraVersion ¼ 7:1’’, the keyword ‘‘ FedoraVersion’’ is firsthashed, and the hash value is used to locate a node in the SuperRing that is connected to the FedoraVersion ring. Then, thekeyword value ‘‘7.1’’ is hashed to locate a node in the FedoraVer-sion ring. One major drawback of the model is that it heavilydepends on a bootstrap server.
BP Layer
Fig. 4. The architecture of ML-Chord.
3. The design of ML-Chord
The architecture: The proposed ML-Chord is a multi-layeredP2P resource sharing model. The number of overlay layers
depends on the number of categories for a specific domain orontology. Each layer, called category layer, is a Chord-like overlaynetwork. There are two types of nodes: one is normal peer, andthe other is called bridge peer (BP). Based on the categories ofshared resources, a peer may be associated with more than onelayer. A peer with better capabilities (such as relatively higherprocessing power or bandwidth) can be selected as a BP. A BP islinked to all categories. For efficiency, all BPs themselves form aChord-like overlay network called BP Layer. As shown in Fig. 4, ifthere is 4 categories, then ML-Chord has 4þ 1 layers.
Each node in ML-Chord has a unique ID number and is denotedas NID
i . In all-IP based network, NIDi can be calculated as NID
i �
HmðIPÞkCi where Hm is a hash function of m-bits, k is aconcatenation symbol, and Ci denotes the i th category where1pipT and T is the total number of categories. Similarly, the IDnumber of a shared resource can be calculated as KRID
i � HmðRÞkCi,where R denotes the content of the shared resource.
Successor and predecessor: Every node in each layer (eithercategory layers or BP layer) is sorted by its ID number, organizedin a ring, and connected to a successor and a predecessor. Asuccessor of a node id is a node that is arranged after id and closestto id, while a predecessor of id is a node that is arranged before id
and closest to id. In Fig. 5(A), it is a pseudo code of find_successorðÞ
that is used to find the successor for node id in category c. Supposethere is a node n which wants to find the successor of node id incategory c, n will invoke find_successor(c, id). In find_successor(c,id), it will first check to see whether or not n is linked to categoryc. If not, n will ask a BP to find the successor for itself by invokingbp:find_successor(c, id). Otherwise, it will check whether or not id
is within the range of n and n:successor½c�, excluding n, wheren:successor½c� denotes n’s successor in category c. If id is withinthe specified range, n:successor½c� is returned; otherwise,find_successor(c, id) will ask n0 to find successor for id where n0
is obtained by executing find_predecessor (c, id).In Fig. 5(B), it is a pseudo code of find_predecessor(c, id) that is
used to find the predecessor for node id in category c. Infind_predecessor(c, id), n:finger½c� denotes n’s finger table forcategory c, and n:finger½c; i� denotes the i th entry of n:finger½c�.For each entry in n:finger½c�, find_predecessor(c, id) checks whetheror not n:finger½c; i� is between n and id excluding n and id. If yes,n:finger½c; i� is returned; otherwise, n is returned.
ARTICLE IN PRESS
Fig. 5. (A) Get the successor of id and (B) get the predecessor of id.
Fig. 6. Finger tables and BP finger table of N51 and N42.
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 581
Finger tables: Every peer and BP has a routing table, calledfinger table, for each category which it belongs to. The size of eachfinger table is m. The procedure for creating finger tables isdescribed as follows: for a node id in category c, id calculatestk ¼ ðidþ 2k�1) mod 2m for all k where 1 p k p m. Then, for eachtk, id invokes find_successor(c,tk) to obtain tk0 which is thesuccessor of tk. All tk0 constitute the finger table of id for categoryc.
In addition to finger tables for category layers, there is a BP
finger table in each peer including BPs. The creation of BP fingertable for BPs is identical to the above procedure. However, thecreation of BP finger table for normal peers is different. To create aBP finger table, a normal peer randomly selects a node from anarbitrary finger table. Then, the peer retrieves the first entry,which is a BP, from the BP finger table of the selected node, andthe selected BP becomes the first entry of its BP finger table. Thesecond entry of its BP finger table is the successor of the firstentry. The same procedure will continue until all entries are filledup. The size of BP finger table for normal peer is d satisfying1pdpm.
Fig. 6 shows example of finger tables of a BP and a normal peerin a domain of two categories. N51 is a BP and has three fingertables of size m. N42 is a normal peer and has two finger tables ofsize m. The size of N42’s BP finger table is d. To determine the firstentry of the BP finger table for N42, N42 first selects the first entryof the finger table for category 1. The selected node is N51. Then,because the first entry of N51’s BP finger table is N23
BP , the first entryof N42’s BP finger table is N23
BP . Also, the second entry of N42’s BPfinger table is the successor of N23 which is N51.
Query: To query the location of a resource R on the category c,the query node id hashes the resource to obtain HmðRÞ and invokesid:find_successor(n, c) to find the location of R where n is a node IDin id’s finger table that is greater than or equal to HmðRÞ but closestto HmðRÞ. If R is in a category c that id has no direct link, id will lookup its BP finger table to find a BP and ask the BP to query for it. Byusing the finger table for category c, the BP invokesfind_successor() to find the location of R. From the previousdiscussions, it is clear that the number of hops for a query is in thelogarithmic order of Oð1þ log N=TÞ if all nodes are uniformlydistributed in T category layers.
ARTICLE IN PRESS
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588582
For example, as shown in Fig. 7(A), a node N34 in category C1
attempts to find out the location of a resource K192 . Because N34
does not have a finger table for C2, N34 has to ask a BP to query forit. N34 looks up the BP finger table, obtains N51
BP , and sends thequery message to N51
BP . On behalf of N34, N512 continues the query.
Although N51BP and N51
2 are conceptually located at two differentlayers, they are the same node. As shown in Fig. 7(B), N51
2 looks up itsfinger table for category C2 and obtains N8
2 which is close to K192 . N51
2
will send the query message to N82 and ask N8
2 to continue the query.The same procedure will be repeated until N23
2 is reached.Node join: When a new node joins ML-Chord, it has to find out
its successor first and then connect to the successor. Theprocedure is described in Fig. 8. If a node n wants to join acategory c, but there is no existing node in c, n invokes createðcÞ asshown in Fig. 8(A). In createðcÞ, n simply sets its predecessor to nil
and its successor to itself.However, if there are other nodes in category c, n randomly
selects a node snode and invokes joinðc; snodeÞ as shown in Fig.8(B). In joinðc; snodeÞ, n sets its predecessor to nil, invokessnode:find_successorðc;nÞ to obtain its successor, and finally resetsits successor’s predecessor to n. An example of node join isillustrated in Fig. 9. Initially, N23
2 is the successor of N142 as shown
in Fig. 9(A). As N172 joins ML-Chord, it invokes joinðÞ and finds out
that N232 is its successor as shown in Fig. 9(B). Note that, at this
stage, the predecessor and the successor of N172 is nil and N23
2 ,respectively. The predecessor of N23
2 is N172 . However, the successor
of N142 is still N23
2 . The situation will not be corrected untilmaintenance is completed.
N153 N1
8
N121
N123
N125
N142
N142
--- ---
N151
N142
C1
N134
Lookup (C2,K19)
(N151)
(N 23)BP
(N 51)BP
(N 23)BP
FT - N125 BP Finger
Fig. 7. An example
Fig. 8. The pseudoco
Maintenance: Because nodes may join or leave ML-Chordfrequently, it is required to maintain ML-Chord periodically tokeep the accuracy of routing information such as finger tables,successors, and predecessors. The maintenance task includesstabilizeðÞ, fix_fingerðÞ, and check_predecessorðÞ as described inFig. 10. The main function of n:stabilizeðÞ is to periodically checkand correct n’s successor and predecessor if necessary. Inn:stabilizeðÞ, for each category c, it will first assign the predecessorof n’s successor to x. Then, if x is within the range of n andn:successor½c� excluding n and n:successor½c�, x will be assigned ton:successor½c�. Finally, n:successor½c�:notifyðnÞ is invoked. The pur-pose of n:notifyðn0Þ is to set n:closest_preceding_node½c� to n0 ifeither n:closest_preceding_node½c� is nil or n0 is within the range ofn:closest_preceding_node½c� and n excluding n:closest_preceding_node½c� and n. For example, in Fig. 9(B), after N17
2 joined thenetwork, its predecessor is nil. Also, N14
2 ’s successor is N232 . As
stated earlier, they are incorrect. When N142 executes stabilizeðÞ, x is
N172 . Because x is in the range of N14
2 and N232 , N14
2 ’s successor is setto x which is N17
2 . Also, N172 executes notifyðN14
2 Þ. Since N172 ’s
predecessor is nil, it is set to N142 .Therefore, after stabilizeðÞ is
executed, all errors were corrected and the result is shown inFig. 9(C).
fix_fingerðÞ was designed to maintain finger tables, and itspseudocode is shown in Fig. 10(B). In fix_fingerðÞ, every node willrebuild its finger tables and BP finger table which was describedearlier. Finally, n:check_predecessorðÞ was designed to checkwhether or not the predecessor of n had left. If yes, n’s predecessorwill be set to nil which is shown in Fig. 10(C).
N153
N22
---
------
N153
N18 N1
8
C2
N232
N23BP
N23BP
N23BP
(N51 )BP
(N23 )BP
N512
N22
N82
N112
N142
N232
N272
N322
N422
FT - N151 BPFT - N51FT - N2
51BP
query process.
de for node join.
ARTICLE IN PRESS
N142
N232 N23
2
N142
N172
N232
N172
N142
Fig. 9. An example node join.
Fig. 10. The pseudocode for maintenance.
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 583
4. Simulation experiments and analysis
PeerSim (2006) was used as the simulator for all experimentspresented in this section, because it is developed in Java and cansimulate up to 1,000,000 nodes. Since ML-Chord was developedbased on Chord, it would be interesting to investigate whether ornot ML-Chord outperforms Chord in various measurements.Additionally, because ML-Chord divides the network into multiplelayers as SCS, the experiments were designed to compare ML-Chord and SCS.
4.1. Simulation environment
Unless otherwise stated, all experiments presented below arebased on the parameters presented in Table 1. The number ofnodes is 2k where 9pkp15. Because the number of categories, thenumber of clusters per category, and the number of shortcuts are16, 8, 2, respectively, which are used in Gu et al.’s experiments (Guet al., 2005a), these values are also used in our experiments.However, because there is only one layer for Chord, the number ofcategory for Chord is 1. Although one node can be linked tomultiple categories in ML-Chord, the number of categories a peeris linking to is set to 1. Also, the number BPs is set to 8. In latersections, we will study the effects when the number of categoriesa peer linking to and the number of BPs are changed.
4.2. Average query costs
The average query cost is the average number of hops querymessages have to go through. It was calculated as follows: onenode was randomly selected, all other nodes in the networkqueried the node in turn, and the average number of hops wascalculated. As shown in Fig. 11, the experimental results show thatthe average query costs of ML-Chord is the lowest of the threemodels. Additionally, while the average query costs of both ML-Chord and Chord increases slightly, the average query cost of SCSgrows significantly when the number of node increases.
In previous experiments, the number of categories is fixed.However, it is interesting to know the effects on the average querycosts when the number of categories is changed. The number ofnodes is fixed at 210. The number of categories is 2c , where c is1;2;3;4;5;6, or 7. The experimental results are shown in Fig. 12. Itis clear that ML-Chord outperforms SCS in all cases. Also, it isobserved that, when the number of categories increases, theaverage query cost of ML-Chord decreases slightly, while theaverage query cost of SCS decreases in the beginning and thenlater increases significantly. We investigated this issue further andfound that in SCS, when the number of categories is small, thenumber of nodes in each cluster is large. As a result, the query costis dominated by message broadcasting in clusters. When thenumber of categories is increased, the query cost is dominated bytransmitting query messages among clusters.
Table 1Simulation parameters
Number of nodes 2k , where k ¼ 9;10;11; . . . ;15
Chord ML-Chord SCS
Number of categories 1 16 16
Number of categories per peer – 1 –
Number of BPs – 8 –
Number of clusters per category – – 8
Shortcuts – – 2
ARTICLE IN PRESS
120
100
80
60
40
20
0512 8192 16384 32768
Number of nodes
Num
ber o
f hop
s per
pee
r
1024 2048 4096
ML-Chord
Chord
SCS
Fig. 11. Average query costs vs. number of nodes.
100
90
80
70
60
50
40
30
20
10
0
Number of categories
ML-Chord
SCS
Num
ber o
f hop
s
2 4 8 16 32 64 128
Fig. 12. Average query costs vs. number of categories.
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588584
4.3. Maintenance costs
In P2P systems, a node may join or leave the network at will.Therefore, it is necessary to maintain the network periodically.The average maintenance cost is calculated based on the totalamount of messages transmitted during maintenance and dividedby the number of nodes. In the experiments, it is assumed thatthere is no node joining or leaving the network duringmaintenance. The number of nodes is 2k, where 8pkp15. Asshown in Fig. 13, the results show that the maintenance cost ofSCS is much lower than that of ML-Chord and Chord. Moreover,the maintenance cost of ML-Chord is lower than that of Chord. Asdescribed in the previous section, one major part of themaintenance cost is the cost of find_successorðÞ of each node.Because the cost of find_successorðc;nÞ in ML-Chord (the size ofsearch space is N=T where N is the total number of nodes and T isthe number of categories) is smaller than the cost offind_successorðnÞ in Chord (the size of search space is N), ML-Chord has lower maintenance cost than that of Chord.
4.4. Node joining
When a node joins the network, it is required to transmitmessages to update routing information. The average cost ofjoining a node is calculated as follows: when a node joins thenetwork at a specific node, the number of transmitted messages isrecorded. Then, the node joins all other nodes in turn, and theaverage number of transmitted messages is calculated. Theexperimental results are shown in Fig. 14. From the figure, it isclear that ML-Chord outperforms both Chord and SCS. Because thecost of node joining is dominated by find_successorðÞ, and becausefind_successorðc;nÞ in ML-Chord is smaller than the cost offind_successorðnÞ in Chord, the cost of node joining in ML-Chordis lower than that of Chord.
4.5. Total costs
From the previous experimental results, it is of no surprise thatthe average query cost of ML-Chord is much lower than SCS, while
ARTICLE IN PRESS
260240220200180160140120100
80604020
0512 1024 2048 4096 8192 16384 32768
ML-Chord
SCS
Chord
Num
ber o
f mes
sage
s per
pee
r
Number of nodes
Fig. 13. Maintenance costs vs. number of nodes.
35
30
25
20
15
10
5
0512 1024 4096 8192 16384 32768
Ml-ChordChordSCS
Num
ber o
f mes
sage
s of j
oin
one
peer
2048Number of nodes
Fig. 14. Joining costs vs. number of nodes.
175000
155000
95000
75000
55000
35000
15000
195000
135000
135000
256 512 768 1024
ML-ChordChordSCS
Num
ber o
f mes
sage
s
Number of lookups15361280
Fig. 15. Analysis of total costs.
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 585
the maintenance cost of SCS is lower than ML-Chord. Therefore,further investigation is required. In the following discussions, thetotal cost of the three models will be compared. For simplicity,only query cost and maintenance cost are considered. However, itis noted that the average cost of joining node of ML-Chord is lessthan that of SCS and Chord.
It is assumed that the number of nodes is 210 and maintenancetasks will be performed on each node every 30 s. Within one 30 s
period, there are 256;512;768;1024;1280, and 1538 queries. Thetotal costs are shown in Fig. 15. From the results, the total cost ofML-Chord is identical to that of SCS when the number of queries isabout 300. When the number of queries is greater than 300, ML-Chord is much superior than SCS. As for Chord and SCS, the totalcost of Chord is identical to that of SCS when the number ofqueries is 760. Additionally, the total costs were calculated whenthe numbers of nodes are 2k, where 11pkp15, and the results are
ARTICLE IN PRESS
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588586
similar to Fig. 15 except that the break-even point was moved left(i.e. the number of queries was decreased) when the number ofnodes was increased. In the interest of space, these figures areomitted.
4.6. Node leaving/failures
In this experiment, the stability of ML-Chord, Chord, and SCS isstudied. The stability of a P2P model is measured using theaverage query cost, the average number of timeouts, and thesuccessful rate after a large amount of nodes failed and beforemaintenance was performed. The number of nodes was set to 210,and the ratios of failing nodes are 0:0;0:1;0:2;0:3;0:4, and 0.5. Forcompleteness, every living node was selected in turn as a targetnode, and other living nodes queried the target node. Throughthese experiments, the average, minimum, and maximum of
Table 2Analysis of node failures
Simulation results Failure ratios
0.0 0.1 0.2 0.3 0.4 0.5
Avg. query cost (Min, max)
ML-Chord 4.042 4.188 4.393 4.414 4.413 4.507
ð1;6Þ ð1;8Þ ð1;9Þ ð1;9Þ ð1;9Þ ð1;12Þ
Chord 5.004 5.315 5.629 6.004 6.286 6.632
ð1;10Þ ð1;13Þ ð1;14Þ ð1;16Þ ð1;18Þ ð1;19Þ
SCS 13.832 23.375 31.881 37.503 26.825 22.340
ð1;39Þ ð1;309Þ ð1;351Þ ð1;303Þ ð1;243Þ ð1;192Þ
Avg. no. of timeouts (Min, max)
ML-Chord 0.0 1.535 2.381 2.163 2.860 3.542
ð0;0Þ ð1;9Þ ð1;11Þ ð1;12Þ ð1;16Þ ð1;22Þ
Chord 0.0 2.30 3.104 4.155 5.360 7.205
ð0;0Þ ð1;14Þ ð1;19Þ ð1;24Þ ð1;35Þ ð1;45Þ
SCS 0.0 5.114 12.289 22.044 23.624 23.868
ð0;0Þ ð1;66Þ ð1;153Þ ð1;231Þ ð1;232Þ ð1;228Þ
Successful rate (%)
ML-Chord 100.00 99.76 95.96 94.62 86.24 80.73
Chord 100.00 99.57 96.98 93.73 85.34 71.14
SCS 100.00 99.99 99.13 90.99 62.76 36.56
7
Link category: 2Link category: 4Link category: 8Link category: 16
Link category: 1
6.5
6
5.5
5
2.5
3
3.5
4
Num
ber o
f hop
s
2512 1024 2048 40
Number
4.5
Fig. 16. Average query costs for variou
query costs and timeouts are calculated. The experimental resultswere summarized in Table 2.
As shown in the table, the average query costs and the averagenumber of timeouts of ML-Chord is far superior than that of SCSand Chord. The only exceptions are that the successful rates of SCSis better than ML-Chord when the failure ratios are 0.1 and 0.2.This is because SCS does not have a TTL value, and thus querieswill not be terminated until either the target node is found or allnodes failed. However, when the failure ratio is greater than 0.2,the successful rate of SCS is very poor. This is because, in eachsemantic cluster, only one node is in charged mode of the inter-and intra-communications among semantic clusters. Once thefailure probability of the node is increased, the success rate isdecreased.
4.7. Number of categories per peer
As stated earlier, a node in ML-Chord may be linked to morethan one category. In previous experiments, it is assumed that onenode is only linked to one layer; in other words, each node has onefinger table and one BP finger table. However, it is uncommon thatone node may be linked to more than one layer in practice.Therefore, in this section, we studied the average query costswhen each node in the network was linked to 2k layers, where0pkp4. The experimental results are shown in Fig. 16. From thefigure, it is interesting to note that the average query cost is thelowest when each node is only linked to one layer. This is becausethe number of nodes in each category increases when a node islinked to more than one layer. When the number of nodes in acategory layer is increased, the query cost is also increased.Consequently, we recommended that, even when a node can belinked to more than one layer, the node should be assigned to onlyone layer.
4.8. The number of bridge peers
From the results of previous experiments, it is known that theaverage query cost is the highest when every node is a BP. Thus, itfurther investigations are worth to find out the reasonablenumber of BPs for a ML-Chord network. The experimental resultsare presented in Fig. 17. From the figure, it is clear that the average
96 3276816384 of nodes
8192
s number of categories per peer.
ARTICLE IN PRESS
Num
ber o
f hop
s
Nodes :2 10> >
> >
> >
Nodes :2 12
Nodes :2 14
Nodes :2 11
Nodes :2 13
Nodes :2 15
1 2 4 8 16 32 64 128Number of BP
256 512 1024 2048 4096 8192 16384
8
7
6
5
4
3
2
Fig. 17. Average query costs vs. number of BPs.
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 587
query cost remains (or close to) flat when the number of BPsincreases in the beginning. However, when the number of BPsgrows over a threshold, the query cost starts increasing rapidly.For example, when the number of nodes is 210, the query coststarts increasing rapidly when the number of BPs is 16. Similarly,the threshold values are 32;64;128;256, and 512 when thenumbers of nodes are 211, 212, 213, 214 and 215, respectively. As aresult, it is safe to say that the number of BPs should be set to 8.
5. Conclusions and future work
When designing a P2P model, it is important to take bothefficiency and scalability into consideration. In this paper, weproposed an efficient and scalable multi-layered P2P model calledML-Chord. From the experimental results, it is shown that:
�
ML-Chord is superior than SCS and Chord in queries. � Although SCS’s maintenance cost is lower than ML-Chord, theoverall efficiency (measured in average maintenance cost plusaverage query costs) of ML-Chord still outperforms both SCSand Chord.
� When the number of nodes increases, the average query costsof ML-Chord grows only slightly, while the average query costof SCS grows rapidly. The result demonstrated that ML-Chord ismore scalable than SCS.
� From Table 2, it is shown that ML-Chord is more stable thanboth Chord and SCS. However, the success rate of ML-Chord is alittle worse than SCS when the failure ratios are 0.1 and 0.2.When the failure ratio was increased to 0.3, 0.4, and 0.5, ML-Chord proved to be more stable than the others.
� Although a node can be linked to more than one overlay layer,it is suggested that one node should be linked to only one layerfor better performance.
As stated earlier, peer data management system (PDMS) is apotential and important research area. One critical issue in
designing a PDMS is to select a reasonable number of super-peersto manage metadata and keep maintenance cost reasonably low atthe same time. From the experimental results shown in Fig. 17, BPsseem to be a good candidate for super-peers. However, furtherinvestigation is required.References
Antonopoulos N, Salter J, Peel R. A multi-ring method for efficient multi-dimensional data lookup in p2p networks. In: Proceedings of the 1stinternational conference on scalable information systems; 2006. p. 10–6.
Cai M, Frank M. RDFPeers: a scalable distributed RDF repository based on astructured peer-to-peer network. In: Proceedings of the 13th internationalconference on World Wide Web; 2004. p. 650–7.
Cai M, Frank M, Yan B, MacGregor R. Subscribable peer-to-peer RDF repository fordistributed metadata management. Web Semant: Sci Services Agents WorldWide Web 2004;2(2):109–30.
Garces-Erice L, Biersack E, Ross K, Felber P, Urvoy-Keller G. Hierarchical peer-to-peer systems. Parallel Process Lett 2003;13(4):643–57.
Gu T, Pung HK, Zhang D. A peer-to-peer overlay for context information search. In:Proceedings of the 14th international conference on computer communica-tions and networks (ICCCN 2005). NY: Wiley; 2005a. p. 395–400.
Gu T, Tan E, Pung HK, Zhang D. ContextPeers: scalable peer-to-peer search forcontext information. In: Proceedings of the 1st international workshop oninnovations in web infrastructure (IWI 2005); 2005b.
Haase P, Siebes R, van Harmelen F. Peer selection in peer-to-peer networks withsemantic topologies. In: Proceedings of the international conference onsemantics in a networked world (ICNSW’04); 2004. p. 108–25.
Kaashoek M, Karger D. Koorde: a simple degree-optimal distributed hash table. In:Proceedings of the 2nd international workshop on peer-to-peer systems(IPTPS’03); 2003.
Liu J, Zhuge H. A semantic-based P2P resource organization model R-Chord. J SystSoftware 2006;79(11):1619–31.
Lv Q, Cao P, Cohen E, Li K, Shenker S. Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th international conference onsupercomputing; 2002. p. 84–95.
Mislove A, Druschel P. Providing administrative control and autonomy instructured peer-to-peer overlays. In: Proceedings of the 3rd internationalworkshop on peer-to-peer systems (IPTPS’04); 2004.
Nejdl W, Wolf B, Qu C, Decker S, Sintek M, Naeve A, et al. EDUTELLA: a P2Pnetworking infrastructure based on RDF. In: Proceedings of the 11thinternational World Wide Web conference (WWW 2002); 2002. p. 604–15.
Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M, Brunkhorst I, et al. Super-peer-based routing and clustering strategies for RDF-based peer-to-peernetworks. In: Proceedings of the 12th international conference on World WideWeb; 2003. p. 536–43.
Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M, Brunkhorst I. Super-peer-based routing strategies for RDF-based peer-to-peer networks. Web Semant:Sci Services Agents World Wide Web 2004;1(2):177–86.
Novak D, Zezula P. M-Chord: a scalable distributed similarity search structure. In:Proceedings of the 2005 international conference on foundations of computerscience (FCS’05); 2005.
PeerSim, 2006. PeerSim. Available from hhttp://peersim.sourceforge.net/i.
Ratnasamy S, Francis P, Handley M, Karp R. A scalable content-addressablenetwork. In: ACM SIGCOMM 2001; 2001. p. 161–72.
RDFStore 2006. RDFStore. Available from hhttp://rdfstore.sourceforge.net/i.Ripeanu M, Foster I, Iamnitchi A. Mapping the Gnutella network: properties of
large-scale peer-to-peer systems and implications for system design. IEEEInternet Comput 2002;6(1):50–7.
ARTICLE IN PRESS
E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588588
Rowstron A, Druschel P. Pastry: scalable, decentralized object location and routingfor large-scale peer-to-peer systems. In: IFIP/ACM international conference ondistributed systems platforms; 2001. p. 329–50.
Sen S, Wang J. Analyzing peer-to-peer traffic across large networks. ACM/IEEETrans Network 2004;12(2):219–32.
Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H. Chord: a scalable peer-to-peer lookup service for Internet applications. In: ACM SIGCOMM 2001;2001. p. 149–60.
Stoica I, Morris R, Linben-Dowell D, Karger DR, Kaashoek MF, Dabek F, et al. Chord:a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACMTrans Network 2003;11(1).
Wepiwe G, Simeonov P. A concentric multi-ring overlay for highly reliablep2p networks. In: Proceedings of the 2005 4th IEEE international
symposium on network computing and applications (NCA’05); 2005.p. 83–90.
Xu Z, Min R, Hu Y. HIERAS: a DHT based hierarchical p2p routing algorithm. In:Proceedings of the 2003 international conference on parallel processing(ICPP’03); 2003. p. 187–94.
Zhao B, Duan Y, Huang L, Joseph A, Kubiatowicz J. Brocade: landmark routing onoverlay networks. In: Proceedings of the 1st international workshop on peer-to-peer systems (IPTPS’02); 2002.
Zhao B, Huang L, Stribling J, Rhea S, Joseph A, Kubiatowicz J. Tapestry: a resilientglobal-scale overlay for service deployment. IEEE J Sel Areas Commun2004;22(1):41–53.