11
ML-Chord: A multi-layered P2P resource sharing model $ Eric Jui-Lin Lu a, , Yung-Fa Huang b , Shu-Chiu Lu b a Department of Management Information Systems, National Chung Hsing University, 250 Kuo Kuang Road, Taichung 402, Taiwan, ROC b Graduate Institute of Networking and Communication Engineering, Chaoyang University of Technology,168 Gifeng E. Road, Wufeng, Taichung County 413, Taiwan, ROC article info Article history: Received 24 November 2007 Received in revised form 7 July 2008 Accepted 5 August 2008 Keywords: Peer-to-peer Semantic Chord RDF Resource sharing abstract In recent years, due to the emergence of P2P technology, people rely on the Internet to share resources. It is believed that the number of users and shared resources will become enormously huge. As a result, many researches have been dedicated to improve the scalability and efficiency of P2P models. In this paper, we propose a multi-layered P2P resource sharing model, called ML-Chord, that assigns nodes into Chord-like layers based on the categories of shared resources. From the experimental results, it shows that ML-Chord is both efficient and scalable. & 2008 Elsevier Ltd. All rights reserved. 1. Introduction With the blooming development of the Internet, the demand for efficient resource sharing is increased rapidly. Due to its simplicity, P2P technology has been widely used for sharing resources, and it is believed that the number of users and shared resources will become enormously huge. This tendency, however, raises two critical issues that must be resolved when designing a P2P resource sharing model: one is scalability and the other is efficiency. In the past, many P2P models were proposed. These P2P models were classified into three categories (Lv et al., 2002): centralized, decentralized and unstructured, as well as decentra- lized and structured. A well-known centralized P2P system is Napster. Because a centralized directory server is required, it is in general not scalable. For decentralized and unstructured P2P systems such as Gnutella, query messages are broadcasted until the requested resources are found, and this results in high traffic overhead and low scalability (Sen and Wang, 2004). For efficiency and scalability, many P2P systems are decentralized and struc- tured, and the most well-known ones include CAN (Ratnasamy et al., 2001), Pastry (Rowstron and Druschel, 2001), Chord (Stoica et al., 2001, 2003), and Tapstry (Zhao et al., 2004). In CAN, the d- dimensional search space is dynamically partitioned into N spaces which results in the average query cost of logarithmic order OðdN 1=d Þ. For Pastry, Chord, and Tapstry models, indexes (key– value pairs) were distributed among nodes. This results in low query costs of logarithmic order Oðlog NÞ . Because resource indexes were generated using hashing functions and distributed among nodes, these P2P systems were also called DHT-based (distributed hash table) systems. To perform well not only on the overlay network but also on its underlying network, multi-ring DHT models were proposed and their sub-rings were created based on either network latency (Xu et al., 2003), administrative domains such as firewalls and gateways (Zhao et al., 2002; Mislove and Druschel, 2004), or content locality (Garces-Erice et al., 2003). Kaashoek and Karger (2003) also proposed a DHT- based system called Koorde which is based on Chord and de Bruijn graph. In Koorde, the query cost is Oðlog NÞ if each node has two neighbors, the query cost can be reduced to Oðlog N= log log NÞ if each node has Oðlog NÞ neighbors. In 2005, Wepiwe and Simeonov (2005) proposed a concentric multi-ring network that further improved Koorde. Although DHT-based systems are in general scalable and efficient, they only support exact-match search. Recently, a new arena of P2P research is the usage of metadata to describe the shared resources which in part resolves the exact- match search problem embedded in pure DHT-based systems . Furthermore, it is believed that P2P systems using metadata can provide flexible and faster queries. There are many semantic- based P2P projects such as RDFStore (RDFStore, 2006), Edutella (Nejdl et al., 2002, 2003, 2004), RDFPeers (Cai and Frank, 2004; Cai et al., 2004), Expertise (Haase et al., 2004), ContextPeers (Gu et al., 2005b), SCS (Gu et al., 2005a), SuperRing (Antonopoulos et al., 2006), M-Chord (Novak and Zezula, 2005), and R-Chord (Liu ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jnca Journal of Network and Computer Applications 1084-8045/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.jnca.2008.08.002 $ This research was partially supported by the National Science Council, Taiwan, ROC, under Contract no. NSC95-2221-E-005-050-MY2. Corresponding author. Tel.: +886 4 22840864; fax: +886 422857173. E-mail address: [email protected] (E.J.-L. Lu). Journal of Network and Computer Applications 32 (2009) 578–588

ML-Chord: A multi-layered P2P resource sharing model

Embed Size (px)

Citation preview

Page 1: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

Journal of Network and Computer Applications 32 (2009) 578–588

Contents lists available at ScienceDirect

Journal of Network and Computer Applications

1084-80

doi:10.1

$ Thi

ROC, un� Corr

E-m

journal homepage: www.elsevier.com/locate/jnca

ML-Chord: A multi-layered P2P resource sharing model$

Eric Jui-Lin Lu a,�, Yung-Fa Huang b, Shu-Chiu Lu b

a Department of Management Information Systems, National Chung Hsing University, 250 Kuo Kuang Road, Taichung 402, Taiwan, ROCb Graduate Institute of Networking and Communication Engineering, Chaoyang University of Technology, 168 Gifeng E. Road, Wufeng, Taichung County 413, Taiwan, ROC

a r t i c l e i n f o

Article history:

Received 24 November 2007

Received in revised form

7 July 2008

Accepted 5 August 2008

Keywords:

Peer-to-peer

Semantic

Chord

RDF

Resource sharing

45/$ - see front matter & 2008 Elsevier Ltd. A

016/j.jnca.2008.08.002

s research was partially supported by the Nati

der Contract no. NSC95-2221-E-005-050-MY

esponding author. Tel.: +886 4 22840864; fax

ail address: [email protected] (E.J.-L. Lu).

a b s t r a c t

In recent years, due to the emergence of P2P technology, people rely on the Internet to share resources.

It is believed that the number of users and shared resources will become enormously huge. As a result,

many researches have been dedicated to improve the scalability and efficiency of P2P models. In this

paper, we propose a multi-layered P2P resource sharing model, called ML-Chord, that assigns nodes into

Chord-like layers based on the categories of shared resources. From the experimental results, it shows

that ML-Chord is both efficient and scalable.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

With the blooming development of the Internet, the demandfor efficient resource sharing is increased rapidly. Due to itssimplicity, P2P technology has been widely used for sharingresources, and it is believed that the number of users and sharedresources will become enormously huge. This tendency, however,raises two critical issues that must be resolved when designing aP2P resource sharing model: one is scalability and the other isefficiency.

In the past, many P2P models were proposed. These P2Pmodels were classified into three categories (Lv et al., 2002):centralized, decentralized and unstructured, as well as decentra-lized and structured. A well-known centralized P2P system isNapster. Because a centralized directory server is required, it is ingeneral not scalable. For decentralized and unstructured P2Psystems such as Gnutella, query messages are broadcasted untilthe requested resources are found, and this results in high trafficoverhead and low scalability (Sen and Wang, 2004). For efficiencyand scalability, many P2P systems are decentralized and struc-tured, and the most well-known ones include CAN (Ratnasamyet al., 2001), Pastry (Rowstron and Druschel, 2001), Chord (Stoicaet al., 2001, 2003), and Tapstry (Zhao et al., 2004). In CAN, the d-dimensional search space is dynamically partitioned into N spaces

ll rights reserved.

onal Science Council, Taiwan,

2.

: +886 4 22857173.

which results in the average query cost of logarithmic orderOðdN1=d

Þ. For Pastry, Chord, and Tapstry models, indexes (key–value pairs) were distributed among nodes. This results in lowquery costs of logarithmic order Oðlog NÞ. Because resourceindexes were generated using hashing functions and distributedamong nodes, these P2P systems were also called DHT-based(distributed hash table) systems. To perform well not only on theoverlay network but also on its underlying network, multi-ringDHT models were proposed and their sub-rings were createdbased on either network latency (Xu et al., 2003), administrativedomains such as firewalls and gateways (Zhao et al., 2002;Mislove and Druschel, 2004), or content locality (Garces-Ericeet al., 2003). Kaashoek and Karger (2003) also proposed a DHT-based system called Koorde which is based on Chord and de Bruijngraph. In Koorde, the query cost is Oðlog NÞ if each node has twoneighbors, the query cost can be reduced to Oðlog N= log log NÞ ifeach node has Oðlog NÞ neighbors. In 2005, Wepiwe and Simeonov(2005) proposed a concentric multi-ring network that furtherimproved Koorde. Although DHT-based systems are in generalscalable and efficient, they only support exact-match search.

Recently, a new arena of P2P research is the usage of metadatato describe the shared resources which in part resolves the exact-match search problem embedded in pure DHT-based systems .Furthermore, it is believed that P2P systems using metadata canprovide flexible and faster queries. There are many semantic-based P2P projects such as RDFStore (RDFStore, 2006), Edutella(Nejdl et al., 2002, 2003, 2004), RDFPeers (Cai and Frank, 2004;Cai et al., 2004), Expertise (Haase et al., 2004), ContextPeers (Guet al., 2005b), SCS (Gu et al., 2005a), SuperRing (Antonopouloset al., 2006), M-Chord (Novak and Zezula, 2005), and R-Chord (Liu

Page 2: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 579

and Zhuge, 2006). These semantic-based P2P models can beroughly classified into two categories based on their mainobjectives. The main objective of one category is to provide fasterqueries by reducing search space based on the metadata of sharedresources. Example of such P2P systems include Expertise,ContextPeers, SCS, and SuperRing. The main objective of the othercategory, called peer data management systems (PDMS), is toprovide flexible queries such as disjunctive and range queries.Example of such P2P systems include RDFStore, RDFPeers, M-Chord, Edutella, and R-Chord.

To support flexible queries in PDMS, each shared resource isdescribed by metadata. All metadata are usually saved inrepositories, called super-peers, which are nodes that arerelatively static and have higher bandwidth or computing power.The organization of super-peers can be either centralized (ex.RDFStore) or distributed (ex. Edutella, RDFPeers, M-Chord, and R-Chord). In Edutella, metadata are saved in nodes, called super-peers, where they are organized as a hypercube. In the RDFPeersand M-Chord, metadata are saved in nodes that are organized in aChord-like ring. In the R-Chord model, metadata are saved insuper-peers that are organized in a hybrid structure and related toeach other with views. In PDMS, queries are sent to repositories tolocate the nodes that host the requested resources. Because super-peers in PDMS are relatively static, the maintenance overhead ofthese models is rarely discussed. Because the main purpose of thispaper is to propose a P2P model that is not only efficient andscalable, but also of low maintenance cost, P2P systems in thesecond category are not considered in the rest of the paper.

In this paper, we focus on the first category and propose amulti-layered P2P resource sharing model called ML-Chord. InML-Chord, all resources are classified into categories based on aselected ontology. Each category is corresponding to an overlaylayer in ML-Chord. Because a shared resource may belong to morethan one category, the node that hosts the shared resource may belinked to or associated with more than one layer. On each layer,nodes are organized in a Chord-like manner. From the results ofour various experiments that studied average query costs, averagemaintenance costs, the average costs of node joining, the stabilityin case of massive node failure, etc., we show that ML-Chord issuperior than both Chord and SCS.

The rest of the paper is organized as follows: Section 2 brieflyreviewed semantic-based P2P protocols in the first category. Thedesign of ML-Chord is discussed in Section 3. In Section 4, varioussimulation experiments and their results are presented andanalyzed. Finally, we conclude our work in Section 5.

ContextBus A

ContextBus C

ContextBus D

ContextBus B

:ContextPeer :BridgePeer

Fig. 1. Framework of ContextPeers file sharing system.

2. Related work

Edutella (Nejdl et al., 2002, 2003, 2004) is a Gnutella-like P2Pnetwork (Ripeanu et al., 2002) which utilizes RDF to describe andsearch a wide range of resources. In Edutella, users issue queriescoded in Edutella query language, and query messages arebroadcasted to all nearby nodes. If the queried resources cannotbe found among the nearby nodes, the query messages will betransmitted again to their nearby nodes until either the requestedresources are found or the failure messages are returned if thequeries exceed the number of time-to-live (TTL). Like all broad-cast-based P2P systems, Edutella suffers large transmissionoverhead and poor scalability.

Unlike Edutella which transmits query messages blindly tonear nodes, Haase et al. (2004) proposed a P2P protocol thatselects to-be-queried nodes intelligently to reduce unnecessarytransmissions. In Hasse and Siebes model, each node extracts asummary, known as an expertise, from its knowledge base andsends the expertise to other nodes. When receiving an expertise, a

node will compare the receiving expertise with its own expertise.If they are similar, the node would save the expertise. Whenquerying resources, a node sends query messages to adjacentnodes by the way of broadcast. When receiving a query message, anode will check whether or not it has the requested resources. Ifnot, it will extract the subject from the message and compare thesubject with all expertises the node has. Then, the query messagewill only be sent to those adjacent nodes whose expertises aremore similar to the query message than its own expertise. TheHasse and Siebes model reduces transmission overhead withoutblindly broadcasting. However, if there is a gap between theinquiry node and the target nodes, the resources on the targetnodes may not be found.

Gu et al. (2005b) proposed ContextPeers which classifies sharedresources, based on their metadata, into groups. Each group is aContextBus as shown in Fig. 1. Each ContextBus is a unstructurednetwork topology that deliverers messages by the way of broad-cast. Each node can be linked to one or more ContextBuses basedon the categories of its sharing resources. A node with bettercapability (ex. larger communication bandwidth or processingpower, more battery power, etc.) can be selected as a BridgePeer

which is linked to all ContextBuses. When receiving a querymessage on the same ContextBus, a node will broadcast the querymessage to other nodes on the same ContextBus. Otherwise, whenreceiving a query message for resources on a ContextBus that ithas no direct link, a node transmits the query message to aBridgePeer and then the BridgePeer broadcasts the query messageonto the target ContextBus.

As stated by its developers (Gu et al., 2005a), the scalability ofContextPeers is poor and its maintenance cost is high. Therefore,they proposed a new model called semantic context space (SCS). InSCS, a ContextBus becomes a semantic cluster (SC) which is furtherdivided into clusters. A node is allocated to a cluster based on themain category of its sharing resources. Nodes within a cluster arefully interconnected. The network topology becomes a ring as shownin Fig. 2. When a node (ex. N1) receives a query message Q, N1checks whether or not Q falls into its own category SC0. If yes, N1broadcasts Q to its own cluster C0 and also forwards Q to adjacentclusters. Otherwise, N1 forwards Q to adjacent SCs such as SC1 andSC7. To speed up search process, SCS allows users to define shortcuts

between SCs. There is one shortcut between SC0 and SC4 as shownin Fig. 2. As stated in the paper, although the more shortcuts thebetter search performance, authors warned that maintenance costgrows rapidly when the number of shortcuts is increased.

Antonopoulos et al. (2006) developed a multi-ring modelbased on Chord. Each shared resource is described by one or morekeywords. Nodes are organized in multiple keyword rings. Eachnode in a keyword ring contains the list of nodes that hostresources matching a certain keyword/value pair. For example, asshown in Fig. 3, the FedoraVersion ring contains nodes N2, N4,and N7 which contain the list of nodes that host resourcesmatching ‘‘FedoraVersion ¼ 6:0’’, ‘‘FedoraVersion ¼ 7:1’’ and

Page 3: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

N4

N5

N1

N1C1

N7

N8

N6

C28

C26

C25

C24

C22

C20

C19C18

C17

C12

C11

C10

C9

C8

C6

C4

C3C2

C0

C14C16

N3

SC0SC7

SC6

SC5

SC4 SC3

SC2

SC1

Fig. 2. The network topology of SCS.

Fig. 3. The network topology of superring.

: Peer : BP

CategoryLayer

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588580

‘‘FedoraVersion ¼ 8:0’’, respectively. A Super Ring is also neededto connect keyword rings. To query resources matching‘‘ FedoraVersion ¼ 7:1’’, the keyword ‘‘ FedoraVersion’’ is firsthashed, and the hash value is used to locate a node in the SuperRing that is connected to the FedoraVersion ring. Then, thekeyword value ‘‘7.1’’ is hashed to locate a node in the FedoraVer-sion ring. One major drawback of the model is that it heavilydepends on a bootstrap server.

BP Layer

Fig. 4. The architecture of ML-Chord.

3. The design of ML-Chord

The architecture: The proposed ML-Chord is a multi-layeredP2P resource sharing model. The number of overlay layers

depends on the number of categories for a specific domain orontology. Each layer, called category layer, is a Chord-like overlaynetwork. There are two types of nodes: one is normal peer, andthe other is called bridge peer (BP). Based on the categories ofshared resources, a peer may be associated with more than onelayer. A peer with better capabilities (such as relatively higherprocessing power or bandwidth) can be selected as a BP. A BP islinked to all categories. For efficiency, all BPs themselves form aChord-like overlay network called BP Layer. As shown in Fig. 4, ifthere is 4 categories, then ML-Chord has 4þ 1 layers.

Each node in ML-Chord has a unique ID number and is denotedas NID

i . In all-IP based network, NIDi can be calculated as NID

i �

HmðIPÞkCi where Hm is a hash function of m-bits, k is aconcatenation symbol, and Ci denotes the i th category where1pipT and T is the total number of categories. Similarly, the IDnumber of a shared resource can be calculated as KRID

i � HmðRÞkCi,where R denotes the content of the shared resource.

Successor and predecessor: Every node in each layer (eithercategory layers or BP layer) is sorted by its ID number, organizedin a ring, and connected to a successor and a predecessor. Asuccessor of a node id is a node that is arranged after id and closestto id, while a predecessor of id is a node that is arranged before id

and closest to id. In Fig. 5(A), it is a pseudo code of find_successorðÞ

that is used to find the successor for node id in category c. Supposethere is a node n which wants to find the successor of node id incategory c, n will invoke find_successor(c, id). In find_successor(c,id), it will first check to see whether or not n is linked to categoryc. If not, n will ask a BP to find the successor for itself by invokingbp:find_successor(c, id). Otherwise, it will check whether or not id

is within the range of n and n:successor½c�, excluding n, wheren:successor½c� denotes n’s successor in category c. If id is withinthe specified range, n:successor½c� is returned; otherwise,find_successor(c, id) will ask n0 to find successor for id where n0

is obtained by executing find_predecessor (c, id).In Fig. 5(B), it is a pseudo code of find_predecessor(c, id) that is

used to find the predecessor for node id in category c. Infind_predecessor(c, id), n:finger½c� denotes n’s finger table forcategory c, and n:finger½c; i� denotes the i th entry of n:finger½c�.For each entry in n:finger½c�, find_predecessor(c, id) checks whetheror not n:finger½c; i� is between n and id excluding n and id. If yes,n:finger½c; i� is returned; otherwise, n is returned.

Page 4: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

Fig. 5. (A) Get the successor of id and (B) get the predecessor of id.

Fig. 6. Finger tables and BP finger table of N51 and N42.

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 581

Finger tables: Every peer and BP has a routing table, calledfinger table, for each category which it belongs to. The size of eachfinger table is m. The procedure for creating finger tables isdescribed as follows: for a node id in category c, id calculatestk ¼ ðidþ 2k�1) mod 2m for all k where 1 p k p m. Then, for eachtk, id invokes find_successor(c,tk) to obtain tk0 which is thesuccessor of tk. All tk0 constitute the finger table of id for categoryc.

In addition to finger tables for category layers, there is a BP

finger table in each peer including BPs. The creation of BP fingertable for BPs is identical to the above procedure. However, thecreation of BP finger table for normal peers is different. To create aBP finger table, a normal peer randomly selects a node from anarbitrary finger table. Then, the peer retrieves the first entry,which is a BP, from the BP finger table of the selected node, andthe selected BP becomes the first entry of its BP finger table. Thesecond entry of its BP finger table is the successor of the firstentry. The same procedure will continue until all entries are filledup. The size of BP finger table for normal peer is d satisfying1pdpm.

Fig. 6 shows example of finger tables of a BP and a normal peerin a domain of two categories. N51 is a BP and has three fingertables of size m. N42 is a normal peer and has two finger tables ofsize m. The size of N42’s BP finger table is d. To determine the firstentry of the BP finger table for N42, N42 first selects the first entryof the finger table for category 1. The selected node is N51. Then,because the first entry of N51’s BP finger table is N23

BP , the first entryof N42’s BP finger table is N23

BP . Also, the second entry of N42’s BPfinger table is the successor of N23 which is N51.

Query: To query the location of a resource R on the category c,the query node id hashes the resource to obtain HmðRÞ and invokesid:find_successor(n, c) to find the location of R where n is a node IDin id’s finger table that is greater than or equal to HmðRÞ but closestto HmðRÞ. If R is in a category c that id has no direct link, id will lookup its BP finger table to find a BP and ask the BP to query for it. Byusing the finger table for category c, the BP invokesfind_successor() to find the location of R. From the previousdiscussions, it is clear that the number of hops for a query is in thelogarithmic order of Oð1þ log N=TÞ if all nodes are uniformlydistributed in T category layers.

Page 5: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588582

For example, as shown in Fig. 7(A), a node N34 in category C1

attempts to find out the location of a resource K192 . Because N34

does not have a finger table for C2, N34 has to ask a BP to query forit. N34 looks up the BP finger table, obtains N51

BP , and sends thequery message to N51

BP . On behalf of N34, N512 continues the query.

Although N51BP and N51

2 are conceptually located at two differentlayers, they are the same node. As shown in Fig. 7(B), N51

2 looks up itsfinger table for category C2 and obtains N8

2 which is close to K192 . N51

2

will send the query message to N82 and ask N8

2 to continue the query.The same procedure will be repeated until N23

2 is reached.Node join: When a new node joins ML-Chord, it has to find out

its successor first and then connect to the successor. Theprocedure is described in Fig. 8. If a node n wants to join acategory c, but there is no existing node in c, n invokes createðcÞ asshown in Fig. 8(A). In createðcÞ, n simply sets its predecessor to nil

and its successor to itself.However, if there are other nodes in category c, n randomly

selects a node snode and invokes joinðc; snodeÞ as shown in Fig.8(B). In joinðc; snodeÞ, n sets its predecessor to nil, invokessnode:find_successorðc;nÞ to obtain its successor, and finally resetsits successor’s predecessor to n. An example of node join isillustrated in Fig. 9. Initially, N23

2 is the successor of N142 as shown

in Fig. 9(A). As N172 joins ML-Chord, it invokes joinðÞ and finds out

that N232 is its successor as shown in Fig. 9(B). Note that, at this

stage, the predecessor and the successor of N172 is nil and N23

2 ,respectively. The predecessor of N23

2 is N172 . However, the successor

of N142 is still N23

2 . The situation will not be corrected untilmaintenance is completed.

N153 N1

8

N121

N123

N125

N142

N142

--- ---

N151

N142

C1

N134

Lookup (C2,K19)

(N151)

(N 23)BP

(N 51)BP

(N 23)BP

FT - N125 BP Finger

Fig. 7. An example

Fig. 8. The pseudoco

Maintenance: Because nodes may join or leave ML-Chordfrequently, it is required to maintain ML-Chord periodically tokeep the accuracy of routing information such as finger tables,successors, and predecessors. The maintenance task includesstabilizeðÞ, fix_fingerðÞ, and check_predecessorðÞ as described inFig. 10. The main function of n:stabilizeðÞ is to periodically checkand correct n’s successor and predecessor if necessary. Inn:stabilizeðÞ, for each category c, it will first assign the predecessorof n’s successor to x. Then, if x is within the range of n andn:successor½c� excluding n and n:successor½c�, x will be assigned ton:successor½c�. Finally, n:successor½c�:notifyðnÞ is invoked. The pur-pose of n:notifyðn0Þ is to set n:closest_preceding_node½c� to n0 ifeither n:closest_preceding_node½c� is nil or n0 is within the range ofn:closest_preceding_node½c� and n excluding n:closest_preceding_node½c� and n. For example, in Fig. 9(B), after N17

2 joined thenetwork, its predecessor is nil. Also, N14

2 ’s successor is N232 . As

stated earlier, they are incorrect. When N142 executes stabilizeðÞ, x is

N172 . Because x is in the range of N14

2 and N232 , N14

2 ’s successor is setto x which is N17

2 . Also, N172 executes notifyðN14

2 Þ. Since N172 ’s

predecessor is nil, it is set to N142 .Therefore, after stabilizeðÞ is

executed, all errors were corrected and the result is shown inFig. 9(C).

fix_fingerðÞ was designed to maintain finger tables, and itspseudocode is shown in Fig. 10(B). In fix_fingerðÞ, every node willrebuild its finger tables and BP finger table which was describedearlier. Finally, n:check_predecessorðÞ was designed to checkwhether or not the predecessor of n had left. If yes, n’s predecessorwill be set to nil which is shown in Fig. 10(C).

N153

N22

---

------

N153

N18 N1

8

C2

N232

N23BP

N23BP

N23BP

(N51 )BP

(N23 )BP

N512

N22

N82

N112

N142

N232

N272

N322

N422

FT - N151 BPFT - N51FT - N2

51BP

query process.

de for node join.

Page 6: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

N142

N232 N23

2

N142

N172

N232

N172

N142

Fig. 9. An example node join.

Fig. 10. The pseudocode for maintenance.

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 583

4. Simulation experiments and analysis

PeerSim (2006) was used as the simulator for all experimentspresented in this section, because it is developed in Java and cansimulate up to 1,000,000 nodes. Since ML-Chord was developedbased on Chord, it would be interesting to investigate whether ornot ML-Chord outperforms Chord in various measurements.Additionally, because ML-Chord divides the network into multiplelayers as SCS, the experiments were designed to compare ML-Chord and SCS.

4.1. Simulation environment

Unless otherwise stated, all experiments presented below arebased on the parameters presented in Table 1. The number ofnodes is 2k where 9pkp15. Because the number of categories, thenumber of clusters per category, and the number of shortcuts are16, 8, 2, respectively, which are used in Gu et al.’s experiments (Guet al., 2005a), these values are also used in our experiments.However, because there is only one layer for Chord, the number ofcategory for Chord is 1. Although one node can be linked tomultiple categories in ML-Chord, the number of categories a peeris linking to is set to 1. Also, the number BPs is set to 8. In latersections, we will study the effects when the number of categoriesa peer linking to and the number of BPs are changed.

4.2. Average query costs

The average query cost is the average number of hops querymessages have to go through. It was calculated as follows: onenode was randomly selected, all other nodes in the networkqueried the node in turn, and the average number of hops wascalculated. As shown in Fig. 11, the experimental results show thatthe average query costs of ML-Chord is the lowest of the threemodels. Additionally, while the average query costs of both ML-Chord and Chord increases slightly, the average query cost of SCSgrows significantly when the number of node increases.

In previous experiments, the number of categories is fixed.However, it is interesting to know the effects on the average querycosts when the number of categories is changed. The number ofnodes is fixed at 210. The number of categories is 2c , where c is1;2;3;4;5;6, or 7. The experimental results are shown in Fig. 12. Itis clear that ML-Chord outperforms SCS in all cases. Also, it isobserved that, when the number of categories increases, theaverage query cost of ML-Chord decreases slightly, while theaverage query cost of SCS decreases in the beginning and thenlater increases significantly. We investigated this issue further andfound that in SCS, when the number of categories is small, thenumber of nodes in each cluster is large. As a result, the query costis dominated by message broadcasting in clusters. When thenumber of categories is increased, the query cost is dominated bytransmitting query messages among clusters.

Table 1Simulation parameters

Number of nodes 2k , where k ¼ 9;10;11; . . . ;15

Chord ML-Chord SCS

Number of categories 1 16 16

Number of categories per peer – 1 –

Number of BPs – 8 –

Number of clusters per category – – 8

Shortcuts – – 2

Page 7: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

120

100

80

60

40

20

0512 8192 16384 32768

Number of nodes

Num

ber o

f hop

s per

pee

r

1024 2048 4096

ML-Chord

Chord

SCS

Fig. 11. Average query costs vs. number of nodes.

100

90

80

70

60

50

40

30

20

10

0

Number of categories

ML-Chord

SCS

Num

ber o

f hop

s

2 4 8 16 32 64 128

Fig. 12. Average query costs vs. number of categories.

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588584

4.3. Maintenance costs

In P2P systems, a node may join or leave the network at will.Therefore, it is necessary to maintain the network periodically.The average maintenance cost is calculated based on the totalamount of messages transmitted during maintenance and dividedby the number of nodes. In the experiments, it is assumed thatthere is no node joining or leaving the network duringmaintenance. The number of nodes is 2k, where 8pkp15. Asshown in Fig. 13, the results show that the maintenance cost ofSCS is much lower than that of ML-Chord and Chord. Moreover,the maintenance cost of ML-Chord is lower than that of Chord. Asdescribed in the previous section, one major part of themaintenance cost is the cost of find_successorðÞ of each node.Because the cost of find_successorðc;nÞ in ML-Chord (the size ofsearch space is N=T where N is the total number of nodes and T isthe number of categories) is smaller than the cost offind_successorðnÞ in Chord (the size of search space is N), ML-Chord has lower maintenance cost than that of Chord.

4.4. Node joining

When a node joins the network, it is required to transmitmessages to update routing information. The average cost ofjoining a node is calculated as follows: when a node joins thenetwork at a specific node, the number of transmitted messages isrecorded. Then, the node joins all other nodes in turn, and theaverage number of transmitted messages is calculated. Theexperimental results are shown in Fig. 14. From the figure, it isclear that ML-Chord outperforms both Chord and SCS. Because thecost of node joining is dominated by find_successorðÞ, and becausefind_successorðc;nÞ in ML-Chord is smaller than the cost offind_successorðnÞ in Chord, the cost of node joining in ML-Chordis lower than that of Chord.

4.5. Total costs

From the previous experimental results, it is of no surprise thatthe average query cost of ML-Chord is much lower than SCS, while

Page 8: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

260240220200180160140120100

80604020

0512 1024 2048 4096 8192 16384 32768

ML-Chord

SCS

Chord

Num

ber o

f mes

sage

s per

pee

r

Number of nodes

Fig. 13. Maintenance costs vs. number of nodes.

35

30

25

20

15

10

5

0512 1024 4096 8192 16384 32768

Ml-ChordChordSCS

Num

ber o

f mes

sage

s of j

oin

one

peer

2048Number of nodes

Fig. 14. Joining costs vs. number of nodes.

175000

155000

95000

75000

55000

35000

15000

195000

135000

135000

256 512 768 1024

ML-ChordChordSCS

Num

ber o

f mes

sage

s

Number of lookups15361280

Fig. 15. Analysis of total costs.

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 585

the maintenance cost of SCS is lower than ML-Chord. Therefore,further investigation is required. In the following discussions, thetotal cost of the three models will be compared. For simplicity,only query cost and maintenance cost are considered. However, itis noted that the average cost of joining node of ML-Chord is lessthan that of SCS and Chord.

It is assumed that the number of nodes is 210 and maintenancetasks will be performed on each node every 30 s. Within one 30 s

period, there are 256;512;768;1024;1280, and 1538 queries. Thetotal costs are shown in Fig. 15. From the results, the total cost ofML-Chord is identical to that of SCS when the number of queries isabout 300. When the number of queries is greater than 300, ML-Chord is much superior than SCS. As for Chord and SCS, the totalcost of Chord is identical to that of SCS when the number ofqueries is 760. Additionally, the total costs were calculated whenthe numbers of nodes are 2k, where 11pkp15, and the results are

Page 9: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588586

similar to Fig. 15 except that the break-even point was moved left(i.e. the number of queries was decreased) when the number ofnodes was increased. In the interest of space, these figures areomitted.

4.6. Node leaving/failures

In this experiment, the stability of ML-Chord, Chord, and SCS isstudied. The stability of a P2P model is measured using theaverage query cost, the average number of timeouts, and thesuccessful rate after a large amount of nodes failed and beforemaintenance was performed. The number of nodes was set to 210,and the ratios of failing nodes are 0:0;0:1;0:2;0:3;0:4, and 0.5. Forcompleteness, every living node was selected in turn as a targetnode, and other living nodes queried the target node. Throughthese experiments, the average, minimum, and maximum of

Table 2Analysis of node failures

Simulation results Failure ratios

0.0 0.1 0.2 0.3 0.4 0.5

Avg. query cost (Min, max)

ML-Chord 4.042 4.188 4.393 4.414 4.413 4.507

ð1;6Þ ð1;8Þ ð1;9Þ ð1;9Þ ð1;9Þ ð1;12Þ

Chord 5.004 5.315 5.629 6.004 6.286 6.632

ð1;10Þ ð1;13Þ ð1;14Þ ð1;16Þ ð1;18Þ ð1;19Þ

SCS 13.832 23.375 31.881 37.503 26.825 22.340

ð1;39Þ ð1;309Þ ð1;351Þ ð1;303Þ ð1;243Þ ð1;192Þ

Avg. no. of timeouts (Min, max)

ML-Chord 0.0 1.535 2.381 2.163 2.860 3.542

ð0;0Þ ð1;9Þ ð1;11Þ ð1;12Þ ð1;16Þ ð1;22Þ

Chord 0.0 2.30 3.104 4.155 5.360 7.205

ð0;0Þ ð1;14Þ ð1;19Þ ð1;24Þ ð1;35Þ ð1;45Þ

SCS 0.0 5.114 12.289 22.044 23.624 23.868

ð0;0Þ ð1;66Þ ð1;153Þ ð1;231Þ ð1;232Þ ð1;228Þ

Successful rate (%)

ML-Chord 100.00 99.76 95.96 94.62 86.24 80.73

Chord 100.00 99.57 96.98 93.73 85.34 71.14

SCS 100.00 99.99 99.13 90.99 62.76 36.56

7

Link category: 2Link category: 4Link category: 8Link category: 16

Link category: 1

6.5

6

5.5

5

2.5

3

3.5

4

Num

ber o

f hop

s

2512 1024 2048 40

Number

4.5

Fig. 16. Average query costs for variou

query costs and timeouts are calculated. The experimental resultswere summarized in Table 2.

As shown in the table, the average query costs and the averagenumber of timeouts of ML-Chord is far superior than that of SCSand Chord. The only exceptions are that the successful rates of SCSis better than ML-Chord when the failure ratios are 0.1 and 0.2.This is because SCS does not have a TTL value, and thus querieswill not be terminated until either the target node is found or allnodes failed. However, when the failure ratio is greater than 0.2,the successful rate of SCS is very poor. This is because, in eachsemantic cluster, only one node is in charged mode of the inter-and intra-communications among semantic clusters. Once thefailure probability of the node is increased, the success rate isdecreased.

4.7. Number of categories per peer

As stated earlier, a node in ML-Chord may be linked to morethan one category. In previous experiments, it is assumed that onenode is only linked to one layer; in other words, each node has onefinger table and one BP finger table. However, it is uncommon thatone node may be linked to more than one layer in practice.Therefore, in this section, we studied the average query costswhen each node in the network was linked to 2k layers, where0pkp4. The experimental results are shown in Fig. 16. From thefigure, it is interesting to note that the average query cost is thelowest when each node is only linked to one layer. This is becausethe number of nodes in each category increases when a node islinked to more than one layer. When the number of nodes in acategory layer is increased, the query cost is also increased.Consequently, we recommended that, even when a node can belinked to more than one layer, the node should be assigned to onlyone layer.

4.8. The number of bridge peers

From the results of previous experiments, it is known that theaverage query cost is the highest when every node is a BP. Thus, itfurther investigations are worth to find out the reasonablenumber of BPs for a ML-Chord network. The experimental resultsare presented in Fig. 17. From the figure, it is clear that the average

96 3276816384 of nodes

8192

s number of categories per peer.

Page 10: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

Num

ber o

f hop

s

Nodes :2 10> >

> >

> >

Nodes :2 12

Nodes :2 14

Nodes :2 11

Nodes :2 13

Nodes :2 15

1 2 4 8 16 32 64 128Number of BP

256 512 1024 2048 4096 8192 16384

8

7

6

5

4

3

2

Fig. 17. Average query costs vs. number of BPs.

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588 587

query cost remains (or close to) flat when the number of BPsincreases in the beginning. However, when the number of BPsgrows over a threshold, the query cost starts increasing rapidly.For example, when the number of nodes is 210, the query coststarts increasing rapidly when the number of BPs is 16. Similarly,the threshold values are 32;64;128;256, and 512 when thenumbers of nodes are 211, 212, 213, 214 and 215, respectively. As aresult, it is safe to say that the number of BPs should be set to 8.

5. Conclusions and future work

When designing a P2P model, it is important to take bothefficiency and scalability into consideration. In this paper, weproposed an efficient and scalable multi-layered P2P model calledML-Chord. From the experimental results, it is shown that:

ML-Chord is superior than SCS and Chord in queries. � Although SCS’s maintenance cost is lower than ML-Chord, the

overall efficiency (measured in average maintenance cost plusaverage query costs) of ML-Chord still outperforms both SCSand Chord.

� When the number of nodes increases, the average query costs

of ML-Chord grows only slightly, while the average query costof SCS grows rapidly. The result demonstrated that ML-Chord ismore scalable than SCS.

� From Table 2, it is shown that ML-Chord is more stable than

both Chord and SCS. However, the success rate of ML-Chord is alittle worse than SCS when the failure ratios are 0.1 and 0.2.When the failure ratio was increased to 0.3, 0.4, and 0.5, ML-Chord proved to be more stable than the others.

� Although a node can be linked to more than one overlay layer,

it is suggested that one node should be linked to only one layerfor better performance.

As stated earlier, peer data management system (PDMS) is apotential and important research area. One critical issue in

designing a PDMS is to select a reasonable number of super-peersto manage metadata and keep maintenance cost reasonably low atthe same time. From the experimental results shown in Fig. 17, BPsseem to be a good candidate for super-peers. However, furtherinvestigation is required.

References

Antonopoulos N, Salter J, Peel R. A multi-ring method for efficient multi-dimensional data lookup in p2p networks. In: Proceedings of the 1stinternational conference on scalable information systems; 2006. p. 10–6.

Cai M, Frank M. RDFPeers: a scalable distributed RDF repository based on astructured peer-to-peer network. In: Proceedings of the 13th internationalconference on World Wide Web; 2004. p. 650–7.

Cai M, Frank M, Yan B, MacGregor R. Subscribable peer-to-peer RDF repository fordistributed metadata management. Web Semant: Sci Services Agents WorldWide Web 2004;2(2):109–30.

Garces-Erice L, Biersack E, Ross K, Felber P, Urvoy-Keller G. Hierarchical peer-to-peer systems. Parallel Process Lett 2003;13(4):643–57.

Gu T, Pung HK, Zhang D. A peer-to-peer overlay for context information search. In:Proceedings of the 14th international conference on computer communica-tions and networks (ICCCN 2005). NY: Wiley; 2005a. p. 395–400.

Gu T, Tan E, Pung HK, Zhang D. ContextPeers: scalable peer-to-peer search forcontext information. In: Proceedings of the 1st international workshop oninnovations in web infrastructure (IWI 2005); 2005b.

Haase P, Siebes R, van Harmelen F. Peer selection in peer-to-peer networks withsemantic topologies. In: Proceedings of the international conference onsemantics in a networked world (ICNSW’04); 2004. p. 108–25.

Kaashoek M, Karger D. Koorde: a simple degree-optimal distributed hash table. In:Proceedings of the 2nd international workshop on peer-to-peer systems(IPTPS’03); 2003.

Liu J, Zhuge H. A semantic-based P2P resource organization model R-Chord. J SystSoftware 2006;79(11):1619–31.

Lv Q, Cao P, Cohen E, Li K, Shenker S. Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th international conference onsupercomputing; 2002. p. 84–95.

Mislove A, Druschel P. Providing administrative control and autonomy instructured peer-to-peer overlays. In: Proceedings of the 3rd internationalworkshop on peer-to-peer systems (IPTPS’04); 2004.

Nejdl W, Wolf B, Qu C, Decker S, Sintek M, Naeve A, et al. EDUTELLA: a P2Pnetworking infrastructure based on RDF. In: Proceedings of the 11thinternational World Wide Web conference (WWW 2002); 2002. p. 604–15.

Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M, Brunkhorst I, et al. Super-peer-based routing and clustering strategies for RDF-based peer-to-peernetworks. In: Proceedings of the 12th international conference on World WideWeb; 2003. p. 536–43.

Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M, Brunkhorst I. Super-peer-based routing strategies for RDF-based peer-to-peer networks. Web Semant:Sci Services Agents World Wide Web 2004;1(2):177–86.

Novak D, Zezula P. M-Chord: a scalable distributed similarity search structure. In:Proceedings of the 2005 international conference on foundations of computerscience (FCS’05); 2005.

PeerSim, 2006. PeerSim. Available from hhttp://peersim.sourceforge.net/i.

Ratnasamy S, Francis P, Handley M, Karp R. A scalable content-addressablenetwork. In: ACM SIGCOMM 2001; 2001. p. 161–72.

RDFStore 2006. RDFStore. Available from hhttp://rdfstore.sourceforge.net/i.Ripeanu M, Foster I, Iamnitchi A. Mapping the Gnutella network: properties of

large-scale peer-to-peer systems and implications for system design. IEEEInternet Comput 2002;6(1):50–7.

Page 11: ML-Chord: A multi-layered P2P resource sharing model

ARTICLE IN PRESS

E.J.-L. Lu et al. / Journal of Network and Computer Applications 32 (2009) 578–588588

Rowstron A, Druschel P. Pastry: scalable, decentralized object location and routingfor large-scale peer-to-peer systems. In: IFIP/ACM international conference ondistributed systems platforms; 2001. p. 329–50.

Sen S, Wang J. Analyzing peer-to-peer traffic across large networks. ACM/IEEETrans Network 2004;12(2):219–32.

Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H. Chord: a scalable peer-to-peer lookup service for Internet applications. In: ACM SIGCOMM 2001;2001. p. 149–60.

Stoica I, Morris R, Linben-Dowell D, Karger DR, Kaashoek MF, Dabek F, et al. Chord:a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACMTrans Network 2003;11(1).

Wepiwe G, Simeonov P. A concentric multi-ring overlay for highly reliablep2p networks. In: Proceedings of the 2005 4th IEEE international

symposium on network computing and applications (NCA’05); 2005.p. 83–90.

Xu Z, Min R, Hu Y. HIERAS: a DHT based hierarchical p2p routing algorithm. In:Proceedings of the 2003 international conference on parallel processing(ICPP’03); 2003. p. 187–94.

Zhao B, Duan Y, Huang L, Joseph A, Kubiatowicz J. Brocade: landmark routing onoverlay networks. In: Proceedings of the 1st international workshop on peer-to-peer systems (IPTPS’02); 2002.

Zhao B, Huang L, Stribling J, Rhea S, Joseph A, Kubiatowicz J. Tapestry: a resilientglobal-scale overlay for service deployment. IEEE J Sel Areas Commun2004;22(1):41–53.