A Peer-To-peer System for on-Demand Sharing of Capacity Across Network Applications

A peer-to-peer system for on-demand sharing of capacityacross network applications

Georgios Exarchakos & Nick Antonopoulos

Received: 14 March 2009 /Accepted: 24 August 2011# Springer Science+Business Media, LLC 2011

Abstract As a plethora of various distributed applicationsemerge, new computing platforms are necessary to supporttheir extra and sometimes evolving requirements. Thisresearch derives its motive from deficiencies of realnetworked applications deployed on platforms unable tofully support their characteristics and proposes a networkarchitecture to address that issue. Hoverlay is a system thatenables logical movement of nodes from one network toanother aiming to relieve requesting nodes, which experi-ence high workload. Node migration and dynamic serveroverlay differentiate Hoverlay from Condor-based architec-tures, which exhibit more static links between managersand nodes. In this paper, we present a number of importantextensions to the basic Hoverlay architecture, whichcollectively enhance the degree of control owners haveover their nodes and the overall level of cooperation amongservers. Furthermore, we carried out extensive simulations,which proved that Hoverlay outperforms Condor and Flockof Condors in both success rate and average successfulquery path length at a negligible increase in messages.

Keywords Peer-to-peer. Computational resource sharing .

Resource migration . Keyword-based control

1 Introduction

Heterogeneous distributed applications deployed on differ-ent networks may have quite variable network throughputperformance requirements during their lifetime. We definethe Network Capacity as the number of user queries a nodecan process within a time unit. The Network Capacitydepends on the combined communication and computationthroughput of that node. In case the application’s workloadovercomes the available network capacity (overloadedsituation), new nodes are required to join the network toserve the demand. On the contrary, when the applicationproduces traffic that fewer nodes could efficiently serve(underloaded situation) the network application may freesome of them and increase the remaining nodes’utilization.

Hoverlay is a P2P management system that enables thesharing of reusable resources and specifically networkcapacity. Network capacity is a non-replicable, reusable,stochastically available resource and only one instance of itmay exist within a network and no more than one user mayuse it each time [1]. This architecture facilitates thecooperation of heterogeneous networks for improving theutilization of the spare (currently not used) capacity in thewhole system. The overlay consists of a set ofinterconnected servers each of which represents the nodesof an underlying network.

In this paper we extend the CSOA model presented in[2] with a detailed presentation of the resource matchingfunctionality of servers and with a mechanism for animproved server collaboration based on keyword-drivendescription of underlying applications. Moreover, this paperbuilds on top of [2] with an extensive evaluation sectioncomparing Hoverlay with competitive systems. That model,CSOA, was named after the initials of a phrase describing

G. Exarchakos :N. AntonopoulosUniversity of Surrey,Surrey GU2 7XH,Guildford, UK

N. Antonopoulose-mail: [email protected]

G. Exarchakos (*)Electrical Engineering, Eindhoven University of Technology,Eindhoven 5612 AZ, The Netherlandse-mail: [email protected]

Peer-to-Peer Netw. Appl.DOI 10.1007/s12083-011-0108-4

its functionality. Now, the same model is identified with aunique name: Hoverlay.

The main contributions of this work are:

& A distributed cooperation mechanism among free resour-ces and requestors in a multi-domain network. Hoverlayrelies on resource migration to; hence, control overresource mobility gets important. The proposed mecha-nism discourages closed group formations via rewiring.

& An extensive experimentation with the behaviour ofHoverlay against its more static counterpart Flock ofCondors. We claim that resource migration can signif-icantly improve the success rate and discovery latencyof a distributed environment especially in cases of flashcrowds at certain areas.

The paper has the following structure. After a literaturereview (section 2) to justify the scope of this work, we givesmall overview to Hoverlay (section 3) and we continuewith the cooperation mechanism at section 4. Sections 5, 6and 7 deal with the setup of the extensive experiments andtheir results in two scenarios respectively. Conclusions andfuture work close the paper at section 8.

2 Related work

Any centralized approach of interconnecting all the under-lying networks’ servers would suffer from the highworkload from frequent queries and advertisements of therequested and available capacity respectively. The high ratesof leave/join actions of the servers and nodes would causeextra significant update overhead. In case of a failure of thecentral manager, no capacity sharing would be possible.

Adopting a more distributed approach using P2P Networkssolves the single-point-of-failure and reliability problems ofthe centralised one. Replication of resources [3, 4] mayincrease the throughput performance of the overlay networksince it increases the availability of the same resource [5].Advertisement [6] or gossiping may direct the query faster tothe resource provider thus reducing the latency. DHT-basedP2P systems such as CAN [7], Chord [8], Pastry [9] andTapestry [10] can guarantee successful discovery if therequested resource is available in the network within O(logn)messages [11]. However, replication or gossiping/advertise-ment as well as informed resource discovery techniques ofP2P Networks are not applicable to the discovery of networkcapacity since it is a non-replicable reusable with highfluctuations on its availability resource. Organizing theoverlay servers in a Structured P2P would require the useof its lookup function each time a node joins the systemresulting in a high maintenance cost.

Existing research on high throughput computing hasproduced several solutions to the issue of reusable resources

discovery especially storage capacity and CPU cycles. Condor[12, 13] is one of the most mature high throughputcomputing technologies. It is a distributed job schedulerproviding the appropriate services for submission andexecution of jobs on remote idle resources even if they areunder different administrative domains. A central managerreceives the advertisements of the available resources andtries to submit the queued jobs to the appropriate ones whichreport back to the manager the execution state of each job.The central manager along with the idle resources constitutesthe condor pool. The flocking [14, 15] was introduced tostatically link several condor pools and share resourcesbetween them. Manual configuration of a pool’s neighboursis required thus limiting the adaptivity of the system in caseof dynamic changes in resource availability. It is alsoassumed that the pool managers run on reliable machines[16] since their failure can prevent the execution of new jobs.These problems can be approached with a Pastry [9] self-organizing overlay of condor pools [16]. The condor poolsare organized into a ring and proactively advertise their idleresources to their neighbours so that they can choose theseadvertised resources whenever necessary. Unfortunately, thisP2P-based flock of Condors requires a substantial mainte-nance overhead for updating the proximity-aware routingtables since it is based on the advertisements of availableresources. If the availability of the resources changes veryfrequently these updates need to be frequent and thereforehigh maintenance costs.

Finally, an important feature of Condor Flocking is thatthe execution machines are always managed by the samemanagers. Thus, every new discovery of the same/similarremote resources by the same manager follows the sameprocedure. Given that the required network capacity couldfrequently exceed the locally available, the local managerwould forward equally frequent queries seeking almost thesame amount of capacity; thus resulting in a significantnumber of messages.

The benefits of P2P overlays [11, 17] for the discoveryof reusable resources have been identified and used in P-Grid. P-Grid, identifying the update overhead posed byavailable resources’ advertisements on DHT-based over-lays, uses a tree-based distributed storage system of therequesting resources’ advertisements [18]. The resourceproviders locate in this tree the requestors they can serveand offer themselves for use. While other structured P2Pnetworks hash the indexing keys, thus limiting the search-ing capabilities, P-Grid enables complex queries. Theorganization of this overlay raises a number of concernsabout its scalability in case of large highly dynamicnetworks since an update action of one advertisement couldpropagate to many peers.

Sensor Networks is another field that uses the benefits ofP2P Networks to achieve reliable cooperation of networked

Peer-to-Peer Netw. Appl.

sensors. Recent research on P2P-based end-to-end bandwidthallocation [19] proposes a wireless unstructured overlay ofsensors. Initially a central peer posses all the bandwidth anddistributes it on-demand and every query is broadcasted toall peers. This system cannot be applied to the case ofnetwork capacity sharing since it makes the assumption thatthe available bandwidth within the whole network is knowna-priori and that the topology of the network remains thesame. Finally, it is suitable for wireless environments wherethe cost of broadcasting is the same to unicasting.

All the systems described above are efficient in thecontext they were developed for but they are insufficient inthe context of network capacity. Network characteristicsmay change extremely fast so that any advertisement and/orindexing scheme could result in frequent updates with ahigh cost on messages.

3 Architecture overview

In this section we provide a brief summary of the Hoverlayfunctionality. For more details please see [2]. A Hoverlayserver uses only a random list of other server IDs (NeighbourList) to share its own resources and discover new ones intheir overlay. Whenever necessary, a local (requesting) serverforwards queries originated from underlying nodes (internalqueries) to its neighbours and waits for an answer. A NodePool, embedded in every server, keeps records of availableunderutilized nodes and tries to satisfy an internal queryusing that capacity reserving as much as possible. Any extraamount of capacity (if at all), not provided by local pool, isqueried to neighbouring servers. Each request has a lifetimewhich is the maximum time a requesting server may wait foranswers from the overlay. A query terminates if its lifetimeexpires or an answer is received.

In case of an external query (sent by another overlayserver), servers try to completely satisfy it using capacity intheir local Node Pool only. A pool reserves, if there is locallysufficient capacity, a subset of the available resources thatcollectively represent capacity at least as much as the queryrequirements and initiates a handshaking protocol to deliver

those resources to query originator server. Otherwise (notenough capacity available), it forwards the same query to itsserver neighbours without reserving any resources. Figure 1illustrates the main Hoverlay components.

All three server components (Neighbour List, Node Pooland Query Processor) are for handling incoming queriesand answers. The Underlying Network Relocator resides inevery underlying node accomplishing its logical movementas well as monitoring its workload:

& Neighbour List (NL) manages a server’s direct links toneighbouring servers and determines the next destina-tions of a forwarded query. It applies the serverforwarding policy (if flooding all neighbours areselected, a subset otherwise). As stated in the previoussections, the deployment of informed techniques is nota efficient approach to service discovery in a highlyintermittent environment such as Hoverlay; thus, at theimplementation level, the servers will be using blindsearch methods. The proposed architecture is designedto operate under no centralized monitoring layer orneighbour-list updating scheme; though there is noguarantee for good direct links between servers, themaintenance costs of these lists stay minimal. This mayimprove system applicability in large-scale networks. Aserver’s neighbour list gets refreshed upon receivingand answer replacing the oldest neighbour with theanswer originator server. Thus, the answer rate drivesthe rewiring one; if most of system’s resources are busythe answers and, thus, rewiring are rare. Symmetrically,low loaded environments generate few queries andtrigger even fewer rewirings. The neighbour lists getrefreshed in situations with normal load. These frequentupdates help on keeping the overlay connected.

& Node Pool (NP) keeps a record of free nodes until they arereused. These records primarily focus on access andcommission details. Internal queries may reserve anyamount of capacity available whereas external ones canonly reserve the capacity that fully satisfies their require-ments. Node Pools keep some of the available capacity(safety capacity) for use by underlying busy nodes only.

Fig. 1 Hoverlay systemarchitecture and components


No external query is satisfied if the capacity availability ina Node Pool is lower than its safety level. Safety capacityis used to serve only internal queries produced by smallfluctuations of workload of underlying nodes. It prevents alarge number of queries from being forwarded to theoverlay. The safety capacity size is a percentage of theaverage requested capacity from the underlying nodeswithin the last few time units (time frame). Thispercentage and time frame are application specific andare configured by server administrators. An implementa-tion of Hoverlay may use any kind of well-establishedresource specification format (e.g. Condor Classads [20])adopted by all servers.

& Query Processor (QP) processes any incoming query andperforms all the communication activities of a server. Itcaches any internal and external query for a given periodof time and interacts with Node Pool to satisfy it, ifpossible responding back or forwarding it to neighbouringservers otherwise. In case of internal queries, it waits forthe answers. As soon as an answer is received it mergesthe discovered capacity with that reserved in local NodePool, if any, and acknowledges back.

& Underlying Network Relocator (UNR) resides in un-derlying network nodes and is responsible for control-ling node relocations from one network to another. It isused by remote servers that nodes migrate to.

4 Keyword-driven resource selection

Unlike Condor which practically relies on resource avail-ability and reliability, Hoverlay is by definition a system tobe deployed on highly dynamic environments. Here, thispaper introduces a mechanism based on the use ofkeywords to enhance the collaboration of nodes byimproving the success rate and resource utilisation of thesystem without creating closed groups that may fragmentthe topology. The aim of the following mechanism is not toshorten the paths between requestor and provider but ratherensure that the most suitable resources are well visible tonodes. The next paragraphs present this mechanismaccompanied by a worked example.

4.1 Keyword components

Answer refinement is necessary in the context of P2PNetworks, which are dedicated to specific applications. Notall the discovered nodes are able to participate in anunderlying network. Even if they are able to provide therequested resources, they may be inappropriate to take overa specific task. Therefore, keywords help to specify anyspecial requirement of the requestor. These keywords aresimple words or (key, value) pairs.

The three keyword containers of the model are theKeyword List (KL), the Popular Keyword List (PKL) andthe Keyword Exclusion List (KEL). Every node has aKeyword Exclusion List to specify the traffic that is notwilling to serve (i.e. incompatible applications). Everyserver uses a Keyword List denoting the applications itsunderlying network runs. Queries originated from thatserver carry its Keyword List. The same server also collectsthe most frequent keywords of the queries it receives in itsPopular Keyword List. These keywords are the mostfrequently occurring keywords in the Keyword Lists ofservers within its vicinity.

The pool of the server cannot contain a node that has in itsKeyword Exclusion List a keyword that is common to theKeyword List of the server. This ensures that each pool hassolely nodes that can serve traffic of the underlying network.Each query carries the Keyword List of its originating server.When a server receives a query, it tries to find nodes in itslocal pool that have no common keywords in their KeywordExclusion Lists with the query’s Keyword List. Thisguarantees that the nodes found will be useful in the contextof requesting server; thus compatible with the applicationrunning in requestor’s underlying network.

4.2 Query matching

Hoverlay is to be deployed on highly dynamic environ-ments to support primarily traffic needs of networkedapplications. It does assume guaranties of resource avail-ability and therefore it cannot 100% rely on the servicecapacity of registered nodes. This makes the use of kbpsand MHz two more feasible parameters of their capacity. Aserver upon receiving a query, it starts the matching processon this Node Pool. Based on the S field of the query (see[2]), it tries to discover nodes that cumulatively orindividually satisfy those capacity parameters.

Though the required network throughput can be easilycompared against those offered by free nodes, CPU speedand usage comparisons in heterogeneous environments aremore difficult. Hoverlay assumes that a multiplication ofthe CPU usage with its clock speed gives a roughestimation of the required processing capacity. Moredetailed comparisons need information of both hardware(e.g. cache, CPU architecture, MIPS, memory speed, I/Olatency) and software (e.g. language or executable, compil-er version, operating system) environments of nodes. Suchmatchmaking is well-documented and used by Condorswhich, however, differ in purpose; Condor focuses on jobcompletion whereas the proposed one here on traffic andenables nodes to request for extra capacity if necessary.CompuP2P [21], as a computational resource sharingparadigm deployed on dynamic P2P Networks, also usescycles/second to represent processor capacity.


The following scenario demonstrates a simple matchmak-ing process on a server with two free nodes in its Pool. Thereare three nodes involved in this interaction: the requestor, andthe two available ones. Their configuration appears below:

1. NodeA (requestor): Its network capacity isNC = 100 kbpsand processor capacity PC = {50%,2,000 MHz} and itsoverload threshold is NC = 100 kbps and PC ={40%,2,000 MHz} for the network and processorrespectively. Assuming that its load is: NC = 100 kbpsand PC = {60%,2,000 MHz} then it seeks for resourcessatisfying the: NC = 100 kbps and PC ={20%,2,000 MHz}. That is, the query that reaches theserver host of nodes B & C looks for a set of nodes eachof which has a minimum network throughput 100 kbpsand collectively {40%,2,000 MHz} processor capacity.

2. Node B (free): Its overload threshold published in itsserver Pool is NC = 20 kbps and PC ={20%,2,000 MHz}. This node cannot match the require-ments as its network throughput is well below therequested one.

3. Node C (free): Its overload threshold published in itsserver Pool is NC = 200 kbps and PC ={70%,1,000 MHz}. Its network throughput is enoughto cover A’s requirements. The product of its CPUusage with clock speed is bigger than that of query’s;that is, 70%*1,000 MHz>20%*2,000 MHz.

Therefore, Node C migrates to requesting node underly-ing network to take on part of requestor’s workload.

4.3 Answer and safety capacity selection

In this section we introduce two contributions regarding theanswer selection and the filtering of nodes in servers’ poolfor filling its safety capacity. That is, we deploy keyword-based heuristics to select one among multiple concurrentlyreceived answers and to assign nodes from the Node Poolto the safety capacity container.

While the system retains its principle using the firstreceived answer of each query, this keyword-drivencollaboration changes servers’ behaviour in case of multipleanswers in the same time-unit. In brief, a server chooses theone providing nodes compatible with the most applicationsof servers in its vicinity as detailed below. This is a self-lessapproach as each server tries to collect nodes useful to otherservers maximising its contribution to the network. Toachieve that each server does the following:

& calculate the union of KELs of all nodes for everyanswer it receives in the same time-unit referring to thesame query. Let’s assume that a server received Nα

answers labelled as αj for all 1 ≤ j ≤ Nα. The j-th answercarries a set of discovered nodes counting up to Rj TheKeyword Exclusion List of that answer is the union ofthe exclusion keywords (KELn) of each node:

KELj ¼SRj

x¼1KELn;x.

& select the answer with the minimum intersection ofthat union (see above) and the server’s PKL. Amongall received answers on the same time-unit for thesame query a server accepts the A such that

KELATPKL ¼ min

Na

k¼1KELk

TPKLð Þ.

While servers tend to collect resources most useful forthe whole network, they try to keep the least useful ones tofill their safety capacity. Every node arriving into a server’sdomain is compatible with the underlying application.However, after similar calculations as above, a server maydetect the ones that cannot serve traffic of most ofunderlying networks in its vicinity. Such nodes are theones with the maximum intersection of their KEL with theserver’s PKL. The following pseudocode provides moredetails on the safety capacity functionality. Assuming thatSF is the current set of nodes comprising the server’s safetycapacity, TargetSF is the level of safety capacity that serveraims for and Pool the set of nodes in its pool:


The above algorithm runs on every action on the pool.That is, when a new node arrives in the pool or removedfrom it, the server tries to optimize its safety capacity byrunning that algorithm.

4.4 Worked example

Without loss of generality the following example demon-strates a simplified interaction scenario involving theexchange of keyword-based components as explainedbefore. The example consists of 5 servers with twoneighbours each. Let’s also assume that queries areforwarded to all one’s neighbours and travel maximumtwo hops. The frequency of the received keywords appearsin PKL inside a subscript parenthesis next to the keyword.For simplicity, this example omits the query matching andfocuses on the functionality of keyword components. Thetable below describes the initial configuration of theservers. Table 1

Each query carries the KL of its originator. Assumingthat server D sends a query to its neighbours A & Ecarrying its KL, the query will be: Q1={D→A&E, (d,f)}.This query triggers a sequence of actions within the system:

1. Q1 reaches A: Server A has already keyword {d} in itsPKL whose frequency increases by 1; keyword {f} isalso added. Then, the query matching process findsnode n1 in the pool which, assumingly, can satisfy thecapacity requirements of Q1 but the intersection of itsK L w i t h q u e r y ’s K EL i s n o n em p t y :KELn1

TKLQ1 ¼ df g. Thus server A cannot help with

an answer and prepares Q2 to be sent to its neighboursD & E asking for the same capacity and with the sameKL. However, one of the neighbours is the queryoriginator and thus Q2 goes only to E: Q2={A→E, (d,f)}. At the end of this interaction server A is: A={(D,E),(f), (b(3),d(2),c(1),f(1)), (n1)}.

2. Q1 reaches E: Following the same principles, E has toupdate its PKL and check its Node Pool for resourceavailability. Though the intersection of node’s n2 KELwith query’s KL is empty: KELn2

TKLQ1 ¼ fg, assum-

ing that this node cannot cover the required capacity,server E has prepares Q3 to send to its neighbours B &

C: Q3={E→B&C, (d,f)}. Server E is now: E={(B,C),(c,j), (f(5),d(1)), (n2)}.

3. Q2 reaches E: Server E has already processed the samequery and thus without any further processing rejectsthe query.

4. Q3 reaches B: Server B updates its PKL and checks itsNode Pool for resource availability. Node n3 cannot helpas KELn3

TKLQ3 ¼ ff g but assuming that n4 offers

enough capacity to serve the requirements it is also acompatible solution as KELn4

TKLQ3 ¼ fg. Thus, serv-

er B prepares answer A1={B→D, (b,e,g)}. Server B isnow: B={(A,C), (a,g), (c(2),d(1),e(1),f(1)), (n3, n4)}.

5. Q3 reaches C: Server C has in its Node Pool threenodes; the KELs of two of them have empty intersec-t i o n w i t h q u e r y ’s KL : KELn5

TKLQ3 ¼ fg,

KELn6TKLQ3 ¼ df g and KELn7

TKLQ3 ¼ fg. Assum-

ing that none of the two nodes n5, n7 is able to satisfythe requirements, server C prepares an answer withincluding both nodes as their collective capacity coversthe required one: A2={C→D, ((a,b),(a,c,e))}. Server Cis now: C={(A,D), (b,h), (a(2),d(1),f(1)), (n5, n6, n7)}.

6. A1 and A2 reach D: Server D receives both answers inthe same time unit. Therefore, it has to select the one withthe least popular keywords in the KELs of the discoveredn o d e s : KELn4

TPKLD ¼ b; gf g f o r A 1 a n d

KELn5SKELn6ð ÞTPKLD ¼ bf g for A2. Answer A1

offers resources that are more compatible with servers inD’s vicinity; thus, A2 is rejected. Both answer originatorsare then notified for the decision of server D.

The condition of all servers after the actions aboveappears below. Table 2

Node n4 was moved from server B to server D to help anoverloaded underlying node. Assuming that n4 is releasedafter some time and that the configuration of server D hasstayed unchanged, the server has to decide which nodebetween n4 and n8 will keep for safety capacity. Calculatingt h e i n t e r s e c t i o n s KELn4

TPKLD ¼ b; gf g a n d

KELn8TPKLD ¼ gf g appears that n4 is more useful to

servers in its vicinity. Thus, assuming that n8 is enoughsafety capacity for that server, n4 stays in the Node Pool foron-demand migration. This way, servers in a dynamicenvironment of non-replicable reusable resources (e.g.

Table 1 Initial configuration ofthe worked example Neighbours KL PKL Node pool

A: D,E f b(3),c(1),d(1) n1 = {d,e}

B: A,C a,g c(2),e(1) n3 = {a,c,f}, n4 = {b,e,g}

C: A,D b,h a(2) n5 = {a,b}, n6 = {b,d}, n7 = {a,c,e}

D: A,E d,f b(2),g(2) n8 = {a,g}

E: B,C c,j f(4) n2 = {a,c}


Hoverlay) realise an altruistic approach to sharing resourcesdecoupling the connectivity of servers from the semanticlayer.

5 Experiments and evaluation

With regards to Hoverlay principal concepts and function-alities, the following set of experiments serves as a proof-of-concept and basis for a detailed evaluation. At the heartof Hoverlay is an Unstructured P2P Network of servers(pools of resources) designed to support on-demandresource migration between networks. Both this architec-tural element (dynamic P2P Overlay of pools) and sharingtechnique (resource migration) are two main differencescompared to other architectures. Existing competitiveresource sharing systems, appropriate to work as abenchmark are listed below:

& Condor: a local pool (manager) to facilitate resourcesharing within an individual network (centralisedarchitecture). Experimentation with Condor systemsmay provide useful material for evaluating possiblecosts (e.g. traffic, latency) introduced by Hoverlay as itproposes an arbitrary connection of similar systems.

& Flock of Condors; this category represents Condor-likesystems with interconnected (via an Unstructured P2POverlay) managers. They basically assume static linksbetween pools and no mobility of resources as opposedto Hoverlay. This family of resources is the closestparadigm to Hoverlay.

Hoverlay, via its dynamic Unstructured P2P Overlay ofservers, supports resource volatility and, via resourcemigration, aims at reducing discovery latency. All experi-ments below need to use a set of performance metricsagainst which all these three resource sharing paradigms(Condors, Flock of Condors and Hoverlay) will beassessed:

& Success rate: percentage of successful queries over thetotal number of those generated in the system. Due toworkload fluctuations, in certain cases answers deliv-ered to requesting nodes may be unnecessary as theirload has fallen to normal levels while waiting for a

response; such cases are not precluded from thatpercentage. This metric serves as an indication ofHoverlay efficiency in finding the required by over-loaded underlying networks capacity.

& Hops per Answered Query: Query latency represents theelapsed time from query generation till answer deliveryto requesting node. However, it depends on severalfactors such as connection speeds, processing powerand memory of intermediate servers on query paths. Asthese factors are difficult to predict, the path length(hops) of a successful query can be used to estimate itslatency without loss of generality assuming that all hopsare temporally equal. This metric corresponds to theaverage path length of successful queries from theirrequesting nodes to the first provider server. Lateresponses that a server may receive for an alreadysatisfied query do not contribute to it.

& Satisfied User Queries: number of additional userqueries (requested capacity) that overloaded networkshave managed to satisfy with extra capacity discoveredvia a given overlay. If an underlying node gets aresponse with its requested capacity, that system hasmade it possible for these user queries to be processedsuccessfully.

& Messages: number of messages (traffic) produced in thesystem by any operation (registrations, queries, answersand acknowledgements).

5.1 Simulation practices

For these evaluation purposes, a C++ object-oriented simu-lator (calledOmeosis) was developed. It can simulate time asa sequence of timeslots during which any message (query,answer, registration and acknowledgement) may travel for asingle hop only and ensures their concurrent processing andpropagation. It assumes that no connection introduces anyextra delay; thus, any message produced during a timeslotreaches its next destination on the following timeslot.Timeslots are equivalent to iterations of the main loop.Therefore, every iteration executes three phases:

1. Set global workload: add or remove workload on arandom subset of underlying nodes. Each node of thissubset takes on a chunk of that workload (w) defined as

Table 2 Final configuration ofthe worked example Neighbours KL PKL Node pool

A: D,E f b(3),d(2),c(1),f(1) n1 = {d,e}

B: A,C a,g c(2),d(1),e(1),f(1) n3 = {a,c,f}

C: A,D b,h a(2),d(1),f(1) n5 = {a,b}, n6 = {b,d}, n7 = {a,c,e}

D: A,E d,f b(2),g(2) n8 = {a,g}

E: B,C c,j f(5),d(1) n2 = {a,c}


a random integer within: w = {x ∈ ℵ: 1 ≤ z ≤ max(c)}where max(c) corresponds to the maximum capacity anode can have. This chunk distribution finishes as soonas the whole additional workload of this iteration isconsumed; this determines the size of that subset. Thus,there is a non-negligible probability that a) a node takeson two or more, b) multiple underlying nodes of asingle server take on at least one or even c) nounderlying node of a server take on any chunks.

2. Generate queries: once a message reaches a node orserver, another appropriate one (if necessary) is queuedin its output buffer for delivery on the followingtimeslot. Apart from output buffer, messages are alsostored in caches with the appropriate expiry time. If anunderlying node is still overloaded, as soon as thewaiting time of its cached query expires a new one withsame requirements is generated and again cached for anexponentially increasing period (i.e. TTL+2repetitions+1).Therefore, message queues in output buffers follow theorder of incoming ones. Exponential increase ofwaiting time before retry is a usual practice in requestsubmission to networks and helps them avoid bursts ofrequests especially during high load situations.

3. Send produced messages: the message delivery alsotriggers their processing upon reaching their destina-tion. This processing may result in more messagesgenerated as replies to or forwarding of the receivedones; these messages are to be delivered on thefollowing time-unit. While both phases above may bein parallel or sequentially in any order executed, thisone has to follow both. This is only an implementationrequirement as all the messages produced by previousphases can be sent out with one pass per node.

Every network component (server or underlying node)incorporates a set of modules which facilitate communica-tion (input and output buffers), message caching, time eventhandling (reactions to time progress such as cache cleaning,message regeneration) and message processing. Servers usea pool, which can reserve resources upon request, freeresources if response is not accepted or otherwise releasethem. Apart from time events, underlying nodes react toworkload changes, too. An increase of their load maytrigger the suitable query generation module. Symmetrically, adrop to its load may force the rejection of pending queries inits cache. All experiments used the same initial configurationof servers and nodes achieved by choosing the sameparameters and feed to the random number generator usedthroughout. Both server overlay size and number of underly-ing nodes are user inputs and remain fixed during theexperiment. The simulator first creates the server overlayand carries on with the underlying nodes. As soon as itgenerates a server, it populates its Neighbour List with a

random subset of the previously created ones; thus, theirpopularity follows the order they were created by resultinginto a power-law network.

The global workload fluctuates based on a patternpredefined by user input; applying a positive or negativeworkload per timeslot on underlying nodes implements arise or drop, respectively, of the global workload. Amonitoring module records all actions triggered by anyevent (message deliveries, workload changes, lifetimeexpirations) which finally creates appropriate output filesin both analytical and concise forms.

5.2 Experiments configuration

The experiments configuration is as follows:

& Network Sizes: 10,000 servers (i.e. independent net-works) and 50,000 nodes uniformly distributed amongthe servers. During connections initial setup, a nodelinks to any server with probability 1

10000. Therefore,servers with no underlying nodes cannot generatequeries or provide answers but may increase the hopsper answered query and number of messages. All nodesare initially free and available in their local pools.

& Capacity: each node represents capacity (c) of sizebetween 5 and 10 (5 ≤ c ≤ 10) units inclusive. Thecapacity density function shows the number of nodesrepresenting a certain amount of capacity. It follows a

geometric distribution: dðcÞ ¼ 12

� �c�4. Multiplying the

number of nodes with the sum of c*d(c) productsgives a close estimation of system capacity:

C ¼ 50000P10

c¼5300000 c

2c�4

� �capacity units.

& Connectivity & Time-to-Live: each server connects witha maximum 3 other random ones. Its Neighbour Listinitial configuration occurs during server’s creation withlinks to other existing ones. That is, the probability aserver attracting a new link from another one exponen-tially increases with latter’s age thus resulting into apower-law incoming-degree distribution. This Neigh-bour List size is small enough to increase the averagepath length between any two servers making difficultthe access of any resource from the vast majority ofunderlying nodes. The TTL of every query is set fixedto 7; that is, each query may access a maximum ofP7

t¼03t ¼ 3280 (=32.8% of all) servers. Practically, this

percentage is lower as the average number of serversanother one may reach is 26.29; that is, only 0.26% ofHoverlay overlay size.

& Workload: given the global initially available capac-ity, system-wide workload should have both valleys


and peaks fluctuating from 0% to even 150% ofglobal capacity. This helps the system evaluationunder several situations like workload increase/decrease, long-lasting strenuous high-load or relax-ing low-load phases.

As explained above, server-to-server degrees follow apower-law distribution as only a small number of them arevery popular; that is a result of the way new servers jointheir overlay. This coincides with real systems based onpreferential attachment of new nodes onto older and morestable ones: e.g. Gnutella WebCaches [22]. It is areasonable network topology for Hoverlay; it is expectedto have power-law properties as strong providers will attractmore links. Initial network configuration achieves a Poissondistribution of links from nodes to servers. Node migration(Hoverlay case only) may distort this distribution. Whilegood providers (pools with plenty of resources and, thus,high node-originated incoming degree) attract more andmore links they lose their resources faster. Finally, thedistribution of global capacity onto nodes (number of nodeswith the same capacity) follows a geometric densityfunction complying with the idea that most of Hoverlayusers offer low-capacity resources.

Despite the theoretical maximum number of reachableservers for each query (see above), monitoring Hoverlay atits initial phase shows that this number does not exceed 57,way lower to the theoretical one. That is the maximumnumber of servers a query may visit via OutboundNeighbour Lists even with infinite TTL; thus, serversoverlay appears to have a number of cyclic paths. Withoutrevisiting servers, a server may access all its reachableservers with an average path length of 7.5 hops. Therefore,TTL = 7 appears to be an appropriate query path length forthis network configuration.

The deployed discovery protocol is set to a well-knownand usual benchmark in Peer-to-Peer scientific community:Flooding. This mechanism minimises doubts about resultsaccuracy as it explores the whole vicinity of each request-ing server. Being more selective at query propagation at this

evaluation stage, factors such as selectivity heuristics coulddistort results (e.g. k-walkers have unstable success rate andresults would be unclear and non-conclusive regardingbenefits and costs of Hoverlay). Flooding on a staticoverlay ensures that queries from a server, either in aCondor-based or Hoverlay architecture, may explore thesame servers; this eliminates one factor of results differen-tiation: deployed search technique.

5.3 Query distribution

The system evaluation was based on two environmentsselected to test different aspects:

& Uniform Query Distribution: every node in the wholesystem has equal probability with any other to generatea query.

& Hotspot Query Distribution: queries are generated froma specific small subset of servers. This means thatservers that have generated queries in the past have highprobability to generate again.

While global capacity remains fixed as no node joins orleaves the overlay throughout the experiments, globalworkload fluctuates as shown in the two-layered Fig. 2.Both layers share the same x-axis (timeslots) but their y-axes have same units (capacity) and different scale. Bottomlayer describes the workload added or removed per timeslotwhereas top one illustrates the system-wide capacity andcumulative load applied on to the system. Fixed positive ornegative new load produces linear increases or decreases ofglobal load at same intervals. After initialization phase,Hoverlay load fluctuates between 3

2 and13 of its capacity.

6 Uniform query distribution

6.1 Query success rate

Figure 3 presents Condors, Flock of Condors and Hoverlaysuccess rates. Based on its plots, Hoverlay outperforms

Fig. 2 Global workload andcapacity in (a) uniform querydistribution and (b) hotspotquery distribution environments:added or removed workload pertimeslot (top layer) andcumulative workload andsystem capacity (bottom layer)


both disconnected Condors and their Flock with regards tothe percentage of successful over the total number ofgenerated queries. Throughout the experiment Hoverlayachieves better than Condors (up to 50%) success rateconfirming that interconnecting individual networks con-tributes to the satisfaction of more user requests. For mostof the experiment duration, when the system is normallyand heavily loaded, Hoverlay manages to satisfy biggerportion of queries (on average 5% more) compared to Flockof Condors.

Any explanation of this improvement in success rateachieved with Hoverlay should derive from the fundamen-tal differences of the two systems: resource migration.Indeed, the reasoning is two-faceted: a) resource reserva-tions in Flock of Condors last longer and b) resourcemigrations are also translated to query migrations. Indetails:

& Service capacity is a highly dynamic resource; one ofthe system design requirements was that it should notrely on guarantees that migrated resources have thecapacity their provider pools claim to have. A request-ing node may reject discovered but unnecessary orunsuitable capacity. Resource migration eases the re-registration of this capacity with requestor’s poolavoiding extra messages and latency to return it backto its provider pool. In case of Hoverlay, once capacityis discovered an answer travels from remote provider(Server A) to requestor server (Server B) which then, atthe same time, acknowledges the provider and wrapsthat answer to forward it to the underlying node. If forany reason that capacity is not used, it registers withServer B; providers get acknowledgements on the nexttimeslot (reservations last 2 timeslots). However in caseof Flock of Condors, rejected resources need to returnback to the provider with the acknowledgement to theirresponse. Before Server B acknowledges Server A, ithas to wait for the acknowledgement from the under-lying node. Thus, discovered resources need to stayreserved for 4 timeslots before released. While Flock ofCondors keeps resources reserved for 2 extra timeslotspractically useless, Hoverlay provides them to request-ing nodes serving extra load.

& The overlay topology of these experiments is a power-law one. With a uniform distribution of load on nodes,every node initially has the same probability to generatea query. The majority of servers are at the edges of theoverlay (leaf servers) and, hence, most of the queriescome from those edges. Due to this overlay topology,most of the query paths direct to high-incoming-degreeservers. These queries in combination with resourcemigrations force capacity to move from the centre of theoverlay to its edges. Further increase of global workloadwill most likely generate queries from those overlayedges. With an adequate TTL and due to the small-world phenomenon in the topology, queries maytraverse the whole overlay. However, if resources donot migrate, extra increase of their workload generatesqueries forwarded via always the same server. A goodportion of resources are managed by servers in theoverlay centre which, however, have a shorter horizon,less accessible capacity and worst success rate.

Uniform resource distribution in scale-free networksdoes not work for the benefit of success rate in highlydynamic environments and resources.

Two vertical lines divide the Fig. 3 area into three phases(A, B, C): A & C marked with (+)’s and B with a (−). Thesemarks denote the areas in which Hoverlay is more (A & C)or less (B) successful than Flock of Condors. When thewhole system handles low global workload (Phase B),Flock of Condors reach even 20% higher success rate thanHoverlay. This deterioration is superficial due to the verylow number of produced queries and the even lowernumber of satisfied ones for all three systems. Moreover,the difference between the number of satisfied queries ofHoverlay and Flock of Condors is negligible compared tothat of Phases A or C. Therefore, migrating resources canhelp on satisfying more user requests even in a staticnetwork (i.e. without rewiring).

Condors, as a set of disconnected pools preventingaccess to remote resources, seem to have from 10% to even50% lower success rate compared to the other twoarchitectures. For the first few timeslots, Flock of Condorsand Hoverlay reach 100% success by seeking for both localand remote capacity. Some servers contain no local free

Fig. 3 Success rate ofdisconnected Condors, Flock ofCondors and Hoverlay in auniform query distributionenvironment


capacity, even on the first few timeslots, in which caseCondors cannot serve requests from their underlying nodes;hence, lower success rate than Flock of Condors orHoverlay.

Global workload in the beginning of Phase A steadilyincreases and therefore no new fresh nodes appear in serverpools. Gradually all resources within requestors vicinityexhaust and a) more queries fail, b) more new queriesbrowse the overlay and c) more servers regenerate theunsatisfied ones. Capacity exhaustion increases the numberof queries and their repetitions deteriorating the success rateof all three systems.

Symmetrically, applying negative workload on nodesmakes the global cumulative one drop; some of these nodes(resources) become underloaded and available for on-demand migration or local re-commission via their pools.In some other cases despite their workload drop, they mayremain overloaded but reactively adjust downwards theirrequested capacity. That is, responses for past queries maynot be necessary and thus the discovered capacity mayeither stay in its new local pool or be partially used by therequesting node. The unused portion of that capacity mayserve extra load of the underlying network without extrarequests.

All systems during Phase B generate very few queriesand satisfy even fewer. Flock of Condors satisfy negligiblymore queries than Hoverlay but this difference is substantialcompared to the number of queries they generate. Thismakes the success rate of Flock of Condors by 20% betterthan that of Hoverlay. However, it is a misleadingconclusion if not accompanied by this observation.

As shown in 2, from timeslot 80 till around 100, globalworkload drops and stabilises at almost 14 of global capacity.On timeslot 96 (left border of phase B), system’s workloadapproaches the 2

3 of its capacity. That is the point afterwhich Flock of Condors become more successful. Asunderlying nodes lose workload, some either becomeunderloaded or normally loaded or even remain overloadedat same or lower workload levels. There are no new queriesbut for repeated ones. As fresh capacity becomes available,repeated queries get satisfied improving success rate forboth systems.

While in case of Condor flocking unnecessary capacityreturns back to its provider, Hoverlay moves it to requestingserver. Within phase B, this migration is useless as thatcapacity may only be used on workload increase or evenpotentially harmful for system success rate as it may bemoved away from places accessible by requesting servers.This is the case for those few repeated queries. Globalworkload affected most of the loaded nodes but not all;some remain overloaded and thus keep regeneratingqueries. As workload drops and before its stabilisation atits lowest level, repeated queries from several servers moveout capacity from the vicinity of other servers which keeppropagating queries even during steady-workload period.Without any workload increase that will trigger otherservers query propagation, no fresh capacity can migratein their horizon whereas Flock of Condors repositions freecapacity to its initial provider and thus probably close torepeated query generators.

This clearance of requesting servers’ horizon fromavailable resources explains that success rate swap betweenFlock of Condors and Hoverlay architectures with former’sbeing higher than latter’s one. However, that cannot justifywhy their difference in phase B is well bigger than in theother two. That difference can be justified considering theminimal number of queries on which this percentage isbased. Flock of Condors keep, in low global workloadcases, the resources at their initial distribution amongservers bringing a bigger impact on success rate. Over-loaded nodes and their queries start increasing with theglobal workload. At the end of phase B good portion ofthose queries are successful due to the available capacitygenerated during the last workload drop. Therefore, theimpact of that factor gets lower and the success rate of bothsystems increases.

6.2 Average path length of successful queries

Figure 4 presents the average number of hops successfulqueries had to travel before they discover their first answer.The graphs confirm that Hoverlay manages to achievebetter success rate with shorter query paths especially onworkload fluctuations. It helps queries get responses from

Fig. 4 Average number of que-ry hops before they discovertheir first answer deployed ondisconnected Condors, Flock ofCondors and Hoverlay in auniform query distributionenvironment


0.5 to even 2 hops sooner than Flock of Condors do. This2-hop improvement takes place around timeslots that globalworkload started decreasing: fresh capacity appears close tothe regenerators of repeated queries.

The average path lengths of Flock of Condors andHoverlay do not follow the pattern of success rate and staywell below query TTL. The power-law topology of theoverlay, in the absence of rewiring, stays the samethroughout the experiment. Most query paths start fromthe edges of the network and finish at its centre. Thus,while workload increases resources in the centre becomescarce; the biggest portion of the capacity is close to leafservers. Long successful paths are less than short ones and,thus, their average remains below TTL mean value 4.5 (1hop from node to server plus TTL

2 ).As soon as workload starts dropping the effect of

resource migrations becomes clearer. Fresh capacityappears in the pools of domains it migrated to, closer torequestors. On the contrary, Flock of Condors places thatcapacity back to its originators and thus repeated queriesneed to travel further to re-discover it. This explains whyFlock of Condors exhibit path length bursts on the first fewtimeslots the workload starts decreasing. While workloadkeeps dropping, free capacity increases close to requestorsvicinity, servers generate no new queries and more andmore repeated ones are cancelled. This keeps the averagehop count in low levels.

In case of disconnected set of Condors, queries can onlytravel from underlying nodes to their local servers; hence,one hop. Till timeslot 44 successful queries of both Flock ofCondors and Hoverlay exhibit almost the same hop count.Workload increases linearly for the first 44 timeslots and all

migrated nodes join the requesting underlying networks.Given that both systems use flooding tested on sametopologies, all servers explore their whole horizon and thusthe average hop count has minimal deviation.

6.3 Traded capacity and cost in messages

In general, Hoverlay satisfies more load than Flock ofCondors (Fig. 5 left) though both systems request almostthe same amount (Fig. 5 right). This improvement comes atalmost same cost in messages (Fig. 6 left). Unlike Flooding,the number of messages spent by random walkers is highlycorrelated to the success rate. Therefore, Hoverlay couldoutperform flocks of Condors if a search mechanism that isable to detect migrations was used. Condors satisfies muchless requested capacity than the other two networkedsystems; however, Condors have a minimal overall cost inmessages as their queries can only travel one hop.

Following similar patterns as success rate, satisfiedcapacity of Hoverlay is more than that of Flock of Condorsin phases A & C. Flock of Condors superiority in Phase Bbecomes insignificant as the requested capacity during thatperiod is much lower than that of remaining two phases.

Figures 5 left and 6 left and right have a commoncharacteristic: almost vertical deep decreases and highincreases in plotted lines. These radical changes happenon timeslots that new load swaps from positive to negativevalues and vice-versa. Though within 45–65 timeslotsinterval the global workload drops linearly the number ofmessages, number of queries and requested capacity arenon-zero and follow similar patterns. Similarly, the patternof 45–65 interval tops the new positive workload of 66–79

Fig. 5 User-end perceivedsatisfaction: (left) capacityrequested and satisfied fromserver overlays and (right)requested capacity per timeslot

Fig. 6 Cost in messages: (left)total number of messagesproduces and (right) totalnumber of generated queries


timeslots. Differences on those patterns appear as portionsof those repeated queries get satisfied or stopped asunnecessary (especially after workload drops). Condors,due to lack of query forwarding in the servers overlay,exhibit much less number of messages. Their low successrate forces them to repeat many queries and finally overpassboth Flock of Condors and Hoverlay in number ofmessages and queries.

To sum up, this experiment proves that even in fixedtopologies with high workload situations and similar searchtechnique (exhaustive flooding) deployed, Hoverlay ismore efficient than Flock of Condors in terms of:

& success rate (percentage of successful queries),& satisfied capacity (portion of requested capacity that

was satisfied) and& average path length of successful queries before they hit

their first answer.

This is basically the positive effect of resource migrationto requesting networks and comes at practically no cost inmessages. However, under certain circumstances, when theglobal load gets lower than global capacity the resourcesdistribution is skewed and may negatively affect thesuccess rate. Considering the low total and even lowersatisfied number of queries on those cases, that deteriora-tion of success rate is almost insignificant.

7 Hotspot query distribution

To setup these experiments few more parameters arenecessary to generate hotspots throughout.

Number of spots: a subset of servers are picked tobelong in three, not necessarily neighbouring, hotspot areas.In fact, each of these areas consists of a centroid (server)and all other servers within its direct or indirect vicinityaccessible via incoming or outgoing links.

Spot radius: any server within a hotspot is maximum fivehops away from its centroid.

Spot lifetime: each spot area lives for 70 timeslots. Atany moment, there are three hotspot areas. Thus, in 210timeslots the areas change three times. The new set of areasis picked randomly and may overlap with the previous set.

TTL is set to maximum 7 hops as longer paths wouldunrealistically overload the network. The workload in thesethree small areas cannot be as high as the load a wholenetwork could take as much less capacity is accessible withTTL = 7 from hotspots (see Fig. 2b).

Most of the observations from the results below sharethe same analysis as in previous experiments. Detailedexplanations will only follow observations that differentiatethese results from section 6.

7.1 Query success rate

As shown in Fig. 7, all systems experience, in general, lowsuccess rate but Hoverlay is the most successful (up to20%) compared to Condor-based systems. Hotspot areasare very small compared to the network size and the globalworkload is uniformly distributed among their nodes.

Resource migrations in Hoverlay help each hotspot areato serve more queries and to increase the number ofresources that can take over future workload. Condor-basedsystems are task-oriented; each node is assigned a veryspecific job and cannot take over another from the sameflock. A remote resource needs to return its control back tothe originator flock after the completion of a job. However,Hoverlay resources remain available at the server theymigrated to and are able to take over extra workload bytheir new manager server. Newly migrated nodes join thenetworks without getting overloaded; they can take onsome more workload without overpassing their threshold.This reduces both number of queries and requested capacityand improves the success rate.

On timeslot 70, a hotspot relocation takes place to areasthat cumulatively contain more available capacity. Thismakes success rates of all systems abruptly increase.Though there is now enough accessible and availablecapacity only about 50% of queries are successful.Workload already applied onto certain servers remains onthem until is satisfied. Therefore, past highly loadedhotspots (as is the case during 0–70 timeslots) regeneratequeries alongside new ones. Around timeslot 80, thoughglobal workload starts decreasing, success rate of allsystems drops quickly, too, justified by the followingreasoning:

Fig. 7 Success rate of Condors,Flock of Condors and Hoverlayin Hotspot areas


1. calculation of success rate includes queries from bothpast and new hotspots,

2. past hotspots do not generate new but only repeatunsatisfied queries,

3. new hotspots have much more capacity and lessworkload than past ones, thus, most of their queriesget satisfied,

4. as soon as their workload starts dropping (timeslot 80onwards), new hotspots stop producing or repeatingqueries,

5. queries come from old hotspots only which, however,have not managed to discover more capacity despiteexploring their entire vicinity.

Decreasing workload helps success rate to increase.Unlike previous experiments, Hoverlay is more successfuluntil timeslot 140. The main percentage of global workloadis still on servers of past hotspots. Hence, workload isremoved primarily from old hotspot areas and freshcapacity remains in these areas. However, in case of Flockof Condors, free capacity returns to its original host andowner. This may move capacity outside hotspot areasmaking difficult to reach from inner servers.

7.2 Average path length of successful queries

Based on the left part of Fig. 8, Hoverlay outperformsFlock of Condors in terms of average query path length ofsuccessful queries. While the former fluctuates between 1and 3 hops, the latter reaches even 8 hops with an averageof 2 hops longer paths for most of the experiment.

Initially, queries tend to travel far to discover resourcesas low-capacity hotspots are charged with high workload.This causes a non smooth increase of average path length.Resource migration helps Hoverlay reduce requestedcapacity per query and thus increase success in few hops.As hotspots exhaust all reachable capacity, query successdrops to zero for Condor-based systems and thus noaverage hop count to be recorded. Within first few timeslotsHoverlay moves all discovered capacity within hotspotareas keeping query paths shorter than Condor-basedsystems and thereof even gradually reducing them.

The second interval starts with a relatively smallworkload increase compared to capacity available withinthe second hotspot triplet and thus both Flock of Condorsand Hoverlay exhibit similar path lengths. Once workloadstarts dropping, all queries come from previous triplets ofhotspots. Hence, fresh capacity returns to its owner: a)outside -if Flock of Condors- or b) inside -if Hoverlay- theirborders. This explains why the latter satisfies repeatedqueries faster than the former. At the final phase of thisexperiment, success rate of Flock of Condors approacheszero and thus the average path length is a calculation of asmall sample causing a fluctuation on the graph.

This experiment shows the important gains of Hoverlaycompared to Flock of Condors especially in cases ofworkload bursts to specific areas of a network. It is worthmentioning that the number of messages spent by those twosystems are almost the same as Flooding is deployed onboth. Finally, the results are further confirmed by right sideof Fig. 8 which illustrates the gains over satisfied userqueries. The locality of fetched resources allows thediscovery of more capacity when workload increases.

8 Conclusion

Hoverlay is a system that enables logical movement ofnodes from one network to another aiming to relieve nodesexperiencing high workload. Remote nodes migrate into therequesting node domain to take over some of that excessiveworkload. It is an arbitrary network of servers (overlay)each of which represents a single underlying network. Allservers use blind search techniques to discover freeresources from other networks and move them into therequesting network. It is designed to be tolerant to node andserver failures since it has minimized the maintenance costsof server and node components.

Node migration and dynamic server overlay differentiateHoverlay from Condor-based architectures which exhibitmore static links between managers and nodes. In thispaper, we have presented a number of important extensions tothe basic Hoverlay architecture which collectively enhance thedegree of control owners have over their nodes and the overall

Fig. 8 Average number of hops(left) of successful queries andaverage satisfied requestedcapacity (right)


level of cooperation among servers. The former is achievedthrough the concept of exclusion lists used by nodes in orderto avoid participation in incompatible applications. This isactually a general purpose method that can be used to deploynode mobility constraints imposed by their owners. Cooper-ation among servers is enhanced in two ways: firstly eachserver chooses nodes based on their overall compatibility withits predecessor servers, secondly servers reserve safetycapacity with nodes that are likely to be the least useful inthe same context as above.

The keyword-based cooperation of nodes in a distributedenvironment, even in the context of P2P systems, is a well-researched area. However, unlike other studies on semanticcooperation, Hoverlay does not use keywords to refine servers’neighbour lists based on the type of resources they provide. Inhighly dynamic environments, rewiring may quickly createclosed groups of servers sharing the same type of resourcesand fragment the network. Queries trapped in these groupscannot discover new capacity that may appear outside thegroup thus deteriorating the success rate. The proposed schemeimproves the visibility and mobility of widely compatible andacceptable resources and restricts the less useful ones. Theselatter resources tend to gather at the edges of the overlaypractically deteriorating their access to popular capacity inoverlay’s centre and consequently their success rate.

An extensive simulator was developed to evaluate theconceptual characteristics of Hoverlay in static environmentsagainst the benchmark of Condors and Flock of Condors. Aseries of simulations were run and the results proved thatHoverlay performs better than disconnected Condors andFlock of Condors achieving important improvements in bothsuccess rate and average successful query path length at anegligible expense in messages. Another significant contribu-tion in this paper is the second set of experiments of the twosystems in flash crowd at specific areas of a network. In thesescenarios the gains of Hoverlay are clear and suggests thatresource migrations more robust to network changes.

There are plenty of interesting avenues to explore forfurther work. Currently we are experimenting with a new typeof blind search mechanisms that can trace resource mobilitywithout logging statistical data on nodes. This concept isexpected to allow Hoverlay discover remote/migrated resour-ces and converge to network changes very fast. We alsodevelop heuristics to pro-actively move unutilised nodes frompools to other servers in order to cater for future overloadscenarios before they actually occur. These heuristics arebased on real-time topology features extraction.

References

1. Bedrax-Weiss T, Macgann C, Ramaksishnan S (2003) Formaliz-ing resources for planning, PDDL03: Proceedings of the Work-

shop on Planning Domain Description Language, Trento, Italy, pp7–14, June

2. Exarchakos G, Antonopoulos N (2007) Resource sharing archi-tecture for cooperative heterogeneous P2P overlays. J Netw SystManag 15:311–334

3. Cohen E, Shenker S (2002) Replication strategies in unstructuredpeer-to-peer networks. ACM SIGCOMM Comput Comm Rev 32(4):177–190

4. Tsoumakos D, Roussopoulos N (2006) Analysis and comparisonof P2P search methods, ACM 1st International Conference onScalable Information Systems, Hong Kong, China, vol. 152, no.25, May

5. Yang X, de Veciana G (2006) Performance of peer-to-peernetworks: service capacity and role of resource sharing policies.Perform Eval 63(3):175–194

6. Zhou D, Lo V (2004) Cluster computing on the fly: resourcediscovery in a cycle sharing peer-to-peer system, CCGrid: IEEEInternational Symposium on Cluster Computing and the Grid,Chicago, Illinois USA, pp 66–73, April

7. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) Ascalable content addressable network, conference on applications,technologies, architectures, and protocols for computer communi-cations, San Diego, California, United States, pp 161–172, August

8. Stoica I, Morris R, Liben-Nowell D, Karger DR, Kaashoek MF,Dabek F, Balakrishnan H (2003) Chord: a scalable peer-to-peerlookup protocol for Internet applications. IEEE/ACM Trans Netw11(1):17–32

9. Rowstron A, Druschel P (2001) Pastry: scalable, decentralizedobject location, and routing for large-scale peer-to-peer systems.Lect Notes Comput Sci 2218

10. Zhao BY, Kubiatowicz JD, Joseph AD (2001) Tapestry: aninfrastructure for fault-tolerant wide-area location and routing.University of California at Berkeley, Berkeley

11. Lua K, Crowcroft J, Pias M, Sharma R, Lim S (2005) A surveyand comparison of peer-to-peer overlay network schemes.Commun Surv Tutor IEEE: 72–93

12. Litzkow MJ, Livny M, Mutka MW (1988) Condor-a hunter of idleworkstations, 8th International Conference on Distributed Com-puting Systems, pp 104–111

13. Thain D, Tannenbaum T, Livny M (2005) Distributed computingin practice: the condor experience. Concurrency Comput Pract Ex17(2–4):323–356

14. Evers X, de Jongh JFCM, Boontje R, Epema DHJ, van Dantzig R(1993) Condor flocking: load sharing between pools of work-stations, technical report DUT-TWI-93-104. Delft University ofTechnology, The Netherlands

15. Epema DHJ, Livny M, van Dantzig R, Evers X, Pruyne J (1996)A worldwide flock of Condors: load sharing among workstationclusters. J Future Generat Comput Syst 12(1):53–65

16. Butt A, Zhang R, Hu C (2006) A self-organizing flock ofCondors. J Parallel Distr Comput 66(1):145–161

17. Androutsellis-Theotokis S, Spinellis D (2004) A survey of peer-to-peer content distribution technologies. ACM Comput Surv(CSUR) 36(4):335–371

18. Philippe K (2001) P-Grid: a self-organizing structured P2Psystem, Sixth International Conference on Cooperative Informa-tion Systems (CoopIS 2001). Lect Notes Comput Sci 2172:179–194

19. Caviglione L, Davoli F (2005) Peer-to-peer middleware for band-width allocation in sensor networks. IEEE Comm Lett 9(3):285–287

20. Raman R, Livny M, Solomon M (1999) Matchmaking: anextensible framework for distributed resource management. ClustComput 2(2):129–138

21. Gupta R, Sekhri V, Somani AK (2006) CompuP2P: an architec-ture for internet computing using peer-to-peer networks. IEEETrans Parallel Distr Syst 17(11):1306–1320


22. Karbhari P, Ammar M, Dhamdhere A, Raj H, Riley GF, Zegura E(2004) Bootstrapping in Gnutella: a measurement study, volume3015/2004 of Lecture Notes in Computer Science. SpringerBerlin, Heidelberg, pp 22–32

Dr. George Exarchakos is aResearcher in Autonomic Net-work at the Eindhoven Universityof Technology, The Netherlands.His research interests span fromdistributed network features ex-traction and modell ing tonetwork-aware video distributionand P2P Computing. He finishedhis PhD in the Department ofComputing at the University ofSurrey in 2009. He successfullycompleted the BSc in Informaticsand Telecommunications of theUniversity of Athens in 2004. The

same year joined the MSc in Advanced Computing at Imperial CollegeLondon to complete it in 2005. He collaborated with the NetworkOperation Centre of the University of Athens from 2003 until 2004. Hisachievements include an authored book on Networks for PervasiveServices published by Springer in 2011 and an edited handbook ofresearch on P2P and Grid Systems for Service-Oriented Computingpublished by IGI-Global in 2010. He has gotten best paper awards in theinternational network and multimedia conferences Since the start of hisPhD in 2005. He has contributed to peer-review journals and internationalconferences with more than 20 articles. Contact him at the Department ofElectrical Engineering, Eindhoven University of Technology, 5612 AZEindhoven, The Netherlands or via email at [email protected]

Dr. Nick Antonopoulos is aSenior Lecturer at the Depart-ment of Computing, Universi-ty of Surrey, UK. He holds aBSc in Physics from theUnivers i ty of Athens in1993, an MSc in InformationTechnology from Aston Uni-versity in 1994 and a PhD inComputer Science from theUniversity of Surrey in 2000.He has worked as a networksconsultant and was the co-founder and director of acompany developing Web-

based management information systems. He has over 9 yearsof academic experience during which he has designed and hasbeen managing advanced Masters programmes in computerscience at the University of Surrey. He has published over 50articles in fully refereed journals and international conferencesand has received a number of best paper awards in conferences.He is the organiser and chair of the 1st international workshopon Computational P2P networks. He is on the editorial board ofthe Springer journal of Peer-to-Peer Networking and Applica-tions (effective from 2009) and on the advisory editorial boardof the IGI Global Handbook of Research on TelecommunicationsPlanning and Management for Business. He is a Fellow of theUK Higher Education Academy and a full member of the BritishComputer Society. His research interests include emergingtechnologies such as large scale distributed systems and peer-to-peer networks, software agent architectures and security.Contact him at the Department of Computing, University ofSurrey, Guildford, Surrey, GU2 7XH, United Kingdom; [email protected]


Documents

A Peer-To-peer System for on-Demand Sharing of Capacity Across Network Applications