544 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL

Probabilistic Discovery of SemanticallyDiverse Content in MANETs

Andronikos Nedos, Kulpreet Singh, Raymond Cunningham, and Siobhan Clarke

Abstract—Mobile ad hoc networks rely on the opportunistic interaction of autonomous nodes to form networks without the use of

infrastructure. Given the radically decentralized nature of such networks, their potential for autonomous communication is significantly

improved when the need for a priori consensus among the nodes is kept to a minimum. This paper addresses an issue within the

domain of semantic content discovery, namely, its current reliance on the preexisting agreement between the schema of content

providers and consumers. We present OntoMobil, a semantic discovery model for ad hoc networks that removes the assumption of a

globally known schema and allows nodes to publish information autonomously. The model relies on the randomized dissemination and

replication of metadata through a gossip protocol. Given schemas with partial similarities, the randomized metadata dissemination

mechanism facilitates eventual semantic agreement and provides a substrate for the scalable discovery of content. A discovery

protocol can then utilize the replicated metadata to identify content within a predictable number of hops using semantic queries. A

stochastic analysis of the gossip protocol presents the different trade-offs between discoverability and replication. We evaluate the

proposed model by comparing OntoMobil against a broadcast-based protocol and demonstrate that semantic discovery with proactive

replication provides good scalability properties, resulting in a high discovery ratio with less overhead than a reactive nonreplicated

discovery approach.

Index Terms—Mobile ad hoc networks, distributed discovery, probabilistic algorithms, gossip protocols, semantic services.

Ç

1 INTRODUCTION

MOBILE Ad Hoc Networks (MANETs) are formed whenautonomous mobile devices with short-range wireless

communication capabilities cooperate to provide sponta-neous connectivity. Although a challenging environment,MANETs find wide applicability in scenarios whereopportunistic networking and collaborative activity isrequired, e.g., pervasive computing and mobile games. Insuch a dynamic environment, the location of requiredcontent (i.e., services or data) cannot be hard-wired becausenodes can acquire transient addresses and networks maypartition at any time. The result is that discovery becomesan important process preceding any collaborative effort. Ingeneral, discovery attempts to match required and availablefunctionality by using an appropriate representation lan-guage and suitable network support.

In MANETs, representation and discovery of content hasseveral unique requirements: the opportunistic nature ofthe network necessitates reduced human interventionrequiring automated discovery; node autonomy impliesthat content representation is difficult to standardize or toguarantee its agreement between providers and consumers;the open and dynamic nature of the network requiresdiscovery mechanisms that scale; and resource-constraineddevices dictate the efficient distribution of discovery load.We elaborate on each of these requirements below.

When minimum user intervention is required in mobilenetworks with no stable and permanent set of nodes,manually identifying relevant functionality requires userinvolvement and constitutes additional discovery latency.Representing services or data with capabilities rather thannames offers greater flexibility in discovery queries. A moreexpressive content description can encourage automateddiscovery based on queries about explicit functionality [1],rather than discovery based on the implicit associationbetween syntactic interfaces and content. Semantic servicesare a good example of how ontologies can facilitate asemantic interface for discovery and also represent thecapabilities of services.

A second requirement for content discovery is thenecessary agreement on the information representationbetween providers and clients. Standardization of informa-tion schemas is one way to address this issue, though theadministrative overhead is not negligible and requirescentralized infrastructure and planning. The problem ofhaving a standardized representation is exacerbated inMANETs when one considers the self-contained andunpredictable environment. As MANETs can form in placeswhere no Internet connectivity is guaranteed, it follows thatnodes cannot avail of common representations on theInternet and can only use the knowledge of connectednodes. When ontologies provide the representation lan-guage, it is not reasonable to assume that information inmobile nodes will be described using commonly agreedontologies. Rather, the use of multiple and independentlydeveloped domain ontologies is more likely. A comprehen-sive specification of this problem is given in [2], where theauthors use the term emergent semantics to describe semanticconsensus based not on standardization but on emergentbehavior.

544 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009

. The authors are with the Distributed Systems Group, Department ofComputer Science, School of Computer Science and Statistics, TrinityCollege Dublin, Dublin 2, Ireland.E-mail: {Andronikos.Nedos, Raymond.Cunningham,Siobhan.Clarke}@cs.tcd.ie, [email protected].

Manuscript received 8 Nov. 2007; revised 9 May 2008; accepted 18 Aug.2008; published online 5 Sept. 2008.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2007-11-0336.Digital Object Identifier no. 10.1109/TMC.2008.133.

1536-1233/09/$25.00 � 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Authorized licensed use limited to: TRINITY COLLEGE LIBRARY DUBLIN. Downloaded on December 14, 2009 at 11:17 from IEEE Xplore. Restrictions apply.

In the distributed context envisioned by the emergentsemantics model, maintaining semantic interoperabilityremains a strong requirement. Even when data and servicesare described by multiple and independently developedontologies, service discovery, content retrieval, and seman-tic inference should be guaranteed as if operating within asingle ontology. To this end, appropriate mechanisms arerequired to map or translate metadata [3], [4].

Projects like H-MATCH [5], LARKS [6], and GLUE [4]accept semantic heterogeneity and have devised techniquesand algorithms for ontology matching, mapping, andevolution.

The additional challenges that MANETs pose to thediscovery of semantically diverse content relate to scal-ability and efficiency. Open networks with nodes that canbe both content providers and consumers require discoveryprotocols that scale. Furthermore, mobility and limiteddevice resources dictate that selected techniques must beadaptable, configurable, and able to spread load efficiently.In general, application interoperability in MANETs facesissues that require careful consideration of establishedassumptions such as reliance on standardized syntacticinterfaces or availability of common ontologies.

Current discovery architectures for MANETs addressthe above requirements to varying degrees. Traditionalservice discovery architectures like Sun’s Jini [7] andIETF’s Service Location Protocol (SLP) [8] were notdesigned for MANET environments, so they scale poorlyand require global knowledge of service templates. Otherapproaches have concentrated on distributed discoveryprotocols [9], [10], [11] but limit node autonomy byassuming a simple representation language globallyknown by all mobile peers.

The contribution of this paper lies in the design, analysis,and evaluation of a distributed discovery model calledOntoMobil. OntoMobil caters for MANETs and semanticdecentralization. Semantic decentralization is the idea thatautonomous nodes can express data or services usingdifferent ontologies that are not agreed upon a priori. Themodel relies on the decomposition of ontologies intoconcepts and the replication of these concepts through agossip protocol. This randomized concept disseminationworks in tandem with a lightweight semantic matchingmechanism that is executed in each node to facilitate theeventual semantic agreement between diverse ontologiesand provide a substrate for the scalable discovery ofcontent. The actual discovery employs a random walkprotocol, whereby semantic queries are first routed to anumber of random nodes and are subsequently evaluatedby a semantic reasoner at any provider node with acompatible ontology. A stochastic analysis is used to predictperformance and illustrate the trade-offs between highdiscoverability and the overhead incurred by replication. Acomparison between the analytical model and an imple-mentation of OntoMobil in ns2 confirms the derivedanalytical bounds.

The paper has the following layout. Section 2 containsthe state of the art in decentralized and mobile discoverysystems. Section 3 describes the model and protocols thatcomprise OntoMobil. Section 4 provides the stochastic

analysis of the gossip protocol and probabilistic bounds ofthe random walk discovery protocol, while the evaluation ispresented in Section 5, and the conclusion is given inSection 6.

2 RELATED WORK

The importance of content discovery in the design offlexible and adaptive applications can be seen in theproliferation of discovery architectures. This section re-views existing work with similar goals and features toOntoMobil in decentralized semantic topologies and ad hocdiscovery protocols.

2.1 Semantic P2P Topologies

The assumption of a centralized architecture and anintuitive scheme to name available resources can simplifythe discovery process. The activity of looking throughyellow pages or initiating a google query are typicalexamples of this. When centralized architectures cannot besupported, specialized architectures and protocols arerequired. A characteristic example is peer-to-peer (P2P)networks. Initially, discovery in these systems was confinedto naming conventions with limited expressiveness, puttingthe emphasis on topology properties such as scalability andbounded routing (e.g., Chord and Pastry). Currently, a newgeneration of P2P networks uses semantic representationscoupled with decentralized topologies.

One of the first complete systems to explore the idea ofsemantic or schema-based topologies is the EDUTELLAproject [12]. The goal of EDUTELLA is the distributeddiscovery of semantic information without the use ofcommon ontologies. In terms of topology, EDUTELLA isbased on superpeers, though the clustering algorithm ituses has several flavors.

Haase et al. [13] describe another semantic P2P topologyfocused on efficiency. The authors start from a completelyrandom network topology and eventually derive a topologythat closely matches the similarity in the semantic knowl-edge of peers. The authors evaluate the hypothesis thatcertain semantic topologies will perform more efficientlythan random topologies. A number of assumptions underliethis hypothesis. Chief among them are the assumptions ofsemantic similarity, stable connectivity, and global semanticknowledge.

INGA [14] exploits the query history in order to routesemantic queries in P2P networks. It accepts that a commonschema between peers is a simplistic assumption andassumes that peers describe content with heterogeneousmetadata. Since INGA does not define an exact topology, itselects a number of peers that are likely to contain relevantresults through observation and recording of metainforma-tion from user queries. This progressive acquisition ofknowledge is a common characteristic between INGA andOntoMobil, though the two systems use very differentmethods.

OntoMobil differs from existing semantic topologies inits assumptions, its topology, and the guarantees itprovides. First, OntoMobil is targeted toward MANETs,which are inherently dynamic, composed of autonomousnodes, and characterized by transient connectivity. These

NEDOS ET AL.: PROBABILISTIC DISCOVERY OF SEMANTICALLY DIVERSE CONTENT IN MANETS 545


conditions prevent solutions that rely on stable connectionsand global semantic knowledge. Second, OntoMobil buildsan unstructured semantic overlay by maintaining a repli-cated set of concepts. Although the OntoMobil topology isnot intended to be a general-purpose P2P topology, it canstill be classified according to the SIL model [15] as a graphwith forwarding search and forwarding index links. Whatfurther differentiates the OntoMobil topology in relation toother general-purpose content addressable topologies, e.g.,PlanetP [16], is the use of multiple weakly consistentreplicated indexes in the form of concepts instead of asingle index. Finally, the use of epidemic updates tomaintain concepts weakly consistent has the main benefitsof performance, simplicity, and an algorithm amenable toan analytical model and predictable guarantees.

2.2 Service Discovery in MANETs

Despite similarities between P2P networks and MANETs,there are a number of important differences. In anenvironment where mobile nodes develop content withouta common schema, ontology matching and discovery mustbe robust to node mobility, consider the processing capacityof nodes, and adapt to variation in the size of the network.We illustrate these design trade-offs by comparing Onto-Mobil to the following discovery systems in MANETs.

Kozat and Tassiulas [9] present a distributed discoverymechanism for MANETs. The mechanism is broker-basedand operates very close to the routing layer. There is nosuggestion as to what type of services can be supported insuch a system as the focus is on the specification of thediscovery protocol and its performance evaluation. Thebroker-based topology designed by Kozat and Tassiulasand the semantic overlay created by OntoMobil satisfydifferent requirements, making a direct comparison diffi-cult. If the broker-based topology were to be used to fulfillthe requirements of OntoMobil, the broker nodes wouldhave to bear the sole responsibility for concept matching.This would increase their processing overhead and wouldalso require a specialized hand-off procedure to transfer thematching relations between departing and new brokernodes. Because of mobility, the hand-off procedure and theconstant swapping of nodes between brokers and non-brokers would increase the traffic overhead.

Sailhan and Issarny [10] describe a scalable servicearchitecture for mobile ad hoc and hybrid networks. Thearchitecture is composed of a service representation and asuite of protocols for advertisement and service discovery.The requirement of reduced energy consumption have ledto a design that incorporates a set of connected brokernodes forming an overlay network. The core differencesbetween the OntoMobil overlay and the discovery archi-tecture in [10] reside in the representation of content, thefabric of the overlay network, and the use of broker nodes.Because OntoMobil uses ontologies to describe content, theoverlay is constructed from metadata, rather than data orservices. Furthermore, in OntoMobil every node is con-sidered to be part of the overlay, which simplifies thedesign by not requiring a special protocol for node to brokercommunication.

The Group-based Service Discovery (GSD) protocol [17]uses a caching mechanism to improve access times and

reduce the overhead associated with semantic servicediscovery in ad hoc networks. The caching mechanismworks by employing an ontology-based classification for allservices. The use of cached service concepts to aiddiscovery in GSD resembles that of OntoMobil. Amongthe differences are the common ontology assumed by GSDand the fact that OntoMobil does not use flooding butrelies on a unicast gossip protocol for concept dissemina-tion. This has the potential to improve OntoMobil’sscalability properties and to reduce overhead for nodesthat do not participate in the exchange of services. GSDalso sketches a service matching approach, which functionsby identifying service groups, input-output parameters,and service capabilities as potential matchmaking proper-ties. This can prove suboptimal when the selectiveforwarding strategy is different to the matchmaking one.For example, as outlined in the design of the forwardingstrategy, service groups are sufficient to route a service to aprovider. This does not guarantee a matched service,however, as the query signature might still be different tothe provided one. OntoMobil addresses this concern byrouting requests according to their full semantic signature.

3 ONTOMOBIL: MODEL AND PROTOCOLS

This section presents the OntoMobil model and the detailedspecification of the gossip and discovery protocols.

3.1 Model Description

The model specification uses concepts as the core abstractionand randomization as the main process. The specificationbegins by considering a finite set of concepts that are splituniformly across nodes. The aim is to compare all conceptsin a pairwise fashion by using all available nodes, while atthe same time providing a replication pattern to facilitaterapid discovery. The model uses the intuition that bothontologies and semantic queries are easily decomposed intotheir constituent concepts. To simplify this abstract modeldescription, each node is assumed to know every othernode, and protocol events in all nodes are assumed to besynchronized. In reality, the model only requires each nodeto know a subset of the participating nodes, while thesynchrony assumption is relaxed.

Initially, each node randomly selects a fixed subset of itsconcepts and transmits these concepts to a set of randomnodes. In subsequent steps, each node mixes any receivedconcepts with those from its own ontology and performsthe same random selection and transmission. To prevent theeventual replication of all concepts into all nodes, the modelrestricts both the retransmission of received concepts andtheir propagation. This restriction is necessary to allow themodel to scale as the number of concepts increase. Therestriction is facilitated by the use of two constraintparameters: a transmission threshold (age) and a propaga-tion threshold (time to live (ttl)).

The model requires each participating node to contributea fraction of its resources for ontology matching andreplication. Ontology matching combines the pairwiseconcept comparison with the randomized concept dissemi-nation to obtain network-wide semantic agreement in aprogressive and distributed fashion. By utilizing the



infrastructure that emerges from replication and semanticmatching, a discovery query can now follow a random paththrough the one-hop neighborhood of each node until itidentifies concepts that are similar to the query’s constituentconcepts. Replication guarantees that similar concepts willbe found in a number of hops that is probabilisticallybound.

Such a model offers a scalable mechanism for thediscovery of diverse semantic content and an efficientway to match heterogeneous ontologies. It is scalablebecause each node maintains only partial metadata andefficient because matching load is progressive and dis-tributed across all nodes.

3.2 System Assumptions

We consider an ad hoc network composed of N ¼fn1; n2; . . . ; nNg mobile nodes. All nodes are considered tobe active participants that maintain ontologies and sharecontent. The model can be extended to include nodes thatare not participants, though they are required to have thebasic capability to forward packets. It is assumed that allnodes communicate using a fixed-range wireless medium(e.g., IEEE 802.11) and that a unicast routing protocol (e.g.,AODV and OLSR) is available.

Each node ni 2 N maintains three different views. Theontology view, the concept view, and the node view. Theontology view represents a node’s ontology as a fixed set ofconcepts, VO

i ¼ fci1; . . . ; ciGg, where cil 2 VOi , 1 � l � G is a

concept in the ontology of node ni, and G represents themaximum number of concepts per node. The modelassumes that these ontologies are static, so this view allowsno additions or deletions of concepts for the duration of theprotocol execution. We believe that this is a reasonableassumption as ontologies are structured metadata, meaningthat they are specified during application design and do notchange often.

The concept view includes the set of concepts that arereceived from other nodes. It does not allow duplicateconcepts or concepts that exist already in the node’sontology view. This view is represented as VC

i �fckljckl 2 VO

k ; k 2 N � fnig; 1 � l � Gg, where ckl representsa concept from any ontology view except the ontology viewof node ni. The gossip protocol provides the concept viewwith the following properties:

. Probabilistically bound. The concept view is notconstrained by a fixed size; rather, the protocolguarantees a bound on its size with a certainprobability. The intent of the different gossipparameters is to keep the concept view partial, i.e.,with a certain probability, it should maintain only asubset of the total number of concepts.

. Evolving. The gossip protocol constantly inserts andremoves concepts.

. A simple random sample. Since each view does notcontain a set of concepts that concretely describe aknowledge domain, it rather contains randomizedconcepts that can belong to any of the availableontologies.

The node view is a set of node identifiers, VNi �

fnkjk 2 N � fnigg. Like the concept view, it does not allow

duplicate node identifiers and cannot contain the node’sown identifier. It is a membership view that has similarproperties to those found in recent gossip protocols [18],[19]. The node view has the following properties:

. Fixed size. Contrary to the concept view, the nodeview does not automatically adapt to an ever-increasing group size. More sophisticated protocolshave been proposed that adapt the size of themembership view as the group size increases [20].The node view is also intended to be partial withoutcontaining the complete list of all participants.

. Uniform. The probability that a node identifier existsin a specific node view is the same for all nodeidentifiers.

. Randomized. The distribution of identifiers in thenode views is that of a random distribution.

The node view is populated during a bootstrap phase and issubsequently maintained by the gossip protocol.

3.2.1 Gossip-Based Parameters

The gossip protocol is completely characterized by thefollowing parameters:

. Fc—the concept fanout specifies the number ofconcepts a sender includes in a gossip message.

. Fn—the node fanout specifies the number of destina-tion nodes a gossip message is sent to.

. Ta—age specifies the number of times a nodetransmits a received concept before the concept isremoved from the concept view. For convenience,we define function ageðcÞ, which takes concept c asinput and returns its age.

. Tt—the ttl value is assigned by each sender to anyconcept that is selected for transmission from itsontology view. It specifies the number of hops in thesemantic overlay that a concept will traverse beforebeing discarded, i.e., when the ttl value reaches zero.

3.3 Constructing the Semantic Overlay

3.3.1 Join

Before a node begins execution of the gossip protocol, itmust first populate its node view with a partial list of otherparticipants. A simple bootstrap protocol is used in order toreach a partial and uniform membership view. TheOntoMobil join protocol is based on a simple expandingring search and is used either when a node enters an ad hocnetwork or after a node failure.

3.3.2 Leave

The current gossip specification requires an explicitdisconnection procedure and does not admit unexpectedfailures. Before a node leaves the network, it declares itsintent as part of the periodic gossip transmission. Nodesthat receive such a gossip transmission 1) remove the nodeidentifier if it exists in their node view, 2) remove anyconcepts from their concept view that originate from theontology view of the departing node, and 3) remove allreferences from concepts maintained in their ontology orconcept views that point to concepts in the ontology view ofthe departing node.



In the next gossip round, the nodes that initially receivedthe leave notification will further propagate it, resulting inmore nodes executing the same removal sequence as above.Eventually, by gossiping the intention to disconnect, allparticipants will be reached, and all network-wide refer-ences to the departing node will be removed.

3.3.3 Gossip Protocol

The OntoMobil gossip protocol is influenced by [19],retaining similar semantics for the ttl and age parametersand incorporating the use of a partial membership viewmechanism found in similar protocols. It has been modifiedto a distinct protocol, described in more detail in [21], to fit thediscovery model of OntoMobil by altering the transmit andreceive operations, adding the concept fanout parameter,and optimizing the OntoMobil gossip for a proactive routingprotocol. In particular, what distinguishes the two maingossip operations of transmission and reception in [19] istheir operation on a finite set of concepts rather than onapplication-generated messages; the transmission of arandomized selection from the union of the concept andontology views rather than the transmission of bufferedmessages until such buffer is empty; and ultimately, theprovision of randomized replication rather than randomizedtransmission.

Listing 1. Gossip Transmission

1: //GossipMessage represents the network packet

2: Every t time units at node j

3: Choose Fc random concepts fck1; . . . ; ckFcg from VOj

SVCj

where k 2 N4: for all c 2 fck1; . . . ; ckFc

g do

5: if c 2 VOj then

6: c:ttl Tt7: end if

8: if c 2 VCj ^ ageðcÞ ¼ Ta then

9: VCj VC

j � fcg10: else if c 2 VC

j then

11: ageðcÞ ageðcÞ þ 1

12: end if

13: end for

14: GossipMessage:concepts fck1; . . . ; ckFcg

15: //Piggyback removal notifications

16: GossipMessage:removals BufferRemovals17: //Select destination nodes

18: Choose Fn random nodes, fn1; . . . ; nFng from VN

j

19: GossipMessage:src fjg20: for all nid 2 fn1; . . . ; nFng do

21: sendsendðnid;GossipMessageÞ22: end for

Listing 1 describes the transmission of a gossip message.Periodically, a node selects Fc concepts uniformly atrandom from the union of the ontology and concept views(line 3). A concept that is selected from the ontology view isaugmented with the ttl property (line 6), signifying thenumber of hops the concept will be propagated before beingdropped by the receiving node. If a concept is selected fromthe concept view and has already been transmitted Ta times,it is removed from the concept view (line 9). Selecting afixed number of random concepts from the union of the two

views is a simple algorithm that exhibits a desirableadaptive behavior. During the initial stages of gossiptransmission, a node’s priority is to disseminate its ownontology so that its semantic information is diffusedthroughout the network. Since the concept view containsfew elements during the initial rounds, concepts from theontology view have a higher probability of being selectedfor transmission.

If the sender has received any leave notifications, theyare also appended to the gossip message (line 16). The lastaction for the sender is to select Fn distinct nodes uniformlyat random from its node view (line 18) and to unicast thegossip message to the target nodes (line 21).

On reception of a gossip message, the receiving nodeexecutes the algorithm presented in Listing 2. If the optionalfield GossipMessage:removals exists, then the actions out-lined in Section 3.3.2 are executed. The identifiers are thenstored in the auxiliary buffer BufferRemovals, so they can befurther propagated in the next gossip transmission (line 6).

Next, the receiver matches the set of concepts includedin the gossip message against its own stored concepts(line 10). We use the H-MATCH algorithm [5] to identifyconcept equivalence relationships. When two concepts arefound to be equivalent, a reference to the matchingconcept and its corresponding source node address areappended in both concepts. The current implementationuses RDFS to model ontologies, so two RDFS predicatesare appended in both concepts: om:matchesConcept

holds a reference to the URI of the matching concept, andom:matchesConceptInNode references the correspond-ing node address.

Subsequent to matching, lines 11-14 show that if aconcept is not found in either the concept or the ontologyviews and the concept’s ttl is greater than one, it is stored inthe receiver’s concept view and the concept’s ttl value isdecremented by one.

Listing 2. Gossip Reception

1: //GossipMessage represents the gossip packet

2: On reception of a GossipMessage at node j3: //Prune entries from removed nodes

4: for all nid 2 GossipMessage:removals do

5: Execute algorithm in Section 3.3.2

6: BufferRemovals BufferRemovalsSfnidg

7: end for

8: //Process received concepts

9: for all c 2 GossipMessage:concepts do

10: Execute ontology matching algorithm between c and

VOj

SVCj

11: if c =2VOj ^ c =2VC

j ^ c:ttl > 1 then

12: c:ttl c:ttl� 1

13: VCj VC

j

Sfcg

14: end if

15: end for

16: //Node view maintenance

17: if GossipMessage:src =2VNj then

18: if jVNj j ¼ ParameterNodeView then

19: Prune VNj by removing a random node id

20: end if

21: VNj VN

j

SfGossipMessage:srcg

22: end if



Through the ontology matching algorithm, each messagehas the potential to create new associations betweenconcepts from different ontologies. This progressive aspectof ontology matching results in the following networkbehavior: the longer a node is connected to an ad hocnetwork, the more likely it is that potential associationsbetween its own and other ontologies will be identified.This prevents nodes that are short-lived or transientlyconnected from immediately overloading the network withthe task of complete ontology matching. A concept viewsize that is probabilistically bound and a fixed conceptfanout also ensure that nodes are not overwhelmed withmatching large ontologies. Although the redundancy that isinherent in gossip protocols can seem excessive, it is thisvery feature that allows progressive matching through theexchange of concepts and scalable discovery throughconcept replication.

The last action a receiver undertakes is node viewrelated. A simple algorithm derived from [18] is used. Thealgorithm maintains the fixed size of the node view,denoted by ParameterNodeView, while constantly updatingit with a small number of new identifiers (lines 17-22).

The benefits of the gossip execution are twofold: first, theontology matching component in each node can derivesemantic similarity relations between its stored conceptsand those received, and second, replicated concepts can beused to bound the number of hops required for discovery.The gossip protocol is the basis of the model and providesthe foundation on top of which distributed ontologymatching and semantic discovery can take place.

3.3.4 Optimized Gossip

An important decision in the design of OntoMobil was theseparation between the gossip and the routing protocol.While this separation places mobility adaptation to therouting layer and semantic discovery in the applicationlayer, the transmission of concepts many hops away canincrease routing overhead and failed transmissions. Wehave devised and implemented an optimization for thegossip protocol that works with a proactive routingprotocol, specifically OLSR, and combines the use oftopology-aware membership views with a weighted desti-nation selection mechanism. The idea is to have the nodeview maintain, in addition to the set of node identifiers, anestimation of the number of hops required to reach each ofthe corresponding nodes. This information can be recordedeach time a gossip message is received and maintainedtogether with an associated timer to track the accuracy ofthe hop distance. Having a node view that is populatedwith a distance metric can facilitate an algorithm that doesnot select identifiers uniformly at random but is biased tothose identifiers with a short hop count and timelyinformation. Specifically, in the optimized gossip version,the target of a gossip transmission is not selected based onuniform random sampling but using a weighted randomsampling. The node view for node ni now contains entriesof the form VN

i ¼ fðnk; wkÞ; . . .g, where nk is a nodeidentifier, and wk ¼ ðh��tÞ�1 is the weight for node kthat was h hops away from node i when it was inserted inthe node view of node i some �t time units ago.

3.4 Discovery

Discovery queries in OntoMobil [22] accept a set of conceptsdescribing the capabilities of services or generally thesemantics of required content. The mechanism that dis-seminates these queries exploits the randomized overlay tolocate nodes having ontologies with concepts equivalent tothe concepts that compose the discovery queries. Thespecification in this section covers this mechanism andassumes that a matchmaking process at each destinationnode will identify services or data that are semanticallysimilar to those requested and transmit the reply to thesource node.

Some common definitions are provided below followed

by the protocol description:

. A node initiating a discovery request becomes thesource node of the request.

. A request is composed from a set of concepts. The

following notation is used:

Q ¼ c; . . . jc 2 V0i ; i 2 N

� �:

. Matches are stored in R ¼ fðc; fni; . . .gÞ; . . . jc 2 Qg,where ni represents a node that has an ontology

containing a concept that matches c.. To simplify the description of the random walk, we

define the function fSðcÞ, where c is a concept with

c 2 S and S can be either of the two views, i.e., VO,VC, or R. This function returns the set of node

identifiers of all matched concepts currently em-

bedded in c. In other words, it returns the values

obtained from the set of om:matchesConceptIn-

Node predicates.

Listing 3. Semantic Discovery1: //RequestMessage represents the semantic query2: At source node j3: for all c 2 Q do4: if c 2 VO

j then5: R fVO

jðcÞ

6: end if7: end for8: RequestMessage:ttl ParameterRequestTTL

9: Choose a random nid from the 1-hop neighbors of j10: sendsendðnid; RequestMessageÞ11: At each node receiving RequestMessage:12: for all c 2 Q do13: if c 2 VC [ VO then14: R R[ fVOðcÞ or R R[ fVCðcÞ15: end if16: end for17: if RequestMessage:ttl ¼ 0 or all concepts found then18: D ¼

Tc2QðfRðcÞÞ

19: for all nid 2 D do20: //redirect query to node nid21: forwardforwardðnid; RequestMessageÞ22: end for23: else24: RequestMessage:ttl RequestMessage:ttl� 125: Choose a random nid from 1-hop neighbours26: sendsendðnid;RequestMessageÞ27: end if



The basic discovery mechanism in OntoMobil is flexibleand can support many possible variations. These variationsare conditioned on several choices, e.g., whether theconcepts composing the query exist in the source node’sontology or the nature of the operators that form the querycondition. The current variation, manifested in the defini-tion of Q, is selected because of its simplicity and because itutilizes the mechanisms of concept replication and match-ing. It does so by allowing the composition of queries fromnonlocal concepts, i.e., concepts that do not exist in theontology view of the source node. The difference being thatwhen a query is solely composed of concepts that exist inthe source node’s ontology, extracting matching conceptscan be accomplished either by only polling periodically thelocal concepts for new matching associations or by usingpolling followed by a discovery request. The drawbackwith a polling-only mechanism is that it becomes overlydependent on the progress of the semantic matching.Combing polling with a discovery request can increase thechances of identifying new associations in other nodes.When a query is composed of nonlocal concepts, adiscovery query is mandatory since the query conceptsmust first be discovered in the network for any subse-quently matching associations to be extracted. The motiva-tion to compose queries with nonlocal concepts stems fromthe ability to create more complex queries by combiningconcepts in the node’s ontology with concepts in thenetwork. Furthermore, it allows the discovery of contentusing predefined queries by nodes that can only hold aminimal ontology but want to avail of all networkknowledge.

We briefly describe the discovery protocol by observingthe random walk as traversing a graph built by consideringeach node as a vertex with outgoing edges to the nodesfound in its one-hop neighborhood (lines 1-10). The randomwalk mechanism is used to collect the identifiers of nodesthat maintain ontologies with concepts that match theconcepts in the discovery query. The node identifiersconstitute addresses of potential content providers since amatching relation by definition indicates partially compa-tible ontologies. When a query condition is satisfied, i.e., allquery concepts are located, or when the query’s ttl reacheszero (line 17), the second phase is initiated in which thequery is forwarded to the nodes discovered during the firstphase (line 21).

4 STOCHASTIC ANALYSIS OF ONTOMOBIL

When considering the applicability of OntoMobil, the sizeand variability of the concept view become importantfactors. These factors influence the scalability of the gossipprotocol as memory consumption, processing overhead,and the probabilistic guarantees for discovery depend onthe distribution of replicated concepts across nodes.

This section uses a stochastic analysis to derive apredictive model of the proposed gossip protocol, giventhe total number of concepts, the number of nodes, and thecharacteristic parameters of Fn, Fc, Tt, and Ta. The main aimof the analysis is to formulate a probability measure of theconcept view size, which will be used as the foundation forthe probabilistic guarantees of the random walk discoveryprotocol.

The variability of the concept view size is modeled using

the random variable V . Based on V , the probability mass

function (pmf) fV ðvÞ is derived, which contains complete

information about the distribution of V to satisfy the main

goal of the analysis.

4.1 Analytical Assumptions

To make analytical modeling tractable, a number of

assumptions and simplifications are made. Similar to the

system assumptions in Section 3.2, the analysis considers a

fixed set of N participants with cardinality N ¼ jN j, which

maintain the same number of concepts G ¼ jVOj in their

ontology views. This brings the total number of concepts to

Gtotal ¼ N �G.1

Recall that concepts are inserted in the concept view if

they do not already exist in the two views VC and VO.

Concepts are also removed when they are selected for

transmission from the concept view and their age parameter

reaches Ta. The analysis presented here focuses on modeling

the variability of the concept view size, based on the

probability that a certain number of concepts are inserted

and removed in each round.As is evident from the protocol specification, the size of

the concept view varies both across nodes and across

rounds. Specifically, a node’s concept view varies across

rounds, while in a certain round, there is variability in the

view sizes of the different nodes. We assume that the size

distribution across both of these dimensions is Gaussian,

and as the number of rounds and the number of nodes

increase, the mean and variance of these respective

distributions converge. Fig. 1 illustrates the distribution of

the concept view size through a normal probability plot

taken from a simulation of the gossip protocol with

300 nodes running for 300 rounds. Despite the dispersion

between the two data slices and the short-tailed lines, the

two slices do approximate the normal distribution, which is

sufficient for the purposes of this assumption.


1. The Appendix provides a concise table of the symbols used during theanalysis.

Fig. 1. The variability in the size of the concept view can be measured inspace (node slice) or in time (round slice). Data in this figure are takenfrom simulations with 300 nodes for 300 rounds. Despite the dispersion,it can be seen that both distributions are convergent and approximatethe Gaussian. Experiment invariants: Fc ¼ 4, Fn ¼ 2, Tt ¼ 2, and Ta ¼ 2.


The stochastic model provides an analytical distributionthat approximates both experimental distributions asderived from the per-node and the per-round samples.This implies that the distribution of the concept view sizefor a single node across time can also be used as anindicator for the distribution of the concept view sizesacross all nodes during a single round. Therefore, thebehavior of the gossip protocol is observed from a node thatis selected uniformly at random from the nodes in N .

The four main assumptions that will be used during theanalysis are presented as follows:

A1. Synchronous gossip transmission. The stochastic ana-lysis assumes that all nodes transmit in synchronousintervals. At every round, each node in N transmitsa gossip message to Fn other nodes. Node failures ormessage omissions are not taken into account, and itis assumed that the message latency between allnodes is negligible. It can be inferred that messagessent in a round will be received within the sameround. Note that this assumption only requires theupper bound on node-to-node communication to bewithin the range of the gossip time-out value. This isfeasible since there are no real-time requirementsand the time-out value can be easily adjustedwithout influencing protocol correctness.

A2. Simple random sample. All concepts in both theconcept and ontology views represent a simplerandom sample selected from the set of all concepts.This assumption is verified by the agreementbetween the discovery and analytical results thatare presented in Section 4.3. Specifically, thisassumption implies that the set of concepts in eachnode’s VC [ VO represent a simple random samplefrom

SVOi , 8i 2 N .

A3. Independence of gossip reception and transmission. Thetransmission and reception of gossip messages areindependent events that happen simultaneously. Inpractice, transmission and reception of gossipmessages have an arbitrary order within a roundand occur sequentially. However, treating them assimultaneous events allows the composition ofreception and transmission into a single well-ordered gossip action.

A4. Sequential insertion of concepts when Fn > 1. The lastassumption concerns the reception of multiplegossip messages from different senders, i.e., Fn > 1.Section 4.2.2 shows that on the average, Fc � Fnconcepts are received by each node in a singleround. The assumption of sequential insertionexcludes the case where similar concepts may existin the different Fn transmissions, essentially treatingthis case similar to the reception of a single gossipmessage containing Fc � Fn distinct concepts.

4.2 Stochastic Model Specification

The stochastic model considers a network of nodes withempty concept views at the initial round r ¼ 0. Let Vr bea random variable representing the concept view size of arandomly chosen node, so that Vr models jVCj whenr � 0. From the gossip specification, Vr has range

EV ¼ f0; 1; . . . ; Gcg, where Gc ¼ Gtotal � G. The sequence ofrandom variables fVrgr�0 can now be considered as adiscrete-time and finite-space Markov chain with EV as itsstate space. We now proceed to calculate the state vectorand transition matrix of Vr.

4.2.1 State Probability Vector

The state space represents all applicable concept view sizesin unit increments. To represent the state probabilities atround r, we use the vector

pðrÞ ¼ pðrÞ0 . . . p

ðrÞGc

h i0; ð1Þ

where pðrÞi ¼ P ½Vr ¼ i�, i 2 EV , is the probability that the

concept view has size i after round r with an initialcondition p

ð0Þ0 ¼ P ½V0 ¼ 0� ¼ 1.

If the stationary vector for pðrÞ exists, the state prob-abilities reach a steady state, and the stationary vectorrepresents the required probability measure of the conceptview size. Since the stationary vector can be mapped to thepmf fV ðvÞ, then if it shown that pðrÞ has the stationaryproperty, fV ðvÞ will have been derived.

4.2.2 Transition Probability Matrix

Given that the concept view of a random node has sizeVr ¼ i after round r, the transition to size Vrþ1 ¼ j afterround rþ 1 is subject to the following conditions:

. C1: the number of concepts received by the node,

. C2: the number of concepts inserted in the conceptview, and

. C3: the number of concepts removed from theconcept view because of the node’s transmissionand the age parameter threshold.

Condition C1 is approximated by computing the meannumber of received concepts in each round. Under theassumptions of uniform node views and synchronousgossip transmission A1, each node will receive on theaverage �c ¼ Fc � Fn concepts from Fn gossip transmissions.

Knowing the mean number of concepts that a nodereceives in a round, it can be inferred that between tworounds, the size of a concept view ranges from �Fc to �c.This variability lies in the interval of the boundary caseswhere no concepts are inserted and Fc concepts areremoved, and all �c received concepts are inserted but noconcepts are removed.

Fig. 2 illustrates the possible transitions of the conceptview size after a round. Each of these transitions can berepresented by an event that has zero or more outcomes.Two numbers compose each outcome: the number ofconcepts inserted in the view and the number of conceptsremoved from the view. These two numbers correspond toconditions C2 and C3.


Fig. 2. States and corresponding transitions for a Markov chain that

represents the variability in concept view sizes.


For example, the transition to the state of maximum viewsize reduction occurs when there are zero insertions and Fcconcept removals, while moving to the state representingthe maximum increase happens when there are zeroremovals and �c concept insertions.

The transition to other states can be computed in asimilar way, remembering that most transitions are repre-sented by multiple outcomes. An increase of one in the viewsize can happen because two concepts are inserted and oneis removed or three concepts are inserted and two areremoved, etc.

This relationship is revealed if the difference in view sizesbetween consecutive rounds, i.e., Vrþ1 � Vr, is furtherdecomposed into two random variables. Each randomvariable represents an outcome, withX denoting the numberof inserted concepts during a round ðC2Þ and Y denoting thenumber of concepts removed during the same round ðC3Þ.The two random variables have ranges EX ¼ f0; 1; . . . ; �cgand EY ¼ f0; 1; . . . ; Fcg. It follows that X � Y ¼ Vrþ1 � Vr.

As stated in assumption A3, insertion and removal areconsidered as simultaneous and independent events. LetPXðx; iÞ ¼ P ½X ¼ xjVr ¼ i� and PY ðy; iÞ ¼ P ½Y ¼ yjVr ¼ i�be used to express the probabilities that x concepts areinserted and y concepts are removed when the conceptview size is i. The probability that a view change is Vrþ1 �Vr can then be computed by using the sum of productsbetween PXðx; iÞ and PY ðy; iÞ for all x 2 EX and y 2 EY ,where x� y ¼ Vrþ1 � Vr.

The view change probability can be generalized toexpress the transition probability as

Pij ¼P ½Vrþ1 ¼ jjVr ¼ i�

¼

P8x2EX;8y2EY :

x�y¼j�i

PXðx; iÞPY ðy; iÞ; i� Fc � j � iþ �c;

0; iþ �c < j < i� Fc:

8><>:

ð2Þ

We can now proceed to compute the values Pij of theprobability matrix by deriving PXðx; iÞ and PY ðy; iÞ.

4.2.3 Concept Removal Probability

Let �ðiÞ ¼ Gþ i be a convenience function that returns thenumber of concepts in a node’s ontology and concept viewswhen the concept view has size i. The concept removalprobability is calculated assuming that concepts withdifferent age values are uniformly distributed in theconcept view. The probability of removing y concepts,when the concept fanout is Fc can then be calculated as

PY ðy; iÞ ¼di=Taey

� �b�ðiÞ� i=Tað Þc

Fc�y

� ��ðiÞFc

� � y 2 EY : ð3Þ

Equation (3) expresses the probability of selecting yconcepts from the subset of concepts in the concept viewthat have an age value of Ta � 1, while the remaining Fc � yare selected either from the ontology view or from thesubset of concepts having an age value different to Ta � 1.

4.2.4 Concept Insertion Probability

To calculate the concept insertion probability, it is necessaryto identify the number of received concepts that 1) have a ttl

value that is greater than one and 2) do not exist in theconcept and ontology views of the receiver. Let �ðiÞ be afunction that returns the number of concepts in the senderthat have a ttl value greater than one and do not exist in thereceiver’s views.

The probability of inserting x concepts having received�c can be calculated as

PXðx; iÞ ¼d�ðiÞex

� �b�ðiÞ��ðiÞc�c�x

� ��ðiÞ�c

� � x 2 EX: ð4Þ

Equation (4) expresses the probability that x conceptswill be selected from �ðiÞ concepts and therefore inserted,while the remaining �c � x concepts will either have a ttlvalue of one or exist in the receiver’s views and willaccordingly not be inserted.

4.2.5 Computation of �ðiÞAs stated in Section 4.2.4, the two factors of the gossipprotocol that influence the insertion of received conceptsinto the concept view are F1Þ the ttl values of the receivedconcepts and F2Þ the existence of the received concepts inthe concept or ontology views of the receiving node.

In order to compute the probability that the number ofconcepts selected by the sender will be inserted in thereceiver’s view, it suffices to relate both F1 and F2 to theview size of the gossip sender. Each of the selected conceptsmust have a ttl value different to one and cannot exist in thereceiver’s views. To formulate the concept insertion prob-ability, we require expressions, predicated on view size, fromwhich to derive the number of concepts having a certain ttlvalue and the number of replicated concepts between thesender and the receiver. The following paragraphs outlinean analytical derivation of the two expressions.

To compute �ðiÞ, we must find the distribution of ttlvalues across the ontology and concept views of a singlenode for F1 and the distribution of replicated conceptsbetween a sender and a receiver for F2. Since both factorsdepend on the size of the concept view, their calculationmust take place for all concept view sizes.

Knowing that concepts in the ontology view have a ttlvalue of Tt, the concept view will contain concepts with ttlvalues in the range of ½1; . . . ; Tt � 1�. Fig. 3 illustrates theintuition that in each round, the number of concepts with attl value of � is proportional to the number of concepts thathad a ttl value of � þ 1 in the previous round.


Fig. 3. The distribution of ttl values across concepts in a concept view is

proportional to the ttl distribution across concepts in both views of the

previous round.


Knowing that the ttl distribution, when the concept view

has size i, derives from the ttl distribution of the

immediately preceding size i� 1, we can use a recurrent

function, gð�; iÞ, to compute the number of concepts having

ttl value � when the size of the concept view is i. Function

gð�; iÞ has the following domain and range:

gð�; iÞ : 1; . . . ; Ttg � f0; . . . ; Gcg ! f0; . . . ;maxfG; igf g

and is specified as

gð�; iÞ ¼0; if f�gTt�1

1 ; i ¼ 0;

G; if � ¼ Tt; figGc

0 ;

i � gð�þ1;i�1ÞPTt

�¼2gð�;i�1Þ

; if f�gTt�11 ; figGc

1 :

8>><>>: ð5Þ

To compute the probability that a transmitted concept

exists in the receiver’s views (i.e., either the concept or the

ontology view), we first need to calculate the number of

identical (replicated) concepts in both views between the

sender and the receiver. Using an overall set to represent all

concepts, under the simple random sample assumption A2,

we can use two distinct subsets of equal size selected

randomly and with replacement from the overall set to

represent the sender’s and receiver’s concepts.Let the random variable Z represent the number of

replicated concepts between two nodes. The range of Z is

EZ ¼ f0; . . . ; �ðiÞg, and its probability distribution can be

expressed as

PZðz; iÞ ¼�ðiÞz

� �Gtotal��ðiÞ�ðiÞ�z

� �Gtotal

�ðiÞ

� � for z 2 EZ: ð6Þ

Equation (6) expresses the probability that z concepts are

identical when both the sender’s and the receiver’s views

have size �ðiÞ. We derive the distribution knowing that �ðiÞconcepts have already been selected from Gtotal for the first

subset and formulating the probability distribution of

identical concepts based on the second subset. Since the

size of the second subset is also �ðiÞ, the probability

distribution is calculated by having the second subset be

composed of z items chosen from the first subset and

therefore considered identical, while the rest �ðiÞ � z are

selected from the remaining set of all available concepts.

Simulation results indicate that the expected value E½Z� is a

good approximation to the mean number of identical

concepts in both views between the sender and the receiver.Assuming that the expected number of identical con-

cepts are spread uniformly across the two views in each

node, the probability that a transmission contains a certain

number of identical concepts can be easily determined.Factors F1 and F2 can now be expressed with (5) and

(6), resulting in the following equation for �ðiÞ:

�ðiÞ ¼XTt

�¼2

gð�; iÞ �PTt

�¼2 gð�; iÞ�ðiÞ � E½Z�

!: ð7Þ

4.2.6 Stationary Distribution

By construction, the probability matrix derived from (2) is

finite. It can be shown [23] that the Markov chain is

irreducible and aperiodic if for some power of the matrix,all its elements are positive.

Each transition matrix constructed under the currentmodel has elements that have either positive or zero values.Each matrix is formulated with a set of well-defined rules,with the following property:

P½ij� ¼ > 0; if i� � < j � iþ �;0; otherwise;

�ð8Þ

where P½ij� represents the matrix element at position ij,which has a value calculated from Pij. Specifically, �

corresponds to �c, and � corresponds to Fc. When � > 1 and� > 1, it is easy to show that for some r > 1, the transitionprobability matrix Pr will have all its elements positive.

Knowing that the state probability vector pðrÞ results in astationary vector and having formulated the transitionmatrix, the probability distribution of V is the stationaryvector limr!1pðrÞ, which can be computed from

pðrÞ ¼ pðr�1ÞP: ð9Þ

4.2.7 Discovery Distribution

We can now derive the probabilistic bound on conceptdiscovery for a query composed of nonlocal concepts andspecified according to the protocol in Section 3.4:

PDisc ¼ 1� 1� jVCj

Gtotal

zfflffl}|fflffl{i0BBB@

1CCCA � 1� jV

Cj þ jV0jGtotal

zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ii0BBB@

1CCCAn�1

0BBBB@

1CCCCAjQj

;

ð10Þ

where n � 1 represents the number of nodes in the randomwalk path, including the source node, i represents theprobability that a concept exists in the concept view of thesource node, ii represents the probability that a concept isfound in either of the two views, and jQj represents thenumber of concepts in the query.

4.3 Evaluation of Stochastic Analysis

This section evaluates the stochastic properties of the gossipprotocol by studying the accuracy of the analytical modelagainst simulation results. In all experiments, the sameparameter values were used for both the simulation and theanalysis. We chose two network sizes of 30 and 60 nodes.The varied gossip parameters were node fanout ðFnÞ, ageðTaÞ, and the concept ttl ðTtÞ, while the concept fanout ðFcÞand the ontology view size ðjVOjÞ were kept constant at 4and 10 concepts, respectively. Section 5 presents moredetails on the simulation environment.

Fig. 4 demonstrates the results between the analysis andns2 simulations. For simulation results, the 95 percentconfidence interval is calculated over 350 rounds of protocolexecution. In each experiment, all nodes record the size oftheir concept view after every transmission. To illustrateresults closer to the steady-state distribution of the conceptview, the initial 30 recordings are discarded. Each simula-tion point in the graph represents the median value from asequence of average concept view sizes obtained from five



experiments. For the analytical results, what is depicted isthe expected value of the state vector ðE½V �Þ.

For almost all parameter permutation, the stochasticanalysis gives a very accurate prediction for the behaviorof the gossip protocol. The variability that is observedwhen the concept view size is large is caused partly bythe analytical assumptions made during the derivation ofthe transition matrix in the analysis and partly by the useof weighted random sampling in the optimized gossipimplementation and the failed transmissions in thesimulation.

Fig. 5 shows the comparative performance between thens2 simulation and the analysis of the random walkdiscovery protocol. In each query, the source node selectsone concept ðjQj ¼ 1Þ uniformly at random from a set thatincludes all available concepts, except the ones in theontology view of the source node. The source nodesubsequently searches its concept view for that concept,and if it is not found, the random walk protocol is executed.It is clear that the size of the concept view influences thediscovery probability, so we used the gossip ttl ðTt ¼ 2; 3Þto vary the concept view. After the first 30 rounds of thens2 simulation, every node transmitted one discoveryquery per second. In all experiments, the query ttlðParameterRequestTTLÞ was set to five. The y-axis depictsthe discovery ratio per node in the random walk. It is theprobability of locating the query concept during therandom walk. The discovery probability is computed usingthe cumulative number of queries satisfied in eachsuccessive node divided by the total number of queries,including those that were lost.

The discovery ratio always increases in proportion to thesize of the concept view. With respect to the network size,the discovery ratio shows a linear decrease with anincreasing number of nodes and, hence, an increasingnumber of concepts. Note that this is a natural consequenceof the intrinsic relationship in OntoMobil between nodesand content (concepts), which is contrary to the usualtreatment of nodes and content as separate entities.OntoMobil can guarantee a probabilistic bound on itsproperties such as discoverability or the latency of semantic

matching, provided that its characteristic parameters adjustto variations in the network size. For example, given adesirable discovery ratio of 60 percent in a network of30 nodes with the gossip ttl parameter set to two, toguarantee a similar discovery ratio when nodes increase to60, it suffices to increase the gossip ttl to three. In general,assuming similar ontology view sizes between nodes evenas the network grows or shrinks, constant probabilisticbounds on OntoMobil properties are guaranteed by varyingthe values of the characteristic parameters to keep the ratioof the expected concept view size to total conceptsðE½V �=GtotalÞ constant.

Finally, the correspondence between the analytical andthe experimental results verifies the assumption that theunion of the concept and ontology views constitute a simplerandom sample from the population of all concepts(assumption A2 in Section 4.1), which is a further validationof the discovery probability model captured by (10).

5 EXPERIMENTAL EVALUATION

In order to assess the design choices made in OntoMobil,specifically the use of replication and gossiping, we devisedDiscBCast, a broadcast-based protocol used for the experi-mental evaluation. DiscBCast does not use replication orgossiping but instead broadcasts a discovery query to allconnected nodes. To improve the performance of DiscBCast,we used an established broadcast optimization technique,namely, BCAST, as described in [24]. The aim of DiscBCastis similar to OntoMobil: to discover semantic content inMANETs, assuming multiple heterogeneous ontologies. Tofacilitate concept matching, a random subset of conceptsfrom the query’s source node are always piggybacked withevery query. DiscBCast was also implemented in ns2 [25],and simulations between the two protocols were conductedusing similar parameters.

The experimental evaluation compared OntoMobilagainst DiscBCast in terms of the ontology matchinglatency, the overhead a node sustains due to the distributed


Fig. 4. Comparative results for the size of the concept view between the

ns2 simulation and the analysis for a range of the characteristic

parameters.

Fig. 5. Comparison of the discovery ratio between the ns2 simulation

and the analytical results predicted by (10).


matching approach and the network overhead. The latencyreveals the required time before a semantic query discoversany potential matches for its concepts. It is measured as theratio between the matching associations established whenthe query executes versus all potential matches. The nodeoverhead demonstrates the impact of matching in eachnode. It is a measure of the number of concept comparisonsper second. Finally, the network overhead illustrates thecost of each protocol and is measured in the number ofmessage receptions per second.

5.1 Experimental Setup

Simulations in ns2 were conducted with network sizes of30 and 60 nodes. Experiments last for 500 seconds each,with each experiment repeated five times. Results arealways averages of these five runs. The experimental setupuses the random trip mobility model [26] in a simulationarea that maintains a constant density. The area dimen-sions are 915 � 915 and 1,297 � 1,297 m corresponding tonetwork sizes of 30 and 60 nodes. All nodes act asparticipants, with each node maintaining an ontology of10 concepts.

Since DiscBCast does not feature epidemic disseminationor replication, the parameters of node fanout, age, and ttlare not used. Concept fanout is used with similar semanticsas in OntoMobil. It represents the number of concepts thatare piggybacked with every discovery request.

The fundamental trade-off faced in DiscBCast whensetting parameters is the network overhead vis-a-vismatching ratio. Excluding the option to vary the conceptfanout parameter, the only two parameters that influencethis trade-off is the discovery time-out and the number ofnodes querying the network. OntoMobil uses two differenttime-outs: one for the gossip and one for the discoveryprotocol. The gossip time-out is selected randomlybetween the range of U½0:68; 1:11� seconds, while thediscovery time-out expires every 1 second. It is clear thata high rate of discovery queries (low time-out) forDiscBCast will accelerate convergence of the multipleontologies but will also generate excessive traffic. ForDiscBCast, we have set the discovery time-out to the value

of OntoMobil’s gossip time-out and also used a compensa-tion factor that gives each node a probability of 0.7 ofactually executing the discovery query, reducing some-what the broadcast storm problem [27].

The evaluation of the ontology matching latency usedsynthetic data. A set of random and unique concepts weregenerated that populated the ontology view of each node.To establish a notion of similarity, a number of conceptswere randomly selected from the ontology view of eachnode, and a mapping is created between each sourceconcept and one target concept selected from the remaindernodes. The mapping enables the appearance of transitivemapping relations, meaning that a concept maybe be relatedto more than one concept in other ontologies. Note that aconcept does not contain a reference to all other concepts inthe transitive mapping. Instead, a concept maintains amapping to only a subset of the concepts that compose thetransitive relation. Constructing the mapping relation insuch a way is more realistic in a distributed environment,since each ontology is not required to have completeknowledge of all other ontologies. On the other hand, suchan approach induces additional latency in order to facilitatecomplete semantic agreement.

5.2 Results

Fig. 6 depicts the matching latency with respect to protocolrounds. Latency is measured by having each node issue adiscovery query every 1 second for OntoMobil and with70 percent probability once every U ½0:68; 1:11� for each nodein the DiscBCast simulation. The query is composed of asingle concept, and we record the number of establishedmatches in each visited node. As the number of potentialmatches is known for each concept, it is straightforward toassess the matching ratio. The main factor that influencesthe number of rounds required before complete semanticmatching is established, i.e., when all concepts areaugmented with all potential matches, is the network size.The DiscBCast protocol performs marginally better, whichis expected because of the higher rate of concept matchingfacilitated by the broadcast nature of the protocol. As nodesincrease, we also observe an increase in the matchinglatency. This is attributed to the fact that the matchinglatency increases linearly with the number of total concepts,giving an idealized mean number of concept comparisonsper node of N� jVOj2, where jVOj is a constant factor. Thegraph shows that combining a gossip unicast approach withreplication can achieve comparable results with a broadcastprotocol but, as shown in the next paragraphs, withsignificantly reduced node and network overhead.

Table 1 displays the overhead incurred by the matchingprocess between OntoMobil and DiscBCast. We measure


Fig. 6. Comparison of matching latency between OntoMobil and

DiscBCast.

TABLE 1Node Overhead Measured in Concept

Comparisons per Node per Second


the overhead using the number of concept comparisonsper node per second. It is interesting to observe that thereis no significant difference as the number of nodes double.In the case of OntoMobil, this is facilitated by the gossipprotocol, which guarantees a constant number of messagereceptions per node, regardless of the network size. In thecase of DiscBCast, the relatively constant number ofconcept comparisons is due to 1) the increase in thesimulation area, guaranteeing a constant node density andhence reducing packet collisions, and 2) the scalability ofthe BCAST optimization, which prevents network conges-tion by eliminating redundant transmissions. As alreadymentioned in Section 4, the determining factor thatinfluences the processing overhead in each node is thesize of the concept view. As expected, by increasing the Ttparameter from two to three results in doubling the size ofthe concept view, which accounts for the higher number ofconcept comparisons between the two variations of theOntoMobil protocol.

Table 2 shows the network overhead incurred by the twoprotocols as measured by the number of protocol packetsreceived per node per second. Message reception canprovide us with an accurate metric of network saturation,especially when the comparison involves a broadcast and aunicast protocol. We have excluded routing and MACcontrol packets from this measurement to remove anyoverhead associated with OLSR and BCAST. As expected,the broadcast nature of DiscBCast incurs a higher toll on thenetwork, leading to increased packet transmission failuresand bandwidth saturation.

6 CONCLUSION

This paper presented OntoMobil, a model for the discoveryof semantic content that caters for MANETs and semanticdecentralization. The model relies on a randomized conceptdissemination mechanism to build a semantic overlaytopology. Such an overlay facilitates eventual semanticagreement between heterogeneous ontologies and providesa substrate for the discovery of content. We have provided astochastic analysis of the proposed model and verified theaccuracy against simulation results. We have also comparedthe two main features of OntoMobil, replication and unicastgossiping, against a broadcast protocol in order to shedlight on the trade-offs involved. OntoMobil performsfavorably in terms of discoverability with significantlyreduced overhead.

At the moment, the stochastic analysis (Section 4.2)does not incorporate a failure model. This has an impactin the accuracy of the stochastic model in scenarios withhigh message failure rate (> 15 percent). Integrating the

probability of lost messages will enhance the predictive

power of the model, increasing the accuracy of the

expected concept view size.Another aspect of OntoMobil that requires future

attention is the integration of a leave protocol that uses asoft-state mechanism, rather than relying on an explicitdisconnection procedure (Section 3.3.2). One approach is torely on the randomized nature of transmission and use theproperty that with some probability each node willeventually contact every other node, unless a failure occurs.

APPENDIX

NOTATION

1. VCi is the concept view for node i.

2. VOi is the ontology view for node i with constant

size G.3. VN

i is the node view for node i with constant sizeParameterNodeView.

4. Tt is the network-wide constant for the ttl parameter.5. Ta is the network-wide constant for the age

parameter.6. Fn is the network-wide constant for the node fanout.7. Fc is the network-wide constant for the concept

fanout.8. Gtotal is the total number of concepts in the network.9. G is the network-wide constant for the size of the

ontology view.10. Gc is the number of concepts in the network

excluding G concepts from the ontology view of asingle node.

11. �ðiÞ is a function returning the total number ofconcepts in both ontology and concept views,given i concepts in the concept view of a node.

12. �ðiÞ is a function returning the number of conceptsthat have a ttl value different to one and do notexist in either the receiver’s ontology or conceptviews, given i concepts in the concept view.

13. N is the number of nodes in the network.14. ParameterNodeView is a constant representing

the number of node ids in a node view,ParameterNodeView ¼ jVNj.

15. ParameterRequestTTL is the network-wide constantrepresenting the number of hops before a semanticquery expires.

ACKNOWLEDGMENTS

The authors would like to acknowledge the support of

Lero: The Irish Software Engineering Research Centre,

funded by Science Foundation Ireland. They would also

like to thank Vinny Cahill and Stefan Weber for their

valuable comments and support in this work.

REFERENCES

[1] D. Martin, M. Burstein, J. Hobbs, O. Lassila, D. McDermott, S.McIlraith, S. Narayanan, M. Paolucci, B. Parsia, T. Payne, E. Sirin,N. Srinivasan, and K. Sycara, OWL-S: Semantic Markup for WebServices, W3C, http://www.w3.org/Submission/OWL-S, Nov.2004.


TABLE 2Network Overhead Measured as the Average Number

of Received Messages per Node per Second


[2] K. Aberer, P. Cudre-Mauroux, A.M. Ouksel, T. Catarci, M.-S.Hacid, A. Illarramendi, V. Kashyap, M. Mecella, E. Mena, E.J.Neuhold, O. De Troyer, T. Risse, M. Scannapieco, F. Saltor, L. DeSantis, S. Spaccapietra, S. Staab, and R. Studer, “EmergentSemantics Principles and Issues,” Proc. Ninth Int’l Conf. DatabaseSystems for Advanced Applications (DASFAA ’04), pp. 25-38, 2004.

[3] K. Aberer, P. Cudre-Mauroux, and M. Hauswirth, “The ChattyWeb: Emergent Semantics through Gossiping,” Proc. 12th Int’lConf. World Wide Web (WWW ’03), pp. 197-206, 2003.

[4] A. Doan, J. Madhavan, P. Domingos, and A. Halevy, “Learning toMap between Ontologies on the Semantic Web,” Proc. 11th Int’lConf. World Wide Web (WWW ’02), pp. 662-673, 2002.

[5] S. Castano, A. Ferrara, and S. Montanelli, “H-MATCH: AnAlgorithm for Dynamically Matching Ontologies in Peer-BasedSystems,” Proc. First Int’l Workshop Semantic Web and Databases(SWDB ’03), pp. 231-250, 2003.

[6] K. Sycara, S. Widoff, M. Klusch, and J. Lu, “LARKS: DynamicMatchmaking among Heterogeneous Software Agents in Cyber-space,” Autonomous Agents and Multi-Agent Systems, vol. 5, no. 2,pp. 173-203, June 2002.

[7] K. Arnold, R. Scheifler, J. Waldo, B. O’Sullivan, and A. Wollrath,Jini Specification. Addison Wesley Longman, 1999.

[8] E. Guttman, C. Perkins, J. Veizades, and M. Day, Service LocationProtocol, Version 2. IETF, 1999.

[9] U.C. Kozat and L. Tassiulas, “Service Discovery in Mobile Ad HocNetworks: An Overall Perspective on Architectural Choices andNetwork Layer Support Issues,” Ad Hoc Networks, vol. 2, no. 1,pp. 23-44, 2004.

[10] F. Sailhan and V. Issarny, “Scalable Service Discovery forMANET,” Proc. Third IEEE Int’l Conf. Pervasive Computing andComm. (PerCom ’05), pp. 235-244, 2005.

[11] V. Lenders, M. May, and B. Plattner, “Service Discovery in MobileAd Hoc Networks: A Field Theoretic Approach,” Proc. Int’l Symp.World of Wireless, Mobile and Multimedia Networks (WoWMoM ’05),June 2005.

[12] W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek, A. Naeve,M. Nilsson, M. Palmer, and T. Risch, “EDUTELLA: A P2PNetworking Infrastructure Based on RDF,” Proc. 11th Int’l Conf.World Wide Web (WWW ’02), pp. 604-615, 2002.

[13] P. Haase, R. Siebes, and F. van Harmelen, “Peer Selection in Peer-to-Peer Networks with Semantic Topologies,” Proc. First Int’l Conf.Semantics in a Networked World (ICNSW’ 04), M. Bouzeghoub, ed.,pp. 108-125, June 2004.

[14] A. Loser, S. Staab, and C. Tempich, “Semantic Social OverlayNetworks,” IEEE J. Selected Areas in Comm., vol. 25, no. 1, pp. 5-14,2007.

[15] B.F. Cooper and H. Garcia-Molina, “SIL: A Model for AnalyzingScalable Peer-to-Peer Search Networks,” Computer Networks,vol. 50, no. 13, pp. 2380-2400, 2006.

[16] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, “PlanetP:Using Gossiping to Build Content Addressable Peer-to-PeerInformation Sharing Communities,” Proc. 12th IEEE Int’l Symp.High Performance Distributed Computing, pp. 236-246, 2003.

[17] D. Chakraborty, A. Joshi, T. Finin, and Y. Yesha, “GSD: A NovelGroup Based Service Discovery Protocol for MANETs,” Proc.Fourth IEEE Conf. Mobile and Wireless Comm. Networks (MWCN ’02),Sept. 2002.

[18] P.T. Eugster, R. Guerraoui, S.B. Handurukande, P. Kouznetsov,and A.M. Kermarrec, “Lightweight Probabilistic Broadcast,” ACMTrans. Computer Systems, vol. 21, no. 4, pp. 341-374, 2003.

[19] J. Luo, P.Th. Eugster, and J.-P. Hubaux, “PILOT: ProbabilistIcLightweight grOup communication sysTem for Mobile Ad HocNetworks,” IEEE Trans. Mobile Computing, vol. 3, no. 2,pp. 164-179, 2004.

[20] A.J. Ganesh, A.-M. Kermarrec, and L. Massoulie, “SCAMP: Peer-to-Peer Lightweight Membership Service for Large-Scale GroupCommunication,” Proc. Third Int’l Workshop Networked GroupComm. (NGC ’01), Nov. 2001.

[21] A. Nedos, K. Singh, R. Cunningham, and S. Clarke, “A GossipProtocol to Support Service Discovery with HeterogeneousOntologies in MANETs,” Proc. Third IEEE Int’l Conf. Wireless andMobile Computing, Networking and Comm. (WiMob ’07), Oct. 2007.

[22] A. Nedos, K. Singh, and S. Clarke, “Mobile Ad Hoc Services:Semantic Service Discovery in Mobile Ad Hoc Networks,” Proc.Fourth Int’l Conf. Service-Oriented Computing (ICSOC ’06), A. Danand W. Lamersdorf, eds., pp. 90-103, https://www.cs.tcd.ie/publications/tech-reports/reports.07/TCD-CS-2007-21.pdf, 2006.

[23] R.D. Yates and D.J. Goodman, Probability and Stochastic Processes: AFriendly Introduction for Electrical and Computer Engineers. JohnWiley & Sons, 1999.

[24] T. Kunz, “Multicasting in Mobile Ad-Hoc Networks: AchievingHigh Packet Delivery Ratios,” Proc. Conf. Centre for AdvancedStudies on Collaborative Research (CASCON ’03), pp. 156-170, 2003.

[25] S. McCanne and S. Floyd, ns, Network Simulator, 1997.[26] S. PalChaudhuri, J.-Y.L. Boudec, and M. Vojnovic, “Perfect

Simulations for Random Trip Mobility Models,” Proc. 38th Ann.Symp. Simulation (ANSS ’05), pp. 72-79, 2005.

[27] Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, and J.-P. Sheu, “The BroadcastStorm Problem in a Mobile Ad Hoc Network,” Wireless Networks,vol. 8, nos. 2/3, pp. 153-167, 2002.

Andronikos Nedos received the MSc and PhDdegrees from Trinity College Dublin. He is aresearch fellow in the Distributed SystemsGroup, Department of Computer Science,School of Computer Science and Statistics,Trinity College Dublin. His main research inter-ests include MANET service discovery protocols,group communication, randomized algorithms,and distributed middleware.

Kulpreet Singh studied civil engineering in theHimalayas and studied group communicationsystems for mobile ad hoc networks during hisPhD at Trinity College Dublin. He is currentlywith the Distributed Systems Group, Departmentof Computer Science, School of ComputerScience and Statistics, Trinity College Dublin.His interests include new networking paradigmsfor MANETs that question and challenge theassumptions made for the Internet. He has lived

on three different continents up till now.

Raymond Cunningham received the BA de-gree in mathematics and MSc and PhD degreesin computer science from Trinity College Dublin.He is a research fellow in the DistributedSystems Group, Department of ComputerScience, School of Computer Science andStatistics, Trinity College Dublin. His researchinterests cover the area of mobile distributedsystems, distributed systems optimization tech-niques and adaptive middleware. He has pub-

lished a number of peer-reviewed papers in the areas of ad hocnetworking, peer-to-peer, and distributed optimization.

Siobhan Clarke is a senior lecturer and fellow inthe Department of Computer Science, School ofComputer Science and Statistics, Trinity CollegeDublin, where she leads the Distributed SystemsGroup. Her research interests are programmingmodels and middleware for mobile and em-bedded systems.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.



Documents

544 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL