Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Università di Roma “La Sapienza”Dipartimento di Informatica e Sistemistica
M idd l ewa re L abo r a to r yM I D L A B
A Scalable P2P Architecture for Topic-based Event DisseminationRoberto Baldoni1Roberto Beraldi1Vivien Quéma2
Leonardo Querzoni1Sara Tucci Piergiovanni1
1. Università degli Studi di Roma “La Sapienza”2. INRIA Rhône-Alpes
19-3-2007
Index■The publish/subscribe communication
paradigm
■Event routing on a large-scale
■Traffic confinement
■Current solutions
■Evaluation of outer-cluster routing in TERA
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ From provider-centric communication (client/server) to document-centric communication (publish/subscribe):■Publishers: produce data in the form of events.
■Subscribers: declare interests on published data with subscriptions.
■ Each subscription is a filter on the set of published events.
■An Event Notification Service (ENS) notifies to each subscriber every published event that matches at least one of its subscriptions.
■ Interaction between publishers and a subscribers is decoupled in space, time and flow [Eugster et al., ACM Computing Surveys, 03].
The publish/subscribe communication paradigm
Publishers
Event
Notification
Service
Subscrib
ers
publish(e)
subscribe(s)
unsubscribe(s)
notify(e)
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■Topic-based event selection■ Each event published in the
system is tagged with a topic that completely characterizes its content.
■ Each subscription contains a topic which the subscriber is interested in.
■Other more complex models: hierarchical topic-based, content-based, type-based, etc.
■Topic-based publish/subscribe fits the needs of many common applications.
Event selection models
ccnosanco aocchwsdp oosscnwppip cjj po
t
topic t
Subscrib
er
notify()
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ In a peer-to-peer environment peers play both the roles of publishers/subscribers and event brokers.
■These publish/subscribe systems are usually implemented on top of overlay networks that provide:■ connectivity among participants
■ an efficient and reliable substrate for information diffusion
■Trivial solution to the problem of event dissemination:■ Each event is broadcasted in the network.
■ Subscription-based filtering is performed locally.
■ This usually implies a great waste of resources (on the network and on the nodes).
■The semantics of the publish/subscribe paradigm can be leveraged to confine the diffusion of each event only in the set of matched subscribers without affecting the whole network (traffic confinement).
Event routing in distributed ENSs
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■Traffic confinement can be realized solving three problems:
■ Interest clusteringSubscribers sharing similar interests should be arranged in a same cluster ; ideally, given an event, all and only the subscribers interested in that event should be grouped in a single cluster.
■Outer-cluster routingEvents can be published anywhere in the system. We need a mechanism able to bring each event from the node where it is published, to at least one interested subscriber.
■ Inner-cluster disseminationOnce a subscriber receives an event it can simply broadcast it in the cluster it is part of.
Traffic confinement
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ Scribe [Castro et al., IEEE Journal on Selected Areas in Communications n.8 v.20, 2002]
■ Topic-based publish/subscribe implemented on top of DHTs.
■ For each topic a single node is responsible to act as a rendez-vous point between published events and issued subscriptions.
■ Problems:
■ single points of failure;
■ hot spots;
■ partial traffic confinement.
Current solutions
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■CAN [Ratnasamy et al., Lecture Notes in Computer Science v.2233, 2001]
■ Based on techniques similar to Scribe.
■ Employs clustering: for each topic a
dedicated overlay is built and maintained.
■ Problems:
■ single points of failure;
■ hot spots;
■ rendez-vous nodes are forced to subscribe the topic.
Current solutions
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■Data Aware Multicast [Baehni et al., International Conference on Dependable Systems and Networks, 2004]
■Hierarchical Topic-based publish/subscribe.
■ Each topic is mapped to an overlay network.
■Overlays are maintained and interconnected through probabilistic algorithms.
■ Problem:
■ Publishers are forced tosubscribe topics
Current solutions
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ Sub-2-Sub [Voulgaris et al., International Workshop on Peer-to-Peer Systems, 2006]
■Content-based publish/subscribe.
■Complex three-levels infrastructure.
■ Employs clustering: brokers with similar interests are clustered in a same overlay.
■ Similarity is calculated checking intersections among subscriptions.
■ Problems:
■ depending on subscription distribution a huge number of distinct overlays must be maintained;
■ the number of overlay networks a single node participates to is not proportional to the number of subscriptions it stores.
Current solutions
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■A two-layer infrastructure:■All clients are connected by a single overlay network at the lower layer (general
overlay).
■Various overlay network instances at the upper layer connect clients subscribed to same topics (topic overlays).
■ Event diffusion:■ The event is routed in the
general overlay toward one of the nodes subscribed to the target topic.
■ This node acts as an access point for the event that is then diffused in the correct topic overlay.
A two-layer infrastructure for event diffusion
Topic overlay
Node
Node used as access point
Event routedin the system
General overlay
(a) System overview
TERA
Applications
Overlay Management Protocol
Network
Event Management Subscription Management
Broadcast
Partition Merging
Access Point Lookup
Peer Sampling
Size Estimation
subscribe unsubscribe publish notify
(b) Node architecture
Figure 1: The TERA publish/subscribe system.
2 An Overview of TERA
TERA is a topic-based publish/subscribe system designed to offer an event diffusion service for very
large scale peer-to-peer systems. Each published event is “tagged” with a topic and is delivered to
all the subscribers that expressed their interest in the corresponding topic by issuing a subscription
for it. The set of available topics is not fixed, nor predefined: applications using TERA can
dynamically create or delete them.
2.1 Architecture
Nodes participating to TERA are organized in a two-layers infrastructure (ref. Figure 1(a)). At
the lower layer, a global overlay network connects all nodes, while at the upper layer various topic
overlay networks connect subsets of all the nodes; each topic overlay contains nodes subscribed to
the same topic. All these overlay networks are separated and are maintained through an overlay
management protocol.
Subscription management and event diffusion in TERA are based on two simple ideas: nodes
4
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ Event routing in the general overlay is realized through a random walk.
■The walk stops at the first broker that knows an access point for the target topic.
Outer-cluster routing
B4
B6
B1
B3
B2
B5
t
to the topic overlay
topic AP
a B5
f B6
topic AP
x B1
a B5
topic AP
e B4
h B4
topic AP
t B1
y B6
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
OMPs: Newscast, Cyclon, etc.
Architecture
OverlayNetwork
ACCESS POINT
LOOKUP
Applications
Network
TERA
Overlay Management Protocol
General overlay
peer sampling
Topic 1 overlay
peer sampling
sizeestim.
Topic 2 overlay
peer sampling
sizeestim.
Topic n overlay
peer sampling
sizeestim.
EVENT MANAGEMENT SUBSCRIPTION MANAGEMENT
INNER-CLUSTER
DISSEMINATION
PARTITION
MERGING
ACCESS POINT TABLE
SUBSCRIPTION
TABLE
pub
lishe
d e
vent
s (low
er la
yer)
pub
lishe
d e
vent
s (u
pper
laye
r)ev
ents
node
IDs
lookup
check subscription
add o
r
rem
ove
topic
ove
rlay
siz
e
inst
antia
te/jo
in/le
ave
a to
pic
ove
rlay
subsc
riptio
n ad
vert
isem
ents
forc
e vi
ew
exch
ange
subsc
ription a
dve
rtisem
ents
viewexchange
joinoverlay
publish notify subscribe unsubscribe
rand
om
wal
ks
node
IDs
topic overlay id
Figure 2: A detailed view of the architecture of TERA.
as for notifying subscribers. An event dissemination startsas soon as an application publishes some data in a topic.It is done in two steps: the event is first routed to a nodesubscribed to the topic (this node acts as an access point forit); then, the access point diffuses the event in the overlayassociated to the topic. The first step is realized througha lookup executed on the Access Point Lookup component:if the lookup returns an empty list of node identifiers, thenode discards the event.
When a node subscribed to the topic receives an eventfor which it must act as an access point, it uses the broad-cast primitive provided by the Inner-Cluster Disseminationservice to forward the event to all nodes belonging to thecorresponding topic overlay. When a node subscribed tothe topic receives a broadcasted event, it notifies the appli-cation.
Access Points Lookup.The Access Point Lookup component plays a central role
in TERA’s architecture as it is used by both the Event Man-agement and Subscription Management components to ob-tain lists of access points identifiers for specific topics. Itsfunctioning is based on a local data structure, called Ac-cess Point Table (APT), and a distributed search algorithmbased on random walks.
Each APT is a cache, containing a limited number of en-tries, each with the form < t, n >, where t is a topic and nthe identifier of a node that can act as an access point for t.APTs are continuously updated following a simple strategy:each time a node receives a subscription advertisement fortopic t from a node n, it substitutes the access point identi-fier for t if an entry < t, n′ > exists in the APT, otherwiseit adds a new entry < t, n > with probability 1/Pt, wherePt is the popularity of topic t estimated by n and attached
to the subscription advertisement. When an APT exceedsa predefined size, randomly chosen entries are removed.
As a consequence of this update strategy, APTs have thefollowing properties:
1. APT entries tend to contain non-stale access points,
2. inactive topics (i.e. topics that are no longer subscribedby any node) tend to disappear from APTs,
3. each access point is a uniform random sample of thepopulation of nodes subscribed to that topic,
4. the content of each APT is a uniform random sampleof the set of active topics (i.e. topics subscribed by atleast one node),
5. the size of each APT is limited.
The first property is a consequence of the way new en-tries are added to APTs; suppose, in fact, that there is onlyone topic t in the system subscribed by two nodes, na andnb; suppose, moreover, that, at certain point of time, nb
unsubscribes t. Starting from that moment, only na willadvertise t, therefore nodes containing an entry < t, nb >will eventually substitute it with entry < t, na >, as theuniformity of node samples provided by the peer samplingservice guarantees that na will eventually advertise t to allthe system population. The second property comes from thefact that inactive topics are no longer advertised. They are,thus, eventually replaced by active topics in APTs (assum-ing that the set of active topics is larger than the maximumAPT size). The third property is a consequence of the factthat subscription advertisements are sent to nodes returnedby the peer sampling service that provides uniform randomsamples, and that each node advertises its subscriptions with
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■We want every topic to appear with the same probability in every APT, regardless of its popularity.
Results: topic distribution in APTs
4.2.1 Topic distribution in APTsWe start by presenting an experiment showing that the
method used in TERA to update APTs content ensures auniform distribution of topics in every APT. This is a fun-damental property for APTs as it allows TERA to use theircontent as a uniform random sample of the active topic pop-ulation and build on it the access point lookup mechanism.We ran tests over a system with 104 nodes, each advertisingits subscriptions every 5 cycles to 5 neighbors out of 20 (theoverlay management protocol view size). APT size was lim-ited to 10 entries. We issued 5000 subscriptions distributedin various ways on 1000 distinct topics, and we measured,for each topic, the number of APTs containing an entry forit. The expected outcome of these tests is to find a con-stant value for such measure, regardless of the initial topicpopularity distribution.
Figure 3(a) shows the results for an initial uniform distri-bution of topic popularity. The X axis represents the topicpopulation (each topic is mapped to a number). Each blackdot represents the number of times a specific topic appearsin APTs, while the grey dot represents its popularity. Theplot shows that each topic is present, on average, in the samenumber of APTs, with a very small error that is randomlydistributed around the mean. This confirms that the topicdistribution in APTs can be considered uniform.
Figures 3(b) and 3(c) show the results for an initial zipfdistribution of topic popularity. The two graphs report theresults for differently skewed popularity distributions (dis-tribution parameter a = 0.7 and a = 2.0). As these graphsshow, TERA is always able to balance APT updates, anddelivers an almost uniform distribution. Even in an extremecase (a = 2.0), the APT update mechanism is able to bal-ance the updates coming from the small number of activetopics (in this scenario only 79 topics share the whole 5000subscriptions), maintaining their presence in APTs aroundthe same average value with a small standard deviation (al-ways below 5%). In the next evaluations, we only report re-sults for zipf popularity distribution with a = 0.7, as resultsfor other values of a did not exhibit significant differences.
4.2.2 Access Point LookupIn this section, we evaluate the probability for the access
point lookup mechanism to successfully returns a node iden-tifier for a lookup operation (in the case such node exists).We denote by K the lifetime of the random walk (the max-imum number of visited nodes), by |APT | the size of APTtables, and by |T | the number of topics8. The probability pto find an access point for a specific topic in an APT is p =|APT |
|T | . Assuming that every APT contains the maximum al-lowed number of entries, the probability that an access pointcannot be found within K steps is Pr{fail} = (1− p)K .Thus, the probability to find the access point visiting at most
K nodes is Pr{success} = 1−(1− p)K = 1−“1− |APT |
|T |
”K.
Therefore, to ensure with probability P that an access pointfor a given topic will be found, it is necessary that sizes Kor |APT | be such that:
8Thanks to the fact that APTs can be considered as uniformrandom samples of the set of active topics, each node canestimate at runtime the value of |T | [16].
Distribution of subscriptions on APTs
(uniform)
0
20
40
60
80
100
120
140
160
180
200
0 200 400 600 800 1000
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=1,11
(a)
Distribution of subscriptions on APTs
(zipf a=0,7)
0
20
40
60
80
100
120
140
160
180
200
0 200 400 600 800 1000
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=2,16
(b)
Distribution of subscriptions on APTs
(zipf a=2,0)
0
250
500
750
1000
1250
1500
1750
2000
0 20 40 60 80
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=51,49
(c)
Figure 3: The plot shows how topics are distributedamong APTs (black dots) when the topic popular-ity distribution (grey dots) is (a) uniform and (b-c)skewed (zipf with parameter a).
4.2.1 Topic distribution in APTsWe start by presenting an experiment showing that the
method used in TERA to update APTs content ensures auniform distribution of topics in every APT. This is a fun-damental property for APTs as it allows TERA to use theircontent as a uniform random sample of the active topic pop-ulation and build on it the access point lookup mechanism.We ran tests over a system with 104 nodes, each advertisingits subscriptions every 5 cycles to 5 neighbors out of 20 (theoverlay management protocol view size). APT size was lim-ited to 10 entries. We issued 5000 subscriptions distributedin various ways on 1000 distinct topics, and we measured,for each topic, the number of APTs containing an entry forit. The expected outcome of these tests is to find a con-stant value for such measure, regardless of the initial topicpopularity distribution.
Figure 3(a) shows the results for an initial uniform distri-bution of topic popularity. The X axis represents the topicpopulation (each topic is mapped to a number). Each blackdot represents the number of times a specific topic appearsin APTs, while the grey dot represents its popularity. Theplot shows that each topic is present, on average, in the samenumber of APTs, with a very small error that is randomlydistributed around the mean. This confirms that the topicdistribution in APTs can be considered uniform.
Figures 3(b) and 3(c) show the results for an initial zipfdistribution of topic popularity. The two graphs report theresults for differently skewed popularity distributions (dis-tribution parameter a = 0.7 and a = 2.0). As these graphsshow, TERA is always able to balance APT updates, anddelivers an almost uniform distribution. Even in an extremecase (a = 2.0), the APT update mechanism is able to bal-ance the updates coming from the small number of activetopics (in this scenario only 79 topics share the whole 5000subscriptions), maintaining their presence in APTs aroundthe same average value with a small standard deviation (al-ways below 5%). In the next evaluations, we only report re-sults for zipf popularity distribution with a = 0.7, as resultsfor other values of a did not exhibit significant differences.
4.2.2 Access Point LookupIn this section, we evaluate the probability for the access
point lookup mechanism to successfully returns a node iden-tifier for a lookup operation (in the case such node exists).We denote by K the lifetime of the random walk (the max-imum number of visited nodes), by |APT | the size of APTtables, and by |T | the number of topics8. The probability pto find an access point for a specific topic in an APT is p =|APT |
|T | . Assuming that every APT contains the maximum al-lowed number of entries, the probability that an access pointcannot be found within K steps is Pr{fail} = (1− p)K .Thus, the probability to find the access point visiting at most
K nodes is Pr{success} = 1−(1− p)K = 1−“1− |APT |
|T |
”K.
Therefore, to ensure with probability P that an access pointfor a given topic will be found, it is necessary that sizes Kor |APT | be such that:
8Thanks to the fact that APTs can be considered as uniformrandom samples of the set of active topics, each node canestimate at runtime the value of |T | [16].
Distribution of subscriptions on APTs
(uniform)
0
20
40
60
80
100
120
140
160
180
200
0 200 400 600 800 1000
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=1,11
(a)
Distribution of subscriptions on APTs
(zipf a=0,7)
0
20
40
60
80
100
120
140
160
180
200
0 200 400 600 800 1000
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=2,16
(b)
Distribution of subscriptions on APTs
(zipf a=2,0)
0
250
500
750
1000
1250
1500
1750
2000
0 20 40 60 80
Topics
Nu
mb
er o
f p
resen
ces
Distribution on APTs
Popularity
Std.Dev=51,49
(c)
Figure 3: The plot shows how topics are distributedamong APTs (black dots) when the topic popular-ity distribution (grey dots) is (a) uniform and (b-c)skewed (zipf with parameter a).
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Which is the probability for an event to be correctly routed in the general overlay toward an access point ?
■Depends on:■ uniform randomness of topics
contained in access point tables;
■ access point table size;
■ random walk lifetime.
Results: outer-cluster routing
K =ln(1− P )
ln“1− |APT |
|T |
” or |APT | = |T |“1− K
√1− P
”
Note that, given K and P , |APT | linearly depends on |T |.In order to reduce APT size, it would be necessary to in-crease random walks length (i.e. using a large value for K)negatively affecting the time it takes to find an access point.To mitigate this problem, it is advisable to launch r multipleconcurrent random walks, each having a lifetime #K
r $. In-deed, the fact that topics are uniformly distributed amongAPTs guarantees that launching multiple concurrent ran-dom walks does not impact the lookup success rate. In thisway, access point lookup responsiveness is improved at thecost of a slightly larger overhead due to the independencyof each random walk lifetime.
We ran experiments to check that TERA’s behavior isclose to the one predicted by the analytical study. Tests wererun on a system with 1000 nodes, each having Cyclon viewsholding 20 nodes. At the beginning, 5000 subscriptionswere issued uniformly distributed on 1000 distinct topics.Lookups were started after 1000 cycles. Each lookup wasconducted starting four concurrent random walks (r = 4).
Figure 4(a) shows how the access point lookup success ra-tio changes when varying the lifetime of each random walk(K) for different values of |APT |. For each line, we plottedboth simulation results (solid line) and values calculated us-ing the analytical study (dashed line). The plot confirmsthat TERA’s lookup mechanism is able to probabilisticallyguarantee that an access point for an active topic will befound with probability P . Note that this plot also showsthat the actual memory size required by APTs is limited.Indeed, consider the biggest APT size plotted on the graph:400 entries. Assuming that each entry in an APT is a stringcontaining 256 characters, the memory size occupied by anAPT containing 400 entries is about 104kB.
4.2.3 Partition MergingIn this section, we analyze the probability for the partition
merging mechanism to detect a very small overlay partition,and the time it takes for this to happen. Suppose that thereis a topic represented by an overlay network partitioned intwo clusters containing |G| and 1 nodes, respectively9. Letus call n this single node. The probability p to detect thepartition in a cycle can be expressed as p = 1−(pa·pb), wherepa is the probability that none of the nodes in G advertiseits subscriptions to n, and pb is the probability that n doesnot advertise its subscriptions to any of the nodes in G.
Probability pa can be expressed as
pa = (1− Pr{a node advertises to n})|G|
Every node in G advertises its subscription to n only ifn is contained in its view for the general overlay, and if nis one of the D nodes selected for the advertisement. Letus suppose, for the sake of simplicity, that D is equal to theview size. In this case Pr{a node advertises to n} = |V iew|
(N−1) ,
9Note that the case where a partition is constituted by asingle node is the most difficult to solve as the probability fornodes belonging to distinct partitions to meet is the lowestpossible one.
Random Walk success rate.
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0 1 2 3 4 5 6 7 8 9 10
Random walk lifetime
Su
ccess r
ate
APT 50 Sim APT 50 Theo APT 100 Sim
APT 100 Theo APT 400 Sim APT 400 Theo
(a)
Cycles needed to merge a partitioned node
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0 50 100 150 200
Cycles
Pro
bab
ilit
y o
f m
erg
e
|G|=4 Sim
|G|=4 Theo
|G|=16 Sim
|G|=16 Theo
|G|=64 Sim
|G|=64 Theo
(b)
Figure 4: (a) The plot shows how the success ratefor access point lookups changes when varying themaximum APT size and the random walk lifetime.Solid lines represent results from the simulator,while dashed lines plot values from the formula.(b) The plot shows how the probability to detecta topic overlay partition increases with time (cy-cles). Solid lines represent results from the simula-tor, while dashed lines plot values from the formula.The tests were run varying the number |G| of nodessubscribed to the topic.
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■ Load imposed on nodes by our system is fairly distributed:■ no hot spots or single points of failure;
■ “The more you ask, the more you pay”.
Results: load distribution
Node stress distribution
general overlay - uniform popularity
1,E-05
1,E-04
1,E-03
Node population
Percen
tag
e o
f m
essag
es h
an
dle
d
Std.Dev.=4,04E-06
Node stress distribution
global - uniform popularity
1,E-05
1,E-04
1,E-03
Node population
Percen
tag
e o
f m
essag
es h
an
dle
d
1
10
100
Nu
mb
er o
f lo
cal
su
bscrip
tio
ns
Messages (percentage)
Subscriptions
Std.Dev.=1,23E-05
(a)
Node stress distribution
general overlay - zipf popularity
1,E-05
1,E-04
1,E-03
Node population
Percen
tag
e o
f m
essag
es h
an
dle
d
Std.Dev.=4,13E-06
Node stress distribution
global - zipf popularity
1,E-05
1,E-04
1,E-03
Node population
Percen
tag
e o
f m
essag
es h
an
dle
d
1
10
100
Nu
mb
er o
f lo
cal
su
bscrip
tio
ns
Messages (percentage)
Subscriptions
Std.Dev.=1,23E-05
(b)
Figure 5: The plots show how the load generated by TERA is distributed among nodes when the
distribution of topic popularity is either uniform (a) or zipf (b). For both popularities, the figure
shows in the left graph the load distribution in the general overlay and, in the right graph, the
global load distribution (black points), together with the subscription distribution on nodes (grey
points).
Figures 5(a) show the results for a test with uniform topic popularity, while figures 5(b) show
the same results for an initial zipf distribution with parameter a = 0.7. Pictures on the left
show how load is distributed in the general overlay. As shown by the graphs, TERA is able
to uniformly distribute load among nodes, avoiding the appearance of hot spots. This result is
obtained regardless of the distribution of topic popularities. Pictures on the right show the global
load experienced by nodes; in these graphs, nodes on the X axis are ordered in decreasing local
subscriptions count (i.e. points on the left refer to nodes subscribed to more topics), in order
to show how the global load is affected by the number of subscriptions maintained at each node.
The number of subscriptions per node is also plotted with grey dots. The graphs show how load
distribution closely follows the distribution of subscription on nodes, actually implementing the
pragmatic rule “the more you ask, the more you pay”, then fairly distributing the load among
participants.
18
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Experiments show how our system scales with respect to:
■Number of subscriptions.
■Number of topics.
■ Event publication rate.
■Number of nodes.
(reference figure is given by a simple event flooding approach)
Results: scalability
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+01 1,E+03 1,E+05 1,E+07
Subscriptions
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
event rate: 1
(a)
Average notification cost
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05
Topics
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
subscriptions: 10000
event rate: 1
(b)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05
Event publication rate
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
subscriptions: 10000
(c)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
1,E+08
1,E+09
1,E+01 1,E+03 1,E+05 1,E+07 1,E+09
Nodes
Messag
es p
er n
oti
ficati
on
Event flooding TERA
subscriptions: 10000
topics: 100
event rate: 1
(d)
Figure 6: The plots show the average number of messages needed by TERA to notify an event
when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total
number of nodes in the system (d) varies. For each figure, results from a simple event flooding
algorithm are reported for comparison.
4.3.2 Message cost per notification
The traffic confinement strategy implemented by TERA induces some overhead. In order to assess
the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a
single event to a subscriber, namely the total number of generated messages divided by the number
of notifications. This cost includes both messages generated to diffuse the event, and messages
generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred
by a simple event flooding-based approach7 in the same settings.
Figure 6(a) reports the results when the total number of subscriptions varies between 102 and
106. The number of topics is fixed and equal to 100. The network considered in this test was
constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per
topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed
by at least one subscriber; therefore, each curve is limited on its left end by the number of available
topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained
through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the
same considered in TERA.
19
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Experiments show how our system scales with respect to:
■Number of subscriptions.
■Number of topics.
■ Event publication rate.
■Number of nodes.
(reference figure is given by a simple event flooding approach)
Results: scalability
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+01 1,E+03 1,E+05 1,E+07
Subscriptions
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
event rate: 1
(a)
Average notification cost
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05
Topics
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
subscriptions: 10000
event rate: 1
(b)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05
Event publication rate
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
subscriptions: 10000
(c)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
1,E+08
1,E+09
1,E+01 1,E+03 1,E+05 1,E+07 1,E+09
Nodes
Messag
es p
er n
oti
ficati
on
Event flooding TERA
subscriptions: 10000
topics: 100
event rate: 1
(d)
Figure 6: The plots show the average number of messages needed by TERA to notify an event
when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total
number of nodes in the system (d) varies. For each figure, results from a simple event flooding
algorithm are reported for comparison.
4.3.2 Message cost per notification
The traffic confinement strategy implemented by TERA induces some overhead. In order to assess
the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a
single event to a subscriber, namely the total number of generated messages divided by the number
of notifications. This cost includes both messages generated to diffuse the event, and messages
generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred
by a simple event flooding-based approach7 in the same settings.
Figure 6(a) reports the results when the total number of subscriptions varies between 102 and
106. The number of topics is fixed and equal to 100. The network considered in this test was
constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per
topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed
by at least one subscriber; therefore, each curve is limited on its left end by the number of available
topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained
through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the
same considered in TERA.
19
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Experiments show how our system scales with respect to:
■Number of subscriptions.
■Number of topics.
■ Event publication rate.
■Number of nodes.
(reference figure is given by a simple event flooding approach)
Results: scalability
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+01 1,E+03 1,E+05 1,E+07
Subscriptions
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
event rate: 1
(a)
Average notification cost
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05
Topics
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
subscriptions: 10000
event rate: 1
(b)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05
Event publication rate
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
subscriptions: 10000
(c)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
1,E+08
1,E+09
1,E+01 1,E+03 1,E+05 1,E+07 1,E+09
Nodes
Messag
es p
er n
oti
ficati
on
Event flooding TERA
subscriptions: 10000
topics: 100
event rate: 1
(d)
Figure 6: The plots show the average number of messages needed by TERA to notify an event
when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total
number of nodes in the system (d) varies. For each figure, results from a simple event flooding
algorithm are reported for comparison.
4.3.2 Message cost per notification
The traffic confinement strategy implemented by TERA induces some overhead. In order to assess
the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a
single event to a subscriber, namely the total number of generated messages divided by the number
of notifications. This cost includes both messages generated to diffuse the event, and messages
generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred
by a simple event flooding-based approach7 in the same settings.
Figure 6(a) reports the results when the total number of subscriptions varies between 102 and
106. The number of topics is fixed and equal to 100. The network considered in this test was
constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per
topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed
by at least one subscriber; therefore, each curve is limited on its left end by the number of available
topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained
through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the
same considered in TERA.
19
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Experiments show how our system scales with respect to:
■Number of subscriptions.
■Number of topics.
■ Event publication rate.
■Number of nodes.
(reference figure is given by a simple event flooding approach)
Results: scalability
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+01 1,E+03 1,E+05 1,E+07
Subscriptions
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
event rate: 1
(a)
Average notification cost
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05
Topics
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
subscriptions: 10000
event rate: 1
(b)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05
Event publication rate
Messag
es p
er n
oti
ficati
on
Event flooding TERA
nodes: 10000
topics: 100
subscriptions: 10000
(c)
Average notification cost
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
1,E+08
1,E+09
1,E+01 1,E+03 1,E+05 1,E+07 1,E+09
Nodes
Messag
es p
er n
oti
ficati
on
Event flooding TERA
subscriptions: 10000
topics: 100
event rate: 1
(d)
Figure 6: The plots show the average number of messages needed by TERA to notify an event
when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total
number of nodes in the system (d) varies. For each figure, results from a simple event flooding
algorithm are reported for comparison.
4.3.2 Message cost per notification
The traffic confinement strategy implemented by TERA induces some overhead. In order to assess
the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a
single event to a subscriber, namely the total number of generated messages divided by the number
of notifications. This cost includes both messages generated to diffuse the event, and messages
generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred
by a simple event flooding-based approach7 in the same settings.
Figure 6(a) reports the results when the total number of subscriptions varies between 102 and
106. The number of topics is fixed and equal to 100. The network considered in this test was
constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per
topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed
by at least one subscriber; therefore, each curve is limited on its left end by the number of available
topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained
through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the
same considered in TERA.
19
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■Current results (static setting)■ The outer-cluster routing mechanism is able to probabilistically guarantee that a
an event will be delivered to the correct cluster.
■ The impact of this mechanism on system scalability is small, when a large number of events is published.
■Moreover, it is able to uniformly distribute the load on the system, avoiding the appearance of hot spots.
Conclusions and open problems
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
■Open problems■ Evaluation in dynamic settings.
■What happens when we insert subscription/node churn ?
■An unsubscription can create a stale access point in some APTs.
■We must evaluate the impact of this problem on outer-cluster routing reliability.
■Can we provide (probabilistically) reliable inner-cluster dissemination ?
■Can we exploit node heterogeneity ?
■Currently we try to uniformly distribute load.
■This is not necessarily desirable if participants have different characteristics and available resources.
Conclusions and open problems
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Questions ?
Mid
dle
war
e L
abo
rato
ryM
IDL
AB
Topic overlays can get partitioned in specific cases.
■The Partition Merging mechanism guarantees that a partitioned topic overlay will be eventually merged.
■The more popular is the topic, the less it takes for the partition to be detected and repaired.
Results: traffic confinement
Random Walk success rate.
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0 1 2 3 4 5 6 7 8 9 10
Random walk lifetime
Su
ccess r
ate
APT 10 Sim APT 10 Theo APT 50 Sim APT 50 Theo
APT 100 Sim APT 100 Theo APT 400 Sim APT 400 Theo
(a)
Cycles needed to merge a partitioned node
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0 50 100 150 200
Cycles
Pro
bab
ilit
y o
f m
erg
e
|G|=4 Sim
|G|=4 Theo
|G|=16 Sim
|G|=16 Theo
|G|=64 Sim
|G|=64 Theo
(b)
Figure 4: (a) The plot shows how the success rate for access point lookups changes when varying the
maximum APT size and the random walk lifetime. Solid lines represent results from the simulator,
while dashed lines plot values from the formula. (b) The plot shows how the probability to detect a
topic overlay partition increases with time (cycles). Solid lines represent results from the simulator,
while dashed lines plot values from the formula. The tests were run varying the number |G| of nodes
subscribed to the topic.
At the beginning, 5000 subscriptions were issued uniformly distributed on 1000 distinct topics.
Lookups were started after 1000 cycles. Each lookup was conducted starting four concurrent
random walks (r = 4).
Figure 4(a) shows how access point lookup success ratio changes when varying the lifetime of
each random walk (K) for different values of |APT |. For each line, we plotted both simulation
results (solid line) and values calculated using the analytical study (dashed line). The plot confirms
that TERA’s lookup mechanism is able to probabilistically guarantee that an access point for an
active topic will be found with probability P .
15