15
Designing and orchestrating reproducible experiments on federated networking testbeds Thierry Rakotoarivelo , Guillaume Jourjon, Max Ott NICTA, 13 Garden Street, Eveleigh, NSW, Australia article info Article history: Received 1 July 2012 Received in revised form 6 December 2013 Accepted 27 December 2013 Available online 4 January 2014 Keywords: Federation Testbeds Experiment definition and orchestration abstract In addition to theoretical analysis and simulations, the evaluation of new networking tech- nologies in a real-life context and scale is critical to their global adoption and deployment. Federations of experimental platforms (aka testbeds) offer a controlled and cost-effective solution to perform such an evaluation. Most recent efforts in that area focused on building those facilities and providing experimenters with tools to allow the discovery and provi- sioning of their shared resources. Many challenges remain in order to support the complete experiment life cycle in a federated environment. We propose OMF-F, a framework which allows the definition of networking experiments and their execution over shared resources provided by different federated administrative domains. OMF-F provides a domain-specific language enabling rich event-based experi- ment descriptions. It defines a specific resource model and protocol, which together with its publish-subscribe messaging system allows automatic experiment orchestrations at a large scale. OMF-F further provides interfaces to operate with existing resource discovery and provisioning tools for federated testbeds. Our contributions in this paper are threefold. First we provide detailed descriptions of OMF-F’s design, its architecture, and its involved entities. Then, we present a quantitative evaluation of its underlying messaging and event-handling systems. Finally, we discuss two real examples of OMF-F deployed and used on federated domains to define and exe- cute experiments. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Networking technologies have had a profound impact on societies and economies, with more than a quarter of the world now regularly using the Internet [1]. While these technologies are based on often solid theoretical frame- works, the intrinsic nature of large networks and the limi- tation in fully characterising an entire system require extensive simulation and ultimately evaluation at sufficient scale in real-world settings. Until recently the cost and effort associated with building these experimental facilities, also referred to as testbeds, limited their scale and the number of promising research ideas which could progress to the experimental validation phase. Unfortunately, the problem is not only one of scale, but also one of resource diversity and rapid technology pro- gress. Any testbed built with bleeding-edge equipment will quickly look dated and needs frequent refreshes. In addi- tion, most research groups focus on specific domains which translates to testbeds with a similarly narrow focus which in turn limits their potential audience beyond that group. There are essentially two approaches to addressing this problem. The first one is to pool the resources of a commu- nity and build a small number of large-scale shared 1389-1286/$ - see front matter Ó 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.bjp.2013.12.033 Corresponding author. Tel.: +61 293762199. E-mail addresses: [email protected] (T. Rakotoarivelo), [email protected] (G. Jourjon), [email protected] (M. Ott). Computer Networks 63 (2014) 173–187 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet

Designing and orchestrating reproducible experiments on federated networking testbeds

  • Upload
    max

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Designing and orchestrating reproducible experiments on federated networking testbeds

Computer Networks 63 (2014) 173–187

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/ locate/comnet

Designing and orchestrating reproducible experimentson federated networking testbeds

1389-1286/$ - see front matter � 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.bjp.2013.12.033

⇑ Corresponding author. Tel.: +61 293762199.E-mail addresses: [email protected] (T. Rakotoarivelo),

[email protected] (G. Jourjon), [email protected] (M.Ott).

Thierry Rakotoarivelo ⇑, Guillaume Jourjon, Max OttNICTA, 13 Garden Street, Eveleigh, NSW, Australia

a r t i c l e i n f o

Article history:Received 1 July 2012Received in revised form 6 December 2013Accepted 27 December 2013Available online 4 January 2014

Keywords:FederationTestbedsExperiment definition and orchestration

a b s t r a c t

In addition to theoretical analysis and simulations, the evaluation of new networking tech-nologies in a real-life context and scale is critical to their global adoption and deployment.Federations of experimental platforms (aka testbeds) offer a controlled and cost-effectivesolution to perform such an evaluation. Most recent efforts in that area focused on buildingthose facilities and providing experimenters with tools to allow the discovery and provi-sioning of their shared resources. Many challenges remain in order to support the completeexperiment life cycle in a federated environment.

We propose OMF-F, a framework which allows the definition of networking experimentsand their execution over shared resources provided by different federated administrativedomains. OMF-F provides a domain-specific language enabling rich event-based experi-ment descriptions. It defines a specific resource model and protocol, which together withits publish-subscribe messaging system allows automatic experiment orchestrations at alarge scale. OMF-F further provides interfaces to operate with existing resource discoveryand provisioning tools for federated testbeds.

Our contributions in this paper are threefold. First we provide detailed descriptions ofOMF-F’s design, its architecture, and its involved entities. Then, we present a quantitativeevaluation of its underlying messaging and event-handling systems. Finally, we discusstwo real examples of OMF-F deployed and used on federated domains to define and exe-cute experiments.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

Networking technologies have had a profound impacton societies and economies, with more than a quarter ofthe world now regularly using the Internet [1]. While thesetechnologies are based on often solid theoretical frame-works, the intrinsic nature of large networks and the limi-tation in fully characterising an entire system requireextensive simulation and ultimately evaluation atsufficient scale in real-world settings. Until recently the

cost and effort associated with building these experimentalfacilities, also referred to as testbeds, limited their scaleand the number of promising research ideas which couldprogress to the experimental validation phase.

Unfortunately, the problem is not only one of scale, butalso one of resource diversity and rapid technology pro-gress. Any testbed built with bleeding-edge equipment willquickly look dated and needs frequent refreshes. In addi-tion, most research groups focus on specific domainswhich translates to testbeds with a similarly narrow focuswhich in turn limits their potential audience beyond thatgroup.

There are essentially two approaches to addressing thisproblem. The first one is to pool the resources of a commu-nity and build a small number of large-scale shared

Page 2: Designing and orchestrating reproducible experiments on federated networking testbeds

174 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

facilities. The second approach is to federate a large num-ber of independent testbeds to allow experimenters toassemble larger ‘virtual’ testbeds from many smaller, phys-ical ones. The two are actually complimentary as mostexperimental investigations progress through differentstages of scale and fidelity, where the majority of theexperiments can be served by the smaller testbeds.

In fact, this is what happened in the network researchcommunity over the last few years. Initiatives, such asGENI [2] in the US and FIRE1 in Europe (through projectslike OneLab2) have been developing federation frameworksto integrate existing testbeds and facilitate the addition ofnew ones with new capabilities. This has led to substantialincrease in the number and diversity of available resources.However, the number of users and research outcomes whichhave been enabled by these large facilities is still lagging.

We observe that most of the effort went into buildingthose facilities, and much less so towards tools to supportthe experimenter. This was often a deliberate decision, asthere is little consensus on how researchers want to usethese testbeds. Thus, architectures like SFA [3] adoptedthe ‘‘hour glass’’ principle of the Internet, and defined a‘‘small waist’’ on which many different user tools can bebuilt. While we agree with this principle, we believe thatits current application is too narrow and too much focusedon the provisioning of resources directly provided by theindividual testbeds.

We will argue in this paper for a small set of architec-tural principals and mechanisms to support the experi-ment life cycle in a federated environment. This includesa generic way to describe all resources required for anexperiment, the orchestration of the experiment over itslife time, as well as the provenance of all producedmeasurements.

Most of the existing federation solutions provide uni-fied mechanisms to discover and provision resources of-fered by the testbeds. However this does not readilyextend to resources provided by the experimenters. Anexample of such resource would be a software service,e.g. a novel content delivery network (CDN) service [4],which may be made available for other researchers toexperiment with, effectively running an experiment ontop of another one. Discovering, provisioning, and control-ling them in the same manner would greatly simplify thetools used to manage experiments. Although the existingmechanisms can be extended to support this, the absenceoff such use cases in almost all the relevant architecturediscussions is troubling.

Most researchers currently rely on custom scripts andad hoc tools to define and orchestrate an experiment overa direct secure connection (ssh) to the resources. However,these scripts and tools are often closely integrated with aspecific experiment realisation, and thus often do not allowits reproduction in a different context or cater for othertype of experiments.

The other element missing in most federationarchitectures is the support for instrumentation and

1 http://cordis.europa.eu/fp7/ict/fire/.2 http://onelab.eu.

measurements (I&M). This is often assumed to be part ofthe user tools, or segregated in a measurement plane. Wewill argue that I&M needs to be managed in the same man-ner as other resources. This raises issues, such as the har-monisation of resource naming and properties to ensuresound provenance tracking of produced measurement, aswell as authorisation, as collecting some measurementsmay require extended privileges (e.g. a spectrum analyserin a shared wireless testbed). Finally, maintaining long-running experiments requires ongoing measurements todetect and recover from resource failure.

To address these challenges we have developed OMF-F,a generic framework to allow the definition and orchestra-tion of experiments using shared (already provisioned) re-sources from different federated testbeds. OMF-F is basedon OMF [5], which was originally developed for single test-bed deployments. OMF-F makes three novel contributionscompared to existing experiment orchestration solutions.First it provides a domain-specific language based on anevent-based execution model to fully describe even com-plex experiments, such as ones involving cars or phonesmoving out of range of all control networks. OMF-F alsodefines a generic resource model and concise interactionprotocol, which allows third parties to contribute new re-sources as well as develop new tools and mechanisms tocontrol an experiment. Finally, it has a distributed commu-nication infrastructure supporting the scalable orchestra-tion of thousands of distributed and potentiallydisconnected resources.

In addition to these novel features, OMF-F easily inte-grates with systems from other complementary testbedframeworks. Indeed, it interfaces with SFA to allowresearchers to discover and provision resources to be usedin their experiments. It is also compatible with measure-ment resources developed within instrumentation andmonitoring initiatives.3 OMF-F can also be paired with aweb-based interface to act as a digital laboratory notebookand capture study cycles involving multiple experiments,through the use of additional wiki, versioning and statisticalanalysis capabilities [6].

Section 3 describes our objective and discusses the userand design requirements for OMF-F. From these specifica-tions, we propose a resource model and an architecture forOMF-F in Section 4. This architecture is composed of dis-tributed entities and systems to support their interactions,which we further detail in the remainder of that section. InSection 5, we focus on the event-based model used to de-scribe experiments and some of the related capabilities.Most parts of OMF-F have been either deployed in existingtestbeds and demonstrated at past events, such as the GENIEngineering Conferences (GEC). In Section 6, we evaluatethe performance of some of the key aspects of OMF-F whenused on these existing testbeds. We then present someexamples of real experiment demonstrators highlightingspecific OMF-F features in Section 7. Finally, we con-clude this paper and discuss future work directions inSection 8.

3 http://groups.geni.net/geni/wiki/GIMI.

Page 3: Designing and orchestrating reproducible experiments on federated networking testbeds

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 175

2. Background and related work

2.1. OMF (Orbit Management Framework)

OMF was originally developed to operate the ORBIT [7]wireless testbed. It has been extended in the recent yearsto support more types of resources and additional experi-mental features [5,8]. OMF has a dual purpose. First, it pro-vides services to help testbed operators efficiently managetheir resources, such as remote bootstrapping or disk im-age saving and loading. Second, it provides experimenterswith software tools to describe experiments and automat-ically run them on a testbed. In this contribution, we buildon the original OMF and significantly modify its core de-sign to allow its operation across large scale distributedand federated testbeds.

2.2. Resource discovery and provisioning in federated testbeds

The Slice-based Federation Architecture (SFA) [3] is aframework which federates many GENI and FIRE testbeds.It defines the notion of a slice, which is a container abstrac-tion for all the resources in a given experiment. Research-ers are associated with slices and can use the SFA tools todiscover, provision and include resources in their slice.SFA uses a hierarchical naming scheme to refer to entitiesand implements a distributed architecture to enable theirinteractions. It handles authentication and authorisationthrough the use of PKI-like certificates and credentials thatare trackable along the trust chains between the users andinstitutions involved in the federation.

ProtoGENI [9] is another GENI-funded framework,which implements a variant of SFA for resource discoveryand provisioning. It implements a SFA Clearinghouse en-tity acting as a centralised certificate repository and a reg-istry for other entities (e.g. users, resources) within thefederation. ProtoGENI also introduces a Slice Authority(SA) entity for each federated institution. Researchers affil-iated with an institution use its SA to create, modify, de-stroy their slices. ProtoGENI currently federates manytestbeds over multiple GENI-affiliated institutions in theUSA.

Federation frameworks were also developed within theFIRE initiative. For example, the Panlab project proposed aTEAGLE-based framework [10,11] to allow the discoveryand provision of resources on both PanLab and PlanetLabtestbeds [12]. It relies on a Panlab-specific SFA interface,which interacts with the SFA-managed PlanetLab re-sources. Expedient4 is another framework focusing on thereservation and provision of virtual machines and OpenFlowswitches. Other contributions such as Top-Hat [13] federatedifferent measurement infrastructures to provide research-ers with valuable information during the discovery andselection of resources for their experiments.

While these contributions are essential to find and pre-pare resources from various institutions to be used in anexperiment, they do not address the issues of describing

4 https://alpha.fp7-ofelia.eu/doc/index.php.

and performing the experiment itself. OMF-F complementsthese contributions by addressing these last two issues andby further providing interfaces to some of them (e.g. inter-face with SFA in Section 4.7).

2.3. Experiment definition and orchestration

The GENI User Shell (GUSH) [14] allows GENI users todescribe the execution of their experiments in an XML doc-ument. The user’s GUSH controller interprets that docu-ment and sends corresponding commands to a GUSHclient running on each resource involved in the experi-ment. GUSH relies on direct unicast connections betweenthe controller and clients and the availability of ssh. Assuch, it does not support experimental cases involving dis-connections as found in mobile wireless meshes, or delaytolerant networks.

The Network Experimentation Programming Interface(NEPI) [15] provides tools to describe an experiment as abox-model workflow. Similar to GUSH, a controller inter-prets this description, and sends related commands to asingle entity per testbed responsible for executing themon all its resources. NEPI allows the orchestration of exper-iments in simulated, emulated, or testbed environmentsand their combinations. However, it currently supports alimited number of these environments.

Emulab [16] extends the ns-25 domain-specific languageto allow users to define experiments to be performed onEmulab testbeds. This experiment orchestration systemhas many features such as detailed control of link emulationbetween resources, synchronisation barriers, or remote soft-ware install. In addition, it also has some tracing capabilitiesto collect link measurements. However, this system does notappear to be able to orchestrate experiments involving re-sources from different federated testbeds.

In contrast with these contributions, OMF-F supports awide range of experimental cases involving many hetero-geneous resources over different environments, as demon-strated in Section 7.

3. Objective, scope, and requirements

3.1. Objective

The goal of our OMF-F system is to enable the descrip-tion and automatic orchestration of reproducible experi-ments involving resources from multiple testbeds underdifferent administrative domains. Thus OMF-F aims at en-abling the experimental scenario illustrated in Fig. 1. Someresearchers working on a study want to perform someexperiments, for which they require resources (e.g. R1)from different testbeds or aggregates (e.g. A) provided bydistinct administrative domains. These resources are de-fined in a resource description, and included in a virtualaggregate (e.g. X) or slice [3,10]. Users define their experi-ments in experiment descriptions (e.g. Exp1), which areused to direct their execution over the resources withinthe virtual aggregate.

5 http://nsnam.isi.edu/nsnam/.

Page 4: Designing and orchestrating reproducible experiments on federated networking testbeds

Aggregate A Aggregate BVirtual Aggregate X

Resource Description

ResearchCollaborators Exp 1

Description

R1 R2

R4R3

Administrative Domain A Administrative Domain B

Fig. 1. Federation scenario and involved entities.

176 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

3.2. Scope

We loosely define a federation as an agreement be-tween participating entities representing experimentersand testbed providers on how to share resources and con-duct experiments. In more technical terms, federations willbe defined by allocation and usage policies as well as amutually agreed set of APIs and mechanisms for experi-menters to discover, reserve, provision, and interact withthe resources provided to the federation. Ideally, the userswould interact with the federated aggregates as if theywere a single one. Fig. 2 shows a potential set of functionalcomponents to achieve such a unified view.

The Discovery and Provisioning component (D&P) holdsthe list of capabilities offered by the federation (such asavailable resources), and the list of entities involved in it(such as institutions, users). It accepts user requests for re-sources and handles reservations if required. It may offersadvanced features such as constraint-based selection, e.g.at least x resources of type y. This component also preparesresources for their use in experiments. This provisioningphase may involve different tasks depending on the re-source’s type, e.g. creating a virtual machine, turning on aphysical device, or loading an OS image. Section 2.2 pre-sents some of the systems implementing this functionalcomponent.

The Experiment Execution component (EE) providesexperimenters with domain-specific constructs to describetheir experiments. It processes these descriptions and en-sures the provisioning of the resources and the overallexperiment orchestration over the experiment’s life-time.This may include tasks, such as setting values of resourceproperties, performing actions at certain time, or collectingspecific measurements from or through resources. Thusthe EE is critical in enabling reproducible experiments atlarge scales with resources from different aggregates. In-deed, a high level description allows the unambiguousreproduction of an experiment on similar or different

Experiment Execution (EE) Discovery & Provisioning (D&P)

User Interface

Instrumented Resources from Federated Aggregates

Experiment Description Resource Description

OM

F-F

OM

F-F

Fig. 2. Functional components in a federation of testbeds and scope ofOMF-F.

resources (e.g. how does an adhoc routing protocol per-form over different types of wireless technologies). Whilean automated orchestration facilitates its execution onlarge number of distributed resources, as it frees the userfrom dealing with resource heterogeneity and large scalecommunication. Only a few contributions have focusedon the EE component in federated environments. Indeed,the systems presented in Section 2.3 have limited capabil-ities or do not support federations. OMF-F aims at fillingthat gap and focuses specifically on this EE function.

The UI component is the interface between the usersand either or both the D&P and the EE components. Itcan be a text based user-shell [14], or a web-based soft-ware [9], or a set of web services recording a completestudy cycle from experiment design to result analysisand curation [6].

3.3. User and design requirements

From an experimenter’s perspective, a framework sup-porting the entire experiment life-cycle should have thefollowing characteristics.

Ease of use. The abstractions and APIs to describe exper-iments and initiate their execution should have a lowlearning curve, and be backed-up by comprehensive docu-mentation and tutorials.

Scalable complexity. The description and execution of anexperiment should not become more complex as the num-ber of involved entities grows.

Observable. The user should be able to monitor and col-lect information from all involved resources and the exper-iment itself.

Support study cycles. An experiment is usually part of alarger study involving multiple design, execution, resultanalysis iterations. The framework should support theseconsecutive iterations, for example through interfaces withtools such as versioning systems or statistical analysissoftware.

In addition to these user-oriented specifications, the de-sign of such a framework should also adhere to the follow-ing testbed operator-oriented requirements.

Generic. It should be modular and offer high-levelabstractions to support a large spectrum of resource types,e.g. core network and mobile personal devices, wirelesssensors, cloud-hosted services.

Secure. It should provide secure authentication for in-volved entities (e.g. experimenters, resources) to supportaccountability, and a secure authorisation scheme to con-firm the rights of entities requesting actions from otherones (e.g. user requesting the monitoring of a wirelesschannel).

Page 5: Designing and orchestrating reproducible experiments on federated networking testbeds

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 177

Scalable and distributed. It should scale to a large num-ber of aggregates, resources, and concurrent experiments;and not rely on a central entity, which may become a fail-ure or performance bottleneck, or source of conflict.

Open. Its design and implementation should be opensource to promote its dissemination and contributionsfrom third parties (e.g. support for new types of resources).

Adopt, adapt, develop. It should adopt standard technol-ogies, potentially adapting them to meet its objectives, be-fore developing new ones.

4. The OMF-F architecture

4.1. Overview

Fig. 3 presents an overview of OMF-F architecture,where several entities interact to enable the federationscenario described in the previous Section 3. At the centreof this architecture is a distributed publish-and-subscribemessaging system (pub/sub), which is realised by a set ofPeering Servers (PS). OMF-F use a topic-based messagingpattern where resources or groups of resources are repre-sented by topics. Communication among all entities isachieved by publishing and subscribing to the respectivetopics.

Any resource is associated with a Resource Controller(RC), which is the proxy for one or more barebone re-sources. The RC subscribes to the topics associated withits resources and translates messages it receives from otherentities to resource specific actions. The RC normally alsoincludes a policy component which first checks the validityof an incoming request.

An experimenter describes her experiment using a do-main-specific language, and passes this description to anExperiment Controller (EC). This EC interprets the experi-ment description and uses the pub/sub system to send re-quests to the involved RCs and to receive reports fromthem on the experiment’s progress. These RCs instructthe resources to execute the tasks within these requestsand relays their outcomes. If a resource is instrumented,it may collect filtered measurements as instructed by theRC based on the experiment description. These data aresent to measurement entities using the OMF MeasurementLibrary (OML) [17], which can process them and store orforward them to other OML components. These OML enti-ties are themselves resources, which understand the samecommunication protocol as the RCs and thus can also be

PS

OMLOML

Aggregate A PS

ResRC

ResRC

Publish/SubMessag

System

EC

Experimenter in slice X Vis & AExp 1Description

Fig. 3. OMF-F archite

organised and controlled via the EC and the experimentdescription.

The experimenter may use additional software (Vis andA) to visualise the experiment’s progress and analyse itscollected measurements [6]. The remaining of this sectiondetails the main components of this architecture, and howthey meet the requirements from Section 3.3.

4.2. Resource model and life-cycle

Our first design decision is to consider every entity par-ticipating in an experiment as a resource, independent onwho is providing it. A resource has a set of propertiesand an associated life-cycle (Fig. 4). It communicates withother resources through well defined messages. A new re-source is created by an existing resource receiving a createmessage. It is initially in the inactive state, and may transi-tion to the active state either automatically after creationor at some later stage.

A resource’s property may be set to a given value via aconfigure message. Given the large variety of resourcesthere is no support for actuator commands, such as ‘‘dotask X’’. Instead, resources are requested through a config-ure message to adjust their internal operations, so that theobservable properties reach a certain value. In other words,we request the outcome and leave the ‘‘how’’ to the re-source. For example, an experimenter may instruct a robotresource to move its arm to a certain position by setting itsarmX, armY, armZ properties to some desired coordinates,then setting its armState property to ‘‘moving’’. Some re-sources may only accept configuration messages in theinactive state, and some properties may only be set at cre-ation time, as part of the create message.

The request message asks a component to report on itsproperties via an inform message. Some types of resourcesmay only accept request messages once created, and dis-card any create or configure messages. This would be thecase for a resource which primarily provides a service ina production system, and may only be used in experimentsas a read-only source of measurements. Finally, a compo-nent is discarded when it receives a release message.

Creating new resources through an existing one estab-lishes a clear policy context in which we can decide ifthe request is valid or not. This results in a parent–childrelation between the creator and the created. As a conse-quence and to maintain a proper accountability chain, aparent can only cease to exist when all its descendants

scribe ing

OML

Aggregate BPS

ResRC

ResRC

EC

Experimenter in slice YVis & AExp 2Description

cture overview.

Page 6: Designing and orchestrating reproducible experiments on federated networking testbeds

Fig. 4. OMF-F resource model and life-cycle.

178 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

have been released as well. In our current implementation,every resource maintains a list of its children and whenreceiving a release message it will forward it to its childrenas well. The initial release request will only succeed if alldescendants have released themselves.

To support scalability we also introduce a group re-source which maintains a set of other resources. In prac-tice, this type of resource is simply a group messagingmechanism represented by a pub/sub topic. All membersof a group are requested to also subscribe to the respectivegroup topic and process the received messages accord-ingly. Given the one-way communication pattern of pub/sub there is no additional semantics associated with groupresources. For instance, there is no implied guarantee thata message sent to a group will be received by all its mem-bers. In fact, that guarantee does not even exist for individ-ual resources.

This resource model is an original feature of OMF-Fcompared to other orchestration framework [14,16]. It isembodied in the RC and addresses our requirements for ageneric, open, and observable framework.

The life-cycle model can be extended with sub-statesand transitions for any type of resource. For example, amobile robot resource might have the sub-states active/moving-forward or active/rotating. This life-cycle model isan optional part of the architecture as it is not reflectedin the ‘‘standardised’’ parts, such as the messaging proto-col. Indeed, there are no explicit messages to request a spe-cific change in the life-cycle and it may only be observableif a resource provides it through a property. However, awell defined and broadly applied model simplifies thedevelopment of resource adapters. In contrast, the messag-ing protocol to interact with a resource is defined in an ref-erence document,6 which allows third parties to developtheir own RC or EC implementation to interact with RCs orECs from other providers.

4.3. Resource naming and description

Each resource in OMF-F has an identifier of the form:resID@aggregateID, where aggregateID is unique, and resIDis unique within an aggregate.7

OMF-F provides a domain-specific language to describein a topology the list of resources associated with a slice.This language is an extension of the original OMF languagepresented in [5]. Listing 1 shows a simple topology exam-ple consisting of a PC-based, a virtual machine and a phoneresource from different aggregates. The user learnt fromthe resource discovery phase about the resources availableat a given aggregate (e.g. a physical device), or the type of

6 http://mytestbed.net/projects/omf/wiki/ArchitecturalFoundation2ProtocolInteractions.

7 The identifier of an aggregate resource is by convention:aggregateID@aggregateID.

resources that can be created by that aggregate (e.g. a vir-tual machine). She then creates the resource description inListing 1 with the reservation period and any otherrequirements captured in key/value parameters, and sendsit to each aggregate. Although this resource discovery andprovisioning phase is outside the scope of OMF-F as men-tioned earlier, we will provide later in this section somedetails on how OMF-F can easily interface with toolsimplementing that phase (such as the SFA).

Other provisioning or orchestration frameworks use asimilar resource description such as the RSpec [9,14]. TheOMF-F topology can be completely mapped into an RSpecand has the benefit of using the same language as theexperiment description, thus facilitating the dynamic addi-tion of resources within running experiments. This namingand description scheme can be applied to any type of re-sources and is based on an existing language, thus it meetsour generic and adopt/adapt design requirements.

4.4. Publish and subscribe messaging

We implemented the OMF-F pub/sub messaging systemusing the XMPP protocol [18] and existing XMPP serverssuch as OpenFire.8 This system is composed of distributedservers peering with each other, and hosting topics. Authen-ticated clients can connect to a server, subscribe to any top-ics hosted by any servers and publish messages to them.XMPP’s server-to-server protocol ensures that a messagepublished to a topic is forwarded to all of its subscribersregardless of which server they are connected to. Fig. 5shows the typical organisation of OMF-F’s pub/sub topicsfor a federated experiment scenario as described in Section 3,along with the subscribed entities from Fig. 3.9

A topic is identified by name@domainID. The domainIDis the domain name of the server hosting that topic. Eachaggregate must have an associated XMPP server, withdomainID = aggregateID (Section 4.3). The name part is aunique identifier for the topic within a domain. While thisidentifier could be a fixed-length hash, for maintenancepurpose we may define it according to the tree structurefrom Fig. 5. For example, the ID of topic res3 created forthe experiment exp1 part of sliceX hosted by the serverfoo is: sliceX/exp1/[email protected]. The experimenters in-volved in sliceX will never deal with such a topic ID, theywill only manipulate resources and OMF-F will handleany mapping to pub/sub topics.

During the discovery and provisioning phase, the userselects a pub/sub server to host her study’s slice and cre-ates the slice’s topics, e.g. [email protected] and sliceX/[email protected]. As part of the provisioning process [3,9], anadditional topic is created for each resource assigned to

8 http://www.igniterealtime.org/projects/openfire.9 OML entities are omitted for legibility as they are also resources.

Page 7: Designing and orchestrating reproducible experiments on federated networking testbeds

Listing 1. Simple OMF-F topology with optional key/value parameters partially showed for the first resource only.

PubSub topic N (experiment)OMF-F entity EE subscribed to N

RC1

slice X

resources

res 1 res 3res 4 res 2

exp 1

group jres 1 res 3

exp 2res 2

PubSub Server Foo

RC3

...

group i

ECExp 1

res 4

PubSub topic N (provisioning)

Fig. 5. OMF-F pubsub topics and subscriptions for a simple federated experiment (the tree structure of the nodes is organisational and does not implymessage forwarding).

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 179

the slice, e.g. sliceX/resources/[email protected], and the RCassociated to this resource is subscribed to that topic.These topics are shown with dashed lines in Fig. 5.

In the experiment orchestration phase handled by OMF-F, the user’s EC (ECExp1) creates further topics related to theexperiment (exp1), such as sliceX/[email protected], and sli-ceX/exp1/[email protected]. ECExp1 also creates the topics forall new resources that are created during an experimentexecution, for example a new application instance or anew container for other resources (group i). These topicsare in full line in Fig. 5. For each provisioned resource tobe used in exp1, such as res3, the ECExp1 publishes on theslice’s topic (sliceX/resources/[email protected]) a configura-tion message requesting it to join exp1. Upon receivingthat message, the corresponding RC (RC3) subscribes tothe topics of exp1 which are relevant to it, for example sli-ceX/exp1/[email protected] and sliceX/exp1/[email protected] ifres3 is included in the group i container. The RC is thenready to accept any commands related to exp1’s execution,which are published on these topics by the ECExp1.

Some additional subscriptions enable restricted mes-sage broadcasts and are not displayed on Fig. 5 for legibil-ity reasons. For example, all entities involved in anexperiment must also subscribe to the experiment’s topic(sliceX/[email protected]) to respond to experiment-widebroadcast, such as an emergency shutdown.

Our use of a pub/sub messaging system with a struc-tured topic map (Fig. 5) for OMF-F is novel compared tothe communication schemes in other frameworks[14,15]. It allows any types of entities to communicate ina scalable and distributed manner since it does not relyon a centralised architecture. It also meets our openessrequirement as it is based on an standard [18], and itadopts established pub/sub server implementations. Final-ly, its asynchronous nature allows the orchestration ofpotentially disconnected resources as illustrated inSection 7.

4.5. Authentication and authorisation

OMF-F needs to guarantee the publisher’s identity forall messages. While XMPP supports the authentication ofconnecting clients, misconfigured or compromised serversmay not enforce it properly. Thus OMF-F instead uses end-to-end authentication based on the well established prac-tice of PKI-backed digital signatures. Each OMF-F entityhas a set of public and private keys, signs its generatedmessages with its private key, and verifies the signaturesof received messages with the originator’s public key. Pub-lic keys of a slice’s entities are exchanged at resource pro-visioning between the users and the resources, or obtainedvia a trusted scheme such as a certificate authority or webof trust. The former method is currently implemented indeployed OMF-F testbeds. The other method is under eval-uation as it may not scale to a high experiment churninvolving large resource numbers. PlanetLab [12] uses asimilar authentication scheme, which enables the deploy-ment of RCs on its slivers and allows OMF-F to orchestrateexperiments with resources from both PlanetLab and OMF-F testbeds as presented in Section 7.

An OMF-F entity also needs to verify that the originatorof a message has sufficient rights for any enclosed re-quests. For example, although a user acquired the rightto use a spectrum analyser to collect some data, it maybe limited to use only some frequency ranges. We proposeto attach a set of assertions or their resolvable references toeach message, which establish the originator’s rights. Inthe previous example, the user’s request will have a firstassertion from her institution confirming her affiliation toit, then a second assertion from the owner of the resourcegiving rights on a set of ranges to affiliates of that institu-tion for a reserved time period. All assertions are signed bytheir originators and verified using the same key mecha-nism as above. Some assertions such as the last one in thisexample may be generated during the resource reservation

Page 8: Designing and orchestrating reproducible experiments on federated networking testbeds

10 http://omf.mytestbed.net/projects/omf/repository/show?rev=sfa.

180 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

and provisioning phase. This scheme can be considered arestricted, but light-weight variant of ABAC [19]. It is re-stricted as it does not allow for additional rules to be at-tached to assertions, it is light-weight as most assertionswill only be passed by reference, and it is based on widelyadopted industry standards.

4.6. Experiment and resource controllers

As previously mentioned, the EC is the entity responsi-ble for orchestrating the experiment. It interprets anexperiment description from the user, developed using adomain-specific language [5], and sends related commandsto RCs using the pub/sub scheme and our defined messag-ing protocol. In OMF-F, we significantly extended the basicEC and RC entities of the original OMF [5] along the follow-ing four directions.

First the RC was redesigned to implement the resourcemodel from Section 4.2. This generalises its source code tosupport different types of resources, such as virtual ma-chines, mobile phone applications, or wireless sensors ina consistent manner. It also provides a reference imple-mentation of our resource model and protocol, whichserves as a base for custom RC implementations by thirdparties.

OMF-F experiments are now fully event-driven, i.e. theexperimenters define events associated with tasks in theirexperiment. These events are synchronisation barriersbased on either time or values of properties or measure-ments from resources; when their conditions are met thetasks associated to them are executed. An example eventcould be ‘‘when the signal strength drops below -XdBm’’and the associated tasks could be ‘‘decrease measurementsampling rate by X’’, as presented in Listing 2. Section 5presents our design approach, a formal model, and sometechnical details for the new event-handling engine atthe core of the OMF-F EC.

The communication stack of all original OMF entitieswere modified to support the new OMF-F messaging sys-tem. This allows the EC and RCs to be on different networkdomains, removes the need for a permanent connectionbetween them, and provides scalable group communica-tion. It also enables new experiment capabilities, such asdisconnected experiments (e.g. a mobile resource tempo-rarily out of range of an EC), or long-running surveys (e.g.EC connecting episodically to RCs in a x-month data collec-tion). Finally, the EC and RC entities were extended to sup-port the authentication scheme described above, whichallowed RCs to be readily deployed on PlanetLab, and en-abled orchestration of federated experiments (Section 7).

4.7. Interface with resource discovery and provisioning, andother tools

The use of a distributed pub/sub scheme at the core ofOMF-F allows it to easily interface with existing scheduler,registry and aggregate manager functionalities defined incontributions from the GENI or FIRE initiatives. Indeedthese contributions can all be adapted to use the OMF-Fpub/sub system. The challenge remains then in the se-quence of interactions between these entities and the

OMF-F ones. For this, we propose to use the same conciseset of messages as defined for our resource model (Sec-tion 4.2). For example, a given testbed may be modelledas a resource and accepts request messages from experi-menters or brokers during the resource discovery phase.It replies via inform messages containing the resourceswhich are available or can be created. It may then acceptconfigure messages holding reservation or allocation de-mands, and again use inform to convey the outcomes. Weare working with NITOS on a software broker,10 whichimplements both the SFA API and our OMF-F messaging pro-tocol to provide resource reservation and provisioning. Thisbroker will interact on one side with experimenters or otherbrokers using SFA or OMF-F protocols, and on the other sidewith OMF-F testbeds resources, as in the above interactionexample.

This communication system also allowed the interfaceof OMF-F with a web-based digital laboratory notebook,which provides a unified interface to record research notes,track versions of experiment descriptions and associatedsoftware, automate their orchestration, and analyse the re-sults [6].

4.8. Distributed instrumentation

OMF-F uses the existing OML system [17] to instrumentresources and collect measurements. An EC executing anexperiment handles an OML-instrumented resource ormeasurement entity just like any other resource. OMF-Fand OML are two separately designed and developedframeworks, and recent OML developments are presentedin a separate contribution [20].

5. Describing and orchestrating experiments

5.1. Distributed orchestration using events

As mentioned above, an experiment orchestration canbe fully described by a set (S) of events (E) and actions(A), such as: S ¼ ðE;AÞ.

We further define an event as some function ei 2 E overa set of observations on resources and time-out events T.When the event ei fires, an action ai 2 A is executed. Execu-tion of ai in the context of the messaging model describedabove, equates to issuing a set of messages Mi. We cantherefore describe a declaration si as follows:

si ¼ ei Robsi ; Ti

� �) aiðÞ ) Mi ¼ fmi;jðRi;jÞ : Ri;j # Ract

i g

Ri ¼ Robsi [ Ract

i

Each declaration si is described by an event ei consumingobservations from the resource set Robs

i . When the eventei fires, action ai is executed, resulting in a set of messagesMi. Each message mi;j 2 Mi is addressed to a resource setRi;j. The union of all Ri;j is the set of resources Ract

i whichdeclaration si may act upon. We note that the executionof declaration si only requires communication among theunion of Robs

i and Racti . In fact, we can further restrict that

Page 9: Designing and orchestrating reproducible experiments on federated networking testbeds

Listing 2. Experiment description in OMF-F domain-specific language.

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 181

requirement by observing that an action ai can be executedmultiple times in response to a firing event eiðtÞ as long asany resource r 2 Ract

i will only receive one copy11 of eachmessage mi;jðtÞ. We can expand the above to describe a sys-tem where the action blocks are co-located with each re-source and all relevant messages are delivered locally only.

Si ¼ sj 2 Sji 2 Ractj

n o

Oi ¼ rj 2 Rjsk 2 Si ^ rj 2 Robsj

n o

In the above equation, Si defines the set of declaration swhich can potentially act on resource i. In turn, Oi is theset of resources contributing observations to the eventsassociated with the declaration in Si. We can therefore con-clude that the correct orchestration of an experiment is as-sured if every resource ri can at least receive messagesfrom all the resources in Oi, assuming that it can locallyexecute the relevant event and action operators. The abilityto determine the ‘‘communication’’ set for a resource r al-lows us to reason about the impact of resources whichmay temporarily become disconnected from the messag-ing plane. This is especially important for experiments inthe wireless, mobile, and sensor domain.

A special case occurs when a resource only depends onitself (Oi ¼ frig) and therefore can successfully participatein an experiment even if disconnected from everyone else.In practice, our fine-grain modelling of resources normallymeans that multiple resources will be ‘‘hosted’’ by a singlephysical device. While not always true, we can usually as-

11 This restriction is really only necessary for non-idempotent messages.

sume that all resources on a device can communicateindependent of network connectivity and therefore the‘‘practical’’ special case is for a situation where theobservation set of a resource ri;h hosted by resource rh isalso hosted in its entirety on rh (Oi;h ¼ frj 2 OijhostedðrjÞ ¼rhg). The experiment in Listing 2, which we will describein more detail shortly, is a simple but typical example. Inthis experiment, a resource is dynamically modified basedon measurements taken on the same resource.

5.2. Model for experiment description

We introduce a stochastic analytical model to representthe orchestration of events in an OMF-F experiment. Ourmotivation to define such a model is to provide experi-menters with a tool to deduce some characteristics of theirexperiments before running them. For example, using thismodel one could verify prior to an experiment’s runtimethat it does not contain a chain of events leading to a dead-lock (e.g. resources waiting on each other to proceed). Thismodel could further be used to check that all events in anexperiment can be triggered (i.e. reachability property), orthat ever-increasing chains of events are not possible (i.e.boundedness property). In experiments involving a largenumber of distributed resources and defined events, suchinformation would allow experimenters to save develop-ment time and effort.

We based our proposed model on Petri Nets, whichhave been previously used to assess liveness, reachabilityand boundedness of given systems [21]. More specifically,we used the Generalised Stochastic Petri Nets (GSPN) to

Page 10: Designing and orchestrating reproducible experiments on federated networking testbeds

Fig. 6. Petri Net Model of an OMF Experiments with three nodes and three actors.

182 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

describe and analyse the expected behaviours of any givenOMF-F Experiment Description. In the remainder of thissection, we apply without loss of generality this model tothe case of the experiment described in Listing 2.

The experiment in Listing 2 consists of three physicalresources which each have a specific assigned role, namelya traffic source, a sink and an observer. Once these re-sources are operational, the sender and the receiver startexchanging data and the observer reports all capturedpackets via OML [17] to an external server. This experi-ment also defines an additional event E based on the mea-surements collected by the observer. This event triggerswhen a measured metric reaches a set threshold. At thatpoint, the observer lowers its sampling rate to avoid inter-fering with the observed system.

From this experiment, we derive the Petri Net modelshown in Fig. 6. In this figure, the transition t6 representsthe specific event ‘‘ALL_UP_ AND_INSTALLED’’, which re-quires three tokens to be fired. The transitions t2–t4 repre-sent the temporal transition for a resource to be ready,while the transition t5 represents a potential experimentfailure by either a cancel from the user or a problem duringresource setup. When the event ‘‘ALL_UP_AND_IN-STALLED’’ is triggered, we enter into the sequential phaseof the experiment as described in Listing 2. In particular,the sender and receiver are driven by temporal eventswhile the observer actor is driven by both a temporal eventand a measurement-based event.

Once the Petri Net corresponding to a given experimenthas been modelled as in the above example, we can derivethe embedded Markov Chain of the related stochastic pro-cess. In this example, the derived Markov Chain is pre-sented in Appendix A, and shows the different markingschemes of the previous Petri Net model. The analysis ofthis stochastic Markov Chain will give the experimentermuch quantitative and qualitative information about herexperiment. For example, it can give temporal boundariesto any given experiment using sojourn time analysis.Moreover, it can also identify deadlock and livenessthrough the establishment of the steady state distributionand the identification of the transient and absorbing states.

We showed how the event-based approach adopted byOMF-F allows an experiment to be modelled as a Petri Net,which itself leads to a Markov Chain. Analysing the latterallows an experimenter to verify key properties of herexperiment prior to running it, thus potentially savingvaluable time in troubleshooting (e.g. will her experimentwith many resources and events ever block?). We areworking on automating this model generation and analysisto provide experimenters with systematic experimentverifications.

5.3. Event engine implementation

OMF-F provides the defEvent construct to allow experi-menters to define their custom events. Such definitionmust include the conditions that will trigger the event.These conditions could be based on given timestamps, onsome resource properties or collected measurementsreaching some values, or any combination thereof. Forexample, the defEvent statement in Listing 2 defines anevent based on a collected RSSI measurement. The OMF-Fuser documentation provides other examples of eventdefinition.12

In the case of an always-connected experiment, the ECis responsible for monitoring the triggering conditions. Inan experiment where a resource may be disconnected, itsRC is responsible for monitoring events, which can be trig-gered only if it is based on local states or measurements asdiscussed in Section 5.1. In the first case and for conditionson resource properties, the EC keeps a representation ofthe values of all resources’ properties. This representationis first built from the initial known configuration of the re-sources, and then updated at runtime using the informa-tion from the inform messages sent by the resourcesupon property updates. The EC’s event engine periodicallyruns through all defined events and checks if they shouldbe triggered based on that representation.

For conditions on collected measurements, the EC peri-odically queries the OML data store that contains theexperiment’s measurements (Section 4.8). This query is

12 http://mytestbed.net/projects/omf54/wiki/EventsHowTo.

Page 11: Designing and orchestrating reproducible experiments on federated networking testbeds

0 5 10 15 20 25 30

0.10

0.14

0.18

Number of Resource (N)

Mea

nR

espo

n se

Del

a y (D

) [s]

D = f(N) for TB1

0.4

0.6

0.8

1.0

1.2

Number of Resource (N)

Mea

nR

espo

nse

Del

ay (D

) [s]

0 50 100 150 190

D = f(N) for TB2

Fig. 7. Mean response delay (over all resources) for different numbers of resources.

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 183

derived from the event definition and its returned data aretested against the triggering conditions. One limitation ofthis mechanism is that it currently does not support anyaccess control (i.e. any EC can access data from any exper-iment on a given OML store). This is due to the lack of ac-cess control in OML, and will be addressed in future work.

6. Evaluation

We performed two sets of experiments to quantify theperformance of the pub/sub communication scheme usedby OMF-F and the responsiveness of its event-based exper-iment orchestration.

6.1. Evaluation of the OMF-F communication scheme

The objective of this first experiment set is to quantifythe delay between the time when the EC publishes a com-mand towards a group of RCs and the time when all ofthese RCs publishes back that they have executed the com-mand. We used OML [17] to instrument the EC with twomeasurement points (MP), one at message publishingand the other at message reception. Each MP records atimestamp and the message content, so we can match re-ceived replies corresponding to sent commands. In thisexperiment set, the EC asks a group of RCs to start an appli-cation with negligible startup delay. As soon as this is done,the RC published a message back to the EC. We ran thisexperiment on both a local testbed (TB1) with ORBIT-likeresources and the PlanetLab testbed (TB2). We varied thenumber of involved resources N from 1 to 30 on TB1 andfrom 1 to 190 on TB2, and ran 10 trials for each N value.13

Fig. 7 shows the mean response delay D over all re-sources for both testbeds and different number of re-sources N. D seems to grow linearly with the number ofinvolved resources, with D < 1:3s for N ¼ 190 on TB2. Alinear regression using this data gives an estimated slopeof a ¼ 5:17 � 10�3 (thus D should remain low for large Nvalues), and predicts D ¼ 2:85 for N ¼ 500 and D ¼ 5:43for N ¼ 1000.

13 Originally N ¼ 200 for TB2, however when inspecting the data wenoticed faulty PlanetLab resources on some of the trials and decided toremove them from all trials.

Fig. 8 shows the ascending mean response delay d foreach resource when N ¼ 30 on TB1 and N ¼ 190 on TB2.It shows that within a given trial, d seems to grow linearlybetween the first replying resource and the last one. Thishighlights a bottleneck in the current EC implementation,where received messages are queued before sequentialprocessing. This limit may be mitigated by the use of mul-tiple connections between the EC and the pub/sub serverand the parallel processing of incoming messages (assum-ing only one network interface on the EC side).

6.2. Evaluation of the OMF-F event engine

The goal of this second set of experiments is to quantifythe responsiveness of the OMF-F event-handling engine.Thus we measure the delay between the moment whenthe EC detects that an event happens (i.e. a synchronisationbarrier is triggered) and the moment when its associatedtasks are performed.

In this set of experiments, 100 random resources eachrun a single application. We define an event, which moni-tors the status of these applications, and which triggers ifany application dies (i.e. quits with a non-zero error code).When such an event happens, the associated tasks are toselect some replacement resources from a list of stand-byones and then start a new application instance on thesereplacements. We measure the duration between the firstdetected failed application until 100 application instancesare running again over the entire experiment. This dura-tion is composed of the time it takes for the failures to bedetected, the EC and RCs processing times, the communi-cation time between them, and the startup time of anapplication instance.14 This is a classic failure recovery sce-nario likely to occur in large scale experiments with unreli-able resources. We ran 10 trials of that experiment on 200PlanetLab resources. For each trial, we kill the running appli-cation on 50 randomly selected resources after 150 s ofruntime.

Fig. 9 shows the recovery time for a single trial of theabove experiment and for all of the 10 trials. It showsthat the EC event-handling scheme is responsive as itneeds on average about 11.18 s to recover from the

14 The application here is a simple infinite while loop; thus startup time isnegligible.

Page 12: Designing and orchestrating reproducible experiments on federated networking testbeds

50 100 150 200 250

020

4060

8010

0

Time [s]

Num

ber o

fApp

licat

ions

Recovery Time for a Single Trial

810

1214

Mean = 11.18 (+/- 3.02 )

Tim

e[s

]

Recovery Time over 10 trials

Fig. 9. Recovery time upon 50 random failures for one and many trials.

0 5 10 15 20 25 30

0.10

0.20

0.30

Index of Resource (I)

Mea

nR

espo

nse

Del

ay (d

) [s] d = f(I) for N=30 (TB1)

0.5

1.0

1.5

2.0

2.5

Index of Resource (I)

Mea

nR

espo

nse

Del

ay (d

) [s]

0 50 100 150 190

d = f(I) for N=190 (TB2)

Fig. 8. Ordered mean response delay (for each resource) when N ¼ 30 and N ¼ 190.

184 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

failure of 50 application instances after the first detectedfailure. This event engine is currently implemented usinga 1 s active pooling, i.e. every 1 s the EC pools informa-tion relevant to the event and decides if it triggered ornot. This is done using ‘‘Green Threads’’ in Ruby 1.8, thusits performance depends on the scheduling efficiency ofthe Ruby virtual machine. A similar evaluation was per-formed on an earlier version of GUSH [14] and resultedin longer recovery time, which could be due to the dif-ferent tasks performed by the resources in that case,e.g. install and start instances of a PlanetLab-specific re-source discovery software (SWORD).

7. Case studies

This section presents two examples of research experi-ments, which were performed on federated testbeds usingthe OMF-F framework. These experiments were also dem-onstrated live during the GENI Engineering Conferences(GEC). OMF-F and subsets of its components as describedin Section 4 are currently deployed on many testbeds pro-vided by different institutions.15

7.1. Distributed resolution of the obstacle problem acrosstestbeds

This experiment was part of a research work investigat-ing novel peer-to-peer scheme to solve problems (e.g. the

15 http://https://omf.mytestbed.net/projects/omf/wiki/DeploymentSite.

classic Obstacle Problem) in a distributed manner [22]. Itwas also presented as a federation demonstrator duringthe 7th GEC. In this scheme, peers were organised in coop-erating clusters with desirable communication characteris-tics, which allowed fast resolution of large scale numericalproblems via asynchronous iterative steps. In this experi-ment, the researchers used two sets of solvers, one de-ployed on the Orbit aggregate and the other on thePlanetLab aggregate. Fig. 10 presents the setup of this fed-erated experiment. This experiment demonstrated the fol-lowing features of OMF-F, as described in Section 4: thePKI-based authentication scheme, the use of the messagescheme across domains, and the extension of the orches-tration tools to control heterogeneous (PlanetLab, Orbit)resources.

7.2. ParkNet demonstration over three federated domains

This experiment was developed within the ParkNet pro-ject [23], which aimed at collecting and processing parkingspace statistics via sensor-enabled vehicles [23]. It in-volved three aggregates, namely the Poly NYU aggregateproviding WiMAX connectivity to the vehicles driving inBrooklyn, the Orbit aggregate hosting the ParkNet cloud,and a GENI aggregate providing backbone connectivity.OMF-F was used to design and execute the experimentsacross these federated aggregates. Fig. 11 presents the set-up of this experiment, which was demonstrated at the 9thGEC. This experiment illustrates many OMF-F features,such as the support for mobile wireless resources, for

Page 13: Designing and orchestrating reproducible experiments on federated networking testbeds

Fig. 11. Nineth GEC demonstration: ParkNet over 3 aggregates.

Control Network

Experimental Network

(the Internet)

EC

PubSub Server

Internet

P2PProblem solving Client

ORBITAggregate

PlanetLab Aggregate

RC

P2P Problem solving Client

OML

AM

OMLServer

OML

RC

AM

ExperimentDescription

Fig. 10. Seventh GEC demonstration: resolution of the obstacle problem.

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 185

potential disconnections, and for even more heterogeneousresources (e.g. sensors, video devices). Finally it also dem-onstrated OMF-F’s capability to drive a highly distributedexperiment with geographically distant resources.

8. Conclusion

We proposed OMF-F, a framework which allows thedefinition and orchestration of networking experimentsover shared resources provided by different federatedadministrative domains. Its main three features are (1)the support for rich, event-based experiment descriptionsin a domain-specific language; (2) the specific definitionsof how it models resources and how to communicate withthem over an asynchronous publish-subscribe system toenable automatic orchestration at large scale; and (3) thecapability to interface with existing resource discoveryand provisioning tools [3], and experiment cycle and datamanagement tools [6].

Within the global initiatives on future Internet testbeds,many contributions have focused on the critical tasks of dis-covering and provisioning resources within a federation oftestbeds. However, few groups have focused on the equallyimportant tasks of developing experiments and executingthem in that same federated environment. Such a systemwould foster reproducible and scientifically sound experi-mental research and facilitate peer-review. OMF-F focuseson that challenge and provides a complete system to defineand orchestrate large scale experiments over federatedtestbeds.

In that regard, we made three main contributions in thispaper. First we provided a detailed description of the modelsused in OMF-F, its architecture, and its involved entities. We

complemented this with overviews of OMF-F’s messagingprotocol and experiment definition language, with links totheir complete online descriptions. Second, we presentedan evaluation of two key components of OMF-F, namely itsunderlying messaging and event-handling systems. Thisevaluation showed that they are scalable to a high numberof resources (�1 k) and responsive (�11 s response time to50% failure). Then, we presented a couple of real examplesof OMF-F being deployed in different federated domainsand used to define and execute concrete interesting re-search experiments. These case-studies demonstrate theoperation and ease-of- use of OMF-F in real-life.

Our future plans include completing the authorisationscheme based on assertion sets, improving the perfor-mance of OMF-F’s messaging and event-handing systems,polishing the current implementation and documentationto attract more users (not only experimenters, but alsodevelopers contributing supports for new resources, suchas the GENI Racks [24]), and extending the existing deploy-ment base. We are collaborating with other teams withinboth the FIRE and GENI initiatives to provide a seamlessOMF-F interface with various existing contributions, andwith measurement analysis, management and curationsystems. Finally, we plan to periodically review OMF-F’sdesign, models, and mechanisms based on the insightsgained through our involvements with the FIRE and GENIprojects and the feedback from enthusiastic OMF-F users.

Appendix A. Petri net embedded markov chain

Fig. A.12 presents the embedded Markov Chain for thePetri Net from Fig. 6. We simplified this Petri Net byremoving the transition t5 (i.e. the potential failure case)

Page 14: Designing and orchestrating reproducible experiments on federated networking testbeds

Fig. A.12. Embedded Markov of the GSPN.

186 T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187

as it would have added many states in the Markov chainwhich would have resulted in a obliterated presentationof the problem. Nevertheless, without loss of generalitywe can analyse the experiment represented in Fig. 6 usingthe Markov Chain in Fig. A.12.

The embedded Markov Chain gives us the transitionmatrix P. In this matrix fið�Þ represents the function onthe line i such as fið�Þ ¼ 1�

Pnj¼1pij.

f ð�Þ p1 p2 p3 0 0 0 0 0 0 0 0 0 0 0 0 0 00 f ð�Þ 0 0 p4 0 p5 0 0 0 0 0 0 0 0 0 0 00 0 f ð�Þ 0 p6 p7 0 0 0 0 0 0 0 0 0 0 0 00 0 0 f ð�Þ 0 p8 p9 0 0 0 0 0 0 0 0 0 0 00 0 0 0 f ð�Þ 0 0 p10 0 0 0 0 0 0 0 0 0 00 0 0 0 0 f ð�Þ 0 p11 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 f ð�Þ p12 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 f ð�Þ p13 p14 p15 p16 0 0 0 0 0 00 0 0 0 0 0 0 0 f ð�Þ 0 0 0 p17 0 p20 p24 0 00 0 0 0 0 0 0 0 0 f ð�Þ 0 0 0 p18 p19 0 p21 00 0 0 0 0 0 0 0 0 0 f ð�Þ 0 0 p22 p23 0 0 00 0 0 0 0 0 0 0 0 0 0 f ð�Þ 0 0 0 p25 p26 00 0 0 0 0 0 0 0 0 0 0 0 f ð�Þ 0 0 0 0 p27

0 0 0 0 0 0 0 0 0 0 0 0 0 f ð�Þ 0 0 0 p28

0 0 0 0 0 0 0 0 0 0 0 0 0 0 f ð�Þ 0 0 p29

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f ð�Þ 0 p30

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f ð�Þ p31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

0BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA

The sojourn time for every marking and the steady statedistribution p can be extracted from this transition matrix.

References

[1] World Bank, World Development Indicators 2011, 2011.[2] M. Berman, J.S. Chase, A. Nakao, M. Ott, D. Raychaudhuri, R. Ricci, S.

Seskar, GENI: a federated testbed for innovative networkexperiments, Comput. Netw. 61 (2014) 5–23.

[3] L. Peterson, R. Ricci, A. Falk, J. Chase, Slice-Based FederationArchitecture, Technical Report, Geni, 2010.

[4] Coral Content Distribution Network, 2012. <http://www.coralcdn.org>.[5] T. Rakotoarivelo, M. Ott, G. Jourjon, I. Seskar, OMF: a control and

management framework for networking testbeds, ACM Operat. Syst.Rev. 43 (2010) 54–59.

[6] G. Jourjon, T. Rakotoarivelo, M. Ott, A portal to support rigorousexperimental methodology in networking research, in: Tridentcom,Springer-Verlag, 2011.

[7] D. Raychaudhuri, I. Seskar, M. Ott, S. Ganu, K. Ramachandran, H.Kremo, R. Siracusa, H. Liu, M. Singh, Overview of ORBIT radio grid

testbed for evaluation of next-generation wireless networkprotocols, in: Wireless Communication and Networking Conference(WCNC 2005).

[8] G. Jourjon, T. Rakotoarivelo, M. Ott, From learning to researching, easethe shift through testbeds, in: Tridentcom, Springer-Verlag, 2010.

[9] J. Duerig, R. Ricci, L. Stoller, G. Wong, S. Chikkulapelly, W. Seok,Designing a federated testbed as a distributed system, in:TridentCom, Springer-Verlag, 2012.

[10] S. Wahle, T. Magedanz, K. Campowsky, Interoperability in heterogeneousresource federations, in: TridentCom, Springer-Verlag, 2010.

[11] S. Wahle, C. Tranoris, S. Fox, T. Magedanz, Resource description inlarge scale heterogeneous resource federations, in: TridentCom,Springer-Verlag, 2011.

[12] L. Peterson, T. Roscoe, The design principles of PlanetLab, ACMOperat. Syst. Rev. 40 (2006) 11–16.

[13] T. Bourgeau, J. Auge, T. Friedman, TopHat: supporting experimentsthrough measurement infrastructure federation, in: TridentCom,Springer-Verlag, 2010.

[14] J. Albrecht, D.Y. Huang, Managing distributed applications usingGUSH, in: TridentCom, Springer-Verlag, 2010.

[15] A. Quereilhac, M. Lacage, C. Freire, T. Turletti, W. Dabbous, NEPI: anintegration framework for network experimentation, in: Conferenceon Software, Telecommunication and Computer Networks (SoftCom2011).

[16] E. Eide, L. Stoller, J. Lepreau, An experimentation workbench forreplayable networking research, in: Fourth USENIX Symposium onNetworked Systems Design and Implementation (NSDI 2007).

[17] J. White, G. Jourjon, T. Rakotoarivelo, M. Ott, Measurementarchitectures for network experiments with disconnected mobilenodes, in: TridentCom, Springer-Verlag, 2010.

[18] P. Saint-Andre, Extensible Messaging and Presence Protocol (XMPP):Core, RFC 6120, IETF, 2011.

[19] N. Li, J.C. Mitchell, W.H. Winsborough, Design of a role-based trustmanagement framework, in: Symposium on Security and Privacy,IEEE Computer Society Press, 2002.

[20] O. Mehani, G. Jourjon, T. Rakotoarivelo, M. Ott, An instrumentationframework for the critical task of measurement collection in thefuture internet, Comput. Netw. 63 (2014) 68–83.

[21] F. Bause, P.S. Kritzinger, Stochastic Petri Nets – An Introduction tothe Theory, Bause and Kritzinger, 2002.

[22] T.T. Nguyen, D. El Baz, P. Spiteri, G. Jourjon, M. Chau, Highperformance peer-to-peer distributed computing with applicationto obstacle problem, in: HotP2P Workshop in Conjunction withIPDPS 2010.

[23] S. Mathur, T. Jin, N. Kasturirangan, J. Chandrashekharan, W. Xue, M.Gruteser, W. Trappe, ParkNet: drive-by sensing of road-side parkingstatistics, in: Conference on Mobile Systems, Applications andServices (MobiSys 2010).

[24] N. Bastin, A. Bavier, J.A. Blaine, J.H. Chen, N.K. Krishnan, J. Mambretti,R. McGeer, R. Ricci, N.J. Watts, The InstaGENI initiative: anarchitecture for distributed systems and advanced programmablenetworks, Comput. Netw. 61 (2014) 24–38.

Page 15: Designing and orchestrating reproducible experiments on federated networking testbeds

T. Rakotoarivelo et al. / Computer Networks 63 (2014) 173–187 187

Thierry Rakotoarivelo is a senior researcherwith the Network Research Group at NICTA,where he works on Frameworks for LargeScale Worldwide Experimental Studies. He isalso interested in using these frameworks toimprove the teaching of networking courses,using the IREEL eLearning Platform. Herecently started new activities related toexperimental measurements and analysis.Before joining NICTA, Thierry completed in2007 a PhD in a cotutelle agreement betweenUNSW (Sydney, Australia) and DMI Lab at

ENSICA (Toulouse, France). His PhD dissertation is titled: ‘‘DistributedDiscovery and Management of Alternate Internet Paths with EnhancedQuality of Service’’. Prior to that he was a senior research engineer at the

Motorola Australian Research Centre (MARC, now closed), where heworked with the Wireless Technology Lab (WTL) on Quality-of-Serviceoriented MAC protocols for Wireless LAN (WLAN).

Guillaume Jourjon received his PhD from theUniversity of New South Wales and the Tou-louse University of Science in 2008, his Engi-neering degree from the ENSICA, a Frenchaeronautical engineering school in Toulouseand his Master of Research in Telecommuni-cations and Networking. He is currently aResearcher at NICTA in the Network ResearchGroup. His research interests are in the opti-misation of distributed systems (i.e. energeticaspect and QoE), the management and controlof experimentation (testbed), measurement

architectures and live analysis, and e-learning platforms.

Max Ott is a Sr. Principal Researcher and aResearch Leader in the Networks ResearchGroup at NICTA. He is heavily involved inmany of the worldwide experimental researchfacility activities, such as GENI (US) and FIREwith many testbeds adopting the award-winning OMF control and managementframework. Before coming to NICTA he foun-ded Semandex, which is a pioneer provider ofContent-Based Networks, a newgeneration ofEnterprise Information Integration andKnowledge Management systems. At Seman-

dex he invented, designed, and directed the implementation of the coretechnology behind the company’s line of content routing products. He amstill serving on the board of Semandex. He is also holding a Research

Professor appointment at WINLAB, Rutgers University where he isresponsible for the software architecture of the ORBIT testbed. Prior tothat, he was a technical manager at the Communications and ComputingResearch Laboratory (CCRL), NEC USA where his responsibilities includedproviding technical leadership in the area of multimedia/Internet soft-ware within NEC CCRL’s network systems research activities. Before that,He was a visiting scientist at the NHK Science and Research Laboratories.He received a PhD in Electrical Engineering from the University of Tokyo,Japan, an M.S. from the Technical University Vienna, Austria, and anexcellent foundation from the HTBLA Steyr.