EventWave: Programming Model and Runtime Support …milind/docs/socc13.pdf · EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications Wei-Chiu

EventWave: Programming Model and Runtime Support forTightly-Coupled Elastic Cloud Applications

Wei-Chiu Chuang, Bo Sang, Sunghwan Yoo, Rui Gu,Milind Kulkarni

Purdue University{weichiu,bsang,sunghwanyoo,ruigu,milind}@purdue.edu

Charles KillianPurdue University and Google

[email protected]

AbstractAn attractive approach to leveraging the ability of cloud-computing platforms to provide resources on demandis to build elastic applications, which can dynamicallyscale up or down based on resource requirements. Toease the development of elastic applications, it is use-ful for programmers to write applications with simplesequential semantics, without considering elasticity, andrely on runtime support to provide that elasticity. Whilethis approach has been useful in restricted domains, suchas MapReduce, existing programming models for gen-eral distributed applications do not expose enough in-formation about their inherent organization of state andcomputation to provide such transparent elasticity.

We introduce EVENTWAVE, an event-driven pro-gramming model that allows developers to design elas-tic programs with inelastic semantics while naturallyexposing isolated state and computation with program-matic parallelism. In addition, we describe the runtimemechanism which takes the exposed parallelism to pro-vide elasticity. Finally, we evaluate our implementationthrough microbenchmarks and case studies to demon-strate that EVENTWAVE can provide efficient, scalable,transparent elasticity for applications run in the cloud.

1 IntroductionOne of the major promises of cloud computing is theability to use resources-on-demand. Rather than build

Copyright c© 2013 by the Association for Computing Machinery, Inc.(ACM). Permission to make digital or hard copies of portions of thiswork for personal or classroom use is granted without fee providedthat the copies are not made or distributed for profit or commercial ad-vantage and that copies bear this notice and the full citation on the firstpage in print or the first screen in digital media. Copyrights for compo-nents of this work owned by others than ACM must be honored. Ab-stracting with credit is permitted. To copy otherwise, to republish, topost on servers, or to redistribute to lists, requires prior specific permis-sion and/or a fee. Request permissions from [email protected].

SoCC’13, October 01–03 2013, Santa Clara, CA, USACopyright is held by the owner/author(s). Publication rights licensedto ACM.ACM 978-1-4503-2428-1/13/10...$15.00.http://dx.doi.org/10.1145/2523616.2523617

the "world"

Building Building

Room Room

Hallway

Room Room

Hallway

players

Figure 1: A multiplayer game composed of buildings,rooms, hallways and players

out a large-scale infrastructure to support a networkedservice with fluctuating needs, companies can build elas-tic components that start running on a small-scale infras-tructure (perhaps only a handful of hosts), then scale upas the usage or popularity of the service grows. Elasticitycan also allow changing infrastructure size on a smallertime-scale: weekly, daily or even hourly variations inload could trigger the infrastructure size to change withthe needs of the service. Further, elasticity could allowvariable infrastructure size as a result of the availabilityof discount resources, or the sudden unexpected needsof the service in response to, e.g., flash crowds.

However, elastic infrastructure alone does not sufficeto provide these benefits, because programmers must de-sign the applications with elasticity in mind. Considera simple multi-player game server shown in Figure 1.In this game, players wander in the virtual world. Ifa building is crowded with more players than the ma-chine can handle, a logical solution to scale up the sys-tem is to split the world into buildings, and delegate therequests of different buildings to different nodes. Thisgame can further scale up by splitting the buildings intorooms and hallways, and delegate the requests to moremachines. Any such program must explicitly account forelasticity, which means that it must support the arrivalor departure of nodes from the system: safely “split-ting” and “merging” the world state as the system ex-pands and contracts without losing program state, trans-parently migrating state, handling partial failures of thenow-distributed system, etc. And all of this without af-fecting gameplay semantics. While all of these tasks can

be programmed manually, implementing the necessaryrun-time mechanisms and reasoning about the resultingapplication is an exceptional burden for developers.

To accelerate the development of elastic applicationsfor the cloud, what we need is a programming modelwhere programmers need not be aware of the actual scaleof the application, or the run time support that dynam-ically reconfigures the system to distribute applicationstate across computing resources. To support complex,tightly coupled applications, such as the game server wedescribed, such an elastic programming model shouldprovide several key features: (i) stateful computation, (ii)transparent elasticity, and (iii) simple semantics. Unfor-tunately, existing programming models for writing elas-tic applications do not satisfy all of these criteria.

In recent years, there have been many systems thatprovide elasticity for various distributed applications.Scalable databases can adaptively distribute databasestate to scale up a database’s capabilities [1, 4, 12]. How-ever, these systems only target part of a program’s func-tionality and do not, for example, provide elasticity for aprogram’s computation.

Functional programming models, such as MapRe-duce [13], Dryad [19], and “bag of tasks” batch sched-ulers [24, 30], are based on models of computationwhere computations do not depend on prior system state,and individual tasks are independent of one another. Asa result, the runtime system can freely distribute tasksamong varying numbers of nodes without involving theprogrammer. However, in a game server, the game worldconsists of state, and the “tasks” consist of actions takenby players in interacting with this world and with eachother, precluding a functional approach.

Actor-based programming models support simple,event-driven, stateful computation [17]: state is parti-tioned among isolated actors, each of which can be runon a separate node. The actors receive incoming eventsthat manipulate their state, and actors interact with eachother by passing messages that trigger additional events.Unfortunately, traditional actor-based models do notsupport elasticity, as an actor is conceived of as a mono-lithic unit that handles incoming events atomically. Thescale of an actor-based system is determined by the num-ber of actors.

A recent system, Orleans, supports elasticity in theactor model, by replicating the state of actors, allow-ing multiple events to be processed simultaneously [8].However, this state replication leads to complicated se-mantics for merging changes to actor state, and as a re-sult, Orleans does not provide simple, sequential seman-tics. While this is acceptable in many applications, itmay not be appropriate in certain cases. For example,in a game server, it seems desirable that events triggeredby multiple players be resolved in some sequential order,

and that every player observe that same order.We propose EVENTWAVE, a new programming

model for tightly-coupled, stateful, distributed applica-tions that provides transparent elasticity while preserv-ing sequential semantics. In EVENTWAVE, an applica-tion consists of a set of logical nodes. A logical nodeconsists of some system state, and that state is ma-nipulated by events that execute at the node. Logicalnodes interact with each other by message passing. Asin the actor model (and many other event-driven sys-tems), EVENTWAVE logical nodes provide atomic eventsemantics: events the node receives appear to execute se-quentially, and in order.

Our model: EVENTWAVE

EVENTWAVE applications provide elasticity by allow-ing logical nodes to execute multiple events in parallel,and by allowing a single logical node to be distributedover multiple physical nodes (e.g., multiple machinesin a cluster, or multiple virtual hosts). The fundamen-tal guarantee of the EVENTWAVE model is that a logicalnode distributed over multiple physical nodes will main-tain atomic event semantics. In other words, a program-mer can reason about the behavior of her program with-out considering elasticity: if a program is correct whenrunning on a set of logical nodes, its semantics will bepreserved even as the program scales up or down by run-ning on various physical nodes. Programmers focus onthe logic, not the scale.

Our model is based on the insight that many appli-cations decompose into a hierarchy of different con-texts, namely that modular and component design andobject-oriented programming have led to program de-signs where large sections of code are inherently boundto a subset of application state, and that the applicationstate is often further bound to the parameters of the func-tion calls. For example, a context-based design of a gameserver can be composed of contexts for each room andbuilding in the game world, and contexts for each player.Events that manipulate just a portion of the state will ex-ecute in the context that state resides in. For example,when assessing which direction a player may move, theprogram needs to consider information only about thecurrent room, and players in it, not the global state.

To exploit the elasticity exposed by EVENTWAVEprograms, we design a distributed run-time system. Thesystem distributes the contexts of a single logical nodeacross multiple physical nodes (for example, distributingdifferent building contexts to different physical nodes).The group of physical nodes appears to the rest of thesystem as a single logical node, preserving the seman-tics of an inelastic application running on a fixed setof resources while providing the performance of an ap-plication running on a larger set of resources. The run-

logical nodes

physical nodes

communicatonbetween physical nodes

communicatonbetween logical nodes

more clients server under higher load

scale down scale upfewer clients server

Figure 2: A game server logical node connected by sev-eral client nodes. Regardless of scale, clients see only onegame server logical node. The actual composition of a log-ical node at runtime is transparent to the programmers. Theprogrammers can only see at the level of logical nodes.

void k i d I n i t ( i n t nKid ) {Kid [ nKid ] . l o c a t i o n = LOCATION IN WORLD ;Kid [ nKid ] . k i d D i r e c t i o n = DIRECTION STATIONARY ;

}

Figure 3: The code in Mace syntax

time executes the tasks of a single logical node acrossmultiple physical nodes, dispatching events in a particu-lar context to the physical node where that context re-sides, and can transparently and dynamically migratecontexts to change the number of physical nodes con-stituting a logical node, scaling up application resourcesin response to demands.

Figure 2 illustrates a game server in EVENTWAVE.When running at small scales, all of the state resides on asingle physical node. As the number of clients (players)increases, the game state can be dynamically distributedamong more physical nodes, increasing the execution re-sources available to the system. As seen in the figure, theclients still interact with this newly-distributed server asthough it were executing on a single node.

We build EVENTWAVE on top of Mace, a toolkit forwriting event-driven distributed systems [22], providingcontext information through annotations. As a result,many applications can support elasticity with straight-forward modification, similar to converting from C toC++. As an example, a short code snippet in Mace syn-tax is listed in Figure 3, and the corresponding code inEVENTWAVE syntax is listed in Figure 4.

After reviewing the key points of event-driven pro-gramming, we present the EventWave model, followedby successive levels of detail about our runtime system.We evaluate our runtime system using microbenchmarksand example applications before detailing related workand concluding.

[ Kid<nKid>] void k i d I n i t ( i n t nKid ) {l o c a t i o n = LOCATION IN WORLD ;k i d D i r e c t i o n = DIRECTION STATIONARY ;

}

Figure 4: The code with context annotation

2 Event-driven programmingA popular approach to writing distributed applications isthe event-driven programming model. Conceptually, anapplication runs on one or more logical nodes, whichtraditionally represent separate physical machines. Alogical node’s execution consists of processing events,which are self-contained units of computation, consist-ing of a set of method invocations. An event can be trig-gered by external events such as receipt of messages, orinternally by other events.

We adopt the event-driven programming model forEVENTWAVE programs for two reasons. First, event-driven programming is a natural model for writingmany distributed programs. Many applications (e.g., themulti-player game server) are designed and describedin terms of reacting and responding to messages. Theevent-driven model is also a good fit for asynchronousdistributed applications, such as the multiplayer game,where events are triggered by different external agentsbut must be handled in a coordinated manner. We alsonote that traditional task-graph style programs (e.g.,Cilk programs) can be reformulated as event-driven pro-grams, with each task represented as a separate event,and “spawns” of new tasks represented as starting anew event. Second, there is already substantial tool sup-port for event-driven programming, including program-ming languages that make expressing event-based appli-cations easy [22] and development tools such as modelcheckers to verify event-handling protocols [21]. Build-ing on top of these existing tools, which have been usedto implement numerous distributed applications, broad-ens EVENTWAVE’s applicability.

While many event-based programming models usethreads for low-level services such as timer schedulingand network I/O, most adhere to an atomic event model.In this model, the entire sequence of computational stepsnecessary to process an event must appear as though itexecuted atomically and in isolation from other events;the event processing is transactional. Thus, even if anevent is triggered while another event is being processed,an application behaves in a sequentially consistent man-ner: the behavior of the application is as if the eventswere processed one-at-a-time, in the order they were re-ceived. This execution model is attractive because it al-lows programmers to easily reason about the behaviorof their applications, even if many events are initiatedsimultaneously, or events actually run in parallel.

Typical atomic event systems preclude parallelism.

a u t o t y p e s{B u i l d i n g{

map rooms ;Hal lway h a l l w a y ;

}Room {

s e t kidsInRoom ;}Hallway {

a r r a y<a r r a y > hallwayMap ;s e t k i d s I n H a l l w a y ;

}Kid{

i n t c u r r e n t B u i l d i n g ;i n t cur ren tRoom ;c o o r d i n a t e coord ;i n t k i d D i r e c t i o n ;

}}s t a t e v a r i a b l e s {

s e t k i d s I n W o r l d ;map b u i l d i n g s ;map k i d s ;B u i l d i n g b u i l d i n g s ;Kid k i d s ;

}

Figure 5: State definitions of the example game in Figure 1

Instead, EVENTWAVE dispatches and executes events inparallel provided they are accessing disjoint state (thismodel is similar to, but richer than, that proposed byYoo et al. based on read/write locking of applicationstate [33]). The challenge, then, is to either detect ordiscover completely independent events. EVENTWAVEaccomplishes this by extending the programming modelto capture state isolation, and enhancing the runtimemodel to enforce event isolation, rather than either pre-dicting (through static analysis) or detecting (throughrun-time checks) whether the developer correctly iso-lated events in their application. Section 3.3 discussesEVENTWAVE’s approach to parallelism in detail.

3 EVENTWAVE

This section introduces EVENTWAVE, an event-driven,elastic programming model. We introduce the notionof contexts, which provide a hierarchical, dynamic par-titioning of a node’s state and associated operationson those partitioned states. We then describe an eventmodel, which constrains how events can interact withcontexts. This event model enables a parallel execu-tion model, that allows multiple events to run in parallelwhile preserving atomic event semantics. In Section 4,we describe how this event and execution model can sup-port distributed execution—running the events of a sin-gle logical node across multiple physical nodes—a keystep in enabling elastic execution.

3.1 ContextsOne key feature of event-driven distributed systems (un-like, e.g., dataflow models) is that logical nodes in thesystem contain mutable state. For instance, Figure 5shows the application state definition for the simple

game in Figure 1. The state consists of self-containedstate variables: players (“kids”) running around a gameworld as well as buildings, rooms and hallways. Somestate variables are part of player objects and capture,e.g., the player’s information in the game world, such asthe player’s identifier. Other variables define the world:rooms and hallways reside in a building and severalbuildings populate the world.

This state can be inspected and modified while re-sponding to an event. Crucially, however, not every eventrequires accessing all the state in a node; instead, dif-ferent events may actually access disjoint portions of anode’s state. To capture this behavior, state in a EVENT-WAVE logical node is organized into contexts. A context,as the name suggests, provides the environment in whichan event can execute.

As an event executes, it exists in one or more con-texts, and the context(s) in which an event executes con-trol which portions of a program’s state the event can ac-cess. Intuitively, two events that are executing in differ-ent contexts are accessing disjoint state, and hence canbe executed in parallel (as we elaborate in Section 3.3).

A context in EVENTWAVE consists of a portion ofan application’s state. For example, in a game server, a“building” in the world might be one context, while aplayer of the game would have a separate context. Con-text definitions are analogous to structure definitions:there can be multiple instances of a particular contexttype (for example, multiple players in a game, or mul-tiple buildings in a game world). To express this, eachcontext has an associated identifier (e.g., a player’s username), which serves to distinguish different instances ofthe same context type.

Importantly, contexts are dynamic. A context is notexplicitly instantiated by a programmer, but is insteadinstantiated when first referenced (by its identifier). Forexample, as new players enter a game, events are trig-gered with new player user names, creating the contextsin which those events execute. Intuitively, one can thinkof a map, for each context type, between context ids andcontexts; when a particular context id is referenced, ifthe context exists in the map, it is used, otherwise a newcontext is created and added to the map. This approachto instantiation naturally fits the expected behavior ofmany event-driven programs (e.g., in a distributed hashtable, creating new contexts as files are added or as newpeers join). Figure 6 shows the state definition in Fig-ure 5 translated into contexts. Each object in the originalstate is naturally translated into a context. For instance,the Room object becomes the Room context.

A hierarchy of contexts To this point, we have as-sumed that contexts in EVENTWAVE programs are“flat”: there is no relationship between different con-texts, and all contexts are independent. In reality pieces

s e t k i d s I n W o r l d ;c o n t e x t B u i l d i n g {

c o n t e x t Room {s e t kidsInRoom ;

}c o n t e x t Hallway {

a r r a y<a r r a y > hallwayMap ;s e t k i d s I n H a l l w a y ;

}}c o n t e x t Kid {

i n t c u r r e n t B u i l d i n g ;i n t cur ren tRoom ;c o o r d i n a t e coord ;i n t k i d D i r e c t i o n ;

}

Figure 6: Context definition of the game in figure1

Logical Node global

Building<0> Building<1> Kid<0> Kid<1> Kid<2>

Room<0> Hallway Room<0> Hallway

Figure 7: Sample context hierarchy

of state in an application have hierarchical relationships:one context can contain other, sub-contexts. All explic-itly declared contexts have a parent context. By defaultthe parent context is global, but if the context is explic-itly declared inside another context, the former’s parentis the latter. For example, in Figure 6, the room and hall-way contexts have a building context as their parent.

Figure 7 shows the context hierarchy for the exam-ple in Figure 6, with two buildings, rooms, hallways andmultiple players. To name a specific context in the hi-erarchy, we use double colon as the connector. For ex-ample, the first room in the first building is labeled asBuilding〈0〉::Room〈0〉.

EVENTWAVE currently supports applications withone-to-many relation between contexts. Our future workwill support applications with more complex interac-tions between contexts. For example, contexts may forma lattice, which can be used to represent many-to-manyrelationships: a building has many departments, and adepartment may be housed in many buildings.

3.2 Event modelAs discussed in Section 2, event-driven programs canbe thought of as a series of atomic events executing inresponse to various stimuli. EVENTWAVE supports mul-tiple events executing simultaneously while preservingsequential, atomic event semantics (see Section 3.3). Toaccomplish this, the EVENTWAVE event model uses con-text information to restrict what events can do.

3.2.1 Context methods

An event in EVENTWAVE executes a series of methods.Each method m is associated with a context, c. We willwrite such methods as m[c]. A method m[c] can read and

write state associated with c, but cannot access state as-sociated with any other context.

Recall that contexts are identified by a context typeand by a unique identifier. Methods are not associ-ated with context types, but specific contexts. The bind-ing of methods to specific contexts occurs at run time:the context annotation takes the form c〈x〉, where xrefers to a method argument. When the method is in-voked, the value of this argument is used to bind themethod to a particular context. Hence, the signature for areportLocation method for a particular player p idinvoked in the Kid〈p id〉 context would be:

[Kid]reportLocation(int p id)

3.2.2 Event execution

When an event starts, it invokes a single context method,m[c]—all other methods must be invoked from m[c].This first context method is called a transition which isalso known as the event handler.

To read and write state in context c, the event must“acquire” c. This acquisition occurs implicitly by invok-ing methods in context c. An event may acquire multi-ple contexts by executing multiple methods, called rou-tines. Acquiring multiple contexts within a single eventensures that a consistent state is seen across the con-texts. Figure 8 shows an example code in EVENTWAVElanguage. A message delivery transition triggers a newevent at a Building context which calls routines syn-chronously to relocate a player from a room to the hall-way. If the routines in the figure were implemented asindependent events, consistency would not be guaran-teed (e.g., the player could be in both the room and thehallway simultaneously).

Crucially, events cannot acquire contexts at will. Iftwo events e1 and e2, with e1 logically earlier than e2,access c in the opposite order, e2’s execution violatessequential semantics. Furthermore, if an event accessesmultiple contexts, its operations across those contextsmust appear atomic. While these problems could be ad-dressed by requiring that only one event access the con-text hierarchy at a time, such a restriction precludes par-allelism. Instead, we place several restrictions on events’behaviors that enable a parallel execution model thatpreserves sequential semantics (see Section 3.3).

If an event accesses multiple contexts, it must acquirethem in a particular order. In particular, to access a con-text c, an event must either start by accessing c, or musthave acquired a context higher than c in the context hi-erarchy. Because the context hierarchy creates a partialorder of events, this access rule means that an event startsat a particular point in the hierarchy, and the set of con-texts it can write to “grows” down in the hierarchy.

If an event no longer requires write access to a con-text, it can call a special downgrade method to release

t r a n s i t i o n s {[ B u i l d i n g<r . n B u i l d i n g >]

d e l i v e r ( from , to , R e l o c a t e R e q u e s t r ){downgrade ( ) ;moveOut ( r . kidID , r . nRoom , r . n B u i l d i n g ) ;moveToHallWay ( r . kidID , r . n B u i l d i n g ) ;

}}r o u t i n e s {

[ B u i l d i n g<n B u i l d i n g > : :Room<nRoom>]bool moveOut ( i n t kidID , i n t nRoom , i n t n B u i l d i n g ) { . . . }

[ B u i l d i n g<n B u i l d i n g > : : Hal lway ]bool moveToHallway ( i n t kidID , i n t n B u i l d i n g ) { . . . }

}

Figure 8: A code snippet written in EVENTWAVE syntaxfor handling client requests

global

B<0> B<1>

H R<0>global

B<0> B<1>

H R<0>

global

B<0> B<1>

H R<0>

event e starts at context B<1>(1) (2)

(5)

(3)

global

B<0> B<1>

H R<0>

leading wavetrailing wave

global

B<0> B<1>

H R<0>

(4)

Figure 9: Event waves and execution transitions of an evente. Contexts B〈0〉/B〈1〉 are abbreviation for Building〈0〉and Building〈1〉,respectively. The context H is shortfor Building〈1〉::Hallway. The context R〈0〉 is short forBuilding〈1〉::Room〈0〉

access to the context. However, once an event releaseswrite access to a context, it cannot reacquire access tothat context (though it may still read its state). A con-text c cannot be downgraded unless the event has alreadyreleased access to all of c’s ancestors in the hierarchy.Absent downgrades, an event only releases access to itscontexts when it completes.

We note the similarity of the context access restric-tions to two-phase locking approaches for isolation. Inthe absence of downgrades, acquiring contexts is anal-ogous to strict two-phase locking [5], while the addi-tion of downgrades produces a protocol analogous totree locking [2]. As in two-phase locking, the orderingimposed on the acquisition of contexts by the context hi-erarchy helps prevent deadlock (as we shall see in thenext section). Downgrades complicate the model some-what, but the ordering on contexts still suffices to enablesafe, parallel execution.

Intuitively, the execution of an event can be reasonedabout in terms of two event waves: the leading wave andtrailing wave. The region occupied by the leading wavecorresponds to the contexts that the event has write ac-cess to; the region occupied by the trailing wave corre-sponds to the contexts that the event has downgraded,but can still read. The trailing wave can not pass by theleading wave, as that would violate the model.

Figure 9 plots an event executing the methods in

Figure 8, and shows how context acquires and down-grades affect the event waves. When an event starts inthe deliver method (as shown in step 1©), it only haswrite access to context Building〈1〉. By downgradingthe context, the trailing wave moves down in step 2©. Itthen calls moveOut() method synchronously to acquirethe context Building〈1〉::Room〈0〉, moving the leadingwave down (shown in step 3©). Subsequently, it callsmoveToHallway() synchronously to acquire the contextBuilding〈1〉::Hallway (step 4©) Finally, the event re-leases all contexts in step 5© when it finishes.

Events can create new events by calling methodsasynchronously. An asynchronous method invocation,conceptually, is identical to starting a new event and im-mediately calling the target of the asynchronous invo-cation. The new event is a separate unit of computationfrom its parent event, and only owns the context(s) asso-ciated with the initiating method. Asynchronous meth-ods are roughly analogous to asynchronous calls suchas “spawns” in Cilk [6] or “async” in X10 [9]; unlikein those languages, however, events triggered by asyn-chronous methods in EVENTWAVE are fully decoupledfrom their parent event after the parent event commits.

3.3 Parallel execution modelThe EVENTWAVE context and event model is coupledwith a parallel execution model that constrains the col-lective behavior of multiple events executing simultane-ously. When an event is initiated (e.g., the message trig-gering it is received, or the asynchronous method initi-ating it is called), it is given a logical, monotonically-increasing timestamp that orders it with respect to allother events in the system (in the case of a distributedprogram, with respect to all other events on a partic-ular node). EVENTWAVE events can execute in paral-lel as long as their behavior corresponds to the atomicevent model: the behavior of each event should be con-sistent with an execution where the events executed se-quentially according to their timestamp order. In otherwords, events appear to execute atomically, in isolation,and in timestamp order.

Because contexts are self-contained collections ofstate and code, an event executing in one context can-not affect the behavior of a different event executing inanother context. Hence, EVENTWAVE allows events toexecute in parallel as long as any context is held by atmost one event at a time.

In terms of the waves of events, consider events e ande′, where e′ has a later timestamp than e. For basic cor-rectness and limited parallelism, the leading wave for e′

must always be above the trailing wave for e, as event ecan read the contexts in the trailing wave, and must notread any modifications made by e′.

To improve parallelism, we note that once an event

view of event e view of the later event e'global

B<0> B<1>

H R<0>

global

B<0> B<1>

H R<0>

leading wavetrailing wave

Figure 10: Simultaneous view of two events e and e’ in anexecution that adheres to EVENTWAVE.

e downgrades from a context, moving down its trail-ing wave front, it only needs read access to the context.Hence, the EVENTWAVE runtime can take a snapshotof the current state of the downgraded context beforemoving down the wave front. This is similar to snapshotisolation in databses [3]. The difference is that snapshotisolation reads the committed data when the transactionstarts, but in EVENTWAVE, a snapshot of a context istaken when the event releases the lock. The event modelensures that once the trailing wave front moves down,the event will never modify the context. Hence it is safefor a later event, e′, to move its own leading wave frontbelow the context—i.e., to begin accessing the context.Future reads of the context by e will read from the snap-shot, preserving isolation. Effectively, this relaxes the re-strictions on events so that now an event’s leading wavemust only remain above earlier events’ leading wave,rather than above the trailing wave. Figure 10 illustratestwo concurrent events which obey the model.

Finally, to preserve timestamp order, an event that hascompleted execution cannot commit until all events withearlier timestamps have committed. In particular, anynew events triggered by an event’s execution (e.g., sentmessages, asynchronous calls) are delayed until after theevent commits. Note that this property means that anevent triggered by an asynchronous call logically startsand completes after its parent event finishes.

A simple strategy to ensure sequential semantics is toforce all events to start at the global context, and maketheir way down through the context hierarchy to performtheir operations. Because of the restrictions on events’wave fronts, later events will remain “above” earlierevents in the hierarchy, and events will appear to executein sequential order. Because EVENTWAVE allows eventsto be initiated at deeper contexts in the hierarchy, whenan event starts at context c, a “dummy” event begins atthe global context and implicitly attempts to acquire theglobal context, then acquire lower-level contexts and im-mediately downgrade them until c is reached. It is clearthat this strategy introduces some scalability issues (ev-ery event must essentially access every context). Sec-tion 4.3 presents a mechanism to mitigate this issue.

The next section discusses how the execution modeloutlined above can be satisfied in a runtime to distributea logical node over multiple physical nodes and/orparallel-executing threads while still preserving the il-

Logical NodeHead

Physical Node 1

Physical Node 2

Physical Node n

.....

1. receive an event2. execute the event

3. continue execute the event in another context

5. commit the event

4. create a subevent

6. execute the sub-event

Figure 11: Events processed by logical node distributedover multiple physical nodes

lusion that a logical node is running on a single physicalnode with one thread. Section 5 discusses how the samemechanism can be used to support migration of contextsduring elastic execution.

4 Distributed execution model andrun-time

This section presents the EVENTWAVE distributed run-time system, which implements the execution model de-scribed in Section 3.3.

When running a EVENTWAVE application, thestraightforward approach is to request one physical nodeper logical node. Execution can proceed as in the stan-dard, inelastic event-driven model. If the physical nodehas multiple threads, the EVENTWAVE runtime can takeadvantage of events in disjoint contexts to provide ad-ditional throughput. If, however, the load on the sys-tem increases beyond the capability of a single physi-cal node to process them, the EVENTWAVE distributedruntime system supports distributing the execution of asingle logical node across multiple physical nodes. Inother words, an application written to run on n logicalnodes can be run on m physical nodes, where m > n,providing additional throughput. When combined withdynamic migration (Section 5), which supports chang-ing m over the course of execution, the runtime enableselastic execution of distributed applications.

For the remainder of this section, we will assume thatthe EVENTWAVE application is written for a single log-ical node; the mechanisms easily generalize to applica-tions with multiple logical nodes.

4.1 OverviewA high level overview of the EVENTWAVE distributedruntime system, and how it processes events, is shownin Figure 11. The execution of a single logical node isdistributed across multiple physical nodes. A single headnode serves as the representative of the logical node: allcommunication with the outside world is intermediatedby the head node (e.g., all messages sent to this logicalnode are routed to the head node, and all messages sentby this logical node are sent from the head node).

All events processed by the logical node are initiatedby the head node. Upon receiving an event ( 1©), the headnode assigns the event a timestamp and signals a workernode to process the event ( 2©). Note that deciding whichworker node gets an event is discussed in the next sec-tion. As the event executes, the worker may “pass” theevent to a different worker to continue execution ( 3©).If the event makes an asynchronous call, this event isenqueued at the head node ( 4©). Once the event’s execu-tion is complete, the worker currently handling the eventsignals the head node that the event is ready to com-mit ( 5©). The head, having a global view of the logicalnode’s execution, can ensure that all events will com-mit in timestamp order. As in the basic parallel execu-tion model, once an event commits, any new events itmay have triggered are initiated. In particular, any asyn-chronous methods called by the event will now begin,starting from the head node as with other events ( 6©).

Requiring a single physical node to mediate all the op-erations of the logical node introduces a bottleneck: thethroughput of the logical node is constrained by how ef-ficiently the head node can manage execution. We notethat most of the work performed by the head node isbookkeeping, while most computation will be performedby other physical nodes, somewhat mitigating this draw-back. A runtime system that allows the head node’s ex-ecution to be distributed across multiple physical nodesis an important topic for future work.

4.2 Context-based distributionWhen a head node begins processing an event, howdoes it determine which worker node to send the eventto? There are many issues to consider: locality (eventsshould be assigned to workers containing the statethey must access), communication (events should bedistributed to minimize their transfer between workernodes), and load balance (worker nodes should performsimilar amounts of computation).

To address these issues, EVENTWAVE uses context-based distribution. The global context is mapped to thehead node, but other contexts are mapped to workernodes. When an event needs to execute in a particularcontext, it is sent to the worker node to which that con-text is mapped. If an event accesses multiple contextsduring its execution, its execution will be split amongmultiple worker nodes, being passed back and forth ac-cording to the context it is currently in.

Because contexts capture both a portion of the pro-gram’s data and the computations that act on them,this mapping strategy naturally accomplishes several ofthe goals laid out above. Locality is achieved by thedata-centric nature of the mapping. Rather than a task-based mapping, where an event is assigned to a workernode, and the data may need to be accessed remotely,

global

Room<0>

Context hierarchy

e1

timeline

global

Room<0>

e2

e1

e2

block

e1 holds the lock

e2 holds the lock

Figure 12: Left: e1 acquires the context synchronouslyafter e2, and violates the model. Right: e2 waits for e1. e2acquires the lock successfully after e1 downgrades.

in EVENTWAVE, events are sent to the data. Hence anevent always accesses only local data.

Context mapping The mapping of contexts to nodesaffects communication and load balance. A simpleheuristic for reducing communication cost is by consid-ering the hierarchical nature of contexts: an event in acontext c is more likely to also interact with a contextc′ that is a descendant of c. Hence, c and c′ should bemapped to the same physical node.

If the distribution of events visiting each context isknown a priori, then contexts can also be mapped tonodes to achieve load balance. However, specific heuris-tics for mapping are beyond the scope of this paper.

4.3 Parallel execution of distributed eventsOnce the mapping of contexts to physical nodes is ac-complished, the final step is to ensure that the atomicevent model is preserved by distributed execution. Whilethe head node dispatches multiple events to workernodes simultaneously, sequential commit order is guar-anteed by the mediation of the head node in every event’sexecution. Because events return to the head node beforecommitting, the head node can delay an event’s comple-tion until all earlier timestamped events complete. Thetrick, then, is to ensure that the parallel execution of (or-dered) events preserves the atomic event model. In par-ticular, we must ensure that (a) only one event is in acontext at a time; and (b) events access contexts in anorder consistent with their timestamp order.

Accomplishing the first goal is straightforward. Everycontext has a lock associated with it. When an event en-ters a context by calling a method associated with thatcontext, it must first acquire the lock on that context.That lock is held until the event either completes or ex-plicitly downgrades from that context.

More challenging is ensuring that events acquire con-texts according to their timestamp order. In particular,given events e1 and e2, with e1 logically earlier than e2,for any context c that both e1 and e2 want to access, e1must acquire a lock on c before e2; e2 cannot access cuntil e1 releases c. While it may seem that the hierarchi-cal nature of contexts enforces this ordering (there seemsto be no way for e2 to get to c before e1 does without

“passing” e1 in the context hierarchy), we note that anevent can start lower in the hierarchy as mentioned inSection 3.3. Considering the context hierarchy from Fig-ure 7, e1 could be executing in the global context, whilea second event e2 begins in context Room 〈 0 〉. If e1then wants to descend to Room 〈 0 〉, it would arrive atthe context after e2 and violate the atomic event model,as illustrated in Figure 12.

As described in Section 3.3, this problem can besolved using dummy events, but which clearly intro-duces scalability issues, as every event must touch ev-ery context. To avoid this, dummy events are not eagerlypropagated through the context hierarchy, but are instead“batched” and propagated through the tree in a group.

To implement this strategy, each context has a ticketbooth that contains an integer counter that indicateswhich events (identified by timestamp) have accessedthe context already. Hence, the current value of the ticketbooth represents the next event the context “expects” tosee. When an event e starts in context c, a dummy eventwith the same timestamp is issued to the global context.If the global context does not yet expect the dummyevent, it is enqueued at the global context. When anevent that the global context expects begins, it not onlymoves through the context tree itself, but carries with itall dummy events whose timestamps immediately fol-low it, and increments the ticket booth at the global con-text appropriately. For example, suppose the global con-text expects an event e with timestamp 7, while dummyevents with timestamps 8, 9 and 11 are enqueued. Whene executes, it propagates the events with timestamps 8and 9, but not the one with timestamp 11. The ticketbooth at the global context now expects timestamp 10.

The same procedure is used at all contexts. If a groupof dummy events, with timestamps x + 1 . . .x + k . . .batched with an event e (with timestamp x) reach a con-text c where an actual event e′ with timestamp x + kis waiting, the group is split, with dummy events x +1 . . .x+ k− 1 continuing through the context hierarchywith e, while dummy event x+ k terminates, and eventsx + k + 1 . . . are now batched with the actual event e′,which can begin executing once e releases access to c.

5 Dynamic migrationWith the distributed run time system described in theprevious section, EVENTWAVE programs written for nlogical nodes can run on m physical nodes. However,the ability to run a distributed program across additionalnodes is only one part of the elasticity story. Becausethe goal of elastic programs is to increase their through-put in response to demand, and throughput is increasedby running a program on additional logical nodes, wemust have a mechanism for dynamically changing thenumber of physical nodes that a logical node is run-

Head Node

Old Node

New Node

migration event em

earlier event e1

later event e2

em replicates context

e1 releases the lock and gets snapshot

em removes old context em duplicates the context, releases lock

e2 acquires the lock

e2 arrives, waits for em

e1 commits

em commits, removes old map

Figure 13: An example of dynamic migration

ning on, migrating computation and state to new physi-cal nodes. Furthermore, this migration should be trans-parent. The application must be able to change its physi-cal footprint without affecting any users; throughput andresponsiveness might vary during migration, but usersshould not observe any difference beyond increased la-tency. Finally, the developer’s programming model andmental model of the application should not have to ex-plicitly account for migration. By supporting dynamic,transparent migration in the existing EVENTWAVE pro-gramming model, the runtime provides the infrastructurenecessary to develop elastic applications.

The basic approach that EVENTWAVE takes to elas-ticity is to support dynamic migration of contexts.Namely, during execution, a context’s physical locationcan move. Because we want this migration to occurtransparently, there are a couple of problems that mustbe addressed. First, because events execute within con-texts, the distributed runtime must account for migrationcorrectly to ensure that events are forwarded to the ap-propriate location. Second, because a migration mightoccur while an event has access to a context, the run-time must make sure that events can safely deal with acontext that might move during their execution.

Migration algorithmTo achieve dynamic, transparent migration of a contextto a new physical node, the runtime must do two things:(i) replicate the state of the context to the new node; and(ii) update the context-node mapping to reflect the newcontext location. The first task is a prerequisite to migra-tion: because the goal is to move computation to anotherphysical node, the data associated with that computationmust be moved as well. The second task is necessaryto ensure that future events that access the context arerouted to the correct physical node.

Figure 13 shows the execution of dynamic migration.A migration creates a special “migration event,” When amigration request is issued, the head node creates a spe-cial “migration event.” It is inserted into the same eventqueue as normal application-generated events. Concep-tually, all events before the migration event will use theold context mapping, while all events after the migrationevent will see the new context mapping.

The migration event changes the mapping of contexts

to physical nodes, but instead of updating the contextmapping immediately, the runtime system keeps severalversions of the mapping at the same time, so that ev-ery event before the migration event will use the oldmapping, and the ones after the migration will use thenew version. By keeping multiple version of mapping,it avoids the pitfall mentioned above: events that alreadyhave access to the context will not observe the mapping,and hence migration will be transparent. After the newcontext map is created, it is sent to every physical node.Just as any other event, the migration event accesses thecontext to be migrated, acquiring a lock on it. Once itgains access to the context (ensuring that earlier eventsare done modifying the context), it replicates the state ofthe context and transfers it to the new physical node.

After transferring state, the migration event removesthe old context. Note that this is safe: the runtime will en-sure that events later than the migration event will readfrom the new maps, and hence those events will accessthe replicated version of the context on the new physicalnode. Earlier events have already released the migratedcontext and taken a snapshot. Thus, even though the con-text is now at a new physical node, those earlier eventswill simply read from their snapshots and continue with-out interruption. Finally, when the migration event com-mits, it removes the old mapping. At this point, all eventsthat may have needed the old mapping will have com-mitted, so it is no longer necessary.

With the mechanism sketched above, a critical chal-lenge is to develop the policy for migration. We discussapplication-specific policies in Section 7, but a generalpolicy is beyond the scope of this paper.

6 Fault ToleranceOne pressing problem that arises in elastic programs isfault tolerance. If a program initially run on n physicalnodes is expanded to run on 2n physical nodes, the like-lihood of a node failure increases commensurately. Ifthe failure of a single node were able to bring down theentire program, elasticity would result in more failure-prone applications. A valuable property for an elasticsystem to have is that the failure probability of an appli-cation remains fixed, regardless of how many physicalnodes an application’s logical nodes are distributed over.

Our current implementation of the EVENTWAVEmodel does not provide this guarantee. However, inearlier work, we described how existing EVENTWAVEmechanisms can be readily repurposed to guard againstphysical node failures and integrated with existing sys-tems to recover from logical node failures [10]. Webriefly summarize this approach here for completeness.

Recall that our runtime system implements threefeatures to support distributed, parallel execution ofevents: (i) events execute transactionally and in se-

quential order to preserve atomic-event semantics; (ii)events take snapshots of contexts as they execute; and(iii) all externally-visible communication is handled bythe “head” node, with incoming messages triggeringevents, and outgoing messages deferred until event com-mits. These features directly enable adding physical-node fault tolerance with little change to the runtime, byimplementing a basic checkpoint-and-rollback system:When an event commits at the head node, the contextsnapshots associated with the event are used to createcheckpoints. If a physical node fails, the contexts on thefailed node are rolled back using the commited snapshot.

The head node must also be fault-tolerant, since itprovides necessary coordination; if it fails, the logicalnode will fail. Since the head node looks like the entirelogical node to the rest of the system, any application-or logical-node-level fault tolerance approaches, can beused to mask the head node failure. MaceKen [32] is aMace extension integrating the Ken reliability protocolwhich can potentially mask head node failure. It recordscommitted events to its local persistent storage. When ahead node fails, it is simply restarted and its state is re-stored using the local persistent storage. The state of theentire logical node is thus rolled back to the last commit-ted event. The Ken protocol guarantees that external ob-servers cannot distinguish a restarted logical node froma slowly-responding logical node: an external message isexplicitly acknowledged, and any unacknowledged mes-sages are retransmitted until being acknowledged.

7 EvaluationWe implemented EVENTWAVE as an extension of theMace runtime [22] by adding 15,000 lines of code writ-ten in C++. We also modified the Mace compiler to parsethe context definitions and annotations, and then trans-late them into runtime API calls. The modification added5,000 lines of Perl code.

We note that the EVENTWAVE language eases devel-opment effort. As one example, the elastic key-valuestore application described below was written in lessthan 20 lines of code, while an equivalent C++ imple-mentation is about 6,000 LOC; the EVENTWAVE pro-gramming model allows programmers to concentrate onapplication semantics, leaving the complexity of distri-bution, migration and elasticity to the runtime.

As a new programming model for elastic applications,there are no standard benchmarks against which to eval-uate EVENTWAVE. Instead, we evaluate EVENTWAVEin two phases. First, we use a synthetic microbench-mark that allows us to vary the number and size of in-dependent contexts, to evaluate the performance of theEVENTWAVE system under various scenarios. We thenuse two application case studies, a key-value store and agame server, to study EVENTWAVE’s ability to use elas-

N=1 N=2 N=4 N=8Number of physical nodes

0

5000

10000

15000

20000

25000

30000

Thro

ughput

(pro

cess

ed e

vents

/ s

ec)

P=0P=100

P=150P=200

P=300P=400

(a) Smaller scale from 1 node to 8 nodes

N=16 N=32 N=64 N=128Number of physical nodes

0

500

1000

1500

2000

2500

3000

3500

Thro

ughp

ut (p

roce

ssed

eve

nts

/ sec

)

P=450 P=500 P=550

(b) Larger scale microbenchmark from 16 nodes upto 128 nodes

Figure 14: Microbenchmark throughput

ticity to maintain performance under dynamic load. Theresult of microbenchmark and the key-value store exper-iments was obtained using our lab cluster. Each node haseight 2.33 Ghz cores Intel Xeon and 8GB RAM con-nected to 1Gbps ethernet.

7.1 MicrobenchmarkWe evaluate three different aspects of EVENTWAVE witha microbenchmark. The microbenchmark consists of agenerator that produces a series of events that eachchoose one of 160 independent contexts in a round-robinfashion and perform a specified amount of work.

Throughput Figure 14a plots the throughput of themicrobenchmark (events processed per second) with dif-ferent amounts of work (P) and on different numbers ofphysical nodes (N). P = 0, implies no real work in theevent, and the throughput effectively measures the maxi-mum throughput the EVENTWAVE runtime supports. Asthe number of physical nodes increases, throughput doesnot drop; in fact, it increases as even with no work tobe performed, processing an event still has some paral-lelizable components. When P is increased, each eventdoes more work, so the overall throughput of the sys-tem drops, as seen when N = 1. Increasing the numberof physical nodes increases the computational resources

0

5000

10000

15000

20000

25000

30000

35000

40000

0 50 100 150 200 250 300 350

Pro

ce

sse

d e

ve

nts

(p

er

se

c)

Time (sec)

(a) context size=0MB

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 50 100 150 200 250 300 350P

roce

sse

d e

ve

nts

(p

er

se

c)

Time (sec)

(b) context size=100MBFigure 15: Throughput change before/after migration attime 160.

available to the application and hence recovers the lostperformance, until the application is once again runningat maximum performance. As the amount of work in-creases, it takes more physical nodes to recover max-imum performance, but the overall trend is stable. Wealso show microbenchmark result for larger scale up to128 nodes using larger workload in Figure 14b.

We can draw two conclusions: (i) the maximumthroughput of EVENTWAVE applications does not dropas a logical node is spread over more physical nodes;and (ii) EVENTWAVE is able to effectively harness thecomputational resources of multiple physical nodes tomaintain performance under higher levels of load.

Migration Overhead Figure 15 plots the overhead ofmigrating a single context from one physical node toanother at time 160. When the context is small (Fig-ure 15a), migration has no impact on average through-put. When it is large (Figure 15b), the migration eventmust serialize the context, send it to the destination phys-ical node and deserialize it. Even though other eventsaccessing different contexts continue to execute in par-allel with the migration event, they cannot commit un-til the migration event does. Hence, events back up be-hind the migration event, leading to a transient drop inthroughput during migration. Once the migration eventcommits, throughput temporarily exceeds the long-run

0.0 MB 0.1 MB 0.2 MB 0.4 MB 0.8 MB 1.6 MB 3.2 MB 6.4 MB 12.8 MB0

50

100

150

200

250

300La

tency

(m

s)

Figure 16: Migration latency of different context size

average throughput as all of the events that backed upcan be quickly committed. In both cases, migration doesnot have a long-term impact on throughput.

Migration Latency Figure 16 plots the latency of mi-gration events corresponding to different context sizes.Migration latency is proportional to the serialized size ofcontext, and is largely determined by network through-put and the speed of serialization/deserialization. Sincemigration does not require global synchronization, it isfast. For contexts of size less than 1MB, the latencyis negligible. We note that virtual machine live migra-tion protocols such as VNSnap [20] may also be ap-plied to reduce service disruption. Interestingly, context-based application state distribution may help live mi-gration protocols because a context represents a fine-grained chunk of application state that can be migratedindependently of other contexts.

7.2 Elastic Key-Value StoreWe implemented an elastic key-value store implemen-tation shown in Figure 18 using EVENTWAVE to eval-uate its performance as an application-level benchmark.Note that we claim no novelty of the key-value store.Instead, with the programming model and runtime sup-port described, we show that it is easy to write an elastickey-value store using less than 20 lines of code, whereasan equivalent C++ implementation is about 6,000 LOC.This evaluation demonstrates how elasticity can greatlyhelp a memory-constrained system. The application con-sists of two logical nodes. One is the client (1 node) andthe other is the server. On the server side, one physicalnode is used as the head node while one or more physicalnodes hold the key-value pairs within their memory. Thekeys are grouped into buckets using a hash function, andeach bucket gets its own context; in this way, operationson independent buckets proceed in parallel.

The experiment examines the behavior of performinga series of puts/gets on the key-value store. As the exper-

0

10

20

30

40

50

60

70

80

90

100

0

2

4

6

8

10

12

14

0 200 400 600 800 1000 1200

Usa

ge

(%

)

La

ten

cy (

se

c)

Time (sec)

Latency(Get)

Latency(Put)

Mem%

Swap%

(a) Single node key-value store without migration

0

10

20

30

40

50

60

70

80

90

100

0

5

10

15

20

25

30

35

40

0 1000 2000 3000 4000 5000

Usa

ge

(%

)

La

ten

cy (

se

c)

Time (sec)

Latency(Get)

Latency(Put)

Mem%

Swap%

4 to 8 nodes

2 to 4 nodes

1 to 2 nodes

(b) Single node key-value store with migrationFigure 17: Key-value store

c o n t e x t Bucket {map<s t r i n g , s t r i n g> kvmap ;}messages {

Get { s t r i n g k ; }GetReply { s t r i n g k ; s t r i n g v ; }Put { s t r i n g k ; s t r i n g v ; }

}t r a n s i t i o n s {

[ Bucket<hash ( msg . k)>] d e l i v e r ( s r c , d e s t , Get& msg ) {r o u t e ( s r c , GetReply ( msg . k , kvmap [ msg . k ] ) ) ;

}[ Bucket<hash ( msg . k)>] d e l i v e r ( s r c , d e s t , Pu t& msg ) {

kvmap [ msg . k ] = msg . v ;}

}

Figure 18: An elastic key-value store implementation

iment runs and more puts are performed, the size of thekey-value store increases. As a result, the nodes main-taining the store eventually exceed their physical mem-ory capacity, and swapping occurs. Figures 17a and 17bplot the latency of put and get requests (measured by theclient) over time and physical memory usage. In Fig-ure 17a, an inelastic configuration is used: the key-valuestore is kept on a single physical node, and once swap-ping begins, the latency of operations unsurprisingly be-comes more variable, and on average much higher.

Figure 17b demonstrates the power of EVENTWAVE’selasticity. Because different buckets in the key-valuestore are different contexts, the EVENTWAVE runtimecan easily migrate some buckets to other physical nodes,dynamically expanding the total amount of physicalmemory available to the application. We implement asimple migration policy: when the system’s physical

0 100 200 300 400 500 600 700 800Time(sec)

0

20

40

60

80

100

120

Clien

ts N

um

ber

Elastic Experiments for Tag Servers on EC2

Clients Number

Request Latency

0

2

4

6

8

10

Lat

ency

(se

cond)

(a) Latency and # of clients over time

0 100 200 300 400 500 600 700 800Time(sec)

0

10

20

30

40

50

60

70

Ser

ver

Num

ber

Tag Server Number Variation

(b) # of physical nodes used by server over timeFigure 19: Performance of game server with and withoutelasticity.

memory usage exceeds 80%, the number of physicalnodes is doubled and each existing node migrates halfits contexts to a new node. The figure demonstrates threesuch migration events, at which point the application isusing eight physical nodes. We note several points: (i)the sequential commit policy of EVENTWAVE meansthat some events that get “trapped” behind a migrationevent exhibit much longer latency, as they must waitfor the migration to complete before committing; (ii) af-ter migration, physical memory usage drops commen-surately; (iii) even as the key-value store is spread overmore physical nodes, the latency of operations does notincrease; and (iv) the elastic application is able to avoidpaging, and hence sustain low latencies for much longerthan the inelastic version.

7.3 Multiplayer game serverOur second case study concerns the ability to elasticallyadapt to changing loads of the EVENTWAVE multiplayergame example of Section 3. The game is implementedas a single logical node representing the game server,and multiple logical nodes representing the clients. Thegame state is organized into contexts as in Figure 6.We deploy 128 clients over 16 EC2 Small Instances togenerate workload. To balance the cost and latency, theserver head node resides on an Extra Large instance, butphysical nodes are Small Instances.

We generate an artificial “load” for the game server inthe form of players that are randomly distributed among

the buildings, hallways and rooms, and move randomlyaround the world. We simulate real-world gamer behav-ior by using a Gaussian distribution to have clients ran-domly join and leave the game server. Figure 19a showsthe number of connected clients at a given time; thenumber of connected players varies from 0 to around 90,in a periodic pattern. We evaluate four full periods of thisbehavior, measuring the average latency experienced byclients as they attempt to move around the game world.

Figure 19b shows the number of physical nodes usedby the server. For the first, second and the fourth period,the elasticity mechanism is not activated, and the serveruses but a single physical node. As Figure 19a demon-strates, the average latency experienced by the clientsincreases and decreases with the load. At high loads, theserver is unable to process client requests fast enough,and average latency increases dramatically.

For the third period, we use a simple elasticity pol-icy. If the number of clients exceeds some threshold,the server doubles the number of physical nodes used,migrating half its contexts to the new physical nodes.This process continues up to a maximum of 64 physi-cal nodes. If the number of clients drops below a cer-tain level, elasticity is exploited in the opposite direction,and the number of physical nodes is decreased. As Fig-ure 19b demonstrates, the number of servers used there-fore varies proportional to the load of the system. Asexpected, this dynamic migration provides more compu-tational resources to the server rapidly, and as a conse-quence, the average latency experienced by clients in thethird period is much reduced. Similarily, when the con-nected clients leaves, the application scales down and re-leases extra nodes. With this elasticity support, we couldguarantee low request latency as well as minimizing theresource usage under dynamic workload.

8 Related WorkParallel event execution Recent research on auto-mated sequential execution has been moving towards thesystems where programmers specify parallelism to allowthe compiler take advantage of it. One notable exampleis Bamboo [34]. This approach enables parallelism onshared memory multicore systems by utilizing object pa-rameter guards and then compile the language into locks.It is similar to our EVENTWAVE programming modeland runtime implementation in that both provides paral-lelism annotations that direct the compiler to generate anappropriate locking scheme to support parallel executionof events. However, unlike EVENTWAVE, Bamboo doesnot support distributed or elastic execution.

Computation offloading One line of research withsimilar goals to EVENTWAVE is computation offload-ing, where an application is partitioned between a client

and server (or multiple servers) to improve perfor-mance [16, 23, 26, 29, 31]. These approaches are similarto EVENTWAVE’s distribution of a single logical node’scomputation across multiple physical nodes. However,there are key differences. Some of these approaches usestatic partitioning—the program is analyzed at compiletime and a fixed partition across client and server iscomputed—and hence cannot provide dynamic elastic-ity. Others perform dynamic partitioning, allowing themto respond to changing environments and load. How-ever, these approaches mostly target applications eitherwith a single thread of control, or where the program-mer has explicitly added parallelism and synchroniza-tion. EVENTWAVE aims to supports parallelism whileallowing programmers to reason sequentially.

Pilot job frameworks Pilot job systems support theexecution of a set of tasks on an elastic set of computa-tional resources [7, 11, 15, 25, 27, 30]. The underlyingcommonality of these systems is that an application mustbe broken up into a set of isolated tasks. These tasks canbe organized either as a “bag of tasks,” where the taskscan execute independently in any order [7, 15, 25], or asa DAG of tasks, where the completion of one (or more)tasks enables the execution of later tasks [11, 27, 30].These models are fundamentally more restrictive thanEVENTWAVE’s: while tasks are roughly analogous toEVENTWAVE’s events, tasks have very limited interac-tion and cannot communicate with one another, whileevents can be organized in arbitrary ways, and can com-municate through context state. Furthermore, pilot jobframeworks use, effectively, a computation-centric ap-proach to elasticity, where tasks are the basic unit of dis-tribution, and adding resources affects how tasks are dis-tributed. EVENTWAVE, in contrast, uses a data-centricapproach to elasticity, where state is the basic unity ofdistribution. This facilitates state-based interactions be-tween events and also leads to better locality.

Actor model The Actor Model is the basis for sev-eral systems [18, 28]. Actors are collections of stateand code that communicate via message passing, witheach actor behaving atomically. There is a clear con-nection between an actor and a context: the actor hasimplicit parallelism because each entity is independentof each other. Much like EVENTWAVE, Actors adopt adata-centric approach to distribution, with computationbeing co-located with its associated data. The primarydifference between the two models is that EVENTWAVEprovides event atomicity across multiple contexts, ratherthan treating contexts as independent entities.

Orleans extends the Actor Model to allow transac-tional execution across multiple actors and to supportelasticity [8]. However, the elasticity model of Orleansis different from EVENTWAVE. In EVENTWAVE, elas-

ticity is achieved by partitioning state across different re-sources, while Orleans achieves elasticity through statereplication, allowing parallel execution of the same ac-tor at multiple physical nodes. Orleans’ programmingmodel trades off flexibility for consistency: Orleans’transactional events have no restrictions on their exe-cution, unlike EVENTWAVE’s event model. However,Orleans’ replication-based approach to elasticity doesnot provide sequential consistency. We note, however,that EVENTWAVE may be complementary to Orleans.An Orleans actor could be implemented using EVENT-WAVE, allowing the use of EVENTWAVE’s elasticitymechanism and hence providing stronger semantics.

Scalable databases Scalable databases, notably Elas-Tras [12], MegaStore [1] and Cloud SQL Server [4] typi-cally employ two-level structure to partition the data intoshards. ACID properties is guaranteed inside a shard.

At high level, EVENTWAVE and these systems usethe same basic design principles: state (or data) is par-titioned and hosted by a set of nodes. The global par-tition manager in Cloud SQL Server and the dynamicpartitioning mechanism of ElasTras are both similar inprinciple to how EVENTWAVE maps contexts to nodes,but both approaches aim at failure recovery.

Live migration Both virtual machine live migrationand database live migration [14] aim at redistributing thestate of a distributed system. At high level, their redistri-bution is similar to EVENTWAVE’s migration of contextstate. However, these systems solve a different problem,attempting to achieve load balance in multi-tenant envi-ronments, rather than providing elasticity.

9 ConclusionsDeveloping elastic cloud applications that can dynami-cally scale is hard, because the elasticity complicates theprogram’s logic.

We described EVENTWAVE, a new programmingmodel for tightly-coupled, stateful, distributed applica-tions that provides transparent elasticity while preserv-ing sequential semantics. An application is written as afixed number of logical nodes and the runtime provideselastic execution on arbitrary numbers of physical nodes.

Our case studies suggest EVENTWAVE eases thedevelopment effort for elastic cloud applications, andEVENTWAVE applications scale efficiently.

AcknowledgmentsWe would like to thank the anonymous reviewers fortheir helpful comments and suggestions, and our shep-herd, Phil Bernstein, for his many useful suggestions onhow to improve this paper.

References[1] J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin,

J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh.Megastore: Providing scalable, highly available storagefor interactive services. In CIDR, volume 11, pages 223–234, 2011.

[2] R. Bayer and M. Schkolnick. Concurrency of operationson b-trees. Acta informatica, 9(1):1–21, 1977.

[3] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil,and P. O’Neil. A critique of ansi sql isolation levels. ACMSIGMOD Record, 24(2):1–10, 1995.

[4] P. A. Bernstein, I. Cseri, N. Dani, N. Ellis, A. Kalhan,G. Kakivaya, D. B. Lomet, R. Manne, L. Novik, andT. Talius. Adapting microsoft sql server for cloud com-puting. In Data Engineering (ICDE), 2011 IEEE 27thInternational Conference on, pages 1255–1263. IEEE,2011.

[5] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Con-currency control and recovery in database systems, vol-ume 370. Addison-wesley New York, 1987.

[6] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leis-erson, K. H. Randall, and Y. Zhou. Cilk: An efficientmultithreaded runtime system. In PPoPP’95, pages 207–216, 1995.

[7] B. Bode, D. M. Halstead, R. Kendall, Z. Lei, and D. Jack-son. The portable batch scheduler and the maui scheduleron linux clusters. In Usenix, 4th Annual Linux Showcase& Conference, 2000.

[8] S. Bykov, A. Geller, G. Kliot, J. R. Larus, R. Pandya, andJ. Thelin. Orleans: Cloud computing for everyone. InProceedings of the 2nd ACM Symposium on Cloud Com-puting, page 16. ACM, 2011.

[9] P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kiel-stra, K. Ebcioglu, C. Von Praun, and V. Sarkar. X10: anobject-oriented approach to non-uniform cluster comput-ing. ACM SIGPLAN Notices, 40(10):519–538, 2005.

[10] W.-C. Chuang, B. Sang, C. Killian, and M. Kulkarni. Pro-gramming model support for dependable, elastic cloudapplications. In Proceedings of the Eighth USENIX con-ference on Hot Topics in System Dependability, pages 9–9. USENIX Association, 2012.

[11] P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger.Workflow in condor. In In Workflows for e-Science,Editors: I.Taylor, E.Deelman, D.Gannon, M.Shields.Springer Press, 2007.

[12] S. Das, D. Agrawal, and A. El Abbadi. Elastras: An elas-tic transactional data store in the cloud. USENIX Hot-Cloud, 2, 2009.

[13] J. Dean and S. Ghemawat. Mapreduce: simplified dataprocessing on large clusters. In OSDI’04, pages 10–10.USENIX Association, 2004.

[14] A. J. Elmore, S. Das, D. Agrawal, and A. El Abbadi.Zephyr: live migration in shared nothing databases forelastic cloud platforms. In Proceedings of the 2011 ACMSIGMOD International Conference on Management ofdata, pages 301–312. ACM, 2011.

[15] J. Frey, T. Tannenbaum, M. Livny, I. Foster, andS. Tuecke. Condor-g: A computation managementagent for multi-institutional grids. Cluster Computing,5(3):237–246, 2002.

[16] X. Gu, K. Nahrstedt, A. Messer, I. Greenberg, andD. Milojicic. Adaptive offloading for pervasive comput-ing. volume 3, pages 66–73. IEEE, 2004.

[17] C. Hewitt. Actor model of computation: Scalable robustinformation systems. arXiv preprint arXiv:1008.1459,2010.

[18] C. Hewitt. Tutorial for actorscript (tm) extension ofc sharp (tm), java (tm), and objective c (tm): iadaptive(tm) concurrency for anticloud (tm) privacy and security.2010.

[19] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.Dryad: distributed data-parallel programs from sequen-tial building blocks. ACM SIGOPS Operating SystemsReview, 41(3):59–72, 2007.

[20] A. Kangarlou, P. Eugster, and D. Xu. Vnsnap: Takingsnapshots of virtual networked environments with min-imal downtime. In Dependable Systems & Networks,2009. DSN’09. IEEE/IFIP International Conference on,pages 524–533. IEEE, 2009.

[21] C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life,death, and the critical transition: Detecting liveness bugsin systems code. In NSDI, 2007.

[22] C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, andA. M. Vahdat. Mace: language support for building dis-tributed systems. In PLDI ’07, pages 179–188. ACM,2007.

[23] Z. Li, C. Wang, and R. Xu. Computation offloading tosave energy on handheld devices: a partition scheme. InCASES’01, pages 238–246. ACM, 2001.

[24] M. Litzkow, M. Livny, and M. Mutka. Condor - a hunterof idle workstations. In ICDCS’88, 1988.

[25] A. Luckow, L. Lacinski, and S. Jha. Saga bigjob: Anextensible and interoperable pilot-job abstraction for dis-tributed applications and systems. In Cluster, Cloud andGrid Computing (CCGrid), 2010 10th IEEE/ACM Inter-national Conference on, pages 135–144. IEEE, 2010.

[26] P. McGachey, A. Hosking, and J. Moss. Class transfor-mations for transparent distribution of java applications.volume 10, 2011.

[27] J. T. Moscicki. Diane-distributed analysis environmentfor grid-enabled simulation and analysis of physics data.In Nuclear Science Symposium Conference Record, 2003IEEE, volume 3, pages 1617–1620. IEEE, 2003.

[28] M. D. Noakes, D. A. Wallach, and W. J. Dally. Thej-machine multicomputer: an architectural evaluation.In ACM SIGARCH Computer Architecture News, vol-ume 21, pages 224–235. ACM, 1993.

[29] S. Ou, K. Yang, and A. Liotta. An adaptive multi-constraint partitioning algorithm for offloading in perva-sive systems. In PerCom’06, pages 10–pp. IEEE, 2006.

[30] I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde.Falkon: a fast and light-weight task execution framework.

In SC’07, pages 1–12. IEEE, 2007.[31] L. Wang and M. Franz. Automatic partitioning of

object-oriented programs for resource-constrained mo-bile devices with multiple distribution objectives. In IC-PADS’08, pages 369–376. IEEE, 2008.

[32] S. Yoo, C. Killian, T. Kelly, H. K. Cho, and S. Plite. Com-posable reliability for asynchronous systems. In USENIXAnnual Technical Conference (ATC), 2012.

[33] S. Yoo, H. Lee, C. Killian, and M. Kulkarni. Incontext:simple parallelism for distributed applications. HPDC’11, pages 97–108. ACM, 2011.

[34] J. Zhou and B. Demsky. Bamboo: a data-centric, object-oriented approach to many-core software. PLDI ’10,pages 388–399. ACM, 2010.

Documents

EventWave: Programming Model and Runtime Support …milind/docs/socc13.pdf · EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications Wei-Chiu