12
DMake: A Tool for Monitoring and Configuring Distributed Systems Theodoros Gkountouvas Cornell University Ithaca, NY, USA [email protected] Kenneth Birman Cornell University Ithaca, NY, USA [email protected] David Anderson Washington State University Pullman, WA, USA [email protected] ABSTRACT Designers of long-running distributed systems must define management policies to address the various runtime disrup- tions that can arise in distributed settings. In many applica- tions, notably ones requiring high-assurance guarantees, this management infrastructure must be flexible enough to repre- sent non-trivial requirements and dependencies, and poised to quickly adapt if changes in the behavior of the system require an intervention. Our belief is that with the emer- gence of cloud computing and a very large scale “migration” of applications towards cloud hosting, a growing community of non-experts will find themselves developing such applica- tions, and that software tools oriented towards this sort of cloud user will be required. Our paper describes DMake, a new system responding to this need. DMake is unusual in offering a very flexible range of management options through a familiar and widely popu- lar model: that of the Unix make utility. DMake represents a significant generalization of Unix make, and its extended “makefile” format can express complex policies. The sys- tem has a rigorous semantics, including strong consistency and reliability properties that map to distributed consen- sus (state machine replication). We believe that DMake opens the door to much more sophisticated management policies by users with relatively little specialized training in distributed computing or system management. Keywords Distributed Systems, Management, Monitor, Configuration, Make, DMake 1. INTRODUCTION Management of autonomous applications is classic problem for which a variety of solutions have been created, includ- ing some extremely sophisticated ones. For example, many readers will be familiar with dashboard interfaces for initial deployment and resource allocation for managing managed distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the “Elastic Beanstalk” [19] service is used to automate load-balancing, fail-over and sev- eral other features: one selects the desired configuration and then fills in a parameters page. In other settings one finds far more elaborate domain-specific tools that provide users with the ability to employ highly customized policies, but these often have specialized models that the typical devel- oper might find unfamiliar and hard to use. For exam- ple, cloud developers at companies like Google, Microsoft or Amazon have access to elaborate internal management infrastructures, with features going well beyond what Elas- tic Beanstalk or similar APIs provide. Our work begins with the observation that there is surprisingly little available in the “middle” of the spectrum: tools allowing customization of policies but that are easy to use and aimed at the large community of non-expert developers. DMake is a new management tool that automates enforce- ment of user-defined management policies, mapping the man- agement problem to a more familiar paradigm: that of con- structing a system from components in a manner controlled by a dependency graph represented by a structure similar to a Makefile [2, 9]. The intent is to provide developers with much more control than is possible with tools like Elastic Beanstalk, yet without requiring them to become domain experts in the area of cloud-hosted distributed system man- agement. The policies provided to DMake take the form of application- specific dependency graphs, and the basic DMake model is one in which events trigger reactive actions, a structure that is also used in Makefiles. Predefined events include failures of computer nodes or failures of application processes which are detected from DMake, but the system is easily extended to sense any change in the state of an application process provided that there is some way to track that state (our pre- ferred approach is to just have the application update status files, which use an XML data representation and can encode whatever data the developer wishes to track). DMake’s man- agement cycle starts when some form of management event occurs. The system captures the associated information (or aggregates information from multiple locations), and then enforces the management policies according to the depen- dency graph specified by the current Makefile, which trig- gers the creation of an updated configuration for the appli- cation. This configuration is then employed by the system to adjust system functionality in any desired manner: appli-

DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

DMake: A Tool for Monitoring and Configuring DistributedSystems

Theodoros GkountouvasCornell UniversityIthaca, NY, USA

[email protected]

Kenneth BirmanCornell UniversityIthaca, NY, USA

[email protected]

David AndersonWashington State University

Pullman, WA, [email protected]

ABSTRACTDesigners of long-running distributed systems must definemanagement policies to address the various runtime disrup-tions that can arise in distributed settings. In many applica-tions, notably ones requiring high-assurance guarantees, thismanagement infrastructure must be flexible enough to repre-sent non-trivial requirements and dependencies, and poisedto quickly adapt if changes in the behavior of the systemrequire an intervention. Our belief is that with the emer-gence of cloud computing and a very large scale “migration”of applications towards cloud hosting, a growing communityof non-experts will find themselves developing such applica-tions, and that software tools oriented towards this sort ofcloud user will be required.

Our paper describes DMake, a new system responding tothis need. DMake is unusual in offering a very flexible rangeof management options through a familiar and widely popu-lar model: that of the Unix make utility. DMake representsa significant generalization of Unix make, and its extended“makefile” format can express complex policies. The sys-tem has a rigorous semantics, including strong consistencyand reliability properties that map to distributed consen-sus (state machine replication). We believe that DMakeopens the door to much more sophisticated managementpolicies by users with relatively little specialized trainingin distributed computing or system management.

KeywordsDistributed Systems, Management, Monitor, Configuration,Make, DMake

1. INTRODUCTIONManagement of autonomous applications is classic problemfor which a variety of solutions have been created, includ-ing some extremely sophisticated ones. For example, manyreaders will be familiar with dashboard interfaces for initialdeployment and resource allocation for managing managed

distributed applications on platforms like Amazon’s elasticcompute cloud (EC2) [1], where the “Elastic Beanstalk” [19]service is used to automate load-balancing, fail-over and sev-eral other features: one selects the desired configuration andthen fills in a parameters page. In other settings one findsfar more elaborate domain-specific tools that provide userswith the ability to employ highly customized policies, butthese often have specialized models that the typical devel-oper might find unfamiliar and hard to use. For exam-ple, cloud developers at companies like Google, Microsoftor Amazon have access to elaborate internal managementinfrastructures, with features going well beyond what Elas-tic Beanstalk or similar APIs provide. Our work begins withthe observation that there is surprisingly little available inthe “middle” of the spectrum: tools allowing customizationof policies but that are easy to use and aimed at the largecommunity of non-expert developers.

DMake is a new management tool that automates enforce-ment of user-defined management policies, mapping the man-agement problem to a more familiar paradigm: that of con-structing a system from components in a manner controlledby a dependency graph represented by a structure similar toa Makefile [2, 9]. The intent is to provide developers withmuch more control than is possible with tools like ElasticBeanstalk, yet without requiring them to become domainexperts in the area of cloud-hosted distributed system man-agement.

The policies provided to DMake take the form of application-specific dependency graphs, and the basic DMake model isone in which events trigger reactive actions, a structure thatis also used in Makefiles. Predefined events include failuresof computer nodes or failures of application processes whichare detected from DMake, but the system is easily extendedto sense any change in the state of an application processprovided that there is some way to track that state (our pre-ferred approach is to just have the application update statusfiles, which use an XML data representation and can encodewhatever data the developer wishes to track). DMake’s man-agement cycle starts when some form of management eventoccurs. The system captures the associated information (oraggregates information from multiple locations), and thenenforces the management policies according to the depen-dency graph specified by the current Makefile, which trig-gers the creation of an updated configuration for the appli-cation. This configuration is then employed by the systemto adjust system functionality in any desired manner: appli-

Page 2: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

Management of wide-area bulk power networks poses chal-lenges. One must deploy instrumentation (so-called syn-chrophasor measurement units, or PMUs) on the powerbusses with adequate redundancy to properly capture gridstate. Bulk power systems are operated by regional manage-ment entities called independent service operators (ISOs),and these need to collaborate and coordinate cross-regionalpower flow. ISOs increasingly need to integrate renewablepower into their networks (large wind farms, solar, etc) andthese sources can fluctuate, so the ISO must be prepared tobalance any variability using other sources of power, per-haps imported at long distances. Disruptions can sweepthrough the grid much like a wave propagating on a water

surface, and the ISO must watch for such events and takeremedial actions in real-time to prevent damage or disrup-tive outages. Further, there is a need to plan ahead so thatif power demand surges due to weather conditions or otherunusual events, the generating capacity will be prescheduledand available to respond to that demand.

Not surprisingly, the machinery of the smart grid translatesto a fairly large amount of software that involves many co-operating subsystems, each consisting of multiple programsand data sources. Overall, one has to manage the PMUdeployment and the network connecting the PMUs to thedata collectors (a network that would often include PMUdata concentrators as a kind of relaying and control compo-nent). In our work on GridCloud, the network terminateswithin the Amazon EC2 cloud in a bank of data collectorswhich shard the data, so that any given PMU is redundantlymonitored and archived. Then we can run applications suchas state estimation on the data, and these applications alsoinvolve multiple programs running on multiple nodes. Thereare many such applications, some running continuously (ifthese are disrupted by some sort of EC2 elasticity event,we need to immediately repair the configuration to restorefull service within a few seconds), and others on demand(i.e. if an operator poses a “what if?” question, or launchessome sort of rarely needed program to investigate a newdevelopment).

Figure 1: GridCloud as Motivation for DMake

cations can be stopped or started, entire nodes shut down,new parameter files can be written, and many more changesare supported. Interaction with active applications is ac-complished by writing XML files that the application reads(an interrupt or notification event can also be sent to theapplication, if desired).

The structure of our paper is as follows. We start by brieflymotivating the need for a new kind of management solutionin the sections that immediately follow. The architecture ofDMake is described in Section 4 and Section 5 includes theoptimizations made to improve the performance of DMake.We report the evaluation results of DMake at Section 6.We summarize topics for future research in Section 7 andin Section 8, we give more details on prior work related toDMake.

2. CLOUD MANAGEMENT2.1 Motivation: GridCloudOur work started as a side-effect of setting out to create anew kind of cloud computing platform: GridCloud, whichis a hosted environment for the so-called Smart Power Grid[13]. Very briefly, this term refers to a new class of comput-ing systems that capture information about the state of apower grid and then use machine learning or optimizationtechniques to efficiently deliver electric power to the home,industry, and other kinds of infrastructures like urban streetlighting systems. The power grid is hierarchical, with a bulkpower infrastructure that handles high-tension lines at longdistances and a collection of regional distribution systems

that deliver power to the home over lower tension networks.We’re looking at transmission and distribution, but the workthat led us to develop DMake is concerned with the bulkpower network (Figure 1).

The critical nature of bulk power delivery brings the ob-vious requirements: extremely strong security (all storedor transmitted data is encrypted), tightly controlled shar-ing for information moved from ISO to ISO or to the vari-ous distribution network operators, real-time data deliveryobligations, consistency, redundancy for archived historicalrecords, geographic redundancy for the system as a whole,etc. The main elements of these applications are softwaresystems created by the same experts who have operated theISO’s control systems for decades: normal developers whouse normal computing tools and have extensive experiencebut primarily using small dedicated clusters.

Why host a smart-grid control system on the cloud? Thereasoning centers on economics and is out of our scope here,but in short, the cloud offers the scaling these systems re-quire (PMU deployments will someday be national in scale:a full US deployment could someday include 100,000 PMUs,each generating 44 bytes of data 30 times per second). Obvi-ously any single ISO would need just some of this data, butthe volume would be considerable; the cloud fits this modeltoday, while other approaches might require substantial andcostly new data centers. Thus it is appealing to at least con-sider using existing cloud systems, operated by vendors thatthe power grid community considers adequately trustworthy.

Page 3: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

In effect, one can then leverage the huge existing investmentthat went into creating the cloud and today sustains it.

2.2 The management challengeAs this background should make clear, our work is moti-vated by the emergence of a new kind of cloud computingapplication. In the past, most cloud applications were builtusing platform as a service (PaaS) solutions: customizableplatforms that support some style of cloud computing pop-ular (and highly monetized) by the Web. Popular examplesinclude Google AppEngine [24], Microsoft Azure [3], etc.Such systems allow the user to build a wide variety of self-managed cloud-hosted functionality, but they also constrainthe developer to a certain style of computing that on thewhole rejects strong guarantees in favor of weaker forms ofeventual consistency. Further, these kinds of assumptionsare often baked deeply into not just the application devel-opment tools, but also the associated management infras-tructures.

But GridCloud is just one of many platforms that departsfrom those PaaS models by using the cloud as an infras-tructure (IaaS), importing potentially complex functional-ity that brings an application into a cloud setting for whichmuch stronger requirements arise. GridCloud illustrates manyof these: it must operate in a highly autonomous manner,and because of the EC2 sharing elasticity model, events suchas failures, activity migration or transient load spikes mightbe common. For example, our effort emerges from a a com-munity developing cloud-hosted smart power grid applica-tions. In the past, power grid state estimation (SCADA)and analytic systems ran mostly on dedicated HPC clusters.The cloud is not a particularly good match for these kindsof systems at present, but has such strong cost advantages,and brings such exciting new capabilities (particularly withrespect to elastic computing and hosting large data sets)that work aimed at overcoming barriers to adoption seemsappealing. DMake is a contribution aimed at this need.

Our assumption is that the DMake user community will beunited by a number of features. First, just as in the powergrid example, they will be a community with extensive ex-isting code bases but motivated to migrate toward the cloudbecause of a new need to leverage elasticity (pay for whatyou use, when you need it), to reduce cost of ownership, andto leverage cloud-scale data collection and machine-learningalgorithms to optimize power delivery. Indeed, for certainkinds of large-scale uses, the cloud seems to be the only com-puting model that one could reasonably consider. Yet suchcommunities also bring a very real form of inertia. The verynature of the process forces developers who have long beencomfortable with a dedicated Linux server model of comput-ing to suddenly confront platforms like EC2 or Eucalyptus,which is a situation very different from the experience of adeveloper working at a company like Google or Amazon onapplications created with the cloud in mind from the outset.Moreover, again using GridCloud as our example, becausethe existing power grid still centers on human control, thesedevelopers are not merely moving applications into an unfa-miliar setting, but are interconnecting them to control cen-ters staffed by operators who bring all sorts of human-factorconsiderations into the mix, while also extending applica-tions that in the past might have been used mostly in offline

settings to play real-time, infrastructure-critical roles

Notice the complex mix of needs that now arises. Our goalsinclude robustly capturing data from a wide area in real-time, performing an optimization task that might lead toactions such as directly controlling switchable bulk powerlines, or responding to a transient disruption, and we needto carry out these tasks on a cloud infrastructure that mightbe reconfigured dynamically. The system will need to reactto these events, yet was built by a developer who is familiarwith a standard style of legacy “owned” HPC computing,and who needs to reuse the large body of pre-existing codeand solutions used in that setting. Thus, control actions thatin the past would have occurred offline and with ample timewill now need to be performed from the cloud and in real-time, using automated solutions that must remain continu-ously active. Control mishaps could cause costly damage tophysical infrastructures, like large transforms or wind gener-ating systems, which could take years to repair or replace. Insummary, we see a mix of needs that diverge in many waysfrom those of the more typical cloud services operated in sup-port of mobile apps or social networking. Further, this needis not unique to the smart power grid community. Similartrends are occurring in health-care, transportation control(especially smart vehicles that use cloud-hosted helper ser-vices), smart buildings and citys, and many more settings.

3. MANAGEMENT AS MIDDLEWAREOur DMake solution operates as a middleware layer, res-ident between the application itself and the managementhooks that can be used to directly access application stateor modify application configuration (see figure 2), represent-ing policies as dependency graphs, an approach very similarto Makefile. As seen in the figure, management of a dis-tributed system using DMake involves a) monitoring of thestate of the system b) taking management decisions that de-pend on the monitored state and c) configuring the systemin order to comply with these decisions. DMake presumesthat this procedure should be automatic.

Management Layer

Policy 1 Policy 2 Policy 3

Application Layer

Proc 1 Proc 2 Proc 3

DMake

Figure 2: DMake as Middleware

There are many management tools in the same domain asDMake, but we believe that DMake is the first to combinedynamic management of applications with strong guaran-tees, the ability to define rich set of management policiesand the ease of usage afforded by our decision to mimic the

Page 4: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

make utility. More specifically, some of these tools [15, 23]focus on deploying the application to a distributed environ-ment and on allocating resources to application processesand thus, they offer limited management choices. Othersolutions [4, 5], provide dynamic management but requirehuman intervention to manage the distributed system whensomething unexpected happens, rendering them more suit-able for debugging. Several tools [11] are designed for man-agement of specific types of systems and do not target a widerange of distributed systems applications. Finally, there aretools [11, 18] designed for only monitoring, and that do notlink monitoring and management of the system.

We believe that implementation of user-defined managementpolicies is significantly simplified by separating it from themanaged application code. In this we are inspired by trendsin networking, where researchers are demonstrating majoradvantages to Software Defined Networks [21] in which thecontrol plane (where management decisions are made) is de-coupled from the data plane. Indeed, we increasingly viewDMake as a tool that might, with further evolution, be us-able in SDN settings; they are, after all, large distributedsystems with distributed control needs, triggered by events,and subject to a variety of constraints. But the SDN com-munity is not the only one to have successfully separatedmanagement from the application itself. For example, theHadoop platform [20] has been hugely successful. In thissystem, a master node orchestrates the worker nodes ac-cording to management policies provided by the user. Andthe SWIFT scripting technology [22, 25] is similarly popularfor control of semi-autonomous HPC applications, which areoften developed on dedicated clusters but then operated invery large scale shared ones.

In contrast, DMake targets applications that will be wireddirectly to the real world, capturing data in real-time, pro-cessing it rapidly and delivering results to human opera-tors who need to see system state within seconds, and needstrong guarantees of security and consistency when theyshare information across regional boundaries. The require-ments thus take us in a direction very different from those ofthese prior solutions, which generally have a strongly “batchcomputing” feel to them, or in the case of today’s cloud com-puting control systems, are mostly aimed at servers that arefar less interdependent and interconnected than in a systemlike GridCloud. For example, with Elastic Beanstalk, Ama-zon will dynamically vary the pool of first-tier servers, andthen encourages a style of coding in which those servers canbe launched or killed as needed, without loss of information(they store data in S3, Dynamo or Dynamo-DB [16], or evenZookeeper [10]). This is very different from what a commu-nity like our target one would be used to. DMake aims at adeveloper who has absolutely no intention of reimplementingeverything from scratch in a completely new style.

DMake’s role begins when a cloud-hosted system experiencesa disruption, a load shift, a need to reconfigure, etc. Whensuch issues do arise, it is important that they be rapidlyresolved, since each such contingency reduces the ability ofthe system to deal with further disruptions. Moreover, be-cause our goal is to manage critical infrastructures, such asthe bulk power transition network, DMake must ensure thatthe GridCloud system only passes through consistent, cor-

rect states and is never misconfigured in ways that can causeharm to the physical grid it operates. Finally, the DMakemanagement code must be reasonably concise.

We believe we have achieved all of these objectives. For ex-ample, our GridCloud management system required just sixhundred lines of application management code, with whichit creates an initial setup of 72 nodes running approximately5000 processes, handle node and process failures and providepolicies for sophisticated dynamic workload balancing thatmaintain the performance of GridCloud in satisfactory lev-els even in the presence of the kinds of instabilities commonin Amazon’s EC2 (and very uncommon in the HPC settingswhere most components of GridCloud were originally devel-oped). Obviously, as the power community imports moreand more applications into GridCloud, these numbers willgrow. We believe that developers from other communitiescould achieve similar results.

4. ARCHITECTUREThe deployment of DMake is conceptually simple, as seen inFigure 3. A DMake process should be instantiated on everycomputer node in the system, along with the executablesfor whatever application processes the user wishes to de-ploy. DMake itself is launched first, and as will be explainedin more detailed shortly, it will then launch the applicationprocesses that the current configuration requires on the nodein question. Because they are children of DMake, these ap-plication processes can be monitored and configured by theDMake daemon.

We needed a simple way for application processes to sharetheir states with DMake, and settled on an approach inwhich this is done by writing information into an xml file.Thus, application state should be able to include anythingthe user needs to monitor for configuring the whole applica-tion. DMake augments application-supplied state with ad-ditional state it captures from the operating system. DMakeperiodically checks if the state has changed, under a re-fresh policy the developer can control (frequency of checking,event-driven or time-driven, etc).

For example, data-streaming applications may need to fre-quently review network latencies and sense process failures,reacting by quickly rerouting data to a different data col-lection node so as to avoid loss of data. In this case, thefrequency at which DMake checks for updates should be setto a relatively small number in order to avoid disruptionsto the application itself. However, rapid polling consumesCPU time and frequent checks (<200ms) can have a signifi-cant impact in the performance of the application (especiallyin computer nodes with a low number of CPUS). In contrast,an application that does not need such active managementcan afford a more relaxed frequency of status checking, andthus could avoid much of the associated overhead. Continu-ing with our example: clearly if we decide to reroute trafficfrom some PMU sensor to a different data collector node,that data collecting process needs to be told of its new role.This is an example of a dynamic configuration task. We ad-dress it by having DMake update the relevant configurationfiles, again by writing information in an xml format that theapplication can scan. Much as for the application-state filebut now with information flowing in the opposite direction,

Page 5: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

the application processes are expected to periodically checkfor updates in the configuration, pull the new configurationif any, and adapt to it. Again, the check can be time based,with a periodicity fixed by the developer, or event-based, inwhich case DMake will deliver a signal (if on Linux) or anevent object (if on Windows) to the application.

While the developer does need to modify the managed appli-cation to write its state periodically, and to read configura-tion data periodically, we see these as acceptable obligationsto impose. DMake doesn’t require any particular codingstyle, language, or even any particular speed of reaction,and because there are powerful existing xml file I/O toolsfor every language, the developer shouldn’t find it particu-larly difficult to carry out these new tasks. In contrast thedeveloper who decides to build an application using an exist-ing cloud platform would be urged to work in a completelynew style that would typically be best suited to creation ofa completely new application, from the ground up.

application

OS state config

DMake

Figure 3: DMake deployment inside the computernodes. The dashed lines represent the monitoringphase and how DMake acquires the state of the ap-plication processes. The full lines represent the con-figuration phase and how DMake configures the ap-plication processes.

As each cloud node is launched, its local DMake daemon con-nects itself to other active DMake processes form a DMakegroup, using a software library that addresses these groupformation and messaging tasks (see Figure 4). They alsoelect a DMake leader that takes the role of the coordinatorin the system. The DMake leader collects all the state andinitiates management actions according to the user’s poli-cies. The DMake leader itself is a replicated state machine:one should understand it as a virtual process, operating ina subgroup of the full DMake deployment with a size thatcan be customized by the user depending on how criticalthe application is. For applications that do not require highassurance and are mostly concerned with performance, thesize can be one. Loss of the leader need not be catastrophic:it if fails, the system remains unmanaged for a short periodof time, after which DMake would detect that it has lost itsleader (consensus) and automatically launch a new leader.The new leader queries all the DMake processes for the cur-rent state and after it pulls all the state it starts functioningand the system is no longer unmanaged.

On the other hand, for applications like GridCloud that needcontinuous management and cannot tolerate even brief lossof the leader role, we simply replicate the leader. The groupacts as a single entity, and if one of its members fails, theothers continue to act without even a brief loss of continuity.The group management library we’ve selected, Isis2 [6, 7],was explicitly designed for cloud settings; it handles groupformation and provides the communication and dynamic re-configuration mechanisms described above, offering strongconsistency and reliability guarantees.

The use of Isis2 (isis2.codeplex.com) simplifies DMake inmany ways, bringing fault-tolerance and consistency in theform of a strong execution model: virtual synchrony [8]. Inthis model, members of a process group see the identicalevents in the identical order, including membership changestriggered by process or node failures, which are automati-cally sensed by Isis2. The library also handles bootstrap-ping, and will automatically configure itself to use the bestavailable communication option (UDP only, UDP + IPMC,RDMA over Infiniband, etc), provides multicast flow controland ordering, and includes features for high speed “out ofband” file transfer. The consistency features of the platformhelp ensure that DMake’s management actions are consis-tent, as well.

node1 node2. . .

noden

leader leader leader

state

state1state2

...staten

configs

config1config2

...confign

policies

Figure 4: Architecture of DMake. The dashed ar-rows correspond to transfer of states from localnodes to DMake Leader. The full arrows correspondto configuration messages from DMake leader backto local nodes. The first one is processed (green)while the other ones are ignored (red).

We see this pattern in Figure 4. For each managed sub-system, information captured from the local nodes wherethat subsystem has components is transferred to the statemachine of leader replicas. The replicas share state infor-

Page 6: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

mation: files, stored within folders (one copy per replica)with naming conventions used to distinguish the managedentity from which the state was captured, and one file perinstance in the case of managed entities that can themselvesbe replicated over a set of nodes. An application consistingof multiple subsystems might instantiate this pattern multi-ple times, once per subsystem, or it could use a single over-all configuration management infrastructure covering all thesubsystems at once; as we will see, the choice would gener-ally depend upon the kinds of configuration policies desiredfor the application (e.g. in an application that needs one Bserver for every N members of the A service, one would wantboth managed by a single DMake instance; if A and B werecompletely independent, one might prefer to also managethem independently).

Figure 5 explains how the files are stored in the leader repli-cas. DMake creates a root directory directly under the di-rectory it operates. Additionally, under the root directory,folders with the names of the nodes inside the DMake groupare created. These names can be customized by the useras well during the start-up of DMake or during the execu-tion (changes in node names). If no name is given, DMakeemploys the DNS hostname or IP address. Below them,the application processes folders are created along with thecorresponding state and configuration files.

In addition to the the application status files, DMake usesfiles to report other kinds of management events. When anew node joins the DMake group a folder is created to repre-sent its state, and the event is reported under it. Similarly,when a failed node is detected the event is reported underthe corresponding folder and later it is deleted along withthe folder. A failed node event is triggered when a DMakeprocess fails. Notice that because of network partitioningevents that can arise even within a cloud computing system(we have seen such events within Amazon EC2), it mightbe the case that the node has not failed and the applicationprocesses is actually still functioning, although DMake haslost the ability to monitor and configure it. We adopt theview that an uncontrollable application is a failed applica-tion, because we view active control as a part of applicationhealth. Accordingly, a DMake agent that loses connectiv-ity with the rest of the DMake group shuts the node down.DMake can also track other node attributes such as CPUloads, paging rates, etc.

DMake generates a per-node configuration file that it passesto the daemon on the node, which reads the file to learn theprocesses it should run (or stop), how to find the executablesfor them, what file names it should monitor to capture ap-plication state. application, and how to configure node-levelparameters such as environment variables.

The user defines the policies for the managed applicationusing a dependency graph (see Figure 6). This dependencygraph turns out to be a valuable source of information; asshown later in Section 5 it can be used for a applicationoptimization and to automate self-repair after a disruptiveepisode, and even lends itself to a more formal style of analy-sis that permits us to explore reachable states and to reasonrigorously about management semantics. Thus the depen-dency graph is in a very real sense the core of DMake. We

root

node1 node2 . . . noden

new node fail node proc1 . . . procm DMake

state.xml config.xmlstop proc config.xml

Figure 5: Events (green), State and Configurationfiles (grey) as they are kept in the DMake LeaderReplicas. Directories are marked with yellow color.

use the term DMakefile to refer to this graph.

The DMakefile scripting language is the same as that used bythe Linux make utility [9], and hence can encode simple man-agement logic. However, far more sophisticated manage-ment actions can also be defined, by providing user-definedprograms that might maintain an internal application modeland in the event that the application state changes, initiatereconfiguration or repair actions. Such user-defined manage-ment code would be triggered by changes to processes’ statefiles, events (new node, failed node, failed processes), initialconfiguration files for the application (instructions about ini-tial setup of the application) or intermediate files that ag-gregate the state from multiple processes (probably outputfiles for some other policy). Notice that because DMakeuses files to represent events, there is a natural match to theMakefile model, in which changes to files trigger rebuildingof dependent files: in DMake, changes in the environment orapplication state trigger the recomputation of configurationfiles. A notational extension to the Makefile syntax is usedto access information within a file: if X is a file, X{tag}represents the value associated with some xml-tagged ele-ment within that file (or the set of values, if the tag is usedmultiple times).

Since the DMake leader is elected from the DMake group, itis important that any node in the DMake group eligible forthis leadership role have a way to access the needed man-agement programs, system state files and other data files.Thus, we need to replicate the associated data across thenodes that might participate in the leader replica group,and this could be costly. Accordingly, DMake includes anoption that can be used to restrict the set of nodes that willbe considered as candidates for leadership. This narrows thechoices for DMake but limits the need for replication to asmaller group.

Outputs of the management computation again take theform of files: these may contain some form of intermediateinformation, or could be configuration files for delivery toapplication or DMake processes. Again, DMake actions can

Page 7: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

either generate entirely new versions of the needed files, orcan update fields within them: X{tag}=Y creates an emptyfile X if the file doesn’t already exist, then updates the file,setting the xml field with the specified tag to the given value.The procedure followed when DMake applies the policiesin the DMakefile is very similar to the behavior of make.DMake parses the DMakefile to find input files that changedand triggers the corresponding policies. If output files arechanged then DMake triggers the policies that contain thesefiles as input files and this action continues until there are nonew policies that can be triggered. Indeed, rather than repli-cate the functionality of Make, DMake actually runs Make:our code is basically a wrapper that creates a context withinwhich the normal functionality of Make can be leveraged toperform distributed management tasks. DMake runs Makewhenever it might have an action to trigger: for example,when the state of application processes changes, or when aDMake daemon reports an event (such as new node, failednode, failed process, etc).

Make is not designed for concurrent execution, hence we im-pose a simple form of locking: While a management policyis actively computing, new state or event updates are notprocessed. This avoids the risk of erroneous results. WhenMake finishes, the DMake leader replicas process any up-dates that were generated during its execution, which couldtrigger a further execution of Make. Under the state ma-chine replication model we use, leader replicas always pro-cess updates in the order they were received: events occur ina total order that also respects the sequencing in which theyinitially occurred (that is, a total order that is also causaland FIFO). We are thus able to ensure that an old state willnever somehow overwrite a new one. Further, since policiescan cause changes to the configuration of the applicationprocesses being managed, DMake distributes to each of themanaged local nodes the corresponding configuration file (ifit has changed) in the end of every execution of Make.

Notice that although the DMake leaders perform the sameactions in the same order, they might not run in perfecttemporal lock-step. Accordingly, when we have more thanone leader replica, an update can be initiated earlier (inclock time) by one replica than by some other replica. Inorder to disambiguate messages from the leaders to the localnodes, we maintain a vector clock structure where the fieldscorrespond to the version of the state file the leader replicasreceive from the local nodes. When we run Make, we markthe output configuration files with the value of the vectorclock. We piggyback this value to the message that deliversthe configuration files to local nodes and they can tell bycomparing the values from two different replicas which oneis the configuration with the most recent information. Thevalues would be always comparable since replicas processstate updates in the same order.

output: input1 input2 . . . inputnpolicy

Figure 6: Management Policies Format

There are some inherent limitations with the architecture ofDMake. First, while our model is simple, our centralized ap-proach to management implies that if the application being

managed is very large or generates very frequent manage-ment events, the system could suffer from performance bot-tlenecks in the leader replicas. The DMake leader replicashave to maintain the state of the whole system, run Makeand distribute back the configuration files that changed. Asa result, they incur potentially substantial performance over-heads, and in a heavily loaded setting the reconfiguration ofthe nodes will reflect these delays. However, even in Grid-Cloud, a fairly demanding use, our experience suggests thatreconfiguration of the system is not very frequent and man-agement generally does not require a level of state or com-putation that would cause significant delay. Another limita-tion is that a complex reconfiguration of a system can takea significant amount of time (response time as we refer toit from now on). Applications that require quick reconfigu-ration (within milliseconds) or applications where the statechanges frequently (receive configuration resulting from adifferent state than the current one) might not be a goodmatch for our approach, since DMake requires a round tripto the leader replicas and a certain amount of delay is thusincurred. The next Section explains how DMake tries toalleviate these limitations.

5. OPTIMIZATIONSThe optimizations in DMake have the purpose of relievingthe burden in the leader replicas. First, we batch messagesfrom the local nodes to leader replicas and from leader repli-cas to local nodes. Messages that originate or target appli-cation processes that reside in the same node are similarlybatched. Thus, DMake ends up using significantly less net-work bandwidth, especially for applications that require fre-quent updates. Additionally, the users may annotate theirpolicies in the DMakefile with a “local” tag. A local policy isone that can be performed without obtaining state from anyother node. In such cases, DMake can perform the necessaryaction without involving the central leader. For example, ifthe user desires to relaunch an application that fails, DMakecan do so with no delay and no communication. Notice,however, that this assumes that the node configuration filesare not in any way dependent upon global information thatmight need to be accessed as a consequence of the failureevent. The local tag allows the developer to control the useof this feature, which can be useful in situations where theDMakefile doesn’t encode enough information to correctlysense the presence of a global dependency. When the localoptimization can be used, we prevent extra overhead in theleader instances and achieve improved response time, sincewe avoid the round trip to the leader.

Finally, by annotating policies with different tags the userscan instruct DMake to employ distinct leaders for the dif-ferent policies. The idea is as follows: when tags are used,DMake will check the dependency graph for disjoint man-agement rules, treating the tags as hints but then verifyingthat there are no cross-tag dependencies. If two policies aretagged using distinct tags and the associated event files areconfirmed to be independent (no file is created using infor-mation from the other), DMake will employ multiple differ-ent leader groups (corresponding to different Isis2 processgroups), one for each distinct tag. This optimization canthus yield a decentralized management behavior, where theoverhead of management is spread over multiple replicationgroups. In effect, disjoint subsystems can thus be managed

Page 8: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

by distinct DMake leaders.

6. EVALUATIONAll the experiments were contacted on Amazon’s EC2 plat-form. We use c3.xlarge instances that have 4 vCPUs (HighFrequency Intel Xeon E5-2670 v2 Ivy Bridge Processors),7.5GB memory and “moderate” network bandwidth. Forsome computer nodes in GridCloud we use the c3.large in-stances which have only 2 vCPUs and slightly inferior per-formance characteristics relative to the c3.xlarge ones. Inorder to evaluate DMake we use two applications, a dummydynamic scheduler and GridCloud.

DMake: Dynamic SchedulerWe made a static and a dynamic scheduler for memory in-tensive workloads. All the tasks require approximately thesame amount of resources and require similar time for com-pletion. For the static scheduler we assume that the numberof available nodes is known beforehand. Thus, we simply as-sign to every node its share from the pool of tasks. Sincethere is no heterogeneity in the jobs or in the hardware weuse, this scheduling is highly efficient. Next, we designeda dynamic scheduler with the help of DMake that can per-form quite well even in unfavorable conditions. The dynamicscheduler initially assigns a batch of jobs to the nodes thatjoin the group. Then, it monitors them to determine when itshould assign to them another batch of works and it contin-ues until all the tasks have finished. The dynamic schedulerhas the potential to outperform the static one for cases whenthe outcome of the execution cannot be easily predicted; ourexperiment undertook to evaluate this hypothesis.

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300

Time (s)

0

10

20

30

40

50

60

70

80

90

100

Perc

enta

ge o

f Jobs C

om

ple

ted (

%)

wo DMakew DMake

Figure 7: Normal Operation

In the first experiment (see Figure 7), we run the static andthe dynamic scheduler with 30 available nodes and 9000 dif-ferent jobs. Although, they ended up making very similardecisions (the same number of tasks were assigned to eachnode), static scheduling slightly outperforms the dynamicone. This happens because the dynamic scheduler needs tokeep track of the current state in order to take further deci-sions while the static one is able to distribute all the tasks atthe outset. However, the difference is very small: the num-ber of completed jobs in the dynamic case is within 5% of thenumber completed in the static system. In effect, although

DMake constantly monitors and re-configures the system,this background activity added only a small overhead to thesystem.

Next we repeated the experiment with 1/3 of the nodes sig-nificantly slowed down (artificially) to see how the dynamicscheduler adapts when compared with the static scheduler.Figure 8 shows that the dynamic scheduler maintains a con-stant rate of tasks completion over time while the staticscheduler does not. The dynamic scheduler is able to de-tect the slow nodes and distributes tasks accordingly. Incontrast, the static scheduler distributes tasks evenly overboth slow and fast nodes. As a result, when the fast nodescomplete their tasks (around 240 seconds from the begin-ning of experiment), the slow ones still have to do morethan half of their tasks (they end up finishing at around 550seconds). In this case, the dynamic scheduler thus outper-forms the static one (327 versus 548 seconds). We wouldargue that there are actually many situations in which thedynamic scheduler would be a better choice. For example,integration or failure of worker nodes during the executionwould be impossible with the naive static approach, since ithas no way to recompute its schedule; the dynamic sched-uler would adapt and perform well under these conditions.Failure raises similar issues, as do situations in which thenodes are heterogeneous hardware or where it is difficult topredict the completion time of a task, or where tasks willneed different resources and their execution time will varyaccordingly.

0 50 100 150 200 250 300 350 400 450 500 550 600

Time (s)

0

10

20

30

40

50

60

70

80

90

100

Perc

enta

ge o

f Jobs C

om

ple

ted (

%)

wo DMakew DMake

Figure 8: With 33% Slow Nodes

Figure 9 shows the DMake response times for the same ex-periments, again looking at two scenarios. Here, the termresponse time denotes the interval between the moment alocal DMake process detected a change in the state file of anapplication processes and the moment it wrote to the cor-responding configuration file after receiving a new configu-ration from the relevant Leader. Notice that state changesdo not necessarily lead to a new configuration and we countonly the ones that do. The application response time is thetime between the actual change of the process state and theenforcement of the configuration that ”fixes”the issue causedby it.

As it can be seen in Figure 9, the majority of the DMake

Page 9: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

responses are within 0.5 seconds. Given that the averageround trip from the DMake Leader to local nodes at thetime of the experiment was 0.25 seconds these results seemvery reasonable to us (EC2 communication can be surpris-ingly slow because the nodes are virtualized and often runon shared physical hardware, so there can be long schedulingdelays). There are a few cases in which the response timespikes to more than 1 second. Unfortunately, EC2 has poorclock synchronization and offers limited monitoring options,so we are not able to fully isolate the causes. Our belief isthat these outliers are caused either by transient networkbottlenecks (again, recall that our nodes are shared), or byspikes of state updates that cause delays in the execution ofthe DMake Leader. However, for this experiment we use asingle leader. If we had used a replica group consisting ofmore than one, our solution would have been more tolerantof late responses since all replicas would initiate the same ac-tion, and we would end up measuring the the delay until thefirst response out of many instead of just one. On the otherhand, we would then see slightly increased overhead withinthe replica group itself, an effect that could slightly increasethe response delay. We use the term “slight” because in ourexperience, Isis2 has been quite fast on this platform.

The application response time depends on the frequencywith which DMake checks state and configuration files (seeSection 4). In this particular experiment, we set the rateto one second. Thus, the application and DMake responsetimes for the same event should not differ by more than twoseconds (at most one second for DMake to get the state andat most one second for the process to retrieve the configu-ration). As seen in Figure 9 this is indeed the case. We canincrease the rate to obtain better response times for the pro-cesses, paying a mild penalty for the increased demand onCPU resources. However, for this application the 1 secondrate is adequate, yielding high performance matched to theexperimental setup.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Time (s)

0

2

4

6

8

10

12

Dis

trib

utio

n o

f R

esp

on

se

Tim

es (

%)

DMake resp timeApp resp time

Figure 9: DMake and Application Response Times

It should be stressed that the dynamic scheduler is a dummyapplication, designed purely to evaluate the performance ofDMake. In our view, the experiment represents a good stresstest for DMake since it requires constant reconfiguration andhas frequent state updates. However, it is atypical in the

sense that few real applications would require so demandingand active a form of management: DMake targets a classof uses that would normally involve long-running systemssupporting users who need responses within seconds, butnot milliseconds.

DMake: GridCloudNext, we experimented using our GridCloud platform, de-scribed in Section 2. The experiment used a real deploymentof the current version of the system, in which we collect datafrom simulated Phasor Management Units (PMUs) that aredeployed in the Internet so as to replicate conditions seen inreal power grids (security concerns that make it difficult toexperiment on the actual US power grid, hence these sortsof emulation or simulation studies are the best one can do).Our configuration, however, was extremely realistic, and en-abled us to capture data at the proper rates, and then torun a variety of power grid applications on top of that data.After the data is acquired in the front end (Collectors), itis forwarded to the back-end (state estimators). The stateestimators collectively calculate the global state of the elec-tric power grid, and forward the data to a final node, whereit is rendered for display to the grid operator.

As was mentioned before, GridCloud is the application thatmotivated DMake, and typical of a larger class of stream-ing applications where data flows from sensors deployed inthe real world into the cloud, is handled by multiple sys-tems on multiple nodes, and finally delivered in real-timeto a human user. GridCloud needs strong guarantees aboutthe data (no loss of streams is permitted) and thus, all theprocessing nodes within our cloud configuration are repli-cated in order so as to compensate for any possible nodefailures. Furthermore, GridCloud requires liveness guaran-tees (the data visualized should correspond to a recent stateof the system). The system is thus a real and serious test forDMake. Further, it requires that a variety of managementpolicies be enforced.

The experiment shown below starts with a demanding ini-tial configuration in which we use 24 computer nodes for 381state estimator processes and 42 computer nodes for 4731collector processes (a configuration that might correspondto a monitoring setup for the entire Pacific NW). In totalwe use DMake to configure 72 nodes that run more than5000 processes. The state estimator configurations here arerelatively simple and they do not require sophisticated poli-cies. On the other hand, for the collector processes things getmore complicated. We have to enforce application-specificconstraints about the mapping of data streams to the ap-propriate collectors, and we need to simultaneously balancethe workload between collector nodes. Our goal is to assignapproximately the same number of streams (or collector pro-cesses since we have one process per stream) to each of thecollector nodes.

Figure 10 shows the amount of time DMake needs to con-figure the entire GridCloud platform. More specifically, thefigure shows what percentage of processes has been config-ured (and also have provided an acknowledgement to theDMake Leader) after the formation of the group. The first44 seconds are needed for some initial DMake configurationafter the group formation and for enforcing all these sophis-

Page 10: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

ticated policies described in the previous paragraph for the4731 collector processes (Make).The whole platform is con-figured in approximately 60 seconds resulting in more than80 processes configurations per second in average. The ef-fects of batching are clear: one can see bursts of actionsthroughout the whole timeline.

44 46 48 50 52 54 56 58 60 62 64

Time (s)

0

10

20

30

40

50

60

70

80

90

100

Pe

rce

nta

ge

of

Pro

ce

sse

s C

on

fig

ure

d (

%)

Figure 10: DMake in GridCloud Initial Deployment

We reran the same experiment with the injection of collectorprocess failures (1/4 of the processes fail shortly after theyare configured). In this experiment, our policy for handlingthese failures is to rerun the collector with the same con-figuration, a policy that can be carried out locally in eachnode (see Section 5). Thus, process failures are handledwithin 200ms, which is the monitoring cycle (every 200msDMake locally searches for failed processes). This permitsus to completely mask the failures: in the eyes of our humanoperators, the resulting run is not distinguishable from thefailure-free one. Note, however, that this class of failures iseasy to detect, because the node itself remains healthy andhence the DMake daemon can directly sense the crashes.

We also have policies for node failures. In case a collectornode fails, we look for other available collector nodes thatare not fully loaded and distribute the workload to themaccording to our load balancing policies, somewhat like thedynamic scheduler experiment. If there are no available col-lectors which also satisfy the constraints GridCloud imposes,our current policy is to wait until some node becomes avail-able (it would not be hard to build a fancier one in whichwe would ask EC2 to launch extra nodes). If a state estima-tor node fails, we stop collector processes that transfer datato it and then wait for an available state estimator node totake on the unassigned workload. But here failure detectionis a much harder problem: the DMake daemon becomes un-responsive, but on EC2, such events are common and Isis2

has a long built-in hysterisis to reduce the risk of false de-tections. Once failure detection actually occurs, the DMakeresponse time is very short. We did not include data forthis case (although we have tested it): the resulting graph isdominated by the failure detector delay parameter used byIsis2 and hence is not a very interesting case for evaluationof DMake.

7. FUTURE WORKAs future work, we would like to explore the idea of integrat-ing formal methods in distributed computing management.By doing so, we hope to be able to describe desired applica-tion properties (for example, that a system always createsone data collector for each data stream), and use formalmethods to confirm that the policies applied to the underly-ing application satisfy the objectives. Similarly, it may be ofinterest to explore the space of states the application couldenter, and the trajectories employed to recover from unde-sired states; one could flag as incorrect a policy that mightleave the system permanently in an undesired state.

Another interesting topic it to create a management frame-work for multiple users. Recall that GridCloud is intendedas a shared management and monitoring infrastructure thatmight be used by many operators, each of whom mightlaunch applications as needed. This creates the need fora collaboration platform within which we may need to con-firm that different users use compatible policies, and moregenerally to explore the question of whether policies mightinteract with each other in an undesired manner. Indeed, wesee the lack of a framework like ours as one major barrierto the emergence of solutions for the smart grid and similarapplications.

8. RELATED WORKEarly work [12, 14, 17] focuses on management policies, howthey should be defined by users and how conflicts are solvedwhen these policies are provided by different entities. Thiswork emphasizes the need for dynamic management and au-tomatic enforcement of the policies to the underlying sys-tems. They envision policies as hierarchies where the low-level policies are generated automatically from high-leveluser defined policies, sharing the same perspective as DMakethat policies should be easily defined by users. Furthermore,they discuss about conflicts in management policies posed bydifferent users and they propose an authorization scheme inorder to solve them.

Rhizoma [23] is a more recent work with very similar archi-tecture as ours. In Rhizoma, the computer nodes form acommunication group and they elect a coordinator amongstthemselves. Similar to DMake, the coordinator is not a sin-gle point of failure since if it fails a new coordinator wouldbe elected from the rest of the group. The coordinator col-lects all the monitored data and applies the correspondingactions to the rest of the system. Actions in Rhizoma are de-fined by the users in the form of a constraint program (usingconstraint logic programming). However, Rhizoma is morefocused on deployment of the application and resource allo-cation. Thus, the only actions that are allowed is adding orremoving new nodes, limiting significantly the managementoptions of the user.

Plush [4, 5] is another tool focused more on the deploy-ment and the debugging of a distributed systems applica-tion. Plush uses a primary-backup scheme for controllersfollowing a slightly different approach than Rhizoma andDMake. Again, all the monitored data is collected to thecontrollers and Plush can take actions according to the usersinput. However, Plush requires human intervention to takeactions and reconfigure the system, rendering it inappropri-

Page 11: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

ate for applications where reaction to certain events in aminimum amount of time is critical. Thus, it is more ap-propriate for debugging or for applications where a quickresponse to state changes is not crucial. Also, users need tobecome familiar with the concept of blocks that Plush uti-lizes in order to define their policies, rendering Plush difficultto use.

Other works in this domain [11, 18] aim to monitor efficientlythe underlying applications. Monitor of the applications en-ables dynamic management of the system, but these toolsdeal solely with monitoring and they treat the managementof the system as an independent procedure that needs to bedone.

9. CONCLUSIONSOur work on the smart grid led us to recognize that exist-ing cloud management tools are inadequate in many ways.While very suited to the control of highly scalable, state-less, first-tier cloud applications that store data deeper inthe cloud, such tools offer very little to developers who arefaced with migrating applications from dedicated cluster en-vironments onto cloud platforms. Such developers will oftenhave existing code that needs to run in new settings, andthat must be closely monitored and managed to maintainavailability. The DMake system addresses this requirement,using a familiar paradigm (Make and the associated Make-file model), but augmenting it to play a management role inwhich distributed events are mapped to file updates.

Our evaluation shows that DMake can handle some very de-manding, large-scale use cases. While we see a number oftopics for future work, the existing system is already capableof managing GridCloud, a complete system for managementof the bulk power grid, and could easily be adapted to otherapplications with similar needs. The solution is open sourceand will be available for no-fee download from our develop-ment website by mid 2014.

10. ACKNOWLEDGEMENTSThe authors are grateful to Robbert van Renesse, Carl Hauser,Dave Bakken and Nate Foster for their help and sugges-tions. We also wish to acknowledge our funding sources: theDOE/ARPA-E GENI program, DARPA’s MRC program,the NSF Smart Grid program and Amazon EC2.

11. REFERENCES[1] Amazon EC2. aws.amazon.com/ec2/.

[2] Make.www.gnu.org/software/make/manual/make.html.

[3] Microsoft Azure. azure.microsoft.com.

[4] J. Albrecht, C. Tuttle, R. Braud, D. Dao, N. Topilski,A. C. Snoeren, and A. Vahdat. DistributedApplication Configuration, Management, andVisualization with Plush. ACM Trans. InternetTechnol., 11(2):6:1–6:41, Dec. 2011.

[5] J. Albrecht, C. Tuttle, A. C. Snoeren, and A. Vahdat.PlanetLab application management using Plush. ACMSIGOPS Operating Systems Review, 40(1):33–40, 2006.

[6] K. Birman. Isis2 Cloud Computing Library, August2012.

[7] K. Birman and H. Sohn. Hosting Dynamic Data in theCloud with Isis2 and the Ida DHT. ACM Workshopon Timely Revolts in Operating Systems (TRIOS), atSOSP 2013.

[8] K. P. Birman. Guide to Reliable Distributed Systems:Building High-Assurance Applications andCloud-Hosted Services. Springer, 2012.

[9] S. I. Feldman. Make—A program for maintainingcomputer programs. Software: Practice andexperience, 9(4):255–265, 1979.

[10] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed.Zookeeper: wait-free coordination for internet-scalesystems. In Proceedings of the 2010 USENIXconference on USENIX annual technical conference,volume 8, pages 11–11, 2010.

[11] M. Kutare, G. Eisenhauer, C. Wang, K. Schwan,V. Talwar, and M. Wolf. Monalytics: OnlineMonitoring and Analytics for Managing Large ScaleData Centers. In Proceedings of the 7th InternationalConference on Autonomic Computing, ICAC ’10,pages 141–150, New York, NY, USA, 2010. ACM.

[12] E. Lupu and M. Sloman. Conflicts in policy-baseddistributed systems management. SoftwareEngineering, IEEE Transactions on, 25(6):852–869,Nov 1999.

[13] K. Maheshwari, M. Lim, L. Wang, K. Birman, andR. van Renesse. Toward a reliable, secure and faulttolerant smart grid state estimation in the cloud. InInnovative Smart Grid Technologies (ISGT), 2013IEEE PES, pages 1–6, Feb 2013.

[14] J. Moffett and M. Sloman. Policy hierarchies fordistributed systems management. Selected Areas inCommunications, IEEE Journal on, 11(9):1404–1414,Dec 1993.

[15] D. Rosu, K. Schwan, S. Yalamanchili, and R. Jha. Onadaptive resource allocation for complex real-timeapplications. In Real-Time Systems Symposium, 1997.Proceedings., The 18th IEEE, pages 320–329, Dec1997.

[16] S. Sivasubramanian. Amazon dynamodb: a seamlesslyscalable non-relational database service. In Proceedingsof the 2012 ACM SIGMOD International Conferenceon Management of Data, pages 729–730. ACM, 2012.

[17] M. Sloman. Policy driven management for distributedsystems. Journal of Network and SystemsManagement, 2(4):333–360, 1994.

[18] R. Van Renesse, K. P. Birman, and W. Vogels.Astrolabe: A Robust and Scalable Technology forDistributed System Monitoring, Management, andData Mining. ACM Trans. Comput. Syst.,21(2):164–206, May 2003.

[19] J. Van Vliet, F. Paganelli, S. van Wel, and D. Dowd.Elastic Beanstalk. ” O’Reilly Media, Inc.”, 2011.

[20] H. Wiki. MapReduce — Hadoop Wiki.http://en.wikipedia.org/w/index.php?title=

MapReduce&oldid=606038028.

[21] Wikipedia. Software-defined networking — Wikipedia,The Free Encyclopedia.http://en.wikipedia.org/w/index.php?title=

Software-defined_networking&oldid=605183730.

[22] M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford,D. S. Katz, and I. Foster. Swift: A language for

Page 12: DMake: A Tool for Monitoring and Configuring Distributed ...distributed applications on platforms like Amazon’s elastic compute cloud (EC2) [1], where the\Elastic Beanstalk"[19]

distributed parallel scripting. Parallel Computing,37(9):633–652, 2011.

[23] Q. Yin, A. SchAijpbach, J. Cappos, A. Baumann, andT. Roscoe. Rhizoma: A Runtime for Self-deploying,Self-managing Overlays. In J. Bacon and B. Cooper,editors, Middleware 2009, volume 5896 of LectureNotes in Computer Science, pages 184–204. SpringerBerlin Heidelberg, 2009.

[24] A. Zahariev. Google app engine. Helsinki University ofTechnology, 2009.

[25] Y. Zhao, M. Hategan, B. Clifford, I. Foster,G. Von Laszewski, V. Nefedova, I. Raicu,T. Stef-Praun, and M. Wilde. Swift: Fast, reliable,loosely coupled parallel computation. In Services,

2007 IEEE Congress on, pages 199–206. IEEE, 2007.