Capsule-oriented Programming - Iowa State Universitydesign.cs.iastate.edu/papers/TR-13-01/capsule-oriented.pdfCapsule-oriented Programming Hridesh Rajan, Steven M. Kautz, Eric Lin,

Capsule-oriented ProgrammingHridesh Rajan, Steven M. Kautz, Eric Lin, Sarah Kabala, Ganesha Upadhyaya, Yuheng Long, Rex Fernando, and Loránd Szakács

TR #13-01Initial Submission: February 28, 2013.

Keywords: concurrency, modularity, synergyCR Categories:D.2.10 [Software Engineering] DesignD.1.5 [Programming Techniques] Object-Oriented ProgrammingD.2.2 [Design Tools and Techniques] Modules and interfaces,Object-oriented design methodsD.2.3 [Coding Tools and Techniques] Object-Oriented ProgrammingD.3.3 [Programming Languages] Language Constructs and Features - Control structures

Copyright (c) 2013, Hridesh Rajan, Steven M. Kautz, Eric Lin, Sarah Kabala, Ganesha Upadhyaya, Yuheng Long, Rex Fernando, andLoránd Szakács.

Department of Computer Science226 Atanasoff Hall

Iowa State UniversityAmes, Iowa 50011-1041, USA

Capsule-oriented Programming

Hridesh Rajan, Steven M. Kautz, Eric Lin, Sarah Kabala, Ganesha Upadhyaya,Yuheng Long, Rex Fernando, and Loránd Szakács

Iowa State University, Ames, IA, 50011, USA{hridesh,smkautz,eylin,skabala,ganeshau,csgzlong,fernanre,lorand}@iastate.edu

ABSTRACTMany programmers find writing and reasoning about concurrentprograms difficult and can benefit from better abstractions for con-currency. A promising class of such concurrency abstractions com-bines state and control within a single linguistic mechanism anduses asynchronous messages for communications, e.g. active ob-jects or actors. One hurdle is the need to adapt to an asychronousstyle of programming. We believe that most benefits of actor-likeabstractions can be brought to sequentially-trained programmersvia a more familiar synchronous model. We call this model capsule-oriented programming, where programmers describe a system interms of its modular structure and write sequential code to imple-ment the operations of those modules using a new abstraction thatwe call capsule. Capsule-oriented programs look like familiar se-quential programs but they are implicitly concurrent. We presentPanini, a capsule-oriented programming language, and its compiler,which help programmers avoid two classes of concurrency errors:sequential inconsistency and data races due to sharing. We haverefactored the Java Grande and NPB benchmarks (>134,000 LOC)using Panini, leading to simpler and shorter programs that performas well as the parallel versions provided with the benchmarks.

1. INTRODUCTIONModern software systems tend to be distributed, event-driven,

and asynchronous, often requiring components to maintain multi-ple threads for message and event handling. In addition, there isincreasing pressure on developers to introduce concurrency intoapplications in order to take advantage of multicore processors toimprove performance. Yet concurrent programming stubbornly re-mains difficult and error-prone. First, a programmer must parti-tion the overall system workload into tasks. Second, tasks mustbe associated with threads of execution in a manner that improvesutilization while minimizing overhead; note that this set of deci-sions is highly dependent on characteristics of the platform, such asthe number of available cores. Finally, the programmer must man-age the dependence, interaction, and potential interleaving betweentasks to maintain the intended semantics and avoid concurrencyhazards, often by using low-level primitives for synchronization.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

For the programmer to address these issues, better abstractions areneeded that can hide the details of concurrency and allow them tofocus on the program logic.

The significance of better abstractions for concurrency is notlost on the research community. Some key ideas from the last twodecades include: guardians [25], active objects [23, 32], and ac-tors [3], all of which combine state and control within a single lin-guistic mechanism and use asynchronous messages for communi-cations. We believe that a major gap remains. There is an impedancemismatch between sequential and implicitly concurrent code writ-ten using actor-like abstractions that is hard for a sequentially trainedprogrammer to overcome. These programmers typically rely uponthe sequence of operations to reason about their programs.

Contributions. We present capsule-oriented programming toaddress the challenges of concurrent programming. A central goalof this programming style is to provide tools to enable program-mers to simply do what they do best, that is, to describe a systemin terms of its modular structure [34] and write sequential code toimplement the operations of those modules. To achieve this, we in-troduce a new abstraction that we call capsule. A capsule is similarto a process in CSP [21]; it defines a set of public operations, andalso serves as a memory region for some set of ordinary objects.

One goal in capsule-oriented programming is that the program-mer should get the benefits of asynchronous execution without be-ing forced to adapt to an asynchronous, message-passing style ofprogramming. To the programmer, inter-capsule calls look like or-dinary method calls. Capsule-oriented programs are implicitly con-current. There are no explicit threads or synchronization locks; ifnecessary or beneficial, concurrency is introduced by the compiler.Capsule-oriented programming eliminates two classes of concur-rency errors: sequential inconsistency and race conditions due toshared data.

We have realized the basic ideas behind capsule-oriented pro-gramming in an extension of Java that we call Panini (§3) and haveimplemented a compiler for Panini by extending the industry stan-dard javac compiler (§4). Our experience with over 134,000 linesof capsule-oriented programs shows that they are simpler, shorter,and have performance comparable to explicitly concurrent versionswritten by expert benchmark programmers (§5 and §6).

2. MOTIVATIONTo illustrate the challenges of concurrent program design, con-

sider a simplified navigation system. The system consists of fourcomponents: a route calculator, a maneuver generator, an interfaceto a GPS unit, and a UI. The UI requests a new route by invok-ing a calculate operation on the route calculator, assumed to becomputationally intensive. When finished, the route is passed to themaneuver generator via method setNewRoute. The GPS inter-

Java program with explicit concurrency

1 class ManeuverGen {2 private Route currentRoute;3 private Position currentPosition ;4 private UI ui ;5 public synchronized void setNewRoute(Route r) {6 currentRoute = r;7 }8 public synchronized void updatePosition(Position p) {9 currentPosition = p;

10 final Position temp = p;11 Runnable r = new Runnable() {12 public void run() {ui .updatePosition(temp);}13 };14 SwingUtilities .invokeLater(r) ;15 final Instruction inst = nextManeuver();16 if ( inst != null) {17 r = new Runnable() {18 public void run() { ui .announceNextTurn(inst); }19 };20 SwingUtilities .invokeLater(r) ;21 }22 }23 public synchronized Position getCurrentPosition() {24 return currentPosition ;25 }26 private Instruction nextManeuver() {/∗ ... ∗/}27 }28 interface Calculator {void calculate(Position dst) ;}29 class Shortest implements Calculator {30 private ManeuverGen mg;31 public Shortest(ManeuverGen mg) {this.mg = mg;}32 public void calculate( final Position dst) {33 Thread t = new Thread(new Runnable() {34 public void run() {35 Route r = helper(mg.getCurrentPosition(), dst) ;36 mg.setNewRoute(r);37 }38 }) ;39 t . start () ;40 }41 private Route helper(Position src, Position dst) {/∗ ∗/}42 }43 class GPS {44 private ManeuverGen mg;45 public GPS(ManeuverGen mg) {this.mg = mg;}46 public void runLoop() {47 while (true) mg.updatePosition( readData() );48 }49 private native Position readData();50 }

Java program, con’t51 class UI { /∗ provides updatePosition, announceNextTurn ∗/ }52 class Navigation {53 public static void main(String[] args) {54 ManeuverGen mg = new ManeuverGen();55 Calculator rc = new Shortest(mg);56 final GPS gps = new GPS(mg);57 Thread t = new Thread(new Runnable() {58 public void run() { gps.runLoop(); }59 }) ;60 t . start () ;61 //62 // Also create and start UI, details not shown63 //64 }65 }

Implicitly concurrent Panini program66 capsule ManeuverGen (UI ui) { // Requires an instance of capsule UI67 Route currentRoute = null; // A capsule state68 Position position = null ;69 void setNewRoute(Route r) { currentRoute = r; } // A capsule procedure70 void updatePosition(Position p) {71 position = p;72 ui .updatePosition(p); // Inter−capsule procedure call73 Instruction inst = nextManeuver();74 if ( inst != null) ui .announceNextTurn(inst);75 }76 Position getCurrentPosition() { return position ; }77 private Instruction nextManeuver() {/∗ ... ∗/} // A helper procedure78 }79 signature Calculator { void calculate(Position dst) ; }80 capsule Shortest (ManeuverGen m) implements Calculator {81 void calculate(Position dst) {82 Route r = helper(m.getCurrentPosition(), dst) ;83 m.setNewRoute(r);84 }85 private Route helper(Position src, Position dst) {/∗ ... ∗/}86 }87 capsule GPS (ManeuverGen mg) {88 void run() {89 while (true) mg.updatePosition( readData() );90 }91 private native Position readData();92 }93 capsule UI { /∗ provides updatePosition, announceNextTurn ∗/ }94 system Navigation {95 UI ui ; ManeuverGen m ; Shortest r ; GPS g ; // Capsule instances96 m (ui) ; r (m) ; g (m) ; // Wiring capsule instances97 }

Figure 1: Programs for a simplified navigation system. Classes Position, Route, and Instruction are elided.

face continually parses the data stream from the hardware and up-dates the maneuver generator with the current position via methodupdatePosition. The maneuver generator checks the positionagainst the current route and generates a new turn instruction forthe UI if needed (not computationally intensive).

The modular structure of the system is clear from the descriptionabove, and it is not difficult to define four Java classes with appro-priate methods corresponding to this design. However, the systemwill not yet work. The programmer is faced with a number of non-trivial decisions: Which of these components needs to be associ-ated with an execution thread of its own? Which operations mustbe executed asynchronously? Where is synchronization going to beneeded? A human expert might reach the following conclusions,shown in the listing in Figure 1.

• A thread is needed to read the GPS data (lines 57-60)• The UI, as usual, has its own event-handling thread. The calls

on the UI need to pass their data to the event handling threadvia the UI event queue (lines 10–14 and 17–20)

• The route calculation needs to run in a separate thread; other-wise, calls to calculateRoute will "steal" the UI event threadand cause the UI to become unresponsive (lines 33–39)

• The ManeuverGen class does not need a dedicated thread,however, its methods need to be synchronized, since its datais accessed by both the GPS thread and the thread doing routecalculation (lines 5, 8, and 23)

None of the conclusions above, in itself, is difficult to imple-ment in Java. Rather, in practice it is the process of visualizing theinteractions between the components, in order to reach those con-clusions, that is extremely challenging for programmers [28, 2].

3. THE PANINI LANGUAGEA central goal of capsule-oriented programming and the Panini

language is to help sequentially trained programmers deal with thechallenges of concurrent program design.

Overview. The Panini programmer specifies a system as a col-lection of capsules and ordinary object-oriented classes. A capsule

is an abstraction for decomposing a software into its parts and asystem is a mechanism for composing these parts together.12 Acapsule is like Parnas’s information-hiding module [34] in that itdefines a set of public operations, hides the implementation details,and could serve as a work assignment for a developer or a team ofdevelopers. Beyond these standard responsibilities, a capsule alsoserves as a memory region, or ownership domain [11, 10], for someset of standard object instances and behaves as an independent logi-cal process [21]. Inter-capsule calls look like ordinary method callsto the programmer. The object-oriented features are standard, butthere are no explicit threads or synchronization locks in Panini.

3.1 Declarations in PaniniA program in Panini can have zero or more signature declara-

tions, zero or more class declarations, zero or more capsule decla-rations, and a system declaration.

Signature. A signature declaration in Panini contains one ormore abstract procedure signatures. An example signature declara-tion Calculator appears on line 79 in Figure 1. Each proceduresignature has a return type, a name, and zero or more formal pa-rameters. This syntax is similar to interfaces in Java, except that forsimplicity we do not allow signature inheritance.

Capsule. A capsule declaration consists of the keyword ‘cap-sule’, the name of the capsule, zero or more formal parameters rep-resenting dependencies on other capsules, and zero or more signa-tures representing services that the capsule provides, followed bythe capsule’s implementation. This is similar to both the Mesa [31]and nesC [16] languages. Each procedure declaration in every sig-nature implemented by the capsule must match in entirety with ex-actly one capsule procedure. Panini does not have capsule inheri-tance but does have class inheritance. The primary mechanism forreuse of capsules is composition.

The example in Figure 1 contains four capsule declarations. Thecapsule ManeuverGen requires a UI capsule instance on line 66,(UI ui). Similarly, the capsule Shortest declares that it re-quires an instance of ManeuverGen capsule and it provides theCalculator signature on line 80. After a capsule instance is cor-rectly initialized, expressions inside a capsule instance may accessthese imported capsule instances using their names, e.g., ui. LikeML modules [29], Panini capsule instances are not first-class valuesso capsule instances may not be passed as arguments to methods orstored in capsule states or object fields.

A capsule implementation consists of zero or more state decla-rations, zero or more capsule procedures, and zero or more internalclass declarations. A state declaration has a class type, a name, andoptionally an initialization expression, so in that sense it is similarto a field in traditional class declarations, except that a capsule in-stance controls all accesses to the object graph reachable from itsstates, i.e., a capsule instance acts as a dominator for the graph [11].All state declarations are private to a capsule, therefore, no visibil-ity modifiers are necessary. Two examples appear on lines 67-68 inFigure 1.

A capsule procedure has a return type, a name, zero or more ar-guments, and a body. Helper procedures can be declared by qualify-ing them with a modifier private. All capsule procedures, excepthelper procedures, constitute the interface of the capsule. There isone designated optional capsule procedure run representing thatthe capsule can start computation without external stimuli.

1The intent of capsule, a modeling notation in UML-RT andROOM [39] is similar in that there is an activity, a state machineand an interface, but Panini capsules are a language feature.2Unrelated to CAPSULE [27] a stream processing framework.

A capsule can also contain standard object-oriented class decla-rations. These class declarations are considered internal to the cap-sule and are not visible outside the capsule. A class declaration thatis used across two or more capsules should be declared outside acapsule, as is usual in object-oriented languages.

System Declaration. A system declaration is a declarative staticspecification of the topology of capsule instances that would bepresent in the program. The system declaration for the navigationsystem appears on lines 94-97 in Figure 1; line 95 specifies thecapsule instances that will be participating in this system, e.g., aninstance of UI; and line 96 specifies how these instances are con-nected, e.g., the GPS instance g is connected to the ManeuverGeninstance m. The topology of capsules is fixed for a program anddoes not change dynamically, which facilitates more precise staticanalysis of capsule interactions (§4). Arrays of capsule instances offixed length can also be declared.

Design Rationale for Capsules. At first glance a capsule decla-ration may look similar to a class declaration, thus naturally raisingthe question as to why a new syntactic category is essential, andwhy class declarations may not be enhanced with the additionalcapabilities that capsules provide, namely, confinement (as in Er-lang [4]) and an activity thread (as in previous work on concurrentOO languages [8, 44, 40]).There are three main reasons for this de-sign decision in Panini. First, we believe based on previous experi-ences that objects may be too fine-grained to think of each one as apotentially independent activity [33]. Second, we wanted to specifya system as a set of related capsules with a fixed topology, in orderto make it feasible to perform static analysis of the system graphsdescribed in §4; this implies that capsules should not be first-classvalues. Third, there is a large body of OO code that is written with-out any regard to confinement. Changing the semantics of classeswould have made reusing this vast set of libraries difficult, if notimpossible. In the current design of Panini, since syntactic cate-gories are different, sequential OO code can be reused within theboundary of a capsule without needing any modification.

3.2 Informal SemanticsAny capsule with a run procedure begins executing indepen-

dently as soon as the initialization and interconnection of all cap-sules is complete and may generate calls to the procedures of othercapsules. For example, referring to Figure 1, capsule GPS will runcode on lines 88-90. Capsules without a run procedure, such asShortest and ManeuverGen, perform computation only whentheir procedures are invoked.

As discussed in more detail in §4, even a capsule without a runprocedure may be associated with an independent execution thread.Thus, in any nontrivial Panini program there will be issues of thread-safety to address for all capsules. Thread-safety is primarily en-sured by confining the object graphs rooted at each capsule’s states,i.e., only a single execution thread has access to the states of a cap-sule. In inter-capsular calls, built-in and immutable types are passedand returned by value, whereas reference types are passed by lineartransfer of ownership for the object graph rooted at the parameterobject or result object.

Although capsule procedures may execute asynchronously, theprogrammer does not have to program in a message-passing style.A call to a capsule procedure behaves like an ordinary method call.Calling a capsule procedure may have side-effects on the state ofthe capsule instance, e.g., reading or writing state, and may alsolead to external calls to other capsule procedures. For two consec-utive procedure calls on the same capsule instance, the side-effectsof the first procedure call are always visible to the second proce-dure call to provide sequential consistency. However, it is also pos-

sible that two calls on different capsules ultimately lead to effectswithin a single capsule, and we also ensure that effects of consecu-tive calls, as observable within a given capsule, are always seen inthe order intended by the programmer.

Navigation system revisited. Compared to the explicitly con-current Java program on lines 1-65 in Figure 1, the Panini programon lines 66-97 is an implicitly concurrent program. Owing to thedeclarative nature of some Panini features, this program is muchshorter compared to the Java program. Most importantly, this ex-ample illustrates some of the key advantages of the Panini approachfor programmers. These are:

• They don’t need to create explicit threads or specify whethera given capsule needs its own thread of execution.

• They don’t need to recognize or reason about potential dataraces.

• They work within a familiar method-call style interface witha reasonable expectation of sequential consistency.

• All synchronization-related details are abstracted away andare fully transparent to them.

4. THE PANINI COMPILERTo show the feasibility of realizing capsule-oriented program-

ming in an industrial-strength compiler, we extended the Open-JDK Java compiler (javac) to add support for capsule-oriented pro-gramming to create the Panini compiler. We considered library andannotation-based approaches instead of extending syntax for Java,but the notation burden of these alternatives was significant enoughto hamper usability, so we went with syntax changes.

There were three major challenges: how to detect confinementviolations, how to detect the possibility of sequential inconsistency,and how to realize capsules while minimizing overhead.

Confinement. Within a capsule, thread-safety is primarily en-sured by confinement: only a single capsule instance has access tothe objects belonging to the instance. Confinement in itself is gener-ally insufficient, however, since mutable objects can still be passedas arguments to inter-capsule procedure calls and returned as val-ues. Unlike Erlang, which enforces that all data is immutable [4],or Scala actors with capabilities [19], which use an ownership typesystem [11, 10], we have adapted a static analysis for confine-ment [18] that tracks variables used as parameters in inter-capsuleprocedure calls and in return statements of public procedures.

Sequential consistency. In Panini, all instances of capsules ina system and all interconnections between them are declared stat-ically; there is no inheritance for capsules and references to themcannot be passed as procedure arguments. The compiler exploitsthese properties to efficiently construct a system graph showing de-pendencies between capsule instances. In addition, when compilinga system declaration the compiler produces a more detailed inter-procedural system graph in which the nodes are capsule proceduresand edges are interprocedural calls. The compiler detects cases inwhich two calls lead to observable effects within the same cap-sule instance and provides a warning to the programmer. In somecases, that warning may be benign, and we provide an annotationCommutes for the programmer to express that. Our current pro-totype does not check whether this annotation is correctly placed,however, Diniz and Rinard’s commutativity analysis could be ap-plied [13]. If two calls are made to the same capsule instance, theFIFO ordering of messages (see below) ensures that effects withinthat capsule instance will occur in order.

Execution model. A capsule instance is a potential executiondomain as well as an ownership domain for its state. An importantfeature of Panini is that the specification of capsules by the pro-

P1

Process

P1

P2

Pn...

Message Ring Buffer

CapsuleP

roce

du

res

Inte

rfa

ce

Fetc

h

Finish

P1

P2

Pn

...

id args ret.val

Figure 2: A capsule. Procedure invocations are enqueued as re-quests on the Message Ring Buffer. Requests are processed byan execution thread in FIFO order.

grammer is decoupled from the decisions about how to map eachcapsule’s activities to OS threads. Logically, each capsule is an in-dependent process-like entity, and the invocation of a capsule pro-cedure triggers creation of a request object that is placed on a FIFOqueue (the message ring buffer shown in Figure 2). In the defaultimplementation, requests are executed in FIFO order by a singlethread associated with the capsule instance.

From the caller’s point of view, invoking a capsule procedurelooks like an ordinary method call. The call returns immediatelyand the caller receives a future as a proxy for the actual return value(void return values are allowed). The future is completely transpar-ent to the programmer, that is, it is an automatically generated typethat is consistent with the expected return value [35]. If the valueis not used immediately, the caller can continue execution. An at-tempt by the caller to invoke a method on the returned future willcause the caller to block until the callee has finished executing theprocedure, automatically providing a synchronization point.

5. FURTHER APPLICATIONSIn coarse-grained concurrent applications, such as the simplified

navigation system illustrated in Figure 1, the main motivation isnot necessarily to achieve parallel execution but rather to correctlyand safely model components that are “logically autonomous” [24].These kinds of asynchronous, event-driven systems are the obvi-ous candidates for Panini. However, the capsule abstraction alsoadapts easily to other styles of parallel programming, while retain-ing Panini’s advantages of abstracting away the concurrency con-trol mechanisms and ensuring data confinement.

Master-worker example. As an example, Figure 3 is a simpleparallel program in which a number of “worker” tasks execute aMonte Carlo approximation of π in parallel; a “master” task com-bines the results as the workers finish. Each call to compute online 17 executes asynchronously in an instance of a Worker cap-sule; the returned Number object is transparently replaced by a fu-ture for the eventual result. The futures provide an implicit barrier;that is, in the call to value on line 19, the execution of the runprocedure in master capsule blocks until the corresponding Workerhas finished computing its result.

The explicitly concurrent Java program has the applications’sconcerns tangled with the concurrency concerns, whereas the Paniniprogram abstracts away the details of concurrency. As Figure 3shows, the performance of the Panini program is comparable tothat of the thread-based program. A more significant potential ben-efit is that the Panini compiler can be employed to guard againstrace conditions when parallelism is introduced into an application.

Thread pool example. Figure 4 shows the Java and Panini codefor a simplified server using a thread pool implemented using a

Java program with threads and synchronization

1 class Worker implements Runnable {2 long num;3 private final CountDownLatch doneSignal;4 Worker(long num, CountDownLatch doneSignal) {5 this .num = num;6 this .doneSignal = doneSignal;7 }8 Random prng = new Random ();9 Number _circleCount = null; //Emulates return value of worker

10 Number getCircleCount() { return _circleCount; }11 public void run() {12 _circleCount = new Number(0);13 for (long j = 0; j < num; j++) {14 double x = prng.nextDouble();15 double y = prng.nextDouble();16 if ((x ∗ x + y ∗ y) < 1) _circleCount. incr () ;17 }18 doneSignal.countDown();19 }20 }21 class Master {22 void assign(long totalCount, int numWorkers) {23 CountDownLatch l = new CountDownLatch(numWorkers);24 Worker[] workers = new Worker[numWorkers];25 for ( int i = 0; i < numWorkers; ++i) {26 workers[i ] = new Worker(totalCount/numWorkers, l);27 new Thread(workers[i]).start () ;28 }29 try {30 l .await() ;31 } catch (InterruptedException e) { /∗ Error recovery ∗/ }32 Number[] results = new Number[numWorkers];33 for ( int i=0; i< numWorkers; i++)34 results [ i ] = workers[i ]. getCircleCount();35 long total = 0;36 for (Number result: results ) total += result .value() ;37 double pi = 4.0 ∗ total / totalCount;38 }39 }40 public class Pi {41 public static void main(String[] args) {42 Master master = new Master();43 master.assign(50000000,10);44 }45 }

Panini program1 capsule Worker (int num) {2 Random prng = new Random ();3 Number compute() {4 Number _circleCount = new Number(0);5 for ( int j = 0; j < num; j++) {6 double x = prng.nextDouble();7 double y = prng.nextDouble();8 if ((x ∗ x + y ∗ y) < 1) _circleCount. incr () ;9 }

10 return _circleCount;11 }12 }13 capsule Master (int totalCount, Worker[] workers) {14 void run(){15 Number[] results = new Number[workers.length];16 for ( int i = 0; i < workers.length; i++)17 results [ i ] = workers[i ]. compute();18 long total = 0;19 for (Number result: results ) total += result .value() ;20 double pi = 4.0 ∗ total / totalCount;21 }22 }23 system Pi {24 Master master; Worker workers[10]; // Statically declared capsule array25 master(50000000, workers); // Wiring capsules together26 for (Worker w : workers) w(5000000); //Giving initial value to num27 }

Performance results

Figure 3: Parallel programs for Monte Carlo calculation of π . The user-defined Number class is the same in both Java and Panini.

leader-followers pattern [38]. The Java code is based on the imple-mentation of the Tomcat 6 web server [1]. The thread pool Poolconsists of a queue of idle threads. When the pool recieves a re-quest from the Host via the runIt method to handle a connec-tion, a Worker is removed from the queue and awakened withthe doRunIt method. When finished handling the connection, theworker calls available to put itself back on the idle queue andthen waits for the next task. If there are no idle threads in thequeue when the host calls runIt, the main thread will block untila worker is available to handle the request, automatically providinga form of throttling to ensure that existing connections are handledbefore any new connections are accepted. (Note that this exam-ple, while faithful to the Tomcat implementation, does not preciselymatch the leader-followers pattern as described in [38] in which theworker threads also take turns listening for connections.)

The Panini version is functionally the same, but is dramaticallysimpler since all of the queueing, waiting, and notification is im-plicit. When a Worker calls the getConnection procedure, thecall returns immediately with a future representing the Socket ob-ject, and a task corresponding to the procedure body is queued forexecution in the Host. When a worker attempts to use the socketin its handleConnection helper, it blocks until the Host pro-vides the actual socket.

Pipeline example. Panini’s features can also be useful for im-plementing applications that can benefit from pipeline parallelism.To illustrate, consider the problem of maintaining the running av-erage and maximum of a stream of numbers, e.g. average and maxprice of a stock in a day. Figure 5 presents a model with four cap-sules: Source, Average, Max, and Sink. Each of these cap-sules implement the signature Stage that provides only one pro-cedure consume. The capsule Source generates a stream of ran-dom numbers and sends it down the pipeline, where Average up-dates its running average, Max updates the maximum, and Sinkcounts. Finally, on line 33, instances of these capsules are con-nected as a pipeline src -> avg -> sum -> max -> snk.

This code is very similar to how one would write a sequen-tial program to model the same scenario, so the structure of thisPanini program would be familiar to a sequential programmer. Thiscode is also free of any concurrency-related concerns, such as setupand teardown threads for running each stage in the pipeline con-currently and synchronization between adjacent stages to hand offthe input to the next stage, which is typical of a pipeline pattern.This code would, however, have all of the benefits of the explicitlyconcurrent implementation. Therefore, we believe that a sequentialprogrammer would have a greater chance of success.

Java1 public class Server {2 public static void main(String[] args) {3 new Host().runServer();4 }5 }6 class Pool {7 Queue<Worker> idle = new LinkedList<Worker>();8 public void addWorker(Worker w) {9 idle .add(w);

10 }11 public synchronized void runIt(Socket s) {12 while ( idle .isEmpty()) wait () ;13 Worker w = idle.remove();14 w.doRunIt(s);15 }16 public synchronized void available(Worker w){17 idle .add(w);18 notify () ;19 }20 }21 class Worker extends Thread {22 private Pool p;23 private Socket s;24 public Worker(Pool p){this.p = p;}25 public void run() {26 while (true) {27 synchronized(this) {28 while (s == null) wait () ;29 handleConnection(s);30 s = null ;31 }32 p.available (this) ;33 }34 }35 public synchronized void doRunIt(Socket s){36 this .s = s;37 notify () ;38 }39 private void handleConnection(Socket s){ /∗...∗/}40 }

Java program, con’t41 class Host {42 private Pool p;43 public Host(){44 p = new Pool();45 for ( int i = 0; i < 10; ++i) {46 p.addWorker(new Worker(p));47 }48 }49 public void runServer() {50 ServerSocket ss = new ServerSocket(8080);51 while (true) {52 Socket s = ss.accept();53 p. runIt (s) ;54 }55 }56 }

Panini57 capsule Host() {58 ServerSocket ss = new ServerSocket(8080);59 Socket getConnection() {60 Socket s = ss.accept();61 return s;62 }63 }64 capsule Worker(Host h) {65 void run() {66 while (true) {67 Socket s = h.getConnection();68 handleConnection(s);69 }70 }71 private void handleConnection(Socket s) { /∗...∗/ }72 }73 system Server {74 Host h; Worker workers[10]; // Statically declared capsule array75 for (Worker w: workers)76 w(h); // Wiring capsules together77 }

Figure 4: Server with a leader-followers style thread pool (based on Tomcat 6). Exception handling code is elided.

6. EVALUATION: PERFORMANCEWe now investigate three critical features of Panini to determine

its viability as a language and an abstraction for non-trivial, implic-itly concurrent programming. First, in §6.2 we translate 15 popularbenchmarks shown in Figure 6 into Panini and determine the linesof code changed by minimal rewriting. Second, in §6.3 we test thespeedup of these Panini benchmarks against the original serial andparallel versions. Third, in §6.4 we examine the memory overheadfor Panini benchmarks compared with the originals.

6.1 BenchmarksOur selected benchmark suites, JavaGrande (JG) [41] and NPB

[15], offer explicitly-threaded and explicitly-serial reference pro-grams. Our rewriting translated only concurrency-related code andstructure from the original programs and did not alter or optimizecode not related to concurrency. We translated every Java bench-mark in NPB and every benchmark in JG that had both a serial anda threaded version. For the rest of our evaluation, we refer to theseexplicitly-sequential and explicitly-parallel programs as SERIALand PARALLEL, respectively. Figure 6 summarizes several char-acteristics of these programs gathered with SOOT [42] and listswhat problem sizes our experiments used.

6.2 Code SizeExplicitly concurrent code can be some of the most intricate code

in a program and challenging to write. Reducing this volume ofcode eases the programmer’s burden and limits development bugs.

Research Question 1: Do Panini programs require less code toimplicitly achieve parallel and serial execution?

Methodology. The translation of NPB and JG benchmarks fromJava to Panini allowed many lines of explicitly-concurrent code tobe replaced by implicitly-concurrent lines of code. We used the fol-lowing non-intrusive guidelines for rewriting:

• Change thread objects to Panini capsules• Change synchronized methods and blocks to Panini capsule

methods and calls• Create Panini capsule fields for top-level class instances• Remove explicitly-concurrent and serial-only code

We used standard techniques for counting lines of code, whichignore blank lines and comments and count multi-statement lineswith a weight equal to the number of actual statements.3

Results. The lines of code added and deleted by our rewritingare shown in Figure 7.

Analysis. The translated Panini benchmarks show a significantdrop in lines of concurrency-related code because synchronizationpatterns are more concise in Panini. For example, execution of par-allel workers is synchronized regularly in NPB with 11 lines in amaster object and 13 lines in a worker object. Panini realizes thispattern implicitly with only 2 and 4 capsule lines, respectively.

3http://reasoning.com/downloads/java_line_count_estimator.html

1 signature Stage { void consume(long n); }2 capsule Source (Stage next, long num) {3 Random prng = new Random ();4 void run() {5 for ( int j = 0; j < num; j++) {6 long n = prng.nextInt(1024);7 next.consume(n);8 }9 }

10 }11 capsule Average (Stage next) implements Stage {12 long average = 0;13 long count = 0;14 void consume(long n) {15 next.consume(n);16 average = ((count ∗ average) + n) / (count + 1);17 count++;18 }19 }20 capsule Max (Stage next) implements Stage {21 long max = Long.MIN_VALUE;22 void consume(long n) {23 next.consume(n);24 if ( n > max) max = n;25 }26 }27 capsule Sink(long num) implements Stage {28 long count = 0;29 void consume(long n) { count++; }30 }31 system Pipeline {32 Source src; Average avg; Max max; Sink snk;33 src(avg,500); avg(sum); max(snk); snk(500);34 }

Figure 5: An Example of Pipeline Parallelism in Panini.

Benchmark Abbr. Size # Methods # Statements

Java

Gra

nde

Crypt cr C 30 1,567LUFact lu C 33 1,737MolDyn md B 35 2,417

MonteCarlo mc B 110 2,252RayTracer rt B 68 2,303

Series ser B 28 873SOR sor C 26 771

SparseMatmult sm C 27 818

NPB

BT – A 44 34,804CG – A 31 3,434FT – A 37 4,831IS – A 27 2,358LU – A 44 36,736MG – A 41 7,818SP – A 44 28,098

Total 625 130,817

Figure 6: Static Characteristics of Evaluation Benchmarks

By reducing the volume and complexity of concurrent code, Paninipromises to boost programmer productivity and success during con-current program design. When rewriting these benchmarks in Panini,our student programmers gave frequent feedback that the Paninicode they created was easier for them to write and understand thanthe explicitly-concurrent code of the original benchmarks.

6.3 SpeedupTo ensure that Panini program performance is comparable to

that of explicitly-parallel programs, we examined the speedup thatPanini benchmarks achieve with reference to the original serial ver-sions and original parallel versions. We define Panini speedup as:

Speedup = Re f erence Time / Panini Time

Because speedup is a ratio, its average is correctly computed asa geometric mean rather than as an arithmetic mean.

Figure 7: Change in Benchmark Code Size

General Methodology. Following the methodology of Georges etal., we tested the speedup of Panini benchmarks against the origi-nals using two different metrics [17]. Start-up performance is mea-sured for “one VM invocation and a single benchmark iteration.”Steady-state performance is measured using “one VM invocationand multiple benchmark iterations,” where each benchmark is re-peated within the VM until its performance reaches steady-state.Iteration time measurements are taken after steady-state is reached.

For our experiments, steady-state performance is reached whenthe coefficient of variation of the most recent three iteration times ofa benchmark fall below 0.02. Some changes to the original bench-marks were necessary to reset static result variables, restart timers,join created threads, and cut object reference loops in support ofrepeated iterations in a single VM.

We repeated our comparison on multiple evaluation platformswith a number of cores ranging from 2 to 12 including an IntelCore 2 Duo with two cores, an Intel Core i5 with four cores, anAMD Opteron 2431 with six cores, and an AMD Opteron 2431with twelve cores. We also considered several different values forthe number of threads used by each benchmark in order to compareperformance across a range of thread-to-core ratios.

6.3.1 Start-up TimeWhen a Java program is run only once, a previous adaptive com-

pilation of that program is not available to the VM. If the programis fairly short, the execution time will include JIT-ing but will notbe long enough to benefit much from it. In these cases, start-up timeis a critical performance measure.

Research Question 2: Do Panini programs have better start-upperformance than sequential programs?

Methodology. We took 30 start-up time samples for each bench-mark on each evaluation platform and at each number of threads ofinterest. The number of threads ranged from “0,” which implies se-rial execution, up to 24, which is twice the number of cores on ourlargest evaluation machine. Time was measured as the OS-reportedCPU time used to execute the program once in a fresh VM. Be-cause the NPB benchmarks BT, LU, and SP take orders of mag-nitude longer to run than all other benchmarks, we sampled themonly 3 times and not on our 2-core platform. We computed 95%confidence intervals (CI’s) using the Student’s t-inference to verifythe reliability of our observed summary statistics.

Results. At a glance, Figure 8 shows the speedup for all testcases and all Panini benchmarks with reference to SERIAL. Aswhite changes to blue, speedup increases from 1.0 to 12.0. As whitechanges to yellow, speedup decreases from 1.0 to 0.0.

Analysis. Figure 8 is dominated by white and blue, which showsthat Panini speedup is near 1.0 or above 1.0, in most cases. Only afew test cases show a slowdown with Panini in light yellow.

Number of Threads

Num

ber

of C

ores

246

12

0 1 2 4 8 12 16 24

BT CG

0 1 2 4 8 12 16 24

Crypt FT

0 1 2 4 8 12 16 24

IS

LU LUFact MG MolDyn

24612

MonteCarlo

246

12RayTracer

0 1 2 4 8 12 16 24

Series SOR

0 1 2 4 8 12 16 24

SP SparseMatmult

0 2 4 6 8 10

Figure 8: Start-up Time Speedup For Panini vs. SERIAL.(Stronger blues show that Panini performance is better.)

This overview gives quick visual evidence that a Panini programis as good as or better than SERIAL. Not all programs exhibit start-up time speedup; so, we examine head-to-head speedup at each ex-periment setting to confirm that Panini’s implictly-concurrent per-formance is on par with PARALLEL.

Research Question 3: Is the start-up time speedup of Paniniprograms comparable to that achieved by human experts?

Methodology. To answer this research question, we used thetest data that was gathered for the previous question.

Results. Figure 9 shows the head-to-head speedup of each Paninibenchmark against its explicit PARALLEL or SERIAL version.The confidence intervals for start-up time speedup were very tight;so, they are not shown.

Analysis. The majority of Panini benchmarks show close start-up time performance to their explicitly-parallel counterparts. SORfrom JavaGrande exhibits unpredictable behavior while the oth-ers follow consistent speedup patterns. As the number of machinecores increases, speedup tends to vary more. The 2-core machineshows speedup constricted around 1.0 while the 12-core machineshows speedup ranging more loosely between about 0.8 and 1.2.

The Panini speedup for SOR is unique and somewhat erratic. Weinspected the PARALLEL code for SOR and discovered that a busywait on shared data values is used for thread synchronization. Thishas rising costs as the number of threads increases; so, Panini’s au-tomatic synchronization mechanism performs much better at non-ideal thread-to-core ratios. It may be that this behavior is intendedfor benchmarking purposes. Other JG benchmarks make use of aTournamentBarrier, and show better behavior as thread-count in-creases.

Assuming a log-normal distribution for speedup, statistical t-inference yields a 99% confidence interval of (0.997,1.021) forspeedup across all start-up time experiments. Repeating the t-testfor an alternative hypothesis that true speedup is less than 1.0 yieldsp-value against the alternative of 0.9699. So, even when restrictedto the same compilation decision as human experts, there is strongevidence that a Panini program with compiler-generated synchro-nization will execute as fast as or faster than a vetted Java programwith manually-written synchronization.

6.3.2 Steady-State TimeWhen a Java program runs for a long time, adaptive compilation

and class-loading performed during start-up have much less impacton overall performance. This can be approximated for shorter pro-grams with repeated executions in a single VM. For long-runningconcurrent applications such as server programs, steady-state per-formance is of greater interest than start-up time.

Research Question 4: Do Panini programs have better steady-state performance than sequential programs?

Methodology. We took 10 steady-state time samples for eachbenchmark on the same combination of platforms and programthread counts as the preceding start-up time experiment. Time wasmeasured as the average clock time of three iterations of the pro-gram inside an existing VM after steady-state performance wasreached. Because the NPB benchmarks BT, LU, and SP take ordersof magnitude longer to run than all other benchmarks, we sampledthem 2 times and not on our 2-core evaluation platform. We againcomputed 95% confidence intervals.

Results. Figure 10 shows an overview of steady-state time speedupof each Panini benchmark with reference to SERIAL.

Analysis. Almost entirely white and blue, Figure 10 establishesthat Panini programs are as fast or faster than SERIAL. Not allPanini benchmarks show speedup against SERIAL; so, we againmore closely examine steady-state time in head-to-head compar-isons of Panini benchmarks with the originals.

Research Question 5: Is the steady-state time speedup of Paniniprograms comparable to that achieved by human experts?

Methodology. To answer this research question, we used thetest data that was gathered for the previous question.

Results. Head-to-head steady-state speedup for Panini with ref-erence to PARALLEL and SERIAL is shown in Figure 12.

Analysis. As before, most Panini benchmarks perform virtuallythe same as their counterparts. The variation in speedup increaseswith the number of cores on the platform, exhibiting little differ-ence from 1.0 on a 2-core machine and a range of about 1.3 to 0.7on a 12-core machine. Though not shown, the 95% CI’s remainbetween 0.6 and 1.6 for all benchmarks other than SOR, whosebusy-sync behavior is again a problem for PARALLEL.

The expected steady-state speedup overall is 1.003, with a 99%confidence interval of (0.988,1.018). Though not as clearly equiva-lent as it was for start-up time, there remains evidence that a Paniniprogram with automatic, implicit concurrency will execute as fastas an expert’s explicitly-parallel Java program in steady-state.

6.4 Memory OverheadEfficient use of both time and memory is essential for applica-

tions to reduce energy costs and allow other programs to be pro-ductive on a shared platform. Having shown that the size of con-currency code is decreased and that the execution time remainsgenerally unchanged for Panini programs, we consider finally whatmemory overhead, if any, is introduced by Panini.

Research Question 6: How much memory overhead is intro-duced by Panini and what are the primary sources of this over-head?

Methodology. Using our 4-core evaluation platform, we loggedgarbage collector activity for 10 runs of each benchmark at eachnumber of threads used in the speedup experiments of §6.3. Be-cause the NPB benchmarks BT, LU, and SP take orders of magni-tude longer to run, we sampled them only 3 times.

●●●

● ● ● ● ●

0 4 8 12 16 24

0.6

0.8

1.0

1.2

1.4

1.6

●●● ● ● ● ● ●

geomean = 1.019

2−Core

●●

●●

● ● ● ●

0 4 8 12 16 24

●●● ●

● ● ● ●

●

●●

●● ● ●

●

geomean = 0.997

4−Core

●●● ●

● ● ●●

0 4 8 12 16 24

●●●● ● ● ● ●

●

●●

●● ● ● ●

geomean = 1.012

6−Core

● ●● ● ● ● ● ●

Spe

edup

0 4 8 12 16 24

0.6

0.8

1.0

1.2

1.4

1.6

●

●

●●

●

●

●

●

●

●●

● ●●

●●

geomean = 1.009

12−Core

Number of Threads

Spe

edup

● ● ●cr lu md mc rt ser sor sm BT CG FT IS LU MG SP

Figure 9: Start-up Time Speedup For Panini vs. Originals — Equivalent, overall, with no direct concurrent programming.

Number of Threads

Num

ber

of C

ores

246

12

0 1 2 4 8 12 16 24

BT CG

0 1 2 4 8 12 16 24

Crypt FT

0 1 2 4 8 12 16 24

IS

LU LUFact MG MolDyn

24612

MonteCarlo

246

12RayTracer

0 1 2 4 8 12 16 24

Series SOR

0 1 2 4 8 12 16 24

SP SparseMatmult

0 2 4 6 8 10

Figure 10: Steady-state Time Speedup For Panini vs. SERIAL(Stronger blues show that Panini performance is better.)

Figure 11: Change in Java Heap Used For Panini vs. Originals

Results. Figure 11 shows the head-to-head ratio of the heapmemory used by Panini over PARALLEL and SERIAL on our 4-core evaluation platform.

Analysis. Panini’s automatically returned future objects are small;however, when the memory used by a program is also small persynchronization, this overhead becomes more apparent. In partic-ular, LUFact uses shared objects throughout its execution yet hasseveral barriers per iteration.

Our Panini version of SparseMatmult was based on the PAR-

ALLEL version, which creates a copy of 3 large matrices in eachworker object. The SERIAL version uses a single master copy ofits matrices. So, SERIAL consumes only about half as much Javaheap as the Panini version, which has 1 serial worker capsule.

6.5 SummaryPanini programs automatically achieve start-up and steady-state

time performance that is likely indistinguishable from hand-crafted,explicitly-parallel programs. All in fewer lines of code and withno explicitly concurrent code. Compiler-generated synchronizationintroduces minimal memory overhead, which becomes trivial forsufficient computation per capsule method. Thus, Panini is a veryattractive choice for a simple, implicitly-concurrent developmentlanguage that yields no performance loss.

7. RELATED IDEASThe idea of exposing concurrency at the boundary of compo-

nents has appeared in one form or another in much previous work,e.g., guardians [25], active objects [32], and actors [3] as in Er-lang [4] in Scala [20] and in AmbientTalk [12]. Many of the earlyproposals were in the context of distributed systems; nevertheless,they are relevant to current and ongoing efforts in the design, se-mantics, and implementation of concurrent language features [4,20], including capsules. Bal, Steiner, and Tanenbaum present anexcellent overview of this research [5]. The observation that im-proving parallelism requires better abstractions has also appearedin literature, e.g., X10 [9], Habanero [7], and Galois [22]. A pri-mary goal of Panini is to decrease the impedance mismatch be-tween sequential and implicitly concurrent code to help sequen-tially trained programmers who may struggle with concurrency.In contrast to prior work that provides an asynchronous program-ming model, calls to capsules are treated as logically synchronousin Panini. Many decisions in the design of Panini and its implemen-tation are driven by this fundamental difference in philosophy.

Active Objects. Lavendar and Schmidt [23], Nierstrasz [32]among others, investigated the notion of active objects. Most re-cently Schäfer and Poetzsch-Heffter [37] as well as Clarke et al. [10]have further investigated this design. The notion of a “domain” inHybrid [32] is closely related to a capsule. A domain encapsulatesa set of objects, whereas a capsule acts as a dominator for the ob-ject graph reachable via the capsule [11]. Like a capsule, a domaincan only have a single active thread of control called an “activ-ity,” which is like a token that is moved from one domain to an-other; in Panini a capsule logically retains an active thread of con-trol throughout its lifetime. In Hybrid all calls (except for “activitystart”) were remote procedure calls with blocking send, whereas inPanini intra-capsular calls are synchronous and inter-capsular calls

●●● ● ●

● ● ●

0 4 8 12 16 24

0.6

0.8

1.0

1.2

1.4

1.6

●●● ● ● ● ● ●

geomean = 1.014

2−Core

●●●

●● ●

● ●

0 4 8 12 16 24

●

●● ●

● ● ●●

●

●●

●● ●

●●

geomean = 1.016

4−Core

●●●

●

● ● ● ●

0 4 8 12 16 24

●●● ● ●

●

●

●

●

●●●

● ● ●

●

geomean = 1.003

6−Core

●● ●

●

●

●

● ●

Spe

edup

0 4 8 12 16 24

0.6

0.8

1.0

1.2

1.4

1.6

●

● ●

● ●

●

●

●

●

● ● ●

●

●●

●

geomean = 0.98

12−Core

Number of Threads

Spe

edup

● ● ●cr lu md mc rt ser sor sm BT CG FT IS LU MG SP

Figure 12: Steady-state Time Speedup For Panini vs. Originals — Equivalent, overall, with no direct concurrent programming.

are logically synchronous. MOAO [10] distinguishes between “ac-tive” and “passive” objects and uses futures to introduce concur-rency for calls on active objects. Transparent futures using proxyobjects similar to those generated by the Panini compiler are de-scribed in [35].

Argus. The notion of capsules in Panini is also related to thenotion of guardians in Argus [25]. Like guardians, capsules encap-sulate and control access to resources. However, unlike guardians,sending a request does not create a concurrent process in Panini,avoiding concurrency-related issues within a capsule’s boundary.Unlike the dynamic creation of guardians in Argus, capsules arestatically created and configured in Panini, which helps provide se-quential consistency via a compile-time analysis.

More declarative concurrency. Unlike Jade [36] and similarapproaches like Cilk [14] that focus on fine-grained parallelism,Panini combines both concurrent tasks and data into the capsuleabstraction to provide coarse-grained implicit concurrency. Cap-sule are also more general compared to async event types that arespecific to implicit-invocation design style [26].

Concurrent object-oriented languages. There is also a richbody of work on such concurrent object-oriented languages such asCOOL [8], Seuss [30], Concurrent Smalltalk [44] and BETA [40].See Papathomas’s survey for an overview [33]. Unlike the worksabove, in Panini objects do not execute in the context of a localprocess, which avoids creating too many processes [33]. Panini’sdesign also avoids the inheritance anomaly [33].

StreamIt. Capsules in Panini can also be used like filters andstreams in the StreamIt language as we show in §5; however, sincefocus of StreamIt is streaming applications it provides some spe-cialized (and highly optimized) features for this domain, e.g. split-ters, joiners. Panini is intended to be a general purpose language inwhich these features can be defined by the programmer, e.g. con-sider a capsule Splitter in Figure 5 that connects to two othercapsules and calls the procedure consume on both.

Explicitly concurrent languages. Compared to explicitly con-current features like threads in Java and C#, and approaches likeunified parallel C (UPC) [6], Titanium [43], Panini provides im-plicit concurrency at the capsule boundary. The advantage of Panini’sapproach compared to these ideas is the ease of use for a sequen-tially trained programmer, whereas a disadvantage is the lack offine-grained control over concurrent structures.

Modules in sequential languages. One of the earliest languagesto have the notion of modules with exported and imported inter-faces was Mesa [31], a language designed and developed at XeroxPalo Alto Research Center in the 70’s and early 80’s. The notionsof modules and signatures in ML are similar [29]. Panini’s capsulesare based on Mesa, but also provide implicit concurrency.

8. CONCLUSION AND FUTURE WORKProgrammers come in two shapes — those who have mastered

reasoning about the interleaving of concurrent tasks and those thatfind it difficult. There are many in the second camp [28, 2]. There isa significant body of research on helping programmers in the firstcamp, typically HPC programmers [9, 7, 22]. Programmers whofind concurrency hard to master can benefit from better abstrac-tions that provide implicit concurrency [3, 25, 32]. While the juryis still out on whether the asynchronous message passing model forconcurrency is the best path going forward, it is clear that the ex-isting software development workforce was not educated to adoptthat model [2].

We have argued for research on programming language mech-anisms that adhere to assumptions that a sequentially trained pro-grammer would rely upon, such as the sequential semantics of pro-cedure calls. However, we do not promise concurrency for free;rather, we rely upon the observation that capable, sequential pro-grammers have received some formal training in modularization-related concepts in both basic and in advanced courses. To harnessthis knowledge towards exposing modularization-guided implicitconcurrency in program design, we have shown how to fuse the no-tion of a module [31, 16, 29] with the notion of an activity [32] andan ownership domain [11] such that separating program logic intocapsules naturally contributes toward implicit concurrency. Cap-sules provide simultaneous benefits in terms of both software evo-lution and program performance.

In Panini, the specification of capsules by the programmer isdecoupled from decisions regarding how to associate them withthreads of execution. The default implementation associates a ded-icated OS thread with each capsule. However, in some cases apurely thread-based execution model may introduce unnecessaryoverhead. Our immediate future work focuses on a compilationstrategy that, based on a set of heuristics and program metrics wehave identified, selects for each capsule either a thread-based im-plementation, a purely sequential implementation, or a task-basedimplementation in which one execution thread is shared by a groupof similar capsules.

A key challenge in this work was to provide a safe and efficientimplementation while retaining familiar semantics. We tried out thePanini language on several benchmarks, where it showed speedupcomparable to manually parallelized versions, while providing theadditional benefit of simpler, more modular code. Since concur-rency is implicit, Panini hides these concerns from programmers,allowing them to focus on the task at hand. By bringing most ben-efits of actor-like abstractions to sequentially-trained programmersvia a familiar synchronous model, Panini could help these program-mers overcome the tough hurdle of concurrent program design.

AcknowledgementsThis work was supported in part by the NSF under grants CCF-08-46059, and CCF-11-17937. We thank Mehdi Bagherzadeh, FeroshJacob, Gary T. Leavens, Robyn Lutz, Sean Mooney, Henrique Re-bêlo, Laurel Tweed, and David Weiss for comments.

9. REFERENCES[1] Apache Tomcat. http://tomcat.apache.org/.[2] ACM/IEEE-CS Joint Task Force. Computer science curricula

2013 (CS2013). Technical report, ACM/IEEE, 2012.[3] G. Agha. Actors: a model of concurrent computation in

distributed systems. Technical Report AITR-844, 1985.[4] J. Armstrong, R. Williams, M. Virding, and C. Wikstroem.

Concurrent Programming in ERLANG. Prentice-Hal, 1996.[5] H. E. Bal, J. G. Steiner, and A. S. Tanenbaum. Programming

languages for distributed computing systems. ACM Comput.Surv., 21(3):261–322, Sept. 1989.

[6] W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick,E. Brooks, and K. Warren. Introduction to UPC andlanguage specification. Center for Computing Sciences,Institute for Defense Analyses, 1999.

[7] V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java:the new adventures of old X10. In PPPJ, pages 51–61, 2011.

[8] R. Chandra, A. Gupta, and J. L. Hennessy. COOL: Anobject-based language for parallel programming. Computer’94, 27.

[9] P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra,K. Ebcioglu, C. von Praun, and V. Sarkar. X10: anobject-oriented approach to non-uniform cluster computing.In OOPSLA ’05.

[10] D. Clarke, T. Wrigstad, J. Östlund, and E. B. Johnsen.Minimal ownership for Active Objects. In APLAS ’08.

[11] D. G. Clarke, J. M. Potter, and J. Noble. Ownership types forflexible alias protection. In OOPSLA ’98.

[12] J. Dedecker, T. Van Cutsem, S. Mostinckx, T. DâAZHondt,and W. De Meuter. Ambient-oriented programming inAmbientTalk. ECOOP, pages 230–254, 2006.

[13] P. C. Diniz and M. C. Rinard. Commutativity analysis: a newanalysis technique for parallelizing compilers. ACM Trans.Program. Lang. Syst., 19(6):942–991, 1997.

[14] M. Frigo, C. E. Leiserson, and K. H. Randall. Theimplementation of the Cilk-5 multithreaded language. InPLDI, pages 212–223, 1998.

[15] M. Frumkin, M. Schultz, H. Jin, and J. Yan. Implementationof the NAS Parallel Benchmarks in Java. 2002.

[16] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, andD. Culler. The nesC language: A holistic approach tonetworked embedded systems. In PLDI, pages 1–11, 2003.

[17] A. Georges, D. Buytaert, and L. Eeckhout. Statisticallyrigorous Java performance evaluation. In OOPSLA ’07,pages 57–76.

[18] C. Grothoff, J. Palsberg, and J. Vitek. Encapsulating objectswith confined types. ACM Trans. Program. Lang. Syst.,29(6), Oct. 2007.

[19] P. Haller and M. Odersky. Capabilities for uniqueness andborrowing. In ECOOP’10.

[20] P. Haller and M. Odersky. Scala actors: Unifyingthread-based and event-based programming. Theor. Comput.Sci. ’09, 410.

[21] C. A. R. Hoare. Communicating sequential processes.Commun. ACM, 21(8):666–677, Aug. 1978.

[22] M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan,K. Bala, and L. P. Chew. Optimistic parallelism requiresabstractions. In PLDI ’07.

[23] R. Lavender and D. Schmidt. Active object – an objectbehavioral pattern for concurrent programming. In Patternlanguages of program design 2, 1996.

[24] D. Lea. Concurrent Programming in Java. Addison-Wesley,Boston, MA, USA, 1999.

[25] B. Liskov and R. Scheifler. Guardians and Actions:Linguistic support for robust, distributed programs. TOPLAS’83, 5.

[26] Y. Long, S. L. Mooney, T. Sondag, and H. Rajan. Implicitinvocation meets safe, implicit concurrency. In GPCE, pages63–72. ACM, 2010.

[27] G. Losa, V. Kumar, H. Andrade, B. Gedik, M. Hirzel,R. Soulé, and K.-L. Wu. CAPSULE: language and systemsupport for efficient state sharing in distributed streamprocessing systems. In DEBS ’12.

[28] D. Meder, V. Pankratius, and W. F. Tichy. Parallelism incurricula an international survey. Technical report,University of Karlsruhe, 2008.

[29] R. Milner, M. Tofte, and R. Harper. The Definition ofStandard ML. MIT Press, Cambridge, MA, USA, 1990.

[30] J. Misra. A simple, object-based view of multiprogramming.Form. Methods Syst. Des., 20(1):23–45, 2002.

[31] J. G. Mitchell, W. Maybury, and R. Sweet. Mesa languagemanual. Technical Report CSL-79-3, Xerox Palo AltoResearch Center, 1979.

[32] O. M. Nierstrasz. Active objects in Hybrid. In OOPSLA ’87.[33] M. Papathomas. Concurrency in object-oriented

programming languages. In Object-oriented softwarecomposition. Prentice Hall, 1995.

[34] D. L. Parnas. On the criteria to be used in decomposingsystems into modules. Commun. ACM, 15(12):1053–1058,Dec. 1972.

[35] P. Pratikakis, J. Spacco, and M. Hicks. Transparent proxiesfor Java futures. In OOPSLA ’04, pages 206–223.

[36] M. C. Rinard and M. S. Lam. The design, implementation,and evaluation of Jade. TOPLAS ’98, 20.

[37] J. Schäfer and A. Poetzsch-Heffter. JCoBox: generalizingactive objects to concurrent components. In ECOOP’10.

[38] D. Schmidt, M. Stal, H. Rohnert, and F. Buschmann.Pattern-Oriented Software Architecture, Volume 2. Wiley,New York, NY, USA, 2000.

[39] B. Selic, G. Gullekson, and P. Ward. Real-TimeObject-Oriented Modeling. John Wiley & Sons, 1994.

[40] B. Shriver and P. Wegner. Research directions inobject-oriented programming, 1987.

[41] L. Smith, J. Bull, and J. Obdrizalek. A parallel Java Grandebenchmark suite. In ACM/IEEE Conf. on Supercomputing,pages 6–6, 2001.

[42] R. Vallee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, andV. Sundaresan. Soot-a Java bytecode optimizationframework. In the conference of the Centre for AdvancedStudies on Collaborative research, page 13, 1999.

[43] K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit,A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay,P. Colella, et al. Titanium: A high-performance Java dialect.Concurrency Practice and Experience, 10(11-13):825–836.

[44] A. Yonezawa and M. Tokoro. Object-oriented concurrentprogramming, 1990.

Documents

Capsule-oriented Programming - Iowa State Universitydesign.cs.iastate.edu/papers/TR-13-01/capsule-oriented.pdfCapsule-oriented Programming Hridesh Rajan, Steven M. Kautz, Eric Lin,