23
Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Embed Size (px)

Citation preview

Page 1: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Distributed Programmingin Scala with APGAS

Philippe Suter, Olivier Tardieu, Josh MilthorpeIBM Research

Picture by Simon Greig

Page 2: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

APGAS - Context

• Model for concurrency + distribution in X10.

• X10, general purpose language– Developed at IBM Research for 10+ years.– Focus/bias towards distributed HPC tasks.– JVM + native back-ends (through Java & C++).– Some X10 apps ran on >50K cores.

Asynchronous Partitioned Global Address Space

http://x10-lang.org and X10’15 @ PLDI (tomorrow)

Page 3: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

APGAS in Scala• Goal: expose the concurrent/distributed core

of X10 as a library.– In Java 8 and as a Scala DSL.

• This contribution:– Introduction to programming w/ APGAS in Scala.– Illustrated through two benchmarks:• K-means clustering• Unbalanced Tree Search (see paper)

– Contrasting model with Akka (see paper).– Preliminary experimental scaling results.

Page 4: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

APGAS Primer

• Concurrent tasks run at distributed places.• The environment exposes the available places.

def places : Seq[Place]def here : Place

def asyncAt(p : Place)(body: =>Unit) : Unitdef async(body: =>Unit) : Unit

• Tasks can be remote or local.• Tasks are asynchronous by default.

Page 5: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

APGAS Primer

• The termination of tasks is controlled by the finish construct.

def finish(body: =>Unit) : Unit

• Blocks until enclosed tasks have completed, including all nested tasks, local or remote.

• Distributed termination is challenging, finish is a powerful contribution of APGAS.

Page 6: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Hello World

finish { for(p <- places) { asyncAt(p) { println(s“Hello from $here.”) } }}

Completes when all places have completed their task.

asyncAt returns immediately.

$> …Hello from place(0). Hello from place(3).Hello from place(1).Hello from place(2).

Page 7: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

“Academic” Fibonacci

def fibonacci(i: Int) : Long = { if(i <= 1 ) i else { var a,b = 0L finish { async { a = fibonacci(i – 2) } b = fibonacci(i – 1) } a + b }}

finish guards a single asyncAt…

…but recursive invocations enclose many more.

finish completes exactly when the computation of all dependencies is complete.

Page 8: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Messages and Memory• Default mechanism for transferring memory

between places is to capture it in the closure of the body of asyncAt.

• APGAS lets the programmer define global symbols for memory local to places.

class Worker(…) extends PlaceLocal

Page 9: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Place-local Objects

• All instances of PlaceLocal resolve to objects that are place-specific.

class Worker(…) extends PlaceLocal

val w : Worker = PlaceLocal.forPlaces(places) { new Worker(…) }

for(p <- places) { asyncAt(p) { w.work() }}

One distinct instance is created at each place.

Here, w resolves to the worker at place p.

Page 10: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Global and Shared References

• For objects that cannot extend PlaceLocal, APGAS provides a wrapper (“pointer”)trait GlobalRef[T] { def apply(): T }

• Shared references refer to an object at a particular place and can only be dereferenced there.– Useful to “call back” from an asynchronous task.

trait SharedRef[T] { def apply(): T }

Page 11: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Global and Shared References

// at place p1val largeArray : Array[Double] = …val ref = SharedRef.make(largeArray)

asyncAt(p2) { … asyncAt(p1) { val array = ref() array(…) = … } …}

Dereference at p1 resolves to largeArray.

largeArray is never captured, therefore never serialized.

Dereferencing ref() here would be an error.

Page 12: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Distributed K-means Clustering• Goal: iteratively divide a set of points into K

disjoint clusters.• Distribute the points among workers.• In each iteration:– workers:• computes the new centroids for their own points.• communicate their view of the centroid to the master

– the master:• aggregates all workers’ data and checks convergence

Page 13: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Distributed K-Means: Memory

• Each worker needs to hold:– Its set of points.– Its local view of centroids.

• In addition, the master holds:– The aggregated centroids.

• In our implementation, the workers write their results directly at the master’s.– Requires synchronized data structure.

GlobalRef[WorkerData]

SharedRef[MasterData]

Page 14: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Distributed K-Means: Structure

while(!converged) { finish { for(p <- places) { asyncAt(p) { // compute new local centroids asyncAt(masterRef.home()) { // merge local centroids in master } } } }}

Page 15: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Unbalanced Tree Search

• Counts nodes in a dynamically generated tree.• Each node:– Has an associated SHA1 hash.– Has a number of children determined by a

probabilistic law.• Trees are unbalanced in an unpredictable but

deterministic way.

Page 16: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Unbalanced Tree Search

• Algorithm combines work-stealing and work-dealing among workers.

• Workers are modeled as state machines.• Termination:– in APGAS: a single, top-level finish.– in Akka: requires a counting protocol.

Page 17: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

APGAS Implementation

• APGAS implementation:– ~2000 lines Java 8– ~200 lines Scala (definitions, helpers, serialization)

• Tasks are scheduled using fork/join.• Distribution built on top of Hazelcast.

• Benchmarks are ~1200 Scala lines– 1/3 APGAS, 1/3 Akka, 1/3 common.

Page 18: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Performance Evaluation

• For both benchmarks, we ran a fixed problem using 1, 2, 4, 8, 16, and 32 workers.

• Measured “unit of work” per second per worker.

• All experiments ran on single 48 core machine.– Akka benchmarks use akka-remote.

Page 19: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Performance Evaluation

• Experiments are meant to:– be a sanity check,– provide evidence of scalability potential.

• Please do not interpret as claim that X is better than Y.

“Comparable performance and scalability for comparable complexity.”

Page 20: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

K-Means

0 5 10 15 20 25 30 350.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

APGASAkka

Itera

tions

/sec

ond/

wor

ker

Number of workers

Page 21: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Unbalanced Tree Search

0 5 10 15 20 25 30 358.4

8.6

8.8

9

9.2

9.4

9.6

APGASAkka

Mill

ion

of n

odes

/sec

ond/

wor

ker

Number of workers

Page 22: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Conclusion

• Made APGAS programming problem accessible to Scala programmers.

• Programming style is different, but a good fit for some problems.

• In particular, finish concisely solves hard distributed termination problems.

• Complexity is similar to equivalent Akka impls.• Promising preliminary scaling results.

Page 23: Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig

Thank you!