View
0
Download
0
Category
Preview:
Citation preview
Vivek Kumar1, Julian Dolby2, Stephen M Blackburn3
1 Rice University (Work done during affiliation with Australian National University) 2 IBM T.J. Watson Research 3 Australian National University
Integrating Asynchronous Task Parallelism and Data-centric Atomicity
Introduction
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16 2
Hardware and Software Yesterday
OS
CPU
1990s, early 2000
“What Andy giveth, Bill taketh away”
3
Hardware and Software Today
CPU CPU
CPU CPU
OS
Introduction
GPU
CPU CPU “The Free Lunch is
Now Over!”
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
4
The 3P Challenge
• Productivity
• Performance
• Portability
Introduction
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
5
Options ?
• Productivity • “Doing” lots of things at once (parallelism)
• Threads, fork/join(Java), async-finish (X10, HJ)
• “Dealing” with lots of things at once (concurrency) • locks, transactions, isolated (HJ)
• Performance • Work–stealing scheduling
• Portability • Managed runtime to hide hardware complexities
Introduction
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
6
Productivity and Performance Challenge
• Parallelism • Expose “right” amount of parallelism
• Concurrency • “Control-centric” approach might lead to data races
and deadlocks
Problem Statement
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
7
✔ Atomic Java with Work-Stealing (AJWS) A new parallel programming model
✔ Annotations to expose task parallelism and data-centric concurrency control
Compiler transformation of AJWS to vanilla Java that uses a highly
efficient work-stealing runtime implemented directly inside a JVM
✔ Detailed performance study Using three large open-source Java applications and evaluating the
performance on a multicore smartphone
✔ Results That shows that AJWS improves both productivity and performance over
conventional approaches
Contributions
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Motivating Analysis
8Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
Bank Transaction
Motivating Analysis
9
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); }}
How to parallelize ??
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
Bank Transaction
Motivating Analysis
10
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); }}}
async {finish
That’s elegant
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
11
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); }}}
Wait… But what about task granularity ??
async {finish
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
12
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); }}}
async {finish
• TryCatchWS • Kumar et. al., OOPSLA 2012 • JVM support for work-stealing
for X10 • Extremely low overheads
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
I can use TryCatchWS and avoid that hassle
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
13
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); }}}
async {finish
• TryCatchWS • Kumar et. al., OOPSLA 2012 • JVM support for work-stealing
for X10 • Extremely low overheads
So, should I learn
X10 ..??
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
14
Concurrency…?
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].lock(); a[i].addInterest(); a[i].unlock(); } }
class Transfer { Account from, to; int amount;
void run() { from.lock(); to.lock() from.debit(amount); to.credit(amount); from.unlock();to.unlock() }}
Bank Transaction
Motivating Analysis
15
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
Deadlock ..?
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
void addInterest() { Account[N] a; for (i=0; i<N; i++) { a[i].lock(); a[i].addInterest(); a[i].unlock(); } }
class Transfer { Account from, to; int amount;
void run() { from.lock(); to.lock() from.debit(amount); to.credit(amount); from.unlock();to.unlock() }}
Bank Transaction
Motivating Analysis
16
class Account { int balance; public Account(b) { ... } void addInterest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
Deadlock ..?
from to
to from
T1
T2
lock()
lock()
lock()
lock()
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
17
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
@Atomic(from) @Atomic(to)
Dolby et. al.’s data-centric annotations
@Atomicset(A);@Atomic(A)
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
class Transfer { Account from, to; int amount;
void run() { from.debit(amount); to.credit(amount); }}
Bank Transaction
Motivating Analysis
18
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
But that’s not available to me ..
@Atomicset(A);@Atomic(from) @Atomic(to)
@Atomic(A)
How to parallelize..
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Insights
• Integrate async-finish task parallelism and data-centric approach for synchronization
• For high performance load balancing, rely on TryCatchWS work-stealing runtime
Motivating Analysis
19Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
AJWS Programming Model
20Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
Bank Transaction in AJWS
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
class Transfer { Account from, to; int amount; void run() { from.debit(amount); to.credit(amount); }}
}
async {finish
finishasync
AJWS – Atomic Java with Work-Stealing
21
class Account { int balance; public Account(b) { ... } void interest() { ...} void debit(amount) { ... } void credit(amount) { ... }}
@Atomicset(A);@Atomic(from) @Atomic(to)
@Atomic(A)
AJWS !!
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Annotations in AJWS • Parallelism (Kumar et. al. OOPSLA 2012)
– async, finish
• Data-centric Atomicity (Dolby et. al. TOPLAS 2012) – @Atomicset(A)
• Denotes a group of memory locations that shares same consistency property and should be updated atomically
– @Atomic(A)• Annotation on fields that belong to atomic set A • Annotation on a method to declare it to be additional unit for work
for atomic set A
– @AliasAtomic(A=this.A)• Annotation on an object to specify that the object’s atomic set A
is unified with current class’s atomic set (this.A)
AJWS Programming Model
22Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Translating AJWS to Vanilla Java
23Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
Bank Transaction in AJWS
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
finishasync
async {
class Transfer { Account from, to; int amount; void run() { from.debit(amount); to.credit(amount); }}
finish
}
AJWS Programming Model
24
class Account { int balance; public Account(b) { ... } void interest() { ...} ...
}
@Atomicset(A);@Atomic(from) @Atomic(to)
@Atomic(A)
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Translating Data-centric Annotations
AJWS Programming Model
class Account { OrderedLock _lockA; int balance;
public Account(b) { ... _lockA = new OrderedLock(); }
void interest() { synchronized(_lockA) { ... } } void interest_internal() {...}
25
class Account { int balance; public Account(b) { ... } void interest() { ...} ...
}
@Atomicset(A);@Atomic(A)
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
Bank Transaction in AJWS
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
finishasync
async {
class Transfer { Account from, to; int amount; void run() { from.debit(amount); to.credit(amount); }}
finish
}
AJWS Programming Model
26
class Account { int balance; public Account(b) { ... } void interest() { ...} ...
}
@Atomicset(A);@Atomic(from) @Atomic(to)
@Atomic(A)
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Translating Data-centric Annotations class Transfer { Account from, to; @Atomic(from) @Atomic(to) void run() { from.debit(amount); to.credit(amount); }}
class Transfer { Account from, to; void run() { OrderedLock[2] lock; lock[0] = from.getLock(); lock[1] = to.getLock(); sort(lock); synchronized(lock[0]) { synchronized(lock[1]) { from.debit_internal(amount); to.credit_internal (amount); } } } // run()}
AJWS Programming Model
27Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
class Bank { void fundTransfer() { Transfer[N] t; for (i=0; i<N; i++) { t[i].run(); } }}
Bank Transaction in AJWS
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
finishasync
async {
class Account { @Atomicset(A); @Atomic(A) int balance; public Account(b) { ... } void interest() { ... } ...}
class Transfer { Account from, to; int amount; @Atomic(from) @Atomic(to) void run() { from.debit(amount); to.credit(amount); }}
finish
}
AJWS Programming Model
28Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Translating Work-Stealing Annotations
void interest() { Account[N] a; for (i=0; i<N; i++) { a[i].addInterest(); } }
async {finish
}
AJWS Programming Model
a[0].addInterest()
Victim
interest()
…….
Thief
…….
…….
Thief
interest()
…….
a[1].addInterest()
Thief
interest()
…….
void interest() { try { for (i=0; i<N; i++) { try { RT.continuation(); a[i].addInterest(); RT.checkIfStolen(); }catch(ThiefEntry e) { } } RT.doFinish(); catch(Finish f) { } }
29
Yieldpoint, OSR
throw exception ThiefEntry
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Implementation
30Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Implementation
• Open-sourced prototype implementation of AJWS – Uses JastAdd compilation framework (unlike eclipse
refactoring in AJ and polyglot in X10’s TryCatchWS) • Currently uses Java synchronized blocks for
concurrency control – Future work would explore other options (e.g.
transactions) • Using parallelism annotation inside atomic section
is not allowed
31Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Performance Evaluation
32Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Benchmarks
• jMetal – Framework providing set of classes that can be used
as template for multi-objective optimization – Uses Java Executer , Thread, synchronized
• JTransforms – Multithreaded FFT library – Uses Java Future
• SJXP – XML parser build for Android OS – Original implementation is sequential
Methodology
33Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Experimental Infrastructure
• Hardware platform – 4 core ASUS ZenFone 2 – Intel Atom core running at 2.3 GHz – LinuxDeploy application to install Ubuntu 15.10
• Software platform – JikesRVM Java VM mercurial version 11181
• Measurements – 15 invocations of each experiment with 6 iterations per
invocation – We report the mean of final iteration in each invocation
Methodology
34Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Evaluating AJWS Productivity
Evaluation
Benchmark
Sequential Default AJWS
Files LOC synchs kism E↵ort @AtomicSet @Atomic kism E↵ort
jMetal 329 28,216 10 6 1.2% 2 3 47 0.1%JTransforms 45 42,756 0 372 14.3% 0 0 372 3.7%SJXP 17 1,250 – – – 1 2 1 6.5%
35
Original
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Evaluating AJWS Performance
Evaluation
0.5
1
1.5
2
2.5
3
1 2 3 4
Seedup (
Tim
e for
Sequentia
l / P
ara
llel)
Threads
Default AJWS
0.5
1
1.5
2
2.5
3
1 2 3 4
Seedup (
Tim
e for
Sequentia
l / P
ara
llel)
Threads
Default AJWS
1 2 3 4
Threads
Spe
edup
ove
r Seq
uent
ial
0.5 1
1.5 2
2.5 3
1 2 3 4
jMetal JTransforms SJXP
Original
AJWS
36
0.5
1
1.5
2
2.5
3
1 2 3 4Seed
up (T
ime
for S
eque
ntia
l / P
aral
lel)
Threads
Sequential AJWS
1 2 3 4
Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Conclusion
• AJWS – provides annotations in Java to enhance the productivity in parallel programming
• Prototype implementation that integrates task parallelism and data-centric atomicity
• High performance load balancing without incurring overheads
• Evaluation of AJWS using 3 large open-sourced applications – Extremely low syntactic overheads – Delivers better performance than original versions
37Integrating Asynchronous Task Parallelism and Data-centric Atomicity | Kumar et al. | PPPJ ‘16
Recommended