Upload
douglas-potter
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Are New Languages Necessary for Manycore?
David I. AugustDepartment of Computer SciencePrinceton University
David I. August
Why New Multicore Languages Will Fail• Money is earned by relieving
customer pain• The Market• Legacy, Legacy, Legacy• Programmers adopt new
programming models• Parallel programming is more
difficult• Parallel programming models
have longevity issues
• Automatic Thread Extraction (ATE)
David I. August
Automatic Thread Extraction“That isn't to say we are
parallelizing arbitrary C code, that's a fool's errand!” –
Richard Lethin“Compiler can’t determine a tree
from a graph…” – Burton Smith
“Compiler can’t determine dependences without type
information. Even then…” – Burton Smith
“Decades of automatic parallelization work has been a
failure…” – James Larus“All that icky pointer chasing
code...” – Tim Mattson
David I. August
How To Get Parallelism For Multicore? • Nine months ago, with an open
mind…
• A priori select ALL C programs from SPEC CINT 2000
• Our objective function (in priority order):1.Extract meaningful parallelism2.Prefer automatic over manual3.Minimize impact to the programmer
when manual
David I. August
Our ResultsBenchmark
Threads at Peak
Speedup
LOCs Changed164.gzip 32+ 29.91 26
175.vpr 15 3.59 1
176.gcc 16 5.06 17
181.mcf 32+ 2.84 0
186.crafty
32+ 25.18 9
197.parser
32+ 24.50 2
253.perlbmk
5 1.21 0
254.gap 10 1.94 1
255.vortex
32+ 4.92 0
256.bzip2 12 6.72 0
300.twolf 8 2.06 1
GEOMEAN 17 5.54
ARITHMEAN
20 9.81
M.L.O.P.:5 Generations
32 Cores5.3x Speedup
David I. August
Our RecipeRecent Compiler Technology:• Decoupled Software Pipelining (DSWP)
[MICRO 05]
• Parallel-Stage DSWP (PS-DSWP)• Speculative DSWP (Spec-DSWP) [PACT 07]
• Existing Technology: Speculative DOALL, TLS
• Targeted Memory Profiling• Procedure Boundary Elimination [PLDI 06]
Hardware Support:• Compiler-Controlled Speculation• Streaming Communication [MICRO 06]
David I. August
Typical Example: 197.parser
Threads run on multicore model
with Itanium 2 cores.
FindEnglish
Sentences
ParseSentences
(95%)
EmitResults
DSWPPS-DSWP (Spec DOALL Middle Stage)
David I. August
What We Learned
1. A new way of thinking about dependences:
Go With the Flow
1. TLP is easier to extract than ILP
1. A holistic approach is better
1. A limitation exists in the sequential model:
Determinism
David I. August
Determinism: A Double Edged Sword
while(<cond>): <work> x = Rand() <work>
int Rand(): state = f2(state) return f1(state)
1
1
2 3 4
2 3 4DOALL
SEQUENTIAL
56 LOCs in 11 programs: 22 annotationsOnly 2 programs needed more
Most common culprit: Custom Allocators
David I. August
What about Manycore?Multicore• New languages aren’t necessary• Legacy code easily adjusted
Manycore• Implicitly Parallel Sequential
Programming•No optimization for sequential
(custom allocators)•Points of non-determinism specified
• Parallel algorithms in sequential codes
• Debuggability, Understandability, Sanity
David I. August
The Answer Originates with ATEThe Old Way:
PL folks would write languages, Architecture folks would make HW, andCompiler folks would dutifully connect the two.
This will fail for Manycore:• Unduly burden the programmer• Performance will suffer
There’s a New Way…
David I. August
How Code Was TransformedBenchma
rkLOC(All)
LOC(Model
)
ModelTechniqu
es
Compiler Techniques Applied
164.gzip 26 2 Y-Branch TLS Memory, DSWP
175.vpr 1 1 PURE Alias, Value, & Control Spec, TLS Mem, DSWP
176.gcc 17 7 PURE Alias & Control Spec, TLS MEM, DSWP181.mcf 0 0 Alias, Silent Store, & Control
Spec, TLS Mem, DSWP, Nested186.craft
y9 9 PURE TLS Mem, DSWP, Nested
197.parser
2 2 PURE TLS Mem, DSWP
253.perlbmk
0 0 Alias, Control, & Value Spec, DSWP254.gap 1 1 PURE TLS Memory, DSWP, Alias Spec255.vorte
x0 0 Alias & Value Spec, TLS
Mem, DSWP256.bzip2
0 0 TLS Memory, DSWP
300.twolf 1 1 PURE Alias & Control Spec, TLS Mem, DSWP