17
Are New Languages Necessary for Manycore? David I. August Department of Computer Science Princeton University

Are New Languages Necessary for Manycore? David I. August Department of Computer Science Princeton University

Embed Size (px)

Citation preview

Are New Languages Necessary for Manycore?

David I. AugustDepartment of Computer SciencePrinceton University

David I. August

THIS is the Problem!S

PEC

CP

U I

NTEG

ER

PER

FO

RM

AN

CE

TIME

?2004

David I. August

Why New Multicore Languages Will Fail• Money is earned by relieving

customer pain• The Market• Legacy, Legacy, Legacy• Programmers adopt new

programming models• Parallel programming is more

difficult• Parallel programming models

have longevity issues

• Automatic Thread Extraction (ATE)

David I. August

Automatic Thread Extraction“That isn't to say we are

parallelizing arbitrary C code, that's a fool's errand!” –

Richard Lethin“Compiler can’t determine a tree

from a graph…” – Burton Smith

“Compiler can’t determine dependences without type

information. Even then…” – Burton Smith

“Decades of automatic parallelization work has been a

failure…” – James Larus“All that icky pointer chasing

code...” – Tim Mattson

David I. August

How To Get Parallelism For Multicore? • Nine months ago, with an open

mind…

• A priori select ALL C programs from SPEC CINT 2000

• Our objective function (in priority order):1.Extract meaningful parallelism2.Prefer automatic over manual3.Minimize impact to the programmer

when manual

David I. August

Our ResultsBenchmark

Threads at Peak

Speedup

LOCs Changed164.gzip 32+ 29.91 26

175.vpr 15 3.59 1

176.gcc 16 5.06 17

181.mcf 32+ 2.84 0

186.crafty

32+ 25.18 9

197.parser

32+ 24.50 2

253.perlbmk

5 1.21 0

254.gap 10 1.94 1

255.vortex

32+ 4.92 0

256.bzip2 12 6.72 0

300.twolf 8 2.06 1

GEOMEAN 17 5.54

ARITHMEAN

20 9.81

M.L.O.P.:5 Generations

32 Cores5.3x Speedup

David I. August

Our RecipeRecent Compiler Technology:• Decoupled Software Pipelining (DSWP)

[MICRO 05]

• Parallel-Stage DSWP (PS-DSWP)• Speculative DSWP (Spec-DSWP) [PACT 07]

• Existing Technology: Speculative DOALL, TLS

• Targeted Memory Profiling• Procedure Boundary Elimination [PLDI 06]

Hardware Support:• Compiler-Controlled Speculation• Streaming Communication [MICRO 06]

David I. August

Typical Example: 197.parser

Threads run on multicore model

with Itanium 2 cores.

FindEnglish

Sentences

ParseSentences

(95%)

EmitResults

DSWPPS-DSWP (Spec DOALL Middle Stage)

David I. August

What We Learned

1. A new way of thinking about dependences:

Go With the Flow

1. TLP is easier to extract than ILP

1. A holistic approach is better

1. A limitation exists in the sequential model:

Determinism

David I. August

Determinism: A Double Edged Sword

while(<cond>): <work> x = Rand() <work>

int Rand(): state = f2(state) return f1(state)

1

1

2 3 4

2 3 4DOALL

SEQUENTIAL

56 LOCs in 11 programs: 22 annotationsOnly 2 programs needed more

Most common culprit: Custom Allocators

David I. August

What about Manycore?Multicore• New languages aren’t necessary• Legacy code easily adjusted

Manycore• Implicitly Parallel Sequential

Programming•No optimization for sequential

(custom allocators)•Points of non-determinism specified

• Parallel algorithms in sequential codes

• Debuggability, Understandability, Sanity

David I. August

The Answer Originates with ATEThe Old Way:

PL folks would write languages, Architecture folks would make HW, andCompiler folks would dutifully connect the two.

This will fail for Manycore:• Unduly burden the programmer• Performance will suffer

There’s a New Way…

David I. August

DO NOT POST ANYTHING AFTER THIS SLIDE

David I. August

How Code Was TransformedBenchma

rkLOC(All)

LOC(Model

)

ModelTechniqu

es

Compiler Techniques Applied

164.gzip 26 2 Y-Branch TLS Memory, DSWP

175.vpr 1 1 PURE Alias, Value, & Control Spec, TLS Mem, DSWP

176.gcc 17 7 PURE Alias & Control Spec, TLS MEM, DSWP181.mcf 0 0 Alias, Silent Store, & Control

Spec, TLS Mem, DSWP, Nested186.craft

y9 9 PURE TLS Mem, DSWP, Nested

197.parser

2 2 PURE TLS Mem, DSWP

253.perlbmk

0 0 Alias, Control, & Value Spec, DSWP254.gap 1 1 PURE TLS Memory, DSWP, Alias Spec255.vorte

x0 0 Alias & Value Spec, TLS

Mem, DSWP256.bzip2

0 0 TLS Memory, DSWP

300.twolf 1 1 PURE Alias & Control Spec, TLS Mem, DSWP

David I. August

PURE

David I. August

Y-Branch

David I. August

SPEC 2006: 403.gcc

Threads run on multicore model with Itanium 2 cores.