A Roadmap to Restoring Computing's Former Glory

David I. August

Princeton University

(Not speaking for Parakinetics, Inc.)

Golden era of computer architecture

1992 20121994 1996 1998 2000 2002 2004 2006 2008 2010

~ 3 years behind

CPU92CPU95CPU2000CPU2006

Era of DIY:• Multicore• Reconfigurable• GPUs• Clusters

10 Cores!

10-Core Intel Xeon“Unparalleled Performance”

P6 SUPERSCALAR ARCHITECTURE (CIRCA 1994)

AutomaticSpeculation

AutomaticPipelining

Parallel ResourcesAutomatic

Allocation/Scheduling

Commit

MULTICORE ARCHITECTURE (CIRCA 2010)

AutomaticPipelining

Parallel Resources

AutomaticSpeculation

AutomaticAllocation/Scheduling

Commit

Realizable parallelism

Parallel Library Calls

Credit: Jack Dongarra

“Compiler Advances Double Computing Power Every 18 Years!” – Proebsting’s Law

Multicore Needs:

1. Automatic resource allocation/scheduling, speculation/commit, and pipelining.

2. Low overhead access to programmer insight.3. Code reuse. Ideally, this includes support of legacy codes as

well as new codes.4. Intelligent automatic parallelization.

Parallel Programming

Automatic Parallelization Parallel Libraries

Computer Architecture

Implicitly parallel programming with

critique-based iterative, occasionally interactive,

speculatively pipelined automatic

parallelization

A Roadmap to restoring computing’s

former glory.

Multicore Needs:1. Automatic resource allocation/scheduling, speculation/commit, and pipelining.2. Low overhead access to programmer insight.3. Code reuse. Ideally, this includes support of legacy codes as well as new codes.4. Intelligent automatic parallelization.

New or ExistingSequential Code DSWP Family

Optis Parallelized Code

Machine Specific Performance Primitives

Complainer/Fixer

InsightAnnotation

One Implementation

New or ExistingLibraries

InsightAnnotation

OtherOptis

SpeculativeOptis

Core 1

Core 2

Core 3

Core 4

Spec-PS-DSWPP6 SUPERSCALAR ARCHITECTURE

Example

A: while (node) {B: node = node->next;C: res = work(node);D: write(res); }

Core 1 Core 2 Core 3

Program Dependence Graph

Control DependenceData Dependence

Example

Spec-DOALL

Example

Spec-DOALL

Example

B: node = node->next;C: res = work(node);D: write(res); }

Spec-DOALL

A2A1 A3

A: while (node) { while (true) {

197.parser

Slowdown

Spec-DOACROSS

Spec-DSWP

Throughput: 1 iter/cycle Throughput: 1 iter/cycle

Comparison: Spec-DOACROSS and Spec-DSWP

Comm.Latency = 2: Comm.Latency = 2:Comm.Latency = 1: 1 iter/cycle Comm.Latency = 1: 1 iter/cycle

PipelineFill time

0.5 iter/cycle 1 iter/cycle

(1,1)(8,2)

(16,4)(24,6)

(32,8)

(40,10)

(48,12)

(56,14)

(64,16)

(72,18)

(80,20)

(88,22)

(96,24)

(104,26)

(112,28)

(120,30)

(128,32)0

50TLSSpec-PS-DSWP

(Number of Total Cores, Number of Nodes)

TLS vs. Spec-DSWP[MICRO 2010]Geomean of 11 benchmarks on the same cluster

Multicore Needs:1. Automatic resource allocation/scheduling, speculation/commit, and pipelining. 2. Low overhead access to programmer insight.3. Code reuse. Ideally, this includes support of legacy codes as well as new codes.4. Intelligent automatic parallelization.

Complainer/Fixer

InsightAnnotation

One Implementation

InsightAnnotation

OtherOptis

SpeculativeOptis

char *memory;

void * alloc(int size);

void * alloc(int size) { void * ptr = memory; memory = memory + size; return ptr;}

Core 1 Core 2

Core 3

Execution Plan

alloc1

alloc2

alloc3

alloc4

alloc5

alloc6

char *memory;

void * alloc(int size);@Commutative

Core 1 Core 2

Core 3

Execution Plan

alloc1

alloc2

alloc3

alloc4

alloc5

alloc6

char *memory;

void * alloc(int size);@Commutative

Core 1 Core 2

Core 3

Execution Plan

alloc1

alloc2

alloc3

alloc4

alloc5

alloc6

Easily Understood Non-Determinism!

[MICRO ‘07, Top Picks ’08; Automatic: PLDI ‘11]

~50 of ½ Million LOCs modified in SpecINT 2000Mods also include Non-Deterministic Branch

Multicore Needs:1. Automatic resource allocation/scheduling, speculation/commit, and pipelining. 2. Low overhead access to programmer insight. 3. Code reuse. Ideally, this includes support of legacy codes as well as new codes. 4. Intelligent automatic parallelization.

Complainer/Fixer

InsightAnnotation

One Implementation

InsightAnnotation

OtherOptis

SpeculativeOptis

SumReduction

Unroll

Rotate

0.8XSum

Reduction

Unroll

SumReduction

Rotate

Unroll

Iterative Compilation[Cooper ‘05; Almagor ‘04; Triantafyllis ’05]

PS-DSWPComplainer

Red Edges: Deps between malloc() & free()Blue Edges: Deps between rand() callsGreen Edges: Flow Deps inside Inner LoopOrange Edges: Deps between function calls

Unroll

SumReduction

Rotate

PS-DSWPComplainer Who can

help me? ProgrammerAnnotation

PS-DSWPComplainer

SumReduction

PS-DSWPComplainer

SumReduction

PROGRAMMERCommutative

PS-DSWPComplainer

SumReduction

LIBRARYCommutative

PS-DSWPComplainer

SumReduction

LIBRARYCommutative

1 8 16 24 32 40 48 56 640

1020304050

Scalable Speedup!

Parallel HMMER V2HMMER with Commutative

Multicore Needs:1. Automatic resource allocation/scheduling, speculation/commit, and pipelining. 2. Low overhead access to programmer insight. 3. Code reuse. Ideally, this includes support of legacy codes as well as new codes. 4. Intelligent automatic parallelization.

Complainer/Fixer

InsightAnnotation

One Implementation

InsightAnnotation

OtherOptis

SpeculativeOptis

Performance relative to Best Sequential128 Cores in 32 Nodes with Intel Xeon Processors [MICRO 2010]

Restoration of Trend

“Compiler Advances Double Computing Power Every 18 Years!” – Proebsting’s Law

Compiler Technology

Architecture/Devices

Era of DIY:• Multicore• Reconfigurable• GPUs• Clusters

Compiler technology inspired class of architectures?

The End

A Roadmap to Restoring Computing's Former Glory

Documents

resolution l e i Restoring Former Glory with e Cotton Buds and a ... DM4000 M/Newslet… · magazine for experts in art restoration and analysis of irreplaceable originals resolution

Gleevec's Glory Days Gleevec's Glory Days

Restoring Tradition

Restoring shalom

The Taj Mumbai Restoring Glory, Rebuilding Faith

Glory Glory

A Roadmap to Restoring Computing's Former Glory David I. August Princeton University (Not speaking for Parakinetics, Inc.)

Glory To God Forever Glory To God Forever Music by Brian Doerksen Music by Brian Doerksen Glory to God, glory to God, glory to God, forever! Glory to God,

Glory Glory Hallelujah Glory Glory Hallelujah Glory Glory Hallelujah His truth is marching on!

2020 8 16 10:00am Waiting upon God's Word · Glory to the Lamb (HOP210) Glory, glory, glory to the Lamb; Glory, glory, glory to the Lamb. For He is glorious and worthy to be praised,

6 Principles for Restoring Gastrointestinal Healthdata.integrativepro.com/downloads/6-principles-restoring-gi-health.pdf · Six Principles For Restoring Gastrointestinal Health

Network Computing's 2016 Infrastructure Salary Survey

Heritage Keepers: Restoring the Haldeman 1ST QTR.pdfHeritage Keepers: Restoring the Haldeman Mansion to its Glory Days Written by Heidi Schellenger, HMPS Vice President and Fundraising

Restoring Jaguars To Their Full Glory

Restoring breathing

Restoring our senses, restoring the Earth. Fostering

Restoring health

Restoring Power

restOring glOry tO CreightOn Bannerman Castle: the ... · Opening Day in BrOOklyn! BaseBall greats in green‑WOOD trOlley tOur Bannerman Castle: the meDieval ruins Of huDsOn valley

Glory to Jesus Christ! Glory Forever!