13
[email protected] 21 May 2013 Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden 1 Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de/RH-bio.pdf The Impact of Reconfigurable Computing on Manycore Programming Trends Reiner Hartenstein 1 10 Sep 2009, Stockholm, Sweden keynote [email protected] 9:15 10:00 Teaching for Change: an early martyr „Turing is irrelevant“ The von Neumann model is the emulation of a tape machine http://www.sigsoft.org/SEN/parnas.html D. L. Parnas (keynote): 10 th Conf. Softw. Engineering Education and Training (CSEET '97) „The von Neumann syndrome“: coined ~ a decade later Prof. C.V. Ramamoorthy, (UC Berkeley), SDPS 2006, San Diego, CA Dijkstra 1968: The Goto considered harmful R. Hartenstein, G. Koch 1975: The universal Bus considered harmful Backus 1978: Can programming be liberated from the von Neumann style? Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006: Why Software is bad … Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover of Comm_ACM) Critique of von Neumann is not new: punished for blasphemy? (mimicking tape on RAM) Peter G. Neumann Teaching for Change http://hartenstein.de © 2009, [email protected] 3 Outline (1) The Power Consumption of Computing The Single-Core Approach The Multicore Scenario The Silver Bullet? A CPU-centric Flat World The Generalisation of Software Engineering Conclusions http://hartenstein.de 4 Impact of the von Neumann Syndrome http://www.forbes.com/forbes/1999/0531/6311070a.html Dig more coal -- the PCs are coming Peter W. Huber, Mark P. Mills, 05.31.99 [1989 from a student at Kaiserslautern] never run out of energy? typical oil field operation coal hydro nuclear gas oil [Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/ 2007: 80% crude oil coming from decline fields natural gas: similar situation > 30 % ~ 55 % Production (%) 100 0 5 „6 more Saudi Arabias needed for demand predicted for 2030“ http://hartenstein.de © 2009, [email protected] 6 Server Farms the electricity bill is a key issue at banks of the Columbia river: [Randy Katz: IEEE Spectrum, Febr. 2009] Am. football fields at Quincy, size: 10 Power consumption by internet: x30 til 2030 if trends continue G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 11 Sep 2008 Quincy Dalles Boardman WASHINGTON OREGON 48 MW 48 MW power for 40,000 homes each 6500 m 2 each 6500 m 2 at Dallas

never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

1 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

The Impact of

Reconfigurable Computing on

Manycore Programming Trends

Reiner Hartenstein

1

10 Sep 2009, Stockholm, Sweden

ke

yn

ote

[email protected]

9:15 – 10:00 http://hartenstein.de

© 2009, [email protected]

2

Teaching for Change: an early martyr

„Turing is irrelevant“

The von Neumann model is the emulation of a tape machine

http://www.sigsoft.org/SEN/parnas.html

D. L. Parnas (keynote):

"Teaching for Change“;

10th Conf. Softw. Engineering Education

and Training (CSEET '97)

„The von Neumann syndrome“: coined ~ a decade later

Prof. C.V. Ramamoorthy, (UC Berkeley),

SDPS 2006, San Diego, CA

Brad Cox 1990: Planning the Software Industrial Revolution

Dijkstra 1968: The Goto considered harmful

R. Hartenstein, G. Koch 1975: The universal Bus considered harmful Backus 1978: Can programming be liberated from the von Neumann style?

Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006:

Why Software is bad …

Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover

of Comm_ACM)

Critique of von Neumann is not new:

punished for blasphemy?

(mimicking tape on RAM)

Peter G. Neumann

Teaching for Change

http://hartenstein.de

© 2009, [email protected]

3

Outline (1)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

4

Impact of the

von Neumann

Syndrome

http://www.forbes.com/forbes/1999/0531/6311070a.html

Dig more coal -- the PCs are coming Peter W. Huber, Mark P. Mills, 05.31.99

[1989 from a student at Kaiserslautern]

http://hartenstein.de

© 2009, [email protected]

5

never run out of energy?

typical oil field operation

coal

hydro

nuclear

gas

oil

[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/

2007: 80% crude oil coming from decline fields

natural gas: similar situation

> 30 %

~ 55 %

Pro

du

ctio

n (%

)

100

0 5

„6 more Saudi Arabias needed for demand predicted for 2030“

http://hartenstein.de

© 2009, [email protected]

6

Server Farms

the electricity bill is a key issue

at banks of the Columbia river: [Randy Katz: IEEE Spectrum, Febr. 2009]

Am. football fields at Quincy, size: 10

Power consumption by internet: x30 til 2030 if trends continue G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008

Quincy

Dalles

Boardman

WASHINGTON

OREGON

48 MW 48 MW power for 40,000 homes

each 6500 m2 each 6500 m2

at Dallas

Page 2: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

2 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

7

Power Consumption of Computers

Energy cost may overtake IT equipment cost in the near future

but „we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]

... has become an industry-wide issue: incremental improvements are on track,

[Albert Zomaya]

Current trends will lead to unaffordable future operation cost of our cyber infrastructure

(subject of my talk)

http://hartenstein.de

© 2009, [email protected]

8

Outline (2)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

9

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

109

108

107

106

105

104

103

free ride on

Moore‘s Law

the burden of

software performance is the task of chip designers*

year

*) M-&-C-created population

Single-core

approach: Software Performance

http://hartenstein.de

© 2009, [email protected]

10

Rapid VLSI Design Education Revolution

1980 - 1983

10 Carver Mead

Lynn Conway

The most effective project in the

history of modern computer science 10

E.I.S. project

The incubator of the free ride on Moore‘s law

DARPA; NSF; many national governments; European Union …

massive

funding:

[Foto: GMD]

Created the missing designer population

(Heinz Riesenhuber)

http://hartenstein.de

© 2009, [email protected]

11

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

109

108

107

106

105

104

103

The End of Moore‘s Law

the end of the

single-core era

year

The end of

Moore‘s Law

soon: the 20

nm wall

stop in

2005

traditional instruction-based computing is running out of steam [DAC’09 special session: Computation in the Post-Turing Era]

http://hartenstein.de

© 2009, [email protected]

12

year

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

1010

1013

1012

1011

relative performance

109

108

107

106

105

104

103

10 12 14 16 18 20 22 24 26 28 30

the end of the

single-core era

number of transistors

doubles every 2 years

Growth beyond Moore‘s Law?

processor cores

the key issue:

Page 3: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

3 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

13 13

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

be

gin of the

multic

ore era

Multimedia in the Multicore Era

Multimedia Performance Needs

application performance needs up to:

Audio 800 MIPS Graphics 11 GOPS Video 160 GOPS Digital TV 900 GOPS

[Pierre Paulin, MPSoC’09]

http://hartenstein.de

© 2009, [email protected]

14 14

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

be

gin of the

multicore era

next standard

Broadband in the Multicore Era

needed performance

growing faster than

Moore‘s law

[courtesy E. Sanchez] MIPS

GSM GPRS EDGE UMTS

http://hartenstein.de

© 2009, [email protected]

15

ICT is at an inflection point

Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.

Cheap Revolution:

„Broadband is significant at the inflection point, prompting major market governance changes“

massive funding needed Cowhey‘s & Aronson‘s Law:

affordable broadband & software performance

„Future prosperity depends on network capacity, ..., efficient pricing, and flexible platforms“

handheld & living room commercially more important than the comparatively small PC market.

requirement growing

faster than Moore‘s law

[courtesy E. Sanchez] MIPS

http://hartenstein.de

© 2009, [email protected]

16

Funding market governance changes

RUS Broadband Initiatives Program (BIP)

http://www.broadbandusa.gov/

NTIA Broadband Technology Opportunities Program (BTOP).

ARPA-E ?

EU-FP7 ?

DARPA ?

other sources

EFRCEs ? Energy Frontier Research Centers 777 bio $

The Recovery Act: $7.2 billion

http://hartenstein.de

© 2009, [email protected]

17

Outline (3)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

18

Multicore has been around for decades

•ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent

•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines

•Kendall Square Research •Key Computer Laboratories

Dead (Super)Computer Society [Gordon Bell, keynote, ISCA 2000]

•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

the single core sequential mind set was the winner

only 2 or 3 successes

most in 1985-1995

- mainly research

18

Page 4: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

4 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

19 19

Speed-up factors by GPGPUs (1)

The power efficiency is disputable The power efficiency is disputable

(up to ~150 x)

[Michael Garland, NVIDIA Research: Parallel Computing on Manycore GPUs; IPDPS, Rome, Italy, June 25-29, 2009]

this hardware can only be used only in certain ways.

Jan 2007

July 2007

Jan 2008

July 2008

Jan 2009

July 2009

Jan 2010

100

103

102

101 Spe

edup

-Fac

tor

Imaging

Video Video

146

20

130

30

100

47 50

149

18

36

Bioinformatics

Numerics Numerics

effective only at problems that can be solved using stream processing.

streams provide data parallelism *) migration from x86 singlecore

*

? such speed-ups by GPGPUs only for embarrassingly parallel applications

?

http://hartenstein.de

© 2009, [email protected]

20 20

Speed-up factors by GPGPUs (2)

CUDA ZONE pages [NVIDIA Corp.]: non-reviewed CUDA user submissions

http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home

Spe

edup

-Fac

tor

Jan 2007

July 2007

Jan 2008

July 2008

Jan 2009

July 2009

Jan 2010

Cryptography Cryptography

12

50

55

2 Imaging

5

169

20

100

50

30

327

100

5

20

90

109

13

40

10

15

10

36

100

50

35

50

Bioinformatics

3 . 5

30

270

20

4

15 16 26

10

4

150

100

35

29

13

4 . 3

Numerics Numerics

15

35

60

40

4

170

12

90

10

15

500 420

75

675 340

50

10

172

50 60

100

169

2

100

50

3

2

5

10

270

27

8 7

32

470

150

9

10

100

30

138

55

Video & Audio Video & Audio

7

20

9

10

9

60

CFD CFD Computational Fluid Dyamics Computational Fluid Dyamics

23

120

39

17

55

100

77

10

29

1 . 3

4

10

DCC Digital Content Creation

5

3 5

Graphics

50

2

100

16

25 26

3

Astrophysics Astrophysics

250 250

100

103

102

101 DSP Digital Signal Processing

5

35

50

31

35

8

260

EDA

34 oil & gas

Compute Unified Device Architecture (CUDA), accelerates BLAS libraries (Basic Linear Algebra Subroutines)

Less flexible than FPGAs.

(GPGPU tool development years earlier than f. x86)

NVIDIA GeForce GTX

stream processor

cores

minium power supply

recommended

275 240 650–680 watt

295 480 650–680 watt

Intel Xeon "Nehalem-EX" for servers: 8 cores Intel Core™2 Quad (desktop PCs): 4 cores

(up to ~600 x)

*) migration from x86 singlecore

*

http://hartenstein.de

© 2009, [email protected]

21 21

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

Growth by Multicore

be

gin of the

multic

ore era

http://hartenstein.de

© 2009, [email protected]

22

John Hennessy:

widespread confusion and competing claims, „I would be panicked if I were in industry“

Hastily knitted

compilers for the

heavy lifting?

e. g. automatically parallelizing

compilation via multi-threading, and many other

ad-hoc solutions

“wait for current generation of

programmers to die off and be replaced

by programmers brought up on the

new paradigm?” [Jim Turley]

new types of bugs

introduced

http://hartenstein.de

© 2009, [email protected]

23

Outline (4)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

24

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

be

gin of the

multicore era

Growth in the Multicore Era

Page 5: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

5 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

25 25

FFT FFT 100

Reed-Solomon Decoding

Reed-Solomon Decoding 2400

Viterbi Decoding Viterbi Decoding 400

1000

MAC MAC

DSP and wireless

molecular dynamics simulation

molecular dynamics simulation

88

BLAST BLAST 52

protein identification protein identification 40

Smith-Waterman pattern matching Smith-Waterman pattern matching

288

Bioinformatics

GRAPE GRAPE

20 20 Astrophysics Astrophysics

SPIHT wavelet-based image compression

SPIHT wavelet-based image compression 457

real-time face detection

real-time face detection

6000 6000

video-rate stereo vision

video-rate stereo vision

900 pattern

recognition pattern

recognition 730

Image processing, Pattern matching, Multimedia

3000 CT imaging CT imaging

100

103

106

Spe

edup

-Fac

tor

Speed-up

factors

obtained

by Software

to Configware

migration

vs. GPU: almost 50x (up to ~30,000x) ~50x

(200x)

(200x)

CUDA ZONE

Garland IPDPS‘09

8723

DNA & protein sequencing

crypto crypto 1000

28514 DES breaking DES breaking

by FPGA:

intel supports direct front side bus access by FPGAs

“... design techniques will evolve, by necessity, to satisfy the demands of reconfigurable hardware and software programmability”. J. R. Rattner, DAC 2008

2 orders of magnitude

http://hartenstein.de

© 2009, [email protected]

26 26

FFT FFT 100

Reed-Solomon Decoding

Reed-Solomon Decoding 2400

Viterbi Decoding Viterbi Decoding 400

1000

MAC MAC

DSP and wireless

molecular dynamics simulation

molecular dynamics simulation

88

BLAST BLAST 52

protein identification protein identification 40

Smith-Waterman pattern matching Smith-Waterman pattern matching

288

Bioinformatics

GRAPE GRAPE

20 20 Astrophysics Astrophysics

SPIHT wavelet-based image compression

SPIHT wavelet-based image compression 457

real-time face detection

real-time face detection

6000 6000

video-rate stereo vision

video-rate stereo vision

900 pattern

recognition pattern

recognition 730

Image processing, Pattern matching, Multimedia

3000 CT imaging CT imaging crypto crypto

1000

28514 DES breaking DES breaking

8723

DNA & protein sequencing

1985 1990 100

103

106

Spe

edup

-Fac

tor

*) TU Kaiserslautern 100

103

106 Speedup-Factor

+ Pre-FPGA solutions

2000

39.4

160

15000

2-D FIR filter (no FPGA: DPLA by TU-KL*) 2-D FIR filter (no FPGA: DPLA by TU-KL*)

Lee Routing (DPLA by TU-KL*) Lee Routing (DPLA by TU-KL*)

Grid-based DRC: no FPGA: DPLA on MoM by TU-KL*

Grid-based DRC: no FPGA: DPLA on MoM by TU-KL*

Grid-based DRC* („fair comparizon“) Grid-based DRC* („fair comparizon“)

fabricated by E.I.S. Multi University

Project

1984: 1 DPLA replaced 256

FPGAs

http://hartenstein.de

© 2009, [email protected]

27

Software vs. FPGA

96 98 00 02 04 06 08

105

104

103

102

101

100 year

10

relative performance

1990 1995 2000

200

100

0

50

150

75

25

125

175

SP

EC

fp2000/M

Hz/B

illion

Tra

nsis

tors

HP

[BWRC, UC Berkeley, 2004]

0.5

x M

OP

S/M

Hz/B

illion

Tra

nsis

tors

420

1996

46

Benchmarks: Moore‘s law does not indicate microprocessor MIPS

?

!

http://hartenstein.de

© 2009, [email protected]

28

Moore’s law not applicable to all aspects

For multicore*: the Law of More …

…with drastically declining programmer productivity

*) number of cores doubles every 2 years

http://hartenstein.de

© 2009, [email protected]

29

Massive

Energy Saving

factors: ~10%

of speedup factor

29

FFT FFT 100

Reed-Solomon Decoding

Reed-Solomon Decoding 2400

Viterbi Decoding Viterbi Decoding 400

1000

MAC MAC

DSP and wireless

molecular dynamics simulation

molecular dynamics simulation

88

BLAST BLAST 52

protein identification protein identification 40

Smith-Waterman pattern matching Smith-Waterman pattern matching

288

Bioinformatics

GRAPE GRAPE

20 20 Astrophysics Astrophysics

SPIHT wavelet-based image compression

SPIHT wavelet-based image compression 457

real-time face detection

real-time face detection

6000 6000

video-rate stereo vision

video-rate stereo vision

900 pattern

recognition pattern

recognition 730

Image processing, Pattern matching, Multimedia

3000 CT imaging CT imaging crypto crypto

1000

28514 DES breaking DES breaking

100

103

106

Spe

edup

-Fac

tor

http://hartenstein.de

© 2009, [email protected]

8723

DNA & protein sequencing

Software

vs. FPGA

(2)

http://hartenstein.de

© 2009, [email protected]

30 30

[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]

Application Speed-up factor

Savings factors

Power Cost Size

DNA and Protein sequencing 8723 779 22 253

DES breaking 28514 3439 96 1116

much less equipment

needed

much less memory and bandwidth needed massively saving energy

RC*: Demonstrating the intensive Impact

SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster

Tarek El-Ghazawi

*) RC = Reconfigurable Computing

Page 6: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

6 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

31

Why such Speed-up Factors ...

... with FPGAs: a much worse technology ! massive wiring overhead

+ routing congestion growing with FPGA size

+ massive reconfigurability overhead

main reason: no von Neumann Syndrome!

more recently also: more „platform FPGAs“

The „Reconfigurable Computing Paradox“

http://hartenstein.de

© 2009, [email protected]

32

RC versus Multicore

RC: speed-up often higher by orders of magnitude

RC: energy-efficiency often higher: very much, or, by orders of magnitude ?

Sure !

Sure !

We need both: Multicore and RC

this is the

silver bullet Multicore:

legacy software, control-intensive applications, etc.

„RC“ = Reconfigurable Computing

http://hartenstein.de

© 2009, [email protected]

33 33

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

en

d o

f th

e

sin

gle

co

re

e

ra

33 Reconfigurable Computing is indispensable!

For a Booming Multicore Era

von-Neumann-only is not the silver bullet http://hartenstein.de

© 2009, [email protected]

34

Outline (5)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

35

CPU-centric

flat world

sequential-only mind set –

(Aristotelian model)

typical programmer qualification:

This Software-centric world model is obsolete

CPU-“centric“ but no hardware know-how CPU-“centric“ but no hardware know-how

(kind of tunnel view)

http://hartenstein.de

© 2009, [email protected]

36

Machine Model of the Mainframe Era

Machine model

resources sequencer

property programming

source property programming source state register

CPU hardwired - programmable Software (instruction streams)

program counter

Page 7: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

7 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

37

40 years Software Crisis

Nathan’s Law: Software is a gas. It expands to fill its container ...

Nathan Myhrvold

… until being limited by Moore’s Law [& Kryder’s Law]

Wirth‘s Law

“software is slowing faster than hardware is accelerating“

F. L. Bauer

Oct 1957 The Economist: Nov 19th 1955

In 1955, Parkinson could not have foreseen the impact of software. formula: bureaucracy growth independent of actual work to be done

[Niklaus Wirth]

[Cyril Northcote Parkinson]

Software critics is not new:

F. L. Bauer 1968, coined the term „Software Crisis“

N. N. 1995: THE STANDISH GROUP REPORT

Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005

Anthony Berglas 2008: Why it is Important that Software Projects Fail L. Savain 2006: Why Software is bad

Peter G. Neumann 1985-2003:

216x “Inside Risks“(18 years inside back cover of Comm_ACM)

http://hartenstein.de

© 2009, [email protected]

38

“Software”

overhead piles up to code sizes of astronomic dimensions

The von Neumann

Syndrome:

stands for extremely memory-cycle-hungry instruction streams

stolen from Bob Colwell

memory wall,

caches, ...

from earlier talks: from earlier talks:

datastream parallelism

instruction stream parallelism

data meet PU

C.V. Ramamoorthy

“The Memory Wall”

coined by Sally McKee (& co-author)

Patterson’s Law:

Dave Patterson

bandwidth gap grows 50% / year

has reached >1000x

the uglyness of this term

http://hartenstein.de

© 2009, [email protected]

39

Machine Model of the PC Era

Machine model

resources sequencer

property programming

source property programming source state register

ASIC accelerator hardwired - hardwired -

CPU hardwired - programmable Software (instruction streams)

program counter

Application-Specific Integrated Circuit & other accelerators: e.g. graphics processor

wagging the dog“

“the tail is

ISIS 1997 Austin, TX [ ]

http://hartenstein.de

© 2009, [email protected]

40

Science does not progress continuously,

Thomas S. Kuhn 1969: The Structure of Scientific Revolutions

…in which the established paradigm is overthrown and replaced. .

? ? ?

Thomas S. Kuhn

The von Neumann paradigm?

… shortcomings in an established paradigm produces a crisis

that may lead to a revolution

http://hartenstein.de

© 2009, [email protected]

41

Outline (6)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

42

From CPU to RPU

machine model

right now

resources sequencer

property programming

source property programming source

state register

ASIC accelerator hardwired - hardwired -

CPU hardwired - programmable Software

(instruction streams)

program counter

RPU accelerator

programmable Configware (configuration

code) programmable Flowware

(data streams) data

counters

we need 2 more program sources

Reconfigurable

Processing Unit

non-von-Neumann-

now accelerators are programmable!

Page 8: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

8 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

43

[Thomas S. Kuhn 1969: The Structure of Scientific Revolutions]

“… in which the established paradigm is overthrown and replaced.”

However, not the von Neumann paradigm will be overthrown and replaced.

The CPU-centric world model of Software Engineering will be replaced by removing the tunnel view perspective

Thomas Kuhn is right !

What Revolution?

http://hartenstein.de

© 2009, [email protected]

44

RC* outside a

CPU-centric

flat world?

For the

Multicore era

we need

a new model

(Copernican)

For the

Multicore era

we need

a new model

(Copernican)

*) RC = Reconfigurable Computing

http://hartenstein.de

© 2009, [email protected]

45

Program Program Performance

„Multicore computers shift the burden of software performance from chip designers to programmers.“

we anyway need a Software Education Revolution ... Since People have to write code differently,

[J. Larus: Spending Moore's Dividend; C_ACM, May 2009]

... performance drops & other problems in moving single-core to multicore ...

... the chance to move RC* from niche to mainstream

a scenario like before the Mead-&-Conway revolution Missing programmer population and methodology:

*) RC = Reconfigurable Computing

Embedded syst. & hdw scene have the right background to reform the parallelism education of SW programmers

http://hartenstein.de

© 2009, [email protected]

46

A Heliocentric CS Model

FE

Flowware

Engineering

PE

Program Engineering

The Generalization of Software Engineering —

A Twin Paradigm Dual Dichotomy Approach.

time to space mapping

issue

SE

Software

Engineering

RPU RPU

special

*) do not confuse with „dataflow“!

http://hartenstein.de

© 2009, [email protected]

47

A Multicore Submarine Model?

C is not the silver bullet: it’s inherently serial

mapping parallelism just into the time domain: “abstracting” away the space domain is fatal

But nobody wants to learn a new language. There is no easy way to program in parallel

The programmer needs to understand how data flows through cores, accelerators, interconnect and peripherals

The programmer* needs system visualization in the space domain, to understand performance under parallelism

The datastream model of the twin-paradigm approach helps to understand the space domain and parallelism

*) and, especially the student

http://hartenstein.de

© 2009, [email protected]

48

Our Contemporary Computer Machine Model

Machine model

resources sequencer

property programming

source property programming source state register

ASIC accelerator hardwired - hardwired -

CPU hardwired - programmable Software (instruction streams)

program counter

RPU accelerator programmable

Configware (configuration

code) programmable

Flowware (data

streams)

data counters

twin Paradigm Dichotomy

in CPU

in RAM

data counters of reconfigurable address generators in asM (auto-sequencing) data memory blocks

the same language primitives!

Page 9: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

9 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

49

Time to Space Mapping

Machine model

resources sequencer

property programming

source property programming source state register

ASIC accelerator hardwired - hardwired -

CPU hardwired - programmable Software (instruction streams)

program counter

RPU accelerator programmable

Configware (configuration

code) programmable

Flowware (data

streams)

data counters

Relativity Dichotomy

„The biggest payoff will come from Putting Old ideas into Practice and teaching people how to apply them properly.“ D

avid

Par

nas loop turns

2 pipeline C C

1967

http://hartenstein.de

© 2009, [email protected]

50

How to achieve acceptance

Hardware description languages hidden

Courses tailored for students not being hardware-savvy

Tools usable by users not being hardware designers

[Courtesy Richard Newton]

„How to hide the ugliness from the user“ [Herman Schmit]

http://hartenstein.de

© 2009, [email protected]

51

traditional qualification in the time domain

51

Software Education (R)evolution:

+ lean qualification in the space domain

= lean hardware modeling qualification at a higher level of abstraction

by simultaneous dual domain co-education:

viable methodology for dual rail education (only a few % curricula need to be changed)

step by step, not overthrowing the SE scene

http://hartenstein.de

© 2009, [email protected]

52

We need a Software Education Revolution

2010 - ....

52

The incubator of the free ride on Cowhey‘s & Aronson‘s law massive

funding

required partially re-write the code

Create the missing programmer population

next most effective project in the history of modern computer science

DOS to Windows took 10 years

http://hartenstein.de

© 2009, [email protected]

53

Community Building Function

of the DATE Friday Workshop Friday, March 12, 2010, 08:30 – 17:00

Friday Workshop

[email protected]

Software Education Revolution

for using Multicore - and RC* (SERUM-RC*)

http://www.date-conference.com DATE-Conference, Dresden, DE:

CfP: http://fpl.org/cfp/ 53

*) Reconfigurable Computing

http://hartenstein.de

© 2009, [email protected]

54

RAW 2010

17th Reconfigurable Architectures Workshop

April 19-20, 2010, Atlanta (Georgia), USA

http://www.ece.lsu.edu/vaidy/raw/

Run-Time Reconfiguration & Adaptive Computing:

Architectures, Algorithms, Technologies

http://www.ipdps.org/

24th IEEE International Parallel and Distributed Processing Symposium

April 19-23, 2010, Atlanta (Georgia) USA

in conjunction with:

Manuscript due: October 18, 2009 Notification of acceptance: December 14, 2009 Camera-ready Papers Due: February 1, 2010

Page 10: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

10 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

55

Outline (7)

• The Power Consumption of Computing

• The Single-Core Approach

• The Multicore Scenario

• The Silver Bullet?

• A CPU-centric Flat World

• The Generalisation of Software Engineering

• Conclusions

http://hartenstein.de

© 2009, [email protected]

56

To maintain a Booming Multicore Era:

Not without Reconfigurable Computing!

Conclusions (1)

04 06 08 10 12 14 16 18 20 22 24 26 28 30

year

relative performance

56

possible for 2 or 3 more decades?

th

e e

nd

o

f th

e

sin

gle

co

re

e

ra

http://hartenstein.de

© 2009, [email protected]

57

additional Flowware / Configware skills are essential qualifications for programmers.

Mead-&-Conway-style SE Revolution toward dual-rail education is urgently needed

key motivation: performance and energy consumption of programs

we need to master hetero of all 3: Singlecore, Multicore, & Reconfigurable Computing

massive long term

R&D funding required

like known from DARPA A main problem: selecting (or creating) tools for lab courses

SERUM-RC

the key issue: ease of use!

Conclusions

http://hartenstein.de

© 2009, [email protected]

58

We need „une' Levée en Masses“

58

We need „une'

Levée en Masses“

We need „une'

Levée en Masses“

http://hartenstein.de

© 2009, [email protected]

59

thank you for your

patience

59

http://hartenstein.de

© 2009, [email protected]

60

END

Page 11: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

11 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

61

Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],

EU grant (80ies), 85 mio ECU (pre-€): complete EDA framework [4,5] around KARL

1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)

1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution: the multi university „E.I.S. project“ (gov. grant: 38 million Deutschmark)

IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards

Professor (ordinarius emeritus), TU Kaiserslautern

CV of Reiner Hartenstein

All acad. degrees from KIT Karlsruhe Institute of Technology (his mentor: Karl Steinbuch)

Creator of KARL[2], most successful [3] trailblazer HDL before VHDL came up

[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)

[4] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993. also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html

[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator, embedded router, automatic test generation, testability analysis, structured logic synthesis, simulator, et al. -- also see [4]

[2] R. Hartenstein: Fundamentals of Structured Hardware Design; American Elsevier, 1977 -- Bestseller

Founder / co-founder of several international annual conference series

[email protected]

61

1977 & later used as a

textbook at UC Berkeley

(not only here)

KARL: a Pascalish hardware language

[3] for users, usage details, quotations,etc.see: http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html

his hobby: giving keynotes

http://hartenstein.de/keynotes.htm

http://hartenstein.de

© 2009, [email protected]

62

backup for discussion

62

http://hartenstein.de

© 2009, [email protected]

63

Double Dichotomy

2) Relativity Dichotomy

-Procedure time:

(Software-Domain)

-Structure space:

(Configware-Domain)

1) Paradigm Dichotomy

instruction stream von Neumann Machine

(Software-Domain) data stream Datastream Machine

(Flowware-Domain)

63

http://hartenstein.de

© 2009, [email protected]

64

Relativity Dichotomy

time domain: space domain:

procedure domain structure domain

2 phases: 1) programming

instruction streams 2) run time

3 phases: 1) reconfiguration

of structures

time space

2) programming data streams

3) run time

64

von Neumann Machine Datastream Machine

(time time/space)

http://hartenstein.de

© 2009, [email protected]

65

time-iterative to space-iterative

65

a time to

space/time

mapping

loop transformation methodogy: 70ies and later

n*k time steps, 1 CPU

n time steps, k DPUs

Often the space dimension is limited n time steps, 1 CPU

1 time step, n DPUs

a time to

space

mapping

e. g. example: bubble sort migration Strip mining

[D. Loveman, J-ACM, 1977]

http://hartenstein.de

© 2009, [email protected]

66

time to space mapping

time domain: space domain:

procedure domain structure domain

66

program loop n time steps, 1 CPU

pipeline 1 time step, n DPUs

Bubble Sort n x k time steps,

1 „conditional swap“ unit

Shuffle Sort k time steps, n „conditional swap“ units

time algorithm space algorithm

conditional swap

x

y

conditional swap

conditional swap

conditional swap

conditional swap

time algorithm

space/time algorithm s

Page 12: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

12 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

67

1

2

3

4

5

6

7

8 y

x

1 2 3 4 5 6 7 8

JPEG zigzag scan pattern

EastScan is step by [1,0] end EastScan;

SouthScan is step by [0,1] endSouthScan;

*> Declarations

NorthEastScan is loop 6 times until [*,1] step by [1,-1] endloop end NorthEastScan;

SouthWestScan is loop 7 times until [1,*] step by [-1,1] endloop end SouthWestScan;

HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;

goto PixMap[1,1]

HalfZigZag; SouthWestScan uturn (HalfZigZag)

HalfZigZag

data counter data counter

data counter data counter

2

1

3

4

HalfZigZag

Main program:

Flowware language example (MoPL): programming the datastream

x

y

67 (an animation)

http://hartenstein.de

© 2009, [email protected]

68

Programming model: Flowware

Adder

Speaker

FMDemod

LPF1

Split

Gather

LPF2 LPF3

HPF1 HPF2 HPF3

Source:

MIT

StreamIT

• Pros for streaming – Streamlined, low-overhead

communication – (More) deterministic behaviour – Good match for many simple media

rich applications

[Pierre Paulin]

We‘ve to find out, which applications types and programming models Students should exercise for the flowware approach

• Cons – control-dominated applications – shunt yard problem

http://hartenstein.de

© 2009, [email protected]

69

Flowware

from a generalization of the systolic arrays

supports any wild free form of pipe networks:

spiral, zigzag, fork and join, and even more wild,

unidirectional and fully or partially bidirectional,

Flowware: scheduling data streams -

Fifos, stacks, registers, register files, RAM blocks...

Flowware means parallelism resulting from time to space migration

http://hartenstein.de

© 2009, [email protected]

70

Ways to implement an Algorithm

• Hardware

• Software

• Configware

• mixed

von Neumann-machine

datastream machine

multicore

. manycore per se

singlecore

manycore

RAM-based

http://hartenstein.de

© 2009, [email protected]

71

Acceleration Mechanisms

•parallelism by multi bank memory architecture •auxiliary hardware for address calculation •address calculation before run time •avoiding multiple accesses to the same data. •avoiding memory cycles for address computation •optimization by storage scheme transformations •optimization by memory architecture transformations

http://hartenstein.de

© 2009, [email protected]

72

New boundary constraints are the limiting factor

7272 27 October 200827 October 2008 Software 2008, ZurichSoftware 2008, Zurich

Legacy scientific applications: predominantly sequential

The entire software ecosystem will need to evolve (including curricula): O/S, libraries, software development environments, compilers and languages

additional levels of parallelism: chaining, pipelining, systolic, super-systolic, wavefront arrays

additional data structures and storage organization: the new distributed memory discipline

New boundary constraints

Page 13: never run out of energy? Farms · •DAPP •Denelcor • •Elexsi •ETA Systems 1985 •Evans and Sutherland research Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace

[email protected]

21 May 2013

Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

13 Reiner Hartenstein, TU Kaiserslautern, Germany

http://hartenstein.de/RH-bio.pdf

http://hartenstein.de

© 2009, [email protected]

73

old Paradigms and Methodologies

1946: Machine Paradigm (von Neumann)

1980: Datastreams (Kung, Leiserson)

1989: Anti Machine** Paradigm (TU-KL)

1990: first rDPA* (Rabaey)

1994: higher Anti Machine** Programming Language (Flowware: TU-KL)

1995: super systolic array: rDPA (Kress)

1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...

1997+: Discipline of Distributed Memory Architectures (IMEC …)

1997: first automatically partitioning Configware/Software Co-Compiler (TU-KL)

*) rDPA = reconfigurable Data Path Array

**) datastream machine (flowware machine): no „dataflow machine“!!

http://hartenstein.de

© 2009, [email protected]

74

http://hartenstein.de

© 2009, [email protected]

75 [email protected]

Teaching computing fundamentals

Ignoring Reconfigurable Computing in teaching computing fundamentals within our CS curricula

causing to waste billions of dollars.

is one of the biggest mistakes in the history of information technology application