Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
1 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
The Impact of
Reconfigurable Computing on
Manycore Programming Trends
Reiner Hartenstein
1
10 Sep 2009, Stockholm, Sweden
ke
yn
ote
9:15 – 10:00 http://hartenstein.de
© 2009, [email protected]
2
Teaching for Change: an early martyr
„Turing is irrelevant“
The von Neumann model is the emulation of a tape machine
http://www.sigsoft.org/SEN/parnas.html
D. L. Parnas (keynote):
"Teaching for Change“;
10th Conf. Softw. Engineering Education
and Training (CSEET '97)
„The von Neumann syndrome“: coined ~ a decade later
Prof. C.V. Ramamoorthy, (UC Berkeley),
SDPS 2006, San Diego, CA
Brad Cox 1990: Planning the Software Industrial Revolution
Dijkstra 1968: The Goto considered harmful
R. Hartenstein, G. Koch 1975: The universal Bus considered harmful Backus 1978: Can programming be liberated from the von Neumann style?
Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006:
Why Software is bad …
Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover
of Comm_ACM)
Critique of von Neumann is not new:
punished for blasphemy?
(mimicking tape on RAM)
Peter G. Neumann
Teaching for Change
http://hartenstein.de
© 2009, [email protected]
3
Outline (1)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
4
Impact of the
von Neumann
Syndrome
http://www.forbes.com/forbes/1999/0531/6311070a.html
Dig more coal -- the PCs are coming Peter W. Huber, Mark P. Mills, 05.31.99
[1989 from a student at Kaiserslautern]
http://hartenstein.de
© 2009, [email protected]
5
never run out of energy?
typical oil field operation
coal
hydro
nuclear
gas
oil
[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/
2007: 80% crude oil coming from decline fields
natural gas: similar situation
> 30 %
~ 55 %
Pro
du
ctio
n (%
)
100
0 5
„6 more Saudi Arabias needed for demand predicted for 2030“
http://hartenstein.de
© 2009, [email protected]
6
Server Farms
the electricity bill is a key issue
at banks of the Columbia river: [Randy Katz: IEEE Spectrum, Febr. 2009]
Am. football fields at Quincy, size: 10
Power consumption by internet: x30 til 2030 if trends continue G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008
Quincy
Dalles
Boardman
WASHINGTON
OREGON
48 MW 48 MW power for 40,000 homes
each 6500 m2 each 6500 m2
at Dallas
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
2 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
7
Power Consumption of Computers
Energy cost may overtake IT equipment cost in the near future
but „we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]
... has become an industry-wide issue: incremental improvements are on track,
[Albert Zomaya]
Current trends will lead to unaffordable future operation cost of our cyber infrastructure
(subject of my talk)
http://hartenstein.de
© 2009, [email protected]
8
Outline (2)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
9
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
109
108
107
106
105
104
103
free ride on
Moore‘s Law
the burden of
software performance is the task of chip designers*
year
*) M-&-C-created population
Single-core
approach: Software Performance
http://hartenstein.de
© 2009, [email protected]
10
Rapid VLSI Design Education Revolution
1980 - 1983
10 Carver Mead
Lynn Conway
The most effective project in the
history of modern computer science 10
E.I.S. project
The incubator of the free ride on Moore‘s law
DARPA; NSF; many national governments; European Union …
massive
funding:
[Foto: GMD]
Created the missing designer population
(Heinz Riesenhuber)
http://hartenstein.de
© 2009, [email protected]
11
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
109
108
107
106
105
104
103
The End of Moore‘s Law
the end of the
single-core era
year
The end of
Moore‘s Law
soon: the 20
nm wall
stop in
2005
traditional instruction-based computing is running out of steam [DAC’09 special session: Computation in the Post-Turing Era]
http://hartenstein.de
© 2009, [email protected]
12
year
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
1010
1013
1012
1011
relative performance
109
108
107
106
105
104
103
10 12 14 16 18 20 22 24 26 28 30
the end of the
single-core era
number of transistors
doubles every 2 years
Growth beyond Moore‘s Law?
processor cores
the key issue:
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
3 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
13 13
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
be
gin of the
multic
ore era
Multimedia in the Multicore Era
Multimedia Performance Needs
application performance needs up to:
Audio 800 MIPS Graphics 11 GOPS Video 160 GOPS Digital TV 900 GOPS
[Pierre Paulin, MPSoC’09]
http://hartenstein.de
© 2009, [email protected]
14 14
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
be
gin of the
multicore era
next standard
Broadband in the Multicore Era
needed performance
growing faster than
Moore‘s law
[courtesy E. Sanchez] MIPS
GSM GPRS EDGE UMTS
http://hartenstein.de
© 2009, [email protected]
15
ICT is at an inflection point
Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.
Cheap Revolution:
„Broadband is significant at the inflection point, prompting major market governance changes“
massive funding needed Cowhey‘s & Aronson‘s Law:
affordable broadband & software performance
„Future prosperity depends on network capacity, ..., efficient pricing, and flexible platforms“
handheld & living room commercially more important than the comparatively small PC market.
requirement growing
faster than Moore‘s law
[courtesy E. Sanchez] MIPS
http://hartenstein.de
© 2009, [email protected]
16
Funding market governance changes
RUS Broadband Initiatives Program (BIP)
http://www.broadbandusa.gov/
NTIA Broadband Technology Opportunities Program (BTOP).
ARPA-E ?
EU-FP7 ?
DARPA ?
other sources
EFRCEs ? Energy Frontier Research Centers 777 bio $
The Recovery Act: $7.2 billion
http://hartenstein.de
© 2009, [email protected]
17
Outline (3)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
18
Multicore has been around for decades
•ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent
•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines
•Kendall Square Research •Key Computer Laboratories
Dead (Super)Computer Society [Gordon Bell, keynote, ISCA 2000]
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
the single core sequential mind set was the winner
only 2 or 3 successes
most in 1985-1995
- mainly research
18
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
4 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
19 19
Speed-up factors by GPGPUs (1)
The power efficiency is disputable The power efficiency is disputable
(up to ~150 x)
[Michael Garland, NVIDIA Research: Parallel Computing on Manycore GPUs; IPDPS, Rome, Italy, June 25-29, 2009]
this hardware can only be used only in certain ways.
Jan 2007
July 2007
Jan 2008
July 2008
Jan 2009
July 2009
Jan 2010
100
103
102
101 Spe
edup
-Fac
tor
Imaging
Video Video
146
20
130
30
100
47 50
149
18
36
Bioinformatics
Numerics Numerics
effective only at problems that can be solved using stream processing.
streams provide data parallelism *) migration from x86 singlecore
*
? such speed-ups by GPGPUs only for embarrassingly parallel applications
?
http://hartenstein.de
© 2009, [email protected]
20 20
Speed-up factors by GPGPUs (2)
CUDA ZONE pages [NVIDIA Corp.]: non-reviewed CUDA user submissions
http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home
Spe
edup
-Fac
tor
Jan 2007
July 2007
Jan 2008
July 2008
Jan 2009
July 2009
Jan 2010
Cryptography Cryptography
12
50
55
2 Imaging
5
169
20
100
50
30
327
100
5
20
90
109
13
40
10
15
10
36
100
50
35
50
Bioinformatics
3 . 5
30
270
20
4
15 16 26
10
4
150
100
35
29
13
4 . 3
Numerics Numerics
15
35
60
40
4
170
12
90
10
15
500 420
75
675 340
50
10
172
50 60
100
169
2
100
50
3
2
5
10
270
27
8 7
32
470
150
9
10
100
30
138
55
Video & Audio Video & Audio
7
20
9
10
9
60
CFD CFD Computational Fluid Dyamics Computational Fluid Dyamics
23
120
39
17
55
100
77
10
29
1 . 3
4
10
DCC Digital Content Creation
5
3 5
Graphics
50
2
100
16
25 26
3
Astrophysics Astrophysics
250 250
100
103
102
101 DSP Digital Signal Processing
5
35
50
31
35
8
260
EDA
34 oil & gas
Compute Unified Device Architecture (CUDA), accelerates BLAS libraries (Basic Linear Algebra Subroutines)
Less flexible than FPGAs.
(GPGPU tool development years earlier than f. x86)
NVIDIA GeForce GTX
stream processor
cores
minium power supply
recommended
275 240 650–680 watt
295 480 650–680 watt
Intel Xeon "Nehalem-EX" for servers: 8 cores Intel Core™2 Quad (desktop PCs): 4 cores
(up to ~600 x)
*) migration from x86 singlecore
*
http://hartenstein.de
© 2009, [email protected]
21 21
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
Growth by Multicore
be
gin of the
multic
ore era
http://hartenstein.de
© 2009, [email protected]
22
John Hennessy:
widespread confusion and competing claims, „I would be panicked if I were in industry“
Hastily knitted
compilers for the
heavy lifting?
e. g. automatically parallelizing
compilation via multi-threading, and many other
ad-hoc solutions
“wait for current generation of
programmers to die off and be replaced
by programmers brought up on the
new paradigm?” [Jim Turley]
new types of bugs
introduced
http://hartenstein.de
© 2009, [email protected]
23
Outline (4)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
24
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
be
gin of the
multicore era
Growth in the Multicore Era
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
5 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
25 25
FFT FFT 100
Reed-Solomon Decoding
Reed-Solomon Decoding 2400
Viterbi Decoding Viterbi Decoding 400
1000
MAC MAC
DSP and wireless
molecular dynamics simulation
molecular dynamics simulation
88
BLAST BLAST 52
protein identification protein identification 40
Smith-Waterman pattern matching Smith-Waterman pattern matching
288
Bioinformatics
GRAPE GRAPE
20 20 Astrophysics Astrophysics
SPIHT wavelet-based image compression
SPIHT wavelet-based image compression 457
real-time face detection
real-time face detection
6000 6000
video-rate stereo vision
video-rate stereo vision
900 pattern
recognition pattern
recognition 730
Image processing, Pattern matching, Multimedia
3000 CT imaging CT imaging
100
103
106
Spe
edup
-Fac
tor
Speed-up
factors
obtained
by Software
to Configware
migration
vs. GPU: almost 50x (up to ~30,000x) ~50x
(200x)
(200x)
CUDA ZONE
Garland IPDPS‘09
8723
DNA & protein sequencing
crypto crypto 1000
28514 DES breaking DES breaking
by FPGA:
intel supports direct front side bus access by FPGAs
“... design techniques will evolve, by necessity, to satisfy the demands of reconfigurable hardware and software programmability”. J. R. Rattner, DAC 2008
2 orders of magnitude
http://hartenstein.de
© 2009, [email protected]
26 26
FFT FFT 100
Reed-Solomon Decoding
Reed-Solomon Decoding 2400
Viterbi Decoding Viterbi Decoding 400
1000
MAC MAC
DSP and wireless
molecular dynamics simulation
molecular dynamics simulation
88
BLAST BLAST 52
protein identification protein identification 40
Smith-Waterman pattern matching Smith-Waterman pattern matching
288
Bioinformatics
GRAPE GRAPE
20 20 Astrophysics Astrophysics
SPIHT wavelet-based image compression
SPIHT wavelet-based image compression 457
real-time face detection
real-time face detection
6000 6000
video-rate stereo vision
video-rate stereo vision
900 pattern
recognition pattern
recognition 730
Image processing, Pattern matching, Multimedia
3000 CT imaging CT imaging crypto crypto
1000
28514 DES breaking DES breaking
8723
DNA & protein sequencing
1985 1990 100
103
106
Spe
edup
-Fac
tor
*) TU Kaiserslautern 100
103
106 Speedup-Factor
+ Pre-FPGA solutions
2000
39.4
160
15000
2-D FIR filter (no FPGA: DPLA by TU-KL*) 2-D FIR filter (no FPGA: DPLA by TU-KL*)
Lee Routing (DPLA by TU-KL*) Lee Routing (DPLA by TU-KL*)
Grid-based DRC: no FPGA: DPLA on MoM by TU-KL*
Grid-based DRC: no FPGA: DPLA on MoM by TU-KL*
Grid-based DRC* („fair comparizon“) Grid-based DRC* („fair comparizon“)
fabricated by E.I.S. Multi University
Project
1984: 1 DPLA replaced 256
FPGAs
http://hartenstein.de
© 2009, [email protected]
27
Software vs. FPGA
96 98 00 02 04 06 08
105
104
103
102
101
100 year
10
relative performance
1990 1995 2000
200
100
0
50
150
75
25
125
175
SP
EC
fp2000/M
Hz/B
illion
Tra
nsis
tors
HP
[BWRC, UC Berkeley, 2004]
0.5
x M
OP
S/M
Hz/B
illion
Tra
nsis
tors
420
1996
46
Benchmarks: Moore‘s law does not indicate microprocessor MIPS
?
!
http://hartenstein.de
© 2009, [email protected]
28
Moore’s law not applicable to all aspects
For multicore*: the Law of More …
…with drastically declining programmer productivity
*) number of cores doubles every 2 years
http://hartenstein.de
© 2009, [email protected]
29
Massive
Energy Saving
factors: ~10%
of speedup factor
29
FFT FFT 100
Reed-Solomon Decoding
Reed-Solomon Decoding 2400
Viterbi Decoding Viterbi Decoding 400
1000
MAC MAC
DSP and wireless
molecular dynamics simulation
molecular dynamics simulation
88
BLAST BLAST 52
protein identification protein identification 40
Smith-Waterman pattern matching Smith-Waterman pattern matching
288
Bioinformatics
GRAPE GRAPE
20 20 Astrophysics Astrophysics
SPIHT wavelet-based image compression
SPIHT wavelet-based image compression 457
real-time face detection
real-time face detection
6000 6000
video-rate stereo vision
video-rate stereo vision
900 pattern
recognition pattern
recognition 730
Image processing, Pattern matching, Multimedia
3000 CT imaging CT imaging crypto crypto
1000
28514 DES breaking DES breaking
100
103
106
Spe
edup
-Fac
tor
http://hartenstein.de
© 2009, [email protected]
8723
DNA & protein sequencing
Software
vs. FPGA
(2)
http://hartenstein.de
© 2009, [email protected]
30 30
[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]
Application Speed-up factor
Savings factors
Power Cost Size
DNA and Protein sequencing 8723 779 22 253
DES breaking 28514 3439 96 1116
much less equipment
needed
much less memory and bandwidth needed massively saving energy
RC*: Demonstrating the intensive Impact
SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster
Tarek El-Ghazawi
*) RC = Reconfigurable Computing
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
6 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
31
Why such Speed-up Factors ...
... with FPGAs: a much worse technology ! massive wiring overhead
+ routing congestion growing with FPGA size
+ massive reconfigurability overhead
main reason: no von Neumann Syndrome!
more recently also: more „platform FPGAs“
The „Reconfigurable Computing Paradox“
http://hartenstein.de
© 2009, [email protected]
32
RC versus Multicore
RC: speed-up often higher by orders of magnitude
RC: energy-efficiency often higher: very much, or, by orders of magnitude ?
Sure !
Sure !
We need both: Multicore and RC
this is the
silver bullet Multicore:
legacy software, control-intensive applications, etc.
„RC“ = Reconfigurable Computing
http://hartenstein.de
© 2009, [email protected]
33 33
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
en
d o
f th
e
sin
gle
co
re
e
ra
33 Reconfigurable Computing is indispensable!
For a Booming Multicore Era
von-Neumann-only is not the silver bullet http://hartenstein.de
© 2009, [email protected]
34
Outline (5)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
35
CPU-centric
flat world
sequential-only mind set –
(Aristotelian model)
typical programmer qualification:
This Software-centric world model is obsolete
CPU-“centric“ but no hardware know-how CPU-“centric“ but no hardware know-how
(kind of tunnel view)
http://hartenstein.de
© 2009, [email protected]
36
Machine Model of the Mainframe Era
Machine model
resources sequencer
property programming
source property programming source state register
CPU hardwired - programmable Software (instruction streams)
program counter
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
7 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
37
40 years Software Crisis
Nathan’s Law: Software is a gas. It expands to fill its container ...
Nathan Myhrvold
… until being limited by Moore’s Law [& Kryder’s Law]
Wirth‘s Law
“software is slowing faster than hardware is accelerating“
F. L. Bauer
Oct 1957 The Economist: Nov 19th 1955
In 1955, Parkinson could not have foreseen the impact of software. formula: bureaucracy growth independent of actual work to be done
[Niklaus Wirth]
[Cyril Northcote Parkinson]
Software critics is not new:
F. L. Bauer 1968, coined the term „Software Crisis“
N. N. 1995: THE STANDISH GROUP REPORT
Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005
Anthony Berglas 2008: Why it is Important that Software Projects Fail L. Savain 2006: Why Software is bad
Peter G. Neumann 1985-2003:
216x “Inside Risks“(18 years inside back cover of Comm_ACM)
http://hartenstein.de
© 2009, [email protected]
38
“Software”
overhead piles up to code sizes of astronomic dimensions
The von Neumann
Syndrome:
stands for extremely memory-cycle-hungry instruction streams
stolen from Bob Colwell
memory wall,
caches, ...
from earlier talks: from earlier talks:
datastream parallelism
instruction stream parallelism
data meet PU
C.V. Ramamoorthy
“The Memory Wall”
coined by Sally McKee (& co-author)
Patterson’s Law:
Dave Patterson
bandwidth gap grows 50% / year
has reached >1000x
the uglyness of this term
http://hartenstein.de
© 2009, [email protected]
39
Machine Model of the PC Era
Machine model
resources sequencer
property programming
source property programming source state register
ASIC accelerator hardwired - hardwired -
CPU hardwired - programmable Software (instruction streams)
program counter
Application-Specific Integrated Circuit & other accelerators: e.g. graphics processor
wagging the dog“
“the tail is
ISIS 1997 Austin, TX [ ]
http://hartenstein.de
© 2009, [email protected]
40
Science does not progress continuously,
Thomas S. Kuhn 1969: The Structure of Scientific Revolutions
…in which the established paradigm is overthrown and replaced. .
? ? ?
Thomas S. Kuhn
The von Neumann paradigm?
… shortcomings in an established paradigm produces a crisis
that may lead to a revolution
http://hartenstein.de
© 2009, [email protected]
41
Outline (6)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
42
From CPU to RPU
machine model
right now
resources sequencer
property programming
source property programming source
state register
ASIC accelerator hardwired - hardwired -
CPU hardwired - programmable Software
(instruction streams)
program counter
RPU accelerator
programmable Configware (configuration
code) programmable Flowware
(data streams) data
counters
we need 2 more program sources
Reconfigurable
Processing Unit
non-von-Neumann-
now accelerators are programmable!
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
8 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
43
[Thomas S. Kuhn 1969: The Structure of Scientific Revolutions]
“… in which the established paradigm is overthrown and replaced.”
However, not the von Neumann paradigm will be overthrown and replaced.
The CPU-centric world model of Software Engineering will be replaced by removing the tunnel view perspective
Thomas Kuhn is right !
What Revolution?
http://hartenstein.de
© 2009, [email protected]
44
RC* outside a
CPU-centric
flat world?
For the
Multicore era
we need
a new model
(Copernican)
For the
Multicore era
we need
a new model
(Copernican)
*) RC = Reconfigurable Computing
http://hartenstein.de
© 2009, [email protected]
45
Program Program Performance
„Multicore computers shift the burden of software performance from chip designers to programmers.“
we anyway need a Software Education Revolution ... Since People have to write code differently,
[J. Larus: Spending Moore's Dividend; C_ACM, May 2009]
... performance drops & other problems in moving single-core to multicore ...
... the chance to move RC* from niche to mainstream
a scenario like before the Mead-&-Conway revolution Missing programmer population and methodology:
*) RC = Reconfigurable Computing
Embedded syst. & hdw scene have the right background to reform the parallelism education of SW programmers
http://hartenstein.de
© 2009, [email protected]
46
A Heliocentric CS Model
FE
Flowware
Engineering
PE
Program Engineering
The Generalization of Software Engineering —
A Twin Paradigm Dual Dichotomy Approach.
time to space mapping
issue
SE
Software
Engineering
RPU RPU
special
*) do not confuse with „dataflow“!
http://hartenstein.de
© 2009, [email protected]
47
A Multicore Submarine Model?
C is not the silver bullet: it’s inherently serial
mapping parallelism just into the time domain: “abstracting” away the space domain is fatal
But nobody wants to learn a new language. There is no easy way to program in parallel
The programmer needs to understand how data flows through cores, accelerators, interconnect and peripherals
The programmer* needs system visualization in the space domain, to understand performance under parallelism
The datastream model of the twin-paradigm approach helps to understand the space domain and parallelism
*) and, especially the student
http://hartenstein.de
© 2009, [email protected]
48
Our Contemporary Computer Machine Model
Machine model
resources sequencer
property programming
source property programming source state register
ASIC accelerator hardwired - hardwired -
CPU hardwired - programmable Software (instruction streams)
program counter
RPU accelerator programmable
Configware (configuration
code) programmable
Flowware (data
streams)
data counters
twin Paradigm Dichotomy
in CPU
in RAM
data counters of reconfigurable address generators in asM (auto-sequencing) data memory blocks
the same language primitives!
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
9 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
49
Time to Space Mapping
Machine model
resources sequencer
property programming
source property programming source state register
ASIC accelerator hardwired - hardwired -
CPU hardwired - programmable Software (instruction streams)
program counter
RPU accelerator programmable
Configware (configuration
code) programmable
Flowware (data
streams)
data counters
Relativity Dichotomy
„The biggest payoff will come from Putting Old ideas into Practice and teaching people how to apply them properly.“ D
avid
Par
nas loop turns
2 pipeline C C
1967
http://hartenstein.de
© 2009, [email protected]
50
How to achieve acceptance
Hardware description languages hidden
Courses tailored for students not being hardware-savvy
Tools usable by users not being hardware designers
[Courtesy Richard Newton]
„How to hide the ugliness from the user“ [Herman Schmit]
http://hartenstein.de
© 2009, [email protected]
51
traditional qualification in the time domain
51
Software Education (R)evolution:
+ lean qualification in the space domain
= lean hardware modeling qualification at a higher level of abstraction
by simultaneous dual domain co-education:
viable methodology for dual rail education (only a few % curricula need to be changed)
step by step, not overthrowing the SE scene
http://hartenstein.de
© 2009, [email protected]
52
We need a Software Education Revolution
2010 - ....
52
The incubator of the free ride on Cowhey‘s & Aronson‘s law massive
funding
required partially re-write the code
Create the missing programmer population
next most effective project in the history of modern computer science
DOS to Windows took 10 years
http://hartenstein.de
© 2009, [email protected]
53
Community Building Function
of the DATE Friday Workshop Friday, March 12, 2010, 08:30 – 17:00
Friday Workshop
Software Education Revolution
for using Multicore - and RC* (SERUM-RC*)
http://www.date-conference.com DATE-Conference, Dresden, DE:
CfP: http://fpl.org/cfp/ 53
*) Reconfigurable Computing
http://hartenstein.de
© 2009, [email protected]
54
RAW 2010
17th Reconfigurable Architectures Workshop
April 19-20, 2010, Atlanta (Georgia), USA
http://www.ece.lsu.edu/vaidy/raw/
Run-Time Reconfiguration & Adaptive Computing:
Architectures, Algorithms, Technologies
http://www.ipdps.org/
24th IEEE International Parallel and Distributed Processing Symposium
April 19-23, 2010, Atlanta (Georgia) USA
in conjunction with:
Manuscript due: October 18, 2009 Notification of acceptance: December 14, 2009 Camera-ready Papers Due: February 1, 2010
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
10 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
55
Outline (7)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de
© 2009, [email protected]
56
To maintain a Booming Multicore Era:
Not without Reconfigurable Computing!
Conclusions (1)
04 06 08 10 12 14 16 18 20 22 24 26 28 30
year
relative performance
56
possible for 2 or 3 more decades?
th
e e
nd
o
f th
e
sin
gle
co
re
e
ra
http://hartenstein.de
© 2009, [email protected]
57
additional Flowware / Configware skills are essential qualifications for programmers.
Mead-&-Conway-style SE Revolution toward dual-rail education is urgently needed
key motivation: performance and energy consumption of programs
we need to master hetero of all 3: Singlecore, Multicore, & Reconfigurable Computing
massive long term
R&D funding required
like known from DARPA A main problem: selecting (or creating) tools for lab courses
SERUM-RC
the key issue: ease of use!
Conclusions
http://hartenstein.de
© 2009, [email protected]
58
We need „une' Levée en Masses“
58
We need „une'
Levée en Masses“
We need „une'
Levée en Masses“
http://hartenstein.de
© 2009, [email protected]
59
thank you for your
patience
59
http://hartenstein.de
© 2009, [email protected]
60
END
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
11 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
61
Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],
EU grant (80ies), 85 mio ECU (pre-€): complete EDA framework [4,5] around KARL
1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)
1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution: the multi university „E.I.S. project“ (gov. grant: 38 million Deutschmark)
IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards
Professor (ordinarius emeritus), TU Kaiserslautern
CV of Reiner Hartenstein
All acad. degrees from KIT Karlsruhe Institute of Technology (his mentor: Karl Steinbuch)
Creator of KARL[2], most successful [3] trailblazer HDL before VHDL came up
[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)
[4] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993. also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html
[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator, embedded router, automatic test generation, testability analysis, structured logic synthesis, simulator, et al. -- also see [4]
[2] R. Hartenstein: Fundamentals of Structured Hardware Design; American Elsevier, 1977 -- Bestseller
Founder / co-founder of several international annual conference series
61
1977 & later used as a
textbook at UC Berkeley
(not only here)
KARL: a Pascalish hardware language
[3] for users, usage details, quotations,etc.see: http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html
his hobby: giving keynotes
http://hartenstein.de/keynotes.htm
http://hartenstein.de
© 2009, [email protected]
62
backup for discussion
62
http://hartenstein.de
© 2009, [email protected]
63
Double Dichotomy
2) Relativity Dichotomy
-Procedure time:
(Software-Domain)
-Structure space:
(Configware-Domain)
1) Paradigm Dichotomy
instruction stream von Neumann Machine
(Software-Domain) data stream Datastream Machine
(Flowware-Domain)
63
http://hartenstein.de
© 2009, [email protected]
64
Relativity Dichotomy
time domain: space domain:
procedure domain structure domain
2 phases: 1) programming
instruction streams 2) run time
3 phases: 1) reconfiguration
of structures
time space
2) programming data streams
3) run time
64
von Neumann Machine Datastream Machine
(time time/space)
http://hartenstein.de
© 2009, [email protected]
65
time-iterative to space-iterative
65
a time to
space/time
mapping
loop transformation methodogy: 70ies and later
n*k time steps, 1 CPU
n time steps, k DPUs
Often the space dimension is limited n time steps, 1 CPU
1 time step, n DPUs
a time to
space
mapping
e. g. example: bubble sort migration Strip mining
[D. Loveman, J-ACM, 1977]
http://hartenstein.de
© 2009, [email protected]
66
time to space mapping
time domain: space domain:
procedure domain structure domain
66
program loop n time steps, 1 CPU
pipeline 1 time step, n DPUs
Bubble Sort n x k time steps,
1 „conditional swap“ unit
Shuffle Sort k time steps, n „conditional swap“ units
time algorithm space algorithm
conditional swap
x
y
conditional swap
conditional swap
conditional swap
conditional swap
time algorithm
space/time algorithm s
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
12 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
67
1
2
3
4
5
6
7
8 y
x
1 2 3 4 5 6 7 8
JPEG zigzag scan pattern
EastScan is step by [1,0] end EastScan;
SouthScan is step by [0,1] endSouthScan;
*> Declarations
NorthEastScan is loop 6 times until [*,1] step by [1,-1] endloop end NorthEastScan;
SouthWestScan is loop 7 times until [1,*] step by [-1,1] endloop end SouthWestScan;
HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;
goto PixMap[1,1]
HalfZigZag; SouthWestScan uturn (HalfZigZag)
HalfZigZag
data counter data counter
data counter data counter
2
1
3
4
HalfZigZag
Main program:
Flowware language example (MoPL): programming the datastream
x
y
67 (an animation)
http://hartenstein.de
© 2009, [email protected]
68
Programming model: Flowware
Adder
Speaker
FMDemod
LPF1
Split
Gather
LPF2 LPF3
HPF1 HPF2 HPF3
Source:
MIT
StreamIT
• Pros for streaming – Streamlined, low-overhead
communication – (More) deterministic behaviour – Good match for many simple media
rich applications
[Pierre Paulin]
We‘ve to find out, which applications types and programming models Students should exercise for the flowware approach
• Cons – control-dominated applications – shunt yard problem
http://hartenstein.de
© 2009, [email protected]
69
Flowware
from a generalization of the systolic arrays
supports any wild free form of pipe networks:
spiral, zigzag, fork and join, and even more wild,
unidirectional and fully or partially bidirectional,
Flowware: scheduling data streams -
Fifos, stacks, registers, register files, RAM blocks...
Flowware means parallelism resulting from time to space migration
http://hartenstein.de
© 2009, [email protected]
70
Ways to implement an Algorithm
• Hardware
• Software
• Configware
• mixed
von Neumann-machine
datastream machine
multicore
. manycore per se
singlecore
manycore
RAM-based
http://hartenstein.de
© 2009, [email protected]
71
Acceleration Mechanisms
•parallelism by multi bank memory architecture •auxiliary hardware for address calculation •address calculation before run time •avoiding multiple accesses to the same data. •avoiding memory cycles for address computation •optimization by storage scheme transformations •optimization by memory architecture transformations
http://hartenstein.de
© 2009, [email protected]
72
New boundary constraints are the limiting factor
7272 27 October 200827 October 2008 Software 2008, ZurichSoftware 2008, Zurich
Legacy scientific applications: predominantly sequential
The entire software ecosystem will need to evolve (including curricula): O/S, libraries, software development environments, compilers and languages
additional levels of parallelism: chaining, pipelining, systolic, super-systolic, wavefront arrays
additional data structures and storage organization: the new distributed memory discipline
New boundary constraints
21 May 2013
Opening keynote, the 6th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden
13 Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de/RH-bio.pdf
http://hartenstein.de
© 2009, [email protected]
73
old Paradigms and Methodologies
1946: Machine Paradigm (von Neumann)
1980: Datastreams (Kung, Leiserson)
1989: Anti Machine** Paradigm (TU-KL)
1990: first rDPA* (Rabaey)
1994: higher Anti Machine** Programming Language (Flowware: TU-KL)
1995: super systolic array: rDPA (Kress)
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
1997+: Discipline of Distributed Memory Architectures (IMEC …)
1997: first automatically partitioning Configware/Software Co-Compiler (TU-KL)
*) rDPA = reconfigurable Data Path Array
**) datastream machine (flowware machine): no „dataflow machine“!!
http://hartenstein.de
© 2009, [email protected]
74
http://hartenstein.de
© 2009, [email protected]
Teaching computing fundamentals
Ignoring Reconfigurable Computing in teaching computing fundamentals within our CS curricula
causing to waste billions of dollars.
is one of the biggest mistakes in the history of information technology application