CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November

CAPES / DFG Project Universidade do Brasilia

Universitaet KaiserslauternUniversitaet Karlsruhe

Reiner Hartenstein*

University ofKaiserslautern

November 14, 2003, Brasilia, Brazil

Present and Future of Reconfigurable

Systems

*) IEEE fellow

© 2003, [email protected] http://hartenstein.de2

University of Kaiserslautern

Xputer LabLiterature (also downloads)

http://hartenstein.de

also click „recent talks“this page: also links to available Ph. D theses:

Becker ,Herz, Kress, Nageldinger,



Xputer LabReconfigurable Computing:

a second programming domain

Migration of programming to the structural domain

The opportunity to introduce the structural domain to programmers ...

The structural domain has become RAM-based

... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm



Xputer LabIT ages

mainframe age

computer age (PC age)

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

von Neumann does not support morphware

flowware

here?



Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de



Xputer Labfine grain

• Fine Grain morphware platforms

already mainstream: reconfigurable logic

just logic design on a strange platform ?

speed-up til 3 orders of magnitude



Xputer Lab

cost / mio §

4

3

2

1mask set

cost [eASIC]

NRE and mask cost

[dataquest] .

12 12 16 20 26 28 30 >30no. of masks

0.8 0.6 0.35 0.25 0.18 0.15 0.13 0.1 0.07 feature size

PC: 25%

22%communication

others: 31%

6 %automotive

16% consumer

Xilinx42%

Altera37%

Lattice15%

Actel6%

Top 4 PLD Manufacturers 2000total: $3.7 Bio

• [Dataquest] > $7 billion by 2003.

• FPGAs going into every type of application – also SoC• fastest growing segment of semiconductor market

you don‘t need specific silicon !

you don‘t need specific silicon !

rGAs



Xputer Lab

switch

rGA with island architecture(Ausschnitt)

connect

switch

Rainer Hartenstein



Xputer Lab switch box• R

eko

nfi

gu

rier

bar

switch box

switch

point



Xputer Lab connect box• R

eko

nfi

gu

rier

bar

connect boxconnect point

part of configuration

memory



Xputer Lab

Verbindungspunkt (vergrößert)

Verbindungs-Punkt• R

eko

nfi

gu

rier

bar

reconfigurable logic box

illustration



Xputer Lab connection activated

Die Zuleitung zur Funktionswahl des

rLB nicht gezeigt

reconfigurable logic box

illustration



Xputer Labconnect point activated• R

ou

tin

g

Rainer Hartenstein



Xputer Lab

der 4. Schaltpunkt

der 5. Schaltpunkt

3 Schaltpunkte switch points

activated

• Ro

uti

ng

switch box

switch

point



Xputer Lab Routing continued

• Ro

uti

ng

Rainer Hartenstein



Xputer Lab A

B

Plazierungs- und Routing Software bekannt s. 25 Jahren

Solche Netzwerk-Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar.

1979 Silva Lisco (Silicon Valley Research Corp.) bietet CALM-P an

20 Transistors + 20 Flipflops

Routing completed

for 1 net

•Routing



Xputer Lab

A

B

Passing through: long distance wiring from rLBs outside this region

Routing:long distance nets

A path can be used only once at a time .....

Rainer Hartenstein



Xputer LabA

B

CCDD

C and D are not reachable.

A bridge can be passed only once (bridges of Königsberg)

routing congestion

C cannot be connected with D.








Xputer Lab

Leonhard Euler

Euler‘s problem of the bridges of Königsberg is such a network problem (1736):

Find a way, which passes each bridge exactly once .....

... also an optimization: none of the bridges remains unused.

1736



Xputer LabL. Euler: Solutio Problematis Ad geometriam Situs

Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140

Graph

edge

node

Left Bank

Right Bank

Kneiphof Island

Other Island



Xputer Lab

adjacency matrix

Data structures for Graphs

ListGraph

1 2

3 4

0000

10

10

100

1

0

100

1234

1 2 3 4from

to

2 14 /2

3 /

2 /33 /4

directed graph

1 2

3 4

0

110

10

11

110

1

0

110

1234

1 2 3 4from

to

3 /2 13 1 22 1 33 /2 4

4 /

4 /

undirected graph

J. E. Hopcroft, R. E. Tarjan: Efficient algorithm

for graph manipulation; Comm. ACM, 1973



Xputer Lab

ENIAC, completed 1945

Partitioning over racks in the hallPartitioning over card cages in the rackPartitioning over boards (cards) in card cages Partitioning over chips etc. on the card (e. g. SBC)Partitioning over blocks on the chip (e. g. microprocessor)

Large Scale Routing



Xputer LabPCBs (printed circuit boards)

for 40 years

MULTEC at Böblingen produces printed circuits boards since 1963

planar „wiring“

no. of pins is limited



Xputer Lab

Integated Citcuit (Chip)limited number of pins

„wiring“ on a planar surface



Xputer Labhierarchy

card cage

rack

cardchip

macro cell

basic cell

more levels

Kaisers-lautern

1

KL2 KL3 KL4

FTI1

JWGU

FTI2

IMS1

IMS2

IMS3

IMS

IMS

IMS

IMS

IMSIMS



Xputer Labwiring

hierarchy

cables in the rackconnect thecard cages

card cage wiringconnectsthe cards

card wiring connects the chips

macro cell

cell

on-Chip-wiringconnectsthe cells

*) 30er: Telefon-Vermittlung (ohne Chips,Crossbar / Hebdreh-Wähler statt Karten)40er: erste Computer (ohne Chips)



Xputer Lab An obsolete Application Area



before fabrication ?

after fabrication ?



Xputer Lab

Celaro Pro (Mentor)

Dini Group

Dini Group

EmulatorsQuickturn

PCi bus extender

Dini group



Xputer LabCrossbar

324 x 4

n=8

no. of crossbar chips

n x n/2n

8 32

100 5000

cossbar chips in

a row

full crossbar

64

64

14

32

nn

8 8

100 100

no. of crossbar chips

cossbar chips in

a row

partial crossbar



Xputer Lab

14 Logic Chips (Lchip) with 128 pins(occasionally for rout-through)

32 Crossbar Chips (Xchip) with 72 I/O pins(for rout-through only)

each Xchip: 4 pins connected to each Lchip

8 Logic cards per card cage

Logik-Karte

Einschub

Schrank

8 card cages per rack

8 Ychip cards per card cage

Backplane: 8 Zboard cards per rack

Routing



Xputer Lab

1913 J. N. Reynold‘s crossbar switch

1915 patent granted

1926 first public telefon switching application in Shweden

Betulander‘s crossbar switch 1919

NASA telemetrics crossbar array 1964

Crossbar ?



Xputer LabRWC Real World Computing, Japan, 40 TFLOPS

Crossbar weight: 220 tons, 3000 km cable,5120 processors with 5000 pins each



Xputer Lab Routing Congestion

Example

direct connection impossible

rGA rGA rGA rGA

rGA rGA rGA rGA

rout-throughdetour connection



Xputer LabRouting-only configuration

(2 examples)

rLB

Identitityfunction

configured

• Ro

uti

ng



Xputer Lab

T. Uehara, W. M. van Cleemput: Optimal Layout of CMOS Functional Arrays; IEEE Trans. C-30, pp. 305-312, May 1981

Graphs, Partitioning, Algorithms

B. Kernighan, S. Lin: An Efficient Heuristic Procedure for Partitioning Graphs; BSTJ 49, 1970,

C. Alpert, A. Kahng: Recent Directions in Netlist Partitioning: A Survey; Integration, vol 19 (1-2), pp. 1-81, 1995

T. Cormen, et al.: Introduction to Algorithms; MIT Press / McGraw-Hill, 1991



Xputer Labwhy emulators are obsolete

10 000 000

1 000 000

100 000

10 000

1 000

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

planned

Virtex II

XC 40250XV

Virtex

XC 4085XL

100

System gates per rGA chip

Jahr

[Xilinx Data]

200

500



Xputer Lab

More and more the prototyping platform of rGA based systems will be directly delivered as the product to the customer: fully configured

ASICs lost the battle. rGAs are the winners

0.1 3

2001 2002 2003 2004

year

50,000

40,000

30,000

20,000

10,000

0c)

number of design starts

rGA-basiert

[N. Tredennick, Gilder Technology Report, 2003]

why declining ASIC business?

ASIC emulators have been a transient solution: now with declining commercial significance.

you don‘t need specific silicon !you don‘t need specific silicon !



Xputer Lab

• FPGA Fabric-based on Virtex-II Architecture

Source: Ivo Bolsens, Xilinx

On Chip Memory Controller

Power PCCore

EmbededRAM

RocketIO

Xilinx: full hierarchy on chip

from rack to chipfrom rack to chip• Xilinx Virtex-II Pro

FPGA Architecture

• PowerPC 405 RISC CPU (PPC405) cores








Xputer Labfocusing on coarse grain

• Fine Grain morphware platforms

• Coarse Grain platforms:

already mainstream: reconfigurable logicjust logic design on a strange platform

Reconfigurable Computing :not that new – but shocking the

fundamentals of CS curricula

an order of magnitude more MIPS/mW than fine grain



Xputer Labwhy coarse grain

1000

100

10

1

0.1

0.01

0.0012 1 0.5 0.25 0.13 0.1 0,07

MOPS / mW

µ feature size

FPGAs (reconfigurable logic)hardwired

instruction set processors

standard microprocessor

DSP

T. Claasen et al.: ISSCC 1999*) R. Hartenstein: ISIS 1997

rDPAs (reconfigurable computing)*

flexibility

throughput

hard-wired

vonNeumann

FPGAs

coarse grain goes far beyond bridging the gap

coarsegrain



Xputer Lab

Reconfigurable Interconnect Fabric

separate routing area

rDPA (Reconfigurable Datapath Array)

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

RIF layouted over rDPUs:rDPA wired by abutment



Xputer LabCMOS intercoonnect resources

Foundries offer up to 9 metal layers

and up to 3 poly layers

reconfigurable interconnect fabric layouted over the

rDU cell



Xputer LabCommercial rDPAs

XPU family (IP cores):PACT Corp., Munich

XPU128



Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

mapping algorithms efficently onto rDPA

rout thru only

not usedbackbus connect

SNN filter on KressArray

by the way: example of scalability / relocatability by EDA support

„Structured

Configware

Design“ [R. H.]



Xputer Lab

badly scalable

Hundreds of rGAs or very large rGAs

Routing congestion growing exponentially

•Routing



Xputer Lab Communication Resource Requirements

... often Functional Resources are not the Throughput

BottleneckIn some Application Areas,such as e. g. Wireless Communication, Reconfigurable Computing Arraysneed extraordinarily rich and powerful Communication ResourcesThe Solution: Generators for Domain-specific RA Platforms



Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

http://kressarray.de



Xputer LabSuper Pipe Networks

pipeline propertiesarray applications

shape resources

mappingscheduling

(data streamformation)

systolicarray

regular datadependencies

only

linearonly

uniformonly

linear projection oralgebraic synthesis

super-systolicRA

no restrictionssimulated

annealing orP&R algorithm

(e.g. force-directed)schedulingalgorithm

The key is mapping, rather than architecture

**) KressArray [ASP-DAC-1995]








Xputer LabMorphware machines vs. hardwired

machines

platformprogram source

running on it

hardware (not programmable)

morphware

fine grain rGA (FPGA)configwarecoarse

grainrDPU, rDPA

machine

reconfigurable data stream processor

flowware & configware

hardwired

data stream processor

flowware

instruction stream processor (v. N.)

software

A clear terminology helps a lot



Xputer Lab

DPA

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data streams

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|output data streams

time

port #

time

time

port #time

port #

... which data item at which time at which port

Flowware defines:



Xputer LabParadigm Shifts:

Nick Tredennick‘s view

algorithms variable

resources fixed

instruction-stream-based computing:

algorithms variable

resources variable

data-stream-based reconfigurable computing:

programmable

why 2 program sources ?

Configware

resources variable

Flowware

data-stream

Software

instruction-stream



Xputer Lab

Flowware heading toward mainstream

•Data-stream-based Computing is heading for mainstream

–1997 SCCC (LANL) Streams-C Configurabble Computing

–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution

–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing

–2000 Bee (UCB), ...

–Most stream-based multimedia systems, etc.

–Many other areas ....

Flowware ..... mostly not yet modelled that way: most

flowware is hidden by its indirect instruction-stream-based implementationFlowware:

managing data streamsSoftware: managing instruction streams



Xputer Labcontrol-procedural vs. data-procedural

The structural domain is primarily data-stream-based:

Flowware provides a (data-)procedural abstraction of the (data-stream-based) structural domain

Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ...

... a Troyan horse to introduce the structural domain to the procedural mind set of programmers








Xputer Lab

asM

distributed memory

architecture

distributed memory

architecture

Configware / Flowware Compilation

r. DataPath

Array

rDPA intermediate

high level source

wrapper

flowwareflowware

scheduler

M M M M

M M M M

MM

MM

MM

MM

data streams

data sequencer

address generato

r

„instruction“ fetch before runtime

configwareconfigware

mapper



Xputer Lab>>> extremely high

efficiency: flowware-based computing

1. avoiding address computation memory cycle overhead

2. avoiding instruction fetch and interpretation overhead

3. high parallelism, massively multiple deep pipelines

4. much less configuration memory

5. interconnect layouted over the cell: no extra routing areas

6. methodologies readily available



Xputer LabProgramming Language

Paradigms

language category Software Languages Languages f. Anti Machine

both deterministic procedural sequencing: traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching

state register program counter data counter(s) address computation

massive memory cycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoided parallel memory bank access interleaving only no restrictions

language features control flow + data manipulation

data streams only (no data manipulation)

very easy to learn

multipleGAGsmuch more

simple

much moresimple

much more

powerful

flowware languagesflowware languages



Xputer LabMachine Paradigms

machine category Computer (the Machine:

“v. Neumann”) The Anti Machine

driven by: Instruction streams data streams (no “dataflow”)

engine principles instruction sequencing sequencing data streams

state register single program counter (multiple) data counter(s)

Communication path set-up .

at run time at load time

resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path

operation sequential parallel pipe network etc.

( “instruction fetch” )

also hardwired implementations**) e g. Bee project Prof. Broderson








Xputer Labcomputing paradigms and

methodologies

1946: machine paradigm (von Neumann)

1980: data streams (Kung, Leiserson)

1989: anti machine paradigm

1990: 1st rDPU* (Rabaey)

1994: anti machine high level programming language

1995: super systolic rDPA (Kress)

1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...

1997+: discipline of distributed memory architecture

1997: 1st configware / software partitioning compiler

flow

ware

*) rDPU = reconfigurable Data Path Unit



Xputer LabThe Secret of Success: Co-

Compilation

Analyzer/ Profiler

SW code

SWcompiler

paradigm“vN" machine

CW Code

CWcompiler

anti machineparadigm

Partitioner

Resource Parameters

supportingdifferentplatforms

supporting platform-based design

High level PL source



Xputer Lab

data-stream machine

M

DPU or rDPU

data addressgenerator(data sequencer)

memory

I/O

asM**

(anti machine)(anti machine)

Machine paradigms

von Neumanninstruction

stream machineM

I/O

instructionsequencer

CPU

instructionstream

I/OMM MM M

(r)DPU

DPU

Software

I/OMM MM M

(r)DPA

memorydistributed memory architecture*

data stream

Flowware

(Configware)

(reconf.)

*) the new discipline came just in time:see Herz et al.: Proc. IEEE ICECS, 2002

instruction stream+

CPU

- data stream

-DPU

+

memory

also see books by Francky Catthoor et al.



Xputer Lab

Synthesizable distributed memory architecture...

Memory(data memory)

memory bank

memory bank

memory bank

memory bank

memory bank

...

...

Scheduler

for a Stream-based Soft Machine

rDPA“instructions”

Compiler

Sequencers(data stream

generator)



Xputer LabPC replaced by PS

mainframe age

computer age (PC age)

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

PC replaced by PS (personal supercomputer)

PC replaced by PS (personal supercomputer)

flowware

rDPArDPAµProcµProc

co-compilerco-compiler

anti machineanti machinevon Neumannvon Neumann



Xputer Lab all methodologies available

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

flowware

free know-how for personal super computer

free know-how for personal super computer

rDPArDPAµProcµProc

co-compilerco-compiler

.... and all other methodologies available from

literature

.... and all other methodologies available from

literature



Xputer LabWe have an education problem

... we need a second machine paradigm

The typical programmer has problems to understand function evaluation without machine mechanisms....

Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software

acceleratorsacceleratorsµprocessorµprocessor

It‘s the gap between procedural and structural mind set

Crossing the Hardware / Software Chasm [Mike

Butts]



Xputer Lab Ubiquitous Embedded Systems

... and the main focus in system design

embedded software and configware became the main vehicle to product differentiation ...

(Performance and) Flexibility are key issues

current CS curricula do not qualify our students



Xputer Labmisqualified: jobless CS graduates

?

Embe

dded

sof

twar

e [D

TI*

law

]

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

(1.4/year)

[Moore

’s law]90% of all code

written for embedded systems The real labor market:

10 times more programmers will write embedded applications than computer software by 2010








Xputer LabEDA Industry Revolution every 7 years

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis (HDLs): Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]McKinsey Curves

EDA industry paradigmswitching every 7 years

1999



Xputer LabEDA the main bottleneck

[cou

rtes

y by

Ric

hard

New

ton]

math formula ?TRS ?



Xputer LabBiggest Mistake of EDAguess it !



Xputer LabThe next EDA Industry Revolution

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis (HDLs): Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]McKinsey Curves

EDA industry paradigmswitching every 7 years

1999

(Co-) Compilation:data-stream-based

DPAs

Von Neumann does not support Morphware:

System-Cmath formula: TRS*

higher abstraction level:

*) Term Rewriting Systems



Xputer Lab Algorithmic cleverness needed

Example - migration from signal processor to rGA: very high throughput on low power slow FPGAs obtained only by algorithmic cleverness:

We need an all-embracing taxonomy of algorithms and survey on algorithm transformations ....

loop transformations ....

optimization, partitioning, signal processing, (de-) coding algorithms (wireless communication), image processing, sorting, .... And much more areas .....



Xputer Labalgorithmic cleverness needed for CS graduates in embedded

systemsthe hardware / configware / software partitioning problem: current CS curricula do not qualify our students

software / configware migration: current CS curricula do not qualify our students

extending software engineering into software / flowware engineering: the anti machine paradigm and reconfigurable computing are the curricular enablers



Xputer Lab>>> thank you

thank you



Xputer Lab

- END -



Xputer Lab

Appendix for

discussion



Xputer LabProcessor Memory Performance Gap

1

10

100

1000Performance

1980 1990 2000

µProc60%/yr..

DRAM7%/yr..

Processor-MemoryPerformance Gap:(grows 50% / year)

DRAM

CPU



Xputer LabWhy a dichotomy of machine

paradigms?

data stream machine:

• bad message: caches do not help

• good message: no vN bottleneck

• caches not needed

stolen from Bob Colwell

CPU

caches, ...

vN bottleneckvN: unbalanced

The anti machine has novon Neumann bottleneck



Xputer Lab„Pollack‘s Law“

(simplified)

[intel]

growth factor

µm

0.1

performance

area efficiency



Xputer LabLoop Transformation

Examples

loop 1-8bodybodyendloop

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop



reconf.array:host:loop 1-16bodyendloop

sequential processes: resource parameter drivenCo-Compilation

loop unrolling



Xputer Lab

desi

gn c

ost

year

product life cycle

Die Entwurfs-KriseDie langen Durchlauf-Zeiten der ASIC-Fertigung werden zunehmend unbezahlbar

Steigende Nachfrage: schnelle Patches und Upgrades – möglichst am Standort des Kunden – Förderung der Langlebigkeit des Produktes



Xputer LabSummary of the Anti Machine

Paradigm

• anti language primitives are almost the same (slightly extended)

• anti machine execution potential is dramatically more powerful

• provides drastically more flexibility

• not always replacing von Neumann



Xputer LabReconfigurable Computing:

a second programming domain

Migration of programming to the structural domain

Currently running: the next fundamental revolution after introduction of the microprocessor

The structural domain has become RAM-based

However, CS curricula ignore this impact of Reconfigurable Computing – key issue in embedded systems ...

... causing the coming disaster by unqualified CS graduates pushing up the unemployment rate ?



Xputer LabAll enabling technologies are

available

•anti machine and all its architectural resources

•parallel memory IP cores and generators

•anything else needed

•languages & (co-)compilation techniques

•morphware vendors like PACT ....

•literature from last 30 years



Xputer LabNew horizons

• A new RAM-based platform going mainstream• Configware industry• New machine paradigm• New theory needed• New architectures – without v. N. bottleneck• New compilation techniques• More effective parallelism provided• Rich material is already available in many areas• Lots of similarities with the classical v.N. world• But a few asymmetries: a challenge



Xputer Lab evangelist‘s material + lobby

space

Evangelist‚s material:• http://hartenstein.de – click „recent talks“Lobby space:• http://morphware.net• http://configware.org• http://data-streams.org• http://flowware.netTrailblazer group:• you are welcome to improve, rewrite, post links ...• You are welcome to join the trailblazer group



Xputer LabThe genious of von Neumann

• enormous impact of the von Neumann paradigm• even stronger impact by a dichotomy of

paradigms:• von Neumann of matter• von Neuman of anti matter –• Von Neumann machine vs. anti machine

• does not mean throwing over v. N.‘s monument• it multiplies the glory of von Neumann



Xputer Lab MPU performance stalled

Moore’s law will stall soon for MPUs

relative computation time needed doubles every 2 years

had been compensated by Moore’s law

Bill Gates’ law:



Xputer LabBasics of Binding Time

run time

loading time

compile time

time of “Instruction Fetch”

microprocessorparallel computer

ReconfigurableComputing



Xputer LabTime to Market

• Morphware brings a new dimension to digital system development and has a strong impact on SoC design.

• Flexibility supports spin-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades

• A New Business Model (in-field debugging and upgrading ... )

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

1 10 20

ASIC Product

30

Update 1

Product

Update 2

reconfigurable Product with download

[Tom Kean]



Xputer LabKressArray principles

• take systolic array principles

• replace classical synthesis by simulated annealing

• yields the super systolic array

• a generalization of the systolic array

• no more restricted to regular data dependencies

• now reconfigurability makes sense



Xputer LabSignificance of Address Generators

• Address generators have the potential to reduce computation time significantly.

• In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750

• Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead



Xputer LabAcceleration Mechanisms

•parallelism by multi bank memory architecture•auxiliary hardware for address calculation •address calculation before run time

•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations•improve parallelism by memory architecture transformations

•alleviate interconnect overhead (delay, power and area)



Xputer Lab

Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

microprocessor / DSP

No

rmal

ized

pro

cess

or

spee

d

battery performance

Algorithmic Complexity(Shannon’s Law)

memory

Tra

nsi

sto

rs/c

hip

1960 1970 1980 1990 2000 2010

100 000 000

10 000 000

1000 000

100 000

10 000

1000

100

10

1

2G

3G

4GWhy coarse

grain ?

1G

wireless

100

10

1

0.1

0.01

0.001

mA/ MIP

computational efficiency

StrongARMSH7752

Documents

CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November