65
Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology [email protected] t.nl International Symposium on NEW TRENDS IN COMPUTER ARCHITECTURE Gent, Belgium December 16, 1999

Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology [email protected]

Embed Size (px)

Citation preview

Page 1: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Transport Triggered Architectures used for Embedded Systems

Henk Corporaal

EE department

Delft Univ. of Technology

[email protected]

http://cs.et.tudelft.nl

International Symposium onNEW TRENDS IN

COMPUTER ARCHITECTURE Gent, Belgium

December 16, 1999

Page 2: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19992

Topics

MOVE project goals Architecture spectrum of solutions From VLIW to TTA Code generation for TTAs Mapping applications to processors Achievements TTA related research

Page 3: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19993

MOVE project goals Remove bottlenecks of current ILP processors Tools for quick processor and system design; offer

expertise in a package Application driven design process Exploit ILP to its limits (but not further !!) Replace hardware complexity with software complexity as

far as possible Extreme functional flexibility Scalable solutions Orthogonal concept (combine with SIMD, MIMD, FPGA

function units, ... )

Page 4: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19994

Architecture design spectrumFour dimensional architecture design space: I,O,D,SS = freq (op) lt(op)

Four dimensional architecture design space: I,O,D,SS = freq (op) lt(op)

Operations/instruction ‘O’

Instructions/cycle ‘I’

Data/operation ‘D’

Superpipelining degree ‘S’

(1,1,1,1)

VLIW

Superpipelined

RISC

SIMD

Superscalar DataflowCISC

(MOVE design space)

Page 5: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19995

Architecture design spectrumArchitecture I O D S Mpar

CISC 0.2 1.2 1.1 1 0.26

RISC 1 1 1 1.2 1.2

VLIW 1 10 1 1.2 12

Superscalar 4 1 1 1.2 4.8

Superpipelined 1 1 1 3 3

Vector 0.1 1 64 5 32

SIMD 1 1 128 1.2 154

MIMD 32 1 1 1.2 38

Dataflow 10 1 1 1.2 12

Mpar is the amount of parallelism to be exploited by the compiler / application !Mpar is the amount of parallelism to be exploited by the compiler / application !

Page 6: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19996

Architecture design spectrum

Which choice: I,O,D,or S ? A few remarks: I: instructions / cycle

Superscalar / dataflow: limited scaling due to complexity

MIMD: do it yourself

O: operations / instruction VLIW: good choice if binary compatibility not an

issue Speedup for all types of applications

Page 7: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19997

Architecture design spectrum D: data/operation

SIMD / Vector: application has to offer this type of parallelism

may be good choice for multimedia

S: pipelining degree Superpipelined: cheap solution however, operation latencies may become dominant unused delay slots increase

MOVE project initially concentrates on O and S

Page 8: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19998

From VLIW to TTA

VLIW Scaling problems

number of ports on register file bypass complexity

Flexibility problems can we plug in arbitrary functionality ?

TTA: reverse the programming paradigm template characteristics

Page 9: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 19999

From VLIW to TTA

General organization of a VLIW

Inst

ruct

ion

mem

ory

Inst

ruct

ion

fetc

h un

it

Inst

ruct

ion

deco

de u

nit

FU-1

FU-2

FU-3

FU-4

FU-5

Reg

iste

r fi

le

Dat

a m

emor

y

CPU

Byp

assi

ng n

etw

ork

Page 10: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199910

From VLIW to TTAStrong points of VLIW:

Scalable (add more FUs) Flexible (an FU can be almost anything)

Weak points: With N FUs:

Bypassing complexity: O(N2) Register file complexity: O(N) Register file size: O(N2)

Register file design restricts FU flexibility

Solution: mirror programming paradigm

Page 11: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199911

Transport Triggered Architecture

General organization of a TTAIn

stru

ctio

n m

emor

y

Inst

ruct

ion

fetc

h un

it

Inst

ruct

ion

deco

de u

nit

FU-1

FU-2

FU-3

FU-4

FU-5

Reg

iste

r fi

le

Dat

a m

emor

y

CPU

Byp

assi

ng n

etw

ork

Page 12: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199912

TTA structure; datapath details

integer RF

float RF

boolean RF

instruct. unit

immediate unit

load/store unit

integer ALU

float ALU

integer ALU

load/store unit

Socket

Page 13: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199913

TTA characteristicsHardware Modular: Lego play tool generator Very flexible and scalable

easy inclusion of Special Function Units (SFUs) Low complexity

50% reduction on # register ports reduced bypass complexity (no associative matching) up to 80 % reduction in bypass connectivity trivial decoding reduced register pressure

Page 14: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199914

Register pressure

12

34

5

12

34

51.00

1.50

2.00

2.50

3.00

3.50

ILP

de

gre

e

Read portsWrite ports

Read and write ports required

Page 15: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199915

TTA characteristics

SoftwareA traditional Operation-triggered instruction:

mul r1, r2, r3

A Transport-triggered instruction:

r3 mul.o, r2 mul.t, mul.r r1

Extra scheduling optimizations However: More difficult to schedule !

Page 16: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199916

Code generation trajectory

Application (C)

Compiler frontend

Sequential code

Compiler backend

Parallel code

Sequential simulation

Parallel simulation

Arc

hite

ctur

e de

scri

ptio

n

Profiling data

Input/Output

Input/Output

• Frontend: GCC or SUIF (adapted)

• Frontend: GCC or SUIF (adapted)

Page 17: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199917

TTA compiler characteristics

Handles all ANSI C programs Region scheduling scope with speculative

execution Using profiling Software pipelining Predicated execution (e.g. for stores) Multiple register files Integrated register allocation and scheduling Fully parametric

Page 18: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199918

Code generation for TTAs

TTA specific optimizations common operand elimination software bypassing dead result move elimination scheduling freedom of T, O and R

Our scheduler (compiler backend) exploits these advantages

Page 19: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199919

TTA specific optimizations

Bypassing can eliminate the need of RF accesses

Example: r1 -> add.o, r2 -> add.t; add.r -> r3; r3 -> sub.o, r4 -> sub.t sub.r -> r5;

Translates into: r1 -> add.o, r2 -> add.t; add.r -> sub.o, r4 -> sub.t; sub.r -> r5;

Page 20: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199920

Mapping applications to processors

We have described a Templated architecture Parametric compiler exploiting specifics of the

template

Problem:

How to tune a processor architecture for a certain application domain?

Page 21: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199921

Mapping applications to processors

Architectureparameters

OptimizerOptimizer

Parametric compilerParametric compiler Hardware generatorHardware generator

feedbackfeedback

Userintercation

Parallel object code chip

Pareto curve(solution space)

cost

exec

. tim

e

x

x

x

x

xx

x

xx

x

x

x

x

x

x

xx x

x

x

Move framework

Page 22: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199922

Achievements within the MOVE project Transport Triggered Architecture (TTA) template

lego playbox toolkit Design framework almost operational

you may add your own ‘strange’ function units (no restrictions) Several chips have been designed by TUD and Industry; their

applications include Intelligent datalogger Video image enhancement (video stretcher) MPEG2 decoder Wireless communication

Page 23: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199923

Video stretcher board containing TTA

Page 24: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199924

Intelligent datalogger• mixed signal• special FUs• on-chip RAM and ROM• operates stand alone• core generated automatically• C compiler

Page 25: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199925

TTA related research

RoD: registers on demand scheduling SFUs: pattern detection CTT: code transformation tool Multiprocessor single chip embedded systems Global program optimizations Automatic fixed point code generation ReMove

Page 26: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199926

RoD: Register on Demand scheduling

Page 27: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199927

Phase ordering problem: scheduling allocation Early register assignment

Introduces false dependencies Bypassing information not available

Late register assignment Span of live ranges likely to increase which leads to

more spill code Spill/reload code inserted after scheduling which

requires an extra scheduling step Integrated with the instruction scheduler: RoD

More complex

Page 28: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199928

RoD 4 -> add.o, x -> add.t, add.r-> y;4 -> add.o, x -> add.t, add.r-> y;r0 -> sub.o, y -> sub.t, sub.r -> z;r0 -> sub.o, y -> sub.t, sub.r -> z;

4 -> add.o r1-> add.t4 -> add.o r1-> add.t

4 -> add.o r1 -> add.t4 -> add.o r1 -> add.tadd.r -> r1add.r -> r1

4-> add.o r1 -> add.t4-> add.o r1 -> add.tadd.r -> sub.tadd.r -> sub.t

4-> add.o r1 -> add.t4-> add.o r1 -> add.tadd.r -> sub.t r0 -> sub.oadd.r -> sub.t r0 -> sub.osub.r -> r7sub.r -> r7

RRTsSchedule

r0r0

r0 r0

r0r0

r0r0

r0 r0

r0, r1r0, r1

r0r0

r7r7

step 1.step 1.

step 2.step 2.

step 3.step 3.

step 4.step 4.

step 5.step 5.

Page 29: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199929

Spilling Occurs when the number of simultaneously live

variables exceeds the number of registers

Contents of variables are stored in memory

The impact on the performance due to the insertion of extra code must be as small as possible

Page 30: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199930

Spilling

def r1def r1store r1

use r1load r1use r1

def y

use xuse y

def x

Page 31: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199931

Spilling Operation to schedule:

x -> sub.o, r1 -> sub.t; sub.r -> r3;

Code after spill code insertion: Bypassed code:

4 -> add.o, fp -> add.t; 4 -> add.o, fp -> add .o;add.r -> z; add.r -> ld.t;z -> ld.t; ld.r -> sub.o, r1 -> sub.t;ld.r -> x; sub.r -> r3;x -> sub.o, r1 -> sub.t;sub.r -> r3;

Page 32: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199932

RoD compared with early assignment

32 24 20 16 12 10-5

0

5

10

15

20

25

30

35

32 24 20 16 12 10

a68bisoncompressdhrystonegzipsievesortsumuniqwcaverage

Number of registersNumber of registers

Spee

dup

of R

oD[%

]Sp

eedu

p of

RoD

[%]

Page 33: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199933

RoD compared with early assignment

0

4

8

12

16

20

24

12 16 20 24 28 32

RoD

early assignment

Number of registers

cycl

e co

unt i

ncre

ase[

%]

cycl

e co

unt i

ncre

ase[

%]

Impact of decreasing number of registers

Page 34: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199934

Special Functionality: SFUs

Page 35: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199935

Mapping applications to processors

SFUs may help ! Which one do I need ? Tradeoff between costs and performance

SFU granularity ? Coarse grain: do it yourself (profiling helps)

Move framework supports this Fine grain: tooling needed

Page 36: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199936

SFUs: fine grain patterns

Why using fine grain SFUs: code size reduction register file #ports reduction could be cheaper and/or faster transport reduction power reduction (avoid charging non-local wires)

Which patterns do need support? Detection of recurring operation patterns needed

Page 37: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199937

SFUs: Pattern identification

Method: Trace analysis Built DDG Create pattern library on demand Fusing partial matches into complete matches

Page 38: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199938

SFUs: fine grain patterns

General pattern & subject graph multi-output non-tree operand and operation nodes

Page 39: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199939

SFUs: covering results

Page 40: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199940

SFUs: top-10 patterns (2 ops)

Page 41: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199941

SFUs: conclusions

Most patterns are: multi-output and not tree like Patterns 1, 4, 6 and 8 have implementation

advantages 20 additional 2-node patterns give 40% reduction

(in operation count) Group operations into classes for even better

results

Now: scheduling for these patterns? How?

Page 42: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199942

Source-to-Source transformations

Page 43: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199943

Design transformationsSource-to-source transformations CTT: code transformation tool

GUILibrary oftransformations

Input Csources

Output Csources

CTT

Page 44: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199944

Transformation example: loop embedding

....for (i=0;i<100;i++){

do_something();}....void do_something() { procedure body}

....for (i=0;i<100;i++){

do_something();}....void do_something() { procedure body}

....do_something2();....void do_something2() { int i; for (i=0;i<100;i++){ procedure body }}

....do_something2();....void do_something2() { int i; for (i=0;i<100;i++){ procedure body }}

Page 45: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199945

Structure of transformation

PATTERN { description of the code selection stage}

CONDITIONS { additional constraints}

RESULT { description of the new code}

PATTERN { description of the code selection stage}

CONDITIONS { additional constraints}

RESULT { description of the new code}

Page 46: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199946

Implementation

Transformations

IR

IR

Inputsources

IR

Outputsources

SUIFfront-end

SUIFfront-end

SUIFlinker

CodeTransformationEngine

s2c

IRCTT

Page 47: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199947

Experimental results

Loop peeling. Index set splitting. Loop reversal. Loop skewing.

Loop fusion. Wave fronting. Inlining. Loop fission.

Strip mining. Code sinking. Unswitching. Loop embedding

and extraction.

Could transform 39 out of 45 SIMD loops (in a set of 9 DSP benchmarks and MPEG)

Can handle transformations like:

Page 48: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199948

Partitioning your program for Multiprocessor single chip

solutions

Page 49: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199949

RAM I/O TPU

core core core

sfu1 sfu2 sfu1 sfu1 sfu2

sfu3

Asip1 Asip2 Asip3

RAM RAM

Multiprocessor embedded system

An ASIP based heterogeneous multiprocessor How to partition and map your application? Splitting threads

Page 50: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199950

Design transformations

Why splitting threads?

Combine fine (ILP) and coarse grain parallelism Avoid ILP bottleneck Multiprocessor solution may be cheaper

More efficient resource use Wire delay problem clustering needed !

Page 51: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199951

Experimental results of partitioner

0

2

4

6

8

10

12

14

16

18

Sp

eed

up

Benchmark

1 proc 2 procs 3 procs 4 procs

Page 52: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199952

Instant frequency tracking example

Page 53: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199953

Global program optimizations

Page 54: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199954

Traditional compilation path

Compiler output is textual, i.e. assembly loss of source-level

information. The object code defines

the program’s memory layout. efficient binary

representation, but not suitable for code

transformations.

compilersource

file

objectcode

library code

executable

assembly

assembler

Page 55: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199955

New Compilation Path Structured machine-level

representation of the program: the representation is

accessible to “binary tools”, high-level information is

maintained and passed to the linker,

code transformations on whole-programs are easier.

The link function and the section offsets information must be rethought.

front-end

sourcefile

machine-level IR

library codeIR

linked machinecode

Page 56: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199956

Inter-module Register Allocation After linkage global exported variables can be

allocated to registers Performing re-allocation of exported variables

before scheduling is expensive

Solution: re-allocation after linking all modules Analyses on variable aliasing (is address taken?) is

computed and maintained A larger pool of live ranges candidates available

for actual register allocation

Page 57: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199957

Fixed-point conversion: motivation

Cost of floating-point hardware.

Most “embedded” programs written in ANSI C.

C does not support fixed-point arithmetic.

Manual writing of fixed-point programs is tedious

and error-prone (insertion of scaling operations).

Fixed-point extensions to C are only a partial

solution.

Page 58: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199958

Fixed-point conversionExample:

acc += (*coef_ptr) * (*data_ptr)

coef_ptr coef_data

load load

mul

add

acc

acc

coef_ptr coef_data

load load

call mulh()

add

acc

acc

>>1

<<1

4 40

5

4

Page 59: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199959

Methodology The user starts with a floating-point

version of the application.

The user annotates a selected set of

FP variables.

The converter automatically

converts the remaining

variables/temporaries and delivers

feedback.

Result: source file where floating-

point variables are replaced by

integer variables with appropriate

scaling operations.

Userannotes

CProgram

converter

AnnotedC

Program

Fixed-point C

Program

Page 60: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199960

Link-time code conversion Problem: linking fixed-point code with library code

transformations on binary code impractical source-level linkage is awkward

Solution: Floating- to fixed-point conversion of library code “on the fly” during linkage.

Advantages: No need to compile in advance a specific version of the

library for a particular fixed-point format. Information about the fixed-point format can flow

between user and library code in both directions.

Page 61: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199961

Experimental Results

SE

SSESQNR

'log10

SQNR (dB)

program fixed-p.1 fixed-p.2

FIR 33.1 74.7

IIR 20.3 55.1

floating-p.

70.9

64.9

S = floating-point signal S’ = fixed-point signal

Accuracy Metric: signal-to-noise ratio (dB)

Test programs: 35th-order FIR, 6th-order IIR filters

Page 62: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199962

Experimental Results

Performance and code size

Floating-point Fixed-point

hardware sw emulation

program cycles size cycles size

FIR

IIR

32826 66

7422 73

151849 170

39192 258

version2

cycles size

39410 72

8723 93

Page 63: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199963

What next?

How to map your application A(L,A,D) to hardware (L,N,C)

L: design level (e.g. architecture, implementation or realization level)A: application compononentsD: dependences between application componentsN: hardware componentC: connections between hardware components

Page 64: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199964

Integrated design environment Software

descriptionAG(L,A,D)

HardwaredescriptionRG(L,N,C)

Mapper &Scheduler

Analysis

Exploration

Steeringdesigntransformation

Steeringdesigntransformationand mapping

Design point

Statistics

Designtransfor-mations

Designtransfor-mations

In the MOVE project we mostly ‘closed’ the right part of the design cycle !!In the MOVE project we mostly ‘closed’ the right part of the design cycle !!

Page 65: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl

Gent, December 199965

Conclusions / Discussion Billions of embedded systems with embedded processors sold

annually; how to design these systems quickly, cheap, correct, low power,.... ?

We have experience with tuning architectures for applications extremely flexible templated TTA; used by several companies parametric code generation automatic TTA design space exploration

The challenge: automated tuning of applications for architectures : closing the Y-chart design transformation framework needed