15
2014-02-13 1 TDTS 01 Lecture 7 High-Level Synthesis II TDTS 01 Lecture 7 High-Level Synthesis II Zebo Peng Embedded Systems Laboratory IDA, Linköping University Zebo Peng Embedded Systems Laboratory IDA, Linköping University 2 Zebo Peng, IDA, LiTH Zebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 TDTS01 Lecture Notes – Lecture 7 Lecture 7 Advanced HLS issues Control unit synthesis Allocation and binding

High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

1

TDTS 01 Lecture 7

High-Level Synthesis IITDTS 01 Lecture 7

High-Level Synthesis II

Zebo PengEmbedded Systems Laboratory

IDA, Linköping University

Zebo PengEmbedded Systems Laboratory

IDA, Linköping University

22Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Lecture 7

Advanced HLS issues

Control unit synthesis

Allocation and binding

Page 2: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

2

33Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation and Binding

Allocation (unit selection) —— To determine the type and number of hardware resources required, including

Functional units

Storage elements

Buses

Binding —— Assignment to resource instances:

Operations to functional unit instances

Values to be stored to instances of storage elements

Data transfers to bus instances

Allocation and binding generate the datapath of the design.

44Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation and Binding Principle

Resource sharing: Allow multiple non-concurrent operations to share the same hardware as much as possible.

Optimization goal:

Minimize total cost of functional units, registers, bus drivers, and multiplexers.

Minimize total interconnection length (placement info needed).

Constraint on critical path delay.

s1 +

+

a

a b,e,g

+1, +3

+

+

b c d

s2

o1

o3

o2

o4

e f

g h

c,f,h d

+2, +4

Page 3: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

3

55Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation/Binding — Approach 1 Constructive — start with an empty datapath and add

functional, storage and interconnection components as needed.Greedy algorithms — perform allocation/binding for one control

step at a time.

+ +

*

Reg

m1, m2

a1, a3 a2, a4

Rule-based –– used to select type and numbers of function units, especially prior to scheduling.

+

*

*

+

+

1

2

3

a1

+

m1

a2 a3

m2

a4

f

f

66Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation/Binding — Approach 2

Graph-theoretical formulations — Sub-tasks are

mapped into well-defined problems in graph theory.

Clique partitioning.

Left-edge algorithm.

Graph coloring.

Page 4: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

4

77Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Clique Partitioning

G = (V, E), an undirected graph with a set V of vertices and a set E of edges. a1

a2

a4

a3

a clique

another clique

A clique partitioningexample

A clique is a set of vertices that form a complete subgraph of G.

The Clique Partitioning Problem:To partition G into a minimal number of cliques such that each vertex belongs to exactly one clique.

88Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation as Clique Partitioning

Functional unit allocation:

Each vertex represents an operation.

An edge connects two vertices iff:

The two operations are scheduled into different control steps, and

There exists a functional unit that is capable of carrying out both operations.

a3

a1

a2

a4

m1

m2

+

*

*

+

+

1

2

3

a1

+

m1

a2 a3

m2

a4

f

Page 5: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

5

99Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

S. Allocation as Clique Partitioning

Storage allocation as a clique partitioning problem:

Each value needed to be stored is mapped to a vertex.

Two vertices are connected, iff the life-times of the two values

do not intersect.

The clique partitioning problem is NP-complete.

Efficient heuristics must be developed.

Ex. Tseng developed a polynomial time algorithm, based on

step-wise grouping, which generates very good results.

1010Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Tseng’s Algorithm

A super-graph is derived from the original graph.

V3

V1 V2

V4 V5

Edge

(V1,V3) 1(V1,V4) 1(V2,V3) 0(V2,V5) 0(V3,V4) 1(V4,V5) 0

Commonneighbors V2

V4 V5V3

V1

Merge the two nodes and repeated from the first step, until no more merger can be carried out.

Find two connected super-nodes such that they have the maximum number of common neighbors.

Page 6: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

6

1111Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

S1-3

V2

V4 V5V3

V1

Tseng’s Algorithm (Cont’d)

Edge

(S1-3,V4) 0(V2,V5) 0(V4,V5) 0

Commonneighbors

V2

V5V4V3

V1

V2

V5V4V3

V1

S1-3-4

Edge

(V2,V5) 0

Commonneighbors V2

V5V4V3

V1

1212Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Left-Edge (LE) Algorithm

Used in channel routing to minimize the number of tracks used to connect points (layout design).

To minimize the number of needed tracks.

To reduce wire lengths.

To avoid wire crossings.

Page 7: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

7

1313Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

LE Algorithm for Reg. Allocation

Map birth time of a value to the left (top) edge, and its death time to the right (down) edge of a wire.

i1

a

b

*5

+ 4

+ 8

*9

- 6

+ 1

* 7+

10

*3

+ 2

i1 i2 i3 i4 i5

o1 o2 o3

a

bgf

ed

c

‘7’

‘3’

‘8’

‘8’

‘9’

‘4’ ‘2’

o1

i2 i3 i4 i5

d e

f g

c

o2 o3

1414Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

The Left-Edge Algorithm1. The values are sorted in increasing order of their birth

times.

2. The first value is assigned to the first register.

3. The list is then scanned for the next value whose birth time is equal to or larger than the death time of the previous value.

4. This value is assigned to the current register.

5. The list is scanned until no more value can share the same register.

6. A new register is then introduced to hold the next value in the sorted list, and the algorithm iterates from step 3.

Page 8: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

8

1515Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

LE Algorithm Example

i1

a

b

R2

i2

R1

o1

f

o2

i5i3

d

g

o3

R3 R4 R5

i4

e

c

i1

a

b

o1

i2 i3 i4 i5

d e

f g

c

o2 o3

Original life-times

a

b

o1

d e

f g

c

o2 o3

i1 i2 i3 i4 i5

Sorted list based on birth times Allocated registers

1616Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

LE Algorithm Discussions

The algorithm guarantees to allocate the minimum number of registers.

However, it has two disadvantages:Not all life-time table can be interpreted as intersecting

intervals on a line.

• Loop

• Conditional branches

The assignment is neither unique, nor necessarily optimal, in terms of minimal number of multiplexers, for example.

Page 9: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

9

1717Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Allocation/Binding — Approach 3

Transformational allocation –– starting from an initial allocation and binding, a final design is obtained by successive transformations. Usually it starts with a maximal allocation (each operation has

its dedicated physical unit).

The design is then improved by merging, step-by-step, physical units so that hardware resources are shared as much as possible.

Si +Sj + Si,j +

Si

Sj

1818Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Lecture 7

Advanced HLS issues

Control unit synthesis

Allocation and binding

Page 10: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

10

1919Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Control-Unit Synthesis

Two basic approaches are widely used:

Microcode.

Hard-wired.

The basic assumptions:

A synchronous controller is used.

A schedule is given with the set of activation signals

• for enabling, multiplexer input selection, bus control, etc.

The controller is modeled as a finite-state machine.

2020Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Microcoded Control Synthesis

To store the control information in an organized fashion.

A microcode ROM of size λ is used, where λ is the number of

schedule steps.

The ROM must have log2λ address bits (note: x denotes

the ceiling function).

A synchronous counter with a reset signal is used to address

the ROM.

The counter is controlled by the system clock.

The ROM contents can be implemented as horizontal or

vertical microcode.

Page 11: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

11

2121Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Horizontal Microcode Each activation signal is associated to one bit of the word in

the microcode. Address Microwords

00011011

CounterResetClock

1 1 0 0 0 1 0 1 0 1 00 0 1 0 0 0 1 0 1 0 10 0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0

Activation signals

The word length is usually much larger than λ, and the ROM has therefore a width larger than its height.

Each bit is connected directly to an activation signal ── high performance.

There are many zeros ── wasted storage resource.

λ

2222Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Vertical Microcode A fully vertical microcode encodes the n activation signals with

log2n bits to reduce the width of the ROM. Several words may be needed for a schedule step.

1 1 0 0 0 1 0 1 0 1 00 0 1 0 0 0 1 0 1 0 10 0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0

Activation signals

0 0 0 10 0 1 00 1 1 01 0 0 01 0 1 00 0 1 10 1 1 11 0 0 11 0 1 10 1 0 00 1 0 1

1 2 3 4 5 6 7 8 9 10 11 (n = 11)

Activation signals

Decoder

Page 12: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

12

2323Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Vertical Microcode Issues

A decoder is needed, which can be implemented by another ROM to form a two-stage control store.

Operation concurrency may not be fully supported.

Reserve code-words for concurrent operations.• e.g., using “1100” to denote activation of the first group

of activation signals.

Vertical control schemes can be implemented by:

Lengthening the schedule, or

Reading multiple ROM words in each step.

Both have, however, some disadvantages.

0 0 0 10 0 1 00 1 1 01 0 0 01 0 1 0

0 0 1 10 1 1 11 0 0 11 0 1 1

0 1 0 0

0 1 0 1

Activation S.

Decoder

2424Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Microcode Optimization To find the shortest encoding of the words such that full

concurrency is preserved — the microcode compaction problem (an intractable problem).

MC can be approached by partitioning the operations into groups such that only one operation is active in each group and therefore vertical encoding can be used in it.

1 1 0 0 0 1 0 1 0 1 00 0 1 0 0 0 1 0 1 0 10 0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0

Activation signals

1 2 3 4 5 6 7 8 9 0 1’

1 0 0 1 1 0 0 1 0 1 00 1 0 0 0 1 0 0 1 0 10 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0

1 3 4 2 6 7 5 8 9 0 1’

0 1 1 0 1 0 1 0 11 0 0 1 0 1 0 1 01 1 0 0 0 0 0 0 00 0 0 1 1 0 0 0 0

A B C D E

D1 D2 D3 D4

Page 13: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

13

2525Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Microcode Compaction To minimize the number of groups.

Construct a conflict graph, where the vertices correspond to the operations and the edges represent concurrency.

A minimum coloring of this graph gives the minimum number of groups needed.

Note: this does not necessarily lead to the minimum number of word bits (e.g., 10 can be divided as 5+5, or 7+3).

4

3

21

5

6

4

3

21

5

6

Coloring

2626Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Hard-Wired Control Synthesis

Generate a Moore-type finite-state machine from a schedule.

Synthesize the FSM model.

1 1 0 0 0 1 0 1 0 1 00 0 1 0 0 0 1 0 1 0 10 0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0

1 2 3 4 5 6 7 8 9 0 1’ S1 S2

S3S4

3,7,9,11

4

1,2,6,8,10

5

Reset

Page 14: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

14

2727Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Lecture 7

Advanced HLS issues

Control unit synthesis

Allocation and binding

2828Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Advanced Issues of HLS

Many-to-many mapping between operations and physical components.

Re-use of previous designs (partial structure).

Synthesis with commercially available sub-systems, IP-based synthesis.

HLS with testability consideration.

+

Adder

x

Mult

x

ALU

+ -

Subs Adder

Bit-width compatibility

Page 15: High-Level Synthesis II - ida.liu.seTDTS01/lectures/14/lec7.pdf · Graph coloring. 2014-02-13 4 Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7 7 TDTS01

2014-02-13

15

2929Zebo Peng, IDA, LiTHZebo Peng, IDA, LiTH TDTS01 Lecture Notes – Lecture 7TDTS01 Lecture Notes – Lecture 7

Summary

High-level synthesis is one of the most important design steps in the design process of electronic systems.

The use of efficient HLS tools has led to the great improvement of design productivity.

The two most important tasks are scheduling and allocation/binding, which are interdependent.

Controller design is also an important task, and its interaction with datapath design should be considered.

The HLS tasks are usually formulated as optimization problems and heuristic algorithms are used.