18
Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer Engineering University of California, Santa Barbara Adam Kaplan, Philip Brisk and Majid Sarrafzadeh Computer Science Department University of California, Los Angeles

Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Physically Aware Data Communication Optimization

for Hardware Synthesis

Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer

Dept. of Electrical and Computer Engineering

University of California, Santa Barbara

Adam Kaplan, Philip Brisk and Majid Sarrafzadeh

Computer Science Department

University of California, Los Angeles

Hardware Compilation

Application specified in high level language

CompilerCompiler

SynthesisSynthesis

and and

PhysicalPhysical

DesignDesign

HDLHDL(behavioral,(behavioral,

structural)structural)

We focus our efforts on mapping an application written in a high-level language to a hardware description

We desire this mapping to have optimal characteristics (area, latency, etc.)

In this talk, we focus on the problem of minimizing data communication in the final hardware Chip, bitstream, …

Obligatory Design Flow SlideSUIF:

Syntactic &SemanticAnalysis

ApplicationSpecification

ASTMachine

SUIF:CompilerBackend

SSACDFG

4. Synthesize behavioral HDL code to RTL code

Behavioral Synthesis

Logical & Physical Synthesis

8. Synthesize RTL code

Entity 1

Entity 3Entity 2

Entity 4

6. Determine structural controland data communicationbetween basic block entities

7. Generate synthesizable RTL code

CFG Entity5. Create CFG interface

entity cfg is…

architecture behavioral of cfg…

2. Transform instruction list to dataflow graph

1. Create interface

++

+ *

*

3. Transform dataflow graph to behavioral HDL code

Basic Block Entity

entity basic_block is…

architecture behavioral of basic_block…

Characterizing Data Communication

Examples of data communication schemes

Control Node 1

Control Node 3

Control Node 2

Control Node 4

Memory(Register

Bank,RAM)

Control Node 4

Control Node 2

Control Node 3

Control Node 1

Bus

Distributed Distributed Centralized Centralized Data communication = wire Data communication = memory access

Identifying Data Communication Determine relationship between place(s) where data is

defined and where data is used

b …

a …

a

a …

a …

c …b …

b c

Naïve method: all use-points of a variable depend on all definitions of that variable

Not all use points “use” a variable

Need analysis to minimize Need analysis to minimize the amount of data the amount of data communicationcommunication

Global Data Communication = 5 variables

Use of SSA in Compilation

b …

a …

a

a …

a …

c …b …

b c

b1 …

a2 …

a4

a3 …

a1 …

c1 …b2 …

b1 c1

a4 (a2,a3)

Must determine relationship between where data is generated and where data is used

Problem formulations [DAC02]: Minimize the total number of

bits communicated between all pairs of control nodes

Today: Minimize overall wirelength SSA (Static Single Assignment)

Changes each variable to have a unique definition point

Must add -nodes to merge definitions

Physically Aware Compiler Transforms

Consider layout information during compilation Modify transforms to consider physical info Ideal: full physical synthesis – extremely

accurate, but way too time consuming

PhysicalSynthesis

HardwareCompilation

application

Floor-planner

Approximate using floorplanningMuch fasterGives “good enough” high level

physical picture Previous data communication work

No physical informationCan lead to negative results

Let’s Get Physical!

Physically Aware Data Communication

Modify placement of Φ-functions to consider wirelength

1. Given a CFG Gcfg(Vcfg, Ecfg)

2. perform_ssa(Gcfg)

3. calculate_def_use_chains(Gcfg)

4. remove_back_edges(Gcfg)

5. topological_sort(Gcfg)

6. foreach vertex v Vcfg

7. foreach -node v

8. s .sources

9. d |def_use_chain(.dest)|

10. IDF iterated_dominance_fronter(s)

11. PossiblePlacements findPlacementOptions(IDF)

12. place()

selectBest(PossiblePlacements)

13. distribute/duplicate to place()

-Placement Algorithm

1. Given a set of CFG Nodes R

2. -options

3. insert(R) into-options

4. foreach instruction i R

5. if( i is a destination of -function f )

6. return -options

7. temp_-options

8. foreach non-dominated child c of R

9. temp_-options crossProductJoin(temp__options, findPlacementOptions(c))

10. return-options temp_-options

FindPlacementOptions Algorithm

Algorithm in Action

FAST function from MediaBench testsuite

F T

T F

N3

nn_4, i_2 nn_5, i_3

N9

Algorithm in Action

F T

T Fnn_4, i_2 nn_5, i_3

N3

N9

F T

T F

N3

nn_4, i_2 nn_5, i_3

N9

PhysicalSynthesis

HardwareCompilation

FullFloor-

planner

1. Initial optimization minimizes data communication

2. Full SA based floorplanning3. Reoptimization based to minimize

floorplanning4. Full SA based floorplanning

Floorplan Wirelength

1

10

100

1000

10000

100000

1000000

10000000

benchmark

wire

leng

th (l

ogar

ithm

ic)

WL (first)

WL (second)

Spectacularly negative results

Full Floorplanning Results

Simple iterative approach

Incremental Floorplanning

Incremental Placement [Coudert et al]: Given an optimized placement and a set of changes to

the netlist (e.g., due to technology remapping) modify the placement to improve it.

Equally applicable to floorplanning

6

1

2

3

4

6

Initial Floorplan Modified Floorplan

Perturbations 1

2

3

4

6

6

1

floorplanmodules (e.g. due to -function movement) floorplan

1

2

3

4

6

6

|

2/2.3 - 9/10.1 -

11/12.4 - 16/18 -

5/5.6 - 27/30.4 -

32/36 -

-

3

-

2

1

4

Incremental Floorplan

Our Incremental Floorplanner

IncrementalFloorplanner

6

1

2

3

4

6

Initial Floorplan Modified Floorplan

Perturbations1

2

3

4

6

Our Incremental Floorplanner

1. Calculate area & room of each node: bottom up slicing tree traversal

2. Area redistribution Top down traversal Increase area if necessary

Not enough space at root Aspect ratios become too distorted

1

2

3

4

6

6

|

2/2.3 - 9/10.1 -

11/12.4 - 16/18 -

5/5.6 - 27/30.4 -

32/36 -

-

3

-

2

1

4

Incremental FloorplanModified Floorplan

1

2

3

4

Simple, yet effective

Other more complicated algorithms might work better

MediaBench Functions

Benchmark Blocks Links Weight Initial WL

1adpcmcoder

33 31 54 2688 35568

2adpcm

decoder26 23 44 1952 21588

3internal

filter10 143 60 17088 411637

4Internalexpand

101 94 257 14336 317031

5compress

output34 17 60 2368 29114

6mpeg2dec

block62 13 66 2272 34510

7mpeg2dec

vector16 4 26 1024 4366

8 FAST 14 4 15 704 3714

9 FR4TR 77 87 155 704 340697

10 det 12 5 13 7936 3772

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 avrg

Initial Overall Optimal Overall Incremental Phi Optimal Phi Incremental

Incremental Floorplanning Results

Norm

alized

Wir

ele

ng

th

Benchmarks

“Optimal” Approach:12% Overall Wirelength Reduction

25% Phi-node Wirelength Reduction

Our Approach:6% Overall Wirelength

Reduction 8% Phi-node Wirelength

Reduction

Related Work

Hardware compilation projects using SSA PDG+SSA form [UCSB] CASH [CMU] SA-C [UCR] Sea Cucumber [BYU]

Physically aware behavioral synthesis techniques SA for scheduling, binding and floorplanning [Prabhakaran97] SA for binding and floorplanning [Yung-Ming94] Scheduling, allocation and binding [Dougherty00] Fasolt: bus topology [Knapp92] High level synthesis [Tarafdar00]

Incremental CAD Problem overview/challenges [Coudert00] Floorplanning [Crenshaw99]

Conclusions

It’s been a long strange trip…

SSA a nice IR for hardware compilationExplicitly shows data flowUseful for exploiting parallelism

Compiler techniques applied to hardware design can reduce wirelengthThey must be aware of physical informationThey must use an incremental floorplanning