45
N orthw estern V LSICA D G roup azargan R. Kastner M. Sarraf Physical Design for Reconfigurable Computing Systems using Firm Templates Department of Electrical & Computer Engineering Northwestern University

Physical Design for Reconfigurable Computing Systems using Firm Templates

  • Upload
    moe

  • View
    21

  • Download
    2

Embed Size (px)

DESCRIPTION

Physical Design for Reconfigurable Computing Systems using Firm Templates. Department of Electrical & Computer Engineering Northwestern University. K. Bazargan R. KastnerM. Sarrafzadeh. Outline. Outline. FPGA: What and why? What is Reconfigurable Computing System (RCS)? - PowerPoint PPT Presentation

Citation preview

Page 1: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Northwestern VLSI CAD Group

K. Bazargan R. Kastner M. Sarrafzadeh

Physical Design for Reconfigurable Computing Systems

using Firm Templates

Department of Electrical &Computer Engineering

Northwestern University

Page 2: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 992

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 3: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 993

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 4: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 994

CPU

Data Memory

Control

Data

Data Data

Instruction Memory (Program)

RFUOPs CPU instructions

The Architecture of a Reconfigurable System

RFU

Page 5: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 995

Execution of a Sample Program

RFU

t y

x

x = 3*a - b;…

C = RFUOP1(x,5);

y = 4*x - c;

for (i=0;i<3;i++){

x+=RFUOP2(y);

++y;

}

z = RFUOP1(x,3);

a = z - y;

b = RFUOP3(a,b);

c = a - b;…

CodeCode DFGDFG

=> (on CPU)

(on RFU)=>

=>

=>

No room on RFU to run allin parallel ==> run in sequence

=>

=>

(in parallel)=>

=>

=>

Page 6: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 996

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 7: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 997

Application Example: Image Restoration

The value of the center pixel in the next iteration: xk+1 = *y + xk - * (d**xk)y: the pixel value from the original degraded image

xk: the pixel value from the previous iteration

d**xk denotes the weighted sumr1* (eight neighbor pixels) + r0 * center pixel r1 r1 r1

r1 r1 r1

r1 r1r0

Page 8: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 998

m

o

n

Image Restoration (cont.)

• Incentive:– Processing of large images using

FPGA’s with limited resources

• Strategy:– Segmentation of the image into

smaller sized images suitablefor the FPGA

– Segments of size m x nare surrounded by an overlap of o.

Page 9: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 999

MEMORY m

o

n RFU

Image Restoration: Data Flow Strategy• Data flow strategy

– Pixels of individual segments are restored in parallel by hardware.

– Restored segments are written back after the overlap is discarded

Page 10: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9910

Degraded Image Restored Image

Image Restoration Example

Page 11: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9911

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 12: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9912

Configuration Memory

Config. Bits RFUOPs

RFU Manager

System Components

PlacementEngine

CacheManager

Prefetch/BranchPrediction Unit

Control

Program Manager

InstructionMem. (Prog.)

CPU instructions

Data

CPU

RFU

Data Memory

Data

Data

Page 13: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9913

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 14: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9914

Online Placement: Problem Definition• Input:

– RFU dimensions (W, H)– List of RFUOP events: (w, h, arrival, departure)

arrival

departure

• Output:– For each module, either

• Rejected (not able to place) [penalty?]• Accepted: (x,y) accepted

rejected

Page 15: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9915

Online Placement

• When a new RFUOP arrives,– Is there enough room?– If yes, which location is best?

• Previous work– Bin-packing heuristics (1-D) - O(n2)

• First Fit, Best Fit, Shelf, Look ahead, …

– [Chazelle’83] The Bottom-Left heuristic. O(n2)– [Healy-Creavin’97] O(n2 lg n)

+ = ?

CurrentPlacement

New moduleto be inserted

Page 16: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9916

Our Online Placement• Our approach:

– Divide the empty space into explicit “empty rectangles”

• When a new RFUOP arrives– Is there enough room? (any ER large enough?)– If yes, which location is best? (which ER is best?)

• Packing rule– Best Fit, Bottom Left, First Fit

Page 17: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9917

Heuristics for Choosing an Empty Rectangle

New moduleto be inserted

+ = ?A

B

CurrentPlacement

Area( ) < Area( ) Choose A

BF (Best Fit)

Places the new module in the empty rectangle which causes less wasted space.

FF (First Fit)

Any of A or B could be chosen for placing the new module.

BL (Bottom Left)

P1

P2

Chooses the empty rect which is more to the bottom left

y(P2) < y(P1) Choose B

Page 18: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9918

Our Online Placement

• Managing the empty space– Keep empty rectangles explicitly,

use “range tree” to store/access empty rects.– Efficient use of RFU real estate

• KAMER: Keep all O(n2) maximal empty rectangles

• Our approach:– Divide the empty space into explicit “empty

rectangles”

• When a new RFUOP arrives– Is there enough room? (any ER large enough?)– If yes, which location is best? (which ER is best?)

Page 19: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9919

Keeping All Empty Rectangles

Page 20: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9920

Our Online Placement• Our approach:

– Divide the empty space into explicit “empty rectangles”

• When a new RFUOP arrives– Is there enough room? (any ER large enough?)– If yes, which location is best? (which ER is best?)

• Managing the empty space– Keep empty rectangles explicitly,

use “range tree” to store/access empty rects.– Efficient use of RFU real estate

• KAMER: Keep all O(n2) maximal empty rectangles

– Fast but sub-optimal• Keep only O(n) empty rectangles

– Shorter Seg. (SSEG), Square Empty Rects. (SQR), ...

Page 21: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9921

Keeping O(n) Empty Rectangles - SSEG

Page 22: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9922

Heuristics for Choosing a Segment

SSEG (Shorter Seg) BER (Balanced Empty Rects) LSQR (Larger Rect Square)

SQR (Square Rects)LER (Large Empty Rects)LSEG (Longer Seg)

S1

S2

Chooses the shorter of the twosegments.

Chooses the longer of the twosegments.

AB

C

D

S1

S2

AB

C

D

A

B

C

D

A

B

C

D

Chooses the segment which creates less area difference.

Chooses the segment which creates the larger rectangle closer to square.

S1 < S2

S1 < S2

Area(B) - Area(A) > Area(D) - Area(C) AspectRatio(B) > AspectRatio(D)

Chooses the segment which creates the larger empty rectangle.

Chooses the segment which creates empty rectangles closer to squares.

Area(B) > Area(D)

Max{AR(A),AR(B)} < Max{AR(C),AR(D)}AR = AspectRatio

Page 23: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9923

How Good is a Placement?• Acceptance rate

– percentage of modules accepted (placed)

• Volume penalty– Area complexity– Time-span in the system loop iterations– Penalty of rejecting a module

penalty = volume = area * time

• Input data– Randomly generated dimensions– Randomly generated enter/leave time

Page 24: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9924

Program

snapshot

Page 25: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9925

Online Placement Results

Bin-Pack

Data set KAMER SSEG BER LSQR LSEG LER SQR

ra2048 79.25 74.26 61.52 70.36 52.83 73.87 70.36ra4096 84.59 79.1 66.84 74.39 58.37 79.49 74.73ra8192 79.71 73.39 63.23 69.87 55.87 74.88 68.11

FF

ra16384 81.35 75.08 63.59 70.42 55.73 76.13 69.38 Avg(FF) 81.23 75.46 63.80 71.26 55.70 76.09 70.65

ra2048 82.52 77.49 67.18 75.05 58.93 76.46 74.66ra4096 87.06 81.76 73.22 80.32 64.57 81.66 79.78ra8192 82.28 77.57 67.85 73.91 59.04 76.12 73.77

BF

ra16384 84.04 78.81 68.5 75.36 60.92 78.25 75.44 Avg(BF) 83.97 78.91 69.19 76.16 60.86 78.12 75.91

ra2048 81.84 76.22 61.72 73.29 55.57 76.07 71.83ra4096 86.18 81.93 70.29 78.56 62.33 81.42 78.54ra8192 81.17 75.71 65.04 72.9 59.71 76.54 72.18

BL

ra16384 83.46 77.39 64.97 74.53 58.23 78.29 73.25 Avg(BL) 83.16 77.81 65.50 74.82 58.96 78.08 73.95

Percentage of accepted modules using different bin-packing and empty space partitioning rules

Page 26: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9926

Online Placement Results (cont.)

Penalties for different partitioning heuristics when BF is used

0.0E+00

2.0E+07

4.0E+07

6.0E+07

8.0E+07

1.0E+08

1.2E+08

1.4E+08

1.6E+08

1.8E+08

KAMER SSEG BER LSQR LSEG LER SQRPartitioning heuristic

Pen

alty

A2048 A4096 A8192 A16384

Page 27: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9927

Online Placement Results (cont.)

Running Time Comparison(Time to place "A16384" file)

35.77 34.27 34.74

2.23 2.12 2.24

0

5

10

15

20

25

30

35

40

KAMER SSEG

Tim

e (s

ec.)

BF

FF

BL

Page 28: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9928

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 29: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9929

ty

x

3-D Floorplanning

RFU

DFGDFG ScheduleSchedule

RFU CPU

RFU area

time

Page 30: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9930

ty

x

3-D Floorplanning

RFU

By deleting this RFUOP(CPU performs theoperation)...

DFGDFG ScheduleSchedule

RFU CPU

Page 31: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9931

ty

x

3-D Floorplanning

RFU

This RFUOP can bemoved on the RFU

DFGDFG ScheduleSchedule

RFU CPU

Page 32: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9932

ty

x

3-D Floorplanning

RFU

DFGDFG ScheduleSchedule

RFU CPU

These RFUOPs can beperformed earlier...

Page 33: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9933

ty

x

3-D Floorplanning

RFU

DFGDFG ScheduleSchedule

RFU CPU

Page 34: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9934

Our Current 3-D Floorplanners

• No change in the schedule– Fixed insertion and deletions of RFUOPs

• Annealing based.– Move set

• Move operation from CPU set to RFU set• Move operation from RFU set to CPU set• Displace an already placed RFUOP on the RFU

– Cost function• Penalty in rejecting modules (sum of volumes of the

RFUOPs in the CPU set)• No overlap allowed during annealing

• Greedy– Sort the modules on decreasing vol., apply KAMER

Page 35: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9935

Our Current 3-D Floorplanners (cont.)

• KAMER-BF-Decreasing – Sort the modules on their volumes– Use KAMER to find a fast placement of the modules

• Low-temp. annealing (LTSA)– Similar to KAMER-BFD, but use KAMER to place

only the X% largest modules– Use low-temp annealing to place the rest

• Zero-temp. annealing (ZTSA) -- Greedy– Use KAMER to place as many modules as you can– Use only displace and move from CPU to RFU

annealing moves.

Page 36: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9936

Our Current 3-D Floorplanners (cont.)

• BFOP - Best Fit Online Placement – Sort the RFUOPs on volume (decreasing)– For each RFUOP, find candidate “corners”– Choose the corner which results in min wasted

area(similar to well-studied 2-D Bin Packing problem)

ty

x

A Floor corresponding to time t1t1

cornerst1

Page 37: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9937

Algorithm Dataset

Offlineacc. rate

Onlineacc. rate

Ratio OfflinePenalty

OnlinePenalty

Ratio

T50 70 84 83.33% 147287 213153 69.10%T100 72 83 86.75% 253566 307879 82.36%S100 86 84 102.38% 464049 508923 91.18%S200 81 89.5 90.50% 539435 612623 88.05%S1024 84.5 84.6 99.88% 4468662 4643786 96.23%

LTSAX=100%

A1024 87 89 97.75% 427761 456627 93.68% Avg 80.08 85.68 93.43% 1050126 1123831 86.77%

T50 76 84 90.48% 148975 213153 69.89%T100 82 83 98.79% 225603 307879 73.28%S100 81 84 96.43% 287153 508923 56.42%S200 85.5 89.5 95.53% 359980 612623 58.76%

LTSAX=20%

A1024 81 89 91.01% 213036 456627 46.65% Avg 81.10 85.90 94.45% 246949 419841 61.00%

Annealing-Based Offline vs. Online

Percentage of accepted modules and penalties using two offline parameters.The higher the RFU acceptance rate and lower the penalty, the better the algorithm.

Page 38: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9938

Offline Placement Results - All

Comparison of different offline algorithms

0

100000

200000

300000

400000

500000

600000

700000

Tiny50 Tiny100 Small100 Small200 A100

Data files

Pen

alty

of

pla

cem

ent

KAMER -BFD

LTSA

ZTSA

BFOP

Page 39: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9939

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 40: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9940

Flexible Modules• Library of soft templates

– Flexible shapes• Constant area, different width,height• Problem? Hard to build (PD should be done for each

shape)

– Median• Use the same area, but square shape

– Rotation

• Placement method– Use best shape (min wasted area)

Page 41: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9941

Using Flexible Modules in BFOPQuality improvement when using flexible modules

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Tiny50

Tiny10

0

Small

100

Small

200

Small

1024

A100

A2048

avg

Data files

Imp

rove

men

t (p

erce

nta

ge)

Median Median/Rotation

Median uses a square module with the same area

Page 42: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9942

Flexible Modules (cont.)• “Firm” templates

– Slice the module into x horizontal or vertical strips– If cannot place the module, use the 2-split, 3-split,

… until you can fit.

• Problem? – Routing!– Limited module types can be split (like carry chains,

etc. with min communication between stages)

Vertical 3-split

Page 43: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9943

Quality Improvements Using Firm Templates

Placment improvement when using firm templates (in OBFD)

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

100.00%

Split-2 Split-3 Split-4 Split-5 Split-6Per

cen

tag

e im

pro

vem

ent

ove

r n

o-

spli

t

Tiny50

Tiny100

Small100

Small200

Small1024

A100

A1024

avg

Page 44: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9944

Outline• FPGA: What and why?• What is Reconfigurable Computing

System (RCS)?• Application example• RCS: System components• Online placement: problem

definition and our approach• Offline placement and scheduling• Flexible modules and firm templates• Conclusion and future work

Page 45: Physical Design for  Reconfigurable Computing Systems  using Firm Templates

Sep 10, 9945

Conclusion• Which online algorithm?

– If speed is an issue, SSEG, ow KAMER

• Online or offline?– If you have the schedule => offline

• Which offline algorithm?– BFOP is the best (faster+better quality)

• Median? Flexibility? Firm templates?– Surprisingly, median gives little improvement– If flexible shape avail, better than splitting

(no additional routing problem)– How many splits?

• no-split 2-split: 23% improvement• 5-split 6-split: 3% improvement