32
Address comments to Address comments to [email protected] [email protected] FPGA Area Reduction by Multi- Output Sequential Resynthesis Yu Hu 1 , Victor Shih 2 , Rupak Majumdar 2 and Lei He 1 1 Electrical Engineering Dept., UCLA 2 Computer Science Dept., UCLA Presented by Yu Hu Presented by Yu Hu

Address comments to [email protected] FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Embed Size (px)

Citation preview

Page 1: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Address comments to [email protected] comments to [email protected]

FPGA Area Reduction by Multi-Output Sequential ResynthesisFPGA Area Reduction by Multi-Output Sequential Resynthesis

Yu Hu1, Victor Shih2, Rupak Majumdar2 and Lei He1

1Electrical Engineering Dept., UCLA

2Computer Science Dept., UCLAPresented by Yu HuPresented by Yu Hu

Page 2: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

OutlineOutline

Background and Motivation

Combinational Resynthesis with MIMO Blocks

Sequential Resynthesis

Experimental Results

Conclusion and Future Work

Page 3: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

BackgroundBackground

Area-optimal Technology Mapping for LUT-based FPGAs is NP-Hard [Farrahi, TCAD’94]

Post-mapping resynthesis is effective to reduce area (LUT#) [Ling, DAC’05]

RTL Synthesis

LogicSynthesis

Technology Mapping Resynthesis Packing P&R

Area reduction Fault tolerance,power optimization,

physical-aware optimization,and many others.

Page 4: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Boolean Matching Based ResynthesisBoolean Matching Based Resynthesis

Attempt to re-map a logic block to reduce LUT#

BM can be used to handle both homogenous and heterogeneous PLBs

(Source: Andrew Ling, University of Toronto, DAC'05)

Page 5: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Overall Flow of BM-based ResynthesisOverall Flow of BM-based Resynthesis

Multi-iterations of block-based Boolean Matching

(Source: Andrew Ling, University of Toronto, DAC'05)

Page 6: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Limitations of Existing WorkLimitations of Existing Work

Considering single-output logic blocks

Considering combinational portion of the circuit

A larger solution space can be explored and area could be reduced if Multiple-output logic blocks are considered FF boundaries are eliminated

Page 7: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Motivation Example – RetimingMotivation Example – Retiming

a

b c

d e

x1 x2 x3

O1

O2 a

b c

d e

x1 x2 x3

O1

O2

Resynthesis is restricted

by FF boundaries …

Retiming creates chances for resynthesis

2-LUTnetwork

Page 8: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Motivation Example – MISO ResynthesisMotivation Example – MISO Resynthesis

2-LUTnetwork f

c

g e

x1 x2 x3

O1

O2

Function of O2 has to be preserved … Only 1-LUT reduction

a

b c

d e

x1 x2 x3

O1

O2

Page 9: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Motivation Example – MIMO ResynthesisMotivation Example – MIMO Resynthesis

a

b c

d e

x1 x2 x3

O1

O2

2-LUTnetwork h

i

x1 x2x3

O1

O2

60% area reduction is

obtained by sequential MIMO resynthesis!

Page 10: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Major ContributionsMajor Contributions

Present a Boolean matching based resynthesis algorithm considering multi-output logic blocks

Propose a sequential resynthesis technique Reduce area by up to 10% compared to combinational

resynthesis, when both using MIMO blocks

Page 11: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

OutlineOutline

Background and Motivation

Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm Experimental Results

Sequential Resynthesis

Experimental Results

Conclusion and Future Work

Page 12: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Existing Boolean Matching for MISOExisting Boolean Matching for MISO

2-LUT

2-LUT

2-LUT

2-LUT

2-LUT

ff gg

??

Formulate the sub-problem of resynthesis to Boolean matching (BM) BM: Can function f be implemented in circuit g ? Resynthesis: Is there a configuration to g so that for all

inputs to g, f is equivalent to g?

(Source: Andrew Ling, University of Toronto, DAC'05)

Page 13: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

SAT-BM for Multi-Output FunctionsSAT-BM for Multi-Output Functions

1111

110

101

100

0011

010

0001

0000

x1x2x3

1111

110

101

100

0011

010

0001

0000

f1x1x2x3

0

0

1

0

0

0

0

f2

0

0

1

0

0

0

0

0

0

LUT2x1

x2

x3

PLB (X, F)

LUT2 F1

F2

G LUT [i1, i2 , F] =

( i1 + i2+ ¬L0 + F) ( i1 + i2+ L0 + ¬ F)

( i1 + ¬ i2+ ¬L1 + F) ( i1 + ¬ i2+ L1 + ¬ F)

(¬ i1 + i2+ ¬L2 + F) (¬ i1 + i2+ L2 + ¬ F)

(¬ i1 + ¬ i2+ ¬L3 + F) (¬ i1 + ¬ i2+ L3 + ¬ F)G = G LUT1 [x1, x2 , F2] · G LUT2 [F2 , x3 , F1]

Configuration bits are encoded as SAT literals

Characteristic function

Page 14: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

SAT-BM for Multi-Output FunctionsSAT-BM for Multi-Output Functions

1111

110

101

100

0011

010

0001

0000

x1x2x3

1111

110

101

100

0011

010

0001

0000

f1x1x2x3

0

0

1

0

0

0

0

f2

0

0

1

0

0

0

0

0

0

LUT2x1

x2

x3

PLB (X, F)

LUT2 F1

F2G = G LUT1 [x1, x2 , F2] · G LUT2 [F2 , x3 , F1]

Replicated SAT Problem:

G expand = G[X/000, F1/0, F2/0] · G[X/001, F1/0, F2/0]

G[X/010, F1/1, F2/0] · G[X/011, F1/0, F2/0]

G[X/100, F1/1, F2/0] · G[X/101, F1/0, F2/0]

G[X/110, F1/1, F2/1] · G[X/111, F1/1, F2/1]

x1

x2

x3

PLB (X, F)

F1

F2

G1

G2

The solution of this SAT problem

corresponds to the Boolean matching

resultsSAT!

Page 15: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Unique Problem of MIMO SynthesisUnique Problem of MIMO Synthesis

MIMO-resynthesis can generate new path in the block

The new path might cause combinational cycles

Conservative solution: detect combinational cycles and discard resynthesis solutions with cycles

Combinational cycle!

PI

PO

5

1

2

4

3 1

2

3

4

5

POPIFalse

path?

Page 16: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Experimental SettingsExperimental Settings

Implementation in OAGear SAT-BM uses miniSAT2.0

20 biggest MCNC benchmarks are tested 10 combinational 10 sequential mapped with 4-LUTs by Berkeley ABC

Resynthesis settings One traversal is performed Blocks with up to 10 inputs are considered

Results are verified by ABC equivalency checkers

Page 17: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Experimental Settings – PLB templatesExperimental Settings – PLB templates

All three possible structures for PLBs with up to 10 inputs and less than 4 4-LUTs [Ling, DAC’05]

All intermediate wires are treated as the outputs in MIMO resynthesis

4LUT

4LUT

4LUT

X1

X2

X3

X4

X5

X6

X7X8

X9

X10G

4LUT

4LUT

X1

X2

X3

X4

X5

X6

X7

G

F

F1

F2

4LUT

4LUT

4LUT

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10 G

F1

F2

Page 18: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Combinational Resynthesis: MISO vs. MIMOCombinational Resynthesis: MISO vs. MIMO

MIMO does not out-perform MISO significantly, probably due to Rejecting “false paths” introduced by MIMO resynthesis Narrow PLB templates Small block size and LUT size No iterations of re-synthesis

Circiuit ABC LUT# Runtime(min) MISO MIMO MISO MIMO

alu4 720 716 (-0.56%) 714 (-0.83%) 29 870 apex2 965 959 (-0.62%) 958 (-0.73%) 14 429 apex4 791 790 (-0.13%) 789 (-0.25%) 1 247 des 1249 1243 (-0.48%) 1242 (-0.56%) 69 347

ex1010 1103 1090 (-1.18%) 1090 (-1.18%) 55 251 ex5p 541 538 (-0.55%) 538 (-0.55%) 1 58

misex3 736 733 (-0.41%) 730 (-0.82%) 8 406 pdc 2210 2194 (-0.72%) 2191 (-0.86%) 21 241 seq 998 995 (-0.30%) 994 (-0.40%) 2 434 spla 2126 2096 (-1.41%) 2096 (-1.41%) 38 198 ave 1144 1135 (-0.64%) 1134 (-0.76%) 24 348 ratio 1 99.9% 1 15X

Page 19: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

OutlineOutline

Background and Motivation

Combinational Resynthesis with MIMO Blocks

Sequential Resynthesis

Experimental Results

Conclusion and Future Work

Page 20: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Structure Impact on Sequential ResynthesisStructure Impact on Sequential Resynthesis

The structure of a logic block decides the sequential resynthesis strategies

Retiming Classic retiming

All edges have non-negative weights after retiming Peripheral retiming

Result in negative number of FFs at peripheral edges

Logic Duplication Allow duplication Not allow duplication

Page 21: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

4-LUT

4-LUT 4-LUT

4-LUT

FFFF

4-LUT

4-LUT 4-LUT

FFs

4-LUT

4-LUT 4-LUT

FF

4-LUT

4-LUT 4-LUT

4-LUT

FFs

Case I: Classic Retiming w/o DuplicationCase I: Classic Retiming w/o Duplication

Step1: backward retiming

Step2: combinational resynthesis

Step3: forward retiming

Page 22: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Case II: Peripheral Retiming w/o DuplicationCase II: Peripheral Retiming w/o Duplication

3-LUT

3-LUT 3-LUT

FF

out1

out2

x1 x2 x3 x4 x5

3-LUT

3-LUT 3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

3-LUT

3-LUT

FF

out1

out2

-1 FF

x1 x3 x2 x4 x5

3-LUT

3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

Step1: peripheral retiming

Step2: combinational resynthesis

Step3: check feasibility of forward retiming

Brorrow FFs from outside.

A resynthesis solution w/ feasible retiming

Page 23: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Case II: Peripheral Retiming w/o DuplicationCase II: Peripheral Retiming w/o Duplication

3-LUT

3-LUT 3-LUT

FF

out1

out2

x1 x2 x3 x4 x5

3-LUT

3-LUT 3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

3-LUT

3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

3-LUT

3-LUT

FF

out1

out2

FF

x1 x2 x3 x4 x5

Step4: forward retiming

Page 24: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Case III: Retiming w/ DuplicationCase III: Retiming w/ Duplication

LUT

LUT LUT

LUT

FF

x1 x2 x3 x4

FF not movable!

FF# = 1

FF# = 0

Duplication is required to enable

retiming!

Page 25: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Case III: Peripheral Retiming w/ DuplicationCase III: Peripheral Retiming w/ Duplication

LUT

LUT LUT

LUT

FF

x1 x2 x3 x4

FF not movable!

LUT

LUT LUT

LUT-a

FF

LUT-b

x1 x12 x4x1

3 x02 x0

3

LUT

LUT LUT

FFs

LUT-a LUT-b

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT-c LUT-d

FFs

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT-c LUT-d

FFs

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT

FFs

x1 x2 x3 x4

Identical configuration for

LUT-c and LUT-d.

Page 26: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Duplication or Not?– A Sufficient and Necessary ConditionDuplication or Not?– A Sufficient and Necessary Condition

An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91]a. All input-output paths have the same FF#

b. There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj )

α1+β1 α2+β1 α3+β1 α4+β1

* * α3+β2 α4+β2

1 0 1 1 * * 0 0=

α1 = 1, α2 = 0, α3 = 1, α4 = 1, β1 = 0, β2 = -1 LUT

LUT LUT

FFFF

out1

out2

FF

-1 FF

LUT

LUT LUT

FF

FF

out1

out2

α1 α2 α3 α4

β1 β2

Page 27: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Duplication or Not?– A Sufficient and Necessary ConditionDuplication or Not?– A Sufficient and Necessary Condition

An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91]a. All input-output paths have the same FF#

b. There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj )

Time complexity O(e min(m,n)) Negligible for small block

Classic or peripheral retiming? Classic retiming iff there exist non-negative αi and βj

Page 28: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Can We Accept Every Single Resynthesis? – Feasibility Checking for Sequential ResynthesisCan We Accept Every Single Resynthesis? – Feasibility Checking for Sequential Resynthesis

Initial State Computation Filter out some of the rewriting steps so that an equivalent

initial state for the synthesized machine can be computed from a given initial state of the original machine.

Rewriting invariant [Brayton, IWLS’07] Can be reduced to a SAT problem

Clock Period Preservation A New Retiming-based Technology Mapping Algorithm for

LUT-based FPGAs [Pan, FPGA’98] Sequential arrival time: l-values

Page 29: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Experimental Results – Sequential vs. Combinational ResynthesisExperimental Results – Sequential vs. Combinational Resynthesis

Seq-resynthesis obtains up to 9% area reduction

Factors to affect seq-resynthesis Sequential structure All factors in combinational resynthesis

Circuit ABC LUT# Runtime(min) Comb Seq Comb Seq

bigkey 1261 1261 (0.00%) 1244 (-1.35%) 2709 1898 clma 4210 4167 (-1.02%) 4116 (-2.23%) 2697 3825 di_eq 674 674 (0.00%) 673 (-0.15%) 655 856 dsip 1554 1330 (-14.41%) 1338 (-13.90%) 705 1481

elliptic 441 419 (-4.99%) 419 (-4.99%) 32 370 frisc 2841 2660 (-6.37%) 2595 (-8.66%) 1364 1537 s298 44 41 (-6.82%) 37 (-15.91%) 186 125

s38417 3134 3105 (-0.93%) 3117 (-0.54%) 3466 6092 s38584 3720 3654 (-1.77%) 3655 (-1.75%) 2867 8363 tseng 946 935 (-1.16%) 934 (-1.27%) 1331 1492 ave 1883 1825 (-3.75%) 1813 (-5.07%) 1601 2604

Ratio 1 99.3% 1 1.6X

Page 30: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

OutlineOutline

Background and Motivation

Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm

Sequential Resynthesis

Conclusion and Future Work

Page 31: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

Conclusions and Future WorkConclusions and Future Work

Proposed a new resynthesis considering bothMIMO blocks and retiming

Results indicate that sequential resynthesis obtainsmore gain than MIMO resynthesis

Future work PLBs from [Ling, DAC’05] are optimal only for MISO,

and we will develop new PLB structures for MIMO re-synthesis

Study the resynthesis for heterogeneous FPGAs

Page 32: Address comments to lhe@ee.ucla.edu FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1

ThanksThanks

FPGA Area Reduction by Multi-Output Sequential Resynthesis

Yu Hu, Victor Shih, Rupak Majumdar and Lei He