Upload
gretchen-regis
View
213
Download
1
Embed Size (px)
Citation preview
Address comments to [email protected] comments to [email protected]
FPGA Area Reduction by Multi-Output Sequential ResynthesisFPGA Area Reduction by Multi-Output Sequential Resynthesis
Yu Hu1, Victor Shih2, Rupak Majumdar2 and Lei He1
1Electrical Engineering Dept., UCLA
2Computer Science Dept., UCLAPresented by Yu HuPresented by Yu Hu
OutlineOutline
Background and Motivation
Combinational Resynthesis with MIMO Blocks
Sequential Resynthesis
Experimental Results
Conclusion and Future Work
BackgroundBackground
Area-optimal Technology Mapping for LUT-based FPGAs is NP-Hard [Farrahi, TCAD’94]
Post-mapping resynthesis is effective to reduce area (LUT#) [Ling, DAC’05]
RTL Synthesis
LogicSynthesis
Technology Mapping Resynthesis Packing P&R
Area reduction Fault tolerance,power optimization,
physical-aware optimization,and many others.
Boolean Matching Based ResynthesisBoolean Matching Based Resynthesis
Attempt to re-map a logic block to reduce LUT#
BM can be used to handle both homogenous and heterogeneous PLBs
(Source: Andrew Ling, University of Toronto, DAC'05)
Overall Flow of BM-based ResynthesisOverall Flow of BM-based Resynthesis
Multi-iterations of block-based Boolean Matching
(Source: Andrew Ling, University of Toronto, DAC'05)
Limitations of Existing WorkLimitations of Existing Work
Considering single-output logic blocks
Considering combinational portion of the circuit
A larger solution space can be explored and area could be reduced if Multiple-output logic blocks are considered FF boundaries are eliminated
Motivation Example – RetimingMotivation Example – Retiming
a
b c
d e
x1 x2 x3
O1
O2 a
b c
d e
x1 x2 x3
O1
O2
Resynthesis is restricted
by FF boundaries …
Retiming creates chances for resynthesis
2-LUTnetwork
Motivation Example – MISO ResynthesisMotivation Example – MISO Resynthesis
2-LUTnetwork f
c
g e
x1 x2 x3
O1
O2
Function of O2 has to be preserved … Only 1-LUT reduction
a
b c
d e
x1 x2 x3
O1
O2
Motivation Example – MIMO ResynthesisMotivation Example – MIMO Resynthesis
a
b c
d e
x1 x2 x3
O1
O2
2-LUTnetwork h
i
x1 x2x3
O1
O2
60% area reduction is
obtained by sequential MIMO resynthesis!
Major ContributionsMajor Contributions
Present a Boolean matching based resynthesis algorithm considering multi-output logic blocks
Propose a sequential resynthesis technique Reduce area by up to 10% compared to combinational
resynthesis, when both using MIMO blocks
OutlineOutline
Background and Motivation
Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm Experimental Results
Sequential Resynthesis
Experimental Results
Conclusion and Future Work
Existing Boolean Matching for MISOExisting Boolean Matching for MISO
2-LUT
2-LUT
2-LUT
2-LUT
2-LUT
ff gg
??
Formulate the sub-problem of resynthesis to Boolean matching (BM) BM: Can function f be implemented in circuit g ? Resynthesis: Is there a configuration to g so that for all
inputs to g, f is equivalent to g?
(Source: Andrew Ling, University of Toronto, DAC'05)
SAT-BM for Multi-Output FunctionsSAT-BM for Multi-Output Functions
1111
110
101
100
0011
010
0001
0000
x1x2x3
1111
110
101
100
0011
010
0001
0000
f1x1x2x3
0
0
1
0
0
0
0
f2
0
0
1
0
0
0
0
0
0
LUT2x1
x2
x3
PLB (X, F)
LUT2 F1
F2
G LUT [i1, i2 , F] =
( i1 + i2+ ¬L0 + F) ( i1 + i2+ L0 + ¬ F)
( i1 + ¬ i2+ ¬L1 + F) ( i1 + ¬ i2+ L1 + ¬ F)
(¬ i1 + i2+ ¬L2 + F) (¬ i1 + i2+ L2 + ¬ F)
(¬ i1 + ¬ i2+ ¬L3 + F) (¬ i1 + ¬ i2+ L3 + ¬ F)G = G LUT1 [x1, x2 , F2] · G LUT2 [F2 , x3 , F1]
Configuration bits are encoded as SAT literals
Characteristic function
SAT-BM for Multi-Output FunctionsSAT-BM for Multi-Output Functions
1111
110
101
100
0011
010
0001
0000
x1x2x3
1111
110
101
100
0011
010
0001
0000
f1x1x2x3
0
0
1
0
0
0
0
f2
0
0
1
0
0
0
0
0
0
LUT2x1
x2
x3
PLB (X, F)
LUT2 F1
F2G = G LUT1 [x1, x2 , F2] · G LUT2 [F2 , x3 , F1]
Replicated SAT Problem:
G expand = G[X/000, F1/0, F2/0] · G[X/001, F1/0, F2/0]
G[X/010, F1/1, F2/0] · G[X/011, F1/0, F2/0]
G[X/100, F1/1, F2/0] · G[X/101, F1/0, F2/0]
G[X/110, F1/1, F2/1] · G[X/111, F1/1, F2/1]
x1
x2
x3
PLB (X, F)
F1
F2
G1
G2
The solution of this SAT problem
corresponds to the Boolean matching
resultsSAT!
Unique Problem of MIMO SynthesisUnique Problem of MIMO Synthesis
MIMO-resynthesis can generate new path in the block
The new path might cause combinational cycles
Conservative solution: detect combinational cycles and discard resynthesis solutions with cycles
Combinational cycle!
PI
PO
5
1
2
4
3 1
2
3
4
5
POPIFalse
path?
Experimental SettingsExperimental Settings
Implementation in OAGear SAT-BM uses miniSAT2.0
20 biggest MCNC benchmarks are tested 10 combinational 10 sequential mapped with 4-LUTs by Berkeley ABC
Resynthesis settings One traversal is performed Blocks with up to 10 inputs are considered
Results are verified by ABC equivalency checkers
Experimental Settings – PLB templatesExperimental Settings – PLB templates
All three possible structures for PLBs with up to 10 inputs and less than 4 4-LUTs [Ling, DAC’05]
All intermediate wires are treated as the outputs in MIMO resynthesis
4LUT
4LUT
4LUT
X1
X2
X3
X4
X5
X6
X7X8
X9
X10G
4LUT
4LUT
X1
X2
X3
X4
X5
X6
X7
G
F
F1
F2
4LUT
4LUT
4LUT
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10 G
F1
F2
Combinational Resynthesis: MISO vs. MIMOCombinational Resynthesis: MISO vs. MIMO
MIMO does not out-perform MISO significantly, probably due to Rejecting “false paths” introduced by MIMO resynthesis Narrow PLB templates Small block size and LUT size No iterations of re-synthesis
Circiuit ABC LUT# Runtime(min) MISO MIMO MISO MIMO
alu4 720 716 (-0.56%) 714 (-0.83%) 29 870 apex2 965 959 (-0.62%) 958 (-0.73%) 14 429 apex4 791 790 (-0.13%) 789 (-0.25%) 1 247 des 1249 1243 (-0.48%) 1242 (-0.56%) 69 347
ex1010 1103 1090 (-1.18%) 1090 (-1.18%) 55 251 ex5p 541 538 (-0.55%) 538 (-0.55%) 1 58
misex3 736 733 (-0.41%) 730 (-0.82%) 8 406 pdc 2210 2194 (-0.72%) 2191 (-0.86%) 21 241 seq 998 995 (-0.30%) 994 (-0.40%) 2 434 spla 2126 2096 (-1.41%) 2096 (-1.41%) 38 198 ave 1144 1135 (-0.64%) 1134 (-0.76%) 24 348 ratio 1 99.9% 1 15X
OutlineOutline
Background and Motivation
Combinational Resynthesis with MIMO Blocks
Sequential Resynthesis
Experimental Results
Conclusion and Future Work
Structure Impact on Sequential ResynthesisStructure Impact on Sequential Resynthesis
The structure of a logic block decides the sequential resynthesis strategies
Retiming Classic retiming
All edges have non-negative weights after retiming Peripheral retiming
Result in negative number of FFs at peripheral edges
Logic Duplication Allow duplication Not allow duplication
4-LUT
4-LUT 4-LUT
4-LUT
FFFF
4-LUT
4-LUT 4-LUT
FFs
4-LUT
4-LUT 4-LUT
FF
4-LUT
4-LUT 4-LUT
4-LUT
FFs
Case I: Classic Retiming w/o DuplicationCase I: Classic Retiming w/o Duplication
Step1: backward retiming
Step2: combinational resynthesis
Step3: forward retiming
Case II: Peripheral Retiming w/o DuplicationCase II: Peripheral Retiming w/o Duplication
3-LUT
3-LUT 3-LUT
FF
out1
out2
x1 x2 x3 x4 x5
3-LUT
3-LUT 3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
3-LUT
3-LUT
FF
out1
out2
-1 FF
x1 x3 x2 x4 x5
3-LUT
3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
Step1: peripheral retiming
Step2: combinational resynthesis
Step3: check feasibility of forward retiming
Brorrow FFs from outside.
A resynthesis solution w/ feasible retiming
Case II: Peripheral Retiming w/o DuplicationCase II: Peripheral Retiming w/o Duplication
3-LUT
3-LUT 3-LUT
FF
out1
out2
x1 x2 x3 x4 x5
3-LUT
3-LUT 3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
3-LUT
3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
3-LUT
3-LUT
FF
out1
out2
FF
x1 x2 x3 x4 x5
Step4: forward retiming
Case III: Retiming w/ DuplicationCase III: Retiming w/ Duplication
LUT
LUT LUT
LUT
FF
x1 x2 x3 x4
FF not movable!
FF# = 1
FF# = 0
Duplication is required to enable
retiming!
Case III: Peripheral Retiming w/ DuplicationCase III: Peripheral Retiming w/ Duplication
LUT
LUT LUT
LUT
FF
x1 x2 x3 x4
FF not movable!
LUT
LUT LUT
LUT-a
FF
LUT-b
x1 x12 x4x1
3 x02 x0
3
LUT
LUT LUT
FFs
LUT-a LUT-b
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT-c LUT-d
FFs
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT-c LUT-d
FFs
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT
FFs
x1 x2 x3 x4
Identical configuration for
LUT-c and LUT-d.
Duplication or Not?– A Sufficient and Necessary ConditionDuplication or Not?– A Sufficient and Necessary Condition
An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91]a. All input-output paths have the same FF#
b. There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj )
α1+β1 α2+β1 α3+β1 α4+β1
* * α3+β2 α4+β2
1 0 1 1 * * 0 0=
α1 = 1, α2 = 0, α3 = 1, α4 = 1, β1 = 0, β2 = -1 LUT
LUT LUT
FFFF
out1
out2
FF
-1 FF
LUT
LUT LUT
FF
FF
out1
out2
α1 α2 α3 α4
β1 β2
Duplication or Not?– A Sufficient and Necessary ConditionDuplication or Not?– A Sufficient and Necessary Condition
An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91]a. All input-output paths have the same FF#
b. There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj )
Time complexity O(e min(m,n)) Negligible for small block
Classic or peripheral retiming? Classic retiming iff there exist non-negative αi and βj
Can We Accept Every Single Resynthesis? – Feasibility Checking for Sequential ResynthesisCan We Accept Every Single Resynthesis? – Feasibility Checking for Sequential Resynthesis
Initial State Computation Filter out some of the rewriting steps so that an equivalent
initial state for the synthesized machine can be computed from a given initial state of the original machine.
Rewriting invariant [Brayton, IWLS’07] Can be reduced to a SAT problem
Clock Period Preservation A New Retiming-based Technology Mapping Algorithm for
LUT-based FPGAs [Pan, FPGA’98] Sequential arrival time: l-values
Experimental Results – Sequential vs. Combinational ResynthesisExperimental Results – Sequential vs. Combinational Resynthesis
Seq-resynthesis obtains up to 9% area reduction
Factors to affect seq-resynthesis Sequential structure All factors in combinational resynthesis
Circuit ABC LUT# Runtime(min) Comb Seq Comb Seq
bigkey 1261 1261 (0.00%) 1244 (-1.35%) 2709 1898 clma 4210 4167 (-1.02%) 4116 (-2.23%) 2697 3825 di_eq 674 674 (0.00%) 673 (-0.15%) 655 856 dsip 1554 1330 (-14.41%) 1338 (-13.90%) 705 1481
elliptic 441 419 (-4.99%) 419 (-4.99%) 32 370 frisc 2841 2660 (-6.37%) 2595 (-8.66%) 1364 1537 s298 44 41 (-6.82%) 37 (-15.91%) 186 125
s38417 3134 3105 (-0.93%) 3117 (-0.54%) 3466 6092 s38584 3720 3654 (-1.77%) 3655 (-1.75%) 2867 8363 tseng 946 935 (-1.16%) 934 (-1.27%) 1331 1492 ave 1883 1825 (-3.75%) 1813 (-5.07%) 1601 2604
Ratio 1 99.3% 1 1.6X
OutlineOutline
Background and Motivation
Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm
Sequential Resynthesis
Conclusion and Future Work
Conclusions and Future WorkConclusions and Future Work
Proposed a new resynthesis considering bothMIMO blocks and retiming
Results indicate that sequential resynthesis obtainsmore gain than MIMO resynthesis
Future work PLBs from [Ling, DAC’05] are optimal only for MISO,
and we will develop new PLB structures for MIMO re-synthesis
Study the resynthesis for heterogeneous FPGAs
ThanksThanks
FPGA Area Reduction by Multi-Output Sequential Resynthesis
Yu Hu, Victor Shih, Rupak Majumdar and Lei He