Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD 2008.
Y. Hu, V. Shih, R. Majumdar and L. He, “FPGA Area Reduction by Multi-Output Function Based Sequential Resynthesis”, DAC 2008.
Y. Hu, Z. Feng, R. Majumdar and L. He, “Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching”, ICCAD 2008.
FPGA Resynthesis for Area and Reliability
AbstractResynthesis, a circuit rewriting technique in FPGA CAD flow,
has emerged to cope with the inherent NP-hardness of many
CAD tasks and the ever increasing design complexity and
logic capacity of FPGAs. Targeting area and reliability
optimization, this project proposed two logic resynthesis
algorithms by applying an efficient SAT-based Boolean
matching as the optimization engine. In contrast to existing
resynthesis, our proposed algorithms explore multiple design
freedoms and architecture features in order to achieve better
quality.
Motivations Heuristic FPGA synthesis results in sub-optimal
500X gap exists between optimal and heuristic technology mapping [Cong, FPGA’06]
Growing design complexity and FPGA capacity increases the optimality gap
Resynthesis is needed to improve quality (area/performance/power/reliability)
Rewrite the logic or physical design
Perform iterations for design closure 1
Resynthesis for Area & SER
SAT-based Boolean Matching
MIMO&Retiming for Area
References
Student: Yu Hu (Ph.D., 2009)Advisor: Lei HeEDA Lab, Electrical Engineering Department,
UCLAhttp://eda.ee.ucla.edu
2
5
6
Area-Aware Resynthesis
8
9
4
Resynthesis Framework
Area Reduction ResultsExploring Symmetries in BM SAT-BM can be much faster if we explore symmetries in
Boolean function, e.g., b and c are symmetric in a(b+c)
FPGA PLB architecture, e.g., pins in an LUT are symmetric
7
14
Proposed Resynthesis Resynthesis based on LUT reconfiguration
Leverage the inherent flexibility in LUT-based FPGA
Reduce area without performance degradation
Increase reliability with negligible area overhead
A SAT-based Boolean matching is the key A formal method ensuring correct-by-construction Flexible enough to deal with heterogeneous FPGA Efficient proposed implementation for scalability 3
200X speedup
Multi-iterations of block-based re-mapping
Sequential resynthesis obtains up to 9% area Factors to sequential resynthesis quality
Sequential structure
PLB templates, the number of iterations
SER model: [Mukherjee, HPCA, 2005]
Assume large industrial FPGA: 330,000 LUTs
Robust PLB Structure
10
12
Stochastic Resynthesis
Fault-Tolerant Boolean Matching11
13MTTF Evaluation
Different synthesis algorithms lead to different area-robustness tradeoffs
Stochastic resynthesis maximizes the yield rate under random faults No testing overhead, negligible area/performance overhead
Logic synthesis
Resynthesis
Physical design
High-level circuit description
Bitstream
Timing info
Fault info
Architecture specification
Boolean matching answers a Yes-No question
Can PLB p implement Boolean function f?
If yes, give the configuration bits for all LUTs in p.
LUT4
PLB_d
LUT4
LUT3
0.001
0.01
0.1
1
10
100
5 6 7 8 9
Runt
ime
(s)
Number of inputs of Boolean function
Ling'05 Ours Ours+Cong'07
Each re-mapping is based on SAT-BM
a
b c
d e
x1 x2 x3
O1
O2a
b c
d e
x1 x2 x3
O1
O2
Retiming breaks register boundaries for resynthesis
f
c
g e
x1 x2 x3
O1
O2
Function of O2 has to be preserved, i.e., c and e need to be duplicated, which is not required if MIMO block is considered.
h
i
x1 x2x3
O1
O2
Case I: Classic retiming w/o duplication
Case II: Peripheral retiming w/o duplication
Case III: Peripheral retiming w duplication
4-LUT
4-LUT 4-LUT
4-LUT
FFFF
4-LUT
4-LUT 4-LUT
FFs
4-LUT
4-LUT 4-LUT
FF
4-LUT
4-LUT 4-LUT
4-LUT
FFs
3-LUT
3-LUT 3-LUT
FF
out1
out2
x1 x2 x3 x4 x5
3-LUT
3-LUT 3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
3-LUT
3-LUT
FF
out1
out2
-1 FF
x1 x2 x3 x4 x5
3-LUT
3-LUT
FF
out1
out2
FF
x1 x2 x3 x4 x5
LUT
LUT LUT
LUT
FF
x1 x2 x3 x4
LUT
LUT LUT
LUT-a
FF
LUT-b
x1 x12 x4x1
3 x02 x0
3
LUT
LUT LUT
FFs
LUT-a LUT-b
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT-c LUT-d
FFs
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT-c LUT-d
FFs
x1 x12 x4x1
3 x02 x0
3
LUT
LUT
LUT
FFs
x1 x2 x3 x4
Circuit ABC LUT# Runtime(min) Comb Seq Comb Seq
bigkey 1261 1261 (0.00%) 1244 (-1.35%) 2709 1898 clma 4210 4167 (-1.02%) 4116 (-2.23%) 2697 3825 di_eq 674 674 (0.00%) 673 (-0.15%) 655 856 dsip 1554 1330 (-14.41%) 1338 (-13.90%) 705 1481
elliptic 441 419 (-4.99%) 419 (-4.99%) 32 370 frisc 2841 2660 (-6.37%) 2595 (-8.66%) 1364 1537 s298 44 41 (-6.82%) 37 (-15.91%) 186 125
s38417 3134 3105 (-0.93%) 3117 (-0.54%) 3466 6092 s38584 3720 3654 (-1.77%) 3655 (-1.75%) 2867 8363 tseng 946 935 (-1.16%) 934 (-1.27%) 1331 1492 ave 1883 1825 (-3.75%) 1813 (-5.07%) 1601 2604
Ratio 1 99.3% 1 1.6X
Logic blockIntermediate
logics
Fault rateof previous
logics
Input faults of LB2
Faults in config-bits
X
Both faults in LUT configuration and interconnect are considered and modeled as random variables. 18 synthesis solutions for MCNC-i10
Input: PLB template H and Boolean function F Fault rate for the inputs and SRAM bits of PLB H
Output: Either that F cannot be implemented by PLB H Or that the configuration of H which minimizes the probability that faults are observable in the output of the PLB under all input vectors
Fault-Tolerant BM task breakdown Find multiple Boolean matching Evaluate the stochastic fault rate
Deterministic SAT vs. SSAT
Deterministic SAT Stochastic SAT
Faults in intermediate wiresFaults in LUT configurations
LUT4
LUT4
LUT4 LUT4
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
z1
z3
z2
G
F1
F2
LUT2
LUT2
00 101 110 x11 1
z1
z3
z2
X1
X2
X3
X4
LUT2
00 x01 x10 x11 x
LUT2z1
z3
z2
X1
X2
1
1
G
(a) (b)
Satisfiability don’t-
care
Observability don’t-
care
Robust PLB structure introduces more potential for don’t-cares Stochastic resynthesis maximizes don’t-cares w/ FTBM and robust PLB
20.66
27.15
0 5 10 15 20 25 30
MT
TF
(y
ea
r)
ABC Stochastic Resynth 31% MTTF increase!
LUT utilization is low for Xilinx V-5 FPGA
5