View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Combining Technology Mapping Combining Technology Mapping and Retimingand Retiming
EECS 290A EECS 290A Sequential Logic Synthesis and VerificationSequential Logic Synthesis and Verification
OutlineOutline MotivationMotivation Technology mapping for combinational circuitsTechnology mapping for combinational circuits Generalizing the concept of combinational delay Generalizing the concept of combinational delay
to sequential circuit using the concept of l-valueto sequential circuit using the concept of l-value Technology mapping for sequential circuitsTechnology mapping for sequential circuits
Computation of cutsComputation of cuts Search for the optimum-delay solutionSearch for the optimum-delay solution
• Computation of optimum l-valuesComputation of optimum l-values Constructing the solutionConstructing the solution Retiming for optimum delayRetiming for optimum delay
Traditional Tech Mapping ApproachTraditional Tech Mapping Approach
Cut sequential circuit at the Cut sequential circuit at the latch boundarylatch boundary
Optimize and map the Optimize and map the combinational partcombinational part
Pros: Preserves latch encodingPros: Preserves latch encoding Cons: Potentially suboptimalCons: Potentially suboptimal
(Optional) Retime the mapped (Optional) Retime the mapped circuitcircuit
LIPO
PI LO
Logic
Latches
Motivating Example: LUT Size = 3Motivating Example: LUT Size = 3
a b
c
i1 i2
f
a b
c
i1 i2
f
i2
i1
f
i1
f
i2
2 LUTs
mappingmapping
retiming
1 LUT
Basic Mapping: OverviewBasic Mapping: Overview
Pre-compute truth tables of gates (supergates)Pre-compute truth tables of gates (supergates) Represent netlist as an AND-INV graph (AIG)Represent netlist as an AND-INV graph (AIG) For each node, compute cutsFor each node, compute cuts Map network for delayMap network for delay Recover area using heuristicsRecover area using heuristics Select final mappingSelect final mapping
What is Mapping?What is Mapping? Mapping expresses functions using gatesMapping expresses functions using gates
z1 z2 z3
x5x4x3x2x1
cdcdabab 0000 0101 1111 1010
0000 00 00 11 000101 00 00 11 111111 00 11 11 001010 00 00 11 00
F(a,b,c,d) = ab + d(ac’+bc)
F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d)
cdcdabab 0000 0101 1111 1010
0000 00 00 11 000101 00 00 11 111111 00 11 11 001010 00 00 11 00
6 nodes
4 levels
7 nodes
3 levels
b ca c
a b d
a c b d b c a d
Basic Mapping: AND-INV Basic Mapping: AND-INV GraphsGraphs
Basic Mapping: Computing AIGBasic Mapping: Computing AIG Technology-independent Technology-independent
synthesissynthesis Any synthesis flow can be usedAny synthesis flow can be used
Constructing AIG from Constructing AIG from factored formsfactored forms
SOPs are factored using SOPs are factored using algebraic factoringalgebraic factoring
Balancing AIGBalancing AIG Reduces delayReduces delay
z1 z2 z3
x5x4x3x2x1
n
Fn= x2 x3’ x4
Basic Mapping: CutsBasic Mapping: Cuts
Definition. A cut Definition. A cut CC for a node for a node n n is a is a set of nodes, such that all paths set of nodes, such that all paths from the primary inputs to from the primary inputs to n n passes passes through a node in through a node in CC
Node itself is an elementary cutNode itself is an elementary cut kk-feasible cuts are cuts containing at -feasible cuts are cuts containing at
most most kk nodes nodes An average number of 5-feasible cuts An average number of 5-feasible cuts
in benchmarks is ~20 cuts per nodein benchmarks is ~20 cuts per node
n
x3x2x1
Basic Mapping: Computing CutsBasic Mapping: Computing Cuts
Compute all 2-feasible cuts of node n.Compute all 2-feasible cuts of node n.
Cuts for node Cuts for node p = {{p}, {s,xp = {{p}, {s,x22}, {x}, {x11,x,x22}}}}
Cuts for node Cuts for node q = {{q}, {xq = {{q}, {x22,t}, {x,t}, {x22,x,x33}}}}
Cuts for node Cuts for node n = {{p}, {s,xn = {{p}, {s,x22}, {x}, {x11,x,x22}} }} {{q}, {x {{q}, {x22,t}, {x,t}, {x22,x,x33}} }} {n} = {n} = {{n}, {{n}, {p,q}, {p,x{p,q}, {p,x22,t}, {p,x,t}, {p,x22,x,x33}, …}}, …}
2-feasible cuts for node 2-feasible cuts for node n = {{n}, {p,q}}n = {{n}, {p,q}}
n
x3x2x1
qp
s t
All All kk-feasible cuts are computed in -feasible cuts are computed in one pass over the AIGone pass over the AIG
Assign elementary cuts for primary inputsAssign elementary cuts for primary inputs For each internal node For each internal node
• merge the cut sets of children while merge the cut sets of children while removing duplicated cutsremoving duplicated cuts
• add the elementary cut composed of the add the elementary cut composed of the node itselfnode itself
Basic Mapping: Truth TablesBasic Mapping: Truth Tables
Truth table is a bit-string representing Truth table is a bit-string representing Boolean function of a cutBoolean function of a cut
Truth tables are computed for all cuts of Truth tables are computed for all cuts of all nodesall nodes
For each cut, assign elementary variables to For each cut, assign elementary variables to cut leavescut leaves
Compute the truth tables for the internal Compute the truth tables for the internal nodes in topological ordernodes in topological order
x3x1
t
q
x2
x1 = 10101010x1 = 10101010x2 = 11001100x2 = 11001100x3 = 11110000x3 = 11110000t = x2 & x3 = 11000000t = x2 & x3 = 11000000q = x1 & t = 10000000q = x1 & t = 10000000
LSBMSB
Basic Mapping: Delay Basic Mapping: Delay OptimalityOptimality
Assign the arrival times of the Assign the arrival times of the primary inputsprimary inputs
For each node, in topological orderFor each node, in topological order Compare the truth table of the cut with Compare the truth table of the cut with
the truth tables of the gates (when they the truth tables of the gates (when they are equal, we have a are equal, we have a matchmatch))
Compute the arrival times of each cut, Compute the arrival times of each cut, in both phasesin both phases
Select the best cut for each phaseSelect the best cut for each phase When arrival times are equal, use area When arrival times are equal, use area
as a tie-breakeras a tie-breaker
c1 c2c3
c4
TTc2c2 < T < Tc3c3 < T < Tc1c1 < T < Tc4c4
CC2 2 is the best cutis the best cut
Basic Mapping: Area RecoveryBasic Mapping: Area Recovery Performs three passesPerforms three passes
Minimize area flowMinimize area flow Minimize exact area for best matchesMinimize exact area for best matches Minimize area by phase assignmentMinimize area by phase assignment
In each pass, for all nodes, in In each pass, for all nodes, in topological ordertopological order
Consider matches with Consider matches with ArrivalTime <= RequiredTimeArrivalTime <= RequiredTime Among these matches, pick the one Among these matches, pick the one
minimizing area(flow)minimizing area(flow) When area(flows) are equal, use delay When area(flows) are equal, use delay
as a tie-breakeras a tie-breaker
c1 c2c3
c4
AAc2c2 < A < Ac3c3 < A < Ac1c1 < A < Ac4c4
CC2 2 is the best cutis the best cut
Basic Mapping: Area FlowBasic Mapping: Area Flow
Definition:Definition: Area flowArea flow of a primary input is 0 of a primary input is 0 Area flowArea flow of a node in the network is of a node in the network is
AF(n) = [ Area(n) +AF(n) = [ Area(n) + ii AF(fanin AF(faninii(n)) ] / NumFanouts(n)(n)) ] / NumFanouts(n)
00
1/3
(1+1/3) / 2 = 2/3
0
Basic Mapping: Area of a MatchBasic Mapping: Area of a Match Definition. Definition. AreaArea of a match is the sum total of the areas of all the of a match is the sum total of the areas of all the
gates in maximum fanout-free cone (MFFC) of the root gate gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins)(includes the root gate and some of the fanins)
M1
g1
g2 g3g4
g5 g6
g7 g8g9
g10g11
g12g13
A(M1)=A(g1)+ A(g3)+ A(g4)+ A(g5)+A(g9)
Basic Mapping: Select Final Basic Mapping: Select Final MappingMapping
Extracting the final mapping from the AIG after the best Extracting the final mapping from the AIG after the best matches are assigned to each nodematches are assigned to each node
Select the best match for each primary output nodeSelect the best match for each primary output node Recursively, for each fanin of a selected match, select its best Recursively, for each fanin of a selected match, select its best
matchesmatches z1 z2 z3
x5x4x3x2x1
Mapping for Sequential CircuitsMapping for Sequential Circuits
Represent netlist as an AND-INV graph (AIG)Represent netlist as an AND-INV graph (AIG) For each node, compute cuts (iteration over the circuit)For each node, compute cuts (iteration over the circuit) For each node, compute l-values (iteration over the circuit)For each node, compute l-values (iteration over the circuit) Map network for delay (iteration over the clock periods)Map network for delay (iteration over the clock periods) Recover area using heuristicsRecover area using heuristics Select final mappingSelect final mapping
P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.
l-Value: A Generalization of l-Value: A Generalization of Combinational DelayCombinational Delay
Definition.Definition. For each edge For each edge e: u e: u v v inin S S, we , we assign assign l-weight l-weight equal to equal to --d+d+uuvv, , where where is the clock period, is the clock period, dd is the number of latches on the edge, and is the number of latches on the edge, and uuv v is the combinational delay of pin is the combinational delay of pin uu of node of node v.v.
Definition.Definition. The The l-valuel-value of a node in of a node in S S is defined is defined as the maximum weight of the paths from the PIs as the maximum weight of the paths from the PIs to the node using the to the node using the l-weightsl-weights..
Theorem:Theorem: SS can be retimed to a clock period can be retimed to a clock period iff iff the the l-valuel-value of each PO is less than or equal to of each PO is less than or equal to ..
ExampleExample
a b
c
i1 i2
f
D = 1 = 1 - infeasible
l(a) = 1, l(c)=2, etc
D = 1 = 2 - feasible
l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc
D = 1 = 3 - feasible
l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc
Computing CutsComputing Cutsfor each non-PO node for each non-PO node vv in in NN LLvv = {{v = {{v00}};}};donedone = false; = false;while ( while ( donedone == false ) do == false ) do donedone = true; = true; for each node for each node vv (not PI or PO) in (not PI or PO) in NN do do tmptmp = = mergemerge ( (LLu1u1, L, Lu2u2, …, L, …, Luiui);); if ( if ( tmp tmp L Lvv ) then ) then Lv = tmp Lv = tmp {{v {{v00}};}}; donedone = false; = false;return success; // return success; // LLvv settled to settled to CCvv for each for each vv
mergemerge(C(Cu1u1,C,Cu2u2,…,C,…,Cutut) ) = {= {c = cc = c11d1d1 c c22
d2d2 … … c cttdt dt ||ccii C Cui ui andand |c| |c| k k }}
where where ccii
didi = {x = {xd+did+di | x | xdd c cii}} and and ddii is the number of latches on the edge from is the number of latches on the edge from uuii to to vv..
ExampleExample
ii11 i i22 a b c a b c
0:0: {i {i1100} {i} {i2200} {a} {a00} {b} {b00} {c} {c00}}
1:1: {i {i1100, c, c11} {i} {i2200, c, c00} {a} {a00, b, b11} }
{a{a00, i, i2211, c, c11} }
{i{i1100, c, c11, b, b11} }
{i{i1100, c, c11, i, i2211} }
2: 2: {i{i1100, a, a11, b, b22}} {i{i2200, a, a00, b, b11} }
a b
c
i1 i2
Finding Minimum l-ValuesFinding Minimum l-Valuesfor each node for each node vv in in NN do do if (if (vv is a PI) is a PI) l(v)l(v) = 0; = 0; else else l(v)l(v) = - = -;;donedone = false; = false;while ( while ( donedone == false ) do == false ) do donedone = true; = true; for each non-PI node for each non-PI node vv in in NN do do
tmp = mintmp = minc, a cut of v c, a cut of v ( max[ l(u) - ( max[ l(u) - d+d+uuv v | u| udd c] ) c] )
if ( if ( l(v) < tmpl(v) < tmp ) ) l(v) = tmpl(v) = tmp; ; donedone = false; = false; if ( if ( vv is a PO and is a PO and l(v) > l(v) > ) return failure; ) return failure;return success; // bound have settledreturn success; // bound have settled
Constructing Mapping SolutionConstructing Mapping Solution
UU = the set of POs = the set of POsS S = { = { vv | | vv is a PI or PO } is a PI or PO } while ( while ( UU ) do ) do vv = any node in = any node in UU; ; UU = = UU – – {v}{v}; ;
for each non-trivial cut for each non-trivial cut c c C Cvv do do
if ( if ( lloptopt(v) ==(v) == max[ lmax[ loptopt(u) - (u) - d+d+uuv v | u| udd c] c] ))
ccbestbest = c; = c;
for each for each uudd c cbestbest dodo
if ( if ( uu is not in is not in SS ) ) S = S S = S {u}; U = U {u}; U = U {u}; {u}; create an edge is create an edge is SS from from uu to to v v with with dd FFs; FFs;return return SS; ;
Performing Final RetimingPerforming Final Retiming Retime each node Retime each node vv with the following retiming with the following retiming
lag:lag:
where where lloptopt(v)(v) is the optimal retiming value and is the optimal retiming value and is the selected clock periodis the selected clock period
0 is a PI or PO
( ) ( )1
opt
v
r v l v