24
Combining Technology Combining Technology Mapping and Retiming Mapping and Retiming EECS 290A EECS 290A Sequential Logic Synthesis and Verification Sequential Logic Synthesis and Verification

Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Combining Technology Mapping Combining Technology Mapping and Retimingand Retiming

EECS 290A EECS 290A Sequential Logic Synthesis and VerificationSequential Logic Synthesis and Verification

Page 2: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

OutlineOutline MotivationMotivation Technology mapping for combinational circuitsTechnology mapping for combinational circuits Generalizing the concept of combinational delay Generalizing the concept of combinational delay

to sequential circuit using the concept of l-valueto sequential circuit using the concept of l-value Technology mapping for sequential circuitsTechnology mapping for sequential circuits

Computation of cutsComputation of cuts Search for the optimum-delay solutionSearch for the optimum-delay solution

• Computation of optimum l-valuesComputation of optimum l-values Constructing the solutionConstructing the solution Retiming for optimum delayRetiming for optimum delay

Page 3: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Traditional Tech Mapping ApproachTraditional Tech Mapping Approach

Cut sequential circuit at the Cut sequential circuit at the latch boundarylatch boundary

Optimize and map the Optimize and map the combinational partcombinational part

Pros: Preserves latch encodingPros: Preserves latch encoding Cons: Potentially suboptimalCons: Potentially suboptimal

(Optional) Retime the mapped (Optional) Retime the mapped circuitcircuit

LIPO

PI LO

Logic

Latches

Page 4: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Motivating Example: LUT Size = 3Motivating Example: LUT Size = 3

a b

c

i1 i2

f

a b

c

i1 i2

f

i2

i1

f

i1

f

i2

2 LUTs

mappingmapping

retiming

1 LUT

Page 5: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: OverviewBasic Mapping: Overview

Pre-compute truth tables of gates (supergates)Pre-compute truth tables of gates (supergates) Represent netlist as an AND-INV graph (AIG)Represent netlist as an AND-INV graph (AIG) For each node, compute cutsFor each node, compute cuts Map network for delayMap network for delay Recover area using heuristicsRecover area using heuristics Select final mappingSelect final mapping

Page 6: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

What is Mapping?What is Mapping? Mapping expresses functions using gatesMapping expresses functions using gates

z1 z2 z3

x5x4x3x2x1

Page 7: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

cdcdabab 0000 0101 1111 1010

0000 00 00 11 000101 00 00 11 111111 00 11 11 001010 00 00 11 00

F(a,b,c,d) = ab + d(ac’+bc)

F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d)

cdcdabab 0000 0101 1111 1010

0000 00 00 11 000101 00 00 11 111111 00 11 11 001010 00 00 11 00

6 nodes

4 levels

7 nodes

3 levels

b ca c

a b d

a c b d b c a d

Basic Mapping: AND-INV Basic Mapping: AND-INV GraphsGraphs

Page 8: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Computing AIGBasic Mapping: Computing AIG Technology-independent Technology-independent

synthesissynthesis Any synthesis flow can be usedAny synthesis flow can be used

Constructing AIG from Constructing AIG from factored formsfactored forms

SOPs are factored using SOPs are factored using algebraic factoringalgebraic factoring

Balancing AIGBalancing AIG Reduces delayReduces delay

z1 z2 z3

x5x4x3x2x1

n

Fn= x2 x3’ x4

Page 9: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: CutsBasic Mapping: Cuts

Definition. A cut Definition. A cut CC for a node for a node n n is a is a set of nodes, such that all paths set of nodes, such that all paths from the primary inputs to from the primary inputs to n n passes passes through a node in through a node in CC

Node itself is an elementary cutNode itself is an elementary cut kk-feasible cuts are cuts containing at -feasible cuts are cuts containing at

most most kk nodes nodes An average number of 5-feasible cuts An average number of 5-feasible cuts

in benchmarks is ~20 cuts per nodein benchmarks is ~20 cuts per node

n

x3x2x1

Page 10: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Computing CutsBasic Mapping: Computing Cuts

Compute all 2-feasible cuts of node n.Compute all 2-feasible cuts of node n.

Cuts for node Cuts for node p = {{p}, {s,xp = {{p}, {s,x22}, {x}, {x11,x,x22}}}}

Cuts for node Cuts for node q = {{q}, {xq = {{q}, {x22,t}, {x,t}, {x22,x,x33}}}}

Cuts for node Cuts for node n = {{p}, {s,xn = {{p}, {s,x22}, {x}, {x11,x,x22}} }} {{q}, {x {{q}, {x22,t}, {x,t}, {x22,x,x33}} }} {n} = {n} = {{n}, {{n}, {p,q}, {p,x{p,q}, {p,x22,t}, {p,x,t}, {p,x22,x,x33}, …}}, …}

2-feasible cuts for node 2-feasible cuts for node n = {{n}, {p,q}}n = {{n}, {p,q}}

n

x3x2x1

qp

s t

All All kk-feasible cuts are computed in -feasible cuts are computed in one pass over the AIGone pass over the AIG

Assign elementary cuts for primary inputsAssign elementary cuts for primary inputs For each internal node For each internal node

• merge the cut sets of children while merge the cut sets of children while removing duplicated cutsremoving duplicated cuts

• add the elementary cut composed of the add the elementary cut composed of the node itselfnode itself

Page 11: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Truth TablesBasic Mapping: Truth Tables

Truth table is a bit-string representing Truth table is a bit-string representing Boolean function of a cutBoolean function of a cut

Truth tables are computed for all cuts of Truth tables are computed for all cuts of all nodesall nodes

For each cut, assign elementary variables to For each cut, assign elementary variables to cut leavescut leaves

Compute the truth tables for the internal Compute the truth tables for the internal nodes in topological ordernodes in topological order

x3x1

t

q

x2

x1 = 10101010x1 = 10101010x2 = 11001100x2 = 11001100x3 = 11110000x3 = 11110000t = x2 & x3 = 11000000t = x2 & x3 = 11000000q = x1 & t = 10000000q = x1 & t = 10000000

LSBMSB

Page 12: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Delay Basic Mapping: Delay OptimalityOptimality

Assign the arrival times of the Assign the arrival times of the primary inputsprimary inputs

For each node, in topological orderFor each node, in topological order Compare the truth table of the cut with Compare the truth table of the cut with

the truth tables of the gates (when they the truth tables of the gates (when they are equal, we have a are equal, we have a matchmatch))

Compute the arrival times of each cut, Compute the arrival times of each cut, in both phasesin both phases

Select the best cut for each phaseSelect the best cut for each phase When arrival times are equal, use area When arrival times are equal, use area

as a tie-breakeras a tie-breaker

c1 c2c3

c4

TTc2c2 < T < Tc3c3 < T < Tc1c1 < T < Tc4c4

CC2 2 is the best cutis the best cut

Page 13: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Area RecoveryBasic Mapping: Area Recovery Performs three passesPerforms three passes

Minimize area flowMinimize area flow Minimize exact area for best matchesMinimize exact area for best matches Minimize area by phase assignmentMinimize area by phase assignment

In each pass, for all nodes, in In each pass, for all nodes, in topological ordertopological order

Consider matches with Consider matches with ArrivalTime <= RequiredTimeArrivalTime <= RequiredTime Among these matches, pick the one Among these matches, pick the one

minimizing area(flow)minimizing area(flow) When area(flows) are equal, use delay When area(flows) are equal, use delay

as a tie-breakeras a tie-breaker

c1 c2c3

c4

AAc2c2 < A < Ac3c3 < A < Ac1c1 < A < Ac4c4

CC2 2 is the best cutis the best cut

Page 14: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Area FlowBasic Mapping: Area Flow

Definition:Definition: Area flowArea flow of a primary input is 0 of a primary input is 0 Area flowArea flow of a node in the network is of a node in the network is

AF(n) = [ Area(n) +AF(n) = [ Area(n) + ii AF(fanin AF(faninii(n)) ] / NumFanouts(n)(n)) ] / NumFanouts(n)

00

1/3

(1+1/3) / 2 = 2/3

0

Page 15: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Area of a MatchBasic Mapping: Area of a Match Definition. Definition. AreaArea of a match is the sum total of the areas of all the of a match is the sum total of the areas of all the

gates in maximum fanout-free cone (MFFC) of the root gate gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins)(includes the root gate and some of the fanins)

M1

g1

g2 g3g4

g5 g6

g7 g8g9

g10g11

g12g13

A(M1)=A(g1)+ A(g3)+ A(g4)+ A(g5)+A(g9)

Page 16: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Basic Mapping: Select Final Basic Mapping: Select Final MappingMapping

Extracting the final mapping from the AIG after the best Extracting the final mapping from the AIG after the best matches are assigned to each nodematches are assigned to each node

Select the best match for each primary output nodeSelect the best match for each primary output node Recursively, for each fanin of a selected match, select its best Recursively, for each fanin of a selected match, select its best

matchesmatches z1 z2 z3

x5x4x3x2x1

Page 17: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Mapping for Sequential CircuitsMapping for Sequential Circuits

Represent netlist as an AND-INV graph (AIG)Represent netlist as an AND-INV graph (AIG) For each node, compute cuts (iteration over the circuit)For each node, compute cuts (iteration over the circuit) For each node, compute l-values (iteration over the circuit)For each node, compute l-values (iteration over the circuit) Map network for delay (iteration over the clock periods)Map network for delay (iteration over the clock periods) Recover area using heuristicsRecover area using heuristics Select final mappingSelect final mapping

P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.

Page 18: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

l-Value: A Generalization of l-Value: A Generalization of Combinational DelayCombinational Delay

Definition.Definition. For each edge For each edge e: u e: u v v inin S S, we , we assign assign l-weight l-weight equal to equal to --d+d+uuvv, , where where is the clock period, is the clock period, dd is the number of latches on the edge, and is the number of latches on the edge, and uuv v is the combinational delay of pin is the combinational delay of pin uu of node of node v.v.

Definition.Definition. The The l-valuel-value of a node in of a node in S S is defined is defined as the maximum weight of the paths from the PIs as the maximum weight of the paths from the PIs to the node using the to the node using the l-weightsl-weights..

Theorem:Theorem: SS can be retimed to a clock period can be retimed to a clock period iff iff the the l-valuel-value of each PO is less than or equal to of each PO is less than or equal to ..

Page 19: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

ExampleExample

a b

c

i1 i2

f

D = 1 = 1 - infeasible

l(a) = 1, l(c)=2, etc

D = 1 = 2 - feasible

l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc

D = 1 = 3 - feasible

l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc

Page 20: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Computing CutsComputing Cutsfor each non-PO node for each non-PO node vv in in NN LLvv = {{v = {{v00}};}};donedone = false; = false;while ( while ( donedone == false ) do == false ) do donedone = true; = true; for each node for each node vv (not PI or PO) in (not PI or PO) in NN do do tmptmp = = mergemerge ( (LLu1u1, L, Lu2u2, …, L, …, Luiui);); if ( if ( tmp tmp L Lvv ) then ) then Lv = tmp Lv = tmp {{v {{v00}};}}; donedone = false; = false;return success; // return success; // LLvv settled to settled to CCvv for each for each vv

mergemerge(C(Cu1u1,C,Cu2u2,…,C,…,Cutut) ) = {= {c = cc = c11d1d1 c c22

d2d2 … … c cttdt dt ||ccii C Cui ui andand |c| |c| k k }}

where where ccii

didi = {x = {xd+did+di | x | xdd c cii}} and and ddii is the number of latches on the edge from is the number of latches on the edge from uuii to to vv..

Page 21: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

ExampleExample

ii11 i i22 a b c a b c

0:0: {i {i1100} {i} {i2200} {a} {a00} {b} {b00} {c} {c00}}

1:1: {i {i1100, c, c11} {i} {i2200, c, c00} {a} {a00, b, b11} }

{a{a00, i, i2211, c, c11} }

{i{i1100, c, c11, b, b11} }

{i{i1100, c, c11, i, i2211} }

2: 2: {i{i1100, a, a11, b, b22}} {i{i2200, a, a00, b, b11} }

a b

c

i1 i2

Page 22: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Finding Minimum l-ValuesFinding Minimum l-Valuesfor each node for each node vv in in NN do do if (if (vv is a PI) is a PI) l(v)l(v) = 0; = 0; else else l(v)l(v) = - = -;;donedone = false; = false;while ( while ( donedone == false ) do == false ) do donedone = true; = true; for each non-PI node for each non-PI node vv in in NN do do

tmp = mintmp = minc, a cut of v c, a cut of v ( max[ l(u) - ( max[ l(u) - d+d+uuv v | u| udd c] ) c] )

if ( if ( l(v) < tmpl(v) < tmp ) ) l(v) = tmpl(v) = tmp; ; donedone = false; = false; if ( if ( vv is a PO and is a PO and l(v) > l(v) > ) return failure; ) return failure;return success; // bound have settledreturn success; // bound have settled

Page 23: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Constructing Mapping SolutionConstructing Mapping Solution

UU = the set of POs = the set of POsS S = { = { vv | | vv is a PI or PO } is a PI or PO } while ( while ( UU ) do ) do vv = any node in = any node in UU; ; UU = = UU – – {v}{v}; ;

for each non-trivial cut for each non-trivial cut c c C Cvv do do

if ( if ( lloptopt(v) ==(v) == max[ lmax[ loptopt(u) - (u) - d+d+uuv v | u| udd c] c] ))

ccbestbest = c; = c;

for each for each uudd c cbestbest dodo

if ( if ( uu is not in is not in SS ) ) S = S S = S {u}; U = U {u}; U = U {u}; {u}; create an edge is create an edge is SS from from uu to to v v with with dd FFs; FFs;return return SS; ;

Page 24: Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Performing Final RetimingPerforming Final Retiming Retime each node Retime each node vv with the following retiming with the following retiming

lag:lag:

where where lloptopt(v)(v) is the optimal retiming value and is the optimal retiming value and is the selected clock periodis the selected clock period

0 is a PI or PO

( ) ( )1

opt

v

r v l v