Upload
hall-jarvis
View
20
Download
0
Embed Size (px)
DESCRIPTION
Sequential Timing Optimization. s. i. s. j. T. setup. Long path timing constraints. Data must not reach destination FF too late. d max (i,j). s i + d(i,j) + T setup s j + P. i. j. d(i,j). s. i. s. j. Short path timing constraints. FF should not get >1 data set per period. - PowerPoint PPT Presentation
Citation preview
Long path timing constraints
• Data must not reach destination FF too late
si + d(i,j) + Tsetup sj + P
si
sj
d(i,j) T setup
dmax(i,j)i j
Short path timing constraints
• FF should not get >1 data set per period
si
sj
dmin(i,j) Thold
si + dmin(i,j) sj + Thold
dmin(i,j)i j
Clock skew optimization
• Another approach for sequential timing optimization
• Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing– For zero skew, delay from clock source to all FF’s = T
– Positive skew of at FFk
• Change delay from clock source to FFk to T +
– Negative skew of at FFk
• Change delay from clock source to FFk to T –
• Problem statement: set skews for optimized performance
Sequential timing optimization
• Two “true” sequential timing optimization methods– Retiming: moving latches around in a design
– Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous”
Clk Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF FF
Clk
FF FF
Clk Clk
ClkClk
FF FF FF
DelayClkClk Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF
• Represented by the optimization problem below - solve for P and optimal skews
minimize Psubject to
(for all pairs of FF’s (i,j) connected by a combinational path)
si + dmin(i,j) sj + Thold
si + dmax(i,j) + Tsetup sj + P
• If dmax(i,j) and dmin(i,j) are constant – linear program in the variables si and P
Finding the optimal clock period using skews
Graph-based approaches
• For a constant clock period P, the linear program = system of difference constraints
sp - sq constant
• As before, perform a binary search on P
• For each value of P build an equivalent constraint graph
• Shortest path in the constraint graph gives a set of skews for a given value of P
• If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations
i jf ( P )
Retiming
Assume unit gate delays, no setup times
Initial Circuit: P=3
Retimed Circuit: P=2
Clk Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF
FF
Clk
FF FF
Clk Clk
Retiming: Definition
• Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods)
• Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged
Graph Notation of Circuit
w(euv) = #latencies between u and v
r(u) is # latencies moved across gate u
r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0
wr(euv) = w(euv) + r(v) - r(u)
u vw(euv) = 2
r(u) = 1
w(euv) = 1u v
r(v) = 2
u vwr(euv) = 2
u v
delay = d(u) delay = d(v)
For a path from v1 to vk
• Consider a path of vertices
– Define w(v1 to vk) = w12 + w23 + … + w(k-1,k)
– After retiming, wr(v1 to vk) = w12r + w23r + … + w(k-1,k)r
= [w12+r(2)–r(1)]+[w23+r(3)–r(2)]+[w23+r(3)–r(2)]+…+[w(k-1,k)+r(k)–r(k-1)]
= w(v1 to vk) + r(k) – r(1)
– For a cycle, v1 = vk, which implies that wr = w for a cycle
– In other words, retiming leaves the # latencies unchanged on any cycle
v1 v2 v3 vkw12 w23 w34 Wk-1,k
Constraints for retiming
• Non-negativity constraints (cannot have negative latencies)– wr on each edge must be non-negative
– For any edge from vertex u to vertex v,
wr(u,v) = w(u,v) + r(v) – r(u) 0
i.e., r(u) – r(v) w(u,v)
• Period constraints (need a latency if path delay period)– (or more precisely, path delay + Tsetup period)
– For any path from vertex v1 to vertex vk, under clock period P,
wr(v1 to vk) = w(v1 to vk) + r(vk) – r(v1) 1 if delay(v1 to vk) > P
i.e., r(v1) – r(vk) w(v1 to vk) – 1 if delay(v1 to vk) > P
Example
• Circuit graph:– Vertex weights = gate delays
– Edge weights = # latencies
• Non-negativity constraints1. r(h) – r(G1) 0
2. r(G1) – r(G2) 0
3. r(G2) – r(G3) 0
4. r(G3) – r(G4) 1
5. r(G4) – r(h) 0
• Period constraints for P = 26. r(h) – r(G3) -1
7. r(G1) – r(G3) -1
8. r(G2) – r(G4) 0
9. r(G2) – r(h) 0
Clk Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF
G1 G3G2 G4
0
11
1 1
0 0
1
0
0
G1
G2 G3
G4
h
Graph-based approaches
• System of difference constraintsr(u) – r(v) c
• Equivalent constraint graph
• Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P)
• If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations
v uc
Corresponding shortest path problem
• Find shortest path from host to get– r(h) = 0
– r(G1) = 0
– r(G2) = 0
– r(G3) = 1
– r(G4) = 0
• This gives the solution
0 0
1
0
0
G1
G2 G3
G4
h
-1
-1
00
Clk Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF FF
Clk
FF FF
Clk Clk
Overall scheme for minimum period retiming
• Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!)– Binary search over P = [0,Punretimed]– Punretimed = period of unretimed circuit = upper bound on optimal P– Range in some iteration of the search = [Pmin, Pmax]– Build shortest path graph with non-negativity constraints (independent of
P)– At each value of P
• Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here)
• Solve shortest path problem• If negative cycle found, set Pmin = P; else set Pmax = P• Iterate until range of P is sufficiently small
Finding shortest paths
• Dijkstra’s algorithm– O(VlogV + E) for a graph with V vertices and E edges– Applicable only if all edge weights are non-negative– The latter condition does not hold in our case!
• Bellman-Ford algorithm– O(VE) for a graph with V vertices and E edges– Outline
for I = 1 to V – 1 for each edge (u,v) E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)]
for each edge (u,v) E if r(u) + d(u,v) > r(v) then a negative cycle exists
• Basic idea: in iteration I, update lowest cost path with I edges• After V – 1 iterations, if any update is still required, a negative cycle exists
“Relaxation” algorithm for retiming
• Perform a binary search on clock period P as before
• At each value of P check feasibility as follows– Repeat V-1 times (where V = # vertices)
1. Set r(u) = 0 for each vertex
2. Perform timing analysis to find clock period of the circuit
3. For any vertex u with delay > P, r(u)++
4. If no such vertex exists, P is feasible
5. Else, retime the circuit using these values of r; update the circuit and go to step 1
– If Clock period > P after V – 1 iterations, then P is infeasible
The retiming-skew relationship
• Skew
• Retiming
• Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1
• Magnitude of optimal skew = amount of delay that the FF has to move across
• Can be generalized for another approach to retiming
FF
Clk
FF FF
Clk Clk
Clk
Comb Block 1 Comb Block 2
Clk
FF FF FF
Delay = 1Clk
Can move from skews to retiming
• Moving a flip-flop across a gate G– left right increasing its
skew by delay(G)–
– right left reducing its skew by delay(G)
–
• More generally,
Old skew=sDelay=d
New skew = s+d
s1
s2
s3
s4
sj = max1 i 4 (si+MAX(i,j))
sk = max1 i 4 (si+MAX(i,k))
FF j
FF k
Another approach to retiming
• Two-phase approach– Phase A: Find optimal skews
(complexity depends on the number of FF’s, not the number of gates)
– Phase B: Relocate FF’s to retime circuit(since most FF movements are seen to be local in practice, this does not
take too long)
– Not provably better than earlier approach in terms of complexity, but practically works very well