Upload
alexander-carroll
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
A REVISIT TO THE PRIMAL-DUAL BASED CLOCK SKEW SCHEDULING ALGORITHMMin Ni and Seda Ogrenci Memik
EECS Department, Northwestern University
AGENDA
Introduction Related Work The Primal-Dual Algorithm
The existing primal-dual approach Our enhanced implementation
Experimental Results Conclusion
INTRODUCTION
The Problem of Clock Skew Scheduling
constraint graph
ijji
jiji
tLL
PTLL
MINIMIZE P
RELATED WORK
Existing Approaches for Solving Clock Skew Scheduling Linear programming Binary search with iterative shortest path
problem O(|V||E|log(C/n))
Primal-dual based algorithm (Burns’) O(|V|^2|E|)
THE PRIMAL-DUAL APPROACH
Theory of the Primal-Dual Algorithm
hijji
sjiji
EjitLL
EjiPTLLts
PPRIMAL
),(,
),(,..
min
Complementary slackness theorem: starting from feasible solution of PRIMAL, find feasible solution of DUAL, they can be optimal if certain conditions are met.
Esjiij
Ehji Ehij Esji Esijjiijjiij
Esji Ejiijijijji
Vits
tTDUALh
),(
),( ),( ),( ),(
),( ),(
1
,0..
max
dual variables
Primal variables
PRIMAL-DUAL APPROACH
The complementary slackness conditions
0
0)1(
0)(
0)(
),( ),( ),( ),(
),(
Ehji Ehij Esji Esijjiijjiij
Esjiij
ijjiij
jijiij
P
tLL
PTLL
General format: variable times constraints
Starting from a feasible solution {Li, P}, if we can also find feasible solution { }to the above system of linear equations, the feasible solution is optimal.
ijij ,
If > 0, then must be zero, those = 0 are called admissible edge.
ijij ,
RESTRICTED DUAL PROBLEM
Solve the system of linear equations on only admissible edges
0
01
),( ),( ),( ),(
),(
Ehji Ehij Esji Esijjiijjiij
Esjiij
This is equivalent to solving the following restricted dual problem
0,0,0
0
1..
min
),( ),( ),( ),(
),(
jiij
Ehji Ehij Esji Esijjiijjiij
Esjiijts
If minimum is 0, then we are done. However, it is still not straightforward to solve because it is on dual variables
RESTRICTED PRIMAL PROBLEM
Check on the Restricted Primal Problem
1
0
0..
max
ji
ji
dd
ddts
It can be proved that this problem has an optimal solution 0 if there exists a cycle on the admissible graph Ga (consisting of admissible edges only).
PRIMAL-DUAL ALGORITHM
Starting from an empty admissible graph, incrementally reduce the clock period value until a cycle emerges in the admissible graph.
The effect of reducing P is that more edges become admissible and those are inserted into admissible graph Ga.
0
0
ijji
jiji
tLL
PTLL
Two main tasks in while loop: 1.Find THETA;2.Maintain Ga;
PRIMAL-DUAL BURNS’ IMPLEMENTATION
A different strategy for maintaining the admissible graph Ga and updating THETA values results in different efficiency.
AN EXAMPLE
5 iterations to find the minimum clock period P by updating admissible graph and theta value;
edge becomes admissible
Theta value
skew
ENHANCED IMPLEMENTATION
Two major sources of overhead in the existing implementationScan through all edges (|E|) in the graph
to create admissible graph Ga from scratch in each iteration;
Calculate theta values for all edges (|E|) in the graph and find the minimum one;
MAINTAINING ADMISSIBLE GRAPH Theorem: If exactly one minimum theta value edge (i,
j) is added into the admissible graph Ga, then Ga is a forest until a cycle is generated.
Add new admissible edge and remove edges becoming non-admissible;
No need for calling negative cycle detection routine, maintaining a parent list instead; Complexity is |V| compared with the same step in Burns’ implementation |E|;
EFFICIENT CALCULATION OF THETA
Similar to Dijkstra’s shortest path algorithm, a set of edges are maintained as candidates of shortest path tree edges;In our problem, we need to find minimum
theta edge to add into Ga; In Burns’ implementation, all edges are scanned
during each iteration; theta values are recalculated for all edges;
We maintain a much smaller set of candidates in heap; theta values are only recalculated for a subset of this small candidate set.
O(logV) for maintaining the heap;
ASYMPTOTIC RUNTIME IMPROVEMENT
Our implementation has an asymptotic runtime of ; while it is for Burns’ implementation; Very similar to the improvement from Bellman-
Ford algorithm ( )to Dijikstra’s ( ) algorithm for shortest path problem.
|)||(| 2 EVO
|)||(| EVO
|)|log|||||(| 2 VVEVO
|)|log|||(| VVEO
EXPERIMENTAL SETUP
Benchmark circuits ISCAS89 large circuit ITC99
Delay data Resynthesis in Synopsys Design Compiler (VHDL) Delay is exported from Standard Delay Format
(SDF) file Comparison between Burns’ and ours
Same graph data structure Same graph manipulating subroutines Same routine for calculating theta values
EXPERIMENTAL RESULTS
CONCLUSIONS
A much more efficient primal-dual based algorithm to improve the runtime efficiency of Burns’ implementation of the primal-dual algorithm
Superior in both theoretical and practical runtime efficiency
On average 95X speed up on 20 test circuits