Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
▸ What is RCS?
▸ What is the time complexity of the ASAP scheduling algorithm in terms of |V| and |E|?
▸ With the ILP formulation, how many 0-1 variables are needed for a graph with N nodes and a latency bound of L cycles?
2
Revisiting Some Useful Scheduling Concepts
▸ More on resource-constrained scheduling– ILP formulation– List scheduling
▸ Time-constrained scheduling
▸ Scalable scheduling with realistic design constraints– SDC-based scheduling
3
Outline
▸ Two operations: v1 and v2– Each operate has a full-cycle delay – No operation chaining allowed– L = 3, i.e., a three-cycle scheduling window
▸ Objective: ASAP (no resource constraints here)▸ Write down the ILP formulation
4
Exercise: ILP for ASAP Scheduling
´
+
v1
v2
▸ Linear constraints:
– Unique start times:
– Dependence must be satisfied (no chaining)
– Resource constraints
ILP Formulation of RCS
xik =1, i =1,2,...,Nk∑
t j ≥ ti + di +1:∀(vi,vj )∈ E⇒ k x jkk∑ ≥ k xik +
k∑ di +1
xill=k−di
k
∑i:RT (vi )=r∑ ≤ ar, r =1,...,nres, k =1,...,L
5
RT(vi) : resource type ID (between 1~nres) of operation vi
ar is the number of available resources for resource of type r
▸ Objective: min cTt– t = start times vector, c = cost weight (e.g., [0 ...0 1])
▸ To minimize the overall latency, we can introduce a pseudo node to serve as a unique sink (vN+1) – This sink depends on the original primary output nodes – We then minimize the start time of the sink node
6
ILP Formulation of RCS: Objective
+
´
´
´´
´
´
+ <
-
-
sink
v2v1
v3
v4
v5
vN+1
v6
v7 v8
v9
v10
v11
=+
L
kkNxk
1,1min tN+1 min
▸ In general, the following will help the ILP solver run faster– Minimize # of variables and constraints– Simplify the constraints
▸ We can write the ILP without ASAP and ALAP, but using ASAP and ALAP will simplify the inequalities
+´´´´
´ ´ + <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7
v8
v9
v10
v11
+´
´
´´
´
´
+ <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7 v8
v9
v10
v11
7
Use of ASAP and ALAP
ASAP schedule ALAP schedule
x1,1 + x1,2 + x1,3 + x1,4 =1x2,1 + x2,2 + x2,3 + x2,4 =1...x11,1 + x11,2 + x11,3 + x11,4 =1
8
ILP Formulation: Unique Start Time Constraints
x1,1 =1x2,1 =1x3,2 =1...x6,1 + x6,2 =1...x9,2 + x9,3 + x9,4 =1...
▸ Without using ASAP and ALAP ▸ Using ASAP and ALAP
))(),((
0
iLii
Si
Li
Siil
vALAPtvASAPttlandtlforx
==
><=
+´´´´
´ ´ + <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7
v8
v9
v10
v11
+´
´
´´
´
´
+ <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7 v8
v9
v10
v11
assume L=4
ILP Formulation: Dependence Constraints
▸ Using ASAP and ALAP, the non-trivial inequalities are: (assuming no chaining and single-cycle ops)
2x7,2 +3x7,3 − x6,1 − 2x6,2 −1≥ 02x9,2 +3x9,3 + 4x9,4 − x8,1 − 2x8,2 −3x8,3 −1≥ 0
2x11,2 +3x11,3 + 4x11,4 − x10,1 − 2x10,2 −3x10,3 −1≥ 04x5,4 − 2x7,2 −3x7,3 −1≥ 0
5xn,5 − 2x9,2 −3x9,3 − 4x9,4 −1≥ 05xn,5 − 2x11,2 −3x11,3 − 4x11,4 −1≥ 0
9
+´´´´
´ ´ + <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7
v8
v9
v10
v11
+´
´
´´
´
´
+ <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7 v8
v9
v10
v11
assume L=4
ILP Formulation: Resource Constraints▸ Resource constraints (assuming 2 ALUs and 2 multipliers)
2222222
4,114,94,5
3,113,103,93,4
2,112,102,9
1,10
3,83,7
2,82,72,62,3
1,81,61,21,1
++
+++
++
+
+++
+++
xxxxxxxxxxxxxxxxxxxxx
10
+´´´´
´ ´ + <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7
v8
v9
v10
v11
+´
´
´´
´
´
+ <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7 v8
v9
v10
v11
assume L=4
▸ A widely-used heuristic algorithm for RCS – Schedule one control step (cycle) at a time– Maintain a list of “ready” operations considering dependence– Assign priorities to operations; most “critical” operations (with
the highest priorities) go first
▸ Often refers to a family of algorithms– Typically classified by the way priority function is calculated
• Static priority: Priorities are calculated once before scheduling• Dynamic priority calculation: Priorities are updated during scheduling
11
List Scheduling
12
Static Priority Example: Node Height
× +× × ×
× × + <
-
-
sink
4 4
3 2
2
1
3
1 1
2 2
Nodes are labelled with distance to sink (height)
Ready operations are colored in green
§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available
13
Ready Nodes with Highest Priorities Picked First
× +× × ×
× × + <
-
-
sink
4 4
3 2
2
1
3
1 1
2 2 × +×
× ××
× + <-
-
sink
4 4
3
22
1
3
1 1
2
2
§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available
14
× +×
× ××
× + <-
-
sink
4 4
3
22
1
3
1 1
2
2 × +×
×
×
×
×
+
<
-
-
sink
4 4
3
22
1
3
1
1
2
2
§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available
Update Ready Nodes and Repeat for Each Step
15
Update Ready Nodes and Repeat for Each Step
× +×
×
×
×
×
+
<
-
-
sink
4 4
3
22
1
3
1
1
2
2 × +×
×
×
×
×
+
<
-
-
sink
4 4
3
22
1
3
1
1
2
2
§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available
16
Repeat Until All Nodes Scheduled
× +×
×
×
×
×
+
<
-
-
sink
4 4
3
22
1
3
1
1
2
2 × +×
×
×
×
×
+
<
-
-
sink
4 4
3
22
1
3
1
1
2
2
§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available
A Special Case
▸ With the following (very) restrictive conditions:– All operations have unit delay– All operations (and resources) of the same type– Graph is a forest
▸ List scheduling with static height-based priorities guarantees optimality
▸ This is known as Hu’s algorithm– T. C. Hu, Parallel sequencing and assembly line
problems. Operations Research, 9(6), 841-848, 1961– Guarantees
17
▸ Dual problem of resource-constrained scheduling– Overall latency is given as a constraint (deadline) – Minimize the total cost in terms of area (or resource
usage), power, etc.
▸ NP-hard problem– ILP formulation is exact but is not a polynomial-time
solution– Force-directed scheduling is a well-known heuristic
for TCS (see De Micheli chapter 5.4.4)
18
Time-Constrained Scheduling (TCS)
▸ Minimize the number of classrooms that the school has to allocate for the following courses
▸ Steps to formulate the ILP1. Create variables2. Each course must be scheduled to exactly one of the preferred slots3. Determine the number of rooms required
19
Exercise: ILP for Classroom Allocation
Course PreferredSlots
A (1) (2)B (1) (3)C (2) (3) D (2)
(1) 8:00 – 10:00am(2) 10:00am – 12:00pm(3) 12:00 – 2:00pm
▸ Pros: versatile modeling ability– Can be extended to handle almost every design
aspects• Resource allocation • Module selection• Area, power, etc.
▸ Cons: computationally expensive– #variables = O( #nodes * #c-steps)– 0-1 assignment variables: need extensive search to
find optimal solution
20
Summary: ILP Scheduling
21
Tension between Scalability and Quality
High scalability
(w/ greedy decisions)
High quality(w/ global optimization)
Lowquality
Slow runtime
List scheduling(e.g., [Parker et al., DAC’86])
(M) ILP
Force-directed
[Paulin & Knight,
TCAD’89]
①Handle rich constraints②Perform global optimization③Archive fast runtime
Meta heuristics(e.g., Ant colony [Wang et al., TCAD’07])
?
▸ SDC = System of difference constraints
22
SDC-Based Scheduling
• Target cycle time: 5ns• Delay estimates
– Add (+) 1ns– Load (ld) 3ns– Store (st) 1ns
si : schedule variable for operation i
§ Dependence constraints<v0 ,v4>:s0 – s4 ≤0<v1 ,v3>:s1 – s3 ≤0<v2 ,v3>:s2 – s3 ≤0<v3 ,v4>:s3 – s4 ≤0<v4 ,v5>:s4 – s5 ≤0
§ Cycle time constraintsv2à v5 :s2 – s5 ≤-1v1à v5 :s1 – s5 ≤-1
[J. Cong & Z. Zhang, DAC, 2006] [Z. Zhang & B. Liu, ICCAD, 2013]
ld
+
ldld
+
v1
v3
v4
v2v0
1ns
3ns
1ns
stv51ns
Timing constraints
Operation chaining is naturally supported
Difference Constraints
▸ A difference constraint is a formula in the form of x – y £ b or x – y < b for numeric variables x and y, and constant b
▸ With scheduling variables, we use integer difference constraints to model a variety of scheduling constraints– x and y must have integral values
• Thus b only needs to be an integer => form x – y < b is redundant
23
SDC Constraint Matrix
▸ The constraint matrix of SDC is a totally unimodular matrix (TUM): – Every nonsingular square submatrix has a determinant of -1/+1.
1 0 0 0 -1 00 1 0 -1 0 00 0 1 -1 0 00 0 0 1 -1 00 0 0 0 1 -10 0 1 0 0 -10 1 0 0 0 -1
s0s1s2s3s4s5
00000-1-1
£
A s b
24
Linear programming with a TUM constraint matrix guarantees integral solutions [Hoffman & Kruskal, 1956] [Hochbaum & Shanthikumar, 1990]