27
More Scheduling ECE 5775 High-Level Digital Design Automation Fall 2018

ECE 5775 High-Level Digital Design Automation …High-Level Digital Design Automation Fall 2018 Lab 2 due on Monday 9/24 Lab 3 will be posted over the weekend 1 Announcements What

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

More Scheduling

ECE 5775High-Level Digital Design Automation

Fall 2018

▸ Lab 2 due on Monday 9/24

▸ Lab 3 will be posted over the weekend

1

Announcements

▸ What is RCS?

▸ What is the time complexity of the ASAP scheduling algorithm in terms of |V| and |E|?

▸ With the ILP formulation, how many 0-1 variables are needed for a graph with N nodes and a latency bound of L cycles?

2

Revisiting Some Useful Scheduling Concepts

▸ More on resource-constrained scheduling– ILP formulation– List scheduling

▸ Time-constrained scheduling

▸ Scalable scheduling with realistic design constraints– SDC-based scheduling

3

Outline

▸ Two operations: v1 and v2– Each operate has a full-cycle delay – No operation chaining allowed– L = 3, i.e., a three-cycle scheduling window

▸ Objective: ASAP (no resource constraints here)▸ Write down the ILP formulation

4

Exercise: ILP for ASAP Scheduling

´

+

v1

v2

▸ Linear constraints:

– Unique start times:

– Dependence must be satisfied (no chaining)

– Resource constraints

ILP Formulation of RCS

xik =1, i =1,2,...,Nk∑

t j ≥ ti + di +1:∀(vi,vj )∈ E⇒ k x jkk∑ ≥ k xik +

k∑ di +1

xill=k−di

k

∑i:RT (vi )=r∑ ≤ ar, r =1,...,nres, k =1,...,L

5

RT(vi) : resource type ID (between 1~nres) of operation vi

ar is the number of available resources for resource of type r

▸ Objective: min cTt– t = start times vector, c = cost weight (e.g., [0 ...0 1])

▸ To minimize the overall latency, we can introduce a pseudo node to serve as a unique sink (vN+1) – This sink depends on the original primary output nodes – We then minimize the start time of the sink node

6

ILP Formulation of RCS: Objective

+

´

´

´´

´

´

+ <

-

-

sink

v2v1

v3

v4

v5

vN+1

v6

v7 v8

v9

v10

v11

=+

L

kkNxk

1,1min tN+1 min

▸ In general, the following will help the ILP solver run faster– Minimize # of variables and constraints– Simplify the constraints

▸ We can write the ILP without ASAP and ALAP, but using ASAP and ALAP will simplify the inequalities

+´´´´

´ ´ + <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7

v8

v9

v10

v11

´

´´

´

´

+ <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7 v8

v9

v10

v11

7

Use of ASAP and ALAP

ASAP schedule ALAP schedule

x1,1 + x1,2 + x1,3 + x1,4 =1x2,1 + x2,2 + x2,3 + x2,4 =1...x11,1 + x11,2 + x11,3 + x11,4 =1

8

ILP Formulation: Unique Start Time Constraints

x1,1 =1x2,1 =1x3,2 =1...x6,1 + x6,2 =1...x9,2 + x9,3 + x9,4 =1...

▸ Without using ASAP and ALAP ▸ Using ASAP and ALAP

))(),((

0

iLii

Si

Li

Siil

vALAPtvASAPttlandtlforx

==

><=

+´´´´

´ ´ + <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7

v8

v9

v10

v11

´

´´

´

´

+ <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7 v8

v9

v10

v11

assume L=4

ILP Formulation: Dependence Constraints

▸ Using ASAP and ALAP, the non-trivial inequalities are: (assuming no chaining and single-cycle ops)

2x7,2 +3x7,3 − x6,1 − 2x6,2 −1≥ 02x9,2 +3x9,3 + 4x9,4 − x8,1 − 2x8,2 −3x8,3 −1≥ 0

2x11,2 +3x11,3 + 4x11,4 − x10,1 − 2x10,2 −3x10,3 −1≥ 04x5,4 − 2x7,2 −3x7,3 −1≥ 0

5xn,5 − 2x9,2 −3x9,3 − 4x9,4 −1≥ 05xn,5 − 2x11,2 −3x11,3 − 4x11,4 −1≥ 0

9

+´´´´

´ ´ + <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7

v8

v9

v10

v11

´

´´

´

´

+ <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7 v8

v9

v10

v11

assume L=4

ILP Formulation: Resource Constraints▸ Resource constraints (assuming 2 ALUs and 2 multipliers)

2222222

4,114,94,5

3,113,103,93,4

2,112,102,9

1,10

3,83,7

2,82,72,62,3

1,81,61,21,1

++

+++

++

+

+++

+++

xxxxxxxxxxxxxxxxxxxxx

10

+´´´´

´ ´ + <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7

v8

v9

v10

v11

´

´´

´

´

+ <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7 v8

v9

v10

v11

assume L=4

▸ A widely-used heuristic algorithm for RCS – Schedule one control step (cycle) at a time– Maintain a list of “ready” operations considering dependence– Assign priorities to operations; most “critical” operations (with

the highest priorities) go first

▸ Often refers to a family of algorithms– Typically classified by the way priority function is calculated

• Static priority: Priorities are calculated once before scheduling• Dynamic priority calculation: Priorities are updated during scheduling

11

List Scheduling

12

Static Priority Example: Node Height

× +× × ×

× × + <

-

-

sink

4 4

3 2

2

1

3

1 1

2 2

Nodes are labelled with distance to sink (height)

Ready operations are colored in green

§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available

13

Ready Nodes with Highest Priorities Picked First

× +× × ×

× × + <

-

-

sink

4 4

3 2

2

1

3

1 1

2 2 × +×

× ××

× + <-

-

sink

4 4

3

22

1

3

1 1

2

2

§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available

14

× +×

× ××

× + <-

-

sink

4 4

3

22

1

3

1 1

2

2 × +×

×

×

×

×

+

<

-

-

sink

4 4

3

22

1

3

1

1

2

2

§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available

Update Ready Nodes and Repeat for Each Step

15

Update Ready Nodes and Repeat for Each Step

× +×

×

×

×

×

+

<

-

-

sink

4 4

3

22

1

3

1

1

2

2 × +×

×

×

×

×

+

<

-

-

sink

4 4

3

22

1

3

1

1

2

2

§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available

16

Repeat Until All Nodes Scheduled

× +×

×

×

×

×

+

<

-

-

sink

4 4

3

22

1

3

1

1

2

2 × +×

×

×

×

×

+

<

-

-

sink

4 4

3

22

1

3

1

1

2

2

§ Assumptions:– All operations have unit delay– 2 MULTs, 1 AddSub, and 1 CMP available

A Special Case

▸ With the following (very) restrictive conditions:– All operations have unit delay– All operations (and resources) of the same type– Graph is a forest

▸ List scheduling with static height-based priorities guarantees optimality

▸ This is known as Hu’s algorithm– T. C. Hu, Parallel sequencing and assembly line

problems. Operations Research, 9(6), 841-848, 1961– Guarantees

17

▸ Dual problem of resource-constrained scheduling– Overall latency is given as a constraint (deadline) – Minimize the total cost in terms of area (or resource

usage), power, etc.

▸ NP-hard problem– ILP formulation is exact but is not a polynomial-time

solution– Force-directed scheduling is a well-known heuristic

for TCS (see De Micheli chapter 5.4.4)

18

Time-Constrained Scheduling (TCS)

▸ Minimize the number of classrooms that the school has to allocate for the following courses

▸ Steps to formulate the ILP1. Create variables2. Each course must be scheduled to exactly one of the preferred slots3. Determine the number of rooms required

19

Exercise: ILP for Classroom Allocation

Course PreferredSlots

A (1) (2)B (1) (3)C (2) (3) D (2)

(1) 8:00 – 10:00am(2) 10:00am – 12:00pm(3) 12:00 – 2:00pm

▸ Pros: versatile modeling ability– Can be extended to handle almost every design

aspects• Resource allocation • Module selection• Area, power, etc.

▸ Cons: computationally expensive– #variables = O( #nodes * #c-steps)– 0-1 assignment variables: need extensive search to

find optimal solution

20

Summary: ILP Scheduling

21

Tension between Scalability and Quality

High scalability

(w/ greedy decisions)

High quality(w/ global optimization)

Lowquality

Slow runtime

List scheduling(e.g., [Parker et al., DAC’86])

(M) ILP

Force-directed

[Paulin & Knight,

TCAD’89]

①Handle rich constraints②Perform global optimization③Archive fast runtime

Meta heuristics(e.g., Ant colony [Wang et al., TCAD’07])

?

▸ SDC = System of difference constraints

22

SDC-Based Scheduling

• Target cycle time: 5ns• Delay estimates

– Add (+) 1ns– Load (ld) 3ns– Store (st) 1ns

si : schedule variable for operation i

§ Dependence constraints<v0 ,v4>:s0 – s4 ≤0<v1 ,v3>:s1 – s3 ≤0<v2 ,v3>:s2 – s3 ≤0<v3 ,v4>:s3 – s4 ≤0<v4 ,v5>:s4 – s5 ≤0

§ Cycle time constraintsv2à v5 :s2 – s5 ≤-1v1à v5 :s1 – s5 ≤-1

[J. Cong & Z. Zhang, DAC, 2006] [Z. Zhang & B. Liu, ICCAD, 2013]

ld

+

ldld

+

v1

v3

v4

v2v0

1ns

3ns

1ns

stv51ns

Timing constraints

Operation chaining is naturally supported

Difference Constraints

▸ A difference constraint is a formula in the form of x – y £ b or x – y < b for numeric variables x and y, and constant b

▸ With scheduling variables, we use integer difference constraints to model a variety of scheduling constraints– x and y must have integral values

• Thus b only needs to be an integer => form x – y < b is redundant

23

SDC Constraint Matrix

▸ The constraint matrix of SDC is a totally unimodular matrix (TUM): – Every nonsingular square submatrix has a determinant of -1/+1.

1 0 0 0 -1 00 1 0 -1 0 00 0 1 -1 0 00 0 0 1 -1 00 0 0 0 1 -10 0 1 0 0 -10 1 0 0 0 -1

s0s1s2s3s4s5

00000-1-1

£

A s b

24

Linear programming with a TUM constraint matrix guarantees integral solutions [Hoffman & Kruskal, 1956] [Hochbaum & Shanthikumar, 1990]

▸ More on SDC scheduling▸ Resource sharing

25

Next Class

▸ These slides contain/adapt materials developed by– Ryan Kastner (UCSD)

26

Acknowledgements