Dynamic Scheduling and Dynamic Percolation Elkin Garcia CAPSL – UD Based on a presentation made by...

Preview:

Citation preview

Dynamic Scheduling and Dynamic Percolation

Elkin GarciaCAPSL – UD

Based on a presentation made by Rishi KhanET International

1

Motivation

• Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64)

• EVEN FOR REGULAR APPLICATIONS LIKE MATRIX MULTIPLICATION

2

Issues of Static Scheduling• Blocks are not necessarily multiples of the Optimal

Tile Size (OTS)• Extra overhead for processing non-optimal sized tiles.

• It is worst when many processors share a small fixed amount of on-chip memory

3

Issues of Static Scheduling (2)

• SS does not consider stalls due to arbitration of shared resources.

• SS assumes that TUs doing the same amount of work will complete at the same time.

• Many-core architectures have plenty of shared resources:– FPUs, crossbar, memory, and I-Caches that can

produce unexpected stalls.

4

Dynamic Scheduling (DS)Balances optimally in presence of shared

resources with higher efficiency.

- Partition of matrix C only in tiles.

- Use of atomic operations for low overhead.

5

X =

A B C

Results on Cyclops64

6

Dynamic Percolation (DP)

• Assign tasks (codelets) to TU at runtime with low overhead using a lock-free queue:

• Computation tasks: Compute optimum tiles of 6x6.

• Data movement tasks: Move inputs and outputs between SRAM and DRAM using double buffering.

7

Codelets

• A nonpreemptive set of code that can run to completion once it’s dependencies/Events are met.

• Dependencies/Events: – Data Dependencies– Resource Constrains (Threads/BW)– Desired Behavior (Power)

8

Matrix Multiply in SRAM using Petri nets

INIT

INIT

Comp1Comp1

Clean

Clean

10241024

TT

LowLow

Size: 192x192

X =

A B C9

PLACE

TRANSITION

TOKENS

Matrix Multiply in DRAM

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

88

TT TT

FF

TT

LowLow

10

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

TT

TT

LowLow

Double Buffer Computation Example

11

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High Init Set 1

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

ΔT

11

11

Low

Double Buffer Computation Example

12

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

Init Set 2

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

Comp1(1024)

11

10241024

Double Buffer Computation Example

13

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

Comp1(1020)

10201020

10241024

Comp2(1024)

Double Buffer Computation Example

14

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

10241024

Comp2(1024)

Clean

Double Buffer Computation Example

15

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow11

10201020

Comp2(1020)

Init Copy Set

Double Buffer Computation Example

16

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

88

10001000

Comp2(1000)

Copy (8)

Double Buffer Computation Example

17

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

500500

Comp2(500)

Clean

Double Buffer Computation Example

18

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow11

495495

Comp2(495)

Done

Double Buffer Computation Example

19

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

11

490490

Comp2(490)

Init Set 1

Double Buffer Computation Example

20

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

10241024

480480

Comp2(480) Comp1(1024)

Double Buffer Computation Example

21

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

10241024

Comp1(1024)

Clean

Double Buffer Computation Example

22

INIT

INIT

Comp1Comp1

Clean

Clean

INIT

INIT

Copy1Copy1

Clean

Clean

10241024Done

Done

INIT

INIT

Comp2Comp2

Clean

Clean

INIT

INIT

Copy2Copy2

Clean

Clean

10241024Done

Done

ΔTΔT

StartStart

88

88

TT

TT

TT

TT

FF

FF

LowLow

High

Low

TT

TT

Rules:Always take highest priority task firstIf two tasks have the same priority, take the task that was enabled firstOtherwise, choose arbitrarily

LowLow

10201020

Comp1(1020)

Init Copy Set 2

11

Double Buffer Computation Example

23

The Cool Demo on MxM

• Due to Rishi Khan (ETI)

04/21/23 UHPC-Portland-Meeting-06-2009 24

Final Results

25

Scalability

26

Summary• Static Optimizations increase performance

substantially. • Dynamic Scheduling and Dynamic Percolation

mitigates the unpredictable effects of resource sharing.

• Optimizations implemented are also power efficient.

27

04/21/23 Tutorial Project Part 2 28

Acknowledgements

• Professor Guang Gao• ETI and CAPSL people that have help on this

project (Rishi Khan, Daniel Orozco, Kelly Livingston, Ioannis Venetis)

• Members of CAPSL

Recommended