31
2009-11-27 1 Simplified Design Flow (a picture from Ingo Sander) Hardware architecture So far, we have talked about only single processor systems “Concurrency” implemented by “scheduling” RT systems often consist of several processors Multiprocessor systems “Tightly connected” processors by a “high-speed” interconnect e.g. cross-bar, bus, NoC (network on chip) etc. Single processor with multi-thread Distributed Systems “Loosely connected” processors by a “low-speed” network e.g. CAN, Ethernet, Token Ring etc. --

Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

1

Simplified Design Flow

(a picture from Ingo Sander)

Hardware architecture

So far, we have talked about only single processor systems

– “Concurrency” implemented by “scheduling”

RT systems often consist of several processors

• Multiprocessor systems

– “Tightly connected” processors by a “high-speed” interconnect e.g. cross-bar, bus, NoC (network on chip) etc.

– Single processor with multi-thread

• Distributed Systems– “Loosely connected” processors by a “low-speed” network e.g. CAN, Ethernet, Token

Ring etc. --

Page 2: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

2

Multiprocessor/Distributed Systems

-- Local area distributed system (our focus)

Complete system

(d)

LAN

(examples)

OUTLINE

• Why multiprocessor?– energy, performance and predictability

• What are multiprocessor systems– OS etc

• Design RT systems on multiprocessors– Task Assigment

• Multiprocessor scheduling– (semi-)partitioned scheduling

– global scheduling

Page 3: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

3

Why multiprocessor systems?

“Tightly connected” processors by a “high-speed” interconnect e.g. cross-bar, bus, NoC etc.

1

10

100

1000

Now

Performance[log]

Year

Single Core

Multicore:Requires Parallel Applications

Hardware: Trends

Page 4: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

4

You must do “parallel processing” to get:

• Higher Performance– Single-core reached the “limit” of implicit parallelism – Frequency does not scale any more

• Lower Power Consumption: Battery lifetime, cost of energy

– Increasing the cores, decreasing the frequency• Performance = Cores * F 2* Cores * F/2 Cores * F• Power = C * V2 * F 2* C * (V /2)2 * F/2 C * V2 /4 * F

Keep the “same performance” using ¼ of the energy(doubling the cores) -- theoretically

• Experiments show that a 1% clock increase results in a 3% power increase.

CPU frequency vs Power consumption

1. Standard processor over-clocked 20%

2. Standard processor

3. Two standard processors each under-clocked 20%

Page 5: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

5

What’s happening now?

• General-Purpose Computing(Symposium on High-Performance Chips, Hot Chips 21, Palo Alto, Aug 23-25, 2009)

– 4 cores in notebooks

– 12 cores in servers

• AMD 12-core Magny-Cours will consume less energy than previous generations with 6 cores

– 16 cores for IBM servers, Power 7

• Embedded Systems

– 4 cores in ARM11 MPCore embedded processors

What next?

• Manycores (>100’s of cores) predicted to be here in a few years – e.g. Ericsson

Page 6: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

6

11

CPU

L1

CPU

L1

CPU

L1

CPU

L1

CPU

L1

CPU

L1

CPU

L1

CPU

L1

Ban

dw

idth

Typical Multicore Architecture

11

L2 Cache

Off-chip memory

Multiprocessor models

• Identical multiprocessors:

– each processor has the same computing capacity

• Uniform multiprocessors:

– different processors have different computing capacities

• Heterogeneous multiprocessors:

– each (task, processor) pair may have a different computing capacity

Page 7: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

7

Multiprocessor models

P1 P2 P3

background – problem – results

Co

mp

utin

g c

ap

acity

identical multiprocessors: each processor has the same computing capacity

Multiprocessor models

P1 P2 P3

Task T1 Task T2

background – problem – results

Page 8: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

8

uniform multiprocessors: different processors have different computing capacities

Multiprocessor models

P1 P2 P3

Task T1 Task T2

x

x/2 x/3

y y/2 y/3

background – problem – results

speed = 1 speed = 2 speed = 3

Multiprocessor models

P1 P2 P3

Task T1 Task T2

x/2 x/3

x

y

1.5 y

background – problem – results

heterogeneous multiprocessors: each (task, processor) pair may have a different computing capacity

y

Page 9: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

9

Single processor vs. multiprocessor OS

Page 10: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

10

Symmetric Multiprocessing (SMP)

Page 11: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

11

Simplified Design Flow of RT Systems

(a picture from Ingo Sander)

Task assignment

Task Assignment

• In the design flow:– First, the application is partitioned into tasks or task graphs.

– At some stage, the execution times, communication costs, data and control dependencies of all the tasks become known.

• Task assignment determines– how many processors needed (bin-packing problem)

– on which processor each task executes

• This is a very complex problem (NP-hard)– Often done off-line

– Often heuristics work

Page 12: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

12

Task Assignment

• The task models used in task assignment can vary in complexity

depending what considered/ignored:– Communication costs

– Data and control dependencies

– Resource requirements e.g. WCET, memory etc

• In multi-core platforms with shared caches, communication costs for tasks on the same chip may be very small

• It is often meaningful to consider the execution time requirement (WCET) and ignore communication in an early design phase

Multiprocessor scheduling

• "Given a set J of jobs where job ji has length li and a number of processors mi, what is the minimum possible time required to schedule all jobs in J on m

processors such that none overlap?" – Wikipedia

– That is, design a schedule such that the response time of the last tasks is minimized

(Alternatively, given M processors and N tasks,find a mapping from tasks to processors such that all the tasks are schedulable)

• The problem is NP-complete• It is also known as the “load balancing problem”

Page 13: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

13

Multiprocessor scheduling of “periodic/sporadic real-time tasks”

• Partitioned scheduling

– Static allocation – task assignment• Each task may only execute on a fixed processor

• No task migration

• Semi-partitioned scheduling

– Static allocation – task assignment• Each instance (or part of it) of a task is assigned to a fixed processor

• task instance or part of it may migrate

• Global scheduling

– Dynamic allocation – (dynamic) task assignment• Any instance of any task may execute on any processor

• Task migration

Multiprocessor Scheduling

952

1 6

8

4

new task instance

waiting queue

cpu 1 cpu 2 cpu 3

Global scheduling

cpu 1 cpu 2 cpu 3

2

5

2

1

22

3

6

7

4

2 3

new task instance

cpu 1 cpu 2 cpu 3

5

1

2

8

6

3

9

7

4

new task instance

Partitioned scheduling Semi-partitioned scheduling

Page 14: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

14

Partitioned scheduling

• Two steps:

– Determine a mapping of tasks to processors

– Perform run-time scheduling

• Example: Partitioned with EDF

– Assign tasks to the processors such that no processor’s capacity is exceeded (utilization bounded by 1.0)

– Schedule each processor using EDF

Bin-packing algorithms

• The problem concerns packing objects of varying sizes in boxes (”bins”) with the objective of minimizing number of used boxes.

– Solutions (Heuristics): Next Fit and First Fit

• Application to multiprocessor systems:

– Bins are represented by processors and objects by tasks.

– The decision whether a processor is ”full” or not is derived from a utilization-based schedulability test.

Page 15: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

15

Rate-Monotonic-First-Fit (RMFF): [Dhall and Liu, 1978]

• First, sort the tasks in the order of increasing periods.

• Task Assignment

– All tasks are assigned in the First Fit manner starting from the task with highest priority

– A task can be assigned to a processor if all the tasks assigned to the processor are RM-schedulable i.e.• the total utilization of tasks assigned on that processor is bounded

by n(21/n-1) where n is the number of tasks assigned.

(One may also use the Precise test to get a better assignment!)

– Add a new processor if needed for the RM-test.

Partitioned scheduling

• Advantages:

– Most techniques for single-processor scheduling are also applicable here

• Partitioning of tasks can be automated

– Solving a bin-packing algorithm

• Disadvantages:

– Cannot exploit/share all unused processor time

Page 16: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

16

Partitioned Scheduling to Minimize Communication Costs

• Communication between tasks which run on same processor is usually significantly lower than for tasks that run on different processors

– C ij = Communication Cost (e.g. amount of data to send)

• When communication costs are considered, objective of task assignment is twofold

– Find minimum number of processors needed

– Find minimum of communication costs for this number of processors

• The problem may be solved using optimization techniques such as integer/linear programming.

Example of Task Partitioning

P2P1 P1

Page 17: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

17

Semi-Partitioned scheduling

• High resource utilization

• High overhead (due to task migration)

Example:(semi-)Partitioned Scheduling Algorithm

• sort all tasks in increasing priority order

7

6

5

4

3

2

1highest priority

lowest priority

Page 18: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

18

• select the processor on which the assigned utilization is the lowest

7

6

5

4

3

2

1

P1 P2 P3

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

• select the processor on which the assigned utilization is the lowest

6

5

4

3

2

1

P1 P2 P3

7

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

Page 19: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

19

• select the processor on which the assigned utilization is the lowest

5

4

3

2

1

P1 P2 P3

76

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

• select the processor on which the assigned utilization is the lowest

4

3

2

1

P1 P2 P3

76

5

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

Page 20: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

20

• select the processor on which the assigned utilization is the lowest

3

2

1

P1 P2 P3

76 54

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

• select the processor on which the assigned utilization is the lowest

2

1

P1 P2 P3

76

54

3

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

Page 21: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

21

• select the processor on which the assigned utilization is the lowest

1

P1 P2 P3

76 54

321

22

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

• select the processor on which the assigned utilization is the lowest

1

P1 P2 P3

76

54

32122

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

Page 22: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

22

• select the processor on which the assigned utilization is the lowest

P1 P2 P3

76 54

321

12

11

22

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

• select the processor on which the assigned utilization is the lowest

P1 P2 P3

76

54

3211211

22

highest priority

lowest priority

Example:(semi-)Partitioned Scheduling Algorithm

Page 23: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

23

Global scheduling

• All ready tasks are kept in a global queue

• When selected for execution, a task can be dispatched to any processor, even after being preempted

Algorithms for Global scheduling

• EDF – Unfortunately not optimal!

– No simple schedulability test known (only sufficient)

• Fixed Priority Scheduling e.g. RM

– Difficult to find the optimal priority order

– Difficult to check the schedulability

• Any algorithm for single processor scheduling may work, but schedulability analysis is non-trivial.

Page 24: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

24

Example Priority Assignment

Schedulability Test

Note that U/m is close to 1/3 when m is large enough.

This means that regardless of the number of tasks, and the number of processors if no more than 33% of the processing capacity is required, the task set is schedulable. For traditional RM, this is not true.

Page 25: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

25

Priority Assignment

Priority Assignment

Page 26: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

26

Poor Resource Utilization

Page 27: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

27

Known Results

20

10

30

40

60

50

70

80

Multiprocessor Scheduling

Global Partitioned

Fixed

Priority

Dynamic

Priority

Semi-partitioned

Fixed

Priority

Dynamic

Priority

Fixed

Priority

Dynamic

Priority

38

%

50

Liu and Layland’s

Utilization Bound

50 50

65 66

utilizationbound

69

.3

Global Scheduling: + and -

• Advantages:– Supported by most multiprocessor operating systems

• Windows NT, Solaris, Linux, ...

– Effective utilization of processing resources (if it works)• Unused processor time can easily be reclaimed at run-time (mixture of hard

and soft RT tasks to optimize resource utilization)

• Disadvantages:– Few results from single-processor scheduling can be used– No “optimal” algorithms known except idealized assumption (Pfair sch)– Poor resource utilization for hard timing constraints

• No more than 50% resource utilization can be guaranteed for hard RT tasks

– Suffers from scheduling anomalies• Adding processors and reducing computation times and other parameters can

actually decrease optimal performance with some scenarios!

Page 28: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

28

Underlying causes

– The ”root of all evil” in global scheduling: (Liu, 1969):

The simple fact that a task can use only one processor even when several processors are free at the same time adds a surprising amount of difficulty to the scheduling of multiple processors.

– Dhall’s effect: with RM, DM and EDF, some low-utilization task sets can be un-schedulable regardless of how many processors are used.

– Hard-to-find critical instant: a critical instant does not always occur when a task arrives at the same time as all its higher-priority tasks.

Page 29: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

29

How to avoid Dhall’s effect

• Problem: RM, DM and EDF only account for task periods!

– Actual computation demands are not accounted for.

• Solution: Dhall’s effect can easily be avoided by letting tasks with high utilization receive higher priority:

Page 30: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

30

MP Scheduling Anomalies

• Adding processors, reducing computation times, or “improving” other parameters can actually decrease optimal performance in some scenarios!

Doubling Processor Speed (increasing the number of processors) may cause a deadline

miss

Page 31: Simplified Design Flo€¦ · Complete system (d) LAN (examples) OUTLINE •Why multiprocessor? –energy, performance and predictability •What are multiprocessor systems –OS

2009-11-27

31

Anomalies under Resource constraints

• Task set J of five tasks on two processors.• Task 2 and 4 share the same resource in exclusive mode• Static allocation P1 (J1,J2) and P2 (J3,J4,J5)• Reducing the computation time of task 1 will increase the optimal

schedule time!

1 2

3 4 5

P1

P2

0 2 4 6 8 10 12 14

1 2

3 4 5

P1

P2

0 2 4 6 8 10 12 14

16 18

16 18

20 22

20 22

critical section

blocking