11
EECS 570: Fall 2003 -- rev1 1 EECS 570 • Notes on Chapter 2 – Parallel Programs

EECS 570

Embed Size (px)

DESCRIPTION

EECS 570. Notes on Chapter 2 – Parallel Programs. Terminology. Task: Programmer-defined sequential piece of work Concurrency is only across tasks Qualitative amount of work may be: small large Process (thread): Abstract entity that performs tasks Equivalent OS concepts - PowerPoint PPT Presentation

Citation preview

Page 1: EECS 570

EECS 570: Fall 2003 -- rev1 1

EECS 570

• Notes on Chapter 2 – Parallel Programs

Page 2: EECS 570

EECS 570: Fall 2003 -- rev1 2

Terminology

• Task:– Programmer-defined sequential piece of work– Concurrency is only across tasks– Qualitative amount of work may be:

• small• large

• Process (thread):– Abstract entity that performs tasks– Equivalent OS concepts– Must communicate and synchronize with other processes– Execute on processor

• typically one-to-one mapping

Page 3: EECS 570

EECS 570: Fall 2003 -- rev1 3

Step in Creating a Parallel Program

Page 4: EECS 570

EECS 570: Fall 2003 -- rev1 4

Decomposition

• Break up computation into tasks to be divided among processes– could be static, quasi-static or dynamic– i.e., identify concurrency and decide level at which to

exploit it

Goal: Enough tasks to keep processes busy...

...but not too many

Page 5: EECS 570

EECS 570: Fall 2003 -- rev1 5

Amdahl's Law

• Assume fraction s of sequential execution is inherently serial– remainder (1- s) can be perfectly parallelized

• Speedup with p processors is:

1

1- ss - p

• Limit: ?

Page 6: EECS 570

EECS 570: Fall 2003 -- rev1 6

Aside on Cost-Effective Computing

• Isn't Speedup(P) < P inefficient?• If only throughput matters, use P computers instead?• But much of a computer's cost is NOT in the processor

[Wood & Hill, IEEE Computer 2/95]Let Costup(P) = Cost(P)/Cost(l)

• Parallel computing cost-effective:Speedup(P) > Costup(P)

• E.g. for SGI PowerChallenge w/500MB:Costup(32) = 8.6

Page 7: EECS 570

EECS 570: Fall 2003 -- rev1 7

Assignment

• Assign tasks to processes– Again, can be static, dynamic, or in between

• Goals:– Balance workload– Reduce communication– Minimize management overhead

• Decomposition + Assignment = Partitioning• Mostly independent of architecture/programming

model

Page 8: EECS 570

EECS 570: Fall 2003 -- rev1 8

Orchestration

• How do we achieve task communication, synchronization, and assignment given programming model?– data structures (naming)– task scheduling– communication: messages, shared data accesses– synchronization: locks, semaphores, barriers, etc,

• Goals– Reduce cost of communication and synchronization– Preserve data locality (reduce communication, enhance

caching)– Schedule tasks to satisfy dependencies early– Reduce overhead of parallelism management

Page 9: EECS 570

EECS 570: Fall 2003 -- rev1 9

Mapping

• Assign processes to processors– Usually up to OS, maybe with user

hints/preferences– Usually assume one-to-one, static

• Terminology:– space sharing– gang scheduling– processor affinity

Page 10: EECS 570

EECS 570: Fall 2003 -- rev1 10

Parallelizing Computation vs. Data

Above view is centered around computation• Computation is decomposed and assigned (partitioned)

Partitioning data is often a natural view too• Computation follows data: owner computes• Grid example: data mining; High Performance Fortran (HPF)

But not always sufficient• Distinction between comp. and data stronger in many applications

– Barnes-Hut, Raytrace

Page 11: EECS 570

EECS 570: Fall 2003 -- rev1 11

Assignment

• Static assignments (given decomposition into rows)– block: row i is assigned to

process floor(i/p)– cyclic: process i is

assigned rows I, i+p, and so on

• Dynamic– get a row index, work on the row, get a new row, and so on

• Static assignment reduces concurrency (from n to p)– block assign, reduces communication by keeping adjacent rows

together