Upload
mohammad-wilder
View
14
Download
0
Embed Size (px)
DESCRIPTION
EECS 570. Notes on Chapter 2 – Parallel Programs. Terminology. Task: Programmer-defined sequential piece of work Concurrency is only across tasks Qualitative amount of work may be: small large Process (thread): Abstract entity that performs tasks Equivalent OS concepts - PowerPoint PPT Presentation
Citation preview
EECS 570: Fall 2003 -- rev1 1
EECS 570
• Notes on Chapter 2 – Parallel Programs
EECS 570: Fall 2003 -- rev1 2
Terminology
• Task:– Programmer-defined sequential piece of work– Concurrency is only across tasks– Qualitative amount of work may be:
• small• large
• Process (thread):– Abstract entity that performs tasks– Equivalent OS concepts– Must communicate and synchronize with other processes– Execute on processor
• typically one-to-one mapping
EECS 570: Fall 2003 -- rev1 3
Step in Creating a Parallel Program
EECS 570: Fall 2003 -- rev1 4
Decomposition
• Break up computation into tasks to be divided among processes– could be static, quasi-static or dynamic– i.e., identify concurrency and decide level at which to
exploit it
Goal: Enough tasks to keep processes busy...
...but not too many
EECS 570: Fall 2003 -- rev1 5
Amdahl's Law
• Assume fraction s of sequential execution is inherently serial– remainder (1- s) can be perfectly parallelized
• Speedup with p processors is:
1
1- ss - p
• Limit: ?
EECS 570: Fall 2003 -- rev1 6
Aside on Cost-Effective Computing
• Isn't Speedup(P) < P inefficient?• If only throughput matters, use P computers instead?• But much of a computer's cost is NOT in the processor
[Wood & Hill, IEEE Computer 2/95]Let Costup(P) = Cost(P)/Cost(l)
• Parallel computing cost-effective:Speedup(P) > Costup(P)
• E.g. for SGI PowerChallenge w/500MB:Costup(32) = 8.6
EECS 570: Fall 2003 -- rev1 7
Assignment
• Assign tasks to processes– Again, can be static, dynamic, or in between
• Goals:– Balance workload– Reduce communication– Minimize management overhead
• Decomposition + Assignment = Partitioning• Mostly independent of architecture/programming
model
EECS 570: Fall 2003 -- rev1 8
Orchestration
• How do we achieve task communication, synchronization, and assignment given programming model?– data structures (naming)– task scheduling– communication: messages, shared data accesses– synchronization: locks, semaphores, barriers, etc,
• Goals– Reduce cost of communication and synchronization– Preserve data locality (reduce communication, enhance
caching)– Schedule tasks to satisfy dependencies early– Reduce overhead of parallelism management
EECS 570: Fall 2003 -- rev1 9
Mapping
• Assign processes to processors– Usually up to OS, maybe with user
hints/preferences– Usually assume one-to-one, static
• Terminology:– space sharing– gang scheduling– processor affinity
EECS 570: Fall 2003 -- rev1 10
Parallelizing Computation vs. Data
Above view is centered around computation• Computation is decomposed and assigned (partitioned)
Partitioning data is often a natural view too• Computation follows data: owner computes• Grid example: data mining; High Performance Fortran (HPF)
But not always sufficient• Distinction between comp. and data stronger in many applications
– Barnes-Hut, Raytrace
EECS 570: Fall 2003 -- rev1 11
Assignment
• Static assignments (given decomposition into rows)– block: row i is assigned to
process floor(i/p)– cyclic: process i is
assigned rows I, i+p, and so on
• Dynamic– get a row index, work on the row, get a new row, and so on
• Static assignment reduces concurrency (from n to p)– block assign, reduces communication by keeping adjacent rows
together