30
INTRODUCTION TO PARALLEL ALGORITHMS

INTRODUCTION TO PARALLEL ALGORITHMS. Objective Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

INTRODUCTION TO PARALLEL ALGORITHMS

Page 2: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Objective

Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Characteristics of Tasks and Interactions Parallel Algorithm Design Models

Page 3: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

What is a Parallel Algorithm? Imagine you needed to find a lost child in

the woods.

Even in a small area, searching by yourself would be very time consuming

Now if you gathered some friends and family to help you, you could cover the woods in much faster manner…

Page 4: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Sherwood Forest

Page 5: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Definition

In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential (or serial) algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.

Page 6: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Elements of a Parallel Algorithm

„ Pieces of work that can be done concurrently- tasks

„ Mapping of the tasks onto multiple processors- processes vs processors

„ Distribution of input/output & intermediate data across the different processors

„ Management the access of shared data either input or intermediate „ Synchronization of the processors at various points

of the parallel execution

Page 7: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Decomposition, Tasks, and Dependency Graphs The first step in developing a parallel algorithm is to

decompose the problem into tasks that can be executed concurrently

A given problem may be docomposed into tasks in many different ways.

Tasks may be of same, different, or even indetermined sizes.

A decomposition can be illustrated in the form of a directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.

Page 8: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Granularity of Task Decompositions

The number of tasks into which a problem is decomposed determines its granularity.

Decomposition into a large number of tasks results in fine-grained decomposition and that into a small number of tasks results in a coarse grained decomposition.

A coarse grained counterpart to the dense matrix-vector product example. Each task in this example corresponds to the computation of three elements of the result vector.

Page 9: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Example: Multiplying a Dense Matrix with a Vector

Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.

Page 10: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Example: Database Query Processing Consider the execution of the query:

MODEL = ``CIVIC'' AND YEAR = 2001 AND(COLOR = ``GREEN'' OR COLOR = ``WHITE)

on the following database:

ID# Model Year Color Dealer Price

4523 Civic 2002 Blue MN $18,000

3476 Corolla 1999 White IL $15,000

7623 Camry 2001 Green NY $21,000

9834 Prius 2001 Green CA $18,000

6734 Civic 2001 White OR $17,000

5342 Altima 2001 Green FL $19,000

3845 Maxima 2001 Blue NY $22,000

8354 Accord 2000 Green VT $18,000

4395 Civic 2001 Red CA $17,000

7352 Civic 2002 Red WA $18,000

Page 11: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Example: Database Query Processing

The execution of the query can be divided into subtasks in variousways. Each task can be thought of as generating an intermediatetable of entries that satisfy a particular clause.

Decomposing the given query into a number of tasks.Edges in this graph denote that the output of one task

is needed to accomplish the next.

Page 12: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Task-Dependency Graph

Key Concepts Derived from the Task Dependency Graph

…Degree of Concurrency „ The number of tasks that can be concurrently

executed …Critical Path

„ The longest vertex-weighted path in the graph … The weights represent task size

…Task granularity affects both of the above characteristics

Page 13: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Critical Path Length A directed path in the task dependency graph represents

a sequence of tasks that must be processed one after the other.

The longest such path determines the shortest time in which the program can be executed in parallel.

The length of the longest path in a task dependency

graph is called the critical path length.

Page 14: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Critical Path Length Consider the task dependency graphs of the two

database query decompositions:

Page 15: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Task-Interaction Graph Captures the pattern of interaction between tasks … This graph usually contains the task-dependency graph as a

subgraph „ i.e., there may be interactions between tasks even if

there are no dependencies … these interactions usually occur due to accesses on shared

data

Page 16: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Attributes of parallel algorithms

concurrency scalability locality and modularity

Page 17: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Contd… Concurrency refers to the ability to perform many

actions simultaneously; this is essential if a program is to execute on many processors. 

Scalability indicates resilience to increasing processor counts and is equally important, as processor counts appear likely to grow in most environments. 

Locality means a high ratio of local memory accesses to remote memory accesses (communication); this is the key to high performance on multicomputer architectures. 

Modularity ---the decomposition of complex entities into simpler components---is an essential aspect of software engineering, in parallel computing as well as sequential computing.

Page 18: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Design Process of Parallel Algorithms Partitioning. The computation that is to be performed and the

data operated on by this computation are decomposed into small tasks. Practical issues such as the number of processors in the target computer are ignored, and attention is focused on recognizing opportunities for parallel execution.

Communication. The communication required to coordinate task execution is determined, and appropriate communication structures and algorithms are defined. 

Agglomeration. The task and communication structures defined in the first two stages of a design are evaluated with respect to performance requirements and implementation costs. If necessary, tasks are combined into larger tasks to improve performance or to reduce development costs. 

Mapping. Each task is assigned to a processor in a manner that attempts to satisfy the competing goals of maximizing processor utilization and minimizing communication costs. Mapping can be specified statically or determined at runtime by load-balancing algorithms.

Page 19: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Contd…

Page 20: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Communication Static

Each processor is hard-wired to every other processor

Completely Connected Star Connected Bounded Degree

Dynamic Processors are connected to a series of switches

Page 21: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Agglomeration

Page 22: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Processes and Mapping

In general, the number of tasks in a decomposition exceeds the number of processing elements available.

For this reason, a parallel algorithm must also provide a mapping of tasks to processes.

Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.

Page 23: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Processes and Mapping (Cont..) Appropriate mapping of tasks to processes is critical to the

parallel performance of an algorithm. Mappings are determined by both the task dependency and

task interaction graphs. Task dependency graphs can be used to ensure that work is

equally spread across all processes at any point (minimum idling and optimal load balance).

Task interaction graphs can be used to make sure that processes need minimum interaction with other processes (minimum communication).

Page 24: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Processes and Mapping (Cont..)An appropriate mapping must minimize parallel execution time by:

Mapping independent tasks to different processes.

Assigning tasks on critical path to processes as soon as they become available.

Minimizing interaction between processes by mapping tasks with dense interactions to the same process.

Page 25: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Processes and Mapping: Example

Mapping tasks in the database query decomposition to processes. These mappings were arrived at by viewing the dependency graph in terms of levels (no two nodes in a level have dependencies). Tasks within a single level are then assigned to different processes.

Page 26: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Parallel Algorithm Models Master-Slave Model: One or more processes generate work and

allocate it to worker processes. This allocation may be static or dynamic.

Pipeline / Producer-Comsumer Model: A stream of data is passed through a succession of processes, each of which perform some task on it.

Hybrid Models: A hybrid model may be composed either of multiple models applied hierarchically or multiple models applied sequentially to different phases of a parallel algorithm.

Page 27: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Refrences Principles of Parallel Algorithm Design by Ananth Grama,

Anshul Gupta, George Karypis, and Vipin Kumar

http://www.cs.cmu.edu/~scandal/nesl/algorithms.html

http://www-users.cs.umn.edu/~karypis/parbook/

Page 28: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Summary Parallel Algorithm:

It is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.

Decompose the problem into tasks that can be executed concurrently The number of tasks into which a problem is decomposed determines its

granularity. Task-Dependency Graph

Based on this graph,mapping is done between processes and processors

Task-Interaction Graph Captures the pattern of interaction between tasks

Critical Path Length A directed path in the task dependency graph represents a sequence of

tasks that must be processed one after the other. The length of the longest path in a task dependency graph is called the

critical path length.

Page 29: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

Summary(Cont….) Attributes of parallel algorithms

concurrency scalability locality and modularity

Design Process of Parallel Algorithms Partitioning Communication Agglomeration Mapping

Page 30: INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors

THANK YOU