Parallel and Distributed Algorithms-IMPORTANT QUESTION

5/12/2018 Parallel and Distributed Algorithms-IMPORTANT QUESTION - slidepdf.com

http://slidepdf.com/reader/full/parallel-and-distributed-algorithms-important-question

PARALLEL AND DISTRIBUTED ALGORITHMS

IMPORTANT QUESTIONS->

Q1. Differentiate between parallel and distributed computing

system with appropriate diagrams?

DIAGRAMS OF DISTRIBUTED AND PARALLEL SYSTEM:-

(a)–(b) A Distributed System.(c) A Parallel System.

Q2. Explain different stages of designing parallel algorithms?

Designing Parallel Algorithms

• Parallel algorithm design is not easily reduced to simple recipes.• Goal is to suggest a framework within which parallel algorithm

design can be explored.• In the process, we hope to develop intuition as to what constitutes a

good parallel algorithm.

Some issues in the design of parallel algorithms

• Efficiency

• Scalability

• P artitionioning computations :

○ Domain Decomposition

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg



○ Functional Decomposition Techniques

• Locality

• Synchronous and Asynchronous communication

• Agglomeration as a means of reducing communication

• Load-balancing strategies.

STAGES OF DESIGNING PARALLEL : -

1. Methodical Design

➢ This methodology structures the design process as four distinctstages:

• Partitioning,

• Communication,

• Agglomeration, and

• Mapping.

➢ In the first two stages, we focus on concurrency and scalability. Thethird and fourth stages deal with locality and other performance-related issues.

• Partitioning .

✔ Decompose the computation and the data operated on by thiscomputation into small tasks.

✔ Practical issues such as the number of processors in the targetcomputer are ignored.

✔ Focus on recognizing opportunities for parallel execution.• Communication

✔ The communication required to coordinate task execution isdetermined.

✔ Appropriate communication structures and algorithms are

defined.

• Agglomeration✔ The task and communication structures defined in the first

two stages of a design are evaluated with respect to

Performance requirements Implementation costs.

• If necessary, tasks are combined into larger tasks to improveperformance or reduce development costs.

• Mapping



✔ Each task is assigned to a processor in a manner thatattempts to satisfy the competing goals of Maximizing processor utilization and Minimizing communication costs.

✔

Mapping can be specified statically or determined at runtimeby load-balancing algorithms.

DIAGRAM :-

Figure 2.1: PCAM: a design methodology for parallel programs. Startingwith a problem specification, we develop a partition, determinecommunication requirements, agglomerate tasks, and finally map tasks to processors.

1. Partitioning

➢ The partitioning stage of a design is intended to exposeopportunities for parallel execution.

➢ The focus is on defining a large number of small tasks in order to

yield what is termed a fine-grained decomposition of a problem.➢ In later design stages, evaluation of communication requirements,

the target architecture, or software engineering issues may lead usto forego opportunities for parallel execution identified at this stage.

➢ In this first stage we avoid prejudging alternative partitioningstrategies.

➢ A good partition divides into small pieces both the

• Computation associated with a problem and• The data on which this computation operates.

➢ Programmers most commonly first focus on the data associated witha problem:



• first determine an appropriate partition for the data, and

• finally work out how to associate computation with data.

➢ This partitioning technique is termed domain decomposition.

1. Domain Decomposition➢ We divide these data into small pieces of approximately equal size.

➢ Next, we partition the computation that is to be performed.This

partitioning yields a number of tasks, each comprising some data

and a set of operations on that data.

➢ Typically, communication is required to move data between tasks.

This requirement is addressed in the next phase of the design

process.

Example:

➢ Domain decomposition is a problem involving a three-dimensionalgrid.

➢ Computation is performed repeatedly on each grid point.

➢ In the early stages of a design, we favour the most aggressivedecomposition possible, which in this case defines one task for eachgrid point.

Figure 2.2: Domain decompositions for a problem involving a three-dimensional grid. One-, two-, and three-dimensional decompositions are possible; in each case, data associated with a single task are shaded. Athree-dimensional decomposition offers the greatest flexibility and isadopted in the early stages of a design.

1. Functional Decomposition

➢ Domain decomposition forms the foundation for most parallel

algorithms.

➢ Functional decomposition is valuable as a different way of thinking

about problems.

➢ A focus on the computations that are to be performed can

sometimes reveal structure in a problem that would not be obvious

from a study of data alone.



Figure 2.3: Functional decomposition in a computer model of climate.Each model component can be thought of as a separate task, to be

parallelized by domain decomposition. Arrows represent exchanges of data between components during computation: the atmosphere model

generates wind velocity data that are used by the ocean model, the oceanmodel generates sea surface temperature data that are used by the

atmosphere model, and so on.

Q3. Write an algorithm for list ranking ?

LIST-RANK(L) (in O(lg n) time)

1. for each processor i, in parallel2. do if next [i]=nil 3. then d [i]←04. else d [i]←15. while there exists an object i such that next [i]≠ nil6. do for each processor i, in parallel7. do if next [i]≠ nil8. then d [i]¬ d [i]+ d [next [i]]9. next [i] ¬next [next [i]]

Q4. Describe Flynn’s classification of parallel computer with

suitable diagrams?

Flynn's Classical Taxonomy

• There are different ways to classify parallel computers. One of themore widely used classifications, in use since 1966, is called Flynn's Taxonomy.

• Flynn's taxonomy distinguishes multi-processor computerarchitectures according to how they can be classified along the twoindependent dimensions of Instruction and Data. Each of thesedimensions can have only one of two possiblestates: Single or Multiple.

• The matrix below defines the 4 possible classifications according toFlynn:



S I S D

Single Instruction, SingleData

S I M D

Single Instruction, MultipleData

M I S D

Multiple Instruction,Single Data

M I M D

Multiple Instruction,Multiple Data

Single Instruction, Single Data (SISD):

• A serial (non-parallel) computer

• Single Instruction: Only one instruction stream is being acted onby the CPU during any one clock cycle

• Single Data: Only one data stream is being used as input duringany one clock cycle

• Deterministic execution

• This is the oldest and even today, the most common type of computer

• Examples: older generation mainframes, minicomputers andworkstations; most modern day PCs.

Single Instruction, Multiple Data (SIMD):

• A type of parallel computer

• Single Instruction: All processing units execute the sameinstruction at any given clock cycle

• Multiple Data: Each processing unit can operate on a differentdata element

• Best suited for specialized problems characterized by a high degreeof regularity, such as graphics/image processing.

• Synchronous (lockstep) and deterministic execution

• Two varieties: Processor Arrays and Vector Pipelines

• Examples:

○ Processor Arrays: Connection Machine CM-2, MasPar MP-1 &MP-2, ILLIAC IV

○ Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP,

NEC SX-2, Hitachi S820, ETA10

• Most modern computers, particularly those with graphics processor

units (GPUs) employ SIMD instructions and execution units.



Multiple Instruction, Single Data (MISD):


• Multiple Instruction: Each processing unit operates on the dataindependently via separate instruction streams.

• Single Data: A single data stream is fed into multiple processingunits.

• Few actual examples of this class of parallel computer have everexisted. One is the experimental Carnegie-Mellon C.mmp computer(1971).

• Some conceivable uses might be:

○ multiple frequency filters operating on a single signal stream

○ multiple cryptography algorithms attempting to crack a single

coded message.

Multiple Instruction, Multiple Data (MIMD):


• Multiple Instruction: Every processor may be executing adifferent instruction stream

• Multiple Data: Every processor may be working with a differentdata stream

• Execution can be synchronous or asynchronous, deterministic or

non-deterministic• Currently, the most common type of parallel computer - most

modern supercomputers fall into this category.

• Examples: most current supercomputers, networked parallelcomputer clusters and "grids", multi-processor SMP computers,multi-core PCs.

• Note: many MIMD architectures also include SIMD execution sub-components

Q5. Describe the parallel computing memory architecture?

Parallel Computing Memory Architecture:-

Shared Memory

General Characteristics:

• Shared memory parallel computers vary widely, but generally havein common the ability for all processors to access all memory asglobal address space.



• Multiple processors can operate independently but share the samememory resources.

• Changes in a memory location effected by one processor are visibleto all other processors.

• Shared memory machines can be divided into two main classesbased upon memory access times: UMA and NUMA

Uniform Memory Access (UMA):

• Most commonly represented today by Symmetric Multiprocessor(SMP) machines

• Identical processors

• Equal access and access times to memory

• Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent

means if one processor updates a location in shared memory, all theother processors know about the update. Cache coherency isaccomplished at the hardware level.

Non-Uniform Memory Access (NUMA):

• Often made by physically linking two or more SMPs

• One SMP can directly access memory of another SMP

• Not all processors have equal access time to all memories

• Memory access across link is slower

• If cache coherency is maintained, then may also be called CC-NUMA- Cache Coherent NUMA



Shared Memory (NUMA)

Advantages:

• Global address space provides a user-friendly programming

perspective to memory• Data sharing between tasks is both fast and uniform due to the

proximity of memory to CPUs

Disadvantages:

• Primary disadvantage is the lack of scalability between memory andCPUs. Adding more CPUs can geometrically increases traffic on theshared memory-CPU path, and for cache coherent systems,geometrically increase traffic associated with cache/memorymanagement.

• Programmer responsibility for synchronization constructs thatensure "correct" access of global memory.

• Expense: it becomes increasingly difficult and expensive to designand produce shared memory machines with ever increasingnumbers of processors.

Parallel Computer Memory Architectures

Distributed Memory


• Like shared memory systems, distributed memory systems vary widelybut share a common characteristic. Distributed memory systemsrequire a communication network to connect inter-processor memory.



• Processors have their own local memory. Memory addresses in oneprocessor do not map to another processor, so there is no concept of global address space across all processors.

• Because each processor has its own local memory, it operatesindependently. Changes it makes to its local memory have no effect onthe memory of other processors. Hence, the concept of cachecoherency does not apply.

• When a processor needs access to data in another processor, it isusually the task of the programmer to explicitly define how and whendata is communicated. Synchronization between tasks is likewise theprogrammer's responsibility.

• The network "fabric" used for data transfer varies widely, though it cancan be as simple as Ethernet.

Advantages:

• Memory is scalable with the number of processors. Increase thenumber of processors and the size of memory increasesproportionately.

• Each processor can rapidly access its own memory withoutinterference and without the overhead incurred with trying tomaintain cache coherency.

• Cost effectiveness: can use commodity, off-the-shelf processors andnetworking.

Disadvantages:

• The programmer is responsible for many of the details associatedwith data communication between processors.

• It may be difficult to map existing data structures, based on globalmemory, to this memory organization.

• Non-uniform memory access (NUMA) times

Q6. Differentiate between:(i) UMA and NUMA(ii) Shared Memory and Distributed Memory



Shared Memory


• Shared memory parallel computers vary widely, but generally havein common the ability for all processors to access all memory asglobal address space.

• Multiple processors can operate independently but share the samememory resources.

• Changes in a memory location effected by one processor are visibleto all other processors.

• Shared memory machines can be divided into two main classesbased upon memory access times: UMA and NUMA

Distributed Memory


• Like shared memory systems, distributed memory systems vary widelybut share a common characteristic. Distributed memory systemsrequire a communication network to connect inter-processor memory.

• Processors have their own local memory. Memory addresses in one

processor do not map to another processor, so there is no concept of global address space across all processors.

• Because each processor has its own local memory, it operatesindependently. Changes it makes to its local memory have no effect onthe memory of other processors. Hence, the concept of cachecoherency does not apply.

• When a processor needs access to data in another processor, it isusually the task of the programmer to explicitly define how and whendata is communicated. Synchronization between tasks is likewise theprogrammer's responsibility.



• The network "fabric" used for data transfer varies widely, though it canbe as simple as Ethernet.

Q7. What do you mean by binary search? Write an algorithm for

binary search?

• In computer science, a binary search or half-interval

search algorithm finds the position of a specified value (the input

"key") within a sorted array.

• At each stage, the algorithm compares the input key value with the

key value of the middle element of the array. If the keys match,

then a matching element has been found so its index, or position, is

returned. Otherwise, if the sought key is less than the middle

element's key, then the algorithm repeats its action on the sub-

array to the left of the middle element or, if the input key is greater,

on the sub-array to the right.

• If the remaining array to be searched is reduced to zero, then the

key cannot be found in the array and a special "Not found"

indication is returned.

• A binary search halves the number of items to check with each

iteration, so locating the item (or determining its absence) takeslogarithmic time.

BinarySearch(A[0..N-1], value, low, high) {if (high < low)

return -1 // not foundmid = low + (high - low) / 2

if (A[mid] > value)return BinarySearch(A, value, low, mid-1)

else if (A[mid] < value)return BinarySearch(A, value, mid+1, high)

elsereturn mid // found

}

Q9. What is Parallel Computing?

• Traditionally, software has been written for serial computation:

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Index_(information_technology)

http://en.wikipedia.org/wiki/Sorted_array

http://en.wikipedia.org/wiki/Logarithmic_time

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Index_(information_technology)

http://en.wikipedia.org/wiki/Sorted_array

http://en.wikipedia.org/wiki/Logarithmic_time




○ To be run on a single computer having a single CentralProcessing Unit (CPU);

○ A problem is broken into a discrete series of instructions.

○ Instructions are executed one after another.

○ Only one instruction may execute at any moment in time.

f

• In the simplest sense, parallel computing is the simultaneous useof multiple compute resources to solve a computational problem:

○ To be run using multiple CPUs

○ A problem is broken into discrete parts that can be solved

concurrently

○ Each part is further broken down to a series of instructions

○ Instructions from each part execute simultaneously on

different CPUs

ee

For example:

• The computer resources might be:

○ A single computer with multiple processors;

○ An arbitrary number of computers connected by a network;

○ A combination of both.

• The computational problem should be able to:



○ Be broken apart

into

discrete pieces of work that can be solved simultaneously;○ Execute multiple program instructions at any moment in time;

○ Be solved in less time with multiple compute resources than

with a single compute resource.

Distributed computing

• Distributed computing is a field of computer science that studies

distributed systems.

• A distributed system consists of multiple

autonomous computers that communicate through a computer

network.

• The computers interact with each other in order to achieve a

common goal.

• A computer program that runs in a distributed system is called

a distributed program, and distributed programming is the

process of writing such programs.

• Distributed computing also refers to the use of distributed systemsto solve computational problems.

• In distributed computing, a problem is divided into many tasks, eachof which is solved by one or more computers.

Q10.COMPLEXITY MEASURES:

The three basic aims of complexity theory are:


http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Computer_network


http://en.wikipedia.org/wiki/Computer_program


http://en.wikipedia.org/wiki/Computer



http://en.wikipedia.org/wiki/Computer_program



1. Introducing a notation to specify complexity: the first aim is to

introduce a mathematical notation to specify the functional

relationship between the input size of the problem and the

compution of computational resources, eg, computational time and

memory space.2. Choice of machine model to standardize the measures: The

second aim is to specify an underlying machine model to prescribe

an associated set of measures for the consumption of resources.

These measures are standardized so that these are invariant for

possible variations in algorithm.

3. Refinement of the meansures for parallel computation:

having obtained the turing measures,the third aim is to understahnd

how fast we cans solve certain problems when a large number of

processors are put together to work in parallel.

SOURCE:-

INTERNET

Documents

Parallel and Distributed Algorithms-IMPORTANT QUESTION