SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Parallel Computing

CS 147: Computer ArchitectureInstructor: Professor Sin-Min Lee

Spring 2011By: Alice Cotti


Background

Amdahl's law and Gustafson's law Dependencies Race conditions, mutual exclusion,

synchronization, and parallel slowdown Fine-grained, coarse-grained, and

embarrassing parallelism


Amdahl's Law

The speed-up of a program from parallelization is limited by how much of the program can be parallelized.

Amdahl's Law


Dependencies

Consider the following functions, which demonstrate several kinds of dependencies:

1: function Dep(a, b)2: c := a·b3: d := 2·c4: end function

Operation 3 in Dep(a, b) cannot be executed before (or even in parallel with) operation 2, because operation 3 uses a result from operation 2. It violates condition 1, and thus introduces a flow dependency.


Dependencies

Consider the following functions

1: function NoDep(a, b)2: c := a·b3: d := 2·b4: e := a+b5: end function

In this example, there are no dependencies between the instructions, so they can all be run in parallel.


Race condition

A flaw whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events.

Can occur in electronics systems, logic circuits, and multithreaded software.

Race condition in a logic circuit. Here, ∆t1 and ∆t2 represent the propagation delays of

The logic elements. When the input value (A) changes, the circuit outputs a short spike of duration ∆t1.


Fine-grained, coarse-grained, and embarrassing parallelism

Applications are often classified according to how often their subtasks need to synchronize or communicate with each other.

Fine-grained parallelism: subtasks must communicate many times per second

Coarse-grained parallelism: they do not communicate many times per second

Embarrassingly parallel: rarely or never have to communicate. Embarrassingly parallel applications are the easiest to parallelize


Types of parallelism

Data parallelism Task parallelism Bit-level parallelism Instruction-level parallelism

A five-stage pipelined superscalar processor, capable of issuing two instructions per cycle.

It can have two instructions in each stage of the pipeline, for a total of up to 10 instructions (shown

in green) being simultaneously executed.


Hardware

Memory and communication Classes of parallel computers Multicore computing Symmetric multiprocessing Distributed computing


Multicore Computing

PROS

better than dual core

won't use the same bandwidth and bus

therefore be even faster.

CONS

heat dissipation problems

more expensive


Software

Parallel programming languages Automatic parallelization Application checkpointing


Parallel programming languages

Concurrent programming languages, libraries, APIs, and parallel programming models (such as Algorithmic Skeletons) have been created for programming parallel computers.

Shared memory Distributed memory Shared distributed memory


Automatic parallelizationAutomatic parallelization of a sequential program

by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, has had only limited success.

Mainstream parallel programming languages remain either explicitly parallel or (at best) partially implicit, in which a programmer gives the compiler directives for parallelization.

A few fully implicit parallel programming languages exist—SISAL, Parallel Haskell, and (for FPGAs) Mitrion-C.


Application checkpointing

The larger and more complex a computer is, the more that can go wrong and the shorter the mean time between failures.

Application checkpointing is a technique whereby the computer system takes a "snapshot" of the application. This information can be used to restore the program if the computer should fail.


Algorithmic methods

Parallel computing is used in a wide range of fields, from bioinformatics to economics. Common types of problems found in parallel computing applications are:

Dense linear algebra Sparse linear algebra Dynamic programming Finite-state machine simulation


Programming

The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed.

The base language of supercomputer code is, in general, Fortran or C, using special libraries to share data between nodes.

The new massively parallel GPGPUs have hundreds of processor cores and are programmed using programming models such as CUDA and OpenCL.


Classes of parallel computers

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism.

Multicore computing Symmetric multiprocessing Distributed computing Specialized parallel computers


Multicore computing

Includes multiple execution units ("cores") on the same chip.

Can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar.

Simultaneous multithreading has only one execution unit, but when that unit is idling (such as during a cache miss), it process a second thread. IBM's Cell microprocessor, for use in the Sony PlayStation 3 is multithreading.


Symmetric multiprocessing

A computer system with multiple identical processors that share memory and connect via a bus.

Bus contention prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors.

Small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely cost-effective.


Distributed computing

A distributed memory computer system in which the processing elements are connected by a network.

Highly scalable.

(a)–(b) A distributed system.(c) A parallel system.


Specialized parallel computers

Within parallel computing, there are specialized parallel devices that tend to be applicable to only a few classes of parallel problems.

Reconfigurable computing General-purpose computing on graphics

processing units Application-specific integrated circuits Vector processors


Questions?


References:

Wikipedia.org Google.com

Documents

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti