Parallel Processrs Ppt

8/13/2019 Parallel Processrs Ppt

1/40

PRESENTATION ON

PARALLELPROCESSORS


2/40

INTRODUCTION

A parallel processor is a processor that performs concurrent* data processingtasks, which results in lesser execution time.

Parallel processing involves simultaneous computations in the CPU for the

purpose of increasing its computational speed. Instead of processing each

instruction sequentially as in conventional computers. Parallel processing is

established by distributing the data among the multiple functional units.

For example, while an instruction is being executed in the ALU, the next

instruction can be read from memory. The arithmetic, logic, and shift operations

can be separated into three units and the operand diverted to each unit under the

supervision of a control unit.


3/40

Processor with multiple functional units

www.ustudy.in

Adder- subtractor

Integer multiply

Floating point

add-subtract

Incrementer

Logic unit

Shift unit

Floating pointmultiply

Floating pointdivide

Processorregisters

To

memory


4/40

The figure shows one possible way of separating the execution unit into eight

functional units.

They operands in the registers are applied to one of the units depending on the

operation specified by the instruction.

The adder-subtractor and integer multiplier perform the arithmetic operations with

integer number.

The floating point operations are separated into three circuits operating in parallel.

The logic, shift and increment operations can be performed concurrently on

different data.

All units are independent of each other. So one number can be incremented while

another number is being shifted.


5/40

ADVANTAGES :

Lesser execution time, so higher throughput, which is the

maximum number of results that can be generated per unittime by a processor.

Parallel processing is much faster than sequentialprocessing when it comes to doing repetitive calculations

on vast amounts of data. This is because a parallelprocessor is capable of multithreading on a large scale, andcan therefore simultaneously process several streams ofdata. This makes parallel processors suitable for graphics

cards since the calculations required for generating themillions of pixels per second are all repetitive.

Disadvantages:- More hardware required, also more powerrequirements. Not good for low power and mobile devices.


6/40

CLASSIFICATION

There are variety of ways that parallel processing canbe classified.

It can be based on the Internal organization of the processor

The interconnection structure between processors

The flow of information through the system


7/40

Micheal J. Flynns classification

one of the earliest classification systems for parallel (andsequential) computers and programs, now knownas Flynn's taxonomy.

it is the organization of computer systems by

o the number of instructions ando data sets that are manipulated simultaneously.

Flynns classification divides computers into four majorgroups as follows:

Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD)

Multiple Instruction, Single Data (MISD)

Multiple Instruction, Multiple Data (MIMD)
http://en.wikipedia.org/wiki/Flynn's_taxonomyhttp://en.wikipedia.org/wiki/Flynn's_taxonomy


8/40

Single Instruction, Single Data (SISD) SISD represents a serial (non-parallel) computer, containing

a control unit, a processor unit, and a memory unit.

Single instruction: only one instruction stream is being actedon by the CPU during any one clock cycle

Single data: only one data stream is being used as inputduring any one clock cycle


9/40

Instructions are executed sequentiallyand the system may or may not have

internal parallel processingcapabilities.

Parallel processing in this case may beachieved by means of multiplefunctional units or by pipelineprocessing.

This is the oldest and even today, themost common type of computer

Examples: older generationmainframes, minicomputers andworkstations; most modern day PCs.


10/40

Single Instruction, Multiple Data (SIMD) A type of parallel computer

It represents an organization that includes many processing units under thesupervision of a common control unit.

Single instruction: All processing units execute the same instruction at anygiven clock cycle

Multiple data: Each processing unit can operate on a different data element

The shared memory unit must contain multiple modules so that it can

communicate with all the processors simultaneously. Best suited for specialized problems characterized by a high degree of

regularity, such as graphics/image processing.

Examples:

Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV

Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2,Hitachi S820, ETA10

Most modern computers, particularly those with graphics processor units(GPUs) employ SIMD instructions and execution units.


11/40


12/40

Multiple Instruction, Single Data

(MISD) MISD structure is only of theoretical interest since no

practical system has been constructed using thisorganization.

single data stream is fed into multiple processing units. Each processing unit operates on the data independently

via independent instruction streams.

Few actual examples of this class of parallel computer haveever existed. One is the experimental Carnegie-MellonC.mmp computer (1971).

Some conceivable uses might be: Multiple frequency filtersoperating on a single signal stream. Multiple cryptographyalgorithms attempting to crack a single coded message.


13/40


14/40

Multiple Instruction, Multiple Data (MIMD)

MIMD organization refers to a computer system capable of processingseveral programs at the same time.

Most multiprocessor and multicomputer systems can be classified inthis category.

Currently, the most common type of parallel computer. Most modern

computers fall into this category. Multiple Instruction: every processor may be executing a different

instruction stream

Multiple Data: every processor may be working with a different datastream

Execution can be synchronous or asynchronous, deterministic or non-deterministic

Examples: most current supercomputers, networked parallel computerclusters and "grids", multi-processor SMP computers, multi-core PCs.

Note: many MIMD architectures also include SIMD execution sub-

components


15/40


16/40

A superscalar architecture is one in which several instructions can beinitiated simultaneously and executed independently.

They have the ability to initiate multiple instructions during thesame clock cycle.


17/40

A superscalar architecture consists of a numberofpipelinesthat are working in parallel.


18/40

PIPELINE A pipelineis a set of data processing elements connected

in series, so that the output of one element is the input ofthe next one.


19/40

PIPELINING Pipelining allows the processor to read a new instruction from memory

before it is finished processing the current one. As an instruction goesthrough each stage, the next instruction follows it does not need to wait untilit completely finishes.

Pipelining saves time by ensuring that the microprocessor can start theexecution of a new instruction before completing the current or previousones. However, it can still complete just one instruction per clock cycle.


20/40

ADVANTAGES

Allows for instruction execution rate to exceed theclock rate (CPI of less than 1).

It thereby allows faster CPU throughput thanwould otherwise be possible at the same clockrate.

THROUGHPUT- It is maximum number ofinstructions that can be carried out at a givenperiod of time.
http://en.wikipedia.org/wiki/Throughputhttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Throughput


21/40

Superscalar Architectures

A typical Superscalar processor fetches and decodes theincoming instruction stream several instructions at a time.

Superscalar Execution


22/40

Instruction-Level Parallelism Superscalar processors are designed to exploit more

instruction-level parallelism in user programs.

For example,load R1 R2 add R3 R3, 1

add R3 R3, 1 add R4 R3, R2

add R4 R4, R2 store [R4] R0

The three instructions on the left are independent, and intheory all three could be executed in parallel.

The three instructions on the right cannot be executed in

parallel because the second instruction uses the result of thefirst, and the third instruction uses the result of the second.


23/40

Fetching and dispatching two instructions per cycle

(degree 2)


24/40

One floatingpoint and two integer operations are issued and

executed simultaneously; each unit is pipelined and executes

several operations in different pipeline stages.


25/40

Hardware Organization of a superscalar

processor


26/40

Some Architectures PowerPC 604

six independent execution units: Branch execution unit Load/Store unit 3 Integer units Floating-point unit

in-order issue register renaming

Power PC 620 provides in addition to the 604 out-of-order issue

Pentium three independent execution units:

2 Integer units Floating point unit

in-order issue


27/40

Intel P5 Microarchitecture

Used in initial Pentium processor

Could execute up to 2 instructions simultaneously


28/40

PIPELINING: Pipelining is a technique of decomposing a sequential process

(instruction) into sub operations and each of the suboperations get executed in a special dedicated segment that

operates concurrently with all other segments.

Each segment performs partial processing dictated by the waythe task is partitioned. The result obtained from thecomputation in each segment is transferred to the nextsegment in the pipeline.


29/40


30/40

SUPERPIPELINING:

Superpipelining is the breaking of longer stages of apipeline into smaller stages and this shortens the clockperiod per instruction. Therefore more number ofinstructions can be executed in the same time ascompared to pipelined structure.

The breaking of stages increases the efficiency as clocktime is determined by the longest stage.


31/40


32/40

TIMING DIAGRAM :


33/40

Comparison of clock time per

cycle:


34/40

Some processors which have super pipelined

architecture are :-

MIPS R400,Intel Net Burst,ARM11 core.

ARM cores are famous for their simple and cost-effectivedesign. However, ARM cores have also evolved and showsuperpipelining characteristics in their architectures andhave architectural features to hide the possible longpipeline stalls.The ARM11 (specifically, the ARM1136JF) is ahigh performance and low-power processor which isequipped with eight stage pipelining.

The core consists of two fetch stages, one decode stage,one issue stage, and four stages for the integer pipeline.


35/40

The eight stages of ARM11 core are :


36/40

DIFFERENCE BETWEEN SUPERSCALING

AND SUPER PIPELINING SUPERSCALING :It creates multiple pipelines within

a processor, allowing the CPU to execute multipleinstructions simultaneously.

SUPERPIPELINING :It breaks the instructionpipeline into smaller pipeline stages , allowing the

CPU to start executing the next instruction beforecompleting the previous one. The processor can runmultiple instructions simultaneously, with eachinstruction being at a different stage of completion.

ASPECTS SUPERSCALING SUPERPIPELINING


37/40

1. APPROACH Dynamically issuesmultiple instruction

per cycle.

Divides the long latencystages of the pipeline into

shorter stages.

2. INSTRUCTION I SSUE

RATE

Multiple Multiple(different instructionsat different stages ofcompletion)

3.EFFECTS Effects the clock perinstruction (CPI) termof the performanceequation

Effects the clock cycletime term of the

performance equation.

4.DI FF ICULTY OF

DESIGN

Complex design issues Relatively easierdesign

5.ADDI TIONAL AIDS Additional hardwareunits required like the

fetch units.

No additionalhardware unitsrequired.


38/40

INSTRUCTION ISSUE STYLE: Both superscaling and superpipelining follow dynamic

instruction scheduling. In dynamic scheduling, the instructions are fetched

sequentially in program order. However, those instructionsare decoded and stored in a scheduling window of a

processor execution core. After decoding the instructions,the processor core obtains the dependency informationbetween the instructions and can also identify theinstructions which are ready for execution.

*The performance equation of a microprocessor :

Execution Time = IC *CPI * clock cycle time


39/40

CONCLUSION: From all these we can conclude that the techniques of

parallel processing , superscaling and superpipeliningare different architectural improvements introduced to

increase the efficiency of the modern day computers.


40/40

Documents

Parallel Processrs Ppt