Upload
bugbugbugbug
View
220
Download
2
Embed Size (px)
Citation preview
8/13/2019 Parallel Processrs Ppt
1/40
PRESENTATION ON
PARALLELPROCESSORS
8/13/2019 Parallel Processrs Ppt
2/40
INTRODUCTION
A parallel processor is a processor that performs concurrent* data processingtasks, which results in lesser execution time.
Parallel processing involves simultaneous computations in the CPU for the
purpose of increasing its computational speed. Instead of processing each
instruction sequentially as in conventional computers. Parallel processing is
established by distributing the data among the multiple functional units.
For example, while an instruction is being executed in the ALU, the next
instruction can be read from memory. The arithmetic, logic, and shift operations
can be separated into three units and the operand diverted to each unit under the
supervision of a control unit.
8/13/2019 Parallel Processrs Ppt
3/40
Processor with multiple functional units
www.ustudy.in
Adder- subtractor
Integer multiply
Floating point
add-subtract
Incrementer
Logic unit
Shift unit
Floating pointmultiply
Floating pointdivide
Processorregisters
To
memory
8/13/2019 Parallel Processrs Ppt
4/40
The figure shows one possible way of separating the execution unit into eight
functional units.
They operands in the registers are applied to one of the units depending on the
operation specified by the instruction.
The adder-subtractor and integer multiplier perform the arithmetic operations with
integer number.
The floating point operations are separated into three circuits operating in parallel.
The logic, shift and increment operations can be performed concurrently on
different data.
All units are independent of each other. So one number can be incremented while
another number is being shifted.
8/13/2019 Parallel Processrs Ppt
5/40
ADVANTAGES :
Lesser execution time, so higher throughput, which is the
maximum number of results that can be generated per unittime by a processor.
Parallel processing is much faster than sequentialprocessing when it comes to doing repetitive calculations
on vast amounts of data. This is because a parallelprocessor is capable of multithreading on a large scale, andcan therefore simultaneously process several streams ofdata. This makes parallel processors suitable for graphics
cards since the calculations required for generating themillions of pixels per second are all repetitive.
Disadvantages:- More hardware required, also more powerrequirements. Not good for low power and mobile devices.
8/13/2019 Parallel Processrs Ppt
6/40
CLASSIFICATION
There are variety of ways that parallel processing canbe classified.
It can be based on the Internal organization of the processor
The interconnection structure between processors
The flow of information through the system
8/13/2019 Parallel Processrs Ppt
7/40
Micheal J. Flynns classification
one of the earliest classification systems for parallel (andsequential) computers and programs, now knownas Flynn's taxonomy.
it is the organization of computer systems by
o the number of instructions ando data sets that are manipulated simultaneously.
Flynns classification divides computers into four majorgroups as follows:
Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD)
Multiple Instruction, Single Data (MISD)
Multiple Instruction, Multiple Data (MIMD)
http://en.wikipedia.org/wiki/Flynn's_taxonomyhttp://en.wikipedia.org/wiki/Flynn's_taxonomy8/13/2019 Parallel Processrs Ppt
8/40
Single Instruction, Single Data (SISD) SISD represents a serial (non-parallel) computer, containing
a control unit, a processor unit, and a memory unit.
Single instruction: only one instruction stream is being actedon by the CPU during any one clock cycle
Single data: only one data stream is being used as inputduring any one clock cycle
8/13/2019 Parallel Processrs Ppt
9/40
Instructions are executed sequentiallyand the system may or may not have
internal parallel processingcapabilities.
Parallel processing in this case may beachieved by means of multiplefunctional units or by pipelineprocessing.
This is the oldest and even today, themost common type of computer
Examples: older generationmainframes, minicomputers andworkstations; most modern day PCs.
8/13/2019 Parallel Processrs Ppt
10/40
Single Instruction, Multiple Data (SIMD) A type of parallel computer
It represents an organization that includes many processing units under thesupervision of a common control unit.
Single instruction: All processing units execute the same instruction at anygiven clock cycle
Multiple data: Each processing unit can operate on a different data element
The shared memory unit must contain multiple modules so that it can
communicate with all the processors simultaneously. Best suited for specialized problems characterized by a high degree of
regularity, such as graphics/image processing.
Examples:
Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV
Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2,Hitachi S820, ETA10
Most modern computers, particularly those with graphics processor units(GPUs) employ SIMD instructions and execution units.
8/13/2019 Parallel Processrs Ppt
11/40
8/13/2019 Parallel Processrs Ppt
12/40
Multiple Instruction, Single Data
(MISD) MISD structure is only of theoretical interest since no
practical system has been constructed using thisorganization.
single data stream is fed into multiple processing units. Each processing unit operates on the data independently
via independent instruction streams.
Few actual examples of this class of parallel computer haveever existed. One is the experimental Carnegie-MellonC.mmp computer (1971).
Some conceivable uses might be: Multiple frequency filtersoperating on a single signal stream. Multiple cryptographyalgorithms attempting to crack a single coded message.
8/13/2019 Parallel Processrs Ppt
13/40
8/13/2019 Parallel Processrs Ppt
14/40
Multiple Instruction, Multiple Data (MIMD)
MIMD organization refers to a computer system capable of processingseveral programs at the same time.
Most multiprocessor and multicomputer systems can be classified inthis category.
Currently, the most common type of parallel computer. Most modern
computers fall into this category. Multiple Instruction: every processor may be executing a different
instruction stream
Multiple Data: every processor may be working with a different datastream
Execution can be synchronous or asynchronous, deterministic or non-deterministic
Examples: most current supercomputers, networked parallel computerclusters and "grids", multi-processor SMP computers, multi-core PCs.
Note: many MIMD architectures also include SIMD execution sub-
components
8/13/2019 Parallel Processrs Ppt
15/40
8/13/2019 Parallel Processrs Ppt
16/40
A superscalar architecture is one in which several instructions can beinitiated simultaneously and executed independently.
They have the ability to initiate multiple instructions during thesame clock cycle.
8/13/2019 Parallel Processrs Ppt
17/40
A superscalar architecture consists of a numberofpipelinesthat are working in parallel.
8/13/2019 Parallel Processrs Ppt
18/40
PIPELINE A pipelineis a set of data processing elements connected
in series, so that the output of one element is the input ofthe next one.
8/13/2019 Parallel Processrs Ppt
19/40
PIPELINING Pipelining allows the processor to read a new instruction from memory
before it is finished processing the current one. As an instruction goesthrough each stage, the next instruction follows it does not need to wait untilit completely finishes.
Pipelining saves time by ensuring that the microprocessor can start theexecution of a new instruction before completing the current or previousones. However, it can still complete just one instruction per clock cycle.
8/13/2019 Parallel Processrs Ppt
20/40
ADVANTAGES
Allows for instruction execution rate to exceed theclock rate (CPI of less than 1).
It thereby allows faster CPU throughput thanwould otherwise be possible at the same clockrate.
THROUGHPUT- It is maximum number ofinstructions that can be carried out at a givenperiod of time.
http://en.wikipedia.org/wiki/Throughputhttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Throughput8/13/2019 Parallel Processrs Ppt
21/40
Superscalar Architectures
A typical Superscalar processor fetches and decodes theincoming instruction stream several instructions at a time.
Superscalar Execution
8/13/2019 Parallel Processrs Ppt
22/40
Instruction-Level Parallelism Superscalar processors are designed to exploit more
instruction-level parallelism in user programs.
For example,load R1 R2 add R3 R3, 1
add R3 R3, 1 add R4 R3, R2
add R4 R4, R2 store [R4] R0
The three instructions on the left are independent, and intheory all three could be executed in parallel.
The three instructions on the right cannot be executed in
parallel because the second instruction uses the result of thefirst, and the third instruction uses the result of the second.
8/13/2019 Parallel Processrs Ppt
23/40
Fetching and dispatching two instructions per cycle
(degree 2)
8/13/2019 Parallel Processrs Ppt
24/40
One floatingpoint and two integer operations are issued and
executed simultaneously; each unit is pipelined and executes
several operations in different pipeline stages.
8/13/2019 Parallel Processrs Ppt
25/40
Hardware Organization of a superscalar
processor
8/13/2019 Parallel Processrs Ppt
26/40
Some Architectures PowerPC 604
six independent execution units: Branch execution unit Load/Store unit 3 Integer units Floating-point unit
in-order issue register renaming
Power PC 620 provides in addition to the 604 out-of-order issue
Pentium three independent execution units:
2 Integer units Floating point unit
in-order issue
8/13/2019 Parallel Processrs Ppt
27/40
Intel P5 Microarchitecture
Used in initial Pentium processor
Could execute up to 2 instructions simultaneously
8/13/2019 Parallel Processrs Ppt
28/40
PIPELINING: Pipelining is a technique of decomposing a sequential process
(instruction) into sub operations and each of the suboperations get executed in a special dedicated segment that
operates concurrently with all other segments.
Each segment performs partial processing dictated by the waythe task is partitioned. The result obtained from thecomputation in each segment is transferred to the nextsegment in the pipeline.
8/13/2019 Parallel Processrs Ppt
29/40
8/13/2019 Parallel Processrs Ppt
30/40
SUPERPIPELINING:
Superpipelining is the breaking of longer stages of apipeline into smaller stages and this shortens the clockperiod per instruction. Therefore more number ofinstructions can be executed in the same time ascompared to pipelined structure.
The breaking of stages increases the efficiency as clocktime is determined by the longest stage.
8/13/2019 Parallel Processrs Ppt
31/40
8/13/2019 Parallel Processrs Ppt
32/40
TIMING DIAGRAM :
8/13/2019 Parallel Processrs Ppt
33/40
Comparison of clock time per
cycle:
8/13/2019 Parallel Processrs Ppt
34/40
Some processors which have super pipelined
architecture are :-
MIPS R400,Intel Net Burst,ARM11 core.
ARM cores are famous for their simple and cost-effectivedesign. However, ARM cores have also evolved and showsuperpipelining characteristics in their architectures andhave architectural features to hide the possible longpipeline stalls.The ARM11 (specifically, the ARM1136JF) is ahigh performance and low-power processor which isequipped with eight stage pipelining.
The core consists of two fetch stages, one decode stage,one issue stage, and four stages for the integer pipeline.
8/13/2019 Parallel Processrs Ppt
35/40
The eight stages of ARM11 core are :
8/13/2019 Parallel Processrs Ppt
36/40
DIFFERENCE BETWEEN SUPERSCALING
AND SUPER PIPELINING SUPERSCALING :It creates multiple pipelines within
a processor, allowing the CPU to execute multipleinstructions simultaneously.
SUPERPIPELINING :It breaks the instructionpipeline into smaller pipeline stages , allowing the
CPU to start executing the next instruction beforecompleting the previous one. The processor can runmultiple instructions simultaneously, with eachinstruction being at a different stage of completion.
ASPECTS SUPERSCALING SUPERPIPELINING
8/13/2019 Parallel Processrs Ppt
37/40
1. APPROACH Dynamically issuesmultiple instruction
per cycle.
Divides the long latencystages of the pipeline into
shorter stages.
2. INSTRUCTION I SSUE
RATE
Multiple Multiple(different instructionsat different stages ofcompletion)
3.EFFECTS Effects the clock perinstruction (CPI) termof the performanceequation
Effects the clock cycletime term of the
performance equation.
4.DI FF ICULTY OF
DESIGN
Complex design issues Relatively easierdesign
5.ADDI TIONAL AIDS Additional hardwareunits required like the
fetch units.
No additionalhardware unitsrequired.
8/13/2019 Parallel Processrs Ppt
38/40
INSTRUCTION ISSUE STYLE: Both superscaling and superpipelining follow dynamic
instruction scheduling. In dynamic scheduling, the instructions are fetched
sequentially in program order. However, those instructionsare decoded and stored in a scheduling window of a
processor execution core. After decoding the instructions,the processor core obtains the dependency informationbetween the instructions and can also identify theinstructions which are ready for execution.
*The performance equation of a microprocessor :
Execution Time = IC *CPI * clock cycle time
8/13/2019 Parallel Processrs Ppt
39/40
CONCLUSION: From all these we can conclude that the techniques of
parallel processing , superscaling and superpipeliningare different architectural improvements introduced to
increase the efficiency of the modern day computers.
8/13/2019 Parallel Processrs Ppt
40/40