Upload
akash-kankaria
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
1/63
UNIT-5
PIPELINE ANDVECTOR PROCESSING
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
2/63
PIPELINING AND VECTOR PROCESSING
Introduction to pipelining and pipeline hazards
Design issues of pipeline architecture
Instruction level parallelism and advanced issues
Parallel processing concepts-
Vector processing
Array processors
CISC
RISC
VLIW
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
3/63
PARALLEL PROCESSING
1. It is used to provide simultaneous data
processing tasks for the purpose of
increasing the computational speed of acomputer system.
EX
When an inst is being executed in ALU,
the next inst can be read from memory.
2. H/W and cost increase Parallel processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
4/63
Multiple functional units
Adder-Subtracted
Integer multiply
Logic unit
Shift unit
Incremented
Floating point add-subtract
Floating point multiply
Floating point divide
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
5/63
PROCESSOR WITH MULTIPLE FUNCTIONAL UNITS
ADDER-SUBTRACTOR
INTEGERMULTIPLY
LOGIC UNIT
SHIFT UNIT
INCREMENTER
FLOATING-POINT
ADD-SUBTRACT
FLOATING-POINTMULTIPLY
FLOATING-POINTDIVIDE
PROCESSORREGISTER
To memory
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
6/63
PARALLEL COMPUTERS
Architectural Classification
Number of Data Streams
Number of
Instruction
Streams
Single
Multiple
Single Multiple
SISD SIMD
MISD MIMD
Parallel Processing
Flynn's classificationBased on the multiplicity ofInstruction Streams and
Data Streams
Instruction Stream
Sequence of Instructions read from memoryData Stream
Operations performed on the data in the processor
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
7/63
SISD
Inst executed sequentially an system may or may
not have internal parallel processing capabilities.
SIMD
Many processing units under supervision of commoncontrol unit. All processors receive the same instfrom control unit but operate on different items ofdata.
MISD: Theory oriented not practical implemented.
MIMD: Processing several programs at same time.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
8/63
Parallel processing techniques
Pipeline processing
Vector ,,
Array ,,
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
9/63
Pipeline processing:
When arithmetic sub operations on the phases of
computer inst cycle overlay in execution
Vector processing:
Deals with computations involving large
vectors and matrices.
Array processing:
Perform computations on large arrays of data
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
10/63
PIPELINING
A technique of decomposing a sequential process into
sub operations, with each sub process being executed in
a partial dedicated segment that operates concurrently
with all other segments. Result obtained from computation in each segment is
transferred to next segment in pipeline.
Overlapping of computation.
Register holds data and combinational circuit performs
sub operation in particular segment.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
11/63
WHAT IS PIPELINING??
Pipelining is an implementation techniquewhere multiple instructions are overlappedin execution to make fast CPUs.
It is an implementation which exploitsparallelism among the instructions in asequential instruction stream.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
12/63
THE METHODOLOGY
In a pipeline each step is called a pipe
stage/pipe segment which completes a
part of an instruction.
Each stage is connected with each other to
form a pipe.
Instructions enter at one end ,progress
through each stage and exit at the other
end.
Pi li i
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
13/63
PIPELINING
R1 Ai, R2 Bi Load Ai and BiR3 R1 * R2, R4 Ci Multiply and load CiR5 R3 + R4 Add
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai
R1 R2
Multiplier
R3 R4
Adder
R5
Memory
Pipelining
Bi Ci
Segment 1
Segment 2
Segment 3
Example of pipelineprocessing
Pi li i
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
14/63
OPERATIONS IN EACH PIPELINESTAGE
ClockPulse Segment 1 Segment 2 Segment 3
Number R1 R2 R3 R4 R51 A1 B12 A2 B2 A1 * B1 C13 A3 B3 A2 * B2 C2 A1 * B1 + C14 A4 B4 A3 * B3 C3 A2 * B2 + C25 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C47 A7 B7 A6 * B6 C6 A5 * B5 + C58 A7 * B7 C7 A6 * B6 + C69 A7 * B7 + C7
Pipelining
Pipelining
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
15/63
GENERAL PIPELINEGeneral Structure of a 4-Segment Pipeline
S R1 1 S R2 2 S R3 3 S R4 4Input
Clock
Space-Time Diagram
1 2 3 4 5 6 7 8 9
T1
T1
T1
T1
T2
T2
T2
T2
T3
T3
T3
T3 T4
T4
T4
T4 T5
T5
T5
T5 T6
T6
T6
T6
Clock cyclesSegment 1
2
3
4
Pipelining
Pipelining
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
16/63
PIPELINE SPEEDUPn: Number of tasks to be performed
Conventional Machine (Non-Pipelined)tn: Clock cyclet1: Time required to complete the n taskst1 = n * tn
Pipelined Machine (k stages)tp: Clock cycle (time to complete each suboperation)tk: Time required to complete the n taskstk = (k + n - 1) * tp
SpeedupS
k: Speedup
Sk = n*tn / (k + n - 1)*tp
n Sk =tntp
( = k, if tn = k * tp )lim
Pipelining
Pipelining
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
17/63
PIPELINE AND MULTIPLE FUNCTIONUNITS
P1
I i
P2
I i+1
P3
I i+2
P4
I i+3
Multiple Functional Units
Example- 4-stage pipeline
- subopertion in each stage; tp = 20nS- 100 tasks to be executed- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined Systemn*k*tp = 100 * 80 = 8000nS
SpeedupSk = 8000 / 2060 = 3.88
4-Stage Pipeline is basically identical to the systemwith 4 identical function units
Pipelining
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
18/63
Disadvantage of pipeline
Different segment may take different times to complete their sub
operation.
Clk cycle must be chosen to equal the time delay of segment with
max propagation time.
This causes all other segment to waste time waiting for the next
clk.
Two area of computer design where pipeline org is applicable:
1.Arithmetic pipeline : divides arithmetic operation into suboperation for execution in pipeline segment.
2.Inst pipeline: operates on a stream of inst by overlapping fetch,
decode and execute phases of inst cycle
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
19/63
PIPELINE HAZARDS
WHAT ARE PIPELINE HAZARDS ???
Hazards are those situations ,that prevent
the next instruction in the instruction
stream from executing during its
designated clock cycle. They reduce the
performance from the ideal speedup
gained by pipelining.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
20/63
CLASSIFICATION OF HAZARDS
Structural Hazards : arise from resource
conflicts when the hardware cant support
all possible combinations in simultaneous
overlapped execution.
Data hazards : arise when an instruction
depends upon the results of a previous
instruction in a way that is exposed by theoverlapping of instructions in the pipeline.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
21/63
CLASSIFICATION OF HAZARDS
Control Hazards : arise from the pipelining
of branches and other instructions that
change the PC
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
22/63
STRUCTURAL HAZARDS
For any system to be free from hazards,
pipelining of functional units and
duplication of resources is necessary to
allow all possible combinations ofinstructions in the pipeline.
Structural hazards arise due to the
following reasons :
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
23/63
STRUCTURAL HAZARDS
When a functional unit is not fully pipelined ,then the sequence of instructions using that unitcannot proceed at the rate of one per clockcycle.
When the resource is not duplicated enough toallow all possible combinations of instructions.
ex : a machine may have one register file writeport, but it may want to perform 2 writes duringthe same clock cycle.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
24/63
STRUCTURAL HAZARDS
A machine with a shared single memory for
data and instructions . An instruction
containing data memory reference will
conflict with the instruction reference fora later instruction.
This resolved by stalling the pipeline for
one clock cycle when the data memoryaccess occurs.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
25/63
DATA HAZARDS
Data hazards occur when the pipeline
changes the order of read/write accesses
to operands so that the order differs from
the order they see by sequentiallyexecuting instructions on an unpipelined
machine.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
26/63
CLASSIFICATION OF DATA HAZARDS
RAW (read after write ) : consider twoinstructions i and j with i occurring before j.
j tries to read a source before i actually writesinto it , as a result j gets the old value.
Ex :ADD R1,R2,R3
SUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
27/63
CLASSIFICATION OF DATA HAZARDS
This hazard is overcome by a simple hardwaretechnique called forwarding.
in forwarding ,the ALU result from the EX/MEMregister is always fed back into ALU input latches.
if the forwarding hardware detects that theprevious ALU operations has written the registercorresponding to a source for the current ALU
operation, then the control logic selects theforwarded result as the ALU input rather than thevalue read from the register file.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
28/63
CLASSIFICATION OF DATA HAZARDS
WAW (write after write) :
j tries to write an operand before it iswritten by i. Thus the writes are performed
in the wrong order leaving the value of i asthe final value.
This hazard is present in pipelines that
write in more than one pipe stage.However in DLX this isnt a hazard as itwrites only in the WB stage.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
29/63
CLASSIFICATION OF DATA HAZARDS
EX :
LW R1,0(R2)
ADD R1,R2,R3
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
30/63
CLASSIFICATION OF DATA HAZARDS
WRITE AFTER READ (WAR) :
j tries to write a destination before it is read by i.
This doesnt happen in DLX as all reads occur early
(ID phase) and all writes occur late (in WB stage).
EX:
SW 0(R1),R2
ADD R2,R3,R4
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
31/63
CONTROL HAZARDS
Control hazards cause a greater
performance loss compared to the losses
posed by data hazards.
The simplest method of dealing with
branches is that the pipeline is stalled as
soon the branch is detected in the ID phase
and until the MEM stage where the new PCis finally determined.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
32/63
CONTROL HAZARDS
Each branch causes a 3 cycle stall in the DLX
pipeline which is a significant loss as the 30% of
the instructions used are branch instructions.
The number of clock cycles in the branch is
reduced by testing the condition for branching in
the ID stage and computing the destination
address in the ID stage using a separate adder.
Thus there is only clock cycle on branches .
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
33/63
WHAT MAKES PIPELINING HARD
TO IMPLEMENT???
EXCEPTIONAL SITUATIONS : are those
situations in which the normal order of
execution is changed. This is due to
instructions that raise exceptions that mayforce the machine to abort the instructions
in the pipeline before they complete.
A A S P P G ARD
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
34/63
WHAT MAKES PIPELINING HARD
TO IMPLEMENT???
Some of the exceptions include :
o Integer arithmetic overflow/underflow.o Power failure
o Hardware malfunctions.
o I/O device request.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
35/63
Arithmetic pipeline
Usually found in high speed computers.
Used to implement floating-point operations ,multiplication of fixed-point
number.
EX
floating point addition and subtraction.
A and B are fractions representing mantissas and a and b are the
exponents.
Sub operation that are performed in four segment are
[1] Compare theexponents
[2] Align the mantissa
[3] Add/sub the mantissa
[4] Normalize the result
Arithmetic Pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
36/63
ARITHMETIC PIPELINE
Floating-point adder
[1] Compare the exponents[2] Align the mantissa[3] Add/sub the mantissa[4] Normalize the result
X = A x 2aY = B x 2b
R
Compareexponents
by subtraction
a b
R
Choose exponent
Exponents
R
A B
Align mantissa
Mantissas
Difference
R
Add or subtractmantissas
R
Normalizeresult
R
R
Adjustexponent
R
Segment 1:
Segment 2:
Segment 3:
Segment 4:
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
37/63
Instruction pipeline
Instruction pipeline reads consecutive instructions from
memory while previous instructions are being executed
in other segments .
This causes instructions fetch and execute phases tooverlap and perform simultaneous operation
Instruction Pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
38/63
INSTRUCTION CYCLESix Phases* in an Instruction Cycle
[1] Fetch an instruction from memory[2] Decode the instruction[3] Calculate the effective address of the operand[4] Fetch the operands from memory[5] Execute the operation[6] Store the result in the proper place
Some instructions skip some phases* Effective address calculation can be done in the part of the decodingphase* Storage of the operation result into a register is done automatically in theexecution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory[2] DA: Decode the instruction and calculate the effective address of the
operand[3] FO: Fetch the operand
[4] EX: Execute the operation
Instruction Pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
39/63
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Fetch instructionfrom memory
Decode instructionand calculate
effective address
Branch?
Fetch operandfrom memory
Execute instruction
Interrupt?Interrupthandling
Update PC
Empty pipe
no
yes
yes no
Segment1:
Segment2:
Segment3:
Segment4:
Four segment CPU pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
40/63
Timing of instruction pipeline
1 2 3 4 5 6 7 8 9 10 12 1311
FI DA FO EX1
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
2
3
4
5
6
7
FI
Step:
Instruction
(Branch)
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
41/63
Major difficulties
Resource conflicts:
caused by access to memory by 2 segment atthe same time. These conflicts can be resolved
by using separate instruction and data memories.Data dependency conflicts:
when inst depends on result of previous inst
but this result not yet available.Branch difficulties arise from branch and other
inst that change the value of pc.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
42/63
1.Data dependency
A data dependency occurs when an inst needs data that are notyet available.
H/W interlocks:
circuit that detects inst whose source operands are dest of instfarther up in pipeline.
Inst whose source is not available to be delayed by enough clockcycle to resolve conflict.
2.Operand forwarding:Special h/w to detect a conflict an then avoid it by routing datathrough special path b/w pipeline segment.
3. Delayed load:
Reorder the inst as necessary to delay the loading of conflictingdata by inserting no-operation inst
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
43/63
Handling of branch instructionsPrefetch Target Instruction
Fetch instructions in both streams, branch not taken andbranch taken
Both are saved until branch branch is executed. Then,select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
Entry: Addr of previously executed branches; Targetinstruction
and the next few instructions
When fetching an instruction, search BTB.
If found, fetch the instruction stream in BTB;
If not, new stream is fetched and update BTB
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
44/63
Loop Buffer (High Speed Register file)
Storage of entire loop that allows to execute a loop without
accessing memory
Branch Prediction
Guessing the branch condition, and fetch an instruction
stream based onthe guess. Correct guess eliminates the branch penalty
Delayed Branch
Compiler detects the branch and rearranges the instructionsequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
45/63
RISC pipeline
RISC
- Machine with a very fast clock cycle that executes at the rate
of one instruction per cycle
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
46/63
RISC PIPELINE
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation InstructionsI: Instruction FetchA: Decode, Read Registers, ALU Operations
E: Write a Register
Load and Store InstructionsI: Instruction FetchA: Decode, Evaluate Effective AddressE: Register-to-Memory or Memory-to-Register
Program Control InstructionsI: Instruction FetchA: Decode, Evaluate Branch AddressE: Write Register(PC)
RISC Pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
47/63
DELAYED LOAD
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle 1 2 3 4 5 6
Load R1 I A ELoad R2 I A EAdd R1+R2 I A EStore R3 I A E
Pipeline timing with delayed load
clock cycle 1 2 3 4 5 6 7Load R1 I A ELoad R2 I A ENOP I A EAdd R1+R2 I A EStore R3 I A E
LOAD: R1 M[address 1]LOAD: R2 M[address 2]ADD: R3 R1 + R2STORE: M[address 3] R3
The data dependency is takencare by the compiler ratherthan the hardware
Advantage
RISC Pipeline
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
48/63
DELAYED BRANCH
1
I
3 4 652Clock cycles:
1. Load A
2. Increment
4. Subtract
5. Branch to X
7
3. Add
8
6. NOP
E
I A EI A E
I A E
I A E
I A E
9 10
7. NOP
8. Instr. in X
I A E
I A E
1
I
3 4 652Clock cycles:
1. Load A
2. Increme nt
4. Add
5. Subtract
7
3. Branch to X
8
6. Instr. in X
E
I A E
I A E
I A E
I A E
I A E
Compiler analyzes the instructions before and afterthe branch and rearranges the program sequence byinserting useful instructions in the delay steps
Using no-operation instructions
Rearranging the instructions
CISC (COMPLEX INSTRUCTION SET COMPUTING)
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
49/63
CISC (COMPLEX INSTRUCTION SET COMPUTING)
A CISC is a computer where single instruction can execute
several low-level operations and are capable of multi-stepoperations or addressing modes within a single instruction.
Some complex instructions are difficult or impossible toexecute in one cycle through the pipeline
Many different addressing modesInstructions of different lengths
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
50/63
Implementing a CISC Architecture There are simple and complex instructions in a CISC
architecture One approach:
Adapt the RISC pipeline Execute the simple, frequently-used CISC instructions as in RISC For the more complex instructions, use microinstructions.
That means, a sequence of microinstructions is stored on ROM for eachcomplex CISC instruction Complex instructions often involve multiple microoperations or memory
accesses in sequence When a complex instruction is decoded in the DOF stage of the
pipeline, the microcode address and control is given to the microcodecounter. Microinstructions are executed until the instruction iscompleted.
Each microoperation is simply a set of control input signals Example: A certain CISC instruction I has microcode written in the
microcode ROM at address A. When I is decoded in the main pipeline,stall the main pipeline, and give control to the microcode control (MC).At each subsequent clock cycle, MC will increment and execute the nextmicrooperation (which is a control word that controls the datapath).The last microoperation in the sequence will give control back to themain pipeline, and un-stall the main pipeline.
CISC Approach
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
51/63
CISC Approach
The primary goal of CISC architecture is to complete a task in a few lines ofAssembly as possible.
This is achieved by building processor hardware that is capable ofUnderstanding and executing a series of operation.
EX: MULT 2:3,5:2
For this task a CISC processor would come prepare with a specific instruction.
This instruction loads two values into separate register, multiplies the operandIn execution unit and stores the product in app. Register.
The entire task of multiplying two numbers can be completed with oneInstruction.
MULT ----- complex instruction.
CISC minimizes the number of instruction per program, sacrificing number ofCycles per instruction.
Vector Processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
52/63
VECTOR PROCESSING
Vector Processing Applications
Problems that can be efficiently formulated in terms of vectors
Long-range weather forecasting
Petroleum explorations
Seismic data analysis Medical diagnosis
Aerodynamics and space flight simulations
Artificial intelligence and expert systems
Image processing
Vector Processor (computer)
Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers
Vector Processors may also be pipelined
Vector Processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
53/63
VECTOR PROGRAMMING
DO 20 I = 1, 10020 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 Read A(I)Read B(I)Store C(I) = A(I) + B(I)Increment I = i + 1If I 100 goto 20
Vector computer
C(1:100) = A(1:100) + B(1:100)
Vector Processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
54/63
VECTOR INSTRUCTIONSf1: V * V
f2: V * S
f3: V x V * V
f4: V x S * V
V: Vector operandS: Scalar operand
Type Mnemonic Description (I = 1, ..., n)
f1 VSQR Vector square root B(I) * SQR(A(I))
VSIN Vector sine B(I) * sin(A(I))
VCOM Vector complement A(I) * A(I)
f2 VSUM Vector summation S * S A(I)
VMAX Vector maximum S * max{A(I)}
f3 VADD Vector add C(I) * A(I) + B(I)
VMPY Vector multiply C(I) * A(I) * B(I)
VAND Vector AND C(I) * A(I) . B(I)
VLAR Vector larger C(I) * max(A(I),B(I))
VTGE Vector test > C(I) * 0 if A(I) < B(I)
C(I) * 1 if A(I) > B(I)
f4 SADD Vector-scalar add B(I) * S + A(I)
SDIV Vector-scalar divide B(I) * A(I) / S
Vector Processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
55/63
VECTOR INSTRUCTION FORMAT
Operationcode
Base addresssource 1
Base addresssource 2
Base addressdestination
Vectorlength
Vector Instruction Format
SourceA
SourceB
Multiplierpipeline
Adderpipeline
Pipeline for Inner Product
MULTIPLE MEMORY MODULE AND INTERLEAVINGVector Processing
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
56/63
MULTIPLE MEMORY MODULE AND INTERLEAVING
Multiple Module Memory
Address Interleaving
Different sets of addresses are assigned todifferent memory modules
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
Address bus
Data bus
M0 M1 M2 M3
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
57/63
Pipeline & vector processors often require simultaneous Pipeline
& vector processors often require simultaneous access to
memory from 2 or more sources.
Inst pipeline may require fetching of an inst and an operand at
same time from two different segment.
Memory can be partitioned into number of modules connected to
common memory address & data buses. Memory module is a
memory array together with its own address & data register.
One module initiates memory access while other modules are inthe process of reading or writing and each modules can honor a
memory request independent of the state of the other modules.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
58/63
Advantage of modular memory
Interleaving:
In an interleaved memory, different sets of address are assigned
to different memory modules.
A vector processor that uses n-way interleaved memory can fetchn operands from n different modules.
Example
In 2-modules memory system , even address may be 1 module
and the odd addresses in the other. A CPU with inst pipeline can take advantage of multiple
memory modules so that each segment in pipeline can access
memory independent of memory access from other segments.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
59/63
Array processors
It is a processor that performs computations on large-
arrays of data.
Attached array processor
It is an auxiliary processor attached to general purposecomputer.
SIMD Array Processor.
It is a processor that has single inst multiple data org.Manipulates vector inst by means of multiple functional
units responding to a common inst.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
60/63
Attached array processor
Enhance the performance of computer by providing vector
processing for complex scientific applications
Attached array processor
General purposecomputer
I/P - O/P interfacei/pO/p interface Attached array processor
Main memory Local memory
High speed
Memory tomemory bus
SIMD Array processors
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
61/63
SIMD Array processors
SIMD array processor org
PE1
PE2
PEn
M1
M2
Mn
Master ctrl unit
Main memory
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
62/63
It is a computer with multiple processing units operating inparallel.
It consist of a set of identical processing element each having alocal memory.
Each processor element includes an ALU, Floating pointarithmetic unit an working register.
Master ctl unit controls the operations in the processors elements.
Main memory is used for storage of program.
Function of master control unit is to decode the inst anddetermine how the inst is to executed.
Vector inst are broadcast to all PEs Simultaneously.
Vector operands are distributed to local memories prior toparallel execution of inst.
8/3/2019 CP0804_06-Apr-2011_RM01_unit 5
63/63