CP0804_06-Apr-2011_RM01_unit 5

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

1/63

UNIT-5

PIPELINE ANDVECTOR PROCESSING

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

2/63

PIPELINING AND VECTOR PROCESSING

Introduction to pipelining and pipeline hazards

Design issues of pipeline architecture

Instruction level parallelism and advanced issues

Parallel processing concepts-

Vector processing

Array processors

CISC

RISC

VLIW

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

3/63

PARALLEL PROCESSING

1. It is used to provide simultaneous data

processing tasks for the purpose of

increasing the computational speed of acomputer system.

EX

When an inst is being executed in ALU,

the next inst can be read from memory.

2. H/W and cost increase Parallel processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

4/63

Multiple functional units

Adder-Subtracted

Integer multiply

Logic unit

Shift unit

Incremented

Floating point add-subtract

Floating point multiply

Floating point divide

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

5/63

PROCESSOR WITH MULTIPLE FUNCTIONAL UNITS

ADDER-SUBTRACTOR

INTEGERMULTIPLY

LOGIC UNIT

SHIFT UNIT

INCREMENTER

FLOATING-POINT

ADD-SUBTRACT

FLOATING-POINTMULTIPLY

FLOATING-POINTDIVIDE

PROCESSORREGISTER

To memory

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

6/63

PARALLEL COMPUTERS

Architectural Classification

Number of Data Streams

Number of

Instruction

Streams

Single

Multiple

Single Multiple

SISD SIMD

MISD MIMD

Parallel Processing

Flynn's classificationBased on the multiplicity ofInstruction Streams and

Data Streams

Instruction Stream

Sequence of Instructions read from memoryData Stream

Operations performed on the data in the processor

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

7/63

SISD

Inst executed sequentially an system may or may

not have internal parallel processing capabilities.

SIMD

Many processing units under supervision of commoncontrol unit. All processors receive the same instfrom control unit but operate on different items ofdata.

MISD: Theory oriented not practical implemented.

MIMD: Processing several programs at same time.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

8/63

Parallel processing techniques

Pipeline processing

Vector ,,

Array ,,

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

9/63

Pipeline processing:

When arithmetic sub operations on the phases of

computer inst cycle overlay in execution

Vector processing:

Deals with computations involving large

vectors and matrices.

Array processing:

Perform computations on large arrays of data

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

10/63

PIPELINING

A technique of decomposing a sequential process into

sub operations, with each sub process being executed in

a partial dedicated segment that operates concurrently

with all other segments. Result obtained from computation in each segment is

transferred to next segment in pipeline.

Overlapping of computation.

Register holds data and combinational circuit performs

sub operation in particular segment.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

11/63

WHAT IS PIPELINING??

Pipelining is an implementation techniquewhere multiple instructions are overlappedin execution to make fast CPUs.

It is an implementation which exploitsparallelism among the instructions in asequential instruction stream.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

12/63

THE METHODOLOGY

In a pipeline each step is called a pipe

stage/pipe segment which completes a

part of an instruction.

Each stage is connected with each other to

form a pipe.

Instructions enter at one end ,progress

through each stage and exit at the other

end.

Pi li i

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

13/63

PIPELINING

R1 Ai, R2 Bi Load Ai and BiR3 R1 * R2, R4 Ci Multiply and load CiR5 R3 + R4 Add

Ai * Bi + Ci for i = 1, 2, 3, ... , 7

Ai

R1 R2

Multiplier

R3 R4

Adder

R5

Memory

Pipelining

Bi Ci

Segment 1

Segment 2

Segment 3

Example of pipelineprocessing

Pi li i

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

14/63

OPERATIONS IN EACH PIPELINESTAGE

ClockPulse Segment 1 Segment 2 Segment 3

Number R1 R2 R3 R4 R51 A1 B12 A2 B2 A1 * B1 C13 A3 B3 A2 * B2 C2 A1 * B1 + C14 A4 B4 A3 * B3 C3 A2 * B2 + C25 A5 B5 A4 * B4 C4 A3 * B3 + C3

6 A6 B6 A5 * B5 C5 A4 * B4 + C47 A7 B7 A6 * B6 C6 A5 * B5 + C58 A7 * B7 C7 A6 * B6 + C69 A7 * B7 + C7

Pipelining

Pipelining

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

15/63

GENERAL PIPELINEGeneral Structure of a 4-Segment Pipeline

S R1 1 S R2 2 S R3 3 S R4 4Input

Clock

Space-Time Diagram

1 2 3 4 5 6 7 8 9

T1

T1

T1

T1

T2

T2

T2

T2

T3

T3

T3

T3 T4

T4

T4

T4 T5

T5

T5

T5 T6

T6

T6

T6

Clock cyclesSegment 1

2

3

4

Pipelining

Pipelining

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

16/63

PIPELINE SPEEDUPn: Number of tasks to be performed

Conventional Machine (Non-Pipelined)tn: Clock cyclet1: Time required to complete the n taskst1 = n * tn

Pipelined Machine (k stages)tp: Clock cycle (time to complete each suboperation)tk: Time required to complete the n taskstk = (k + n - 1) * tp

SpeedupS

k: Speedup

Sk = n*tn / (k + n - 1)*tp

n Sk =tntp

( = k, if tn = k * tp )lim

Pipelining

Pipelining

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

17/63

PIPELINE AND MULTIPLE FUNCTIONUNITS

P1

I i

P2

I i+1

P3

I i+2

P4

I i+3

Multiple Functional Units

Example- 4-stage pipeline

- subopertion in each stage; tp = 20nS- 100 tasks to be executed- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined Systemn*k*tp = 100 * 80 = 8000nS

SpeedupSk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the systemwith 4 identical function units

Pipelining

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

18/63

Disadvantage of pipeline

Different segment may take different times to complete their sub

operation.

Clk cycle must be chosen to equal the time delay of segment with

max propagation time.

This causes all other segment to waste time waiting for the next

clk.

Two area of computer design where pipeline org is applicable:

1.Arithmetic pipeline : divides arithmetic operation into suboperation for execution in pipeline segment.

2.Inst pipeline: operates on a stream of inst by overlapping fetch,

decode and execute phases of inst cycle

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

19/63

PIPELINE HAZARDS

WHAT ARE PIPELINE HAZARDS ???

Hazards are those situations ,that prevent

the next instruction in the instruction

stream from executing during its

designated clock cycle. They reduce the

performance from the ideal speedup

gained by pipelining.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

20/63

CLASSIFICATION OF HAZARDS

Structural Hazards : arise from resource

conflicts when the hardware cant support

all possible combinations in simultaneous

overlapped execution.

Data hazards : arise when an instruction

depends upon the results of a previous

instruction in a way that is exposed by theoverlapping of instructions in the pipeline.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

21/63

CLASSIFICATION OF HAZARDS

Control Hazards : arise from the pipelining

of branches and other instructions that

change the PC

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

22/63

STRUCTURAL HAZARDS

For any system to be free from hazards,

pipelining of functional units and

duplication of resources is necessary to

allow all possible combinations ofinstructions in the pipeline.

Structural hazards arise due to the

following reasons :

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

23/63

STRUCTURAL HAZARDS

When a functional unit is not fully pipelined ,then the sequence of instructions using that unitcannot proceed at the rate of one per clockcycle.

When the resource is not duplicated enough toallow all possible combinations of instructions.

ex : a machine may have one register file writeport, but it may want to perform 2 writes duringthe same clock cycle.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

24/63

STRUCTURAL HAZARDS

A machine with a shared single memory for

data and instructions . An instruction

containing data memory reference will

conflict with the instruction reference fora later instruction.

This resolved by stalling the pipeline for

one clock cycle when the data memoryaccess occurs.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

25/63

DATA HAZARDS

Data hazards occur when the pipeline

changes the order of read/write accesses

to operands so that the order differs from

the order they see by sequentiallyexecuting instructions on an unpipelined

machine.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

26/63

CLASSIFICATION OF DATA HAZARDS

RAW (read after write ) : consider twoinstructions i and j with i occurring before j.

j tries to read a source before i actually writesinto it , as a result j gets the old value.

Ex :ADD R1,R2,R3

SUB R4,R1,R5

AND R6,R1,R7

OR R8,R1,R9

XOR R10,R1,R11

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

27/63


This hazard is overcome by a simple hardwaretechnique called forwarding.

in forwarding ,the ALU result from the EX/MEMregister is always fed back into ALU input latches.

if the forwarding hardware detects that theprevious ALU operations has written the registercorresponding to a source for the current ALU

operation, then the control logic selects theforwarded result as the ALU input rather than thevalue read from the register file.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

28/63


WAW (write after write) :

j tries to write an operand before it iswritten by i. Thus the writes are performed

in the wrong order leaving the value of i asthe final value.

This hazard is present in pipelines that

write in more than one pipe stage.However in DLX this isnt a hazard as itwrites only in the WB stage.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

29/63


EX :

LW R1,0(R2)

ADD R1,R2,R3

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

30/63


WRITE AFTER READ (WAR) :

j tries to write a destination before it is read by i.

This doesnt happen in DLX as all reads occur early

(ID phase) and all writes occur late (in WB stage).

EX:

SW 0(R1),R2

ADD R2,R3,R4

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

31/63

CONTROL HAZARDS

Control hazards cause a greater

performance loss compared to the losses

posed by data hazards.

The simplest method of dealing with

branches is that the pipeline is stalled as

soon the branch is detected in the ID phase

and until the MEM stage where the new PCis finally determined.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

32/63

CONTROL HAZARDS

Each branch causes a 3 cycle stall in the DLX

pipeline which is a significant loss as the 30% of

the instructions used are branch instructions.

The number of clock cycles in the branch is

reduced by testing the condition for branching in

the ID stage and computing the destination

address in the ID stage using a separate adder.

Thus there is only clock cycle on branches .

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

33/63

WHAT MAKES PIPELINING HARD

TO IMPLEMENT???

EXCEPTIONAL SITUATIONS : are those

situations in which the normal order of

execution is changed. This is due to

instructions that raise exceptions that mayforce the machine to abort the instructions

in the pipeline before they complete.

A A S P P G ARD

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

34/63

WHAT MAKES PIPELINING HARD

TO IMPLEMENT???

Some of the exceptions include :

o Integer arithmetic overflow/underflow.o Power failure

o Hardware malfunctions.

o I/O device request.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

35/63

Arithmetic pipeline

Usually found in high speed computers.

Used to implement floating-point operations ,multiplication of fixed-point

number.

EX

floating point addition and subtraction.

A and B are fractions representing mantissas and a and b are the

exponents.

Sub operation that are performed in four segment are

[1] Compare theexponents

[2] Align the mantissa

[3] Add/sub the mantissa

[4] Normalize the result

Arithmetic Pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

36/63

ARITHMETIC PIPELINE

Floating-point adder

[1] Compare the exponents[2] Align the mantissa[3] Add/sub the mantissa[4] Normalize the result

X = A x 2aY = B x 2b

R

Compareexponents

by subtraction

a b

R

Choose exponent

Exponents

R

A B

Align mantissa

Mantissas

Difference

R

Add or subtractmantissas

R

Normalizeresult

R

R

Adjustexponent

R

Segment 1:

Segment 2:

Segment 3:

Segment 4:

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

37/63

Instruction pipeline

Instruction pipeline reads consecutive instructions from

memory while previous instructions are being executed

in other segments .

This causes instructions fetch and execute phases tooverlap and perform simultaneous operation

Instruction Pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

38/63

INSTRUCTION CYCLESix Phases* in an Instruction Cycle

[1] Fetch an instruction from memory[2] Decode the instruction[3] Calculate the effective address of the operand[4] Fetch the operands from memory[5] Execute the operation[6] Store the result in the proper place

Some instructions skip some phases* Effective address calculation can be done in the part of the decodingphase* Storage of the operation result into a register is done automatically in theexecution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory[2] DA: Decode the instruction and calculate the effective address of the

operand[3] FO: Fetch the operand

[4] EX: Execute the operation

Instruction Pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

39/63

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Fetch instructionfrom memory

Decode instructionand calculate

effective address

Branch?

Fetch operandfrom memory

Execute instruction

Interrupt?Interrupthandling

Update PC

Empty pipe

no

yes

yes no

Segment1:

Segment2:

Segment3:

Segment4:

Four segment CPU pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

40/63

Timing of instruction pipeline

1 2 3 4 5 6 7 8 9 10 12 1311

FI DA FO EX1

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO EX

2

3

4

5

6

7

FI

Step:

Instruction

(Branch)

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

41/63

Major difficulties

Resource conflicts:

caused by access to memory by 2 segment atthe same time. These conflicts can be resolved

by using separate instruction and data memories.Data dependency conflicts:

when inst depends on result of previous inst

but this result not yet available.Branch difficulties arise from branch and other

inst that change the value of pc.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

42/63

1.Data dependency

A data dependency occurs when an inst needs data that are notyet available.

H/W interlocks:

circuit that detects inst whose source operands are dest of instfarther up in pipeline.

Inst whose source is not available to be delayed by enough clockcycle to resolve conflict.

2.Operand forwarding:Special h/w to detect a conflict an then avoid it by routing datathrough special path b/w pipeline segment.

3. Delayed load:

Reorder the inst as necessary to delay the loading of conflictingdata by inserting no-operation inst

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

43/63

Handling of branch instructionsPrefetch Target Instruction

Fetch instructions in both streams, branch not taken andbranch taken

Both are saved until branch branch is executed. Then,select the right

instruction stream and discard the wrong stream

Branch Target Buffer(BTB; Associative Memory)

Entry: Addr of previously executed branches; Targetinstruction

and the next few instructions

When fetching an instruction, search BTB.

If found, fetch the instruction stream in BTB;

If not, new stream is fetched and update BTB

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

44/63

Loop Buffer (High Speed Register file)

Storage of entire loop that allows to execute a loop without

accessing memory

Branch Prediction

Guessing the branch condition, and fetch an instruction

stream based onthe guess. Correct guess eliminates the branch penalty

Delayed Branch

Compiler detects the branch and rearranges the instructionsequence

by inserting useful instructions that keep the pipeline busy

in the presence of a branch instruction

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

45/63

RISC pipeline

RISC

- Machine with a very fast clock cycle that executes at the rate

of one instruction per cycle

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

46/63

RISC PIPELINE

Instruction Cycles of Three-Stage Instruction Pipeline

Data Manipulation InstructionsI: Instruction FetchA: Decode, Read Registers, ALU Operations

E: Write a Register

Load and Store InstructionsI: Instruction FetchA: Decode, Evaluate Effective AddressE: Register-to-Memory or Memory-to-Register

Program Control InstructionsI: Instruction FetchA: Decode, Evaluate Branch AddressE: Write Register(PC)

RISC Pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

47/63

DELAYED LOAD

Three-segment pipeline timing

Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6

Load R1 I A ELoad R2 I A EAdd R1+R2 I A EStore R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7Load R1 I A ELoad R2 I A ENOP I A EAdd R1+R2 I A EStore R3 I A E

LOAD: R1 M[address 1]LOAD: R2 M[address 2]ADD: R3 R1 + R2STORE: M[address 3] R3

The data dependency is takencare by the compiler ratherthan the hardware

Advantage

RISC Pipeline

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

48/63

DELAYED BRANCH

1

I

3 4 652Clock cycles:

1. Load A

2. Increment

4. Subtract

5. Branch to X

7

3. Add

8

6. NOP

E

I A EI A E

I A E

I A E

I A E

9 10

7. NOP

8. Instr. in X

I A E

I A E

1

I

3 4 652Clock cycles:

1. Load A

2. Increme nt

4. Add

5. Subtract

7

3. Branch to X

8

6. Instr. in X

E

I A E

I A E

I A E

I A E

I A E

Compiler analyzes the instructions before and afterthe branch and rearranges the program sequence byinserting useful instructions in the delay steps

Using no-operation instructions

Rearranging the instructions

CISC (COMPLEX INSTRUCTION SET COMPUTING)

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

49/63

CISC (COMPLEX INSTRUCTION SET COMPUTING)

A CISC is a computer where single instruction can execute

several low-level operations and are capable of multi-stepoperations or addressing modes within a single instruction.

Some complex instructions are difficult or impossible toexecute in one cycle through the pipeline

Many different addressing modesInstructions of different lengths

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

50/63

Implementing a CISC Architecture There are simple and complex instructions in a CISC

architecture One approach:

Adapt the RISC pipeline Execute the simple, frequently-used CISC instructions as in RISC For the more complex instructions, use microinstructions.

That means, a sequence of microinstructions is stored on ROM for eachcomplex CISC instruction Complex instructions often involve multiple microoperations or memory

accesses in sequence When a complex instruction is decoded in the DOF stage of the

pipeline, the microcode address and control is given to the microcodecounter. Microinstructions are executed until the instruction iscompleted.

Each microoperation is simply a set of control input signals Example: A certain CISC instruction I has microcode written in the

microcode ROM at address A. When I is decoded in the main pipeline,stall the main pipeline, and give control to the microcode control (MC).At each subsequent clock cycle, MC will increment and execute the nextmicrooperation (which is a control word that controls the datapath).The last microoperation in the sequence will give control back to themain pipeline, and un-stall the main pipeline.

CISC Approach

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

51/63

CISC Approach

The primary goal of CISC architecture is to complete a task in a few lines ofAssembly as possible.

This is achieved by building processor hardware that is capable ofUnderstanding and executing a series of operation.

EX: MULT 2:3,5:2

For this task a CISC processor would come prepare with a specific instruction.

This instruction loads two values into separate register, multiplies the operandIn execution unit and stores the product in app. Register.

The entire task of multiplying two numbers can be completed with oneInstruction.

MULT ----- complex instruction.

CISC minimizes the number of instruction per program, sacrificing number ofCycles per instruction.

Vector Processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

52/63

VECTOR PROCESSING

Vector Processing Applications

Problems that can be efficiently formulated in terms of vectors

Long-range weather forecasting

Petroleum explorations

Seismic data analysis Medical diagnosis

Aerodynamics and space flight simulations

Artificial intelligence and expert systems

Image processing

Vector Processor (computer)

Ability to process vectors, and related data structures such as matrices

and multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelined

Vector Processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

53/63

VECTOR PROGRAMMING

DO 20 I = 1, 10020 C(I) = B(I) + A(I)

Conventional computer

Initialize I = 0

20 Read A(I)Read B(I)Store C(I) = A(I) + B(I)Increment I = i + 1If I 100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)

Vector Processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

54/63

VECTOR INSTRUCTIONSf1: V * V

f2: V * S

f3: V x V * V

f4: V x S * V

V: Vector operandS: Scalar operand

Type Mnemonic Description (I = 1, ..., n)

f1 VSQR Vector square root B(I) * SQR(A(I))

VSIN Vector sine B(I) * sin(A(I))

VCOM Vector complement A(I) * A(I)

f2 VSUM Vector summation S * S A(I)

VMAX Vector maximum S * max{A(I)}

f3 VADD Vector add C(I) * A(I) + B(I)

VMPY Vector multiply C(I) * A(I) * B(I)

VAND Vector AND C(I) * A(I) . B(I)

VLAR Vector larger C(I) * max(A(I),B(I))

VTGE Vector test > C(I) * 0 if A(I) < B(I)

C(I) * 1 if A(I) > B(I)

f4 SADD Vector-scalar add B(I) * S + A(I)

SDIV Vector-scalar divide B(I) * A(I) / S

Vector Processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

55/63

VECTOR INSTRUCTION FORMAT

Operationcode

Base addresssource 1

Base addresssource 2

Base addressdestination

Vectorlength

Vector Instruction Format

SourceA

SourceB

Multiplierpipeline

Adderpipeline

Pipeline for Inner Product

MULTIPLE MEMORY MODULE AND INTERLEAVINGVector Processing

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

56/63

MULTIPLE MEMORY MODULE AND INTERLEAVING

Multiple Module Memory

Address Interleaving

Different sets of addresses are assigned todifferent memory modules

AR

Memory

array

DR

AR

Memory

array

DR

AR

Memory

array

DR

AR

Memory

array

DR

Address bus

Data bus

M0 M1 M2 M3

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

57/63

Pipeline & vector processors often require simultaneous Pipeline

& vector processors often require simultaneous access to

memory from 2 or more sources.

Inst pipeline may require fetching of an inst and an operand at

same time from two different segment.

Memory can be partitioned into number of modules connected to

common memory address & data buses. Memory module is a

memory array together with its own address & data register.

One module initiates memory access while other modules are inthe process of reading or writing and each modules can honor a

memory request independent of the state of the other modules.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

58/63

Advantage of modular memory

Interleaving:

In an interleaved memory, different sets of address are assigned

to different memory modules.

A vector processor that uses n-way interleaved memory can fetchn operands from n different modules.

Example

In 2-modules memory system , even address may be 1 module

and the odd addresses in the other. A CPU with inst pipeline can take advantage of multiple

memory modules so that each segment in pipeline can access

memory independent of memory access from other segments.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

59/63

Array processors

It is a processor that performs computations on large-

arrays of data.

Attached array processor

It is an auxiliary processor attached to general purposecomputer.

SIMD Array Processor.

It is a processor that has single inst multiple data org.Manipulates vector inst by means of multiple functional

units responding to a common inst.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

60/63


Enhance the performance of computer by providing vector

processing for complex scientific applications


General purposecomputer

I/P - O/P interfacei/pO/p interface Attached array processor

Main memory Local memory

High speed

Memory tomemory bus

SIMD Array processors

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

61/63

SIMD Array processors

SIMD array processor org

PE1

PE2

PEn

M1

M2

Mn

Master ctrl unit

Main memory

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

62/63

It is a computer with multiple processing units operating inparallel.

It consist of a set of identical processing element each having alocal memory.

Each processor element includes an ALU, Floating pointarithmetic unit an working register.

Master ctl unit controls the operations in the processors elements.

Main memory is used for storage of program.

Function of master control unit is to decode the inst anddetermine how the inst is to executed.

Vector inst are broadcast to all PEs Simultaneously.

Vector operands are distributed to local memories prior toparallel execution of inst.

8/3/2019 CP0804_06-Apr-2011_RM01_unit 5

63/63

Documents

CP0804_06-Apr-2011_RM01_unit 5