18
Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch 1 Z. Li, 2008 COMP 212 Computer Organization & Architecture COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch 2 Z. Li, 2008 Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism Divide instruction cycles into stages, overlapped execution Could potentially achieve k time speed up for k-stage pipelines Pipeline Hazards: Structural: two micro-ops requires the same circuits in the same cycle Control: target branch PC not known until execution Data: successive instructions read the output of previous instruction Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch 3 Z. Li, 2008 Instruction Micro-Operations An 6-stage pipeline Execution takes longer than fetch Break up execution into sub-cycles, i.e, DI, CO, FO, EI, WO. Allow overlapping, or pre- fetch the command Branch : may have to re- fetch the correct instruction Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch Comp 212 Computer Org & Arch 4 Z. Li, 2008 Instruction Pipeline – no hazard Speedup: 9x6=54 (no pipeline) vs 14 (pipelined) time slots.

COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

COMP 212 Computer Organization & Architecture

COMP 212 Fall 2008

Lecture 12

RISC & Superscalar

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 2 Z. Li, 2008

Pipeline Re-Cap

• Pipeline is ILP -Instruction Level Parallelism

– Divide instruction cycles into stages, overlapped execution

– Could potentially achieve k time speed up for k-stage pipelines

• Pipeline Hazards:

– Structural: two micro-ops requires the same circuits in the same

cycle

– Control: target branch PC not known until execution

– Data: successive instructions read the output of previous instruction

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 3 Z. Li, 2008

Instruction Micro-Operations

• An 6-stage pipeline

– Execution takes longer

than fetch

– Break up execution into

sub-cycles, i.e, DI, CO, FO,

EI, WO.

– Allow overlapping, or pre-

fetch the command

– Branch : may have to re-

fetch the correct

instruction

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 4 Z. Li, 2008

Instruction Pipeline – no hazard

Speedup: 9x6=54 (no pipeline) vs 14 (pipelined) time slots.

Page 2: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 5 Z. Li, 2008

Conditional branching

• The correct PC address is runtime dependent

Branch

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 6 Z. Li, 2008

Alternative Pipeline View

Flush out I6-I3

Found thatCorrect PC should be I15

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 7 Z. Li, 2008

Speedup – perfect case

• k-stage pipeline, n instructions, execution time speed up:

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 8 Z. Li, 2008

Dealing with Branches

• Pipeline efficiency depends on a steady stream of

instructions that fills up the pipeline

• Conditional branching is a major drawback for efficiency

• Can be deal with by:

– Multiple Streams

– Prefetch Branch Target

– Loop buffer

– Branch prediction

– Delayed branching

Page 3: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 9 Z. Li, 2008

Branch Prediction – Static Solutions

• Predict never taken

– Assume that jump will not happen

– Always fetch next instruction

– 68020 & VAX 11/780

• Predict always taken

– Assume that jump will happen

– Always fetch target instruction

• Predict by opcode

– By collecting stats on different opcode w.r.t. branching

– Correct rate > 75%

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 10 Z. Li, 2008

Branch Prediction – Dynamic, Runtime Based

• Taken/Not taken switch

– Use 1 or 2 bits to record taken/not taken history

– Good for loops

• Branch history table

– Based on previous history

– Good for loops

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 11 Z. Li, 2008

Branch Prediction State Diagram

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 12 Z. Li, 2008

RISC

Reduced Instruction Set Computer

Page 4: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 13 Z. Li, 2008

Motivation of RISC

• Improve Pipeline efficiency

– Fixed instruction format and small number of instructions:

» Make the operations more predictable and manageable

– Large register files

» avoid data dependency and hazard

– Both compile time and run time pipeline optimization,

» register renaming, out of order execution.

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 14 Z. Li, 2008

A little bit of history….

• The computer family concept

– IBM System/360 1964, DEC PDP-8

– Separates architecture from implementation

• Microporgrammed control unit

– Idea by Wilkes 1951, produced by IBM S/360 1964

– Flexibility and extensibility in CPU control implementation.

• Cache memory

– IBM S/360 model 85 1969

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 15 Z. Li, 2008

A bit of history….

• Solid State RAM

– (See memory notes)

• Microprocessors

– Intel 4004 1971

• Pipelining

– Introduces parallelism into fetch execute cycle

• Multiple processors

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 16 Z. Li, 2008

The Next Step - RISC

• Reduced Instruction Set Computer

• Key features

– Large number of general purpose registers

– or use of compiler technology to optimize register use

– Limited and simple instruction set

– Emphasis on optimising the instruction pipeline

Page 5: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 17 Z. Li, 2008

Instruction Characteristics

• Operations Performed

– Functions to be performed, how it interacts with memory

• Operands Used

– Types of operands

– Memory organization and addressing modes

• Executing Sequence

– Control and pipeline operations

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 18 Z. Li, 2008

Operations

• Assignments

– Movement of data

• Conditional statements (IF,THEN, FOR, WHILE)

– Sequence control

• Procedure call-return is very time consuming

• Some HLL instruction lead to many machine code

operations

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 19 Z. Li, 2008

Operation Statistics

• In High Level Language (HLL) like C/Pascal, assignment is

the dominating operation

• Number of machine instruction/memory references:

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 20 Z. Li, 2008

Operands

• Mainly local scalar variables

• Optimisation should concentrate on accessing local

variables

Pascal C Average

Integer Constant 16% 23% 20%

Scalar Variable 58% 53% 55%

Array/Structure 26% 24% 25%

Page 6: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 21 Z. Li, 2008

Procedure Calls

• Time consuming,

– Depends on number of parameters passed

– Depends on level of nesting

• Most programs do not do a lot of calls followed by lots of returns

• Most variables are local

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 22 Z. Li, 2008

Implications

• Best support is given by optimising most used and most

time consuming features

• Large number of registers

– Operand referencing

• Careful design of pipelines

– Branch prediction etc.

• Simplified (reduced) instruction set

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 23 Z. Li, 2008

Large Register File

• Software solution

– Require compiler to allocate registers

– Allocate based on most used variables in a given time

– Requires sophisticated program analysis

• Hardware solution

– Have more registers

– Thus more variables will be in registers

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 24 Z. Li, 2008

Why CISC ?

• Software costs far exceed hardware costs

• Increasingly complex high level languages (HLL)

• Semantic gap: machine instruction vs HLL instruction

• Leads to:

– Large instruction sets

– More addressing modes

– Hardware implementations of HLL (high level language) statements

» e.g. CASE (switch) on VAX

Page 7: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 25 Z. Li, 2008

Intention of CISC

• Ease compiler writing

• Improve execution efficiency

– Complex operations in microcode/micro-ops

• Support more complex HLLs

• However, CISC instructions are complex, hard to predict

and optimize.

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 26 Z. Li, 2008

Variable access localization

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 27 Z. Li, 2008

Registers for Local Variables

• Register is the fastest storage

– Better than cache and memory

• Try to limit the data assignment to registers would be

good for performance

– Software approach: compiler figure out variable assignment to

register at compile time

– Hardware approach: register windows:

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 28 Z. Li, 2008

Register Windows

• Most operands reference several local variables in the

function, along with couple of globals

• Function calls change local variable set

• Function calls also involves parameters to be passed

• So, instead of using stack to save local variables, and

pass parameters, partition register file into sets,

• And select different window to access it according to

program execution.

Page 8: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 29 Z. Li, 2008

Register Windows cont.

• Three areas within a register set

– Parameter registers

– Local registers

– Temporary registers

• Examples:

– Berkeley RISC use 8 windows of 16 registers each

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 30 Z. Li, 2008

Overlapping Register Windows

– Temporary registers from one set overlap parameter registers from

the next

– This allows parameter passing without moving data

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 31 Z. Li, 2008

Circular Buffer diagram

• Managing register window

– When a call is made, a current window

pointer is moved to show the

currently active register window

– If all windows are in use, an interrupt

is generated and the oldest window

(the one furthest back in the call

nesting) is saved to memory

– A saved window pointer indicates

where the next saved windows should

restore to

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 32 Z. Li, 2008

Global Variables

• Allocated by the compiler to memory

– Inefficient for frequently accessed variables

• Have a set of registers for global variables

– Eg. Requires R0~R7 to be used for storing globals.

Page 9: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 33 Z. Li, 2008

Referencing variable in windowed register

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 34 Z. Li, 2008

Referencing variable in cache

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 35 Z. Li, 2008

Registers v Cache

• Windowed Register:

– stores all variables of the last N-1 most recent procedural calls,

faster , Handles globals well

• Cache:

– store a selection of recent variables, more efficient usage of memory,

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 36 Z. Li, 2008

Compiler Based Register Optimization

• Assume small number of registers (16-32)• Optimizing is up to compiler• HLL programs have no explicit references to registers• Assign symbolic or virtual register to each candidate

variable • Map (unlimited) symbolic registers to real registers• Symbolic registers that do not overlap can share real

registers• If you run out of real registers some variables use

memory

Page 10: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 37 Z. Li, 2008

How to assign variables to registers ?

• Graph Coloring Algorithm:

– Build register interference graph,

– 2 variables if alive at the same time, or

interfere with each other, draw an edge

– Try to find smallest number of colors for all

nodes, such that nodes interfering each other do

not have the same color

– Each color is assigned to a different register

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 38 Z. Li, 2008

RISC Pipelining

• Most instructions are register to register

• Two phases of execution

– I: Instruction fetch

– E: Execute

» ALU operation with register input and output

• For load and store

– I: Instruction fetch

– E: Execute

» Calculate memory address

– D: Memory

» Register to memory or memory to register operation

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 39 Z. Li, 2008

Effects of Pipelining

13 cycles 10 cycles, 1 mem port

8 cycles, 2 mem ports

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 40 Z. Li, 2008

Optimization of Pipelining

• Out of order execution

– Insertion of NoOp to avoid clearing pipelines by circuits

– Out of order execution:

Page 11: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 41 Z. Li, 2008

CISC vs RISC: a summary

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 42 Z. Li, 2008

Comparison of CISC/RISC processors

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 43 Z. Li, 2008

CISC vs RISC

• Compiler simplification?

– Complex machine instructions harder to exploit

– Optimization more difficult

• Smaller programs?

– Program takes up less memory but…

– Memory is now cheap

– May not occupy less bits, just look shorter in symbolic form

» More instructions require longer op-codes

» Register references require fewer bits

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 44 Z. Li, 2008

CISC vs RISC

• Faster programs?

– Bias towards use of simpler instructions

– More complex control unit

– Microprogram control store larger

– thus simple instructions take longer to execute

• It is far from clear that CISC is the appropriate solution

Page 12: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 45 Z. Li, 2008

RISC Characteristics

• Simple instructions

– One instruction per cycle

– Register to register operations

– Few, simple addressing modes

– Few, simple instruction formats

– Hardwired design (no microcode)

– Fixed instruction format

• More compile time optimization effort

– Register renaming

– Out of order execution

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 46 Z. Li, 2008

No conclusive comparison

• Quantitative– compare program sizes and execution speeds

• Qualitative– examine issues of high level language support and use of VLSI real

estate

• Problems– No pair of RISC and CISC that are directly comparable– No definitive set of test programs– Difficult to separate hardware effects from complier effects– Most comparisons done on “toy” rather than production machines– Most commercial devices are a mixture

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 47 Z. Li, 2008

Superscalar Architecture

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 48 Z. Li, 2008

What is Superscalar?

• Scalar computer: handle one instruction one data at a time

• Vector Computer: handle multiple data at a time.

• Superscalar Computer:

– Multiple independent pipelines (2 int, 2 fp, 1 mem) are implemented

– Each pipeline has stages which can also handle multiple instructions

Page 13: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 49 Z. Li, 2008

Superscalar vs Superpipeline

• Superpipeline:

– Many pipeline stages

need less than half a

clock cycle

– Double internal clock

speed gets two tasks per

external clock cycle

• Superscalar allows

parallel fetch execute

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 50 Z. Li, 2008

Limitations

• Instruction level parallelism

• Compiler based optimisation

• Hardware techniques

• Limited by

– True data dependency

– Procedural dependency

– Resource conflicts

– Output dependency

– Antidependency

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 51 Z. Li, 2008

True Data Dependency

• ADD r1, r2 (r1 := r1+r2;)

• MOVE r3,r1 (r3 := r1;)

• Can fetch and decode second instruction in parallel with

first

• Can NOT execute second instruction until first is finished

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 52 Z. Li, 2008

Procedural Dependency

• Can not execute instructions after a branch in parallel

with instructions before a branch

• Also, if instruction length is not fixed, instructions have

to be decoded to find out how many fetches are needed

• This prevents simultaneous fetches

Page 14: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 53 Z. Li, 2008

Resource Conflict

• Two or more instructions requiring access to the same

resource at the same time

– e.g. two arithmetic instructions

• Solution: Can duplicate resources

– e.g. have two arithmetic units

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 54 Z. Li, 2008

Effect of Dependency

• Illustration of

– Data dependency

– Procedural (branch) dependency

– Resource dependency

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 55 Z. Li, 2008

Design Issues

• Instruction level parallelism

– Instructions in a sequence are independent

– Execution can be overlapped

– Governed by data and procedural dependency

• Machine Parallelism

– Ability to take advantage of instruction level parallelism

– Governed by number of parallel pipelines

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 56 Z. Li, 2008

Instruction Issue Policy

• Order in which instructions are fetched

• Order in which instructions are executed

• Order in which instructions change registers and memory

Page 15: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 57 Z. Li, 2008

• Issue instructions in the order they occur

• Not very efficient

• May fetch >1 instruction

• Instructions must stall if necessary (2 fetch, 3 exe, 2 mem ports)

In-order issue & exec

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 58 Z. Li, 2008

• Output dependency

– R3:= R3 + R5; (I1)

– R4:= R3 + 1; (I2)

– R3:= R5 + 1; (I3)

– I2 depends on result of I1 - data dependency

– If I3 completes before I1, the result from I1 will be wrong - output (read-

write) dependency

In order issue, out-of-order execute

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 59 Z. Li, 2008

• Decouple decode pipeline from execution pipeline

• Can continue to fetch and decode until this instruction window pipeline

is full

• When a functional unit becomes available an instruction can be

executed

• Since instructions have been decoded, processor can look ahead

Out-of-order issue and execute

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 60 Z. Li, 2008

Antidependency

• Write-write dependency

– R3:=R3 + R5; (I1)

– R4:=R3 + 1; (I2)

– R3:=R5 + 1; (I3)

– R7:=R3 + R4; (I4)

– I3 can not complete before I2 starts as I2 needs a value in R3 and I3

changes R3

Page 16: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 61 Z. Li, 2008

Register Renaming

• Output and antidependencies occur because register

contents may not reflect the correct ordering from the

program

• May result in a pipeline stall

• Registers allocated dynamically

– i.e. registers are not specifically named

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 62 Z. Li, 2008

Register Renaming example

• R3b:=R3a + R5a (I1)

• R4b:=R3b + 1 (I2)

• R3c:=R5a + 1 (I3)

• R7b:=R3c + R4b (I4)

• Without subscript refers to logical register in instruction

• With subscript is hardware register allocated

• Note R3a R3b R3c

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 63 Z. Li, 2008

Machine Parallelism

• Duplication of Resources

• Out of order issue

• Renaming

• Not worth duplication functions without register renaming

• Need instruction window large enough (more than 8)

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 64 Z. Li, 2008

Performances

Page 17: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 65 Z. Li, 2008

Branch Prediction

• 80486 fetches both next sequential instruction after

branch and branch target instruction

• Gives two cycle delay if branch taken

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 66 Z. Li, 2008

RISC - Delayed Branch

• Calculate result of branch before unusable instructions

pre-fetched

• Always execute single instruction immediately following

branch

• Keeps pipeline full while fetching new instruction stream

• Not as good for superscalar

– Multiple instructions need to execute in delay slot

– Instruction dependence problems

• Revert to branch prediction

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 67 Z. Li, 2008

Superscalar Implementation

• Simultaneously fetch multiple instructions

• Logic to determine true dependencies involving register values

• Mechanisms to communicate these values

• Mechanisms to initiate multiple instructions in parallel

• Resources for parallel execution of multiple instructions

• Mechanisms for committing process state in correct order

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 68 Z. Li, 2008

PowerPC

• Direct descendent of IBM 801, RT PC and RS/6000

• All are RISC

• RS/6000 first superscalar

• PowerPC 601 superscalar design similar to RS/6000

• Later versions extend superscalar concept

Page 18: COMP 212 Computer Organization & Architecture Pipeline Re-Capcomp212/lec2008/lec-12-risc... · 2008-11-26 · Comp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 69 Z. Li, 2008

PowerPC 601 General View

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 70 Z. Li, 2008

PowerPC 601 Pipeline

Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 71 Z. Li, 2008

Summary

• RISC

– A design that simplifies elementary instruction processing

– Allows for optimization and improvements of efficiency by compiler

and run-time circuits later

– Main-stream solution now, MIPS, SPARC, PowerPC, …etc.

• Superscalar

– Multiple fetch, execution and memory port units

– Additional dimension to achieve ILP

– Brings more complex issues to consistence and correctness of

execution.