Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 1 Z. Li, 2008
COMP 212 Computer Organization & Architecture
COMP 212 Fall 2008
Lecture 12
RISC & Superscalar
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 2 Z. Li, 2008
Pipeline Re-Cap
• Pipeline is ILP -Instruction Level Parallelism
– Divide instruction cycles into stages, overlapped execution
– Could potentially achieve k time speed up for k-stage pipelines
• Pipeline Hazards:
– Structural: two micro-ops requires the same circuits in the same
cycle
– Control: target branch PC not known until execution
– Data: successive instructions read the output of previous instruction
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 3 Z. Li, 2008
Instruction Micro-Operations
• An 6-stage pipeline
– Execution takes longer
than fetch
– Break up execution into
sub-cycles, i.e, DI, CO, FO,
EI, WO.
– Allow overlapping, or pre-
fetch the command
– Branch : may have to re-
fetch the correct
instruction
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 4 Z. Li, 2008
Instruction Pipeline – no hazard
Speedup: 9x6=54 (no pipeline) vs 14 (pipelined) time slots.
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 5 Z. Li, 2008
Conditional branching
• The correct PC address is runtime dependent
Branch
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 6 Z. Li, 2008
Alternative Pipeline View
Flush out I6-I3
Found thatCorrect PC should be I15
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 7 Z. Li, 2008
Speedup – perfect case
• k-stage pipeline, n instructions, execution time speed up:
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 8 Z. Li, 2008
Dealing with Branches
• Pipeline efficiency depends on a steady stream of
instructions that fills up the pipeline
• Conditional branching is a major drawback for efficiency
• Can be deal with by:
– Multiple Streams
– Prefetch Branch Target
– Loop buffer
– Branch prediction
– Delayed branching
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 9 Z. Li, 2008
Branch Prediction – Static Solutions
• Predict never taken
– Assume that jump will not happen
– Always fetch next instruction
– 68020 & VAX 11/780
• Predict always taken
– Assume that jump will happen
– Always fetch target instruction
• Predict by opcode
– By collecting stats on different opcode w.r.t. branching
– Correct rate > 75%
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 10 Z. Li, 2008
Branch Prediction – Dynamic, Runtime Based
• Taken/Not taken switch
– Use 1 or 2 bits to record taken/not taken history
– Good for loops
• Branch history table
– Based on previous history
– Good for loops
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 11 Z. Li, 2008
Branch Prediction State Diagram
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 12 Z. Li, 2008
RISC
Reduced Instruction Set Computer
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 13 Z. Li, 2008
Motivation of RISC
• Improve Pipeline efficiency
– Fixed instruction format and small number of instructions:
» Make the operations more predictable and manageable
– Large register files
» avoid data dependency and hazard
– Both compile time and run time pipeline optimization,
» register renaming, out of order execution.
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 14 Z. Li, 2008
A little bit of history….
• The computer family concept
– IBM System/360 1964, DEC PDP-8
– Separates architecture from implementation
• Microporgrammed control unit
– Idea by Wilkes 1951, produced by IBM S/360 1964
– Flexibility and extensibility in CPU control implementation.
• Cache memory
– IBM S/360 model 85 1969
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 15 Z. Li, 2008
A bit of history….
• Solid State RAM
– (See memory notes)
• Microprocessors
– Intel 4004 1971
• Pipelining
– Introduces parallelism into fetch execute cycle
• Multiple processors
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 16 Z. Li, 2008
The Next Step - RISC
• Reduced Instruction Set Computer
• Key features
– Large number of general purpose registers
– or use of compiler technology to optimize register use
– Limited and simple instruction set
– Emphasis on optimising the instruction pipeline
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 17 Z. Li, 2008
Instruction Characteristics
• Operations Performed
– Functions to be performed, how it interacts with memory
• Operands Used
– Types of operands
– Memory organization and addressing modes
• Executing Sequence
– Control and pipeline operations
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 18 Z. Li, 2008
Operations
• Assignments
– Movement of data
• Conditional statements (IF,THEN, FOR, WHILE)
– Sequence control
• Procedure call-return is very time consuming
• Some HLL instruction lead to many machine code
operations
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 19 Z. Li, 2008
Operation Statistics
• In High Level Language (HLL) like C/Pascal, assignment is
the dominating operation
• Number of machine instruction/memory references:
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 20 Z. Li, 2008
Operands
• Mainly local scalar variables
• Optimisation should concentrate on accessing local
variables
Pascal C Average
Integer Constant 16% 23% 20%
Scalar Variable 58% 53% 55%
Array/Structure 26% 24% 25%
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 21 Z. Li, 2008
Procedure Calls
• Time consuming,
– Depends on number of parameters passed
– Depends on level of nesting
• Most programs do not do a lot of calls followed by lots of returns
• Most variables are local
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 22 Z. Li, 2008
Implications
• Best support is given by optimising most used and most
time consuming features
• Large number of registers
– Operand referencing
• Careful design of pipelines
– Branch prediction etc.
• Simplified (reduced) instruction set
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 23 Z. Li, 2008
Large Register File
• Software solution
– Require compiler to allocate registers
– Allocate based on most used variables in a given time
– Requires sophisticated program analysis
• Hardware solution
– Have more registers
– Thus more variables will be in registers
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 24 Z. Li, 2008
Why CISC ?
• Software costs far exceed hardware costs
• Increasingly complex high level languages (HLL)
• Semantic gap: machine instruction vs HLL instruction
• Leads to:
– Large instruction sets
– More addressing modes
– Hardware implementations of HLL (high level language) statements
» e.g. CASE (switch) on VAX
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 25 Z. Li, 2008
Intention of CISC
• Ease compiler writing
• Improve execution efficiency
– Complex operations in microcode/micro-ops
• Support more complex HLLs
• However, CISC instructions are complex, hard to predict
and optimize.
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 26 Z. Li, 2008
Variable access localization
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 27 Z. Li, 2008
Registers for Local Variables
• Register is the fastest storage
– Better than cache and memory
• Try to limit the data assignment to registers would be
good for performance
– Software approach: compiler figure out variable assignment to
register at compile time
– Hardware approach: register windows:
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 28 Z. Li, 2008
Register Windows
• Most operands reference several local variables in the
function, along with couple of globals
• Function calls change local variable set
• Function calls also involves parameters to be passed
• So, instead of using stack to save local variables, and
pass parameters, partition register file into sets,
• And select different window to access it according to
program execution.
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 29 Z. Li, 2008
Register Windows cont.
• Three areas within a register set
– Parameter registers
– Local registers
– Temporary registers
• Examples:
– Berkeley RISC use 8 windows of 16 registers each
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 30 Z. Li, 2008
Overlapping Register Windows
– Temporary registers from one set overlap parameter registers from
the next
– This allows parameter passing without moving data
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 31 Z. Li, 2008
Circular Buffer diagram
• Managing register window
– When a call is made, a current window
pointer is moved to show the
currently active register window
– If all windows are in use, an interrupt
is generated and the oldest window
(the one furthest back in the call
nesting) is saved to memory
– A saved window pointer indicates
where the next saved windows should
restore to
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 32 Z. Li, 2008
Global Variables
• Allocated by the compiler to memory
– Inefficient for frequently accessed variables
• Have a set of registers for global variables
– Eg. Requires R0~R7 to be used for storing globals.
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 33 Z. Li, 2008
Referencing variable in windowed register
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 34 Z. Li, 2008
Referencing variable in cache
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 35 Z. Li, 2008
Registers v Cache
• Windowed Register:
– stores all variables of the last N-1 most recent procedural calls,
faster , Handles globals well
• Cache:
– store a selection of recent variables, more efficient usage of memory,
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 36 Z. Li, 2008
Compiler Based Register Optimization
• Assume small number of registers (16-32)• Optimizing is up to compiler• HLL programs have no explicit references to registers• Assign symbolic or virtual register to each candidate
variable • Map (unlimited) symbolic registers to real registers• Symbolic registers that do not overlap can share real
registers• If you run out of real registers some variables use
memory
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 37 Z. Li, 2008
How to assign variables to registers ?
• Graph Coloring Algorithm:
– Build register interference graph,
– 2 variables if alive at the same time, or
interfere with each other, draw an edge
– Try to find smallest number of colors for all
nodes, such that nodes interfering each other do
not have the same color
– Each color is assigned to a different register
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 38 Z. Li, 2008
RISC Pipelining
• Most instructions are register to register
• Two phases of execution
– I: Instruction fetch
– E: Execute
» ALU operation with register input and output
• For load and store
– I: Instruction fetch
– E: Execute
» Calculate memory address
– D: Memory
» Register to memory or memory to register operation
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 39 Z. Li, 2008
Effects of Pipelining
13 cycles 10 cycles, 1 mem port
8 cycles, 2 mem ports
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 40 Z. Li, 2008
Optimization of Pipelining
• Out of order execution
– Insertion of NoOp to avoid clearing pipelines by circuits
– Out of order execution:
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 41 Z. Li, 2008
CISC vs RISC: a summary
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 42 Z. Li, 2008
Comparison of CISC/RISC processors
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 43 Z. Li, 2008
CISC vs RISC
• Compiler simplification?
– Complex machine instructions harder to exploit
– Optimization more difficult
• Smaller programs?
– Program takes up less memory but…
– Memory is now cheap
– May not occupy less bits, just look shorter in symbolic form
» More instructions require longer op-codes
» Register references require fewer bits
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 44 Z. Li, 2008
CISC vs RISC
• Faster programs?
– Bias towards use of simpler instructions
– More complex control unit
– Microprogram control store larger
– thus simple instructions take longer to execute
• It is far from clear that CISC is the appropriate solution
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 45 Z. Li, 2008
RISC Characteristics
• Simple instructions
– One instruction per cycle
– Register to register operations
– Few, simple addressing modes
– Few, simple instruction formats
– Hardwired design (no microcode)
– Fixed instruction format
• More compile time optimization effort
– Register renaming
– Out of order execution
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 46 Z. Li, 2008
No conclusive comparison
• Quantitative– compare program sizes and execution speeds
• Qualitative– examine issues of high level language support and use of VLSI real
estate
• Problems– No pair of RISC and CISC that are directly comparable– No definitive set of test programs– Difficult to separate hardware effects from complier effects– Most comparisons done on “toy” rather than production machines– Most commercial devices are a mixture
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 47 Z. Li, 2008
Superscalar Architecture
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 48 Z. Li, 2008
What is Superscalar?
• Scalar computer: handle one instruction one data at a time
• Vector Computer: handle multiple data at a time.
• Superscalar Computer:
– Multiple independent pipelines (2 int, 2 fp, 1 mem) are implemented
– Each pipeline has stages which can also handle multiple instructions
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 49 Z. Li, 2008
Superscalar vs Superpipeline
• Superpipeline:
– Many pipeline stages
need less than half a
clock cycle
– Double internal clock
speed gets two tasks per
external clock cycle
• Superscalar allows
parallel fetch execute
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 50 Z. Li, 2008
Limitations
• Instruction level parallelism
• Compiler based optimisation
• Hardware techniques
• Limited by
– True data dependency
– Procedural dependency
– Resource conflicts
– Output dependency
– Antidependency
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 51 Z. Li, 2008
True Data Dependency
• ADD r1, r2 (r1 := r1+r2;)
• MOVE r3,r1 (r3 := r1;)
• Can fetch and decode second instruction in parallel with
first
• Can NOT execute second instruction until first is finished
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 52 Z. Li, 2008
Procedural Dependency
• Can not execute instructions after a branch in parallel
with instructions before a branch
• Also, if instruction length is not fixed, instructions have
to be decoded to find out how many fetches are needed
• This prevents simultaneous fetches
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 53 Z. Li, 2008
Resource Conflict
• Two or more instructions requiring access to the same
resource at the same time
– e.g. two arithmetic instructions
• Solution: Can duplicate resources
– e.g. have two arithmetic units
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 54 Z. Li, 2008
Effect of Dependency
• Illustration of
– Data dependency
– Procedural (branch) dependency
– Resource dependency
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 55 Z. Li, 2008
Design Issues
• Instruction level parallelism
– Instructions in a sequence are independent
– Execution can be overlapped
– Governed by data and procedural dependency
• Machine Parallelism
– Ability to take advantage of instruction level parallelism
– Governed by number of parallel pipelines
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 56 Z. Li, 2008
Instruction Issue Policy
• Order in which instructions are fetched
• Order in which instructions are executed
• Order in which instructions change registers and memory
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 57 Z. Li, 2008
• Issue instructions in the order they occur
• Not very efficient
• May fetch >1 instruction
• Instructions must stall if necessary (2 fetch, 3 exe, 2 mem ports)
In-order issue & exec
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 58 Z. Li, 2008
• Output dependency
– R3:= R3 + R5; (I1)
– R4:= R3 + 1; (I2)
– R3:= R5 + 1; (I3)
– I2 depends on result of I1 - data dependency
– If I3 completes before I1, the result from I1 will be wrong - output (read-
write) dependency
In order issue, out-of-order execute
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 59 Z. Li, 2008
• Decouple decode pipeline from execution pipeline
• Can continue to fetch and decode until this instruction window pipeline
is full
• When a functional unit becomes available an instruction can be
executed
• Since instructions have been decoded, processor can look ahead
Out-of-order issue and execute
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 60 Z. Li, 2008
Antidependency
• Write-write dependency
– R3:=R3 + R5; (I1)
– R4:=R3 + 1; (I2)
– R3:=R5 + 1; (I3)
– R7:=R3 + R4; (I4)
– I3 can not complete before I2 starts as I2 needs a value in R3 and I3
changes R3
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 61 Z. Li, 2008
Register Renaming
• Output and antidependencies occur because register
contents may not reflect the correct ordering from the
program
• May result in a pipeline stall
• Registers allocated dynamically
– i.e. registers are not specifically named
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 62 Z. Li, 2008
Register Renaming example
• R3b:=R3a + R5a (I1)
• R4b:=R3b + 1 (I2)
• R3c:=R5a + 1 (I3)
• R7b:=R3c + R4b (I4)
• Without subscript refers to logical register in instruction
• With subscript is hardware register allocated
• Note R3a R3b R3c
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 63 Z. Li, 2008
Machine Parallelism
• Duplication of Resources
• Out of order issue
• Renaming
• Not worth duplication functions without register renaming
• Need instruction window large enough (more than 8)
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 64 Z. Li, 2008
Performances
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 65 Z. Li, 2008
Branch Prediction
• 80486 fetches both next sequential instruction after
branch and branch target instruction
• Gives two cycle delay if branch taken
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 66 Z. Li, 2008
RISC - Delayed Branch
• Calculate result of branch before unusable instructions
pre-fetched
• Always execute single instruction immediately following
branch
• Keeps pipeline full while fetching new instruction stream
• Not as good for superscalar
– Multiple instructions need to execute in delay slot
– Instruction dependence problems
• Revert to branch prediction
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 67 Z. Li, 2008
Superscalar Implementation
• Simultaneously fetch multiple instructions
• Logic to determine true dependencies involving register values
• Mechanisms to communicate these values
• Mechanisms to initiate multiple instructions in parallel
• Resources for parallel execution of multiple instructions
• Mechanisms for committing process state in correct order
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 68 Z. Li, 2008
PowerPC
• Direct descendent of IBM 801, RT PC and RS/6000
• All are RISC
• RS/6000 first superscalar
• PowerPC 601 superscalar design similar to RS/6000
• Later versions extend superscalar concept
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 69 Z. Li, 2008
PowerPC 601 General View
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 70 Z. Li, 2008
PowerPC 601 Pipeline
Comp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & ArchComp 212 Computer Org & Arch 71 Z. Li, 2008
Summary
• RISC
– A design that simplifies elementary instruction processing
– Allows for optimization and improvements of efficiency by compiler
and run-time circuits later
– Main-stream solution now, MIPS, SPARC, PowerPC, …etc.
• Superscalar
– Multiple fetch, execution and memory port units
– Additional dimension to achieve ILP
– Brings more complex issues to consistence and correctness of
execution.