Upload
rosalind-hunter
View
230
Download
2
Embed Size (px)
Citation preview
Pipelined Datapath and Control
(Lecture #13)
ECE 445 – Computer Organization
The slides included herein were taken from the materials accompanying Computer Organization and Design, 4th Edition, by Patterson and Hennessey,
and were used with permission from Morgan Kaufmann Publishers.
Fall 2010 ECE 445 - Computer Organization 2
Material to be covered ...
Chapter 4: Sections 5 – 9, 13 – 14
Fall 2010 ECE 445 - Computer Organization 3
Performance of the Single-Cycle MIPS
Fall 2010 ECE 445 - Computer Organization 4
Fall 2010 ECE 445 - Computer Organization 5
Example: MIPS Clock Rate
Determine the clock rate for the MIPS architecture, assuming the following:
The MIPS is a Single Cycle Machine 1 clock cycle per instruction CPI = 1
Access time for memory units = 200 ps Operation time for ALU and adders = 100 ps Access time for register file = 50 ps
Fall 2010 ECE 445 - Computer Organization 6
Example: MIPS Clock Rate
Instruction Class Functional Units used by the Instruction Class
ALU Instruction Inst. Fetch Register ALU Register
Load Word Inst. Fetch Register ALU Memory Register
Store Word Inst. Fetch Register ALU Memory
Branch Inst. Fetch Register ALU
Jump Inst. Fetch
Fall 2010 ECE 445 - Computer Organization 7
Example: MIPS Clock Rate
Instruction Class Instr Memory
Register read
ALU operation
Data Memory
Register write
Total
ALU Instruction 200 50 100 0 50 400 ps
Load Word 200 50 100 200 50 600 ps
Store Word 200 50 100 200 0 550 ps
Branch 200 50 100 0 0 350 ps
Jump 200 0 0 0 0 200 ps
Fall 2010 ECE 445 - Computer Organization 8
Example: MIPS Clock Rate
The clock cycle time for a machine with a single clock cycle per instruction will be determined by the longest instruction.
In this example, the load word instruction requires 600 ps.
The clock rate is then
Clock rate = 1 / Clock Cycle Time
Clock rate = 1 / 600 ps = 1.67 GHz
Fall 2010 ECE 445 - Computer Organization 9
Performance Issues Longest delay determines clock period
Critical path: load word (lw) instruction Instruction memory register file ALU data
memory register file Not feasible to vary clock period for different
instructions Violates design principle
Making the common case fast Improve performance by pipelining
Fall 2010 ECE 445 - Computer Organization 10
How does pipelining work?
Fall 2010 ECE 445 - Computer Organization 11
Pipelining Analogy Pipelined laundry: overlapping execution
Parallelism improves performance
§4.5 An O
verview of P
ipelining Four loads: Speedup
= 8/3.5 = 2.3
Non-stop: Speedup
= 2n/0.5n + 1.5 ≈ 4= number of stages
Fall 2010 ECE 445 - Computer Organization 12
Objective:
Keep all stages of the pipeline busy at all times.
Fall 2010 ECE 445 - Computer Organization 13
Pipelining: Improving Performance
Latency Max. Throughput
Non-Pipelined 2 hours 0.5
Pipelined 2 hours 2
Latency = time from start of one load to the end of same load.
Maximum Throughput = # of loads completed per hour.
Assuming all stages of pipeline are busy at all times.Length of time for each
load does not change.
Fall 2010 ECE 445 - Computer Organization 14
Pipelining: Improving Performance
Pipelining improves performance by increasing instruction throughput, rather than decreasing
execution time of an individual instruction.
Fall 2010 ECE 445 - Computer Organization 15
The MIPS Pipeline
Fall 2010 ECE 445 - Computer Organization 16
MIPS Pipeline
Five stages, one step per stage– IF : Instruction fetch from memory– ID : Instruction decode & register read– EX : Execute operation or calculate address– MEM : Access memory operand– WB : Write result back to register
Fall 2010 ECE 445 - Computer Organization 17
MIPS Pipeline
Fall 2010 ECE 445 - Computer Organization 18
Pipeline Performance Assume time for stages is
100ps for register read or write 200ps for other stages
Compare pipelined datapath with single-cycle datapath
Instr Instr fetch Register read
ALU op Memory access
Register write
Total time
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
Fall 2010 ECE 445 - Computer Organization 19
Pipeline PerformanceSingle-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
Why is the clock period 800ps?
Why is the clock period 200ps?
Fall 2010 ECE 445 - Computer Organization 20
Pipeline Speedup
If all stages are balanced i.e., all take the same time
Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages If not balanced, speedup is less Speedup due to increased throughput
Latency (time for each instruction) does not decrease
Fall 2010 ECE 445 - Computer Organization 21
Pipelining and ISA Design MIPS ISA designed for pipelining
All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions
Few and regular instruction formats Can decode and read registers in one step
Load/store addressing Can calculate address in 3rd stage, access memory in 4th
stage Alignment of memory operands
i.e. on word boundaries Memory access takes only one cycle
Fall 2010 ECE 445 - Computer Organization 22
Pipeline Summary
Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency
Subject to hazards Structure, data, control
Instruction set design affects complexity of pipeline implementation
The BIG Picture
hazards will be discussed in upcoming lectures
Fall 2010 ECE 445 - Computer Organization 23
MIPS Pipelined Datapath§4.6 P
ipelined Datapath and C
ontrol
Fall 2010 ECE 445 - Computer Organization 24
Pipeline registers Need registers between stages
To hold information produced in previous cycle
Why?
Fall 2010 ECE 445 - Computer Organization 25
Pipeline Operation
Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram
Shows pipeline usage in a single cycle Highlight resources used
“Multi-clock-cycle” diagram Graph of operation over time
We’ll look at “single-clock-cycle” diagrams for load word and store word.
Fall 2010 ECE 445 - Computer Organization 26
IF for Load, Store, …
Fall 2010 ECE 445 - Computer Organization 27
ID for Load, Store, …
Fall 2010 ECE 445 - Computer Organization 28
EX for Load
Fall 2010 ECE 445 - Computer Organization 29
MEM for Load
Fall 2010 ECE 445 - Computer Organization 30
WB for Load
Wrongregisternumber
Why?
Fall 2010 ECE 445 - Computer Organization 31
Corrected Datapath for Load
Fall 2010 ECE 445 - Computer Organization 32
EX for Store
Fall 2010 ECE 445 - Computer Organization 33
MEM for Store
Fall 2010 ECE 445 - Computer Organization 34
WB for Store
Fall 2010 ECE 445 - Computer Organization 35
Multi-Cycle Pipeline Diagram Form showing resource usage
Fall 2010 ECE 445 - Computer Organization 36
Multi-Cycle Pipeline Diagram Traditional form
Fall 2010 ECE 445 - Computer Organization 37
Single-Cycle Pipeline Diagram State of pipeline in a given cycle
Fall 2010 ECE 445 - Computer Organization 38
Questions?