Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
ECE 4750 Computer ArchitectureTopic 2: From CISC to RISC
Christopher BattenSchool of Electrical and Computer Engineering
Cornell University
http://www.csl.cornell.edu/courses/ece4750
slide revision: 2013-09-08-23-34
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
CPI for Microcoded Machine
Inst 17 cycles
Inst 25 cycles
Inst 310 cycles
I Total clock cycles = 7 + 5 + 10 = 22I Total instructions = 3I Clocks per Instruction (CPI) = 22 / 3 = 7.33I CPI is always an average over a large number of instructions
ECE 4750 T02: From CISC to RISC 2 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
“Iron Law” of Processor Performance
TimeProgram
=Instructions
Program× Cycles
Instruction× Time
Cycles
I Instructions / program depends on source code, compiler, ISAI Cycles / instruction (CPI) depends on ISA, microarchitectureI Time / cycle depends upon microarchitecture and implementation
Microarchitecture CPI Cycle Time
last topic→ Microcoded >1 shortthis topic→ Single-Cycle Unpipelined 1 longthis topic→ Multi-Cycle Unpipelined >1 short
next topic→ Pipelined ≈1 short
ECE 4750 T02: From CISC to RISC 3 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Agenda
Technology Trends Motivating RISC
Memory Basics
Single-Cycle Unpipelined MIPS Processor
Multi-Cycle Unpipelined MIPS Processor
ECE 4750 T02: From CISC to RISC 4 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Minicomputers in the 1970’s
Extremely popular VAX 11/780 firstavailable in 1977; often used as a
baseline for benchmarking andassumed to have a speed of 1M
instructions/section (1 MIPS):5 MHz, TTL devices
I Implemented with racks ofdiscrete components
I Used microcode to implementCISC ISA
I Applications in business,scientific, commercial computing
ECE 4750 T02: From CISC to RISC 5 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Microprocessors in the 1970’s
First microprocessor is the Intel4004 fabricated in 1971: designed
for desktop printing calculator:750 KHz, 8–16 cycles/inst, 8 µm
PMOS, 2.3K transistors, 12 mm2,microcoded control to implement
CISC ISA
I Microprocessors made possibleby new integrated circuit tech
I Constrained by what could fit ona single chip leading to few-bitdatapaths with hardwired control
I Initial application was forembedded control
I 8-bit microprocessors used inhobbyist personal computers. Micral, Alrair, TRS-80, Apple-II. Usually had 16-bit address space
(65KB directly addressable). Simple BASIC interpreter in ROM
or cassette tape
ECE 4750 T02: From CISC to RISC 6 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
DRAM in the 1970’s
I Dramatic progress in MOSFET memory technologyI 1970→ Intel introduces first DRAM (Model 1103 w/ 1 Kb)I 1979→ Fujitsu introduces 64 Kb DRAMI By mid-1970’s became obvious that microprocessors would
soon have >64 KB of physical memory
ECE 4750 T02: From CISC to RISC 7 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
VisiCalc as “Killer” App and Eventually the IBM PC
ECE 4750 T02: From CISC to RISC 8 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Analyzing Microcoded Machines
I John Cocke and group at IBM
. Working on a simple pipelined processor, 801, and advanced compilers
. Ported experimental PL8 compiler to IBM 370, and only used simpleregister-register and load/store instructions similar to 801
. Code ran faster than other existing compilers that used all 370 instructions!(up to 6 MIPS, whereas 2 MIPS considered good before)
I Joel Emer and Douglas Clark at DEC. Measured VAX-11/780 using external hardware. Found it was actually a 0.5 MIPS machine, not a 1 MIPS machine. 20% of VAX instrs = 60% of µcode, but only 0.2% of the dynamic execution
I VAX 8800, high-end VAX in 1984. Control store: 16K×147b RAM, Unified Cache: 64K×8b RAM. 4.5× more microstore RAM than cache RAM!
ECE 4750 T02: From CISC to RISC 9 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
From CISC to RISC
I Key changes in tech constraints. Logic, RAM, ROM all implemented with MOS transistors. RAM ≈ same speed as ROM
I Use fast RAM to build fast instruction cache of user-visibleinstructions, not fixed hardware microfragments. Change contents of fast instruction memory to fit what app needs
I Use simple ISA to enable hardwired pipelined implementation. Most compiled code only used a few of CISC instructions. Simpler encoding allowed pipelined implementations. Load/Store Reg-Reg ISA as opposed to Mem-Mem ISA
I Further benefit with integration. Early 1980’s→ fit 32-bit datapath, small caches on single chip. No chip crossing in common case allows faster operation
ECE 4750 T02: From CISC to RISC 10 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
From CISC to RISC
μPC
ROM for
μInst
Small
Decoder
User PC
RAM for
Instr Cache
"Larger"
Decoder
Vertical μCode
ControllerRISC
Controller
ECE 4750 T02: From CISC to RISC 11 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Berkeley RISC Chips
RISC-I fabricated in 1982under the direction of David
Patterson and probably the firstVLSI RISC processor: 1 MHz, 5 µmNMOS, 44.5K transistors, 77 mm2
RISC-II was the 1983 follow up withseveral improvements: 3 MHz, 3 µmNMOS, 40.7K transistors, 60 mm2
ECE 4750 T02: From CISC to RISC 12 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Stanford MIPS Chips
First MIPS prototype fabricatedin 1984 under direction of John
Hennessy; MIPS-X was the 1986follow up: 5-stage, 20 MHz, 2 µm
2-layer CMOS
John Hennessy leaves Stanford toform MIPS Computer Systems and
their first chip is MIPS R2000 in1986: 8–15 MHz, 2 µm 2-layer
CMOS, 110K transistors, 80 mm2
ECE 4750 T02: From CISC to RISC 13 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
MIPS vs. VAX
Ratio of
MIPS
to
VAX
-- H&P, Appendix J, from Bhandarkar and Clark, 1991
Performance Ratio
Instructions Excuted Ratio
CPI Ratio
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
spice
matri
x
nasa7
fpppp
tom
catv
doduc
espre
sso
eqntott li
2x more instr
6x lower CPI
2-4x higher perf
ECE 4750 T02: From CISC to RISC 14 / 43
• Motivating RISC • Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
CISC/RISC Convergence
by Linley Gwennap
Not to be left out in the move to thenext generation of RISC, MIPS Tech-nologies (MTI) unveiled the design ofthe R10000, also known as T5. As thespiritual successor to the R4000, thenew design will be the basis of high-end
MIPS processors for some time, at least until 1997. Byswapping superpipelining for an aggressively out-of-order superscalar design, the R10000 has the potentialto deliver high performance throughout that period.
The new processor uses deep queues decouple theinstruction fetch logic from the execution units. Instruc-tions that are ready to execute can jump ahead of thosewaiting for operands, increasing the utilization of the ex-ecution units. This technique, known as out-of-order ex-ecution, has been used in PowerPC processors for sometime (see 081402.PDF ), but the new MIPS design is themost aggressive implementation yet, allowing more in-structions to be queued than any of its competitors.
Taking advantage of its experience with the 200-MHz R4400, MTI was able to streamline the design andexpects it to run at a high clock rate. Speaking at theMicroprocessor Forum, MTI’s Chris Rowen said that thefirst R10000 processors will reach a speed of 200 MHz,50% faster than the PowerPC 620. At this speed, he ex-pects performance in excess of 300 SPECint92 and 600SPECfp92, challenging Digital’s 21164 for the perfor-mance lead. Due to schedule slips, however, the R10000has not yet taped out; we do not expect volume ship-ments until 4Q95, by which time Digital may enhancethe performance of its processor.
Speculative Execution Beyond BranchesThe front end of the processor is responsible for
maintaining a continuous flow of instructions into thequeues, despite problems caused by branches and cachemisses. As Figure 1 shows, the chip uses a two-way set-associative instruction cache of 32K. Like other highlysuperscalar designs, the R10000 predecodes instructionsas they are loaded into this cache, which holds four extra
bits per instruction. These bits reducethe time needed to determine the ap-propriate queue for each instruction.
The processor fetches four instruc-tions per cycle from the cache and de-codes them. If a branch is discovered, itis immediately predicted; if it is pre-dicted taken, the target address is sentto the instruction cache, redirecting thefetch stream. Because of the one cycleneeded to decode the branch, takenbranches create a “bubble” in the fetchstream; the deep queues, however, gen-erally prevent this bubble from delay-ing the execution pipeline.
The sequential instructions thatare loaded during this extra cycle arenot discarded but are saved in a “re-sume” cache. If the branch is later de-termined to have been mispredicted, thesequential instructions are reloadedfrom the resume cache, reducing themispredicted branch penalty by onecycle. The resume cache has four entriesof four instructions each, allowing spec-ulative execution beyond four branches.
The R10000 design uses the stan-dard two-bit Smith method to predict
M I C R O P R O C E S S O R R E P O R T
MIPS R10000 Uses Decoupled Architecture Vol. 8, No. 14, October 24, 1994 © 1994 MicroDesign Resources
MIPS R10000 Uses Decoupled ArchitectureHigh-Performance Core Will Drive MIPS High-End for Years
1 9 9 4
FORUMMICROPROCESSOR
Figure 1. The R10000 uses deep instruction queues to decouple the instruction fetch logicfrom the five function units.
Instruction Cache32K, two-way associative
PC
Unit
Predecode
Unit
ITLB8 entry
Decode, Map,
DispatchActive
ListMapTable
Main TLB64 entries
ALU1
Data Cache32K, two-way associative
FP
Adder
4 instr
4 instr
4 instr
MemoryQueue
16 entries
IntegerQueue16 entries
FPQueue16 entries
ALU2 FP
Mult÷!
FP÷"
virtualaddr
phys addr
64
DataSRAM
128
512K-16M
Ava
lan
ch
e B
us (
64
bit a
dd
r/d
ata
)
L2 C
ache Inte
rface
128
Syste
m Inte
rface
TagSRAM
BHT512 x 2
Resume
Cache
Address
Adder
Integer Registers64 ! 64 bits
FP Registers64 ! 64 bits
MIPS R10K uses sophisticatedout-of-order engine; branch
delay slot not useful
– Gwennap, MPR, 1994
Intel Nehalem frontend breaks x86 CISCinto smaller RISC-like µops; µcode
engine handles rarely used complex instr
– Kanter, Real World Technologies, 2009
ECE 4750 T02: From CISC to RISC 15 / 43
Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Agenda
Technology Trends Motivating RISC
Memory Basics
Single-Cycle Unpipelined MIPS Processor
Multi-Cycle Unpipelined MIPS Processor
ECE 4750 T02: From CISC to RISC 16 / 43
Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Register File with Combinational Read
ReadSel1 ReadSel2
WriteSel
Register file
2R+1W ReadData2
WriteData
WE Clock
rd1 rs1
rs2
ws wd
rd2
we
ff
Q0
D0
Clk En
ff
Q1
D1
ff
Q2
D2
ff
Qn-1
Dn-1
...
...
...
Single Register
ReadData1
ECE 4750 T02: From CISC to RISC 17 / 43
Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
Register File Implementation
reg 31
ws clk
reg 1
wd
we
rs1 rd1 rd2
reg 0
…
32
…
5 32 32
…
rs2 5 5
I Register files with large number of ports are difficult to implement
I Almost all MIPS instrs have exactly two register source operands
I Intel’s Itanium general-purpose register file has 128 registerswith 8 read ports and 4 write ports!
ECE 4750 T02: From CISC to RISC 18 / 43
Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
“Magic” Memory Model
MAGIC RAM
ReadData
WriteData
Address
WriteEnable Clock
I Read is combinational
I Write is performed at the rising clock edge if enabled
I Write address must be stable at the clock edge
I Later we will consider using more realistic memory
ECE 4750 T02: From CISC to RISC 19 / 43
Motivating RISC • Memory Basics • Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor
More Realistic Memory Model
SRAM ReadData
WriteData
Address
WriteEnable Clock
I Synchronous operation
I Read data ready next cycle
I Read/write data buses sharesingle internal bit lines
Simplified SRAM Read Simplified SRAM Write
ECE 4750 T02: From CISC to RISC 20 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Agenda
Technology Trends Motivating RISC
Memory Basics
Single-Cycle Unpipelined MIPS Processor
Multi-Cycle Unpipelined MIPS Processor
ECE 4750 T02: From CISC to RISC 21 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
MIPS Instruction Formats
6 5 5 5 5 6ALU 0 rs rt rd 0 func R[rd]← R[rs] func R[rt]
31 26 25 21 20 16 15 11 10 6 5 0
6 5 5 16ALUI opcode rs rt immediate R[rt]← R[rs] op immediate
31 26 25 21 20 16 15 0
6 5 5 16ST: M[ R[rs] + sext(offset) ]← R[rt]LD: R[rt]← M[ R[rs] + sext(offset) ]
LD/ST opcode rs rt offset31 26 25 21 20 16 15 0
6 5 5 16if ( R[rs] == 0 )PC← PC+4 + offset*4
BEQZ opcode rs 0 offset31 26 25 21 20 16 15 0
6 5 5 16PC← R[rs]JALR also does R[31]← PC+8
JR/JALR opcode rs 0 031 26 25 21 20 16 15 0
6 26PC← jtarg( PC, target )JAL also does R[31]← PC+8
J/JAL opcode target31 26 25 0
ECE 4750 T02: From CISC to RISC 22 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Instruction Execution Steps
I 1. Instruction fetchI 2. Decode and register fetchI 3. ALU operationI 4. Memory operation if requiredI 5. Register write-back if requiredI — Computation of the next instruction to fetch
ECE 4750 T02: From CISC to RISC 23 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: ALU Reg-Reg Instructions (ADDU)
0x4 Add
clk
addr inst
Inst. Memory
PC
inst<25:21> inst<20:16>
inst<15:11>
inst<5:0>
OpCode
z ALU
ALU Control
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we
ECE 4750 T02: From CISC to RISC 24 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: ALUI Reg-Imm Instructions (ADDIU)
Imm Ext
ExtSel
inst<15:0>
OpCode
0x4 Add
clk
addr inst
Inst. Memory
PC
z ALU
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we inst<25:21>
inst<20:16>
inst<31:26> ALU Control
ECE 4750 T02: From CISC to RISC 25 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Address Conflicts in Merged Datapath with Muxes
Imm Ext
ExtSel OpCode
0x4 Add
clk
addr inst
Inst. Memory
PC
z ALU
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we inst<25:21>
inst<20:16>
inst<15:0>
inst<31:26> ALU Control
inst<15:11>
inst<5:0>
inst<20:16>
ECE 4750 T02: From CISC to RISC 26 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: ALU and ALUI Instructions
<31:26>, <5:0>
BSrc Reg / Imm
RegDst rt / rd
Imm Ext
ExtSel OpCode
0x4 Add
clk
addr inst
Inst. Memory
PC
z ALU
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we <25:21> <20:16>
<15:0>
OpSel
ALU Control
<15:11>
ECE 4750 T02: From CISC to RISC 27 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Approach for Program and Data Memory
I Harvard-style : separate program and data memories. Inspired by Howard Aiken and the Mark I. Read-only program memory. Read/write data memory. Need some way to load program memory
I Princeton-style : unified program and data memories. Inspired by von Neumann. Single read/write memory for both. Load/store instructions require accessing memory twice during execution
Most modern machines are mixed with separateinstruction and data caches but a unified main memory
that holds both the program and data
ECE 4750 T02: From CISC to RISC 28 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Load Instructions (LW)
WBSrc ALU / Mem
RegDst BSrc
rs
offset
ExtSel OpCode OpSel
ALU Control
z ALU
0x4
Add
clk
addr inst
Inst. Memory
PC
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
clk
MemWrite
addr
wdata
rdataDataMemory
we
ECE 4750 T02: From CISC to RISC 29 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Store Instructions (SW)
WBSrc ALU / Mem
RegDst BSrc
rs
offset
ExtSel OpCode OpSel
ALU Control
z ALU
0x4
Add
clk
addr inst
Inst. Memory
PC
RegWrite
clk
rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
clk
MemWrite
addr
wdata
rdataDataMemory
we
ECE 4750 T02: From CISC to RISC 30 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Conditional Branches (BEQZ)
0x4
Add
PCSrc
clk
WBSrc MemWrite
addr
wdata
rdata Data Memory
we
RegDst BSrc ExtSel OpCode
z
OpSel
clk
zero?
clk
addr inst
Inst. Memory
PC rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
ALU
ALU Control
Add
br
pc+4
RegWrite
ECE 4750 T02: From CISC to RISC 31 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Register-Indirect Jumps (JR)
0x4
RegWrite
Add Add
clk
WBSrc MemWrite
addr
wdata
rdata Data Memory
we
RegDst BSrc ExtSel OpCode
z
OpSel
clk
zero?
clk
addr inst
Inst. Memory
PC rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
ALU
ALU Control
PCSrc br
pc+4
rind
ECE 4750 T02: From CISC to RISC 32 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Register-Indirect Jump-&-Link (JALR)
0x4
RegWrite
Add Add
clk
WBSrc MemWrite
addr
wdata
rdata Data Memory
we
RegDst BSrc ExtSel OpCode
z
OpSel
clk
zero?
clk
addr inst
Inst. Memory
PC rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
ALU
ALU Control
31
PCSrc br
pc+4
rind
ECE 4750 T02: From CISC to RISC 33 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Datapath: Absolute Jump-&-Link (J,JAL)
0x4
RegWrite
Add Add
clk
WBSrc MemWrite
addr
wdata
rdata Data Memory
we
RegDst BSrc ExtSel OpCode
z
OpSel
clk
zero?
clk
addr inst
Inst. Memory
PC rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
ALU
ALU Control
31
PCSrc br
pc+4
rind jabs
ECE 4750 T02: From CISC to RISC 34 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Final Harvard Style Datapath for MIPS
0x4
RegWrite
Add Add
clk
WBSrc MemWrite
addr
wdata
rdata Data Memory
we
RegDst BSrc ExtSel OpCode
z
OpSel
clk
zero?
clk
addr inst
Inst. Memory
PC rd1
GPRs
rs1 rs2
ws wd rd2
we
Imm Ext
ALU
ALU Control
31
PCSrc br rind jabs pc+4
ECE 4750 T02: From CISC to RISC 35 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Hardwired Controller is Pure Combinational Logic
Comb Logic
op code
zero?
ExtSel
BSrc
OpSel
MemWrite
WBSrc
RegDst
RegWrite
PCSrc
Inst<31:26> (Opcode)
Decode Map
Inst<5:0> (Func)
ALUop
0?
+
OpSel ( Func,Op,+,0? )
ExtSel ( sExt16, uExt16, High16)
ECE 4750 T02: From CISC to RISC 36 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Hardwired Control Table
January 26, 2010 CS152, Spring 2010 42
Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc
ALU
ALUi
ALUiu
LW
SW
BEQZz=0
BEQZz=1
J
JAL
JR
JALR
Hardwired Control Table
BSrc = Reg / Imm WBSrc = ALU / Mem / PC
RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs
* * * no yes rind PC R31
rind * * * no no * *
jabs * * * no yes PC R31
jabs * * * no no * *
pc+4 sExt16 * 0? no no * *
br sExt16 * 0? no no * *
pc+4 sExt16 Imm + yes no * *
pc+4 Imm Op no yes ALU rt
pc+4 * Reg Func no yes ALU rd
sExt16 Imm Op pc+4 no yes ALU rt
pc+4 sExt16 Imm + no yes Mem rt
uExt16
BSrc = { Reg, Imm } RegDest = { rt, rd, R31 }WBSrc = { ALU, Mem, PC } PCSrc = { pc+4, br, rind, jabs }
ECE 4750 T02: From CISC to RISC 37 / 43
Motivating RISC Memory Basics • Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor
Single-Cycle Hardwired Control
Requires that clock period is sufficiently longso that all of the following steps can be completed
I 1. Instruction fetchI 2. Decode and register fetchI 3. ALU operationI 4. Data read or data store if requiredI 5. Register write-back setup time if required
tc > tifetch + trfrd + tALU + tdmem + trfwr
At the rising edge of the clock:the PC, the register file, and the memory are updated
ECE 4750 T02: From CISC to RISC 38 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •
Agenda
Technology Trends Motivating RISC
Memory Basics
Single-Cycle Unpipelined MIPS Processor
Multi-Cycle Unpipelined MIPS Processor
ECE 4750 T02: From CISC to RISC 39 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •
Multi-Cycle Unpipelined Datapath
write -back phase
fetch phase
execute phase
decode & Reg-fetch phase
memory phase
addr
wdata
rdata Data Memory
we ALU
Imm Ext
0x4
Add
addr rdata
Inst. Memory
rd1
GPRs
rs1 rs2
ws wd rd2
we
IR PC
Clock period is reduced by dividing the execution of an instruction intomultiple cycles; allows for more realistic synchronous memory
tc < max(tifetch, trf , tALU , tdmem, trfwr )
CPI will of course be greater than one
ECE 4750 T02: From CISC to RISC 40 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •
Multi-Cycle Unpipelined ControllerEC
E4750
Com
puterA
rchitecture,Fall2011Lab
2:Multicycle
PAR
Cv2
Processor
Figure2:A
ppendix:Multicycle
PAR
Cv1
StateD
iagram
20
ECE 4750 T02: From CISC to RISC 41 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •
Summary
I Microcoding less attractive due to evolving technology constraintsI Unpipelined µarch first step towards RISC design philosophyI “Iron Law” of processor performance helps explain design space
Inst 17 cycles
Inst 25 cycles
Inst 310 cycles
Inst 11 cycle
Inst 21 cycle
Inst 31 cycle
Inst 15 cycles
Inst 23 cycles
Inst 35 cycles
CPI = 7.33
CPI = 1 CPI = 4.33
Microcoded
Single-Cycle
Unpipelined
Multi-Cycle
Unpipelined
Microarchitecture CPI Cycle Time
last topic→ Microcoded >1 shortthis topic→ Single-Cycle Unpipelined 1 longthis topic→ Multi-Cycle Unpipelined >1 short
next topic→ Pipelined ≈1 short
ECE 4750 T02: From CISC to RISC 42 / 43
Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor • Multi-Cycle Unpipelined MIPS Processor •
Acknowledgements
Some of these slides contain material developed and copyrighted by:
Arvind (MIT), Krste Asanovic (MIT/UCB), Joel Emer (Intel/MIT)James Hoe (CMU), John Kubiatowicz (UCB), David Patterson (UCB)
MIT material derived from course 6.823UCB material derived from courses CS152 and CS252
ECE 4750 T02: From CISC to RISC 43 / 43