Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 2
CSE 420 Chapter 2 — Instructions: Language of the Computer — 3
The MIPS Instruction Set n Used as the example throughout the book n Large share of embedded core market
but dwarfed by ARM
n Typical of many modern ISAs n See MIPS Reference Data tear-out card, and
Appendixes B and E
CSE 420 Chapter 2 — Instructions: Language of the Computer — 4
Arithmetic Operations n Add and subtract have three operands
n Two sources and one destination add a, b, c # a gets b + c
n All MIPS arithmetic ops have this form n Design Principle 1:
Simplicity favors regularity. n Regularity makes implementation simpler n Regularity enables higher performance
at lower cost
§2.2 Operations of the C
omputer H
ardware
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 4
CSE 420 Chapter 2 — Instructions: Language of the Computer — 7
Register Operands n Arithmetic instructions use register operands n MIPS has a 32 × 32-bit register file
n Used for frequently accessed data n Numbered 0 to 31
n Assembler names n $t0, $t1, …, $t9 for temporary values n $s0, $s1, …, $s7 for saved variables
n Design Principle 2: Smaller is faster.
§2.3 Operands of the C
omputer H
ardware
CSE 420 Chapter 2 — Instructions: Language of the Computer — 8
Register Operand Example n C code: f = (g + h) - (i + j);
n Register assignment? n f in $s0, g in $s1, h in $s2, i in $s3, j in $s4 n Temps?
n Compiled MIPS code: add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 5
CSE 420 Chapter 2 — Instructions: Language of the Computer — 9
Load-Store Architecture n Data is in memory
n Load values from memory into registers n Store result from register to memory
n Details: n Memory is byte addressed
n Each address identifies an 8-bit byte n Words are aligned in memory
n Word addresses must be a multiple of 4 n MIPS is Big Endian
n Most-significant byte at least address of a word n c.f. Little Endian: least-significant byte at least address
CSE 420 Chapter 2 — Instructions: Language of the Computer — 10
Memory Operand Example 1 n C code: g = h + A[8];
n g in $s1, h in $s2, base address of A in $s3 n Compiled MIPS code:
n Index 8 requires offset of 32 Why?
lw $t0, 32($s3) # load word add $s1, $s2, $t0
offset base register
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 6
CSE 420 Chapter 2 — Instructions: Language of the Computer — 11
Memory Operand Example 2 n C code: A[12] = h + A[8];
n h in $s2, base address of A in $s3 n Compiled MIPS code:
n Index 8 requires offset of 32 n Index 12 requires offset of ?
lw $t0, 32($s3) # load word add $t0, $s2, $t0 sw $t0, 48($s3) # store word
CSE 420 Chapter 2 — Instructions: Language of the Computer — 12
Registers vs. Memory n Registers are faster than memory n Load-store è
“more” instructions to be executed n Register optimization is important!
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 7
CSE 420 Chapter 2 — Instructions: Language of the Computer — 13
Immediate Operands n Constant data specified in an instruction addi $s3, $s3, 4
n There is no “subtract immediate” instruction n Subtract is “add a negative” addi $s2, $s1, -4
n Design Principle 3: Make the common case fast. n Small constants are common n Immediate operand avoids a load instruction n Why?
CSE 420 Chapter 2 — Instructions: Language of the Computer — 14
The Constant Zero n MIPS register 0 ($zero) is the constant 0
n Cannot be overwritten n Useful for common operations
n e.g., move between registers add $t2, $s1, $zero
n Why a fixed register zero?
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 10
CSE 420 Chapter 2 — Instructions: Language of the Computer — 19
Sign Extension n How do you
represent a number using more bits? e.g. move an 8-bit number into a 16-bit word
n Replicate the sign bit to the left n c.f. unsigned values: extend with 0s
n Examples: 8-bit to 16-bit n +2: 0000 0010 => 0000 0000 0000 0010 n –2: 1111 1110 => 1111 1111 1111 1110
n In MIPS instruction set n addi: extend immediate value n lb, lh: extend loaded byte/halfword n beq, bne: extend the displacement
CSE 420 Chapter 2 — Instructions: Language of the Computer — 20
Representing Instructions n Instructions are encoded in binary
n Called “machine code”
n MIPS instructions n Encoded as 32-bit instruction words (Regularity!) n Small number of formats encode
opcode, register numbers, … n Register numbers
n $t0 – $t7 are reg’s 8 – 15 n $s0 – $s7 are reg’s 16 – 23 n $t8 – $t9 are reg’s 24 – 25 n (0-8 and 25-31 described later – procedure calls)
§2.5 Representing Instructions in the C
omputer
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 11
CSE 420 Chapter 2 — Instructions: Language of the Computer — 21
MIPS R-format Instructions
n Instruction fields n op: operation code (opcode) n rs: first source register number n rt: second source register number n rd: destination register number n shamt: shift amount (00000 for now) n funct: function code (extends opcode) (Why?)
op rs rt rd shamt funct 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
CSE 420 Chapter 2 — Instructions: Language of the Computer — 22
R-format Example
add $t0, $s1, $s2
special $s1 $s2 $t0 0 add
0 17 18 8 0 32
000000 10001 10010 01000 00000 100000
000000100011001001000000001000002 = 0232402016
op rs rt rd shamt funct 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 12
CSE 420 Chapter 2 — Instructions: Language of the Computer — 23
Hexadecimal n Base 16
n Compact representation of bit strings n 4 bits per hex digit
0 0000 4 0100 8 1000 c 1100 1 0001 5 0101 9 1001 d 1101 2 0010 6 0110 a 1010 e 1110 3 0011 7 0111 b 1011 f 1111
n Example: eca8 6420 n 1110 1100 1010 1000 0110 0100 0010 0000
CSE 420 Chapter 2 — Instructions: Language of the Computer — 24
MIPS I-format Instructions
n Immediate arithmetic and load/store instructions n rt: destination or source register number n Constant: –215 to +215 – 1 n Address: offset added to base address in rs
n Design Principle 4: Good design demands good compromises. n Different formats complicate decoding,
but allow 32-bit instruction uniformity n Keep formats as similar as possible
op rs rt constant or address 6 bits 5 bits 5 bits 16 bits
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 13
CSE 420 Chapter 2 — Instructions: Language of the Computer — 25
Stored Program Computers n Instructions represented in
binary, just like data n Instructions and data
stored in memory n Programs can
operate on programs n e.g., compilers, linkers, …
n Binary compatibility allows compiled programs to work on other computers with the same ISA.
The BIG Picture
CSE 420 Chapter 2 — Instructions: Language of the Computer — 26
Logical Operations n Instructions for bitwise manipulation
Operation C Java MIPS Shift left << << sll
Shift right >> >>> srl
Bitwise AND & & and, andi
Bitwise OR | | or, ori
Bitwise NOT ~ ~ nor
n Useful for extracting and inserting groups of bits in a word
§2.6 Logical Operations
The University of Adelaide, School of Computer Science 17 January 2011
Chapter 2 — Instructions: Language of the Computer 27
CSE 420 Chapter 2 — Instructions: Language of the Computer — 53
Branch Addressing n Branch instructions specify
n Opcode, two registers, target address n Most branch targets are near branch
n Forward or backward
op rs rt constant or address 6 bits 5 bits 5 bits 16 bits
n PC-relative addressing n Target address = PC + offset × 4 n PC already incremented by 4 by this time
CSE 420 Chapter 2 — Instructions: Language of the Computer — 54
Jump Addressing n Jump (j and jal) targets could be
anywhere in text segment n Encode full address in instruction
op address 6 bits 26 bits
n (Pseudo)Direct jump addressing n Target address = PC31…28 : (address × 4)
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 1
Chapter 4
The Processor
CSE 420 Chapter 4 — The Processor — 2
Introduction n CPU performance factors
n Instruction count n Determined by ISA and compiler
n CPI and Cycle time n Determined by CPU hardware
n We will examine two MIPS implementations n A simplified version n A more realistic pipelined version
n Simple subset, shows most aspects n Memory reference: lw, sw n Arithmetic/logical: add, sub, and, or, slt n Control transfer: beq, j
§4.1 Introduction
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 2
CSE 420 Chapter 4 — The Processor — 3
Instruction Execution n PC → instruction memory, fetch instruction n Register numbers → register file, read registers n Depending on instruction class
n Use ALU to calculate n Arithmetic result n Memory address for load/store n Branch target address
n Access data memory for load/store n PC ← target address or PC + 4
CSE 420 Chapter 4 — The Processor — 4
CPU Overview
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 3
CSE 420 Chapter 4 — The Processor — 5
Multiplexers n Can’t just join
wires together n Use multiplexers
CSE 420 Chapter 4 — The Processor — 6
Control
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 6
CSE 420 Chapter 4 — The Processor — 11
Clocking Methodology n Combinational logic
transforms data during clock cycles n Between clock edges n Input from state elements,
output to state element n Longest delay determines clock period
CSE 420 Chapter 4 — The Processor — 12
Building a Datapath n Datapath
n Elements that process data and addresses in the CPU
n Registers, ALUs, mux’s, memories, …
n We will build a MIPS datapath incrementally n Refining the overview design
§4.3 Building a D
atapath
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 7
CSE 420 Chapter 4 — The Processor — 13
Instruction Fetch
32-bit register
Increment by 4 for next instruction
CSE 420 Chapter 4 — The Processor — 14
R-Format Instructions n Read two register operands n Perform arithmetic/logical operation n Write register result
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 8
CSE 420 Chapter 4 — The Processor — 15
Load/Store Instructions n Read register operands n Calculate address using 16-bit offset
n Use ALU, but sign-extend offset n Load: Read memory and update register n Store: Write register value to memory
CSE 420 Chapter 4 — The Processor — 16
Branch Instructions n Read register operands n Compare operands
n Use ALU, subtract and check Zero output n Calculate target address
n Sign-extend displacement n Shift left 2 places (word displacement) n Add to PC + 4
n Already calculated by instruction fetch
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 9
CSE 420 Chapter 4 — The Processor — 17
Branch Instructions
Just re-routes
wires
Sign-bit wire replicated
CSE 420 Chapter 4 — The Processor — 18
Composing the Elements n First-cut data path
does an instruction in one clock cycle n Each data-path element
can only do one function at a time n Hence, we need
separate instruction and data memories n Use multiplexers where alternate data
sources are used for different instructions
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 10
CSE 420 Chapter 4 — The Processor — 19
R-Type/Load/Store Datapath
Full Data Path
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 11
CSE 420 Chapter 4 — The Processor — 21
ALU Control n ALU used for
n Load/Store: F = add n Branch: F = subtract n R-type: F depends on funct field
§4.4 A Sim
ple Implem
entation Schem
e
ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR
CSE 420 Chapter 4 — The Processor — 22
ALU Control n Assume 2-bit ALUOp derived from opcode
n Combinational logic derives ALU control
opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010
subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 12
CSE 420 Chapter 4 — The Processor — 23
The Main Control Unit n Control signals derived from instruction
0 rs rt rd shamt funct 31:26 5:0 25:21 20:16 15:11 10:6
35 or 43 rs rt address 31:26 25:21 20:16 15:0
4 rs rt address 31:26 25:21 20:16 15:0
R-type
Load/ Store
Branch
opcode always read
read, except for load
write for R-type
and load
sign-extend and add
Data Path With Control
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 16
CSE 420 Chapter 4 — The Processor — 31
Pipelining Analogy n Pipelined laundry: overlapping execution
n Parallelism improves performance n Four stages: Wash – Dry – Fold - Hang
§4.5 An O
verview of P
ipelining n Four loads: n Speedup
= 8/3.5 = 2.3 n Non-stop:
n Speedup = 2n/(0.5n + 1.5) ≈ 4 = number of stages
CSE 420 Chapter 4 — The Processor — 32
MIPS Pipeline n Five stages, one step per stage
1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 19
CSE 420 Chapter 4 — The Processor — 37
Hazards n Situations that prevent starting the next
instruction in the next cycle n Structural hazard
n A required resource is busy n Data hazard
n Need to wait for previous instruction to complete its data read/write
n Control hazard n Deciding on control action
depends on previous instruction
CSE 420 Chapter 4 — The Processor — 38
Structure Hazards n Conflict for use of a resource n In MIPS pipeline with a single memory
n Load/store requires data access n Instruction fetch would have to stall
for that cycle n Would cause a pipeline “bubble”
n Hence, pipelined data paths require separate instruction & data memories n Or separate instruction & data caches
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 25
CSE 420 Chapter 4 — The Processor — 49
MIPS Pipelined Datapath §4.6 P
ipelined Datapath and C
ontrol
WB
MEM
Right-to-left flow leads to hazards
CSE 420 Chapter 4 — The Processor — 50
Pipeline registers n Need registers between stages
n To hold information produced in previous cycle
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 26
CSE 420 Chapter 4 — The Processor — 51
Pipeline Operation n Cycle-by-cycle flow of instructions
through the pipelined data path n “Single-clock-cycle” pipeline diagram
n Shows pipeline usage in a single cycle n Highlight resources used
n c.f. “multi-clock-cycle” diagram n Graph of operation over time
n We’ll look at “single-clock-cycle” diagrams for load & store
CSE 420 Chapter 4 — The Processor — 52
IF for Load, Store, …
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 27
CSE 420 Chapter 4 — The Processor — 53
ID for Load, Store, …
CSE 420 Chapter 4 — The Processor — 54
EX for Load
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 28
CSE 420 Chapter 4 — The Processor — 55
MEM for Load
CSE 420 Chapter 4 — The Processor — 56
WB for Load
Wrong register number
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 29
CSE 420 Chapter 4 — The Processor — 57
Corrected Datapath for Load
CSE 420 Chapter 4 — The Processor — 58
EX for Store
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 33
CSE 420 Chapter 4 — The Processor — 65
Pipelined Control n Control signals derived from instruction
n As in single-cycle implementation
Chapter 4 — The Processor — 66
Pipelined Control
Morgan Kaufmann Publishers 26 January 2011
Chapter 4 — The Processor 38
CSE 420 Chapter 4 — The Processor — 75
Datapath with Forwarding
CSE 420 Chapter 4 — The Processor — 76
Load-Use Data Hazard
Need to stall for one cycle