Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
cps 104 1
Designing a Single Cycle Datapath
Computer Science 104 Alvin R. Lebeck
cps 104 2
Administrivia
Homework Due Friday March 2
Prof. Hilton no office hours this week
Reading Ch 4.
Review
Register File
Where are we with respect to the BIG picture?
The Steps of Designing a Processor
Datapath Design
cps 104 3
What did you learn last time?
cps 104 4
Register Cells on a bus
One can “source” and “sink” from any cell on the bus by activating the right controls, IE--input enable, and OE--output enable.
D
E
Q
Q
D latch
IE OE
D
E
Q
Q
D latch
IE OE
D
E
Q
Q
D latch
IE OE
D
E
Q
Q
D latch
IE OE
cps 104 5
3-Port Register Cell
Q
Q
D ata-In
D E nable OutA OutB
Bus-B
Bus-A
Bus-C
Complement Q
• Stores one bit of a register • Can Read onto Bus-A & Bus-B and Write from Bus-C Simultaneously
cps 104 6
Q
Q
Bus-B
Bus-A
Bus-C
Q
Q
Bus-B
Bus-A
Bus-C
Bit-0
Bit-1
EA EB EC
3-Port Register File
cps 104 7
Address Decode Circuit
A0 A1 EA B0 B1 EB C0 C1 EC
Q
Q
Data-in
OutA OutB DEnable Bus-A
Bus-B Bus-C
Register address: 01
cps 104 8
Reg-0 Reg-1 Reg-2 Reg-3
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
One-bit Cell
A3 B3 C3
A-En Add-A1 Add-A0 B -En Add-B 1 Add-B 0
C -En Add-C 1 Add-C 0
A1 B1 C1
A2 B2 C2
A0 B0 C0
Register File (Four 4-bit Registers)
cps 104 9
Digital Logic Summary
° Given Boolean function, generate a circuit to “realize” the function.
° Constructed circuits that can add and subtract.
° The ALU: a circuit that can add, subtract, detect overflow, compare, and do bit-wise operations (AND, OR, NOT)
° Shifter
° Memory Elements: SR-Latch, D Latch, D Flip-Flop
° Tri-state drivers & Bus Communication vs. MUX
° Register Files
° Control Signals modify what the circuit does with inputs • ALU, Shift, Register Read/Write
cps 104 10
What is Computer Architecture?
• Coordination of levels of abstraction
I/O system CPU
Compiler
Operating System
Application
Digital Design Circuit Design
• Under a set of rapidly changing technology Forces
Instruction Set Architecture, Memory, I/O
Firmware
Memory
Software
Hardware
Interface Between HW and SW
cps 104 11
The Big Picture: Where are We Now?
The Five Classic Components of a Computer
Control
Datapath
Memory
Processor Input
Output
Today’s Topic: Datapath Design
cps 104 12
Datapath Design
° How do we build hardware to implement the MIPS instructions?
° Add, LW, SW, Beq, Jump
cps 104 13
An Abstract View of the Implementation
Clk
5
Rw Ra Rb 32 32-bit Registers
Rd
AL
U
Clk
Data In
DataOut
Data Address Ideal
Data Memory
Instruction
Instruction Address
Ideal Instruction
Memory
Clk PC
5 Rs
5 Rt
16 Imm
32
32 32 32
A
B
cps 104 14
The MIPS Instruction Formats
° All MIPS instructions are 32 bits long. The three instruction formats:
• R-type
• I-type
• J-type
° The different fields are: • op: operation of the instruction • rs, rt, rd: the source and destination register specifiers • shamt: shift amount • funct: selects the variant of the operation in the “op” field • address / immediate: address offset or immediate value • target address: target address of the jump instruction
op target address 0 26 31
6 bits 26 bits
op rs rt rd shamt funct 0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate 0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
cps 104 15
The MIPS Subset (We can’t implement them all!)
° ADD and subtract • add rd, rs, rt • sub rd, rs, rt
° OR Immediate: • ori rt, rs, imm16
° LOAD and STORE • lw rt, rs, imm16 • sw rt, rs, imm16
° BRANCH: • beq rs, rt, imm16
° JUMP: • j target
op target address 0 26 31
6 bits 26 bits
op rs rt rd shamt funct 0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate 0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
cps 104 16
The Hardware “Program”
Instruction Fetch
Instruction Decode
Operand Fetch
Execute
Result Store
Next Instruction
How do I build the hardware to implement the MIPS instructions and their sequencing?
cps 104 17
Combinational Logic Elements (Basic Building Blocks)
° Adder
° MUX
° ALU
32
32
A
B 32
Sum
Carry
32
32
A
B 32
Result
Zero
OP
32 A
B 32
Y 32
Select
Adder
MU
X
AL
U
CarryIn
cps 104 18
Storage Element: Register (Basic Building Block)
° Register • Similar to the D Flip Flop except
- N-bit input and output - Write Enable input
• Write Enable: - negated (0): Data Out will not change - asserted (1): Data Out will become the same
as Data In.
Clk
Data In
Write Enable
N N
Data Out
cps 104 19
Storage Element: Register File
° Register File consists of 32 registers: • Two 32-bit output busses: busA and busB • One 32-bit input bus: busW
° Register is selected by: • RA selects the register to put on busA • RB selects the register to put on busB • RW selects the register to be written
via busW when Write Enable is 1
° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block:
- RA or RB valid => busA or busB valid after “access time.”
Clk
busW
Write Enable
32 32
busA
32 busB
5 5 5 RW RA RB
32 32-bit Registers
cps 104 20
Storage Element: Idealized Memory
° Memory (idealized) • One input bus: Data In • One output bus: Data Out
° Memory word is selected by: • Write Enable = 0: Address selects the word to put on the
Data Out bus • Write Enable = 1: Address selects the memory
word to be written via the Data In bus
° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic
block: - Address valid => Data Out valid after “access time.”
Clk
Data In
Write Enable
32 32 DataOut
Address
cps 104 21
An Abstract View of the Implementation
Clk
5
Rw Ra Rb 32 32-bit Registers
Rd
AL
U
Clk
Data In
DataOut
Data Address Ideal
Data Memory
Instruction
Instruction Address
Ideal Instruction
Memory
Clk PC
5 Rs
5 Rt
16 Imm
32
32 32 32
A
B
cps 104 22
Clocking Methodology
All storage elements are clocked by the same clock edge
Cycle Time >= CLK-to-Q + Longest Delay Path + Setup + Clock Skew
Longest delay path = critical path
Clk
Don’t Care Setup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
cps 104 23
An Abstract View of the Critical Path ° Register file and ideal memory:
• The CLK input is a factor ONLY during write operation • During read operation, behave as combinational logic:
- Address valid => Output valid after “access time.”
Clk
5
Rw Ra Rb 32 32-bit Registers
Rd A
LU
Clk
Data In
DataOut
Data Address Ideal
Data Memory
Instruction
Instruction Address
Ideal Instruction
Memory
Clk PC
5 Rs
5 Rt
16 Imm
32
32 32 32
cps 104 24
The Steps of Designing a Processor
° Instruction Set Architecture => Register Transfer Language
° Register Transfer Language => • Datapath components • Datapath interconnect
° Datapath components => Control signals
° Control signals => Control logic
cps 104 25
Overview of the Instruction Fetch Unit
° The common Register Transfer Language (RTL) operations • Fetch the Instruction: mem[PC] • Update the program counter:
- Sequential Code: PC
cps 104 26
RTL: The ADD Instruction
° add rd, rs, rt
• mem[PC] Fetch the instruction from memory
• R[rd]
cps 104 27
RTL: The Load Instruction
° lw rt, rs, imm16
• mem[PC] Fetch the instruction from memory
• Address
cps 104 28
RTL: The ADD Instruction
° add rd, rs, rt
• mem[PC] Fetch the instruction from memory
• R[rd]
cps 104 29
RTL: The Subtract Instruction
° sub rd, rs, rt
• mem[PC] Fetch the instruction from memory
• R[rd]
cps 104 30
Datapath for Register-Register Operations ° R[rd]
cps 104 31
Register-Register Timing
32
Result
ALUctr
Clk
busW
RegWr
32 32
busA
32 busB
5 5 5
Rw Ra Rb 32 32-bit Registers
Rs Rt Rd
AL
U
Register Write Occurs Here
Clk
PC
Rs, Rt, Rd, Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value
New Value
RegWr Old Value
New Value
Delay through Control Logic
busA, B Register File Access Time
Old Value
New Value
busW ALU Delay
Old Value
New Value
Old Value
New Value
New Value Old Value
Inst fetch Decode Opr. fetch Execute Write Back
cps 104 32
RTL: The OR Immediate Instruction
° ori rt, rs, imm16
• mem[PC] Fetch the instruction from memory
• R[rt]
cps 104 33
Datapath for Logical Operations with Immediate ° R[rt]
cps 104 34
RTL: The Load Instruction
° lw rt, rs, imm16
• mem[PC] Fetch the instruction from memory
• Address
cps 104 35
Datapath for Load Operations ° R[rt]
cps 104 36
RTL: The Store Instruction
° sw rt, rs, imm16
• mem[PC] Fetch the instruction from memory
• Address
cps 104 37
Datapath for Store Operations ° Mem[R[rs] + SignExt[imm16]
cps 104 38
RTL: The Branch Instruction
° beq rs, rt, imm16
• mem[PC] Fetch the instruction from memory
• Cond
cps 104 39
Datapath for Branch Operations ° beq rs, rt, imm16 We need to compare Rs and Rt!
op rs rt immediate 0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
ALUctr
Clk
busW
RegWr
32 32
busA
32 busB
5 5 5
Rw Ra Rb 32 32-bit Registers
Rs
Rt
Rt
Rd RegDst
Extender
Mux
Mux
32 16
imm16
ALUSrc
ExtOp
AL
U
PC Clk
Next Address Logic 16
imm16
Branch
To Instruction Memory
Zero
cps 104 40
Binary Arithmetic for the Next Address
° In theory, the PC is a 32-bit byte address into the instruction memory: • Sequential operation: PC = PC + 4 • Branch operation: PC = PC + 4 + SignExt[Imm16] * 4
° The magic number “4” always comes up because: • The 32-bit PC is a byte address • And all our instructions are 4 bytes (32 bits) long
° In other words: • The 2 LSBs of the 32-bit PC are always zeros • There is no reason to have hardware to keep the 2 LSBs
° In practice, we can simplify the hardware by using a 30-bit PC: • Sequential operation: PC = PC + 1 • Branch operation: PC = PC + 1 + SignExt[Imm16] • In either case: Instruction-Memory-Address = PC concat “00”
cps 104 41
Next Address Logic: Expensive and Fast Solution
° Using a 30-bit PC: • Sequential operation: PC = PC + 1 • Branch operation: PC = PC + 1 + SignExt[Imm16] • In either case: Instruction-Memory-Address = PC concat “00”
30 30
SignExt
30
16 imm16
Mux
0
1
Adder
“1”
PC
Clk A
dder
30
30
Branch Zero
Addr
Instruction Memory
Addr “00”
32
Instruction Instruction
30
cps 104 42
Next Address Logic
30
30 SignExt
30 16 imm16
Mux
0
1
Adder
“0”
PC
Clk
30
Branch Zero
Addr
Instruction Memory
Addr “00”
32
Instruction
30
“1”
Carry In
Instruction
cps 104 43
RTL: The Jump Instruction
° j target
• mem[PC] Fetch the instruction from memory
• PC
cps 104 44
Instruction Fetch Unit
30 30
SignExt
30
16 imm16
Mux
0
1
Adder “1”
PC
Clk A
dder 30
30
Branch Zero
“00”
Addr
Instruction Memory
Addr
32
Mux
1
0
26
4 PC+4
Target 30
° j target • PC
cps 104 45
Putting it All Together: A Single Cycle Datapath
32
ALUctr
Clk
busW
RegWr
32 32
busA
32 busB
5 5 5
Rw Ra Rb 32 32-bit Registers
Rs
Rt
Rt
Rd RegDst
Extender
Mux
Mux
32 16 imm16
ALUSrc
ExtOp
Mux
MemtoReg
Clk
Data In WrEn
32 Adr
Data Memory
32
MemWr A
LU
Instruction Fetch Unit
Clk
Zero
Instruction
Jump
Branch
° We have everything except control signals.
0
1
0
1
0 1
Imm16 Rd Rs Rt