9/4/2018
1
EEC 581
Computer Architecture
Datapath
Department of Electrical Engineering and Computer
Science
Cleveland State University
1
2
Architects write the checks that the design engineers have
to cash. If the amount is too high, the whole project
goes bankrupt.
Design engineers must constantly juggle many conflicting
demands: schedule, performance, power dissipation,
features, testing, documentation, training and hiring.
The Pentium Chronicles, Colwell, pg. 64 & 63
9/4/2018
2
3
Review: MIPS Organization
ProcessorMemory
32 bits
230
words
read/write
addr
read data
write data
word address
(binary)
0…00000…01000…10000…1100
1…1100Register File
src1 addr
src2 addr
dst addr
write data
32 bits
src1data
src2data
32registers
($zero - $ra)
32
32
32
32
32
32
5
5
5
PC
ALU
32 32
32
32
32
0 1 2 3
7654
byte address
(big Endian)
Fetch
PC = PC+4
DecodeExec
Add32
324
Add32
32br offset
4
We're ready to look at an implementation of the MIPS
Simplified to contain only:
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j
Generic Implementation:
use the program counter (PC) to supply instruction address
get the instruction from memory
read registers
use the instruction to decide exactly what to do
The Processor: Datapath and Control
Fetch
PC = PC+4
DecodeExec
9/4/2018
3
5
Ada Lovelace, 1815-1852
Wrote the first computer program.
It calculated the Bernoulli numbers using Charles Babbage’s
Analytical Engine.
She was the only legitimate child of the poet Lord Byron.
Charles Babbage's Analytical Engine, 1871.
This was the first fully-automatic calculating machine.
6
Abstract / Simplified View:
Two types of functional units:
Elements that operate on data values (combinational)
Elements that contain state (sequential)
More Implementation Details
Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
9/4/2018
4
7
Unclocked vs. Clocked
Clocks used in synchronous logic
when should an element that contains state be updated?
cycle time
rising edge
falling edge
State Elements
8
Combinational Logic Review
Combinational logic circuits are memoryless
No feedback in combinational logic circuits
Output assumes the function implemented by the
logic network, assuming that the switching transients have
settled
Outputs can have multiple logical transitions before settling
to the correct value
Combinational
Circuit
Input Output
9/4/2018
5
9
Sequential Logic Circuits
Sequential circuits
Combinational logic circuits
State information (stored in memory)
Output is a function of inputs and present state
Can be synchronous or asynchronous
Combinationalcircuits
inputs outputs
StorageElement
delay
Present State
Next State
Controller by a periodic clock or an event trigger
10
The set-reset latch
output depends on present inputs and also on past inputs
It consists of two cross coupled NOR gates. Two inputs S and R,
two outputs are Q and Qn.
Similar to the cross coupled inverters, but its state can be
controlled by S and R, they set and reset the output Q.
An unclocked state element
S
Q
QN
R
S R Q QN
0 0 Q Q
0 1 0 1
1 0 1 0
1 1 0 0
Reset
Set
Undefined
No Change
9/4/2018
6
11
Output is equal to the stored value inside the element
(don't need to ask for permission to look at the value)
Change of state (value) is based on the clock
Latches: whenever the inputs change, and the clock is asserted
Level Sensitive Latch
Flip-flop: state changes only on a clock edge
(edge-triggered methodology) "logically true",
— could mean electrically low
A clocking methodology defines when signals can be read and written
— wouldn't want to read a signal at the same time it was being written
Latches and Flip-flops
Master-slave and edge-triggered Flip-flop
Thus, Flip-flop refers to a bi-stable element. (Edge-triggered register are also
called Flip-flops)
12
Two inputs:
the data value to be stored (D)
the clock signal (C) indicating when to read & store D
Two outputs:
the value of the internal state (Q) and it's complement
D-latch
Q
C
D
_Q
D
C
Q
Latch is transparent when clock is high. (copies input to output) !
9/4/2018
7
13
10T D Latch w/ Transmission Gates
D
C
C
C
Q
Q
The circuit consists of a t-gate based multiplexer and
a non-inverting buffer (built as a cascade of two inverters).
14
Transmission Gates
Pass transistors produce degraded outputs
Transmission gates pass both 0 and 1 well
g = 0, gb = 1
a b
g = 1, gb = 0
a b
0 strong 0
Input Output
1 strong 1
g
gb
a b
a b
g
gb
a b
g
gb
a b
g
gb
g = 1, gb = 0
g = 1, gb = 0
CMOS as a switch
-Amp
-Switch
-
9/4/2018
8
15
10T D Latch w/ Transmission Gates
D
C=1
C
Q
Q
D
Writing Data
D
DC
When the clock input is high, the current value from the data input (D)
will propagate through the left transmission gate and through the inverters.
16
10T D Latch w/ Transmission Gates
D_new
C=0
C
Q
Q
Writing Data
D
D
D
C
The output value Q of the flipflop will be fed back into the input of
the first-stage inverter. Therefore, the latch stores whatever value
it hold when the clock signal changed to low.
9/4/2018
9
17
Problem of Transparency
C
Transparent D-Latch
1
D Q D
En
D
Q
Oscillating Unstable Unstable
18
D flip-flop
Output changes only on the clock edge
_Q
Q
_Q
D
latch
D
C
D
latch
DD
C
C
D
C
Q
9/4/2018
10
19
Our Implementation
An edge triggered methodology
Typical execution:
read contents of some state elements,
send values through some combinational logic
write results to one or more state elements
Clock cycle
State
element
1
Combinational logic
State
element
2
20
Built using D flip-flops
Register File
M
u
x
Register 0
Register 1
Register n – 1
Register n
M
u
xRead data 1
Read data 2
Read register
number 1
Read register
number 2
Read register number 1 Read
data 1
Read data 2
Read register number 2
Register fileWrite register
Write data Write
* Recently, register files can be implemented by way of fast
Static RAMS with multiple ports.
9/4/2018
11
21
Register File
Note: we still use the real clock to determine when to
write
n-to-1
decoder
Register 0
Register 1
Register n – 1
C
C
D
D
Register n
C
C
D
D
Register number
Write
Register data
0
1
n – 1
n
22
Simple Implementation
Include the functional units we need for each
instruction
PC
Instruction
memory
Instruction address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
16 32Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data
memoryWrite data
Read data
a. Data memory unit
Address
ALU control
RegWrite
RegistersWriteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writedata
ALUresult
ALU
Data
Data
Register
numbers
a. Registers b. ALU
Zero5
5
5 3
9/4/2018
12
23
Fetching Instructions: Memory
Fetching instructions involves
reading the instruction from the Instruction Memory
updating the PC to hold the address of the next instruction
Read
AddressInstruction
Instruction
Memory
Add
PC
4
24
Decoding Instructions: Register
sending the fetched instruction’s opcode and function field
bits to the control unit
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
Control
Unit
reading two values from the Register File- Register File addresses are contained in the instruction
9/4/2018
13
25
Executing R Format Operations: ALU
R format operations (add, sub, slt, and, or)
perform the (op and funct) operation on values in rs and rt
store the result back into the Register File (into location rd)
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU controlRegWrite
R-type:
31 25 20 15 5 0
op rs rt rd functshamt
10
The Register File is not written every cycle (e.g. sw), so we need an
explicit write control signal for the Register File
26
Executing Load and Store Operations compute memory address by adding the base register (read
from the Register File during decode) to the 16-bit signed-
extended offset field in the instruction
store value (read from the Register File during decode) written
to the Data Memory
load value, read from the Data Memory, written to the Register
File
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU controlRegWrite
Data
Memory
Address
Write Data
Read Data
Sign
Extend
MemWrite
MemRead
16 32
9/4/2018
14
27
Executing Branch Operations
compare the operands read from the Register File during decode for equality (zero ALU output)
compute the branch target address by adding the updated PC to
the 16-bit signed-extended offset field in the
instr
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
ALU control
Sign
Extend16 32
Shift
left 2
Add
4Add
PC
Branch
target
address
(to branch
control logic)
28
Executing Jump Operations
Jump operation involves
replace the lower 28 bits of the PC with the lower 26 bits of the
fetched instruction shifted left by 2 bits
Read
AddressInstruction
Instruction
Memory
Add
PC
4
Shift
left 2
Jump
address
26
4
28
9/4/2018
15
29
Creating a Single Datapath from the Parts
Assemble the datapath segments and add control lines and
multiplexors as needed
Single cycle design – fetch, decode and execute each
instructions in one clock cycle
no datapath resource can be used more than once per instruction,
so some must be duplicated (e.g., separate Instruction Memory
and Data Memory, several adders)
multiplexors needed at the input of shared elements with control
lines to do the selection
write signals to control writing to the Register File and Data
Memory
Cycle time is determined by length of the longest path
30
Building the Datapath
Use multiplexers to stitch them together
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instruction memory
Read address
Instruction [31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
RegWrite
4
16 32Instruction [15– 0]
0
Registers
Write register
Write data
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Sign extend
ALU result
Zero
Data memory
Address Read data
M u x
1
0
M u x
1
0
M u x
1
0
M u x
1
Instruction [15– 11]
ALU control
Shift
left 2
PCSrc
ALU
AddALU
result
9/4/2018
16
31
R-Type Instructions (e.g. add $2, $3, $4; Not JR/JALR)
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
RegWrite
4
16 32Instruction [15–0]
0
Registers
Writeregister
Writedata
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
ALUresult
Zero
Datamemory
Address Readdata
Mux
1
0
Mux
1
0
Mux
1
0
Mux
1
Instruction [15–11]
ALUcontrol
Shiftleft 2
PCSrc
ALU
AddALU
result
32
I-Type Instructions (e.g. lw $4, 1000($15))
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
RegWrite
4
16 32Instruction [15–0]
0
Registers
Writeregister
Writedata
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
ALUresult
Zero
Datamemory
Address Readdata
Mux
1
0
Mux
1
0
Mux
1
0
Mux
1
Instruction [15–11]
ALUcontrol
Shiftleft 2
PCSrc
ALU
AddALU
result
9/4/2018
17
33
I-type Instruction for Branches
(e.g. beq $4, $5, Label7)
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
RegWrite
4
16 32Instruction [15–0]
0
Registers
Writeregister
Writedata
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
ALUresult
Zero
Datamemory
Address Readdata
Mux
1
1
Mux
0
0
Mux
1
0
Mux
1
Instruction [15–11]
ALUcontrol
Shiftleft 2
PCSrc
ALU
AddALU
result
34
Control
Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexer inputs)
Information comes from the 32 bits of the instruction
Example:
add $8, $17, $18 Instruction Format:
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
ALU's operation based on instruction type and
function code
9/4/2018
18
35
e.g., what should the ALU do with this instruction
Example: lw $1, 100($2)
35 2 1 100
op rs rt 16 bit offset
ALU control input
000 AND
001 OR010 add110 subtract111 set-on-less-than
Why is the code for subtract 110 and not 011? What do you need for slt instruction?
ALU Control
ALU control
ALU
result
ALU
Zero
3
Main control unit generates the ALUop bits for ALU control and
ALU control unit generates ALU control input, reducing main control size.
36
Supporting slt
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Set
Overflowdetection
Overflow
MSB Logic Block
Overflow logic depends on whether doing an addition or subtraction:
if (addition) overflow = (a and b and (not Nf ) ) or
( ( not a) and (not b) and Nf)
i.e. For addition, if sign bits of operands are the same, but the result
sign bit is different, then OVERFLOW has occurred.
Sign bit of
result for
addition,
subtraction.
Call this
‘Nf’
9/4/2018
19
37
Must describe hardware to compute 3-bit ALU control input
given instruction type
00 = lw, sw
01 = beq,
10 = arithmetic (incl. slt)
function code for arithmetic
Describe it using a truth table (can turn into gates):
ALUOp
computed from instruction type
Control the ALU
ALUOp Funct field ALU
ControlALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
X 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
inst[5:0]Generated from
Decoding inst[31:26]
ALU control
ALU
result
ALU
Zero
3
add
subaddsub
andor
slt
lw/sw
beq
arith
ALU
control
ALUOp
funct =
inst[5:0]
38
Two level implementation
inst
ruct
ion
reg
iste
r ALUop
ALUcontrol
Opco
de
Funct
.
31
26
0
5
bit
Control 1
Control 2
ALU
00: lw, sw01: beq10: add, sub, and, or, slt
000: and001: or010: add110: sub111: set on less than6
6
2
3
9/4/2018
20
39
ALU Control
Simple combinational logic (truth tables)
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5– 0)
ALUOp0
ALUOp
ALU control block
40
Three Instruction Classes Format
Instruction format for R-type all have an opcode of 0
R-type instructions have three operands: fields rs and rt are
sources, and rd is the destination.
Instruction format for load (opcode =35) and store (opcode
=43) instructions.
Register rs is base register that is added to 16-bit address field
to form memory address.
For loads, rt is destination register for loaded value.
For stores, rt is source register whose value should be stored
into memory.
Instruction format for branch equal (opcode = 4)
Registers rs and rt are the source registers that are compared
for equality.
16-bit address field is sign-extended, shifted, and added to PC
to compute branch target address
9/4/2018
21
41
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31–26]
4
16 32Instruction [15–0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Shiftleft 2
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1
Instruction [15–11]
ALUcontrol
ALUAddress
Use rt not rd
42
The Effect of the seven Control signal
Textbook Figure 5.16 p306.
9/4/2018
22
43
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31–26]
4
16 32Instruction [15–0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Shiftleft 2
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1
Instruction [15–11]
ALUcontrol
ALUAddress
Use rt not rd
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
44
Control Unit Signals
R-format Iw sw beq
Op0
Op1
Op2
Op3
Op4
Op5
Inputs
Outputs
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
To harness
the datapath
Inst[31:26]
9/4/2018
23
45
All of the logic is combinational
We wait for everything to settle down, and the right thing to be
done
ALU might not produce “right answer” right away
we use write signals along with clock to determine when to write
Cycle time determined by length of the longest path
Our Simple Control Structure
We are ignoring some details like setup and hold times
Clock cycle
State
element
1
Combinational logic
State
element
2