45
cps 104 1 Designing a Single Cycle Datapath Computer Science 104 Alvin R. Lebeck

Designing a Single Cycle Datapathdb.cs.duke.edu › courses › spring12 › cps104 › slides › lecture_datapath_1.pdfAn Abstract View of the Critical Path ° Register file and

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • cps 104 1

    Designing a Single Cycle Datapath

    Computer Science 104 Alvin R. Lebeck

  • cps 104 2

    Administrivia

     Homework Due Friday March 2

     Prof. Hilton no office hours this week

     Reading Ch 4.

     Review

     Register File

     Where are we with respect to the BIG picture?

     The Steps of Designing a Processor

     Datapath Design

  • cps 104 3

    What did you learn last time?

  • cps 104 4

    Register Cells on a bus

    One can “source” and “sink” from any cell on the bus by activating the right controls, IE--input enable, and OE--output enable.

    D

    E

    Q

    Q

    D latch

    IE OE

    D

    E

    Q

    Q

    D latch

    IE OE

    D

    E

    Q

    Q

    D latch

    IE OE

    D

    E

    Q

    Q

    D latch

    IE OE

  • cps 104 5

    3-Port Register Cell

    Q

    Q

    D ata-In

    D E nable OutA OutB

    Bus-B

    Bus-A

    Bus-C

    Complement Q

    •  Stores one bit of a register •  Can Read onto Bus-A & Bus-B and Write from Bus-C Simultaneously

  • cps 104 6

    Q

    Q

    Bus-B

    Bus-A

    Bus-C

    Q

    Q

    Bus-B

    Bus-A

    Bus-C

    Bit-0

    Bit-1

    EA EB EC

    3-Port Register File

  • cps 104 7

    Address Decode Circuit

    A0 A1 EA B0 B1 EB C0 C1 EC

    Q

    Q

    Data-in

    OutA OutB DEnable Bus-A

    Bus-B Bus-C

    Register address: 01

  • cps 104 8

    Reg-0 Reg-1 Reg-2 Reg-3

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    One-bit Cell

    A3 B3 C3

    A-En Add-A1 Add-A0 B -En Add-B 1 Add-B 0

    C -En Add-C 1 Add-C 0

    A1 B1 C1

    A2 B2 C2

    A0 B0 C0

    Register File (Four 4-bit Registers)

  • cps 104 9

    Digital Logic Summary

    ° Given Boolean function, generate a circuit to “realize” the function.

    ° Constructed circuits that can add and subtract.

    ° The ALU: a circuit that can add, subtract, detect overflow, compare, and do bit-wise operations (AND, OR, NOT)

    ° Shifter

    ° Memory Elements: SR-Latch, D Latch, D Flip-Flop

    ° Tri-state drivers & Bus Communication vs. MUX

    ° Register Files

    ° Control Signals modify what the circuit does with inputs •  ALU, Shift, Register Read/Write

  • cps 104 10

    What is Computer Architecture?

    •  Coordination of levels of abstraction

    I/O system CPU

    Compiler

    Operating System

    Application

    Digital Design Circuit Design

    •  Under a set of rapidly changing technology Forces

    Instruction Set Architecture, Memory, I/O

    Firmware

    Memory

    Software

    Hardware

    Interface Between HW and SW

  • cps 104 11

    The Big Picture: Where are We Now?

     The Five Classic Components of a Computer

    Control

    Datapath

    Memory

    Processor Input

    Output

    Today’s Topic: Datapath Design

  • cps 104 12

    Datapath Design

    ° How do we build hardware to implement the MIPS instructions?

    ° Add, LW, SW, Beq, Jump

  • cps 104 13

    An Abstract View of the Implementation

    Clk

    5

    Rw Ra Rb 32 32-bit Registers

    Rd

    AL

    U

    Clk

    Data In

    DataOut

    Data Address Ideal

    Data Memory

    Instruction

    Instruction Address

    Ideal Instruction

    Memory

    Clk PC

    5 Rs

    5 Rt

    16 Imm

    32

    32 32 32

    A

    B

  • cps 104 14

    The MIPS Instruction Formats

    ° All MIPS instructions are 32 bits long. The three instruction formats:

    • R-type

    •  I-type

    •  J-type

    °  The different fields are: •  op: operation of the instruction •  rs, rt, rd: the source and destination register specifiers •  shamt: shift amount •  funct: selects the variant of the operation in the “op” field •  address / immediate: address offset or immediate value •  target address: target address of the jump instruction

    op target address 0 26 31

    6 bits 26 bits

    op rs rt rd shamt funct 0 6 11 16 21 26 31

    6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

    op rs rt immediate 0 16 21 26 31

    6 bits 16 bits 5 bits 5 bits

  • cps 104 15

    The MIPS Subset (We can’t implement them all!)

    ° ADD and subtract •  add rd, rs, rt •  sub rd, rs, rt

    ° OR Immediate: •  ori rt, rs, imm16

    °  LOAD and STORE •  lw rt, rs, imm16 •  sw rt, rs, imm16

    ° BRANCH: •  beq rs, rt, imm16

    °  JUMP: •  j target

    op target address 0 26 31

    6 bits 26 bits

    op rs rt rd shamt funct 0 6 11 16 21 26 31

    6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

    op rs rt immediate 0 16 21 26 31

    6 bits 16 bits 5 bits 5 bits

  • cps 104 16

    The Hardware “Program”

    Instruction Fetch

    Instruction Decode

    Operand Fetch

    Execute

    Result Store

    Next Instruction

    How do I build the hardware to implement the MIPS instructions and their sequencing?

  • cps 104 17

    Combinational Logic Elements (Basic Building Blocks)

    ° Adder

    ° MUX

    ° ALU

    32

    32

    A

    B 32

    Sum

    Carry

    32

    32

    A

    B 32

    Result

    Zero

    OP

    32 A

    B 32

    Y 32

    Select

    Adder

    MU

    X

    AL

    U

    CarryIn

  • cps 104 18

    Storage Element: Register (Basic Building Block)

    ° Register •  Similar to the D Flip Flop except

    -  N-bit input and output -  Write Enable input

    • Write Enable: -  negated (0): Data Out will not change -  asserted (1): Data Out will become the same

    as Data In.

    Clk

    Data In

    Write Enable

    N N

    Data Out

  • cps 104 19

    Storage Element: Register File

    ° Register File consists of 32 registers: •  Two 32-bit output busses: busA and busB • One 32-bit input bus: busW

    ° Register is selected by: • RA selects the register to put on busA • RB selects the register to put on busB • RW selects the register to be written

    via busW when Write Enable is 1

    ° Clock input (CLK) •  The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block:

    -  RA or RB valid => busA or busB valid after “access time.”

    Clk

    busW

    Write Enable

    32 32

    busA

    32 busB

    5 5 5 RW RA RB

    32 32-bit Registers

  • cps 104 20

    Storage Element: Idealized Memory

    ° Memory (idealized) • One input bus: Data In • One output bus: Data Out

    ° Memory word is selected by: • Write Enable = 0: Address selects the word to put on the

    Data Out bus • Write Enable = 1: Address selects the memory

    word to be written via the Data In bus

    ° Clock input (CLK) •  The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic

    block: -  Address valid => Data Out valid after “access time.”

    Clk

    Data In

    Write Enable

    32 32 DataOut

    Address

  • cps 104 21

    An Abstract View of the Implementation

    Clk

    5

    Rw Ra Rb 32 32-bit Registers

    Rd

    AL

    U

    Clk

    Data In

    DataOut

    Data Address Ideal

    Data Memory

    Instruction

    Instruction Address

    Ideal Instruction

    Memory

    Clk PC

    5 Rs

    5 Rt

    16 Imm

    32

    32 32 32

    A

    B

  • cps 104 22

    Clocking Methodology

     All storage elements are clocked by the same clock edge

     Cycle Time >= CLK-to-Q + Longest Delay Path + Setup + Clock Skew

     Longest delay path = critical path

    Clk

    Don’t Care Setup Hold

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Setup Hold

  • cps 104 23

    An Abstract View of the Critical Path ° Register file and ideal memory:

    •  The CLK input is a factor ONLY during write operation • During read operation, behave as combinational logic:

    -  Address valid => Output valid after “access time.”

    Clk

    5

    Rw Ra Rb 32 32-bit Registers

    Rd A

    LU

    Clk

    Data In

    DataOut

    Data Address Ideal

    Data Memory

    Instruction

    Instruction Address

    Ideal Instruction

    Memory

    Clk PC

    5 Rs

    5 Rt

    16 Imm

    32

    32 32 32

  • cps 104 24

    The Steps of Designing a Processor

    °  Instruction Set Architecture => Register Transfer Language

    ° Register Transfer Language => • Datapath components • Datapath interconnect

    ° Datapath components => Control signals

    ° Control signals => Control logic

  • cps 104 25

    Overview of the Instruction Fetch Unit

    °  The common Register Transfer Language (RTL) operations •  Fetch the Instruction: mem[PC] • Update the program counter:

    -  Sequential Code: PC

  • cps 104 26

    RTL: The ADD Instruction

    °  add rd, rs, rt

    • mem[PC] Fetch the instruction from memory

    • R[rd]

  • cps 104 27

    RTL: The Load Instruction

    °  lw rt, rs, imm16

    • mem[PC] Fetch the instruction from memory

    • Address

  • cps 104 28

    RTL: The ADD Instruction

    °  add rd, rs, rt

    • mem[PC] Fetch the instruction from memory

    • R[rd]

  • cps 104 29

    RTL: The Subtract Instruction

    °  sub rd, rs, rt

    • mem[PC] Fetch the instruction from memory

    • R[rd]

  • cps 104 30

    Datapath for Register-Register Operations ° R[rd]

  • cps 104 31

    Register-Register Timing

    32

    Result

    ALUctr

    Clk

    busW

    RegWr

    32 32

    busA

    32 busB

    5 5 5

    Rw Ra Rb 32 32-bit Registers

    Rs Rt Rd

    AL

    U

    Register Write Occurs Here

    Clk

    PC

    Rs, Rt, Rd, Op, Func

    Clk-to-Q

    ALUctr

    Instruction Memory Access Time

    Old Value

    New Value

    RegWr Old Value

    New Value

    Delay through Control Logic

    busA, B Register File Access Time

    Old Value

    New Value

    busW ALU Delay

    Old Value

    New Value

    Old Value

    New Value

    New Value Old Value

    Inst fetch Decode Opr. fetch Execute Write Back

  • cps 104 32

    RTL: The OR Immediate Instruction

    °  ori rt, rs, imm16

    • mem[PC] Fetch the instruction from memory

    • R[rt]

  • cps 104 33

    Datapath for Logical Operations with Immediate ° R[rt]

  • cps 104 34

    RTL: The Load Instruction

    °  lw rt, rs, imm16

    • mem[PC] Fetch the instruction from memory

    • Address

  • cps 104 35

    Datapath for Load Operations ° R[rt]

  • cps 104 36

    RTL: The Store Instruction

    °  sw rt, rs, imm16

    • mem[PC] Fetch the instruction from memory

    • Address

  • cps 104 37

    Datapath for Store Operations ° Mem[R[rs] + SignExt[imm16]

  • cps 104 38

    RTL: The Branch Instruction

    °  beq rs, rt, imm16

    • mem[PC] Fetch the instruction from memory

    • Cond

  • cps 104 39

    Datapath for Branch Operations °  beq rs, rt, imm16 We need to compare Rs and Rt!

    op rs rt immediate 0 16 21 26 31

    6 bits 16 bits 5 bits 5 bits

    ALUctr

    Clk

    busW

    RegWr

    32 32

    busA

    32 busB

    5 5 5

    Rw Ra Rb 32 32-bit Registers

    Rs

    Rt

    Rt

    Rd RegDst

    Extender

    Mux

    Mux

    32 16

    imm16

    ALUSrc

    ExtOp

    AL

    U

    PC Clk

    Next Address Logic 16

    imm16

    Branch

    To Instruction Memory

    Zero

  • cps 104 40

    Binary Arithmetic for the Next Address

    °  In theory, the PC is a 32-bit byte address into the instruction memory: •  Sequential operation: PC = PC + 4 • Branch operation: PC = PC + 4 + SignExt[Imm16] * 4

    °  The magic number “4” always comes up because: •  The 32-bit PC is a byte address • And all our instructions are 4 bytes (32 bits) long

    °  In other words: •  The 2 LSBs of the 32-bit PC are always zeros •  There is no reason to have hardware to keep the 2 LSBs

    °  In practice, we can simplify the hardware by using a 30-bit PC: •  Sequential operation: PC = PC + 1 • Branch operation: PC = PC + 1 + SignExt[Imm16] •  In either case: Instruction-Memory-Address = PC concat “00”

  • cps 104 41

    Next Address Logic: Expensive and Fast Solution

    ° Using a 30-bit PC: •  Sequential operation: PC = PC + 1 • Branch operation: PC = PC + 1 + SignExt[Imm16] •  In either case: Instruction-Memory-Address = PC concat “00”

    30 30

    SignExt

    30

    16 imm16

    Mux

    0

    1

    Adder

    “1”

    PC

    Clk A

    dder

    30

    30

    Branch Zero

    Addr

    Instruction Memory

    Addr “00”

    32

    Instruction Instruction

    30

  • cps 104 42

    Next Address Logic

    30

    30 SignExt

    30 16 imm16

    Mux

    0

    1

    Adder

    “0”

    PC

    Clk

    30

    Branch Zero

    Addr

    Instruction Memory

    Addr “00”

    32

    Instruction

    30

    “1”

    Carry In

    Instruction

  • cps 104 43

    RTL: The Jump Instruction

    °  j target

    • mem[PC] Fetch the instruction from memory

    •  PC

  • cps 104 44

    Instruction Fetch Unit

    30 30

    SignExt

    30

    16 imm16

    Mux

    0

    1

    Adder “1”

    PC

    Clk A

    dder 30

    30

    Branch Zero

    “00”

    Addr

    Instruction Memory

    Addr

    32

    Mux

    1

    0

    26

    4 PC+4

    Target 30

    °  j target •  PC

  • cps 104 45

    Putting it All Together: A Single Cycle Datapath

    32

    ALUctr

    Clk

    busW

    RegWr

    32 32

    busA

    32 busB

    5 5 5

    Rw Ra Rb 32 32-bit Registers

    Rs

    Rt

    Rt

    Rd RegDst

    Extender

    Mux

    Mux

    32 16 imm16

    ALUSrc

    ExtOp

    Mux

    MemtoReg

    Clk

    Data In WrEn

    32 Adr

    Data Memory

    32

    MemWr A

    LU

    Instruction Fetch Unit

    Clk

    Zero

    Instruction

    Jump

    Branch

    ° We have everything except control signals.

    0

    1

    0

    1

    0 1

    Imm16 Rd Rs Rt