4 29 03 ImplementingMIPS 0429

Embed Size (px)

Citation preview

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    1/45

    1

    The Midterm is Coming

    Midterm on May 8th. Midterm review on May 6th. Come to class with questions.

    Midterm will cover everything before it It will mostly resemble the homeworks andthe reading quizzes.

    We will send out a reading quiz compendium onThursday

    It will be challenging. It will be curved.

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    2/45

    2

    The Final is Also Coming (but

    more slowly) Despite what the online schedule says, we

    have only one final time and it is:

    6/10/2014

    8:00am-11:00am.

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    3/45

    3

    Implementing a MIPS

    Processor

    Readings: 4.1-4.9

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    4/454

    Goals for this Class

    Understand how CPUs run programs How do we express the computation the CPU? How does the CPU execute it? How does the CPU support other system components (e.g., the OS)? What techniques and technologies are involved and how do they work?

    Understand why CPU performance (and other metrics)varies How does CPU design impact performance? What trade-offs are involved in designing a CPU? How can we meaningfully measure and compare computer systems?

    Understand why program performance varies

    How do program characteristics affect performance? How can we improve a programs performance by considering the CPU

    running it?

    How do other system components impact program performance?

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    5/45

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    6/45

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    7/45

    Foreshadowing

    Act I: A Single-cycle Processor Simplest design Not how many real machineswork (maybe some deeply embedded processors)

    Figure out the basic parts; what it takes to executeinstructions

    Act II: A Pipelined Processor This is how many real machines work Exploit parallelism by executing multiple instructions

    at once.

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    8/458

    Target ISA

    We will focus on part of MIPS Enough to run into the interesting issues Memory operations A few arithmetic/Logical operations (Generalizing is

    straightforward)

    BEQ and J This corresponds pretty directly to what

    youll be implementing in 141L.

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    9/459

    Basic Steps for Execution

    Fetch an instruction from the instruction store Decode it What does this instruction do?

    Gather inputs From the register file From memory

    Perform the operation Write back the outputs

    To register file or memory Determine the next instruction to execute

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    10/45

    10

    The Processor Design Algorithm

    Once you have an ISA

    Design/Draw the datapath Identify and instantiate the hardware for your architectural state Foreach instruction

    Simulate the instruction

    Add and connect the datapath elements it requires Is it workable? If not, fix it. Design the control

    Foreach instruction Simulate the instruction

    What control lines do you need? How will you compute their value? Modify control accordingly Is it workable? If not, fix it.

    Youve already done much of this in 141L.

    Arithmetic; R Type

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    11/45

    Arithmetic; R-Type Inst = Mem[PC] REG[rd] = REG[rs] op REG[rt] PC = PC + 4

    bits 31:26 25:21 20:16 15:11 10:6 5:0

    name op rs rt rd shamt funct

    # bits 6 5 5 5 5 6

    ADDI; I Type

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    12/45

    12

    ADDI; I-Type PC = PC + 4 REG[rt] = REG[rs] op SignExtImm

    bits 31:26 25:21 20:16 15:0

    name op rs rt imm

    # bits 6 5 5 16

    Load Word

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    13/45

    13

    Load Word PC = PC + 4 REG[rt] = MEM[signextendImm + REG[rs]]

    bits 31:26 25:21 20:16 15:0

    name op rs rt immediate

    # bits 6 5 5 16

    Store Word

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    14/45

    14

    Store Word PC = PC + 4 MEM[signextendImm + REG[rs]] = REG[rt]

    bits 31:26 25:21 20:16 15:0

    name op rs rt immediate

    # bits 6 5 5 16

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    15/45

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    16/45

    16

    A Single-cycle Processor

    Performance refresher ET = IC * CPI * CT Single cycle CPI == 1; That sounds great Unfortunately, Single cycle CT is large

    Even RISC instructions take quite a bite of effort toexecute This is a lot to do in one cycle

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    17/45

    17

    Our Hardware is Mostly Idle

    Cycle time = 18 nsSlowest module (alu) is ~6ns

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    18/45

    Processor Design inTwo Acts

    Act II: A pipelined CPU

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    19/45

    Pipelining

    Letter Answer

    A Allows the execution of multiple instructions to

    overlap

    B Prevents branch articulation

    C Significantly decreases the amount of time it

    takes to execute a particular instructionD Significantly increases the amount of time it

    takes to implement a particular instruction

    E A and D

    19

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    20/45

    Pipelining

    Letter Answer

    A Increases instruction count

    B Reduces CPI

    C Reduces cycle time

    D Has no effect on performance

    E B and C

    20

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    21/45

    Data hazards

    Letter Answer

    A Occur because a value is not ready when its

    needed

    B Occur because the next PC is not yet known.

    C Cannot be removed.

    D A and B

    E All of the above

    21

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    22/45

    Stalling a processor

    Letter Answer

    A Reduces CPI and increases instruction count.

    B Means that instructions early in the pipeline

    stop making progress

    C Can resolve some hazards.

    D B and C

    E A and C

    22

    F di

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    23/45

    Forwarding

    Letter Answer

    A Is just for email.

    B Allows the processor to resolve control

    hazards.

    C Improves CPI

    D Reduces cycle time

    E Interacts poorly with stalling.

    23

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    24/45

    24

    Pipelining Review

    Pi li i

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    25/45

    Pipelining

    Break up the logic with pipeline registers intopipeline stages Each stage can act on different instruction/data States/Control Signals of instructions are hold in

    pipeline registers (latches)

    25

    2ns 2ns 2ns 2ns 2ns

    10nslatch

    latch

    latc

    h

    latch

    latch

    latch

    latch

    latch

    Pi li i

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    26/45

    Pipelining

    26

    2ns 2ns 2ns 2ns 2nslatch

    lat

    ch

    lat

    ch

    lat

    ch

    lat

    ch

    lat

    ch

    cycle #1

    2ns 2ns 2ns 2ns 2nslatch

    latch

    latch

    latch

    latch

    latch

    cycle #2

    2ns 2ns 2ns 2ns 2nslatch

    latch

    latch

    latch

    latch

    latch

    cycle #3

    2ns 2ns 2ns 2ns 2nsla

    tch

    la

    tch

    la

    tch

    la

    tch

    la

    tch

    la

    tch

    cycle #4

    2ns 2ns 2ns 2ns 2nslatch

    latch

    latch

    latch

    latch

    latch

    cycle #5

    P f f i li

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    27/45

    Performance of a pipeline processor

    If we have 500 instructions , whats the speedup of a5-stage pipeline processor with 2 ns cycle time v.s. asingle-cycle processor with 10 ns cycle time?

    A. 5

    B. 4.96

    C. 2.78D. 1

    E. None of the above

    27

    R Cl k

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    28/45

    Recap: Clock

    A hardware signal defines when data is valid andstable Think about the clock in real life!

    We use edge-triggered clocking

    Values stored in the sequential logic is updated only on aclock edge

    28

    sequential logiccombinational logic

    Th 5 St MIPS Pi li

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    29/45

    The 5-Stage MIPS Pipeline

    Instruction Fetch Read the instruction Decode

    Figure out the incominginstruction?

    Fetch the operands from theregister file

    Execution: ALU Perform ALU functions

    Memory access Read/write data memory Write back results to registers

    Write to register file

    36

    Execution (EXE)

    Instruction Fetch (IF)

    Instruction Decode (ID)

    Memory Access (MEM)

    Write Back (WB)

    Pipelined Datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    30/45

    p D p

    Read

    Address

    Instruction

    Memory

    Add

    PC

    4

    Write Data

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    Read

    Data 1

    Read

    Data 2

    16 32

    ALU

    Shift

    left 2

    Add

    Data

    Memory

    Address

    Write Data

    Read

    Data

    Sign

    Extend

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    31/45

    Pipelined datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]

    inst[31:0]

    mux

    0

    1

    mux

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegDst

    RegWrite MemWrite

    PCSrc

    Zero

    PCSrc = Branch & Zero

    IF/ID ID/EX EX/MEM MEM/WB

    Instruction FetchInstruction

    DecodeExecution

    Memory

    Access

    Write

    Back

    Will this work?

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    32/45

    Pipelined datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]

    inst[31:0]

    mux

    0

    1

    mux

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    add $1, $2, $3

    lw $4, 0($5)

    sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    33/45

    Pipelined datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]inst[31:0]

    mux

    0

    1

    mux

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    add $1, $2, $3

    lw $4, 0($5)

    sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    34/45

    Pipelined datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]

    inst[31:0]

    mux

    0

    1

    mux

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    add $1, $2, $3

    lw $4, 0($5)

    sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    35/45

    Pipelined datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]

    inst[31:0]

    mux

    0

    1

    mu

    x

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    add $1, $2, $3

    lw $4, 0($5)

    sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    36/45

    RegDst

    P pel ned datapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[15:11]

    inst[31:0]

    mux

    0

    1

    mu

    x

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemRead

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    Is this right?

    add $1, $2, $3

    lw $4, 0($5)

    sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    ALUop

    Pipelined datapath

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    37/45

    p n atapath

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[31:0]

    mux

    0

    1

    mu

    x

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemReadRegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    inst[15:11]

    ALUop

    Pipelined datapath + control

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    38/45

    p p

    Read

    Address

    Instruction

    Memory

    ALU

    Write Data

    4

    Add

    Read

    Data 1

    Read

    Data 2

    Read Reg 1

    Read Reg 2

    Write Reg

    Register

    File

    inst[25:21]

    inst[20:16]

    inst[31:0]

    mux

    0

    1

    mu

    x

    0

    1sign-

    extend 3216

    Data

    Memory

    AddressRead

    Data

    mux

    1

    0

    Write Data

    mux

    1

    0

    AddShiftleft 2

    ALUSrc

    MemtoReg

    MemReadRegDst

    RegWrite MemWrite

    PCSrc

    Zero

    IF/ID ID/EX EX/MEM MEM/WB

    inst[15:11]

    ALUop

    Control WB

    ME

    EX

    WB

    ME

    WB

    RegWrite

    Simplified pipeline diagram

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    39/45

    p f p p g

    1.Use symbols to represent the physical resourceswith the abbreviations for pipeline stages.

    1. IF, ID, EXE, MEM, WB

    2.Horizontal axis represent the timeline, vertical axis

    for the instruction stream3.Example:

    add $1, $2, $3

    lw $4, 0($5)sub $6, $7, $8

    sub $9,$10,$11

    sw $1, 0($12)

    IF EXE WBID MEM

    IF EXE WBID MEM

    IF EXEID MEM

    IF EXEID

    IF ID

    WB

    WBMEM

    EXE WBMEM

    What how much speedup should pipelining

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    40/45

    p p p p gprovide and why?

    Letter Answer

    A 5x, by Amdahls law (x = 0.8 , S = 6.25)

    B 25x, by the PE, since CPI goes up by 5xand cycle time goes down by 5x

    C 2.24x, by the PE and the quotient rule

    D 5x, by the PE since cycle time goes downby 90%

    E 5x, by the PE since clock rate goes up by

    5x 48

    Pipelining Inaction

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    41/45

    50

    Pipelining Inaction

    Imem 2 77 ns

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    42/45

    Ctrl 0.797 ns

    Imem 2.77 ns

    ArgBMux 1.124 ns

    ALU 6.527ns

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    43/45

    Dmem 1.744 ns

    ALU 6.527 ns

    WriteRegMux

    3.067 ns

    RegFile 2.27 ns

    Single-cycle Implementation to scale

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    44/45

    53

    Ideal 5-stage Pipeline (3.733ns -> 267Mhz)

    18.667ns -> 3.733ns == 80% reduction in CT Lold = IC * CPI * CTold Lnew = IC * CPI * CTnew

    CTnew = 0.2 * CTold Lnew = 0.2 * Lold Speed up = Lold/Lnew = 5x

    Single-cycle Implementation to scale

  • 8/12/2019 4 29 03 ImplementingMIPS 0429

    45/45

    54

    Ideal 5-stage Pipeline (3.733ns -> 267Mhz)

    Realistic 5-stage Pipeline

    Letter Whats the actual

    speedup? Clock rate?

    A 3x; 150MhzB 1.02; 76.6Mhz

    C 2.85x; 153Mhz

    D 5.49x; 294Mhz

    E None of the abve