94
1 Digital Design Copyright © 2006 Frank Vahid Digital Design Chapter 5: Register-Transfer Level (RTL) Design

Chapter 5: Register-Transfer Level (RTL) Designharris/cs151/slides/dd_vahid_ch5.pdf(coin detected) – Input from datapath comparator’s output, which we named tot_lt_s • Controller’s

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 1Digital DesignCopyright © 2006Frank Vahid

    Digital Design

    Chapter 5: Register-Transfer Level

    (RTL) Design

  • 5.1

    2Digital DesignCopyright © 2006Frank Vahid

    IntroductionCombinational

    logic

    n0s1 s0

    n1

    bobi

    clk State register

    FSM

    inpu

    ts

    FSM

    outp

    uts

    • Chapter 3: Controllers– Control input/output: single bit (or just a

    few) representing event or state– Finite-state machine describes

    behavior; implemented as state register and combinational logic

    • Chapter 4: Datapath components– Data input/output: Multiple bits

    collectively representing single entity– Datapath components included

    registers, adders, ALU, comparators, register files, etc.

    • This chapter: custom processors– Processor: Controller and datapath

    components working together to implement an algorithm

    Register Comparatorsi

    ansis

    Register fileALU

    z

    e

    Combinationallogic

    n0s1 s0

    n1

    bobi

    State register

    Register file

    ALU

    DatapathController

  • 3Digital DesignCopyright © 2006Frank Vahid

    RTL Design: Capture Behavior, Convert to Circuit• Recall

    – Chapter 2: Combinational Logic Design• First step: Capture behavior (using equation

    or truth table)• Remaining steps: Convert to circuit

    – Chapter 3: Sequential Logic Design• First step: Capture behavior (using FSM)• Remaining steps: Convert to circuit

    • RTL Design (the method for creating custom processors)– First step: Capture behavior (using high-

    level state machine, to be introduced) – Remaining steps: Convert to circuit

    Capture behavior

    Convert to circuit

  • 5.2

    4Digital DesignCopyright © 2006Frank Vahid

    RTL Design Method

  • 5Digital DesignCopyright © 2006Frank Vahid

    RTL Design Method: “Preview” Example• Soda dispenser

    – c: bit input, 1 when coin deposited

    – a: 8-bit input having value of deposited coin

    – s: 8-bit input having cost of a soda

    – d: bit output, processor sets to 1 when total value of deposited coins equals or exceeds cost of a soda

    as

    cd

    Sodadispenserprocessor

    as

    cd

    Sodadispenserprocessor

    25

    1 025

    1

    1

    500

    0

    0

    0tot: 25

    tot: 50

    a

    How can we precisely describe this processor’s behavior?

  • 6Digital DesignCopyright © 2006Frank Vahid

    Preview Example: Step 1 --Capture High-Level State Machine

    • Declare local register tot• Init state: Set d=0, tot=0• Wait state: wait for coin

    – If see coin, go to Add state• Add state: Update total value:

    tot = tot + a– Remember, a is present coin’s

    value– Go back to Wait state

    • In Wait state, if tot >= s, go to Disp(ense) state

    • Disp state: Set d=1 (dispense soda)– Return to Init state

    Inputs: c (bit), a (8 bits), s (8 bits)Outputs: d (bit)Local registers: tot (8 bits)

    Wait

    Add

    Disp

    Init

    d=0tot=0

    c’*(tot

  • 7Digital DesignCopyright © 2006Frank Vahid

    Preview Example: Step 2 -- Create Datapath

    ldclr

    tot

    8-bit<

    8-bitadder

    8

    8

    88

    s a

    Datapath

    tot_ldtot_clr

    tot_lt_s

    Inputs : c (bit), a(8 bits), s (8 bits)Outputs : d (bit)Local reg isters: tot (8 bits)

    Wait

    Add

    Disp

    Init

    d=0tot=0

    c‘ (tot

  • 8Digital DesignCopyright © 2006Frank Vahid

    Preview Example: Step 3 –Connect Datapath to a Controller

    • Controller’s inputs– External input c

    (coin detected)– Input from datapath

    comparator’s output, which we named tot_lt_s

    • Controller’s outputs– External output d

    (dispense soda)– Outputs to datapath

    to load and clear the tot register

    tot_lt_s

    tot_clr

    tot_ld

    Controller Datapath

    s

    c

    d

    a8 8

    ldclr

    tot

    8-bit<

    8-bitadder

    8

    8

    88

    s a

    Datapath

    tot_ldtot_clr

    tot_lt_s

  • 9Digital DesignCopyright © 2006Frank Vahid

    Preview Example: Step 4 – Derive the Controller’s FSM

    • Same states and arcs as high-level state machine

    • But set/read datapathcontrol signals for all datapathoperations and conditions

    tot_lt_s

    tot_clr

    tot_ld

    Con

    trolle

    r

    Dat

    apat

    h

    s

    c

    d

    a8 8

    ldclr tpt

    8-bit<

    8-bitadder

    8

    8

    88

    s a

    Datapath

    tot_ldtot_clr

    tot_lt_s

    Inputs::c, tot_lt_s(bit)Outputs:d, tot_ld, tot_clr (bit)

    Wait

    Disp

    Init

    d=0tot_clr=1

    c’* tot_lt_s’

    c’*tot_lt_s

    d=1

    c

    tot_ld=1

    c

    d

    tot_ld

    tot_clr

    tot_lt_s

    Controller

    Add

  • 10Digital DesignCopyright © 2006Frank Vahid

    Preview Example: Completing the Design• Implement the FSM as

    a state register and logic– As in Ch3– Table shown on right

    d000000000

    1

    000000001

    0

    111100000

    0

    n0111111001

    0

    n1000010110

    0

    010101010

    0

    c001100110

    0

    s1000000001

    1

    s0000011110

    1

    tot_lt_s

    tot_ld

    tot_clr

    Init

    Wai

    tAd

    dDi

    sp

    Inputs::c, tot_lt_s(bit)Outputs:d, tot_ld, tot_clr (bit)

    Wait

    Disp

    Init

    d=0tot_clr=1

    c’* tot_lt_s’

    c’*tot_lt_s

    d=1

    c

    tot_ld=1

    c

    d

    tot_ld

    tot_clr

    tot_lt_s

    Controller

    Add

  • 11Digital DesignCopyright © 2006Frank Vahid

    Step 1: Create a High-Level State Machine• Let’s consider each step of the

    RTL design process in more detail

    • Step 1– Soda dispenser example– Not an FSM because:

    • Multi-bit (data) inputs a and s• Local register tot• Data operations tot=0, tot

  • 12Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance Measurer

    Object ofinterest

    D

    2D = T sec * 3*108 m/secsensor

    laser

    T (in seconds)

    • Example of how to create a high-level state machine to describe desired processor behavior

    • Laser-based distance measurement – pulse laser, measure time T to sense reflection– Laser light travels at speed of light, 3*108 m/sec – Distance is thus D = T sec * 3*108 m/sec / 2

  • 13Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance Measurer

    sensor

    laser

    T (in seconds)

    Laser-baseddistancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensor

    • Inputs/outputs– B: bit input, from button to begin measurement– L: bit output, activates laser– S: bit input, senses laser reflection– D: 16-bit output, displays computed distance

  • 14Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance MeasurerLaser-based

    distancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensor

    Inputs: BOutputs:

    , S(1 bit each)L (bit), D (16 bits)

    S0 ?

    a L = 0 (laser off)D = 0 (distance = 0)

    • Step 1: Create high-level state machine• Begin by declaring inputs and outputs• Create initial state, name it S0

    – Initialize laser to off (L=0)– Initialize displayed distance to 0 (D=0)

  • 15Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance MeasurerLaser-based

    distancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensor

    Inputs: B, S (1 bit each)Outputs: L (bit), D (16 bits)

    S0

    L = 0D = 0

    S1 ?

    B’ (button not pressed)

    B(buttonpressed)

    S0

    a

    • Add another state, call S1, that waits for a button press– B’ – stay in S1, keep waiting– B – go to a new state S2

    Q: What should S2 do? A: Turn on the lasera

  • 16Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance Measurer

    S0 S1 S2

    L = 0D = 0

    L = 1(laser on)

    S3

    L = 0(laser off)

    B’

    B

    Laser-based

    distancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensor

    Inputs: B, S (1 bit each)Outputs: L (bit), D (16 bits)

    a

    • Add a state S2 that turns on the laser (L=1)• Then turn off laser (L=0) in a state S3

    Q: What do next? A: Start timer, wait to sense reflectiona

  • 17Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance MeasurerLaser-based

    distancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensorLocal Registers: Dctr (16 bits)

    S0 S1 S2 S3

    L = 0D = 0

    L = 1 L = 0Dctr = Dctr + 1(count cycles)

    Dctr = 0(reset cycle

    count)

    B’

    B

    S’ (no reflection)

    S (reflection)?

    Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)

    a

    • Stay in S3 until sense reflection (S)• To measure time, count cycles for which we are in S3

    – To count, declare local register Dctr– Increment Dctr each cycle in S3– Initialize Dctr to 0 in S1. S2 would have been O.K. too

  • 18Digital DesignCopyright © 2006Frank Vahid

    Step 1 Example: Laser-Based Distance MeasurerLaser-based

    distancemeasurer16

    from button

    to displayS

    L

    D

    Bto laser

    from sensor

    S0 S1 S2 S3

    L = 0D = 0

    L = 1 L=0Dctr = Dctr + 1

    Dctr = 0

    B’ S’

    B SD = Dctr / 2

    (calculate D)

    S4

    Local Registers: Dctr (16 bits)Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)

    a

    • Once reflection detected (S), go to new state S4– Calculate distance – Assuming clock frequency is 3x108, Dctr holds number of meters, so

    D=Dctr/2

    • After S4, go back to S1 to wait for button again

  • 19Digital DesignCopyright © 2006Frank Vahid

    Step 2: Create a Datapath• Datapath must

    – Implement data storage– Implement data computations

    • Look at high-level state machine, do three substeps– (a) Make data inputs/outputs be datapath

    inputs/outputs– (b) Instantiate declared registers into the

    datapath (also instantiate a register for each data output)

    – (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations

    Instantiate: to introduce a new component into a design.

  • 20Digital DesignCopyright © 2006Frank Vahid

    Step 2 Example: Laser-Based Distance Measurer(a) Make data

    inputs/outputs be datapathinputs/outputs

    (b) Instantiate declared registers into the datapath (also instantiate a register for each data output)

    (c) Examine every state and transition, and instantiate datapathcomponents and connections to implement any data computations

    DatapathDreg_clr

    Dctr_clrDctr_cnt

    Dreg_ld

    Local Registers: Dctr (16 bits)

    S0 S1 S2 S3

    L = 0D = 0

    L = 1 L=0Dctr = Dctr + 1

    Dctr = 0

    B‘ S‘

    B SD = Dctr / 2

    (calculate D)

    S4

    loadQ

    IDreg: 16-bit

    registerQ

    Dctr: 16-bitup-counter

    16

    D

    clearclearcount

    a

    Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)

  • 21Digital DesignCopyright © 2006Frank Vahid

    Step 2 Example: Laser-Based Distance Measurer(c) (continued)

    Examine every state and transition, and instantiate datapathcomponents and connections to implement any data computations

    clearcount

    clearload

    Q Q

    IDctr: 16-bitup-counter

    Dreg: 16-bitregister

    16

    D

    Datapath

    Dreg_clr

    Dctr_clrDctr_cnt

    Dreg_ld 16

    16

    >>1

    a

    Local Registers: Dctr (16 bits)

    S0 S1 S2 S3

    L = 0D = 0

    L = 1 L=0Dctr = Dctr + 1

    Dctr = 0

    B‘ S‘

    B SD = Dctr / 2

    (calculate D)

    S4

    Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)

  • 22Digital DesignCopyright © 2006Frank Vahid

    Step 2 Example Showing Mux Use

    • Introduce mux when one component input can come from more than one source

    T0

    T1

    R = E + F

    R = R + G

    E,F, G, R (16 bits)Localregisters:

    (a)

    E F G

    A B+

    R

    add_A_s0add_B_s0

    2⋅ 1 2⋅ 1

    (d)

    ××

    a

    E F G

    A B+

    R

    (b)

    E F G

    A B+

    R

    (c)

  • 23Digital DesignCopyright © 2006Frank Vahid

    Step 3: Connecting the Datapath to a Controller

    • Laser-based distance measurer example

    • Easy – just connect all control signals between controller and datapath300 MHz Clock

    D

    B L

    S

    16to display

    from buttonController

    to laserfrom sensor

    Datapath

    Dreg_clr

    Dreg_ld

    Dctr_clr

    Dctr_cnt

    clearcount

    clearload

    Q Q

    IDctr: 16-bitup-counter

    Dreg: 16-bitregister

    16D

    Datapath

    Dreg_clr

    Dctr_clrDctr_cnt

    Dreg_ld

  • 24Digital DesignCopyright © 2006Frank Vahid

    Step 4: Deriving the Controller’s FSM

    • FSM has same structure as high-level state machine– Inputs/outputs all

    bits now– Replace data

    operations by bit operations using datapath

    300 MHz Clock

    D

    B L

    S

    16to display

    from button Controllerto laserfrom sensor

    Datapath

    Dreg_clr

    Dreg_ld

    Dctr_clr

    Dctr_cnt

    Inputs: B, SOutputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

    S0 S1 S2 S3

    L = 0 L = 1 L = 0L = 0

    B’ S’

    B SS4

    L = 0

    Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)Local Registers: Dctr (16 bits)

    S0 S1 S2 S3

    L = 0D = 0

    L = 1 L=0Dctr = Dctr + 1

    Dctr = 0

    B’ S’

    B SD = Dctr / 2

    (calculate D)

    S4

    a

    Dreg_clr = 1Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser off)(clear D reg)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 1Dctr_cnt = 0(clear count)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser on)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 1(laser off)(count up)

    Dreg_clr = 0Dreg_ld = 1Dctr_clr = 0Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)

  • Step 4: Deriving the Controller’s FSM

    25Digital DesignCopyright © 2006Frank Vahid

    • Using shorthand of outputs not assigned implicitly assigned 0

    S0 S1 S2 S3

    L = 0 L = 1 L = 0L = 0

    B’ S’

    B SS4

    L = 0Dreg_clr = 1Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser off)(clear D reg)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 1Dctr_cnt = 0(clear count)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser on)

    Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 1(laser off)(count up)

    Dreg_clr = 0Dreg_ld = 1Dctr_clr = 0Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)

    S0 S1 S2 S3

    L = 0 L = 1 L = 0

    B’ S’

    B S

    (laser on)

    S4

    Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

    Dreg_clr = 1(laser off)(clear D reg)

    Dctr_clr = 1(clear count) Dctr_cnt = 1

    (laser off)(count up)

    Dreg_ld = 1Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)

    a

  • 26Digital DesignCopyright © 2006Frank Vahid

    Step 4

    • Implement FSM as state register and logic (Ch3) to complete the design

    clearcount

    clearload

    Q Q

    IDctr: 16-bitup-counter Dreg: 16-bitregister

    16D

    Datapath

    Dreg_clr

    Dctr_clrDctr_cnt

    Dreg_ld

    300 MHz Clock

    D

    B L

    S

    16to display

    from button

    Con

    trolle

    r to laserfrom sensor

    Dat

    apat

    h

    Dreg_clr

    S0 S1 S2 S3

    L = 0 L = 1 L = 0

    B’ S’

    B S

    (laser on)

    S4

    Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

    Dreg_clr = 1(laser off)(clear D reg)

    Dctr_clr = 1(clear count) Dctr_cnt = 1

    (laser off)(count up)

    Dreg_ld = 1Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)

    Dreg_ld

    Dctr_clr

    Dctr_cnt

  • 5.3

    27Digital DesignCopyright © 2006Frank Vahid

    RTL Design Examples and Issues

    32

    4 A

    rdD

    Per0 Per1 Per15

    Masterprocessor

    Faddr

    4

    ADrd

    Bus interface

    Main part

    Peripheral

    Q32

    to/from processor bus

    32 4

    • We’ll use several more examples to illustrate RTL design

    • Example: Bus interface– Master processor can read

    register from any peripheral• Each register has unique 4-bit

    address • Assume 1 register/periph.

    – Sets rd=1, A=address– Appropriate peripheral places

    register data on 32-bit D lines• Periph’s address provided on

    Faddr inputs (maybe from DIP switches, or another register)

  • 28Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Bus Interface

    WaitMyAddress

    Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)

    rd’ rd

    SendData

    D = “Z”Q1 = Q

    (A = Faddr)and rd

    ((A = Faddr)and rd’)

    D = Q1

    • Step 1: Create high-level state machine– State WaitMyAddress

    • Output “nothing” (“Z”) on D, store peripheral’s register value Q into local register Q1

    • Wait until this peripheral’s address is seen (A=Faddr) and rd=1– State SendData

    • Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines)

  • 29Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Bus Interface

    WaitMyAddress

    Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)

    rd’ rd

    SendData

    D = “Z”Q1 = Q

    (A = Faddr)and rd

    ((A = Faddr)and rd’)

    D = Q1

    W W

    ZD Z ZQ1 Q1

    W W WSD SD SD

    clkInputs

    StateOutputs

    rd

  • 30Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Bus Interface

    WaitMyAddress

    Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)

    rd’ rd

    SendData

    D = “Z”Q1 = Q

    (A = Faddr)and rd

    ((A = Faddr)and rd)’

    D = Q1

    • Step 2: Create a datapath(a) Datapath inputs/outputs(b) Instantiate declared registers(c) Instantiate datapath components and

    connections

    DatapathBus interface

    Q1_ldld Q1

    F Qaddr

    4 4 32

    A

    D_en

    A_eq_Faddr= (4-bit) 32

    32

    D

    a

  • 31Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Bus Interface

    • Step 3: Connect datapath to controller• Step 4: Derive controller’s FSM

    a

    WaitMyAddress

    Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)

    rd’ rd

    SendData

    D = “Z”Q1 = Q

    (A = Faddr)and rd

    ((A = Faddr)and rd)’

    D = Q1rd

    Inputs: rd, A_eq_Faddr (bit)Outputs: Q1_ld, D_en (bit)

    WaitMyAddress

    rd‘ rd

    SendData

    D_en = 0Q1_ld = 1

    D_en = 1Q1_ld = 0

    A_eq_Faddrand rd

    (A_eq_Faddrand rd)‘

    DatapathBus interface

    Q1_ldld Q1

    Faddr Q

    4 4 32

    A

    D_en

    A_eq_Faddr= (4-bit) 32

    32

    D

  • 32Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    • Video is a series of frames (e.g., 30 per second)• Most frames similar to previous frame

    – Compression idea: just send difference from previous frame

    Digitizedframe 2

    1 Mbyte

    Frame 2

    Digitizedframe 1

    Frame 1

    1 Mbyte(a)

    Digitizedframe 1

    Frame 1

    1 Mbyte(b)

    Only difference: ball moving

    aDifference of

    2 from 1

    0.01 Mbyte

    Frame 2

    Just send difference

  • 33Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    Frame 2Frame 1compare Each is a pixel, assume

    represented as 1 byte(actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)

    • Need to quickly determine whether two frames are similar enough to just send difference for second frame– Compare corresponding 16x16 “blocks”

    • Treat 16x16 block as 256-byte array– Compute the absolute value of the difference of each array item– Sum those differences – if above a threshold, send complete frame

    for second frame; if below, can use difference method (using another technique, not described)

  • 34Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    !(i

  • 35Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    !(i=256)• S3: add difference to sum,

    increment index• S4: done, write to output

    sad_reg

    B

    A

    go

    SAD

    sad

    Inputs: A, B (256 byte memory); go (bit)Outputs: sad (32 bits)Local registers: sum, sad_reg (32 bits); i (9 bits)

    !goS0go

    S1 sum = 0i = 0

    S3 sum=sum+abs(A[i]-B[i])i=i+1

    S4 sad_reg = sum

    S2

    i

  • 36Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    • Step 2: Create datapath

    !(i

  • 37Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    !(i

  • 38Digital DesignCopyright © 2006Frank Vahid

    RTL Example: Video Compression – Sum of Absolute Differences

    • Comparing software and custom circuit SAD – Circuit: Two states (S2 & S3) for

    each i, 256 i’s 512 clock cycles– Software: Loop (for i = 1 to 256), but

    for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i – say about 6 cycles per array item 256*6 = 1536 cycles

    – Circuit is about 3 times (300%) faster

    – Later, we’ll see how to build SAD circuit that is even faster

    !(i

  • 39Digital DesignCopyright © 2006Frank Vahid

    RTL Design Pitfalls and Good Practice• Common pitfall: Assuming

    register is update in the state it’s written– Final value of Q?– Final state?– Answers may surprise you

    • Value of Q unknown• Final state is C, not D

    – Why?• State A: R=99 and Q=R

    happen simultaneously• State B: R not updated with

    R+1 until next clock cycle, simultaneously with state register being updated

    A B

    C

    D

    R>=100

    R

  • 40Digital DesignCopyright © 2006Frank Vahid

    RTL Design Pitfalls and Good Practice

    BA B2

    C

    D

    R>=100

    R

  • 41Digital DesignCopyright © 2006Frank Vahid

    RTL Design Pitfalls and Good Practice• Common pitfall:

    Reading outputs– Outputs can only be

    written– Solution: Introduce

    additional register, which can be written and read

    Inputs: A, B (8 bits)Outputs: P (8 bits)

    Inputs: A, B (8 bits)Outputs: P (8 bits)Local register: R (8 bits)

    TS

    P=P+BP=A

    S T

    R=AP=A

    P=R+B

    (a) (b)

  • 42Digital DesignCopyright © 2006Frank Vahid

    RTL Design Pitfalls and Good Practice• Good practice: Register

    all data outputs– In fig (a), output P would

    show spurious values as addition computes

    • Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component

    – In fig (b), spurious outputs reduced, and longest register-to-register path is clear

    +

    RB

    P

    +

    R

    Preg

    B

    P(b)

    (a)

  • 43Digital DesignCopyright © 2006Frank Vahid

    Control vs. Data Dominated RTL Design• Designs often categorized as control-dominated or data-

    dominated– Control-dominated design – Controller contains most of the

    complexity– Data-dominated design – Datapath contains most of the complexity– General, descriptive terms – no hard rule that separates the two

    types of designs– Laser-based distance measurer – control dominated– Bus interface, SAD circuit – mix of control and data– Now let’s do a data dominated design

  • 44Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Filter concept

    – Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle)

    – That 240 is probably wrong!• Could be electrical noise

    – Filter should remove such noise in its output Y

    – Simple filter: Output average of last N values

    • Small N: less filtering• Large N: more filtering, but

    less sharp output

    1212

    Y

    clk

    X

    digital filter

  • 45Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• FIR filter

    – “Finite Impulse Response”– Simply a configurable weighted

    sum of past input values– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

    • Above known as “3 tap”• Tens of taps more common• Very general filter – User sets the

    constants (c0, c1, c2) to define specific filter

    – RTL design• Step 1: Create high-level state

    machine– But there really is none! Data

    dominated indeed.• Go straight to step 2

    1212

    Y

    clk

    X

    digital filter

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

  • 46Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath

    – Begin by creating chain of xt registers to hold past values of X

    1212Y

    clk

    Xdigital filter

    xt0 xt1 xt2

    12 12 12 12

    x(t-2)x(t-1)x(t)3-tap FIR filter

    X Y

    clk

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

    180 180181 180181240

    Suppose sequence is: 180, 181, 240

    a

  • 47Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath

    (cont.)– Instantiate registers for

    c0, c1, c2– Instantiate multipliers to

    compute c*x values

    1212Y

    clk

    Xdigital filter

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

    xt0 xt1 xt2

    x(t-2)x(t-1)x(t)3-tap FIR filter

    X

    Y

    clk

    c1c0 c2

    ∗∗ ∗

    a

  • 48Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath

    (cont.)– Instantiate adders

    1212Y

    clk

    Xdigital filter

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

    xt0 xt1 xt2

    x(t-2)x(t-1)x(t)

    3-tap FIR filter

    X

    Y

    clk

    c0 c1 c2

    ∗ ∗ ∗

    + +

    a

  • 49Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath (cont.)

    – Add circuitry to allow loading of particular c register

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

    a

    1212Y

    clk

    Xdigital filter

    xt0 xt1 xt2

    x(t-2)x(t-1)x(t)

    3-tap FIR filter

    X

    Y

    clk

    c0 c1 c2

    * *

    +

    *

    +

    3210

    2x4

    yreg

    eCa1

    CL

    C

    Ca0

  • 50Digital DesignCopyright © 2006Frank Vahid

    Data Dominated RTL Design Example: FIR Filter• Step 3 & 4: Connect to controller, Create FSM

    – No controller needed– Extreme data-dominated example– (Example of an extreme control-dominated design – an FSM, with no

    datapath)• Comparing the FIR circuit to a software implementation

    – Circuit• Assume adder has 2-gate delay, multiplier has 20-gate delay• Longest past goes through one multiplier and two adders

    – 20 + 2 + 2 = 24-gate delay• 100-tap filter, following design on previous slide, would have about a 34-gate

    delay: 1 multiplier and 7 adders on longest path– Software

    • 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction.

    • (100*2 + 100*2)*10 = 4000 gate delays– Circuit is more than 100 times faster (10,000% faster). Wow.

    y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

  • 5.4

    51Digital DesignCopyright © 2006Frank Vahid

    Determining Clock Frequency• Designers of digital circuits

    often want fastest performance– Means want high clock

    frequency• Frequency limited by longest

    register-to-register delay– Known as critical path– If clock is any faster, incorrect

    data may be stored into register– Longest path on right is 2 ns

    • Ignoring wire delays, and register setup and hold times, for simplicity

    a

    +

    b

    c

    2 nsdelay

    clk

  • 52Digital DesignCopyright © 2006Frank Vahid

    Critical Path• Example shows four paths

    – a to c through +: 2 ns– a to d through + and *: 7 ns– b to d through + and *: 7 ns– b to d through *: 5 ns

    • Longest path is thus 7 ns• Fastest frequency

    – 1 / 7 ns = 142 MHz

    + *

    c d

    7 ns7 ns

    5 nsdelay

    2 nsdelay

    Max(2,7,7,5)= 7 ns

    a b

    5 ns

    7 ns

    7 ns2 ns

  • 53Digital DesignCopyright © 2006Frank Vahid

    Critical Path Considering Wire Delays• Real wires have delay too

    – Must include in critical path• Example shows two paths

    – Each is 0.5 + 2 + 0.5 = 3 ns• Trend

    – 1980s/1990s: Wire delays were tiny compared to logic delays

    – But wire delays not shrinking as fast as logic delays

    • Wire delays may even be greater than logic delays!

    • Must also consider register setup and hold times, also add to path

    • Then add some time to the computed path, just to be safe– e.g., if path is 3 ns, say 4 ns instead

    a

    +

    b

    c

    2 ns

    3 ns

    3 ns

    0.5 ns0.5 ns

    0.5 ns

    clk

    3 ns

  • 54Digital DesignCopyright © 2006Frank Vahid

    A Circuit May Have Numerous Paths• Paths can exist

    – In the datapath– In the controller– Between the

    controller and datapath

    – May be hundreds or thousands of paths

    • Timing analysis tools that evaluate all possible paths automatically very helpful

    Combinational logic

    c

    tot_lt_s

    clk

    n1

    d

    tot_ld

    tot_lt_s

    tot_clr

    s0s1

    n0

    State register

    s

    8 8

    8

    8

    a

    ld

    clrtot

    Datapath

    8-bit<

    8-bitadder

    (c)

    (b) (a)

  • 5.5

    55Digital DesignCopyright © 2006Frank Vahid

    Behavioral Level Design: C to Gates

    !goS0go

    S1 sum = 0i = 0

    S3 sum=sum+abs(A[i]-B[i])i=i+1

    S4 sad_reg = sum

    S2

    i

  • 56Digital DesignCopyright © 2006Frank Vahid

    Behavioral-Level Design: Start with C (or Similar Language)

    • Replace first step of RTL design method by two steps– Capture in C, then convert C to high-level state machine– How convert from C to high-level state machine?

    Step 1A: Capture in C

    Step 1B: Convert to high-level state machinea

  • 57Digital DesignCopyright © 2006Frank Vahid

    Converting from C to High-Level State Machine• Convert each C construct to

    equivalent states and transitions

    • Assignment statement– Becomes one state with

    assignment

    • If-then statement– Becomes state with condition

    check, transitioning to “then”statements if condition true, otherwise to ending state

    • “then” statements would also be converted to states

    target=expression

    atarget = expression;

    (then stmts)

    !cond

    cond

    (end)

    if (cond) {// then stmts

    }

    a

  • 58Digital DesignCopyright © 2006Frank Vahid

    Converting from C to High-Level State Machine• If-then-else

    – Becomes state with condition check, transitioning to “then”statements if condition true, or to “else” statements if condition false

    • While loop statement– Becomes state with condition

    check, transitioning to while loop’s statements if true, then transitioning back to condition check

    !cond

    cond

    (end)

    (then stmts) (else stmts)

    if (cond) {// then stmts

    }else {

    // else stmts}

    a

    !cond

    cond(while stmts)

    (end)

    while (cond) {// while stmts

    }a

  • 59Digital DesignCopyright © 2006Frank Vahid

    Simple Example of Converting from C to High-Level State Machine

    • Simple example: Computing the maximum of two numbers– Convert if-then-else statement to states (b)– Then convert assignment statements to states (c)

    (end)

    (c)

    X>Y

    !(X>Y)

    (end)

    (then stmts) (else stmts)

    (b)

    X>Y

    !(X>Y)

    Max=X Max=Y

    (a)

    Inputs: uint X, YOutputs: uint Max

    if (X > Y) {

    }else {

    }

    Max = X;

    Max = Y;

    a a

  • 60Digital DesignCopyright © 2006Frank Vahid

    Example: Converting Sum-of-Absolute-Differences C code to High-Level State Machine

    • Convert each construct to states– Simplify when possible,

    e.g., merge states• From high-level state

    machine, follow RTL design method to create circuit

    • Thus, can convert C to gates using straightforward automatable process– Not all C constructs can be

    efficiently converted– Use C subset if intended

    for circuit– Can use languages other

    than C, of course

    sum = sum + abs(A[i] - B[i]);

    (a)

    Inputs: byte A[256, B[256]bit go;

    Output: int sadmain(){

    uint sum; short uint I;while (1) {

    sum = 0;i = 0;

    while (!go);

    while (i < 256) {

    i = i + 1;}sad = sum;}

    }

    (d)

    !go go

    sum=0i=0

    (g)

    !go go

    sum=0i=0

    !(i

  • 5.6

    61Digital DesignCopyright © 2006Frank Vahid

    Memory Components• Register-transfer level

    design instantiates datapathcomponents to create datapath, controlled by a controller– A few more components are

    often used outside the controller and datapath

    • MxN memory– M words, N bits wide each

    • Several varieties of memory, which we now introduce

    N-bitswide each

    M×N memory

    M w

    ords

  • 62Digital DesignCopyright © 2006Frank Vahid

    Random Access Memory (RAM)• RAM – Readable and writable memory

    – “Random access memory”• Strange name – Created several decades ago to

    contrast with sequentially-accessed storage like tape drives

    – Logically same as register file – Memory with address inputs, data inputs/outputs, and control

    • RAM usually just one port; register file usually two or more

    – RAM vs. register file• RAM typically larger than roughly 512 or 1024

    words• RAM typically stores bits using a bit storage

    approach that is more efficient than a flip flop• RAM typically implemented on a chip in a square

    rather than rectangular shape – keeps longest wires (hence delay) short

    32

    4

    32

    4W_data

    W_addr

    W_en

    R_data

    R_addr

    R_en16×32

    register file

    Register file from Chpt. 4

    32

    10data

    addr

    rw

    en

    1024× 32RAM

    RAM block symbol

  • 63Digital DesignCopyright © 2006Frank Vahid

    RAM Internal Structure32

    10data

    addr

    rw

    en

    1024x32RAM

    addr0addr1

    addr(A-1)

    clkenrw

    addr

    Let A = log2M

    to all cells

    wdata(N-1)

    rdata(N-1)

    wdata(N-2)

    rdata(N-2)

    wdata0

    rdata0

    bit storageblock(aka “cell”)

    word

    word

    RAM cell

    wordenable

    wordenable

    rw

    data cell

    data

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    AxMdecoder

    enable

    • Similar internal structure as register file– Decoder enables appropriate word based on address

    inputs– rw controls whether cell is written or read– Let’s see what’s inside each RAM cell

  • 64Digital DesignCopyright © 2006Frank Vahid

    Static RAM (SRAM)

    • “Static” RAM cell– 6 transistors (recall inverter is 2 transistors)– Writing this cell

    • word enable input comes from decoder• When 0, value d loops around inverters

    – That loop is where a bit stays stored• When 1, the data bit value enters the loop

    – data is the bit to be stored in this cell– data’ enters on other side– Example shows a “1” being written into cell

    addr0addr1

    addr(A-1)

    clkenrw

    add

    r

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    to all cells

    wdata(N-1)

    rdata(N-1)

    wdata(N-2)

    rdata(N-2)

    wdata0

    rdata0

    bit storageblock(aka cell )

    word

    ,,,,

    cell

    wordenable

    wordenable

    rw

    data

    data

    SRAM celldata data’

    d’dcell

    0wordenable

    1

    1

    1

    0

    0

    32

    10data

    addr

    rw

    en

    1024x32RAM

    SRAM celldata data’

    d

    wordenable

    data data’

    d’dcell

    0wordenable

    1 0 a

    a

    a

  • 65Digital DesignCopyright © 2006Frank Vahid

    Static RAM (SRAM)

    • “Static” RAM cell– Reading this cell

    • Somewhat trickier• When rw set to read, the RAM logic sets

    both data and data’ to 1• The stored bit d will pull either the left line or

    the right bit down slightly below 1• “Sense amplifiers” detect which side is

    slightly pulled down– The electrical description of SRAM is really

    beyond our scope – just general idea here, mainly to contrast with DRAM...

    addr0addr1

    addr(A-1)

    clkenrw

    add

    r

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    to all cells

    wdata(N-1)

    rdata(N-1)

    wdata(N-2)

    rdata(N-2)

    wdata0

    rdata0

    bit storageblock(aka cell )

    word

    ,,,,

    cell

    wordenable

    wordenable

    rw

    data

    data

    SRAM cell

    32

    10data

    addr

    rw

    en

    1024x32RAM

    data data’

    d

    1

    1 1

    wordenable

    To sense amplifiers

    1 0

    1

  • 66Digital DesignCopyright © 2006Frank Vahid

    Dynamic RAM (DRAM)

    • “Dynamic” RAM cell– 1 transistor (rather than 6)– Relies on large capacitor to store bit

    • Write: Transistor conducts, data voltage level gets stored on top plate of capacitor

    • Read: Just look at value of d• Problem: Capacitor discharges over time

    – Must “refresh” regularly, by reading d and then writing it right back

    addr0addr1

    addr(A-1)

    clkenrw

    add

    r

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    to all cells

    wdata(N-1)

    rdata(N-1)

    wdata(N-2)

    rdata(N-2)

    wdata0

    rdata0

    bit storageblock(aka cell )

    word

    ,,,,

    cell

    wordenable

    wordenable

    rw

    data

    data DRAM cell

    32

    10data

    addr

    rw

    en

    1024x32RAM

    wordenable

    data

    cell

    (a)

    (b)

    data

    enable

    d discharges

    dcapacitorslowlydischarging

  • 67Digital DesignCopyright © 2006Frank Vahid

    Comparing Memory Types• Register file

    – Fastest– But biggest size

    • SRAM– Fast– More compact than register file

    • DRAM– Slowest

    • And refreshing takes time– But very compact

    • Use register file for small items, SRAM for large items, and DRAM for huge items– Note: DRAM’s big capacitor requires

    a special chip design process, so DRAM is often a separate chip

    MxN Memoryimplemented as a:

    registerfile

    SRAM

    DRAM

    Size comparison for samenumber of bits (not to scale)

  • 68Digital DesignCopyright © 2006Frank Vahid

    Reading and Writing a RAMclk

    addr

    data

    rw

    en

    1 2

    9 913

    999 Z 500500

    • Writing– Put address on addr lines, data on data lines, set rw=1, en=1

    • Reading– Set addr and en lines, but put nothing (Z) on data lines, set rw=0– Data will appear on data lines

    • Don’t forget to obey setup and hold times– In short – keep inputs stable before and after a clock edge

    valid

    valid

    Z 500

    accesstime

    setuptime

    holdtime

    setuptime

    clk

    addr

    data

    rw

    3

    1 means write

    RAM[9]now equals 500

    RAM[13]now equals 999

    (b)

  • 69Digital DesignCopyright © 2006Frank Vahid

    RAM Example: Digital Sound Recorder

    • Behavior– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM

    • We’ll use a 4096x16 RAM (12-bit wide RAM not common)– Play back later– Common behavior in telephone answering machine, toys, voice recorders

    • To record, processor should read a-to-d, store read values into successive RAM words– To play, processor should read successive RAM words and enable d-to-a

    wire

    speaker

    microphone

    wireanalog-to-

    digitalconverter

    digital-to-analog

    converterad_ld da_ld

    Rrw RenRa12

    16

    processor

    ad_buf

    data

    addr

    rw en

    4096⋅ 16RAM

  • 70Digital DesignCopyright © 2006Frank Vahid

    RAM Example: Digital Sound Recorder

    ad_ld=1ad_buf=1Ra=aRrw=1Ren=1

    S

    a=0

    a=a+1

    a=4095

    a

  • 71Digital DesignCopyright © 2006Frank Vahid

    RAM Example: Digital Sound Recorder– Now create play behavior– Use local register a again,

    create state machine that counts from 0 to 4095 again

    • For each a– Read RAM– Write to digital-to-analog conv.

    • Note: Must write d-to-a one cycle after reading RAM, when the read data is available on the data bus

    – The record and play state machines would be parts of a larger state machine controlled by signals that determine when to record or play

    a

    da_ld=1

    ad_buf=0Ra=aRrw=0Ren=1

    V

    a=0

    a=a+1

    a=4095

    a

  • 72Digital DesignCopyright © 2006Frank Vahid

    Read-Only Memory – ROM• Memory that can only be read from, not

    written to– Data lines are output only– No need for rw input

    • Advantages over RAM– Compact: May be smaller– Nonvolatile: Saves bits even if power supply

    is turned off– Speed: May be faster (especially than

    DRAM)– Low power: Doesn’t need power supply to

    save bits, so can extend battery life• Choose ROM over RAM if stored data won’t

    change (or won’t change often)– For example, a table of Celsius to Fahrenheit

    conversions in a digital thermometer

    32

    10data

    addr

    rw

    en

    1024× 32RAM

    RAM block symbol

    32

    10data

    addr

    en

    1024x32ROM

    ROM block symbol

  • 73Digital DesignCopyright © 2006Frank Vahid

    Read-Only Memory – ROM32

    10data

    addr

    en

    1024x32ROM

    ROM block symbol

    ROM cell

    addr0addr1

    addr(A-1)

    clken

    addr

    Let A = log2M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    AxMdecoder

    wordenable

    rdata(N-1) rdata(N-2) rdata0

    bit storageblock(aka “cell”)

    word

    wordenable

    wordenable

    data

    data

    • Internal logical structure similar to RAM, without the data input lines

  • 74Digital DesignCopyright © 2006Frank Vahid

    ROM Types• If a ROM can only be read, how

    are the stored bits stored in the first place?– Storing bits in a ROM known as

    programming– Several methods

    • Mask-programmed ROM– Bits are hardwired as 0s or 1s

    during chip manufacturing• 2-bit word on right stores “10”• word enable (from decoder) simply

    passes the hardwired value through transistor

    – Notice how compact, and fast, this memory would be

    cell cell

    wordenable

    data line data line01

    addr0addr1

    addr(A-1)

    en

    addr

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    data(N-1) data(N-2) data0

    bit storageblock(a cell )

    word

    ,,,,

    cellword

    enableword

    enable

    data

    data

  • 75Digital DesignCopyright © 2006Frank Vahid

    ROM Types• Fuse-Based Programmable

    ROM– Each cell has a fuse– A special device, known as a

    programmer, blows certain fuses (using higher-than-normal voltage)

    • Those cells will be read as 0s (involving some special electronics)

    • Cells with unblown fuses will be read as 1s

    • 2-bit word on right stores “10”– Also known as One-Time

    Programmable (OTP) ROM

    cell cell

    wordenable

    data line data line11

    blown fusefuse

    addr0addr1

    addr(A-1)

    en

    addr

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    data(N-1) data(N-2) data0

    bit storageblock(a cell )

    word

    ,,,,

    cellword

    enableword

    enable

    data

    data

    a

  • 76Digital DesignCopyright © 2006Frank Vahid

    ROM Types• Erasable Programmable ROM

    (EPROM)– Uses “floating-gate transistor” in each cell– Special programmer device uses higher-

    than-normal voltage to cause electrons to tunnel into the gate

    • Electrons become trapped in the gate• Only done for cells that should store 0• Other cells (without electrons trapped in

    gate) will be 1– 2-bit word on right stores “10”

    • Details beyond our scope – just general idea is necessary here

    – To erase, shine ultraviolet light onto chip• Gives trapped electrons energy to escape• Requires chip package to have window

    addr0addr1

    addr(A-1)

    en

    addr

    Let A = log2 M

    a0a1

    d0

    d1

    d(M-1)

    a(A-1)

    e

    A ⋅ Mdecoder

    wordenable

    data(N-1) data(N-2) data0

    bit storageblock(a cell )

    word

    ,,,,

    cellword

    enableword

    enable

    data

    data

    cell cell

    wordenable

    data line data line

    eÐeÐa

    ting

    g

    a

    t

    e t

    r

    t

    or

    trapped electrons

    01

    float

    ing-

    gate

    tra

    nsis

    tor

  • 77Digital DesignCopyright © 2006Frank Vahid

    ROM Types• Electronically-Erasable Programmable ROM

    (EEPROM)– Similar to EPROM

    • Uses floating-gate transistor, electronic programming to trap electrons in certain cells

    – But erasing done electronically, not using UV light– Erasing done one word at a time

    • Flash memory– Like EEPROM, but all words (or large blocks of

    words) can be erased simultaneously– Become common relatively recently (late 1990s)

    • Both types are in-system programmable– Can be programmed with new stored bits while in the

    system in which the ROM operates• Requires bi-directional data lines, and write control input• Also need busy output to indicate that erasing is in

    progress – erasing takes some time

    a

    ting

    g

    a

    t

    e t

    r

    t

    or

    32

    10data

    addr

    en

    write

    busy

    1024x32EEPROM

  • 78Digital DesignCopyright © 2006Frank Vahid

    ROM Example: Talking Doll4096x16 ROM

    processor

    d

    a

    Ra

    16

    Ren

    da_ld

    digital-to-analog

    converter

    v

    speaker

    vibrationsensor

    “Hello there!”

    “Hello there!” audio divided into 4096 samples, storedin ROM

    “Hello there!”

    a

    • Doll plays prerecorded message, trigger by vibration– Message must be stored without power supply Use a ROM, not a RAM,

    because ROM is nonvolatile• And because message will never change, use a mask-programmed ROM or

    OTP ROM– Processor should wait for vibration (v=1), then read words 0 to 4095 from

    the ROM, writing each to the d-to-a

  • 79Digital DesignCopyright © 2006Frank Vahid

    ROM Example: Talking Doll

    d

    a

    4096x16 ROM

    processor

    Ra

    16

    Ren

    da_ld

    digital-to-analog

    converter

    v

    Sa=0

    da_ld=1a=a+1a=4095

    a

  • 80Digital DesignCopyright © 2006Frank Vahid

    ROM Example: Digital Telephone Answering Machine Using a Flash Memory

    • Want to record the outgoing announcement

    – When rec=1, record digitized sound in locations 0 to 4095

    – When play=1, play those stored sounds to digital-to-analog converter

    • What type of memory?– Should store without power

    supply – ROM, not RAM– Should be in-system

    programmable – EEPROM or Flash, not EPROM, OTP ROM, or mask-programmed ROM

    – Will always erase entire memory when reprogramming – Flash better than EEPROM

    analog-to-digital

    converterdigital-to-

    analogconverterad_ld

    da_ld

    Rrw Rener buRa12

    16

    processor

    ad_buf

    busy

    4096x16 Flash

    recplayrecord

    microphone speaker

    “We’re not home.”

  • 81Digital DesignCopyright © 2006Frank Vahid

    ROM Example: Digital Telephone Answering Machine Using a Flash Memory

    • High-level state machine– Once rec=1, begin

    erasing flash by setting er=1

    – Wait for flash to finish erasing by waiting for bu=0

    – Execute loop that sets local register a from 0 to 4095, reading analog-to-digital converter and writing to flash for each a

    en

    analog-to-digital

    converterdigital-to-

    analogconverterad_ld

    da_ld

    Rrw Ren er buRa12

    16

    processor

    ad_buf

    4096x16 Flash

    recplayrecord

    microphone speaker

    a

    w

    d r

    Ter=0

    bu

    bu’

    er=1rec

    S

    Local register: a (13 bits)

    a=4096

    a

  • 82Digital DesignCopyright © 2006Frank Vahid

    Blurring of Distinction Between ROM and RAM• We said that

    – RAM is readable and writable– ROM is read-only

    • But some ROMs act almost like RAMs– EEPROM and Flash are in-system programmable

    • Essentially means that writes are slow– Also, number of writes may be limited (perhaps a few million times)

    • And, some RAMs act almost like ROMs– Non-volatile RAMs: Can save their data without the power supply

    • One type: Built-in battery, may work for up to 10 years• Another type: Includes ROM backup for RAM – controller writes RAM contents to

    ROM before turning off• New memory technologies evolving that merge RAM and ROM benefits

    – e.g., MRAM• Bottom line

    – Lot of choices available to designer, must find best fit with design goals

    EEPROMROM Flash

    NVRAM

    RAMa

  • 5.7

    83Digital DesignCopyright © 2006Frank Vahid

    Queues• A queue is another component

    sometimes used during RTL design

    • Queue: A list written to at the back, from read from the front– Like a list of waiting restaurant

    customers• Writing called a push, reading

    called a pop• Because first item written into a

    queue will be the first item read out, also called a FIFO (first-in-first-out)

    frontback

    read (andremove) itemsfrom front ofthe queue

    write itemsto the backof the queue

  • 84Digital DesignCopyright © 2006Frank Vahid

    Queues

    r f

    01234567

    fr

    0

    A

    1234567

    A

    fr

    0

    AB

    1234567

    B

    fr

    0

    B

    1234567

    A

    • Queue has addresses, and two pointers: rear and front– Initially both point to 0

    • Push (write)– Item written to address pointed to

    by rear– rear incremented

    • Pop (read)– Item read from address pointed

    to by front– front incremented

    • If front or rear reaches 7, next (incremented) value should be 0 (for a queue with addresses 0 to 7)

    a

    a

    a

  • 85Digital DesignCopyright © 2006Frank Vahid

    Queues• Treat memory as a circle

    – If front or rear reaches 7, next (incremented) value should be 0 rather than 8 (for a queue with addresses 0 to 7)

    • Two conditions of interest– Full queue – no room for more items

    • In 8-entry queue, means 8 items present• No further pushes allowed until a pop occurs• Causes front=rear

    – Empty queue – no items• No pops allowed until a push occurs• Causes front=rear

    – Both conditions have front=rear• To detect whether front=rear means full or

    empty, need state machine that detects if previous operation was push or pop, sets full or empty output signal (respectively)

    fr

    0

    B

    1234567

    A

    B

    1 7

    2 6

    3 5

    4

    0

    frr

    a

  • 86Digital DesignCopyright © 2006Frank Vahid

    Queue Implementation8⋅ 16 register file

    clr

    3-bitup counter

    3-bitup counter

    incclr

    inc

    rear front

    =

    wr

    rd

    reset

    wdata rdata16 16

    33

    wdata

    waddrwr

    rdata

    raddr

    rd

    eq

    Cont

    rolle

    r

    full

    empty8-word 16-bit queue

    • Can use register file for item storage

    • Implement rear and frontusing up counters– rear used as register file’s

    write address, front as read address

    • Simple controller would set control lines for pushes and pops, and also detect full and empty situations– FSM for controller not

    shown

  • 87Digital DesignCopyright © 2006Frank Vahid

    Common Uses of a Queue• Computer keyboard

    – Pushes pressed keys onto queue, meanwhile pops and sends to computer

    • Digital video recorder– Pushes captured frames, meanwhile pops frames, compresses

    them, and stores them

    • Computer network routers– Pushes incoming packets onto queue, meanwhile pops packets,

    processes destination information, and forwards each packet out over appropriate port

  • 88Digital DesignCopyright © 2006Frank Vahid

    Queue Usage Example

    r f

    01234567

    fr

    0123456

    9585723

    7

    fr

    01234567

    f r

    01234567

    9585723

    95857236

    r f

    01234567

    data:9

    full35857236

    ERROR! Pushing a full queueresults in unknown state

    Initially emptyqueue

    1. After pushing9, 5, 8, 5, 7, 2, 3

    2. After popping

    3. After pushing 6

    4. After pushing 3

    5. After pushing 4

    • Example series of pushes and pops– Note how rear and front

    pointers move– Note that popping doesn’t

    really remove the data from the queue, but that data is no longer accessible

    – Note how rear (and front) wraps around from address 7 to 0

    • Note: pushing a full queue is an error– As is popping an empty queue

  • 5.8

    89Digital DesignCopyright © 2006Frank Vahid

    Hierarchy – A Key Design Concept

    • Hierarchy– An organization with a few items at the

    top, with each item decomposed into other items

    – Common example: A country• 1 item at the top (the country)• Country item decomposed into

    state/province items• Each state/province item decomposed into

    city items• Hierarchy helps us manage complexity

    – To go from transistors to gates, muxes, decoders, registers, ALUs, controllers, datapaths, memories, queues, etc.

    – Imagine trying to comprehend a controller and datapath at the level of gates

    P

    r

    o

    vin

    c

    e 3

    P

    r

    o

    vin

    c

    e 2

    P

    r

    o

    vin

    c

    e 1

    CityF

    Country A

    vinvin

    vin

    P

    r

    o

    c

    P

    r

    o

    c

    e 2

    P

    r

    o

    c

    e 1

    Province 1

    Province 2

    Province 3

    Province 1

    Province 2

    Province 3

    Map showing all levels of hierarchy

    Map showing just top two levels of hierarchy

    CityGCityE

    CityDCityA

    CityB

    CityC

    Country A

  • 90Digital DesignCopyright © 2006Frank Vahid

    Hierarchy and Abstraction

    • Abstraction– Hierarchy often involves not just grouping

    items into a new item, but also associating higher-level behavior with the new item, known as abstraction

    • e.g., an 8-bit adder has an understandable high-level behavior – it adds two 8-bit binary numbers

    – Frees designer from having to remember, or even from having to understand, the lower-level details P

    r

    o

    vin

    c

    e 3

    P

    r

    o

    vin

    c

    e 2

    P

    r

    o

    vin

    c

    e 1

    vin

    P

    r

    o

    c

    e 1

    vin

    P

    r

    o

    P

    r

    o

    a7.. a0 b7.. b0

    s7.. s0co

    ci8-bit adder

    vin

    c c

    e 2 e 3

  • 91Digital DesignCopyright © 2006Frank Vahid

    Hierarchy and Composing Larger Components from Smaller Versions

    • A common task is to compose smaller components into a larger one

    – Gates: Suppose you have plenty of 3-input AND gates, but need a 9-input AND gate

    • Can simple compose the 9-input gate from several 3-input gates

    – Muxes: Suppose you have 4x1 and 2x1 muxes, but need an 8x1 mux

    • s2 selects either top or bottom 4x1• s1s0 select particular 4x1 input• Implements 8x1 mux – 8 data inputs, 3 selects, one output

    vin

    P

    r

    o

    c

    e 3

    P

    r

    o

    c

    e 2

    4⋅ 1

    2⋅ 1

    d

    d

    i0i1

    i1

    i0

    i2i3

    i0i1i2i3

    i4i5i6i7

    s1 s0

    s04⋅ 1

    d

    i0i1i2i3

    s1 s0

    s1 s0 s2

    a

    P

    P

    r

    o

    vin

    c

    e 3

    P

    r

    o

    vin

    c

    e 2

    P

    r

    o

    vin

    c

    e 1

    r

    o

    vin

    vin

    c

    e 1

  • 92Digital DesignCopyright © 2006Frank Vahid

    Hierarchy and Composing Larger Components from Smaller Versions

    • Composing memory very common• Making memory words wider

    – Easy – just place memories side-by-side until desired width obtained– Share address/control lines, concatenate data lines– Example: Compose 1024x8 ROMs into 1024x32 ROM

    P

    r

    o

    vin

    c

    e 3

    P

    r

    o

    vin

    c

    e 2

    P

    r

    o

    vin

    c

    e 1

    vin

    P

    r

    o

    c

    e 3

    P

    r

    o

    P

    r

    o

    1024x32ROM

    1024x8ROM

    data

    addr

    endata

    8 8

    32

    8 8

    10

    10

    en

    en

    addr

    addr

    data(31..0)

    1024x8ROM

    addr

    endata

    1024x8ROM

    addr

    endata

    1024x8ROM

    addr

    endata

  • 93Digital DesignCopyright © 2006Frank Vahid

    Hierarchy and Composing Larger Components from Smaller Versions

    • Creating memory with more words – Put memories on top of one another until the

    number of desired words is achieved– Use decoder to select among the memories

    • Can use highest order address input(s) as decoder input

    • Although actually, any address line could be used

    – Example: Compose 1024x8 memories into 2048x8 memory

    P

    r

    o

    vin

    c

    e 3

    P

    r

    o

    vin

    c

    e 2

    P

    r

    o

    vin

    c

    e 1

    vinvin

    P

    r

    o

    c

    e 3

    P

    r

    o

    c

    e 2

    P

    1024x8ROM

    addr

    en data

    1024x8ROM

    addr

    en data

    0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 1 0

    0 1 1 1 1 1 1 1 1 1 00 1 1 1 1 1 1 1 1 1 1

    1 0 0 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 1 0

    1 1 1 1 1 1 1 1 1 1 01 1 1 1 1 1 1 1 1 1 1

    a0a10a9a8

    a10 just chooses which memory to access

    a

    2048x8ROM

    data

    8

    11

    11

    en

    addr

    1024x8ROM

    addr

    en data

    8

    1024x8ROM

    addr

    en data

    8

    a9..a0

    a10 d0

    d1

    en

    addr

    1x2dcdi0e

    a

    To create memory with more words and wider words, can first compose to enough words, then widen.

  • 94Digital DesignCopyright © 2006Frank Vahid

    Chapter Summary– Modern digital design involves creating processor-level components– Four-step RTL method can be used

    • 1. High-level state machine 2. Create datapath 3. Connect datapathto controller 4. Derive controller FSM

    – Several example• Control dominated, data dominated, and mix

    – Determining fastest clock frequency• By finding critical path

    – Behavioral-level design – C to gates• By using method to convert C (subset) to high-level state machine

    – Additional RTL components• Memory: RAM, ROM• Queues

    – Hierarchy: A key concept used throughout Chapters 2-5

    Digital DesignIntroductionRTL Design: Capture Behavior, Convert to CircuitRTL Design MethodRTL Design Method: “Preview” ExamplePreview Example: Step 1 --Capture High-Level State MachinePreview Example: Step 2 -- Create DatapathPreview Example: Step 3 – Connect Datapath to a ControllerPreview Example: Step 4 – Derive the Controller’s FSMPreview Example: Completing the DesignStep 1: Create a High-Level State MachineStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 2: Create a DatapathStep 2 Example: Laser-Based Distance MeasurerStep 2 Example: Laser-Based Distance MeasurerStep 2 Example Showing Mux UseStep 3: Connecting the Datapath to a ControllerStep 4: Deriving the Controller’s FSMStep 4: Deriving the Controller’s FSMStep 4RTL Design Examples and IssuesRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeControl vs. Data Dominated RTL DesignData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterDetermining Clock FrequencyCritical PathCritical Path Considering Wire DelaysA Circuit May Have Numerous PathsBehavioral Level Design: C to GatesBehavioral-Level Design: Start with C (or Similar Language)Converting from C to High-Level State MachineConverting from C to High-Level State MachineSimple Example of Converting from C to High-Level State MachineExample: Converting Sum-of-Absolute-Differences C code to High-Level State MachineMemory ComponentsRandom Access Memory (RAM)RAM Internal StructureStatic RAM (SRAM)Static RAM (SRAM)Dynamic RAM (DRAM)Comparing Memory TypesReading and Writing a RAMRAM Example: Digital Sound RecorderRAM Example: Digital Sound RecorderRAM Example: Digital Sound RecorderRead-Only Memory – ROMRead-Only Memory – ROMROM TypesROM TypesROM TypesROM TypesROM Example: Talking DollROM Example: Talking DollROM Example: Digital Telephone Answering Machine Using a Flash MemoryROM Example: Digital Telephone Answering Machine Using a Flash MemoryBlurring of Distinction Between ROM and RAMQueuesQueuesQueuesQueue ImplementationCommon Uses of a QueueQueue Usage ExampleHierarchy – A Key Design ConceptHierarchy and AbstractionHierarchy and Composing Larger Components from Smaller VersionsHierarchy and Composing Larger Components from Smaller VersionsHierarchy and Composing Larger Components from Smaller VersionsChapter Summary