Register Transfer Methodology II · Methodology II. RTL Hardware Design by P. Chu Chapter 12 2...

Preview:

Citation preview

RTL Hardware Design

by P. Chu

Chapter 12 1

Register Transfer

Methodology II

RTL Hardware Design

by P. Chu

Chapter 12 2

Outline

1. Design example: One–shot pulse

generator

2. Design Example: GCD

3. Design Example: UART

4. Design Example: SRAM Interface

Controller

5. Square root approximation circuit

RTL Hardware Design

by P. Chu

Chapter 12 3

1. One–shot pulse generator

• Sequential circuit divided into

– Regular sequential circuit: w/ regular next-state logic

– FSM: w/ random next-state logic

– FSMD: w/ both

• Division for code development; no formal

definition;

• Some design can be coded in different types

• FSMD is most flexible

• One–shot pulse generator as an example

RTL Hardware Design

by P. Chu

Chapter 12 4

• Basic block diagram

RTL Hardware Design

by P. Chu

Chapter 12 5

• Refined block diagram of FSMD

RTL Hardware Design

by P. Chu

Chapter 12 6

• Regular sequential circuit. E.g., mod-10

counter

RTL Hardware Design

by P. Chu

Chapter 12 7

• FSM. E.g., edge-detection circuit

RTL Hardware Design

by P. Chu

Chapter 12 8

• FSMD.

E.g., multiplier

RTL Hardware Design

by P. Chu

Chapter 12 9

• One-shot pulse generator

– I/O: Input: go, stop; Output: pulse

– go is the trigger signal, usually asserted for

only one clock cycle

– During normal operation, assertion of go

activates pulse for 5 clock cycles

– If go is asserted again during this interval, it

will be ignored

– If stop is asserted during this interval, pulse

will be cut short and return to 0

RTL Hardware Design

by P. Chu

Chapter 12 10

• FSM implementation

RTL Hardware Design

by P. Chu

Chapter 12 11

RTL Hardware Design

by P. Chu

Chapter 12 12

RTL Hardware Design

by P. Chu

Chapter 12 13

RTL Hardware Design

by P. Chu

Chapter 12 14

• Regular sequential circuit implementation

– Based on a mod-5 counter

– Use a flag FF to indicate whether counter should be active

– Code difficult to comprehend

RTL Hardware Design

by P. Chu

Chapter 12 15

RTL Hardware Design

by P. Chu

Chapter 12 16

• FSMD Implementation

RTL Hardware Design

by P. Chu

Chapter 12 17

RTL Hardware Design

by P. Chu

Chapter 12 18

RTL Hardware Design

by P. Chu

Chapter 12 19

• Comparison:

– FSMD is most flexible and easy to

comprehend

• What happens to the following

modifications

– The delay extend from 5 cycles to 100 cycles

– The stop signal is only effective for the first 2

delay cycles and will be ignored otherwise

RTL Hardware Design

by P. Chu

Chapter 12 20

• “Programmable” one-shot generator

– The desired width can be programmed.

– The circuit enters the programming mode when both go and stop are asserted

– The desired width shifted in via go in the next

three clock cycles

RTL Hardware Design

by P. Chu

Chapter 12 21

• Can be easily

extended in

ASMD chart

• How about FSM

and regular

sequential circuit?

RTL Hardware Design

by P. Chu

Chapter 12 22

2. GCD circuit

• GCD: Greatest Common Divisor

– E.g, gcd(1, 10)=1, gcd(12,9)=3

• GCD without division:

RTL Hardware Design

by P. Chu

Chapter 12 23

• Pseudo algorithm

RTL Hardware Design

by P. Chu

Chapter 12 24

• Modified pseudo algorithm w/o while loop

RTL Hardware Design

by P. Chu

Chapter 12 25

• ASMD chart

RTL Hardware Design

by P. Chu

Chapter 12 26

• VHDL code

RTL Hardware Design

by P. Chu

Chapter 12 27

RTL Hardware Design

by P. Chu

Chapter 12 28

RTL Hardware Design

by P. Chu

Chapter 12 29

• What is the problem of this code?

• Another observation

RTL Hardware Design

by P. Chu

Chapter 12 30

RTL Hardware Design

by P. Chu

Chapter 12 31

• What is the performance now?

• Can we do better with more hardware

resources

RTL Hardware Design

by P. Chu

Chapter 12 32

3. SRAM Controller

• SRAM = Static Random Access Memory

• To improve density, clock is not used

• Basic Signals

– Oe = Output enable

– Cs = chip select

– We write enable

• Read cycle (when data becomes valid)

– Address or oe-controlled

RTL Hardware Design

by P. Chu

Chapter 12 33

• Basic block diagram

RTL Hardware Design

by P. Chu

Chapter 12 34

• Taa = Address access time– Roughly used as “speed” e.g. “20-ns SRAM”

• Toh = Output hold time– Time data valid after address change

• Trc = Total (min) time of read cycle– Close to Taa for SRAM

READ CYCLE

RTL Hardware Design

by P. Chu

Chapter 12 35

• Tolz = Output enable to low Z time– Time to leave high-Z state after OE enabled

• Toe = Output enable to output valid time– Time data valid after OE

• Tohz = Output to Z time– Time to enter Hi-Z state after OE disabled

RTL Hardware Design

by P. Chu

Chapter 12 36

• Twp = Time that we asserted– Data is latched during this period

• Tas = Address setup time– Time address valid before we

• Tah = Address hold time– Time address valid after we

WRITE CYCLE

RTL Hardware Design

by P. Chu

Chapter 12 37

• Tds = Data setup time– Time data valid before latching edge (we 0 to 1)

• Tdh = Data hold time– Time data valid after latching edge

• Twr = Write cycle time– Time (min) between write operations

WRITE CYCLE

RTL Hardware Design

by P. Chu

Chapter 12 38

• Example Timing Parameters

RTL Hardware Design

by P. Chu

Chapter 12 39

• Takes commands from system

• Executes read/write cycles

• Role of SRAM Controller

RTL Hardware Design

by P. Chu

Chapter 12 40

• Block Diagram

RTL Hardware Design

by P. Chu

Chapter 12 41

• Sketch FSM

– Activities of read/write cycles

• Refine

– Consider actual timings

– Identify overlapping operations

• Control Path of SRAM Controller

RTL Hardware Design

by P. Chu

Chapter 12 42

• 5 State write cycle

RTL Hardware Design

by P. Chu

Chapter 12 43

• Revised 3 State write cycle

RTL Hardware Design

by P. Chu

Chapter 12 44

• Control Path

for Slow

SRAM

RTL Hardware Design

by P. Chu

Chapter 12 45

• Write Timing (if Tclk=25ns)

> 5ns> 20ns

> 70ns

> 35ns > 5 ns

RTL Hardware Design

by P. Chu

Chapter 12 46

• Revised States (Write)

• 5 Cycles

+ idle

= 150 ns

• Margin of at

least 5 ns

RTL Hardware Design

by P. Chu

Chapter 12 47

• Repeat Procedure for Read Cycle> 5ns < 120ns

> 120 ns

TohN/A

RTL Hardware Design

by P. Chu

Chapter 12 48

• Assumed ideal FSMD

– Ignoring: output delay, hold/setup time on Rs2m

• More detailed read timing

Tctrl = OE output

delay or Addr

output delay

Toz = Time to High-

Z after OE goes

high

RTL Hardware Design

by P. Chu

Chapter 12 49

• Setup Time Check

– Minimize Tctrloutput look-ahead

Tctrl = Tcq

– 120ns SRAM

25ns Clock

– Met by most device technologies

RTL Hardware Design

by P. Chu

Chapter 12 50

• Hold Time Check

– Since Tctrl = Tcq ,

also easily met

• Other considerations

– Data bus conflict

– Tds, Tdh

– Because of large margin

Ideal, initial analysis probably valid

RTL Hardware Design

by P. Chu

Chapter 12 51

• Final SRAM

Controller

RTL Hardware Design

by P. Chu

Chapter 12 52

• Slow SRAM controller

Only suitable for sporadic access

• What about

Fast 20-ns

SRAM?

RTL Hardware Design

by P. Chu

Chapter 12 53

– Meets timing, but very narrow margin

– Manual fine-tuning

– If back-to-back, requires cell-level (not RTL) design

• One Clock Cycle Access (+idle)

RTL Hardware Design

by P. Chu

Chapter 12 54

Square root approximation circuit

• A example of data-oriented (computation-intensive) application

• Equation:

• 0.125x and 0.5y corresponds to shift right 3 bits and 1 bit

RTL Hardware Design

by P. Chu

Chapter 12 55

• Pseudo code:

RTL Hardware Design

by P. Chu

Chapter 12 56

• Direct “data-flow” implementation

RTL Hardware Design

by P. Chu

Chapter 12 57

RTL Hardware Design

by P. Chu

Chapter 12 58

• Requires one adder and six subtractors

• Code contains only concurrent signal

assignment statements

• The order is not important.

• Sequence of execution is embedded in the

flow of data

RTL Hardware Design

by P. Chu

Chapter 12 59

• Data flow graph

– Shows data dependency

– Node (circle): an operation

– Arcs: input and output

variables

• Note that there is limited

degree of parallelism

– At most two operations can

be perform simultaneously

RTL Hardware Design

by P. Chu

Chapter 12 60

• RT methodology can be used to share the operators

• Tasks in converting a dataflow graph to an ASMD chart

– Scheduling: when a function (circle) can start execution

– Binding: which functional unit is assigned to perform the operation

• In square root algorithm,

– all operations can be performed by a modified addition unit

– No function unit is needed for shifting

RTL Hardware Design

by P. Chu

Chapter 12 61

• Scheduling with two functional units

RTL Hardware Design

by P. Chu

Chapter 12 62

• Scheduling with

one functional unit

RTL Hardware Design

by P. Chu

Chapter 12 63

• ASMD

chart

RTL Hardware Design

by P. Chu

Chapter 12 64

• Registers can be shared as well

– reduce the number of unique variables

– A variable can be reused if its value is no longer

needed

• E.g.,

RTL Hardware Design

by P. Chu

Chapter 12 65

RTL Hardware Design

by P. Chu

Chapter 12 66

• VHDL code

– Needs to manually code the data path to

ensure sharing of functional units

– One unit for abs and min

– One unit for abs, min, - and +

– Can be implemented by using an

adder/subtractor with special input and output

routing circuits

RTL Hardware Design

by P. Chu

Chapter 12 67

RTL Hardware Design

by P. Chu

Chapter 12 68

RTL Hardware Design

by P. Chu

Chapter 12 69

RTL Hardware Design

by P. Chu

Chapter 12 70

RTL Hardware Design

by P. Chu

Chapter 12 71

High-level synthesis

• Convert a “dataflow code” into ASMD based code (RTL code).

– RTL code can be optimized for performance (min # clock cycles), area (min # functional units) etc.

– Perform scheduling, binding

– Minimize # registers and muxes

• Mainly for computation intensive applications (e.g., DSP)

RTL Hardware Design

by P. Chu

Chapter 12 72

Summary

Design examples:

Pulse Generator, GCD, SRAM, Sqrt. Approx.

RT Design Methodology

Flexible, often clearer than pure FSM

Important for initial algorithm to be efficient

Mapping Dataflow to FSMD

Scheduling/binding

Tradeoff of speed/area

High-level Synth for complex algorithms

Recommended