EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

1

EEM 334 Digital Systems II

Outline

•  Poor Design Practices •  Counters •  Register File •  Pipelined Circuits

3/21/12

2

Poor Design Practices

Poor Design Practices

•  Synchronous design is the most important methodology

•  Poor practice in the past (to save chips) – Misuse of asynchronous reset – Misuse of gated clock – Misuse of derived clock

3/21/12

3

Misuse of Asynchronous Signals

Do not use asynchronous reset/preset signals in regular operation

Decade Counter with Asynchronous Reset

3/21/12

4


Problems of the Design

•  The transition from 9 (1001) to 0 (0000) is noisy – 1001 -> 1010 -> 0000

•  The design is not reliable – A combinational circuit is needed to produce the

clear signal and glitches may exist – Every glitch will reset the register to “0000”

•  Asynchronous reset is used in the normal operation – Timing analysis is difficult

3/21/12

5

Remedy: Load “0000” Synchronously


3/21/12

6

Misuse of Gated Clocks

Do not use a gated clock to suspend system operation

Binary Counter with a Gated Clock

3/21/12

7

Problems of the Design

•  The enable signal changes independent of the clock signal – The output pulse can be very narrow –  It can cause the counter malfunction

•  If the enable is not glitch free – Glitches pass through the and gate – Counter may see them as clock edges

Remedy: Use a Synchronous Enable

3/21/12

8

Misuse of Derived Clocks

•  A digital system with many subsystems •  Subsystems: Fast (processor), slow (I/O) •  Slow clock signal for the slow subsystem

Misuse of Derived Clocks

•  The system is no longer synchronous – Two clocks with different frequencies and phases – Timing analysis is very involved

3/21/12

9

Remedy: Use one-clock Enable Pulse

Timer

•  Mod-1M counter: [0, 999999] –  clk: 1 Mhz –  sclk: low for [0, 499999]; high for [500000,

999999] •  Mod-60 second counter: [0, 59]

–  sclk: 1 Hz – mclk: low for [0, 29]; high for [30, 59]

•  Mod-60 minute counter: [0, 59] – mclk: 1/60 Hz

3/21/12

10

Timer

•  Mod-1M counter: [0, 999999] –  s_en: low for [0, 999999] except for 500000

•  Mod-60 second counter: [0, 59] – m_en: low for [0, 59] except for 30

•  Mod-60 minute counter: [0, 59]

Timer

3/21/12

11

Timer

Counters

3/21/12

12

Counters

•  Binary •  Gray counter •  Ring counter •  Linear Feedback Shift Register (LFSR) •  BCD counter

•  State follows binary counting sequence •  Use an incrementor for the next-state logic

d

clk

q

reset

+1r_reg r_next

reset

clk

q

Binary Counter

3/21/12

13

•  State changes one-bit at a time

•  Use a Gray incrementor

Gray Counter

Gray Counter

3/21/12

14

Gray Counter

LFSR (Linear Feedback Shift Reg) •  A shifter reg with a special feedback circuit

to generate the serial input •  The feedback circuit performs xor operation

over specific bits •  Can circulate through 2n-1 states for an n-bit

register

3/21/12

15

4-bit LFSR

•  N-bit LFSR can cycle through 2n-1 states •  The feedback circuit always exists •  The sequence is pseudorandom

4-bit LFSR

3/21/12

16

•  Pseudorandom: used in testing, data encryption/decryption

•  A counter with simple next-state logic e.g., 128-bit LFSR using 3 xor gates to circulate 2128-1 patterns (takes 1012 years for a 100 GHz system)

LFSR

LFSR

3/21/12

17

Register File

Register File

•  Registers arranged as an 1-d array •  Each register is identified with an address •  Normally has 1 write port (with write enable signal) •  Can has multiple read ports

3/21/12

18

Register File

Register File

3/21/12

19

Register File

Pipelined Circuits

3/21/12

20

Pipelining •  Two performance criteria:

– Delay: required time to complete one task – Throughput: number of tasks completed per unit

time. •  E.g., ATM machine

– Original: 3 minutes to process a transaction delay: 3 min; throughput: 20 trans per hour

– Option 1: faster machine 1.5 min to process delay: 1.5 min; throughput: 40 trans per hour

– Option 2: two machines delay: 3 min; throughput: 40 trans per hour

•  Pipelined circuit: increase throughput

Pipelining

3/21/12

21

•  Non-pipelined: – Delay: 60 min – Throughput 1/60 load per min

•  Pipelined: – Delay: 60 min – Throughput k/(40+k*20) load per min

• about 1/20 when k is large – Throughput 3 times better than non-pipelined

Pipelining

Pipelining

3/21/12

22

Pipelining

Throughput:

Multiplication

3/21/12

23

8-bit Combinational Multiplier

Pipelined Comb Multiplier

3/21/12

24

Pipelined Comb Multiplier

PIPELINED DESIGN 305

M

55

60

65

70

7s

80

- merged s t a g e 0 C? 1 for p i p e l i n e bvO <= ( o t h e r s = > b (0) ; bpO <=uns igned( ’O’ & (bvO and a ) ) ; pp0 <= bpO; a0 <= a ; bO <= b ( 4 downto 1) ;

bv l <= ( o t h e r s = > b O ( l ) ) ; bp l <=uns igned ( ’0 & ( b v l and aO)) ; pp l -nex t (6 downto 1) <= (’0’ & ppO(5 downto 1 ) ) + b p l ; pp i -nex t (0) <= pp0 (0) ; a l - n e x t <= aO; b l -nex t <= bO(4 downto 2 ) ; - s t a g e 2 bv2 <= ( o t h e r s = > b l - r e g ( 2 ) ) ; bp2 < = u n s i g n e d ( ’0’ & (bv2 and a l - r e g ) ) ; pp l -nex t (7 downto 2) <= (’0’ & p p l - r e g ( 6 downto 2 ) ) + bp2; p p 2 _ n e x t ( i downto 0) <= p p l , r e g ( l downto 0); a2-next <= a l - r a g ; b2-next <= b l - r e g (4 downto 3) ; -- s t a g e 3 bv3 <= ( o t h e r s = > b 2 _ r e g (3)) ; bp3 < = u n s i g n e d ( ’ O ’ & (bv3 and a 2 - r e g ) ) ; pp3-next ( 8 downto 3) <= (’0’ & p p 2 _ r e g ( 7 downto 3 ) ) + bp3; pp3-next (2 downto 0) <= p p 2 _ r e g ( 2 downto 0) ; a3-next <= a 2 - r e g ; b3-next (4 ) <= b2-reg (4) ; - s t a g e 4 bv4 <= ( o t h e r s = > b 3 _ r e g ( 4 ) ) ; bp4 <-uns igned( ’O’ & (bv4 and a 3 - r e g ) ) ; pp4,next (9 downto 4) <= ( ’0 & pp3,reg(8 downto 4 ) ) + bp4; pp4-next (3 downto 0) <= p p 3 _ r e g ( 3 downto 0) ; - o u t p u t y <= s t d - l o g i c - v e c t o r ( p p 4 - r e g ) ;

-

end e f f i - 4 - s t a g e - p i p e - a r c h ;

Tree-shaped pipelined multiplier Discussion in Section 7.5.4 shows that we can re- arrange a cascading network to reduce the propagation delay. In an n-bit combinational multiplier, the critical path consists of n - 1 adders in a cascading network. The critical path can be reduced to flog, nl adders when a tree-shaped network is used. The same scheme can be applied to the pipelined multiplier. The 5-bit tree-shaped combinational circuit is shown in Figure 9.21(a). The five bit products are first evaluated in parallel and then fed into the tree-shaped network. The pipelined version is shown in Figure 9.21(b). It is divided into three stages and the required registers are shown as dark bars. Note that one bit product has to be carried through two stages. The VHDL code is given in Listing 9.23.

Listing 9.23 Tree-shaped three-stage pipelined multiplier

a r c h i t e c t u r e t r e e - p i p e - a r c h of mul t5 is c o n s t a n t W I D T H : i n t e g e r :=5 ; s i g n a l bvO, b v l , bv2 , bv3, bv4:

s t d - l o g i c - v e c t o r ( W I D T H - 1 downto 0) ;

306 SEQUENTIAL CIRCUIT DESIGN: PRACTICE

a b

7 Y

(a). Non-pipelined design

1 + 1

‘f Y

(b). Pipelined design

Figure 9.21 Block diagrams of tree-shaped non-pipelined and pipelined multipliers.

5 s i g n a l bpO, bpl, bp2, bp3, bp4: unsigned (2*WIDTH -1 downto 0) ;

s i g n a l bp4-sl-reg , bp4_s2_reg : unsigned(P*WIDTH-l downto 0) ;

s i g n a 1 bp4-s 1 -next , bp4-s2_next : 10 unsigned(P*WIDTH-l downto 0) ;

s i g n a l ppol-reg , pp23-reg I pp0123_reg, pp01234-reg: unsigned(2*WIDTH-l downto 0) ;

s i g n a l pp0l-next , pp23-next ppOl23-next’ pp01234-next : unsigned (2*WIDTH-1 downto 0) ;

IS begin - p i p e l i n e r e g i s t e r s ( b u f f e r s ) process (clk, reset 1 begin

i f (reset = ’ 1 ’ ) then 20 ppol-reg <= ( o t h e r s = > ’ O ’ ) ;

pp23-reg <= ( o t h e r s = > ’ O ’ ) ; pp0123-reg <= ( o t h e r s = > ’ O ’ ) ; pp01234-reg <= ( o t h e r s = > ’ O ’ ) ; bp4-sl-reg <= ( o t h e r s = > ’ O ’ ) ; bp4-s2_reg <= ( o t h e r s = > ’ O ’ ) ;

PIPELINED DESIGN 307

M

35

40

45

50

5s

elsif (clk'event and clk='lJ) then pp0l-reg <= pp0l-next ; pp23-reg <= pp23-next; pp0123-reg <= pp0123-next; pp01234-reg <= pp01234-next; bp4-sl-reg <= bp4-sl-next; bp4_s2_reg <= bp4-s2_next;

end if ; end process;

- s t a g e I -- b i t p r o d u c t bvO <= (others=>b (0) 1 ; bpO <=unsigned("00000" & (bvO and a ) ) ; bvl <= (others=>b(l)); bpl <=unsigned("0000" & (bvl and a) & bv2 <= (others=>b (2)) ; bp2 <=unsigned("000" & (bv2 and a) & "00"); bv3 <= ( others=>b (3) 1 ; bp3 <=unsigned("OO" & (bv3 and a) & "000"); bv4 <= (others=>b(4)); bp4 <=unsigned("O" & (bv4 and a ) & "0000"); -- a d d e r pp0l-next <= bpO + bpl; pp23-next <= bp2 + bp3; bp4-sl-next <= bp4; -- s t a g e 2 pp0123-next <= pp0l-reg + pp23-reg; bp4-s2_next <= bp4-sl-reg; -- s t a g e 3 pp01234-next <= pp0123-reg + bp4-s2_reg; - o u t p u t y <= std-logic-vector(ppO1234-reg);

end tree -p ipe - ar ch ; In terms of performance, the delay in the tree-shaped multiplier is smaller since it has

only three pipelined stages. The improvement will become more significant for a larger multiplier. On the other hand, the throughput of the two pipelined designs is similar because they have a similar clock rate. Both can generate a new multiplication result in each clock cycle.

Although the division of the adder-based multiplier appears to be reasonable, it is not optimal. Examining the circuit in "finer granularity" can shed light about the data depen- dency on the internal structure and lead to a more efficient partition. This issue is discussed in Section 15.4.2.

9.4.4 Synthesis of pipelined circuits and retiming

The major step of adding pipeline to a combinational circuit is to divide the circuit into adequate stages. To achieve this goal, we must know the propagation delays of the relevant components. However, since the components will be transformed, merged and optimized during synthesis and wiring delays will be introduced during placement and routing, this information cannot easily be determined at the RT level.

Documents

EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous