24
3/21/12 1 EEM 334 Digital Systems II Outline • Poor Design Practices • Counters • Register File • Pipelined Circuits

EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

1

EEM 334 Digital Systems II

Outline

•  Poor Design Practices •  Counters •  Register File •  Pipelined Circuits

Page 2: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

2

Poor Design Practices

Poor Design Practices

•  Synchronous design is the most important methodology

•  Poor practice in the past (to save chips) – Misuse of asynchronous reset – Misuse of gated clock – Misuse of derived clock

Page 3: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

3

Misuse of Asynchronous Signals

Do not use asynchronous reset/preset signals in regular operation

Decade Counter with Asynchronous Reset

Page 4: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

4

Decade Counter with Asynchronous Reset

Problems of the Design

•  The transition from 9 (1001) to 0 (0000) is noisy – 1001 -> 1010 -> 0000

•  The design is not reliable – A combinational circuit is needed to produce the

clear signal and glitches may exist – Every glitch will reset the register to “0000”

•  Asynchronous reset is used in the normal operation – Timing analysis is difficult

Page 5: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

5

Remedy: Load “0000” Synchronously

Decade Counter with Asynchronous Reset

Page 6: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

6

Misuse of Gated Clocks

Do not use a gated clock to suspend system operation

Binary Counter with a Gated Clock

Page 7: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

7

Problems of the Design

•  The enable signal changes independent of the clock signal – The output pulse can be very narrow –  It can cause the counter malfunction

•  If the enable is not glitch free – Glitches pass through the and gate – Counter may see them as clock edges

Remedy: Use a Synchronous Enable

Page 8: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

8

Misuse of Derived Clocks

•  A digital system with many subsystems •  Subsystems: Fast (processor), slow (I/O) •  Slow clock signal for the slow subsystem

Misuse of Derived Clocks

•  The system is no longer synchronous – Two clocks with different frequencies and phases – Timing analysis is very involved

Page 9: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

9

Remedy: Use one-clock Enable Pulse

Timer

•  Mod-1M counter: [0, 999999] –  clk: 1 Mhz –  sclk: low for [0, 499999]; high for [500000,

999999] •  Mod-60 second counter: [0, 59]

–  sclk: 1 Hz – mclk: low for [0, 29]; high for [30, 59]

•  Mod-60 minute counter: [0, 59] – mclk: 1/60 Hz

Page 10: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

10

Timer

•  Mod-1M counter: [0, 999999] –  s_en: low for [0, 999999] except for 500000

•  Mod-60 second counter: [0, 59] – m_en: low for [0, 59] except for 30

•  Mod-60 minute counter: [0, 59]

Timer

Page 11: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

11

Timer

Counters

Page 12: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

12

Counters

•  Binary •  Gray counter •  Ring counter •  Linear Feedback Shift Register (LFSR) •  BCD counter

•  State follows binary counting sequence •  Use an incrementor for the next-state logic

d

clk

q

reset

+1r_reg r_next

reset

clk

q

Binary Counter

Page 13: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

13

•  State changes one-bit at a time

•  Use a Gray incrementor

Gray Counter

Gray Counter

Page 14: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

14

Gray Counter

LFSR (Linear Feedback Shift Reg) •  A shifter reg with a special feedback circuit

to generate the serial input •  The feedback circuit performs xor operation

over specific bits •  Can circulate through 2n-1 states for an n-bit

register

Page 15: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

15

4-bit LFSR

•  N-bit LFSR can cycle through 2n-1 states •  The feedback circuit always exists •  The sequence is pseudorandom

4-bit LFSR

Page 16: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

16

•  Pseudorandom: used in testing, data encryption/decryption

•  A counter with simple next-state logic e.g., 128-bit LFSR using 3 xor gates to circulate 2128-1 patterns (takes 1012 years for a 100 GHz system)

LFSR

LFSR

Page 17: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

17

Register File

Register File

•  Registers arranged as an 1-d array •  Each register is identified with an address •  Normally has 1 write port (with write enable signal) •  Can has multiple read ports

Page 18: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

18

Register File

Register File

Page 19: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

19

Register File

Pipelined Circuits

Page 20: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

20

Pipelining •  Two performance criteria:

– Delay: required time to complete one task – Throughput: number of tasks completed per unit

time. •  E.g., ATM machine

– Original: 3 minutes to process a transaction delay: 3 min; throughput: 20 trans per hour

– Option 1: faster machine 1.5 min to process delay: 1.5 min; throughput: 40 trans per hour

– Option 2: two machines delay: 3 min; throughput: 40 trans per hour

•  Pipelined circuit: increase throughput

Pipelining

Page 21: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

21

•  Non-pipelined: – Delay: 60 min – Throughput 1/60 load per min

•  Pipelined: – Delay: 60 min – Throughput k/(40+k*20) load per min

• about 1/20 when k is large – Throughput 3 times better than non-pipelined

Pipelining

Pipelining

Page 22: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

22

Pipelining

Throughput:

Multiplication

Page 23: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

23

8-bit Combinational Multiplier

Pipelined Comb Multiplier

Page 24: EEM 334 Digital Systems II - eskisehir.edu.treem.eskisehir.edu.tr/userfiles/atdogan/files/L09-SeqCktsinPractice.pdf · Poor Design Practices Poor Design Practices • Synchronous

3/21/12

24

Pipelined Comb Multiplier

PIPELINED DESIGN 305

M

55

60

65

70

7s

80

- merged s t a g e 0 C? 1 for p i p e l i n e bvO <= ( o t h e r s = > b (0) ; bpO <=uns igned( ’O’ & (bvO and a ) ) ; pp0 <= bpO; a0 <= a ; bO <= b ( 4 downto 1) ;

bv l <= ( o t h e r s = > b O ( l ) ) ; bp l <=uns igned ( ’0 & ( b v l and aO)) ; pp l -nex t (6 downto 1) <= (’0’ & ppO(5 downto 1 ) ) + b p l ; pp i -nex t (0) <= pp0 (0) ; a l - n e x t <= aO; b l -nex t <= bO(4 downto 2 ) ; - s t a g e 2 bv2 <= ( o t h e r s = > b l - r e g ( 2 ) ) ; bp2 < = u n s i g n e d ( ’0’ & (bv2 and a l - r e g ) ) ; pp l -nex t (7 downto 2) <= (’0’ & p p l - r e g ( 6 downto 2 ) ) + bp2; p p 2 _ n e x t ( i downto 0) <= p p l , r e g ( l downto 0); a2-next <= a l - r a g ; b2-next <= b l - r e g (4 downto 3) ; -- s t a g e 3 bv3 <= ( o t h e r s = > b 2 _ r e g (3)) ; bp3 < = u n s i g n e d ( ’ O ’ & (bv3 and a 2 - r e g ) ) ; pp3-next ( 8 downto 3) <= (’0’ & p p 2 _ r e g ( 7 downto 3 ) ) + bp3; pp3-next (2 downto 0) <= p p 2 _ r e g ( 2 downto 0) ; a3-next <= a 2 - r e g ; b3-next (4 ) <= b2-reg (4) ; - s t a g e 4 bv4 <= ( o t h e r s = > b 3 _ r e g ( 4 ) ) ; bp4 <-uns igned( ’O’ & (bv4 and a 3 - r e g ) ) ; pp4,next (9 downto 4) <= ( ’0 & pp3,reg(8 downto 4 ) ) + bp4; pp4-next (3 downto 0) <= p p 3 _ r e g ( 3 downto 0) ; - o u t p u t y <= s t d - l o g i c - v e c t o r ( p p 4 - r e g ) ;

-

end e f f i - 4 - s t a g e - p i p e - a r c h ;

Tree-shaped pipelined multiplier Discussion in Section 7.5.4 shows that we can re- arrange a cascading network to reduce the propagation delay. In an n-bit combinational multiplier, the critical path consists of n - 1 adders in a cascading network. The critical path can be reduced to flog, nl adders when a tree-shaped network is used. The same scheme can be applied to the pipelined multiplier. The 5-bit tree-shaped combinational circuit is shown in Figure 9.21(a). The five bit products are first evaluated in parallel and then fed into the tree-shaped network. The pipelined version is shown in Figure 9.21(b). It is divided into three stages and the required registers are shown as dark bars. Note that one bit product has to be carried through two stages. The VHDL code is given in Listing 9.23.

Listing 9.23 Tree-shaped three-stage pipelined multiplier

a r c h i t e c t u r e t r e e - p i p e - a r c h of mul t5 is c o n s t a n t W I D T H : i n t e g e r :=5 ; s i g n a l bvO, b v l , bv2 , bv3, bv4:

s t d - l o g i c - v e c t o r ( W I D T H - 1 downto 0) ;

306 SEQUENTIAL CIRCUIT DESIGN: PRACTICE

a b

7 Y

(a). Non-pipelined design

1 + 1

‘f Y

(b). Pipelined design

Figure 9.21 Block diagrams of tree-shaped non-pipelined and pipelined multipliers.

5 s i g n a l bpO, bpl, bp2, bp3, bp4: unsigned (2*WIDTH -1 downto 0) ;

s i g n a l bp4-sl-reg , bp4_s2_reg : unsigned(P*WIDTH-l downto 0) ;

s i g n a 1 bp4-s 1 -next , bp4-s2_next : 10 unsigned(P*WIDTH-l downto 0) ;

s i g n a l ppol-reg , pp23-reg I pp0123_reg, pp01234-reg: unsigned(2*WIDTH-l downto 0) ;

s i g n a l pp0l-next , pp23-next ppOl23-next’ pp01234-next : unsigned (2*WIDTH-1 downto 0) ;

IS begin - p i p e l i n e r e g i s t e r s ( b u f f e r s ) process (clk, reset 1 begin

i f (reset = ’ 1 ’ ) then 20 ppol-reg <= ( o t h e r s = > ’ O ’ ) ;

pp23-reg <= ( o t h e r s = > ’ O ’ ) ; pp0123-reg <= ( o t h e r s = > ’ O ’ ) ; pp01234-reg <= ( o t h e r s = > ’ O ’ ) ; bp4-sl-reg <= ( o t h e r s = > ’ O ’ ) ; bp4-s2_reg <= ( o t h e r s = > ’ O ’ ) ;

PIPELINED DESIGN 307

M

35

40

45

50

5s

elsif (clk'event and clk='lJ) then pp0l-reg <= pp0l-next ; pp23-reg <= pp23-next; pp0123-reg <= pp0123-next; pp01234-reg <= pp01234-next; bp4-sl-reg <= bp4-sl-next; bp4_s2_reg <= bp4-s2_next;

end if ; end process;

- s t a g e I -- b i t p r o d u c t bvO <= (others=>b (0) 1 ; bpO <=unsigned("00000" & (bvO and a ) ) ; bvl <= (others=>b(l)); bpl <=unsigned("0000" & (bvl and a) & bv2 <= (others=>b (2)) ; bp2 <=unsigned("000" & (bv2 and a) & "00"); bv3 <= ( others=>b (3) 1 ; bp3 <=unsigned("OO" & (bv3 and a) & "000"); bv4 <= (others=>b(4)); bp4 <=unsigned("O" & (bv4 and a ) & "0000"); -- a d d e r pp0l-next <= bpO + bpl; pp23-next <= bp2 + bp3; bp4-sl-next <= bp4; -- s t a g e 2 pp0123-next <= pp0l-reg + pp23-reg; bp4-s2_next <= bp4-sl-reg; -- s t a g e 3 pp01234-next <= pp0123-reg + bp4-s2_reg; - o u t p u t y <= std-logic-vector(ppO1234-reg);

end tree -p ipe - ar ch ; In terms of performance, the delay in the tree-shaped multiplier is smaller since it has

only three pipelined stages. The improvement will become more significant for a larger multiplier. On the other hand, the throughput of the two pipelined designs is similar because they have a similar clock rate. Both can generate a new multiplication result in each clock cycle.

Although the division of the adder-based multiplier appears to be reasonable, it is not optimal. Examining the circuit in "finer granularity" can shed light about the data depen- dency on the internal structure and lead to a more efficient partition. This issue is discussed in Section 15.4.2.

9.4.4 Synthesis of pipelined circuits and retiming

The major step of adding pipeline to a combinational circuit is to divide the circuit into adequate stages. To achieve this goal, we must know the propagation delays of the relevant components. However, since the components will be transformed, merged and optimized during synthesis and wiring delays will be introduced during placement and routing, this information cannot easily be determined at the RT level.