75
6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

Embed Size (px)

Citation preview

Page 1: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.1

ALU Blocks and Control

1. Adder

2. Multiplier

3. Datapath Generation

Contents

Page 2: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.2

1. Adder Full Adder

Boolean equation

CARRY A B B C C A

A B C (A B)

SUM A B C A B C A B C A B C

A B C CARRY (A B C)

Sum(Odd Parity) CARRY A+B+CC

Page 3: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.3

Which is better?

Boolean Equation 1 :

CARRY evaluation is more urgent since CARRY is in the critical

path

[ Ripple Carry Adder ]

CARRY A B C (A B)

SUM A B C CARRY (A B C)

Boolean Equation 2 : CARRY A B C SUM (A B C)

SUM A B C A B C A B C A B C

ADDER

A0 B0

C0

C1

S0

ADDER

A1 B1

C2

S1

ADDER

A2 B2

Cn

S2

ADDER

An Bn

Cn

Sn

Page 4: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.4

Alternating Complementary Form

At Odd Stages At Even Stages

ABC

ABC

SUM

CARRY

SUM

CARRY

CARRY A B C (A B)

SUM A B C CARRY (A B C)

SUM

CARRY

CARRY (A B) (C A B)

SUM (A B C)(CARRY A B C)

SUM

CARRYABC

ABC

Page 5: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.5

Alternating Complementary Form

Page 6: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.6

Dynamic Serial Adder

A

B

SUM

CARRY

CR/S

Q D

CLOCK

A

B

S

a an1 0

b bn1 0

s sn1 0

)]1()1()1([)1()1()1()1()1(

)]1()1([)()1()1()1(

tCtBtAtCARRYtCtBtAtSUM

tBtAtCtBtAtCARRY

Page 7: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.7

Dynamic Configuration

CK

A

C

B A

B

CARRY GATE

OPTIONALPRECHARGE

DEVICE

SR

CK

CK

S

R

CK

CKC (CARRY)

C B A

CK

A

B

C

CK

CKSUM

SUM GATE

OPTIONALPRECHARGEDEVICE

Set/ResetCircuit

][ BACBACARRY

Page 8: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.8

Full Adder Truth Table

0

01234567

1 2 3

7 6 5 4

Mutually Complement

FC - on terms

FS - on terms

Conjugate Symmetry ; input 을 뒤집으면 output 도 뒤집힌다

A

00001111

B

00

C

01

110011

010101

CARRY

00010111

SUM

01101001

SUM F (A,B,C)

CARRY F (A, B,C)

SUM F (A, B,C)

CARRY F (A,B,C)

S

C

S

C

Page 9: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.9

Another Configuration of Carry & Sum Logic

A

C

B

CARRY STAGE

A B

A

A

1 PROPAGATE

1 PROPAGATE

1 GENERATE

1 GENERATE

CARRY

SUM STAGE

CARRY

B

B

C

CSUM

A B C

A B C

A

A

CARRY(t 1) F (A, B,C) A B B C C A A B C (A B)

SUM(t 1) F (A, B,C) A B C A B C A B C A B C

A B C CARRY (A B C)

C

S

Page 10: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.10

Dynamic full adder using np CMOS logic style

Page 11: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.11

Layout of the dynamic full adder

Page 12: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.12

Looking at the FA Truth Table

A

00001111

B

00

C

01

110011

010101

CARRY

00010111

SUM

01101001

CPCPSUM

BAP whereBPCPCARRY

0BA when C

1BA when CSUM

0=BA when A(orB)

1BA when CCARRY

Page 13: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.13

Transmission Gate Implementation

AB

BA B CARRY

C

A B

C

SUM

C

)( BAP

CP

CPCP

A B

A B

A B

Page 14: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.14

CLA (Carry Lookahead Adder)

C0

P1

G1

P2

G2

P3

G3

P4

G4

C1

C2

C3

C4

C G P C where G A B

= G P G P P G .. + P P .. P P C

S C P

i i i i 1 i i i

i i i 1 i i 1 i 2 i i-1 2 1 0

i i i

Available for (# of inputs 4)

. .

An

Bn

Gn

Pn

Page 15: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.15

Carry bypass structure - basic concept

Page 16: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.16

(N=16)-bit carry bypass adder(each stage: M bits)

tp = tsetup + M * tcarry+(N/M - 1) tbypass + M*tcarry+tsum

tsetup : time to create G and P signals

tcarry : propagation delay through a single bit

tbypass : propagation delay through MUX

tsum : time to generate sum

Worst case delay

Page 17: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.17

Combining 4 Domino Carry Lookahead Blocks

Manchester Carry Chain (4-bit)

Limit 4 stages

In the worst case, 6 Series Tr.s to the ground.

C0

CK

CK

P1

C1

G1

P2

C2

G2

P3

C3

G3

P4

C4

G4

C4

C0 MANCHESTERCARRY CHAIN

G1 P1 G2 P2 G3 P3 G4 P4

C0 C4

C0 C1 C2 C3 C4

C G P C1 2 1 0 GP Block Sum Block

Page 18: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.18

Improving Worst Case Carry Prop. Time

MANCHESTERCARRY CHAIN

C0 C4

C0 C4

CKP1 P2 P3 P4

CK

Faster pass transistor chain due to lower parasitic C loading

Page 19: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.19

Manchester CC Adder Floorplan

Dual CC Scheme One for Carry Prop.

The other for off-loading the 1st CC from the SUM-block.

GP

C4

A4

GP

GP

SUMGENERATE

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

MA

NC

HE

STE

RC

AR

RY

CH

AIN

SUM

SUM

SUM

SUMGENERATE

S4

S3

S2

S1

B4

A3

B3

A2

B2

A1

B1

C0

BIT 4

BIT 3

BIT 2

BIT 1

Page 20: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.20

CSA (Carry Select Adder)

1

0S4

~ S7

C8

S41 ~ S7

1

A4 ~ A7 B4 ~ B7

1

S40 ~ S7

00

S0 ~ S3

A0 ~ A3 B0 ~ B3

C0

C4

S0 ~ S3

C81

C80 )C(CC

)CC(CC

) 0CC always (since CCCCC

CCCCC

084

18

0844

18

18

08

18

084

184

084

1848

A4 ~ A7 B4 ~ B7

Realization of MUX with restoring logic

Note) Realization of MUX with pass-transistor gates

C8

0

1C81

C80

C4

C8

C81

C80

C4

C4

C4

C12

C121

C120

C8

C8

C8

Threshold voltage loss per stage

Vdd Vdd - Vt Vdd - 2Vt

Carry Selection

Use restoring logic for critical path

Page 21: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.21

CSA (Carry Select Adder)

For carry propagation, use restoring logic in the alternating pattern

S0 ~ S3

A0 ~ A3 B0 ~ B3

C0

C4

C80 C8

1

C8

C120 C12

1

Number of bits for each stageex1) 32-bit case : 4, 4, 5, 6, 7, 6 ( or 4, 4, 5, 6, 6, 7)ex2) 64-bit case : 4, 4, 5, 6, 7, 8, 9, 10

Page 22: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.22

Minimization of Carry Propagation Path Delay

Carry Select Scheme (prepare result for each case, Cin=1, Cin=0)

Simplify the carry selection using the characteristic between Ci0 & Ci

1

Take complement carries alternating the Even and Odd stages

Adjust each block size with the consideration to the delay of carry select logic carry propagation delay of each block = = carry propagation delay to the

block adjust

4 4 5 6 6 7

eg. for 32-bit path

Page 23: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.23

16-bit Linear CSA(Carry Select Adder)

tadd = tsetup + M * tcarry+ (N/M ) tmux + tsumM: #of bits/stageN : total # of bits

Page 24: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.24

Square Root CSA

tadd = tsetup + M * tcarry+ 2N tmux + tsum

N = M + (M+1) + ….. + (M+P-1) = MP + P(P-1)/2 = P2/2 + P(M - 1/2 ) ~ P2/2 9 stage

Assumed MUX delay is comparable to 1-stage carry prop delay

12 ~6(?) Number of clock cycles

for this signal to be obtained

Page 25: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.25

Propagation Delay of Linear and Square Root CSA and linear RCA

Page 26: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.26

Carry Skip Adder Ripple Carry Adder 와 CLA Adder 의 Compromise

P p p p p

G g g p g p p g p p p

O3 0 1 2 3

O3 3 2 3 1 3 2 0 3 2 1

a3b3a2b2

a1 b1a0 b0

a15b15

a14b14

a13 b13

a12b12

c0

c4c8c12

P12, 15 P8, 11 P4, 7

c16

G12,15 G8,11 G4,7

Worst case delay

Page 27: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.27

pi’s and gi’s are computed from pi=aibi and gi = aibi

Initially, c4, c8 and c12 are cleared

After 4 clock cycle (at T0+4Tc), G-values are calculated as cout assuming ci=0(P-values are also calculated by then)

At this time (at T0+4Tc), true cout in the first stage, c4 is obtained.

After one, two and three clock cycles respectively, assuming the delay of each AOI gate as Tc, true values of c8, c12 and c16 are obtained.

Sum and cout of the last block are obtained at (T0+4Tc+2Tc+4Tc)

Worst case delay

Page 28: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.28

Comparison of Carry Select & Carry Skip Adder

A 32-bit Carry Select Adder

A 32-bit Carry Skip Adder

RCAAreaArea

kkSpeed

2

delays)r multiplexe where(822

logic-P

22delays)r multiplexe where(12

AreaAreaArea

kkSpeed

RCA

Stage # 1 2 3 4 5 6bits/stage 4 4 5 6 7 6inc. delay 4 1 1 1 1 1

Stage # 1 2 3 4 5 6bits/stage 4 5 6 7 8 2inc. delay 4 1 1 1 1 2

32 bit9k2(k2=delay due to 1-bit addition or MUX)

10k2

Page 29: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.29

Conditional Sum Adder

A2 B2

S21 C3

1 S20 C3

0

MPX

A1 B1

S11 C2

1 S10 C2

0

MPX

A0 B0

S01 C1

1 S00 C1

0

MPX

Triple 2-input MUX

S0

C1

C0

S2

(C1=1)C3

(C1=1)S1

(C1=1)S2

(C1=0)

C3

(C1=0)

S2 C3 S1

S1

(C1=0)

Page 30: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.30

Carry Lookahead Tree Adder

Previous CLA implementation is not very adequate due to fan-in, fan-out problem & irregularity, despite the small(5) number of logic levels. Make it regular, using log2n - logic levels.a3 b3 a2 b2

g3 p3 g2 p2

G2,3 P2,3

G0,3 P0,3

a1 b1 a0 b0

g1 p1 g0 p0

G0,1 P0,1

ai bi

gi pi

Gj+1,k Pj+1,k

Gi,k Pi,k

Gi,j

Pi,j

iii

iii

bap

bag

kjjiki

jikjkjki

PPP

GPGG

,1,,

,,1,1,

[ 1st Part ]

Page 31: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.31

Carry Lookahead Tree Adder

iijjijCPGC ,1

iii

iii

iiii

bag

baP

cbaS

C3 C2

g2

p2

C1 C0

g0

p0

G0,1

P0,1

C2 C0

C0

Cj+1 Ci

Gi,j

Pi,j

Ci

aibi

gi pi

Gj+1,k Pj+1,k

Gi,kPi,k

Gi,jPi,j

a3b3 a2b2

C0

a1 b1 a0b0S3 S2 S1 S0

C3 C2C1 C0

C0

S3

Ci

Cj+1

CiCi

[ 2nd Part ]

[ Complete CLA Tree Adder ]

Page 32: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.32

Carry Save Adder

Ripple Carry Adder

Carry Lookahead Adder

CSA (Conditional Sum Adder)

CSA (Carry Select Adder)

CSA (Carry Skip Adder)

CSA (Carry Save Adder)

Carry Propagate Adder

Page 33: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.33

Carry Save Adder

Carry Save Adder is used wherever a large number of operands have to be added.

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A

aibici

CSAstages

CPA

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.AF.AF.A

CarryF/F

CarryF/F

SumF/F

SumF/F

Previous CycleCarry

Previous CycleSum Operand

Page 34: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.34

2. Multiplier

Add-and-Shift Algorithm

1

0

0

0

1

1

0

1

1

0

0

0

1

0

1

0

1

0

0

0

1

1

0

0

1

1

0

0

0

0

0

Multiplication procedure

by Pencil-and-Paper Method

0

Multiplication procedure

by Add-and-Shift Algorithm

0

0

0

0

1

1

1

0

1

0

1

1

0

0

0

0

1

1

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

1

1

0

1

1

1

0 1+

+

+

+

multiplier

multiplicand

Page 35: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.35

The Serial-Parallel Multiplier

0

0

1n

1n

n

n

01nn

01nn

b2Ab2Ab2ABA

as expressed is BAproduct The

)b, ... ,b,(bB

)a, ... ,a,(aA If

D

D

D

D

D D D D

F.A

D

0

D D D

F.A

D

F.A

D

F.A

D

Output

F.A

D

F.A

D

F.A

D

A

B

b2

b1

b0

a0a1a2a3

Page 36: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.36

4x4 array multiplier

Page 37: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.37

tmult = [(M-1) + (N-1)] * tcarry + (N-1) * tsum+ tand

both tcarry and tsum are important

Sum and Carry generation time need to be similar.

N(4)

M(3)

Page 38: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.38

Carry-save Multiplier(CSM)

Rectangular floorplan of CSM

Page 39: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.39

The Modified Booth Algorithm (cont’)

Booth Encoder Table

b2k+1

0

0

0

0

1

1

1

1

b2k

0

0

1

1

0

0

1

1

b2k-1

0

1

0

1

0

1

0

1

multiplied by

0

+ x

+ x

+ 2x

- 2x

- x

- x

0

Ab2k-1

b2k

b2k+1

negative

2A

Booth Encoder

= b2k b2k-1

= b2k+1

Page 40: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.40

Booth Multiplication Example

A

X

Initial 0

Add -A

2-bit Shift

Add 2A

2-bit Shift

Add -A

01

11

-A

00

10

10

11

10

01

00

10

11

00

01

+2A

00

11

11

10

00

11

01

11

01

01

11

-A

00

11

11

11

10

01

11

11

10

11

11

01

01

17

-9

Operation

-153

+

+

+

11

11

Page 41: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.41

The Modified Booth Algorithm

Let’s consider a number B = (bn-1, bn-2, ... , b1, b0) written in 2’s-complement.

B may be rewritten as follows :

Example

In this equation, the terms in brackets is in the set {-2, -1, 0, 1, 2}

n-bit multiplier generates exactly n/2 partial products

B b 2 b 2n 1n 1

kk 0

n 2k

0)=b (assume 2)b2b(bB 12k

1

0k12k2k12k

2n

0101 2)b2bb( 2

321 2)b2bb( 4

543 2)b2bb(

4

43

32

21

10

01 2b2b2b2b2bb

Page 42: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.42

Parallel Multiplier

Multiplier has two basic operations

The generation of partial products

The summation of partial products

Parallel multiplier avoids the overhead that is due to the separate

controls of these two operations

The gain in speed is obtained at the expense of extra hardware

Parallel multiplier can be implemented such that it supports a high rate

of pipelining

Page 43: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.43

The Braun Multiplier

a0

b0

a0b0

P0

a1

b1

a1b0

a0b1

P1

a2

b2

a2b0

a1b1

a0b2

P2

a3

b3

a3b0

a2b1

a1b2

a0b3

P3

a3b1

a2b2

a1b3

P4

a3b2

a2b3

P5

a3b3

P6

A straightforward implementation One bit of the new partial prod

uct

( ai . bj )

One bit of the previous partial product

Carry in

In the first four rows there is no horizontal carry propagation (using carry-save adder)

Page 44: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.44

The Braun Multiplier (cont’)

F.A F.A F.A

F.A F.A F.A

F.A F.A F.A

F.A F.A F.A0

b0

b1

b2

b3

p0

p1

p2

p3

p4p5p6p7

a0a1a2a3

0 0 0

Page 45: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.45

Baugh-Wooley Multiplier

Modified in order to allow multiplication of signed number

Let’s consider 2 number A and B (2’s complement number)

The product A.B is

2n

0

i

i

1n

1n01n

2n

0

i

i

1n

1n01n

2b2b)b ... (bB

2a2a)a ... (aA

22n

1-n1-n

12n22n

1n1n

1ni2n

0i1n

1ni2n

0i1n

1n

1n1n

2n

0

2n

0

ji22n12n

1ni2n

0i

1n22n

1n

1ni2n

0i

1n22n

1n

2n

0

2n

0

ji22n

1n1n

1in2n

0i1n

2n

0

1in

i1n

2n

0

2n

0

ji

ji

22n

1n1n

2)ba(22)a(b :because

2ba2ab2)b(a2ba2)baba(2

2a22b2b22a2ba2ba

2ab2ba2ba2baBA

ji1n1n1n1n

ji

1a when ,2a2aA

0a when ,2aA

complement s2'in bit sign :a

1-n

2n

0

i

i

1-n

1-n

1-n

2n

0

i

i

1n

Page 46: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.46

Baugh-Wooley Multiplier (cont’)

a0a1a2a3

F.A F.A F.A

F.A F.A F.A

b0

b1

b2

b3

p0

p1

p2

0 0 0

F.A F.A F.A

F.A F.A F.A

p4p5p6p7

F.A

p3

a3 b3F.A

F.A1

Page 47: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.47

Wallace Tree Multipliers

Full adder vs Wallace tree

Useful whenever a large number of operands are to add.

Completion time in Braun or Baugh-Wooley multiplier Using Ripple Carry Adder:

Proportional to the twice number of n of bits

Using Wallace trees,

Proportional to log2 (n)

Full Adder

20 20 20

21 20

Wallace n

20 20 20

2n 2021

Page 48: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.48

Recursive Decomposition of the Multiplication

A 2 A A

B 2 B B

A B 2 A B 2 (A B A B ) A B

PH L

PH L

2PH H

PH L L H L L

Partitioning two operands

Four Terms (AH.BH, AH

.BL, AL.BH, AL

.BL) are computed using 4 p-bits multipliers

The results are collected through Wallace tree

Page 49: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.49

Recursive Decomposition of the Multiplication

BH BL

AH AL

AL X BL

AH X BL

AH X BH

AL X BH

AL X BL

AH X BL

AH X BH

AL X BH

Aligning the four partial products

AL X BL

AL X BH

AH X BH

AH X BL

4 X W34 X W3

Adder

AH AL BH BL

Page 50: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.50

Booth’s Algorithm Array Multiplication

Another approach to the design of a parallel multiplier for two’s complement operands

The basic cell in rows i perform an add, subtract or transfer-only

CASS (Controlled Add/Subtract/Shift) Cell

cin

Pin (partial product)a

HD

cout

(subtract)

(add)

ca)c(aPc1,D If

ca)c(aPc0,D If

ca)c(aD)(Pc

)(

)(

caPP1,H If

PP0,H If

H)(cH)(aPP

inininout

inininout

inininout

ininout

inout

ininout

sum

transfer

Page 51: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.51

Booth’s Algorithm Array Multiplication (cont’)

CASS CASS

CASS CASS

CASS CASS

CASS CASS

CASS

CASS

CASS

CASS CASS CASS CASS

CASS CASS CASS CASS

CASS

CASS CASS

CTRL

P6

CTRL

CTRL

CTRL

x3

x2

x1

x0

0

P5 P4 P3 P2 P1 P0

a3 a2 a1 a0

0 0 0 0

00

00

00

0

HD

HD

HD

HD

i

ii

XD

XXH

1

Xi Xi-1

0110

0101

ShiftShift

SubtractAdd

0011

dd10

DH

Page 52: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.52

Generalized block diagram of an array multiplier

Page 53: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.53

Q. Why use an array multiplier if it requires as many addition steps?

A1) Array multiplier is combinational circuit, where the signals flow without being clocked.

Multi-pass Array Multiplier : normally use a clock, but the cycle time for passing through k arrays is < kTc

Page 54: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.54

A2) Some speed-up schemes are possible.

e.g. E/O array, Wallace-tree

Even-Odd Array

Page 55: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.55

Wallace-tree Multiplier

Page 56: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.56

6 x 6 Wallace-tree Multiplier Example

(n : width of the Wallace tree)

e.g. For 32-bit, number of adders necessary for each stage is

32 - 22 - 16 - 12 - 8 - 6 - 4 - 3 - 2

Total delay = 9 x adder delay

nDelay2

3log

Page 57: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.57

Page 58: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.58

Datapath and its elements in bit-slice organization

INP

UT

-OU

TP

UT

MEMORY

DATAPATH

CONTROL

3. Datapath Generation

Page 59: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.59

Two layout strategies for bit-slice datapath

Page 60: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.60

Layout of 4-bit DP using layout strategy II (feedthrough)

Page 61: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.61

1-D placement vs. 2-D placement

Page 62: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.62

1-D placement vs. 2-D placement(Cont’)

Page 63: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.63

Datapath Layout Flow

circuit design floorplan : block ordering, bus track assignment

schematic drawing : tr. sizing

layout cell drawing : leaf cell layout

layout assemble : leaf cell integration (routing)

DRC / LVS : design rule check, layout vs. schematic

back-annotation simulation with the exact capacitance

RTL descriptionRTL description

FloorplanFloorplan

Schematic DrawingSchematic Drawing

Cell DrawingCell Drawing

Layout AssembleLayout Assemble

DRC / LVSDRC / LVS

Back-AnnotationBack-Annotation

Datapath LayoutDatapath Layout

Page 64: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.64

Datapath Design Case (ACCENT HK386)

real mode support of x86 instruction set

enhanced (pipelined) datapath

problems & practices of general DP layout

Page 65: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.65

Datapath structure

3 major blocks alu, register file(32bit)

barallel shifter(40bit)

segment/effective address(32bit)

Seg

men

t,EA

AL

UR

eg

ister

File

BarrelShifter

Page 66: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.66

Track capacity

VSS VDD TRACK(6)

Power

Control, Clock

N-well P-well

6 vertical wires/track in metal 1 metal3 reserved for P & G routing

metal2metal1

Page 67: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.67

Power Grid From bottom & left(chip edges)

Considering IR drop

Seg

men

t,EA

AL

URF

BS

H

Page 68: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.68

Cell Structure

Initial cell template decision Nwell in the left

Pwell in the right

data flow vertical

control flow horizontal

Similar cell structure as VTI

Cell width

– 80 for PMOS

– 70 for NMOS

2510 35 45 10 25

70 80

N-well P-well

Page 69: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.69

Cell Structure

모든 쎌에 power line 이 통과함 power line width

10 (2 contact)

power line location 25 to the inside

from the boundary

Page 70: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.70

Accent Cell Layout Flow ( 어느 학생의 탄식 )

Block Spec.

Schematic

SPICE

처음에 cap 을 가정하고 시뮬레이션 TR sizing 은 간단하게 끝냄 Cap 값이 정확하지 않으니까 optimize 는 필요

없고 spec 만 만족하면 된다고 생각함 전체 assemble 이 되어야 정확한 cap 이 나오므로

한참동안 일에서 손을 뗌 assemble 된 다음 layout 을 고치면 새로 다시

assemble 해야 하는데 엄청난 노가다

Page 71: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.71

Data flow

Control

flow

Cell Design(I) Using 45 degree line for cell design

Page 72: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.72

Cell Design(II) needless effort to reduce cell size

ugly poly; current crowding

Data flow

Page 73: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.73

Critical path used for transistor sizing in relevant datapath element

Page 74: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.74

•Track assignment needs to be done before the cell layout (not after).

AssembleData flow

Page 75: 6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

6.75

대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,

이것은 사실 신기한 일이 아니다 .

사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .

대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,

이것은 사실 신기한 일이 아니다 .

사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .

학점의 가치학점의 가치