VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Preview:

Citation preview

VLSI ArithmeticAdders & Multipliers

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Oklobdzija 2004 Computer Arithmetic 2

Introduction

• Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design.

• The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.

• Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

Oklobdzija 2004 Computer Arithmetic 3

Basic Operations

• Addition

• Multiplication

• Multiply-Add

• Division

• Evaluation of Functions

• Multi-Media

Addition of Binary Numbers

Oklobdzija 2004 Computer Arithmetic 5

Addition of Binary NumbersFull Adder. The full adder is the fundamental building block of most arithmetic circuits:  

The sum and carry outputs are described as:

iiiiiiiiiiiiiiiiiii cbcabacbacbacbacbac 1

iiiiiiiiiiiii cbacbacbacbas

FullAdder

CinCout

si

ai bi

Oklobdzija 2004 Computer Arithmetic 6

Addition of Binary Numbers

Propagate

Propagate

Generate

Generate

Inputs Outputs

ci ai bi si ci+1

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 1 1 0 1

1 0 0 1 0

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

Oklobdzija 2004 Computer Arithmetic 7

Full-Adder Implementation

Full Adder operations is defined by equations:

iiiiiiiiiiiiiiiiii cpcbacbacbacbacbas

iiiiiiiiiiii cpgbacbacbac 1

One-bit adder could be implemented as shown

Carry-Propagate:and Carry-Generate gi

iii bap

iii bag cout c in

s i

a i b i

Oklobdzija 2004 Computer Arithmetic 8

High-Speed Addition

iii cps

iiii cpgc 1

One-bit adder could be implemented more efficiently

because MUX is faster

iii bap iii bag

0

1s

b ia i

cout

s i

c in

Oklobdzija 2004 Computer Arithmetic 9

The Ripple-Carry Adder

Oklobdzija 2004 Computer Arithmetic 10

The Ripple-Carry Adder

A0 B0

S0

Co,0Ci,0

A1 B1

S1

Co,1

A2 B2

S2

Co,2

A3 B3

S3

Co,3

(= Ci,1)FA FA FA FA

Worst case delay linear with the number of bits

tadder N 1– tcarry tsum+

td = O(N)

Goal: Make the fastest possible carry path circuit

From Rabaey

Oklobdzija 2004 Computer Arithmetic 11

Inversion Property

A B

S

CoCi FA

A B

S

CoCi FA

S A B Ci S A B Ci

=

Co A B Ci Co A B Ci

=

From Rabaey

Oklobdzija 2004 Computer Arithmetic 12

Minimize Critical Path by Reducing Inverting Stages

A0 B0

S0

Co,0Ci,0

A1 B1

S1

Co,1

A2 B2

S2

Co,2 Co,3FA’ FA’ FA’ FA’

A3 B3

S3

Odd CellEven Cell

Exploit Inversion Property

Note: need 2 different types of cellsFrom Rabaey

Oklobdzija 2004 Computer Arithmetic 13

Ripple Carry Adder

Carry-Chain of an RCA implemented using multiplexer from the standard cell library: a i+1 b i+1 a i b i

a i+2 b i+2

cout

c i+1 c i

s is i+1s i+2

c in

Critical Path

Oklobdzija, ISCAS’88

Oklobdzija 2004 Computer Arithmetic 14

Manchester Carry-Chain Realization of the Carry Path

• Simple and very popular scheme for implementation of carry signal path

V dd

Carry out Carry in

Propagatedevice

Predischarge& kill device

Generatedevice

++++++++

V ddV ddV ddV ddV ddV ddV dd

Oklobdzija 2004 Computer Arithmetic 15

Original DesignT. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:

A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Oklobdzija 2004 Computer Arithmetic 16

Manchester Carry Chain (CMOS)

P0

Ci,0

P1

G0

P2

G1

P3

G2

P4

G3 G4

VDD

Kilburn, et al, IEE Proc, 1959.

•Implement P with pass-transistors•Implement G with pull-up, kill (delete) with pull-down•Use dynamic logic to reduce the complexity and speed up

Oklobdzija 2004 Computer Arithmetic 17

Pass-Transistor Realization in DPL A

A

B

B

C C

V C CS

S

XO R /XN O R M U LT IPLEX ER B U FFER

C C

M U LT IPLEX ER

V C CC

O

CO

B U FFER

V C C

V C C

O R /N O R

A N D /N A N D

A

A

B

B

A

A

B

B

Oklobdzija 2004 Computer Arithmetic 18

Carry-Skip Adder

MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61

Oklobdzija 2004 Computer Arithmetic 19

Carry-Skip Adder

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,3Co,2Co,1Co,0Ci ,0

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,2Co,1Co,0Ci,0

Co,3

Mul

tipl

exer

BP=PoP1P2P3

Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.

Bypass

From Rabaey

Oklobdzija 2004 Computer Arithmetic 20

Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups

G r G r-1

...

SN-k-1S N-1

a N -1bN -1 b N -k-1a N -k-1

S(r-1)k-1 S (r-2)k

G 1G o

...

Sk

S2k-1

a 2k-1b 2k-1 b kak

Sk-1

S0

...

...a (r-1)k b(r-1)k a (r-1)kb (r-1)k

...a k-1 b k-1 a0 b 0

...

C in

... ... ... ... ... ... ... ...

P r-1P r-2 P 1 P 0

C out + + + +

A N D

O RO RO R O R

A N DA N DA N D

critica l pa th , de lay =2(k-1)+(N /2-2)

Oklobdzija 2004 Computer Arithmetic 21

Carry-Skip Adder

SKIPRCAd tN

tkt

2

212

N

tp

ripple adder

bypass adder

4..8

k

Oklobdzija 2004 Computer Arithmetic 22

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 23

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

G 0

... ...

a0 b

0

...

...

ai

bi

aN-1

bN-1

S j

P m -2

C inC out

C ou

t

G 2G m -2G m -1G m

G 0G 1G 2G m -2G m -1G m

S N-1S i

S 0

P 2P 0P m -1P m

.....

G 1

P 1

C in

.....

aj b

j

Carry signal path

skip ing

ripp ling

Oklobdzija 2004 Computer Arithmetic 24

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

1 13 34 4

5 56

=9

Any-point-to-any-point delay = 9 as compared to 12 for CSKA

Oklobdzija 2004 Computer Arithmetic 25

Carry-chain block size determination for a 32-bit Variable Block Adder

(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 26

Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

P0

Ci,0

P1

G0

P2

G1

P3

G2

BP

G3

BP

Co,3

Delay model:

Oklobdzija 2004 Computer Arithmetic 27

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

Oklobdzija, Barnes, Arith’85

321 cNcctd

Oklobdzija 2004 Computer Arithmetic 28

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

• No closed form solution for delay• It is a dynamic programming problem

Oklobdzija 2004 Computer Arithmetic 29

Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 30

Delay Comparison: Variable Block Adder

0

2

4

6

8

10

12

14

16

4 11 18 25 32 39 46 53 60

Size N

Del

ay

VBA- Multi-Level

CLA

VBA

VLSI ArithmeticLecture 4

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Review

Lecture 3

Oklobdzija 2004 Computer Arithmetic 33

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 34

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

G0

... ...

a0 b

0

...

...

ai

bi

aN-1

bN-1

Sj

Pm-2

CinCout

Cout

G2Gm-2Gm-1Gm

G0G1G2Gm-2Gm-1Gm

SN-1Si

S0

P2P0Pm-1Pm

.....

G1

P1

Cin

.....

aj b

j

Carry signal path

skiping

rippling

Oklobdzija 2004 Computer Arithmetic 35

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

1 13 34 4

5 56

=9

Any-point-to-any-point delay = 9 as compared to 12 for CSKA

Oklobdzija 2004 Computer Arithmetic 36

Carry-chain block size determination for a 32-bit Variable Block Adder

(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 37

Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

P0

Ci,0

P1

G0

P2

G1

P3

G2

BP

G3

BP

Co,3

Delay model:

Oklobdzija 2004 Computer Arithmetic 38

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

Oklobdzija, Barnes, Arith’85

321 cNcctd

Oklobdzija 2004 Computer Arithmetic 39

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

• No closed form solution for delay• It is a dynamic programming problem

Oklobdzija 2004 Computer Arithmetic 40

Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 41

Delay Comparison: Variable Block Adder

0

2

4

6

8

10

12

14

16

4 11 18 25 32 39 46 53 60

Size N

Del

ay

VBA- Multi-Level

CLA

VBASquare Root Dependency

Log Dependency

Oklobdzija 2004 Computer Arithmetic 42

Circuit Issues

• Adder speed can not be estimated based on:– logic gates in the critical path– number of transistors in the path– logic levels in the path

• Estimating Adders speed is much more complex and many of the “fast” schemes may be misleading you.

Oklobdzija 2004 Computer Arithmetic 43

Fan-Out Dependency

Oklobdzija 2004 Computer Arithmetic 44

Fan-In Dependency

This looks like “Logical Effort”

(1985)

Oklobdzija 2004 Computer Arithmetic 45

Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 46

Oklobdzija 2004 Computer Arithmetic 47

Carry-Lookahead Adder(Weinberger and Smith, 1958)

Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958)

Oklobdzija 2004 Computer Arithmetic 48

CLA Definitions: One-bit adder

iii cps

iiii cpgc 1

iii bap iii bag

0

1s

b ia i

cout

s i

c in

Oklobdzija 2004 Computer Arithmetic 49

CLA Definitions: 4-bit Adderai bi

Ci

gi pi

ai+1 bi+1

Ci+1

gi+1 pi+1

ai+2 bi+2

Ci+2

gi+2 pi+2

ai+3 bi+3

Ci+3

gi+3 pi+3

Ci+4

1111

1111112 )(

cppgpg

cpgpgcpgc

iiiii

iiiiiiii

iiiiiiiiiiii cpgbacbacbac 1

Oklobdzija 2004 Computer Arithmetic 50

Carry-Lookahead Adder: 4-bitsai bi

Ci

gi pi

ai+1 bi+1

Ci+1

gi+1 pi+1

ai+2 bi+2

Ci+2

gi+2 pi+2

ai+3 bi+3

Ci+3

gi+3 pi+3

Ci+4

iiiiiiiiii

iiiiiiiiiiii

cpppgppgpg

cppgpgpgcpgc

1212122

111222223

)(

iiiiiiiiiiiiiii

iiiiiiiiiiii

cppppgpppgppgpg

gppgpgpgcpgc

123123123233

12122333334

)(

Gj Pj

Oklobdzija 2004 Computer Arithmetic 51

Carry-Lookahead Adderiiiiiiiiiij gpppgppgpgG 123123233

iiiij ppppP 123

jjjj cPGc )1(4

One gate delay to calculate p, g

One to calculateP and two for G

Three gate delaysTo calculate C4(j+1)

Compare that to 8 in RCA !

a i b i

Cin Cj

G jP j

a i+1 b i+1

g i+1p i+1 g i p i

a i+2 b i+2a i+3 b i+3

g i+1p i+1g i+1p i+1

C4(j+1)

C4j+1C4j+2C4j+3

P , G G roup

Oklobdzija 2004 Computer Arithmetic 52

Carry-Lookahead Adder(Weinberger and Smith)

  

iiiiiiiiiij GPPPGPPGPG 123123233*G

iiiij PPPPP 123*

jkkj cPGc 4)1(4 **

P j

G* P*

C 4j+1

G jP j+1G j+1P j+3G j+3P j+2G j+2

C4jC4(j+1)

C 4j+2C 4j+3

Additional two gate delays

C16 will take a total of 5 vs. 32 for RCA !

Oklobdzija 2004 Computer Arithmetic 53

32-bit Carry Lookahead Adder

C in

C out C in

C 4C 8C 12

C out

C 20C 24C 28

C in

C 16

a ib i

ind ividua l addersgenera ting: g i, p i,

and sum S i

C arry-lookahead b locks o f4-b its generating:

G i, P i, and C in fo r theadders

C arry-lookahead super- b locks o f4-b its b locks genera ting:

G * i, P * i, and C in fo r the 4-b itb locks

G roup producing fina lcarry C out and C 16

C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay

Oklobdzija 2004 Computer Arithmetic 54

Carry-Lookahead Adder(Weinberger and Smith: original derivation, 1958 )

Oklobdzija 2004 Computer Arithmetic 55

Carry-Lookahead Adder(Weinberger and Smith: original derivation )

Oklobdzija 2004 Computer Arithmetic 56

Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !

Oklobdzija 2004 Computer Arithmetic 57

Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !

Motorola: CLA Implementation Example

A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,

Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Oklobdzija 2004 Computer Arithmetic 59

Critical path in Motorola's 64-bit CLA

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3

... G5

2

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G

0

P,G

1:0

P,G

2:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G

60

P,G

61

:60

P,G

62

:60

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

1.05nS

1.7nS

2.0nS 2.35nS

2.7nS

3.75nS

4.8nS

Oklobdzija 2004 Computer Arithmetic 60

Motorola's 64-bit CLA

conventional PG Block

carry ripples locally5-transistors in the path

no better situation here !

Basically, this is MCC performance with Carry-Skip.One should not expect any better results than VBA.

Oklobdzija 2004 Computer Arithmetic 61

Motorola's 64-bit CLA

Modified PG Block

Intermediate propagate signals Pi:0 are generated to speed-up C3

still critical path resembles MCC

Oklobdzija 2004 Computer Arithmetic 62

Motorola's 64-bit CLA

1.8nS

2.2nS

2.9nS 3.2nS

3.55nS

3.9nS

Oklobdzija 2004 Computer Arithmetic 63

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3... G

52

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G0

P,G1

:0

P,G2

:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G6

0

P,G6

1:6

0

P,G6

2:6

0

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

1.05nS

1.7nS

2.0nS 2.35nS

2.7nS3.75nS

4.8nS

1.8nS

2.2nS

2.9nS 3.2nS

3.55nS

3.9nS

Delay Optimized CLA

B. Lee, V. G. OklobdzijaJournal of VLSI Signal Processing, Vol.3, No.4, October 1991

Oklobdzija 2004 Computer Arithmetic 65

Delay Optimized CLA: Lee-

Oklobdzija ‘91(a.) Fixed groups and levels

(b.) variable-sized groups, fixed levels

(c.) variable-sized groups and fixed levels

(d.) variable-sized groups and levels

Oklobdzija 2004 Computer Arithmetic 66

Two-Levels of Logic Implementation of the Carry Block

Oklobdzija 2004 Computer Arithmetic 67

Two-Levels of Logic Implementation of the Carry-Lookahead Block

Oklobdzija 2004 Computer Arithmetic 68

Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)

Oklobdzija 2004 Computer Arithmetic 69

Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)

Oklobdzija 2004 Computer Arithmetic 70

Delay Optimized CLA: Lee-Oklobdzija ‘91

Delay: Two-level BCLA Delay: Three-level BCLA

Oklobdzija 2004 Computer Arithmetic 71

Delay Optimized CLA: Lee-Oklobdzija ‘91

(a.) 2-level BCLA =8.5nS (b.) 3-level BCLA =8.9nS

Ling’s Adder

Huey Ling, “High-Speed Binary Adder”

IBM Journal of Research and Development, Vol.5, No.3, 1981.

Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.

Oklobdzija 2004 Computer Arithmetic 73

Ling’s Derivations

ai bi pi gi ti

0 0 0 0 0

0 1 1 0 1

1 0 1 0 1

1 1 0 1 1

iii CCH 11

iii bag

ai bi

ci

si

ci+1

gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1

iiii CpgC 1

define:

111

11

iiiiii

iiiiiiiii

HpCpCp

CppgpCpCp

1 iiii HpCp

111

11

iiiiii

iiiiiiii

HtHpHg

CpHgCpgC

11 iii HtC

Oklobdzija 2004 Computer Arithmetic 74

Ling’s Derivations

iii CCH 11 iiii CpgC 1

From: and

iiiiiiiii CgCCpgCCH 11

iiii HtgH 11 11 iii HtCbecause:

fundamental expansion

Now we need to derive Sum equation

Oklobdzija 2004 Computer Arithmetic 75

Ling Adder

Variation of CLA:

Ling, IBM J. Res. Dev, 5/81

iiii CpgC 1

iii CpS

iii bap

iii bag

iiii HtgH 11

iiiiii HtgHtS 11

iii bat

iii bag

Ling’s equations:

Oklobdzija 2004 Computer Arithmetic 76

Ling Adder

iiii

iiiiii

Cpgg

CpCggC

1

iiii CtgC 1 11 iiii HtgH

Ling’s equation:

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.

Ling uses different transfer function.Four of those functions have desiredproperties (Ling’s is one of them)

Variation of CLA:

Oklobdzija 2004 Computer Arithmetic 77

Ling Adder

inCttttgtttgttgtgC 012301231232334

in

in

CtttgttgtggH

CttttgtttgttgtgH

01201212234

101200121122234

Conventional:

Ling:

Fan-in of 5

Fan-in of 4

Oklobdzija 2004 Computer Arithmetic 78

Advantages of Ling’s Adder

• Uniform loading in fan-in and fan-out

• H16 contains 8 terms as compared to G16 that contains 15.

• H16 can be implemented with one level of logic (in ECL), while G16 can not.

(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used)

VLSI ArithmeticLecture 5

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Review

Lecture 4

Ling’s Adder

Huey Ling, “High-Speed Binary Adder”

IBM Journal of Research and Development, Vol.5, No.3, 1981.

Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.

Oklobdzija 2004 Computer Arithmetic 82

Ling’s Derivations

ai bi pi gi ti

0 0 0 0 0

0 1 1 0 1

1 0 1 0 1

1 1 0 1 1

iii CCH 11

iii bag

ai bi

ci

si

ci+1

gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1

iiii CpgC 1

define:

11

iiiiii

iiiiiiiii

HpCpCp

CppgpCpCp

1 iiii HpCp

111

11

iiiiii

iiiiiiii

HtHpHg

CpHgCpgC

11 iii HtC

Oklobdzija 2004 Computer Arithmetic 83

Ling’s Derivations

iii CCH 11 iiii CpgC 1

From: and

iiiiiiiii CgCCpgCCH 11

iiii HtgH 11 11 iii HtCbecause:

fundamental expansion

Now we need to derive Sum equation

Oklobdzija 2004 Computer Arithmetic 84

Ling Adder

Variation of CLA:

Ling, IBM J. Res. Dev, 5/81

iiii CpgC 1

iii CpS

iii bap

iii bag

iiii HtgH 11

iiiiii HtgHtS 11

iii bat

iii bag

Ling’s equations:

Oklobdzija 2004 Computer Arithmetic 85

Ling Adder

iiii

iiiiii

Cpgg

CpCggC

1

iiii CtgC 1 iiii HtgH 11

Ling’s equation:

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.

Ling uses different transfer function.Four of those functions have desiredproperties (Ling’s is one of them)

Variation of CLA:

ai bi

ci

si

ci+1

ai-1 bi-1

ci-1

si-1

gi, ti gi-1, ti-1

Hi+1 Hi

Oklobdzija 2004 Computer Arithmetic 86

Ling Adder

inCttttgtttgttgtgC 012301231232334

in

in

CtttgttgtggH

CttttgtttgttgtgH

01201212234

101200121122234

Conventional:

Ling:

Fan-in of 5

Fan-in of 4

Oklobdzija 2004 Computer Arithmetic 87

Advantages of Ling’s Adder• Uniform loading in fan-in and fan-out

• H16 contains 8 terms as compared to G16 that contains 15.

• H16 can be implemented with one level of logic (in ECL), while G16 can not (with 8-way wire-OR).

(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used - his IBM limitation was fan-in of 4 and wire-OR of 8)

Oklobdzija 2004 Computer Arithmetic 88

Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 89

Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 90

Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 91

Advantage of Ling’s Adder

• 32-bit adder used in: IBM 3033, IBM S370/ Model168, Amdahl V6.

• Implements 32-bit addition in 3 levels of logic

• Implements 32-bit AGEN: B+Index+Disp in 4 levels of logic (rather than 6)

• 5 levels of logic for 64-bit adder used in HP processor

Oklobdzija 2004 Computer Arithmetic 92

Implementation of Ling’s Adder in CMOS

(S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96)

Oklobdzija 2004 Computer Arithmetic 93

S. Naffziger, ISSCC’96

01212234 gttgtggH

11 iii HtC

Oklobdzija 2004 Computer Arithmetic 94

S. Naffziger, ISSCC’96

01212234 gttgtggH

Oklobdzija 2004 Computer Arithmetic 95

S. Naffziger, ISSCC’96

01212234 gttgtggH

Oklobdzija 2004 Computer Arithmetic 96

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 97

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 98

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 99

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 100

S. Naffziger, ISSCC’96

)( 0711711111515161516 gttgtggpHpC

Oklobdzija 2004 Computer Arithmetic 101

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 102

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 103

S. Naffziger, ISSCC’96

Oklobdzija 2004 Computer Arithmetic 104

Ling Adder Critical Path

Oklobdzija 2004 Computer Arithmetic 105

Ling Adder: Circuits

A0

B0

A1 B1A1

B1

A2

B2

A2 B2

CKG3

G4

CK

A3

B3P4

A2 B2

B3A3B1

A0 B0

A1

CK

CK

P

LCH LCL

C1H C0LC1L C0H

SumH

CK

K

G

SumL LCH LCL

C1H C0LC1L C0H

CK

P2

P1

G0

CKLC

G2G1

Oklobdzija 2004 Computer Arithmetic 106

LCS4 – Critical G Path

4b

in1

G3

12b

P4(k,p) or (g,p) G4

C15

32b

C47 C15C31

S63 S48S62

16b

Oklobdzija 2004 Computer Arithmetic 107

LCS4 – Logical Effort Delay

Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)

Stages Branch LE ParasiticTotal

Branch Total LEPath Effort fo, opt

Effort Delay

(ps)

Parasitic Delay

(ps)

Total Delay

(ps)

Total Delay (FO4)

dg3# (dg3) 4.0 0.98 2.97g4 (NAND2) 2.0 1.11 1.84C15# (GG4) 1.0 1.01 1.80C15 (INV) 1.0 1.00 1.00C47# (LC) 3.0 1.03 3.32C47 (INV) 1.0 1.00 1.00C47#b (INV) 1.0 1.00 1.00C47b (INV) 1.0 1.00 1.00S63# (SUM) 16.0 0.86 1.36S63 (INV) 1.0 1.00 1.00

3.74E+023.84E+02 9.73E-01 7.2701.81 13666

Oklobdzija 2004 Computer Arithmetic 108

Results:

• 0.5u Technology

• Speed: 0.930 nS

• Nominal process, 80C, V=3.3V

See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96

Prefix Addersand

Parallel Prefix Adders

Oklobdzija 2004 Computer Arithmetic 110

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 111

Prefix Adders

(g0, p0)

Following recurrence operation is defined:

(g, p)o(g’,p’)=(g+pg’, pp’)

such that:

Gi, Pi =

(gi, pi)o(Gi-1, Pi-1 )

i=0

1 ≤ i ≤ n

ci+1 = Gifor i=0, 1, ….. n

c1 = g0+ p0 cin (g-1, p-1)=(cin,cin)

This operation is associative, but not commutativeIt can also span a range of bits (overlapping and adjacent)

Oklobdzija 2004 Computer Arithmetic 112

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 113

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 114

Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic

Units”, IFIP Congress, Munich, Germany, 1962.

Oklobdzija 2004 Computer Arithmetic 115

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 116

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 117

Hybrid BK-KS Adder

Oklobdzija 2004 Computer Arithmetic 118

Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>i≥j≥k

operation is idempotent: h>i≥j≥k

produces carry: cin=0

Oklobdzija 2004 Computer Arithmetic 119

Parallel Prefix Adders: Ladner-Fisher

Exploits associativity, but not idempotency. Produces minimal logical depth

Oklobdzija 2004 Computer Arithmetic 120

Two wires at each level. Uniform, fan-in of two.Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)

Parallel Prefix Adders: Ladner-Fisher(16,8,4,2,1)

Oklobdzija 2004 Computer Arithmetic 121

Parallel Prefix Adders: Kogge-StoneExploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.

Buffers needed in both cases: K-S, L-F

Oklobdzija 2004 Computer Arithmetic 122

Kogge-Stone Adder

Oklobdzija 2004 Computer Arithmetic 123

Parallel Prefix Adders: Brent-Kung

• Set the fan-out to one

• Avoids explosion of wires (as in K-S)

• Makes no sense in CMOS:– fan-out = 1 limit is arbitrary and extreme– much of the capacitive load is due to wire

(anyway)

• It is more efficient to insert buffers in L-F than to use B-K scheme

Oklobdzija 2004 Computer Arithmetic 124

Brent-Kung Adder

Oklobdzija 2004 Computer Arithmetic 125

Parallel Prefix Adders: Han-Carlson

• Is a hybrid synthesis of L-F and K-S

• Trades increase in logic depth for a reduction in fan-out:– effectively a higher-radix variant of K-S.– others do it similarly by serializing the prefix

computation at the higher fan-out nodes.

• Others, similarly trade the logical depth for reduction of fan-out and wire.

Oklobdzija 2004 Computer Arithmetic 126

Parallel Prefix Adders: variety of possibilitiesfrom: Knowles

bounded by L-F and K-S at ends

Oklobdzija 2004 Computer Arithmetic 127

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Following rules are used:

• Lateral wires at the jth level span 2j bits

• Lateral fan-out at jth level is power of 2 up to 2j

• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.

Oklobdzija 2004 Computer Arithmetic 128

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• The number of minimal depth graphs of this type is given in:

• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.

Oklobdzija 2004 Computer Arithmetic 129

Parallel Prefix Adders: variety of possibilities

example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Oklobdzija 2004 Computer Arithmetic 130

Parallel Prefix Adders: variety of possibilities

Example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Oklobdzija 2004 Computer Arithmetic 131

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• Delay is given in terms of FO4 inverter delay: w.c.(nominal case is 40-50% faster)

• K-S is the fastest• K-S adders are wire limited (requiring 80% more area)• The difference is less than 15% between examined schemes

Oklobdzija 2004 Computer Arithmetic 132

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Conclusion

• Irregular, hybrid schmes are possible

• The speed-up of 15% is achieved at the cost of large wiring, hence area and power

• Circuits close in speed to K-S are available at significantly lower wiring cost

VLSI ArithmeticLecture 6

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Review

Lecture 5

Prefix Addersand

Parallel Prefix Adders

Oklobdzija 2004 Computer Arithmetic 136

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 137

Prefix Adders

(g0, p0)

Following recurrence operation is defined:

(g, p)o(g’,p’)=(g+pg’, pp’)

such that:

Gi, Pi =

(gi, pi)o(Gi-1, Pi-1 )

i=0

1 ≤ i ≤ n

ci+1 = Gifor i=0, 1, ….. n

c1 = g0+ p0 cin (g-1, p-1)=(cin,cin)

This operation is associative, but not commutativeIt can also span a range of bits (overlapping and adjacent)

Oklobdzija 2004 Computer Arithmetic 138

Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>i≥j≥k

operation is idempotent: h>i≥j≥k

produces carry: cin=0

Oklobdzija 2004 Computer Arithmetic 139

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 140

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 141

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 142

Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 143

Kogge-Stone Adder

Oklobdzija 2004 Computer Arithmetic 144

Brent-Kung Adder

Oklobdzija 2004 Computer Arithmetic 145

Hybrid BK-KS Adder

Oklobdzija 2004 Computer Arithmetic 146

Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic

Units”, IFIP Congress, Munich, Germany, 1962.

Oklobdzija 2004 Computer Arithmetic 147

Parallel Prefix Adders: Ladner-Fisher

Exploits associativity, but not idempotency. Produces minimal logical depth

Oklobdzija 2004 Computer Arithmetic 148

Two wires at each level. Uniform, fan-in of two.Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)

Parallel Prefix Adders: Ladner-Fisher(16,8,4,2,1)

Oklobdzija 2004 Computer Arithmetic 149

Parallel Prefix Adders: Kogge-StoneExploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.

Buffers needed in both cases: K-S, L-F

Oklobdzija 2004 Computer Arithmetic 150

Parallel Prefix Adders: Brent-Kung

• Set the fan-out to one

• Avoids explosion of wires (as in K-S)

• Makes no sense in CMOS:– fan-out = 1 limit is arbitrary and extreme– much of the capacitive load is due to wire

(anyway)

• It is more efficient to insert buffers in L-F than to use B-K scheme

Oklobdzija 2004 Computer Arithmetic 151

G2,P2

G3,P3

G4,P4

G1,P1

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15Cout

Two Parallel Prefix Adder Structures

G2,P2

G3,P3

G4,P4

G1,P1

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15Cout

Kogge-Stone Han-Carlson

• log(bits) carry stages• Extra Wiring

• log(bits) + 1 carry stages• Reduced Wiring and Gates

Oklobdzija 2004 Computer Arithmetic 152

Parallel Prefix Adders: Han-Carlson

• Is a hybrid synthesis of L-F and K-S

• Trades increase in logic depth for a reduction in fan-out:– effectively a higher-radix variant of K-S.– others do it similarly by serializing the prefix

computation at the higher fan-out nodes.

• Others, similarly trade the logical depth for reduction of fan-out and wire.

Oklobdzija 2004 Computer Arithmetic 153

Parallel Prefix Adders: variety of possibilitiesfrom: Knowles

bounded by L-F and K-S at ends

Oklobdzija 2004 Computer Arithmetic 154

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Following rules are used:

• Lateral wires at the jth level span 2j bits

• Lateral fan-out at jth level is power of 2 up to 2j

• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.

Oklobdzija 2004 Computer Arithmetic 155

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• The number of minimal depth graphs of this type is given in:

• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.

Oklobdzija 2004 Computer Arithmetic 156

Parallel Prefix Adders: variety of possibilities

example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Oklobdzija 2004 Computer Arithmetic 157

Parallel Prefix Adders: variety of possibilities

Example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Oklobdzija 2004 Computer Arithmetic 158

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• Delay is given in terms of FO4 inverter delay: w.c.(nominal case is 40-50% faster)

• K-S is the fastest• K-S adders are wire limited (requiring 80% more area)• The difference is less than 15% between examined schemes

Oklobdzija 2004 Computer Arithmetic 159

Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Conclusion

• Irregular, hybrid schmes are possible

• The speed-up of 15% is achieved at the cost of large wiring, hence area and power

• Circuits close in speed to K-S are available at significantly lower wiring cost

Oklobdzija 2004 Computer Arithmetic 160

Possibilities for Further Research

• The logical depth is important (Knowles was right)• The fan-out is less important than fan-in (Knowles

was wrong):– It is possible to examine a variety of topologies with

restricted and varied fan-in.• Driving strength and Logical Effort rules were

overlooked and at least neglected:– It is possible to create number of topologies taking LE

rules into account.– It is further possible to combine the rules with

compound domino implementation taking advantage of two different rules governing “dynamic” and “static”.

• It is still possible to produce a better adder !

Oklobdzija 2004 Computer Arithmetic 161

Other Types of Adders

Conditional Sum Adder

J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic

Computers, EC-9, p.226-231, 1960.

Oklobdzija 2004 Computer Arithmetic 163

Conditional Sum Adder

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 164

ConditionalSum Adder

Oklobdzija 2004 Computer Arithmetic 165

Conditional Sum Adder

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 166

Conditional Sum Adder

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 167

Conditional Sum Adder

Carry-Select Adder

O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June

1962, p.340-34

Oklobdzija 2004 Computer Arithmetic 169

Carry-Select Sum Adder

from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 170

Carry-Select Adder

Addition under assumption of Cin=0 and Cin =1.

Oklobdzija 2004 Computer Arithmetic 171

Carry Select Adder:combining two 32-b VBAs in select mode

Delay =VBA32+ MUX

Oklobdzija 2004 Computer Arithmetic 172

Carry-Select Adder

O.J. Bedrij, IBM Poughkeepsie, 1962

Recommended