Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept....

faster functional moduleslessons taught from FP-ADDERs

Guy Even

Electrical Engineering Dept.

Tel-Aviv Univ.

Silicon Value Seminar (April 29, 2002)

outline• FP-Adder: an example of a complicated

module

– brief overview

– focus on two sub-blocks

• Counting leading zeros – priority encoders

various design methods:

• divide & conquer

• parallel prefix computation

• redundant addition

• Adders:

– fast adders

– compound adders

Background

Faster clock rates require faster modules.

Example: Floating-Point Adders

• early designs: 50-60 logic levels.

• 15-20 gate levels per cycle 3-4 cycles!

• new designs: 25 gate levels 2 cycles.

How? better algorithms and faster sub-blocks…

Floating-Point Add• Algorithm: Why 50-60 logic

levels?

• Sub-Modules: List of sub-blocks.floating-point number: S-sign, E-exponent, F-mantissa

FES 2)1(

FbFF EbSbEaSaES 2)1(2)1(2)1(

floating-point addition:

input: (Sa,Ea,Fa) & (Sb,Eb,Fb)

output: (S,E,F) such that:

FP-Add: naïve algorithm

Round:

SWAP operands

Align mantissa of smaller operand (shift right)

Compute sticky-bit

(OR of bits shifted outside)

Pre-process: Add/Sub:

Add/Sub mantissas

Convert sum

to sign & mag

abs(negative sum)

rounding decision

INC according to

rounding decision

Normalize sum

(shift left)

RESULT

focus: normalization shift

Problem:

– LZ= number of leading zeros

– shift left by LZ positions unary example:

X[1:4]=0010

A[1:4]=0011Use a priority encoder!

two types of priority encoders:– unary

– binary

binary example:

X[1:4]=0010

Y[2:0]= 010

Unary PENC

otherwise.0

1][: if1][

jXijiA

– input: X[1:n]

– Output: A[1:n]

– functionality:

Simpler: ])[,],2[],1[(][ iXXXORiA

Implementation: what is the best design?

Delay = (log n) & Cost = (n).

Unary PENC – divide & conquer

delay: O(log n) is optimal O(log n) even if fan-out considered

cost: O(n log n) not optimal

OR(n/2)

U- PENC(n/2)

X[1:n/2]

U-PENC(n/2)

X[1+n/2:n]

OR-tree(n/2)

A[1+n/2:n]A[1:n/2]

linear fan-out

logarithmic delay

share OR-tree

slight reduction of cost

Unary PENC - improve

Parallel Prefix Computation (PPC) [FL,BK]!

])[,],2[],1[(][

])3[],2[],1[(]3[

])2[],1[(]2[

]1[]1[

nXXXORnA

XXXORA

Unary PENC = PPC(OR)

A[1] A[3]

X[3] X[4]X[1] X[2] X[n-1] X[n]

A[n-1]

A[4]A[2] A[n]

OROR OR

U-PENC (n/2)

delay = O(log n)

cost = O(n)

PPC - properties

A[1] A[3]

X[3] X[4]X[1] X[2] X[n-1] X[n]

A[n-1]

A[4]A[2] A[n]

OROR OR

U-PENC (n/2)

Fan-out:

Logarithmic fan-out can be decreased to constant (cost still O(n)).

Layout:

O(n log n) area.

Same design as “Brent-Kung” adder.

Applicable for every associative operator.

Binary PENC

i nXiY ]:1[ in zeros leading ofnumber 2][

– input: X[1:n] (n=2^k)

– Output: Y[k:0]

– functionality:

Relation to Unary PENC:

.])[1(2][1

i jAiY

Implementation: what is the best design?

Delay = (log n) & Cost = (n).

Binary PENC – simple & optimal

PPC (OR)

X[1:n]

encoder(n)

Y[k:0]

A[1:n]

])[,],1[],0[(][ iXXXORiA

diff(n)delay(diff(n)) = constant

delay(encoder(n)) = O(log n)

cost(diff(n)) = O(n)

cost(encoder(n)) = O(n)

Binary PENC – with adder tree

PPC (OR)

X[1:n]

ADD-tree(n)

Y[k:0]

A[1:n]

problem:

adder(k) in tree O(log k) delay per adder

total delay is O(log n log k).

0])[1(2][

i jAiY

])[,],1[],0[(][ iXXXORiA

Redundant addition

a3 a2 a1 a0

c3 c2 c1 c0

b3 b2 b1 b0

y3 y2 y1 y0

x3 x2 x1 x0

add columns in parallel using Full-Adders

Partial compression or (3:2)-addition:

delay is constant!

Tree structure enables (n:2)-addition

with O(log n) delay.

(n:2)-addition used in fast multipliers

Binary PENC – O(log n) delay

Tree of Full-Adders:

delay of each full-adder is constant

depth is O(log n)

output is carry-save number

])[,],1[],0[(][ iXXXORiA

A[1:n]

PPC (OR)

X[1:n]

FA-tree(n)

2:1-Adder

Y[k:0]

2[k:0]

0])[1(2][

i jAiY

Binary PENC – divide & conquer

1 2 n/2

n/2+1 n

XPENCBinY

000 ifn/2

000 if0

XL=00…0

YL[k-1]=1

Binary PENC – divide & conquer

+2(k-1)

(Half Adder)

Bin-PENC(n/2)

X[1:n/2]

Bin-PENC(n/2)

X[1+n/2:n]

1MUX(k)

Y[k:0]

YL[k-1]

delay=constant

cost=O(log n)

delay=constant

cost=O(log n)fan-out=k

incurs O(log log n)

bottom line:

delay = O(log n log logn )

cost = O(n)

initial analysis:

delay = O(log n)

cost = O(n)

PENC – quick summary

designmethodcostdelay

U-PENCdiv & conquern log nlog n

area=n log n

Bin-PENCPPC+encodernlog n

PPC+Add_treenlog n

Div & Conquernlog n

log log n

PENC - further issues

back to FP-Adder:

can we estimate LZ before subtracting?

must pre-process to avoid “catastrophic cancellation”!

method: partial compression (signed half-adders).

focus – adderAvoid INC after rounding decision by

pre-computing increment.

RESULT

Compound Adder

a+b a+b+1

MUXrounding decision

PPC Adder [FL,BK]•computes carry bits C[n:1]

•sum bits satisfy: S[i]=XOR(A[i],B[i],C[i]).

•computation of carry bits C[n:1].

claim: pgppjiijiC ]:[:1]1[

example:

A[3:0]=0100

B[3:0]=1110

[3:0]=pgpk

C[4:1]=1100

2][][if

1][][if

0][][if

Define:

PPC adder (cont.)

pgppjiij ]:[:

how to compute the event:

define an operator : {k,p,g} {k,p,g} {k,p,g} as follows:

g x = g

p x = p

k x = kclaim: is associative.

definition: [i] = [i] … [0].

claim: giiC ][1]1[

compute [i]

using PPC with

-gates!

Compound Adder [T]how to compute a+b & a+b+1?

– use 2 separate adders

– understand PPC adder

= (a+0.5)+(b+0.5)

recall: [i] = [i] … [0].

Now, for a+b+1: ’[i] = [i] … [0] g.

][' 1]1['

Therefore,

Conclusion

• faster modules require clever designs

• starting point: gate count (for delay & cost)

• Must take fan-out & layout into account

• lots of methods: – divide & conquer

– parallel prefix computation

– redundant arithmetic

Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept....

Documents

EE5324 Adders

Floating Point Adders

2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec 08+9+10-ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From

Conditional-Sum Adders and Parallel Prefix Network Adders ECE 645: Lecture 5

Adders - facultystaff.richmond.edu

Binary Adders: Half Adders and Full Adders - Edward Bosworth · 2012-09-06 · Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of

Koren Book Adders

Vlsi Adders

1 Floating Point Adders and Multipliers Adders and Multipliers

Lecture 12b: Adders

Binary Parallel Adders

Conditional-Sum Adders and Parallel Prefix Network …ece.gmu.edu/coursewebpages/ECE/ECE645/S13////viewgraphs/ECE645... · Conditional-Sum Adders and Parallel Prefix Network Adders

TRoNICS - worldradiohistory.com9 Digital Adders 123 Binary Adders-Half-Adders-Full-Adders-Binary Subtractors-Serial Adder/Subtractor-Half and Full-Subtractors 10 Binary Counters 134

Parallel Adders - Concordia Universityasim/COEN_6501/Lecture_Notes/L2_Notes.pdf · Parallel Adders Parallel adders are digital circuits that compute the addition of variable binary

Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders

Binary Adders

Ranunculusouramericanroots.com/2020FallBrochureweb.pdf · 2020-04-30 · ranunculus 16 Aviv Gold - 32 Aviv Orange - 33 Aviv Pink - 37 Aviv Purple - 38 Aviv Red- 39 Aviv Rose - 40

UNIT-IV ADDERS

Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5

Jackson Adders