39
Jackson Adders Prof. David Money Harris 9 July 2010

Jackson Adders

Embed Size (px)

Citation preview

Page 1: Jackson Adders

Jackson Adders

Prof. David Money Harris

9 July 2010

Page 2: Jackson Adders

Jackson Adders 2

Overview

Definitions Tree Adders Ling Adders Jackson Adders 18-bit Jackson Tree Evaluation Methodology Preliminary Results

Page 3: Jackson Adders

Jackson Adders 3

Addition

Carry Propagate Adder

– Inputs: AN:0, BN:1

• A0 = Cin

– Outputs: SN:1

• Discard Cout

+

BN...1AN...1

SN...1

CinCou

Page 4: Jackson Adders

Jackson Adders 4

Propagate, Generate, KillOh My!

Bitwise Signals– Generate: Gi:i = Gi ≡ AiBi

– Propagate: Pi:i = Pi ≡ Ai+Bi Also called ~Ki

Xi ≡ Ai xor Bi

Group Recursion to form prefixes– Propagate Pi:j = Pi:kPk-1:j

– Generate Gi:j = Gi:k+Pi:kGk-1:j

– Group generates if upper part generates or upper part propagates and the lower part generates

Bitwise Sum

Si = Xi xor Gi-1:0

Page 5: Jackson Adders

Jackson Adders 5

Higher Valency Groups

Valency-2

– Propagate Pi:j = Pi:kPk-1:j

– Generate Gi:j = Gi:k+Pi:kGk-1:j

Valency-3

– Propagate Pi:j = Pi:kPk-1:lPl-1:j

– Generate Gi:j = Gi:k+Pi:k (Gk-1:j+Pk-1:IGl-1:j)

Valency-4

– Propagate Pi:j = Pi:kPk-1:lPl-1:mPm-1:j

– Generate Gi:j = Gi:k+Pi:k(Gk-1:j+Pk-1:I(Gl-1:m+Pl-1:mGm-1:j))

Page 6: Jackson Adders

Jackson Adders 6

Tree Adders

How should the recursion be organized?

S1

B1A1

P1G1

G0:0

S2

B2

P2G2

G1:0

A2

S3

B3A3

P3G3

G2:0

S4

B4

P4G4

G3:0

A4 Cin

G0 P0

1: Bitwise PG Logic

2: Group PG Logic

3: Sum LogicC0C1C2C3

Cout

Page 7: Jackson Adders

Jackson Adders 7

Black and Gray Cells

Black cell: – Group G and P

Gray cell: – Group G only

Inverting vs. non Higher Valency

i:j

i:j

i:k k–1:j

i:j

i:j

i:k k–1:l l–1:m m–1:j

i:k k–1:j

i:j

Gi:k

Pk–1:j

Gk–1:j

Gi:j

Pi:j

Pi:k

Gi:k

Gk–1:j

Gi:j Gi:j

Pi:j

Gi:j

Pi:j(a)

Gi:k

Gk–1:l

Gl–1:m

Gm–1:j

Gi:j

Pi:j

Pi:k

Pi:k

Pk–1:l

Pl–1:m

Pm–1:j

Gi:k

Pk–1:j

Gk–1:j

Gi:j

Pi:j

Pi:k

Gi:k

Gk–1:j

Gi:j Gi:j

Pi:j

Gi:j

Pi:j

Pi:k

Gi:k

Pk–1:j

Gk–1:j

Gi:j

Pi:j

Pi:k

Gi:k

Gk–1:j

Gi:j Gi:j

Pi:j

Gi:j

Pi:j

Pi:k

(b)

Odd Rows

Even Rows

Black Cell Gray Cell B

Page 8: Jackson Adders

Jackson Adders 8

Tree Adders

(e) Knowles [2,1,1,1]

1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14

3:04:15:26:37:48:59:610:711:812:913:1014:1115:12

4:05:06:07:08:19:210:311:412:513:614:715:8

2:0

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

1:03:25:47:69:811:1013:12

3:07:411:815:12

5:07:013:815:8

15:14

15:8 13:0 11:0 9:0

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

(f) Ladner-Fischer

1:03:25:47:69:811:1013:1215:14

3:05:27:49:611:813:1015:12

5:07:09:211:413:615:8

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

(c) Han-Carlson(a) Brent-Kung

(b) Sklansky

1:03:25:47:69:811:1013:1215:14

3:07:411:815:12

7:015:8

11:0

5:09:013:0

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

1:0

2:03:0

3:25:47:69:811:1013:1215:14

6:47:410:811:814:1215:12

12:813:814:815:8

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

15:0

(d) Kogge-Stone

1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14

3:04:15:26:37:48:59:610:711:812:913:1014:1115:12

4:05:06:07:08:19:210:311:412:513:614:715:8

2:0

0123456789101112131415

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

Page 9: Jackson Adders

Jackson Adders 9

Higher Valency Trees

Page 10: Jackson Adders

Jackson Adders 10

Sparse Trees

Sklansky sparseness 4– Only compute prefixes for every 4th column– Precompute 4-bit results for each possible carry in– Select result based on carry (group generate)

2:1

4:1

4:36:58:710:912:1114:1316:15

8:512:916:13

16:9

123456789101112131415

8:1

12:116:1

16 123456789101112131415

16

18:17

20:17

20:1922:2124:2326:2528:27

24:2128:25

171819202122232425262728293031

24:17

28:17

32 171819202122232425262728293031

32

20:124:128:1 '0

Page 11: Jackson Adders

Jackson Adders 11

Carry Selection

i j...(a)

+

+

Ai:j Bi:j

Si:j

0

1

Gj–1:0

(b)P1

Cin

S1S2S3S4

P4P4 P2P2P3P3 P1

G1P1G1PG2PG3

(c)

Cin =Gj–1

Page 12: Jackson Adders

Jackson Adders 12

Ling Adders

Factor some complexity out of first term Insert it back into sum selection Remove 1 transistor from critical path

Exploits fact that GiPi = (AiBi)(Ai+Bi) = Gi

Page 13: Jackson Adders

Jackson Adders 13

Ling Equations

Define Pseudogenerate: Hi:j ≡ Gi + Gi-1:j

– Simpler than Gi:j = Gi + PiGi-1:j

– Recreate Gi:j = PiHi:j = Pi(Gi + Gi-1:j) = Gi + PiGi-1:j

Define Pseudopropagate Ii:j ≡ Pi-1:j-1

– Shifted version of group propagate Valency-2 recursion is same as PG

– Hi:j = Hi:k + Ii:kHk-1:j

– Ii:j = Ii:kIk-1:j

Sum: Si = Xi xor Gi-1:0 = Xi xor (Pi-1Hi-1:0)

– Selection mux: Si = Hi-1:0 ? [Xi xor Pi-1] : Xi

Sum selection mux chooses Si

based on late-arriving Hi-1:j

Page 14: Jackson Adders

Jackson Adders 14

Ling Circuits

Simplifies first stage Compute Hi+1:I in one swell foop

A1

B1

A2 B2A2

B2

A2 B2

A1 B1

A2

B2

G2:1

G2:1 = G2 + P2G1

= A2B2 + (A2 + B2)A1B1

H2:1 = G2 + G1

= A2B2 + A1B1

A1A2

B1B2

A2 B2

A1 B1

H2:1

(a) (b)

Too hard

Easy

Page 15: Jackson Adders

Jackson Adders 15

Jackson Adders Generalized Ling technique

– Simplify logic in the prefix tree as well– Use sum selection to reinsert missing terms– Balance logic so both data and select to sum mux are

comparable in criticality Developed by Jackson and Talwar in 2004

– Used in Arithmetica synthesis tool– Parameterized by architecture, valency, sparseness– Reportedly produced superior energy-delay tradeoffs– Burgess09 indicates benefits over standard designs– No comprehensible complete published designs

Page 16: Jackson Adders

Jackson Adders 16

Jackson Logic Define new terms D: a group generates or propagates a carry

– Special case: B: a group generates a carry in at least one bit

Rewrite group generate:– Group generates if upper part generates or

propagates and either at least one bit of upper part generates or the low part generates

: : : : 1 :i j i j i j i j i jD G P G P

:

j

i j kk i

B g

:i i iD p

: : : 1:i j i k i k k jG D B G

Page 17: Jackson Adders

Jackson Adders 17

Reduced Generate

Again, Rename bracketed term reduced generate R

– Rp has the top p propgate signals stripped out

– R0i:j = Gi:j

– R1i:j = Hi:j

– Jackson consideres p ≥ 2 Group generate can be rewritten in terms of R

– Computing R prefixes can be easier than G

: : : 1:i j i k i k k jG D B G

: : 1 :pi j i i p i p jR B G

: : 1 :p

i j i i p i jG D R

Page 18: Jackson Adders

Jackson Adders 18

Hyperpropagate

Another term will be useful for recursion: hyperpropagate Define

– Special case for 2-bit groups: : : 1 :pi j i i p i p jQ P D

1 21: 1: 1:i i i i i iQ Q P

Page 19: Jackson Adders

Jackson Adders 19

Jackson Recursions

Valency-2 is no simpler

Valency-3 simplifies R at expense of Q

1

: : : 1:

1: : : 1:

p p i p k qi j i k i p k q k j

p p i p k qi j i k i p k q k j

R R Q R

Q Q R Q

1 1: : 1: 1: 1:

1 1: : 1: 1: 1:

i p i k k p p l l mi j i k k l p m l j

i p i k k p p l l mi j i k k l p m l j

R R R Q R

Q Q Q R Q

total top top mid top mid bot

total top mid bot

G G P G P P G

P P P P

Compare with

total top top bot

total top bot

G G P G

P P P

Compare with

Page 20: Jackson Adders

Jackson Adders 20

Valency-3 Circuits

Compound gate implementation

Simpler gate implementation

Rtotal

Rtop Rmid Rbot

Qflex

Rtop

Rmid

RbotQf.ex

Qbot

Qtop

Qmid

Rflex

Qtop Qmid

Rflex

Qbot Qtotal

Rtotal

Rtop

Rmid

Rbot

Qflex

Qtotal

Qtop

Qmid

Qbot

Rflex

Page 21: Jackson Adders

Jackson Adders 21

Logical Effort of Valency-3

PG RQ Compound

RQ Simpler

Ggenerate 4 2.67 2.22

Gpropagate 1.67 3.33 2.77

Pgenerate 5 4.33 4

Ppropagate 4 4.66 4

Page 22: Jackson Adders

Jackson Adders 22

Sum Selection

Select sum based on Rpi-1:0

– Requires p-bit D signal for sum-selection data input• This is the complexity that is factored out of R

D recursion

1:0

1: 1:0

1:0 1:0 1:

i i i

pi i i p i

p pi i i i i i p

s x G

x D R

R x R x D

1: : 1 : :

p i k pi j i i p i k i p jD D R Q

Page 23: Jackson Adders

Jackson Adders 23

Prior Work

[Jackson04]

+ Introduced R and Q

+ Showed how to compute a single sum output- Does not show how to build an entire adder- Does not include recursions for D, valency-2 R/Q

[Burgess09]

+ Comments on critical path

+ Comparisons suggest benefits of Jackson adder

- Hard to decipher diagram of 24-bit adder

Page 24: Jackson Adders

Jackson Adders 24

Example

18-bit Jackson Adder– Sklansky tree with sparseness 2– Valency-2 initial stage (like Ling)– Valency-3 2nd and 3rd stages– Only 4 levels of noninverting logic

Page 25: Jackson Adders

Jackson Adders 25

Initial Stage

Reduced Generate

Hyperpropagate

Also will need gi for even bits, pi for odd bits, xi for all bits

– For sum selection logic

12 1:2 2 1 2 2 1 2 1 2 2

11:0 0 1 1

i i i i i i i iR g g a b a b

R a a b

12 1:2 2 1 2 2 1 2 1 2 2i i i i i i i iQ p p a b a b

Page 26: Jackson Adders

Jackson Adders 26

Second Stage

Compute 3 and 6-bit group signals– Note potential for sharing common terms

3 1 1 1 117:12 17:16 15:14 14:13 13:12

1 1 1 115:12 15:14 14:13 13:12

3 1 1 1 111:6 11:10 9:8 8:7 7:6

1 1 1 19:6 9:8 8:7 7:6

3 1 1 1 15:0 5:4 3:2 2:1 1:0

1 1 1 13:0 3:2 2:1 1:0

R R R Q R

R R Q R

R R R Q R

R R Q R

R R R Q R

R R Q R

3 2 1 1 114:9 14:13 12:11 11:10 10:9

1 1 1 112:9 12:11 11:10 10:9

3 2 1 1 18:3 8:7 6:5 5:4 4:3

1 1 1 16:3 6:5 5:4 4:3

Q Q Q R Q

Q Q R Q

Q Q Q R Q

Q Q R Q

Page 27: Jackson Adders

Jackson Adders 27

Third Stage

Reduced generate signals for all groups

9 3 3 3 317:0 17:12 11:6 8:3 5:0

7 1 3 3 315:0 15:12 11:6 8:3 5:0

5 1 3 3 313:0 13:12 11:6 8:3 5:0

3 3 3 311:0 11:6 8:3 5:0

1 1 3 39:0 9:6 8:3 5:0

1 1 1 37:0 7:6 6:3 5:0

R R R Q R

R R R Q R

R R R Q R

R R Q R

R R Q R

R R Q R

Page 28: Jackson Adders

Jackson Adders 28

D Logic

Medium-length groups of D are required for sum selection

Note that D17:9 depends on R317:12

– Hence, arrives at same time as R917:0

3 3 1 1 3 317:9 17:15 17:12 14:9 17:17 17:16 16:15 17:12 14:9

1 315:9 15:15 15:12 14:9

1 113:9 13:13 13:12 12:9

1 111:9 11:11 11:10 10:9

9:9 9

7:7 7

15:3 5:4 5:3 5 5:4

D D R Q D R Q R Q

D D R Q

D D R Q

D D R Q

D p

D p

D G P p R

1 1 14:3 5:5 5:4 4:3

3:3 3

1:1 1

Q D R Q

D p

D p

Page 29: Jackson Adders

Jackson Adders 29

Sum Selection

Sparseness of 2 requires 1-bit ripple from even to odd

2 2 2 1:0 2 2 1:2 2 1:0

2 1:0 2 2 1:2 2? :

pi i i i i i p i

pi i i i p i

s x G x D R

R x D x

2 1 2 1 2 2 2 1:0 2 1 2 2 2 1:2 2 1:0

2 1:0 2 1 2 2 2 1:2 2 1 2? :

pi i i i i i i i i i p i

pi i i i i i p i i

s x g x G x g x D R

R x g x D x g

Page 30: Jackson Adders

Jackson Adders 30

Prefix Network

0 ***

a1, b1a2, b2

1

a3, b3a4, b4

2

a5, b5a6, b6

3

a7, b7a8, b8

4

a9, b9a10, b10

5

a11, b11a12, b12

6

a13, b13a14, b14

7

a15, b15a16, b16

8 ***

a17, b17a18, b18 a0

A2i

B2i

A2i+1

B2i+1

Buffer noncritical logic

x2i+2

x2i+1

g2i+2

p2i+1

A2i+2

B2i+2

Q12i+2:2i+1

R12i+1:2i

R11:0, Q

12:1R1

3:2, Q14:3R1

5:4, Q1

6:5R17:6, Q

18:7R1

9:8, Q110:9R1

11:10, Q112:11R1

13:12, Q114:13R1

15:14, Q116:15R1

17:16

012

R13:0R3

5:0R19:6R3

11:6R115:12R3

17:12 Q16:3Q3

8:3Q112:9Q3

14:9

i

A2i+2

B2i+2

A2i+1

B2i+1

A2i

B2i

j

R16j+1:6j

Q16j+2:6j+1

R16j+3:6j+2

Q16j+4:6j+3

R16j+5:6j+4

Q16j+6:6j+5

Q16j+8:6j+7

R16j+3:6j R3

6j+5:6j Q16j+6:6j+3 Q3

6j+8:6j+3

R16j+3:6j

R36j+5:6j

R16j+1:6j

Q16j+2:6j+1

R16j+3:6j+2

R16j+5:6j+4

Q16j+6:6j+3

Q36j+8:6j+3

Q16j+4:6j+3

R16j+5:6j+4

Q16j+6:6j+5

Q16j+8:6j+7

Notes: Black cells compute R and Q. Gray cells compute only RD network not shown

R17:0R1

9:0R311:0R5

13:0R715:0R9

17:0

8

s18

7

s16s17

6

s14s15

5

s12s13

4

s10s11

3

s8s9

2

s6s7

1

s4s5 s2s3

s2ks2k+1

Rp2k-1:0

Rp2k-1:0

s2k

s2k+1

0

1

0

1

x2k

s1

D2k-1:2k-p

X2k+1g2k

Page 31: Jackson Adders

Jackson Adders 31

Observations

Only 4 levels of noninverting logic D17:9 is critical

– Too much factored out of R917:0

– Could eliminate need by doing a 2-bit ripple into s18

18 18 17:0

18 17 17 16 17 16 15:0

718 17 17 16 17 16 15:9 15:9

715:9 15:9 18 17 17 16 16 18 17 17 16? :

s x G

x g p g p p G

x g p g p p D R

D R x g x g p x x x g

Page 32: Jackson Adders

Jackson Adders 32

Comparison Methodology Goal: energy-delay curves for Jackson adders compared

to conventional adders How can we objectively compare against the best

conventional design?– Technology mapping challenges– Sizing

• Gatesizer limitations• SCOT is better, but we only have 130 nm models

– Inadequate design effort on conventional cases Plan: synthesize with Design Compiler

– Compare against assign y = a + b;

Page 33: Jackson Adders

Jackson Adders 33

Preliminary Results

130 nm Artisan library for IBM CMOS8sf– 1.2 V– FO4 Delay: 55 ps

Fastest designs are 570 ps (10 FO4)

Jackson takes more energy except at very long delay

s18 optimization helps at fastest delays

Energy-Delay Tradeoffs

0

200

400

600

800

1000

1200

1400

1600

1800

0 0.5 1 1.5 2

Delay (ns)

En

erg

y (f

J) Jackson

Behavioral

JacksonOptS18

Page 34: Jackson Adders

Jackson Adders 34

Optimization Ideas Compare against Design Compiler architectures

– Starts with NAND/NOR to compute ~gi, ~pi

• Computes xi = pi * ~gi to avoid costly XORs– Appears to use valency-2 Sklansky tree with inverting gates– Final XOR

Logical effort analysis of critical path– Look for areas to reduce effort

Architecture– Valency: consider direct bitwise PG, followed by valency-3 Jackson tree– Sparseness (sparseness 3 in tree above?, sparseness 1)– Sklansky vs. Kogge-Stone

Verilog coding– Does sharing of terms explicitly help or hurt?– Code tuning experiments

Page 35: Jackson Adders

Jackson Adders 35

Sun Feedback

Issues raised at Sun review on 9 July 2010– Should we use SCOT to evaluate the effects of

continuous sizing?– Follow SCOT up with SPICE– Start without wire loads, add later– Wire load modeling in Design Compiler

Page 36: Jackson Adders

Jackson Adders 36

Short-Term Action Items Adder modeling (write eqns, code in Verilog, compare to DC)

– 32-bit Sklansky valency-2 baseline similar to DC• NAND/NOR to form Pbar, Gbar• G * Pbar to form X• Inverting stages of group logic• Final XOR• Does it exactly match DC results?

– 27-bit Jackson (1-bit, followed by 3 radix-3 stages)– 54-bit Jackson (2-bit Ling PG, followed by 3 radix-3 stages)– Explore optimization of 18-bit design

Logical effort analysis of critical path through 18-bit Jackson Tool to automatically generate energy-delay curves with DC Tool flow for DC 2010 with placement and expected wire cap Subversion repository setup Selection of cell library

Page 37: Jackson Adders

Jackson Adders 37

Cell Library IBM 45 nm partially-depleted SOI 12S ARM Library

– sc12_base_v31_rvt_soi12s0_ss_nominal_max_0p90v_125c_mxs.lib

– A12TR library with regular Vt (RVT) transistors– 12 track cell height (1.68 m)– Typical operating point: 1.0 V, 25 C– We use worst-case slow-slow, 0.9 V, 125 C library

• Use Maxsol (mxs) version for worst-case history effect– 1X inverter INV_X1B_A12TR:

• Width = 0.38 m• Cin = 1.6 fF• Intrinsic delay: 16.6 ps rise / 14.1 fall / 15.3 average• Kload: 1.46 ps/pF rise / 1.17 fall / 1.3 average• FO4 delay = 15.3 ps + 1.3 * 1.6 * 4 ≈ 24 ps

– But .lib for 21 ps slew rate, 7.9 fF load suggests» tpdf = 17 ps, tpdr = 23 ps, tpd = 20 ps, tf = 13 ps, tr = 23 ps

• Switching energy: 0.00078 W/MHz ≈ 0.8 fJ– equals 0.5 CinVDD

2

• Leakage power: 0.1 W (very high!)

Page 38: Jackson Adders

Jackson Adders 38

Summary Jackson adders appear to offer potential benefits

– Logical effort– Arithmetica results– Burgess results

Preliminary synthesis results don’t yet demonstrate the advantages

HMC 2010-11 Clay-Wolkin Research goals– Understand Jackson design space– Logical effort analysis of critical path– Develop Jackson adders superior to conventional

Design Compiler results

Page 39: Jackson Adders

Jackson Adders 39

References [Burgess09] N. Burgess, “Implementation of recursive Ling adders in CMOS

VLSI,” Proc. Asilomar Conf. Signals, Systems and Computers, 2009, pp. 1777-1781.

[Jackson04] R. Jackson and S. Talwar, “High speed binary addition,” Proc. Asilomar Conf. Signals, Systems and Computers, 2004, pp. 1350-1353.

[Jackson08] R. Jackson, “Data detection algorithms for perpendicular magnetic recording in the presence of strong media noise,” Ph.D. thesis, Department of Mathematics, University of Warwick, 2008.

[Ling81] H. Ling, "High-speed binary adder," IBM J. Research and Development, vol. 25, no. 3, May 1981, pp. 156-166.

[Patil07] D. Patil, O. Azizi, M. Horowitz, R. Ho, and R. Ananthraman, "Robust energy-efficient adder topologies," Proc. Computer Arithmetic Symp., Jun. 2007, pp. 16-28.

[Weste10] N. Weste and D. Money Harris, CMOS VLSI Design, 4th Ed., Boston: Addison-Wesley, 2010.

[Zlatanovici09] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-delay optimization of 64-bit carry-lookahead adders with a 240 ps 90 nm CMOS design example,” IEEE J. Solid-State Circuits, vol. 44, no. 2, Feb. 2009, pp. 569-583.