45
mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and Systems Engineering Dept. Rensselaer Polytechnic Institute

Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

  • Upload
    ganesa

  • View
    30

  • Download
    3

Embed Size (px)

DESCRIPTION

Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors. Matthew W. Ernest Electrical, Computer and Systems Engineering Dept. Rensselaer Polytechnic Institute. Overview. Motivation Parallel Prefixes and Carry Types HBT Digital Circuits - PowerPoint PPT Presentation

Citation preview

Page 1: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/1

Critical ALU Path Optimization and Implementation in a

BiCMOS Process for Gigahertz Range Processors

Matthew W. Ernest

Electrical, Computer and Systems Engineering Dept.

Rensselaer Polytechnic Institute

Page 2: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/2

Overview

• Motivation

• Parallel Prefixes and Carry Types

• HBT Digital Circuits

• Pseudo-carry Adder

• Future Directions

Page 3: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/3

Motivation

“Speed has always been important otherwise one wouldn't need the computer.” -Seymour

Cray

• Ubiquity

• Simplicity

• Complexity

Page 4: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/4

Parallel Prefixes

• The set of problems covering sequences of operations where terms are added in order to the result of the previous operation

• Carry computation is an application of parallel prefix theory

Given: x0 x1 x2 ... xk

Find: x0 x0 x1 x0 x1 x2 ... x0 x1 x2... xk

Page 5: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/5

Carry types: Carry Select• Compute possible results in

parallel• Select when actual carry-in

available• Requires internal carry for

blocks, e.g. ripple• Delay: O(f(n/b) +b), min.

O(n1/2)• Area: O(f(n/b)b+b), approx.

2n • Affected by block sizing

0

1

0

1

Page 6: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/6

Carry Types: Carry look-ahead

• Carry-out can be “generated” at current position or carry-in “propagated”

• Delay: O(1)• Area: O(n2)• High fan-in/fan-out

Page 7: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/7

Carry Types: Block carry look-ahead

• A block propagates a carry if all bits in the block propagate a carry

• A block generates a carry if a bit generates a carry and all succeeding bits propagate

• Delay: O(log n)

• Area: O(n log n)

Page 8: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/8

Block carry look-ahead trees

Page 9: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/9

Carry vs. Pseudo-carryCout=Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin

If G=A•Band P=A+Bthen

G=G•PCout= Pn•Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin

Cout= Pn(Gn+ Gn-1 +…+Pn-1• ... P0• Cin)Cout= Pn•Hn

Hn =Gn+ Gn-1 +…+Pn-1• ... P0• Cin

Page 10: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/10

Carry vs. Pseudo-carry

• Redundant terms create factorization opportunities

• Factorization moves terms from critical paths to non-critical paths

• Multiple paths can be parallelized

• Products with fewer terms lead to implementations with smaller, faster gates

Page 11: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/11

Block Generate:Gi•j

0= Gij + Pi

jGij-1i + … + Pi

jPij-1iPi

j-2i•••Gi0

If G=A•Band P=A+Bthen

G=G•PGi•j

0= PijGi

j + PijGi

j-1i + … + PijPi

j-1iPij-2i•••Gi

0

Gi•j0= Pi

j(Gij + Gi

j-1i + … + Pij-1iPi

j-2i•••Gi0)

Hi•j0= Gi

j + Gij-1i + … + Pi

j-1iPij-2i•••Gi

0

Deriving Block Pseudo-carry from Block Carry Look-ahead Terms

• Pseudo-carries can be generated in blocks like carries

Page 12: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/12

H2s= G1

s+1 + G1s

Hi+js= Hj

s+i + Ijs+i-1•Hi

s

Hi+j+ks= Hk

s+I+j + Iks+I+j-1•Hj

s+i + Iks+I+j-1• Ij

s+i-1•His

Ip+qt= Iq

t+p•Ipt

Ip+q+rt= Ir

t+q+p•Iqt+p•Ip

t

Generalized Pseudocarry Equations

Page 13: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/13

Sn=AnBnCn-1

IfTn=AnBn

Cm= Pm•Hm

thenSn=TnPn-1Hn-1

Generating Sums Using Pseudocarry

• Sum with pseudo-carry no more complex than sum with carry

• Other look-ahead features still apply, e.g. Han-Carlson “every other carry”

Page 14: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/14

Adder comparision

Bits Rip

ple

CSelA B C CLA

PC

LA

32 32 12 12 9 6 5

64 64 20 16 12 7 6

Page 15: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/15

HBT Digital Circuits

• Exponential I/V relationship leads to high gain and fast switching

• Vertical arrangement allows critical dimensions to be smaller with tighter tolerances

• Traditionally high DC power consumption: compare increasing leakage and switching currents for FETs

Page 16: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/16

Current Steering Logic• Constant current source equals

combined emitter currents• Ratio of current through each

transistor is exp. function of base voltage

• Difference in currents at collector converted to difference in voltage on pull-up resistors.

Page 17: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/17

Single-ended vs. Double-ended

• Limited to simple functions

• Large fan-in

• Any function of inputs• Fan-in limited by supply

voltage

Page 18: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/18

Look-ahead gate w/ fully differential logic

Hn

In

Hn-1 Hn-1

In

Hn

Hn Hn

In In

Hn-1 Hn-1

Hn-2 Hn-2

In-1 In-1

Page 19: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/19

Mixed input look-ahead gates

Hn

In

Hn-1

In

HnVr Vr • In(Hn+ Hn-1) + In•Hn

• Hn+ In•Hn-1

• Two series-gated levels for three inputs

Page 20: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/20

Hn Hn

InIn

Hn-1 Hn-1Hn-2

In-1 In-1

Hn

Mixed input look-ahead gates

• In In-1(Hn+ Hn-1 + Hn-2) + In

In-1(Hn+ Hn-1) + In• In-1• Hn

• Hn+ In•Hn-1 + In• In-1• Hn-2

• Three series-gated levels for five inputs

Page 21: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/21

Pseudocarry BlocksH2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s

H6s

H6s H6

sH6

s H6s

H6s H6

sH6

s H6s

H6s

H18s

H18s H14

sH14

s

H32s

H32s

Page 22: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/22

Pseudocarry Tree Oscillator

B A

Cin

Cout

32

031

1

1 Select

Page 23: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/23

Carry Tree High-speed Output

2 x 165 ps

Page 24: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/24

Breakdown of measured delay

Devices

71%

Wire C

12%

Temperature

6%

Resistor model

11%

Total measured delay = 165 ps

Page 25: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/25

Loaded vs. unloaded toggling

• At design time, fT peak at 1.2mA/um2 but limit at 2mA/um2

• For some devices, max. frequency when driving load can occur above fT peak current

• Models supported this, no reason at time to not believe them

• However, models are never qualified above fT peak current!

Page 26: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/26

Loaded vs. unloaded toggling

0.00E+00

1.00E-11

2.00E-11

3.00E-11

4.00E-11

5.00E-11

6.00E-11

7.00E-11

8.00E-11

0.00E+00 5.00E-04 1.00E-03 1.50E-03 2.00E-03 2.50E-03

Tail Current

Bu

ffer

Del

ay

Page 27: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/27

Resistor Model Effects9805A 99B

Simulated Fabricated

Pull-up 444 528

Tail 1000 1091

Page 28: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/28

Model parameter variation

0

50

100

150

200

250

300

350

400

450

500

9708A 9802 9805 1999B v2.3

Design Kit

Par

amte

r val

ue RB (ohms)

RE (ohms)

RC (ohms)

DARPA02 Design DARPA02 Fabrication

Page 29: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/29

Cadence internal parasitic methods

• Approximates all capacitance as polynomial function of distance between conductors

• Cannot extract RC and capacitance between conductors at the same time: killer for differential wiring!

• Convenient, but window of usability small and shrinking

Page 30: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/30

QuickCap capacitance extraction

• Field solving with floating random walk method

• Accuracy almost wholly a function of run time: 4x run time give ½ error

• Random walks independent, near perfect parallelization

Page 31: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/31

Comparing parasitic extraction

0

5

10

15

20

25

30

35

40

45

50

0 200 400 600 800 1000 1200

Length (um)

Dela

y (

ps) Qcap RC

RCNET

PCAP

Calc RC

Page 32: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/32

Cadence/QuickCap Design Flow• Extract physical data

from layout

• Compute RC with QuickCap

• Extract netlist from schematic

• Combine to simulate with Spectre

Page 33: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/33

Partial manual extraction with QuickCap

• Identify main wires of oscillation paths: approx. dozen pairs

• QuickCap extraction for each wire-ground cap. and cap. between pair

• Add RC-ladder for each pair by hand to schematic and simulate

Page 34: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/34

Simulation with Parasitic Extraction

Feedback path

w/o parasitics

(ps)

QuickCap parasitic cap.

(ps)

COEFGEN parasitic cap.

(ps)

Raphael parasitic

cap.(ps)

QuickCap parasitic

RC(ps)

Cin 100 121 128 131 135

A1 103 123 130 129 137

A31 108 127 129 132 141

Page 35: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/35

Pseudo-carry Tree configured as Ring Oscillator

B A

Cin

Sel0Sel1

Cout

32 30 1

1

1

00...00 11...11

Page 36: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/36

SMI00 Test Structure Layout

Page 37: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/37

SMI00 Test Structure

Page 38: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/38

Carry Tree High-speed Outputs

16 x 146 ps

Page 39: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/39

Comparisons of published adders

Reference Type Size Gate Del. TimeZIMM96 Carry 32 5 -STEL96 Adder 64(32) 12.5(12?) -WANG97 Adder 32 3 2.7nsCHAN98 Adder 64(32) 27(19.5) -SILB98 Fixed 64 - 550 psAIPP99 Adder 64 - 660 psSAGE01 Adder 32[16x2] - <500psMATH01 Adder 64 - 482 psSTAS01 Adder 64 - 440 psLEE02 Adder 64   900 psVANA02 ALU 32 8 <200 ps

Page 40: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/40

Cascode Output Stage• Eliminates Miller

capacitance between input and output

• Reduces Cjc and Cjs on outputs

• Shortens rise time, but increases delay

Page 41: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/41

Dotted Emitter/Collector

Page 42: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/42

“Wide/Short” gate with dotted emitter/collector

Page 43: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/43

“Wide/Short” gate with dotted emitter/collector

• Shorter trees lead to lower supply voltages• Wider trees reduce ratio of emitter-followers to

terms computed, lowering total current• More inputs per look-ahead gate means fewer

look-ahead levels• Elimination of single-ended inputs on critical H

signals allow faster switching with reduced swing

Page 44: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/44

Even wider look-ahead gate

Width limited by• Accumulated Cjc and Cjs of dotted-and node• Saturation vs. breakdown• Fan-out loading from inputs and interconnect

Page 45: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept

mwe/PHD/45

Conclusions

• 32-bit addition depth reduced to 5 gates fabricated. 4 and 3 gate depth circuits designed.

• Gate to compute 3-way look-ahead fabricated. Up to 8-way look-ahead designed.

• Carry delay for 32-bit addition measured at 146ps.• QuickCap technology file for 5HP brings

simulated results within 11% of measured.