48
tal Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei Advanced VLSI Design Fall 2006 Lecture 17: Datapath Design & Adders Yunsi Fei [Adapted from Jan Rabaey et al’s Digital Integrated Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

ECE 300

Advanced VLSI DesignFall 2006

Lecture 17: Datapath Design

& AddersYunsi Fei[Adapted from Jan Rabaey et al’s Digital Integrated

Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]

Page 2: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Major Components of a Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

Modern processor architecture styles– Pipelined, single issue (e.g., ARM)

– Pipelined, hardware controlled multiple issue – superscalar

– Pipelined, software controlled multiple issue – VLIW

– Pipelined, multiple issue from multiple process threads - multithreaded

Page 3: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Basic Building Blocks

Datapath– Execution units

» Adder, multiplier, divider, shifter, etc.

– Register file and pipeline registers

– Multiplexers, decoders

Control– Finite state machines (PLA, ROM, random logic)

Interconnect– Switches, arbiters, buses

Memory– Caches, TLBs, DRAM, buffers

Page 4: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

MIPS 5-Stage Pipelined (Single Issue) Datapath

ReadAddress

I$

Add

PC

4

0

1

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

SignExtend16 32

ALU

1

0

Shiftleft 2

Add

D$Address

Write Data

ReadData

1

0

IF/D

ec

De

c/E

xe

c

Ex

ec

/Me

m

Me

m/W

B

pipelinestage

isolationregister

Fetch Decode Execute Memory WriteBack

clk

Icacheprecharge

Dcacheprecharge

RegWrite

Page 5: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Datapath Bit-Sliced Organization

Control Flow

Bit 0

Bit 1

Bit 2

Bit 3

Tile identical bit-slice elements

Re

gis

ter

File

Pip

elin

e R

egis

ter

Ad

der

Sh

ifter

Pip

elin

e R

egis

ter

Mu

ltip

lexe

r

Mu

ltip

lexe

r

Data Flow

Pip

elin

e R

egis

ter

From I$

Pip

elin

e R

egis

ter

To/From D$

Page 6: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Adders

Carry-ripple Manchester carry chain Carry skip Carry select Carry look ahead

Page 7: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

The 1-bit Binary Adder

1-bit Full Adder(FA)

A

B

S

Cin

S = A B Cin

Cout = A&B | A&Cin | B&Cin (majority function)

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?

A B Cin Cout S carry status

0 0 0 0 0 kill

0 0 1 0 1 kill

0 1 0 0 1 propagate

0 1 1 1 0 propagate

1 0 0 0 1 propagate

1 0 1 1 0 propagate

1 1 0 1 0 generate

1 1 1 1 1 generate

Cout

G = A&BP = A BK = !A & !B

= P Cin

= G | P&Cin

Page 8: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Delay Balanced FA

B !B

Identical Delays for Carry and Sum

P !P

Signal set-up

B

A

!B

pA

Carry generation

Sum generation

Cin

!P

A

!Cout

!P

P

Cin

P

A

!Cout

P

!P

SCin Cin

20+2 transistors

Page 9: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

A 64-bit Adder/Subtractor

1-bit FA S0

C0=Cin

C1

1-bit FA S1

C2

1-bit FA S2

C3

C64=Cout

1-bit FA S63

C63

. .

.

Ripple Carry Adder (RCA) built out of 64 FAs

Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in

RCA

advantage: simple logic, so small (low cost)

disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)

A0

B0

A1

B1

A2

B2

A63

B63

add/subt

Page 10: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Ripple Carry Adder (RCA)

A0 B0

S0

C0=CinFA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACout=C4

T = O(N) worst case delay

Tadder TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

Real Goal: Make the fastest possible carry path

Page 11: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Inversion Property

A B

S

CinFA

!Cout (A, B, Cin) = Cout (!A, !B, !Cin)

Cout

A B

S

FACout Cin

!S (A, B, Cin) = S(!A, !B, !Cin)

Inverting all inputs to a FA results in inverted values for all outputs

Page 12: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Exploiting the Inversion Property

A0 B0

S0

C0=CinFA’

A1 B1

S1

FA’

A2 B2

S2

FA’

A3 B3

S3

FA’Cout=C4

Now need two “flavors” of FAs

regular cellinverted cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs

Page 13: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Fast Carry Chain Design

The key to fast addition is a low latency carry network What matters is whether in a given position a carry is

– generated Gi = Ai & Bi = AiBi

– propagated Pi = Ai Bi (sometimes use Ai | Bi)

– annihilated (killed) Ki = !Ai & !Bi

Giving a carry recurrence of Ci+1 = Gi | PiCi

C1 = G0 | P0C0

C2 = G1 | P1G0 | P1P0 C0

C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0

C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0

Page 14: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Manchester Carry Chain

Switches controlled by Gi and Pi

Total delay of– time to form the switch control signals Gi and Pi

– setup time for the switches– signal propagation delay through N switches in the worst case

Gi Pi

!Ci!Ci+1

clk

Page 15: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

4-bit Sliced MCC Adder

G P

!C0

clk

G PG PG P

& & & &

A0 B0A1 B1A2 B2A3 B3

S0S1S2S3

!C1!C2!C3

!C4

Page 16: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Domino Manchester Carry Chain Circuit

Ci,0G0

clk

clkP0P1P2P3

G1G2G3

Ci,41 2 3 4

5

6

3 3 3 3 3

1

2

2

3

3

4

4

5

!(G0 | P0 Ci,0)

!(G1 | P1G0 | P1P0 Ci,0)

!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)

!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)

Page 17: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Binary Adder Landscapesynchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchester carry parallel conditional carry carry chain select prefix sum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(log N)A = O(N log N)

T = O(N), A = O(N)T = O(N)

A = O(N)

Page 18: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-Skip (Carry-Bypass) Adder

If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the block itself kills or generates the carry internally

A0 B0

S0

Ci,0FA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACo,3

Co,3

BP = P0 P1 P2 P3 “Block Propagate”

Page 19: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-Skip Chain Implementation

BPblock carry-in

block carry-outcarry-out

Cin

G0

P0P1P2P3

G1G2G3

!Cout

BP

Page 20: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

4-bit Block Carry-Skip Adder

Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15

Ci,0

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15

Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum

Page 21: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Optimal Block Size and Time

Assuming one stage of ripple (tcarry) has the same delay as one skip logic stage (tskip) and both are 1

TCSkA = 1 + B + (N/B-1) + B + 1

tsetup ripple in skips ripple in tsum

block 0 last block

= 2B + N/B + 1 So the optimal block size, B, is

dTCSkA/dB = 0 (N/2) = Bopt

And the optimal time is

Optimal TCSkA = 2((2N)) + 1

Page 22: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-Skip Adder Extensions Variable block sizes

– A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay

CinCout

Multiple levels of skip logic

skip level 1

skip level 2

CinCout

AND of the first level skip signals (BP’s)

Page 23: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-Skip Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

VSkA

B=2 B=3B=4

B=5B=6

Page 24: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Parallel Prefix Adders (PPAs)

Define carry operator € on (G,P) signal pairs

– € is associative, i.e.,

[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

(G’’,P’’) (G’,P’)

(G,P)

where G = G’’ P’’G’ P = P’’P’

€ €

G’

!G

G’’

P’’

Page 25: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

PPA General Structure Given P and G terms for each bit position, computing all the

carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order – but note that it is not commutative

Measures to consider– number of € cells

– tree cell depth (time)

– tree cell area

– cell fan-in and fan-out

– max wiring length

– wiring congestion

– delay path variation (glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)

Page 26: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Brent-Kung PPAP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

p2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

p9

G10

P10

G11

p11

G12

P12

G13

p13

G14

p14

G15

p15

€€€€€€€

€ € € €

€ € € € € €

€ €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

T =

log 2

NT

= lo

g 2N

- 2

A =

2lo

g 2N

A = N/2

Page 27: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Kogge-Stone PPF AdderP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

P2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

P9

G10

P10

G11

P11

G12

P12

G13

P13

G14

P14

G15

P15

€€€€€€€

€ € € €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

T =

log 2

N

A =

log 2

N

A = N

€€€€€€€

€ € € € € € € € € €

€ € € € € € € € € €

€ € € € € €

Tadd = tsetup + log2N t€ + tsum

Page 28: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

More Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

VSkA

KS PPA

Page 29: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Topics

Adders and ALUs (§6.4, §6.5)– Carry-ripple– Carry look ahead– Manchester carry chain– Carry skip– Carry select

Multipliers (§6.6) Subsystem design principles (§6.2)

Page 30: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Adders

1-bit full adder– Si = ai bi ci

– ci+1 = aibi + aici + bici

Carry-ripple adder– n-bit adder built from full adders

Adder delay is dominated by carry chain– Naming: Carry- … adder

Page 31: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

1-bit Full Adder: the Mirror Adder

VDD

Ci

A

BBA

B

A

A B

VDD

Ci

A B Ci

Ci

B

A

Ci

A

BBA

VDD

SCo

24 transistors

Page 32: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-lookahead Adder

First compute carry propagate, generate:– Pi = ai + bi

– Gi = ai bi

Compute sum and carry from P and G:– Si = ci Pi Gi = ai bi ci

– ci+1 = Gi + Pici

= Gi + PiGi-1 + PiPi-1 Gi-2 + … +Pi …Pj cj

Page 33: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Depth-4 Carry-lookahead

C1= G0 + P0Cin

C2= G1 + P1 G0 + P1P0Cin

C3= G2 + P2G1+P2P1 G0 + P2P1P0Cin

C4 = G3 + P3G2 + P3P2G1+ P3P2P1 G0 + P3P2P1P0Cin

Page 34: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Analysis

Deepest carry expansion requires gates with large fanin: large, slow– Generally use 4-bit groups– Domino logic implementation

Carry look ahead tree– C4 = G3 + P3G2 + P3P2G1+ P3P2P1 G0 + P3P2P1P0Cin

» G* = G3 + P3G2 + P3P2G1+ P3P2P1 G0

» P* = P3P2P1P0

» C4 = G* + P*Cin

Page 35: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Manchester Carry Chain Circuit

Gi-1

Pi-1

+

Gi

Pi

+

stage i-1 stage i

Ci+1Ci-1Ci

Page 36: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Manchester Carry Chain

Precharged/evaluate carry chain Principles

– If Gi = aibi = 1, Pi = ai+bi = 0, Ci+1 = 1

– If Gi = aibi = 0, Pi = ai+bi = 0, Ci+1 = 0

– If Gi = aibi = 0, Pi = ai+bi = 1, Ci+1= Ci

Worst-case discharge path goes through entire carry chain.

Page 37: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-skip Adder

For m-bit addition, its Cout can be– Inherited from Cin

» ai bi for every bit in stage

– Generated locally within m-bit» i.e. The Cout when Cin = 0

Optimum group size: m = sqrt(n/2)

Longest path:

– Similar to Manchester chain

Page 38: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Two-bit Carry-skip Structure

ai

bi

ai+1

bi+1

Ci

ai+1 bi+1 + (ai+1+bi+1)aibi

Ci+2

or using a mux

Page 39: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-skip Group Structure

M-bit FA

group

M-bit FA

group

Page 40: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-select Adder

Computes two results in parallel, each for different carry input assumptions.

Uses actual carry in to select correct result. Reduces delay to multiplexer.

Page 41: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Carry-select Structure

Page 42: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

DEC “alpha” 21064 Adder

64-bit adder, 0.75m technology, 5ns delay

On the 8-bit level: Manchester chain On the 32-bit sub-block: Carry look ahead On the 64-bit block: Carry select

Page 43: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Serial Adder

May be used in signal-processing arithmetic where fast computation is important but latency is unimportant.

Data format (LSB first):

bit 0bit 1bit 2bit 3...

Page 44: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Serial adder Structure

LSB control signal clears the carry shift register:

Page 45: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Subtraction

a – b = a + (-b) For an n-bit number b, how do we get its

complement?– (-b) = b + 1– a + (-b) = a + b + 1

» Using “1” as the carry-in to avoid two additions

Page 46: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

ALUs

ALU computes a variety of logical and arithmetic functions based on opcode.– Shift

» Arithmetic/logical shift left, shift right

– Logic operations» AND, OR, NOT, …

– Add/subtract» Signed/unsigned, …

Page 47: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

Opcodes

The control bits that determine the datapath– Whether it is a shift, add, subtract …

Must be carefully designed to ease decoding– Use decoder/de-multiplexer to select the correct

datapath

Page 48: Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture

Digital Integrated Circuits 2e: Chapter 11.1-11.3 Copyright 2002 Prentice Hall PTR, Adapted by Yunsi Fei

An ALU Adder Structure