44
CMOS VLSI For Computer Engineering Lecture 9: SRAM Parts from http://www3.hmc.edu/~harris/c mosvlsi/4e/index.html

Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

Lecture 9: SRAMParts from

http://www3.hmc.edu/~harris/cmosvlsi/4e/index.html

Page 2: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 2

Outline

• Memory Arrays

• SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

• Serial Access Memories

Page 3: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 3

Memory Arrays

Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory(CAM)

Read/Write Memory(RAM)

(Volatile)

Read Only Memory(ROM)

(Nonvolatile)

Static RAM(SRAM)

Dynamic RAM(DRAM)

Shift Registers Queues

First InFirst Out(FIFO)

Last InFirst Out(LIFO)

Serial InParallel Out

(SIPO)

Parallel InSerial Out

(PISO)

Mask ROM ProgrammableROM

(PROM)

ErasableProgrammable

ROM(EPROM)

ElectricallyErasable

ProgrammableROM

(EEPROM)

Flash ROM

Page 4: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 4

Array Architecture

• 2n words of 2m bits each

• If n >> m, fold by 2k into fewer rows of more columns

• Good regularity – easy to design

• Very high density if good cells are used

Page 5: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 5

12T SRAM Cell

• Basic building block: SRAM Cell Holds one bit of information, like a latch Must be read and written

• 12-transistor (12T) SRAM cell Use a simple latch connected to bitline 46 x 75 l unit cell

bit

write

write_b

read

read_b

Page 6: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 6

6T SRAM Cell

• Cell size accounts for most of array size Reduce cell size at expense of complexity

• 6T SRAM Cell Used in most commercial chips Data stored in cross-coupled inverters

• Read: Precharge bit, bit_b Raise wordline

• Write: Drive data onto bit, bit_b Raise wordline

bit bit_b

word

Page 7: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 7

SRAM Read

• Precharge both bitlines high

• Then turn on wordline

• One of the two bitlines will be pulled down by the cell

• Ex: A = 0, A_b = 1 bit discharges, bit_b stays high But A bumps up slightly

• Read stability A must not flip N1 >> N2

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600

time (ps)

word bit

A

A_b bit_b

Page 8: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 8

SRAM Write

• Drive one bitline high, the other low

• Then turn on wordline

• Bitlines overpower cell with new value

• Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 Force A_b low, then A rises high

• Writability Must overpower feedback inverter N2 >> P1

time (ps)

word

A

A_b

bit_b

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600 700

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

Page 9: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 9

SRAM Sizing

• High bitlines must not overpower inverters during reads

• But low bitlines must write new value into cellbit bit_b

med

A

weak

strong

med

A_b

word

Page 10: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 10

SRAM Column Example

Read Write

H H

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

out_v1rout_b_v1r

1

2

word_q1

bit_v1f

out_v1r

2

MoreCells

Bitline Conditioning

2

MoreCells

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

data_s1

write_q1

Bitline Conditioning

Page 11: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

WL

BL

VDD

M 5

M 6

M 4

M1VDDVDD VDD

BL

Q = 1Q = 0

Cbit Cbit

CMOS SRAM Analysis (Read)

Page 12: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

0

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.2 1.5 2

Cell Ratio (CR)

2.5 3

Vol

tage

Ris

e (V

)CMOS SRAM Analysis (Read)

Page 13: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

BL = 1 BL = 0

Q = 0Q = 1

M1

M4

M5

M6

VDD

VDD

WL

CMOS SRAM Analysis (Write)

Page 14: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

CMOS SRAM Analysis (Write)

Page 15: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 15

SRAM Layout

• Cell size is critical: 26 x 45 l (even smaller in industry)

• Tile cells sharing VDD, GND, bitline contacts

VDD

GND GNDBIT BIT_B

WORD

Cell boundary

Page 16: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 16

Thin Cell

• In nanometer CMOS Avoid bends in polysilicon and diffusion Orient all transistors in one direction

• Lithographically friendly or thin cell layout fixes this Also reduces length and capacitance of bitlines

Page 17: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 17

Commercial SRAMs

• Five generations of Intel SRAM cell micrographs Transition to thin cell at 65 nm Steady scaling of cell area

Page 18: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 18

Voltage Scaling and the 6T cell• Feedback circuits such as back-to-back

inverters, especially in SRAM’s, do not scale well with voltage due large threshold fluctuations due to the minimum size devices used in these designs

•Current microprocessor architectures such as Nehalem alleviate this by adopting the 8T-SRAM cell vs. the conventional 6T cell.

Chang et al., IEEE JOURNAL OF SOLID-STATE CIRCUITS, pp. 956-963, VOL. 43, NO. 4, APRIL 2008

On average 8T cell array is 16% larger than a 6T cell array

Page 19: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 19

Decoders

• n:2n decoder consists of 2n n-input AND gates One needed for each row of memory Build AND from NAND or NOR gates

Static CMOS Pseudo-nMOS

word0

word1

word2

word3

A0A1

A1word

A0 1 1

1/2

2

4

8

16word

A0

A1

1

1

11

4

8word0

word1

word2

word3

A0A1

Page 20: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 20

Decoder Layout

• Decoders must be pitch-matched to SRAM cell Requires very skinny gates

GND

VDD

word

buffer inverterNAND gate

A0A0A1A2A3 A2A3 A1

Page 21: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 21

Large Decoders

• For n > 4, NAND gates become slow Break large gates into multiple smaller gates

word0

word1

word2

word3

word15

A0A1A2A3

Page 22: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 22

Predecoding

• Many of these gates are redundant Factor out common

gates into predecoder Saves area Same path effort A0

A1

A2

A3

word1

word2

word3

word15

word0

1 of 4 hotpredecoded lines

predecoders

Page 23: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 23

Column Circuitry

• Some circuitry is required for each column Bitline conditioning Sense amplifiers Column multiplexing

Page 24: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 24

Bitline Conditioning

• Precharge bitlines high before reads

• Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b

bit bit_b

Page 25: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 25

Sense Amplifiers

• Bitlines have many cells attached Ex: 32-kbit SRAM has 128 rows x 256 cols 128 cells on each bitline

• tpd (C/I) DV Even with shared diffusion contacts, 64C of diffusion

capacitance (big C) Discharged slowly through small transistors (small I)

• Sense amplifiers are triggered on small voltage swing (reduce DV)

Page 26: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 26

Differential Pair Amp

• Differential pair requires no clock

• But always dissipates static power

bit bit_bsense_b sense

N1 N2

N3

P1 P2

Page 27: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 27

Clocked Sense Amp

• Clocked sense amp saves power

• Requires sense_clk after enough bitline swing

• Isolation transistors cut off large bitline capacitance

bit_bbit

sense sense_b

sense_clk isolationtransistors

regenerativefeedback

Page 28: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 28

Twisted Bitlines

• Sense amplifiers also amplify noise Coupling noise is severe in modern processes Try to couple equally onto bit and bit_b Done by twisting bitlines

b0 b0_b b1 b1_b b2 b2_b b3 b3_b

Page 29: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 29

Column Multiplexing

• Recall that array may be folded for good aspect ratio

• Ex: 2 kword x 16 folded into 256 rows x 128 columns Must select 16 output bits from the 128 columns Requires 16 8:1 column multiplexers

Page 30: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 30

Tree Decoder Mux

• Column mux can use pass transistors Use nMOS only, precharge outputs

• One design is to use k series transistors for 2k:1 mux No external decoder logic needed

B0 B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7A0

A0

A1

A1

A2

A2

Y Yto sense amps and write circuits

Page 31: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 31

Single Pass-Gate Mux

• Or eliminate series transistors with separate decoder

A0A1

B0 B1 B2 B3

Y

Page 32: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 32

Ex: 2-way Muxed SRAM

MoreCells

word_q1

write0_q1

2

MoreCells

A0

A0

2

data_v1

write1_q1

Page 33: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 33

Multiple Ports

• We have considered single-ported SRAM One read or one write on each cycle

• Multiported SRAM are needed for register files

• Examples: Multicycle MIPS must read two sources or write a result on

some cycles Pipelined MIPS must read two sources and write a third

result each cycle Superscalar MIPS must read and write many sources and

results each cycle

Page 34: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 34

Dual-Ported SRAM

• Simple dual-ported SRAM Two independent single-ended reads Or one differential write

• Do two reads and one write by time multiplexing Read during ph1, write during ph2

bit bit_b

wordBwordA

Page 35: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 35

Multi-Ported SRAM

• Adding more access transistors hurts read stability

• Multiported SRAM isolates reads from state node

• Single-ended bitlines save area

Page 36: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 36

Large SRAMs

• Large SRAMs are split into subarrays for speed

• Ex: UltraSparc 512KB cache 4 128 KB subarrays Each have 16 8KB banks 256 rows x 256 cols / bank 60% subarray area efficiency Also space for tags & control

[Shin05]

Page 37: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 37

Serial Access Memories

• Serial access memories do not use an address Shift Registers Tapped Delay Lines Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Queues (FIFO, LIFO)

Page 38: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 38

Shift Register

• Shift registers store and delay data

• Simple design: cascade of registers Watch your hold times!

clk

Din Dout8

Page 39: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 39

Denser Shift Registers

• Flip-flops aren’t very area-efficient

• For large shift registers, keep data in SRAM instead

• Move read/write pointers to RAM rather than data Initialize read address to first entry, write to last Increment address on each cycle

Din

Dout

clk

counter counter

reset

00...00

11...11

readaddr

writeaddr

dual-portedSRAM

Page 40: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 40

Tapped Delay Line

• A tapped delay line is a shift register with a programmable number of stages

• Set number of stages with delay controls to mux Ex: 0 – 63 stages of delay

SR

32

clk

Din

delay5

SR

16

delay4

SR

8

delay3

SR

4

delay2

SR

2

delay1

SR

1

delay0

Dout

Page 41: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 41

Serial In Parallel Out

• 1-bit shift register reads in serial data After N steps, presents N-bit parallel output

clk

P0 P1 P2 P3

Sin

Page 42: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 42

Parallel In Serial Out

• Load all N bits in parallel when shift = 0 Then shift one bit out per cycle

clkshift/load

P0 P1 P2 P3

Sout

Page 43: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 43

Queues

• Queues allow data to be read and written at different rates.

• Read and write each use their own clock, data

• Queue indicates whether it is full or empty

• Build with SRAM and read/write counters (pointers)

Queue

WriteClk

WriteData

FULL

ReadClk

ReadData

EMPTY

Page 44: Lecture 9: SRAM Parts from harris/cmosvlsi/ 4e/index.html

CMOS VLSI for Computer Engineering

19: SRAM 44

FIFO, LIFO Queues

• First In First Out (FIFO) Initialize read and write pointers to first element Queue is EMPTY On write, increment write pointer If write almost catches read, Queue is FULL On read, increment read pointer

• Last In First Out (LIFO) Also called a stack Use a single stack pointer for read and write