View
231
Download
1
Category
Preview:
Citation preview
Foil # 1 / 58 The University of Texas at AustinEE 382M Class Notes
Early Planning for Memory Array
Design
EE-382M VLSI–II
Steven C. SullivanGian Gerosa
Foil # 2 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (5 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 3 / 58 The University of Texas at AustinEE 382M Class Notes
Access TimeCapacity
Register File 0.25-1ns0.5-1KB
Level 1 Cache 1-4ns8-64KB
Level 2 Cache 5-20ns256KB-2MB
Main Memory 35-50ns128-256MB
Hard Drive 5-10ms10-50GB
Memory Hierarchy
Processor
Memory hierarchy gives the appearance of large capacity and fast access time.
Foil # 4 / 58 The University of Texas at AustinEE 382M Class Notes
2006
1982
Processor-Memory Performance Gap
µProc60%/yr
DRAM7%/yr.
1
10
100
1000
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance Gap:(grows 50% per year)
Perf
orm
ance
The need for memory hierarchy is steadily
increasing. 20
0120
0220
0320
04
1.35X/yr
1.55X/yr
2005
2007
10000
Foil # 5 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Hierarchy Evolution
Chipset
Cache
DRAM
386No on-die cache.
L1 cache on motherboard.
CPU
Chipset
L2
DRAM
L1
486
CPU
Level 1 cache on-die. Level 2 on motherboard
Chipset
L2
DRAM
Pentium
I D
CPU
Separate Instruction and Data Caches
Foil # 6 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Hierarchy Evolution
Chipset DRAM
Pentium III
I D
CPU
L2 cache on-die
L2
Chipset DRAM
Pentium 4(Foster)
I D
CPU
L3 cache on-die
L2L3
Chipset
L2
DRAM
Pentium II
I D
CPU
Separate bus to L2 cache in same
package
Recent development: 3-D packaging allows more integration
Foil # 7 / 58 The University of Texas at AustinEE 382M Class Notes
P4
Foil # 8 / 58 The University of Texas at AustinEE 382M Class Notes
Functional Block Diagram
Multiplexors andSense Amplifiers
Column Decoder
Column Address
Data
Row
Decoder
Cell Array
2N x 2M2NNRowAddress
Word Lines
Read/WriteBuffer
2K 2K
2(M-K)
2M
(M-K)
“1-hot” select
Foil # 9 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (4 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 10 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Cell Overview
• A memory cell array has the following capabilities;• A means of storing bits of information (storage elements)• A means of selecting the stored information (wordlines)• A means of transferring data to/from storage elements (bitlines)
• 1T/1C memory cell is the simplest implementation• Only requires 1 W/L and 1 B/L metalization
• 6T SRAM cell consumes more area and requires true & complement bitlines, but is more stable and develops a sensing voltage faster than DRAM cell
• Register File cells allow multiple entries to be accessed or written simultaneously– However, this requires multiple wordlines and bitlines and
becomes metal-limited– Used for integer/floating point registers, single & multiple-cycle
queues and buffers
Foil # 11 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Cell Types
• Schematic of 1-T DRAM cell, 6T dual ended SRAM cell
WL
BL
1-transistor DRAM
Storagecap
WL
BL #BL
6-transistor SRAM
• Industry standard DRAM cell• Smallest area per bit• Explicit storage capacitor• Destructive READ
• Industry standard SRAM cell• Used for FAST static arrays• Cross-coupled inverters• Non-destructive READ with
proper stability analysis
Foil # 12 / 58 The University of Texas at AustinEE 382M Class Notes
WL
BL #BL
6-transistor SRAM cell
BL #BL
WL
GND
VDD
PFET
NFET
PASSGATE
1.0 μm
(65n
m)
0.68 μm (65nm)
In 65nm CMOS, a typical6T bitcell area = .68 μm2
Foil # 13 / 58 The University of Texas at AustinEE 382M Class Notes
Multi-Port Memory Cell Types
WWL
WBL
RWL
D #D
#WBL
RBL #RBL
1 Read (DE), 1 Write (DE)
Foil # 14 / 58 The University of Texas at AustinEE 382M Class Notes
Multi-Port Memory Cell Types
WWL
WBL
RBLRWL
D #D#WBL
1 Write (DE), 1 Read (SE)
Foil # 15 / 58 The University of Texas at AustinEE 382M Class Notes
Register File Multi-Ported Bitcell
VDD rwl wl0 GND wl1 GND
wl0
bl0
bl0b
VDD
bl1
GND
rbl
GND
bl1b
rwl
GND
RWL
D #D
WL0
WL1
BL0
BL1
BL0
BB
L1B
RBL
2 Write (DE), 1 Read (SE)
Foil # 16 / 58 The University of Texas at AustinEE 382M Class Notes
Multi-Port Memory Cell Types
WWL
WBL
RBL
RWL
D #D
#RBL
1 Write (SE), 1 Read (DE)
Foil # 17 / 58 The University of Texas at AustinEE 382M Class Notes
Multi-Port Memory Cell Types
WWL
WBL
RBL
RWL
D #D
#RBL
1 Write (SE), 1 Read (DE)Slight modification
Foil # 18 / 58 The University of Texas at AustinEE 382M Class Notes
Multi-Port Memory Cell Types
WWL
WBLRBL0
RWL0
D #D
RWL1
RBL1
1 Write (SE), 2 Read (SE)
Foil # 19 / 58 The University of Texas at AustinEE 382M Class Notes
Relative Memory Cell SizesDimensions in M1 pitches.
(assume M1 same)
Cell WL Dir BL Dir Area
1T 1 1.5 1.5
4T 3 4 12
6T 4 6 24
4R/2W 9 9 81
Foil # 20 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (4 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 21 / 58 The University of Texas at AustinEE 382M Class Notes
Array Design Choices
• Decoders– Predecoder & Banked WL Drivers - for large number of rows– Hierarchical WL & WL Repeaters - for large number of cols
• Cells– Differential - for few ports and large array size– Single Ended - for many ports or small array size
• Bitlines– Hierarchical - for many rows & available higher metal– Serial - for large number of rows & no higher metal
• Column Muxing– Differential - group by bit– Single Ended - group by entry
Foil # 22 / 58 The University of Texas at AustinEE 382M Class Notes
Basic Array Characteristics
• Array Size– Number of entries– Bits per entry
• Number of Ports– Number of simultaneous reads– Number of simultaneous writes
• Latency– Cycles from address to read data– Cycles from address to write completed
Foil # 23 / 58 The University of Texas at AustinEE 382M Class Notes
Precharge
Basic Array Layout
Cell
Address
BitL
ine
Bitline ReceiversWrite Buffers
Decoder
Rows
Columns
Cell
Cell
Read DataWrite Data
Pre-D
ec
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
CellCell
CellCellCellWordLine
Foil # 24 / 58 The University of Texas at AustinEE 382M Class Notes
Large Signal vs Small Signal Arrays
WordLine
Cell
Sense Amp
Bit Bit#
Data
Small Signal Arrays• Differential bitlines• Dual-ended Sense
amplifier
WordLine
Cell
Bit#
Data
Large Signal Arrays• Single-ended bitline• Inverter threshold
sense
Foil # 25 / 58 The University of Texas at AustinEE 382M Class Notes
• Small Signal Arrays:– DRAM and SRAM chips– Processor D-cache and I-cache
• Large Signal Arrays:– Processor register files– Multi-ported data structures
• Small Signal Arrays are less common because:– Sense amps require special characterization– More sensitive to noise– Area and timing overhead of differential sense amp– May not scale well to low supply voltage
Large Signal vs Small Signal Arrays
Foil # 26 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (5 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 27 / 58 The University of Texas at AustinEE 382M Class Notes
Register File Bitline Segmentation
• Problem: In general, long bitlines cause very slow edge rates– May consider converting to an SSA design approach
• However, very short bitlines causes overall area to increase– Array efficiency goes down; wastes valuable silicon area
• Solution: Break up bitline depth to determine optimal design point– Divide up into smaller sections & recombine with “wire-OR”
• Example #1 shows 16 memory cells on a bitline which drives a dynamic “wire OR” global bitline
• Example #2 shows a “serial” global bitline structure– The lower global bitline is in series with the upper global bitline
with a receiver and NMOS pulldown device in the center (acts like a “repeater”)
Foil # 28 / 58 The University of Texas at AustinEE 382M Class Notes
Register File Segmentation Example #1
Memorycell
Local BL
Global BL
Global BL receiver
Dynamic latch
#pc
Global bitline acts a dynamic “wire-OR”16 cells
Foil # 29 / 58 The University of Texas at AustinEE 382M Class Notes
Register File Segmentation Example #2
• Serial global bitline
Memorycell
Local BL
Global BL
Global BL receiver
Dynamic “wire OR”
Dynamic “wire OR”
#pc
#pc Dynamic latch
Foil # 30 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (5 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 31 / 58 The University of Texas at AustinEE 382M Class Notes
• Cell Area– 6T bitcell dimensions strongly dependent on technology
• Need an actual layout study to determine area– Multiported cells are wire limited and can be easily caclulated
• Cell Height is a function of {MV_Pitch*(Wordlines + Shields)}
• Cell Width is a function of {MH_Pitch*(Bitlines + Datalines + Shields)}
• Local Bitline Receivers and Dataline drivers– Height of array is increased by local bitline receivers
• NumReadPorts*NumEntries/CellPerLBL– Height of array is increased by local dataline drivers
• NumWritePorts*NumEntries/CellPerLBL
Array Area Estimation
Foil # 32 / 58 The University of Texas at AustinEE 382M Class Notes
Array Area Estimation
• Decoder & Wordline Repeaters
– Width of array is increased by the decoder
• Decoder width is a function of number of ports
• 20% of total array width is a reasonable estimate
– Width of array is increased by wordline repeaters
• Typically no more than 32 to 64 bitcells on a single wordline (limits rise/fall time of selected row)
Foil # 33 / 58 The University of Texas at AustinEE 382M Class Notes
Array Area EstimationCell Height & Width CalculationRecall
Cell Height = {MH_Pitch*(Wordlines + Shields)}
MH_Pitch*[(#R + #W) + WL_shield*(#R + #W + 1)]
Cell Width = {MV_Pitch*(Bitlines + Datalines + Shields)}
Mv_Pitch*(#R + Rd_shield*#R + 1) + (#W + Wr_shield*#W + 1)
Where
#R Number of Read Ports#W Number of Write PortsWL_shield Read wordline shield factorRd_shield Read bitline shield factorWr_shield Write dataline shield factorMH_Pitch Wordline PitchMV_Pitch Bitline Pitch
Foil # 34 / 58 The University of Texas at AustinEE 382M Class Notes
Array Area Estimation
Consider: 3 read ports & 2 write ports, 16-bits, 64-entryCell Height = MH_Pitch*(Wordlines + Shields)= MH_Pitch*[(#R + #W) + WL_shield*(#R + #W + 1)]
= 0.2um * [(3 + 2) + (5 shields + 1)] = 2.20um
Cell Width = MV_Pitch*(Bitlines + Datalines + Shields)= MV_Pitch*(#R + Rd_shield*#R + 1) + (#W + Wr_shield*#w + 1)
= 0.2um * [(3 + 0.5*3 + 1) + (2 + 0.5*2 + 1) ] = 1.90um
• Sub-array dimensions are:
X = 16 * (Cell_width) = 16 * 1.90um = 30.4umY = 64 * (Cell_Height) = 64 * 2.20um = 140.8um
Foil # 35 / 58 The University of Texas at AustinEE 382M Class Notes
SRAM Array Area Estimation
Estimate subarray first:1. # 6T bitcells * bitcell area + wordline & column decoders + sense-amp
+ read/write sequentials.2. The decoders + sense-amps + sequentials are typically 15% of the
subarray bitcell area.3. Use an ‘array efficiency’ factor to calculate the total SRAM array area;
this includes clock buffers, address decoders, control logic, repeaters, routing, etc.; typical numbers are in the range of ~60%.
EXAMPLE:
• A 16KB L1 cache with four 4KB subarrays; each subarray is comprised of 128 bitcells/colum and 256 bitcells/wordline; the 6T bitcell area in this 65 nm CMOS technology is 0.82 μm2.
Bitcell subarray = 0.68 μm2 * 128 * 256 = 22,282 μm2
Subarray = 1.15 * 22,282 = 25,624 μm2
4 subarrays = 4 * 25,624 = ~102,500 μm2
16KB L1 cache = 102,500 / 0.60 = 170,833 μm2 or ~ 0.17 mm2
Foil # 36 / 58 The University of Texas at AustinEE 382M Class Notes
Floorplan Options
DE
CO
DE
Sub Array
Rd Block
Wrt DriverCTL
DE
CO
DE
Sub Array
Rd Block
Wrt DriverCTL
Sub Array
Rd Block
Wrt Driver
Possible Large-Signal Array Floorplans• Array Area Calculator provides dimensions for these blocks
Pchg Pchg Pchg
Foil # 37 / 58 The University of Texas at AustinEE 382M Class Notes
Floorplanning ToolStructured Datapath
Foil # 38 / 58 The University of Texas at AustinEE 382M Class Notes
Sample FloorplanGenerated from a floorplanning CAD tool
bitslices
rwldrv
wwldrvdecode
mergelogic
Foil # 39 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (5 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 40 / 58 The University of Texas at AustinEE 382M Class Notes
wordline RC delay (example) 128 bitcells in a row
• RT = Σ Ri = 140 mΩ/ * 348μm/0.1μm = 487.2 Ω
• CT = Σ Ci = CM1 + Ggate
= 348μm * 0.23fF/μm + 128*(2*0.5μm)*2.0fF/μm= 80fF + 256fF = 336fF
• trow = 0.38 * RT * CT = 62ps (50% point of rising wave)
Break into components= wordline driver + wordline RC delay + column fall time + colmux + setup
Access Time Estimation
R1 R2 R128
C1 C2 C128clk
V128
Foil # 41 / 58 The University of Texas at AustinEE 382M Class Notes
Access Time EstimationColumn Fall Time• Assume bitline is discharged linearly, then we can use;• dV/dt = Iread/CBL
• Bitline falls to VDD/2 = 1.0V/2 in 113ps
68fF0.5um*600uA/um
dV/dt = WL=VDD
68fFCBL
0.5μm
Iread
LOWdV/dt = 4.41 V/ns
1.0V
VDD/2 50%
BL
t {ns}
V
113ps
dV/dt = 4.41 V/ns
CJ=1.25fF/μm2
Foil # 42 / 58 The University of Texas at AustinEE 382M Class Notes
Access Time Estimation
Sum up components of delay; assume inverter delay is 40ps and nand2 is about 60ps delay and setup into latch is 30ps;
Taccess = Wordline driver + wordline delay + column delay + column mux + setup
= (60ps + 40ps) + 62ps + 113ps + 60ps + 30ps
= 365ps
Should easily meet machine cycle time since low frequency … however,the above calculated value of 365ps is only the READ-ACCESS time …Wire routing and data capture budgets have not been factored yet.May be able to use a “high Vt” device if it is available from Fab
Foil # 43 / 58 The University of Texas at AustinEE 382M Class Notes
Preliminary Power Estimation
• Most power dissipation for an array occurs in bitlines and sense amplifiers• Calculate total bitline capacitance
– {Metal2 bitline cap} + {junction cap} X {number of bitcells}• Calculate sense node capacitive load to include in power dissipation • For power dissipation, use the approximation:
Pdiss = a * Ctotal * (Vsupply)2 * frequency
Where alpha is the “Activity Factor” 0 < a < 1
• Memory cells can contribute significant D.C. power due to leakage from many cells in standby; be sure to take that into account
Pstatic = Ileakage * VDD
Foil # 44 / 58 The University of Texas at AustinEE 382M Class Notes
Class Agenda
• Memory Hierarchy (6 foils)
• Memory Cell Types (9 foils)
• Basic Array Structure (5 foils)
• Bitline Segmentation (3 foils)
• Area Estimation (7 foils)
• Access Time & Power Estimation (4 foils)
• Clock & Power Distribution (4 foils)
Foil # 45 / 58 The University of Texas at AustinEE 382M Class Notes
Local Clock Distribution
• At high frequencies, clock uncertainties become a significant portion of the cycle time (10-15% of cycle time or more)
• Important to define the overall clocking scheme and distribution before implementation begins
• Clock inaccuracy is composed of 2 major sources;– Clock jitter: due to PLL, DLL, etc– Clock skew: mismatches in clock buffer tree, load,
inductance or variances due to process (Leff is not constant), VDD (it is not constant), and local temperature.
• A global clock grid that distributes to local clock buffers requires large overhead but helps minimize clock skew– LCB’s are evenly distributed within array block and tap off
of global clock grid with minimum route
Foil # 46 / 58 The University of Texas at AustinEE 382M Class Notes
Port1 Input Data LatchLCB
LCB
Port0 Input Data Latch LCB
LCB
Port0 Read/Write CktLCB
Port0 Output LatchLCB
LCB
Port1 Output LatchLCB
Port1 Read/Write Ckt
LCB
LCB
LCB
LCB
BitcellArray
Port1 Input Data LatchLCB
LCB
Port0 Input Data LatchLCB
LCB
Port0 Read/Write Ckt LCB
BitcellArray
Port0 D
ecoder
LCB
LCB
Port0 Output Latch LCB
LCBPort1 Output LatchPort1 Read/Write Ckt
LCB
LCB
LCB
LCB
LCB
LCB
LCB
Port0 Read/Write CktP
ort1 Decoder
LCB Placement
Large number of LCBs minimizes wire load from LCB to sequentials, thus reducing skew variance.
Foil # 47 / 58 The University of Texas at AustinEE 382M Class Notes
SAMPLE Power/Ground GRID
Shielding takes up significant routing resources.Global M6 routes over the array should have minimal coupling noise to array bitlines.
* Where λ is minimum critical dimension for width/space
Sig
Sig
Si g
Sig
VSS VDD VSSS
ig
48λ
Sig
Vss
Vss
Vss
Vss
(Full Shielding, MCF = 1.0)
2λ
4λ
2λ
λ
2λ2λ
λ
2λ
Foil # 48 / 58 The University of Texas at AustinEE 382M Class Notes
Power/Clock Grid• Clock grid is interleaved between VDD and VSS on metal6
Port1 Input Data LatchLCB
LCB
Port0 Input Data Latch LCB
LCB
Port0 Read/Write CktLCB
Port0 Output LatchLCB
LCB
Port1 Output LatchLCB
Port1 Read/Write Ckt
LCB
LCB
LCB
LCB
BitcellArray
Port1 Input Data LatchLCB
LCB
Port0 Input Data LatchLCB
LCB
Port0 Read/Write Ckt LCB
BitcellArray
Port0 D
ecoderLCB
LCB
Port0 Output Latch LCB
LCBPort1 Output LatchPort1 Read/Write Ckt
LCB
LCB
LCB
LCB
LCB
LCB
LCB
Port0 Read/Write CktP
ort1 Decoder
Foil # 49 / 58 The University of Texas at AustinEE 382M Class Notes
BACKUP
Foil # 50 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Array Performance
• Optimization of memory arrays and caches requires careful analysis of:– Size and speed of the array which impacts:
• Power: static and dynamic• Latency: number of clocks to access the memory cell• Area and aspect ratios• Redundancy
– Hit rate (caches): requires additional logic and tag arrays.– Architecture: How many levels of caching?
• In addition need to account for array BIST. This requires additional logic and impacts performance.
Foil # 51 / 58 The University of Texas at AustinEE 382M Class Notes
Memory Array Performance
Foil # 52 / 58 The University of Texas at AustinEE 382M Class Notes
Array Redundant Elements
Cell
Address
WordLine
BitL
ine
Bitline ReceiversWrite Buffers
Decoder
Rows
Columns
Cell
Cell
Read DataWrite Data
Pre-D
ec
Precharge
Redundant Address &
enable
Redundant Wordline &
Driver
Redundant Column & Bitslice
Account for area overhead if redundancy is used for repair
Foil # 53 / 58 The University of Texas at AustinEE 382M Class Notes
Trade-offsLarge Signal Arrays Small Signal Arrays
Simplest sense scheme• Single-ended bitlines
Need sense-amplifier• Dual-ended bitlines
Good noise margin• Vdd/2 threshold
Noise-sensitive• Few hundred millivolts ΔV
Lower bitcell density(Used for small queues & register files, 8 ~ 32 cells on a bitline)
Highest bitcell density(Used for large 1st & 2nd level cache arrays, 64, 128, 256 or more cells on a bitline)
Static timing analysis works Static timing analysis difficult
Multi-portedUsually single-ended;Many READ/WRITE ports
Single portedUsually dual-ended; 1 ~ 3 ports
Foil # 54 / 58 The University of Texas at AustinEE 382M Class Notes
Dual-Ended Cell Column MuxingAddr[6:0]
Read D
ecoder
128 Rows
2 Cols
Data[1:0]
Write D
ecoderC
ells
Addr[6:2]
Read D
ec
32 Rows
8 Cols
Data[0]
Write D
ec
Cells
Bit 0
Cells
Bit 1
4:1 4:1
Data[1]
Addr[1:0]
For minimum delay cell array should be roughly square.
Foil # 55 / 58 The University of Texas at AustinEE 382M Class Notes
Single Ended Cell Column Muxing
Single ended arrays must group bits of the same entry together, to write wordlines only on cells of one entry.
Addr[6:2]
Read D
ec
32 Rows
8 Cols
Data[0]
Cells E
ntry A4:1
Data[1]
Addr[1:0]
Write D
ec
Cells E
ntry B
Cells E
ntry C
Cells E
ntry D
Write D
ec
4:1
Addr[6:0]
Read D
ecoder
128 Rows
2 Cols
Data[1:0]
Write D
ecoderC
ells
Foil # 56 / 58 The University of Texas at AustinEE 382M Class Notes
Dual Ended vs Single Ended Column Muxing
Same bit of different entries grouped together.
Write data driven only on some columns.
Dual-Ended Cells
Write wordline “on’ for entire row.
Different bits of same entry grouped together.
Write data can be driven on every column.
Write wordline “on” for only 1 entry.
A0 B0 C0 D0 A1 B1 C1 D1
Data[0]
4:1
Data[1]
4:1
Read WL
Write WL
A0 A1 B0 B1 C0 C1 D0 D1
Data[0]
4:1
Data[1]
Read WL
Write WLs
Single-EndedCells
4:1
Foil # 57 / 58 The University of Texas at AustinEE 382M Class Notes
Segmentation Guidelines• Design considerations for segmenting the bitlines are based on
variables such as;– Number of entries– Number of ports– Number of bits
• Processor architecture and manufacturing technology also contribute to design decisions– For example, a high-leakage process may limit the number of
cells on a bitline before losing state
• The following table is a guideline to help determine how to divide up the bitlines for optimum performance– The final decision will be based on careful HSPICE
simulations of the different options over PVT variations
Foil # 58 / 58 The University of Texas at AustinEE 382M Class Notes
Table of GuidelinesENTRIES PORTS <=64 <=128 <=256
1--7
Single Array; Split LBL with a maximum of 8 bits per LBL in M2; each to NAND2 receiver followed by a latch; GBL to the input of latch at the bottom in M4 ; 1-cycle latency is assumed
Split into 2 sub arrays with 64 entries each; LBL and GBL should follow the guidelines for similar ports; Output of GBL to NAND2 between subarrays. Single cycle latency is assumed
LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries. 2:1 mux between the two 128 entry sub-arrays; at least two cycle latency is required
8--16
Single Array; Split LBL with a maximum of 8 bits per LBL in M2;Each to a NAND2 receiver followed by latch;Split GBLs are routed in M4 to NAND2 ; dynamic latch in the middle; Lached outputs to destination drivers in M4 (or M3)
Split into 2 sub arrays with up to 64 entries each;LBL and GBL should follow the guidelines for entries with similar ports;Output of GBL to dynamic latch followed by latches;Two cycle latency is assumed.
LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries. 2:1 mux between the two 128 entry sub-arrays;More than 2-cycle latency is required.
17 --21
Single Array; Split LBL with a maximum of 8 bits per LBL in M2; each to NAND2 receiver followed by dynamic wire-ORSplit GBLs are routed in M4 (or M2) to NAND2Latch in the middle; Latched outputs to destination drivers in M4 (or M3); Maximum of 48 entries can be supported for this many ports
Split into 2 sub arrays with up to 48 entries each;LBL and GBL should follow the guidelines for similar ports;Ouput of GBL to dynamic latch followed by latches;At least 2-cycle latency assumed;
LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries.2:1 mux between the two 128 entry sub-arrays;More than 2-cycle latency is required
Recommended