Upload
yvonne-buckner
View
23
Download
0
Embed Size (px)
DESCRIPTION
Cache Physical Implementation. Panayiotis Charalambous Xi Research Group. Contents. Cache Logical View Physical View Case Study – Power 4 L2 Cache. Logical Cache Structure. n-way associative cache. n-elements per set. 2 m Sets. …. m. 32 – m - k. =. =. k. Tag. Index. Offset. or. - PowerPoint PPT Presentation
Citation preview
Cache Physical Implementation
Panayiotis CharalambousXi Research Group
ContentsCache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache
Logical Cache Structuren-way associative cache
n-elements per set
2m Sets
Tag Index
Address (32 bits)
= =
DataHit
m
32 – m - k
…
Offset
k
or
Cache Structure
Cache Access Steps
1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output
Conventional Cache Organization
Memory Cell
Memory Cell
bit' bit
Read: Set bit and bit´ high If the value in the
cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged
Write: Set bit´ to 0. This
forces 1 in the latch.
Decoder with Driver
Various Components
Comparator is xor logic Multiplexer hierarchy for offset. First get
block (from output drive), then word, then byte
Output Driver Maximum of one
input bits high If input 0, then high
resistant output…I0 I1 I7
Banking Idea: Support
Multiple Cache Accesses
Solution: Use multiporting
on bit cells (Cost is big)
Divide the cache into independent banks
Cache Search Steps:
1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the
cache (tag match)4. If all ok return data (block and byte
offset), else check lower level memory
Case Study - Power 4 Dual Core 64-bit
Processors 32KB L1 D-Cache
(Per Processor) 2-way associative 128 Bytes Line
64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4
sectors x 32B) ~1.5MB L2 Cache
8-way set associative 128 Bytes line
Power4 Floorplan
Power4 L2 Logical View Cache Split into 3
Parts, 0.5Mb each Control by 4
Coherency Processors
1 64B Store Queue per Processor
Power4 L2U ~512 KB 8 Banks 128 B block size 8-way associative
Word lines
Bit lines
Decoders
Address Bus
Power4 L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:
Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets
tag index offset
bank index set index
64-bit
79
63
Power4: CACTI Resultscacti 524288 128 8 0.8um 8
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
cacti 524288 128 8 0.8um 16
---------- CACTI version 3.2 ----------
Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V
Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382
Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2
CACTI Data Array
Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line
(sectors) Tag Array
Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line
(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the
increase of sense amplifiers Increase of Ndwl and Ntwl increases the
number of word line drivers
Thank You