19
Cache Physical Implementation Panayiotis Charalambous Xi Research Group

Cache Physical Implementation

Embed Size (px)

DESCRIPTION

Cache Physical Implementation. Panayiotis Charalambous Xi Research Group. Contents. Cache Logical View Physical View Case Study – Power 4 L2 Cache. Logical Cache Structure. n-way associative cache. n-elements per set. 2 m Sets. …. m. 32 – m - k. =. =. k. Tag. Index. Offset. or. - PowerPoint PPT Presentation

Citation preview

Page 1: Cache Physical Implementation

Cache Physical Implementation

Panayiotis CharalambousXi Research Group

Page 2: Cache Physical Implementation

ContentsCache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache

Page 3: Cache Physical Implementation

Logical Cache Structuren-way associative cache

n-elements per set

2m Sets

Tag Index

Address (32 bits)

= =

DataHit

m

32 – m - k

Offset

k

or

Page 4: Cache Physical Implementation

Cache Structure

Page 5: Cache Physical Implementation

Cache Access Steps

1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output

Page 6: Cache Physical Implementation

Conventional Cache Organization

Memory Cell

Page 7: Cache Physical Implementation

Memory Cell

bit' bit

Read: Set bit and bit´ high If the value in the

cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged

Write: Set bit´ to 0. This

forces 1 in the latch.

Page 8: Cache Physical Implementation

Decoder with Driver

Page 9: Cache Physical Implementation

Various Components

Comparator is xor logic Multiplexer hierarchy for offset. First get

block (from output drive), then word, then byte

Output Driver Maximum of one

input bits high If input 0, then high

resistant output…I0 I1 I7

Page 10: Cache Physical Implementation

Banking Idea: Support

Multiple Cache Accesses

Solution: Use multiporting

on bit cells (Cost is big)

Divide the cache into independent banks

Page 11: Cache Physical Implementation

Cache Search Steps:

1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the

cache (tag match)4. If all ok return data (block and byte

offset), else check lower level memory

Page 12: Cache Physical Implementation

Case Study - Power 4 Dual Core 64-bit

Processors 32KB L1 D-Cache

(Per Processor) 2-way associative 128 Bytes Line

64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4

sectors x 32B) ~1.5MB L2 Cache

8-way set associative 128 Bytes line

Page 13: Cache Physical Implementation

Power4 Floorplan

Page 14: Cache Physical Implementation

Power4 L2 Logical View Cache Split into 3

Parts, 0.5Mb each Control by 4

Coherency Processors

1 64B Store Queue per Processor

Page 15: Cache Physical Implementation

Power4 L2U ~512 KB 8 Banks 128 B block size 8-way associative

Word lines

Bit lines

Decoders

Address Bus

Page 16: Cache Physical Implementation

Power4 L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:

Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets

tag index offset

bank index set index

64-bit

79

63

Page 17: Cache Physical Implementation

Power4: CACTI Resultscacti 524288 128 8 0.8um 8

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

cacti 524288 128 8 0.8um 16

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

Page 18: Cache Physical Implementation

CACTI Data Array

Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line

(sectors) Tag Array

Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line

(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the

increase of sense amplifiers Increase of Ndwl and Ntwl increases the

number of word line drivers

Page 19: Cache Physical Implementation

Thank You