Upload
liang
View
30
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Non-Uniform Cache Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili. Non-Uniform Cache Architecture. ASPLOS 2002 proposed by UT-Austin Facts Large shared on-die L2 - PowerPoint PPT Presentation
Citation preview
Non-Uniform Cache Architecture
Prof. Hsien-Hsin S. LeeSchool of Electrical and Computer EngineeringGeorgia Tech
Guest lecture for ECE4100/6100 for Prof. Yalamanchili
2
Non-Uniform Cache Architecture
• ASPLOS 2002 proposed by UT-Austin• Facts
– Large shared on-die L2– Wire-delay dominating on-die cache
3 cycles1MB
180nm, 1999
11 cycles4MB
90nm, 2004
24 cycles16MB
50nm, 2010
3
Multi-banked L2 cache
Bank=128KB11 cycles
2MB @ 130nm
Bank Access time = 3 cyclesInterconnect delay = 8 cycles
4
Multi-banked L2 cache
Bank=64KB47 cycles
16MB @ 50nm
Bank Access time = 3 cyclesInterconnect delay = 44 cycles
5
Static NUCA-1
• Use private per-bank channel• Each bank has its distinct access latency• Statically decide data location for its given address • Average access latency =34.2 cycles• Wire overhead = 20.9% an issue
Tag Array
Data Bus
Address Bus
Bank
Sub-bank
Predecoder
Senseamplifier
Wordline driverand decoder
6
Static NUCA-2
• Use a 2D switched network to alleviate wire area overhead• Average access latency =24.2 cycles• Wire overhead = 5.9%
Bank
Data bus
SwitchTag Array
Wordline driverand decoder
Predecoder
7
Dynamic NUCA
• Data can dynamically migrate• Move frequently used cache lines closer to CPU
8
Dynamic NUCA
• Simple Mapping• All 4 ways of each bank set needs to be searched• Farther bank sets longer access
8 bank setsway 0
way 1
way 2
way 3
one set
bank
9
Dynamic NUCA
• Fair Mapping• Average access time across all bank sets are
equal
8 bank setsway 0
way 1
way 2
way 3
one set
bank
10
Dynamic NUCA
• Shared Mapping• Sharing the closet banks for farther banks
8 bank setsway 0
way 1
way 2
way 3
bank