View
8
Download
0
Category
Preview:
Citation preview
Caches in SystemsCOMP 252 - Lecture 4
Antoniu Pop
antoniu.pop@manchester.ac.uk
8 February 2019
Antoniu Pop – Caches in Systems 1 / 18
Previous Lecture: practical and performance considerations
I Control bits
I Valid & Dirty bitsI Locality tradeoffs and compromises
I Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromises
I Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line size
I Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal locality
I Separating Instruction & Data cachesI Multiple-level caches
I Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level caches
I Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, how
I Performance model in a cache hierarchyI First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Previous Lecture: practical and performance considerations
I Control bitsI Valid & Dirty bits
I Locality tradeoffs and compromisesI Impact of cache line sizeI Spatial vs. temporal localityI Separating Instruction & Data caches
I Multiple-level cachesI Why, howI Performance model in a cache hierarchy
I First lab
Antoniu Pop – Caches in Systems 2 / 18
Today’s Lecture – Learning Objectives
I “3 x C’s” model of cache performance
I Time penalties for starting with empty cache
I Systems interconnect issues with caching and solutions!
I Caching and Virtual Memory
Antoniu Pop – Caches in Systems 3 / 18
Today’s Lecture – Learning Objectives
I “3 x C’s” model of cache performance
I Time penalties for starting with empty cache
I Systems interconnect issues with caching and solutions!
I Caching and Virtual Memory
Antoniu Pop – Caches in Systems 3 / 18
Today’s Lecture – Learning Objectives
I “3 x C’s” model of cache performance
I Time penalties for starting with empty cache
I Systems interconnect issues with caching and solutions!
I Caching and Virtual Memory
Antoniu Pop – Caches in Systems 3 / 18
Today’s Lecture – Learning Objectives
I “3 x C’s” model of cache performance
I Time penalties for starting with empty cache
I Systems interconnect issues with caching and solutions!
I Caching and Virtual Memory
Antoniu Pop – Caches in Systems 3 / 18
Today’s Lecture – Learning Objectives
I “3 x C’s” model of cache performance
I Time penalties for starting with empty cache
I Systems interconnect issues with caching and solutions!
I Caching and Virtual Memory
Antoniu Pop – Caches in Systems 3 / 18
Describing Cache Misses
I Compulsory MissesI Cold start
I Capacity MissesI Even with full associativity, cache cannot contain all the blocks of
the program
I Conflict MissesI Multiple blocks compete for the same set.I This would not happen in fully associative cache
Antoniu Pop – Caches in Systems 4 / 18
Describing Cache Misses
I Compulsory MissesI Cold start
I Capacity MissesI Even with full associativity, cache cannot contain all the blocks of
the program
I Conflict MissesI Multiple blocks compete for the same set.I This would not happen in fully associative cache
Antoniu Pop – Caches in Systems 4 / 18
Describing Cache Misses
I Compulsory MissesI Cold start
I Capacity MissesI Even with full associativity, cache cannot contain all the blocks of
the program
I Conflict MissesI Multiple blocks compete for the same set.
I This would not happen in fully associative cache
Antoniu Pop – Caches in Systems 4 / 18
Describing Cache Misses
I Compulsory MissesI Cold start
I Capacity MissesI Even with full associativity, cache cannot contain all the blocks of
the program
I Conflict MissesI Multiple blocks compete for the same set.I This would not happen in fully associative cache
Antoniu Pop – Caches in Systems 4 / 18
Cache Performance
Today’s caches, how long does it takeI To fill L3 cache? (8MB)I To fill L2 cache? (256KB)I To fill L1 D cache? (32KB)
I Number of lines = (cache size) / (line size)I Number of lines = 32K/64 = 512I 512 x memory access times at 20nS = 10 uSI 20,000 clock cycles at 2GHz
Antoniu Pop – Caches in Systems 5 / 18
Cache Performance
Today’s caches, how long does it takeI To fill L3 cache? (8MB)I To fill L2 cache? (256KB)I To fill L1 D cache? (32KB)
I Number of lines = (cache size) / (line size)I Number of lines = 32K/64 = 512I 512 x memory access times at 20nS = 10 uSI 20,000 clock cycles at 2GHz
Antoniu Pop – Caches in Systems 5 / 18
Caches in Systems
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/OE.g., network, disk, ...Interconnect
I Typical I/O bandwidth?
I What could go wrong?
Antoniu Pop – Caches in Systems 6 / 18
Caches in Systems
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/OE.g., network, disk, ...Interconnect
I Typical I/O bandwidth?I What could go wrong?
Antoniu Pop – Caches in Systems 6 / 18
Cache Consistency Problem (1)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
3
3
5
I Problem
I I/O writes to memoryI Cache data is no longer up-to-date
Antoniu Pop – Caches in Systems 7 / 18
Cache Consistency Problem (1)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
3
3
5
I ProblemI I/O writes to memoryI Cache data is no longer up-to-date
Antoniu Pop – Caches in Systems 7 / 18
Cache Consistency Problem (2)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
3
5
5
I Problem
I I/O reads from memoryI But the cache has a new, updated value
Antoniu Pop – Caches in Systems 8 / 18
Cache Consistency Problem (2)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
3
5
5
I ProblemI I/O reads from memoryI But the cache has a new, updated value
Antoniu Pop – Caches in Systems 8 / 18
Cache Consistency Software Solutions
I O/S knows where I/O takes place in memoryI Mark I/O areas as non-cacheable (how?)
I O/S knows when I/O starts and finishesI Clear caches before & after I/O?
Antoniu Pop – Caches in Systems 9 / 18
Cache Consistency Software Solutions
I O/S knows where I/O takes place in memoryI Mark I/O areas as non-cacheable (how?)
I O/S knows when I/O starts and finishesI Clear caches before & after I/O?
Antoniu Pop – Caches in Systems 9 / 18
Hardware Solution 1
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
I Disadvantage
I Slows down the cacheI “Pollutes” the cache (replaces potentially useful data)
Antoniu Pop – Caches in Systems 10 / 18
Hardware Solution 1
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
I DisadvantageI Slows down the cache
I “Pollutes” the cache (replaces potentially useful data)
Antoniu Pop – Caches in Systems 10 / 18
Hardware Solution 1
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
I DisadvantageI Slows down the cacheI “Pollutes” the cache (replaces potentially useful data)
Antoniu Pop – Caches in Systems 10 / 18
Hardware Solution 2 (Snooping)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
Snoop
I Snoop logic in the cache
I Observes every memory cycleI Scalability issues
Antoniu Pop – Caches in Systems 11 / 18
Hardware Solution 2 (Snooping)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
Snoop
I Snoop logic in the cacheI Observes every memory cycle
I Scalability issues
Antoniu Pop – Caches in Systems 11 / 18
Hardware Solution 2 (Snooping)
CPU
Registers
L1Inst
Cache
On-chip
RAMMemoryL1
DataCache
fetch
data
L2
I/O
5
5
5
Snoop
I Snoop logic in the cacheI Observes every memory cycleI Scalability issues
Antoniu Pop – Caches in Systems 11 / 18
Caches and Virtual Addresses
I CPU addresses – virtual
I Memory addresses – physical
I Recap...I Use Translation-Lookaside Buffer (TLB) to translate V-to-P
I What addresses in cache?
Antoniu Pop – Caches in Systems 12 / 18
Caches and Virtual Addresses
I CPU addresses – virtualI Memory addresses – physical
I Recap...I Use Translation-Lookaside Buffer (TLB) to translate V-to-P
I What addresses in cache?
Antoniu Pop – Caches in Systems 12 / 18
Caches and Virtual Addresses
I CPU addresses – virtualI Memory addresses – physical
I Recap...
I Use Translation-Lookaside Buffer (TLB) to translate V-to-P
I What addresses in cache?
Antoniu Pop – Caches in Systems 12 / 18
Caches and Virtual Addresses
I CPU addresses – virtualI Memory addresses – physical
I Recap...I Use Translation-Lookaside Buffer (TLB) to translate V-to-P
I What addresses in cache?
Antoniu Pop – Caches in Systems 12 / 18
Caches and Virtual Addresses
I CPU addresses – virtualI Memory addresses – physical
I Recap...I Use Translation-Lookaside Buffer (TLB) to translate V-to-P
I What addresses in cache?
Antoniu Pop – Caches in Systems 12 / 18
Option 1 : Cache by Physical Addresses
CPU
Registers
TLB
On-chip
RAMMemory
address
data
$
I Slow
I Address translation is in series with cache
Antoniu Pop – Caches in Systems 13 / 18
Option 1 : Cache by Physical Addresses
CPU
Registers
TLB
On-chip
RAMMemory
address
data
$
I SlowI Address translation is in series with cache
Antoniu Pop – Caches in Systems 13 / 18
Option 2 : Cache by Virtual Addresses
CPU
Registers
TLB
On-chip
RAMMemory
address
data
$
I More functional difficulties
I SnoopingI Aliasing
Antoniu Pop – Caches in Systems 14 / 18
Option 2 : Cache by Virtual Addresses
CPU
Registers
TLB
On-chip
RAMMemory
address
data
$
I More functional difficultiesI Snooping
I Aliasing
Antoniu Pop – Caches in Systems 14 / 18
Option 2 : Cache by Virtual Addresses
CPU
Registers
TLB
On-chip
RAMMemory
address
data
$
I More functional difficultiesI SnoopingI Aliasing
Antoniu Pop – Caches in Systems 14 / 18
Option 3 : Translate in Parallel with Cache Lookup
I Translation only affects high-order bits of addressI Address within page remains unchanged
I Low-order bits of Physical Address = low-order bits of VirtualAddress
I Select “index” field of cache address from within low-order bitsI Only “Tag” bits changed by translation
Antoniu Pop – Caches in Systems 15 / 18
Option 3 : Translate in Parallel with Cache Lookup
I Translation only affects high-order bits of addressI Address within page remains unchanged
I Low-order bits of Physical Address = low-order bits of VirtualAddress
I Select “index” field of cache address from within low-order bitsI Only “Tag” bits changed by translation
Antoniu Pop – Caches in Systems 15 / 18
Option 3 : Translate in Parallel with Cache Lookup
I Translation only affects high-order bits of addressI Address within page remains unchanged
I Low-order bits of Physical Address = low-order bits of VirtualAddress
I Select “index” field of cache address from within low-order bits
I Only “Tag” bits changed by translation
Antoniu Pop – Caches in Systems 15 / 18
Option 3 : Translate in Parallel with Cache Lookup
I Translation only affects high-order bits of addressI Address within page remains unchanged
I Low-order bits of Physical Address = low-order bits of VirtualAddress
I Select “index” field of cache address from within low-order bitsI Only “Tag” bits changed by translation
Antoniu Pop – Caches in Systems 15 / 18
Option 3 in operation
Virtual Page address Index
Compare
hit / miss data
Virtualaddress
Multiplexer
Tag Data line
Within lineoffset
TLB
Physicaladdress(High-order bits)
Antoniu Pop – Caches in Systems 16 / 18
Summary
I “3 x C’s” model of cache performance
I Systems interconnect issues with caching and solutions!I Non-cacheable areasI Cache flushingI Snooping
I Caching and Virtual MemoryI Physical to virtual conversion (TLB)I Cache architectures to support P-to-V conversion
Antoniu Pop – Caches in Systems 17 / 18
Summary
I “3 x C’s” model of cache performance
I Systems interconnect issues with caching and solutions!I Non-cacheable areasI Cache flushingI Snooping
I Caching and Virtual MemoryI Physical to virtual conversion (TLB)I Cache architectures to support P-to-V conversion
Antoniu Pop – Caches in Systems 17 / 18
Summary
I “3 x C’s” model of cache performance
I Systems interconnect issues with caching and solutions!I Non-cacheable areasI Cache flushingI Snooping
I Caching and Virtual MemoryI Physical to virtual conversion (TLB)I Cache architectures to support P-to-V conversion
Antoniu Pop – Caches in Systems 17 / 18
Caches Module Summary
I Why are caches essential?
I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAM
I PerformanceI How do caches work?
I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?
I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I Locality
I AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI Associativity
I Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policy
I Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line size
I Cache lookup: how is data found (address splitting in tag, index,word ID, alignment)
I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)
I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchy
I Cache misses (3 Cs)I What is the impact of caches?
I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?
I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)
I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?
Antoniu Pop – Caches in Systems 18 / 18
Caches Module Summary
I Why are caches essential?I Speed imbalance CPU vs. RAMI Performance
I How do caches work?I LocalityI AssociativityI Replacement policyI Line sizeI Cache lookup: how is data found (address splitting in tag, index,
word ID, alignment)I Cache hierarchyI Cache misses (3 Cs)
I What is the impact of caches?I Performance model (including in a cache hierarchy)I Interaction with other system components
I Does this matter to you?Antoniu Pop – Caches in Systems 18 / 18
Recommended