LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE
OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE
By Hongbin Sun, Nanning Zheng, and Tong Zhang
Joseph SchneiderMarch 23, 2010
The Problem
As CMOS technology shrinks, random defects increase
Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones
As random defects increase, traditional defect strategy may no longer be sufficient
The Solution
Extend the role of Error-Correcting Codes to compensate for defects
Error-Correcting Codes (ECC) also used to compensate for transient soft errors
Find a method that allows ECCs to be used for both defects and soft errors
Multi-bit ECC
Multi-bit ECC – ECC that can correct multiple errors in one codeword
Suffers larger latency and higher coding redundancy than single error correction
Therefore unusable in L1 cache without suffering major performance issues
Overall Goal
Implement multi-bit ECC in L2 cache design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost
Steps to Success
1. Apply multi-bit ECC only to cache blocks that require it
2. Implement buffers to limit repeated use of multi-bit ECC
3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it
Limited multi-bit ECC
Cache blocks with one or more defective cells identified during memory testing; Multi-bit ECC selectively applied then
Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks)
ISSUE: CAM requires large energy consumption
Proposed Architecture
Standard L2 cache core protecting all subblocks with single error correction, double error detection (SEC-DED) codes
Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits
Dirty Replication Cache to ensure soft error tolerance
Multi-bit ECC Core
In case of write, subblock data encoded and check bits stored
In case of read, check bits fetched and decoded
ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities
Solution: Two additional buffers
Multi-bit ECC Core Buffers
Pre-decoding Buffer: Small cache that keeps copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache
Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality
Reduces large amount of ECC decoding and some M-ECC cache access
Multi-bit ECC Core Buffers
FLU buffer – small CAM that keeps addresses of recently accessed cache blocks that are NOT m-blocks
Also employs LRU policy
Further reduces M-ECC cache access
Soft Error Tolerance
ISSUE: When ECC devoted to defect tolerance, defective subblock is vulnerable to soft errors
Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC)
Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected
Dirty Replication Cache
Use of Dirty Replication (DR) cache
When cache block made dirty, data is also kept in this cache
When data leaves this cache, a write is performed to main memory
Ensures a backup is always available
Evaluation
Cache defect density set at 0.5% Multi-ECC: BCH-based DEC-TED code (double error
correction, triple error detection); Subblocks with more than two errors repaired by redundancy
Cache subblocks contain 64 bits BCH DEC-TED decoder has parallelism of 2, uses
PGZ decoding algorithm- resulting latency of 82 cycles
Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core
Evaluation
Compared on four bases: Base: Defect-free L2 cache with no defect
tolerant functions M-ECC only; No buffers M-ECC-pbuf: Use of predecoding buffer M-ECC-pfbuf: Use of predecoding and FLU buffers
First, determine best size of buffers for use; Then compare performance of IPC and power consumption
Results
Similar IPC performance, M-ECC core power performance 30% of L2 cache core, which itself is about 10% of the entire system cache
Conclusions
Goal was to effectively use multi-bit ECC for L2 cache defect tolerance at minimal performance and implementation cost
Multi-bit ECC implemented only where more than one defect found
Two small buffers included to reduce performance impact of multi-bit ECC
Dirty Replication Cache included to ensure soft error tolerance