Upload
flashdomain
View
769
Download
4
Tags:
Embed Size (px)
Citation preview
Cost-Efficient Memory Cost-Efficient Memory Architecture Design of NAND Architecture Design of NAND
Flash Memory Embedded Flash Memory Embedded SystemsSystems
Chanik Park et all.Chanik Park et all.
Proceedings of the ICCD 2003Proceedings of the ICCD 2003
IntroductionIntroduction
ObjectiveObjective Cost efficient NAND flash memory architecture Cost efficient NAND flash memory architecture
for XIP (execute-in-place)for XIP (execute-in-place) Why NAND flash memory?Why NAND flash memory?
NAND offersNAND offers extremely high cell densities high capacity fast write and erase rates low cost
BackgroundsBackgrounds NAND vs. NOR NAND vs. NOR
NORNOR NANDNAND
CapacityCapacity 1MB-32MB1MB-32MB 16MB-512MB16MB-512MB
XIPXIP YesYes NoNo
PerformanPerformancece
Very slow eraseVery slow erase
Slow write/Fast readSlow write/Fast readFast erase/write/read Fast erase/write/read
(long initial latency/fast serial (long initial latency/fast serial read)read)
Life spanLife span Less than 10% of Less than 10% of NANDNAND
Over 10 times more than NOROver 10 times more than NOR
InterfaceInterface Full Memory Full Memory InterfaceInterface
I/O (CLE, ALE, OLE signal toggle)I/O (CLE, ALE, OLE signal toggle)
Access Access ModeMode
Random AccessRandom Access Sequential AccessSequential Access
Ideal Ideal UsageUsage
Code Storage Code Storage Data StorageData Storage
Erase Erase blockblock
64KB – 128KB64KB – 128KB 8KB – 64KB8KB – 64KB
PricePrice HighHigh LowLow
BackgroundsBackgrounds Memory Device CharacteristicsMemory Device Characteristics
Mobile SDRAM Mobile SDRAM Good performance & price, but high power consumptionGood performance & price, but high power consumption
Low-power SRAM & Fast SRAMLow-power SRAM & Fast SRAM Very good performance, but high costVery good performance, but high cost
NOR & NANDNOR & NAND Cost, Power Consumption, Read/Write/Erase performanceCost, Power Consumption, Read/Write/Erase performance
v
BackgroundsBackgrounds Mobile Embedded System ArchitectureMobile Embedded System Architecture
Voice-centric 2GVoice-centric 2G Appropriate for low-end phones, which require medium Appropriate for low-end phones, which require medium
performance & costperformance & cost Cannot accommodate the multi-media applications’ needs Cannot accommodate the multi-media applications’ needs
of high performance & huge storage of high performance & huge storage
BackgroundsBackgrounds Data-centric 2.5GData-centric 2.5G
NOR for code storage & NAND for data storageNOR for code storage & NAND for data storage Yet insufficient to 3G real-time applications Yet insufficient to 3G real-time applications Increased number of components increases system costIncreased number of components increases system cost
3G & SmartPhones3G & SmartPhones NAND flash for code/data storageNAND flash for code/data storage Use Shadowing TechniqueUse Shadowing Technique
Code image is copied into systems’ RAM for execution during boCode image is copied into systems’ RAM for execution during boot-timeot-time
High performance but slow boot process & high power consumpHigh performance but slow boot process & high power consumption (SDRAM)tion (SDRAM)
Adoption of demand paging is neededAdoption of demand paging is needed But it cannot be applicable low or mid-end systemBut it cannot be applicable low or mid-end system
Needs NAND-XIP itself !Needs NAND-XIP itself !
NAND XIPNAND XIP
NAND flash characteristicsNAND flash characteristics Structure Structure
Fixed number of blocks & 32 pages in each blocksFixed number of blocks & 32 pages in each blocks Each pages consists of 512bytes data & 16 bytes Each pages consists of 512bytes data & 16 bytes
spare data for auxiliary information (bad block id. or spare data for auxiliary information (bad block id. or ECC data)ECC data)
Read/Write/EraseRead/Write/Erase Read/write is performed in page unitRead/write is performed in page unit Erase is performed in block unitErase is performed in block unit
ReliabilityReliability Bad block management Bad block management EDC/ECC for bit-flippingEDC/ECC for bit-flipping
NAND XIPNAND XIP Basic ImplementationBasic Implementation
NAND XIP is implemented using NAND XIP is implemented using Small size of buffer Small size of buffer I/O interface – Memory interface conversionI/O interface – Memory interface conversion
LimitationLimitation Poor average access performance Poor average access performance Currently basic XIP area is limited to boot code Currently basic XIP area is limited to boot code
NAND XIPNAND XIP Obstacles of general NAND XIPObstacles of general NAND XIP
Average memory access timeAverage memory access time Average access time of NAND flash should be comparable to that of otheAverage access time of NAND flash should be comparable to that of othe
r memoriesr memories Worst case handlingWorst case handling
Cache miss handling is critical problem in real time environmentCache miss handling is critical problem in real time environment Bad block managementBad block management
Must hide memory space discontinuity caused by bad blockMust hide memory space discontinuity caused by bad block Approach of this paper : Approach of this paper : Intelligent CachingIntelligent Caching
Highest cache hit ratio by Priority-based CachingHighest cache hit ratio by Priority-based Caching Reduced access latency by Profile-based Prefetching techniqueReduced access latency by Profile-based Prefetching technique Bad block management using PAT (page address translation)Bad block management using PAT (page address translation)
Intelligent Caching Intelligent Caching ArchitectureArchitecture
Profile-guided static analysis Profile-guided static analysis Profiling process gathers following information staticallProfiling process gathers following information staticall
yy Access pattern, Access pattern, Prefetching informationPrefetching information
Divide code pages into Divide code pages into High priority : OS code, system libraries, real-time applicationsHigh priority : OS code, system libraries, real-time applications Mid priority : Normal application codeMid priority : Normal application code Low priority : sequential or boot strapping codeLow priority : sequential or boot strapping code
Page priority & Prefetching information is stored in spaPage priority & Prefetching information is stored in spare area, and used by cache controllerre area, and used by cache controller
Intelligent Caching Intelligent Caching ArchitectureArchitecture
Victim CacheVictim Cache Small size of fully associated cache Small size of fully associated cache blocks replaced from main cache are stored (swappingblocks replaced from main cache are stored (swapping
-operation)-operation) Prevent unnecessary conflict miss Prevent unnecessary conflict miss
PAT(page address translation)PAT(page address translation) Bad block managementBad block management
Remaps pages in bad blocks to pages in good blocksRemaps pages in bad blocks to pages in good blocks Assist low priority pages management Assist low priority pages management
by remapping requested pages to swapped pages in system by remapping requested pages to swapped pages in system memorymemory
Intelligent Caching Intelligent Caching ArchitectureArchitecture
Intelligent Caching Intelligent Caching ArchitectureArchitecture
ScenarioScenario Reqeust AReqeust A
A is cached in main cacheA is cached in main cache Request B (conflict with A)Request B (conflict with A)
B is moved to system memoryB is moved to system memory PAT is updated to remap CPAT is updated to remap C
Request C (conflict with A)Request C (conflict with A) C replaces A in main cacheC replaces A in main cache A is swapped to victim cacheA is swapped to victim cache
Experimental SetupExperimental Setup
Prototype NAND XIP boardPrototype NAND XIP board 32MB NAND flash32MB NAND flash 256KB main cache256KB main cache 4KB victim cache4KB victim cache 10KB SRAM for Tag data10KB SRAM for Tag data
NAND Miss penalty (one NAND Miss penalty (one page)page) 35us : Latency(10us) + 35us : Latency(10us) +
page_read(512 * 50ns) page_read(512 * 50ns)
Experimental ResultsExperimental Results Average Memory Access TimeAverage Memory Access Time
SDRAM shadowing SDRAM shadowing NAND XIP(priority) : 32KB cacheNAND XIP(priority) : 32KB cache NOR XIPNOR XIP NAND XIP(basic) : 32KB cacheNAND XIP(basic) : 32KB cache
Experimental ResultsExperimental Results EnergyEnergy ConsumptionConsumption
NOR XIPNOR XIP NAND XIP(priority) : 32KB cacheNAND XIP(priority) : 32KB cache NAND XIP(basic) : 32KB cacheNAND XIP(basic) : 32KB cache SDRAM shadowing SDRAM shadowing
Experimental ResultsExperimental Results Booting Time & CostBooting Time & Cost
NAND XIP shows reasonable booting time with low costNAND XIP shows reasonable booting time with low cost
ConclusionConclusion NAND XIP is feasibleNAND XIP is feasible
Experiment shows the feasibility of proposed Experiment shows the feasibility of proposed architecture in real-life mobile embedded environmentarchitecture in real-life mobile embedded environment
By applying highly optimized caching techniques geared By applying highly optimized caching techniques geared to the specific features of NAND flash and its applicationto the specific features of NAND flash and its application
Yet, more system-wide approach is needed Yet, more system-wide approach is needed Worst case handling is still not easyWorst case handling is still not easy A new task scheduling algorithm, considering NAND A new task scheduling algorithm, considering NAND
flash operations is helpfulflash operations is helpful