Upload
tosca
View
47
Download
1
Tags:
Embed Size (px)
DESCRIPTION
BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage. Origianal Work Of Hyojun Kim and Seongjun Ahn Software Laboratory of Samsung Electronics, Korea Presented At : FAST'08 , March, 2008. Neha Sahay and Sreeram Potluri. Flash!! Flash!!. - PowerPoint PPT Presentation
Citation preview
BPLRU: A Buffer Management Scheme for
Improving Random Writes in Flash Storage
Origianal Work Of Hyojun Kim and Seongjun Ahn
Software Laboratory of Samsung Electronics, KoreaPresented At : FAST'08 , March, 2008
Neha Sahay and Sreeram Potluri
Flash!! Flash!!• Speed of traditional hard disks in bound down by the speed of mechanical parts.• Decreasing costs (by 50% per year) of flash presents us with an alternative.• Advantages of Flash : High random read performance Very low power consumption Smaller and portable Shock resistance• Robust• Disadvantages of Flash: Very poor random write performance Limited Life time (100,000 erases for SLC NAND and 10,000 for MLC NAND)
Outline
• Characteristics of Flash• Flash Translation Layer• Existing Techniques and Related Work• BPLRU• Implementation Details• Evaluation• Conclusion
Characteristics of Flash• Planes, blocks and pages.• Erased before programmed. Random
rewrites are not allowed.• Read/Write in pages but we erase in
blocks.• Effectively we write sequentially
within a page boundary.• Erase operation takes a much longer
time.• Requires wear-leveling.• An FTL masks these properties and
emulates a normal hard disk.
Flash memory has poor performance for random writes while it has good read and sequential write performance.
Flash Translation Layer• Emulates hard disk and provides logical sector updates.• Types :
– Page Mapping• Maintains mapping information at the page level• Requires large amount of memory for mapping information.
– Block Mapping• Maintains mapping information at the block level• A page update requires a whole block update.
– Hybrid Mapping• Maintains block level mapping but page position is not fixed inside a block.• Requires additional offset-level information.
– Other Mapping Techniques• Exploited write locality using some reserved locations.• Effective algorithms can be applied for these reserved locations while simple block
mapping for others
• Log-Block FTL Writes to a log block that use a fine-grained mapping policy. Once full it is merged with the older block and written to a new block. The older location and the log block become free blocks. Full Merge and Switch Merge
Flash Translation Layer
P0 Valid
P2 Invalid
P2 Valid
P3 Invalid
P3 Valid
P0
P1
P2
P3
P4
Data Block New Block Log Block
Flash Aware Caches• Use of RAM Buffer inside SSDs• Clean First LRU (CFLRU)
– Chooses a clean page as a victim rather than a dirty page.
• Flash Aware Buffer Policy (FAB)– Buffers that belong to the same erasable block are grouped together.– The block with maximum number of buffers is evicted.– Works well for sequential writes. Effective than LRU.
• Related Work – DULO – proposed by Zhang et al.– Exploits both temporal and spatial locality.– Dual locality caching.
P13
P11
P12 P21
P31
P32
BPLRU – Block Padding LRU• Applied to write buffer inside SSDs.• Reads are simply redirected to the FTL.• Coverts random writes to sequential writes.
• Three Pronged – Block-level LRU– Page Padding– LRU Compensation
Block-Level LRU• RAM Buffers are grouped in blocks that have same size as erasable block size in
NAND. • Groups all pages in the same erasable block range into one buffer block.• Least recently used block is selected as the victim instead of a page.
9 1 0
19 15
12
6 5
MRU Block LRU Block
6 5 9 1
0
19 15
12
6 Referenced
Block-Level LRU• Example – 0,4,8,12,16,1,5,9,13,17,2,6,10.• 2 Log blocks and 2 pages can reside on write buffer.
• 12 Merges in FTL while only 7 merges in Block-Level LRU.
Page Padding• Replaces expensive full merge to
switch merge
LRU Compensation• To compensate for sequential
writes
Implementation• Two-level indexing using two sets of
nodes, Block Header Nodes and Sector Nodes.
• Two link points for LRU(nPrev, nNext), Block Number(nLbn), Number of sectors in a Block(nNumOfSct) and Sector Buffer(aBuffer).
• For Sector Nodes, aBuffer[] contains contents of writing sector.
• For Block Header Nodes, it contains secondary index table pointing to its child nodes.
• Faster searching of sector nodes; memory overhead is the cost.
EvaluationMS Office Installation task (NTFS)
• 43% faster throughput than FAB for 16-MB buffer.
• 41% lower erase count than FAB for 16-MB buffer.
EvaluationTemporary Internet files of Internet Explorer (NTFS)
• Performance slightly worse than FAB for buffers of size less than 8 MB.
• For buffer size greater than 8MB, performance improves.
• Erase count always less than FAB.
EvaluationHDD test of PCMark 05 (NTFS)
• Performance and erase count very similar to the previous Temporary Internet Files test.
EvaluationRandom writes by Iometer (NTFS)
• No locality exists in Iometer.• FAB shows better write performance,
getting better with bigger buffer sizes.
• BPLRU shows better erase counts due to page padding.
EvaluationCopying MP3 Files (FAT16)
• 90 MP3 files with an average size of 4.8 MB.
• Sequential write pattern.
EvaluationP2P File Download, a 634-MB file (FAT 16)
• Peer-to-peer program randomly writes small parts of a file as different parts of the file are getting downloaded concurrently from numerous peers.
• This graph illustrates the poor performance of flash storage for random writes.
• FAB requires more RAM for better performance.
• Performance improves significantly by BPLRU.
EvaluationUntar Linux Source Files
• From linux-2.6.21.tar.gz (EXT3).• BPLRU shows 39% better throughput
than FAB.
EvaluationKernel Compile
• With Linux-2.6.21 sources (EXT3).• BPLRU shows 23% better
performance than FAB.
EvaluationPostmark
• Evaluation the performance of I/O subsystems.• One of file creation, deletion, read or write is executed at random.
NTFS FAT16 EXT3
EvaluationBuffer Flushing Effect
• File systems use buffer flush command to ensure data integrity.
• Reduces the effect of write buffering.• With a 16-MB buffer reduces the
throughput by approximately 23%.
Conclusion• The proposed BPLRU scheme is more effective than the previous two methods,
LRU and FAB.• Two important issues still remain,
– When a RAM buffer is used, integrity of file system may be damaged due to sudden power failures.
– Frequent buffer flush commands from the host computer degrades BPLRU performance.
• Future Research,– Hardware like small battery or capacitor, or non volatile magneto resistive
RAM or ferroelectric RAM.– Host side buffer cache policy similar as in the storage device.– Read requests with a much bigger RAM capacity and an asymmetrically
weighted buffer management policy.