BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage

BPLRU: A Buffer Management Scheme for

Improving Random Writes in Flash Storage

Origianal Work Of Hyojun Kim and Seongjun Ahn

Software Laboratory of Samsung Electronics, KoreaPresented At : FAST'08 , March, 2008

Neha Sahay and Sreeram Potluri

Flash!! Flash!!• Speed of traditional hard disks in bound down by the speed of mechanical parts.• Decreasing costs (by 50% per year) of flash presents us with an alternative.• Advantages of Flash : High random read performance Very low power consumption Smaller and portable Shock resistance• Robust• Disadvantages of Flash: Very poor random write performance Limited Life time (100,000 erases for SLC NAND and 10,000 for MLC NAND)

Outline

• Characteristics of Flash• Flash Translation Layer• Existing Techniques and Related Work• BPLRU• Implementation Details• Evaluation• Conclusion

Characteristics of Flash• Planes, blocks and pages.• Erased before programmed. Random

rewrites are not allowed.• Read/Write in pages but we erase in

blocks.• Effectively we write sequentially

within a page boundary.• Erase operation takes a much longer

time.• Requires wear-leveling.• An FTL masks these properties and

emulates a normal hard disk.

Flash memory has poor performance for random writes while it has good read and sequential write performance.

Flash Translation Layer• Emulates hard disk and provides logical sector updates.• Types :

– Page Mapping• Maintains mapping information at the page level• Requires large amount of memory for mapping information.

– Block Mapping• Maintains mapping information at the block level• A page update requires a whole block update.

– Hybrid Mapping• Maintains block level mapping but page position is not fixed inside a block.• Requires additional offset-level information.

– Other Mapping Techniques• Exploited write locality using some reserved locations.• Effective algorithms can be applied for these reserved locations while simple block

mapping for others

• Log-Block FTL Writes to a log block that use a fine-grained mapping policy. Once full it is merged with the older block and written to a new block. The older location and the log block become free blocks. Full Merge and Switch Merge

Flash Translation Layer

P0 Valid

P2 Invalid

P2 Valid

P3 Invalid

P3 Valid

P0

P1

P2

P3

P4

Data Block New Block Log Block

Flash Aware Caches• Use of RAM Buffer inside SSDs• Clean First LRU (CFLRU)

– Chooses a clean page as a victim rather than a dirty page.

• Flash Aware Buffer Policy (FAB)– Buffers that belong to the same erasable block are grouped together.– The block with maximum number of buffers is evicted.– Works well for sequential writes. Effective than LRU.

• Related Work – DULO – proposed by Zhang et al.– Exploits both temporal and spatial locality.– Dual locality caching.

P13

P11

P12 P21

P31

P32

BPLRU – Block Padding LRU• Applied to write buffer inside SSDs.• Reads are simply redirected to the FTL.• Coverts random writes to sequential writes.

• Three Pronged – Block-level LRU– Page Padding– LRU Compensation

Block-Level LRU• RAM Buffers are grouped in blocks that have same size as erasable block size in

NAND. • Groups all pages in the same erasable block range into one buffer block.• Least recently used block is selected as the victim instead of a page.

9 1 0

19 15

12

6 5

MRU Block LRU Block

6 5 9 1

0

19 15

12

6 Referenced

Block-Level LRU• Example – 0,4,8,12,16,1,5,9,13,17,2,6,10.• 2 Log blocks and 2 pages can reside on write buffer.

• 12 Merges in FTL while only 7 merges in Block-Level LRU.

Page Padding• Replaces expensive full merge to

switch merge

LRU Compensation• To compensate for sequential

writes

Implementation• Two-level indexing using two sets of

nodes, Block Header Nodes and Sector Nodes.

• Two link points for LRU(nPrev, nNext), Block Number(nLbn), Number of sectors in a Block(nNumOfSct) and Sector Buffer(aBuffer).

• For Sector Nodes, aBuffer[] contains contents of writing sector.

• For Block Header Nodes, it contains secondary index table pointing to its child nodes.

• Faster searching of sector nodes; memory overhead is the cost.

EvaluationMS Office Installation task (NTFS)

• 43% faster throughput than FAB for 16-MB buffer.

• 41% lower erase count than FAB for 16-MB buffer.

EvaluationTemporary Internet files of Internet Explorer (NTFS)

• Performance slightly worse than FAB for buffers of size less than 8 MB.

• For buffer size greater than 8MB, performance improves.

• Erase count always less than FAB.

EvaluationHDD test of PCMark 05 (NTFS)

• Performance and erase count very similar to the previous Temporary Internet Files test.

EvaluationRandom writes by Iometer (NTFS)

• No locality exists in Iometer.• FAB shows better write performance,

getting better with bigger buffer sizes.

• BPLRU shows better erase counts due to page padding.

EvaluationCopying MP3 Files (FAT16)

• 90 MP3 files with an average size of 4.8 MB.

• Sequential write pattern.

EvaluationP2P File Download, a 634-MB file (FAT 16)

• Peer-to-peer program randomly writes small parts of a file as different parts of the file are getting downloaded concurrently from numerous peers.

• This graph illustrates the poor performance of flash storage for random writes.

• FAB requires more RAM for better performance.

• Performance improves significantly by BPLRU.

EvaluationUntar Linux Source Files

• From linux-2.6.21.tar.gz (EXT3).• BPLRU shows 39% better throughput

than FAB.

EvaluationKernel Compile

• With Linux-2.6.21 sources (EXT3).• BPLRU shows 23% better

performance than FAB.

EvaluationPostmark

• Evaluation the performance of I/O subsystems.• One of file creation, deletion, read or write is executed at random.

NTFS FAT16 EXT3

EvaluationBuffer Flushing Effect

• File systems use buffer flush command to ensure data integrity.

• Reduces the effect of write buffering.• With a 16-MB buffer reduces the

throughput by approximately 23%.

Conclusion• The proposed BPLRU scheme is more effective than the previous two methods,

LRU and FAB.• Two important issues still remain,

– When a RAM buffer is used, integrity of file system may be damaged due to sudden power failures.

– Frequent buffer flush commands from the host computer degrades BPLRU performance.

• Future Research,– Hardware like small battery or capacitor, or non volatile magneto resistive

RAM or ferroelectric RAM.– Host side buffer cache policy similar as in the storage device.– Read requests with a much bigger RAM capacity and an asymmetrically

weighted buffer management policy.

Documents

BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage