Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
NAND Fl h b d NAND Fl h b d NAND Flash-based Storage
NAND Flash-based StorageStorageStorage
Jin-Soo Kim ([email protected])Jin Soo Kim ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhtt // l kk dhttp://csl.skku.edu
Today’s TopicsToday’s TopicsToday s TopicsToday s TopicsNAND flash memoryy
Flash Translation Layer (FTL)Flash Translation Layer (FTL)
OS implications
2CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
Flash Memory CharacteristicsFlash Memory CharacteristicsFlash Memory CharacteristicsFlash Memory CharacteristicsFlash memoryy• Non-volatile, Updateable, High-density• Low cost, Low power consumption, High reliability, p p , g y
Erase-before-write• Read
1 1 1 1 1 1 1 1
• Read• Write or Program: 1 0• Erase: 0 1 1 1 0 1 1 0 1 0
write(program)
• Erase: 0 1
Read faster than write/eraseB lk
1 1 0 1 1 0 1 0
erase
Bulk erase• Erase unit: block
1 1 1 1 1 1 1 1
3CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
• Program unit: byte or word (NOR), page (NAND)
NOR FlashNOR FlashNOR FlashNOR FlashNOR flash• Random, direct access
interface• Fast random reads• Slow erase and write• Mainly for code storage
• Intel, Spansion, STMicroSTMicro, ...
4CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
NAND FlashNAND FlashNAND FlashNAND FlashNAND flash• I/O mapped access• Smaller cell size• Lower cost• Smaller size erase blocksS a e s e e ase b oc s• Better performance for
erase and write• Mainly for data storage
• Samsung, Toshiba, Hynix
5CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
Hynix, ...
NAND Flash ArchitectureNAND Flash ArchitectureNAND Flash ArchitectureNAND Flash Architecture2Gb NAND flash device organizationg
6CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
Source: Micron Technology, Inc.
NAND Flash TypesNAND Flash TypesNAND Flash Types NAND Flash Types
SLC NAND1 SLC NAND2SLC NAND(small block)
SLC NAND(large block)
MLC NAND3
Page size (Bytes) 512+16 2,048+64 4,096+128
Pages / Block 32 64 128
Block size 16KB 128KB 512KB
t (read) 15 μs (max) 20 μs (max) 50 μs (max)tR (read) 15 μs (max) 20 μs (max) 50 μs (max)
tPROG (program)200 μs (typ)500 μs (max)
200 μs (typ)700 μs (max)
600 μs (typ)1,200 μs (max)
tBERS (erase)2 ms (typ) 3 ms (max)
1.5 ms (typ)2 ms (max)
3 ms (typ)
NOP 1 (main), 2 (spare) 4 1
Endurance Cycles 100K 100K 10K
ECC(per 512Bytes)
1 bit ECC 2 bits EDC
1 bit ECC2 bits EDC
4 bits ECC5 bits EDC
7CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
(per 512Bytes) 2 bits EDC 2 bits EDC 5 bits EDC
1 Samsung K9F1208X0C (512Mb) 2 Samsung K9K8G08U0A (8Gb) 3 Micron Technology Inc.
NAND ApplicationsNAND ApplicationsNAND ApplicationsNAND ApplicationsUniversal Flash Drives (UFDs)( )Flash cards• CompactFlash MMC SD Memory stickCompactFlash, MMC, SD, Memory stick, …
Embedded devicesCell phones MP3 players PMPs PDAs• Cell phones, MP3 players, PMPs, PDAs, Digital TVs, Set-top boxes, Car navigators, …
Hybrid HDDsHybrid HDDsIntel Turbo MemorySSDs (Solid-State Disks)
8CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
SSDs (1)SSDs (1)SSDs (1)SSDs (1)HDDs vs. SSDs 1.8” HDD Flash SSD
( 8 4 4 1 )
2.5” HDD Flash SSD(101 70 9 3 )
(78.5x54x4.15mm)
(101x70x9.3mm)
Top BottomTop Bottom
9CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
SSDs (2)SSDs (2)SSDs (2)SSDs (2)Feature SSD (Samsung) HDD (Seagate)
Model MMDOE56G5MXP (PM800) ST9500420AS (Momentus 7200.4)
Capacity256GB(16Gb MLC x 128, 8 channels)
500GB(2 Discs, 4 Heads, 7200RPM)
Form factor2.5”Weight: 84g
2.5”Weight: 110g
Host interfaceSerial ATA‐2 (3.0 Gbps)Host transfer rate: 300MB
Serial ATA‐2 (3.0 Gbps)Host transfer rate: 300MBHost transfer rate: 300MB Host transfer rate: 300MB
Power consumptionActive: 0.26WIdle/Standby/Sleep: 0.15W
Active: 2.1W (Read), 2.2W (Write)Idle: 0.69W, Standby/Sleep: 0.2W
Sequential read: Up to 220 MB/s Power‐on to ready: 4 5 secPerformance
Sequential read: Up to 220 MB/sSequential write: Up to 185 MB/s
Power on to ready: 4.5 secAverage latency: 4.17 msec
Measured performance1
(On MacBook Pro,Sequential read: 176.73 MB/sSequential write: 159.98 MB/s
/
Sequential read: 86.07 MB/sSequential write: 84.64 MB/s
/256KB for sequential, 4KB for random)
Random read: 10.56 MB/sRandom write: 2.93 MB/s
Random read: 0.61 MB/sRandom write: 1.28 MB/s
Price2 795,000 won 183,840 won
10CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
1 Source: http://forums.macrumors.com/showthread.php?t=6585712 Source: http://www.danawa.com (As of Nov. 24, 2009)
NAND Constraints (1)NAND Constraints (1)NAND Constraints (1)NAND Constraints (1)No in-place updatep p• Require sector remapping (or address translation)
Bit errorsBit errors• Require the use of error correction codes (ECC)
Bad blocksBad blocks• Factory-marked & run-time bad blocks
R i b d bl k i• Require bad block remapping
Limited program/erase cycles• < 100K for SLCs• < 10K for MLCs
11CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
• Require wear-leveling
NAND Constraints (2)NAND Constraints (2)NAND Constraints (2)NAND Constraints (2)Limited NOP (Number of Programming)( g g)• 1 / sector for most SLCs (4 for 2KB page)• 1 / page for most MLCs/ p g
Sequential page programming• For large block SLCs and MLCs
Pair-page programming in MLCsPair page programming in MLCs• Two pages inside a block are linked together• Performance difference• Performance difference• Interference
12CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
FTL (1)FTL (1)FTL (1)FTL (1)Flash cards internals
13CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
FTL (2)FTL (2)FTL (2)FTL (2)SSDs internals
14CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
Source: Mtron Technology
FTL (3)FTL (3)FTL (3)FTL (3)Flash Translation Layer (FTL)y ( )• A software layer to make NAND flash fully emulate
traditional block devices (e.g., disks).
Why FTL?File System
R d S t W it S t
File System
R d S t W it S ty
• No in-place-update• Bulk erase
Read Sectors Write Sectors
Mismatch!Read Sectors Write Sectors
Read Sectors Write Sectors
Bulk erase
+Device Driver
Read Write Erase
+Device Driver
FTL
++
Flash MemoryFlash Memory
+
Flash MemoryFlash Memory
15CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
FTL (4)FTL (4)FTL (4)FTL (4)For performancep• Sector mapping (or address translation)• Garbage collectiong• Interleaving over multiple channels & flash chips• Request schedulingequest sc edu g• Buffer management
For reliabilityFor reliability• Power-off recovery• Wear leveling• Wear-leveling• Bad block management• Error correction code (ECC)
16CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
• Error correction code (ECC)
Sector Mapping (1)Sector Mapping (1)Sector Mapping (1)Sector Mapping (1)General page mappingp g pp g• Most flexible• Efficient handling of small writes Data blocks
Logblocksg
• Large memory footprint– One mapping entry per page:
00114414 1’
blocks blocks
LPN 0LPN 0LPN 1
32MB for 32GB MLC (4KB page)– Bitmap for page validity
Per block invalid page counter
101055667
12’2’8’8’1’’1’’
LPN 1LPN 2LPN 2LPN 3LPN 3LPN 4LPN 4LPN 5LPN 5LPN 6LPN 6LPN 7– Per-block invalid page counter
• Sensitive to the amount of reserved blocks
2288111113 2’’
LPN 7LPN 8LPN 8LPN 9LPN 9LPN 10LPN 10LPN 11LPN 11LPN 12LPN 12of reserved blocks
• Performance affected asthe system ages
1515993312
12’12’13’13’9’9’
LPN 13LPN 14LPN 14LPN 15LPN 15
17CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
t e syste agesW =
Sector Mapping (2)Sector Mapping (2)Sector Mapping (2)Sector Mapping (2)Naïve block mappingpp g• Each table entry maps one block• Small RAM usage Data blocks
Logblocksg
• Inefficient handling of small writes 0011223
001’1’223
blocks blocks
4455667
LBN 0LBN 0LBN 1LBN 1
4’4’5’5’6’6’7’7’
8899101011
LBN 2LBN 3LBN 3
12121313141415
18CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
W =
Sector Mapping (3)Sector Mapping (3)Sector Mapping (3)Sector Mapping (3)Log block scheme [IEEE TOCE 2002]g• A small number of log blocks• 1+ log block(s) per data block Data blocks
Logblocksg ( ) p
• Page mapping for log blocks 0011223
1’1’2’2’1’’1’’2’’
blocks blocks
• Full/partial/switch merge• Switch merge for sequential
4455667
LBN 0LBN 0LBN 1LBN 1Switch merge for sequential
updates 8899101011
LBN 2LBN 3LBN 3
8’8’
• Low log block utilization 12121313141415
19CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
W =
Sector Mapping (4)Sector Mapping (4)Sector Mapping (4)Sector Mapping (4)FAST [ACM TECS 2007]• Log blocks shared by all data blocks• Sequential/random log blocks Data blocks
Logblocksq / g
• Improved log block utilization
0011223
blocks blocks
p o ed og b oc ut at o• Increased merge time 44
55667
1’1’2’2’8’8’1’’
LBN 0LBN 0LBN 1LBN 1
8899101011
LBN 2LBN 3LBN 3
2’’2’’12’12’13’13’9’
12121313141415
20CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
W =
Sector Mapping (5)Sector Mapping (5)Sector Mapping (5)Sector Mapping (5)Superblock FTL [ACM EMSOFT 2006]p• Superblock = logically adjacent N blocks• A superblock shares log blocks Data blocks
Logblocksp g
• Up to M log blocks per superblock
0011223 1’
blocks blocks
LPN 0LPN 0LPN 1LPN 1LPN 2
• Page mapping within a superblock
4455667
2’2’1’’1’’2’’2’’LSBN 0LSBN 0
LSBN 1LSBN 1
LPN 2LPN 3LPN 3LPN 4LPN 4LPN 5LPN 5LPN 6LPN 6LPN 7LPN 7
• Hot/cold pages separation 8899101011 8’
LPN 8LPN 8LPN 9LPN 9LPN 10LPN 10
• The amount of mappinginformation increased
12121313141415
12’12’13’13’9’9’
LPN 11LPN 12LPN 12LPN 13LPN 13LPN 14LPN 14LPN 15LPN 15
21CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
W =
Sector Mapping (6)Sector Mapping (6)Sector Mapping (6)Sector Mapping (6)μ-FTL [ACM EMSOFT 2008]μ• Page mapping• Multiple mapping granularities Data blocks
Logblocksp pp g g
– Based on extents– Reduce the amount of mapping
i f ti
0011223 1’
blocks blocks
information
• Requires more sophisticatedindex structure
4455667
12’2’8’8’1’’1’’
LPN 0, 1LPN 0, 1LPN 1, 1LPN 1, 1LPN 2, 1LPN 2, 1LPN 3, 1LPN 3, 1LPN 4, 4LPN 4, 4LPN 8 1index structure
– μ-Tree is used to store themapping information
8899121213 2’’
LPN 8, 1LPN 9, 1LPN 9, 1LPN 10, 2LPN 10, 2LPN 12, 2LPN 12, 2LPN 14, 2LPN 14, 2
• Tunable memory footprint– Frequently accessed mapping
i f i h d i
10101111141415
12’12’13’13’9’9’
22CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
information cached in memoryW =
Performance (1)Performance (1)Performance (1)Performance (1)Simulation environment• 4GB flash memory
– Large block SLC NAND (2KB page, 128KB block)
• FTL schemes– Naïve block mapping– Replacement block– Log block
Superblock– Superblock
• Workload– Trace from PC using NTFSTrace from PC using NTFS
23CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
Performance (2)Performance (2)Performance (2)Performance (2)Extra erase and write operationsp• 256 extra blocks
300000
350000
16000000
18000000
200000
250000
10000000
12000000
14000000
Sain
Replacement
Naive
100000
150000
4000000
6000000
8000000Replacement
Logblock
Superblock
0
50000
Erase
0
2000000
Write(GC)
24CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
OS Implications (1)OS Implications (1)OS Implications (1)OS Implications (1)NAND flash has different characteristics compared to disks• No seek time• Asymmetric read/write access times• No in-place-updatep p• Good sequential read/sequential write/random read
performance, but bad random write performancep p• Wear-leveling• …
• Traditional operating systems have been optimized for disks What should be changed?
25CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
for disks. What should be changed?
OS Implications (2)OS Implications (2)OS Implications (2)OS Implications (2)SSD support in Microsoft Windows 7pp• Turn off “defragmentation” for SSDs• New “TRIM” command
– Remove-on-delete
• Align file system partition with SSD layout• Larger block size proposal (4KB)
26CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])
SummarySummarySummarySummarySoftware support for NAND flash memory is pp yessential for improving performance & reliability of the systemy y
We need to revisit OS policies andWe need to revisit OS policies and mechanisms which are optimized for disks
27CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])