27
NAND Fl h b d NAND Fl h b d NAND Flash-based Storage NAND Flash-based Storage Storage Storage Jin-Soo Kim ( [email protected]) Jin Soo Kim ( [email protected]) Computer Systems Laboratory Sungkyunkwan University htt // l kk d http://csl.skku.edu

NAND Fl hNAND Flash-bd based Storagecsl.skku.edu/uploads/CSE3008F09/16-flash.pdf · 2011. 2. 14. · SSDs(2)SSDs (2) Feature SSD (Samsung) HDD (Seagate) Model MMDOE56G5MXP (PM800)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • NAND Fl h b d NAND Fl h b d NAND Flash-based Storage

    NAND Flash-based StorageStorageStorage

    Jin-Soo Kim ([email protected])Jin Soo Kim ([email protected])Computer Systems Laboratory

    Sungkyunkwan Universityhtt // l kk dhttp://csl.skku.edu

  • Today’s TopicsToday’s TopicsToday s TopicsToday s TopicsNAND flash memoryy

    Flash Translation Layer (FTL)Flash Translation Layer (FTL)

    OS implications

    2CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • Flash Memory CharacteristicsFlash Memory CharacteristicsFlash Memory CharacteristicsFlash Memory CharacteristicsFlash memoryy• Non-volatile, Updateable, High-density• Low cost, Low power consumption, High reliability, p p , g y

    Erase-before-write• Read

    1 1 1 1 1 1 1 1

    • Read• Write or Program: 1 0• Erase: 0 1 1 1 0 1 1 0 1 0

    write(program)

    • Erase: 0 1

    Read faster than write/eraseB lk

    1 1 0 1 1 0 1 0

    erase

    Bulk erase• Erase unit: block

    1 1 1 1 1 1 1 1

    3CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    • Program unit: byte or word (NOR), page (NAND)

  • NOR FlashNOR FlashNOR FlashNOR FlashNOR flash• Random, direct access

    interface• Fast random reads• Slow erase and write• Mainly for code storage

    • Intel, Spansion, STMicroSTMicro, ...

    4CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • NAND FlashNAND FlashNAND FlashNAND FlashNAND flash• I/O mapped access• Smaller cell size• Lower cost• Smaller size erase blocksS a e s e e ase b oc s• Better performance for

    erase and write• Mainly for data storage

    • Samsung, Toshiba, Hynix

    5CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    Hynix, ...

  • NAND Flash ArchitectureNAND Flash ArchitectureNAND Flash ArchitectureNAND Flash Architecture2Gb NAND flash device organizationg

    6CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    Source: Micron Technology, Inc.

  • NAND Flash TypesNAND Flash TypesNAND Flash Types NAND Flash Types

    SLC NAND1 SLC NAND2SLC NAND(small block)

    SLC NAND(large block)

    MLC NAND3

    Page size (Bytes) 512+16 2,048+64 4,096+128

    Pages / Block 32 64 128

    Block size 16KB 128KB 512KB

    t (read) 15 μs (max) 20 μs (max) 50 μs (max)tR (read) 15 μs (max) 20 μs (max) 50 μs (max)

    tPROG (program)200 μs (typ)500 μs (max)

    200 μs (typ)700 μs (max)

    600 μs (typ)1,200 μs (max)

    tBERS (erase)2 ms (typ) 3 ms (max)

    1.5 ms (typ)2 ms (max)

    3 ms (typ)

    NOP 1 (main), 2 (spare) 4 1

    Endurance Cycles 100K 100K 10K

    ECC(per 512Bytes)

    1 bit ECC 2 bits EDC

    1 bit ECC2 bits EDC

    4 bits ECC5 bits EDC

    7CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    (per 512Bytes) 2 bits EDC 2 bits EDC 5 bits EDC

    1 Samsung K9F1208X0C (512Mb) 2 Samsung K9K8G08U0A (8Gb) 3 Micron Technology Inc.

  • NAND ApplicationsNAND ApplicationsNAND ApplicationsNAND ApplicationsUniversal Flash Drives (UFDs)( )Flash cards• CompactFlash MMC SD Memory stickCompactFlash, MMC, SD, Memory stick, …

    Embedded devicesCell phones MP3 players PMPs PDAs• Cell phones, MP3 players, PMPs, PDAs, Digital TVs, Set-top boxes, Car navigators, …

    Hybrid HDDsHybrid HDDsIntel Turbo MemorySSDs (Solid-State Disks)

    8CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • SSDs (1)SSDs (1)SSDs (1)SSDs (1)HDDs vs. SSDs 1.8” HDD Flash SSD

    ( 8 4 4 1 )

    2.5” HDD Flash SSD(101 70 9 3 )

    (78.5x54x4.15mm)

    (101x70x9.3mm)

    Top BottomTop Bottom

    9CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • SSDs (2)SSDs (2)SSDs (2)SSDs (2)Feature SSD (Samsung) HDD (Seagate)

    Model MMDOE56G5MXP (PM800) ST9500420AS (Momentus 7200.4)

    Capacity256GB(16Gb MLC x 128, 8 channels)

    500GB(2 Discs, 4 Heads, 7200RPM)

    Form factor2.5”Weight: 84g

    2.5”Weight: 110g

    Host interfaceSerial ATA‐2 (3.0 Gbps)Host transfer rate: 300MB

    Serial ATA‐2 (3.0 Gbps)Host transfer rate: 300MBHost transfer rate: 300MB Host transfer rate: 300MB

    Power consumptionActive: 0.26WIdle/Standby/Sleep: 0.15W

    Active: 2.1W (Read), 2.2W (Write)Idle: 0.69W, Standby/Sleep: 0.2W

    Sequential read: Up to 220 MB/s Power‐on to ready: 4 5 secPerformance

    Sequential read:    Up to 220 MB/sSequential write:   Up to 185 MB/s

    Power on to ready: 4.5 secAverage latency:      4.17 msec

    Measured performance1

    (On MacBook Pro,Sequential read: 176.73 MB/sSequential write: 159.98 MB/s

    /

    Sequential read: 86.07 MB/sSequential write: 84.64 MB/s

    /256KB for sequential, 4KB for random)

    Random read: 10.56 MB/sRandom write: 2.93 MB/s

    Random read: 0.61 MB/sRandom write: 1.28 MB/s

    Price2 795,000 won 183,840 won

    10CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    1 Source: http://forums.macrumors.com/showthread.php?t=6585712 Source: http://www.danawa.com (As of Nov. 24, 2009)

  • NAND Constraints (1)NAND Constraints (1)NAND Constraints (1)NAND Constraints (1)No in-place updatep p• Require sector remapping (or address translation)

    Bit errorsBit errors• Require the use of error correction codes (ECC)

    Bad blocksBad blocks• Factory-marked & run-time bad blocks

    R i b d bl k i• Require bad block remapping

    Limited program/erase cycles• < 100K for SLCs• < 10K for MLCs

    11CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    • Require wear-leveling

  • NAND Constraints (2)NAND Constraints (2)NAND Constraints (2)NAND Constraints (2)Limited NOP (Number of Programming)( g g)• 1 / sector for most SLCs (4 for 2KB page)• 1 / page for most MLCs/ p g

    Sequential page programming• For large block SLCs and MLCs

    Pair-page programming in MLCsPair page programming in MLCs• Two pages inside a block are linked together• Performance difference• Performance difference• Interference

    12CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • FTL (1)FTL (1)FTL (1)FTL (1)Flash cards internals

    13CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • FTL (2)FTL (2)FTL (2)FTL (2)SSDs internals

    14CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    Source: Mtron Technology

  • FTL (3)FTL (3)FTL (3)FTL (3)Flash Translation Layer (FTL)y ( )• A software layer to make NAND flash fully emulate

    traditional block devices (e.g., disks).

    Why FTL?File System

    R d S t W it S t

    File System

    R d S t W it S ty

    • No in-place-update• Bulk erase

    Read Sectors Write Sectors

    Mismatch!Read Sectors Write Sectors

    Read Sectors Write Sectors

    Bulk erase

    +Device Driver

    Read Write Erase

    +Device Driver

    FTL

    ++

    Flash MemoryFlash Memory

    +

    Flash MemoryFlash Memory

    15CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • FTL (4)FTL (4)FTL (4)FTL (4)For performancep• Sector mapping (or address translation)• Garbage collectiong• Interleaving over multiple channels & flash chips• Request schedulingequest sc edu g• Buffer management

    For reliabilityFor reliability• Power-off recovery• Wear leveling• Wear-leveling• Bad block management• Error correction code (ECC)

    16CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    • Error correction code (ECC)

  • Sector Mapping (1)Sector Mapping (1)Sector Mapping (1)Sector Mapping (1)General page mappingp g pp g• Most flexible• Efficient handling of small writes Data blocks

    Logblocksg

    • Large memory footprint– One mapping entry per page:

    00114414 1’

    blocks blocks

    LPN 0LPN 0LPN 1

    32MB for 32GB MLC (4KB page)– Bitmap for page validity

    Per block invalid page counter

    101055667

    12’2’8’8’1’’1’’

    LPN 1LPN 2LPN 2LPN 3LPN 3LPN 4LPN 4LPN 5LPN 5LPN 6LPN 6LPN 7– Per-block invalid page counter

    • Sensitive to the amount of reserved blocks

    2288111113 2’’

    LPN 7LPN 8LPN 8LPN 9LPN 9LPN 10LPN 10LPN 11LPN 11LPN 12LPN 12of reserved blocks

    • Performance affected asthe system ages

    1515993312

    12’12’13’13’9’9’

    LPN 13LPN 14LPN 14LPN 15LPN 15

    17CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    t e syste agesW = 

  • Sector Mapping (2)Sector Mapping (2)Sector Mapping (2)Sector Mapping (2)Naïve block mappingpp g• Each table entry maps one block• Small RAM usage Data blocks

    Logblocksg

    • Inefficient handling of small writes 0011223

    001’1’223

    blocks blocks

    4455667

    LBN 0LBN 0LBN 1LBN 1

    4’4’5’5’6’6’7’7’

    8899101011

    LBN 2LBN 3LBN 3

    12121313141415

    18CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    W = 

  • Sector Mapping (3)Sector Mapping (3)Sector Mapping (3)Sector Mapping (3)Log block scheme [IEEE TOCE 2002]g• A small number of log blocks• 1+ log block(s) per data block Data blocks

    Logblocksg ( ) p

    • Page mapping for log blocks 0011223

    1’1’2’2’1’’1’’2’’

    blocks blocks

    • Full/partial/switch merge• Switch merge for sequential

    4455667

    LBN 0LBN 0LBN 1LBN 1Switch merge for sequential

    updates 8899101011

    LBN 2LBN 3LBN 3

    8’8’

    • Low log block utilization 12121313141415

    19CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    W = 

  • Sector Mapping (4)Sector Mapping (4)Sector Mapping (4)Sector Mapping (4)FAST [ACM TECS 2007]• Log blocks shared by all data blocks• Sequential/random log blocks Data blocks

    Logblocksq / g

    • Improved log block utilization

    0011223

    blocks blocks

    p o ed og b oc ut at o• Increased merge time 44

    55667

    1’1’2’2’8’8’1’’

    LBN 0LBN 0LBN 1LBN 1

    8899101011

    LBN 2LBN 3LBN 3

    2’’2’’12’12’13’13’9’

    12121313141415

    20CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    W = 

  • Sector Mapping (5)Sector Mapping (5)Sector Mapping (5)Sector Mapping (5)Superblock FTL [ACM EMSOFT 2006]p• Superblock = logically adjacent N blocks• A superblock shares log blocks Data blocks

    Logblocksp g

    • Up to M log blocks per superblock

    0011223 1’

    blocks blocks

    LPN 0LPN 0LPN 1LPN 1LPN 2

    • Page mapping within a superblock

    4455667

    2’2’1’’1’’2’’2’’LSBN 0LSBN 0

    LSBN 1LSBN 1

    LPN 2LPN 3LPN 3LPN 4LPN 4LPN 5LPN 5LPN 6LPN 6LPN 7LPN 7

    • Hot/cold pages separation 8899101011 8’

    LPN 8LPN 8LPN 9LPN 9LPN 10LPN 10

    • The amount of mappinginformation increased

    12121313141415

    12’12’13’13’9’9’

    LPN 11LPN 12LPN 12LPN 13LPN 13LPN 14LPN 14LPN 15LPN 15

    21CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    W = 

  • Sector Mapping (6)Sector Mapping (6)Sector Mapping (6)Sector Mapping (6)μ-FTL [ACM EMSOFT 2008]μ• Page mapping• Multiple mapping granularities Data blocks

    Logblocksp pp g g

    – Based on extents– Reduce the amount of mapping

    i f ti

    0011223 1’

    blocks blocks

    information

    • Requires more sophisticatedindex structure

    4455667

    12’2’8’8’1’’1’’

    LPN 0, 1LPN 0, 1LPN 1, 1LPN 1, 1LPN 2, 1LPN 2, 1LPN 3, 1LPN 3, 1LPN 4, 4LPN 4, 4LPN 8 1index structure

    – μ-Tree is used to store themapping information

    8899121213 2’’

    LPN 8, 1LPN 9, 1LPN 9, 1LPN 10, 2LPN 10, 2LPN 12, 2LPN 12, 2LPN 14, 2LPN 14, 2

    • Tunable memory footprint– Frequently accessed mapping

    i f i h d i

    10101111141415

    12’12’13’13’9’9’

    22CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    information cached in memoryW = 

  • Performance (1)Performance (1)Performance (1)Performance (1)Simulation environment• 4GB flash memory

    – Large block SLC NAND (2KB page, 128KB block)

    • FTL schemes– Naïve block mapping– Replacement block– Log block

    Superblock– Superblock

    • Workload– Trace from PC using NTFSTrace from PC using NTFS

    23CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • Performance (2)Performance (2)Performance (2)Performance (2)Extra erase and write operationsp• 256 extra blocks

    300000

    350000

    16000000

    18000000

    200000

    250000

    10000000

    12000000

    14000000

    Sain

    Replacement

    Naive

    100000

    150000

    4000000

    6000000

    8000000Replacement

    Logblock

    Superblock

    0

    50000

    Erase

    0

    2000000

    Write(GC)

    24CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • OS Implications (1)OS Implications (1)OS Implications (1)OS Implications (1)NAND flash has different characteristics compared to disks• No seek time• Asymmetric read/write access times• No in-place-updatep p• Good sequential read/sequential write/random read

    performance, but bad random write performancep p• Wear-leveling• …

    • Traditional operating systems have been optimized for disks What should be changed?

    25CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

    for disks. What should be changed?

  • OS Implications (2)OS Implications (2)OS Implications (2)OS Implications (2)SSD support in Microsoft Windows 7pp• Turn off “defragmentation” for SSDs• New “TRIM” command

    – Remove-on-delete

    • Align file system partition with SSD layout• Larger block size proposal (4KB)

    26CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])

  • SummarySummarySummarySummarySoftware support for NAND flash memory is pp yessential for improving performance & reliability of the systemy y

    We need to revisit OS policies andWe need to revisit OS policies and mechanisms which are optimized for disks

    27CSE3008: Operating Systems | Fall 2009 | Jin-Soo Kim ([email protected])