Upload
flashdomain
View
371
Download
0
Tags:
Embed Size (px)
Citation preview
A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems
Jin Kyu Kim, Samsung ElectronicsHyung Gyu Lee, Samsung Electronics
Shinho Choi, Samsung ElectronicsKyoung Il Bahng, Samsung Electronics
2
Why Hybrid Approach ?
≤Gasoline Engine- Gasoline price is high - Pollution- Fuel Efficiency depends on traffic
[+]Good for high way[-]Bad for heavy traffic
[+] Fuel capacity is large
Electronic Motor[+] Electricity price is reasonable [+] No pollution [+] Good fuel efficiency[-] Fuel capacity is very limited
3
Era of Hybrid Approach
NAND Flash
NVRAM
NAND Flash[-] Short life span [-] W Performance depends on access
[+]Good for large sequential access[-]Bad for small random access
[-] No support for byte access[+] Very high density
NVRAM[+] Long life span[+] Good performance[+] Support for byte access[-] Low density
NVRAMNAND
Hybrid Storage
User Data
Meta data
FTLFile systemMeta Data
4
Characteristics of NVRAM(PRAM)
1M 10M 100M 1G 10G 100G
10n
1n
1u
100n
100u
10u
1m
SRAM
MRAM
FeRAM
DRAM
PRAM
NOR Flash
NAND Flash
Storage memory
Working memory
Volatile memoryNGNVM, No erase Op.
Flash memory, Erase
Wri
te p
erf
orm
an
ce (
s)
Capacity (bit)
• Flash memory device Erase before program Different granularity on erase and
program operations Limited life time High Density (NAND)
NVRAM• Fast read/write• Byte operation (random access)• No erase operation• Capacity (cost) problem
5
Memory Components
PRAM- No Erase Op.- Byte Accessible- High Performance
NAND - High Density - Low Cost
High Performance, Low cost NAND based Storage Solution
(High Performance MLC NAND based Storage System)
Synergy Effects
CPU
WorkingRAM
M-SDRAM
Code XIP
CPU
CodeStorage
WorkingRAM
NORM-SDRAM NAND
Code XIP
Conventional architecture PRAM+NAND hybrid architecture
StorageUser dataMeta data Code +
StorageData
Storage
PRAM NAND
User dataMeta
Block
Page 62
Page 0
Page 63
……
Block
Page 62
Page 0
Page 63
……
Block
Page 62
Page 0
Page 63
……
Block
Page 62
Page 0
Page 63
……
NAND Device
6
Overall Storage Hierarchy
File System
FTL
LLD (low level driver )
Physical NAND devices
Block device driver
Logical Address
Physical Address
Applications
1
FS Meta data User data2 1
Performance Problems !
User data (Clusters)– Large size data
– Sequential access pattern
– SectorUnit I/O
⇒ Appropriate for NAND property
1
Meta data (Clusters)– Small Size Data
– Frequently updated
⇒ Appropriate for PRAM property
2
Mixed Pattern Random Pattern
– Main reason for FTL performance degradation
– Performance degradation for overall storage
21
Meta + User data1 2
FTL Meta data User data3 1 2
31 2
7
Motivations
In Filesystem Level• In addition to user data, file system needs to manage meta data
• The properties of meta data degrade FTL performance
degrade the throughput of file system
If store FS metadata into NVRAM instead of NAND Decrease overhead for handling Metadata I/O Decrease random access I/O to FTL M
otivatio
n 1
In FTL Level• Performance and Performance stability depends on the mapping algorithm
• FTL performance depends on user(Filesystem) 의 Access Pattern
Page Mapping - High performance - High resource consumption
Block Mapping - Low performance - Low resource consumption
Mapping granularity
Fine granularity Coarse granularity
High performance Low performance
Degree of random accessLow High
If implement FTL with NVRAM Page Mapping without large amount of memory memory Stable against random access through fine granularityM
otivatio
n 2
8
Motivations
In Filesystem Level• In addition to user data, file system needs to manage meta data
• The properties of meta data degrade FTL performance
degrade the throughput of file system
If store FS metadata into NVRAM instead of NAND Decrease overhead for handling Metadata I/O Decrease random access I/O to FTL M
otivatio
n 1
In FTL Level• Performance and Performance stability depends on the mapping algorithm
Motiv
ation 2
Coarse grainedmapping FTL
Fine grainedmapping FTL
Fine grainedmapping FTL with NVRAM
Res
ourc
e
Performance
9
Overall Scheme – FSMS, hFTL
FAT-based file system
hFTL
LLD
NAND devices
Hybrid block device driver
(b) Hybrid storage architecture
PRAM Wrapper
FAT-base file system
FTL (flash translation layer )
LLD (low level driver)
Physical NAND devices
General block device driver
(a) Conventional storage architecture
Meta data User data
devicePRAM
Meta data User data
FTL Meta data
Logical Address
Physical Address
FTLMeta data
10
Approach in File System Level
FSMS(File System Metadata Separation)
11
File System Metadata Separation Scheme
• Separate File System Meta data [FAT,DIR,Log] from NAND Flash– Store FS Metadata on PRAM Decrease random access to FTL
– Only the user data will be stored on NAND flash
– Random Access decrement -> FTL overhead decrease -> Performance and life span of NAND flash increase
DataBlk#0
DataBlk#1
DataBlk#n…………………
Logical address for meta area
EOF
………………
…
ClustersFAT
1 234
Log Directory
FTL
NAND Device
Logical Address for data cluster area
PRAM Device
PRAM Wrapper
LSN
4
LSN
3
LSN
1
LSN
2
LS
N70
0
LS
N69
9
……………
Meta data Area
12
Approach in Block Device Level
hFTL(PRAM+NAND hybrid storage FTL)
13
Need for hFTL
Complex Apps .- Complex Applications- Multi-Tasking Systems
Increase Random Access to FTL
3 party FS- Non FAT filesystem
rd
FSMS is not applicableNeed for FS modification
New FTL
- Support Large Volume NAND Flash- No usage of large amount of DRAM- Stable performance against random write
- Apply Page Mapping- Store FTL meta data on NVRAM- Store User Data [FS Data] on NAND Flash
Goal
Basic Ideas
14
Implementation of hybrid FTL
NAND
Page Mapping Table Entry
History of LP8 bits
Page Mapping : LP to PP24 bits
24 bits allow 8GB storage spacein 2KB Page
NAND Page
Main (2KB) Spare4
Byte
Valid Page
Valid Page
Invalid Page
Valid Page
…
Data Block
Page Mapping Table
PP for LP0
……
PP for LP1
PP for LP2
PP for LP3
PP for LP4
PP for LP N
St. for PP0
……
St. for PP1
St. for PP2
St. for PP3
St. for PP4
St. for PP N
Physical Page Status Bitmap
Status of Physical Page 0x00 : Free 0x01 : Valid 0x11 : Invalid
PRAM
15
Implementation of hybrid FTL
• PRAM, NAND Layout
Buffer Block
Blk #0
Blk #1
Blk #2
Blk #3
Blk #n-1
Blk #n
NAND Device
……
Valid Page
Valid Page
Invalid Page
Valid Page
…
Data Block Data Block
Valid Page
Valid Page
Invalid Page
Valid Page
…
Garbage Block
Invalid Page
Invalid Page
Invalid Page
Invalid Page
…Invalid Page
Valid Page
Free Page
Free Page
…
Block Infomation
BI#0
BI#1
BI#2
BI#3
BI#n-1
BI#n……
Page Mapping Table
PP forLP#0
PP forLP#0
PP forLP#0……
PRAM Context
BC
Physical Page Bitmap
StatusPP#0
StatusPP#m……
PRAMCxt.
Reverse Map
DRAM
LP for PP offset 0
LP for PP offset 1
LP for PP offset 64
…
Block Info.
Buffer Blk Cxt
PRAM
Garbage Queue
Blk#
Blk#
Blk#
Blk#
Blk#
Blk#
16
Performance Optimizations - 1
• Filtering between FS and PRAM device
• On the average, only 16 % of one metadata block is changed
File system
PRAMFilter
PRAM Device
512bytes
<<512bytes
512bytes
Writerequest
Readrequest
Filtering write request on PRAM
0
4K
8K
12K
16K
20K
0 25 50 75 100 125 150 175 200 225 250Number of changed word
Fre
quen
cy
Distribution of changed words in meta data write requests
17
Performance Optimizations-2
• Functionality of Logical Delete
…………………
……………
Space that deleted file A occupied
NAND Device
PS
N4
PS
N3
PS
N1
PS
N2
PS
N69
9
PS
N12
3
PS
N12
2
PS
N12
0
PS
N12
1
PS
N12
7
PS
N12
6
PS
N12
4
PS
N12
5
……………
Logical Address (LSN 701 ~ LSN 450000)
File system
…………………
……………
Space that deleted file A occupied
NAND Device
PS
N4
PS
N3
PS
N1
PS
N2
PS
N6
99
PS
N12
3
PS
N12
2
PS
N12
0
PS
N12
1
PS
N12
7
PS
N12
6
PS
N12
4
PS
N12
5
……………
Logical Address (LSN 701 ~ LSN 450000)
File system
PRAM DevicePSN’ s status information Bitmap: Invalid된
Sector Marking
Remove management Overhead of the space occupied by deleted file
18
Evaluations
• Evaluation Setting
– S3C2413 CPU board
• 1 Channel, 1 NAND device support
(MLC and SLC)
– 64MB PRAM (KPS1215EZM)
– 1GB MLC NAND (K9G8G08U0M)
• Evaluations
– Comparison Target: log block FTL
– Criterion
• Performance– LLD Layer (Low Level Device driver)
– FTL Layer (Block Device driver)
– File system Layer (TFS4)
• Life span: Erase count
• Cost : code and data size
PRAM
NAND
19
LLD, FTL-Level Evaluation
• LLD (Low Level Device driver)– Read: 4.25MB/s, Write 1.7MB/s
• FTL
– Sequential Read: 4.16MB/s (LB-FTL, hFTL Similar)
– Random Read: 4.09MB/s (LB-FTL, hFTL Similar)
– Sequential Write: 1.62MB/s (LB-FTL, hFTL Similar)
– Random Write :
-
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
8 16 32 64 128 256 512 1024 2048-
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
8 16 32 64 128 256 512 1024 2048
hFTL_Write LBFTL_Write hFTL_Read LBFTL_Read
Thr
ough
put (
KB
/s)
Request unit size (KB) Request unit size (KB)
Thr
ough
put (
KB
/s)
(a) Sequential access pattern (b) Random access pattern
LLD write
LLD read
20
File System Level Evaluations
• IOzone benchmark for Sequential I/O
-
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
8 16 32 64 128 256 512 1024 2048
Thr
ough
put (
KB
/s)
Request unit size (KB)(a) Sequential read
-
200
400
600
800
1,000
1,200
1,400
1,600
8 16 32 64 128 256 512 1024 2048
Thr
ough
put (
KB
/s)
(b) Sequential write
LBFTL LBFTL+FSMS hFTL hFTL+FSMS
Request unit size (KB)
21
File System Level Evaluations
• IOzone benchmark for Random I/O
LBFTL LBFTL+FSMS hFTL hFTL+FSMS
(c) Random read
-
200
400
600
800
1,000
1,200
1,400
1,600
8 16 32 64 128 256 512 1024 2048T
hrou
ghpu
t (K
B/s
)
(d) Random write
-
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
8 16 32 64 128 256 512 1024 2048
Thr
ough
put (
KB
/s)
Request unit size (KB) Request unit size (KB)
22
Life Span and Implementation Cost• Life Span
• Implementation Cost
20,274
11,99113,746
10,426
0
5K
10K
15K
20K
25K
LBFTL LBFTL-FSMS hFTL hFTL-FSMS
Num
ber
of e
rase
ope
ratio
n
LBFTL LBFTL-FSMS hFTL hFTL-FSMS
0
10
20
30
40
50
60
Code size RW Data size
Size
(K
B)
Required PRAM for 1GB NAND: For FSMS : 2 MB For hFTL : 2.5 MB
23
Contribution
• Summary– Suggest SW scheme for efficient usage of NAND/PRAM Hybrid
Storage• FSMS• hFTL
– Enhance the performance compared to NAND only storage• FSMS, hFTL take effects orthogonally • For, sequential write, the effects of FSMS is larger than that of hFTL • For, random write, the effects of hFTL is larger than that of F 는 hFTL+FSMS shows better performance than LBFTL by up to 290%. hFTL shows better performance than LBFTL+FSMS.
• Future work– Aggressive optimization using NGNVMs in File System Level – Scalability for large NAND flash storage
24
NAND v.s. PRAM
2K
60us
PRAM Async
4K
Time (us)
Byte
s
25us 120us
PRAM Sync
NAND SLC
NAND MLC
Read Performance
2K
800us 1,600
4K
160B
PRAM 500nsNAND SLC
NAND MLC
Program Performance
PRAM 10us
PRAM 1us
Time (us)
Byt
es
Page Access
Need for Erase
104
1.4ms800us(page)
60us(page)
Serial access:30ns
NAND(MLC)
1.4msN/ABlock Erase Time200us(page)10us (word)Word Program Time
Page Access
Need for Erase
Byte Access
No need for EraseFeatures
105106Life span
20us(page)
Serial access:25ns
Async: 80ns
Sync: 10nsAccess time
NAND(SLC)PRAMFeatures
In current technology, it’s hard to replace NAND flash with PRAM device. However, it’s enough to replace NOR flash