24
A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems Jin Kyu Kim, Samsung Electronics Hyung Gyu Lee, Samsung Electronics Shinho Choi, Samsung Electronics Kyoung Il Bahng, Samsung Electronics

slide

Embed Size (px)

Citation preview

Page 1: slide

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

Jin Kyu Kim, Samsung ElectronicsHyung Gyu Lee, Samsung Electronics

Shinho Choi, Samsung ElectronicsKyoung Il Bahng, Samsung Electronics

Page 2: slide

2

Why Hybrid Approach ?

≤Gasoline Engine- Gasoline price is high - Pollution- Fuel Efficiency depends on traffic

[+]Good for high way[-]Bad for heavy traffic

[+] Fuel capacity is large

Electronic Motor[+] Electricity price is reasonable [+] No pollution [+] Good fuel efficiency[-] Fuel capacity is very limited

Page 3: slide

3

Era of Hybrid Approach

NAND Flash

NVRAM

NAND Flash[-] Short life span [-] W Performance depends on access

[+]Good for large sequential access[-]Bad for small random access

[-] No support for byte access[+] Very high density

NVRAM[+] Long life span[+] Good performance[+] Support for byte access[-] Low density

NVRAMNAND

Hybrid Storage

User Data

Meta data

FTLFile systemMeta Data

Page 4: slide

4

Characteristics of NVRAM(PRAM)

1M 10M 100M 1G 10G 100G

10n

1n

1u

100n

100u

10u

1m

SRAM

MRAM

FeRAM

DRAM

PRAM

NOR Flash

NAND Flash

Storage memory

Working memory

Volatile memoryNGNVM, No erase Op.

Flash memory, Erase

Wri

te p

erf

orm

an

ce (

s)

Capacity (bit)

• Flash memory device Erase before program Different granularity on erase and

program operations Limited life time High Density (NAND)

NVRAM• Fast read/write• Byte operation (random access)• No erase operation• Capacity (cost) problem

Page 5: slide

5

Memory Components

PRAM- No Erase Op.- Byte Accessible- High Performance

NAND - High Density - Low Cost

High Performance, Low cost NAND based Storage Solution

(High Performance MLC NAND based Storage System)

Synergy Effects

CPU

WorkingRAM

M-SDRAM

Code XIP

CPU

CodeStorage

WorkingRAM

NORM-SDRAM NAND

Code XIP

Conventional architecture PRAM+NAND hybrid architecture

StorageUser dataMeta data Code +

StorageData

Storage

PRAM NAND

User dataMeta

Block

Page 62

Page 0

Page 63

……

Block

Page 62

Page 0

Page 63

……

Block

Page 62

Page 0

Page 63

……

Block

Page 62

Page 0

Page 63

……

NAND Device

Page 6: slide

6

Overall Storage Hierarchy

File System

FTL

LLD (low level driver )

Physical NAND devices

Block device driver

Logical Address

Physical Address

Applications

1

FS Meta data User data2 1

Performance Problems !

User data (Clusters)– Large size data

– Sequential access pattern

– SectorUnit I/O

⇒ Appropriate for NAND property

1

Meta data (Clusters)– Small Size Data

– Frequently updated

⇒ Appropriate for PRAM property

2

Mixed Pattern Random Pattern

– Main reason for FTL performance degradation

– Performance degradation for overall storage

21

Meta + User data1 2

FTL Meta data User data3 1 2

31 2

Page 7: slide

7

Motivations

In Filesystem Level• In addition to user data, file system needs to manage meta data

• The properties of meta data degrade FTL performance

degrade the throughput of file system

If store FS metadata into NVRAM instead of NAND Decrease overhead for handling Metadata I/O Decrease random access I/O to FTL M

otivatio

n 1

In FTL Level• Performance and Performance stability depends on the mapping algorithm

• FTL performance depends on user(Filesystem) 의 Access Pattern

Page Mapping - High performance - High resource consumption

Block Mapping - Low performance - Low resource consumption

Mapping granularity

Fine granularity Coarse granularity

High performance Low performance

Degree of random accessLow High

If implement FTL with NVRAM Page Mapping without large amount of memory memory Stable against random access through fine granularityM

otivatio

n 2

Page 8: slide

8

Motivations

In Filesystem Level• In addition to user data, file system needs to manage meta data

• The properties of meta data degrade FTL performance

degrade the throughput of file system

If store FS metadata into NVRAM instead of NAND Decrease overhead for handling Metadata I/O Decrease random access I/O to FTL M

otivatio

n 1

In FTL Level• Performance and Performance stability depends on the mapping algorithm

Motiv

ation 2

Coarse grainedmapping FTL

Fine grainedmapping FTL

Fine grainedmapping FTL with NVRAM

Res

ourc

e

Performance

Page 9: slide

9

Overall Scheme – FSMS, hFTL

FAT-based file system

hFTL

LLD

NAND devices

Hybrid block device driver

(b) Hybrid storage architecture

PRAM Wrapper

FAT-base file system

FTL (flash translation layer )

LLD (low level driver)

Physical NAND devices

General block device driver

(a) Conventional storage architecture

Meta data User data

devicePRAM

Meta data User data

FTL Meta data

Logical Address

Physical Address

FTLMeta data

Page 10: slide

10

Approach in File System Level

FSMS(File System Metadata Separation)

Page 11: slide

11

File System Metadata Separation Scheme

• Separate File System Meta data [FAT,DIR,Log] from NAND Flash– Store FS Metadata on PRAM Decrease random access to FTL

– Only the user data will be stored on NAND flash

– Random Access decrement -> FTL overhead decrease -> Performance and life span of NAND flash increase

DataBlk#0

DataBlk#1

DataBlk#n…………………

Logical address for meta area

EOF

………………

ClustersFAT

1 234

Log Directory

FTL

NAND Device

Logical Address for data cluster area

PRAM Device

PRAM Wrapper

LSN

4

LSN

3

LSN

1

LSN

2

LS

N70

0

LS

N69

9

……………

Meta data Area

Page 12: slide

12

Approach in Block Device Level

hFTL(PRAM+NAND hybrid storage FTL)

Page 13: slide

13

Need for hFTL

Complex Apps .- Complex Applications- Multi-Tasking Systems

Increase Random Access to FTL

3 party FS- Non FAT filesystem

rd

FSMS is not applicableNeed for FS modification

New FTL

- Support Large Volume NAND Flash- No usage of large amount of DRAM- Stable performance against random write

- Apply Page Mapping- Store FTL meta data on NVRAM- Store User Data [FS Data] on NAND Flash

Goal

Basic Ideas

Page 14: slide

14

Implementation of hybrid FTL

NAND

Page Mapping Table Entry

History of LP8 bits

Page Mapping : LP to PP24 bits

24 bits allow 8GB storage spacein 2KB Page

NAND Page

Main (2KB) Spare4

Byte

Valid Page

Valid Page

Invalid Page

Valid Page

Data Block

Page Mapping Table

PP for LP0

……

PP for LP1

PP for LP2

PP for LP3

PP for LP4

PP for LP N

St. for PP0

……

St. for PP1

St. for PP2

St. for PP3

St. for PP4

St. for PP N

Physical Page Status Bitmap

Status of Physical Page 0x00 : Free 0x01 : Valid 0x11 : Invalid

PRAM

Page 15: slide

15

Implementation of hybrid FTL

• PRAM, NAND Layout

Buffer Block

Blk #0

Blk #1

Blk #2

Blk #3

Blk #n-1

Blk #n

NAND Device

……

Valid Page

Valid Page

Invalid Page

Valid Page

Data Block Data Block

Valid Page

Valid Page

Invalid Page

Valid Page

Garbage Block

Invalid Page

Invalid Page

Invalid Page

Invalid Page

…Invalid Page

Valid Page

Free Page

Free Page

Block Infomation

BI#0

BI#1

BI#2

BI#3

BI#n-1

BI#n……

Page Mapping Table

PP forLP#0

PP forLP#0

PP forLP#0……

PRAM Context

BC

Physical Page Bitmap

StatusPP#0

StatusPP#m……

PRAMCxt.

Reverse Map

DRAM

LP for PP offset 0

LP for PP offset 1

LP for PP offset 64

Block Info.

Buffer Blk Cxt

PRAM

Garbage Queue

Blk#

Blk#

Blk#

Blk#

Blk#

Blk#

Page 16: slide

16

Performance Optimizations - 1

• Filtering between FS and PRAM device

• On the average, only 16 % of one metadata block is changed

File system

PRAMFilter

PRAM Device

512bytes

<<512bytes

512bytes

Writerequest

Readrequest

Filtering write request on PRAM

0

4K

8K

12K

16K

20K

0 25 50 75 100 125 150 175 200 225 250Number of changed word

Fre

quen

cy

Distribution of changed words in meta data write requests

Page 17: slide

17

Performance Optimizations-2

• Functionality of Logical Delete

…………………

……………

Space that deleted file A occupied

NAND Device

PS

N4

PS

N3

PS

N1

PS

N2

PS

N69

9

PS

N12

3

PS

N12

2

PS

N12

0

PS

N12

1

PS

N12

7

PS

N12

6

PS

N12

4

PS

N12

5

……………

Logical Address (LSN 701 ~ LSN 450000)

File system

…………………

……………

Space that deleted file A occupied

NAND Device

PS

N4

PS

N3

PS

N1

PS

N2

PS

N6

99

PS

N12

3

PS

N12

2

PS

N12

0

PS

N12

1

PS

N12

7

PS

N12

6

PS

N12

4

PS

N12

5

……………

Logical Address (LSN 701 ~ LSN 450000)

File system

PRAM DevicePSN’ s status information Bitmap: Invalid된

Sector Marking

Remove management Overhead of the space occupied by deleted file

Page 18: slide

18

Evaluations

• Evaluation Setting

– S3C2413 CPU board

• 1 Channel, 1 NAND device support

(MLC and SLC)

– 64MB PRAM (KPS1215EZM)

– 1GB MLC NAND (K9G8G08U0M)

• Evaluations

– Comparison Target: log block FTL

– Criterion

• Performance– LLD Layer (Low Level Device driver)

– FTL Layer (Block Device driver)

– File system Layer (TFS4)

• Life span: Erase count

• Cost : code and data size

PRAM

NAND

Page 19: slide

19

LLD, FTL-Level Evaluation

• LLD (Low Level Device driver)– Read: 4.25MB/s, Write 1.7MB/s

• FTL

– Sequential Read: 4.16MB/s (LB-FTL, hFTL Similar)

– Random Read: 4.09MB/s (LB-FTL, hFTL Similar)

– Sequential Write: 1.62MB/s (LB-FTL, hFTL Similar)

– Random Write :

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

8 16 32 64 128 256 512 1024 2048-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

8 16 32 64 128 256 512 1024 2048

hFTL_Write LBFTL_Write hFTL_Read LBFTL_Read

Thr

ough

put (

KB

/s)

Request unit size (KB) Request unit size (KB)

Thr

ough

put (

KB

/s)

(a) Sequential access pattern (b) Random access pattern

LLD write

LLD read

Page 20: slide

20

File System Level Evaluations

• IOzone benchmark for Sequential I/O

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

8 16 32 64 128 256 512 1024 2048

Thr

ough

put (

KB

/s)

Request unit size (KB)(a) Sequential read

-

200

400

600

800

1,000

1,200

1,400

1,600

8 16 32 64 128 256 512 1024 2048

Thr

ough

put (

KB

/s)

(b) Sequential write

LBFTL LBFTL+FSMS hFTL hFTL+FSMS

Request unit size (KB)

Page 21: slide

21

File System Level Evaluations

• IOzone benchmark for Random I/O

LBFTL LBFTL+FSMS hFTL hFTL+FSMS

(c) Random read

-

200

400

600

800

1,000

1,200

1,400

1,600

8 16 32 64 128 256 512 1024 2048T

hrou

ghpu

t (K

B/s

)

(d) Random write

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

8 16 32 64 128 256 512 1024 2048

Thr

ough

put (

KB

/s)

Request unit size (KB) Request unit size (KB)

Page 22: slide

22

Life Span and Implementation Cost• Life Span

• Implementation Cost

20,274

11,99113,746

10,426

0

5K

10K

15K

20K

25K

LBFTL LBFTL-FSMS hFTL hFTL-FSMS

Num

ber

of e

rase

ope

ratio

n

LBFTL LBFTL-FSMS hFTL hFTL-FSMS

0

10

20

30

40

50

60

Code size RW Data size

Size

(K

B)

Required PRAM for 1GB NAND: For FSMS : 2 MB For hFTL : 2.5 MB

Page 23: slide

23

Contribution

• Summary– Suggest SW scheme for efficient usage of NAND/PRAM Hybrid

Storage• FSMS• hFTL

– Enhance the performance compared to NAND only storage• FSMS, hFTL take effects orthogonally • For, sequential write, the effects of FSMS is larger than that of hFTL • For, random write, the effects of hFTL is larger than that of F 는 hFTL+FSMS shows better performance than LBFTL by up to 290%. hFTL shows better performance than LBFTL+FSMS.

• Future work– Aggressive optimization using NGNVMs in File System Level – Scalability for large NAND flash storage

Page 24: slide

24

NAND v.s. PRAM

2K

60us

PRAM Async

4K

Time (us)

Byte

s

25us 120us

PRAM Sync

NAND SLC

NAND MLC

Read Performance

2K

800us 1,600

4K

160B

PRAM 500nsNAND SLC

NAND MLC

Program Performance

PRAM 10us

PRAM 1us

Time (us)

Byt

es

Page Access

Need for Erase

104

1.4ms800us(page)

60us(page)

Serial access:30ns

NAND(MLC)

1.4msN/ABlock Erase Time200us(page)10us (word)Word Program Time

Page Access

Need for Erase

Byte Access

No need for EraseFeatures

105106Life span

20us(page)

Serial access:25ns

Async: 80ns

Sync: 10nsAccess time

NAND(SLC)PRAMFeatures

In current technology, it’s hard to replace NAND flash with PRAM device. However, it’s enough to replace NOR flash