Upload
doanthuan
View
219
Download
0
Embed Size (px)
Citation preview
An Empirical Study of File Systems on NVMPriya Sehgal, Sourav Basu, Kiran Srinivasan, Kaladhar Voruganti
NetApp Inc.
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 1
Motivation
Need a thin software stack to access data from fast NVM
Persistent memory abstractions
NVM optimized file systems
What characteristics of traditional block based file systems are good for NVM?
Can traditional file systems be fine tuned using mount and format options?
Can it be optimized with minor changes?
How does the performance of traditional file systems compare with NVM-optimized one?
What file system features help improve performance on NVM?
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 2
Overview
Related Work
Experimental Methodology
Experimental Results
Recommendation and Conclusion
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 3
Overview
Related Work
Experimental Methodology
Experimental Results
Recommendation and Conclusion
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 4
Related Work
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 5
POSIX interface, re-designing the file system for
NVM
POSIX library interposers or
bypass file system
New Programming Models and
Data-Structures
Adjust operating system,
storage stack or file system configurations
Related Work
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 6
POSIX interface, re-designing the file system for
NVM
POSIX library interposers or
bypass file system
New Programming Models and
Data-Structures
Adjust operating system,
storage stack or file system configurations
• PMFS
• SCMFS
• BPFS
Related Work
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 7
POSIX interface, re-designing the file system for
NVM
POSIX library interposers or
bypass file system
New Programming Models and
Data-Structures
Adjust operating system,
storage stack or file system configurations
• PMFS
• SCMFS
• BPFS
• Moneta-D
• Bankshot
• Aerie
Related Work
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 8
POSIX interface, re-designing the file system for
NVM
POSIX library interposers or
bypass file system
New Programming Models and
Data-Structures
Adjust operating system,
storage stack or file system configurations
• PMFS
• SCMFS
• BPFS
• Moneta-D
• Bankshot
• Aerie
• Mnemosyne
• PMem-Lib
• NV-Heaps
• NV-Tree
• CDDS
Related Work
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 9
POSIX interface, re-designing the file system for
NVM
POSIX library interposers or
bypass file system
New Programming Models and
Data-Structures
Adjust operating system,
storage stack or file system configurations
• PMFS
• SCMFS
• BPFS
• Moneta-D
• Bankshot
• Aerie
• Mnemosyne
• PMem-Lib
• NV-Heaps
• NV-Tree
• CDDS
• Lee et al
• Our work.
Overview
Related Work
Experimental Methodology
Experimental Results
Recommendation and Conclusion
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 10
Experimental MethodologyWorkloads
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 11
WORKLOADS
FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations
Experimental MethodologyWorkloads
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 12
FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations
WebProxy:Flat namespace, large no. of small files, data and metadata operations
WORKLOADS
Experimental MethodologyWorkloads
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 13
WORKLOADSOLTP: data intensive
WebProxy:Flat namespace, large no. of small files, data and metadata operations
FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations
Experimental MethodologyWorkloads
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 14
Key-Value Store: Uses Memory-Mapped Semantics (load/stores)
WORKLOADSOLTP: data intensive
WebProxy:Flat namespace, large no. of small files, data and metadata operations
FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations
Experimental MethodologyFile System Characteristics (varied using mount and format options)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 15
Inode Structure: Linear vs B+ Tree
Block Size: Fixed vs Variable sized extent
Layout/Update: In-place vs Log-structured vs Hybrid
Allocation Strategy: Immediate vs Delayed
Parallel Allocation (Concepts like Allocation/Block group)
Journal: Ordered vs Write-Back vs Data
Execute-in-place(XIP)
Experimental MethodologyFile System Characteristics (varied using mount and format options)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 16
Inode Structure: Linear vs B+ Tree
Block Size: Fixed vs Variable sized extent
Layout/Update: In-place vs Log-structured vs Hybrid
Allocation Strategy: Immediate vs Delayed
Parallel Allocation (Concepts like Allocation/Block group)
Journal: Ordered vs Write-Back vs Data
Execute-in-place(XIP)
File Systems Evaluated
Ext2, Ext3, Ext4, XFS, F2FS, NILFS2, PMFS
Experimental MethodologyExperimental Setup
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 17
RamDisk(NVM
Simulated)
File System
Workload/ Benchmark
DRAM
User Space
Kernel
Block Driver
XIP
Limitations:
• NVM characteristics not simulated
• Not considered recoverabiliy
Overview
Related Work
Experimental Methodology
Experimental Results
Recommendation and Conclusion
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 18
FileServer (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 19
0
20
40
60
80
100
120
140
160
op
s/s
ec (
x 1
000)
1.5x2x
1.3x
XIP support, atomic updates and fine grained
logging
Higher is better100K files of size 128KB
0
20
40
60
80
100
120
140
160
op
s/s
ec (
x 1
000)
FileServer (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 20
1.4x 3x
XIP support only for data in
Ext2/3
Higher is better
0
20
40
60
80
100
120
140
160
op
s/s
ec (
x 1
000)
FileServer (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 21
31%
Increased AG count increased parallelism
for meta-data intensive workloads
Higher is better
0
20
40
60
80
100
120
140
160
op
s/s
ec (
x 1
000)
FileServer (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 22
F2FS solves following problems of NILFS2:• Wandering tree • Garbage collection
3.1x
Higher is better
0
20
40
60
80
100
120
140
160
op
s/s
ec
(x
100
0)
100x
41x
45x
WebProxy (Throughput)At directory depth 0.7 – Flat namespace
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 23
Hash tree (indexed directory)
missing in ext2 and pmfs
500K files of size 32KB Higher is better
WebProxy (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 24
0
20
40
60
80
100
120
140
160
180
200
op
s/s
ec
(x
10
00
)
First Column: 0.7 Directory Depth
Second Column: 2.1 Directory Depth
Lookup is fast for increased
directory depth
PMFS scalability affected by
allocation from single linked list
Higher is better
WebProxy (Throughput)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 25
0
20
40
60
80
100
120
140
160
180
200
op
s/s
ec
(x
10
00
)
First Column: 0.7 Directory Depth
Second Column: 2.1 Directory Depth1.6x
Delayed Allocation not good for small
files
Higher is better
OLTP Database(TPC-C on MySQL)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 26
0
0.2
0.4
0.6
0.8
1
1.2
No
rmalized
tp
mC
XIP-FS performs best
400 warehouse, total size 38GB Higher is better
0
0.2
0.4
0.6
0.8
1
1.2
No
rmalized
tp
mC
OLTP Database(TPC-C on MySQL)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 27
Next Level are buffered FS:13-22%
worse than XIP
400 warehouse, total size 38GB Higher is better
XIP-FS performs best
0
0.2
0.4
0.6
0.8
1
1.2
No
rmalized
tp
mC
OLTP Database(TPC-C on MySQL)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 28
Next Level are buffered FS:13-22%
worse than XIP
Worst Performing FS: 32-42% worse compared to XIP
400 warehouse, total size 38GB Higher is better
XIP-FS performs best
0
0.2
0.4
0.6
0.8
1
1.2
No
rmalized
tp
mC
OLTP Database(TPC-C on MySQL)
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 29
10%
Increased AG count increases indirections
for large files
Higher is better
Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD C
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 30
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rmalized
Late
ncy
READ
Reduced page faults blurs
performance difference
10 million records, total size 36GB
READ=100%
Lower is better
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
No
rmalized
Late
ncy
READ UPDATE
Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD A
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 31
Msync and Fsync is costly in non-xip file
systems
Lower is better
READ=50% UPDATE=50%
Lower performing
workload affects other workloads
Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD E
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 32
0
0.5
1
1.5
2
2.5
3
No
rmalized
Late
ncy
SCAN INSERT
Effect of page-faults shows-up in case of
loads(SCAN)
SCAN=95% INSERT=5%
Lower is better
Overview
Related Work
Experimental Methodology
Experimental Results
Recommendation and Conclusion
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 33
Recommendation and Conclusion
Recommendation for traditional and new file systems
In-place update layout
Execute In Place
Simple and parallel allocation strategy
Fixed sized data blocks
© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 34