17
Lustre/ldiskfs Metadata Performance Boost 2017/10/05 DataDirect Networks Japan, Inc. Shuichi Ihara Shilong Wang

Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Lustre/ldiskfs Metadata Performance Boost

2017/10/05

DataDirect Networks Japan, Inc.Shuichi Ihara Shilong Wang

Page 2: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Why is metadata performance important?

▶  Lustre is general purpose filesystem for Big data •  1 Million files per job are quite common with life science

application • AI/Machine learning type of workload requires small file

access with low latency. Metadata performance is one of key factors of it.

•  Lustre metadata performance has been performing well.

▶  Vertical and Horizontal scale •  28 (and 32) CPU cores/socket is available Today. • DNE helps Horizontal scale out Metadata, but needs to

understand your single MDS metadata performance first.

2

Page 3: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Challenges on Lustre metadata performance

▶  Lustre metadata is complex and very sensitive •  Latency on many layers (CPU, network, disks, kernel..)

affects Lustre metadata performance. • Deep analysis and Maximizing single Metadata server

performance is important for efficient MDS configuration.

▶  Consistent Metadata benchmark is not simple • There are many dependencies between each lustre version.

(Kernel support changes, OFED, Hardware configuration, etc) • mdtest is common benchmark tool. MDS-survey is useful and

good start, but it’s limited layer to test. Eventually, End-to-End requires to understand full Lustre metadata performance.

3

Page 4: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

What’s Lustre metadata performance Today? Benchmark Configuration

Forces on single MDS and MDT testing ▶  1 x MDS

•  1 x CPU Socket o  Intel Skylake Processer (Platinum 8160, 2 1GHz, 24 CPU cores) o  96GB DDR4 Memory o  FDR Infiniband

•  Lustre-2.10.1 ▶  1 x MDT

•  4 x Toshiba RI SSD(RAID10) •  DDN SFA7700X

▶  4 x OSS and 40 x OST with DDN SFA14KXE ▶  32 x Client

•  2 x E5-2650v4 •  128GB DDR4 Memory •  FDR Infiniband •  Lustre-2.10.1

4

Page 5: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

MDS-Survey RHEL7.3/Lustre-2.10/ldiskfs5

0

50,000

100,000

150,000

200,000

250,000

4 8 16 32 64 128

ops/

sec

NUmbers of Thread

MDS-Survey(File Creation and Unlink) RHEL7.3/Lustre-2.10.1RC/ldiskfs (Quota Enabled)

File Creation

File Removal

Page 6: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Metadata Performance impacts with Enabling Quota(RHEL7.3/Lustre-2.10.1)6

0

50000

100000

150000

200000

250000

300000

4 8 16 32 64 128

ops/

sec

Number of Thread

mds-survey(creation)no quota user quota user, project quota

0

50000

100000

150000

200000

250000

300000

4 8 16 32 64 128

ops/

sec

Number of Thread

mds-survey(unlink)no quota user quota user, project quota

Page 7: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

New metadata performance limit on Lustre/ldiksfs

Lustre/ldiskfs has been performing metadata rate, but new high-end CPUs expose next level performance limit. ▶  File creations under heavy concurrency

• Many threads create files to a MDT simultaneously • Scalability problem on Many CPU core system

▶  Quota scalability •  Lustre Quota scalability was hidden by other limitation • Will hit quota scalability issue when lustre metadata

performance improves • New quota accounting (e.g. project quota) introduced

additional performance impacts

7

Page 8: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

A problem on File creation under concurrency

▶  Profiled with perf-tools during mdtest to ldiskfs/ext4 • Collected CPU costs for all functions in ext4 and jbd2 • Found heavy lock contentions on group spinlock

• Same contentions exist in the upstream kernel

8

FUNC TOTAL_TIME(us) COUNT AVG(us) ext4_create 1707443399 1440000 1185.72 _raw_spin_lock 1317641501 180899929 7.28 jbd2__journal_start 287821030 1453950 197.96 jbd2_journal_get_write_access 33441470 73077185 0.46 ext4_add_nondir 29435963 1440000 20.44 ext4_add_entry 26015166 1440049 18.07 ext4_dx_add_entry 25729337 1432814 17.96 ext4_mark_inode_dirty 12302433 5774407 2.13

Page 9: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Fix lock contentions in upstream kernel

▶  Fixed and merged upstream kernel (4.14) Wang Shilong (2): ext4: cleanup goto next group ext4: reduce lock contention in __ext4_new_inode

▶  13x performance improvement on file creation •  Run mdtest to ext4 directly •  Unique directory operations •  Quota disabled

9

0

200000

400000

600000

800000

1000000

1200000

File Creation File removal

ops/

sec

mdtest to ext4 (linux-4.13-rc5)Default ext4 patched ext4

13x

Page 10: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

mds-survey on patched ldiskfs

▶  LU-9796: speedup file creation under heavy concurrency ▶  Ported patches to ldiskfs for RHEL7 kernel

10

0

50,000

100,000

150,000

200,000

250,000

300,000

4 8 16 32 64 128

ops/

sec

Number of Threads

File Creation :mds-survey on ldiskfs 1 x MDS and 1 x MDT(2 x RAID1 SSD)

default

patch

Page 11: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Quota scalability problem

▶  File creation/unlink affects performance when quota enabled • Same behaviors on RHEL7

and upstream kernel • Project quota introduced

additional performance penalty

11

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

File Creation File removal

mdetst to ext4 (linux-4.13-rc5) noquota quota quota,project

Page 12: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Quota scalability improvements in Ext4

▶  New quota scaling patch introduced in upstream kernel • Tested new Jan Kara’s quota scaling patches (merged in 4.14) • Huge performance gains when quota enabled

12

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

noquota quota quota,project

File Creation default quota_scaling

0

100,000

200,000

300,000

400,000

500,000

600,000

noquota quota quota,project

File Removal default quota_scaling

+88%

+88%

Page 13: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Performance results Quota scaling for Lustre

▶  Experimentally ported quota patches to RHEL7 kernel for Lustre server (LU-10034: Quota scaling for Lustre)

▶  User/Group and Project quota enabled

13

0

50,000

100,000

150,000

200,000

250,000

300,000

4 8 16 32 64 128

ops/

sec

Number of Thread

mds-survey(File Creation)default ldiskfs patch ldiskfs+quota patch

0

50,000

100,000

150,000

200,000

250,000

300,000

4 8 16 32 64 128

ops/

sec

Number of Thread

mds-survey(File removal)default ldiskfs patch ldiskfs+quota patch

2.7x+68%

Page 14: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

End-to-End Lustre Metadata Performance mdtest (1.28M files, unique directories)

▶  Applied ext4’s contention fix and quota scaling patch against RHEL7 kernel

▶  mdtest from 32 clients to single MDS/MDT

14

0

50000

100000

150000

200000

250000

32 64 128 256 512

ops/

sec

Number of threads

mdtest(File creation)user quota user,project quota

0

50000

100000

150000

200000

250000

32 64 128 256 512

ops/

sec

Number of threads

mdtest(File removal)user quota user,project quota

Page 15: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Additional metadata performance efforts

▶  LU-7251 osp: do not assign commit callback to every handle • Reduction and optimization of cancel RPCs •  Improved unlink operations

▶  LU-9840 lod: add ldo_dir_stripe_loaded • Performance improvements on file creation to single

shared directory

▶  LU-9972 performance regression on rmdir ▶  LU-10005 osp: cache non-exist EA

• Performance regression for non-root MDT

15

Page 16: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Conclusions

▶  Evaluated Lustre-2.10 metadata performance on new hardware and exposed new performance limits.

▶  Fixed contention problem at inode allocation and tested ext4 quota scaling patch.

▶  Demonstrated 200K file creation and 150K unlink per second on single MDT with full quota accounting.

▶  Need further investigation on Lustre unlink performance.

▶  Will test even more CPU cores to maximize single metadata performance.

16

Page 17: Lustre/ldiskfs Metadata Performance Boost€¦ · © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements or representations

ddn.com © 2017 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

17 Thank you!