Upload
duongminh
View
234
Download
0
Embed Size (px)
Citation preview
LINUX IO performance tuning
for IBM System Storage
Location of this document:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102584
Markus Fehling
Certified IT specialist cross systems
Version 1.5 – August 2018
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 2
Table of Contents
1 Abstract ..................................................................................................................... 2
2 Introduction .............................................................................................................. 3
3 Description of the test environment ......................................................................... 3
4 LINUX IO stack layers ............................................................................................... 4
5 Recommendations ................................................................................................. 10
6 Measurements V7000 ............................................................................................ 11
7 Measurements FS9100 .......................................................................................... 21
8 Resources ............................................................................................................... 25
9 Trademarks ............................................................................................................. 25
10 Disclaimers ............................................................................................................. 26
1 Abstract
Applications like data bases, backup server, e-mail server, video streaming ... need a high
IO performance. In addition, every application has their own IO characteristics: Small
block size or large; random IO, sequential, or a mix of it; mostly read, or mostly write –
LINUX® provides several parameters effecting IO performance significantly.
This paper provides guidelines and recommendation how to set these parameters for
IBM® storage systems, within a SAN environment, by using LINUX LVM2.
This document has been written for IT technical specialists with advanced skill levels on
LINUX and IBM storage systems.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 3
2 Introduction
As mentioned above, different applications have different IO pattern:
OLTP: Online Transaction Processing is a typically workload pattern of Enterprise Re-
source Planning applications. like SAP® ERP™. Basically, the IO pattern is mostly
read, random IO, with a small block size, 32K with IBM DB2®, 8K with Oracle® data
base, 64K with Microsoft® SQL Server™.
OLAP: Online Analytical Processing is used by e.g. business warehouse (BW) applications,
read most, with some streaming aspects.
SAP HANA: During startup, all data needs to be read from storage into memory (RAM),
this workload is random read with large block sizes, 64K up to several MB. During
normal operation, the workload shifts to mostly write: streaming write with a small
(ERP) or large (BW) block size for LOG, and random write with a 64 KB up to 1 MB
block size for DATA.
Video broadcasting: The broadcasting server creates a streaming read IO process, typi-
cally with large block sizes.
3 Description of the test environment
The major goal was to identify the influence of the parameter of LINUX IO stack layer two,
three, and four: Figure 1: LINUX IO stack.
In 2016 tests were run on SLES™ 11.4 and RedHat® Enterprise LINUX™ 6.5,
on Intel® and IBM POWER™.
An 8 Gbit/s SAN was used, with 8 ports.
• IBM Storwize™ V7000 with 120 * HDD 10krpm
• IBM FlashSystem™ 900 with 12 * 2.7 TB Flash modules behind Storwize.
The IO workload was generated with the tool xdisk, all IO were performed with open()
option O_DIRECT, to eliminate effects of the file system cache.
The xdisk tool is available on IBM TechDocs, see reference section.
In July 2018 a new test was performed on IBM FlashSystem™ 9100 with 24 NVMe drives,
SLES 12.2, on IBM Power 8.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 4
4 LINUX IO stack layers
Figure 1 shows the simplified LINUX IO stack with the five major IO layers.
Figure 1: LINUX IO stack
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 5
The following paragraphs describe the different parameters and their performance im-
pacts.
4.1 File system
Today xfs is the most common LINUX file system, therefore all tests have been performed
with xfs only.
4.2 LVM2
i. physical volume
during our test we used a --dataalignment of 1 MB.
It turned out that for one VG four PV deliver good performance.
There is some performance difference between two and four PV, but only a minor
difference between four and eight.
ii. volume group
we used only non-mirrored VG; the most significant parameter is
--physicalextentsize
During our test we tested an extent size range from 64KB up to 1GB.
Overall an extent size of 1 MB delivers good performance.
iii. logical volume
We used striped volumes only, with --stripes equals the number of PV.
During our test we tested a stripe size range from 64KB up to 1GB.
Overall a stripe size of 256 KB delivers good performance.
4.3 IO scheduler
The default scheduler cfq delivers low IO performance for multiple (parallel) streaming
read processes, the total performance (MB/s) collapses to a fraction of a single process.
We recommend not to use the cfq scheduler.
It seems that deadline delivers slightly better performance than noop, special during mul-
tiple (parallel) IO operations, but this is neglectable.
SAP and SUSE recommend noop as scheduler, so does IBM for storage type 2145.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 6
4.4 Number of physical FC connections
We run two scenarios:
1. 4 FC connections between FS 900 & Storwize, and 4 connections between Stor-
wize and Server (IBM Power)
2. 8 FC connections between FS 900 & Storwize, and 8 connections between Stor-
wize and Server (IBM Power)
There is a measurable performance difference, especially if IO processes are working in
parallel; more important, if only 4 FC connections are used the performance is more sen-
sitive to the configuration of the /etc/multipath.conf parameters.
4.5 multipath setting
A list of all /etc/multipath.conf settings can be found here:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/DM_Multi-
path/#mpio_setup
These four parameters impacting the IO performance the most:
Parameter Value
path_grouping_policy
(pgp)
multibus (multi)
group_by_serial (gbs)
group_by_prio (gbp)
group_by_node_name (gbnn)
prio const
alua
rr_weight
(rr)
uniform (uni)
priorities (prio)
path_selector
(path)
round-robin (rr)
queue-length (ql)
service-time (st)
Table 1: multipath parameters
For identifying the best multipath settings, 4 PV and 8 PV have been used,
with a PV extent size of 1 MB, and a LV stripe size of 256 kB. The graphics show the per-
formance dependencies based on 8 FC connections.
The following table shows the best match for a given workload
█ recommended configuration
█ provides good performance
█ do not use this configuration
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 7
Table 2: Multipath configuration matrix for Storwize family
pri uni
rr ql st rr ql st rr ql st rr ql st
multibus alua multibus alua
const const
gbs alua gbs alua
const const
gbp alua gbp alua
const const
gbnn alua gbnn alua
const const
pri uni pri uni
rr ql st rr ql st rr ql st rr ql st
multibus alua multibus alua
const const
gbs alua gbs alua
const const
gbp alua gbp alua
const const
gbnn alua gbnn alua
const const
pri uni pri uni
rr ql st rr ql st rr ql st rr ql st
failover alua failover alua
const const
multibus alua multibus alua
const const
gbs alua gbs alua
const const
gbp alua gbp alua
const const
gbnn alua gbnn alua
const const
streaming read streaming write
random read ramdom write
OLTP SAP HANA DATA
unipri
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 8
4.6 SCSI setting
The only parameter we adjusted was the queue-depth value of the HBA.
For all tests we set it to 256.
find -H /sys/block/*/queue -name nr_requests | while read i; do echo 256 > $i; done
4.7 IBM Storwize RAID settings
Typically, HDD or SSD are configured as RAID 5 or as RAID 10 (RAID 5 provides ~85%
usable capacity, while RAID 10 delivers 50%).
Figure 2 shows the RAID 5 performance proportional to RAID 10, both RAIDs had equal
number of HDDs. OLTP and sequential operations benefit from RAID 5, while RAID 10
delivers better performance for random writes, like for SAP HANA DATA.
rr ql st rr ql st rr ql st rr ql st rr ql st rr ql st
failover failover failover
gbnn gbnn gbnn
gbp gbp gbp
gbs gbs gbs
multibus multibus multibus
rr ql st rr ql st rr ql st rr ql st rr ql st rr ql st
failover failover failover
gbnn gbnn gbnn
gbp gbp gbp
gbs gbs gbs
multibus multibus multibus
OLTP
pri uni
streaming
read
random
read
streaming
write
SAP
HANA
ramdom
write
pri uni
pri uni pri uni
pri uni pri uni
Table 3: Multipath configuration matrix for FlashSystem (FS9000), prio: alua
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 9
Figure 2: Proportional RAID performance
4.8 FlashSystem settings
The number of Volumes, mapped as Managed Disk to IBM Storwize or IBM Spectrum
Scale (SVC) is not critical, if they have a reasonable number, and size.
4.9 FS9100 settings
Two RAID 6 have been uses as 10+2; one storage pool.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 10
5 Recommendations
For IBM storage type 2145, which specifies the IBM Spectrum Virtualize family including
SVC, Storwize, and FS9100, use the following recommendation to gain maximum IO per-
formance for LINUX on Intel or IBM POWER, applicable for RedHat LINUX as well as SLES:
1) Use 4 FC connections per canister / RAID controller:
8 FC connections per SVC/Storwize/FlashSystem
2) IO scheduler: noop
3) LVM2 settings:
use 4 PV for 1 VG
set data alignment to 1 MB
set PV extent size to 1 MB
set LV stripe size to 256 KB
4) Multipath settings: path_grouping_policy group_by_prio
prio alua
rr_weight uniform # for HDD
priorities # for Flash / SSD
path_selector "service-time 0"
5) RAID settings
use RAID 6 -- even that was not used for V7000 in 2016, but it is best practices
now.
Sample /etc/multipath.conf file:
devices {
device {
vendor "IBM"
product "2145"
path_grouping_policy "group_by_prio"
prio "alua"
path_checker "tur"
path_selector "service-time 0"
failback "immediate"
rr_weight "priorities"
no_path_retry "fail"
rr_min_io_rq 10
dev_loss_tmo 600
}
}
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 11
6 Measurements V7000
This section shows the different performance results; several thousand measurements
have been taken, the charts shown here with aggregated data are a small subset to proof
the recommendations.
6.1 IO Scheduler
Figure 3: Performance of IO scheduler
1/8: number of IO processes, R/S: Random/Sequential, r/w: read/write, 64k/1M: block size
Chart 3 shows that with cfq streaming does not scale with large block size, eight streams deliver just 200
MB/s total, while a single stream delivers 900 MB/s.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 12
6.2 LVM2 IO performance matrix: PV extent size X LV stripe size
Figure 4: Performance depending on PV extent and LV stripe size
Chart 4 shows that the performance does not depend that much on the PV extent and LV
stripe size, but the combinations 1MB x 1MB and 1GB x 1MB have performance drops.
Figure 5: Performance depending on PV extent and LV stripe size, full matrix
Chart 5 shows that any other LV stripe size than 64KB or 128KB delivers equal perfor-
mance for random writes with large block size.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 13
Figure 6: Performance depending on number PV
Charts 6 shows that almost max. performance can be achieved with two PV.
6.3 Performance matrix of dm-multipath
There are hundreds of permutations possible, just some interesting has been picked.
8FC SAN connections, deadline scheduler, 4 PV per VG, 256 stripe size,
1MB extent size:
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 21
7 Measurements FS9100
Some best practices from V7000 testing have been used as base:
LVM2 settings:
4 PV per 1 VG
data alignment: 1 MB
PV extent size: 1 MB
LV stripe size:256 KB
multipath settings: prio "alua"
scheduler: deadline
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 25
8 Resources
IO elevators (scheduler):
https://www.suse.com/documentation/sled11/book_sle_tuning/data/cha_tuning_io_sched-
ulers.html
https://www.suse.com/documentation/sles-12/book_sle_tuning/data/cha_tun-
ing_io_switch.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Perfor-
mance_Tuning_Guide/main-io.html
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/perfor-
mance_tuning_guide/index
https://www.thomas-krenn.com/de/wiki/Linux_I/O_Scheduler
dm-multipath:
https://www.thomas-krenn.com/de/wiki/Linux_Storage_Stack_Diagramm
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-sin-
gle/dm_multipath/index
https://www.suse.com/documentation/sles-12/stor_admin/data/cha_multipath.html
IBM xdisk tool:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS5304
9 Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. These and
other IBM trademarked terms are marked on their first occurrence in this information with
the appropriate symbol (® or ™), indicating US registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks
is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation
in the United States, other countries, or both:
IBM®, POWER®, Redpaper™, Redbooks (logo) ®, System Storage®
The following terms are trademarks of other companies:
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 26
SAP, SAP NetWeaver, SAP HANA as well as their respective logos are trademarks or reg-
istered trademarks of SAP AG in Germany or an SAP affiliate company.
Intel Xeon, Intel, Itanium, Intel logo, Intel Inside logo, and Intel Centrino logo are trade-
marks or registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
SUSE and the SUSE logo are registered trademarks of SUSE LLC in the United States and
other countries.
Red Hat and the Red Hat logo are registered trademarks of Red Hat, Inc. in the United
States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
10 Disclaimers
This information was developed for products and services offered in Germany.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and ser-
vices currently available in your area. Any reference to an IBM product, program, or ser-
vice is not intended to state or imply that only that IBM product, program, or service may
be used. Any functionally equivalent product, program, or service that does not infringe
any IBM intellectual property right may be used instead. However, it is the user's respon-
sibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described
in this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Cor-
poration, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PAR-
TICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties
in certain transactions, therefore, this statement may not apply to you.
LINUX IO performance tuning for IBM System Storage
© Copyright IBM Corporation 2018 Page - 27
This information could include technical inaccuracies or typographical errors. Changes
are periodically made to the information herein; these changes will be incorporated in
new editions of the publication. IBM may make improvements and/or changes in the
product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only
and do not in any manner serve as an endorsement of those websites. The materials at
those websites are not part of the materials for this IBM product and use of those web-
sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appro-
priate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly.
Some measurements may have been made on development-level systems and there is
no guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Ac-
tual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those prod-
ucts, their published announcements or other publicly available sources. IBM has not
tested those products and cannot confirm the accuracy of performance, compatibility or
any other claims related to non-IBM products. Questions on the capabilities of non-IBM
products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations.
To illustrate them as completely as possible, the examples include the names of individ-
uals, companies, brands, and products. All of these names are fictitious and any similarity
to the names and addresses used by an actual business enterprise is entirely coincidental.