Upload
ben-cheikh-ali-mohsen
View
355
Download
11
Embed Size (px)
Citation preview
© 2008 IBM Corporation
AIX 6.1 Performance Differences
Speaker: Steve Nasypany
Session ID: pAI09
IBM Power Systems Technical University
featuring IBM AIX and LinuxSeptember 8 – 12, 2008 – Chicago, IL
IBM Training
© 2008 IBM Corporation
Introduction
� VMM Page Replacement
– New defaults reducing the requirement for basic performance tuning
� VMM File IO Pacing Enabled By Default
� Performance Tunables
– Tunables are categorized into restricted and non-restricted tunables
� AIO
– Dynamic AIO tuning
– AIO Fast Path for CIO
� JFS2
– Read only access to files opened with CIO
� NFS
– Changes to TCP scaling window, R/W size and number of biod daemons
� Enhanced JFS “no-log” option
� MPSS support
IBM Training
© 2008 IBM Corporation
Review – AIX Page replacement algorithm� When page replacement begins to run, it selects a
page type to steal based on:– If the amount of file pages is above
maxclient/maxperm, file pages are chosen– If the number of file pages is between minperm and
maxclient, the type is chosen based on re-paging history
– If the amount of file pages is below minperm, working storage and file pages are chosen without checking for the re-paging history
� Re-paging history indicates if individual pages have been written to disk and read back recently– Re-paging history adds a degree of uncertainty to
the selection process– If re-paging history decides to pick working storage
pages, system paging may begin• This was intended as a “safety” valve, if we are
too aggressive in stealing file pages, stop• But, sometimes it is triggered by bad luck
– If re-paging history decides to pick file pages and many file pages are “dirty”, heavy writes to disk can occur• This would probably happen eventually due to
“sync”
Pick file pages
Pick any pages
Pick file pages
-or-
w/s pages based
on recent history
0%
100%
Maxclient =
Maxperm=80%
Minperm=20%
Contents of system memoryH
ow m
uch
mem
ory
is c
achi
ng fi
les
IBM Training
© 2008 IBM Corporation
� AIX 6.1
– minperm% = 3
– maxperm% = 90
– maxclient% = 90
– strict_maxperm = 0
– strict_maxclient = 1
– lru_file_repage = 0
– page_steal_method = 1
� On AIX 6.1, no paging to the paging space will occur unless the system memory is over committed (AVM > 97%)
AIX v5 vs v6 VMM Page Replacement tuning
� AIX 5.2/5.3
– minperm% = 20
– maxperm% = 80
– maxclient% = 80
– strict_maxperm = 0
– strict_maxclient = 1
– lru_file_repage = 1
– page_steal_method = 0
IBM Training
© 2008 IBM Corporation
Legacy page_steal_method=0
� Partition memory is broken up into page pools
– A page pool is a set of physical pages organized into a list
� One lrud per memory pool
� Inside each memory pool is a mix of working storage and file pages
� When the free list is depleted, lrud scans its page pool one scan bucket (default 128k pages) at a time
� The scan can be targeted for working storage pages, file pages, or either
� If scanning for file pages and the number of file pages is small (e.g. max_client=10) the ratio of scanned pages to freed pages will be high (e.g. 10:1)
� This reduces performance in two ways:– CPU time in lrud– Fragmentation of memory which can
result in I/O coalescing being less effective
System Memory
Page Pool 0
Page Pool 1
List of
pages
Page
scan for
either w/s
or file
List of
pages
Page
scan for
either w/s
or file
IBM Training
© 2008 IBM Corporation
List-based LRU page_steal_method=1
� Partition memory is broken up into page pools
– A page pool is a set of physical pages – There are two lists for a page pool,
one that is working storage pages and another that is file pages
� One lrud per memory pool
� When the free list is depleted, lrud scans the appropriate list for the type of pages it desires one scan bucket (128k pages) at a time
� If scanning for file pages and the number of file pages is small (e.g. max_client=10) the ratio of scanned pages to freed pages should be low (e.g. 2:1 – 1:1)
� This improves performance in two ways:– CPU time in lrud is reduced due to less
scanning– IO Coalescing is better preserved for
reading and writing of files larger than memory
Page Pool 0
List of w/s
pages
Page
scan for
w/s
List of file
pages
Page
scan for
file
IBM Training
© 2008 IBM Corporation
VMM File IO Pacing Enabled By Default
� IO Pacing Enabled By Default– Prevents system responsiveness issues due to large quantities of
writes– Limits the maximum number of pages of I/O outstanding to a file
• Without I/O pacing a program can fill up large amounts of memory with written pages. Those “queued” I/O’s can result in long waits for other programs using the storage
• Better solution than the file system write behind techniques– New defaults
• Not very aggressive, intended to limit one or a few programs from impacting system responsiveness. Values high enough not to impact sequential write performance
• maxpout = 8193• minpout = 4096
IBM Training
© 2008 IBM Corporation
Performance Tunables
� Tunables now in two categories
� Restricted Tunables
– Should not be changed unless recommended by AIX development or development support
– Are not shown by tuning commands unless the –F flag is used
– Dynamic change will show a warning message
– Permanent change must be confirmed
– Permanent changes will cause an error log entry at boot time
� Non-Restricted Tunable
– Can have restricted tunables as dependencies
IBM Training
© 2008 IBM Corporation
Changing restricted tunables
�ioo -po aio_sample_rate=6Modification to restricted tunable aio_sample_rate, confirmation yes/no
> ioo -o aio_sample_rate=6Warning: a restricted tunable has been modified
�Changing a restricted tunable dynamically
A permanent change of a restricted tunable requires a confirmation from the user.
Note: The system will log changes to restricted tunable in the system error log atboot time.
A dynamic change of a restricted tunable will inform the user.
�Changing a restricted tunable permanently
IBM Training
© 2008 IBM Corporation
List restricted tunables
> ioo -aF
aio_active = 0
aio_maxreqs = 65536
...
posix_aio_minservers = 3
posix_aio_server_inactivity = 300
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1
aio_sample_rate = 5
aio_samples_per_cycle = 6
j2_maxUsableMaxTransfer = 512
j2_nBufferPerPagerDevice = 512
j2_nonFatalCrashesSystem = 0
j2_syncModifiedMapped = 1
j2_syncdLogSyncInterval = 1
IBM Training
© 2008 IBM Corporation
TUNE_RESTRICTED Error Log EntryLABEL: TUNE_RESTRICTEDIDENTIFIER: D221BD55
Date/Time: Thu May 24 15:05:48 2007Sequence Number: 637Machine Id: 000AB14D4C00Node Id: quakeClass: OType: INFOWPAR: GlobalResource Name: perftune
DescriptionRESTRICTED TUNABLES MODIFIED AT REBOOT
Probable CausesSYSTEM TUNING
User CausesTUNABLE PARAMETER OF TYPE RESTRICTED HAS BEEN MODIFIED
Recommended ActionsREVIEW TUNABLE LISTS IN DETAILED DATA
Detail DataLIST OF TUNABLE COMMANDS CONTROLLING MODIFIED RESTRICTED TUNABLES AT REBOOT, SEE FILE /etc/tunables/lastboot.log
IBM Training
© 2008 IBM Corporation
Why you ask?
� The number of tunables in AIX had grown to a ridiculously large number
– 5.3 TL06: vmo 61, ioo 27, schedo 42, no 135, plus a few others
– 6.1 vmo 29, ioo 21, schedo 15, no 133, plus a few others
� The potential combinations that exist are too huge to effectively test and document
� Many of the tunables had been created to deal with very specificcustomers or situations which don’t apply often
� This wasn’t done in a vacuum, a survey of support and recent situations was employed to identify the commonly used tunables (which remain unrestricted)
� If a restricted tunable must be changed, a PMR should be opened to identify the issue
IBM Training
© 2008 IBM Corporation
General trend toward file system I/O with concurrent I/O
� Concurrent I/O (CIO) has been a feature of AIX since AIX 5.2– http://www.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.pdf
� Concurrent I/O gives applications which do internal buffering of disk I/O and locking a means of by-passing operating system caching and i-node file locking
– This improves CPU efficiency of I/O to very near that of raw logical volumes
– And improves scalability by eliminating operating system i-node locking in the read/write paths
� Concurrent I/O is not for all applications– Some applications require operating system i-node locking to
function correctly– Other applications do not do sophisticated storage buffering and
benefit from caching in the operating system or read-ahead/write-behind mechanisms that the AIX virtual memory management subsystem provide to improve sequential file performance
IBM Training
© 2008 IBM Corporation
CIO and Applications
� DB2 Version 9.5 implements CIO as the DEFAULT mechanism for table spaces on AIX
– NO FILE SYSTEM CACHING/FILE SYSTEM CACHING clauses on CREATE TABLESPACE or ALTER TABLESPACE
– View caching DB2 GET SNAPSHOT FOR TABLES ON db
– DB2 has supported CIO since V8.1
– http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/
� Oracle 10g/11g have support, but it is not a default
– Requires filesystemio_options is SETALL or DIRECTIO– CIO is the recommended deployment solution for JFS2, however some 3rd
party tools have issues
IBM Training
© 2008 IBM Corporation
CIO and Applications
� If you use legacy VMM tuning (e.g AIX 5.2/5.3 defaults) and you switch an application from non-CIO to CIO operation, you will likely need to retune
– The amount and distribution of memory may change quite radically
– Usually, switching file usage to CIO reduces the memory required, as the operating system no longer will be buffering file pages for those files
– Upgrading from DB2 9.1 (non-CIO) to DB2 9.5 may require some tuning preparation
� With AIX 6.1 default tuning, it should not be necessary to change tuning when converting from non-CIO to CIO operation
IBM Training
© 2008 IBM Corporation
AIX 6.1 AIO Support
� Interface Changes
– All the AIO entries in the ODM and AIO smit panels have been removed
– The aioo command will not longer be shipped
– All the AIO tunables have current, default, minimum and maximum value that can be viewed with ioo
� AIO kernel extension loaded at system boot
– Applications no longer fail to run because you forgot to load the kernel extension (you may applaud here)
– No AIO servers are active until requests are present
– Extremely low impact on memory requirements with this implementation
IBM Training
© 2008 IBM Corporation
Improvements to AIO CIO
� AIO Fast Path for CIO enabled by default– With the fast path, the AIO server
threads no longer participate in the I/O path
– By removing the AIO servers from the path, we get three things• The removal of AIO servers as any
potential resource bottleneck• The reduction in path length for AIO
read/write services, as less dispatching is required
• Potentially better coalescing of sequential I/O requests initiated through AIO or LISTIO services
� Fast Path enabled for LV and PV’s for a long time
– No change in behavior for environments such as Oracle 10G/ASM on raw hdisks
Application
File System File System
LVM
Device Driver
Application
File System
LVM
Device Driver
AIO ServerApplication
FS no Fast Path
CIO Fast Path
IBM Training
© 2008 IBM Corporation
General improvements to AIO
� The number of AIO servers varies between minservers and maxservers (times #CPUs), based on workload
– AIO servers stay active as long as they service requests
– Number of AIO server dynamically increased/reduced based on the demand of the workload
– aio_server_inactivity defines after how many seconds idle time an AIO server will exit
– Do not confuse no active servers with kernel extension not loaded. The kernel extension is always loaded
� Changes to AIO tunables are dynamic through ioo
– Changes do not require system reboot– minservers is changed to a per CPU tunable
– maxservers is changed to 30
– maxreqs is changed to 65536
� Benefit
– No longer necessary to tune the minservers/maxservers/maxreqs as in the past
IBM Training
© 2008 IBM Corporation
AIO Tunables
> ioo -a
aio_active = 0
aio_maxreqs = 65536
aio_maxservers = 30
aio_minservers = 3
aio_server_inactivity = 300
posix_aio_active = 0
posix_aio_maxreqs = 65536
posix_aio_maxservers = 30
posix_aio_minservers = 3
posix_aio_server_inactivity = 300
IBM Training
© 2008 IBM Corporation
AIO Restricted Tunables> ioo -aF
...
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1
aio_sample_rate = 5
aio_samples_per_cycle = 6
posix_aio_fastpath = 1
posix_aio_fsfastpath = 1
posix_aio_kprocprio = 39
posix_aio_sample_rate = 5
posix_aio_samples_per_cycle = 6
IBM Training
© 2008 IBM Corporation
CIO Read Mode Flag
� Allows an application to open a file for CIO such that subsequent opens without CIO avoid demotion
– In the past, a 2nd opening of a file without CIO, would cause “demotion” which removes many of the benefits of CIO
– The 2nd read-only opening without CIO will still result in that opening having uncached reads to the file. Thus, such programs should ensure that the I/O sizes are large enough to achieve I/O efficiency
� Example, a backup application can access database files in read only mode while the database has the file opened in concurrent IO mode
� open() flag is O_CIOR
� procfiles does not reflect O_CIO/O_CIO_R currently
– kdb 'u <slotnumber>' then for each file listed there 'file <filepointer>' gives some info
IBM Training
© 2008 IBM Corporation
NFS Performance Improvements
� RFC 1323 enabled by default– Allows for TCP window scaling beyond 64K, so more one-way packets
in-flight allowed between acks for large sequential transfers. We had the nfs_rfc1323 tunable before, it just wasn't enabled by default.
� Increase default number of biod daemons– 32 biod daemons per NFS V3 mount point– Very slight increase in memory (<2MB) required over previous default
of 4– Enables more I/O’s to be outstanding at the same, doesn’t speed
sequential operations much, but helps random access (e.g. OLTP)
� Default read/write size increased to 64k for TCP connections– Was 32k previously
IBM Training
© 2008 IBM Corporation
NFS biod changes
� Having more biod’s allows better read-ahead and write-behind
� However, measured on a single-process basis, don’t have huge performance differences over the AIX 5.3 defaults
� Results should improve in tests with multiple processes/threads operating over NFS
� NFS client tests, p5 520 on 1GB Ethernet with 64kB I/O’s (next slide)
IBM Training
© 2008 IBM Corporation
NFS biod changes
NFS single process throughput, over 256MB file
0
20000
40000
60000
80000
100000
120000
read
seq
serv
er u
ncach
ed
read
seq
serv
er ca
ched
read
rand
ser
ver u
ncac
hed
write
seq
over
write
write
seq
crea
tewrit
e ra
nd c
reat
e
MB
/sec
on
d
32biod4biod
IBM Training
© 2008 IBM Corporation
NFS biod change with Kerberos krbp5
� The increase in biod’s has a much more positive impact when using Kerberos DES security
� Overlapping more compute with network traffic through more biod’s greatly improves throughput
� Same model as previous chart, krbp5 (full packet encryption) mount option
NFS biod changes with Kerberos
0
10000
20000
30000
40000
50000
60000
70000
read
seq
serv
er u
ncach
ed
read
seq
serv
er ca
ched
read
rand
ser
ver u
ncac
hed
write
seq
over
write
write s
eq c
reate
write
rand
cre
ate
MB
/sec 32biod
4biod
IBM Training
© 2008 IBM Corporation
Enhanced JFS “nolog” option
� JFS2 standard metadata logging for filesystem integrity disabledvia a mount option
– Similar to “legacy” JFS “nointegrity option”
� Meant to enable faster migration of data to new storage
– File system operation with heavy file create/delete activity cancreate log bottlenecks
– Potentially useful for temporary file systems where the filesystem can be easily recreated or fsck’ed
� Mount –o log=NULL during data migration phase, then unmountand mount with standard logging
IBM Training
© 2008 IBM Corporation
Enhanced JFS “nolog” option - example
� 4-way POWER5 p550, PHP test “Wikibench”
� Test makes heavy use of file meta-data
� With single disk setup, bottleneck on disk writes to Enhanced JFS2 logs
� With “nolog”, the log bottleneck is avoided
Disk utilization over time
0
20
40
60
80
100
time
%d
isk
bu
sy
default log
nolog
PHP Wikibench
0102030405060708090
Default log nolog
Th
rou
gh
pu
t
IBM Training
© 2008 IBM Corporation
Multiple Page Size Segment (MPSS) Support
� POWER6 provides hardware support for mixing 4kB pages and 64kB pages in the same hardware segment
� This allows the AIX operating system to transparently to an application promote small pages to medium pages
– This typically improves performance by reducing stress on hardware translation mechanisms
– It is controlled with the vmo vmm_default_pspa parameter (-1 turns off)
� This behavior is enabled as a default on AIX 6.1 on POWER6 hardware– Since it is not supported on POWER5, systems running identical
application conditions on POWER5 and POWER6 may differ on exact memory page usage
– In general, no increase in memory consumption should be noticed,however the usage of 64kB pages may increase on POWER6
– System paging activity may result in 64kB pages being broken into 4kB pages
– 64kB pages that are broken by paging won’t usually be reconstituted into 64kB pages later
IBM Training
© 2008 IBM Corporation
MPSS – Using svmon to see MPSS segmentssvmon –P 553068
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
553068 java 44652 8388 37623 73342 N Y N
PageSize Inuse Pin Pgsp Virtual
s 4 KB 1132 244 4055 4798
m 64 KB 2720 509 2098 4284
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
51b10 3 work working storage m 1879 0 1946 3068
0 0 work kernel segment m 520 507 47 561
3c02d d work text or shared-lib code seg m 297 0 85 612
d3a7 e work shared memory segment sm 582 0 3744 4096
61adc - work s 549 244 311 702
65add f work working storage m 20 0 17 36
51ad0 2 work process private m 3 2 2 5
75ad9 1 work code m 1 0 1 2
IBM Training
© 2008 IBM Corporation
MPSS – Using svmon to detail MPSS segments
svmon –D d3a7
Segid: d3a7
Type: working
PSize: sm (4 KB - 64 KB)
Address Range: 0..4095
Size of page space allocation: 3744 pages ( 14.6 MB)
Virtual: 4096 frames (16.0 MB)
Inuse: 582 frames ( 2.3 MB)
Page Psize Frame Pin ExtSegid ExtPage
0 m 442176 Y - -
1 m 442177 Y - -
2 m 442178 Y - -
382 s 362140 N - -
435 s 430534 N - -
IBM Training
© 2008 IBM Corporation
Implementation Considerations
� AIX 5.2/5.3
– VMM page replacement tuning
• reduce minperm, maxperm, maxclient
• turn off strict_maxclient
• increase minfree, maxfree
– AIO tuning
• Enable AIO
• Tune minservers, maxservers and reboot
– DB2 tuning
• Enable CIO
� AIX 6.1
– VMM page replacement tuning
• NO TUNING REQUIRED
– AIO tuning
• NO TUNING REQUIRED
– DB2 tuning
• Enable CIO
�AIX 5.2/3 to AIX 6.1 migration example (DB2 performance tuning)
IBM Training
© 2008 IBM Corporation
Implementation Considerations (Cont’d)
�Best Practices�Do not apply legacy tuning since some tunables may now be restricted
�If you do an upgrade install, your old tunings will be preserved�You may wish to undo them, but we won’t make you
�This level of tune was been applied to numerous AIX 5.3 customers through field support
�We are confident this was a good thing
�However, we try to never change defaults in the service stream, so AIX 5.3 remains as it was
�Change restricted tunables only if recommended by AIX support
IBM Training
© 2008 IBM Corporation
Implementation Considerations (Cont’d)
�Problem Determination�Common problems - seen in field or lab
�Legacy VMM tuning results in error log entries (TUNE_RESTRICTED)
�Tuning scripts fail due to required confirmation for permanent changes of restricted tunables
�Install/tuning scripts fail due missing aio0 device�Diagnostics
�Check AIX errpt for TUNE_RESTRICTED
�Check /etc/tunables/lastboot.log�PERFPMR
IBM Training
© 2008 IBM Corporation34
TrademarksThe following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
The following are trademarks or registered trademarks of other companies.
* All other products may be trademarks or registered trademarks of their respective companies.
Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:
*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA, WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.
Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.