22
1. Install AIX v5.2 at ML01 [or later] and PTFs for HIPER (High-Impact, PERvasive) APARs. If the customer is not ready for AIX v5.2 and must use AIX v5.1, install AIX v5.1 at ML03 [or later]. To get a list of HIPER APARs, please visit: h ttps://techsupport.services.ibm.com/server/aix.CAPARdb Enter “YesHIPER” in the box at the top of the window Change Display by from Proximity to Date Click Exact Words button 2. You can use either the 32-bit or 64-bit AIX kernel. All Oracle products [even the 64-bit products] work just fine on the 32-bit AIX kernel - this is not a typo . Unlike the other UNIX’es , the 32-bit AIX kernel can simultaneously and natively execute both 32-bit and 64-bit user-mode applications. Please do everything you can to correct this misunderstanding, if it is shared by your customer[s] The 64-bit AIX kernel is [only] required when the system [or LPAR] has more than 96GB of physical memory. Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er] Oracle products ship with only the 32-bit edition. An Oracle patch (2896876) is available that provides the 64-bit edition of pw-syscall for the 64-bit AIX kernel. Example: Oracle8i, 11i, iAS 1.0.2.2.2 require a patch to run on the 64-bit AIX kernel; whereas more recent products such as Oracle9i Release 2 (9.2) or 9iAS 9.0.2 support for both 32-bit 64-bit AIX kernels [without additional patches]. Please note that Oracle's metalink web site documents which Oracle patches are required for 64-bit AIX kernel support: http://metalink.oracle.com / 3. If using Oracle9i, enable 64-bit application-mode using the fast path: 'smitty load64bit' Please note that the bos.64bit fileset (Base Operating System 64-bit Runtime) must be installed before 64-bit application mode can be enabled. Enabling 64-bit application mode allows 64-bit applications (such as Oracle9i) to run. The mode setting has no effect whatsoever on 32-bit applications. Please note that Oracle9i requires 64-bit pSeries hardware. You can use either the 32-bit or 64-bit AIX kernel. The following commands will report the hardware and kernel details: $bootinfo -y [reports the hardware type] $bootinfo -K [reports the kernel type] Reminder: Oracle9i ships with both 32-bit client and 64-bit Oracle clients. Although the database is 64-bit software, it interoperates with both 32-bit and 64-bit client applications & servers. All run fine on both the 32-bit and 64-bit AIX kernels. Additionally, Oracle9i R2 is also certified on AIX 4.3.3 and AIX 5.1. Oracle9i / AIX5L Installation FAQ 30-May-2003 Reviewed & approved by IBM’s Oracle Competency Center Douglas R. Ranz; [email protected]

Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

1. Install AIX v5.2 at ML01 [or later] and PTFs for HIPER (High-Impact, PERvasive) APARs.If the customer is not ready for AIX v5.2 and must use AIX v5.1, install AIX v5.1 at ML03[or later].

To get a list of HIPER APARs, please visit:

https://techsupport.services.ibm.com/server/aix.CAPARdb

� Enter “YesHIPER” in the box at the top of the window� Change Display by from Proximity to Date� Click Exact Words button

2. You can use either the 32-bit or 64-bit AIX kernel. All Oracle products [even the 64-bitproducts] work just fine on the 32-bit AIX kernel - this is not a typo. Unlike the otherUNIX’es , the 32-bit AIX kernel can simultaneously and natively execute both 32-bit and64-bit user-mode applications. Please do everything you can to correct thismisunderstanding, if it is shared by your customer[s]

The 64-bit AIX kernel is [only] required when the system [or LPAR] has more than 96GBof physical memory.

Please note that some Oracle components require Oracle's pw-syscall kernel extension.Some old[er] Oracle products ship with only the 32-bit edition. An Oracle patch(2896876) is available that provides the 64-bit edition of pw-syscall for the 64-bit AIXkernel.

Example: Oracle8i, 11i, iAS 1.0.2.2.2 require a patch to run on the 64-bit AIX kernel;whereas more recent products such as Oracle9i Release 2 (9.2) or 9iAS 9.0.2support for both 32-bit 64-bit AIX kernels [without additional patches].

Please note that Oracle's metalink web site documents which Oracle patches are requiredfor 64-bit AIX kernel support:

http://metalink.oracle.com/

3. If using Oracle9i, enable 64-bit application-mode using the fast path: 'smitty load64bit' Please note that the bos.64bit fileset (Base Operating System 64-bit Runtime) must beinstalled before 64-bit application mode can be enabled. Enabling 64-bit application modeallows 64-bit applications (such as Oracle9i) to run. The mode setting has no effectwhatsoever on 32-bit applications.

Please note that Oracle9i requires 64-bit pSeries hardware. You can use either the32-bit or 64-bit AIX kernel. The following commands will report the hardware and kerneldetails:

$bootinfo -y [reports the hardware type]$bootinfo -K [reports the kernel type]

Reminder: Oracle9i ships with both 32-bit client and 64-bit Oracle clients. Although thedatabase is 64-bit software, it interoperates with both 32-bit and 64-bit clientapplications & servers. All run fine on both the 32-bit and 64-bit AIX kernels.Additionally, Oracle9i R2 is also certified on AIX 4.3.3 and AIX 5.1.

Oracle9i / AIX5L Installation FAQ30-May-2003

Reviewed & approved by IBM’s Oracle Competency CenterDouglas R. Ranz; [email protected]

Page 2: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

4. Best performance is achieved using AIX raw logical volumes for Oracle datafiles. If DBAsinsist on using filesystems; please use Concurrent I/O [CIO], which allows filesystemperformance to approach that of raw devices. CIO is only available on JFS2 filesystemsand requires AIX v5.2 ML01 [or later].

If using JFS [not JFS2], use Direct I/O [DIO]. Reminder: CIO is only available on JFS2,AIX v5.2, ML01 [or later].

Never disable Asynchronous I/O in the init.ora.

Do not allocate a JFS with the large-file-enabled (bf) attribute. The big-file attributeincreases the minimum DIO transfer size (diocapbuf.dio_min returned by the finfo systemcall) from 4K to 128K, forcing Oracle to read and write a minimum of 128K bytes to exploitDIO. Please note that 2GB is the largest file size applications (including Oracle) can usein a JFS without the bf attribute.

Do not allocate a compressed JFS; it essentially defeats DIO.

Certain AIX installation and tuning documents recommend using the 64-bit AIX kernel withJFS2 due to memory usage concerns. This does not apply when using CIO and JFS2 asno file system buffer caching is occurring.

If using Oracle8i or Oracle9i release 1 on JFS/JFS2, make sure to have the fsync bugfix(Oracle patch 2024097). Oracle9i release 2 and later has the fix built-in.

Use a log logical volume rather than an INLINE log. Pick a block size (agblksize attribute)that is an integral divisor of the Oracle DB_BLOCK_SIZE. For Oracle redo logs, use a rawdevice or a dedicated JFS2 filesystem with agblksize set to 512 bytes.

Use the CIO mount option for JFS2 filesystems containing Oracle data files and logs.[Reminder: CIO:JFS2 require AIX v5.2 ML01 or later.] Because the CIO mount optionwas not in the AIX v5.2 base, it may not yet be fully documented. Additional CIOinformation is available in the whitepaper, “Improving Database performance With AIXConcurrent I/O, A Case Study with the Oracle9i database on AIX 5L version 5.2” Thiswhitepaper is available at:

http://www-1.ibm.com/servers/aix/whitepapers/db_perf_aix.html

Additional information about DIO is available in the "Tuning Direct I/O" article in the"Performance Management Guide" in the AIX v5.1 web-based documentation at:

http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365c813.htm

5. If any of the Oracle components need a C compiler (eg, Oracle Pro*C/C++), order andinstall the VisualAge C++ for AIX v5 (5765-E26) or C for AIX v5 (5765-E32) compiler(depending on whether you need C&C++ or C alone).

6. Install the latest maintenance (currently the PTF for APAR IY40943 or IY40944,respectively). The latest maintenance is crucial, as a critical bug was introduced in theJuly 2002 maintenance that is only fixed in the latest maintenance.

Oracle9i / AIX5L Installation FAQ30-May-2003

Reviewed & approved by IBM’s Oracle Competency CenterDouglas R. Ranz; [email protected]

Page 3: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

7. Please note that IBM VisualAge C++ Professional for AIX v6 (5765-F56) compiler is beingtested by IBM and Oracle. There are still open issues, so v5 is currently therecommended version. Official support of VAC/C++ v6 is planned for Summer 2003, atwhich point Oracle Metalink will be updated.

Oracle9i / AIX5L Installation FAQ30-May-2003

Reviewed & approved by IBM’s Oracle Competency CenterDouglas R. Ranz; [email protected]

Page 4: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

Improving Database Performance With

AIX Concurrent I/O

A case study with Oracle9i Database on AIX 5L version 5.2

Authors: Sujatha Kashyap Bret Olszewski Richard Hendrickson

{skashyap, breto, richhend}@us.ibm.com

Page 5: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 1 of 12

1 Introduction A new file system feature called "Concurrent I/O" (CIO) was introduced in the Enhanced Journaling File System (JFS2) in AIX 5L™ version 5.2.0.10, also known as maintenance level 01 (announced May 27, 2003). This new feature improves performance for many environments, particularly commercial relational databases. In many cases, the database performance achieved using Concurrent I/O with JFS2 is comparable to that obtained by using raw logical volumes. This paper details the implementation and operational characteristics of Concurrent I/O, and presents the results of our performance evaluation of Concurrent I/O with Oracle9i Database. The file system has long been the heart of UNIX® storage management. Commands and interfaces for manipulating and managing data stored on files are commonly used throughout the UNIX world by users of all skill levels. Managing persistent data via such universally understood mechanics is key to application portability. File systems thus provide a very useful and desirable abstraction for data storage. As is often the case with any method of abstraction, however, the use of file systems results in some tradeoffs between performance and ease of use. The fastest means of transferring data between an application and permanent storage media such as disks, is to directly access more primitive interfaces such as raw logical volumes. The use of files for data storage involves overheads due to serialization, buffering and data copying, which impact I/O performance. Using raw logical volumes for I/O eliminates the overheads of serialization and buffering, but also requires a higher level of skill and training on the part of the user since data management becomes more application-specific. Also, while file system commands do not require system administrator privileges, commands for manipulating raw logical volumes do. However, due to its superior performance, database applications have traditionally preferred to use raw logical volumes for data storage, rather than using file systems. With the Concurrent I/O feature now available in JFS2, database performance on file systems

rivals the performance achievable with raw logical volumes.

2 Using File Systems for Database Applications

For database applications, the superior performance of raw logical volumes compared to file systems arises from certain features of the file system:

• The file buffer cache • The per-file write lock, or inode lock • The sync daemon

These file system features help ensure data integrity, improve fault tolerance, and in fact improve application performance in many cases. However, these features often pose performance bottlenecks for database applications. This section explains the role of these features in a file system, how they impact database performance, and the options provided by JFS2 to help reduce their performance impact.

2.1 File Buffer Cache At the most basic level, a file is simply a collection of bits stored on persistent media. When a process wants to access data from a file, the operating system brings the data into main memory, where the process can examine it, alter it, and then request that the data be saved to disk. The operating system could read and write data directly to and from the disk for each request, but the response time and throughput would be poor due to slow disk access times. The operating system therefore attempts to minimize the frequency of disk accesses by buffering data in main memory, within a structure called the file buffer cache. On a file read request, the file system first attempts to read the requested data from the buffer cache. If the data is not already present in the buffer cache, it is read from disk and cached in the buffer cache. Figures 1 and 2 show the sequence of actions that take place when a read request is issued under this caching policy. Similarly, writes to a file are cached so that future reads can be satisfied without necessitating a disk access, and to reduce the frequency of disk writes. The use of a file buffer cache can be extremely effective when the cache hit rate is high. It also enables the use of sequential read-ahead and write-behind policies to reduce the frequency of physical disk I/O’s.

Page 6: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 2 of 12

Another benefit is in making file writes asynchronous, since the application can continue execution without waiting for the disk write to complete. Figure 3 shows the sequence of actions for a write request under cached I/O. While the file buffer cache improves I/O performance, it also consumes a significant portion of system memory. AIX’s Enhanced JFS, also known as JFS2, allows the system administrator to control the maximum amount of memory that can be used by the file system for caching. JFS2 uses a certain percentage of real memory for its file buffer cache, specified by the maxclient% parameter. The value of max-client% can be tuned via the vmo command. By default it is set to 80, which implies that JFS2 can use up to 80% of real memory for its file buffer cache. The range of acceptable values for maxclient% is from 1 to 100. For example, the following command will reduce the maximum amount of memory that can be used for the file buffer cache to 50% of real memory: vmo –o maxclient%=50. In contrast, raw logical volumes do not use a system-level cache to cache application data, so there is neither duplication nor double-copying of data.

Figure 1: Reads under cached I/O – buffer cache hit

Figure 2: Reads under cached I/O - buffer cache miss

Figure 3: Writes under cached I/O

2.1.1 Direct I/O Certain classes of applications derive no benefit from the file buffer cache. Some technical workloads, for instance, never reuse data due to the sequential nature of their data accesses, resulting in poor buffer cache hit rates. Databases normally manage data caching at the application level, so they do not need the file system to implement this service for them. The use of a file buffer cache results in undesirable overheads in such cases, since data is first moved

from the disk to the file buffer cache and from there to the application buffer. This “double-copying” of data results in additional CPU consumption. Also, the duplication of application data within the file buffer cache increases the amount of memory used for the same data, making less memory available for the application, and resulting in additional system overheads due to memory management. For applications that wish to bypass the buffering of memory within the file system cache,

Application buffer Application

File buffer cache

Disk

1

4

5

2

1. Application issues write request 2. Kernel copies data from application buffer to file buffer

cache 3. Application continues execution, without waiting for disk

write …….

4. Periodic flushing of dirty file buffer cache pages initiated by syncd

5. Dirty pages written to disk

K E R N E L

3

Application buffer Application

File buffer cache

Disk

1

2

3

4 5

6

1. Application issues a read request 2. Kernel looks for requested data in the file buffer cache 3. Requested data not present in file buffer cache 4. Kernel reads data from disk 5. Read data is cached in file buffer cache 6. Read data is copied from the file buffer cache to the

application buffer

K E R N E L

Application buffer Application

File buffer cache

Disk

1

2 3

1. Application issues read request 2. Requested data found in file buffer cache 3. Requested data copied over to application buffer

K E R N E L

Page 7: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 3 of 12

Direct I/O is provided as an option in JFS2. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Figures 4 and 5 depict the sequence of actions that occur for reads and writes under Direct I/O.

Figure 4: Reads under Direct I/O

Figure 5: Writes under Direct I/O

2.1.1.1 Direct I/O Usage Direct I/O can be used for a file either by mounting the corresponding file system with the mount –o dio option, or by opening the file with the O_DIRECT flag specified in the open() system call. When a file system is mounted with the –o dio option, all files in the file system use Direct I/O by default. Direct I/O can be restricted to a subset of the files in a file system by placing the files that require Direct I/O in a separate subdirectory and using namefs to mount this

subdirectory over the file system. For example, if a file system somefs contains some files that prefer to use Direct I/O and others that do not, we can create a subdirectory, subsomefs, in which we place all the files that require Direct I/O. We can mount somefs without specifying –o dio, and then mount subsomefs as a namefs file system with the –o dio option using the command: mount –v namefs –o dio /somefs/subsomefs /somefs. The use of Direct I/O requires that certain alignment and length restrictions be met by the application’s I/O requests. Table 1 lists these requirements for JFS2. Failure to meet these requirements causes reads and writes to be done using normal cached I/O, but after the data is transferred to the application buffer, the cached copy is discarded. File system read-ahead does not occur for files that use Direct I/O. To avoid consistency issues, if there are multiple processes open a file and one or more processes did not specify O_DIRECT while others did, the file stays in the normal cached I/O mode. Similarly, if the file is mapped in memory through the shmat() or mmap() system calls, it stays in normal cached mode. Once the last conflicting, non-direct access is eliminated (by using the close(), munmap(), or shmdt() system calls), the file is moved into Direct I/O mode. The change from caching mode to Direct I/O mode can be expensive because all modified pages in memory will have to be flushed to disk at that point.

Table 1: JFS2 restrictions for Direct I/O

File system format

Buffer alignment

Buffer length increment

JFS2 before AIX 5.2 ML01

4K bytes 4K bytes

JFS2 as of AIX 5.2 ML01

agblksize specified at file system creation

agblksize specified at file system creation

2.1.1.2 Performance Considerations Under Direct I/O

Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice – first between the disk and the file buffer cache, and then from the file

Application buffer Application

Disk

1

1. Application issues write request 2. Kernel initiates disk write 3. Application data written to disk 4. Application continues execution upon completion of disk

writes

K E R N E L

4

2

3

Application buffer Application

Disk

1

2

3

1. Application issues read request 2. Kernel initiates disk read 3. Requested data transferred from disk to application

buffer

K E R N E L

Page 8: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 4 of 12

buffer cache to the application’s buffer. However, several factors could impact application performance when Direct I/O is used. Every Direct I/O read causes a synchronous read from disk, unlike the normal cached I/O policy where the read may be satisfied from the file buffer cache (refer Figures 2 and 5). This can result in poor performance if the data was likely to be in memory under the normal caching policy. Direct I/O also bypasses JFS2 read-ahead. File system read-ahead can provide a significant performance boost for sequentially accessed files. When read-ahead is employed, the operating system tries to anticipate future need for pages of a sequential file by observing the pattern in which an application accesses the file. When the application accesses two successive pages of the file, the operating system assumes that the program will continue to access the file sequentially, and schedules additional sequential reads of the file. These reads are overlapped with application processing, and will make the data available to the application sooner than if the operating system had waited for the program to access the next page before initiating the I/O. The number of pages to be read ahead is determined by two parameters:

• j2_minPageReadAhead Number of pages read ahead when the operating system first detects sequential access. If the program continues to access the file sequentially, the next read-ahead is twice j2_minPageRead-Ahead, the next for 4 times j2_min-PageReadAhead, and so on until the number of pages reaches j2_maxPa-geReadAhead. Default value is 2.

• j2_maxPageReadAhead Maximum number of pages the operating system will read ahead in a sequential file. Default value is 8.

These parameters are tunable, and can be set via the ioo command. Table 2 compares the performance of Direct I/O versus cached I/O for three different read scenarios. The file block size used in these experiments was 4K bytes, and the default values of j2_minPageReadAhead=2, and j2_maxPag-eReadAhead=8 were used. The first row in Table 2 corresponds to the case where the application reads a 1MB file

sequentially, byte by byte. When Direct I/O is used in this case, the alignment restrictions are violated. Consequently, normal cached I/O is used to read a 4K page into the file buffer cache, the requested byte is copied from the file buffer cache to the application buffer, and the 4K page is discarded from the file buffer cache. This results in a 4K page being read for every byte requested by the application, while also incurring the costs of double-copying of data. Cached I/O in this case enjoys two advantages: the 4K page that is brought into the file buffer cache when a single byte is read can be re-used to return 4K bytes of data to the application upon subsequent read requests. Additionally, read-ahead would occur with cached I/O, further reducing the latency of future read requests. The second row in Table 2 corresponds to the case where a 1GB file is read sequentially in 4KB increments. Although this case satisfies the alignment restrictions for Direct I/O, read-ahead will not occur when Direct I/O is used. Cached I/O again outperforms Direct I/O in this case due to file system read-ahead. Note that the total amount of data read in this case is the same for both Direct and cached I/O (although cached I/O reads one additional page, due to read-ahead). The third row in Table 2 corresponds to the case where a 1GB file is read sequentially in 10MB increments. Direct I/O significantly outperforms cached I/O in this case for two reasons. First, the overhead of double-copying is eliminated with Direct I/O. Secondly, cached I/O does not see the benefit of read-ahead in this case because at most 8 4K pages can be read ahead (since j2_maxPageReadAhead=8), while the read increment in this case is 2560 4K pages. These examples show that applications do not uniformly benefit from Direct I/O. However, applications that see performance benefits when using raw logical volumes for storage are likely to benefit from the use of Direct I/O. Raw logical volumes also impose alignment and length restrictions on I/O – they require that the application buffer be 512-byte aligned, and that lengths be in 512-byte increments. Thus, applications that use raw logical volumes for I/O already implement these alignment and length restrictions. By creating file systems with an appropriate block size (e.g., by specifying agblksize=512 at file system creation), such applications can benefit from the use of Direct I/O without any modification.

Page 9: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 5 of 12

Table 2: Direct I/O vs. cached I/O performance

Cached I/O Direct I/O

Read Increment Total File Size

Elapsed Time (sec)

Total KB Read Elapsed Time

Total KB Read

1 byte 1 MB 1.59 1,036 185.27 4,194,320 4 KB 1 GB 21.18 1,045,982 104.31 1,045,986 10 MB 1 GB 20.59 1,048,592 6.81 1,048,596

2.2 Inode Locking While an application views a file as a contiguous stream of data, this is not actually how a file is stored on disk. In reality, a file is stored as a set of (possibly non-contiguous) blocks of data on disk. Each file has a data structure associated with it, called an inode. The inode contains all the information necessary for a process to access the file, such as file ownership, access rights, file size, time of last access or modification, and the location of the file’s data on disk. Since a file’s data is spread across disk blocks, the inode contains a “table of contents” to help locate this data. It is important to note the distinction between changing the contents of an inode and changing the contents of a file. The contents of a file only change on a write operation. The contents of an inode change when the contents of the corresponding file change, or when its owner, permissions, or any of the other information that is maintained as part of the inode changes. Thus, changing the contents of a file automatically implies a change to the inode, whereas a change to the inode does not imply that the contents of the file have changed. Since multiple threads may attempt to change the contents of an inode simultaneously, this could result in an inconsistent state of the inode. In order to avoid such race conditions, the inode is protected by a lock, called the inode lock. This lock is used for any access that could result in a change to the contents of the inode, preventing other processes from accessing the inode while it is in a possibly inconsistent state. When a file is accessed for reading, the contents of the inode do not change, whereas writes to a file do change the contents of the inode (and the contents of the file). JFS2 uses a read-shared, write-exclusive inode lock which allows multiple readers to access the file simultaneously, but requires that the lock be held in exclusive mode

when a write access is made. This means that when the lock is held in write-exclusive mode by a process, no other process may access the file for either reads or writes. However, when the lock is held in read-shared mode by a process, other processes can concurrently read data from the file. Figure 6 depicts the serialization enforced by the inode lock in JFS2. In the figure, threads 1 and 2 simultaneously read data from a shared file. When thread 2 performs a write on the file, it takes the inode lock in write-exclusive mode, preventing thread 1 from performing reads or writes on the file for the duration that thread 2 holds the lock.

Figure 6: Read-shared, write-exclusive inode locking in JFS2

2.2.1 Concurrent I/O The inode lock imposes write serialization at the file level. Serializing write accesses ensures that data inconsistencies due to overlapping writes do not occur. Serializing reads with respect to writes ensures that the application does not read stale data. Sophisticated database applications

Read

Compute

Block

on read/ write

Read/ write

Read

Block on

write

Write

Compute

T H R E A D 1

T H R E A D 2

Page 10: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 6 of 12

implement their own data serialization, usually at a finer level of granularity than the file. Such applications implement serialization mechanisms at the application level to ensure that data inconsistencies do not occur, and that stale data is not read. Consequently, they do not need the file system to implement this serialization for them. The inode lock actually hinders performance in such cases, by unnecessarily serializing non-competing data accesses. For such applications, AIX 5L v5.2 ML01 offers the Concurrent I/O (CIO) option. Under Concurrent I/O, multiple threads can simultaneously perform reads and writes on a shared file. This option is intended primarily for relational database applications, most of which will operate under Concurrent I/O without any modification. Applications that do not enforce serialization for accesses to shared files should not use Concurrent I/O, as this could result in data corruption due to competing accesses.

Figure 7: Inode serialization under Concurrent I/O on JFS2

2.2.1.1 Concurrent I/O Usage Concurrent I/O can be specified for a file either through the mount command (mount –o cio), or via the open() system call (by using O_CIO as the OFlag parameter). When a file system is mounted with the –o cio option, all files in the file system use Concurrent I/O by default. Just as with Direct I/O, Concurrent I/O can be restricted to a subset of the files in the file system by placing the files that use Concurrent I/O in a separate subdirectory and using namefs to mount this subdirectory over the file system. For

example, if a file system somefs contains some files that prefer to use Concurrent I/O and others that do not, we can create a subdirectory, subsomefs containing all the files that use Concurrent I/O. We can mount somefs without the –o cio option, and then mount subsomefs as a namefs file system with the –o cio option: mount –v namefs –o cio /somefs/subsomefs /somefs. The use of Direct I/O is implicit with Concurrent I/O, and files that use Concurrent I/O automatically use the Direct I/O path. Thus, applications using Concurrent I/O are subject to the same alignment and length restrictions as Direct I/O, specified in Table 1. As with Direct I/O, if there are multiple outstanding opens to a file and one or more of the calls did not specify O_CIO, then Concurrent I/O is not enabled for the file. Once the last conflicting access is eliminated, the file begins to use Concurrent I/O. Since Concurrent I/O implicitly uses Direct I/O, it overrides the O_DIO flag for a file. Under Concurrent I/O, the inode lock is acquired in read-shared mode for both read and write accesses. However, in situations where the contents of the inode may change for reasons other than a change to the contents of the file (writes), the inode lock is acquired in write-exclusive mode. One such situation occurs when a file is extended or truncated. Extending a file may require allocation of new disk blocks for the file, and consequently requires an update to the “table of contents” of the corresponding inode. In this case, the read-shared inode lock is upgraded to the write-exclusive mode for the duration of the extend operation. Similarly, when a file is truncated, allocated disk blocks might be freed and the inode’s table of contents needs to be updated. Upon completion of the extend or truncate operation, the inode lock reverts to read-shared mode. This is a very powerful feature, since it allows files using Concurrent I/O to grow or shrink in a manner that is transparent to the application, without having to close or reopen files after a resize. Figure 7 shows the behavior of the inode lock under Concurrent I/O. Another situation that results in the inode lock being acquired in write-exclusive mode is when an I/O request on the file violates the alignment or length restrictions of Direct I/O. Alignment violations result in normal cached I/O being used for the file, and the inode lock reverts to the

Write

Compute

Block

on read/ write

Read/ write

Read

Write

File extend/ truncate

Write

T H R E A D 1

T H R E A D 2

Page 11: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 7 of 12

read-shared, write-exclusive mode of operation depicted in Figure 6.

2.2.1.2 Performance Considerations under Concurrent I/O

Since Concurrent I/O implicitly invokes Direct I/O, all the performance considerations for Direct I/O mentioned in Section 2.1.1.2 hold for Concurrent I/O as well. Thus, applications that benefit from file system read-ahead, or have a high file system buffer cache hit rate, would probably see their performance deteriorate with Concurrent I/O, just as it would with Direct I/O. Concurrent I/O will also provide no benefit for applications in which the vast majority of data accesses are reads. In such environments, read-shared, write-exclusive inode locking will already provide most of the benefits of Concurrent I/O. Applications that use raw logical volumes for data storage don’t encounter inode lock contention since they don’t access files.

2.3 The Sync Daemon The sync daemon (/usr/sbin/syncd) forces a write of dirty (modified) pages in the file buffer cache out to disk. By default, the sync daemon runs at 60-second intervals. On systems with large amounts of memory and large numbers of pages getting modified, this can result in high peaks of I/O activity when the sync daemon runs. Since Direct I/O bypasses the file buffer cache and directly writes data to disk, the use of Direct I/O results in a reduction in the number of dirty pages that need to be flushed by the sync daemon. The same holds true for raw logical volumes.

3 Performance Test Environment

In order to evaluate Concurrent I/O performance, we measured the throughput of an online transaction processing (OLTP) workload under different database storage configurations. We used Oracle9i Database for this study. This workload uses a client/server model, where a client system drives the database server with a mix of transactions intended to simulate a user environment. Database throughput was measured

in terms of the number of transactions completed per second (tps) at the client. The client also measured the response time characteristics of the transactions executed by the database server. We measured the performance of our workload under three different configurations for database storage: � Raw logical volumes � JFS2 filesystems with Direct I/O � JFS2 filesystems with Concurrent I/O The workload was update intensive, and consisted of a mix of transaction types on multiple tables. The system configuration used in these tests is specified in Table 3.

Table 3: System configuration

Attribute Value System Type pSeries™ 680 Number of CPUs 4 Processor Type RS64-IV Processor Clock Speed 600 MHz System Memory 48 GB O.S. Level AIX 5L version 5.2,

64-bit kernel Database Oracle9i Database

release 2 v9.2.0.1 (64-bit)

Database size 500 GB Configured Disks >400 SSA

The OLTP workload consists of a ramp-up phase, followed by the steady-state phase, and ending with a brief ramp-down phase. On our measurement configuration, the steady state performance was reached within about five minutes of execution. The performance measurement interval lasts about thirty minutes after reaching steady state. Database performance benchmarks typically use raw logical volumes for database storage. Raw logical volumes do not suffer from the overheads described in section 2 for file systems, and usually provide the best performance for database applications. We measured the performance of raw logical volumes to serve as the performance goal to be attained when database storage is done on file systems.

Page 12: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 8 of 12

Table 4: Comparison of storage configurations

Raw LV configuration JFS2 configuration Total no. of LVs Total no. of disks Total no. of files Total no. of disks

193 420 51 420

3.1 Raw Logical Volume Configuration

The physical configuration for the OLTP run with raw logical volumes was designed to reduce I/O wait to as near to zero as possible. The objective was to maximize throughput and minimize I/O bottlenecks by spreading data across multiple logical volumes and disks. In all, 193 raw logical volumes were used in this configuration. All of the raw logical volumes (LVs) were created in volume groups defined in 32MB partitions with the exception of the LVs for the database logs, which were defined in 128MB partitions. Previous exercises using this workload have shown that I/O disk performance is best when raw LVs are spanned (without striping) on the outer edge of the disks. For the database logs, SSA RAID-5 arrays were created consisting of 4 physical disks (pdisk) for each logical disk (hdisk). The LVs and redo log files were 20GB in size, so as to eliminate log switches during the OLTP run. Unlike the other LVs, the LVs for the redo log files were striped across the two logical hdisks, with a stripe size of 128K.

3.2 JFS2 File System Configuration As previously mentioned, Direct I/O (DIO) eliminates the overheads associated with the file buffer cache. However, the write-exclusive inode lock continues to pose a performance bottleneck with Direct I/O. In order to minimize the effects of spinning on the inode lock, file system based database applications tend to maximize the number of files in their configuration in order to increase the number of concurrent writes in progress. In previous experiments involving JFS2 file systems with the Direct I/O option, we usually created as many file systems as the number of raw logical volumes used in the raw LV configuration. However, using a large number of files increases the complexity of database administration. A smaller number of files results in a more easily manageable configuration from a database administrator’s perspective.

For this study, we chose to limit the number of files to a manageable number, rather than matching the number of raw logical volumes. This choice eliminated a direct one to one performance comparison between raw and JFS2 physical layouts and implied a potential performance loss due to the use of very large data files. In addition, we treaded on performance waters by striping the JFS2 volumes on raw devices, as opposed to the traditionally preferred method of spanning volumes on raw devices. However, as the very positive JFS2 results show, as long as the number of disks remains the same, DBAs can consolidate raw LVs into JFS2 LVs without risking loss in performance. For the Concurrent I/O run, the filesystems were mounted with the –o cio mount option, and for the Direct I/O run, they were mounted with the –o dio option. We used a file system block size (agblksize) of 4KB. In general, the file system block size used should match the database block size, in order to satisfy Direct I/O alignment restrictions. For the measurements reported in this section, the database transaction logs (redo logs) were stored on raw logical volumes. Using file systems for the redo logs, with agblksize=512 bytes, resulted in about a 4% reduction in throughput. Using agblksize>512 bytes results in much worse performance, as this violates the Direct I/O alignment restrictions that need to be satisfied in order for Concurrent I/O to be used for the file. Database control and configuration files, which are frequently accessed but do not usually satisfy Direct I/O alignment constraints, would exhibit better performance under cached I/O rather than Direct or Concurrent I/O. The thread scheduling policy used in our experiments was the default SCHED_OTHER policy. The OLTP workload measured in our experiments used asynchronous I/O for writing database buffers to disk. Asynchronous I/O (AIO) in JFS2 is handled by kernel processes (kprocs), called aioservers. AIO requests from the application are queued into an AIO queue. An

Page 13: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 9 of 12

aioserver picks requests off the AIO queue one at a time, and is unable to process any more requests until I/O has completed for the request it is currently servicing. Thus, the number of aioservers in the system limits the number of asynchronous I/O operations that can be in progress simultaneously. The maximum number of aioservers that can be created is controlled by the maxservers attribute, which has a default value of 10 per processor. For our experiments, we used a maxservers value of 400 per processor.

4 Performance Test Results In this section, we compare various performance metrics for each of the three storage configurations measured. For ease of comparison, this data is presented through a series of graphs. For each graph, we present relevant observations to help explain the depicted behavior. In the graphs presented in this section, DIO stands for Direct I/O, and CIO for Concurrent I/O.

4.1 Throughput Table 5 lists the average throughput and response times for each storage configuration, measured over the steady-state interval of each benchmark run. While the database throughput with Direct I/O was 70% lower than with raw LVs, the throughput with Concurrent I/O was only 8% lower than the raw LV case. As our graphs will corroborate, the poor Direct I/O performance can be largely attributed to severe lock contention for the inode locks. The Direct I/O performance could have been improved by using a larger number of files, such as by creating one file per disk, as this would have reduced inode lock contention. Figure 8 plots the throughput measured at 30-second intervals over the duration of the run. The two dotted vertical lines demarcate the ramp-up, steady-state and ramp-down phases of the runs. The difference in performance between Concurrent I/O and raw LVs stems from two factors. The first is simply the additional pathlength due to I/O requests having to go through the file system path. The second factor has to do with the way asynchronous I/O is handled through a file system, as opposed to when raw LVs are used. Since the AIO server threads are kernel threads, a context switch occurs for each AIO request to a file. On the other hand, AIO requests to raw logical volumes

use a “fast path” to the logical volume manager that avoids the use of AIO server threads.

Table 5: Average throughput and response times

Configuration

Avg Throughput

(tps)

Avg Response

Time (sec)

Raw LVs 710.19 0.06 Direct I/O 219.69 0.12 Concurrent I/O 652.12 0.07 Apart from this difference in performance, the behavior of the Concurrent I/O run is remarkably similar to the raw LV run. Both exhibit very low variation in throughput once steady-state is reached. The slight variation in throughput is merely because of normal load variations in the benchmark. The Direct I/O run displays wide variations in throughput, which is symptomatic of severe lock contention.

4.2 Disk I/O Figure 9 shows the disk I/O rates for each of our runs. The I/O rate stays fairly constant throughout the run, and mirrors the throughputs presented in Figure 8. This is expected, since the amount of I/O per transaction is constant, so a lower throughput number would also correspond to a lower I/O rate. It is interesting to note that file system journaling does not appear to increase the disk I/O rate in the file system runs. This is because of the OLTP workload behavior. Files are created and populated when the database is created. The run only uses pre-initialized files, and there is virtually no file creation, deletion, allocation or truncation during the run. Thus, there is little journaling during the runs. The disk farm used in our experimental setup did not pose a bottleneck in any of our runs. Figure 10 plots the highest disk utilization for any disk within each 30-second measurement interval, over the duration of the benchmark run. As shown in the graph, the highest disk utilization by any disk at any point did not exceed 70% during any of our runs. CPU usage statistics showed no I/O wait times in any of our runs.

4.3 CPU Usage CPU usage statistics are a commonly used metric for discussing system performance. We used the AIX vmstat command to collect CPU usage statistics at 30-second intervals for the entire

Page 14: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 10 of 12

duration of each run. The vmstat command breaks down CPU usage into four components: %user, %system, %idle, and %wait. %user is the percentage of CPU time spent executing user-level instructions. %system is the percentage of CPU time spent executing at the system (supervisor) level. %idle is the percentage of time that the CPU is available for additional work. %wait is the percentage of time that the CPU spends waiting for I/O to complete, with no runnable processes. Figures 11, 12 and 13 respectively show the %user, %system and %idle times for each run, as reported by vmstat. None of our benchmark runs displayed any %wait time. The high CPU utilization (%user + %system ≈ 100%) is typical of the OLTP benchmark, as there is little or no disk I/O wait. The Concurrent I/O run has a higher %system and lower %user than the raw LV run because of the additional context switches due to traversing down the file system path for I/O, and due to the use of AIO server threads for servicing AIO requests. This results in a greater amount of work being done at the system level. As such, the OLTP workload is not system-intensive. Neither the Concurrent I/O nor the raw LV runs show any idle time, whereas the Direct I/O run shows a high percentage of idle time. This, again, is indicative of severe lock contention in the Direct I/O run.

0

100

200

300

400

500

600

700

800

900

0 5 10 15 20 25 30 35 40 45 50 55Elapsed Time (minutes)

tps

RawLV DIO CIO

DIO

CIO

RawLV

Figure 8: Benchmark throughput over run duration. Higher tps indicates better performance.

0

10000

20000

30000

40000

50000

0 5 10 15 20 25 30 35 40 45 50 55Elapsed Time (minutes)

Kb

ps

Raw DIO CIO

DIO

CIO

RawLV

Figure 9: Disk I/O throughput (in kilobytes of data transferred per second) over run duration.

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40 45 50 55

Elapsed Time (minutes)

% d

isk

uti

lizat

ion

Raw DIO CIO

DIO

CIO

RawLV

Figure 10: Highest disk utilization over run duration.

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55

Elapsed Time (minutes)

%u

ser

RawLV DIO CIO

RawLV

CIO

DIO

Figure 11: %user over run duration

Page 15: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 11 of 12

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55

Elapsed Time (minutes)

%sy

stem

RawLV DIO CIO

CIO

DIO

RawLV

Figure 12: %system over run duration

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55Elapsed time (minutes)

%id

le

RawLV DIO CIO

DIO

RawLVCIO

Figure 13: %idle over run duration.

4.4 Lock Statistics The study of the locking behavior exhibited by the three storage models reveals the most information about their performance characteristics. The lock statistics reported here were gathered using the AIX trace facility. The traces were gathered over a five-second interval during the steady state phase of each run. To better understand the locking statistics presented here, a brief explanation of the AIX 5L v5.2 locking process is in order. When a thread first attempts to acquire a lock and fails, it may spin around the lock for at most maxspin number of times. The variable maxspin can be set via the AIX schedo command. For our runs, we used the default maxspin value of 16384. If the thread fails to acquire the lock after maxspin attempts, it goes into wait state and is undispatched. When the lock is released by the owning thread, it wakes up one or more of the threads waiting to acquire that lock, and the cycle repeats itself. Heavy lock contention thus manifests itself as

through a large number of threads going into the wait state, or blocking, while trying to acquire the lock. Figure 14 plots the number of blocks per second for the most contended lock classes in our tests, i.e., the number of times any thread was driven into wait state while waiting for a lock during the measurement interval, divided by the measurement interval. By “lock class”, we mean all the locks that fall under the same functional category. For example, the lock class “SSA” includes all the different locks used for serializing accesses to SSA devices.

0

200

400

600

800

1000

AIOQ inode SSA

Blo

cks/

seco

nd

CIODIORaw

.

Figure 14: Blocks/second for major lock classes

The three major lock classes observed in our traces were:

• AIOQ lock – serializes insertion and removal of AIO requests from the AIO queues.

• Inode lock – the per-file write-exclusive lock used by the file system to serialize write accesses to a file. This lock is not taken in Concurrent I/O, except under the conditions listed in Section 2.2.1.1.

• SSA lock – used for serializing accesses to an SSA device. Each device has its own lock.

As seen in Figure 14, the Direct I/O run shows excessive amounts of blocking due to the inode lock, which results in a lot of context switching. Since a large number of threads went into the wait state in the Direct I/O run, there was a significant amount of CPU idle time when there were no runnable threads, resulting in lower

Page 16: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

AIX CIO Implementation and Performance

wpAIXcio052703.doc Page 12 of 12

throughput. The Concurrent I/O and raw LV runs do not acquire write-exclusive inode locks.

5 Conclusion The performance experiments described in this paper show that using JFS2 Concurrent I/O for databases results in performance comparable to that achieved through the use of raw logical volumes for database storage, while providing greater flexibility and ease of administration. The throughput of the database application under Concurrent I/O was three times the throughput achieved with Direct I/O – a 200% improvement - and lagged the performance on raw logical volumes by only 8%. We have also shown that the performance equivalence between Concurrent I/O and raw logical volumes holds at a detailed level, with many system metrics showing remarkably similar behavior in both cases. Thus, Concurrent I/O combines all the performance advantages of using raw logical volumes, while greatly simplifying the task of database administration. This makes Concurrent I/O a very attractive option for database storage.

© Copyright IBM Corporation 2003

IBM Corporation Marketing Communications Server Group Route 100 Somers, New York 10589 Produced in the United States of America 05-03 All Rights Reserved This publication was developed for products and/or services offered in the United States. IBM may not offer the products, features, or services discussed in this publication in other countries. The information may be subject to change without notice. Consult your local IBM business contact for information on the products, features and services available in your area. This equipment is subject to FCC rules. It will comply with the appropriate FCC rules before final delivery to the buyer. IBM hardware products are manufactured from new parts, or new and used parts. Regardless, our warranty terms apply. All statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information concerning non-IBM products was obtained from the suppliers of these products. Questions on the capabilities of the non-IBM products should be addressed with the suppliers. All performance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM. IBM, the IBM logo, AIX, AIX 5L, pSeries are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, and service names may be trademarks or service marks of others. The IBM home page on the Internet can be found at http://www.ibm.com The pSeries home page on the Internet can be found at http://www.ibm.com/servers/eserver/pseries/

Page 17: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

Performance Management Guide - Tuning Direct I/O

[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Performance Management Guide

Tuning Direct I/O

When you are processing normal I/O to JFS files, the I/O goes from the application buffer to the VMM and from there to the JFS. The contents of the buffer could get cached in RAM through the VMM's use of real memory as a file buffer cache. If the file cache hit rate is high, then this type of cached I/O is very effective by improving performance of JFS I/O. But applications that have poor cache hit rates or applications that do very large I/Os may not get much benefit from the use of normal cached I/O. In operating system version 4.3, direct I/O was introduced as an alternative I/O method for JFS files.

Direct I/O is only supported for program working storage (local persistent files). The main benefit of direct I/O is to reduce CPU utilization for file reads and writes by eliminating the copy from the VMM file cache to the user buffer. If the cache hit rate is low, then most read requests have to go to the disk. Writes are faster with normal cached I/O in most cases. But if a file is opened with O_SYNC or O_DSYNC (see Using sync/fsync Calls), then the writes have to go to disk. In these cases, direct I/O can benefit applications because the data copy is eliminated.

Another benefit is that it allows applications to avoid diluting the effectiveness of caching of other files, When a file is read or written, that file competes for space in the file cache which could cause other file data to get pushed out of the cache. If an application developer knows that certain files have poor cache-utilization characteristics, then only those files could be opened with O_DIRECT.

For direct I/O to work efficiently, the I/O request should be appropriate for the type of file system being used. The finfo() and ffinfo() subroutines can be used to query the offset, length, and address alignment requirements for fixed block size file systems, fragmented file systems, and bigfile file systems (direct I/O is not supported on compressed file systems). The information queried are contained in the structure diocapbuf as described in /usr/include/sys/finfo.h.

To avoid consistency issues, if there are multiple calls to open a file and one or more of the calls did not specify O_DIRECT and another open specified O_DIRECT, the file stays in the normal cached I/O mode. Similarly, if the file is mapped into memory through the shmat() or mmap() system calls, it stays in normal cached mode. If the last conflicting, non-direct access is eliminated, then the JFS will move the file into direct I/O mode (either by using the close(), munmap(), or shmdt() subroutines). Changing from normal mode to direct I/O mode can be expensive because all modified pages in memory will have

http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365c813.htm (1 of 3) [5/30/2003 7:23:50 AM]

Page 18: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

Performance Management Guide - Tuning Direct I/O

to be flushed to disk at that point.

Direct I/O does not enhance raw I/O performance. Raw I/O performance is still slightly faster than direct I/O, but if you want the benefits of a JFS and enhanced performance, direct I/O is highly recommended.

Performance of Direct I/O Reads

Even though the use of direct I/O can reduce CPU usage, it typically results in longer elapsed times, especially for small I/O requests, because the requests would not be cached in memory.

Direct I/O reads cause synchronous reads from the disk, whereas with normal cached policy, the reads may be satisfied from the cache. This can result in poor performance if the data was likely to be in memory under the normal caching policy. Direct I/O also bypasses the VMM read-ahead algorithm because the I/Os do not go through the VMM. The read-ahead algorithm is very useful for sequential access to files because the VMM can initiate disk requests and have the pages already be resident in memory before the application has requested the pages. Applications can compensate for the loss of this read-ahead by using one of the following methods:

● Issuing larger read requests (minimum of 128 K) ● Issuing asynchronous direct I/O read-ahead by the use of multiple threads ● Using the asynchronous I/O facilities such as aio_read() or lio_listio()

Performance of Direct I/O Writes

Direct I/O writes bypass the VMM and go directly to the disk, so that there can be a significant performance penalty; in normal cached I/O, the writes can go to memory and then be flushed to disk later (write-behind). Because direct I/O writes do not get copied into memory, when a sync operation is performed, it will not have to flush these pages to disk, thus reducing the amount of work the syncd daemon has to perform.

Performance Example

In the following example, performance is measured on an RS/6000 E30, 233 MHz, with AIX 4.3.1 (your performance will vary, depending on system configuration). KBPS is the throughput in kilobytes per second, and %CPU is CPU usage in percent.

# of 2.2 GB SSA Disks # of PCI SSA Adapters

1 1

2 1

4 1

6 1

8 1

Sequential read throughput, using normal I/O

KBPS 7108 14170 18725 18519 17892

http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365c813.htm (2 of 3) [5/30/2003 7:23:50 AM]

Page 19: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

Performance Management Guide - Tuning Direct I/O

%CPU 23.9 56.1 92.1 97.0 98.3

Sequential read throughput, using direct I/O

KBPS 7098 14150 22035 27588 30062

%CPU 4.4 9.1 22.0 39.2 54.4

Sequential read throughput, using raw I/O

KBPS 7258 14499 28504 30946 32165

%CPU 1.6 3.2 10.0 20.9 24.5

Summary

Direct I/O requires substantially fewer CPU cycles than regular I/O. I/O-intensive applications that do not get much benefit from the caching provided by regular I/O can enhance performance by using direct I/O. The benefits of direct I/O will grow in the future as increases in CPU speeds continue to outpace increases in memory speeds.

What types of programs are good candidates for direct I/O? Programs that are typically CPU-limited and perform lots of disk I/O. "Technical" codes that have large sequential I/Os are good candidates. Applications that do numerous small I/Os will typically see less performance benefit, because direct I/O cannot do read ahead or write behind. Applications that have benefited from striping are also good candidates.

[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365c813.htm (3 of 3) [5/30/2003 7:23:50 AM]

Page 20: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

INITIALIZATION PARAMETER RECOMMENDEDVALUE

DEFAULTVALUE

_always_anti_join <DEFAULT> CHOOSE

_always_semi_join <DEFAULT> CHOOSE

_b_tree_bitmap_plans <DEFAULT> TRUE

_complex_view_merging <DEFAULT> TRUE

_fast_full_scan_enabled FALSE TRUE

_index_join_enabled <DEFAULT> TRUE

_like_with_bind_as_equality TRUE FALSE

_new_initial_join_orders <DEFAULT> TRUE

_optimizer_mode_force <DEFAULT> FALSE

_optimizer_undo_changes <DEFAULT> FALSE

_or_expand_nvl_predicate <DEFAULT> TRUE

_ordered_nested_loop <DEFAULT> TRUE

_push_join_predicate <DEFAULT> TRUE

_push_join_union_view <DEFAULT> TRUE

_shared_pool_reserved_min_alloc

4100 5000

_sort_elimination_cost_ratio 5 0

_sortmerge_inequality_join_off <DEFAULT> FALSE

_sqlexec_progression_cost 0 1000

_system_trig_enabled <DEFAULT> TRUE

_table_scan_cost_plus_one TRUE FALSE

_trace_files_public TRUE FALSE

_unnest_subquery <DEFAULT> TRUE

_use_column_stats_for_function <DEFAULT> TRUE

aq_tm_processes 1 0

compatible 9.2.0 none

cursor_sharing <DEFAULT> EXACT

cursor_space_for_time <DEFAULT> FALSE

db_block_checking <DEFAULT> FALSE

db_block_checksum <DEFAULT> TRUE

db_block_buffers 300+ MB 48 MB

db_block_size 8192 2048

db_file_multiblock_read_count 8 8

db_files max no. of datafiles 200

dml_locks 10000 or more 4 x transactions

enqueue_resources 32000 or more derived

hash_area_size <DEFAULT> 2 xsort_area_size

java_pool_size 150000000 20000

job_queue_interval <DEFAULT> 60

job_queue_processes 2 0

Bruce Spencer
Oracle 9i init ora Guideline
Page 21: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

log_buffer 10485760 524288

log_checkpoint_interval 999999999 os dependent

log_checkpoint_timeout 999999999 900

log_checkpoints_to_alert TRUE FALSE

max_dump_file_size <DEFAULT> UNLIMITED

max_enabled_roles 100 20

nls_comp <DEFAULT> BINARY

nls_date_format DD-MON-RR derived

nls_language AMERICAN derived

nls_length_semantics <DEFAULT> BYTE

nls_numeric_characters ".," derived

nls_sort BINARY derived

nls_territory AMERICA os dependent

o7_dictionary_accessibility TRUE FALSE

open_cursors 500 50

optimizer_features_enable 9.2.0 none

optimizer_index_caching <DEFAULT> 0

optimizer_index_cost_adj <DEFAULT> 100

optimizer_max_permutations <DEFAULT> 2000

optimizer_mode <DEFAULT> CHOOSE

optimizer_percent_parallel <DEFAULT> 0

parallel_max_servers 2 x cpu_count derived

parallel_min_percent <DEFAULT> 0

parallel_min_servers <DEFAULT> 0

parallel_threads_per_cpu <DEFAULT> 2

pga_aggregate_target 500M 0

processes max no. of users derived

query_rewrite_enabled TRUE FALSE

row_locking <DEFAULT> ALWAYS

session_cached_cursors 200 0

sessions 2 x processes derived

shared_pool_reserved_size 10% shared_pool 5% shared_pool

shared_pool_size 314572800 or more 16 or 64 MB

sort_area_size <DEFAULT> 65536

sql_trace <DEFAULT> FALSE

timed_statistics TRUE FALSE

undo_management AUTO MANUAL

undo_retention 1800 900

undo_suppress_errors <DEFAULT> FALSE

undo_tablespace Undo tablespace name first available

workarea_size_policy AUTO derived

Page 22: Oracle9i / AIX5L Installation FAQusers.ca.astound.net/.../Oracle-9i-Install-tip.pdf · Please note that some Oracle components require Oracle's pw-syscall kernel extension. Some old[er]

Parameter Name Development/ TestInstance

11 - 100 Users 101 - 500Users

501 -1,000Users

1,001 - 2,000Users (*)

processes 200 200 800 1200 2500

sessions 400 400 1600 2400 5000

db_block_buffers 20000 50000 150000 250000 400000

undo_retention 1800 3600 7200 10800 14400

shared_pool_size 300M 600M 800M 1000M 2000M

shared_pool_reserved_size 30M 60M 80M 100M 100M

pga_aggregate_target 500M 1000M 2000M 4000M 8000M

Total Memory Required (**) ~ 1 GB ~ 2 GB ~ 4 GB ~ 8 GB ~ 14 GB

The range of user counts provided above refers to active users, not total or namedusers.

Note (*) For instances supporting a minimum of 1,000 Oracle users, you should use theOracle 64-bit Server for your platform in order to support large SGAs.

Note (**) The total memory required refers to the amount of memory required for thedata server instance and associated memory including the SGA and the PGA. Youshould ensure that your system has sufficient available memory in order to support thevalues provided above. The values provided above should be adjusted based onavailable memory so as to prevent swapping and paging.