Upload
dhiraj-satpute
View
89
Download
0
Embed Size (px)
Citation preview
Running Oracle Database 10g or 11g on
an HP-UX ccNUMA-based server
Updated for Oracle 11gR2 and HP-UX 11i v3
Technical white paper
Table of contents
Executive summary ............................................................................................................................... 2
ccNUMA: an introduction ..................................................................................................................... 4 ccNUMA infrastructure ..................................................................................................................... 5 HP-UX 11i v3 and “LORA” mode ....................................................................................................... 6 Virtualization, CLM, and “LORA” mode .............................................................................................. 7
Oracle‟s NUMA optimizations .............................................................................................................. 9 Enabling or disabling the Oracle NUMA optimizations ........................................................................ 9 Determining the ccNUMA configuration............................................................................................ 10 How the optimizations work ............................................................................................................ 11 Configuring the server and Oracle to match one another .................................................................... 11 A clear case for NUMA optimization: small vPar in large nPar ............................................................ 13 Dynamic reconfiguration considerations with ccNUMA optimizations ................................................... 14
Summary .......................................................................................................................................... 15
Appendix: Oracle versions, default NUMA enablement ......................................................................... 16
For more information .......................................................................................................................... 17
2
Executive summary
Many HP servers are based on a ccNUMA (cache-coherent Non-Uniform Memory Access)
architecture; these servers provide optional features which allow programmers to optimize application
performance by tailoring their application to the non-uniform nature of the underlying host server.
Depending on the application and workload, by taking advantage of these features and reducing
higher-latency memory references, modest performance gains can be achieved.
Starting with release 10g, the Oracle® database includes such NUMA-optimization features; these
can be enabled or disabled at database startup time, and they are enabled by default on some
Oracle versions, and disabled on others. To be effective, these optimizations require that the server‟s
memory is allocated mainly to the individual locality domains (cell-local memory or socket-local
memory) rather than to the system as a whole (interleaved memory); newer HP ccNUMA servers
default to such a configuration, but older ccNUMA servers do not. It is very important for performance
reasons that the setting of Oracle‟s NUMA optimizations matches the configuration of the server:
never rely on the default settings and configurations to produce an optimal result.
Furthermore, dynamic resource reconfiguration will have significant consequences to any running
NUMA-optimized Oracle Database instances. Because the optimizations are configured by Oracle
only at the time the instance starts up, subsequent changes to the structure (number of locality
domains) of the underlying host will, at best, result in the instance being optimized for the wrong
structure (which will incur sub-optimal performance); at worst, they may cause the database instance
to fail.
Thus, if dynamic reconfiguration is to be undertaken, either:
– disable Oracle‟s NUMA optimizations, or
– manage any dynamic reconfiguration very carefully, following the recommendations spelled out
in a companion white paper, “Dynamic server resource allocation with Oracle Database 10g or
11g on an HP-UX ccNUMA-based server”
Table 1. Summary of rules governing Oracle NUMA optimization with dynamic system reconfiguration
Oracle NUMA
optimization Dynamic Server Reconfiguration
No Dynamic Server
Reconfiguration
disabled or off
(ensure adequate
interleaved
memory for Oracle
SGA)
OK OK
enabled (ensure
adequate cell-local
memory in each
domain for Oracle
SGA)*
OK with restrictions (see companion white paper) but
generally not recommended. OK
* (default on older HP servers is: 100% ILM, 0% CLM)
Applicable servers:: All ccNUMA-based HP-UX servers (all cell-based servers and all servers based on
the Intel® Itanium® 9300 processor) which are running Oracle Database 10g or later.
Target audience: Oracle DBAs and those who administer ccNUMA HP-UX systems which will host
Oracle databases.
3
Support: This whitepaper makes recommendations based on HP‟s best knowledge of the stated
configuration options at this time (June 2010). We do not and cannot imply Oracle support for any of
these configuration options – statements of Oracle support can only be made by Oracle.
Figure 1. Traditional Uniform-Memory-Access (UMA) design. All processors, and all processes (P1, P2, etc), have equal latency
to their objects in any part of memory. The server is scaled up by connecting more processors to the bus (which eventually
causes the bus to be a bottleneck and reduce scalability).
4
ccNUMA: an introduction
Cache-coherent non-uniform memory access (ccNUMA) is an architectural technique which has been
used successfully to build large, multi-processor servers in a way that avoids the bus bottlenecks which
characterize traditional uniform server design (see Figure 1). HP‟s first generation of ccNUMA servers
are cell-based: they are comprised of multiple smaller cells of CPUs, memory, and I/O cards linked
together by at least one low-latency cross-bar switch. These individual cells can be operated
independently or in groups of two or more cells; each such operating unit is known as an nPar or
hardware partition. When multiple cells are incorporated into the same hardware partition, processes
running in any cell can access memory owned by any other cell (Figure 2). Accessing memory from a
remote cell results in slightly greater latency than an access to memory in a process‟ own cell. Since
memory-access time is a component of CPU-busy time, an increase in memory latency equals an
increase in CPU-busy time.
Table 2. HP servers based on ccNUMA architecture
Cell-based ccNUMA servers Itanium 9300-based ccNUMA servers
rp7410, rp7420, rp7440, rx7620, rx7640,
rp8400, rp8420, rp8440, rx8620, rx8640,
HP 9000 Superdome, HP Integrity Superdome
rx2800 i2, BL860c i2, BL870c i2, BL890c i2,
Superdome 2
In 2010, HP introduced new Integrity servers based on the latest Intel Itanium processor (the four-
core/eight-hyper-thread Itanium 9300 series). Like their predecessors, these servers are modular in
nature: the blade-based two-socket BL860c i2 can be used by itself, doubled to form a four-socket
server (BL870c i2), or quadrupled to form an eight-socket server (BL890c i2). A non-modular rack-
mounted version of the two-socket server (rx2800 i2) is also available. The Itanium 9300 is designed
so that each processor socket has its own memory; all memory in these servers is allocated (more or
less) evenly among the sockets. In a two-socket server, a process will have faster access time to
objects in the memory on the same socket as the core where the process is running than it would to
objects in the memory of the other socket. When multiple two-socket blades are lashed together to
form larger servers, in addition to the latency differential between the sockets in the same blade, there
are further increases in latency when accessing memory objects on the other blades which comprise
the server. So while previous ccNUMA servers featured locality domains based on the cell concept,
these newest ccNUMA servers‟ locality domains are based on their socket layout.
ccNUMA-based servers provide excellent performance and scalability without requiring that
workloads be adapted in any way for the ccNUMA architecture. However, certain application
workloads may realize small or even modest performance gains if they are made aware of the
server‟s hierarchical memory layout and are able to adapt themselves to minimize interactions
between the locality domains (the cells or sockets) that comprise the server. By restricting a process to
a certain domain and assigning its memory objects to the memory of that same domain, a process
will minimize memory latency. Processes which do large amounts of data manipulation in memory are
thus candidates for ccNUMA optimization.
5
Figure 2. Non-Uniform Memory Access. Small UMA servers (cells) are connected together through one or more high-speed
switches to form a larger server. Any process has access to any region of memory, but latency times are lower if the object is
closer to the accessing process. In this example, P1‟s access to its object is faster than P2‟s, which is faster than P3‟s.
ccNUMA infrastructure
While the memory in a ccNUMA server is organized in a hierarchical manner, applications are
traditionally designed with the assumption that all parts of memory are equal. Thus, some or all
memory in a ccNUMA server can be designated as “interleaved memory” (ILM), which is designed to
emulate, as much as possible, a uniform architecture. Interleaved memory is managed as a single unit
across all locality domains in the server or partition; an object placed in ILM will be striped across all
locality domains, and access times, on average, will be the same regardless of where (in which
locality domain) the accessing process is located.
The alternative to configuring memory as ILM is to configure it in its native ccNUMA state, which is
called Cell-Local Memory (CLM), or sometimes Socket-Local Memory (SLM). CLM/SLM is configurable
on a per-cell/per-socket basis; an object placed in CLM (SLM) in a cell or socket will be completely
contained within that cell or socket.
In general, one would only place objects in CLM if the applications which access those objects have
been optimized for ccNUMA (i.e., if the application‟s processes can be localized to a single locality
domain); for non-optimized applications, ILM would usually be the best choice.
The memory in HP ccNUMA servers is configurable as ILM or CLM; this configuration is set at power-
up and is fixed while the server is running. The server management interface can be used to view and
modify the ILM/CLM configuration, with any changes taking effect at the next reboot. The default
configuration depends on the particular server model: for Itanium 9300-based servers, the default is
87.5% CLM, but for all other servers, the default is 100% ILM. It is important to know the ILM/CLM
configuration of your server.
6
A note about terminology: the earliest ccNUMA servers were all cell-based, so it‟s not unusual to refer
to the individual locality domains as “cells”, which is why we have Cell-Local Memory as opposed to
“Domain-Local Memory” (though the term “Socket-Level Memory” is sometimes used). Today, we use
the term “locality domain” (or LDOM, or sometimes just locality or domain) to mean cells or sockets –
any set of resources which have equal latency to a region of memory. Furthermore, we might refer
specifically to a locality domain by its number: LDOM 0 through LDOM n.
Figure 3. Interleaved Memory (ILM) is spread across the Locality Domains (LDOMs); Cell-Local Memory (CLM) is contained
within each LDOM. An object placed in ILM will thus be striped across the LDOMs; a process accessing it will find some
accesses to be local (same LDOM), others remote. An object placed in CLM will be local for all processes in that same LDOM
(but all accesses from remote LDOMs will have higher latency). In this example, P1 is accessing an object in ILM (with varying
access times since one part of the object is in the same LDOM, but most of it is in remote LDOMs) and an object in the CLM in
its own LDOM (where all accesses are local). When P2 accesses the object in LDOM 0, all accesses are remote.
HP-UX 11i v3 and “LORA” mode
HP-UX 11i v3 has been enhanced to provide tools and features which facilitate application
optimization for the ccNUMA architecture. The most evident of these is a new system variable,
“LORA_MODE”, which was introduced with HP-UX 11i v3 update 3 (September 2008). The term
LORA comes from Locality-Optimized Resource Allocation, the science of optimizing for ccNUMA.
LORA_MODE is intended to indicate whether the server is configured adequately to permit locality
optimization (ccNUMA optimization) to be effective. A LORA_MODE value of 1 (one) indicates a
sufficient configuration; currently, HP-UX will set this variable to 1 if 86-90% of the server‟s memory is
configured as CLM, and if this CLM is evenly distributed across the locality domains.
LORA_MODE cannot be changed directly, but a kernel tunable parameter (numa_mode) can be used
to control LORA_MODE. The kernel‟s default value for numa_mode is 0 (zero), which allows HP-UX to
set LORA_MODE based solely on the hardware configuration. Setting numa_mode to 1 (one) will
force LORA_MODE to 1, while setting numa_mode to -1 (minus one) will force LORA_MODE to zero.
If numa_mode is set to: Then LORA_MODE will be:
0 (default) Set by HP-UX to 1 if adequate
CLM, 0 otherwise
1 1
-1 0
7
The purpose of LORA_MODE is to provide a way of indicating to any optimized applications that the
server has sufficient CLM to support those optimizations. (An application that is optimized for
ccNUMA requires sufficient CLM in which to place its memory objects; without enough CLM, the
optimizations would have no – or perhaps even a detrimental – effect on performance.)
To determine the value of LORA_MODE, use the HP-UX “getconf” command:
getconf LORA_MODE
As will be discussed in more detail below, Oracle‟s most recent database version, 11gR2, checks
LORA_MODE before engaging its optimizations.
An additional HP-UX 11i v3 parameter which affects ccNUMA behavior is numa_policy, which
governs the allocation of memory (in CLM or ILM). The valid settings of numa_policy are:
0: (default): autosense the right policy based on the (LORA) mode in which HP-UX is
operating
1: (default in LORA mode 1): honor requests from the application, otherwise place the object
in the CLM of the domain where the process is most likely to run
2: override requests; always allocate in ILM if possible
3: honor requests by the application, otherwise allocate in the CLM of the closest domain,
except for text/library objects, which will be placed in ILM if possible
4: (default for LORA mode 0): allocate “non-LORA-intelligently” (i.e., favor ILM for shared
objects and CLM for private objects, but honor requests made by the application)
In short, if the application specifies the location of a newly created object, that location will generally
be used unless numa_policy=2, or unless there‟s not enough CLM or ILM available, in which case the
object will be placed in the next-best location. If the application does not specify the location of the
object, it will default according to the setting of numa_policy.
The HP-UX command kctune can be used to determine the current setting of either numa_mode or
numa_policy:
kctune numa_mode
kctune numa_policy
Virtualization, CLM, and “LORA” mode
Most virtualized servers take on the ccNUMA characteristics of their constituent components, and HP-
UX, and any applications running on these servers, will act accordingly. An nPar or vPar which is
comprised of multiple locality domains will exhibit the same ccNUMA behavior as a physical server
constructed from the same hardware components; HP-UX 11i v3 will apply the same LORA_MODE
tests, and applications will derive the same benefit from NUMA optimization.
Just as it‟s important to properly configure a physical server with sufficient CLM for application
optimizations to be effective, it is important to properly configure a multi-domain vPar or nPar which
will host an optimized application.
To verify the amount of cell-local and interleaved memory configured in an nPar, use the HP-UX
“parstatus” command as follows:
parstatus –w (to determine the nPartition-number of the system)
parstatus –V –p nPartition-number
When configuring a vPar, be aware that some of the memory you specify will be used for vPar
management overhead, so be sure to configure your vPar with more memory than you require,
keeping in mind that LORA_MODE will be set to 1 only if HP-UX sees that 86-90% of the vPar‟s
8
memory is configured as CLM. When you start the vPar for the first time, verify your CLM/ILM
amounts, and (for HP-UX 11i v3) the state of the LORA_MODE variable (using “getconf” as described
above). The HP-UX “machinfo” command will tell you how much total memory and interleaved
memory that HP-UX sees:
machinfo –m
PSETs will also affect the ccNUMA characteristics seen by any applications running within them. A
PSET that is wholly within a single locality domain will cause any applications running within it to
behave as if they were running on a non-ccNUMA server. Likewise, if an nPar or vPar or does not
span multiple domains, then it will be considered a non-ccNUMA server, even if that partition is part
of a larger server that itself comprises multiple domains.
HP Integrity Virtual Machines are the lone exception: VM guests will always appear as a single
locality domain, regardless of the layout of the underlying physical infrastructure.
One last fact about CLM and ILM with vPars: while CLM for each locality in the vPar is obviously
contained within those localities, ILM will be interleaved across ALL the localities that constitute the
server (or nPar) of which the vPar is a part. An example will help illustrate this fact: imagine an nPar
of eight cells that contains a two-cell vPar (see Figure 4). You might expect that the ILM for this vPar
would only span cells 0 and 1, but actually, it will span all eight cells – so when a process in our vPar
needs to access an ILM-based object, it will wind up accessing memory in cells which are not even
part of our vPar. (And as you can see, some of the memory in cells 0 and 1 are allocated to ILM that
will be available to objects belonging to other vPars, which means that we will have to share some of
our memory bandwidth with processes from outside our own vPar.)
This last fact may surprise you – if you thought that by setting up a vPar, your computing activity
would be completely isolated within the vPar, you‟ve just learned that you were wrong – activity on
every vPar DOES affect performance on all vPars because of this ILM characteristic. The performance
advantages of using NUMA optimization to reduce the use of ILM (or, at the very least, of forcing
objects to be placed in CLM instead of ILM) should thus be very clear.
We will discuss this issue and how Oracle is affected once we‟ve discussed Oracle‟s NUMA
optimizations in general.
Figure 4. An example of an eight-cell nPar (or server) with a vPar consisting of cells 0 and 1. Note that while the vPar‟s CLM is
contained completely within cells 0 and 1, the vPar‟s ILM is a portion of the overall ILM allocated to the entire nPar/server - it is
interleaved across all the cells.
9
Oracle‟s NUMA optimizations
To exploit the characteristics of ccNUMA to improve performance, Oracle Database (version 10g and
later) has been enhanced to accommodate Oracle processes and their corresponding memory objects
within the same domains. Such enhanced behaviors are known as Oracle‟s NUMA optimizations
(deliberately without the “cc”), and are engaged only on multi-domain servers and only when they are
enabled via the appropriate Oracle initialization parameter. Each time a NUMA-optimized Oracle
database instance is started, it will modify its configuration in order to best conform to the ccNUMA
characteristics of the host under which it is running.
Enabling or disabling the Oracle NUMA optimizations
Oracle‟s NUMA optimizations are controlled through an initialization parameter which is only
checked at instance startup time. The parameter can be changed at any time but it will have no effect
until the next time the database instance is started.
The name of this parameter depends on the version of Oracle, and the default state of the parameter
depends on the specific release and patch level (which is why HP recommends that this parameter
always be explicitly set!). (The specific default behavior of each Oracle version is included as an
appendix to this paper.)
Oracle versions 10gR2 and 11gR1use a hidden (so-called “underbar”) parameter to control the
ccNUMA optimizations: _enable_NUMA_optimization. Oracle typically does not support these
“underbar” parameters and this one is no exception; however, certain Oracle documentation does
refer to changing this parameter in order to enable or disable the optimizations.
Note
The use of “underbar” parameters to control certain Oracle behaviors is
unsupported by Oracle. Furthermore, since such parameters are
unsupported and undocumented, they may change from one release of
Oracle to the next. We therefore recommend that you consult your Oracle
support resources before making any such changes to a supported Oracle
database instance.
Oracle version 11gR2 uses a different “underbar” parameter: _enable_NUMA_support. This
parameter is FULLY supported by Oracle (even though it starts with an underbar!).
NOTE: when upgrading to 11gR2 from a previous release of Oracle, make sure to remove any
references to the old parameter, _enable_NUMA_optimization. This parameter is unfortunately not
ignored by Oracle 11gR2 – it still affects some aspects of the optimizations and should thus be
avoided.
When a NUMA-optimized Oracle 11gR2 instance is started up, a message indicating the use of
NUMA optimizations (“NUMA system found and support enabled”) is displayed on the console and
in the instance‟s alert log. (No such indication is given for older Oracle versions.)
A complete description of the particular NUMA behaviors associated with each version of Oracle
Database is included as Table 3.
For Oracle Database version 10.2.0.4, the optimizations are not implemented correctly and should
never be used: __enable_NUMA_optimization should always be set to false (and the server should
thus be configured ILM-heavy) for Oracle 10.2.0.4.
10
Table 3. Oracle NUMA behavior. Binding of “none” indicates that Oracle does not restrict the location of the
process(es).Parallel Query Slaves are created and bound one per LDOM in a round-robin (RR) fashion when in NUMA mode.
With NUMA optimizations off, the location of the SGA (including the DB cache) is not specified by Oracle, so it defaults to the
location implied by the HP-UX parameter numa_policy. In addition, the optimizations will not be engaged if Oracle detects only
one locality domain (cell or socket) when the database is started. See “HP-UX 11i v3 and „LORA‟ mode”, above, for definition
and default behavior of numa_policy. O
racl
e ve
rsio
n
NU
MA
para
met
er ¹
LORA
_MO
DE
optim
izat
ions
are
db_c
ache
loca
tion
fixed
_SG
A lo
catio
n
# db
wr
dbw
r bin
ding
para
llel q
uery
bin
ding
lgw
r, lo
g bu
ffer
bin
ding
othe
r bin
ding
_eNo=false 0, 1, or n/a disabled default² default² lcpu³ / 8 none none none none
_eNo=true 0, 1, or n/a engaged CLM (even) default² # LDOMseach
LDOMLDOM RR LDOM 1 LDOM 0
_eNs=false 0 or 1 disabled default² default² lcpu³ / 8 none none none none
0 unengaged default² default² # LDOMs none none none none
1 engaged CLM (even) default² # LDOMseach
LDOMLDOM RR LDOM 1 LDOM 0
¹ _eNo = _enable_NUMA_optimization (10gR2 and 11gR1)
¹ _eNs = _enable_NUMA_support (11gR2)
² depends on numa_policy: either ILM, or CLM of a single LDOM
³ lcpu = logical CPU count {= # cores, times 2 if hyperthreading enabled}
*10.2.0.4 only: optimizations are broken. _eNo should ALWAYS be set to false
_eNs=true
Determining the ccNUMA configuration
At instance startup, if the initialization parameter governing Oracle‟s NUMA optimizations has been
enabled, Oracle performs a check to determine the ccNUMA state of the host server (which could be
a physical server, nPar, vPar, Integrity Virtual Machine, or other partition). The nature of this check
depends on the release of Oracle:
For Oracle 11gR2, the server‟s LORA_MODE variable is checked – if it is zero, the
optimizations will be disabled for that execution of the instance. If LORA_MODE is one,
Oracle still has other checks to perform:
For all Oracle versions, Oracle counts the number of active locality domains (i.e., domains
containing one or more CPUs) present on the host server. If Oracle detects only one locality
domain, NUMA optimizations will be disabled for that execution of the instance; if multiple
domains are detected, the optimizations will be enabled.
Thus, while all versions of Oracle do check to make sure that the server has more than one locality
domain available before engaging optimized behavior, only version 11gR2 checks whether the
server is configured with adequate CLM (LORA_MODE is 1) to render the optimizations effective.
If the server has no CLM and the optimizations are enabled anyway, Oracle will function normally but
the optimizations will be useless – see below. Recall that the default configuration for many HP servers
is zero CLM, so it is important to explicitly configure your server with enough CLM if you plan to use
Oracle‟s NUMA optimizations.
11
How the optimizations work
When Oracle‟s NUMA optimizations are engaged, the following actions are taken as the instance is
started:
– The shared data area (System Global Area, or SGA) is split into multiple pools of memory, one
pool for each locality domain in the server. (The “fixed SGA” is not included – see below.)
– Multiple database writer process are launched, one in each locality domain, in order to more
efficiently write out the dirty database buffers contained in that domain‟s SGA pool.
– The log-writer process is bound to locality domain 1, and all log buffers are allocated from CLM
in that domain.
– Background processes that are instantiated as multiple copies (e.g., the parallel processing
processes ora_pNNN, and query processing processes ora_qNNN) are distributed across the
locality domains in a round-robin fashion.
– All other Oracle background processes (pmon, smon, ckpt, etc.) are bound to the locality domain
in which they are initially placed by HP-UX to ensure that their memory accesses remain local as
much as possible.
A small portion of the SGA, known as the “fixed SGA”, is not explicitly assigned by Oracle to any
particular location. It is thus governed by numa_policy – when LORA_MODE=0 (or when
numa_policy is set to favor ILM), it will go into ILM, and when LORA_MODE=1 (or when numa_policy
is set to favor CLM), it will be placed in the CLM of a single domain, usually LDOM 0. (Note that in
previous versions of this white paper we reported that the fixed SGA would go into ILM when the
NUMA optimizations are engaged – such was our understanding of Oracle‟s intent, but empirical
evidence proves otherwise.) Thus, when LORA_MODE=1 (unless numa_policy is explicitly set to 2,
forcing everything into ILM), the only ILM that will be used by a NUMA-optimized Oracle instance will
be its shared text – i.e., the Oracle code itself, which is currently on the order of 350 MB.
Note
Oracle‟s shadow client processes are not NUMA optimized, because the
TNS listener process, tnslsnr (which spawns the Oracle shadow client
processes for incoming database connections), is not ccNUMA-aware.
Without explicit customization of the listener process, allocation of shadow
processes (and their private objects) will follow the same algorithm that‟s
long been used for shadow process distribution. As a best practice, we
continue to recommend the use of “mpsched –P LL –p <lsnr-pid>” as a
means of enforcing a least-loaded scheduling policy on the spawned
shadow processes. Such enforcement ensures that the shadow processes
will be distributed evenly across the available CPU resources on the host
server and that their allocation of private data will be optimized for their
resident LDOM.
Since the NUMA configuration is determined only at instance startup, Oracle cannot respond to
dynamic system reconfiguration. See “Dynamic reconfiguration considerations with ccNUMA
optimizations”, below, for further discussion.
Configuring the server and Oracle to match one another
It is important to be sure that Oracle and its host server are configured compatibly, either for NUMA-
optimized behavior or for non-optimized behavior. The default configuration for some server models is perfectly mis-matched with the default Oracle configuration of certain Oracle versions (older servers
default to 100% ILM, while NUMA optimizations are enabled by default on some older Oracle
versions; Itanium 9300-based servers default to 87.5% CLM, but the NUMA optimizations are
12
disabled by default for the latest Oracle versions). We strongly recommend that the defaults never be
taken (or at least that the configurations be explicitly examined to make sure they are well aligned).
For NUMA-optimized behavior
When Oracle‟s NUMA optimizations are desired, it is critical to make sure that sufficient CLM is
allocated on the system.
The default configuration for HP cell-based servers – 0% cell-local memory - should be changed on
any multi-cell server that will host a NUMA-optimized Oracle database. To take advantage of
Oracle‟s NUMA optimizations, configure the system with both cell-local memory and interleaved
memory (ILM) – Oracle uses a tiny amount of ILM even when the NUMA optimizations are engaged
(and HP-UX requires some ILM as well).
When insufficient cell-local memory is available in a locality domain, the Oracle memory structures
are created in interleaved memory (across all domains) and/or CLM in other localities, where they
will still be accessible by Oracle‟s processes, but where most memory accesses will be non-local. So
while the Oracle instance will function completely normally, it will be expending the overhead
required to eliminate inter-domain accesses, but it cannot be successful: the overhead will be wasted,
and a performance penalty will result.
How much cell-local memory should be configured on a system? The answer depends largely on two
factors:
HP-UX requirements. Starting with HP-UX version 11.31 (11i v3), the operating system itself is
optimized to use cell-local rather than interleaved memory, so it is important to follow HP-UX
standard recommendations for configuring cell-local memory. If you are upgrading from an earlier
version of HP-UX, be sure to take into account the higher cell-local memory recommendations for
11.31.
Combined memory requirements of all Oracle instances. As noted above, Oracle instances with
NUMA optimizations enabled will request CLM in the amount of the SGA (System Global Area) for
the instance. A given server should be configured with CLM at least as large as the sum of all
NUMA-enabled Oracle instances that will be run on that server.
Additional information about CLM and ILM memory recommendations can be found in the white
paper “Locality-Optimized Resource Alignment” (see For more information section, below). Typically,
HP recommends 87.5% (7/8) of all memory be configured as CLM, to account for both OS and
application needs; this is the default value for the new Itanium 9300-based Integrity servers.
Moreover, the 87.5% setting is the level at which HP-UX 11.31 will set the LORA_MODE
configuration variable to a value of 1, which will allow Oracle 11gR2 to engage its NUMA
optimizations.
In general, when setting up partitions (nPars, vPars, PSETs) which will host an Oracle database,
include the minimum number of domains required to satisfy Oracle‟s resource requirements, and keep
the domains relatively well balanced (equalize memory and processors). Be sure to allocate CLM and
ILM per the instructions above. When configuring nPars/vPars, consider the physical location of the
I/O cards used to access that partition‟s devices: it would make sense to configure your partition with
dedicated cores from the locality domain(s) associated with the I/O bay(s) containing those cards, in
order to avoid cross-locality accesses when processing I/O interrupts.
For non-NUMA-optimized behavior
If your Oracle instance will be run with its NUMA optimizations disabled, Oracle will not make use of
CLM at all, and will need sufficient interleaved memory (ILM) for all its data structures. In this case, be
sure to configure your system to be ILM-heavy, and, if you are running an Oracle version prior to
11gR2, set _enable_NUMA_optimization to false to ensure that the optimizations are disabled.
Oracle 11gR2 will properly detect ILM-heavy configurations (by detecting LORA_MODE equal to 0)
13
and automatically disable NUMA optimizations regardless of the value of the
_enable_NUMA_support parameter.
When an unoptimized Oracle instance requests space for its SGA, the space will be allocated
according to the setting of numa_policy. If numa_policy is set to favor ILM (the default for
LORA_MODE 0) but the system doesn‟t have enough ILM, the space will be allocated from the cell-
local memory of the first LDOM that has it available. Likewise, if numa_policy is set to favor CLM (the
default for LORA_MODE 1), the SGA will be placed in the CLM of a single LDOM. In either case, this
will result in an imbalance in the available memory distribution across the LDOMs, and HP-UX will
favor the remaining LDOMs whenever new processes are created. Thus, the SGA will be in one
LDOM, while most of Oracle‟s processes will be in the other LDOMs, practically guaranteeing that
most memory accesses will be non-local. Clearly, when running Oracle with its NUMA optimizations
disabled, it‟s important to make sure that there‟s plenty of ILM (and that numa_policy is set to favor
ILM).
Configuring your server and your Oracle instance: summary
While a CLM/ILM configuration that does not match the NUMA optimization state of the Oracle
instance will not impede Oracle‟s proper operation, it will result in sub-optimal performance. This
implies a clear best practice when deploying Oracle instances on HP ccNUMA servers: make sure
that the system‟s configuration and Oracle‟s configuration match one another. Configure BOTH the
server and Oracle for ccNUMA (lots of CLM; enable Oracle‟s optimizations) or configure them both
to operate without the optimizations (lots of ILM; disable Oracle‟s optimizations). The best practice for
Oracle Database 11gR2 is somewhat simpler: set _enable_NUMA_support to TRUE and then the
instance will properly enable or disable the optimizations based on HP-UX‟s LORA_MODE setting. (As
previously mentioned, do not assume that the default settings are optimal!)
A clear case for NUMA optimization: small vPar in large nPar
We‟ve already discussed the situation (see Figure 4 and the paragraphs above it) of the small vPar
carved out of a large nPar; now let‟s consider the specific case of using such a vPar as an Oracle
Database server.
Recall that our example vPar consists of two of the eight cells which comprise the nPar; it‟s got a fair
amount of CLM in each cell, and a certain amount of ILM which is interleaved across all eight cells.
Assume that the remaining six cells are allocated to one or more other vPars which will be running
their own workloads.
First, let‟s consider a non-NUMA-optimized Oracle Database instance in our vPar (see Figure 5). All
of Oracle‟s SGA will be placed in ILM (and will thus be interleaved across all eight cells) – an Oracle
process in either cell of our vPar will need to perform, on average, seven memory accesses from other
cells for every one memory access within the local cell. Not only is this bad for the performance of our
database instance, but all those inter-cell memory accesses will affect the workloads running in
whatever vPars are assigned those other cells! Furthermore and likewise, heavy workload activity in
those other vPars can and will affect the performance of our database server.
14
Figure 5. An eight-cell nPar (or server) with a vPar, consisting of cells 0 and 1, running a non-NUMA-optimized Oracle
instance. The SGA will be located in ILM, interleaved across all eight cells (including six cells that are otherwise not part of the
vPar).
Performance of both our database and of the other applications in other cells would clearly benefit
from turning our Oracle Database‟s NUMA optimizations on, because Oracle‟s SGA will be placed
in the CLM of the two cells in our vPar.
Dynamic reconfiguration considerations with ccNUMA optimizations
HP server virtualization capabilities (nPars, vPars, etc) allow you to define virtualized servers as a
subset of available physical resources, and to re-define those servers dynamically; CPUs and, in some
cases, memory can be added or deleted without shutting down the operating system. When an
Oracle instance is open and running on a virtualized server with NUMA optimizations disabled, the
operating system manages the locality of all Oracle‟s processes and memory allocations. Thus,
dynamic changes to the locality domains of the underlying host are transparent to a non-NUMA-
optimized Oracle instance. HP-UX takes care of any necessary process or memory migration, and the
overall number of inter-domain references may be better, worse, or the same after a dynamic change.
But for a NUMA-optimized Oracle instance, dynamic reconfiguration of the underlying host can have
a significant impact. In general, dynamic reconfiguration of a server or partition that hosts a NUMA-
optimized Oracle instance will increase the likelihood of processes migrating to remote localities, thus
reducing the effectiveness of Oracle‟s NUMA optimizations. Furthermore, dynamic removal of
resources can cause an Oracle instance to crash under certain circumstances. Therefore, employing
Oracle’s NUMA optimizations on servers/partitions which will be dynamically reconfigured is not
recommended, even though it is possible under some conditions. If dynamic reconfiguration is
desirable, the recommendation is to disable Oracle‟s NUMA optimizations and let HP-UX handle
process locality.
Nevertheless, dynamic reconfiguration IS supported for NUMA-optimized Oracle databases, and it
can be done safely (without risk of Oracle database crashes) if certain rules are followed. These rules,
and the reasons for them, are discussed fully in a companion white paper, “Dynamic server resource
allocation with Oracle Database 10g or 11g on an HP-UX ccNUMA-based server”. If you are
considering the use of dynamic resource allocation with a NUMA-optimized Oracle database, it is
critical to read and understand the contents of that white paper.
15
Summary
Oracle‟s NUMA enhancements can improve performance when running database instances on HP
ccNUMA servers. Since the memory configuration of the underlying server is critical to the
performance of an Oracle instance, care must be exercised to ensure that the memory configuration
matches the NUMA mode of the instance (sufficient CLM must be configured for a NUMA-optimized
instance, whereas a non-NUMA instance requires sufficient ILM). Make it a point to override the
defaults (if necessary) and match the server configuration with the Oracle configuration: decide
whether you wish your Oracle instance to run with its NUMA optimizations on or off, then configure
BOTH Oracle AND your server accordingly!
16
Appendix: Oracle versions, default NUMA enablement
The initialization parameter which governs the enablement of Oracle‟s NUMA parameters, and its
default setting, depends on the Oracle release, version, and certain patches. Below is a complete list.
Because of the confusion surrounding the default optimization state, HP highly recommends that the
appropriate initialization parameter ALWAYS be set explicitly.
In all cases, the optimizations will not be used if only one domain (cell or socket) is visible to Oracle
when the database is started.
− 10gR2 was the first to support NUMA optimizations on HP-UX. The optimizations are
controlled by the (unsupported) _enable_NUMA_optimization parameter
− 10.2.0.3 : optimizations enabled by default
− 10.2.0.4 : optimizations enabled by default BUT DO NOT WORK (bug 9668940).
Optimizations should be explicitly disabled.
− 10.2.0.5 (future) : optimizations off by default; 9668940 will be fixed.
− Oracle recommends patch 8199533 to switch optimizations off by default for
10.2.0.3 and 10.2.0.4.
− 11gR1 (11.1.0.6 and 11.1.0.7) : optimizations (_enable_NUMA_optimization) enabled by
default but Oracle recommends patch 8199533 to switch optimizations off by default
− 11gR2: optimizations controlled by the (supported) parameter _enable_NUMA_support (but if
the underlying host‟s LORA_MODE parameter is set to 0, the optimizations will not be used)
− 11.2.0.1 : optimizations disabled by default
− When upgrading to 11gR2 from an earlier version, make sure to DELETE any
references to the obsolete parameter _enable_NUMA_optimization to avoid potential
issues.
For more information
For an excellent discussion of the art of tuning applications to optimize ccNUMA performance, see
the white paper “Locality-Optimized Resource Alignment” at http://docs.hp.com/en/14655/ENW-
LORA-TW.pdf.
For best practices and considerations regarding the use of dynamic resource configuration with
Oracle Database, see the white paper “Dynamic server resource allocation with Oracle Database
10g or 11g on an HP-UX ccNUMA-based server”.
For specific rules and the latest information on Oracle support of dynamic reconfiguration and/or
ccNUMA optimizations, see the note entitled “Oracle Database ccNUMA support and dynamic partitioning
on HP-UX” published on Oracle‟s “MySupport” site (http://metalink.oracle.com; search for Document
ID 761065.1. Note: this site has membership requirements).
To help us improve our documents, please provide feedback at
http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html.
© Copyright 2009 – 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Oracle is a U.S. registered trademarks of Oracle Corporation. Intel and Itanium are trademarks of Intel Corporation in the U.S. and other countries.
4AA2-4194ENW, Created January 2009; Updated September 2010, Rev. #1