Numa Ldom Clm Explanation

Running Oracle Database 10g or 11g on

an HP-UX ccNUMA-based server

Updated for Oracle 11gR2 and HP-UX 11i v3

Technical white paper

Table of contents

Executive summary ............................................................................................................................... 2

ccNUMA: an introduction ..................................................................................................................... 4 ccNUMA infrastructure ..................................................................................................................... 5 HP-UX 11i v3 and “LORA” mode ....................................................................................................... 6 Virtualization, CLM, and “LORA” mode .............................................................................................. 7

Oracle‟s NUMA optimizations .............................................................................................................. 9 Enabling or disabling the Oracle NUMA optimizations ........................................................................ 9 Determining the ccNUMA configuration............................................................................................ 10 How the optimizations work ............................................................................................................ 11 Configuring the server and Oracle to match one another .................................................................... 11 A clear case for NUMA optimization: small vPar in large nPar ............................................................ 13 Dynamic reconfiguration considerations with ccNUMA optimizations ................................................... 14

Summary .......................................................................................................................................... 15

Appendix: Oracle versions, default NUMA enablement ......................................................................... 16

For more information .......................................................................................................................... 17

2

Executive summary

Many HP servers are based on a ccNUMA (cache-coherent Non-Uniform Memory Access)

architecture; these servers provide optional features which allow programmers to optimize application

performance by tailoring their application to the non-uniform nature of the underlying host server.

Depending on the application and workload, by taking advantage of these features and reducing

higher-latency memory references, modest performance gains can be achieved.

Starting with release 10g, the Oracle® database includes such NUMA-optimization features; these

can be enabled or disabled at database startup time, and they are enabled by default on some

Oracle versions, and disabled on others. To be effective, these optimizations require that the server‟s

memory is allocated mainly to the individual locality domains (cell-local memory or socket-local

memory) rather than to the system as a whole (interleaved memory); newer HP ccNUMA servers

default to such a configuration, but older ccNUMA servers do not. It is very important for performance

reasons that the setting of Oracle‟s NUMA optimizations matches the configuration of the server:

never rely on the default settings and configurations to produce an optimal result.

Furthermore, dynamic resource reconfiguration will have significant consequences to any running

NUMA-optimized Oracle Database instances. Because the optimizations are configured by Oracle

only at the time the instance starts up, subsequent changes to the structure (number of locality

domains) of the underlying host will, at best, result in the instance being optimized for the wrong

structure (which will incur sub-optimal performance); at worst, they may cause the database instance

to fail.

Thus, if dynamic reconfiguration is to be undertaken, either:

– disable Oracle‟s NUMA optimizations, or

– manage any dynamic reconfiguration very carefully, following the recommendations spelled out

in a companion white paper, “Dynamic server resource allocation with Oracle Database 10g or

11g on an HP-UX ccNUMA-based server”

Table 1. Summary of rules governing Oracle NUMA optimization with dynamic system reconfiguration

Oracle NUMA

optimization Dynamic Server Reconfiguration

No Dynamic Server

Reconfiguration

disabled or off

(ensure adequate

interleaved

memory for Oracle

SGA)

OK OK

enabled (ensure

adequate cell-local

memory in each

domain for Oracle

SGA)*

OK with restrictions (see companion white paper) but

generally not recommended. OK

* (default on older HP servers is: 100% ILM, 0% CLM)

Applicable servers:: All ccNUMA-based HP-UX servers (all cell-based servers and all servers based on

the Intel® Itanium® 9300 processor) which are running Oracle Database 10g or later.

Target audience: Oracle DBAs and those who administer ccNUMA HP-UX systems which will host

Oracle databases.

http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA3-1350ENW%20&cc=us&lc=en


3

Support: This whitepaper makes recommendations based on HP‟s best knowledge of the stated

configuration options at this time (June 2010). We do not and cannot imply Oracle support for any of

these configuration options – statements of Oracle support can only be made by Oracle.

Figure 1. Traditional Uniform-Memory-Access (UMA) design. All processors, and all processes (P1, P2, etc), have equal latency

to their objects in any part of memory. The server is scaled up by connecting more processors to the bus (which eventually

causes the bus to be a bottleneck and reduce scalability).

4

ccNUMA: an introduction

Cache-coherent non-uniform memory access (ccNUMA) is an architectural technique which has been

used successfully to build large, multi-processor servers in a way that avoids the bus bottlenecks which

characterize traditional uniform server design (see Figure 1). HP‟s first generation of ccNUMA servers

are cell-based: they are comprised of multiple smaller cells of CPUs, memory, and I/O cards linked

together by at least one low-latency cross-bar switch. These individual cells can be operated

independently or in groups of two or more cells; each such operating unit is known as an nPar or

hardware partition. When multiple cells are incorporated into the same hardware partition, processes

running in any cell can access memory owned by any other cell (Figure 2). Accessing memory from a

remote cell results in slightly greater latency than an access to memory in a process‟ own cell. Since

memory-access time is a component of CPU-busy time, an increase in memory latency equals an

increase in CPU-busy time.

Table 2. HP servers based on ccNUMA architecture

Cell-based ccNUMA servers Itanium 9300-based ccNUMA servers

rp7410, rp7420, rp7440, rx7620, rx7640,

rp8400, rp8420, rp8440, rx8620, rx8640,

HP 9000 Superdome, HP Integrity Superdome

rx2800 i2, BL860c i2, BL870c i2, BL890c i2,

Superdome 2

In 2010, HP introduced new Integrity servers based on the latest Intel Itanium processor (the four-

core/eight-hyper-thread Itanium 9300 series). Like their predecessors, these servers are modular in

nature: the blade-based two-socket BL860c i2 can be used by itself, doubled to form a four-socket

server (BL870c i2), or quadrupled to form an eight-socket server (BL890c i2). A non-modular rack-

mounted version of the two-socket server (rx2800 i2) is also available. The Itanium 9300 is designed

so that each processor socket has its own memory; all memory in these servers is allocated (more or

less) evenly among the sockets. In a two-socket server, a process will have faster access time to

objects in the memory on the same socket as the core where the process is running than it would to

objects in the memory of the other socket. When multiple two-socket blades are lashed together to

form larger servers, in addition to the latency differential between the sockets in the same blade, there

are further increases in latency when accessing memory objects on the other blades which comprise

the server. So while previous ccNUMA servers featured locality domains based on the cell concept,

these newest ccNUMA servers‟ locality domains are based on their socket layout.

ccNUMA-based servers provide excellent performance and scalability without requiring that

workloads be adapted in any way for the ccNUMA architecture. However, certain application

workloads may realize small or even modest performance gains if they are made aware of the

server‟s hierarchical memory layout and are able to adapt themselves to minimize interactions

between the locality domains (the cells or sockets) that comprise the server. By restricting a process to

a certain domain and assigning its memory objects to the memory of that same domain, a process

will minimize memory latency. Processes which do large amounts of data manipulation in memory are

thus candidates for ccNUMA optimization.

5

Figure 2. Non-Uniform Memory Access. Small UMA servers (cells) are connected together through one or more high-speed

switches to form a larger server. Any process has access to any region of memory, but latency times are lower if the object is

closer to the accessing process. In this example, P1‟s access to its object is faster than P2‟s, which is faster than P3‟s.

ccNUMA infrastructure

While the memory in a ccNUMA server is organized in a hierarchical manner, applications are

traditionally designed with the assumption that all parts of memory are equal. Thus, some or all

memory in a ccNUMA server can be designated as “interleaved memory” (ILM), which is designed to

emulate, as much as possible, a uniform architecture. Interleaved memory is managed as a single unit

across all locality domains in the server or partition; an object placed in ILM will be striped across all

locality domains, and access times, on average, will be the same regardless of where (in which

locality domain) the accessing process is located.

The alternative to configuring memory as ILM is to configure it in its native ccNUMA state, which is

called Cell-Local Memory (CLM), or sometimes Socket-Local Memory (SLM). CLM/SLM is configurable

on a per-cell/per-socket basis; an object placed in CLM (SLM) in a cell or socket will be completely

contained within that cell or socket.

In general, one would only place objects in CLM if the applications which access those objects have

been optimized for ccNUMA (i.e., if the application‟s processes can be localized to a single locality

domain); for non-optimized applications, ILM would usually be the best choice.

The memory in HP ccNUMA servers is configurable as ILM or CLM; this configuration is set at power-

up and is fixed while the server is running. The server management interface can be used to view and

modify the ILM/CLM configuration, with any changes taking effect at the next reboot. The default

configuration depends on the particular server model: for Itanium 9300-based servers, the default is

87.5% CLM, but for all other servers, the default is 100% ILM. It is important to know the ILM/CLM

configuration of your server.

6

A note about terminology: the earliest ccNUMA servers were all cell-based, so it‟s not unusual to refer

to the individual locality domains as “cells”, which is why we have Cell-Local Memory as opposed to

“Domain-Local Memory” (though the term “Socket-Level Memory” is sometimes used). Today, we use

the term “locality domain” (or LDOM, or sometimes just locality or domain) to mean cells or sockets –

any set of resources which have equal latency to a region of memory. Furthermore, we might refer

specifically to a locality domain by its number: LDOM 0 through LDOM n.

Figure 3. Interleaved Memory (ILM) is spread across the Locality Domains (LDOMs); Cell-Local Memory (CLM) is contained

within each LDOM. An object placed in ILM will thus be striped across the LDOMs; a process accessing it will find some

accesses to be local (same LDOM), others remote. An object placed in CLM will be local for all processes in that same LDOM

(but all accesses from remote LDOMs will have higher latency). In this example, P1 is accessing an object in ILM (with varying

access times since one part of the object is in the same LDOM, but most of it is in remote LDOMs) and an object in the CLM in

its own LDOM (where all accesses are local). When P2 accesses the object in LDOM 0, all accesses are remote.

HP-UX 11i v3 and “LORA” mode

HP-UX 11i v3 has been enhanced to provide tools and features which facilitate application

optimization for the ccNUMA architecture. The most evident of these is a new system variable,

“LORA_MODE”, which was introduced with HP-UX 11i v3 update 3 (September 2008). The term

LORA comes from Locality-Optimized Resource Allocation, the science of optimizing for ccNUMA.

LORA_MODE is intended to indicate whether the server is configured adequately to permit locality

optimization (ccNUMA optimization) to be effective. A LORA_MODE value of 1 (one) indicates a

sufficient configuration; currently, HP-UX will set this variable to 1 if 86-90% of the server‟s memory is

configured as CLM, and if this CLM is evenly distributed across the locality domains.

LORA_MODE cannot be changed directly, but a kernel tunable parameter (numa_mode) can be used

to control LORA_MODE. The kernel‟s default value for numa_mode is 0 (zero), which allows HP-UX to

set LORA_MODE based solely on the hardware configuration. Setting numa_mode to 1 (one) will

force LORA_MODE to 1, while setting numa_mode to -1 (minus one) will force LORA_MODE to zero.

If numa_mode is set to: Then LORA_MODE will be:

0 (default) Set by HP-UX to 1 if adequate

CLM, 0 otherwise

1 1

-1 0

7

The purpose of LORA_MODE is to provide a way of indicating to any optimized applications that the

server has sufficient CLM to support those optimizations. (An application that is optimized for

ccNUMA requires sufficient CLM in which to place its memory objects; without enough CLM, the

optimizations would have no – or perhaps even a detrimental – effect on performance.)

To determine the value of LORA_MODE, use the HP-UX “getconf” command:

getconf LORA_MODE

As will be discussed in more detail below, Oracle‟s most recent database version, 11gR2, checks

LORA_MODE before engaging its optimizations.

An additional HP-UX 11i v3 parameter which affects ccNUMA behavior is numa_policy, which

governs the allocation of memory (in CLM or ILM). The valid settings of numa_policy are:

0: (default): autosense the right policy based on the (LORA) mode in which HP-UX is

operating

1: (default in LORA mode 1): honor requests from the application, otherwise place the object

in the CLM of the domain where the process is most likely to run

2: override requests; always allocate in ILM if possible

3: honor requests by the application, otherwise allocate in the CLM of the closest domain,

except for text/library objects, which will be placed in ILM if possible

4: (default for LORA mode 0): allocate “non-LORA-intelligently” (i.e., favor ILM for shared

objects and CLM for private objects, but honor requests made by the application)

In short, if the application specifies the location of a newly created object, that location will generally

be used unless numa_policy=2, or unless there‟s not enough CLM or ILM available, in which case the

object will be placed in the next-best location. If the application does not specify the location of the

object, it will default according to the setting of numa_policy.

The HP-UX command kctune can be used to determine the current setting of either numa_mode or

numa_policy:

kctune numa_mode

kctune numa_policy

Virtualization, CLM, and “LORA” mode

Most virtualized servers take on the ccNUMA characteristics of their constituent components, and HP-

UX, and any applications running on these servers, will act accordingly. An nPar or vPar which is

comprised of multiple locality domains will exhibit the same ccNUMA behavior as a physical server

constructed from the same hardware components; HP-UX 11i v3 will apply the same LORA_MODE

tests, and applications will derive the same benefit from NUMA optimization.

Just as it‟s important to properly configure a physical server with sufficient CLM for application

optimizations to be effective, it is important to properly configure a multi-domain vPar or nPar which

will host an optimized application.

To verify the amount of cell-local and interleaved memory configured in an nPar, use the HP-UX

“parstatus” command as follows:

parstatus –w (to determine the nPartition-number of the system)

parstatus –V –p nPartition-number

When configuring a vPar, be aware that some of the memory you specify will be used for vPar

management overhead, so be sure to configure your vPar with more memory than you require,

keeping in mind that LORA_MODE will be set to 1 only if HP-UX sees that 86-90% of the vPar‟s

8

memory is configured as CLM. When you start the vPar for the first time, verify your CLM/ILM

amounts, and (for HP-UX 11i v3) the state of the LORA_MODE variable (using “getconf” as described

above). The HP-UX “machinfo” command will tell you how much total memory and interleaved

memory that HP-UX sees:

machinfo –m

PSETs will also affect the ccNUMA characteristics seen by any applications running within them. A

PSET that is wholly within a single locality domain will cause any applications running within it to

behave as if they were running on a non-ccNUMA server. Likewise, if an nPar or vPar or does not

span multiple domains, then it will be considered a non-ccNUMA server, even if that partition is part

of a larger server that itself comprises multiple domains.

HP Integrity Virtual Machines are the lone exception: VM guests will always appear as a single

locality domain, regardless of the layout of the underlying physical infrastructure.

One last fact about CLM and ILM with vPars: while CLM for each locality in the vPar is obviously

contained within those localities, ILM will be interleaved across ALL the localities that constitute the

server (or nPar) of which the vPar is a part. An example will help illustrate this fact: imagine an nPar

of eight cells that contains a two-cell vPar (see Figure 4). You might expect that the ILM for this vPar

would only span cells 0 and 1, but actually, it will span all eight cells – so when a process in our vPar

needs to access an ILM-based object, it will wind up accessing memory in cells which are not even

part of our vPar. (And as you can see, some of the memory in cells 0 and 1 are allocated to ILM that

will be available to objects belonging to other vPars, which means that we will have to share some of

our memory bandwidth with processes from outside our own vPar.)

This last fact may surprise you – if you thought that by setting up a vPar, your computing activity

would be completely isolated within the vPar, you‟ve just learned that you were wrong – activity on

every vPar DOES affect performance on all vPars because of this ILM characteristic. The performance

advantages of using NUMA optimization to reduce the use of ILM (or, at the very least, of forcing

objects to be placed in CLM instead of ILM) should thus be very clear.

We will discuss this issue and how Oracle is affected once we‟ve discussed Oracle‟s NUMA

optimizations in general.

Figure 4. An example of an eight-cell nPar (or server) with a vPar consisting of cells 0 and 1. Note that while the vPar‟s CLM is

contained completely within cells 0 and 1, the vPar‟s ILM is a portion of the overall ILM allocated to the entire nPar/server - it is

interleaved across all the cells.

9

Oracle‟s NUMA optimizations

To exploit the characteristics of ccNUMA to improve performance, Oracle Database (version 10g and

later) has been enhanced to accommodate Oracle processes and their corresponding memory objects

within the same domains. Such enhanced behaviors are known as Oracle‟s NUMA optimizations

(deliberately without the “cc”), and are engaged only on multi-domain servers and only when they are

enabled via the appropriate Oracle initialization parameter. Each time a NUMA-optimized Oracle

database instance is started, it will modify its configuration in order to best conform to the ccNUMA

characteristics of the host under which it is running.

Enabling or disabling the Oracle NUMA optimizations

Oracle‟s NUMA optimizations are controlled through an initialization parameter which is only

checked at instance startup time. The parameter can be changed at any time but it will have no effect

until the next time the database instance is started.

The name of this parameter depends on the version of Oracle, and the default state of the parameter

depends on the specific release and patch level (which is why HP recommends that this parameter

always be explicitly set!). (The specific default behavior of each Oracle version is included as an

appendix to this paper.)

Oracle versions 10gR2 and 11gR1use a hidden (so-called “underbar”) parameter to control the

ccNUMA optimizations: _enable_NUMA_optimization. Oracle typically does not support these

“underbar” parameters and this one is no exception; however, certain Oracle documentation does

refer to changing this parameter in order to enable or disable the optimizations.

Note

The use of “underbar” parameters to control certain Oracle behaviors is

unsupported by Oracle. Furthermore, since such parameters are

unsupported and undocumented, they may change from one release of

Oracle to the next. We therefore recommend that you consult your Oracle

support resources before making any such changes to a supported Oracle

database instance.

Oracle version 11gR2 uses a different “underbar” parameter: _enable_NUMA_support. This

parameter is FULLY supported by Oracle (even though it starts with an underbar!).

NOTE: when upgrading to 11gR2 from a previous release of Oracle, make sure to remove any

references to the old parameter, _enable_NUMA_optimization. This parameter is unfortunately not

ignored by Oracle 11gR2 – it still affects some aspects of the optimizations and should thus be

avoided.

When a NUMA-optimized Oracle 11gR2 instance is started up, a message indicating the use of

NUMA optimizations (“NUMA system found and support enabled”) is displayed on the console and

in the instance‟s alert log. (No such indication is given for older Oracle versions.)

A complete description of the particular NUMA behaviors associated with each version of Oracle

Database is included as Table 3.

For Oracle Database version 10.2.0.4, the optimizations are not implemented correctly and should

never be used: __enable_NUMA_optimization should always be set to false (and the server should

thus be configured ILM-heavy) for Oracle 10.2.0.4.

10

Table 3. Oracle NUMA behavior. Binding of “none” indicates that Oracle does not restrict the location of the

process(es).Parallel Query Slaves are created and bound one per LDOM in a round-robin (RR) fashion when in NUMA mode.

With NUMA optimizations off, the location of the SGA (including the DB cache) is not specified by Oracle, so it defaults to the

location implied by the HP-UX parameter numa_policy. In addition, the optimizations will not be engaged if Oracle detects only

one locality domain (cell or socket) when the database is started. See “HP-UX 11i v3 and „LORA‟ mode”, above, for definition

and default behavior of numa_policy. O

racl

e ve

rsio

n

NU

MA

para

met

er ¹

LORA

_MO

DE

optim

izat

ions

are

db_c

ache

loca

tion

fixed

_SG

A lo

catio

n

# db

wr

dbw

r bin

ding

para

llel q

uery

bin

ding

lgw

r, lo

g bu

ffer

bin

ding

othe

r bin

ding

_eNo=false 0, 1, or n/a disabled default² default² lcpu³ / 8 none none none none

_eNo=true 0, 1, or n/a engaged CLM (even) default² # LDOMseach

LDOMLDOM RR LDOM 1 LDOM 0

_eNs=false 0 or 1 disabled default² default² lcpu³ / 8 none none none none

0 unengaged default² default² # LDOMs none none none none

1 engaged CLM (even) default² # LDOMseach

LDOMLDOM RR LDOM 1 LDOM 0

¹ _eNo = _enable_NUMA_optimization (10gR2 and 11gR1)

¹ _eNs = _enable_NUMA_support (11gR2)

² depends on numa_policy: either ILM, or CLM of a single LDOM

³ lcpu = logical CPU count {= # cores, times 2 if hyperthreading enabled}

*10.2.0.4 only: optimizations are broken. _eNo should ALWAYS be set to false

_eNs=true

Determining the ccNUMA configuration

At instance startup, if the initialization parameter governing Oracle‟s NUMA optimizations has been

enabled, Oracle performs a check to determine the ccNUMA state of the host server (which could be

a physical server, nPar, vPar, Integrity Virtual Machine, or other partition). The nature of this check

depends on the release of Oracle:

For Oracle 11gR2, the server‟s LORA_MODE variable is checked – if it is zero, the

optimizations will be disabled for that execution of the instance. If LORA_MODE is one,

Oracle still has other checks to perform:

For all Oracle versions, Oracle counts the number of active locality domains (i.e., domains

containing one or more CPUs) present on the host server. If Oracle detects only one locality

domain, NUMA optimizations will be disabled for that execution of the instance; if multiple

domains are detected, the optimizations will be enabled.

Thus, while all versions of Oracle do check to make sure that the server has more than one locality

domain available before engaging optimized behavior, only version 11gR2 checks whether the

server is configured with adequate CLM (LORA_MODE is 1) to render the optimizations effective.

If the server has no CLM and the optimizations are enabled anyway, Oracle will function normally but

the optimizations will be useless – see below. Recall that the default configuration for many HP servers

is zero CLM, so it is important to explicitly configure your server with enough CLM if you plan to use

Oracle‟s NUMA optimizations.

11

How the optimizations work

When Oracle‟s NUMA optimizations are engaged, the following actions are taken as the instance is

started:

– The shared data area (System Global Area, or SGA) is split into multiple pools of memory, one

pool for each locality domain in the server. (The “fixed SGA” is not included – see below.)

– Multiple database writer process are launched, one in each locality domain, in order to more

efficiently write out the dirty database buffers contained in that domain‟s SGA pool.

– The log-writer process is bound to locality domain 1, and all log buffers are allocated from CLM

in that domain.

– Background processes that are instantiated as multiple copies (e.g., the parallel processing

processes ora_pNNN, and query processing processes ora_qNNN) are distributed across the

locality domains in a round-robin fashion.

– All other Oracle background processes (pmon, smon, ckpt, etc.) are bound to the locality domain

in which they are initially placed by HP-UX to ensure that their memory accesses remain local as

much as possible.

A small portion of the SGA, known as the “fixed SGA”, is not explicitly assigned by Oracle to any

particular location. It is thus governed by numa_policy – when LORA_MODE=0 (or when

numa_policy is set to favor ILM), it will go into ILM, and when LORA_MODE=1 (or when numa_policy

is set to favor CLM), it will be placed in the CLM of a single domain, usually LDOM 0. (Note that in

previous versions of this white paper we reported that the fixed SGA would go into ILM when the

NUMA optimizations are engaged – such was our understanding of Oracle‟s intent, but empirical

evidence proves otherwise.) Thus, when LORA_MODE=1 (unless numa_policy is explicitly set to 2,

forcing everything into ILM), the only ILM that will be used by a NUMA-optimized Oracle instance will

be its shared text – i.e., the Oracle code itself, which is currently on the order of 350 MB.

Note

Oracle‟s shadow client processes are not NUMA optimized, because the

TNS listener process, tnslsnr (which spawns the Oracle shadow client

processes for incoming database connections), is not ccNUMA-aware.

Without explicit customization of the listener process, allocation of shadow

processes (and their private objects) will follow the same algorithm that‟s

long been used for shadow process distribution. As a best practice, we

continue to recommend the use of “mpsched –P LL –p <lsnr-pid>” as a

means of enforcing a least-loaded scheduling policy on the spawned

shadow processes. Such enforcement ensures that the shadow processes

will be distributed evenly across the available CPU resources on the host

server and that their allocation of private data will be optimized for their

resident LDOM.

Since the NUMA configuration is determined only at instance startup, Oracle cannot respond to

dynamic system reconfiguration. See “Dynamic reconfiguration considerations with ccNUMA

optimizations”, below, for further discussion.

Configuring the server and Oracle to match one another

It is important to be sure that Oracle and its host server are configured compatibly, either for NUMA-

optimized behavior or for non-optimized behavior. The default configuration for some server models is perfectly mis-matched with the default Oracle configuration of certain Oracle versions (older servers

default to 100% ILM, while NUMA optimizations are enabled by default on some older Oracle

versions; Itanium 9300-based servers default to 87.5% CLM, but the NUMA optimizations are

12

disabled by default for the latest Oracle versions). We strongly recommend that the defaults never be

taken (or at least that the configurations be explicitly examined to make sure they are well aligned).

For NUMA-optimized behavior

When Oracle‟s NUMA optimizations are desired, it is critical to make sure that sufficient CLM is

allocated on the system.

The default configuration for HP cell-based servers – 0% cell-local memory - should be changed on

any multi-cell server that will host a NUMA-optimized Oracle database. To take advantage of

Oracle‟s NUMA optimizations, configure the system with both cell-local memory and interleaved

memory (ILM) – Oracle uses a tiny amount of ILM even when the NUMA optimizations are engaged

(and HP-UX requires some ILM as well).

When insufficient cell-local memory is available in a locality domain, the Oracle memory structures

are created in interleaved memory (across all domains) and/or CLM in other localities, where they

will still be accessible by Oracle‟s processes, but where most memory accesses will be non-local. So

while the Oracle instance will function completely normally, it will be expending the overhead

required to eliminate inter-domain accesses, but it cannot be successful: the overhead will be wasted,

and a performance penalty will result.

How much cell-local memory should be configured on a system? The answer depends largely on two

factors:

HP-UX requirements. Starting with HP-UX version 11.31 (11i v3), the operating system itself is

optimized to use cell-local rather than interleaved memory, so it is important to follow HP-UX

standard recommendations for configuring cell-local memory. If you are upgrading from an earlier

version of HP-UX, be sure to take into account the higher cell-local memory recommendations for

11.31.

Combined memory requirements of all Oracle instances. As noted above, Oracle instances with

NUMA optimizations enabled will request CLM in the amount of the SGA (System Global Area) for

the instance. A given server should be configured with CLM at least as large as the sum of all

NUMA-enabled Oracle instances that will be run on that server.

Additional information about CLM and ILM memory recommendations can be found in the white

paper “Locality-Optimized Resource Alignment” (see For more information section, below). Typically,

HP recommends 87.5% (7/8) of all memory be configured as CLM, to account for both OS and

application needs; this is the default value for the new Itanium 9300-based Integrity servers.

Moreover, the 87.5% setting is the level at which HP-UX 11.31 will set the LORA_MODE

configuration variable to a value of 1, which will allow Oracle 11gR2 to engage its NUMA

optimizations.

In general, when setting up partitions (nPars, vPars, PSETs) which will host an Oracle database,

include the minimum number of domains required to satisfy Oracle‟s resource requirements, and keep

the domains relatively well balanced (equalize memory and processors). Be sure to allocate CLM and

ILM per the instructions above. When configuring nPars/vPars, consider the physical location of the

I/O cards used to access that partition‟s devices: it would make sense to configure your partition with

dedicated cores from the locality domain(s) associated with the I/O bay(s) containing those cards, in

order to avoid cross-locality accesses when processing I/O interrupts.

For non-NUMA-optimized behavior

If your Oracle instance will be run with its NUMA optimizations disabled, Oracle will not make use of

CLM at all, and will need sufficient interleaved memory (ILM) for all its data structures. In this case, be

sure to configure your system to be ILM-heavy, and, if you are running an Oracle version prior to

11gR2, set _enable_NUMA_optimization to false to ensure that the optimizations are disabled.

Oracle 11gR2 will properly detect ILM-heavy configurations (by detecting LORA_MODE equal to 0)

13

and automatically disable NUMA optimizations regardless of the value of the

_enable_NUMA_support parameter.

When an unoptimized Oracle instance requests space for its SGA, the space will be allocated

according to the setting of numa_policy. If numa_policy is set to favor ILM (the default for

LORA_MODE 0) but the system doesn‟t have enough ILM, the space will be allocated from the cell-

local memory of the first LDOM that has it available. Likewise, if numa_policy is set to favor CLM (the

default for LORA_MODE 1), the SGA will be placed in the CLM of a single LDOM. In either case, this

will result in an imbalance in the available memory distribution across the LDOMs, and HP-UX will

favor the remaining LDOMs whenever new processes are created. Thus, the SGA will be in one

LDOM, while most of Oracle‟s processes will be in the other LDOMs, practically guaranteeing that

most memory accesses will be non-local. Clearly, when running Oracle with its NUMA optimizations

disabled, it‟s important to make sure that there‟s plenty of ILM (and that numa_policy is set to favor

ILM).

Configuring your server and your Oracle instance: summary

While a CLM/ILM configuration that does not match the NUMA optimization state of the Oracle

instance will not impede Oracle‟s proper operation, it will result in sub-optimal performance. This

implies a clear best practice when deploying Oracle instances on HP ccNUMA servers: make sure

that the system‟s configuration and Oracle‟s configuration match one another. Configure BOTH the

server and Oracle for ccNUMA (lots of CLM; enable Oracle‟s optimizations) or configure them both

to operate without the optimizations (lots of ILM; disable Oracle‟s optimizations). The best practice for

Oracle Database 11gR2 is somewhat simpler: set _enable_NUMA_support to TRUE and then the

instance will properly enable or disable the optimizations based on HP-UX‟s LORA_MODE setting. (As

previously mentioned, do not assume that the default settings are optimal!)

A clear case for NUMA optimization: small vPar in large nPar

We‟ve already discussed the situation (see Figure 4 and the paragraphs above it) of the small vPar

carved out of a large nPar; now let‟s consider the specific case of using such a vPar as an Oracle

Database server.

Recall that our example vPar consists of two of the eight cells which comprise the nPar; it‟s got a fair

amount of CLM in each cell, and a certain amount of ILM which is interleaved across all eight cells.

Assume that the remaining six cells are allocated to one or more other vPars which will be running

their own workloads.

First, let‟s consider a non-NUMA-optimized Oracle Database instance in our vPar (see Figure 5). All

of Oracle‟s SGA will be placed in ILM (and will thus be interleaved across all eight cells) – an Oracle

process in either cell of our vPar will need to perform, on average, seven memory accesses from other

cells for every one memory access within the local cell. Not only is this bad for the performance of our

database instance, but all those inter-cell memory accesses will affect the workloads running in

whatever vPars are assigned those other cells! Furthermore and likewise, heavy workload activity in

those other vPars can and will affect the performance of our database server.

14

Figure 5. An eight-cell nPar (or server) with a vPar, consisting of cells 0 and 1, running a non-NUMA-optimized Oracle

instance. The SGA will be located in ILM, interleaved across all eight cells (including six cells that are otherwise not part of the

vPar).

Performance of both our database and of the other applications in other cells would clearly benefit

from turning our Oracle Database‟s NUMA optimizations on, because Oracle‟s SGA will be placed

in the CLM of the two cells in our vPar.

Dynamic reconfiguration considerations with ccNUMA optimizations

HP server virtualization capabilities (nPars, vPars, etc) allow you to define virtualized servers as a

subset of available physical resources, and to re-define those servers dynamically; CPUs and, in some

cases, memory can be added or deleted without shutting down the operating system. When an

Oracle instance is open and running on a virtualized server with NUMA optimizations disabled, the

operating system manages the locality of all Oracle‟s processes and memory allocations. Thus,

dynamic changes to the locality domains of the underlying host are transparent to a non-NUMA-

optimized Oracle instance. HP-UX takes care of any necessary process or memory migration, and the

overall number of inter-domain references may be better, worse, or the same after a dynamic change.

But for a NUMA-optimized Oracle instance, dynamic reconfiguration of the underlying host can have

a significant impact. In general, dynamic reconfiguration of a server or partition that hosts a NUMA-

optimized Oracle instance will increase the likelihood of processes migrating to remote localities, thus

reducing the effectiveness of Oracle‟s NUMA optimizations. Furthermore, dynamic removal of

resources can cause an Oracle instance to crash under certain circumstances. Therefore, employing

Oracle’s NUMA optimizations on servers/partitions which will be dynamically reconfigured is not

recommended, even though it is possible under some conditions. If dynamic reconfiguration is

desirable, the recommendation is to disable Oracle‟s NUMA optimizations and let HP-UX handle

process locality.

Nevertheless, dynamic reconfiguration IS supported for NUMA-optimized Oracle databases, and it

can be done safely (without risk of Oracle database crashes) if certain rules are followed. These rules,

and the reasons for them, are discussed fully in a companion white paper, “Dynamic server resource

allocation with Oracle Database 10g or 11g on an HP-UX ccNUMA-based server”. If you are

considering the use of dynamic resource allocation with a NUMA-optimized Oracle database, it is

critical to read and understand the contents of that white paper.



15

Summary

Oracle‟s NUMA enhancements can improve performance when running database instances on HP

ccNUMA servers. Since the memory configuration of the underlying server is critical to the

performance of an Oracle instance, care must be exercised to ensure that the memory configuration

matches the NUMA mode of the instance (sufficient CLM must be configured for a NUMA-optimized

instance, whereas a non-NUMA instance requires sufficient ILM). Make it a point to override the

defaults (if necessary) and match the server configuration with the Oracle configuration: decide

whether you wish your Oracle instance to run with its NUMA optimizations on or off, then configure

BOTH Oracle AND your server accordingly!

16

Appendix: Oracle versions, default NUMA enablement

The initialization parameter which governs the enablement of Oracle‟s NUMA parameters, and its

default setting, depends on the Oracle release, version, and certain patches. Below is a complete list.

Because of the confusion surrounding the default optimization state, HP highly recommends that the

appropriate initialization parameter ALWAYS be set explicitly.

In all cases, the optimizations will not be used if only one domain (cell or socket) is visible to Oracle

when the database is started.

− 10gR2 was the first to support NUMA optimizations on HP-UX. The optimizations are

controlled by the (unsupported) _enable_NUMA_optimization parameter

− 10.2.0.3 : optimizations enabled by default

− 10.2.0.4 : optimizations enabled by default BUT DO NOT WORK (bug 9668940).

Optimizations should be explicitly disabled.

− 10.2.0.5 (future) : optimizations off by default; 9668940 will be fixed.

− Oracle recommends patch 8199533 to switch optimizations off by default for

10.2.0.3 and 10.2.0.4.

− 11gR1 (11.1.0.6 and 11.1.0.7) : optimizations (_enable_NUMA_optimization) enabled by

default but Oracle recommends patch 8199533 to switch optimizations off by default

− 11gR2: optimizations controlled by the (supported) parameter _enable_NUMA_support (but if

the underlying host‟s LORA_MODE parameter is set to 0, the optimizations will not be used)

− 11.2.0.1 : optimizations disabled by default

− When upgrading to 11gR2 from an earlier version, make sure to DELETE any

references to the obsolete parameter _enable_NUMA_optimization to avoid potential

issues.

For more information

For an excellent discussion of the art of tuning applications to optimize ccNUMA performance, see

the white paper “Locality-Optimized Resource Alignment” at http://docs.hp.com/en/14655/ENW-

LORA-TW.pdf.

For best practices and considerations regarding the use of dynamic resource configuration with

Oracle Database, see the white paper “Dynamic server resource allocation with Oracle Database

10g or 11g on an HP-UX ccNUMA-based server”.

For specific rules and the latest information on Oracle support of dynamic reconfiguration and/or

ccNUMA optimizations, see the note entitled “Oracle Database ccNUMA support and dynamic partitioning

on HP-UX” published on Oracle‟s “MySupport” site (http://metalink.oracle.com; search for Document

ID 761065.1. Note: this site has membership requirements).

To help us improve our documents, please provide feedback at

http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html.

© Copyright 2009 – 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Oracle is a U.S. registered trademarks of Oracle Corporation. Intel and Itanium are trademarks of Intel Corporation in the U.S. and other countries.

4AA2-4194ENW, Created January 2009; Updated September 2010, Rev. #1

http://docs.hp.com/en/14655/ENW-LORA-TW.pdf

http://docs.hp.com/en/14655/ENW-LORA-TW.pdf



https://metalink2.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=761065.1

https://metalink2.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=761065.1

http://metalink.oracle.com/

http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html

http://www.hp.com/go/getconnected

Documents

Numa Ldom Clm Explanation