Memory Management and Latching

integrid.info 1/34Tanel Põder

Memory Management and Latching Improvements in

Oracle9i and 10g

Tanel Põder

independent consultant

http://integrid.info

Hotsos Symposium 2005


Introduction & About• Name: Tanel Põder

• Oracle experience: 7 years

• Occupation: independent consultant

• Company: integrid.info

• Member of Oaktable Network

• Oracle Certified Master DBA

• Europe, Middle-East & Africa Oracle User Group BoD member


Performance and Scalability• Performance

– Avoiding unneccessary work

• Scalability– Avoiding blocking others doing the same

operation (serialization)

• Sometimes sacrifices are needed in one area to gain in other– Direct-path insert

– Redo copy latches


Speed of Data Access• CPU registers

• CPU cache– Caches code, data, also TLB for virtual memory

management

• RAM– Directly accessible for local processes

• External storage / network– May have cache on controller or disk array

– Access requires system calls


Memory As Intermediate Workarea• Need to use it as efficiently as possible

– Despite it is “cheap”

– Allocating precisely required amounts – heap management

– Reusing unused memory – LRU and free lists

• Protect memory from corruptions and inconsistent reads in multiuser env.– Some kind of atomic locking/serializationis

needed – Oracle uses latches for it


Oracle Memory Management• Kernel Shared Memory - KSM

– KSMG – kernel shared memory granule allocation

• Generic Heap Manager – KGH– Shared Pool

• Is shared so concurrently accessible for many processes – latching is needed

• Contains library cache, which is managed by KGL

• PGA– No shared access, usually private memory

– No latching needed


Dynamic SGA• SGA_MAX_SIZE defines max size of SGA

– All virtual memory is allocated immediately (not physical memory)

– Beware when using ISM, lock_sga, _lock_sga_areas or pre_page_sga

– Problems with non-functioning DISM

• Memory allocated in granules– 4 or 16 MB depending whether sga_max_size

> 128MB on Unix

– 4 or 8 MB


Automatic SGA Tuning• SGA_TARGET parameter for specifying total

SGA size

• MMAN process acts as SGA Memory broker

• May resize various components of SGA

• DBA should be very cautious with this feature– Does it give any benefit?

– What MMAN decides to reduce (and flush) half of buffer cache during heavy operations?

– Even more dangerous with shared pool...


Realfree PGA Heap Management• _use_realfree_heap=true makes Oracle to

allocate CGA and UGA heaps independently (as top-level heaps), not to PGA heap– 9.2 new feature– Is set automatically to true when

pga_aggregate_target is set– Default = true in 10g

• mmap() is used instead malloc() and brk()• _realfree_heap_pagesize_hint (10g)

– Bytes to preallocate at a time• _use_ism_for_pga (10g)

– Use ISM for large PGA extent allocation


Realfree Heap Mgmt 2• ksmarfg() and ksmfrfg() internal functions

are used for realfree heap allocation• strace (truss) output of PGA mem allocationmmap2(NULL,1048576,..,MAP_PRIVATE|MAP_NORESERVE, 8,0xb6)= 0xb68c6000mmap2(0xb68c6000, 65536,.., MAP_PRIVATE|MAP_FIXED, 8, 0) = 0xb68c6000mmap2(0xb68d6000, 65536,.., MAP_PRIVATE|MAP_FIXED, 8, 0) = 0xb68d6000mmap2(0xb68e6000, 65536,.., MAP_PRIVATE|MAP_FIXED, 8, 0) = 0xb68e6000mmap2(0xb68f6000, 65536,.., MAP_PRIVATE|MAP_FIXED, 8, 0) = 0xb68f6000mmap2(0xb6906000, 65536,.., MAP_PRIVATE|MAP_FIXED, 8, 0) = 0xb6906000 ^^^^^^^ ^^^ ^^^^^^^ ^^^^ ^ map to address length private copy fix addr filedescr to map

$ ls -l /proc/13786/fd/8lr-x------ 1 oracle dba 64 Nov 3 01:41 /proc/13786/fd/8 -> /dev/zero• _pga_large_extent_size, _uga_cga_large_extent_size – set

large extent size for initial mmap()


_use_realfree_heap=true• Heapdump level 1 (PGA)HEAP DUMP heap name="top call heap" desc=0xbb56660 extent sz=0x213c alt=96 het=32767 rec=0 flg=2 opc=2 parent=(nil) owner=(nil) nex=(nil) xsz=0x7fffcEXTENT 0 addr=0xb6946004 Chunk b694600c sz= 948 freeable "callheap "

ds=0xbb56f80 Chunk b69463c0 sz= 208 perm "perm " alo=208 Chunk b6946490 sz= 129904 free " "EXTENT 1 addr=0xb6a80004 Chunk b6a8000c sz= 424 perm "perm " alo=424 Chunk b6a801b4 sz= 107268 free " " Chunk b6a9a4b8 sz= 4164 freeable "callheap "

ds=0xbb56f80 Chunk b6a9b4fc sz= 4164 freeable "callheap "

ds=0xbb56f80 Chunk b6a9c540 sz= 1352 freeable "callheap "

ds=0xbb56f80 Chunk b6a9ca88 sz= 1072 freeable "callheap "

ds=0xbb56f80


_use_realfree_heap=false• Heapdump level 1HEAP DUMP heap name="pga heap" desc=0xbb58c80 extent sz=0x213c alt=88 het=32767 rec=0 flg=3 opc=2 parent=(nil) owner=(nil) nex=(nil) xsz=0xe23cEXTENT 0 addr=0xbc87c80...EXTENT 11 addr=0xbc30d88 Chunk bc30d90 sz= 3140 freeable "session heap "

ds=0xbbd3884 Chunk bc319d4 sz= 124 free " " Chunk bc31a50 sz= 1072 freeable "callheap "

ds=0xbb56f80 Chunk bc31e80 sz= 4164 freeable "callheap "

ds=0xbb56f80...EXTENT 16 addr=0xbbf2e60 Chunk bbf2e68 sz= 4336 free " " Chunk bbf3f58 sz= 4164 freeable "callheap "

ds=0xbb56f80


Automatic PGA Management• workarea_size_policy

– Set to manual for sessions needing manual workarea control (*_area_size parameters)

• pga_aggregate_target

• _smm_max_size 5% of total aggregate PGA

• _smm_px_max_size 30% of total agg. PGA

• _smm_retain_size (in kB)– wrkarea mem to retain for shared server sessions

• _smm_freeable_retain (in kB)– workarea memory to retain for dedicated servers


V$PGASTATSQL> select * from v$pgastat;NAME VALUE UNIT----------------------------------------- ---------- --------aggregate PGA target parameter 25165824 bytesaggregate PGA auto target 4194304 bytesglobal memory bound 1257472 bytestotal PGA inuse 36829184 bytestotal PGA allocated 85028864 bytesmaximum PGA allocated 86201344 bytestotal freeable PGA memory 3276800 bytesPGA memory freed back to OS 524288 bytestotal PGA used for auto workareas 0 bytesmaximum PGA used for auto workareas 423936 bytestotal PGA used for manual workareas 0 bytesmaximum PGA used for manual workareas 0 bytesover allocation count 1136bytes processed 15126528 bytesextra bytes read/written 0 bytescache hit percentage 100 percent


X$QESMMSGASQL> select qesmmsganm name, qesmmsgamu multiplier, 2 qesmmsgavl value, qesmmsgaun unit 3 from x$qesmmsga where qesmmsgavs = 0;NAME MULTIPLIER VALUE UNIT----------------------------------------- ---------- ---------- ----------total expected memory 1024 0 0drifting from expected memory 1024 0 0PGA max size parameter 1024 204800 0workarea SGA retain size 1024 40 0active processes 1 19 2max active processes 1 20 2percentage freeable memory to be released 100 0 3recompute count (queries) 1 0 2recompute count (total) 1 715 2last recompute time 1 131 1maximum recompute time 1 208 1cumulated IMM deamon time 1 89482 1maximum IMM deamon time 1 296 1cumulated v$pga_advice simulation time 1 2731 1workarea simulated count 1 0 2interrupt v$pga_advice simulation count 1 0 2skip v$pga_advice simulation count 1 0 2BUG count in v$pga_advice 1 0 2


ISM in Solaris• Intimate Shared Memory

• Allocated in 4MB pages, less PTEs needed

• Locked into physical memory– Therefore virtual <-> physical memory mapping

information – pagetable entries can be shared among all processes using shared segment

– Be careful with SGA_MAX_SIZE (or use DISM)

• VM mappings are likely usable in CPU TLB cache also after context switch

• Statistic: OS Other system trap CPU time ?


Hugepages In Linux• Used with use_indirect_data_buffers=true• VM is allocated in 2-4MB pages• Less PTEs/TLB usage – better CPU cache

utilization possible• Can't be paged out• Requires physically contiguous memory for

pages$ cat /proc/meminfoHugePages_Total: 0HugePages_Free: 0Hugepagesize: 4096 kB

Kernel 2.6 and RHAS 3.0's 2.4.21


Relieving Shared Pool Contention• If all else fails...

• Shared pool can be split into multiple separate areas, each protected by a different latch– If shared pool > 256M and cpu_count >= 4

– Can be set using _kghdsidx_count parameter (max. value 7)

– x$kghlu shows one row per shared pool area (+ 1 for java pool, if defined)

– Every shared pool area is protected by it's own shared pool child latch (max 7 latches)


Relieving Shared Pool Contention• Streams pool

– streams_pool_size

– If not defined, streams objects stored in shared pool • may take up to total 10% of shared pool

• and max 50% of a single shared pool area (9.2.0.5)

• Use Large pool if needed (naturally)– _large_pool_min_alloc defaults to 16000

– Fragmentation is relieved by aligning allocations to fixed boundaries dependent on above parameter


Redo Logging• Old feature: redo copy latches

– _log_simultaneous_copies

– Got in immediate mode

– Cycled through all latches if needed

• Multiple redolog strands for 16+ CPUs– log_parallelism (9i, in 10g _log_parallelism)

– Single log buffer, but...

– Multiple redo allocation child latches

– Got in immediate mode

– For log switch, all redo allocation latches have to be got


Redo Logging 2• Dynamic parallelism in 10g

– _log_parallelism_max

– _log_parallelism_dynamic = true | false

SQL> select name, gets,misses, immediate_gets ig, immediate_misses im 2 from v$latch_children where name = 'redo allocation';NAME GETS MISSES IG IM-------------------- ---------- ---------- ---------- ----------redo allocation 2799 8 398127 4redo allocation 136 0 0 0redo allocation 77 0 0 0redo allocation 207 0 0 0redo allocation 57 0 0 0redo allocation 57 0 0 0...redo allocation 24 0 0 0

31 rows selected.


Redo Logging 3• Private redolog strands (zero-copy redo)

• No latching is needed for redo entry writing since log space is preallocated– _log_private_parallelism = true | false

– _log_private_parallelism_mul (10% of sessions ?)

– _log_private_mul (5% of logbuffer is private)

• Issues– Incompatibilities with redolog dependent tools

– Logminer, Logical Standby, Streams

– Currently used for benchmarks only


In-Memory Undo (IMU)• Saving undo information in memory areas

protected by separate In-memory undo latches, not in undo segments–No expensive buffer gets and changes–Can be downgraded to a regular

transaction (flushed to undo segments)–Can also be used in conjunction with

private redo strands• Could be set for recursive transactions as

well – using hidden parameter–_recursive_imu_transactions (false in 10g)


In-Memory undo 2• _in_memory_undo – 10g NF, true by default

• _imu_pools – In-memory undo pools

• _db_writer_flush_imu – if false DBWR will not downgrade IMU txns

SQL> select name, gets, misses, immediate_gets 2 from v$latch_children where name like '%undo%';

NAME GETS MISSES IMMEDIATE_GETS---------------------- ---------- ---------- --------------undo global data 4618 0 0In memory undo latch 2568 0 302In memory undo latch 656 0 18In memory undo latch 19 0 4In memory undo latch 4 0 0In memory undo latch 3 0 0


In-Memory Undo 3SQL> select * from v$sysstat;NAME CLASS VALUE----------------------------------------------- ---------- ----------user commits 1 100039commit cleanouts 8 100217commit cleanouts successfully completed 8 100215IMU commits 128 97625...IMU Flushes 128 8IMU contention 128 7IMU recursive-transaction flush 128 0IMU undo retention flush 128 0IMU ktichg flush 128 0IMU bind flushes 128 0IMU mbu flush 128 0IMU pool not allocated 128 2407IMU CR rollbacks 128 0IMU undo allocation size 128 102861548IMU Redo allocation size 128 7136IMU- failed to get a private strand 128 2407


Latching• A latch is a memory area which is used for

atomic locking and concurrency control

• Latches are more than just one-bit switches– They contain additional bits storing various

control information

• Latches are aligned to match hardware cache line– A latch takes 32 bytes (platform specific)

– Important on SMP systems – changing one latch state wont invalidate any other latch status in other CPU's cache


Shared Latches• Feature actually available since 8.0

• Used in few locations (AQ)

• Maintained using compare-and-swap (CAS) CPU instruction (or a separate CAS latch)– Zero holders – latch lock value is zero L = 0

– First shared mode get increments by 1 L = 1

– Second shared get increments by 1 L = 2

– Latch free decrements by 1 L = 1

• Exclusive gets have to wait until L = 0

• Principle is similar to Unix semaphores


Shared CBC Latches• Reduce cache buffers chains latch contention

Locate hash bucket using ROWID & Block type

Get CBC latch in shared mode

Scan through hash chain to find best suitable buffer hdr.

Upgrade CBC latch to exclusive mode

Pin buffer header in shared mode

Release CBC latch

Copy buffer contents into PGA

Get CBC latch in exclusive mode

Unpin buffer header

Release CBC latch

Read and evaluate rows from buffer's consistent copy in PGA


Shared CBC Latches 2• Exclusive CBC latch gets are required despite

“read-only” operationSQL> select * from v$sysstat where name like 'shared hash

latch%';STATISTIC# NAME VALUE---------- ------------------------------------ ----------- 103 shared hash latch upgrades - no wait 2565999715 104 shared hash latch upgrades - wait 0

• Compare-and-swap latching can be emulatedSQL> select latch#, name, gets, immediate_gets from v$latch 2 where name like '%cas latch%' 3 / LATCH# NAME GETS IMMEDIATE_GETS---------- ---------------------- ---------- -------------- 207 cas latch 0 0 208 rm cas latch 0 0


SCN Updating• Normal SCN consists of 6 bytes

– 4 bytes base

– 2 bytes wrap#

• In 32bit systems it allowed latchless SCN incrementation (CAS update)– When 4 bytes capacity has been used, a latch is

got, wrap is updated and base is zeroed

• Modern Intel 32bit chips allow latchless 64-bit atomic updates– Probably other platforms as well


SCN Updating 2• Latchless updates not so latchless... :(

– A 10000 single-row updates with commits in loop

– One of the reasons is when > 255 changes are made within a block under same SCN, the SCN has to be increased (seq# in block header)

NAME GETS MISSES SLEEPS------------------------------ ---------- ---------- ----------mostly latch-free SCN 237937 14 14lgwr LWN SCN 37566 0 0redo on-disk SCN 117268 0 0

• redo on-disk SCN latch gets remains zero if– _disable_latch_free_SCN_writes_via_64cas = false

(default)


New Wait Events In 10gSQL> select name from v$event_name where name like 'latch:%'...NAME----------------------------------------latch: In memory undo latchlatch: KCL gc element parent latchlatch: MQL Tracking Latchlatch: cache buffer handleslatch: cache buffers chainslatch: cache buffers lru chainlatch: checkpoint queue latchlatch: enqueue hash chainslatch: gcs resource hashlatch: ges resource hash listlatch: latch wait listlatch: library cachelatch: library cache locklatch: library cache pinlatch: messages...26 rows selected. New events also for enqueues...


Summary• Lots of new features get implemented each

new release– Without our knowing– Undocumented, but here to stay

• Usage dependent on hardware and OS facilities– So, raw CPU speed might not be most important

after all in some cases• Oracle and Unix provide diagnostics utilities

for testing each features impact out– v$views, dumps, truss, switchable params, etc..


Questions?

Thank you!

http://integrid.info

[email protected]

Memory Management and Latching Improvements In Oracle9i and 10g

Documents

Memory Management and Latching