If you can't read please download the document
Upload
mattkilner
View
3.852
Download
0
Embed Size (px)
Citation preview
IBM Brand Template
Matthew Kilner IBM Java L3 Service Core team lead23rd September 2013
Debugging Native Heap OOM - Tools & Techniques
Important Disclaimers
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBMS CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: - CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS
About me
Matthew Kilner
Work for IBM 13 years working on IBM JavaMemory Management
Class Sharing
RAS
Currently leading the Core customer support team.
Contact info [email protected]
Twitter: @IBMJTC
Youtube: IBM_JTC
Visit the IBM booth #5112 and meet other IBM developers at JavaOne 2013
An understanding of what we mean by the Native Heap.
A clear problem determination path for Native Heap OOM Errors: How to determine you have a native heap OOM.
An outline process for determining what is causing it.
What should you get from this talk?
All applications run within the bounds of an operating system process Java is no exception
The JVM is subject to the same restrictions as any other application, the most pertinent being:Addressing
OS Memory Model
The Java process
Every process has a finite address space which is dictated by the architecture it runs on.
A 32bit architecture has an addressable range of: 2^32
0x00000000 0xFFFFFFFF
which is 4GB
A 64bit architecture has an addressable range of:2^64
0x0000000000000000 0xFFFFFFFFFFFFFFFF
which is 16 EiB
What do we mean by addressing restrictions?
Not all addressable memory is available to a process.
The operating system has its own requirements such as:The kernel
The runtime support libraries
Requirements vary by Operating System both in terms of: How much memory is needed, and
Where that memory is located
The addressable memory remaining is often referred to as User Space
What do we mean by Memory Model Restrictions?
A view by platform
The chart shows default maximum user space available on common 32-bit platforms
Java Heap
Just In Time (JIT) DataRuntime data & executable code
Virtual Machine ResourcesRAS engines & GC Infrastructure
Native & JNI Allocations
Resources to underpin Java ObjectsClasses and ClassLoaders
Threads
Direct java.nio.ByteBuffers
Sockets
What goes in the User Space
Kernel Space
User Space
Java Heap
VM Resources
Native & JNI
Allocations
Java Libraries
JIT Data
We define the native heap as:
Native Heap = User Space - Maximum Heap Size
It is the total User Space not reserved for backing the Java Heap.
The Native heap
User Space availability is not our only consideration when looking at native memory shortage.
Machines have to be able to back addressable memory with physical memory.
The total physical memory available on a machine is
Physical RAM + Swap Space
Problem symptoms vary based on which resource runs out.
Not quite the whole story
The chart shows some of the symptoms you see when a particular resource is exhausted
Failure Symptoms
Address SpacePhysical RAMPhysical + Swap
(Virtual Memory)OutOfMemoryError
Console Messages
Crash/Other?Memory Pages
Unresponsive appsOutOfMemoryError
Linux: OOM Killer
Win/Sol: alloc's fail
Resource
Symptoms
Detecting a problem is easy if you hit one of the symptoms described previously. OutOfMemoryError's should be fatal to your application.
Paging will cause obvious unresponsiveness or slowdown.
Early detection is possible if you monitor the size of your process. Monitoring is also an important part of understanding any native memory issue.
How do I know if I have a problem?
Process sizes are reported in two ways across all platforms: Resident Size
Virtual Size
Each platform has its own methods for obtaining this information: Windows: Performance Monitor
Linux & z/OS:ps
AIX:svmon
The IBM Garbage Collection & Memory Visualizer (GCMV) tool provides scripts and instructions in its help documentation for gathering the necessary data
Monitoring the size of your process
Analysis of the process size is best done visually.
GCMV and Performance Monitor will plot the raw data for you.
A persistent growth in the virtual size of your process may indicate an issue.
Plotting the size of your process
When any OOME occurs the IBM JVM's default configuration will write a javacore file.
This file provides several pointers to the fact you have an OOM related to the native heap: A header that includes information on which resource cannot be allocated
Details of the current memory usage on the Java heap
A record of recent Garbage Collection activity
The stack of the thread encountering the problem
How do I identify a Native OOM from a javacore?
At the top of each javacore is its header which tells you what event caused the file to be written.
Under certain conditions additional information is written at the head of the javacore when the OOME is triggered:1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
On a Java Heap OOM you will see:1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received
The javacore header
Within the javacore you will find the MEMINFO section.
0SECTION MEMINFO subcomponent dump routineNULL =================================1STHEAPFREE Bytes of Heap Space Free: 3B3AE8 1STHEAPALLOC Bytes of Heap Space Allocated: 400000
If you see a large value for bytes free then it is a good indicator that you are experiencing a native heap OOME.
The javacore heap usage summary
The javacore file also contains a snapshot of the most recent GC activity.
Where an OOM is due to a java heap allocation failing you will see this in the data:
1STGCHTYPE GC History 3STHSTTYPE 14:12:41:476340000 GMT j9mm.101 - J9AllocateIndexableObject() returning NULL! 8000024 bytes requested for object of class 00007FD4801D6E10 from memory space 'Flat' id=00007FD480046EE8
If this entry is not present it is another good indicator you are experiencing a native heap OOME.
The javacore GC history
The current thread within the javacore is the thread which triggered the OOME The top stack frame contains the interesting data
You can identify whether the frame is native:3XMTHREADINFO "main" J9VMThread:0xB8D1C600, j9thread_t:0xB8D019E4, java/lang/Thread:0x98B01960, state:R, prio=5...............3XMTHREADINFO3 Java callstack:4XESTACKTRACE at java/lang/Thread.startImpl(Native Method)4XESTACKTRACE at java/lang/Thread.start(Thread.java:891)
Or java:3XMTHREADINFO "main" J9VMThread:0x00007FD480043D00, j9thread_t:0x00007FD4800079B0, ...............3XMTHREADINFO3 Java callstack:4XESTACKTRACE at StringOOM.main(StringOOM.java:11)
If it is native then this is another good indicator you are experiencing a native heap OOME.
The javacore current thread
It can be tricky to attribute a root cause to a Native OOME Debug capabilities vary by platform.
The fundamental approach is the same irrespective of platform:1) Understand the rate of Native Memory Growth
2) Capture multiple snapshots of data over time.
3) Compare the snapshots and attribute growth to components.
How do I find out what is causing my native OOME?
The rate of memory growth can be determined from the size of your process.
Calculate the delta in virtual size of your process between data snapshots.
Other data snapshots will identify different areas of native memory growth, understanding the proportion each area contributes to the total growth is key to identifying a root cause.
Understanding the rate of memory growth
Some data is common across platforms, other data is platform specific
Common data: Javacore files taken at regular intervals
Core files taken at regular intervals (optional)
Platform specific data: Windows:UMDH tracing, Debug Diag tracing, VMMAP tracing
Linux: No recommended tools
AIX:Debug malloc tracing
What other data is needed?
From the J9 2.4 JVM the javacore file contains a NATIVEMEMINFO section.
NATIVEMEMINFO subcomponent dump routine
=======================================
JRE: 555,698,264 bytes / 1208 allocations
+--VM: 552,977,664 bytes / 856 allocations
| +--Classes: 1,949,664 bytes / 92 allocations
| +--Memory Manager (GC): 547,705,848 bytes / 146 allocations
| | +--Java Heap: 536,875,008 bytes / 1 allocation
| | +--Other: 10,830,840 bytes / 145 allocations
| +--Threads: 2,660,804 bytes / 104 allocations
| | +--Java Stack: 64,944 bytes / 9 allocations
| | +--Native Stack: 2,523,136 bytes / 11 allocations
| | +--Other: 72,724 bytes / 84 allocations
| +--Trace: 92,464 bytes / 208 allocationsComparing the output from
multiple javacores can identify areas of growth in the JVM
If an identified area is a significant portion of the total memory growth it is likely the cause of the problem.
The javacore NATIVEMEMINFO section
| +--JVMTI: 17,328 bytes / 13 allocations
| +--JNI: 15,944 bytes / 32 allocations
| +--Port Library: 6,824 bytes / 56 allocations
| +--Other: 528,788 bytes / 205 allocations+--JIT: 1,748,808 bytes
/ 82 allocations
| +--JIT Code Cache: 524,320 bytes / 1 allocation
| +--JIT Data Cache: 524,336 bytes / 1 allocation
| +--Other: 700,152 bytes / 80 allocations
+--Class Libraries: 971,792 bytes / 270 allocations
| +--Harmony Class Libraries: 1,024 bytes / 1 allocation
| +--VM Class Libraries: 970,768 bytes / 276 allocations
| | +--sun.misc.Unsafe: 69,688 bytes / 1 allocation
| | +--Other: 901,080 bytes / 275 allocations
If your JDK is based on a JVM earlier than the J9 2.4 JVM the javacore file doe not have a NATIVEMEMINFO section.
They do still contain valuable insight, although a little more work is required to obtain it. The MEMINFO subcomponent dump routine lists a summary of memory blocks the JDK has allocated for various purposes
The Classes subcomponent dump routine lists a summary of classloaders and loaded classes.
By parsing and comparing this information across multiple javacore files you can determine you have any signs of a memory area growth or classloader leak.
Javacores from earlier JDK versions
Binary core files provide the same view as the javacore but require processing with external tools: Interactive Diagnostic Dump Explorer (IDDE).
On earlier JDK versions they provide a more accurate summary of JDK memory allocations the the javacore file.
They also provide a complete image of the process, which means: You can inspect the content of memory
You can inspect free memory blocks (subject to platform)
You can inspect allocated memory blocks (subject to platform)
These advantages are offset by the size of the files and additional overhead of processing them.
What core files offer
Windows offers three excellent options for understanding your native memory growth: UMDH
DebugDiag
VMMAP
Each has distinct usage characteristics: UMDH is command line driven.
DebugDiag injects a tracking library into the process and parses core dumps.
VMMAP launches the application you wish to track and is GUI based.
Windows tooling
Commercial tools are available but carry a license fee
Free tools also exist but carry a large performance overhead
We have custom built tooling that logs all calls to allocate and free memory Building your own is possible.
Linux tooling
AIX provides a debug extension directly into the malloc subsystem MALLOCDEBUG
Enables tracing in the allocation subroutines
At termination of the process a report is generated detailing all allocations that were not freed Some additional parsing is needed
AIX tooling
While each platform has different tools, the end result from them is largely the same
The tools give you one or more stack traces that relate to memory allocations that have not been freed.
You are looking for the stacks that demonstrate the same or similar rates of growth as the total process size between snapshots of data.
What the platform tooling tells you
The next step depends on the stack that has been identified as the root cause.
If it is native code you own: Check to see you are releasing the memory your are allocating
If it is native code relating to a java class: Check that you don't have an on heap leak of the related object type
Check for known issues
Contact the JDK vendor for assistance
If it is third party native code: Check for any known issues
Contact the vendor for assistance
What next?
The process for diagnosis and root cause determination for a native OOME is as follows:
1) Understand the limitations of the platform
2) Monitor the size of the process to understand the rate of memory growth
3) Use a combination of JDK and platform diagnostics to determine the area or stack driving the growth
In Summary
Visit the IBM booth #5112
BOF 4159 - The Most Useful Tools for Debugging on Windows Today: 9/23/13 (Monday) 7:30 PM - Hilton - Continental Ballroom 6
Thanks for the memory http://www.ibm.com/developerworks/java/library/j-nativememory-linux/
https://www.ibm.com/developerworks/java/library/j-nativememory-aix/
I would like to know more
Questions?
IBM@JavaOne
http://ibm.co/JavaOne2013
blue-logo
2013 IBM Corporation
IBM Confidential
21 September 2013
blue-logo 2013 IBM Corporation
37-degree-pos-tri-logoClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
blue-logo 2013 IBM Corporation
21 September 2013
IBM Confidential
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline Level
2009 IBM Corporation
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline Level
2009 IBM Corporation
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline Level
2009 IBM Corporation
blue-logo
2013 IBM Corporation
IBM Confidential
21 September 2013
blue-logo 2013 IBM Corporation
37-degree-pos-tri-logoClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
blue-logo 2013 IBM Corporation
21 September 2013
Operating SystemGiBUser SpaceKernel Space
Windows 3222
Windows 32 /3GB31
Linux 32 bit31
Linux 32 bit Hugemem44
zLinux 3122
AIX 323.250.75
zOS1.70.3