© C2B2 Consulting Limited 2013 All Rights Reserved
Java Middleware Surgery
Andy Overton &
Mike Croft
Expert Support Team
© C2B2 Consulting Limited 2013 All Rights Reserved
Introduction
• Going to look at 2 scenarios or problems
• One related to issues with server slowdown and Out Of Memory errors
• One related to consuming JMS messages from a remote queue and messages dissapearing
© C2B2 Consulting Limited 2013 All Rights Reserved
Scenario 1
• A customer has to restart their servers regularly as they slow down and become unresponsive and they see OutOfMemoryExceptions in the logs
• Restarting the server fixes the problem
© C2B2 Consulting Limited 2013 All Rights Reserved
Out Of Memory Errors
• Two types
– Catastrophic – Rapid rise in memory usage, OOME occurs and server crashes. Often daily.
– Long running - Gradual slowdown over time (days) eventually causing an OOME.
© C2B2 Consulting Limited 2013 All Rights Reserved
What to do?
• Gather information
• Analyse the information
• Diagnose issues
• Resolve the issues
© C2B2 Consulting Limited 2013 All Rights Reserved
Information Gathering
• Verbose GC output
• Heap dumps
• Server Logs
• Stack Traces
• Details of system changes
© C2B2 Consulting Limited 2013 All Rights Reserved
Gathering verbose GC data
-verbose:gc
-Xloggc:path_to_log/gc.log
-XX:+PrintGCDetails - causes additional information about the collections to be printed
-XX:+PrintGCTimeStamps - will add a time stamp at the start of each collection. This is useful to see how frequently garbage collections occur
© C2B2 Consulting Limited 2013 All Rights Reserved
Gathering Heap Dump data
• Make sure the JVM is set to provide a heapdump on OutOfMemory errors
• This is not a default setting on Sun’s JVM!
• This can be done by adding the following JVM params:
-XX:-HeapDumpOnOutOfMemoryError
XX:HeapDumpPath=path_to_dump_files/java_pid<pid>.hprof
© C2B2 Consulting Limited 2013 All Rights Reserved
Gathering Heap Dump Data manually
• Get the process ID of the running server:
jps – l
• You should see something similar to this: 3171 weblogic.Server -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m .........
• Use jmap to take a snapshot
jmap -dump:format=b,file=dump1.bin 3171
© C2B2 Consulting Limited 2013 All Rights Reserved
Gathering stack trace data
• Again, retrieve the process id using jps
• Basic command for getting a stack trace and outputting it to a file
jstack -l <pid> > jstack-output.txt
• Best to take a series of snapshots, once per second for at least a minute when slowdown occurs
© C2B2 Consulting Limited 2013 All Rights Reserved
Analysing the data – GC Logs
• The GC logs will show details of all Garbage Collection since the server started
• The files are human readable
• Example:
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]
© C2B2 Consulting Limited 2013 All Rights Reserved
GCViewer - Standard Behaviour
© C2B2 Consulting Limited 2013 All Rights Reserved
GCViewer - Heap Exhaustion
© C2B2 Consulting Limited 2013 All Rights Reserved
Analysing the data – Heap Dump
• A heap dump contains information about all Java objects alive at a given point in time
• Not human readable
• Eclipse Memory Analyzer Tool
• Helps in finding memory leaks and discovering which objects are taking up the most memory
© C2B2 Consulting Limited 2013 All Rights Reserved
Eclipse MAT - Overview
© C2B2 Consulting Limited 2013 All Rights Reserved
MAT – Histogram View
© C2B2 Consulting Limited 2013 All Rights Reserved
MAT – Dominator View
© C2B2 Consulting Limited 2013 All Rights Reserved
Analysing the data – Stack Trace
• Threadlogic
• Quickly understand the health levels and get details about threads
• Thread groups help in bunching together related threads
© C2B2 Consulting Limited 2013 All Rights Reserved
Threadlogic – Summary View
© C2B2 Consulting Limited 2013 All Rights Reserved
Threadlogic – Advisory Map
© C2B2 Consulting Limited 2013 All Rights Reserved
Threadlogic – Details View
© C2B2 Consulting Limited 2013 All Rights Reserved
System changes
• Have you deployed any new applications to the server?
• Any increased load to the system?
• Any updates to the system?
• Are there any fixes or patches related to memory or performance that you are missing?
© C2B2 Consulting Limited 2013 All Rights Reserved
Prevention
• Audit all system changes and be prepared to rollback if necessary
• Ensure you log everything if an OOME occurs
• Use monitoring tools to monitor system behaviour and set up alerts so you’re forewarned of any anomalous behaviour
© C2B2 Consulting Limited 2013 All Rights Reserved
Problematic JMS
• Consuming messages from a remote queue
• Messages getting lost
• Network exceptions in logs
© C2B2 Consulting Limited 2013 All Rights Reserved
Problematic JMS
• Do you care if messages get lost?
• Can the remote producer be trusted?
• How many (physical) network hops?
© C2B2 Consulting Limited 2013 All Rights Reserved
Problematic JMS
• Use a message bridge
– More reliable than you can code yourself
– Makes adding reliability much easier
© C2B2 Consulting Limited 2013 All Rights Reserved
Problematic JMS
• How complex is your scenario?
– Do you process single units of work over multiple messages?
– Do you need to load balance JMS across multiple servers?
© C2B2 Consulting Limited 2013 All Rights Reserved
Problematic JMS
• Which provider should you use?
– Apache ActiveMQ
– Apache Camel
– WebLogic