Download pdf - Java Middleware Surgery

© C2B2 Consulting Limited 2013 All Rights Reserved

Java Middleware Surgery

Andy Overton &

Mike Croft

Expert Support Team


Introduction

• Going to look at 2 scenarios or problems

• One related to issues with server slowdown and Out Of Memory errors

• One related to consuming JMS messages from a remote queue and messages dissapearing


Scenario 1

• A customer has to restart their servers regularly as they slow down and become unresponsive and they see OutOfMemoryExceptions in the logs

• Restarting the server fixes the problem


Out Of Memory Errors

• Two types

– Catastrophic – Rapid rise in memory usage, OOME occurs and server crashes. Often daily.

– Long running - Gradual slowdown over time (days) eventually causing an OOME.


What to do?

• Gather information

• Analyse the information

• Diagnose issues

• Resolve the issues


Information Gathering

• Verbose GC output

• Heap dumps

• Server Logs

• Stack Traces

• Details of system changes


Gathering verbose GC data

-verbose:gc

-Xloggc:path_to_log/gc.log

-XX:+PrintGCDetails - causes additional information about the collections to be printed

-XX:+PrintGCTimeStamps - will add a time stamp at the start of each collection. This is useful to see how frequently garbage collections occur


Gathering Heap Dump data

• Make sure the JVM is set to provide a heapdump on OutOfMemory errors

• This is not a default setting on Sun’s JVM!

• This can be done by adding the following JVM params:

-XX:-HeapDumpOnOutOfMemoryError

XX:HeapDumpPath=path_to_dump_files/java_pid<pid>.hprof


Gathering Heap Dump Data manually

• Get the process ID of the running server:

jps – l

• You should see something similar to this: 3171 weblogic.Server -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m .........

• Use jmap to take a snapshot

jmap -dump:format=b,file=dump1.bin 3171


Gathering stack trace data

• Again, retrieve the process id using jps

• Basic command for getting a stack trace and outputting it to a file

jstack -l <pid> > jstack-output.txt

• Best to take a series of snapshots, once per second for at least a minute when slowdown occurs


Analysing the data – GC Logs

• The GC logs will show details of all Garbage Collection since the server started

• The files are human readable

• Example:

[GC 325407K->83000K(776768K), 0.2300771 secs]

[GC 325816K->83372K(776768K), 0.2454258 secs]

[Full GC 267628K->83769K(776768K), 1.8479984 secs]


GCViewer - Standard Behaviour


GCViewer - Heap Exhaustion


Analysing the data – Heap Dump

• A heap dump contains information about all Java objects alive at a given point in time

• Not human readable

• Eclipse Memory Analyzer Tool

• Helps in finding memory leaks and discovering which objects are taking up the most memory


Eclipse MAT - Overview


MAT – Histogram View


MAT – Dominator View


Analysing the data – Stack Trace

• Threadlogic

• Quickly understand the health levels and get details about threads

• Thread groups help in bunching together related threads


Threadlogic – Summary View


Threadlogic – Advisory Map


Threadlogic – Details View


System changes

• Have you deployed any new applications to the server?

• Any increased load to the system?

• Any updates to the system?

• Are there any fixes or patches related to memory or performance that you are missing?


Prevention

• Audit all system changes and be prepared to rollback if necessary

• Ensure you log everything if an OOME occurs

• Use monitoring tools to monitor system behaviour and set up alerts so you’re forewarned of any anomalous behaviour


Problematic JMS

• Consuming messages from a remote queue

• Messages getting lost

• Network exceptions in logs


Problematic JMS

• Do you care if messages get lost?

• Can the remote producer be trusted?

• How many (physical) network hops?


Problematic JMS

• Use a message bridge

– More reliable than you can code yourself

– Makes adding reliability much easier


Problematic JMS

• How complex is your scenario?

– Do you process single units of work over multiple messages?

– Do you need to load balance JMS across multiple servers?


Problematic JMS

• Which provider should you use?

– Apache ActiveMQ

– Apache Camel

– WebLogic