Upload
dataart
View
1.086
Download
4
Embed Size (px)
Citation preview
On the way to low latency
Artem OrobetsSmartling Inc
You mostly care about throughput
Java for low latency?
• Increasingly Java is being used to build applications with low latency requirements
• Developers should have a deeper understanding of the JVM
What is low latency?
Latencyis a time interval betweenthe stimulationand response
What is latency?total response time = service time + time waiting for service
Those guys consider 10µs latencies slow
We are not a trading company
Latencies about 50ms is barely noticeable for
human
Requirements
• We have latency restriction 100ms
• After this time request is considered as failed
Is that what you call low latency?
Storage
* where latency is 99th percentile
Context switch problem
In production we have about 4k connections opened simultaneously
Context switch problem
• Thread per request doesn’t work• Too much overhead on context switching• Too much overhead on memoryUsually a Thread takes memory from 256kb to 1mb for the stack space!
Great architecture in theory
But in practice it is not enough!
We have have fixed a lot of things that we believed were the most problematic parts.
But they weren’t.
Find an evidence that proves your suggestion
A good tool can give you a clue
• Proper logging and log analysis tool• Performance tests• Monitoring
A good tool can give you a clue
KPI is necessity
A big amount of wrappersSignificant allocation pressure
Intensive usage of lazy initialization.First requests very slow
Smoke tests
• A good practice when you have continuous delivery
• It makes all your code initialized by the time real load comes in
Logging
Synchronous logging is not appropriate for asynchronous application
Synch logging82.83% <= 8 milliseconds99.90% <= 19 milliseconds99.94% <= 34 milliseconds99.97% <= 39 milliseconds99.98% <= 43 milliseconds99.99% <= 48 milliseconds100.00% <= 53 milliseconds
251.59 requests per second
Asynch logging99.86% <= 5 milliseconds99.91% <= 6 milliseconds99.96% <= 7 milliseconds99.98% <= 11 milliseconds99.99% <= 13 milliseconds100.00% <= 14 milliseconds
1657.28 requests per second
Another prod issue
• A long pauses which happened quite often
• We couldn’t repeat the issue in local setup
DNS lookups
• After hours of looking through tcp dumps
• We have found that DNS lookups sometimes take more than 100ms
Network configuration
TCP_NODELAY
GC logging• -Xloggc:path_to_log_file
• -XX:+PrintGCDetails
• -XX:+PrintGCDateStamps
• -XX:+PrintHeapAtGC
• -XX:+PrintTenuringDistribution
-XX:+PrintGCDetails
[GC (Allocation Failure) 260526.491: [ParNew
…
[Times: user=0.02 sys=0.00, real=0.01 secs]
-XX:+PrintHeapAtGCHeap after GC invocations=43363 (full 3): par new generation total 59008K, used 1335K
eden space 52480K, 0% from space 6528K, 20% used to space 6528K, 0% used concurrent mark-sweep generation total 2031616K, used 1830227K
-XX:+PrintTenuringDistribution
Desired survivor size 3342336 bytes, new threshold 2 (max 2)
- age 1: 878568 bytes, 878568 total
- age 2: 1616 bytes, 880184 total
: 53829K->1380K(59008K), 0.0083140 secs] 1884058K->1831609K(2090624K), 0.0084006 secs]
Too many alive objects during young gen GC
• Minimize survivors• Watch the tenuring threshold, might
need to tune it to tenure long lived objects faster
• Reduce NewSize• Reduce survivor spaces
Watch your GC
*time span is 2h
Watch your GC