Артем Оробец «На пути к low-latency»

On the way to low latency

Artem OrobetsSmartling Inc

You mostly care about throughput

Java for low latency?

• Increasingly Java is being used to build applications with low latency requirements

• Developers should have a deeper understanding of the JVM

What is low latency?

Latencyis a time interval betweenthe stimulationand response

What is latency?total response time = service time + time waiting for service

Those guys consider 10µs latencies slow

We are not a trading company

Latencies about 50ms is barely noticeable for

human

Requirements

• We have latency restriction 100ms

• After this time request is considered as failed

Is that what you call low latency?

Storage

* where latency is 99th percentile

Context switch problem

In production we have about 4k connections opened simultaneously

Context switch problem

• Thread per request doesn’t work• Too much overhead on context switching• Too much overhead on memoryUsually a Thread takes memory from 256kb to 1mb for the stack space!

Great architecture in theory

But in practice it is not enough!

We have have fixed a lot of things that we believed were the most problematic parts.

But they weren’t.

Find an evidence that proves your suggestion

A good tool can give you a clue

• Proper logging and log analysis tool• Performance tests• Monitoring

A good tool can give you a clue

KPI is necessity

A big amount of wrappersSignificant allocation pressure

Intensive usage of lazy initialization.First requests very slow

Smoke tests

• A good practice when you have continuous delivery

• It makes all your code initialized by the time real load comes in

Logging

Synchronous logging is not appropriate for asynchronous application

Synch logging82.83% <= 8 milliseconds99.90% <= 19 milliseconds99.94% <= 34 milliseconds99.97% <= 39 milliseconds99.98% <= 43 milliseconds99.99% <= 48 milliseconds100.00% <= 53 milliseconds

251.59 requests per second

Asynch logging99.86% <= 5 milliseconds99.91% <= 6 milliseconds99.96% <= 7 milliseconds99.98% <= 11 milliseconds99.99% <= 13 milliseconds100.00% <= 14 milliseconds

1657.28 requests per second

Another prod issue

• A long pauses which happened quite often

• We couldn’t repeat the issue in local setup

DNS lookups

• After hours of looking through tcp dumps

• We have found that DNS lookups sometimes take more than 100ms

Network configuration

TCP_NODELAY

GC logging• -Xloggc:path_to_log_file

• -XX:+PrintGCDetails

• -XX:+PrintGCDateStamps

• -XX:+PrintHeapAtGC

• -XX:+PrintTenuringDistribution

-XX:+PrintGCDetails

[GC (Allocation Failure) 260526.491: [ParNew

…

[Times: user=0.02 sys=0.00, real=0.01 secs]

-XX:+PrintHeapAtGCHeap after GC invocations=43363 (full 3): par new generation total 59008K, used 1335K

eden space 52480K, 0% from space 6528K, 20% used to space 6528K, 0% used concurrent mark-sweep generation total 2031616K, used 1830227K

-XX:+PrintTenuringDistribution

Desired survivor size 3342336 bytes, new threshold 2 (max 2)

- age 1: 878568 bytes, 878568 total

- age 2: 1616 bytes, 880184 total

: 53829K->1380K(59008K), 0.0083140 secs] 1884058K->1831609K(2090624K), 0.0084006 secs]

Too many alive objects during young gen GC

• Minimize survivors• Watch the tenuring threshold, might

need to tune it to tenure long lived objects faster

• Reduce NewSize• Reduce survivor spaces

Watch your GC

*time span is 2h

Watch your GC

[email protected]

mailto:[email protected]

Technology

Артем Оробец «На пути к low-latency»