Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013

Preview:

DESCRIPTION

Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM. At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success. At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.

Citation preview

Senior Software Engineer, Etsy.com

LIVING WITH GARBAGEGregg Donovan

3.5 Years Solr & Lucene at Etsy.com

3 years Solr & Lucene at TheLadders.com

8+ million members

20 million items

800k+ active sellers

8+ billion pageviews per month

CodeAsCraft.etsy.com

Understanding GCMonitoring GC

Debugging Memory LeaksDesign for Partial Availability

public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };

public static void main(String[] args) { args = myArgs;

int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); }}

New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)

From Garbage Collection Handbook

markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)

From Garbage Collection Handbook

Trivia: Who invented the first GC and Mark-and-Sweep?

Weak Generational Hypothesis

Where do objects in common Solr application live?

AtomicReaderContext?

SolrIndexSearcher?

SolrRequest?

GC Terminology:Concurrent vs Parallel

JVM Collectors

Serial

Trivia: How does System.identityHashCode() work?

Throughput

CMS

Garbage First (G1)

Continuously Concurrent Compacting Collector (C4)

IBM, Dalvik, etc.?

Why Throughput?

Monitoring

GC time per Solr request

...import java.lang.management.*;...

public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }

Available via JMX

Visual GC

export GC_DEBUG="-verbose:gc \-XX:+PrintGCDateStamps \-XX:+PrintHeapAtGC \-XX:+PrintGCApplicationStoppedTime \-XX:+PrintGCApplicationConcurrentTime \-XX:+PrintAdaptiveSizePolicy \-XX:AdaptiveSizePolicyOutputInterval=1 \-XX:+PrintTenuringDistribution \-XX:+PrintGCDetails \-XX:+PrintCommandLineFlags \-XX:+PrintSafepointStatistics \-Xloggc:/var/log/search/gc.log"

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)}

GC Log Analyzers?

GCHisto

GCViewer

garbagecat

Graphing with Logster

github.com/etsy/logster

GC Dashboardgithub.com/etsy/dashboard

YourKit.com

Designing for Partial Availability

JVMTI GC Hook?

How can a client ignore GC-ing hosts?

Server lies to clients about availability

TCP socket receive buffer

TCP write buffer

“Banner” protocol1. Connect via TCP

2. Wait ~1-10ms

3. Either receive magic four byte header or try another host

4. Only send query after receiving header from server

0xC0DEA5CF

What if GC happens mid-request?

Backup requests

Jeff Dean: Achieving Rapid Response Time in Large

Online Services

Solr sharding?

Right now, only as fast as the slowest shard.

“Make a reliable whole out of unreliable parts.”

Memory Leaks

Solr API hooks for custom code

QParserPlugin SearchComponent

SolrRequestHandler SolrEventListener

SolrCache ValueSourceParser

etc.FieldType

PSA: Are you sure you need custom code?

CoreContainer#getCore()

RefCounted<SolrIndexSearcher>

SolrIndexSearcher generation marking with YourKit triggers

Miscellaneous Topics

System.gc()?

-XX:+UseCompressedOops

-XX:+UseNUMA

Paging

#!/usr/bin/env bash

# This script is designed to be run every minute by cron.

host=$(hostname -s)

psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)min_flt=$(echo $psout | awk '{print $1}') # minor page faultsmaj_flt=$(echo $psout | awk '{print $2}') # major page faults

epoch_s=$(date +%s)

echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003

Solution 1: Buy more RAM

Ideally enough RAM to:Keep index in OS file buffersAND ensure no paging of VM memory AND whatever else happens on the box

~$5-10/GB

echo “0” > /proc/sys/vm/swappiness

mlock()/mlockall()

echo “-17” > /proc/$PID/oom_adj

Mercy from the OOM Killer

Huge Pages

-XX:+AlwaysPreTouch

Possible Future Directions

Many small VMs instead of one large VM

microsharding

In-memory Lucene codecs

I.e. custom DirectPostingsFormat

Off-heap memory with sun.misc.Unsafe?

Try G1 again

Try C4 again

Resources

gchandbook.org

bit.ly/mmgcb

Mark Miller’s GC Bootcamp

bit.ly/giltene

Gil Tene: Understanding Java Garbage Collection

bit.ly/cpumemory

Ulrich Drepper: What Every Programmer Should Know About Memory

github.com/pingtimeout/jvm-options

Read the JVM Source(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7

Mechanical Sympathy Google Group

bit.ly/mechsym

CONTACT

Gregg Donovan gregg@etsy.com