Senior Software Engineer, Etsy.com
LIVING WITH GARBAGEGregg Donovan
3.5 Years Solr & Lucene at Etsy.com
3 years Solr & Lucene at TheLadders.com
8+ million members
20 million items
800k+ active sellers
8+ billion pageviews per month
CodeAsCraft.etsy.com
Understanding GCMonitoring GC
Debugging Memory LeaksDesign for Partial Availability
public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };
public static void main(String[] args) { args = myArgs;
int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); }}
New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)
From Garbage Collection Handbook
markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)
From Garbage Collection Handbook
Trivia: Who invented the first GC and Mark-and-Sweep?
Weak Generational Hypothesis
Where do objects in common Solr application live?
AtomicReaderContext?
SolrIndexSearcher?
SolrRequest?
GC Terminology:Concurrent vs Parallel
JVM Collectors
Serial
Trivia: How does System.identityHashCode() work?
Throughput
CMS
Garbage First (G1)
Continuously Concurrent Compacting Collector (C4)
IBM, Dalvik, etc.?
Why Throughput?
Monitoring
GC time per Solr request
...import java.lang.management.*;...
public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }
Available via JMX
Visual GC
export GC_DEBUG="-verbose:gc \-XX:+PrintGCDateStamps \-XX:+PrintHeapAtGC \-XX:+PrintGCApplicationStoppedTime \-XX:+PrintGCApplicationConcurrentTime \-XX:+PrintAdaptiveSizePolicy \-XX:AdaptiveSizePolicyOutputInterval=1 \-XX:+PrintTenuringDistribution \-XX:+PrintGCDetails \-XX:+PrintCommandLineFlags \-XX:+PrintSafepointStatistics \-Xloggc:/var/log/search/gc.log"
2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)}
GC Log Analyzers?
GCHisto
GCViewer
garbagecat
Graphing with Logster
github.com/etsy/logster
YourKit.com
Designing for Partial Availability
JVMTI GC Hook?
How can a client ignore GC-ing hosts?
Server lies to clients about availability
TCP socket receive buffer
TCP write buffer
“Banner” protocol1. Connect via TCP
2. Wait ~1-10ms
3. Either receive magic four byte header or try another host
4. Only send query after receiving header from server
0xC0DEA5CF
What if GC happens mid-request?
Backup requests
Jeff Dean: Achieving Rapid Response Time in Large
Online Services
Solr sharding?
Right now, only as fast as the slowest shard.
“Make a reliable whole out of unreliable parts.”
Memory Leaks
Solr API hooks for custom code
QParserPlugin SearchComponent
SolrRequestHandler SolrEventListener
SolrCache ValueSourceParser
etc.FieldType
PSA: Are you sure you need custom code?
CoreContainer#getCore()
RefCounted<SolrIndexSearcher>
SolrIndexSearcher generation marking with YourKit triggers
Miscellaneous Topics
System.gc()?
-XX:+UseCompressedOops
-XX:+UseNUMA
Paging
#!/usr/bin/env bash
# This script is designed to be run every minute by cron.
host=$(hostname -s)
psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)min_flt=$(echo $psout | awk '{print $1}') # minor page faultsmaj_flt=$(echo $psout | awk '{print $2}') # major page faults
epoch_s=$(date +%s)
echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003
Solution 1: Buy more RAM
Ideally enough RAM to:Keep index in OS file buffersAND ensure no paging of VM memory AND whatever else happens on the box
~$5-10/GB
echo “0” > /proc/sys/vm/swappiness
mlock()/mlockall()
echo “-17” > /proc/$PID/oom_adj
Mercy from the OOM Killer
Huge Pages
-XX:+AlwaysPreTouch
Possible Future Directions
Many small VMs instead of one large VM
microsharding
In-memory Lucene codecs
I.e. custom DirectPostingsFormat
Off-heap memory with sun.misc.Unsafe?
Try G1 again
Try C4 again
Resources
gchandbook.org
bit.ly/giltene
Gil Tene: Understanding Java Garbage Collection
bit.ly/cpumemory
Ulrich Drepper: What Every Programmer Should Know About Memory
github.com/pingtimeout/jvm-options
Read the JVM Source(Not as scary as it sounds.)
hg.openjdk.java.net/jdk7/jdk7
Mechanical Sympathy Google Group
bit.ly/mechsym