36

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Embed Size (px)

Citation preview

Page 1: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest
Page 2: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead

Darren Spehr – System Architect

Page 3: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

MapQuest Going strong … since 1967 •  Maps •  Directions •  Routing •  Geocoding •  Mobile •  B2B

Page 4: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Every Adventure has a Beginning

Our mobile client needs an overhaul … Oh, and we need an auto-correct feature … well, auto-complete … actually, search ahead

Page 5: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

… Top Secret Meeting Minutes

•  How  do  we  use  auto-­‐complete  today?  •  What  are  we  searching  over?  •  How  fast  can  a  person  type?  •  What  are  we  going  to  say  in  response?  •  When  do  we  have  to  launch  this?  

Page 6: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Characteristics

•  Searches  march  from  le;  to  right  •  Expect  the  first  term  to  be  highly  relevant  •  Term  order  and  proximity  are  clues  •  Spaces  are  now  really  important  •  Expect  mixed  query  types  •  AbbreviaCons  and  misspellings  are  common  •  People  can  type  really  fast  (but  generally  less  than  10  keystrokes  per  sec)  •  Users  frequently  want  to  browse  

Page 7: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Requirements

Fast, Like really fast

140 milliseconds maximum response time

Page 8: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Methodology

 Some  opCmizaCons  can  be  planned  

Others  need  to  be  discovered  Test  alternaCves  –  opCmize  low  hanging  fruit  early  

Finally:  Take  it  to  task  

Page 9: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Multiple Types Possible

Page 10: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

The Data: Categories Franchises Locations

•  Neighborhoods  to  Countries  Points of Interest

•  Airports  •  Businesses  •  Landmarks  

Addresses •  Individual  •  Block  (Interpolated)  

In all – over 10 Billion unique documents

Page 11: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Architecture

Solr  Clusters  API  Mobile  

Client  

Mobile  App  API-­‐East  

Targeted  

LocaCon  

Business  

Address  1  

Address  2  

API-­‐West  

Targeted:  4  VMs  1  shard,  283,000  docs  

Frequent  Low  Volume  Updates  

Loca7on:  3  VMs  1  shard,  4.3  million  docs  

Frequent  Low  Volume  Updates  

Business:  5  VMs  1  shard,  13.4  million  docs  

Heavy  Updates  

Address:  30  VMs  10  shards,  100  million  docs  

No  Updates  

Interpolated  Address:  30  VMs  10  shards,  10  billion  docs*  

No  Updates  

Page 12: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Special Cases

Business data Ø  Complex synonyms Ø  Stemming needs Ø  The memory factor Ø  Complex query patterns Addresses Ø  So many! Ø  Nested structure Ø  Interpolated positions Ø  Updates an issue

Airports Ø  Airport codes Ø  International issues Locations Ø  International issues Ø  Relevance

Page 13: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Move Analysis to the ETL

A typical job includes: •  Basic  text  processing  /  cleansing  •  Stemming  •  Synonyms  and  subsCtuCon  •  Cloning  •  Filtering  •  Various  permutaCons  •  RegionalizaCon  •  Pre-­‐calculaCng  relevance  

Page 14: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Custom Doc Routing

Address data won’t fit in memory or perform well … Both collections are sharded so the size on disk is around 6-8 GB Initial, naïve balancing wasn’t nearly good enough Optimization problem that accounts for: -  Size on disk -  Predicted query volumes -  FST load (entropy)

Page 15: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Setting Up the Indexes Clean up schema.xml and solrconfig.xml Exact and Fuzzy queries tested – String fields WIN!

(Thank you FST and prefix queries!) Geo-sensitivity made easy using Spatial4J

(Thank you David Smiley!) Optimization required No NRT functionality needed

Page 16: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Query-Time Considerations Jetty -­‐  <New  class=“java.uCl.concurrent.ArrayBlockingQueue”>  -­‐  Limit  thread  pool  based  on  projected  need  

Filters used judiciously Pull in a single field from the indexes for display. Shard/route aware clients used for Addresses Estimate caching needs

Page 17: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

The API has to be Fast Too Pool as many resources as it makes sense A Note on connection pools: -  The DefaultHttpClient avoids key registry overhead -  Ask for keep-alive support -  Balance pool according to use Thread level caching used to avoid ClassLoader overhead Take out some insurance with TTLs

Executors  HfpClient  Solr  Query  

Page 18: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Keep Queries Simple Federate a larger number of queries

Break queries out by type and expectation Use custom search handlers to move the burden of “tough” queries to Solr Special case: Ø  Interpolated Addresses Ø  Business Names

Collec7on   Query  Count  

Category   3  Franchise   3  Airport   1  

CriCcal  Address   1  

LocaCons   4  

Businesses   3  Addresses  (both)   2  each  

Page 19: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

At this point the service is up and running … but the fun has only begun

Page 20: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Getting Ready to Test Choose your tool set … Ø  Test Suite (JMeter) Ø  Application Monitoring (VisualVM) Ø  GC Monitoring (VisualGC) Ø  On Host tools (top, pidstat) Ø  Runtime exposure (JMX, jsvc) Ø  Offline analysis (JMeter, GCHisto)

Page 21: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Set Boundary Conditions Production Query Volume •  What is the expected peak QPS •  Estimate 50th, 75th percentiles Know what success looks like: •  What availability are you looking for? •  What about latency? •  Caching success? Know what failure looks like: •  When do you consider a machine maxed out?

Page 22: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

So, Let’s Talk About …

Page 23: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Memory Settings Max Heap = anticipated index size in memory + delta for new gen Min Heap = Max Heap to limit HotSpot optimizations

-Xmx = -Xms Sizing the new generation (-Xmn): Ø  Start with around 1/3 of your heap size Ø  Set the Survivor space (-XX:SurvivorRatio=15) Determine the Eden Space: eden = -Xmn - 2 * ( -Xmn / 15 )

Page 24: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Example 7 GB Index + 3 GB for new generation: -Xms10G -Xmx10G -Xmn3G -XX:SurvivorRatio=15 -XX:PermSize=64m -XX:MaxPermSize=64m Survivor Size: 3 GB / 15 = 205 MB Eden Space: 3 GB – 2 * 205 MB = 2.7 GB

Page 25: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Baseline JVM Settings Simple and verbose -verbose:gc -XX:HeapDumpPath=/logs/solr_heap.hprof -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/logs/gc.log -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled

Page 26: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Other Settings We Use

TargetSurvivorRatio=70 MaxTenuringThreshold=5 PretenureSizeThreshold=64m CMSFullGCsBeforeCompaction=1 CMSInitiatingOccupancyFraction=70 CMSTriggerPermRatio=80 CMSMaxAbortablePrecleanTime=6000

+CMSScavengeBeforeRemark +UseCMSInitiatingOccupancyOnly +CMSParallelRemarkEnabled +ParallelRefProcEnabled

Page 27: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Establish Single Test Thread Settings

<ConstantThroughputTimer guiclass="TestBeanGUI" testclass="ConstantThroughputTimer" testname="Constant Throughput Timer" enabled="true"> <intProp name="calcMode">0</intProp> <doubleProp>

   <name>throughput</name>      <value>1600.0</value>      <savedValue>0.0</savedValue>  

</doubleProp> </ConstantThroughputTimer>

Page 28: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Test Cycle

Monitor   Record   Evaluate   ?  

JVM  Page  Faults  

CPU  GC  Rates  Threading  

Context  Switches  Locks  

Swapping  Network  Traffic  

Availability  Throughput  Latency  

Thread  Count  

Have  I  met  my  exit  condiCons?  

Add  More  Threads  

Page 29: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Monitoring the JVM Watch your application come to life! Memory Steady States: •  Old  GeneraCon:  ⅓  to  ¼  the  size  of  your  sepngs  •  Permanent  GeneraCon:  ½  its  size  

Tenure histogram sizes should drop off … this is your ideal level

0  2000  4000  6000  8000  

10000  12000  14000  

1   2   3   4   5  

Tenure  Size  

Tenure  Size  

Page 30: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Monitoring Solr Caches The UI is a wealth of information! Cache Strategy Ø  Size Ø  Type Look at the hit and eviction statistics Use “binary sizing” to walk the sizes up until there are diminishing returns

<filterCache class="solr.LRUCache" size="8384" initialSize="8384" autowarmCount="0"/> <documentCache class="solr.LRUCache" size="8384" initialSize="8384" autowarmCount="0"/>

Page 31: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

JVM Tuning Strategies Smaller eden spaces result in: Ø  more frequent minor GCs Ø  a higher probability of premature promotion Ø  the best performance Watch out for too much eager promotion and lengthening major GCs Mitigate major GC STW pauses by: Ø  Keeping the old generation as small as possible Ø  Maybe even a little smaller Ø  Turn off swapping Ø  Consider explicit GC

Page 32: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Bookkeeping Demo … VisualVM GCHisto Performance Stats

Page 33: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Planning for the Future What we used to do predictive expansion: 1)  Target max VM capacity 2)  Matching QPS 3)  Breakdown of traffic load 4)  Scaling factor

Page 34: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Conclusions 7 Habits of Highly Effective Tuners 1.  Know where you’re going 2.  Know where you’re starting from 3.  Test incrementally 4.  Monitor with intent 5.  Make small changes 6.  Know when to stop 7.  Plan ahead

Page 35: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Questions? Darren Spehr [email protected]

Page 36: High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead: Presented by Darren Spehr, MapQuest

Resources

VisualVM VisualGC GCHisto Java Performance – Hunt and John The Garbage Collection Handbook – Jones, Hosking and Moss Solr In Action – Grainger and Potter