71

Solr Anti-Patterns: Presented by Rafał Kuć, Sematext

Embed Size (px)

Citation preview

Solr Anti - patterns Rafał Kuć, Sematext Group, Inc.

@kucrafal @sematext

http://sematext.com

About me Sematext consultant & engineer Solr.pl co-founder Father & husband

The (not so) perfect migration

h"p://en.wikipedia.org/wiki/Bird_migra4on  

h"p://www.likesbooks.com/aara:erhours/?p=750  

From 3.1 to 4.10 (and hopefully not back)

March  2011   September  2014  

The lonely solrconfig.xml

<requestHandler  name="/update"  class="solr.XmlUpdateRequestHandler"  />        <requestHandler  name="/update/javabin"  class="solr.BinaryUpdateRequestHandler"  />    <requestHandler  name="/update/csv"  class="solr.CSVRequestHandler"  />    <requestHandler  name="/update/json"  class="solr.JsonUpdateRequestHandler"  />    

<luceneMatchVersion>LUCENE_31</luceneMatchVersion>  

<directoryFactory  name="DirectoryFactory"                                                                                                      class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>              

DOC  

DOC  

DOC  

And faulty indexing

EXCEPTIONS  :)  

And faulty indexing

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>    <lst  name="responseHeader">      <int  name="status">400</int>      <int  name="QTime">0</int>    </lst>    <lst  name="error">      <str  name="msg">missing  content  stream</str>      <int  name="code">400</int>    </lst>  </response>  

109173  [qtp1223685984-­‐20]  ERROR  org.apache.solr.core.SolrCore    ľ  org.apache.solr.common.SolrExcep4on:  missing  content  stream                  at  org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)                  at  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)                  at  org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)                  at  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)                  at  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)                  at  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)                  at  org.eclipse.je"y.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)                  at  org.eclipse.je"y.servlet.ServletHandler.doHandle(ServletHandler.java:455)                  at  org.eclipse.je"y.server.handler.ScopedHandler.handle(ScopedHandler.java:137)                  at  org.eclipse.je"y.security.SecurityHandler.handle(SecurityHandler.java:557)                  at  org.eclipse.je"y.server.session.SessionHandler.doHandle(SessionHandler.java:231)                  at  org.eclipse.je"y.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)                  at  org.eclipse.je"y.servlet.ServletHandler.doScope(ServletHandler.java:384)                  at  org.eclipse.je"y.server.session.SessionHandler.doScope(SessionHandler.java:193)                  at  org.eclipse.je"y.server.handler.ContextHandler.doScope(ContextHandler.java:1009)                  at  org.eclipse.je"y.server.handler.ScopedHandler.handle(ScopedHandler.java:135)                  at  org.eclipse.je"y.server.handler.ContextHandlerCollec4on.handle(ContextHandlerCollec4on.java:255)                  at  org.eclipse.je"y.server.handler.HandlerCollec4on.handle(HandlerCollec4on.java:154)                  at  org.eclipse.je"y.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)                  at  org.eclipse.je"y.server.Server.handle(Server.java:368)                  at  org.eclipse.je"y.server.AbstractH"pConnec4on.handleRequest(AbstractH"pConnec4on.java:489)                  at  org.eclipse.je"y.server.BlockingH"pConnec4on.handleRequest(BlockingH"pConnec4on.java:53)                  at  org.eclipse.je"y.server.AbstractH"pConnec4on.headerComplete(AbstractH"pConnec4on.java:942)                  at  org.eclipse.je"y.server.AbstractH"pConnec4on$RequestHandler.headerComplete(AbstractH"pConnec4on.java:1004)                  at  org.eclipse.je"y.h"p.H"pParser.parseNext(H"pParser.java:647)                  at  org.eclipse.je"y.h"p.H"pParser.parseAvailable(H"pParser.java:235)                  at  org.eclipse.je"y.server.BlockingH"pConnec4on.handle(BlockingH"pConnec4on.java:72)                  at  org.eclipse.je"y.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)                  at  org.eclipse.je"y.u4l.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)                  at  org.eclipse.je"y.u4l.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)                  at  java.lang.Thread.run(Unknown  Source)  

Let’s make that right

<requestHandler  name="/update"  class="solr.UpdateRequestHandler"  />        <requestHandler  name="/update/json"  class="solr.UpdateRequestHandler">      <lst  name="defaults">          <str  name="stream.contentType">applicaLon/json</str>      </lst>  </requestHandler>  

<luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion>  

<directoryFactory  name="DirectoryFactory"                                                                                                      class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>              

<nrtMode>true</nrtMode>  

<updateLog>      <str  name="dir">          ${solr.ulog.dir:}      </str>  </updateLog>  

The old schema.xml

<fieldType  name="int"  class="solr.IntField"  omitNorms="true"/>  <fieldType  name="long"  class="solr.LongField"  omitNorms="true"/>  <fieldType  name="float"  class="solr.FloatField"  omitNorms="true"/>  <fieldType  name="double"  class="solr.DoubleField"  omitNorms="true"/>  <fieldType  name="date"  class="solr.DateField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sint"  class="solr.SortableIntField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="slong"  class="solr.SortableLongField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sfloat"  class="solr.SortableFloatField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sdouble"  class="solr.SortableDoubleField"  sortMissingLast="true"  omitNorms="true"/>  

<fieldType  name="int"  class="solr.IntField"  omitNorms="true"/>  <fieldType  name="long"  class="solr.LongField"  omitNorms="true"/>  <fieldType  name="float"  class="solr.FloatField"  omitNorms="true"/>  <fieldType  name="double"  class="solr.DoubleField"  omitNorms="true"/>  <fieldType  name="date"  class="solr.DateField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sint"  class="solr.SortableIntField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="slong"  class="solr.SortableLongField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sfloat"  class="solr.SortableFloatField"  sortMissingLast="true"  omitNorms="true"/>  <fieldType  name="sdouble"  class="solr.SortableDoubleField"  sortMissingLast="true"  omitNorms="true"/>  

The old schema.xml

The new schema.xml

<fieldType  name="int"  class="solr.TrieIntField"  precisionStep="0"  posi4onIncrementGap="0"/>  <fieldType  name="float"  class="solr.TrieFloatField"  precisionStep="0"  posi4onIncrementGap="0"/>  <fieldType  name="long"  class="solr.TrieLongField"  precisionStep="0"  posi4onIncrementGap="0"/>  <fieldType  name="double"  class="solr.TrieDoubleField"  precisionStep="0"  posi4onIncrementGap="0"/>  <fieldType  name="date"  class="solr.TrieDateField"  precisionStep="0"  posi4onIncrementGap="0"/>  <fieldType  name="4nt"  class="solr.TrieIntField"  precisionStep="8"  posi4onIncrementGap="0"/>  <fieldType  name="rloat"  class="solr.TrieFloatField"  precisionStep="8"  posi4onIncrementGap="0"/>  <fieldType  name="tlong"  class="solr.TrieLongField"  precisionStep="8"  posi4onIncrementGap="0"/>  <fieldType  name="tdouble"  class="solr.TrieDoubleField"  precisionStep="8"  posi4onIncrementGap="0"/>  <fieldType  name="tdate"  class="solr.TrieDateField"  precisionStep="6"  posi4onIncrementGap="0"/>  

Threads? What threads? <Set  name="ThreadPool">      <New  class="org.eclipse.je"y.u4l.thread.QueuedThreadPool">          <Set  name="minThreads">10</Set>          <Set  name="maxThreads">200</Set>          <Set  name="detailedDump">false</Set>      </New>  </Set>  

I see deadlocks

Threads? What threads? <Set  name="ThreadPool">      <New  class="org.eclipse.je"y.u4l.thread.QueuedThreadPool">          <Set  name="minThreads">10</Set>          <Set  name="maxThreads">200</Set>          <Set  name="detailedDump">false</Set>      </New>  </Set>  

OK, so now we can actually run queries <Set  name="ThreadPool">      <New  class="org.eclipse.je"y.u4l.thread.QueuedThreadPool">          <Set  name="minThreads">10</Set>          <Set  name="maxThreads">10000</Set>          <Set  name="detailedDump">false</Set>      </New>  </Set>  

The ZooKeeper

The ZooKeeper

The ZooKeeper

The ZooKeeper

The ZooKeeper

The ZooKeeper – production

The ZooKeeper – production

-­‐DzkHost=zk1:2181,zk2:2181,zk3:2181  

The ZooKeeper – production

-­‐DzkHost=zk1:2181,zk2:2181,zk3:2181  

The ZooKeeper – production

-­‐DzkHost=zk1:2181,zk2:2181,zk3:2181  

The ZooKeeper – production

-­‐DzkHost=zk1:2181,zk2:2181,zk3:2181  

Let’s cache everything

<filterCache  class="solr.LRUCache"                            size="1048576"                            ini4alSize="1048576"                            autowarmCount="524288"/>  

<queryResultCache  class="solr.LRUCache"                            size="1048576"                            ini4alSize="1048576"                            autowarmCount="524288"/>  <documentCache  class="solr.LRUCache"  

                         size="1048576"                            ini4alSize="1048576"                            autowarmCount="0"/>  

And now let’s look at the warmup times

And now let’s look at the warmup times

OK, show us the way „Mr. Consultant”

<filterCache  class="solr.FastLRUCache"                            size="1024"                            ini4alSize="1024"                            autowarmCount="512"/>  

<queryResultCache  class="solr.LRUCache"                            size="16000"                            ini4alSize="16000"                            autowarmCount="8000"/>  <documentCache  class="solr.LRUCache"  

                         size="16384"                            ini4alSize="16384"                            autowarmCount="0"/>  

Let’s look at the warmup times again

Let’s look at the warmup times again

Bulks are for noobs

Applica4on   Applica4on   Applica4on  

Doc   Doc   Doc  

Bulks are for noobs

Applica4on   Applica4on   Applica4on  

Doc   Doc   Doc  

But let’s use bulks, just in case

But let’s use bulks, just in case

We need to refresh and hard commit

<autoCommit>      <maxTime>1000</maxTime>      <openSearcher>true</openSearcher>  </autoCommit>    <autoSo:Commit>      <maxTime>1000</maxTime>  </autoSo:Commit>  

Maybe we should only refresh?

<autoCommit>      <maxTime>60000</maxTime>      <openSearcher>false</openSearcher>  </autoCommit>    <autoSo:Commit>      <maxTime>1000</maxTime>  </autoSo:Commit>  

OK, let’s go easy with refreshing

<autoCommit>      <maxTime>60000</maxTime>      <openSearcher>false</openSearcher>  </autoCommit>    <autoSo:Commit>      <maxTime>30000</maxTime>  </autoSo:Commit>  

But I really need all that data

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="responseHeader">      <int  name="status">0</int>      <int  name="QTime">9418</int>      <lst  name="params">          <str  name="start">3000000</str>          <str  name="q">*:*</str>          <str  name="rows">5</str>      </lst>  </lst>  <result  name="response"  numFound="3284000"  start="3000000">      .      .      .  </result>  </response>  

But I really need all that data

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="responseHeader">      <int  name="status">0</int>      <int  name="QTime">9418</int>      <lst  name="params">          <str  name="start">3000000</str>          <str  name="q">*:*</str>          <str  name="rows">5</str>      </lst>  </lst>  <result  name="response"  numFound="3284000"  start="3000000">      .      .      .  </result>  </response>  

But I really need all that data

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="error">      <str  name="msg">java.lang.OutOfMemoryError:  Java  heap  space</str>      <str  name="trace">java.lang.RuntimeException:  java.lang.OutOfMemoryError:  Java  heap  space  

 at  org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)    at  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)  

             .                .                .  Caused  by:  java.lang.OutOfMemoryError:  Java  heap  space  

 .    .    .  

</str>      <int  name="code">500</int>  </lst>  </response>  

But I really need all that data

Query  

But I really need all that data

But I really need all that data

But I really need all that data

Response  

Use the scroll Luke

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'  

Use the scroll Luke

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="responseHeader">      <int  name="status">0</int>      <int  name="QTime">189</int>      <lst  name="params">          <str  name="sort">score  desc,id  desc</str>          <str  name="q">*:*</str>          <str  name="cursorMark">*</str>      </lst>  </lst>  <result  name="response"  numFound="3284000"  start="0">      <doc>        ...      </doc>      .      .      .  </result>  <str  name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str>  </response>  

Use the scroll Luke

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc                                                                                                                                                                &cursorMark=AoIIP4AAACY5OTk5OTA='  

Use the scroll Luke

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc                                                                                                                                                                &cursorMark=AoIIP4AAACY5OTk5OTA='  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="responseHeader">      <int  name="status">0</int>      <int  name="QTime">184</int>      <lst  name="params">          <str  name="sort">score  desc,id  desc</str>          <str  name="q">*:*</str>          <str  name="cursorMark">AoIIP4AAACY5OTk5OTA=</str>      </lst>  </lst>  <result  name="response"  numFound="3284000"  start="0">      <doc>          ...      </doc>      .      .      .  </result>  <str  name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str>  </response>  

Limiting faceting, why bother?

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…                            facet.limit=-­‐1&facet.mincount=0'  

Limiting faceting, why bother?

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…                            facet.limit=-­‐1&facet.mincount=0'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>  <lst  name="responseHeader">      <int  name="status">0</int>      <int  name="QTime">9967</int>      <lst  name="params">      ...      </lst>  </lst>  <result  name="response"  numFound="3284000"  start="0">    .    .    .  </result>  <lst  name="facet_counts">      <lst  name="facet_fields">          <lst  name="tag">          ...          </lst>      </lst>  </lst>  </response>  

Limiting faceting, why bother?

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…                            facet.limit=-­‐1&facet.mincount=0'  

<?xml  version="1.0"  encoding="UTF-­‐8"?>  <response>    .    .    .    <lst  name="error">      <str  name="msg">Error  while  processing  facet  fields:  java.lang.OutOfMemoryError:  Java  heap  space</str>      <str  name="trace">org.apache.solr.common.SolrException:  Error  while  processing  facet  fields:  java.lang.OutOfMemoryError:  Java  heap  space  

 .    .    .  

Caused  by:  java.lang.OutOfMemoryError:  Java  heap  space    at  org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685)    .    .    .  

   </str>      <int  name="code">500</int>    </lst>  </response>  

Now let’s look at performance

Now let’s look at performance

Now let’s look at performance

Now let’s look at performance

Now let’s look at performance

Magic happens with small changes

curl  -­‐XGET  'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…                            facet.limit=100&facet.mincount=1'  

Magic happens with small changes

Magic happens with small changes

Magic happens with small changes

Magic happens with small changes

Magic happens with small changes

Magic happens with small changes

Magic happens with small changes

Monitoring in production

h"p://sematext.com/spm/index.html  

And remember…

Release  Notes  Solr  4.10.0  

<luceneMatchVersion>  3.1  

</luceneMatchVersion>  

Quick summary

h"p://www.soothetube.com/2013/12/29/thats-­‐all-­‐folks/  

We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide!

http://sematext.com/about/jobs.html

Thank you! Rafał Kuć

@kucrafal [email protected]

Sematext

@sematext http://sematext.com http://blog.sematext.com