71

Solr Anti - patterns

Embed Size (px)

DESCRIPTION

Solr Anti-patterns talk given during Lucene Revolution 2014 in Washington DC.

Citation preview

Page 1: Solr Anti - patterns
Page 2: Solr Anti - patterns

Solr Anti - patterns Rafał Kuć, Sematext Group, Inc.

@kucrafal

@sematext

http://sematext.com

Page 3: Solr Anti - patterns

About me

Sematext consultant & engineer

Solr.pl co-founder

Father & husband

Page 4: Solr Anti - patterns

The (not so) perfect migration

http://en.wikipedia.org/wiki/Bird_migration

http://www.likesbooks.com/aarafterhours/?p=750

Page 5: Solr Anti - patterns

From 3.1 to 4.10 (and hopefully not back)

March 2011 September 2014

Page 6: Solr Anti - patterns

The lonely solrconfig.xml

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />

<luceneMatchVersion>LUCENE_31</luceneMatchVersion>

<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>

Page 7: Solr Anti - patterns

DOC

DOC

DOC

And faulty indexing

EXCEPTIONS :)

Page 8: Solr Anti - patterns

And faulty indexing

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response>

109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)

Page 9: Solr Anti - patterns

Let’s make that right

<requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler>

<luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion>

<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

<nrtMode>true</nrtMode>

<updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>

Page 10: Solr Anti - patterns

The old schema.xml

<fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>

Page 11: Solr Anti - patterns

<fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>

The old schema.xml

Page 12: Solr Anti - patterns

The new schema.xml

<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>

Page 13: Solr Anti - patterns

Threads? What threads?

<Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>

Page 14: Solr Anti - patterns

I see deadlocks

Page 15: Solr Anti - patterns

Threads? What threads?

<Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>

Page 16: Solr Anti - patterns

OK, so now we can actually run queries

<Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>

Page 17: Solr Anti - patterns

The ZooKeeper

Page 18: Solr Anti - patterns

The ZooKeeper

Page 19: Solr Anti - patterns

The ZooKeeper

Page 20: Solr Anti - patterns

The ZooKeeper

Page 21: Solr Anti - patterns

The ZooKeeper

Page 22: Solr Anti - patterns

The ZooKeeper – production

Page 23: Solr Anti - patterns

The ZooKeeper – production

-DzkHost=zk1:2181,zk2:2181,zk3:2181

Page 24: Solr Anti - patterns

The ZooKeeper – production

-DzkHost=zk1:2181,zk2:2181,zk3:2181

Page 25: Solr Anti - patterns

The ZooKeeper – production

-DzkHost=zk1:2181,zk2:2181,zk3:2181

Page 26: Solr Anti - patterns

The ZooKeeper – production

-DzkHost=zk1:2181,zk2:2181,zk3:2181

Page 27: Solr Anti - patterns

Let’s cache everything

<filterCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/>

<queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <documentCache class="solr.LRUCache"

size="1048576" initialSize="1048576" autowarmCount="0"/>

Page 28: Solr Anti - patterns

And now let’s look at the warmup times

Page 29: Solr Anti - patterns

And now let’s look at the warmup times

Page 30: Solr Anti - patterns

OK, show us the way „Mr. Consultant”

<filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/>

<queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/> <documentCache class="solr.LRUCache"

size="16384" initialSize="16384" autowarmCount="0"/>

Page 31: Solr Anti - patterns

Let’s look at the warmup times again

Page 32: Solr Anti - patterns

Let’s look at the warmup times again

Page 33: Solr Anti - patterns

Bulks are for noobs

Application Application Application

Doc Doc Doc

Page 34: Solr Anti - patterns

Bulks are for noobs

Application Application Application

Doc Doc Doc

Page 35: Solr Anti - patterns

But let’s use bulks, just in case

Page 36: Solr Anti - patterns

But let’s use bulks, just in case

Page 37: Solr Anti - patterns

We need to refresh and hard commit

<autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>

Page 38: Solr Anti - patterns

Maybe we should only refresh?

<autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>

Page 39: Solr Anti - patterns

OK, let’s go easy with refreshing

<autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>

Page 40: Solr Anti - patterns

But I really need all that data

curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'

Page 41: Solr Anti - patterns

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response>

But I really need all that data

curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'

Page 42: Solr Anti - patterns

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response>

But I really need all that data

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448) . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response>

curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'

Page 43: Solr Anti - patterns

But I really need all that data

Query

Page 44: Solr Anti - patterns

But I really need all that data

Page 45: Solr Anti - patterns

But I really need all that data

Page 46: Solr Anti - patterns

But I really need all that data

Response

Page 47: Solr Anti - patterns

Use the scroll Luke

curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'

Page 48: Solr Anti - patterns

Use the scroll Luke

curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>

Page 49: Solr Anti - patterns

Use the scroll Luke

curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='

Page 50: Solr Anti - patterns

Use the scroll Luke

curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>

Page 51: Solr Anti - patterns

Limiting faceting, why bother?

curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'

Page 52: Solr Anti - patterns

Limiting faceting, why bother?

curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'

<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>

Page 53: Solr Anti - patterns

Limiting faceting, why bother?

curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'

<?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . . </str> <int name="code">500</int> </lst> </response>

Page 54: Solr Anti - patterns

Now let’s look at performance

Page 55: Solr Anti - patterns

Now let’s look at performance

Page 56: Solr Anti - patterns

Now let’s look at performance

Page 57: Solr Anti - patterns

Now let’s look at performance

Page 58: Solr Anti - patterns

Now let’s look at performance

Page 59: Solr Anti - patterns

Magic happens with small changes

curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'

Page 60: Solr Anti - patterns

Magic happens with small changes

Page 61: Solr Anti - patterns

Magic happens with small changes

Page 62: Solr Anti - patterns

Magic happens with small changes

Page 63: Solr Anti - patterns

Magic happens with small changes

Page 64: Solr Anti - patterns

Magic happens with small changes

Page 65: Solr Anti - patterns

Magic happens with small changes

Page 66: Solr Anti - patterns

Magic happens with small changes

Page 67: Solr Anti - patterns

Monitoring in production

http://sematext.com/spm/index.html

Page 68: Solr Anti - patterns

And remember…

<luceneMatchVersion> 3.1

</luceneMatchVersion>

Page 69: Solr Anti - patterns

Quick summary

http://www.soothetube.com/2013/12/29/thats-all-folks/

Page 70: Solr Anti - patterns

We are hiring!

Dig Search?

Dig Analytics?

Dig Big Data?

Dig Performance?

Dig Logging?

Dig working with and in open – source?

We’re hiring world – wide!

http://sematext.com/about/jobs.html