Upload
lucidworks
View
271
Download
2
Embed Size (px)
Citation preview
The (not so) perfect migration
h"p://en.wikipedia.org/wiki/Bird_migra4on
h"p://www.likesbooks.com/aara:erhours/?p=750
The lonely solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />
<luceneMatchVersion>LUCENE_31</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
And faulty indexing
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response>
109173 [qtp1223685984-‐20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrExcep4on: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.je"y.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.je"y.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.je"y.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.je"y.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.je"y.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.je"y.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.je"y.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.je"y.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.je"y.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.je"y.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.je"y.server.handler.ContextHandlerCollec4on.handle(ContextHandlerCollec4on.java:255) at org.eclipse.je"y.server.handler.HandlerCollec4on.handle(HandlerCollec4on.java:154) at org.eclipse.je"y.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.je"y.server.Server.handle(Server.java:368) at org.eclipse.je"y.server.AbstractH"pConnec4on.handleRequest(AbstractH"pConnec4on.java:489) at org.eclipse.je"y.server.BlockingH"pConnec4on.handleRequest(BlockingH"pConnec4on.java:53) at org.eclipse.je"y.server.AbstractH"pConnec4on.headerComplete(AbstractH"pConnec4on.java:942) at org.eclipse.je"y.server.AbstractH"pConnec4on$RequestHandler.headerComplete(AbstractH"pConnec4on.java:1004) at org.eclipse.je"y.h"p.H"pParser.parseNext(H"pParser.java:647) at org.eclipse.je"y.h"p.H"pParser.parseAvailable(H"pParser.java:235) at org.eclipse.je"y.server.BlockingH"pConnec4on.handle(BlockingH"pConnec4on.java:72) at org.eclipse.je"y.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.je"y.u4l.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.je"y.u4l.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)
Let’s make that right
<requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">applicaLon/json</str> </lst> </requestHandler>
<luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
<nrtMode>true</nrtMode>
<updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
The old schema.xml
<fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
The old schema.xml
The new schema.xml
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" posi4onIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" posi4onIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" posi4onIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" posi4onIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" posi4onIncrementGap="0"/> <fieldType name="4nt" class="solr.TrieIntField" precisionStep="8" posi4onIncrementGap="0"/> <fieldType name="rloat" class="solr.TrieFloatField" precisionStep="8" posi4onIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" posi4onIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" posi4onIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" posi4onIncrementGap="0"/>
Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.je"y.u4l.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.je"y.u4l.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
OK, so now we can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.je"y.u4l.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
Let’s cache everything
<filterCache class="solr.LRUCache" size="1048576" ini4alSize="1048576" autowarmCount="524288"/>
<queryResultCache class="solr.LRUCache" size="1048576" ini4alSize="1048576" autowarmCount="524288"/> <documentCache class="solr.LRUCache"
size="1048576" ini4alSize="1048576" autowarmCount="0"/>
OK, show us the way „Mr. Consultant”
<filterCache class="solr.FastLRUCache" size="1024" ini4alSize="1024" autowarmCount="512"/>
<queryResultCache class="solr.LRUCache" size="16000" ini4alSize="16000" autowarmCount="8000"/> <documentCache class="solr.LRUCache"
size="16384" ini4alSize="16384" autowarmCount="0"/>
We need to refresh and hard commit
<autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSo:Commit> <maxTime>1000</maxTime> </autoSo:Commit>
Maybe we should only refresh?
<autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSo:Commit> <maxTime>1000</maxTime> </autoSo:Commit>
OK, let’s go easy with refreshing
<autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSo:Commit> <maxTime>30000</maxTime> </autoSo:Commit>
But I really need all that data
curl -‐XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response>
But I really need all that data
curl -‐XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response>
But I really need all that data
curl -‐XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=5'
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)
. . . Caused by: java.lang.OutOfMemoryError: Java heap space
. . .
</str> <int name="code">500</int> </lst> </response>
Use the scroll Luke
curl -‐XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
Use the scroll Luke
curl -‐XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
Use the scroll Luke
curl -‐XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
Use the scroll Luke
curl -‐XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
Limiting faceting, why bother?
curl -‐XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-‐1&facet.mincount=0'
Limiting faceting, why bother?
curl -‐XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-‐1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-‐8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
Limiting faceting, why bother?
curl -‐XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-‐1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-‐8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
. . .
Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . .
</str> <int name="code">500</int> </lst> </response>
Magic happens with small changes
curl -‐XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide!
http://sematext.com/about/jobs.html
Thank you! Rafał Kuć
@kucrafal [email protected]
Sematext
@sematext http://sematext.com http://blog.sematext.com