Solr Anti Patterns

  • View
    5.779

  • Download
    2

Embed Size (px)

Text of Solr Anti Patterns

  • 1. Solr Anti - patterns Rafa Ku, Sematext Group, Inc. @kucrafal @sematext http://sematext.com

2. About me Sematext consultant & engineer Solr.pl co-founder Father & husband 3. The (not so) perfect migration http://en.wikipedia.org/wiki/Bird_migration http://www.likesbooks.com/aarafterhours/?p=750 4. From 3.1 to 4.10 (and hopefully not back) March 2011 September 2014 5. The lonely solrconfig.xml LUCENE_31 6. DOC DOC DOC And faulty indexing EXCEPTIONS :) 7. And faulty indexing 400 0 missing content stream 400 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) 8. Lets make that right application/json LUCENE_4.10.0 true ${solr.ulog.dir:} 9. The old schema.xml 10. The old schema.xml 11. The new schema.xml 12. Threads? What threads? 10 200 false 13. I see deadlocks 14. Threads? What threads? 10 200 false 15. OK, so now we can actually run queries 10 10000 false 16. The ZooKeeper 17. The ZooKeeper 18. The ZooKeeper 19. The ZooKeeper 20. The ZooKeeper 21. The ZooKeeper production 22. The ZooKeeper production -DzkHost=zk1:2181,zk2:2181,zk3:2181 23. The ZooKeeper production -DzkHost=zk1:2181,zk2:2181,zk3:2181 24. The ZooKeeper production -DzkHost=zk1:2181,zk2:2181,zk3:2181 25. The ZooKeeper production -DzkHost=zk1:2181,zk2:2181,zk3:2181 26. Lets cache everything 27. And now lets look at the warmup times 28. And now lets look at the warmup times 29. OK, show us the way Mr. Consultant 30. Lets look at the warmup times again 31. Lets look at the warmup times again 32. Bulks are for noobs Application Application Application Doc Doc Doc 33. Bulks are for noobs Application Application Application Doc Doc Doc 34. But lets use bulks, just in case 35. But lets use bulks, just in case 36. We need to refresh and hard commit 1000 true 1000 37. Maybe we should only refresh? 60000 false 1000 38. OK, lets go easy with refreshing 60000 false 30000 39. But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100' 40. 0 9418 3000000 *:* 100 . . . But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100' 41. 0 9418 3000000 *:* 5 . . . But I really need all that data java.lang.OutOfMemoryError: Java heap space java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448) . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . 500 curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100' 42. But I really need all that data Query 43. But I really need all that data 44. But I really need all that data 45. But I really need all that data Response 46. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' 47. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' 0 189 score desc,id desc *:* * ... . . . AoIIP4AAACY5OTk5OTA= 48. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' 49. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' 0 184 score desc,id desc *:* AoIIP4AAACY5OTk5OTA= ... . . . AoIIP4AAACY5OTk5ODE= 50. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag& facet.limit=-1&facet.mincount=0' 51. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag& facet.limit=-1&facet.mincount=0' 0 9967 ... . . . ... 52. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag& facet.limit=-1&facet.mincount=0' . . . Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . . 500 53. Now lets look at performance 54. Now lets look at performance 55. Now lets look at performance 56. Now lets look at performance 57. Now lets look at performance 58. Magic happens with small changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag& facet.limit=100&facet.mincount=1' 59. Magic happens with small changes 60. Magic happens with small changes 61. Magic happens with small changes 62. Magic happens with small changes 63. Magic happens with small changes 64. Magic happens with small changes 65. Magic happens with small changes 66. Monitoring in production http://sematext.com/spm/index.html 67. And remember 3.1 68. Quick summary http://www.soothetube.com/2013/12/29/thats-all-folks/ 69. We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open source? Were hiring world wide! http://sematext.com/about/jobs.html 70. Thank you! Rafa Ku @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com