Scaling Solr with Solr Cloud

  • View
    108

  • Download
    3

Embed Size (px)

DESCRIPTION

How to make the most of your Solr Cloud clusters.

Text of Scaling Solr with Solr Cloud

  • 1. Scaling Solr with SolrCloudRafa Ku Sematext Group, Inc. @kucrafal @sematext sematext.com

2. Ta me Sematext consultant & engineer Solr.pl co-founder Father and husband 3. Solr History Solr 4.1 and counting Solr 4.0 released Lucene / Solr merge Solr 1.4 released Solr 1.3 released Incubator graduation Solr donated to ASF Y. Seeley creates Solr 4. The Past 5. Master Slave Deployment Solr SlaveSolr SlaveSolr SlaveSolr MasterApplicationSolr Slave 6. Master as SPOF Solr SlaveSolr SlaveSolr SlaveSolr MasterApplicationSolr Slave 7. Replication Time Solr SlaveIndexing AppSolr MasterSolr R SlaveSolr SlaveQuerying App 8. Too Much for a Single Shard Solr SlaveSolr SlaveSolr MasterApplication 9. Too Much for a Single Shard Solr SlaveSolr SlaveSolr MasterSolr SlaveSolr SlaveSolr MasterApplicationSolr SlaveSolr MasterSolr Slave 10. Querying in Multi Master DeploymentShard1, shard2, shard3Solr Slave Shard 2 DocShard1, shard2, shard3Solr Slave Response Shard 1ApplicationSolr Slave Shard 3Response 11. SolrCloud Comes Into Play 12. Basic Glossary Cluster Node Collection Shard Leader & Replica Overseer https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary 13. Apache ZooKeeper Quorum is required Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888 ZooKeeperZooKeeperZooKeeper 14. Solr Instances -DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181Solr Server-DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181Solr ServerZooKeeperZooKeeperZooKeeper-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181Solr Server-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181Solr Server 15. Collection CreationSolr Server $ curl $ cloud-scripts/zkcli.sh cmd upconfig -zkhostSolr Server 'http://solr1:8983/solr/admin/collections?action=CREATE& 192.168.1.2:2181 -confdir name=revolution&numShards=2&replicationFactor=1' /usr/share/config/revolution/conf -conf revolutionSolr ServerZooKeeperZooKeeperZooKeeperSolr Server 16. Single Collection Deployment Shard1Shard2Solr ServerSolr ServerSolr ServerSolr ServerApplication 17. Collection with ReplicaSolr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=revolution&numShards=2&replicationFactor=2'Solr ServerZooKeeperZooKeeperZooKeeperSolr Server 18. Collection with Replicas Shard1 ReplicaShard2 ReplicaSolr ServerSolr ServerShard2Shard1Solr ServerSolr ServerApplication 19. Querying Shard2 Id,scoreShard1 Id,scoreSolr ServerSolr ServerQUERYSolr ServerApplication 20. Querying Shard2 docShard1 docSolr ServerSolr ServerResultsSolr ServerApplication 21. Shard and Replica Number How your data looks Expected data growth Target performance Target node numberMax number of nodes = number of shards * (number of replicas + 1) 22. What should I go for? More data?ShardShardShardMore queries ?Replica ReplicaReplica ReplicaReplica Replica 23. Custom RoutingDefault (numShards present, pre 4.5)Implicit (numShards not present, pre 4.5) 24. Custom Routing Example Shard1Shard2Solr ServerSolr Serverid=userB!3 id=userA!1 id=userA!2 25. Querying Solr Default Routing Shard 1Shard 2Shard 3Shard 4Shard 5Shard 6Shard 7Shard 8Solr CollectionApplication 26. Quering Solr Custom Routing Shard 1Shard 2Shard 3Shard 4Shard 5Shard 6Shard 7Shard 8Solr Collection q=revolution&_route_=userA!Application 27. Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletionhttp://wiki.apache.org/solr/SolrCloud 28. Collection Creation name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName 29. Collection Split Example$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=collection1&numShards=2&replicationFactor=1' 30. Collection Split Example$ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1' 31. Collection Aliasing $ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107, 20131108,20131109,20131110,20131111,20131112,20131113'$ curl 'http://solr1:8983/solr/weekly/select?q=revolution'$ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly' 32. Caches Refreshed with IndexSearcher Configurable Different purposes Different implementationsSolr Cache 33. Filter Cache q=lucene+revolution+city:Dublinq=lucene+revolution&fq=city:Dublinq=*:*&fq={!cache=false}city:Dublin q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro) 34. Document Cache 35. Query Result Cache q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=1020200 36. Warming *:*date desckeywords:* OR tags:**:*active:**:*date desckeywords:* OR tags:**:*active:*false 37. The Right Directory StandardDirectory SimpleFSDirectoryNIOFSDirectory MMapDirectory_0.fdt_0.fdx _0.fnm _0.nvd_1.fdt_1.fdx _1.fnm _1.nvdNRTCachingDirectory RAMDirectory 38. Column oriented fields - DocValues NRT compatibleBetter compression than field cache Can store data outside of JVM heapCan improve things for dynamic indices 39. Segment Merge Level 0abfLevel 1ccdeg 40. Segment Merge Under Control Merge policy Merge schedulerMerge factor Merge policy configuration 41. Configuring Segment Merge 101010 42. Indexing Throughput Tuning Maximum indexing threadsRAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff 43. TransactionLog Updates durability Recovering peer replay Performant Realtime Get${solr.ulog.dir:} 44. Autocommit or Not? Automatic data flushAutomatic index view refresh150001000false1000 45. Autocommit & openSearcher=true 10true 46. AutoSoftCommit & openSearcher=false 1000false10 47. Postings Formats to the Rescue Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats availableBloom Pulsing Simple text Direct Memory 48. Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work 49. JMX and Solr 50. JMX and Solr 51. Administration Panel 52. Administration Panel 53. Monitoring with SPM 54. Monitoring with SPM 55. Other Monitoring Tools Ganglia http://ganglia.sourceforge.net/New Relic http://www.newrelic.com/Opsview http://www.opsview.com 56. We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open source ? Were hiring world wide ! http://sematext.com/about/jobs.html 57. Thank You ! Rafa Ku @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com SPM discount code:LR2013SPM20@ Sematext booth ;)