Side by Side with Elasticsearch & Solr, Part 2

Side by Side withElasticsearch & Solr

Part 2 - Performance & Scalability

Radu GheorgheRafał Kuć

Who are we?Radu Rafał

Logsenecftweia

Last year

Last year conclusions

One year progress

Facet by functionhttps://issues.apache.org/jira/browse/SOLR-1581

Analytics componenthttps://issues.apache.org/jira/browse/SOLR-5302

Solr as standalone applicationhttps://issues.apache.org/jira/browse/SOLR-4792

top_hits aggregationhttps://github.com/elastic/elasticsearch/pull/6124

minimum_should_matchon has_childhttps://github.com/elastic/elasticsearch/pull/6019

filters aggregationhttps://github.com/elastic/elasticsearch/pull/6118

That's not all

JSON facetshttps://issues.apache.org/jira/browse/SOLR-7214

Backup + restorehttps://issues.apache.org/jira/browse/SOLR-5750

Streaming aggregationshttps://issues.apache.org/jira/browse/SOLR-7082

Moving averages aggregationhttps://github.com/elastic/elasticsearch/pull/10002

Computation on aggregationshttps://github.com/elastic/elasticsearch/pull/9876

Cluster state diff supporthttps://github.com/elastic/elasticsearch/pull/6295

Cross data center replicationhttps://issues.apache.org/jira/browse/SOLR-6273

Shadow replicashttps://github.com/elastic/elasticsearch/pull/8976

This year’s agenda

Horizontal scaling

Products use-case

Logs use-case

Horizontal scaling in theory

shard shard

Horizontal scaling in theory

shard shard

replica

shard shard

replica

shard shard

replica

Horizontal scaling - the API

Create / remove replicas on the fly -Collections API

Moving shards around the cluster using add / delete replica

Create / remove replicas on the fly -Update Indices Settings

Moving shards around the cluster using Cluster Reroute API

Shard splitting using Collections API

Automatic shard balancingby default

Migrating data with a given routing keyto another collection using API

Shard allocation awareness & rule based shard placement

The products - assumptions

Steady data growth

Common, known data structure

Large QPS

Spikes in traffic

Hardware and data

2 x EC2 c3.large instances(2vCPU, 3.5GB RAM,2x16GB SSD in RAID0)

Wikipedia

Test requests

One, common query

https://github.com/sematext/berlin-buzzwords-samples/tree/master/2015

Dictionary of common and uncommon terms

JMeter hammering

Product search @ 5 threads

The products - summary

preparefor highQPS

plan for data growth

Both are fast, but with wikipedia

The products - summary

preparefor highQPS

Both are fast, but with wikipedia

Hardware and data (2nd try)

Videosearch

video video

Real product search @ 20 threads

The products – real summary

preparefor highQPS

With video both are fast and configuration matters

The logs - assumptions

LogsLogs

Logs Logs

Lots of data

No updates

Low query count

Time oriented data

Hardware and data

vsLogs

LogsLogs

Logs Logs

Apache logs

Tuning

refresh_interval: 5s

doc_values: truedocValues: true

soft autocommit: 5s

catch all field: on _all: enabled

hard autocommit: 200mb flush_threshold_size: 200mb

Test requestsFilters Aggregations/Facetsfilter by client IP date histogram

filter by word in user agent

top 10 response codes

wildcard filter on domain # of unique IPs

top IPs per response per time

https://github.com/sematext/berlin-buzzwords-samples/tree/master/2015

Test runs

1.Write throughput

2. Capacity of a single index

3.Capacity with time-based indices on

hot/cold setup

Single index write throughput

Single index @ max EPS

Single index @ 400 EPS

Time-based indices

smaller indiceslighter indexingeasier to isolate hot data from cold dataeasier to relocate

bigger indicesless RAMless management overheadsmaller cluster state

Hot / Cold in practice

cron job does everything

creates indices in advance, on the hot nodes (createNodeSet property)

Uses node properties (e.g. node.tag)

index templates + shard allocationawareness = new indices go to hot

nodes automagically

Optimizes old indices, creates N newreplicas of each shard on the cold nodes

Removes all the replicas from the hot nodes

Cron job optimizes old indices and changes shard allocation attributes => shards get moved and Do on the cold

Time-based: 2 hot and 2 cold

Before

AfterAfter

Before

Time-based: 2 hot and 2 cold

The logs - summary

Faceting

use time based indices

use hot / cold nodes policy

Filtering

One summary to rule them all

Differences in configuration often mattermore than differences in products

Before the tests we expected results to be totally opposite in both use cases

Do your own tests with your data and queries before jumping into conclusions

We are hiringDig Search?Dig Analytics?Dig Big Data?Dig Performance?Dig Logging?Dig working with and in open – source?We’re hiring world – wide!

http://sematext.com/about/jobs.html

Call me, maybe?

Radu Gheorghe@radu0gheorgheradu.gheorghe@sematext.com

Rafał Kuć@kucrafalrafal.kuc@sematext.com

Sematext@sematexthttp://sematext.com

Side by Side with Elasticsearch & Solr, Part 2

Documents

Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)

Solr vs ElasticSearch

Elasticsearch and Solr for Logs

Combining Solr and Elasticsearch to Improve Autosuggestion ... · Apache: Big Data 2015 Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh

in Lucene, Solr and ElasticSearch and the eco-system...Language support and linguistics in Lucene, Solr and ElasticSearch and the eco-system Christian Moen cm@atilika.com June 3rd,

Oak / Solr integration Tommaso Teofili · Oak / Solr integration Tommaso Teofili . adaptTo() 2012 ! Why ! Search on Oak with Solr ! Solr based QueryIndex ! Solr based MK ! Benchmarks

Elasticsearch: Accelerating the Django Admin · Elasticsearch Service elastic cloud Elasticsearch Reference + + + + + + + Elasticsearch Reference: 6.4 (current) Getting Started Set

Combining Solr and Elasticsearch to Improve Autosuggestion on

Solr and ElasticSearch demo and speaker feb 2014

Improving Drupal search experience with Apache Solr and ... · Improving Drupal search experience with Apache Solrand Elasticsearch When to use Elasticsearch 33 • The Elastic Stack

Solr JDBC - Lucene/Solr Revolution 2016

Battle of the Giants - ApacheConarchive.apachecon.com/eu2012/.../aceu-2012...solr-4_0-vs-elasticsea… · Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć –

Battle of the giants: Apache Solr vs ElasticSearch

Indexing and Searching Logs with Elasticsearch/Solr by Radu Gheorghe from Sematext

ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Search Evolution – von Lucene zu Solr und ElasticSearchTerm Document Id Such 1 Evolution 1 Von 1 Lucene 1 zu 1 Solr 1 und 1 ElasticSearch 1 Verteiltes 2 Suchen 2 mit 2 Elasticsearch

Elasticsearch 5 in Amazon Elasticsearch Service

Release 1.1.0 Tuplejump - Read the Docs · 2019. 4. 2. · •Stargate-search - A search server like Solr/ElasticSearch (Work in progress.) 1.1Stargate-core Features 1.Add lucene

Scaling Solr with Solr Cloud

Solr/Elasticsearch for CF Developers (and others)