39
Side by Side with Elasticsearch & Solr Part 2 Performance & Scalability Radu Gheorghe Rafał Kuć

Side by Side with Elasticsearch & Solr, Part 2

Embed Size (px)

Citation preview

Page 1: Side by Side with Elasticsearch & Solr, Part 2

Side  by  Side  withElasticsearch &  Solr

Part  2  -­ Performance  &  Scalability

Radu  GheorgheRafał  Kuć

Page 2: Side by Side with Elasticsearch & Solr, Part 2

Who  are  we?Radu Rafał

Logsenecftweia

Page 3: Side by Side with Elasticsearch & Solr, Part 2

Last  year

Page 4: Side by Side with Elasticsearch & Solr, Part 2

Last  year  conclusions

Page 5: Side by Side with Elasticsearch & Solr, Part 2

One  year  progress

Facet  by  functionhttps://issues.apache.org/jira/browse/SOLR-­1581

Analytics  componenthttps://issues.apache.org/jira/browse/SOLR-­5302

Solr  as  standalone  applicationhttps://issues.apache.org/jira/browse/SOLR-­4792

top_hits  aggregationhttps://github.com/elastic/elasticsearch/pull/6124

minimum_should_matchon  has_childhttps://github.com/elastic/elasticsearch/pull/6019

filters  aggregationhttps://github.com/elastic/elasticsearch/pull/6118

Page 6: Side by Side with Elasticsearch & Solr, Part 2

That's  not  all

JSON  facetshttps://issues.apache.org/jira/browse/SOLR-­7214

Backup  +  restorehttps://issues.apache.org/jira/browse/SOLR-­5750

Streaming  aggregationshttps://issues.apache.org/jira/browse/SOLR-­7082

Moving  averages  aggregationhttps://github.com/elastic/elasticsearch/pull/10002

Computation  on  aggregationshttps://github.com/elastic/elasticsearch/pull/9876

Cluster  state  diff  supporthttps://github.com/elastic/elasticsearch/pull/6295

Cross  data  center  replicationhttps://issues.apache.org/jira/browse/SOLR-­6273

Shadow  replicashttps://github.com/elastic/elasticsearch/pull/8976

Page 7: Side by Side with Elasticsearch & Solr, Part 2

This year’s agenda

Horizontal scaling

Products use-­case

Logs use-­case

Page 8: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling in  theory

Node

shard

Page 9: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling  in  theory

Node

shard

Node

shard

Page 10: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling  in  theory

Node

shard

Node

shard

Node

shard

Page 11: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling  in  theory

Node

shard shard

shard shard

Node

shard shard

shard shard

Node

shard shard

shard shard

Page 12: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling  in  theory

Node

shard shard

shard shard

replica

replica

replica

replica

Node

shard shard

shard shard

replica

replica

replica

replica

Node

shard shard

shard shard

replica

replica

replica

replica

Page 13: Side by Side with Elasticsearch & Solr, Part 2

Horizontal scaling  -­ the  API

Create /  remove replicas  on  the  fly  -­Collections  API

Moving  shards  around the  cluster  using  add  /  delete  replica

Create /  remove replicas  on  the  fly  -­Update  Indices  Settings

Moving  shards  around the  cluster  using  Cluster  Reroute  API

Shard  splitting using  Collections  API

Automatic  shard  balancingby  default

Migrating  data with  a  given  routing  keyto  another  collection  using  API

Shard  allocation  awareness &  rule  based  shard  placement

Page 14: Side by Side with Elasticsearch & Solr, Part 2

The  products  -­ assumptions

Steady  data  growth

Common,  known  data  structure

Large  QPS

Spikes  in  traffic

Page 15: Side by Side with Elasticsearch & Solr, Part 2

Hardware  and  data

2  x  EC2  c3.large  instances(2vCPU,  3.5GB  RAM,2x16GB  SSD  in  RAID0)

vs

Wikipedia

Page 16: Side by Side with Elasticsearch & Solr, Part 2

Test  requests

One,  common query

https://github.com/sematext/berlin-­buzzwords-­samples/tree/master/2015

Dictionary  of  common and  uncommon terms

JMeter hammering

Page 17: Side by Side with Elasticsearch & Solr, Part 2

Product  search  @  5 threads

Page 18: Side by Side with Elasticsearch & Solr, Part 2

The  products  -­ summary

preparefor  highQPS

plan  for  data  growth

Both are fast,  but  with  wikipedia

<

Page 19: Side by Side with Elasticsearch & Solr, Part 2

The  products  -­ summary

preparefor  highQPS

plan  for  data  growth

Both are fast,  but  with  wikipedia

<

Page 20: Side by Side with Elasticsearch & Solr, Part 2

Hardware  and  data (2nd  try)

2  x  EC2  c3.large  instances(2vCPU,  3.5GB  RAM,2x16GB  SSD  in  RAID0)

vs

Videosearch

video video

video video

video video

video

video

Page 21: Side by Side with Elasticsearch & Solr, Part 2

Real product search  @  20 threads

Page 22: Side by Side with Elasticsearch & Solr, Part 2

Real product search  @  20 threads

Page 23: Side by Side with Elasticsearch & Solr, Part 2

The  products  – real summary

preparefor  highQPS

plan  for  data  growth

With  video  both are fast and  configuration matters

Page 24: Side by Side with Elasticsearch & Solr, Part 2

The  logs  -­ assumptions

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

LogsLogs

Logs Logs

Logs

Lots  of  data

No  updates

Low  query  count

Time  oriented  data

Page 25: Side by Side with Elasticsearch & Solr, Part 2

Hardware  and  data

2  x  EC2  c3.large  instances(2vCPU,  3.5GB  RAM,2x16GB  SSD  in  RAID0)

vsLogs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

Logs

LogsLogs

Logs Logs

Logs

Apache  logs

Page 26: Side by Side with Elasticsearch & Solr, Part 2

Tuning

refresh_interval:  5s

doc_values:   truedocValues:   true

soft  autocommit: 5s

catch  all  field:  on _all:  enabled

hard autocommit:  200mb flush_threshold_size:   200mb

Page 27: Side by Side with Elasticsearch & Solr, Part 2

Test  requestsFilters Aggregations/Facetsfilter  by  client    IP date  histogram

filter  by  word  in  user  agent

top  10  response  codes

wildcard  filter  on  domain #  of  unique  IPs

top  IPs  per  response  per  time

https://github.com/sematext/berlin-­buzzwords-­samples/tree/master/2015

Page 28: Side by Side with Elasticsearch & Solr, Part 2

Test  runs

1.Write  throughput

2. Capacity of  a  single  index

3.Capacity with  time-­based indices on  

hot/cold setup

Page 29: Side by Side with Elasticsearch & Solr, Part 2

Single  index  write  throughput

Page 30: Side by Side with Elasticsearch & Solr, Part 2

Single  index  @  max EPS

Page 31: Side by Side with Elasticsearch & Solr, Part 2

Single  index  @  400 EPS

Page 32: Side by Side with Elasticsearch & Solr, Part 2

Time-­based indices

smaller  indiceslighter  indexingeasier  to  isolate  hot  data  from  cold  dataeasier  to  relocate

bigger  indicesless  RAMless  management  overheadsmaller  cluster  state

Page 33: Side by Side with Elasticsearch & Solr, Part 2

Hot  /  Cold in  practice

cron job does everything

creates indices in  advance,  on  the  hot  nodes (createNodeSet property)

Uses node properties (e.g.  node.tag)

index templates +  shard allocationawareness =  new indices go  to  hot  

nodes automagically

Optimizes old indices,  creates N  newreplicas of  each shard on  the  cold nodes

Removes all the  replicas from  the  hot  nodes

Cron job optimizes old indices and  changes shard allocation attributes =>  shards get moved and  Do on  the  cold

nodes

Page 34: Side by Side with Elasticsearch & Solr, Part 2

Time-­based:  2  hot and  2  cold

Before

AfterAfter

Before

Page 35: Side by Side with Elasticsearch & Solr, Part 2

Time-­based:  2  hot and  2  cold

Page 36: Side by Side with Elasticsearch & Solr, Part 2

The  logs  -­ summary

Faceting

use  time  based  indices

use  hot  /  cold  nodes  policy

Filtering

Page 37: Side by Side with Elasticsearch & Solr, Part 2

One  summary to  rule them all

Differences in  configuration often mattermore than differences in  products

Before the  tests we  expected results to  be  totally opposite in  both use cases

Do  your own tests with  your data and  queries before jumping   into conclusions

Page 38: Side by Side with Elasticsearch & Solr, Part 2

We  are  hiringDig  Search?Dig  Analytics?Dig  Big  Data?Dig  Performance?Dig  Logging?Dig  working  with  and  in  open  – source?We’re  hiring  world  – wide!

http://sematext.com/about/jobs.html

Page 39: Side by Side with Elasticsearch & Solr, Part 2

Call  me,  maybe?

Radu [email protected]@sematext.com

Rafał Kuć@[email protected]

Sematext@sematexthttp://sematext.com