38
Elasticsearch in production Igor Motov [email protected] twitter: @imotov github: imotov

Boston elasticsearch meetup October 2012

  • Upload
    imotov

  • View
    785

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Boston elasticsearch meetup October 2012

Elasticsearch in production

Igor [email protected]: @imotov

github: imotov

Page 2: Boston elasticsearch meetup October 2012
Page 3: Boston elasticsearch meetup October 2012

Sonian Inc.•Cloud-based email archiving •Founded in 2007•Headquarters: Newton, MA

Page 4: Boston elasticsearch meetup October 2012

Small team of about15 developers distributed

from Campinas, Brazil toVancouver, Canada

Page 5: Boston elasticsearch meetup October 2012

Using elasticsearch since June 2010, v0.8.0

Page 6: Boston elasticsearch meetup October 2012

6 billionrecords indexed in elasticsearch

We have about

Page 7: Boston elasticsearch meetup October 2012

100,000Netflix DVD Titles

Page 8: Boston elasticsearch meetup October 2012

3,000,000Pages in en.wikipedia.org

Page 9: Boston elasticsearch meetup October 2012

22,000,000Books in Library of Congress catalog

Page 10: Boston elasticsearch meetup October 2012

150,000,000Linked-in profiles

Page 11: Boston elasticsearch meetup October 2012

3,000,000,000Estimated bing.com index size

Page 12: Boston elasticsearch meetup October 2012

6,000,000,000

Sonian Inc. index size

Page 13: Boston elasticsearch meetup October 2012

50,000,000,000

Estimated google.com index size

Page 14: Boston elasticsearch meetup October 2012

Infrastructure

Page 15: Boston elasticsearch meetup October 2012

http://www.sonian.com/awssonian-technical-diagram/

Page 16: Boston elasticsearch meetup October 2012

Ingestion (safe): ClojureSearch Engine: elasticsearchWeb App: Ruby on Rail

Deployment: ChefMonitoring: Sensu

Page 17: Boston elasticsearch meetup October 2012

10 clusters6 AWS Regions

2-17 nodes in each cluster

Page 18: Boston elasticsearch meetup October 2012

Custom version of elasticsearch

based on 0.19.9with several plugins

Page 19: Boston elasticsearch meetup October 2012

jetty plugin

• jetty-based http transport• SSL support• Authentication• Request logging (json, plain)

Page 20: Boston elasticsearch meetup October 2012

Request logs are also indexed in elasticsearch

Page 21: Boston elasticsearch meetup October 2012

Open sourcehttps://github.com/sonian/elasticsearch-jetty

Page 22: Boston elasticsearch meetup October 2012

Zookeeper plugin

Zookeeper-based discoveryReplacement for zen

discovery

Experimental!

Page 23: Boston elasticsearch meetup October 2012

Open sourcehttps://github.com/sonian/elasticsearch-zookeeper

Page 24: Boston elasticsearch meetup October 2012

Valve plugin

•Custom jetty plugin filter•Rejects bulk indexing requests if cluster is overloaded

Page 25: Boston elasticsearch meetup October 2012

Lessons learned in the last two years

or

Page 26: Boston elasticsearch meetup October 2012

Proper Care and Feeding of

Elasticsearch Nodes

Page 27: Boston elasticsearch meetup October 2012

Rule1: Give nodes plenty of space

Running out of disk space or memory is the simplest

way to corrupt your index.

Page 28: Boston elasticsearch meetup October 2012

Make sure elasticsearch doesn’t swap

It reduces performance and causes nodes to leave

clusters

Page 29: Boston elasticsearch meetup October 2012

elasticsearch.yml

bootstrap.mlockall: true

Page 30: Boston elasticsearch meetup October 2012

Increase the number of open file descriptors to 64k.

Page 31: Boston elasticsearch meetup October 2012

Rule 2: Distributed but well connected

All nodes should be able to talk to each other all the

time

Page 32: Boston elasticsearch meetup October 2012

Otherwise your cluster might get split-brain

syndrome

Page 33: Boston elasticsearch meetup October 2012

Consider setting

discovery.zen.minimum_master_nodes

Page 34: Boston elasticsearch meetup October 2012

Rule 3: Throttle the bulk indexing load

Asynchronous architecture makes es scalable and fast, but susceptible to running

out of memory under excessive bulk indexing

load.

Page 35: Boston elasticsearch meetup October 2012

Rule 4: Try to make all shards approximately the

same size

Elasticsearch allocates shards based on the number of shards. It

doesn’t consider shard sizes or available disk

space.

Page 36: Boston elasticsearch meetup October 2012

4 rules for happy elasticsearch

1. Give nodes plenty of space

2. Distributed but well connected

3. Throttle the load4. Make all shards the

same size

Page 37: Boston elasticsearch meetup October 2012

Questions?

Page 38: Boston elasticsearch meetup October 2012

More Information

Latest stable release: 0.19.10

Web Site: http://www.elasticsearch.org/

Follow @elasticsearch on twitter

IRC: #elasticsearch on irc.freenode.net

GitHub: https://github.com/elasticsearch/elasticsearch

Mailing list: elasticsearch on http://groups.google.com/

Stackoverflow tag: elasticsearch