Boston elasticsearch meetup October 2012

Elasticsearch in production

Igor Motovigor@motovs.orgtwitter: @imotov

github: imotov

Sonian Inc.•Cloud-based email archiving •Founded in 2007•Headquarters: Newton, MA

Small team of about15 developers distributed

from Campinas, Brazil toVancouver, Canada

Using elasticsearch since June 2010, v0.8.0

6 billionrecords indexed in elasticsearch

We have about

100,000Netflix DVD Titles

3,000,000Pages in en.wikipedia.org

22,000,000Books in Library of Congress catalog

150,000,000Linked-in profiles

3,000,000,000Estimated bing.com index size

6,000,000,000

Sonian Inc. index size

50,000,000,000

Estimated google.com index size

Infrastructure

http://www.sonian.com/awssonian-technical-diagram/

Ingestion (safe): ClojureSearch Engine: elasticsearchWeb App: Ruby on Rail

Deployment: ChefMonitoring: Sensu

10 clusters6 AWS Regions

2-17 nodes in each cluster

Custom version of elasticsearch

based on 0.19.9with several plugins

jetty plugin

• jetty-based http transport• SSL support• Authentication• Request logging (json, plain)

Request logs are also indexed in elasticsearch

Open sourcehttps://github.com/sonian/elasticsearch-jetty

Zookeeper plugin

Zookeeper-based discoveryReplacement for zen

discovery

Experimental!

Open sourcehttps://github.com/sonian/elasticsearch-zookeeper

Valve plugin

•Custom jetty plugin filter•Rejects bulk indexing requests if cluster is overloaded

Lessons learned in the last two years

Proper Care and Feeding of

Elasticsearch Nodes

Rule1: Give nodes plenty of space

Running out of disk space or memory is the simplest

way to corrupt your index.

Make sure elasticsearch doesn’t swap

It reduces performance and causes nodes to leave

clusters

elasticsearch.yml

bootstrap.mlockall: true

Increase the number of open file descriptors to 64k.

Rule 2: Distributed but well connected

All nodes should be able to talk to each other all the

Otherwise your cluster might get split-brain

syndrome

Consider setting

discovery.zen.minimum_master_nodes

Rule 3: Throttle the bulk indexing load

Asynchronous architecture makes es scalable and fast, but susceptible to running

out of memory under excessive bulk indexing

Rule 4: Try to make all shards approximately the

same size

Elasticsearch allocates shards based on the number of shards. It

doesn’t consider shard sizes or available disk

space.

4 rules for happy elasticsearch

1. Give nodes plenty of space

2. Distributed but well connected

3. Throttle the load4. Make all shards the

same size

Questions?

More Information

Latest stable release: 0.19.10

Web Site: http://www.elasticsearch.org/

Follow @elasticsearch on twitter

IRC: #elasticsearch on irc.freenode.net

GitHub: https://github.com/elasticsearch/elasticsearch

Mailing list: elasticsearch on http://groups.google.com/

Stackoverflow tag: elasticsearch

Boston elasticsearch meetup October 2012

Technology

Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016

April 2010 Boston WordPress Meetup

Elasticsearch in production New York Meetup at Twitter October 2014

Meetup ElasticSearch : « Booster votre Magento avec Elasticsearch »

Boston Spark Meetup May 24, 2016

Elasticsearch Atlanta Meetup 3/15/16

ElasticSearch Meetup 30 - 10 - 2014

Meetup Elasticsearch 13 novembre 2014

Simple fuzzy name matching in elasticsearch paris meetup

Elk meetup boston - logz.io

Tlantic @ ElasticSearch POA Meetup

Talk at the quantopian Boston meetup

Elasticsearch JVM-MX Meetup April 2016

2012 09 Maria2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗DB Boston Meetup

Docker and OpenStack Boston Meetup

SF ElasticSearch Meetup - How HipChat Scaled to 1B Messages

20150226 boston hw meetup

OpenStack Boston meetup 12 4-2014

Elasticsearch logstash kibana meetup

Boston MeetUp 10.10