44
Introduction to Elasticsearch Cliff James /[email protected] 1

Intro to Elasticsearch

Embed Size (px)

DESCRIPTION

Technical overview of Elasticsearch.

Citation preview

Page 1: Intro to Elasticsearch

Introduction toElasticsearch

Cliff James /[email protected]

1

Page 2: Intro to Elasticsearch

What is Elasticsearch?

Elasticsearch:An open-source, distributed, real-time,

document indexer with support for online analytics

2

Page 3: Intro to Elasticsearch

Features at a GlanceExtremely elegant and powerful REST API

•Almost all search engine features are accessible over plain HTTP•JSON formatted queries and results •Can test/experiment/debug with simple tools like curl

Schema-Less Data Model•Allows great flexibility for application designer•Can index arbitrary documents right away with no schema metadata•Can also tweak type/field mappings for indexes as needed

Fully Distributed and Highly-Available•Tunable index-level write-path (index) and read-path (query) distribution policies•P2P node operations with recoverable master node, multicast auto-discovery (configurable)•Plays well in VM/Cloud provisioned environments•Indexes scale horizontally as new nodes are added•Search Cluster performs automatic failover and recovery

Advanced Search Features•Full-Text search, autocomplete, facets, real-time search analytics•Powerful Query DSL•Multi-Language Support•Built-in Tokenizers,Filters and Analyzers for most common search needs

3

Page 4: Intro to Elasticsearch

ConceptsClusters/NodesES is a deployed as a cluster of individual nodes with a single master node. Each node can have many indexes hosted on it.

DocumentsIn ES you index documents. Document indexing is a distributed atomic operation with versioning support and transaction logs. Every document is associated with an index and has at least a type and an id.

IndexesSimilar to a database in traditional relational stores. Indexes are a logical namespace and have a primary shard and zero or more replica shards in the cluster. A single index has mappings which may define several types stored in the index. Indexes store a mapping between terms and documents. MappingsMappings are like schemas in relational database. Mappings define a type within an index along with some index-wide settings. Unlike a traditional database, in ES types do not have to be explicitly defined ahead of time. Indexes can be created without explicit mappings at all in which case ES infer a mapping from the source documents being indexed.

4

Page 5: Intro to Elasticsearch

ConceptsTypesTypes are like tables in a database. A type defines fields along with optional information about how that field should be indexed. If a request is made to index a document with fields that don’t have explicit type information ES will attempt to guess an appropriate type based on the indexed data.

QueriesA query is a request to retrieve matching documents (“hits”) from one or more indexes. ES can query for exact term matches or more sophisticated full text searches across several fields or indexes at once. The query options are also quite powerful and support things like sorting, filtering, aggregate statistics, facet counts and much more.

AnalysisAnalysis is the process of converting unstructured text into terms. It includes things like ignoring punctuation, common stop words (‘the’,’a’,‘on’,‘and’), performing case normalizing, breaking a work into ngrams (smaller pieces based on substrings), etc. to support full-text search. Is ES analysis happens at index-time and query-time.

5

Page 6: Intro to Elasticsearch

Index Layout

Node

Index 1 Index 2

Type 1 Type 2 Type 3 Type 3

Documents

6

Page 7: Intro to Elasticsearch

Shards and Replicas

curl -XPUT localhost:9200/test -d ‘{“settings”: {

“number_of_shards”: 1,“number_of_replicas”: 0 }}’

Node

test(1)

7

Page 8: Intro to Elasticsearch

Node

Node

Node

Node

Shards and Replicas

test(1)

test(1)

test(2)

other(2)

curl -XPUT localhost:9200/test -d ‘{“settings”: {

“number_of_shards”: 3,“number_of_replicas”: 2}}’

other(3)other(1

other(1)

test(2)

test(3) other(2)

Shards

Replicas

Shards

Replicas

8

Page 9: Intro to Elasticsearch

Shard Placement

By default shards in ES are placed onto nodes by taking the the hash of the document id modulo #shards for the destination index

Node

test(2)

Node

test(1)

Node

test(4)

Node

test(3)

REST

Document

Index Request

9

Page 10: Intro to Elasticsearch

Shard Placement

Querying is more complex. Generally potential search hits are spread across all the shards for that index so the query is distributed to all shards and the results are combined somehow before being returned (scatter/gather architecture).

Node

test(2)

Node

test(1)

Node

test(4)

Node

test(3)

RESTUser

Query

10

Page 11: Intro to Elasticsearch

Routing

Routing can be used to control which shards (and therefore which nodes) receive requests to search for a document. When routing is enabled the user can specify a value at either index time or query time to determine which shards are used for indexing/querying. The same routing value is always routed to the same shard for a given index.

Node

test(2)

Node

test(1)

Node

test(4)

Node

test(3)

RESTUser

Query

curl -XGET 'localhost:9200/test/product/_query?routing=electronics'

11

Page 12: Intro to Elasticsearch

ES Document Model•Documents first broken down into terms to create inverted index back to original

source (more on this later)

•Document content is up to you and can be:

✴ unstructured (articles/tweets)

✴ semi-structured (log entries/emails)

✴ structured (patient records/emplyee records) or

✴ any combination thereof

•Queries can look for exact term matches (e.g. productCategory == entertainment) or “best match” based on scoring each document against search criteria

•All documents in ES have an associated index, type and id.

12

Page 13: Intro to Elasticsearch

Analyzers• In ES Analysis is the process of breaking down raw document text into terms

that will be indexed in a single lucene index.

• The role of analysis is performed by Analyzers. Analyzers themselves are broken into logical parts:

✴CharFilter: An optional component that directly modifies the underlying char stream for example to remove HTML tags or convert characters

✴Tokenizer: Component that extracts multiple terms from a single text string

✴TokenFilters: Component that modifies, adds or removes tokens for example to convert all characters to uppercase or remove common stopwords

• Can be index-specific or shared globally.

• ES ships with several common analyzers. You can also create a custom analyzers with a single logical name by specifying the CharFilter, Tokenizer and TokenFilters that comprise it.

13

Page 14: Intro to Elasticsearch

Analyzer Example

TokenizerStandard

TokenFilterStopwords

TokenFilterLowercase

CharFilterHTMLStripper

“<p>The quick brown Fox jumps over the Lazy dog</p>”

“The quick brown Fox jumps over the Lazy dog”

[“The”, “quick”, “brown”, “Fox”, “jumps”, “over”, “the”, “Lazy”, “dog”]

[ “quick”, “brown”, “Fox”, “jumps”, “over”, “Lazy”, “dog”]

[ “quick”, “brown”, “fox”, “jumps”, “over”, “lazy”, “dog”]

Index Terms

Input

14

Page 15: Intro to Elasticsearch

Testing Analyzers

•ES has several built-in analyzers and analyzer components (which are highly configurable)

•You can mix-and-match analyzer components to build custom analyzers and use the Analysis REST API to test your analyzers.

•Here is an example of the standard analyzer (default if you don’t explicitly define a mapping) being applied to a sample text string. Notice that several common english words (the,is,this,a) were removed and the case was normalized to lowercase

curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty&format=text' -d 'this Is a tESt'

{ "tokens" : [ { "token" : "test", "start_offset" : 12, "end_offset" : 16, "type" : "<ALPHANUM>", "position" : 4 } ]}

15

Page 16: Intro to Elasticsearch

Testing Analyzers

•We can also test tokenizers and tokenFilters by themselves.

•You can mix-and-match analyzer components to build custom analyzers and use the Analysis REST API to test your analyzers.

curl -XGET 'localhost:9200/_analyze?tokenizer=standard&pretty' -d 'this Is A tESt'

{ "tokens" : [ { "token" : "this", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "Is", "start_offset" : 5, "end_offset" : 7, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "A", "start_offset" : 8, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "tESt", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 4 } ]}

16

Page 17: Intro to Elasticsearch

E-Commerce Example•Suppose we run an E-commerce site similar to Amazon and

have several “products” that we would like to be able to search for easily and quickly.

•Customers need to be able to search with a variety of complex criteria. Although all products have some common criteria, we don’t know all possible product attributes for all possible products ahead of time (dynamic schema).

•Our Simple JSON Data Model:

{ “category”: “electronics”, “price”: 129.99, “name”: “ipod” }

17

Page 18: Intro to Elasticsearch

curl -XPUT localhost:9200/test/product/1 -d ‘{"category": "electronics", "price": 129.99, "id": 1, "name": "ipod"}’---{"ok":true,"_index":"test","_type":"product","_id":"1","_version":1}

Indexing a Document

•This will index all the fields of our document in the index named test with a type mapping of product an an id of 1

•Notice that we did not create any indexes ahead of time or define any information about the schema of the document we just indexed!

•ES returns a response JSON object acknowledging our operation

Index Type Id

Document

18

Page 19: Intro to Elasticsearch

Indexing a Document

•Using POST method this time instead of PUT

•No explicit id provided to ES so it auto-generates one for us. Id is returned in the id field of the JSON response.

•Notice the _version field in the response. ES keeps a version number for every indexed document. The same document can be updated or re-indexed with different attributes and the version will be automatically incremented by ES.

curl -XPOST localhost:9200/test/product -d ‘{"category": "electronics", "price": 129.99, "name":"ipod"}’---{"ok":true,"_index":"test","_type":"product","_id":"9wrADN4eS8uXm3gNpDvEJw","_version":1}

19

Page 20: Intro to Elasticsearch

Introspecting Indexescurl -XGET 'localhost:9200/test/_mapping?format=yaml' ---test: product: properties: category: type: "string" name: type: "string" price: type: "double"

curl -XGET 'localhost:9200/test/_status?format=yaml' ---ok: true_shards: total: 1 successful: 1 failed: 0indices: test: index: primary_size: "2.2kb" primary_size_in_bytes: 2282 size: "2.2kb" size_in_bytes: 2282 translog: operations: 1 docs: num_docs: 1 max_doc: 1 deleted_docs: 0...

•The mapping API lets us see how ES mapped our document fields

•ES determined that the price field was of type double based on the first document indexed

•Using the ‘format=yaml’ parameter in API Get requests formats the response as YAML which is sometimes easier to read than JSON (the default)

• The _status path lets us examine lots of interesting facts about an index.

•Here we see that a new index ‘test’ was created after our document PUT call and that it is 2.2KB in size and contains a single document

20

Page 21: Intro to Elasticsearch

Index Design

• A very common pattern for user-generated data (e.g. tweets/emails) and machine generated data (log events,system metrics) is to segregate data by date or timestamp.

• ES makes it easy to create a separate index at whatever interval makes sense for your application (daily/weekly/monthly). For example if we are indexing log data by day our indexes might look like:

logs-2013-10-01 logs-2013-10-02 logs-2013-10-03

• Now we can query for all the logs in October and November 2013 with the following URI form:

http://localhost:9200/logs-2013-10*,logs-2013-11*/

Date Bounded Indexes

21

Page 22: Intro to Elasticsearch

Index Aliases

•Index aliases allow us to manage one or more individual indexes under a single logical name.

•This is perfect for things like creating an index alias to hold a sliding window of indexes or providing a filtered “view” on a subset of an indexes actual data.

•Like other aspects of ES, a REST API is exposed that allows complete programmatic management of aliases

curl -XPOST 'http://localhost:9200/_aliases' -d '{    "actions" : [        { "add" : { "index" : "logs-2013-10", "alias" : "logs_last_6months" } },        { "add" : { "index" : "logs-2013-09", "alias" : "logs_last_6months" } },        { "add" : { "index" : "logs-2013-08", "alias" : "logs_last_6months" } },        { "add" : { "index" : "logs-2013-07", "alias" : "logs_last_6months" } },        { "add" : { "index" : "logs-2013-06", "alias" : "logs_last_6months" } },        { "add" : { "index" : "logs-2013-05", "alias" : "logs_last_6months" } },    ]}'

22

Page 23: Intro to Elasticsearch

Retrieving Documents

The primary purpose for setting up an ES cluster is to support full-text or complex querying across documents however you can also retrieve a specific document if you happen to know its id (Similar to KV stores)

curl -XGET ‘localhost:9200/test/product/1?pretty’ ---{ "_index" : "test", "_type" : "product", "_id" : "1", "_version" : 2, "exists" : true, "_source" : {"category": "electronics", "price": 129.99, "name": "ipod"}}

23

Page 24: Intro to Elasticsearch

Manual Index Creation

•For Indexes that are created “lazily” in ES, a mapping is created “on-the-fly” from introspecting the documents being indexed.

•You can specify mappings at index creation time or in a config file stored at each node.

curl -XPOST ‘http://localhost:9200/test' -d \‘{ "mappings": { "products": { "properties": { "name": {"type": "string", }, "price": {"type": "float"}, "category": {"type": "string"} } } }, "settings" :{ "index": { "number_of_shards": 1, "number_of_replicas": 0 } }}’

Name of logical index that we are creating.

Index shard settings (overrides global defaults)

Mappings for types within the index.

24

Page 25: Intro to Elasticsearch

Mappings

•Mappings can also define the underlying analyzer that is used on indexed field values. Field mappings can specify both an index analyzer and a query analyzer or opt out of analyzation completely.

mappings: product: properties: category: type: "string" name: fields: bare: index: "not_analyzed" type: "string" name: index: "analyzed" index_analyzer: "partial_word" search_analyzer: "full_word" type: "string" type: "multi_field" price: type: "float"

settings: analysis: analyzer: full_word: filter: - "standard" - "lowercase" - "asciifolding" tokenizer: "standard" type: "custom" partial_word: filter: - "standard" - "lowercase" - "contains" - "asciifolding" tokenizer: "standard" type: "custom" filter: contains: max_gram: 20 min_gram: 2 type: "nGram"

•A single document field can actually have multiple settings (index settings, type, etc) applied simultaneously using the multi_field type, see reference guide for a full description.

25

Page 26: Intro to Elasticsearch

Dynamic Field Mappings

•Sometimes we want to control how certain fields get mapped dynamically indexes but we don’t know every possible field ahead of time, dynamic mapping templates help with this.

•A dynamic mapping template allows us to use pattern matching to control how new fields get mapped dynamically

•Within the template spec {dynamic_type} is a placeholder for the type that ES automatically infers for a given field and {name} is the original name of the field in the source document

{ "mappings" : { "logs" : { "dynamic_templates" : [ { "logs": { "match" : "*", "mapping" : { "type" : "multi_field", "fields" : { "{name}": { "type" : "{dynamic_type}", “index_analyzer”: “keyword” }, "str": {"type" : "string"} } } } } ] } }}

26

Page 27: Intro to Elasticsearch

Index Templates•Index templates allow you to create templates

that will automatically be applied to new indexes

•Very handy when using a temporal index design strategy like ‘index per day’ or similar

•Templates use a index name matching strategy to decide if they apply to a newly created index. If there is a match the contents of the template are copied into the new index settings.

•Multiple templates can match a single index. Unless the order parameter is given templates are applied in the order they are defined.

curl -XPUT localhost:9200/_template/logtemplate -d '{    "template" : "logs*",    "settings" : {        "number_of_shards" : 5,

“number_of_replicas” : 1    },    "mappings" : {        "logevent" : {            "_source" : { "enabled" : false }        }    }}

27

Page 28: Intro to Elasticsearch

Performing Queries•The _search path is the standard way to

query an ES index

•Using the q=<query> form performs a full-text search by parsing the query string value. While this is convenient for a some queries, ES offers a much richer query API via it’s JSON Query object and query DSL

•Normally a search query also returns the _source field for every search hit which contains the document as it was originally indexed

curl -XGET 'localhost:9200/test/product/_search?q="ipod"&format=yaml' ---took: 104timed_out: false_shards: total: 1 successful: 1 failed: 0hits: total: 1 max_score: 0.15342641 hits: - _index: "test" _type: "product" _id: "1" _score: 0.15342641 _source: category: "electronics" price: 129.99 name: "ipod"

28

Page 29: Intro to Elasticsearch

Multi-Index / Multi-Type API Conventions

URI Format Meaning

curl -XGET ‘localhost:9200/test/_search’ Searches all documents of any type under the test index

curl -XGET ‘localhost:9200/test/product,sale/_search’Searches inside documents of type product or sale in the test index

curl -XGET ‘localhost:9200/test/product,sale/_search’Searches inside documents of type product or sale in the test index

curl -XGET ‘localhost:9200/test,production/product/_search’Searches for documents of type product in the test or production indexes

curl -XGET ‘localhost:9200/_all/product/_search’ Searches for documents of type product across all indexes

curl -XGET ‘localhost:9200/_search’ Searches across all indexes and all types within them

29

Page 30: Intro to Elasticsearch

The ES Query Object

{

size: number of results to return (defaults to 10) ...

from: offset into results (defaults to 0) ...

fields: ... specific fields that should be returned ...

sort: ... sort options ...

query: {

... "query" object following the Query DSL ...

},

filter: {

...a filter spec that will be used to eliminate documents from results

note that this filter only filters on the returned results, not from the index

itself...

},

facets: {

...facets specifications...

}

}

•By Providing a “Query Object” (JSON Blob) to ES during a search operation, you can form very complex search queries

• The size, from, and sort attributes effect how many results are returned and in what order

• The query, filter, facets attributes are the used to control the content of search results

• The query attribute is very customizable and has it’s own flexible DSL

30

Page 31: Intro to Elasticsearch

Queries vs. Filters•Since both queries and filters can return similar search results, it

can be confusing to know which one to use for a given search scenario

•The ES Guide offers some general advice for when to use queries vs. filters:

Use queries when:

•Full text search is needed•The results of the search depend on a relevance score

Use filters when:

•Results of search are binary (yes/no)•Querying against exact values

31

Page 32: Intro to Elasticsearch

Simple Term Filter

•Matches all documents that have a field containing the search term.

•Search terms are not analyzed

•Scores all matched documents the same (1.0 by default)

curl -XGET 'localhost:9200/test/product/_search?format=yaml' \-d @000-term-filter.json ---...hits: total: 7 max_score: 1.0 hits: - _index: "test" _type: "product" _id: "1" _score: 1.0 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.0 _source: category: "electronics" price: 299.99 name: "iPhone"...

{ "query": { "constant_score": { "filter": { "term": { "category": "electronics" } } } }} 000-term-filter.json

32

Page 33: Intro to Elasticsearch

Simple Term Query

•This is the same as the previous query but uses a query instead of a filter

•Matches all documents that have a field containing the search term.

•Search terms are not analyzed

•Performs document relevancy scoring on hits

curl -XGET 'localhost:9200/test/product/_search?format=yaml' \-d @001-basic-term-query.json ---...hits: total: 7 max_score: 1.8109303 hits: - _index: "test" _type: "product" _id: "1" _score: 1.8109303 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.8109303 _source: category: "electronics" price: 299.99 name: "iPhone"- _index: “test” _type: "product" _id: "3" _score: 1.8109303 _source: category: "electronics" price: 499.0 name: "ipad" ...

{ "query": { "term": { "category": "electronics" } }} 001-basic-term-query.json

33

Page 34: Intro to Elasticsearch

Prefix Queries

•Matches all documents that have fields that start with the specified prefix

•Search terms are not analyzed

curl -XGET 'localhost:9200/test/product/_search?format=yaml' \-d @003-prefix-query.json ---hits: total: 3 max_score: 1.0 hits: - _index: "test" _type: "product" _id: "1" _score: 1.0 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.0 _source: category: "electronics" price: 299.99 name: "iPhone" ...

{ "query": { "prefix": { "name": "ip" } }}

003-prefix-query.json

34

Page 35: Intro to Elasticsearch

Complex Queries

•This query finds all electronics products costing less than 300 dollars

•a bool query allows us to composite individual query pieces with must, must_not and should clauses

curl -XGET 'localhost:9200/test/product/_search?format=yaml' \-d @006-complex-bool-query.json ---hits: total: 5 max_score: 1.8109303 hits: - _index: "test" _type: "product" _id: "1" _score: 1.8109303 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.8109303 _source: category: "electronics" price: 299.99 name: "iPhone" - _index: "test" _type: "product" _id: "5" _score: 1.8109303 _source: category: "electronics" price: 139.99 name: "beats audio headphones"...

{ "query": { "bool": { "must": { "term": { "category": "electronics" } }, "must_not": { "range": { "price": { "from": 300 } } } } } } 006-complex-bool-query.json

35

Page 36: Intro to Elasticsearch

Facet Support in ES• Facet queries allow for faceted navigation whereby users of a search enabled application can see

aggregate stats relevant to their search results

• Example: Querying a product catalog for all “electronics” products and then getting back a list of the Top 10 sub-categories under that section with the total count of “number of items” per sub-category

• By default facet counts returned are scoped to the query being performed. This can be altered by using the scope: global attribute on your search request

• In addition to TopN on arbitrary fields, ES Also supports facets for:

✴ Documents within a user-defined ranges (e.g. price)

✴ Histogram counts with user-defined bin sizes

✴ Date Histograms with user-defined interval sizes

✴ Statistical Field Faceting (min,max,variance,std deviation, ss)

✴ Geographical distance from an arbitrary lat-lon

• The true power of facets lies in the fact that they allow you to combine aggregate calculations with arbitrary search-driven drill-down and get real-time results. This creates a powerful platform for complex online analytics.

36

Page 37: Intro to Elasticsearch

Facet Queries

•Due to the search scope settings the categories are global, but the price statistics are local to the search results

curl -XGET 'localhost:9200/test/product/_search?format=yaml' \-d @007-complex-with-facets.json ---...facets: category_breakdown: _type: "terms" missing: 0 total: 18 other: 0 terms: - term: "electronics" count: 7 - term: "sports" count: 3 - term: "entertainment" count: 3 - term: "clothing" count: 3 - term: "housewares" count: 2 price_stats: _type: "statistical" count: 5 total: 1169.95 min: 129.99 max: 299.99 mean: 233.99 sum_of_squares: 306476.6005 variance: 6543.999999999988 std_deviation: 80.89499366462667...

{ "query": { ... same as previous complex query ... }, "facets": { "category_breakdown": { "terms" : { "field" : "category", "size" : 10 }, "global": true }, "price_stats" : { "statistical": { "field": "price" } }

}} 007-complex-with-facets.json

37

Page 38: Intro to Elasticsearch

Performance Tips•Use filters instead of queries when possible. Doing so leverages

underlying efficiencies and cache opportunities from Lucene. From the ES documentation:

Filters are very handy since they perform an order of magnitude better than plain queries since no scoring is performed and they are automatically cached.

Filters can be a great candidate for caching. Caching the result of a filter does not require a lot of memory, and will cause other queries executing against the same filter (same parameters) to be blazingly fast.

• Don’t store implicit fields unless they are needed.

_source This field stores the entire source document by default, if you don’t need this not storing saves significant storage space

_all This field stores all stored fields in a single field by default, if you don’t need to search for values in all fields for a given index and type you can leave it off.

38

Page 39: Intro to Elasticsearch

Security Considerations• Default access to ES, including its management APIs, is over unauthorized/unauthenticated

REST-based APIs over plain HTTP. Can be used for various tasks, such as dropping the index or modifying the index definition to store more data.

• In a production setting you should ensure:

✴ ES in only accessible from behind a firewall, don’t expose HTTP endpoints outside of a firewall!

✴ Set http.enabled = false to disable Netty and HTTP access on nodes that do not need to expose it. Alternatively, can use the ES Jetty plugin (https://github.com/sonian/elasticsearch-jetty) to implement authentication and encryption.

• If you have more stringent security requirements consider the following:

✴ By default ES uses multicast auto-discovery with an auto-join capability for new nodes. Use unicast whitelisting instead to ensure that new “rogue” nodes can’t be started nefariously.

✴ The lucene index data is stored on the node-local filesystem by default in unencrypted files. At a minimum, set proper file system access controls to prevent unauthorized access. You may also want to consider using an encrypted filesystem for your data directories to protect the data while it is stored.

39

Page 40: Intro to Elasticsearch

Cluster Load Balancing•ES nodes can have up to three roles:

✴Master - Master nodes are eligible for being declared the master for the whole cluster. Master nodes act as coordinators for the entire cluster. Only a single master is active at one time and if it fails a new one is automatically selected from the master election pool

✴Data Nodes - Data nodes hold the lucene index shards that make up ES distributed indexes

✴Client Nodes - Client nodes handle incoming client REST requests and coordinate data to satisfy them from the cluster’s data nodes

•The default mode of operation for ES is to have each node take on all 3 roles within the cluster but you can tweak this in elasticsearch.yml and opt out of being a master or data node.

40

Page 41: Intro to Elasticsearch

Cluster Load Balancing

Node 1

Data

Node 2

Data

Node 3

Data

Node 4 Node 5

master master

Example Architecture

Data nodes arethe workhorses of

the cluster so they are

not configured to be master eligible.

Client nodes handle incoming REST client requests and are

also both eligible master nodes in this cluster topology.

If we had more nodes we could have configured

dedicated master nodes as well.

41

Page 42: Intro to Elasticsearch

Plugins

•Plugins extend the core ES capability and provide extended or optional functionality.

•Plugins can also have a “site” associated with them. This allows plugin writers to create third-party web-based UIs for interacting with their plugins

•Some common plugins provide additional transport capabilities or integrations with other data technologies like NoSQL databases, relational databases, JMS, etc.

42

Page 43: Intro to Elasticsearch

Recommended PluginsTwo management plugins are especially useful:

Elasticsearch HeadA plugin that provides a very nice UI for visually the state of an entire ES cluster. Also includes a query UI with a tabular results grid

BigDeskA plugin that shows the last hour’s heap,thread,disk and cpu utilization, index request stats and much more across the cluster.

43

Page 44: Intro to Elasticsearch

References✴ Official Elasticsearch Reference Guide http://bit.ly/1kx8g4R

✴ Elasticsearch Query Tutorial http://bit.ly/1cfaTVj

✴ ES Index Design Patterns and Analytics (By ES creator) http://bit.ly/1kx8s3X

✴ More complicated mapping in Elasticsearch http://bit.ly/1bZNoPd

✴ Using Elasticsearch to speed up filtering http://bit.ly/JSWtj7

✴ On Elasticsearch Performance http://bit.ly/J84j8o

44