37
Elasticsearch - key features Alan Hardy Solutions Architect

Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

Elasticsearch - key features

Alan Hardy Solutions Architect

Page 2: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

2

Elasticsearch

Distributed, scalable, and resilient Designed for scale-out; high availability

Developer friendly API-first; schemaless, native JSON, client libraries for any language

Real-time Search & Analytics Real-time aggregations, geospatial, full-text search; query structured and unstructured data

Store, Search and Analyze

Page 3: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

3

Terminology

“node”running instance of elasticsearch

≈ one server

Page 4: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

4

Terminology

“shard”holds just a a slice of the data

lives on one nodephysical worker unit

(a single Lucene instance)

Page 5: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

5

Terminology

“index”logical namespace

points to one or more shards

shard = hash(_id) % no_of_shards

Page 6: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

6

Terminology

many segments

ssssssssmany shards

ss

one shard

ss→

I

one index

I

Page 7: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co7

scale out, not up

Page 8: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

8

Create an Index

curl -XPUT 'http://localhost:9200/logs{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 }}

To add data we need an index (one or more shards) A shard can be either a primary shard or a replica shard A document belongs to a single primary shard

Page 9: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

9

Single node cluster

one node with three primary shards creates a cluster of one node node is elected to master role within the cluster replica shards not allocated

Page 10: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

10

Add Resiliency

second node started with same cluster.name node joins cluster (discovery unicast/multicast) replica shards automatically allocated to second node

Page 11: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

11

Scale Horizontally

add another node elasticsearch automatically balances data

Page 12: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

12

Scaling out more (number_of_replicas: n)

number of primary shard fixed at index creation can dynamically increase the number of replica shards more copies of you data means higher read throughput

Page 13: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

13

Coping with failure

previous master node fails triggers a new master node election new master instantly promotes replicas to primary

Page 14: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

14

Distributed

• Replication: Data duplication

• read scalability

• high-availability

• Sharding: Data partitioning

• split logical data over several machines

• write scalability

• control data flow

Page 15: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

15

mapping

analysis query dsl

Page 16: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

16

Search

mapping

analysis query dsl

Page 17: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

17

flexible, powerful query language

query dsl

Page 18: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

18

query dsl

• relevance • full text • not cached • slower

queries filters• boolean yes/no • exact values • cached • faster

Filter first, then query remaining docs

Page 19: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

19

query dsl: basic query

GET /_search{ "query": {...} }

Page 20: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

20

query dsl: basic query

GET /_search{ "query": { "match": { "title": "search" }} }

Page 21: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

21

query dsl: filtered query

GET /_search{ "query": { "filtered": { "query": {...}, "filter": {...} } }}

Page 22: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

22

query dsl: filtered query

GET /_search{ "query": { "filtered": { "query": { "match": { "title": "search" }}, "filter": { "term": { "status": "active" }} } }}

Page 23: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

23

other filter types

WHERE field CONTAINS "value"term filter

"term": { "title": "brown" }

WHERE field IN ["val",…]terms filter

"terms": { "title": ["quick", "pets"] }

Page 24: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

24

other filter types

WHERE field >= x AND field < y

range filter

"range": { "content":{ "gte": 10, "lt": 80 } }

"range": { "date":{ "gte": "2014-01-01", "lt": "2041-02-01" } }

Page 25: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

25

boolean filter types

"bool": { "must": [ <filters> ], "should": [ <filters> ], "must_not": [ <filters> ] }

AND

OR

NOT

Page 26: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

26

query dsl: full example{ "filtered": { "query": { "match": { "title": "full text search" }}, "filter": { "bool": { "must": { "range": { "created": { "gte": "now - 1d / d" }}}, "should": [ { "term": { "featured": true }}, { "term": { "starred": true }} ], "must_not": { "term": { "deleted": false }} } } }}

Page 27: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

27

query dsl: filters cached individually{ "filtered": { "query": { "match": { "title": "full text search" }}, "filter": { "bool": { "must": { "range": { "created": { "gte": "now - 1d / d" }}}, "should": [ { "term": { "featured": true }}, { "term": { "starred": true }} ], "must_not": { "term": { "deleted": false }} } } }}

Page 28: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co28

analytics (aggregations dsl)

Page 29: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co29

Types of Aggregations

• Terms• Date Histogram• Filter• Range• Nested• Children• ….

Buckets• Stats• Percentile• Cardinality• Top hits• Scripted• Max | Min | Avg• ….

Metrics

Page 30: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co30

aggs = buckets + calculated metric

CA

TX

MA

CO

AZ

Page 31: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

31

How do aggs work?

data nodes

coordinating node

• ‘inline’ with search query • execute in isolation on each shard • 4 phases • parse • collect • combine • reduce

Page 32: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

32

Phase 1 : Parse

• Coordinating node splits the request into shard request

• shards parse aggregation and initialize data structures

data nodes

coordinating node

Page 33: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

33

Phase 2 + 3: Collect & Combine

• shards process all matching documents

• once done, they combine the aggregated data into an aggregation

data nodes

coordinating node

Page 34: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

34

Phase 4: Reduce

• shards sends their aggregation to the coordinating node

• coordinating node reduces them into a single aggregation

34

data nodes

coordinating node

Page 35: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co35

Aggregation DSL Example

.. “aggs”: { “by_date”: { “date_historgram”: {

“field”: “timestamp”, “interval”: “day” }, “aggs”: { “max_temperature”: { “max” : { “field”:”temperature” } } }

Request.. “aggregation”: { “by_date”: { “buckets”: [ { “key”: “2015-01-01T00:00:00.000Z”, “doc_count”: 24, “max_temperature”: { “value” : 23 } }] } }…

Response

Page 36: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited

36

• Single network round-trip • Single pass through the data on shards • Aggregates are computed in-memory • Trades accuracy for speed in some use cases • Aggregations can be composed • Near real-time response times

Designed for speed and scale

Page 37: Elasticsearch - key featuresfiles.meetup.com/4046992/Elastic-key-features_2015(Alan).pdf · Elasticsearch Distributed, scalable, and resilient Designed for scale-out; high availability

Q & A