64
Elasticsearch in Production Alex Brasetvik [email protected] @alexbrasetvik

Elasticsearch in Production (London version)

Embed Size (px)

DESCRIPTION

Elasticsearch in production, or an overview of things you want to know about before happening upon them in production.

Citation preview

Page 1: Elasticsearch in Production (London version)

Elasticsearch in Production !

Alex Brasetvik [email protected] @alexbrasetvik

Page 2: Elasticsearch in Production (London version)

Elasticsearch in Production !

Alex Brasetvik [email protected] @alexbrasetvik

Page 3: Elasticsearch in Production (London version)

Who?

Co-founder of Found AS 8+ years search, 3+ Elasticsearch

Herding hundreds of Elasticsearch clusters

Page 4: Elasticsearch in Production (London version)

Agenda

Page 5: Elasticsearch in Production (London version)

Agenda• Anti-patterns

• Memory / Resource Usage

• Distributed problems

• Security

• Client concerns

• Changing a cluster

Page 6: Elasticsearch in Production (London version)

found.no/foundation

Elasticsearch in Production Elasticsearch as a NoSQL Database

Intro to Function Scoring All About Analyzers

Securing your Elasticsearch Cluster

Page 7: Elasticsearch in Production (London version)
Page 8: Elasticsearch in Production (London version)
Page 9: Elasticsearch in Production (London version)
Page 10: Elasticsearch in Production (London version)
Page 11: Elasticsearch in Production (London version)

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Page 12: Elasticsearch in Production (London version)

Anti-Patterns

Page 13: Elasticsearch in Production (London version)

Arbitrary Keys

• “Schema Free”

• One field per value

• Ever-growing cluster state

acls: 1234: READ 42: WRITE

Page 14: Elasticsearch in Production (London version)

Heavy Updating

• Update = Delete + Reindex

• Be careful with counters

Page 15: Elasticsearch in Production (London version)

Slow queries

• WHERE foo ILIKE ‘%bar%’

• {“query_string”: {“query”: “foo:*bar*”}}

• Don’t ask for 3300 results :)

Page 16: Elasticsearch in Production (London version)

Arbitrary searchesquery: filtered: filter: term: user_id: 42 query: [user’s query here]

Page 17: Elasticsearch in Production (London version)

Memory

Page 18: Elasticsearch in Production (London version)

Memory• Field caches

• Filter caches

• Page caches

• Aggregations

• Index building

Page 19: Elasticsearch in Production (London version)

Page Cache

• Keeping index pages in memory

• Can’t have too much

• Outgrow: Gradual slowdown

Page 20: Elasticsearch in Production (London version)

Heap Space

• Memory used by Elasticsearch process

• Field / Filter caches

• Aggregations

Page 21: Elasticsearch in Production (London version)
Page 22: Elasticsearch in Production (London version)

Time Bomb

Page 23: Elasticsearch in Production (London version)

Time Bomb

Page 24: Elasticsearch in Production (London version)

OutOfMemoryError

Woah there I ate all the memories

Your cluster may or may not work any more

Page 25: Elasticsearch in Production (London version)

OutOfMemory

• Growing too big

• Selecting too big timespan in Kibana

• Document ingestion peak

Page 26: Elasticsearch in Production (London version)

Preventing OOMs• Have enough memory :-)

• Understand your search’s memory profile

• Bulk / Circuit breaker settings

• Monitoring

• Document values

Page 27: Elasticsearch in Production (London version)

Marvel( /_stats )

Page 28: Elasticsearch in Production (London version)
Page 29: Elasticsearch in Production (London version)
Page 30: Elasticsearch in Production (London version)
Page 31: Elasticsearch in Production (London version)

"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }

Page 32: Elasticsearch in Production (London version)

Document Values

• Rely on page cache

• Only caches doc values actually used

Page 33: Elasticsearch in Production (London version)

Sizing

Page 34: Elasticsearch in Production (London version)

Sizing

• Test, don’t guess

• Start big, scale down

• Index, search, monitor

Page 35: Elasticsearch in Production (London version)
Page 36: Elasticsearch in Production (London version)
Page 37: Elasticsearch in Production (London version)
Page 38: Elasticsearch in Production (London version)

Glitch Meltdown

Page 39: Elasticsearch in Production (London version)

Glitch Meltdown

Page 40: Elasticsearch in Production (London version)
Page 41: Elasticsearch in Production (London version)
Page 42: Elasticsearch in Production (London version)

• Tie-breaker can be a cheap master-node

• Applies to data centers / availability zones too

Page 43: Elasticsearch in Production (London version)

Data-only nodes

Master-only nodes

Page 44: Elasticsearch in Production (London version)
Page 45: Elasticsearch in Production (London version)

Jepsen

Page 46: Elasticsearch in Production (London version)

Jepsen

• Kyle Kingsbury’s series on distributed systems

• Distributed systems are hard

• aphyr.com

Page 47: Elasticsearch in Production (London version)

Security

Page 48: Elasticsearch in Production (London version)

Security

• “Not my job!” – Elasticsearch

• That’s fine!

Page 49: Elasticsearch in Production (London version)

Dynamic Scripts

!

• Scoring

• Aggregations

• Updating

Page 50: Elasticsearch in Production (London version)

Dynamic Scripts

Runtime.getRuntime().exec(…)

Page 51: Elasticsearch in Production (London version)

Dynamic Scripts

Runtime.getRuntime().exec(…)

<script src=“http://127.0.0.1:9200/_search?callback=capture&…

Page 52: Elasticsearch in Production (London version)

Security

!

• Disable dynamic scripts (On by default in ≤1.1)

• Mind index patterns

• Even then, don’t accept arbitrary requests

Page 53: Elasticsearch in Production (London version)

Client Concerns

Page 54: Elasticsearch in Production (London version)

Client Concerns

• Connection pools

• Idempotent requests

• Have sane syncing/indexing strategies

Page 55: Elasticsearch in Production (London version)
Page 56: Elasticsearch in Production (London version)

# BOOM !

Page 57: Elasticsearch in Production (London version)

Cluster changes

Page 58: Elasticsearch in Production (London version)

Cluster changes

• Make new nodes join existing cluster

• No rolling restarts

• Easy rollback if things go bad

Page 59: Elasticsearch in Production (London version)

v1.0.0 v1.0.1

Page 60: Elasticsearch in Production (London version)

Cluster changes

• Test first

• Mind recover_*-settings

Page 61: Elasticsearch in Production (London version)

Multi-Cluster Workflows

• Snapshot/Restore

• Operations across clusters

• Swap clusters!

• Works well with good syncing strategy

Page 62: Elasticsearch in Production (London version)

• Rolling restarts: Risky, fast

• Grow and shrink: Less risky, copies lots of data

• Multiple clusters: Least risky, copies lots of data

Page 63: Elasticsearch in Production (London version)

Misc

• Same JVM

• ulimits

• Unicast

• Kernel-settings like IO-scheduler

Page 64: Elasticsearch in Production (London version)

?

@foundsays