46
Elasticsearch Cluster deep dive

Elasticsearch cluster deep dive

Embed Size (px)

Citation preview

Page 1: Elasticsearch  cluster deep dive

ElasticsearchCluster deep dive

Page 2: Elasticsearch  cluster deep dive

NoSQL: Text Search and Document

Page 3: Elasticsearch  cluster deep dive

Elasticsearch cluster

Page 4: Elasticsearch  cluster deep dive

Cluster documentation

Page 5: Elasticsearch  cluster deep dive

Distributed

Client Nodes

Data Nodes

Master Nodes

Ingest Nodes

Page 6: Elasticsearch  cluster deep dive

Today view of the cluster

Other Nodes

Master Nodes

Page 7: Elasticsearch  cluster deep dive

What happen when a node starts?

Starting

Page 8: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config

Master

Page 9: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config2. Each response contains:

a. cluster nameb. node detailsc. master node detailsd. cluster state version

Page 10: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config2. Each response contains:

a. cluster nameb. node detailsc. master node detailsd. cluster state version

3. Only keeps master eligible responses based on discovery.zen.master_election.ignore_non_master_pings

Page 11: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting● List of master nodes: [C, C]● List of eligible master nodes: [A, B, C]

Page 12: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join

Page 13: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

Page 14: Elasticsearch  cluster deep dive

Cluster state update

Page 15: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

Page 16: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

Page 17: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

5. Change commited and confirmation sent

Page 18: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

5. Change commited and confirmation sent

Page 19: Elasticsearch  cluster deep dive

What happen when a node starts?

E

D

A

B

C

Starting1. New node check the received state for

a. new master nodeb. no master node in the state

Page 20: Elasticsearch  cluster deep dive

Master fault detection

E

D

F A

B

C

Started● Every discovery.zen.fd.ping_interval

nodes ping master (default 1s)● Timeout is

discovery.zen.fd.ping_timeout (default 30s)

● Retry is discovery.zen.fd.ping_retries (default is 3)

Page 21: Elasticsearch  cluster deep dive

Node fault detection

E

D

F A

B

C

Started● Every discovery.zen.fd.ping_interval

nodes ping master (default 1s)● Timeout is

discovery.zen.fd.ping_timeout (default 30s)

● Retry is discovery.zen.fd.ping_retries (default is 3)

Page 22: Elasticsearch  cluster deep dive

Master election

Page 23: Elasticsearch  cluster deep dive

Minimum of candidate required

Page 24: Elasticsearch  cluster deep dive

Master election

E

D

F A

B

C

Page 25: Elasticsearch  cluster deep dive

Network Partition

E

D

F A

B

C Master election cannot happen, master steps down

Page 26: Elasticsearch  cluster deep dive

Network Partition

E

D

F A

B

C

Master fault detection triggers new master election

Page 27: Elasticsearch  cluster deep dive

Master election

1. Based on the list of master eligible nodes it chooses in priority:a. The node with the higher cluster state version (part of the ping response)b. Master eligible nodec. Sort alphabetically the id of the remaining a take the first

2. Sends a join to this new master. In the meantime it accumulates join requests

If the current node elected itself as master it waits for the minimum join requests to declare itself as master (discovery.zen.minimum_master_nodes)

In case of master failure detection, each node removes the failed master from the candidates.

Page 28: Elasticsearch  cluster deep dive

Latest cluster version

Page 29: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v18

v18

Page 30: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v19

v19

Page 31: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Page 32: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Page 33: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Page 34: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v19

v20

v20

Page 35: Elasticsearch  cluster deep dive

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v19

v20

v20

Cannot become the master

Page 36: Elasticsearch  cluster deep dive

Shard allocation

Page 37: Elasticsearch  cluster deep dive

Shard assigned to new node

1. Master will rebalance shard allocation to have:a. same average number of shard per nodeb. same average of shard per index per node avoiding 2 shard with the

same id on the same node2. Uses deciders to decide which shard goes where based on

a. Hot/Warm setup (time based indices)b. Disk usage allocation (low watermark and high watermark)c. Throttling (node is already recovering, master might again later)

Page 38: Elasticsearch  cluster deep dive

Shard initialization (Primary)

1. Master communicate through cluster state a new shard assignment2. Node initialize an empty shard3. Node notify the master4. Master mark the shard as started5. If this is the first shard with a specific id, it is marked as primary is

receives requests

Page 39: Elasticsearch  cluster deep dive

Shard initialization (Replica)

1. Master communicate through cluster state a new shard assignment2. Node initialize recovery from the primary3. Node notify the master4. Master mark the replica as started5. Node activate the replica

Page 40: Elasticsearch  cluster deep dive

Shard recovery

Page 41: Elasticsearch  cluster deep dive

Shard

S1S2S3

DISK

Memory

S1S2S3

Commit point

In memory buffer

Translog

Page 42: Elasticsearch  cluster deep dive

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Page 43: Elasticsearch  cluster deep dive

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Page 44: Elasticsearch  cluster deep dive

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Translog

Page 45: Elasticsearch  cluster deep dive

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Translog

Notifies master

Page 46: Elasticsearch  cluster deep dive

Thank you !