34
Untangling Cluster Management with Helix 1 Helix team @ LinkedIn Kishore Gopalakrishna http://www.linkedin.com/in/kgopalak @kishoreg1980

Apache Helix presentation at SOCC 2012

Embed Size (px)

DESCRIPTION

Untangling Cluster Management with Helix SOCC 2012 presentation

Citation preview

Page 1: Apache Helix presentation at SOCC 2012

1Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Untangling Cluster Management with Helix

Helix team @ LinkedIn

Kishore Gopalakrishna

http://www.linkedin.com/in/kgopalak

@kishoreg1980

Page 2: Apache Helix presentation at SOCC 2012

2

Outline

What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A

Page 3: Apache Helix presentation at SOCC 2012

3

What is Helix

Cluster management framework for distributed systems using declarative state model

Page 4: Apache Helix presentation at SOCC 2012

4

Distributed system examples

Page 5: Apache Helix presentation at SOCC 2012

5

Motivation

A system starts out simple… …but gets complex in the real world …as you address real requirements

Application

client library

SystemCall Routing

Replica 1

Replica 2

Scale Failover Bootstrapping

Page 6: Apache Helix presentation at SOCC 2012

6

Motivation

These are cluster management problems Helix solves them once… …so you can focus on your system

Scale Failover Bootstrapping

Page 7: Apache Helix presentation at SOCC 2012

7

Outline

What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A

Page 8: Apache Helix presentation at SOCC 2012

8

Use-Case: Distributed Data Store

Distributed

Node 1 Node 3

P.1

Node 2

Page 9: Apache Helix presentation at SOCC 2012

9

Use-Case: Distributed Data Store

Distributed Partitioned

Node 1 Node 3Node 2

P.4

P.9 P.10

P.11

P.12

P.1 P.2 P.3 P.7P.5 P.6

P.8

Page 10: Apache Helix presentation at SOCC 2012

10

Use-Case: Distributed Data Store

Distributed Partitioned Replicated

Node 1 Node 3Node 2

P.4

P.9 P.10

P.11

P.12

P.1 P.2 P.3 P.7P.5 P.6

P.8 P.1P.5 P.6

P.9 P.10

P.4P.3

P.7 P.8P.11 P.12

P.2P.1

Page 11: Apache Helix presentation at SOCC 2012

11

Partition Layout

Highly Available Master accepts writes Balanced distribution

Node 1 Node 3Node 2

P.4

P.9 P.10

P.11

P.12

P.1 P.2 P.3 P.7P.5 P.6

P.8 P.1P.5 P.6

P.9 P.10

P.4P.3

P.7 P.8P.11 P.12

P.2P.1

Master

Slave

Page 12: Apache Helix presentation at SOCC 2012

Failover

Node 1

P.5 P.6

P.9 P.10

P.4

Node 3

P.9 P.10

P.11

P.4P.3P.12

P.7 P.8

P.1 P.2 P.3

P.1

Node 2

P.7

P.11 P.12

P.2

P.5 P.6

P.8 P.1

Master

Slave

P.1 P.2 P.3 P.4

Page 13: Apache Helix presentation at SOCC 2012

Add Capacity

Node 1

P.5 P.6

P.9

P.4

Node 3

P.10

P.11

P.4P.3P.12

P.7

P.2 P.3

P.1

Node 2

P.7

P.11P.10

P.8P.12

P.2

P.9P.1 P.5 P.6

P.8 P.1

Node 4

P.10

P.8P.12 Master

Slave

P.1 P.5 P.9

Page 14: Apache Helix presentation at SOCC 2012

14

Use-case requirements

• Partition constraints• 1 master per partition• Balance partitions across cluster• No single-point-of-failure: replicas on different nodes

• Handle failures: transfer mastership

• Elasticity• Distribute workload across added nodes Minimize partition movement

• Meet SLAs Throttle concurrent data movement

Page 15: Apache Helix presentation at SOCC 2012

15

State machine– States

offline, slave, master

– Transitions O-S, S-O, S-M, M-S

COUNT=2

COUNT=1

minimize(maxnj N ∈ S(nj) )t1≤ 5

Declarative Problem Statement

Constraints– States– Transitions

Objective – Partition placement

S

MO

t1 t2

t3 t4

minimize(maxnj N ∈ M(nj) )

Page 16: Apache Helix presentation at SOCC 2012

16

Generalizing cluster management

STATE MACHINE

CONSTRAINTS OBJECTIVE

HELIX

Page 17: Apache Helix presentation at SOCC 2012

17

Outline

What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A

Page 18: Apache Helix presentation at SOCC 2012

18

Helix Based System Roles

Node 1 Node 3Node 2

P.4

P.9 P.10 P.11

P.12

P.1 P.2 P.3 P.7P.5 P.6

P.8 P.1P.5 P.6

P.9 P.10

P.4P.3

P.7 P.8P.11 P.12

P.2P.1

RESPONSE COMMAND

Page 19: Apache Helix presentation at SOCC 2012

Controller Execution Flow

P1:OSP1:SM

Page 20: Apache Helix presentation at SOCC 2012

20

Controller fault tolerance

Page 21: Apache Helix presentation at SOCC 2012

21

Controller fault tolerance

Page 22: Apache Helix presentation at SOCC 2012

22

Participant Plug-in code

Page 23: Apache Helix presentation at SOCC 2012

23

Spectator Plug-in code

Page 24: Apache Helix presentation at SOCC 2012

24

Benefits

Cluster operations “just work”– Bootstrapping– Failover– Add nodes

Global vs Local– Helix Controller

Global knowledge Makes cluster decisions

– Participant Local knowledge Follows orders

Page 25: Apache Helix presentation at SOCC 2012

25

Outline

What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A

Page 26: Apache Helix presentation at SOCC 2012

26

consumer group

Page 27: Apache Helix presentation at SOCC 2012

27

Consumer group: Scaling

Page 28: Apache Helix presentation at SOCC 2012

28

Consumer group: Fault tolerance

Page 29: Apache Helix presentation at SOCC 2012

29

Consumer group: state model

Page 30: Apache Helix presentation at SOCC 2012

30

Outline

What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A

Page 31: Apache Helix presentation at SOCC 2012

31

Helix usage at LinkedIn (Pictures)

Espresso– a timeline-consistent, distributed data store

Databus– a change data capture service

Search as a Service– a multi-tenant service for multiple search applications

More planned

Page 32: Apache Helix presentation at SOCC 2012

32

Summary

Building Distributed Data Systems is hard– Abstraction and modularity is key

Helix: A Generic framework for Cluster Management Simple programming model: declarative state machine

Page 33: Apache Helix presentation at SOCC 2012

Helix: Future Roadmap

• Features• Span multiple data centers• Load balancing

• Announcement • Open source: https://github.com/linkedin/helix• Apache incubation• New contributors

Page 34: Apache Helix presentation at SOCC 2012

34

Questions?