Apache Helix presentation at ApacheCon 2013

  • Published on

  • View

  • Download

Embed Size (px)


Building distributed systems using Helix


  • 1.Building distributed systems using Helix h?p://helix.incubator.apache.org Apache IncubaGon Oct, 2012 @apachehelix Kishore Gopalakrishna, @kishoreg1980h?p://www.linkedin.com/in/kgopalak 1

2. Outline Introduc)on Architecture How to use Helix Tools Helix usage 2 3. Examples of distributed data systems 3 4. Lifecycle Cluster Fault Expansion tolerance Thro?le data movement MulG Re-distribuGon ReplicaGon node Fault detecGon Recovery Single Node ParGGoning Discovery Co-locaGon 4 5. Typical Architecture App. App. App. App. Cluster Network manager Node Node Node Node 5 6. Distributed search service INDEX SHARD P.1 P.2 P.5 P.6 P.3 P.4 P.3 P.4 P.1 P.2 P.5 P.6 REPLICA Node 1 Node 2 Node 3 ParGGon Fault tolerance ElasGcity management MulGple replicas Fault detecGon re-distribute Even Auto create parGGons distribuGon replicas Minimize Rack aware Controlled movement placement creaGon of Thro?le data replicas movement 7. Distributed data store P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.10 P.11 P.4 P.5 P.6 P.8 P.1 P.2 P.12 P.3 P.4 P.1 P.9 P.10 P.11 P.12 P.7 P.8 SLAVE MASTER Node 1 Node 2 Node 3 ParGGon Fault tolerance ElasGcity management MulGple replicas Fault detecGon Minimize 1 designated Promote slave downGme master to master Minimize data Even Even movement distribuGon distribuGon Thro?le data No SPOF movement 8. Message consumer group Similar to Message groups in AcGveMQ guaranteed ordering of the processing of related messages across a single queue load balancing of the processing of messages across mulGple consumers high availability / auto-failover to other consumers if a JVM goes down Applicable to many messaging pub/sub systems like kada, rabbitmq etc 8 9. Message consumer group ASSIGNMENT SCALING FAULT TOLERANCE 9 10. Zookeeper provides low level primiGves. We need high level primiGves. ApplicaGon File system Node Lock ParGGon Ephemeral Replica State TransiGon ApplicaGon Framework Consensus Zookeeper System 10 11. 11 12. Outline IntroducGon Architecture How to use Helix Tools Helix usage 12 13. Terminologies Node A single machine Cluster Set of Nodes Resource A logical en/ty e.g. database, index, task ParGGon Subset of the resource. Replica Copy of a parGGon State Status of a parGGon replica, e.g Master, Slave TransiGon AcGon that lets replicas change status e.g Slave -> Master 13 14. Core concept State Machine Constraints ObjecGves States States ParGGon Placement Oine, Slave, Master M=1, S=2 Failure semanGcs TransiGon TransiGons O->S, S->M,S->M, M->S concurrent(0->S) < 5 COUNT=2 minimize(maxnjN S(nj) ) t15S t1t2t3t4O M COUNT=1 minimize(maxnjN M(nj) ) 14 15. Helix soluGon Message consumer group Distributed search Start consumpGon MAX=1 MAX per node=5 Oine Online Stop consumpGon MAX=3 (number of replicas) 15 16. IDEALSTATE P1 P2 P3 ConguraGon Constraints 3 nodes 1 Master 3 parGGons 2 replicas 1 Slave Even N1:M N2:M N3:M StateMachine distribuGon Replica placement N2:S N3:S N1:S Replica State 16 17. CURRENT STATE N1 P1:OFFLINE P3:OFFLINE N2 P2:MASTER P1:MASTER N3 P3:MASTER P2:SLAVE 17 18. EXTERNAL VIEW P1 P2 P3 N1:O N2:M N3:M N2:M N3:S N1:O 18 19. Helix Based System Roles PARTICIPANTIDEAL STATESPECTATORControllerParition routinglogicCURRENT STATERESPONSECOMMANDP.1 P.2 P.3 P.5 P.6 P.7 P.9 P.10 P.11 P.4 P.5 P.6 P.8 P.1 P.2 P.12 P.3 P.4 P.1 P.9 P.10 P.11 P.12 P.7 P.8 Node 1 Node 2 Node 3 19 20. Logical deployment 20 21. Outline IntroducGon Architecture How to use Helix Tools Helix usage 21 22. Helix based soluGon 1. Dene 2. Congure 3. Run 22 23. Dene: State model deniGon States e.g. MasterSlave All possible states Priority TransiGons Legal transiGons S Priority Applicable to each O M parGGon of a resource 23 24. Dene: state model Builder = new StateModelDefinition.Builder(MASTERSLAVE);!// Add states and their rank to indicate priority. !builder.addState(MASTER, 1);!builder.addState(SLAVE, 2);!builder.addState(OFFLINE);!!//Set the initial state when the node starts!builder.initialState(OFFLINE); //Add transitions between the states.!builder.addTransition(OFFLINE, SLAVE);!builder.addTransition(SLAVE, OFFLINE);!builder.addTransition(SLAVE, MASTER);!builder.addTransition(MASTER, SLAVE);!!24 25. Dene: constraints State Transi)on ParGGon Y Y Resource - Y Node Y Y COUNT=2Cluster - Y S COUNT=1 State Transi)on O M ParGGon M=1,S=2 - 25 26. Dene:constraints // static constraint! builder.upperBound(MASTER, 1);!!! // dynamic constraint! builder.dynamicUpperBound(SLAVE, "R");!!! ! // Unconstrained ! builder.upperBound(OFFLINE, -1; 26 27. Dene: parGcipant plug-in code 27 28. Step 2: congure helix-admin zkSvr CREATE CLUSTER --addCluster ADD NODE --addNode CONFIGURE RESOURCE --addResource REBALANCE SET IDEALSTATE --rebalance 28 29. zookeeper view IDEALSTATE 29 30. Step 3: Run START CONTROLLER run-helix-controller -zkSvr localhost:2181 cluster MyCluster START PARTICIPANT 30 31. zookeeper view 31 32. Znode content CURRENT STATE EXTERNAL VIEW 32 33. Spectator Plug-in code 33 34. Helix ExecuGon modes 34 35. IDEALSTATE P1 P2 P3 ConguraGon Constraints N1:M N2:M N3:M 3 nodes 1 Master 3 parGGons 1 Slave 2 replicas Even StateMachine distribuGon N2:S N3:S N1:S Replica Replica placement State 35 36. ExecuGon modes Who controls what AUTO AUTO CUSTOM REBALANCE Replica Helix App App placement Replica Helix Helix App State 36 37. Auto rebalance v/s Auto AUTO REBALANCE AUTO 37 38. In acGon Auto rebalance Auto MasterSlave p=3 r=2 N=3 MasterSlave p=3 r=2 N=3 Node1 Node2 Node3 Node 1 Node 2 Node 3 P1:M P2:M P3:M P1:M P2:M P3:M P2:S P3:S P1:S P2:S P3:S P1:S On failure: Auto create replica On failure: Only change states to saGsfy and assign state constraint Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 P1:O P2:M P3:M P1:M P2:M P3:M P2:O P3:S P1:S P2:S P3:S P1:M P1:M P2:S 38 39. Custom mode: example 39 40. Custom mode: handling failure Custom code invoker Code that lives on all nodes, but acGve in one place Invoked when node joins/leaves the cluster Computes new idealstate Helix controller res the transiGon without viola)ng constraints P1 P2 P3 P1 P2 P3 Transi)ons 1 N1 MS 2 N2 S M N1:M N2:M N3:M N1:S N2:M N3:M 1 & 2 in parallel violate single master constraint N2:S N3:S N1:S N2:M N3:S N1:S Helix sends 2 aser 1 is nished 40 41. Outline IntroducGon Architecture How to use Helix Tools Helix usage 41 42. Tools Chaos monkey Data driven tesGng and debugging Rolling upgrade On demand task scheduling and intra-cluster messaging Health monitoring and alerts 42 43. Data driven tesGng Instrument Zookeeper, controller, parGcipant logs Simulate Chaos monkey Analyze Invariants are Respect state transiGon constraints Respect state count constraints And so on Debugging made easy Reproduce exact sequence of events 43 44. Structured Log File - sample timestamppartition instanceName sessionIdstate1323312236368 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236426 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236530 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236530 TestDB_91express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236561 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236561 TestDB_91express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236685 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236685 TestDB_91express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236685 TestDB_60express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236719 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236719 TestDB_91express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236719 TestDB_60express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236814 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 45. No more than R=2 slaves Time StateNumber Slaves Instance42632 OFFLINE010.117.58.247_1291842796 SLAVE110.117.58.247_1291843124 OFFLINE110.202.187.155_1291843131 OFFLINE110.220.225.153_1291843275 SLAVE210.220.225.153_1291843323 SLAVE310.202.187.155_1291885795 MASTER 46. How long was it out of whack? Number of Slaves Time Percentage 0 1082319 0.5 1 35578388 16.46 2 179417802 82.99 3 118863 0.05 83% of the Gme, there were 2 slaves to a parGGon 93% of the Gme, there was 1 master to a parGGon Number of Masters Time Percentage 0154904567.1649603591200706916 92.83503964 47. Invariant 2: State TransiGons FROM TO COUNT MASTER SLAVE 55OFFLINEDROPPED0OFFLINESLAVE298SLAVE MASTER155SLAVE OFFLINE 0 48. Outline IntroducGon Architecture How to use Helix Tools Helix usage 48 49. Helix usage at LinkedIn Espresso 49 50. In ight Apache S4 ParGGoning, co-locaGon Dynamic cluster expansion Archiva ParGGoned replicated le store Rsync based replicaGon Others in evaluaGon Bigtop 50 51. Auto scaling sosware deployment tool States Ofine< 100 Download, Congure, Start Download AcGve, Standby Congure Constraint for each state Start Download < 100 AcGve 1000 Active 1000 Standby 100 Standby 10051 52. Summary Helix: A Generic framework for building distributed systems Modifying/enhancing system behavior is easy AbstracGon and modularity is key Simple programming model: declaraGve state machine 52 53. Roadmap Features Span mulGple data centers AutomaGc Load balancing Distributed health monitoring YARN Generic ApplicaGon master for real Gme Apps Stand alone Helix agent 54. website h?p://helix.incubator.apache.org user user@helix.incubator.apache.org dev dev@helix.incubator.apache.org twi?er @apachehelix, @kishoreg1980 54


View more >