Upload
kishore-gopalakrishna
View
894
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Building applications on YARN with Helix
Citation preview
Multi-Tenant Data Cloud with YARN & Helix
LinkedIn - Data infra : Helix, Espresso
@kishore_b_gYahoo - Ads infra : S4
Kishore Gopalakrishna
1Thursday, June 5, 14
What is YARNNext Generation Compute Platform
MapReduce
HDFS
Hadoop 1.0
MapReduce
HDFS
Hadoop 2.0
Others(Batch, Interactive, Online,
Streaming)
YARN(cluster resource management)
2Thursday, June 5, 14
What is YARNNext Generation Compute Platform
MapReduce
HDFS
Hadoop 1.0
MapReduce
HDFS
Hadoop 2.0
Others(Batch, Interactive, Online,
Streaming)
YARN(cluster resource management)
A1
A1
A2
A3
B1 C1
C5
B2
B3 C2
B4
B5
C3
C4
Enables
2Thursday, June 5, 14
HDFS/Common Area
YARNYARN Architecture
ClientResource Manager
Node Manager Node Manager
submit job
node statusnode statuscontainer request
App Package
Application Master Container
3Thursday, June 5, 14
So, let’s build something
4Thursday, June 5, 14
Example System
Generate Data
Serve
M/R
Redis Server 3
HDFS 3
- Generate data in Hadoop - Use it for serving
5Thursday, June 5, 14
Example System
Generate Data
Serve
M/R
Server 3
HDFS 3
6Thursday, June 5, 14
Example SystemRequirements
Big Data :-)
Partitioned, replicated
Fault tolerant, Scalable
Efficient resource utilization
Generate Data
Serve
M/R
Server 3
HDFS 3
6Thursday, June 5, 14
ApplicationMaster
Example System
Request Containers Assign work
Handle FailureHandle
workload Changes
RequirementsBig Data :-)
Partitioned, replicated
Fault tolerant, Scalable
Efficient resource utilization
Generate Data
Serve
M/R
Server 3
HDFS 3
6Thursday, June 5, 14
Allocation + Assignment
HDFS
Server 1 Server 2Server 3
Partition Assignment - affinity, even distribution
Replica Placement - on different physical machines
Container Allocation - data affinity, rack aware placement
M/Rp1 p2 p3 p4 p5 p6
p1 p2
p5 p4
Server 3p3 p4
p1 p6
Server 3p5 p6
p3 p2
Multiple servers to serve the partitioned data
M/R job generates partitioned data
7Thursday, June 5, 14
Failure HandlingServer 1 Server 2Server 1
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6
8Thursday, June 5, 14
Failure HandlingServer 1 Server 2Server 1
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6
8Thursday, June 5, 14
Failure HandlingServer 1 Server 2Server 1
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3 Server 4
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6
p3 p2
p5 p6
8Thursday, June 5, 14
Workload ChangesServer 1 Server 2Server 3
Workload change - Acquire/Release containers
Container change - Re-distribute work
Monitor - CPU, Memory, Latency, Tps
p1 p2
p5 p4
Server 3p3 p4
p1 p6
Server 3p5 p6
p3 p2
9Thursday, June 5, 14
Workload ChangesServer 1 Server 2Server 3
Workload change - Acquire/Release containers
Container change - Re-distribute work
Monitor - CPU, Memory, Latency, Tps
p1 p2
p5 p4
Server 3p3 p4
p1 p6
Server 3p5 p6
p3 p2
Server 3p4 p6
p2
9Thursday, June 5, 14
Workload ChangesServer 1 Server 2Server 3
Workload change - Acquire/Release containers
Container change - Re-distribute work
Monitor - CPU, Memory, Latency, Tps
p1 p2
p5
Server 3p3 p4
p1
Server 3p5 p6
p3
Server 3p4 p6
p2
9Thursday, June 5, 14
Service Discovery
Server 1 Server 2Server 3
Dynamically updated on changes Discover everything, what is running where
p1 p2
p1 p1
Server 3p3 p4
p1 p1
Server 3p5 p6
p1 p1
10Thursday, June 5, 14
Service Discovery
Server 1 Server 2Server 3
Dynamically updated on changes Discover everything, what is running where
p1 p2
p1 p1
Server 3p3 p4
p1 p1
Server 3p5 p6
p1 p1
Client Client
Service Discovery
10Thursday, June 5, 14
Building YARN Application
Writing AM is Hard and Error ProneHandling Faults, Workload Changes is non-trivial and often overlooked
Request container
How many containers
Where
Assign work
Place partitions &
replicas
Affinity
Workload changes
acquire/release
containers
Minimize movement
Faults Handling
Detect non trivial failures
new v/s reuse
containers
Other
Service Discovery
Monitoring
11Thursday, June 5, 14
Building YARN Application
Writing AM is Hard and Error ProneHandling Faults, Workload Changes is non-trivial and often overlooked
Request container
How many containers
Where
Assign work
Place partitions &
replicas
Affinity
Workload changes
acquire/release
containers
Minimize movement
Faults Handling
Detect non trivial failures
new v/s reuse
containers
Other
Service Discovery
Monitoring
Is there something that can make this easy?
11Thursday, June 5, 14
Apache Helix
12Thursday, June 5, 14
What is Helix?
Built at LinkedIn, 2+ years in production
Generic cluster management framework
Contributed to Apache, now a TLP: helix.apache.org
Decoupling cluster management from core functionality
13Thursday, June 5, 14
Helix at LinkedIn
OracleOracleOracleDB
Change Capture
ChangeConsumers
Index Search Index
User Writes
Data Replicator
In Production
ETL
HDFS
Analytics
14Thursday, June 5, 14
Helix at LinkedInIn Production
Over 1000 instances covering over 30000 partitions
Over 1000 instances for change capture consumers
As many as 500 instances in a single Helix cluster
(all numbers are per-datacenter)
15Thursday, June 5, 14
Others Using Helix
16Thursday, June 5, 14
Helix concepts
Resource (Database, Index, Topic, Task)
17Thursday, June 5, 14
Helix concepts
Resource (Database, Index, Topic, Task)
Partitionsp1 p2 p3 p4 p5 p6
17Thursday, June 5, 14
Helix concepts
Resource (Database, Index, Topic, Task)
PartitionsReplicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
17Thursday, June 5, 14
Helix concepts
Resource (Database, Index, Topic, Task)
PartitionsReplicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
Container Process
Container Process
Container Process
17Thursday, June 5, 14
Helix concepts
Resource (Database, Index, Topic, Task)
PartitionsReplicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
Container Process
Container Process
Container Process
Assignment ?
17Thursday, June 5, 14
State Model and ConstraintsHelix Concepts
18Thursday, June 5, 14
Serve
bootstrap
State Model and ConstraintsHelix Concepts
Stop
18Thursday, June 5, 14
Serve
bootstrap
State Model and ConstraintsHelix Concepts
State Constraints
Transition Constraints
Partition
Resource
Node
Cluster
Serve: 3bootstrap: 0 Max T1 transitions in
parallel
- Max T2 transitions in parallel
No more than 10 replicas
Max T3 transitions in parallel
- Max T4 transitions in parallel
Stop
18Thursday, June 5, 14
Serve
bootstrap
State Model and ConstraintsHelix Concepts
State Constraints
Transition Constraints
Partition
Resource
Node
Cluster
Serve: 3bootstrap: 0 Max T1 transitions in
parallel
- Max T2 transitions in parallel
No more than 10 replicas
Max T3 transitions in parallel
- Max T4 transitions in parallel
StateCount= Replication factor:3
Stop
18Thursday, June 5, 14
ParticipantParticipantParticipant
Helix Architecture
P1stop
bootstrapserver
P2 P5
P3
P4
P8
P6
P7
Controller
Client Client Target Provider
Provisioner
Rebalancer
assign work via callback
spectator spectator
Service Discovery
metrics
metrics
19Thursday, June 5, 14
Helix ControllerHigh-Level Overview
Resource Config
Constraints
Objectives
Controller
TargetProvider
Provisioner
Rebalancer
Number of Containers
Task-> Container Mapping
YARN RM
20Thursday, June 5, 14
Helix ControllerTarget Provider
Determine how many containers are required along with the spec
Fixed CPU Memory Bin Packing
monitoring system provides usage informationDefault implementations, Bin Packing can be used to customize further
TargetProvider
Resources p1,p2 .. pn
Existing containers c1,c2 .. cn
Health of tasks, containers cpu, memory, health
Allocation constraints
Affinity,rack locality
SLA
Fixed: 10 containersCPU headroom:30%Memory Usage: 70%
time: 5h
Number of container
release listacquire list
Container speccpu: x
memory: ylocation: L
21Thursday, June 5, 14
Helix ControllerProvisioner
Given the container spec, interact with YARN RM to acquire/release, NM to start/stop containers
YARN
Interacts with YARN RM and subscribes to notifications
22Thursday, June 5, 14
Helix ControllerRebalancer
Based on the current nodes in the cluster and constraints, find an assignment of task to node
Auto Semi-Auto Static
Rebalancer
Tasks t1,t2 .. tn
Existing containers c1,c2 .. cn
Allocation constraints &
objectives
Affinity,rack locality,
Even distribution of tasks,
Minimize movement while expanding
Assignment C1: t1,t2C2: t3,t4
User defined
Based on the FSM, compute & fire the transitions to Participants
23Thursday, June 5, 14
Example System: Helix-Based Solution
Solution
Configure App
Configure Target Provider
Configure Provisioner
Configure RebalancerGenerate Data
Serve
M/R
Server 3
HDFS 3
24Thursday, June 5, 14
Configure AppConfigure App
App Name Partitioned Data Server
App Master Package
/path/to/GenericHelixAppMaster.tar
App package /path/to/RedisServerLauncher.tar
App Config DataDirectory: hdfs:/path/to/data
Configure target providerConfigure target provider
TargetProvider RedisTargetProvider
Goal Target TPS: 1 million
Min container 1
Max containers 25
Configure ProvisionerConfigure Provisioner
YARN RM host:port
Configure RebalancerConfigure RebalancerPartitions 6Replica 2
Max partitions per container 4
Rebalancer.Mode AUTO
Placement Data Affinity
FailureHandling Even distribution
Scaling Minimize Movement
app_config_spec.yaml
Example System: Helix-Based Solution
25Thursday, June 5, 14
yarn_app_launcher.sh app_config_spec.yaml
Launch Application
26Thursday, June 5, 14
Helix + YARN
Server 1 Server 2
27Thursday, June 5, 14
Helix + YARN
YARN Resource Manager
Client
submit job
Server 1 Server 2
27Thursday, June 5, 14
Application Master
Helix + YARN
YARN Resource Manager
Client
submit job
Launch AM
Server 1 Server 2
27Thursday, June 5, 14
Application Master
Helix + YARN
Helix Controller
YARN Resource Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch AM
Server 1 Server 2
27Thursday, June 5, 14
Application Master
Helix + YARN
Helix Controller
YARN Resource Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch AM
request cntrs
Server 1 Server 2
27Thursday, June 5, 14
Node ManagerNode Manager
Application Master
Helix + YARN
Helix Controller
Node Manager
YARN Resource Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch AM
request cntrs
launch containers
Server 1 Server 2participant 3 participant 3 participant 3
27Thursday, June 5, 14
Node ManagerNode Manager
Application Master
Helix + YARN
Helix Controller
Node Manager
YARN Resource Manager
Target Provider
Provisioner
Rebalancer
assign work
Client
submit job
Launch AM
request cntrs
launch containers
Server 1 Server 2participant 3p1 p2
p5 p4
participant 3p3 p4
p1 p6
participant 3p5 p6
p3 p2
27Thursday, June 5, 14
Auto Scaling
Non linear scaling from 0 to 1M TPS and back
28Thursday, June 5, 14
Failure Handling: Random Faults
Recovering from faults at 1M Tps (5%, 10%, 20% failures/min)
29Thursday, June 5, 14
Summary
HDFS
YARN(cluster resource management)
HELIX(container + task management)
Others(Batch, Interactive, Online, Streaming)
Fault tolerance, Expansion handled transparently
Generic Application Master
Efficient resource utilization by task model
30Thursday, June 5, 14
Questions?
Website
Team
helix.apache.org, #apachehelix
@apachehelix, @kishore_b_g
Kanak Biscuitwala, Zhen Zhang?We love helping & being helped
31Thursday, June 5, 14