Upload
till-rohrmann
View
582
Download
3
Embed Size (px)
Citation preview
1001 Deployment Scenarios
Many different deployment scenarios
• Yarn
• Mesos
• Docker/Kubernetes
• Standalone
• Etc.
3
Different Usage Patterns
Few long running vs. many short running jobs
• Overhead of starting a Flink cluster
Job isolation vs. sharing resources
• Allowing to define per job credentials & secrets
• Efficient resource utilization by sharing
4
Job & Session Mode
Job mode
• Dedicated cluster for a single job
Session mode
• Shared cluster for multiple jobs
• Resources can be shared across jobs
5
As-Is State (Standalone)
7
Standalone Flink Cluster
Client(2) Submit Job
JobManager
TaskManager
(3) Deploy Tasks
(1) Register
TaskManager
TaskManager
As-Is State (YARN)
8
YARN
ResourceManager
YARN Cluster
Client
(1) Submit YARN App.
(FLINK)
Application Master
JobManager
TaskManager
TaskManager
TaskManager
(2) Spawn Application
Master
(4) Start
TaskManagers
(8) Deploy Tasks
(3) Poll status
(6) All
TaskManager
started
(5) Register
(7) Submit Job
Problems
No clear separation of concerns
No dynamic resource allocation
No heterogeneous resources
Not well suited for containerized execution
9
Flink Improvement Proposal 6
Introduce generic building blocks
Compose blocks for different scenarios
Mainly driven by:
11
Flip-6 design document:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
The Building Blocks
12
• ClusterManager-specific
• May live across jobs
• Manages available Containers/TaskManagers
• Used to acquire / release resources
ResourceManager
TaskManagerJobManager
• Registers at ResourceManager
• Gets tasks from one or more
JobManagers
• Single job only, started per job
• Thinks in terms of "task slots"
• Deploys and monitors job/task execution
Dispatcher
• Lives across jobs
• Touch-point for job submissions
• Spawns JobManagers
• May spawn ResourceManager
The Building Blocks
13
ResourceManager
(3) Request slotsTaskManager
JobManager
(4) Start TaskManager
(5) Register
(7) Deploy Tasks
Dispatcher
Client
(1) Submit Job
(2) Start
JobManager
(6) Offer slots
Building Flink-on-YARN
14
YARN
ResourceManager
YARN Cluster
YARN
Cluster
Client
(1) Submit YARN App.
(JobGraph / JARs)
Application Master
Flink-YARN
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(2) Spawn
Application Master
(4) Start
TaskManagers
(6) Deploy
Tasks
(5) Register(3) Request slots
Differences to old YARN mode
JARs in classpath of all components
Dynamic resources allocation
No two phase job submission
15
Building Flink-on-Mesos
16
Mesos Master
Mesos Cluster
Mesos
Cluster
Client
(1) HTTP POST
JobGraph/Jars
Flink Master Process
Flink Mesos
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(3) Start
Process (and
supervise)
(5) Start
TaskManagers
(7) Deploy
Tasks
(6) Register(4) Request slots
Flink Mesos
Dispatcher
(2) Allocate container
for Flink master
Master Container
Flink Master Process
Building Flink-on-Docker/K8S
17
Flink-Container
ResourceManager
JobManager
Program Runner
(2) Run & Start
Worker Container
TaskManager
Worker Container
TaskManager
Worker Container
TaskManager
(3) Register
(1) Container framework starts Master & Worker Containers
(4) Deploy Tasks
Containerized Execution
Single dedicated Resource- and JobManagercontainer and multiple TaskManager containers
Generalization• Start N containers
• Use leader election to determine JobManager role; remainder TaskManager role
Enabling auto-scaling groups by rescaling job to fill all available slots
18
Building Standalone
20Standalone Cluster
Flink
Cluster
Client
(2) Submit
JobGraph/Jars
Flink Master Process
Standalone
ResourceManager
TaskManager
TaskManager
TaskManager
(5) Deploy Tasks
(1) Register(4) Request
slots
JobManager JobManager
Dispatcher
(3) Start
JobManager
Standby Master Process Standby Master Process
YARN Session
ApplicationMaster
Flink-YARN
ResourceManager
(5)
Request
slots
JobManager
(A)
JobManager
(B)
Dispatcher
(4) Start
JobMngr
YARN
ResourceManager
YARN Cluster
Client
(1) Submit YARN App.
(FLINK – session)
TaskManager
TaskManager
TaskManager
(2) Spawn
Application Master
(6) Start
TaskManagers
(8, 12) Deploy Tasks
(7) Register
(3) Submit
Job A (11)
Request
slots
(10) Start
JobMngr
(9) Submit
Job B
21
Multi Job Sessions
Dispatcher spawns for each job a dedicated JobManager
Jobs run under session user credentials
ResourceManager holds on to resources• Reuse of allocated resources
• Quicker response for successive jobs
22
Miscellaneous
Resource profiles• Specify CPU & memory requirements for individual
operators
• ResourceManager allocates containers according to resource profiles
New RPC abstraction similar to Akka’s typed actors• Properly defined interface eases development
• No longer locked in on Akka
23
Conclusion
Different cluster environments have different deployment paradigms
Support for “Job” as well as “Session” mode in various environments necessary
Flip-6 architecture provides necessary flexibility to achieve both
25