27
Till Rohrmann [email protected] @stsffap Redesigning Apache Flink’s Distributed Architecture

Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017

Embed Size (px)

Citation preview

Till Rohrmann

[email protected]

@stsffap

Redesigning Apache

Flink’s Distributed

Architecture

2

1001 Deployment Scenarios

Many different deployment scenarios

• Yarn

• Mesos

• Docker/Kubernetes

• Standalone

• Etc.

3

Different Usage Patterns

Few long running vs. many short running jobs

• Overhead of starting a Flink cluster

Job isolation vs. sharing resources

• Allowing to define per job credentials & secrets

• Efficient resource utilization by sharing

4

Job & Session Mode

Job mode

• Dedicated cluster for a single job

Session mode

• Shared cluster for multiple jobs

• Resources can be shared across jobs

5

Flink’s Current State

6

As-Is State (Standalone)

7

Standalone Flink Cluster

Client(2) Submit Job

JobManager

TaskManager

(3) Deploy Tasks

(1) Register

TaskManager

TaskManager

As-Is State (YARN)

8

YARN

ResourceManager

YARN Cluster

Client

(1) Submit YARN App.

(FLINK)

Application Master

JobManager

TaskManager

TaskManager

TaskManager

(2) Spawn Application

Master

(4) Start

TaskManagers

(8) Deploy Tasks

(3) Poll status

(6) All

TaskManager

started

(5) Register

(7) Submit Job

Problems

No clear separation of concerns

No dynamic resource allocation

No heterogeneous resources

Not well suited for containerized execution

9

Flink’s New Distributed

Architecture

10

Flink Improvement Proposal 6

Introduce generic building blocks

Compose blocks for different scenarios

Mainly driven by:

11

Flip-6 design document:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

The Building Blocks

12

• ClusterManager-specific

• May live across jobs

• Manages available Containers/TaskManagers

• Used to acquire / release resources

ResourceManager

TaskManagerJobManager

• Registers at ResourceManager

• Gets tasks from one or more

JobManagers

• Single job only, started per job

• Thinks in terms of "task slots"

• Deploys and monitors job/task execution

Dispatcher

• Lives across jobs

• Touch-point for job submissions

• Spawns JobManagers

• May spawn ResourceManager

The Building Blocks

13

ResourceManager

(3) Request slotsTaskManager

JobManager

(4) Start TaskManager

(5) Register

(7) Deploy Tasks

Dispatcher

Client

(1) Submit Job

(2) Start

JobManager

(6) Offer slots

Building Flink-on-YARN

14

YARN

ResourceManager

YARN Cluster

YARN

Cluster

Client

(1) Submit YARN App.

(JobGraph / JARs)

Application Master

Flink-YARN

ResourceManager

JobManager TaskManager

TaskManager

TaskManager

(2) Spawn

Application Master

(4) Start

TaskManagers

(6) Deploy

Tasks

(5) Register(3) Request slots

Differences to old YARN mode

JARs in classpath of all components

Dynamic resources allocation

No two phase job submission

15

Building Flink-on-Mesos

16

Mesos Master

Mesos Cluster

Mesos

Cluster

Client

(1) HTTP POST

JobGraph/Jars

Flink Master Process

Flink Mesos

ResourceManager

JobManager TaskManager

TaskManager

TaskManager

(3) Start

Process (and

supervise)

(5) Start

TaskManagers

(7) Deploy

Tasks

(6) Register(4) Request slots

Flink Mesos

Dispatcher

(2) Allocate container

for Flink master

Master Container

Flink Master Process

Building Flink-on-Docker/K8S

17

Flink-Container

ResourceManager

JobManager

Program Runner

(2) Run & Start

Worker Container

TaskManager

Worker Container

TaskManager

Worker Container

TaskManager

(3) Register

(1) Container framework starts Master & Worker Containers

(4) Deploy Tasks

Containerized Execution

Single dedicated Resource- and JobManagercontainer and multiple TaskManager containers

Generalization• Start N containers

• Use leader election to determine JobManager role; remainder TaskManager role

Enabling auto-scaling groups by rescaling job to fill all available slots

18

Multi Job Sessions

19

Building Standalone

20Standalone Cluster

Flink

Cluster

Client

(2) Submit

JobGraph/Jars

Flink Master Process

Standalone

ResourceManager

TaskManager

TaskManager

TaskManager

(5) Deploy Tasks

(1) Register(4) Request

slots

JobManager JobManager

Dispatcher

(3) Start

JobManager

Standby Master Process Standby Master Process

YARN Session

ApplicationMaster

Flink-YARN

ResourceManager

(5)

Request

slots

JobManager

(A)

JobManager

(B)

Dispatcher

(4) Start

JobMngr

YARN

ResourceManager

YARN Cluster

Client

(1) Submit YARN App.

(FLINK – session)

TaskManager

TaskManager

TaskManager

(2) Spawn

Application Master

(6) Start

TaskManagers

(8, 12) Deploy Tasks

(7) Register

(3) Submit

Job A (11)

Request

slots

(10) Start

JobMngr

(9) Submit

Job B

21

Multi Job Sessions

Dispatcher spawns for each job a dedicated JobManager

Jobs run under session user credentials

ResourceManager holds on to resources• Reuse of allocated resources

• Quicker response for successive jobs

22

Miscellaneous

Resource profiles• Specify CPU & memory requirements for individual

operators

• ResourceManager allocates containers according to resource profiles

New RPC abstraction similar to Akka’s typed actors• Properly defined interface eases development

• No longer locked in on Akka

23

Conclusion

24

Conclusion

Different cluster environments have different deployment paradigms

Support for “Job” as well as “Session” mode in various environments necessary

Flip-6 architecture provides necessary flexibility to achieve both

25

2

6

Thank you!

@stsffap

@ApacheFlink@dataArtisans

We are hiring!

data-artisans.com/careers