43
Elastic Resource Scheduling with Apache Mesos Sharma Podila Aug 31st, MesosCon Europe 2016

Podila mesos con europe keynote aug sep 2016

Embed Size (px)

Citation preview

Page 1: Podila mesos con europe keynote aug sep 2016

Elastic Resource Scheduling with Apache Mesos

Sharma PodilaAug 31st, MesosCon Europe 2016

Page 2: Podila mesos con europe keynote aug sep 2016

them Wisely.

Finite. Let’s use

Computing Resources are

Page 3: Podila mesos con europe keynote aug sep 2016

Computing Resources arethem Wisely. Let’s schedule

Finite. Let’s useth

em o

ptim

ally

.

Page 4: Podila mesos con europe keynote aug sep 2016

About Me

● Software engineer○ Netflix Edge Engineering○ Sun Microsystems + Oracle Corp.○ Resource scheduling, stream processing,

distributed systems

● Author of Fenzo scheduling library

Page 5: Podila mesos con europe keynote aug sep 2016

● Why Apache Mesos?

● Why focus on scheduling?

● How to guarantee capacity for various apps?

● What’s needed from the container executor?

Let’s address a few questions

Page 6: Podila mesos con europe keynote aug sep 2016

Source: https://www.sandvine.com/news/global_broadband_trends.asp

81 Million subscribers worldwide and growing!

Page 7: Podila mesos con europe keynote aug sep 2016

Microservices architecture on EC2

Page 8: Podila mesos con europe keynote aug sep 2016

Why Apache Mesos?

Page 9: Podila mesos con europe keynote aug sep 2016

Needed to build these...

Needle in a haystack anomaly detection

Page 10: Podila mesos con europe keynote aug sep 2016

Needed to build these...

Needle in a haystack anomaly detection

Container deployment service for a mix of batch and service workloads

Page 11: Podila mesos con europe keynote aug sep 2016

Reactive stream processing: Mantis

Zuul Cluster

API Cluster

MantisStream processing

Cloud native service

● Configurable message delivery guarantees● Heterogeneous workloads

○ Real-time dashboarding, alerting○ Anomaly detection, metric generation○ Interactive exploration of streaming data

AnomalyDetection

Page 12: Podila mesos con europe keynote aug sep 2016

EC2

VPC

VMVM

Titu

s Jo

b C

ontro

l

Containers

AppCloud Platform

(metrics, IPC, health)

VMVM

BatchContainers

Eureka Edda

Container deployment: Titus

Atlas & Insight

Page 13: Podila mesos con europe keynote aug sep 2016

A few common themes

Large variation in peak to trough resource requirements

Mantis events/sec

8M

2M

Titus concurrent jobs

1000s

10s

Page 14: Podila mesos con europe keynote aug sep 2016

A few common themes

Heterogeneous mix of jobs and resources

Resource Task request Agent sizes

CPU 1 - 32 CPUs 8 - 32 CPUs

Memory 2 - 200+ GB 32 - 244 GB

Network bandwidth

10 - 1024 Mbps 1024 - 10240

Resource affinity based on task typeTask locality

Page 15: Podila mesos con europe keynote aug sep 2016

A few common themes

Jobs needing high availability of tasks across ephemeral cloud resources

Host1ec2 zone=d

Host2ec2 zone=e

Host3ec2 zone=f

Job with N tasks

Page 16: Podila mesos con europe keynote aug sep 2016

What kind of scheduler do I need?

Scheduler

Cluster wide optimizations:#servers, heterogeneous mix, security

User centric optimizations:Resource affinity, task locality

Assignments

Achieve multiple scheduling objectives

Page 17: Podila mesos con europe keynote aug sep 2016

Functions of a framework

Framework

AP

I Resource Scheduling

Persistence

Domain specific

Environment specific

Potentially common

Page 18: Podila mesos con europe keynote aug sep 2016

NetflixOSS Fenzo scheduling libraryhttps://github.com/Netflix/Fenzo

● Heterogeneous mix of task and resource sizes● Autoscaling of Mesos agent clusters● Customizable scheduling objectives

Page 19: Podila mesos con europe keynote aug sep 2016

Scheduling optimizationsSpeed Accuracy

First fit assignment Optimal assignment

Real world tradeoffs

Page 20: Podila mesos con europe keynote aug sep 2016

For each task

On each host

Validate hard constraints

Eval fitness and soft constraints

Until fitness “good enough”, and

A minimum #hosts evaluated

Fenzo Scheduling strategy

= Plugins

Sample plugins: bin packing fitness function and soft/hard constraint evaluators for resource affinity and task locality

Page 21: Podila mesos con europe keynote aug sep 2016

Fenzo agent cluster autoscaling

● Scaling up is relatively easy● Scaling down requires bin packing

○ By resource footprint, runtime, etc.

Host 1 Host 2 Host 3 Host 4

vs.Host 1 Host 2 Host 3 Host 4

Page 22: Podila mesos con europe keynote aug sep 2016

Capacity Guarantees

Page 23: Podila mesos con europe keynote aug sep 2016

Capacity guarantees

Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^

Agreed upon

Page 24: Podila mesos con europe keynote aug sep 2016

Capacity guarantees

Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^

Agreed upon

Generally, optimize throughput for batch jobs and start latency for service jobs

Page 25: Podila mesos con europe keynote aug sep 2016

Capacity guarantees

Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^

Agreed upon

Some service style jobs may be less important

Categorize by expected behavior instead:Critical versus Flex (flexible scheduling requirements)

Generally, optimize throughput for batch jobs and start latency for service jobs

Page 26: Podila mesos con europe keynote aug sep 2016

Capacity guarantees

Critical

Flex

Quotas

Page 27: Podila mesos con europe keynote aug sep 2016

Capacity guarantees

Critical

FlexCritical

Flex

ResourceAllocationOrder

Quotas Prioritiesvs.

Page 28: Podila mesos con europe keynote aug sep 2016

AppC

1

AppC

2

AppC

3

AppC

N

AppF1

AppF2

AppFN

AppF3

ResourceAllocationOrder

Capacity guarantees: hybrid view

Critical

Flex

Page 29: Podila mesos con europe keynote aug sep 2016

Critical

Capacity guarantees: hybrid view

Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?

Flex

Page 30: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: hybrid view

Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?

Dynamic scheduling

Critical

Flex

Page 31: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: hybrid view

Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?

Automatic advance reservationTask T2

Dynamic scheduling

T1 T2

HostA

Critical

Flex

Time

Page 32: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: hybrid view

Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?

Automatic advance reservationTask T2

Dynamic scheduling

T1 T2

HostA

Critical

Flex

Time

Underutilization

Page 33: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: hybrid view

Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?

Automatic advance reservationTask T2

Back filling improves utilizationTask T3

Dynamic scheduling

T1 T2

Time

T3

HostA

Critical

Flex

Page 34: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: “utilization”

What if ‘Critical’ is under utilizing?Let Flex use it, but …

Critical

Flex

Page 35: Podila mesos con europe keynote aug sep 2016

Capacity guarantees: “utilization”

What if ‘Critical’ is under utilizing?Let Flex use it, but …

Preemptions“Fairness” via composable functions

Critical

Flex

Page 36: Podila mesos con europe keynote aug sep 2016

Container Executor

Page 37: Podila mesos con europe keynote aug sep 2016

Container executor

+ <MULTI-TENANT

Page 38: Podila mesos con europe keynote aug sep 2016

Container executor

+ <Augment missing pieces:

IP per containerSecurity - Security Groups, IAM rolesIsolation for networking b/w, disk I/O

MULTI-TENANT

Page 39: Podila mesos con europe keynote aug sep 2016

No IP Needed

Task 0

SecGrp Y

Task 1 Task 2 Task 3

docker0 (*)

EC2 VMeth0

eni0SG=Titus Agent

eth1

eni1SecGrp=X

eth2

eni2SG=Y

IP 1IP 2

IP 3

pod rootveth<id>

app

SecGrp X

pod rootveth<id>

app

SecGrp X

pod rootveth<id>

appapp

veth<id>

Linux Policy Based Routing

EC2 Metadata

Proxy

169.254.169.254IPTables NAT (*)

* **

169.254.169.254

Plumbing VPC Networking into Docker

Page 40: Podila mesos con europe keynote aug sep 2016

In Summary...

Page 41: Podila mesos con europe keynote aug sep 2016

Computing Resources arethem Wisely. Let’s schedule

Finite. Let’s useth

em o

ptim

ally

.

Page 42: Podila mesos con europe keynote aug sep 2016

Computing Resources arethem Wisely. Let’s schedule

Finite. Let’s useth

em o

ptim

ally

.

And

, let

’s c

olla

bora

te ^

Page 43: Podila mesos con europe keynote aug sep 2016

Questions?

Elastic Resource Scheduling with Apache MesosSharma Podila spodila @ netflix . com

@podila linkedin.com/in/spodila