33
Dr. Bernd Mathiske Senior Software Architect Mesosphere Why the Datacenter needs an Operating System 1

Why the Datacenter needs an Operating System - … · Why the Datacenter needs an Operating System ... Apache Mesos Meta-frameworks / schedulers: Aurora, Chronos, Marathon, Kubernetes,

  • Upload
    lammien

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Dr. Bernd MathiskeSenior Software Architect

Mesosphere

Why the Datacenter needs an Operating System

1

Bringing Google-Scale

Computing to Everybody

A Slice of Google Tech Transfer History

2005: MapReduce -> Hadoop (Yahoo)

2007: Linux cgroups for lightweight isolation (Google)

2009: BigTable -> MongoDB

2009: “The Datacenter as a Computer” - Barroso, Hölzle (Google)2009: Mesos - a distributed operating system kernel (UC Berkeley)

2010: Large scale production Mesos deployment (Twitter)

since 2010: Many more frameworks and quite a few meta-frameworks

Notable Operating System Developments

Single-something => multi-something: user, tasking, threading, core, …

More: bits, memory, storage, bandwidth…

OS virtualization => lightweight virtualization (cgroups, LXCs, jails, …)

Packaging => containers (docker, rkt, lmctfy, …)

Static libraries => dynamic libraries => static libraries

4

Cluster Operating Systems (Hardware Clustering)Researched since the 1980s

Trying to provide (the illusion of) a single system image

Aiming at HA, load balancing, location transparency (e.g. for storage)

Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS, Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, …

Relatively low scale (up to 100s of nodes)

Complicated to manage, less dynamic than software clustering

5

From HPC Grid to Enterprise Cloud

Condor, LSF, Maui, Moab, Quartz, SLURM, …

Typically for batch jobs

Also cover services => SOA => more job schedulers

=> grid computing => grid middleware … => cloud stacks

6

From Server Virtualization to App Aggregation

Cloud Era:Big apps, small servers

Client-Server Era:Small apps, big servers

Server

Virtualization

App App App AppApp

Aggregation

Serv Serv Serv Serv

Cloud Computing

SaaS: Salesforce demonstrated success, then many followed

PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, …

IaaS: AWS, Azure, DigitalOcean, GCE…

Private cloud stacks including IaaS: Eucalyptus, CloudStack, Joyent, OpenStack, SmartCloud, vSphere, …

8

Datacenter

✴ A facility used to house computer systems and associated components (e.g. networking, storage, cooling, sensors)

✴ In this talk we focus on how to manage and use a single production cluster of networked computers in a datacenter

✴ Such clusters range in size from 10s to 10000s of nodes

✴ Why should we and how can we end up with just one production cluster?

9

Datacenter Services

✴ LAMP (Linux, Apache, MSQL, PHP) or similar

✴ MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar

✴ Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins, Kafka, MPI, Spark, Storm, SSSP, Torque, …

✴ Private PaaS: Deis, …

✴ …

10

Operate your Laptop like your Datacenter?

From Static Partitioning to Elastic Sharing

Static Partitioning

Elastic Sharing

WEB HADOOPCACHE

WASTED

FREEFREEHADOOP

WEB

CACHE

WASTED WASTED100% —

100% —

Software Clustering

Layer between node OS and application frameworks

Scale

Multi-tenancy

High availability

Available Open Source Components

✴ 2-level scheduler: Apache Mesos

✴ Meta-frameworks / schedulers: Aurora, Chronos, Marathon, Kubernetes, Swarm, …

✴ Service discovery: Consul, HAProxy, Mesos DNS, …

✴ Highly available configuration: zk, etcd, …

✴ Storage: HDFS, Ceph, …

✴ Node OSs: lots of Linux variants

✴ Lots of app frameworks: Sparc, Storm, Cassandra, Kafka, …14

2-Level Scheduling

Scale: from 1 node to at least 10000s of nodes

Optimizing resource management

End-to-end principle: “application-specific functions ought to reside in the end nodes of a network rather than intermediary nodes”

-> Requirement for general multi-tenancy

-> Requirement for having only one production cluster

15

App

How Mesos Works

�16

Framework

Scheduler Master Slave

Master

Master

Master

Executor

Executor

Task

Task

Task

Task

zk/etcd

Ways to Run an Application

1. Vanilla job

• Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, …

2. Application of an adapted framework

• Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more…

3. Non-adapted services

• Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, …

• Provide (select) a service discovery solution

4. Program your own scheduler (and executor)17

The Mesos Framework API

✴ Currently like internal Mesos communication:

• protobuf messages over HTTP

✴ Soon:

• JSON messages over HTTP (stream)

=> no need to link with binary Mesos library and/or less to reimplement

ca. a dozen programming languages => any language

18

How to implement a framework

✴ Scheduler interface: 1 half of 2-level scheduling

• The framework knows best when to do what with what kind of resources

• About a dozen callbacks, main functionality in 2 of them:- receive resource offers

- receive task status updates

✴ Executor interface: task life-cycle management and monitoring

• Command line executor included in Mesos

• Docker executor included in Mesos

• Custom executors often not needed19

Scheduler SPI (implemented by Framework)

20

public interface Scheduler {

void registered(SchedulerDriver driver, FrameworkID frameworkId, MasterInfo masterInfo);

void reregistered(SchedulerDriver driver, MasterInfo masterInfo);

void resourceOffers(SchedulerDriver driver, List<Offer> offers);

void offerRescinded(SchedulerDriver driver, OfferID offerId);

void statusUpdate(SchedulerDriver driver, TaskStatus status);

void frameworkMessage(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, byte[] data);

void disconnected(SchedulerDriver driver);

void slaveLost(SchedulerDriver driver, SlaveID slaveId);

void executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status); void error(SchedulerDriver driver, String message);}

Minimal Scheduler Implementationclass MyFrameworkScheduler implements Scheduler { …

private TaskGenerator _taskGen;

public void resourceOffers(SchedulerDriver driver, List<Offer> offers) { if (_taskGen.doneCreatingTasks()) { for (offer : offers) { driver.declineOffer(offer.getId()); } } else { for (offer : offers) {

List<TaskInfo> taskInfos = _taskGen.generateTaskInfos(offer); driver.launchTasks(offer.getId(), taskInfos, _filters); } } }

public void statusUpdate(SchedulerDriver driver, TaskStatus status) { _taskGen.observeTaskStatusUpdate(taskStatus); if (_taskGen.done()) { driver.stop(); } } … }

21

The Developer’s Perspective

✴ Focus on application logic, not datacenter structure

✴Avoid networking-related code

✴Reuse of built-in fault-tolerance and high availability

✴Reuse distributed (infrastructure) frameworks (e.g., storage)

=> API, SDK for datacenter services

22

The Operations Engineer’s Perspective

✴ Ease of deployment/management

✴ Uniformity of deployment/management

✴ Hardware utilization rate

✴ Scaling up as business grows

✴ Scaling out sporadically

✴ Cost and time for moving to a different datacenter

✴ High availability and fault-tolerance of system services

✴ Monitoring

✴ Trouble shooting

23

Necessary Multi-Tenancy Features

Task containerization

Resource isolation

Resource and task attributes

Static and dynamic resource reservations

Reservation levels

Meta-frameworks

Dynamic scheduler update and reconfiguration

Security24

Desirable Multi-Tenancy Features

Optimistic offers

Oversubscription

Task preemption, migration, resizing, reconfiguration

Rate limiting

Auto-scaling => hybrid cloud

Infrastructure frameworks

25

Using Docker Containers in Mesos

26

Mesos Master Server

init | + mesos-master | + marathon |

Mesos Slave Server

init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | |

DockerRegistry

When a user requests a container…

Mesos, LXC, and Docker are tied together for launch

21

3

4

5

6

7

8

Other Schedulers as Meta-Frameworks in a 2-level Scheduler

YARN => https://github.com/mesos/myriad

Kubernetes => https://github.com/mesosphere/kubernetes-mesos

Swarm => Swarm on Mesos (new project)

=> run everything in one cluster

27

Myriad : Virtual YARN Clusters on Mesos

28

◦ POST /api/clusters: Registers a new YARN ◦ GET /api/clusters: Lists all registered clusters ◦ GET /api/clusters/{clusterId}: Lists the cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexup: Expands the size of cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexdown: Shrinks the size of cluster with {clusterId} ◦ DELETE /api/clusters/{clusterId}: Unregisters YARN cluster with {clusterId}. Also, kills all the nodes.

Node

Master

Mesos

Slave

Mesos

YARN

Myriad Scheduler RM

Myriad Executor

1. Launch NodeManager

1

1

1

2.5 CPU 2.5 GB

1

NM

YARN

flexU

p

2.0 CPU 2.0 GB

C1

C2

29

Kubernetes in Mesos

Portability

30

Mesos

Public Cloud Managed Cloud Your Own DC

Framework Apps

Meta-Frameworks

Vanilla Apps

Infrastructure Frameworks

The Application User’s Perspective

✴ Focus on apps, services, parameters, results

✴ Avoid dealing with datacenter operations/management

✴ Avoid adjusting system settings

✴ High availability

✴ Throughput

✴ Responsiveness

✴ Predictiveness

✴ Run everything I need

✴ Return on and safety of investment31

The Datacenter is the new form factor

✴ 2-level scheduler => single production cluster

✴ scalability and portability => avoiding hardware/cloud lock-in

✴ built-in container support => running containers at scale

✴ automation => operator efficiency

✴ repositories => apps/services readily available

✴ API and SDK => productive/quick app/service development

32

33

Above the Clouds

with Open Source!