Nomad - GOTO London · Highly Available!"#!$%&'(Built for Scale. Built on Experience...

Preview:

Citation preview

NomadHASHICORP

HASHICORP

Armon Dadgar @armon

HASHICORP

NomadHASHICORP

Cluster Manager

Scheduler

NomadHASHICORP

Cluster Manager

Scheduler

HASHICORP

Schedulers map a set of work to a set of resources

HASHICORP

CPU Scheduler

Web Server -Thread 1

CPU - Core 1

CPU - Core 2

Web Server -Thread 2

Redis -Thread 1

Kernel -Thread 1

Work (Input) Resources

CPU Scheduler

HASHICORP

CPU Scheduler

Web Server -Thread 1

CPU - Core 1

CPU - Core 2

Web Server -Thread 2

Redis -Thread 1

Kernel -Thread 1

Work (Input) Resources

CPU Scheduler

HASHICORP

Schedulers In the Wild

Type Work Resources

CPU Scheduler Threads Physical Cores

AWS EC2 / OpenStack Nova Virtual Machines Hypervisors

Hadoop YARN MapReduce Jobs Client Nodes

Cluster Scheduler Applications Servers

HASHICORP

Advantages

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

HASHICORP

Advantages

Bin Packing

Over-Subscription

Job Queueing

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

HASHICORP

Advantages

Abstraction

API Contracts

Standardization

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

HASHICORP

Advantages

Priorities

Resource Isolation

Pre-emption

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

HASHICORP

NomadHASHICORP

NomadHASHICORP

Cluster Scheduler

Easily Deploy Applications

Operationally Simple

Built for Scale

job "redis" { datacenters = ["us-east-1"]

task "redis" { driver = "docker" config { image = "redis:latest" }

resources { cpu = 500 # Mhz memory = 256 # MB

network { mbits = 10 dynamic_ports = ["redis"] } } }}

example.nomad

HASHICORP

Job Specification

Declares what to run

HASHICORP

Job Specification

Nomad determines where and manages how to run

HASHICORP

Job Specification

Nomad abstracts work from resources

HASHICORP

Containerized

Virtualized

Standalone

Docker

Rkt

Windows Server Containers

Qemu / KVM

Hyper-V

Xen

Java Jar

Static Binaries

C#

NomadHASHICORP

Declarative Job Specification

Infrastructure-As-Code

Removes Imperative Logic

External Dependencies?

NomadHASHICORP

Service Discovery?

Health Monitoring?

Application Secrets?

Stateful Applications?

job “my-app" { … task “my-app" { service { port = “http” check { type = “http” path = “/health” interval = “5s” } } }}

example.nomad

HASHICORP

Nomad Server Consul Server

Client

Nomad Consul

App 1

App N

Schedule App Register Service Monitor Health

NomadHASHICORP

Secret Distribution:

API Keys

DB Credentials

SSL/TLS Certificates

job “my-app" { … task “my-app" { env { DB_USERPASS = “foo:bar” } }}

example.nomad

Vault

Secure secret storage

Dynamic secrets

Leasing, renewal, and revocation

Auditing

Rich ACLs

Multiple client authentication methods

HASHICORP

Login

Vault Token

Vault Token + Operation

Op Response

job “my-app" { … task “my-app" { env { VAULT_TOKEN = “b6a10b96-9060-11e6-9c6f-67a52bc6b8d3” } }}

example.nomad

job “my-app" { … task “my-app" { vault { policies = [“my-app-role”] } }}

example.nomad

HASHICORP

Nomad Server

Client

Nomad

App 1

App N

Submit Job + Vault Token Verify Vault Token

Schedule App

Generate + Renew Vault Token

NomadHASHICORP

Native Vault Integration

No Secrets in Jobs

No Secrets on Client Disk

Minimize Trust

HASHICORP

Stateful Applications

Stateless Stateful

HASHICORP

Stateful Applications

Stateless StatefulAPI

Web Cache

HASHICORP

Stateful Applications

Stateless StatefulAPI

Web Cache

HDFS Cassandra MongoDB

HASHICORP

Stateful Applications

Stateless StatefulAPI

Web Cache

HDFS Cassandra MongoDB

*SQL

HASHICORP

Stateful Applications

Stateless StatefulAPI

Web Cache

HDFS Cassandra MongoDB

*SQL

EASY MEDIUM HARD

job “my-app" { … task “my-app" { ephemeral_disk { sticky = true } }}

example.nomad

HASHICORP

Moves data between tasks on the same machine

HASHICORP

Copies data between tasks on different machines

NomadHASHICORP

Easily Deploy Apps:

Declarative Jobs

Flexible Workloads

Consul Integration

Vault Integration

Sticky Volumes

HASHICORP

Operationally Simple

HASHICORP

Client Server

Built on Experience

GOSSIP CONSENSUS

Serf

Cluster Management

Gossip Based (P2P)

Membership

Failure Detection

Event System

Serf

Large Scale

Production Hardened

Simple Clustering and Federation

Consul

Service Discovery

Configuration

Coordination (Locking)

Central Servers + Distributed Clients

Consul

Multi-Datacenter

Raft Consensus

Large Scale

Production Hardened

NomadHASHICORP

Operational Simplicity:

Single Binary

No Dependencies

Highly Available

HASHICORP

Built for Scale

Built on Experience

GOSSIP CONSENSUS

Mature Libraries Proven Design Patterns

Lacking Scheduling Logic

Built on Research

GOSSIP CONSENSUS

HASHICORP

Single Region Architecture

SERVER SERVER SERVER

CLIENT CLIENT CLIENTDC1 DC2 DC3

FOLLOWER LEADER FOLLOWER

REPLICATIONFORWARDING

REPLICATIONFORWARDING

RPC RPC RPC

HASHICORP

Multi Region Architecture

SERVER SERVER SERVERFOLLOWER LEADER FOLLOWER

REPLICATIONFORWARDING

REPLICATION

REGION B GOSSIP

REPLICATION REPLICATIONFORWARDING

REGION FORWARDING

REGION A

SERVERFOLLOWER

SERVER SERVERLEADER FOLLOWER

NomadHASHICORP

Region is Isolation Domain

1-N Datacenters Per Region

Flexibility to do 1:1 (Consul)

Scheduling Boundary

HASHICORP

Hundreds of regions

Tens of thousands of clients per region

Thousands of jobs per region

Nomad

Inspired by Google Omega

Optimistic Concurrency

State Coordination

Service & Batch workloads

Pluggable Architecture

Data Model

ALLOCATION

JOB

EVALUATION

NODE

Evaluation ~= State Change

Evaluations

Create / Update / Delete Job

Node Up / Node Down

Allocation Failed / Finished

Evaluations

SCHEDULER

func(Evaluation) => []AllocationUpdates

Evaluations

SCHEDULER

func(Evaluation) => []AllocationUpdates

Service, Batch, System

HASHICORP

External Event

EvaluaBon CreaBon

EvaluaBon Queuing

EvaluaBon Processing

OpBmisBc CoordinaBon

State Updates

NomadHASHICORP

Omega Architecture

Optimistically Schedule

100’s of Jobs in Parallel

Controls for Correctness

NomadMillion Container Challenge

1,000 Jobs

1,000 Tasks per Job

5,000 Hosts on GCE

1,000,000 Containers

“– Bill Gates

640 KB ought to be enough for anybody.

2nd Largest Hedge Fund

18K Cores

5 Hours

2,200 Containers/second

NomadHASHICORP

Cluster Scheduler

Easily Deploy Applications

Operationally Simple

Built for Scale

HASHICORP

Thanks! Q/A

HASHICORP