54
HASHICORP Deploying and Discovering at Scale with Consul & Nomad

SF HashiCorp User Group at GitHub

Embed Size (px)

Citation preview

Page 1: SF HashiCorp User Group at GitHub

HASHICORP

Deploying and Discovering at Scale with Consul & Nomad

Page 2: SF HashiCorp User Group at GitHub

HASHICORP

Jon Benson @jm_benson

Page 3: SF HashiCorp User Group at GitHub

HASHICORP

Agenda

• Nomad Overview

• Nomad Architecture

Page 4: SF HashiCorp User Group at GitHub

HASHICORP

Agenda

• Consul Overview

• Consul Architecture

• Prepared Queries

Page 5: SF HashiCorp User Group at GitHub

HASHICORP

Agenda

• Demo

• Questions!

Page 6: SF HashiCorp User Group at GitHub

HASHICORP

Advantages of a Scheduler

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 7: SF HashiCorp User Group at GitHub

HASHICORP

Advantages of a Scheduler

Bin Packing

Over-Subscription

Job Queueing

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 8: SF HashiCorp User Group at GitHub

HASHICORP

Advantages of a Scheduler

Abstraction

API Contracts

Standardization

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 9: SF HashiCorp User Group at GitHub

HASHICORP

Advantages of a Scheduler

Priorities

Resource Isolation

Pre-emption

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 10: SF HashiCorp User Group at GitHub

HASHICORP

Page 11: SF HashiCorp User Group at GitHub

NomadHASHICORP

Cluster Scheduler

Easily Deploy Applications

Job Specification

Page 12: SF HashiCorp User Group at GitHub

HASHICORP

example.nomad

# Define our simple redis jobjob "redis" {

# Run only in us-east-1 datacenters = ["us-east-1"]

# Define the single redis task using Docker task "redis" { driver = "docker"

config { image = "redis:latest" }

resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 dynamic_ports = ["redis"] } } }}

Page 13: SF HashiCorp User Group at GitHub

HASHICORP

Job Specification

Declares what to run

Page 14: SF HashiCorp User Group at GitHub

HASHICORP

Job Specification

Nomad determines where and manages how to run

Page 15: SF HashiCorp User Group at GitHub

HASHICORP

Job Specification

Abstract work from resources

Page 16: SF HashiCorp User Group at GitHub

NomadHASHICORP

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 17: SF HashiCorp User Group at GitHub

NomadHASHICORP

Multi-Datacenter

Multi-Region

Flexible Workloads

Job Priorities

Bin Packing

Large Scale

Operationally Simple

Page 18: SF HashiCorp User Group at GitHub

HASHICORP

Thousands of regions

Tens of thousands of clients per region

Thousands of jobs per region

Page 19: SF HashiCorp User Group at GitHub

HASHICORP

Built on Experience

gossip consensus

Page 20: SF HashiCorp User Group at GitHub

HASHICORP

• Cluster Management

• Gossip Based (P2P)

• Membership

• Failure Detection

• Event System

Page 21: SF HashiCorp User Group at GitHub

HASHICORP

• Gossip Protocol

• Large Scale

• Production Hardened

• Operationally Simple

Page 22: SF HashiCorp User Group at GitHub

HASHICORP

• Service configuration and discovery

• Monitoring at scale

• High-availability

Page 23: SF HashiCorp User Group at GitHub

HASHICORP

• Service configuration and discovery

• Monitoring at scale

• High-availability

Page 24: SF HashiCorp User Group at GitHub

HASHICORP

• Service configuration and discovery

• Monitoring at scale

• High-availability

Page 25: SF HashiCorp User Group at GitHub

HASHICORP

• Service configuration and discovery

• Monitoring at scale

• High-availability

Page 26: SF HashiCorp User Group at GitHub

HASHICORP

Page 27: SF HashiCorp User Group at GitHub

HASHICORP

• Multi-Datacenter

• Raft Consensus

• Large Scale

• Production Hardened

• Coordination (Locking)

• Central Servers + Distributed Clients

• Network Tomography

• Prepared Queries

Page 28: SF HashiCorp User Group at GitHub

HASHICORP

Prepared Queries

• Multiple instances of a given service exist in multiple

datacenters

• Clients can talk to any of them, and always prefer the

instances with lowest latency

• Policies can change, desire to not have the clients know

the details of how to locate a healthy service

Page 29: SF HashiCorp User Group at GitHub

HASHICORP

Prepared Queries

• New query namespace, similar to services

• Register queries to answer for parts of this namespace

• Clients use APIs, or “.query.consul” DNS lookups to run

queries

Page 30: SF HashiCorp User Group at GitHub

HASHICORP

gossip consensus

Mature Libraries

Design Patterns

No Scheduling Logic

Page 31: SF HashiCorp User Group at GitHub

HASHICORP

Built on Research

gossip consensus

Page 32: SF HashiCorp User Group at GitHub

NomadHASHICORP

Inspired by Google Omega

Optimistic Concurrency

Internal State and Coordination

Service and Batch workloads

Pluggable Architecture

Page 33: SF HashiCorp User Group at GitHub

HASHICORP

Single Region Architecture

SERVER SERVER SERVER

CLIENT CLIENT CLIENTDC1 DC2 DC3

FOLLOWER LEADER FOLLOWER

REPLICATIONFORWARDING

REPLICATIONFORWARDING

RPC RPC RPC

Page 34: SF HashiCorp User Group at GitHub

HASHICORP

Multi Region Architecture

SERVER SERVER SERVERFOLLOWER LEADER FOLLOWER

REPLICATIONFORWARDING

REPLICATION

REGION B GOSSIP

REPLICATION REPLICATIONFORWARDING

REGION FORWARDING

REGION A

SERVERFOLLOWER

SERVER SERVERLEADER FOLLOWER

Page 35: SF HashiCorp User Group at GitHub

NomadHASHICORP

Region is Isolation Domain

1-N Datacenters Per Region

Flexibility to do 1:1 (Consul)

Scheduling Boundary

Page 36: SF HashiCorp User Group at GitHub

HASHICORP

Data Model

Page 37: SF HashiCorp User Group at GitHub

HASHICORP

Evaluations ~= State Change Event

Page 38: SF HashiCorp User Group at GitHub

HASHICORP

Create / Update / Delete JobNode Up / Node Down

Allocation Failed

Page 39: SF HashiCorp User Group at GitHub

HASHICORP

External Event

Evalua?on Crea?on

Evalua?on Queuing

Evalua?on Processing

Op?mis?c Coordina?on

State Updates

Page 40: SF HashiCorp User Group at GitHub

HASHICORP

Server Architecture

Omega Class Scheduler

Pluggable Logic

Internal Coordination and State

Multi-Region / Multi-Datacenter

Page 41: SF HashiCorp User Group at GitHub

HASHICORP

Client Architecture

Broad OS Support

Host Fingerprinting

Pluggable Drivers

Page 42: SF HashiCorp User Group at GitHub

HASHICORP

Fingerprinting

Operating System

Hardware

Applications

Environment

Type Examples

Kernel, OS, Versions

CPU, Memory, Disk

Java, Docker, Consul

AWS, GCE

Page 43: SF HashiCorp User Group at GitHub

HASHICORP

Fingerprinting

Constrain Placement and Bin Pack

Page 44: SF HashiCorp User Group at GitHub

HASHICORP

Fingerprinting

“Task Requires Linux, Docker, and PCI-Compliant Hardware” expressed as Constraints

Page 45: SF HashiCorp User Group at GitHub

HASHICORP

Fingerprinting

“Task needs 512MB RAM and 1 Core” expressed as Resource Ask

Page 46: SF HashiCorp User Group at GitHub

HASHICORP

Drivers

Execute Tasks Provide Resource Isolation

Page 47: SF HashiCorp User Group at GitHub

HASHICORP

Containerized

Virtualized

Standalone

Docker

Qemu / KVM

Java Jar

Static Binaries

Rocket

Page 48: SF HashiCorp User Group at GitHub

HASHICORP

Containerized

Virtualized

Standalone

Docker

Rocket

Windows Server Containers

Qemu / KVM

Hyper-V

Xen

Java Jar

Static Binaries

C#

Page 49: SF HashiCorp User Group at GitHub

NomadHASHICORP

Workload Flexibility:

Schedulers

Fingerprints

Drivers

Job Specification

Page 50: SF HashiCorp User Group at GitHub

NomadHASHICORP

Operational Simplicity:

Single Binary

No Dependencies

Highly Available

Page 51: SF HashiCorp User Group at GitHub

NomadHASHICORP

Cluster Scheduler

Easily Deploy Applications

Job Specification

Page 52: SF HashiCorp User Group at GitHub

NomadHASHICORP

Higher Resource Utilization

Decouple Work from Resources

Better Quality of Service

Page 53: SF HashiCorp User Group at GitHub

NomadHASHICORP

• Million Container Challenge

• hashicorp.com/c1m.html

• github.com/hashicorp/c1m

• Nomad 0.4

• Volume support across drivers

• Advanced networking

Page 54: SF HashiCorp User Group at GitHub

HASHICORP

Thanks!We’ll do a quick demo then answer questions…

Jon Benson @jm_benson