Characterizing andContrasting Kuhn-tey-ner
Awr-kuh-streyt-ors
http://calcotestudios.com/alldaydevops2016
All Day DevOps, November 2016
Lee Calcote
Lee Calcote
linkedin.com/in/leecalcote
@lcalcote
blog.gingergeek.com
clouds, containers, infrastructure,applications and their management
[kuh n-tey-ner] [awr-kuh-streyt-or]
Definition:
@lcalcote
FleetNomadSwarm
Kubernetes Mesos+Marathon
CaaS
@lcalcote(Stay tuned for updates to presentation and book)
Joyent TritonDocker Datacenter
AWS ECSAzure Container Service
Rackspace Carina
One size does not fit all.
A strict apples-to-apples comparison is inappropriate and notthe objective, hence characterizing and contrasting.
@lcalcote
Let's not go here today.
Container orchestrators may be intermixed.
@lcalcote
Categorically Speaking
Scheduling
Genesis & PurposeSupport & MomentumHost & Service Discovery
Modularity & ExtensibilityUpdates & MaintenanceHealth MonitoringNetworking & Load-BalancingSecrets ManagementHigh Availability & Scale
@lcalcote
Hypervisor Manager Elements
ComputeNetworkStorage
Container Orchestrator
ElementsCluster
Host (Node)Task
JobPod
ContainerApplication
ServiceVirtual IP
Secret / ConfigVolume
≈
≈@lcalcote
CoreCapabilities
Cluster Management
Host Discovery
Host Health Monitoring
Scheduling
Orchestrator Updates and Host
Maintenance
Service Discovery
Networking and Load-Balancing
Multi-tenant, multi-region
AdditionalKey CapabilitiesApplication Health Monitoring
Application Deployments
Application Performance Monitoring
@lcalcote
Nomad
Genesis & Purposedesigned for both long-lived services and short-livedbatch processing workloads. cluster manager with declarative job specifications. ensures constraints are satisfied and resourceutilization is optimized by efficient task packing. supports all major operating systems and virtualized,containerized or standalone workloads. written in Go and with a Unix philosophy.
@lcalcote
Support & MomentumProject began June 2015 has 113 contributors over 16months
Current release v0.4v0.5 to be released any day nowNomad Enterprise offering aimed for Q1-Q2 next year.
Supported and governed by HashiCorpHashiconf US '15 had ~300 attendeesHashiconf EU '16 had ~320 attendeesHashiConf US '16 had ~ 500 attendees
@lcalcoteNomad is a single binary, both for clients
and servers, and requires no externalservices for coordination or storage.
Nomad Architecture
services for coordination or storage.
@lcalcote
Host & Service Discovery
Host Discovery
Gossip protocol - Serf is usedDocker multi-host networking and Swarmkit use Serf, too
Servers advertise full set of Nomad servers to clientsheartbeats every 30 seconds
Creating federated clusters is simple
Service Discovery
Nomad integrates with to provide servicediscovery and monitoring.
Consul
@lcalcote
Scheduling
two distinct phases, feasibility checking and ranking. optimistically concurrent
enabling all servers to participate in scheduling decisionswhich increases the total throughput and reduces latency
three scheduler types used when creating jobs:service, batch and system
nomad plan point-in-time-view of what Nomad will do
@lcalcote
Modularity & ExtensibilityTask drivers
Used by Nomad clients to execute a task and provideresource isolation. By having extensible task drivers are important forflexibility to support a broad set of workloads. Does not currently support pluggable task drivers,
Have to iImplement task driver interface and compileNomad binary.
@lcalcote
Updates &Maintenance
Nodes
Drain allocations on a running node.integrates with tools like Packer, Consul, and Terraform tosupport building artifacts, service discovery, monitoring andcapacity management.
Applications
Log rotation (stderr and stdout)
no log forward support, yet
Rolling updates (via the `update` block in the job specification).
@lcalcote
Health MonitoringNodes
Node health monitoring is done via heartbeats, soNomad can detect failed nodes and migrate theallocations to other healthy clients.
Applications
currently http, tcp and script
In the future Nomad will add support for more Consulchecks.
nomad alloc-status reports actual resource utilization
@lcalcote
Networking& Load-Balancing
Networking
Dynamic ports are allocated in a range from 20000 to 60000.Shared IP address with Node
Load-Balancing
Consul provides DNS-based load-balancing
@lcalcote
Secrets Management
Nomad agents provide secure integration with Vaultfor all tasks and containers it spins up
gives secure access to Vault secrets through aworkflow which minimizes risk of secret exposureduring bootstrapping.
@lcalcote
High Availability & Scale
distributed and highly available, using both leaderelection and state replication to provide availability inthe face of failures. shared state optimistic scheduler
only open source implementation.
1,000,0000 across 5,000 hosts and scheduled in 5 min.
Built for managing multiple clusters / cluster federation.
@lcalcote
easier to usea single binary for both clients andserverssupports different non-containerizedtasksarguably the most advanced schedulerdesignupfront consideration of federation /hybrid cloudbroad OS support
Outside of scheduler, comparatively lesssophisticated
Young project
Less relative momentum
Less relative adoption
Less extensible / pluggable
@lcalcote
Docker Swarm
Docker Swarm 1.12aka
Swarmkit or Swarm mode
@lcalcote
Genesis & PurposeSwarm is simple and easy to setup. Responsible for the clustering and scheduling aspectsof orchestration. Originally an imperative system, now declarative Swarm’s architecture is not complex as those ofKubernetes and Mesos Written in Go, Swarm is lightweight, modular andextensible
@lcalcote
Docker Swarm 1.11 (Standalone)
Docker Swarm Mode 1.12 (Swarmkit)
@lcalcote
Support & Momentum
Contributions:Standalone: ~3,000 commits, 12 core maintainers (140 contributors)
Swarmkit: ~2,000 commits, 12 core maintainers (40 contributors)
~250 Docker meetups worldwide Production-ready:
Standalone announced ~12 months ago (Nov 2015)
Swarmkit announced ~3 month ago (July 2016)
@lcalcote
Host & Service DiscoveryHost Discovery
used in the formation of clusters by the Manager to discover forNodes (hosts).
Like Nomad, uses Hashicorp's go for storing cluster state
Pull model - where worker checks-in with the Manager
Rate Control - of checks-in with Manager may be controlled atManager - add jitter
Workers don't need to know which Manager is active; FollowerManagers will redirect Workers to Leader
Service DiscoveryEmbedded DNS and round robin load-balancing
Services are a new concept
MemDB
@lcalcote
SchedulingSwarm’s scheduler is pluggableSwarm scheduling is a combination of strategies andfilters/constraint:
StrategiesRandom, BinpackSpread*Plugin?
Filterscontainer constraints (affinity, dependency, port) are defined as
environment variables in the specification file
node constraints (health, constraint) must be specified when starting the
docker daemon and define which nodes a container may be scheduled on.
Swarm Mode only supports Spread
@lcalcote
Modularity & ExtensibilityAbility to remove batteries is a strength for Swarm:
Pluggable schedulerPluggable network driverPluggable distributed K/V storeDocker container engine runtime-onlyPluggable authorization (in docker engine)*
@lcalcote
Updates & Maintenance
Nodes
Nodes may be Active, Drained and PausedManager weights are used to drain or pause Managers
Manual swarm manager and worker updates
Applications
Rolling updates now supported--update-delay
--update-parallelism
--update-failure-action@lcalcote
Health MonitoringNodes
Swarm monitors the availability and resource usageof nodes within the cluster
Applications
One health check per container may be runcheck container health by running a command inside the container
--interval=DURATION (default: 30s)
--timeout=DURATION (default: 30s)
--retries=N (default: 3)
@lcalcote
Networking & Load-Balancing
Swarm and Docker’s multi-host networking are simpaticoprovides for user-defined overlay networks that are micro-segmentable
uses a gossip protocol for quick convergence of neighbor table
facilitates container name resolution via embedded DNS server (previously via etc/hosts)
You may bring your own network driverLoad-balancing based on IPVS
expose Service's port externally
L4 load-balancer; cluster-wide port publishing
Mesh routing
send a request to any one of the nodes and it will be routed automatically
send a request to any one of the nodes and it will be internally load balanced@lcalcote
Secrets Management
@lcalcote
Not yet...tracking toward 1.13
High Availability & ScaleManagers may be deployed in a highly-availableconfiguration
Active/Standby - only one active Leader at-a-time
Maintain odd number of managers
Rescheduling upon node failure No rebalancing upon node addition to the cluster
Does not support multiple failure isolation regions orfederation
although, with caveats, .
federation is possible
@lcalcote
Scaling swarm to 1,000 AWS nodesand 50,000 containers
@lcalcote
Suitable for orchestrating a combination of infrastructure containers
Has only recently added capabilities falling into the application bucket
Swarm is a young project
advanced features forthcoming
natural expectation of caveats in functionality
No rebalancing, autoscaling or monitoring, yet
Only schedules Docker containers, not containers using other specifications.
Does not schedule VMs or non-containerized processes
Does not provide support for batch jobs
Need separate load-balancer for overlapping ingress ports
While dependency and affinity filters are available, Swarm does not provide
the ability to enforce scheduling of two containers onto the same host or not
at all.
Filters facilitate sidecar pattern. No “pod” concept.
Swarm works. Swarm is simple and easy to
deploy.
1.12 eliminated need for much, but not all third-party software
Facilitates earlier stages of adoption by organizations viewing
containers as faster VMs
now with built-in functionality for applications
Swarm is easy to extend, if can already know
Docker APIs, you can customize Swarm
Still modular, but has stepped back here.
Moving very fast; eliminating gaps quickly.
Kubernetes
Genesis & Purposean opinionated framework for building distributedsystems
or as its tagline states "an open source system for automating
deployment, scaling, and operations of applications."
Written in Go, Kubernetes is lightweight, modular andextensibleconsidered a third generation container orchestratorled by Google, Red Hat and others.
bakes in load-balancing, scale, volumes, deployments, secret
management and cross-cluster federated services among other features.
Declaratively, opinionated with many key featuresincluded
@lcalcote
Support & MomentumKubernetes is young (about two years old)
Announced as production-ready 16 months ago (July 2015)
Project currently has over 1,000 commits per month(~38,000 total)
made by about 100 (862 total) Kubernauts (Kubernetes enthusiasts)
~5,000 commits made in 1.3 release (1.4 is latest)
Under the governance of the Cloud Native ComputingFoundation Robust set of documentation and ~90 meetups
@lcalcote
Host & Service DiscoveryHost Discovery
by default, the node agent (kubelet) is configured to registeritself with the master (API server)
automating the joining of new hosts to the cluster
Service DiscoveryTwo primary modes of finding a Service
DNS
SkyDNS is deployed as a cluster add-on
environment variables
environment variables are used as a simple way of providing
compatibility with Docker links-style networking @lcalcote
Scheduling
By default, scheduling is handled by kube-scheduler.PluggableSelection criteria used by kube-scheduler to identify the best-fit node is defined by policy:
Predicates (node resources and characteristics):
PodFitPorts , PodFitsResources, NoDiskConflict
, MatchNodeSelector, HostName , ServiceAffinit, LabelsPresence
Priorities (weighted strategies used to identify “best fit” node):
LeastRequestedPriority, BalancedResourceAllocation, ServiceSpreadingPriority,
EqualPriority
@lcalcote
Modularity & ExtensibilityOne of Kubernetes strengths its pluggable architecture and itbeing an extensible platform Choice of:
database for service discovery or network drivercontainer runtime
users may choose to run Docker with Rocket containers
Cluster add-onsoptional system components that implement a cluster feature(e.g. DNS, logging, etc.)
shipped with the Kubernetes binaries and are considered aninherent part of the Kubernetes clusters
@lcalcote
Updates & MaintenanceApplications
Deployment objects automate deploying and rollingupdating applications. Support for rolling back deployments
Kubernetes Components
Consistently backwards compatibleUpgrading the Kubernetes components and hosts isdone via shell script Host maintenance - mark the node as unschedulable.
existing pods are vacated from the node
prevents new pods from being scheduled on the node
@lcalcote
Health MonitoringNodes
Failures - actively monitors the health of nodes within the cluster
via Node Controller
Resources - usage monitoring leverages a combination of opensource components:
cAdvisor, Heapster, InfluxDB, Grafana
Applications
three types of user-defined application health-checks and uses theKubelet agent as the the health check monitor
HTTP Health Checks, Container Exec, TCP Socket
Cluster-level Logging
collect logs which persist beyond the lifetime of the pod’s containerimages or the lifetime of the pod or even cluster
standard output and standard error output of each container can be ingested using a
agent running on each nodeFluentd
Networking & Load-Balancing
…enter the Pod
atomic unit of schedulingflat networking with each pod receiving an IP addressno NAT required, port conflicts localizedintra-pod communication via localhost
Load-Balancing
Services provide inherent load-balancing via kube-proxy:runs on each node of a Kubernetes cluster
reflects services as defined in the Kubernetes API
supports simple TCP/UDP forwarding and round-robin and Docker-links-
based service IP:PORT mapping. @lcalcote
Secrets Managementencrypted and stored in etcdused by containers in a pod either: 1. mounted as data volumes2. exposed as environment variables
None of the pod’s containers will start until all the pods'volumes are mounted.
Individual secrets are limited to 1MB in size.
Secrets are created and accessible within a given namespace,not cross-namespace.
@lcalcote
High Availability & ScaleEach master component may be deployed in a highly-available configuration.
Active/Standby configuration
Federated clusters / multi-region deployments
Scale
v1.2 support for 1,000 node clusters
v1.3 supports 2,000 node clusters Horizontal Pod Autoscaling (via Replication Controllers ).
Cluster Autoscaling (if you're running on GCE with AWS support iscoming soon).
@lcalcote
Only runs containerized applications
For those familiar with Docker-only, Kubernetes
requires understanding of new concepts
Powerful frameworks with more moving pieces beget complicated
cluster deployment and management.
Lightweight graphical user interface
Does not provide as sophisticated techniques for
resource utilization as Mesos
Kubernetes can schedule docker or rkt
containers
Inherently opinionated w/functionality built-in.
relatively easy to change its opinion
little to no third-party software needed
builds in many application-level concepts and services
(petsets, jobsets, daemonsets, application packages /
charts, etc.)
advanced storage/volume management
project has most momentum
project is arguably most extensible
thorough project documentation
Supports multi-tenancy
Multi-master, cross-cluster federation, robust
logging & metrics aggregation
@lcalcote
Mesos+
Marathon
Genesis & PurposeMesos is a distributed systems kernel
stitches together many different machines into a logical computer
Mesos has been around the longest (launched in 2009)and is arguably the most stable, with highest (proven) scale currently
Mesos is written in C++with Java, Python and C++ APIs
Marathon as a FrameworkMarathon is one of a number of frameworks (Chronos and Aurora other
examples) that may be run on top of Mesos
Frameworks have a scheduler and executor. Schedulers get resource offers.
Executors run tasks.
Marathon is written in Scala@lcalcote
Support & MomentumMesosCon 2015 in Seattle had 700 attendees
up from 262 attendees in 2014
Mesos has 78 contributorsMarathon has 219 contributors Mesos under the governance of Apache FoundationMarathon under governance of Mesosphere Mesos is used by Twitter, AirBnb, eBay, Apple, Cisco, YodleMarathon is used by Verizon and Samsung
@lcalcote
Host & Service Discovery
Mesos-DNS generates an SRV record for each Mesostask
including Marathon application instances
Marathon will ensure that all dynamically assignedservice ports are uniqueMesos-DNS is particularly useful when:
apps are launched through multiple frameworks (not just Marathon)
you are using an IP-per-container solution like
you use random host port assignments in Marathon
Project Calico
@lcalcote
Scheduling
Two-level schedulerFirst-level scheduling happens at Mesos master based onallocation policy, which decides which framework getresources.
Second-level scheduling happens at Framework scheduler,which decides what tasks to execute.
Provide reservations, over-subscriptions and preemption.
@lcalcote
Modularity & ExtensibilityFrameworks
multiple availablemay run multiple frameworks concurrently
Modules
extend inner-workings of Mesos by creating and usingshared libraries that are loaded on demandmany types of Modules
Replacement, Isolator, Allocator, Authentication, Hook, Anonymous
@lcalcote
Updates & MaintenanceNodes- Mesos has maintenance mode
Mesos backwards compatiblefrom v1.0 forwardMarathon ?
Applications
Marathon can be instructed todeploy containers based on thatcomponent using a blue/greenstrategy
where old and new versions co-exist for a
time. @lcalcote
Health MonitoringNodes
Master tracks a set of statistics and metrics tomonitor resource usage
Applications
support for health checks (HTTP and TCP)an event stream that can be integrated with load-balancers or for analyzing metrics
@lcalcote
Networking & Load-Balancing
Networking
An IP per ContainerNo longer share the node's IP
Helps remove port conflicts
Enables 3rd party network drivers
isolator withMesosContainerizer
Load-Balancing
Marathon offers two TCP/HTTP proxiesA simple shell script and a more complex one called marathon-lb that
has more features.
Pluggable (e.g. Traefik for load-balancing)
Container Network Interface (CNI)
@lcalcote
Secrets Management
Not yet.
Only supported by Enterprise DC/OS
Stored in ZooKeeper, exposed as ENV variables in Marathon
Secrets shorter than eight characters may not be accepted by Marathon.
By default, you cannot store a secret larger than 1MB.
@lcalcote
High Availability & ScaleA strength of Mesos’s architecture
requires masters to form a quorum using ZooKeeper (point of failure)
only one Active (Leader) master at-a-time in Mesos and Marathon
Scale is a strong suit for Mesos. TBD for Marathon. Autoscale
`marathon-autoscale.py` - autoscales application based on the
utilization metrics from Mesos
- request rate-based autoscaling with Marathon.
Great at short-lived jobs. High availability built-in.Referred to as the “golden standard” by Solomon Hykes, Docker CTO.
marathon-lb-autoscale
Still needs 3rd party tools
Marathon interface could be more Docker friendly
(hard to get at volumes and registry)
May need a dedicated infrastructure IT team
an overly complex solution for small deployments
Universal Containerizer
abstract away from docker, rkt, kurma?, runc, appc
Can run multiple frameworks, including Kubernetes and Swarm.
Supports multi-tenancy.
Good for Big Data shops and job / task-oriented workloads.
Good for mixed workloads and with data-locality policies
Mesos is powerful and scalable, battle-tested
Good for multiple large things you need to do 10,000+ node cluster system
Marathon UI is young, but promising.
@lcalcote
Summary
A high-level perspective of the container orchestratorspectrum.
@lcalcote
Lee Calcote
linkedin.com/in/leecalcote
@lcalcote
blog.gingergeek.com
Thank you.Questions?
clouds, containers, infrastructure,applications and their management