49
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses Pets vs. Cattle: The Elastic Cloud Story DevOps Chicago Meetup February 26, 2014 @randybias

Pets vs. Cattle: The Elastic Cloud Story

Embed Size (px)

DESCRIPTION

My recent presentation to the Chicago DevOps Meetup that explains how we're moving from a servers as Pets world to a servers as Cattle world. Understanding this change is critical to success in cloud, DevOps, and delivering new value to the enterprise.

Citation preview

Page 1: Pets vs. Cattle: The Elastic Cloud Story

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*!* All unlicensed or borrowed works retain their original licenses

Pets vs. Cattle:!The Elastic Cloud Story!DevOps Chicago Meetup!February 26, 2014

@randybias

Page 2: Pets vs. Cattle: The Elastic Cloud Story

A Tale of Two Clouds

�2

Page 3: Pets vs. Cattle: The Elastic Cloud Story

Enterprise Computing Approach

�3

GUI Driven!Ticket-Based!Hand-Crafted!

Reserved !Scale-up!

Smart Hardware!Proprietary!

Traditional Dev!…

Page 4: Pets vs. Cattle: The Elastic Cloud Story

Cloud Computing Approach

�4

API Driven!Self-Service!Automated!On-demand!Scale-out!

Smart Apps!Open Source!Agile DevOps!

Page 5: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud Shifts Uptime Responsibility

�5

Enterprise Model Cloud Model

99.9%!Applications!

(8h46m down)

99.999%!Infrastructure!

($$$$)

99.999% Applications!(5m down)

99% Infrastructure!

($$)

Page 6: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud Origins

�6

Elastic !Private Cloud

Enterprise Virtualization!Private Cloud

Elastic & Virtualization

2.0 Clouds are very different.!

!Different

workloads.!!

Different !architectures.!

!Different !

skills.!!

Different economics.

Virtual Infrastructure

Standardization, Automation,!

Chargeback, Self-Service!

Designed for Server Consolidation !IT Admins manage Infrastructure!Ticket-based manual provisioning!Improves virtualization value

=

+

Elastic Public Cloud

On-premise Deployment!

Designed for Agility!Cloud Admins manage Services!

Self-service automated provisioning!Delivers cloud value on-premise

=

+

Page 7: Pets vs. Cattle: The Elastic Cloud Story

What Companies Care About?

�7

Cloud Computing!

Agile Development!

Business !Agility!

Operational Discipline!

ACCELERATING!TIME TO VALUE!Continuous

Integration

Continuous Testing & Delivery

Agile Methodologies

IaaS / PaaS !!

Public / Private / Hybrid !!

Big Data / Analytics

!!

Public APIs

Continuous Deployment

DevOps Data Center & App Automation

Line of Business

Enablement

New App Initiatives

(Mobile, SaaS, etc.)

Data Center Modernization

Page 8: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud is a Mindset Change

�8

Attribution: Bill Baker, Distinguished Engineer, Microsoft

bowzer.company.com!(scale-up)

web001.company.com!(scale-out)

(Virtual) Servers *are* cattle

Page 9: Pets vs. Cattle: The Elastic Cloud Story

Pets vs. Cattle Takes Off

�9

MicrosoftCloudscaling

CERN

IBM

ScalrRackspaceRed Hat

Scale-out, not UP in Cloud

Page 10: Pets vs. Cattle: The Elastic Cloud Story

(Some) Elastic Cloud Patterns!

!

!

What follows are *some* Elastic Cloud Patterns!There are many more, but these are mine!Input, ideas, & other thoughts welcome via twitter / email

�10

Page 11: Pets vs. Cattle: The Elastic Cloud Story

Big Failure Domains !Make Big Craters

�11

Page 12: Pets vs. Cattle: The Elastic Cloud Story

Big Failure Domains !Make Big Craters

�12

Anti-Pattern

Anti-Pattern

Page 13: Pets vs. Cattle: The Elastic Cloud Story

Smaller Failure Domains

�13

Would you rather have the whole cloud down !or just a small bit of it for a short time?

vs

Page 14: Pets vs. Cattle: The Elastic Cloud Story

Loose Coupling

�14

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Page 15: Pets vs. Cattle: The Elastic Cloud Story

Open Source Software

�15

Excessive software taxation is the past.

Black boxes create lock-in.

You can !always fork.

Page 16: Pets vs. Cattle: The Elastic Cloud Story

Uptime in Software Self-management

�16

Hardware fails.!Software fails.!

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can

run anywhere

Page 17: Pets vs. Cattle: The Elastic Cloud Story

Scale Out vs Scale up

�17

Vertical Scaling Make boxes bigger (usually an HA pair)

Horizontal ScalingMake more boxes

A

A

➔➔

B

B ...A B C N

Page 18: Pets vs. Cattle: The Elastic Cloud Story

Circuit Breaker Pattern

�18

Fallback mechanisms (e.g. cached data)

ensure uninterrupted service while giving service time to

recover

When failing service detected, stop calling that

API and serve fallback responses

Page 19: Pets vs. Cattle: The Elastic Cloud Story

Buy from ODMs

�19

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Page 20: Pets vs. Cattle: The Elastic Cloud Story

Less Enterprise “Value” in x86 Servers

�20

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more efficient ... less expensive to build

and run ... “ - OCP

Page 21: Pets vs. Cattle: The Elastic Cloud Story

Fully Routed (L3) Networking

�21

The largest cloud operators all run layer-3 routed,

networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on elastic clouds

using Software-defined Networking (SDN)

Page 22: Pets vs. Cattle: The Elastic Cloud Story

Software-defined Networking (SDN)

�22

• x86 server is the new Linecard!• network switch is the new ASIC!• VXLAN (or NVGRE) is the new Chassis!• SDN Controller is the new SUP Engine

“Network Virtualization”

Page 23: Pets vs. Cattle: The Elastic Cloud Story

Flat Networking + SDNs

�23

Flat + SDN co-exist & thrive together

Standard SecurityGroup

1 2

Availability Zone

VM VM

VM

VM

VM

VM

Virtual L2 Network

VM

VMVM

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

Page 24: Pets vs. Cattle: The Elastic Cloud Story

RAIS instead of HA Pairs/ClustersRedundant arrays of inexpensive services (RAIS)!

Load balanced with no state sharing!Active … active … active … active … !On failure, connections are lost, but failures are rare!Rolling upgrades are easier, because each server is an island!Think: scale-out + fault isolation (sharding)!

Ridiculously simple & scalable!

Hardware failures are infrequent & impact subset of traffic!(N-F)/N, where N = total, F = failed!10 RAIS servers - 1 failure == 90% capacity!Most things retry anyway!

Cascade failures are unlikely and failure domains are small

�24

Page 25: Pets vs. Cattle: The Elastic Cloud Story

Service Array (RAIS) Example

�25

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

API

Public IP Blocks

Cloud Control Plane

Page 26: Pets vs. Cattle: The Elastic Cloud Story

Lots of Inexpensive 1RU Switches

�26

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Page 27: Pets vs. Cattle: The Elastic Cloud Story

Direct-attached Storage (DAS)

�27

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Page 28: Pets vs. Cattle: The Elastic Cloud Story

Elastic Block Device Services

�28

EBS/EBD is a crutch

Bigger failure domains (AWS outage anyone?), complex,

sets high expectations

Sometimes you need a crutch. When you do, overbuild the network, and make sure

you have a smart scheduler.

AWS EBS Outage!http://aws.amazon.com/message/65648/

Page 29: Pets vs. Cattle: The Elastic Cloud Story

More Servers == More Storage I/O

�29

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Page 30: Pets vs. Cattle: The Elastic Cloud Story

Hypervisors are a Commodity

�30

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.!… hypervisors are bare metal APIs

Hypervisor of the future is open source, easily modifiable, &

extensible.

Page 31: Pets vs. Cattle: The Elastic Cloud Story

The Hypervisor of the Future May Be NO Hypervisor

�31

LXC

ironic

Bare Metal Cloud

Page 32: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�32

Page 33: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�33

Pets CattleLACP?

Page 34: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�34

Pets CattleLACP ➔

Page 35: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�35

Pets CattleLACP

Managing a Server at a Time?

Page 36: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�36

Pets CattleLACP

Managing a Serverat a Time ➔

Page 37: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�37

Pets CattleLACP

Managing Server at a Time

Auto-scaling?

Page 38: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�38

Pets CattleLACP

Managing Server at a Time

Auto-scaling➔

Page 39: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�39

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure?

Page 40: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�40

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure➔

Page 41: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�41

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals?

Page 42: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�42

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals ➔

Page 43: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�43

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy?

Page 44: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�44

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy ➔

Page 45: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�45

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture?

Page 46: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�46

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture➔

Page 47: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�47

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage?

Page 48: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�48

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage ➔

Page 49: Pets vs. Cattle: The Elastic Cloud Story

Q & A

�49

Randy Bias!Founder & CEO, Cloudscaling!Director, OpenStack Foundation!@randybias