Usage Compute Time Average Inactivity Period Compute Time Average Usage Compute Time Compute Time Average Usage

Windows Azure ComputeBrad Calder

General ManagerWindows Azure

Usage

Com

pu

te

Time

Average

Inactivity

Period

“On and Off “ RiskMetrics

On and off workloads (e.g. batch job)Over provisioned capacity is wasted

Com

pu

te

Time

“Unpredictable Bursting“

Twitter

Average Usage

Unexpected/unplanned peak in demand Sudden spike impacts performance Hard/costly to over provision for extreme cases

Average Usage

Com

pu

te

Time

“Growing Fast“Docs.com on Facebook

Successful services needs to grow/scale Keeping up w/growth is big IT challenge Can be hard to predict growth

Com

pu

te

Time

Average Usage

“Predictable Bursting“Walmart

Services with micro seasonality trends Peaks due to periodic increased demandWasted capacity off season

Large-Scale Workload Patterns

12:00 AM1:54 AM 3:48 AM 5:42 AM 7:36 AM 9:30 AM11:24 AM 1:18 PM 3:12 PM 5:06 PM 7:00 PM 8:54 PM 10:48 PM

Japan Great Britain

BING SEARCHES – JAPAN VS. UK

Source: Microsoft

Computing Demand Daily FluctuationQ

uery

Volu

me

• turbotax.com • taxcut.com

• hrblock.com • taxact.com

Source: Alexa

~4x normal load(Holiday shopping)

~10x normal load(Tax season)

• target.com • walmart.com

• toysrus.com • barnesandnoble.com

Jan 2009 Jan 2010 Jan 2009 Jan 2010

Source: Alexa

Computing Demand Yearly Variability

Time

Dem

an

dWhat is a “Cloud”?

• Cloud: on-demand, scalable, compute and storage resources

TimeD

em

and

Self Server Provisioning Cloud Provisioning

OverprovisionedUnderprovisioned

What is Under the Covers of a Service

Business logic

Datacenter (Power and Cooling)

Respond to hardware failures

Monitoring and alerting infrastructureReliable/Secure storage and computation

Metering and billing infrastructureLive upgrades and OS patches

Add compute/storage capacity on the flyOverprovision for peak traffic

Service “glue”

…

Buy and provision hardware

What is Windows Azure?An operating system for the cloud:

….Service 1 Service 2 Service NService 3

……

Cloud Terminology

• Infrastructure as a Service (IaaS): basic compute and storage resources• On-demand servers• Amazon EC2, VMWare vCloud, etc

• Platform as a Service (PaaS): cloud application infrastructure• On-demand application-hosting environment• Google AppEngine, Salesforce.com, Windows Azure, etc

• Software as a Service (SaaS): cloud applications• On-demand applications• Office 365, GMail, etc

Operating System

Operating System

VM

WebServer

Operating System

VM

DBMS

2) Choose image, then create and configure VM(s) for

application

1) Choose image, then

create VM for DBMS and

configure DBMS

IaaS

Library

VM Images

Developer/Ops

ApplicationDataLoad

Balancer

5) Config

ure load

balancer

6) Manage VMs and DBMS (e.g.,

deploying new OS images in VMs)

3) Provision database,

then create tables and add data

4) Install

application

Operating System

Operating System

VM

Operating System

VM

DBMS

PaaS Developer/Ops

ApplicationDataLoad

Balancer

2) Deploy applicati

on w/ service model

WebServer

1) Provision database,

then create tables and add data

3) Automated Service Managem

ent

Windows Azure

• Windows Azure is an OS for the data center• Handles resource management, provisioning, and

monitoring• Manages application lifecycle• Allows developers to concentrate on business logic

• Provides common building blocks for distributed applications• Reliable queuing• Simple unstructured and structured storage• SQL storage• Application services like access control, caching, and

connectivity

Windows Azure Platform

Fabric ControllerWindows Azure

Networking

AppFabric Caching

AppFabric Access Control Server

SQL Azure

AppFabric Service Bus

WindowsAzure

Compute

WindowsAzure

Middleware Services

Windows Azure Applications

Windows Azure Storage

Windows Azure CDN

WindowsAzure

Data Services

• Owns all the hardware in the data center• Uses the inventory to host services• Similar to what a per machine operating system

does with applications• Provisions the hardware as necessary• Maintains the health of the hardware• Deploys applications to free resources• Maintains the health of those applications

Fabric Controller

Windows Azure Fabric Controller

Highly-availableFabric Controller

Hardware control Software control

WS08 Hypervisor

VMVM

VM

Fabric

Agent

Switches

Load-balancers

Scaling with the Fabric Controller Service Model

Scaling

• There are two basic scaling models:

Compute

Compute

Compute

Compute

Scale Up Scale Out

Compute

Scaling Lessons

• Use few, well-defined scaling units• Define scaling boundaries• Scale out those units as needed

Scale-Out Applications

Network Load Balancer

Stateless ‘Worker’

Stateless Front End

Shared Filesystem

(Azure Blobs)

Partitioned RDBMS

(SQL Azure)

Key/ValueDatastore

(Azure Tables)

AzureQueues

Scale Out

Scale Out

AlreadyProvided ScalableStorage

The Windows Azure Service Model• A Windows Azure application is called a “service”• Definition information• Configuration information• At least one “role”

• A role is the scaling boundary withina service• Roles are like DLLs in the service “process”• Collection of code with an entry point

that runs in its own virtual machine• Virtual machine is scale unit • Role code runs in a virtual machine • Role scales by instances of a virtual machine size

LB

Durable

Store

Front End

Middle Tier

Multi-Tier Cloud Application• A cloud application is typically made up of different

components• Front end: e.g. load-balanced stateless web servers• Middle worker tier: e.g. order processing, encoding• Backend storage: e.g. Azure Blobs, Azure Tables, SQL

Azure• Multiple instances of each for scalability and availability• Requires at least 2 instances of each to achieve the SLA

Front-End

Cloud Application

Front-End

HTTP/HTTPS

Windows

AzureStorag

e,SQL

Azure

Load Balancer Middle-

Tier

Service Model and Role Contents

• Definition: • Role name• Role type • VM size (e.g. small, medium, etc.)• Network endpoints

• Configuration:• Number of instances• Number of update and fault domains

• Code: • Web/Worker Role: Hosted DLL

and other executables• VM Role: VHD

Service ModelRole: Front-End

DefinitionType: WebVM Size: SmallEndpoints: External-1ConfigurationInstances: 2Update Domains: 3Fault Domains: 2

Role: Middle-Tier

DefinitionType: WorkerVM Size: LargeEndpoints: Internal-1ConfigurationInstances: 3Update Domains: 3Fault Domains: 2

Service Model Files

• Service definition is in ServiceDefinition.csdef

• Service configuration is in ServiceConfiguration.cscfg

• CSPack program Zips service binaries and definition into Service Package File (service.cscfg)

ServiceDefinition.csdef<?xml version="1.0" encoding="utf-8"?><ServiceDefinition name="Sample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" upgradeDomainCount=“3"> <WorkerRole name="Middle-Tier" vmsize="Large"> <Endpoints> <InternalEndpoint name="Internal-1" protocol="tcp" /> </Endpoints> </WorkerRole> <WebRole name="Front-End" vmsize="Small"> <Sites> <Site name="Web"> <Bindings> <Binding name="Endpoint1" endpointName="External-1" /> </Bindings> </Site> </Sites> <Endpoints> <InputEndpoint name="External-1" protocol="http" port="80" /> </Endpoints> <Imports> <Import moduleName="Diagnostics" /> </Imports> </WebRole></ServiceDefinition>

ServiceConfiguration.cscfg<?xml version="1.0" encoding="utf-8"?><ServiceConfiguration serviceName="Sample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily=“2"

osVersion="*"> <Role name="Middle-Tier"> <Instances count="3" /> </Role> <Role name="Front-End"> <Instances count="2" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=key" /> </ConfigurationSettings> </Role></ServiceConfiguration>

Windows Azure Push-button Deployment• Step 1: Allocate VMs/nodes,

VDIPs/VIPs• Across fault domains• Across update domains

• Step 2: Place role images on nodes

• Step 3: Start roles in VM instances

• Step 4: Configure load-balancers• Step 5: Maintain desired number

of role instances• Failed roles automatically

restarted• Node failure results in new VMs

automatically allocated

Allocation across fault and update domains

Load-balancers

• Windows Azure FC monitors the health of roles• FC detects if a role dies• Restart the role to bring it back to a healthy state

• If a failed node can’t be recovered, FC migrates role instances to a new node• A suitable replacement location is found• Existing role instances are notified of the

configuration change

FC Automated Management

Availability andFault/Upgrade Domains

Availability Service Level Agreements (SLA)

• Windows Azure Platform SLAs:• Compute External Connectivity: 99.95% (2 or more

instances)• Storage Availability: 99.9%• SQL Azure Availability: 99.9%

Availability % Downtime per yearDowntime per month*

Downtime per week

99% ("two nines") 3.65 days 7.20 hours 1.68 hours

99.9% ("three nines")

8.76 hours 43.2 minutes 10.1 minutes

99.99% ("four nines")

52.56 minutes 4.32 minutes 1.01 minutes

99.999% ("five nines")

5.26 minutes 25.9 seconds 6.05 seconds

Maintaining Availability: Assume Failure

• Hardware fails • 3-5% of servers experience failures annually

• Software fails• Inevitable in any evolving, complex system

• Tolerating failure means:• Redundancy where possible• Need to build in retries and backoff• Fast recovery• Big red buttons

Hardware Redundancy

TOR

LB LBAgg

PDU

LB LBAgg

LB LBAgg

LB LBAgg

LB LBAgg

LB LBAgg

Racks

Datacenter

Routers

Aggregation Routers and

Load Balancers

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

……… … …

Top of RackSwitches

Power Distribution

Units

…

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Top of Rack Switch is a Single Point of Failure

Maintaining Availability: Fault Domains

• Avoid single points of failure• Unit of failure based on

data center topology is a rack• E.g. top-of-rack switch on a rack of

machines

• Windows Azure considers fault domains when allocating service roles• At least 2 fault domains per service• Will try and spread roles out across

more

Front-End-1

Fault Domain 1

Fault Domain

2

Front-End-2

Middle Tier-2

Middle Tier-1

Fault Domain 3

Middle Tier-3

Front-End-1

Middle Tier-1

Front-End-

2

Middle

Tier-2

Middle

Tier-3

• Update domains specifies what percentage of your service you will take offline for an upgrade• Specify the # of update domains for

your service• Default is 5 and max is 20

• Roles are evenly assigned an update domain

• Used to update only one domain at a time• Rolling update

Update Domains

Upgrade domains

allocated across fault domains

Fault domains

Service Deployment and Maintenance

Containing Failure: Datacenter Clusters

• Datacenters are divided into “clusters”• Approximately 1000 rack-mounted servers (we call them “nodes”)• Provides a unit of fault isolation

• Each cluster is managed by a Fabric Controller (FC)

Cluster1

Cluster2

Clustern

…

Datacenter network

FC FC FC

The Fabric Controller (FC)

• The “kernel” of the cloud operating system• Manages datacenter hardware• Manages Windows Azure services

• Four main responsibilities:• Datacenter resource allocation• Datacenter resource provisioning• Service lifecycle management• Service health management

• Inputs:• Description of the hardware and network resources it will

control• Service model and binaries for cloud applications

Service Deployment Steps• Process service model files

• Determine resource requirements• Create role images

• Allocate compute and network resources• Prepare nodes

• Place role images on nodes• Create virtual machines• Start virtual machines and roles

• Configure networking• Dynamic IP addresses (DIPs) assigned to nodes• Virtual IP addresses (VIPs) + ports allocated and mapped to sets of

DIPs• Configure packet filter for VM to VM traffic within service• Program load balancers to allow traffic to external endpoints

Service Resource Allocation• Goal: allocate service components to available resources

while satisfying all hard constraints • Size of VM

• HW requirements: CPU, Memory, Storage, Network• Upgrade domains• Fault domains

• Secondary goal: Satisfy soft constraints • Optimize network proximity: pack different roles into same node

Deploying a ServiceRole B

Middle-Tier RoleCount: 3

Update Domains: 3Size: Large

Role AFront-End Role

(Front End)Count: 2

Update Domains: 3Size: Medium

LoadBalance

r

10.100.0.36

10.100.0.122

www.mycloudapp.net

www.mycloudapp.net

Fault domain

Upgrade domain

http://www.mycloudapp.net/

http://www.mycloudapp.net/

Inside a Deployed Node

Fabric Controller (Primary)

FC Host Agent

Host Partition

Guest Partitio

nGuest Agent

Guest Partitio

nGuest Agent

Guest Partitio

nGuest Agent

Guest Partitio

nGuest Agent

Physical Node

Fabric Controller (Replica)

Fabric Controller (Replica)…

Role Instance

Role Instance

Role Instance

Role Instance

Trust boundary Image Repository

(OS VHDs, role ZIP files)

Detection: Load Balancer Operation

• FC programs load balancers (LB) to “probe” guest agent (GA) every 15 seconds• If the guest misses two probes, the LB stops forwarding

traffic• The role can report “busy” status to the GA • GA stops responding to probes

• LB keeps an idle connection open for 60s• Use keep-alive commands if the connection needs to be

open longer

Recovery: Server and Role Health

• FC maintains service availability by monitoring the software and hardware health• Based primarily on heartbeats • Automatically “heals” affected rolesProblem Fabric Detection Fabric Response

Role instance crashes FC guest agent monitors role termination FC restarts role

Guest VM or agent crashes FC host agent notices missing guest agent heartbeats

FC restarts VM and hosted role

Host OS or agent crashes FC notices missing host agent heartbeat Tries to recover nodeFC reallocates roles to other nodes

Detected node hardware issue Host agent informs FC FC migrates roles to other nodesMarks node “out for repair”

Updating Your Service• There are two update types:• In-place: used for large scale services and used to

updated services with local state• VIP swap: for ease of testing and fail-back for smaller

services• In-place (rolling) update:• Role instances updated one update domain at a time• Two modes: automatic and manual

In-Place Update

• Purpose: Ensure service stays up while updating• Used by Windows Azure OS updates

• System considers update domains when upgrading a service• 1/Update domains = percent of

service that will be offline• Default is 5 and max is 20

• The Windows Azure SLA is based on at least two update domains and two role instances of each role

Front-End-

1

Front-End-

2

Update Domain 1

Update Domain

2

Middle

Tier-1

Middle

Tier-2

Middle

Tier-3

Update Domain

3

Middle Tier-

3

Front-End-2

Front-End-

1

Middle Tier-

2

Middle

Tier-1

Windows Azure Compute Summary• Platform as a Service is all about reducing

management and operations overhead• The Windows Azure Fabric Controller is the

foundation for Windows Azure’s PaaS• Provisions machines• Deploys services• Configures hardware for services• Monitors service and hardware health

Documents

Usage Compute Time Average Inactivity Period Compute Time Average Usage Compute Time Compute Time Average Usage