50

ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Embed Size (px)

Citation preview

Page 1: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we
Page 2: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Windows Azure from the Pulpit to the WhiteboardRyan Dunn & Wade Wegner

WAD-B351

Page 3: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ExampleCustomer with > 300 VMs deployed and 100’s of SQL Azure databases.

Error in DB connection logic and tight loop retry. Each error is traced with full stack trace.Over 2GB of trace data per minute being generated

Table storage data format is verbose to begin with, but…1 GB NIC completely saturated on 16 workers trying to keep upTimeout during read due to too much data, which caused RETRYAutoscale noticed high queue levels stacking – scaled to max

Page 4: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

What we will cover todayLessons Learned (the hard way)

Building for ScaleAutomation, TestingDeployment Patterns in Windows AzureHandling Disaster/Downtime

Page 5: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ACCELERATE INNOVATIONS USING CLOUD

DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE

DELIVER SCALE AND AGILITY

TO THE

CLOUD.

THE RIGHT

WAY.

What we do at Aditi

Page 6: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Our clients are technology leaders …

Page 7: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

AzureOps.comMonitors deployments in Windows AzureAt peak, monitored ~3000 VMs in 6 datacentersProcessed TBs of trace data per month, GBs of perf countersConsumed half-billion storage transactions per monthRan on 2 S, 4 M + (2 M x DC), and 12 XS instances.

Auto-scales based on custom metrics

Alerts based on custom rules

Page 8: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Scheduler

Page 9: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SchedulerTime-based job schedulingFeaturesWebhooks (GET, POST), Windows Azure QueuesBasic Auth & NoneNuGet

500,000 job executions

Live API documentation

Four plans available in store

Page 10: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Aditi’s High Level Architecture (CQRS)Web Scheduler

Query Svc

Command Handlers

Domain Model

Events Event HandlersEven

t B

us

Denormalizers

Data Access Layer

Event Data

View Data

Service Client

Com

man

ds

Qu

eri

es

Page 11: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Why did we choose this architecture?• Allowed us to easily scale our backend

(async) while keeping front-end very responsive.

• Compartmentalized logic in handlers that could be independently developed/tested

• Event sourcing not only gave us what, but how. Audit history came along for the ride.

• Flexibility to add or modify views at any time and regenerate

Page 12: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ExampleCustomer with > 300 VMs deployed and 100’s of SQL Azure databases.

Error in DB connection logic and tight loop retry. Each error is traced with full stack trace.Over 2GB of trace data per minute being generated

Table storage data format is verbose to begin with, but…1 GB NIC completely saturated on 16 workers trying to keep upTimeout during read due to too much data, which caused RETRYAutoscale noticed high queue levels stacking – scaled to max

Page 13: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SolutionThrottle! Protect your service.Read first 50,000 traces in 5 mins and raised Throttled event

Be very cautious on Retry policiesVirtuous cycles can bite

Page 14: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Protect your servicesAlways assume the worse will happenIt will, guaranteed.Users will find ways to crash your services, guaranteed.

Plan to throttleTimeouts, max result size, # of queries per min

Be wary of retriesAutomatic retries rarely work like you think they doRetry policies can lead to even bigger failures

Learn from your mistakes

Page 15: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ExampleAutoscale system would periodically kick up 4 new instances based on queue length

Queues would bleed down, repeat every 2 hours (as 2 instances would scale away)

Unknown reason why command queue length was increasing on cyclical 2 hour cycle

Page 16: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SolutionAdded custom counters by type of command

Found that calls to SMAPI would have average of 60 seconds and as long as 240 seconds (fastest was 30).

Routed SMAPI refresh commands to own queue and put 12 XS workers on it to processSame cost as 2 S workers6x as many commands processed

Page 17: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ExampleCustomer would update settings and UI would not reflect the change.Eventually, change would be updated, but history would often show multiple updates of exact same data.Customers contacted support to complain

Page 18: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SolutionFound that commands from customers were getting routed to same processing queue as some other much longer running commands

Re-prioritized commands coming from UI to have its own queue with dedicated resources (VIP queues)

Page 19: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Push your hard work to queuesApplies almost universally in CQRS worldAllows for asynchronous dispatch and eventual handling

Prioritize queuesE.g. VIP Queues for UI commandsPrevents stacking of high priority messages that update views

Alleviates front-end from blocking callsHTTP requests become highly efficient and scalable when combined with Read Optimization

Page 20: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Snapshot 1

Page 21: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Snapshot 2

Page 22: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SolutionFound that deserialization of event store processing grew almost exponentially depending on size and number of events.Fixed with an implementation of snapshotting

Page 23: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Snapshot 3

Page 24: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

ExampleQueue length in North Europe was consistently longer than North America despite having fewer tenants (less work to do).Autoscaler was frequently and aggressively scaling up to handle the queue.But… processing time for aggregation was same or shorter than other geographies

Page 25: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

SolutionOne North Europe tenant had used same storage account for both load testing devtest environment as production.

It was equivalent of finding a needle in haystack

Tuned our timeouts for aggregation scheduling (before aggregation).

Page 26: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Identifying BottlenecksInstrumentation is keyCustom performance counters is niceSimple timer that traces long commands works well too

Watch for trends over timeSnapshots might not help until you look at them over time

Profile your code when data indicates a problem

Tune and then verify changes

Page 27: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Optimize for reads

Even

t B

us

RegisterUser

UserRegistrationHandler

UserRegistered

UserProfileHandler

UserQuotaHandler

User Profile

Quota View

Storage

Page 28: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Optimize for reads

Pre-calculate and store each view that will be displayed to your users.It’s ok if data is slightly stale. Really.

Page 29: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Optimize for reads

Storage is cheapCreate a new view for each ‘task’ on the UIIt’s OK to have the same data in multiple views

Page 30: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Optimize for reads

Don’t make your web servers work hardServe JSON files directly from storage.Cache & use EtagsLight transformations on dynamic data is OK.

Page 31: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Learn to live with stale data

Pre-calculated views are by definition already staleIn CQRS, events raised from commands are dispatched to denormalizers from event bus

Page 32: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Learn to live with stale data

Create a view for each ‘task’ on the UIStatic views versus dynamic views

Page 33: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Learn to live with stale data

Dynamic View == Queryable dataE.g. last X hours of trace information with Y levelServed efficiently from table storage.

Page 34: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Learn to live with stale data

Static data E.g. account details, current balance, or settingsIdeal for JSON files sitting in blob storage

Page 35: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Work close to your dataBandwidth becomes limiting factorBetween datacentersBetween VM and NIC

Cost goes down, performance goes up.Win, Win!

Employ a message routing strategyMessages can get routed to geo-specific queuesMessages can get prioritizedMessages can get quarantined

Page 36: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Building for Scale RecapOptimize for reads

Learn to live with stale data

Work close to your data

Protect your services

Push your hard work to queues

Instrument

Page 37: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Why automate?Reproducibility, Reproducibility

Takes the ‘I forgot to…’ out of it.

Automate

Data migrations tie it together.

Continuous builds raise the quality bar.

Visual Studio deploys verboten.

Page 38: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Build Demo

Page 39: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Automation RecapBuild Automation

Deployment Automation

Data Migrations

Page 40: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

PaaS vs IaaSLarge scale (> 50 instances) requires PaaS todayPaaS is a much easier deployment, upgrade, and maintenance model.Requires architecting differently – no state, idempotent, etc.

IaaS is wonderful for stateful appsPaaS FTW

Page 41: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Extra Small VMsHidden gem amongst the instance sizes4x cost advantageIf you can live within bandwidth and memory constraints, big bang for buck.

Page 42: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Websites versus WebRolesGit deployment on Windows Azure Websites is very nice

Ability to use RoleEntry point on WebRoles can be more important than ease of deploymentAbility to control dependenciesScale beyond websites and VMsSSL support is free

WebRoles FTW

Page 43: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Single vs Many DeploymentsMany deployments required to have geo-redundancy.Coordinating upgrades becomes challenging

Ability to dynamically route messages works to your advantageJust ‘turn off’ geo-route until upgrade completesIf datacenter is having ‘issues’, you can remove from routing

Page 44: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Deployment PatternsPaaS vs IaaS

Extra Small VMs

Azure Websites vs WebRoles

Single Deployment vs many

Page 45: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Handling outages and disasterOutages are extremely commonTrust meService degradations

Every service you use has its own SLA

The best you can do is the multiplication of each SLAE.g. 99.95 * 99.9 * 99.9 = 99.75

Your service will go down

Page 46: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Remediation StrategiesWeigh your riskTradeoffs abound, what is your single point of failure?Queues can helpDistributed Queues might be necessary

Geo-DistributionFault domainsMulti-datacenterMulti-cloud

Page 47: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

RecapBuilding for Scale

Automation and Testing

Deployment Patterns

Disaster and Recovery

Page 49: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

Evaluate this session

Scan this QR code to evaluate this session.

Page 50: ACCELERATE INNOVATIONS USING CLOUD DIFFERENTIATE WITH DESIGN AND USER EXPERIENCE DELIVER SCALE AND AGILITY TO THE CLOUD. THE RIGHT WAY. What we

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.