42
From Development to Deployment (ESaaS §12.1) © 2013 Armando Fox & David Patterson, all rights reserved

From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

From Development to Deployment(ESaaS §12.1)!

© 2013 Armando Fox & David Patterson, all rights reserved

Page 2: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Outline of topics"

•  Continuous integration & continuous deployment"

•  Upgrades & feature flags"•  Availability & responsiveness"•  Monitoring"•  Relieving pressure on the database"•  Defending customer data"

Page 3: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Development vs. Deployment"

Development:"•  Testing to make sure your app works as

designed"Deployment:"•  Testing to make sure your app works when

used in ways it was not designed to be used"

Page 4: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Bad News"

•  “Users are a terrible thing”"•  some bugs only appear under stress"•  production environment != development

environment"•  the world is full of evil forces"•  and idiots"

Page 5: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Good News:PaaS makes deployment way easier"

•  get Virtual Private Server (VPS), maybe in cloud"

•  install & configure Linux, Rails, Apache, mysqld, openssl, sshd, ipchains, squid, qmail, logrotate…"

•  fix almost-weekly security vulnerabilities"•  find yourself in Library Hell"•  tune all moving parts to get most bang for

buck"•  figure out how to automate horizontal scaling"

Page 6: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Our goal: stick with PaaS!"

Is this really feasible?"•  Pivotal Tracker & Basecamp each run on a

single DB (128GB commodity box <$10K)"•  Many SaaS apps are not world-facing

(internal or otherwise limited interest)"

PaaS handles…! We handle…!“Easy” tiers of horizontal scaling" Minimize load on database"Component-level performance tuning"

Application-level performance tuning (e.g. caching)"

Infrastructure-level security" Application-level security"

Page 7: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

“Performance & security” defined"

•  Availability or Uptime"What % of time is site up & accessible?!

•  Responsiveness"–  How long after a click does user get response?"

•  Scalability"–  As # users increases, can you maintain responsiveness

without increasing cost/user?"•  Privacy"

–  Is data access limited to the appropriate users?"•  Authentication"

–  Can we trust that user is who s/he claims to be?"•  Data integrity"

–  Is users’ sensitive data tamper-evident?"

Performance

Stability!Security!

Page 8: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

P ≥ min (C, H, R)

P ≤ C ≤ min(H, R)

Can’t tell without additional information

P ≤ C ≤ H ≤ R ☐

8"

Let R = RottenPotatoes app's availability H = Heroku's availability C = Internet connection availability P = Armando's perception of RP availability"Which relationship among these quantities holds?"

Page 9: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Quantifying Availability and Responsiveness

(ESaaS §12.2)!

© 2013 Armando Fox & David Patterson, all rights reserved

Page 10: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Availability and Response time"

•  Gold standard: US public phone system, 99.999% uptime (“five nines”)"– Rule of thumb: 5 nines ~ 5 minutes/year"– Since each nine is an order of magnitude, 4

nines ~ 50 minutes/year, etc."– Good Internet services get 3-4 nines"

•  Response time: how long after I interact with site do I perceive response?"– For small content on fast network, dominated by

latency (not bandwidth)"

Page 11: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Is response time important?"•  How important is response time?*"

–  Amazon: +100ms => 1% drop in sales"–  Yahoo!: +400ms => 5-9% drop in traffic"–  Google: +500ms => 20% fewer searches"

•  Classic studies (Miller 1968, Bhatti 2000)"<100 ms is “instantaneous”">7 sec is abandonment time"

•  http://code.google.com/speed"11"Source: Nicole Sullivan (Yahoo! Inc.), Design Fast Websites, http://www.slideshare.net/stubbornella/designing-fast-websites-presentation

Jeff Dean, Google Fellow"

“Speed is a feature”"

Page 12: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Simplified (& false) view of performance"

•  For standard normal distribution of response times around mean: ±2 standard deviations around mean is 95% confidence interval"

12"

•  Average response time T means: •  95%ile users are getting T+2σ •  99.7% users get T+3σ"

Page 13: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

A real example"

25%"50%"(median)"

75%" 95%"Mean"

Courtesy Bill Kayser, Distinguished Engineer, New Relic. http://blog.newrelic.com/breaking-down-apdex Used with permission of the author.

Page 14: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Service Level Objective (SLO)"•  Time to satisfy user request

(“latency” or “response time”)"•  SLO: Instead of worst case or average: what % of

users get acceptable performance"•  Specify %ile, target response time, time window"

–  e.g., 99% < 1 sec, over a 5 minute window"–  why is time window important?"

•  Service level agreement (SLA) is an SLO to which provider is contractually obligated"

14

Page 15: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Apdex: simplified SLO"

•  Given a threshold latency T for user satisfaction:"– Satisfactory requests take t≤T"– Tolerable requests take T≤ t ≤ 4T"– Apdex = (#satisfactory + 0.5(#tolerable)) / #reqs"– 0.85 to 0.93 generally “good”"

•  Warning! Can hide systematic outliers if not used carefully!"– e.g. critical action occurs once in every 15 clicks

but takes 10x as long => (14+0)/15 > 0.9"

Page 16: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Apdex Visualization"

T=1500ms, Apdex = 0.7"

Page 17: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Apdex Visualization"

T=1000ms, Apdex = 0.49"

Page 18: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

What to do if site is slow?"

•  Small site: overprovision"– applies to presentation & logic tier"– before cloud computing, this was painful"–  today, it’s largely automatic (e.g. Rightscale)"

•  Large site: worry"– Provision 1,000-computer site by 10% = 100

idle computers"•  Insight: same problems that push us out of

PaaS-friendly tier are the ones that will dog us when larger!!

Page 19: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

RottenPotatoes can still meet its uptime goal if there are no further outages this year If no users actually tried to get to the site during the outage, uptime wasn’t hurt There isn’t enough information to determine whether RottenPotatoes can meet its user-perceived uptime goal

Because of the outage, RottenPotatoes has no hope of meeting its uptime goal this year

19"

RottenPotatoes’ target uptime is 99.9%. Yesterday there was a one hour outage. Which statement is true:

Page 20: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Continuous Integration & Continuous Deployment

(ESaaS §12.3)!

© 2013 Armando Fox & David Patterson, all rights reserved

Page 21: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Releases Then and Now:Windows 95 Launch Party"

Page 22: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Releases Then and Now"•  Facebook: master branch pushed once a week,

aiming for once a day (Bobby Johnson, Dir. of Eng., in late 2011)"

•  Amazon: several deploys per week"•  StackOverflow: multiple deploys per day (Jeff

Atwood, co-founder)"•  GitHub: tens of deploys per day (Zach Holman)"•  Rationale: risk == # of engineer-hours invested in

product since last deploy!"Like development and feature check-in, deployment

should be a non-event that happens all the time!

Page 23: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Successful Deployment"

•  Automation: consistent deploy process"– PaaS sites like Heroku, CloudFoundry

already do this"– Use tools like Capistrano for self-hosted sites"

•  Continuous integration: integration-testing the app beyond what each developer does"– Pre-release code checkin triggers CI"– Since frequent checkins, CI always running"– Common strategy: integrate with GitHub"

https://github.com/saasbook/hw2_rottenpotatoes/admin/hooks

Page 24: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Why CI?"

•  Differences between dev & production envs"•  Cross-browser or cross-version testing"•  Testing SOA integration when remote

services act wonky"•  Hardening: protection against attacks"•  Stress testing/longevity testing of new

features/code paths"•  Example: Salesforce CI runs 150K+ tests

and automatically opens bug report when test fails"

Page 25: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Continuous Deployment"

•  Push => CI => deploy several times per day"– deploy may be auto-integrated with CI runs"

•  So are releases meaningless?"– Still useful as customer-visible milestones"– “Tag” specific commits with release names" git tag 'happy-hippo' HEAD git push --tags"

– Or just use Git commit ID to identify release"

Page 26: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

In CI

In the staging environment

All of these

Using autotest with RSpec+Cucumber ☐

26"

RottenPotatoes just got some new AJAX features. Where does it make sense to test these features?

Page 27: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Upgrades & Feature Flags(ESaaS §12.4)!Armando Fox"

© 2013 Armando Fox & David Patterson, all rights reserved

Page 28: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

The trouble with upgrades"

•  What if upgraded code is rolled out to many servers?"– During rollout, some will have version n and

others version n+1…will that work?"•  What if upgraded code goes with schema

migration?"– Schema version n+1 breaks current code"– New code won’t work with current schema"

Page 29: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Naïve update"

1.  Take service offline"2.  Apply destructive migration, including data

copying"3.  Deploy new code"4.  Bring service back online"

•  May result in unacceptable downtime"

http://pastebin.com/5dj9k1cj

Page 30: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Incremental Upgrades with Feature Flags"

1.  Do nondestructive migration"2.  Deploy method protected by feature flag"

3.  Flip feature flag on; if disaster, flip it back"4.  Once all records moved, deploy new code

without feature flag"5.  Apply migration to remove old columns"

http://pastebin.com/TYx5qaSB

http://pastebin.com/qqrLfuQh

Page 31: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

“Undoing” an upgrade"

•  Disaster strikes…use down-migration? "–  is it thoroughly tested?"–  is migration reversible?"– are you sure someone else didn’t apply an

irreversible migration?"•  Use feature flags instead"

– downmigrations are primarily for development"

Page 32: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Other uses for feature flags"

•  Preflight checking: gradual rollout of feature to increasing numbers of users"–  to scope for performance problems, e.g."

•  A/B testing"•  Complex feature whose code spans multiple

deploys"• rollout gem (on GitHub) covers these

cases and more!

Page 33: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

A column in an existing database table

A separate database table

These are all good places to store feature-flag values

A YAML file in config/ directory of app ☐

33"

Which one, if any, is a POOR place to store the value (eg true/false) of a feature flag?

Page 34: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Monitoring (ESaaS §12.5)!Armando Fox"

© 2013 Armando Fox & David Patterson, all rights reserved

Page 35: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Kinds of monitoring"

•  “If you’re not monitoring it, it’s probably broken”"

•  At development time (profiling)"–  Identify possible performance/stability problems

before they get to production"•  In production"

–  Internal: instrumentation embedded in app and/or framework (Rails, Rack, etc.)"

– External: active probing by other site(s)."

Page 36: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Why use external monitoring?"

•  Detect if site is down"•  Detect if site is slow for reasons outside

measurement boundary of internal monitoring"

•  Get user’s view from many different places on the Internet"

•  Example: Pingdom"

Page 37: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Internal monitoring"

•  pre-SaaS/PaaS: local"–  Info collected & stored locally, eg Nagios"

•  Today: hosted"–  Info collected in your app but stored centrally"–  Info available even when app is down"

•  Example: New Relic"– conveniently, has both a development mode

and production mode"– basic level of service is free for Heroku apps "

Page 38: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Kinds of monitoring"

Page 39: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Sampling of monitoring tools"What is monitored! Level! Example tool ! Hosted!Availability" site" pingdom.com" Yes"Unhandled exceptions"

site" airbrake.com" Yes"

Slow controller actions or DB queries"

app" newrelic.com (also has dev mode)"

Yes"

Clicks, think times" app" Google Analytics" Yes"Process health & telemetry (MySQL server, Apache, etc.)"

process" god, monit, nagios" No"

•  Interesting: Customer-readable monitoring features with cucumber-newrelic" http://pastebin.com/TaecHfND

Page 40: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

What to measure?"

•  Stress testing or load testing: how far can I push my system..."–  ...before performance becomes unacceptable?"–  ...before it gasps and dies?"

•  Usually, one component will be bottleneck!– a particular view, action, query, …"

•  Load testers can be simple or sophisticated"– bang on a single URI over and over"– do a fixed sequence of URI’s over and over"– play back a log file" 40"

Page 41: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Longevity Bugs"

•  Resource leak (RAM, file buffers, sessions table) is classic example"

•  Some infrastructure software such as Apache already does rejuvenation "– aka “rolling reboot”"

•  Related: running out of sessions"– Solution: store whole session[] in cookie (Rails

3 does this by default)"

Page 42: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection

Maximum CPU utilization

99%ile response time

Rendering time of 3 slowest views

Slowest queries ☐

42"

Which is probably not a metric of high interest to you, the app operator?