55
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Charity Majors November 14, 2013 How Parse built a mobile backend as a service Friday, November 15, 13

How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Embed Size (px)

DESCRIPTION

Parse is a BaaS for mobile developers that is built entirely on AWS. With over 150,000 mobile apps hosted on Parse, the stability of the platform is our primary concern, but it coexists with rapid growth and a demanding release schedule. This session is a technical discussion of the current architecture and the design decisions that went in to scaling the platform rapidly and robustly over the past year and a half. We talk about some of the lessons learned managing and scaling MongoDB, Cassandra, Redis, and MySQL in the cloud. We also discuss how Parse went from launching individual instances using chef to managing clusters of hosts with Auto Scaling groups, with instance discovery and registry handled by ZooKeeper, thus enabling us to manage vastly larger sets of services with fewer human resources. This session is useful to anyone who is trying to scale up from startup to established platform without sacrificing agility.

Citation preview

Page 1: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Charity Majors

November 14, 2013

How Parse built a mobile backend as a service

Friday, November 15, 13

Page 2: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

What is Parse?• platform for mobile developers

• iOS, Android, WinRT• API and native SDKs• Scales automatically to handle traffic• Analytics, cloud code, file storage, push notifications, hosting

Friday, November 15, 13

Page 3: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Parse is magic.

Friday, November 15, 13

Page 4: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Parse is built on AWS• Parse has never touched bare metal• Recently acquired by Facebook• Current plan is to stay on AWS• We love AWS!

Friday, November 15, 13

Page 5: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Parse is growing fast• Developers• Apps• API usage• Nodes and compute resources• Connected devices

Friday, November 15, 13

Page 6: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

6/11/11

7/11/11

8/11/11

9/11/11

10/11/11

11/11/11

12/11/12

1/11/12

2/11/12

3/11/12

4/11/12

5/11/12

6/11/12

7/11/12

8/11/12

9/11/12

10/11/12

11/11/12

12/11/12

1/11/13

2/11/13

3/11/13

4/11/13

5/11/13

6/11/13

7/11/13

8/11/13

9/11/13

Developers

Friday, November 15, 13

Page 7: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

6/11/11

7/11/11

8/11/11

9/11/11

10/11/11

11/11/11

12/11/12

1/11/12

2/11/12

3/11/12

4/11/12

5/11/12

6/11/12

7/11/12

8/11/12

9/11/12

10/11/12

11/11/12

12/11/12

1/11/13

2/11/13

3/11/13

4/11/13

5/11/13

6/11/13

7/11/13

8/11/13

9/11/13

Developers

When PARSEwas acquired

Friday, November 15, 13

Page 8: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Top left: Parse Grid Load last year

Top Right: Number of Hits last year

Bottom Left: Active PPNS Connections last year

Friday, November 15, 13

Page 9: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

1.5 years ago

Friday, November 15, 13

Page 10: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Now

Friday, November 15, 13

Page 11: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Parse ops philosophy• Work smarter, not harder• Small team, full stack generalists• Automate, automate, automate• Our goal:

• 80% time working on things we want to do• 20% time working on things we have to do

Friday, November 15, 13

Page 12: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

October 2013October 2012• 60% time spent on must-do’s• 40% time spent on want to do’s• ~400 event alerts• Very sleepy opsen

• 20% time spent on must-do’s• 80% time spent want to do’s• ~100 event alerts (mostly daytime)• Infra complexity has 5x’d but time to

manage it has dropped• We have shifted a lot of work from

ourselves to AWS

Past & Present

Friday, November 15, 13

Page 13: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Takeaways• ASGs are your best friend• Automation should be reusable• Choose your source of truth carefully

Friday, November 15, 13

Page 14: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Parse stack

Friday, November 15, 13

Page 15: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Friday, November 15, 13

Page 16: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Infrastructure design choices• Chef• Amazon Route 53• Use real hostnames• Distribute evenly across 3 AZs• Fail over automatically• Single source of truth

Friday, November 15, 13

Page 17: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Amazon EC2 design choices• Standardize on a few instance types

• Makes reserved instances more efficient• We use m1.large, m1.xlarge, m2.4xlarge (multi-core is a must). Prefer many

small disposable instances for stateless services.

• Security groups• One group per role• Verify working set with expected set using git/nagios

• All inbound requests come through Elastic Load Balancing• Nothing talks directly to Amazon EC2 instances

Friday, November 15, 13

Page 18: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Friday, November 15, 13

Page 19: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

API path• Elastic Load Balancing• nginx• haproxy• Ruby app servers (unicorns)• Go api servers (go rewrite from the

ground up)• Go logging servers to FB endpoint

Friday, November 15, 13

Page 20: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Friday, November 15, 13

Page 21: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Hosting• Elastic Load Balancing• Elastic IPs for apex domain redirect service• Go service that wraps cloud code and Amazon S3

Friday, November 15, 13

Page 22: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Friday, November 15, 13

Page 23: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Cloud code• Server-side javascript in v8 virtual machine• Third-party modules for partners (Stripe, Twilio, etc.)• Restrictive security groups• Scrub IPs with squid

Friday, November 15, 13

Page 24: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Push• Resque on redis• Billions of pushes per month• 700/sec steady state• Spikes to 10k/sec (15x burst)• PPNS holds sockets open to all

android devices• PDNS to serve android phone-

home IPs

Friday, November 15, 13

Page 25: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Friday, November 15, 13

Page 26: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

MongoDB• 12 replica sets, ~50 nodes, 2-4 TB per rs• Over 1M collections• Over 170k schemas• Autoindexing of keys based on entropy• Compute compound indexes from real traffic analysis• Implemented our own app-level sharding• PIOPS (striped RAID, 2000-8000 PIOPS/vol)

• totally saved our bacon. Amazon EBS was a killer.

• Fully instrumented provisioning with chef

Friday, November 15, 13

Page 27: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Memcache• Pool of memcaches with consistent hash• I would use ElastiCache instead next time

Friday, November 15, 13

Page 28: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Redis• Queueing using resque• Android outboxes• Single-threaded• Just started playing with ElastiCache redis

Friday, November 15, 13

Page 29: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

MySQL• Trivially tiny and we would love to get rid of it

• ... but rails

• Considered Amazon RDS• No chained replication• Visibility is challenging• Even tiny periodic blips impact the API• ... but AZ failover would be sooo nice

Friday, November 15, 13

Page 30: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Cassandra• Powers the front-end Parse Analytics• Super fast writes and increments• 12 node cluster of m2.4xlarge• Ephemeral storage

• Cheap & won our benchmarks

Friday, November 15, 13

Page 31: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Cassandra + Priam• Initial token assignments• Incremental backups to Amazon S3• Uses Auto Scaling groups• Amazon SimpleDB for tracking tokens, instance

identities• Non-trivial to set up but WORTH IT

Friday, November 15, 13

Page 32: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Infrastructure

Friday, November 15, 13

Page 33: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Characteristics•Ruby on Rails everywhere•Chef to build AMIs•Chef role per service

•Capistrano to deploy code•Source of truth: git

First-generation infrastructure

Friday, November 15, 13

Page 34: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

EffectsCharacteristics•Ruby on Rails everywhere•Chef to build AMIs•Chef role per service

•Capistrano to deploy code•Source of truth: git

• Sooo much hand-editing•Make the same change in many places

•Full deploy and restart any time a single host is added or removed

•Fine for small static host sets

First-generation infrastructure

Friday, November 15, 13

Page 35: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Run 20 knife-ec2 commands to launch 20 hosts,

•Edit the cap deploy file,

•Edit the yml files, push to git,•Do a cap cold deploy to new hosts,

•Do a full deploy/restart to all the services that need to talk to the new hosts

How to deploy 20 new servers:Total time elapsed:

1.5–2.5hours

Friday, November 15, 13

Page 36: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Run 20 knife-ec2 commands to launch 20 hosts,

•Edit the cap deploy file,

•Edit the yml files, push to git,•Do a cap cold deploy to new hosts,

•Do a full deploy/restart to all the services that need to talk to the new hosts

How to deploy 20 new servers:Total time elapsed:

1.5–2.5hoursOMG not ok.

Friday, November 15, 13

Page 37: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Babysitting•Maintaining machine lists by hand•No consistent human readable host naming

•Requires full code deploy to add single node•Humans have to know things and make decisions

PROBLEMS

Friday, November 15, 13

Page 38: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Characteristics•Ruby on Rails everywhere•Chef to configure systems•Chef to generate host lists

•Capistrano to deploy code•Source of truth: chef

Second-generation infrastructure

Friday, November 15, 13

Page 39: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

EffectsCharacteristics•Ruby on Rails everywhere•Chef to configure systems•Chef to generate host lists

•Capistrano to deploy code•Source of truth: chef

•YML files, haproxy configs, etc generated every chef run

•No longer need to do full deploys to affected services, just restart

•Only one set of files to maintain by hand (capistrano)

Second-generation infrastructure

Friday, November 15, 13

Page 40: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Run 20 knife-ec2 commands to launch 20 hosts

•Edit the cap deploy file

•Do a cap cold deploy to new hosts

•Let chef-client run to generate YML files

•Restart services that need to talk to the new hosts

How to deploy 20 new servers:Total time elapsed:

30-60 minutes

Friday, November 15, 13

Page 41: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Run 20 knife-ec2 commands to launch 20 hosts

•Edit the cap deploy file

•Do a cap cold deploy to new hosts

•Let chef-client run to generate YML files

•Restart services that need to talk to the new hosts

How to deploy 20 new servers:Total time elapsed:

30-60 minutesSTILL not ok!

Friday, November 15, 13

Page 42: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Scale up any class of service in < 5 minutes•Automatically detect new nodes•Automatically remove downed nodes from service

•No hand maintained lists ANYWHERE (ugh)•Deploy fast—no time to build AMIs•Option of deploying from master•Design a new deploy process for go binaries

what are our primary goals?

Friday, November 15, 13

Page 43: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Jenkins + Amazon S3Auto Scaling Groups•Each service lives in an ASG•Same AMI used for most services•Base AMI generated by chef

• System state managed by chef•ASG named after chef role

•Runs unit tests•Generate a tarball artifact for each successful build

•Upload to Amazon S3, tag with the build # and role

putting together a solution

Friday, November 15, 13

Page 44: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

auto-deployauto-bootstrap•Runs on first boot•Infers chef role from ASG name•Generates a client.rb and initial runlist

•Registers DNS with Amazon Route 53

•Grabs a lock from zookeeper, so DNS is atomic

•Bootstraps chef•Auto-deploy

• infers the chef role from ASG name

•pulls build artifact from Amazon S3

•unpacks tarball, restarts

autoification

Friday, November 15, 13

Page 45: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

zookeeper•We LOVE zookeeper!!• Service registration, service discovery

•Distributed locking•Coordinated actions, unique ids

a better source of truth

Friday, November 15, 13

Page 46: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

how it workszookeeper•We LOVE zookeeper!!• Service registration, service discovery

•Distributed locking•Coordinated actions, unique ids

•zkwatcher detects the service is up, establishes an ephemeral node to zk

•Or the service registers itself•Ephemeral node goes away, service gets deregistered

•Capistrano asks zookeeper for the list of alive servers to deploy to

a better source of truth

Friday, November 15, 13

Page 47: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Characteristics•Some go, some ruby•Chef to maintain state•ASG per chef role

•Capistrano + zk + jenkins + Amazon S3

• Source of truth: zookeeper

Third-generation infrastructure

Friday, November 15, 13

Page 48: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

EffectsCharacteristics•Some go, some ruby•Chef to maintain state•ASG per chef role

•Capistrano + zk + jenkins + Amazon S3

• Source of truth: zookeeper

•No lists of hosts•No manual labor•Happy opsen

Third-generation infrastructure

Friday, November 15, 13

Page 49: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Adjust the size of the ASG•Have a cocktail

Deploy 20 new servers:Total time elapsed:

5-10 minutes

Friday, November 15, 13

Page 50: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

•Adjust the size of the ASG•Have a cocktail

Deploy 20 new servers:Total time elapsed:

5-10 minutes

YAY!Friday, November 15, 13

Page 51: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

ASG caveats• Amazon CloudWatch triggers are minimally useful for us

• Our bursts are usually too short and sharp• No periodicity to our traffic patterns

• ... but we are lazy so we would like to add them anyway• Need more tooling around downsizing ASGs gracefully• Initial chef run may take 5-7 minutes

• Could someday optimize this• Or eat the overhead of building AMIs with each successful jenkins build

Friday, November 15, 13

Page 52: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Remaining issues• When we get rid of ruby, get rid of cap

• Just use auto-deploy for everything• Trigger a deploy by updating build version # in zookeeper

• Automatic failover for mysql and redis• Move everything into VPC

• ASGs will really help with this!• Then we can use internal load balancers instead of haproxy. Want badly.

Friday, November 15, 13

Page 53: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Takeaways• Single source of truth, or multiple sources of lies• The more real-time your source of truth, the faster your

response time can be• ASGs are amazing <3 <3

Friday, November 15, 13

Page 54: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Q&A

Friday, November 15, 13

Page 55: How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

MBL307 Thank You

Friday, November 15, 13