The Road to the White House with Puppet & AWS

Preview:

DESCRIPTION

Learn how the Obama campaign leveraged Amazon Web Services (AWS) and Puppet to rapidly scale their infrastructure up for the needs of the election in a sustainable manner. Using the automation that AWS and Puppet enabled -- the Obama campaign build a significant AWS infrastructure (http://awsofa.info) while having a lean DevOps team, tight deadlines and applications that needed to be highly available. Learn about using bootstrapping puppet on Amazon EC2 instances with CloudInit, using it with autoscaling groups and secure handling of credentials in manifests. Find out how to scale puppet masters and take advantage of Amazon S3 backed RPM/Debian repos with them. Leo Zhadanovsky Senior Solutions Architect, Amazon Web Services Leo Zhadanovsky is a Senior Solutions Architect at Amazon Web Services. He helps customers best leverage AWS services, in order to help them succeed in building highly-available, scalable and elastic architectures for their business needs. He was previously the Director of Systems Engineering at the Democratic National Committee. From 2009 to early 2013, he ran the DNC's physical server and cloud footprint and supported infrastructure which was in use by the Obama campaign, state and local Democratic parties. In 2010, the DNC successfully ran and deployed many applications, such as a Call Tool and Voter Registration website, that were written in Ruby and ran on AWS. In 2012, the DNC supported the Obama campaign with various backend APIs, web sites, voter file databases and a large data warehouse.

Citation preview

The Road to the White Housewith Puppet & AWS

Leo Zhadanovsky – Solutions Architect – leo@amazon.com @leozh

What am I talking about today?

What was OFA Tech? • Who did it?• What did they build?

How did they do that?• Technologies and Tradeoffs• Services vs. Software

How did they leverage puppet?

What did they learn from building something so big?

Who Am I?

I work for AWSI worked for the DNC 2009-2012

I was embedded at OFA

AWS does not endorse political candidates

I love Star Trek (TNG is the best)

So here’s the Idea

~30th biggest E-commerce operation, globally~200 distinct new applications, many mobileHundreds of new, untested analytical approachesProcessing hundreds of TB of data on thousands of serversSpikes of hundreds of thousands of concurrent users

FUN FUN FUN

a few constraints…

~30th biggest E-commerce operation, globally~200 distinct applications, many mobileHundreds of new, untested analytical approachesProcessing hundreds of TB of data on thousands of serversSpikes of hundreds of thousands of concurrent users

Critically compressed budgetLess than a year to executeVolunteer and near-volunteer development teamCore systems will be used for a single critical dayConstitutionally-mandated completion date

NOT FUN NOT FUNNOT FUN

Built by guys and gals like these: Obama For America

Business as usual..

…for a technology startup

Election Day – OFA Headquarters

So they built it all, and it worked

Typical Charts

How?

The old approach, even from Amazon

The old approach.. Might have some problems..

No Up-Front Capital Expense

Pay Only for What You Use

Self-Service Infrastructure

Easily Scale Up and Down

Improve Agility & Time-to-Market

Low Cost

Cloud Computing Benefits

Deploy

OFA’s Infrastructure

awsofa.info

Web-Scale Applications

500k+ IOPS DB Systems

Services API

Ingredients

Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB

Data Stores

Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB

Development Frameworks

Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB

Infrastructure, Configuration Management & Monitoring

Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB

Configuration Management: Puppet

In mid-2011, we look at options for configuration management and chose PuppetWe needed to make it scale, and to get it to work with state-less, horizontally scalable infrastructureHow did we do this?

Bootstrapping Puppet with CloudInit

CloudInit is built into Ubuntu and Amazon Linux• Allows you to

pass bootstrap parameters in Amazon EC2 user-data field, in YAML format

Bootstrapping Puppet with CloudInit

Don’t store creds in puppet manifests, store them in private Amazon S3 bucketsEither pass Amazon S3 creds through CloudInit:

Even better – avoid this by using AWS Identity and Access Management (IAM) roles and the version of s3cmd in github

Bootstrapping Puppet with CloudInit

Built-in puppet support

Use certname with %i for instance id to name the nodePuppetmaster must have auto sign turned on• Use security groups and/or NACLs for network-level security

In nodes.pp, use regex to match node names

Puppet Tips

Use a base class to define your standard install

Use runstages

Don’t store credentials in puppet, store them in private Amazon S3 buckets• Use AWS IAM to secure the credentials bucket/folders within that

bucket

Puppet Tips

Puppet Tips

Use puppet only for configuration files and what makes your apps uniqueFor undifferentiated parts of apps, use Amazon S3 backed RPM/Debian repositories• Can be either public or private repos, depending on your needs

• Amazon S3 Private RPM Repos: http://git.io/YAcsbg• Amazon S3 Private Debian Repos: http://git.io/ecCjWQ

Puppet Tips

By using packages for applications deploys, you can set ensure => latest, and just bump the package in the repo to update

Log everything with rsyslog/graylog/loggly/NewRelic/splunk

Scaling the Puppet Masters

Use an Auto Scaling group for puppet masters• Min size => 2, use multiple Availability Zones

Either have them build themselves off of existing puppet masters in the group or off packages storied in Amazon S3 and bootstrapped through user-dataAuto-sign must be on

SitesCommunicationsAd TargetingOps ToolsAnalyticsAppsMicro-targetingMicro-listeningReportingRegistrationsVolunteer

CoordinationEtc, etc, etc.

Technology ChoicePolyglot Development

Cloud Hosting

Diverse, App-centered Databases

SOA, queue-based system integrations

Expected TradeoffMore Complex Ops

Less Infra Control, performanceMore Complex Ops, Fragility, Data Corruption

Dev Complexity, slower system performance

Technology ChoicePolyglot Development

Cloud Hosting

Diverse, App-centered Databases

SOA, queue-based system integrations

Expected TradeoffMore Complex Ops

Less Infra Control, performanceMore Complex Ops, Fragility, Data Corruption

Dev Complexity, slower system performance

UpsideBuild as little as possible, rev-1 faster, reuse dev skills

Scale, Speed, Cost

Heterogeneous Resilience, right tools for the job

Scalability, serviceability, operational flexibility, and substantially faster in aggregate

$5.2B retail business

7,800 employees

A whole lot of servers

2003

2012

Every day, AWS adds enough server capacity to

power this $5B enterprise

$5.2B retail business

7,800 employees

A whole lot of servers

2003

2012

Amazon Simple Queuing Service

(SQS)

Thousands of customers

A whole lot of servers

Over 5 Billion Queued Events

2006-8

2012

OFA

Produced 8.4 Billion Amazon SQS Queued

Events

Amazon Simple Queuing Service

(SQS)

Thousands of customers

A whole lot of servers

Over 5 Billion Queued Events

2006-8

2012

OFA

Produced 8.4 Billion Amazon SQS Queued

Events

Just the last month of the campaign

2006-8

Amazon Simple Queuing Service

(SQS)

Thousands of customers

A whole lot of servers

Over 5 Billion Queued Events

No time to waste

?This applies to lots of services!

Elastic Load BalancingAmazon ElastiCacheAmazon RDSAmazon CloudSearchAmazon Route53Amazon S3Amazon CloudFrontAmazon DynamoDB

You can mostly do these on your

own…

But do you have extra:focus, expertise, time, research,

money, risk-tolerance, staff, dedication

to innovate, operations coverage, scalability in

design...

Looks pretty simple.

Inserts 7.5m records in Amazon DynamoDB, in 8 minutes

One thing that is difficult to prepare for…

No pressure…

They had this built for the previous 3 months, all on the East Coast.

They had this built for the previous 3 months, all on the East Coast.

We built this part in 9 hours

to be safe.

AWS +Puppet +

Netflix Asgard + CloudOpt +DevOps =

Cross-Continent Fault-Tolerance On-Demand

Replication across the continent..

http://tsunami-udp.sourceforge.net/

478.18 Mbps cross-continental data transit rate for a single cc2.8xlarge instance

1.72 Tb an hour

27 Tb of data to move

3.92 Hours required to move the data across the continent with four cc2.8xlarge instances

So what did they learn?

HA in Depth: Amazon S3 static pages, de-coupled UI, jekyll/hyde

Game Day: Practice failures so you know what to do.( http://www.awsgameday.com )

Loose-Coupling: Ops easy, scale easy, test easy, fix easy…

Fail-Forward: features, quality, and focus are all critical.

Cloud works.

We showed it to the world at re: Invent 2012

together with the OFA DevOps crew

We presented in Tokyo…

Born from the Campaign

What will you do next?

Maybe look at some of their Ruby code?

Register Now! reinvent.awsevents.com$200 Off Discount Code:

Zoltan2013

Gain New Skills & KnowledgeChoose from 175+ technical sessions, training bootcamps, hands-on labs, and hackathons.

Dive Deeper into AWSDive deep into foundational AWS services and learn about the latest services and features.

Get Your Questions AnsweredGet your technical questions answered by AWS architects, engineers, and product leads.

Learn Best PracticesDiscover best practices, tips and tricks, and lessons learned from expert customers.

Thank you!

Questions? • Come talk to an AWS Solutions Architect at Table 22

Contact me!• @leozh• leo@amazon.com

Recommended