Cookpad AWS Seminar

Preview:

Citation preview

Cookpad’s Migration Path to AWS

Cookpad Inc. Genki Sugawara

About Me•  My work at Cookpad

o  Head of Infrastructure o  Mission: Building and implementing Cookpad’s

infrastructure, always working to improve speed, scalability, availability, back up, and security.

•  Open source work o  Development of AWS tools

•  elasticfox-ec2tag, IAM Fox, R53 Fox o  Ruby Library Development

•  Zipruby, libarchive, rua, etc.

Contents

•  About Cookpad •  Why AWS? •  AWS server and network configuration   •  Migration of service

About Cookpad

About Cookpad

•  Recipe website used by over 15 million people

•  Over 1 million Recipes •  490 million monthly PVs •  Ruby on Rails + MySQL

About Cookpad

•  PC site o  cookpad.com

About Cookpad

•  Mobile site o m.cookpad.com

About Cookpad

•  iPhone •  Android

About Cookpad0:0

0

1:0

0

2:0

0

3:0

0

4:0

0

5:0

0

6:0

0

7:0

0

8:0

0

9:0

0

10:0

0

11:0

0

12:0

0

13:0

0

14:0

0

15:0

0

16:0

0

17:0

0

18:0

0

19:0

0

20:0

0

21:0

0

22:0

0

23:0

0

PV�

PV variation during a single day �

About Cookpad

4月 5月 6月 7月 8月 9月 10月 11月 12月 1月 2月 3月

PV�

Variation in PVs across the year

Why move to AWS?

Why AWS?

1.  Speed 2.  Distribution of Work 3.  Cost �

Why AWS?

o Development speed

Speed

Distribution

of Work

Cost

Why AWS?

o New servers currently require several weeks or more to prepare

o We lack the some of the know-how to build our own servers

Speed

Distribution

of Work

Cost

Why AWS?

o Getting caught up in infrastructure issues causes large delays in releases

Speed

Distribution

of Work

Cost

Why AWS?

o With AWS, it takes less than 10 minutes to start up an instance.

Speed

Distribution

of Work

Cost

Why AWS?

o  Ability to distribute work

Speed

Distribution

of Work

Cost

Why AWS?

Speed

Distribution

of Work

CostApp

Engineer

RequestInfra

Engineer

Prep

Before AWS

Why AWS?

After AWS Speed

Distribution

of Work

Cost App Engineer

Prep

Why AWS?

o  Without AWS, distributing work is difficult: •  Need infrastructure skills/knowledge •  Problems with security & stability

o  With AWS, distribution of work is made possible •  Very little specialized skill needed •  Security/stability issues can be solved by

giving authority where needed

Speed

Distribution

of Work

Cost

Why AWS?

o  EC2 seems a little too costly

Speed

Distribution

of Work

Cost �

Why AWS?For example, here’s an unexpected “surprise” in my EC2 monthly statement… Speed

Distribution

of Work

Cost �

Why AWS?iDC:Charged according to greatest bandwidth

Speed

Distribution

of Work

Cost �

Why AWS?AWS:Charged by data transmitted (Less cost for sites like Cookpad, which have peak and non-peak times) Speed

Distribution

of Work

Cost �

Why AWS?

o  Charged by amount of data transmitted •  Less costly when difference between peak

& non-peak times is especially large. o  Do away with excess investment into servers

Speed

Distribution

of Work

Cost �

Server & Network Configuration

Server & Network Configuration

Current NetworkNetwork

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  Simple 3-layer structure o Networks are partitioned at each layer

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

EC2’s NetworkNetwork

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  All servers located in same segment o  Instead of partitioned networks,

security groups are used

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o  Two types of security groups set for instances •  Basic

•  Security groups for each role

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

Security group organization/structureNetwork

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o  Basic allows for mutual communication between basic ports •  ping(icmp) •  http

o  Allows access from specific security groups •  Health monitoring tools (Nagios, etc.) •  Performance monitoring tools (Munin,

etc.)

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

Security group organization/structureNetwork

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o  Security groups for each role •  Enables communication between

roles themselves •  Enables communication between

each role and basic.

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  Enable access from App groups to DB

groups Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  Allows queries from Basic to DNS

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  IP address are not specified for general

access. o  One exception are roles accessed from

Elastic Load Balancing, in which 10.0.0.0/8 access is allowed•  Cannot specify source IP •  Cannot specify security group

o  Start iptables on all servers •  Helps  eliminate  human  error

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o With EC2, internal IP addresses cannot be fixed •  Internal IP addresses end up

changed with stops & reactivations o Use Internal DNS to block out IP

addresses

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  DNS is organized into a 2-part Active-Active

configuration •  Each is assigned an Elastic IP

o  Each server references DNS with resolv.conf

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server

Server

Server & Network Configurationo DNS obtains name tag information

and configures domain information Ex.) Name:dev → dev.ap-northeast-1.compute.internal

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  resolv.conf is periodically reset by cron

•  When internal IP address changes, resolv.conf is reset

•  If one DNS server stops, it is removed from resolv.conf

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server

Server & Network Configurationo  Cron requests DNS’s Public DNS

Name(Public DNS Name is fixed by Elastic IP assignment)

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Request Public DNS

Name

Server & Network Configurationo  DNS’s internal IP is acquired as the IP

address associated with the Public DNS Name

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Acquire Public DNS

Name

Server & Network Configurationo  Acquired internal IP is written into resolv.conf o  If the request isn’t returned, then it is

removed from resolv.conf

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Write internal IP

Server & Network Configuration

o Clean installation of CentOS5.5 o Root Device = EBS

o Currently, a mix of 32bit and 64bit, but will move to 64bit only in the future.

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  AMI for each role is created from the base

AMI o  Each AMI is given its own version o  Also implement system management tools

such as Chef

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o  System network health monitoring •  Nagios + nrpe

o Performance monitoring •  Munin

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo  Nagios monitors server health status o  Munin monitors and records server

performance data (e.g. CPU usage, load average, etc.)

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server

Server

Server & Network Configurationo  Started instances are automatically

monitored by Nagios・Munin o  Each instance is given a tag so the

appropriate type of monitoring can be identified.

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o  Increasing availability •  Mutual monitoring using Elastic IP

•  Restoration from AMI using Nagios

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

Mutual monitoring using Elastic IP o Used in Nagios & LDAP redundancy

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configurationo Monitor public DNS name of each

elastic IP Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Monitors Public DNS Name

Server & Network Configurationo  Health check is not performed if the

returning internal IP address is of the server itself.

o  If the address differs from the server, then health check is carried out

o  →Back up always performs health check for master

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Back-up performs master health

check

Server & Network Configurationo  If the master health check fails, then

the back-up assigns itself an elastic ID

o  Elastic IP is moved from the master to the back-up, and switched to failover

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Elastic IP moved to back-up

Server & Network Configuration

Restoration from AMI using Nagios o When Nagios fails its health check, it

is restored from AMI o Used in Munin, etc.

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Monitor

Starts instance

Server (new instance)

Server & Network Configuration

o Mutual monitoring using Elastic IP •  Applied to the server that we most

want to minimize downtime o Restoration from AMI using Nagios •  Applied to server allowing 5〜~10

minutes downtime

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

o Downtime is longer compared to keepalived, etc.

o Currently looking into redundancy using Heartbeat

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL

Server & Network Configuration

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL �

Data

Data

(Daily)

Server & Network Configurationo  EC2 used only for Slaves o Data in EBS

o  Snapshots of data taken daily

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL �

Data

Data

(Daily)

Server & Network Configurationo New slave created from snapshots

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL �

(Daily)

Start up

Data

Data

New DB �

Restoration

Server & Network Configurationo Data created from snapshot has same

replication position o  Simplification of slave failover

Network

Security

DNS

AMI

Monitoring

Redundancy

MySQL �

DataNew Data (EBS)

New DB �

Restore Create

Service Migration

Service MigrationiDC & EC2 Hybrid

Internet

Service Migrationo  Service access is divided up between EC2 & iDC

using round robin o Read from DB comes from EC2

o Write to DB takes place in iDC

Service MigrationMoving the master DB to EC2

Internet

Service Migrationo  The master DB is moved to EC2 o Before the move, iDC access is gradually stopped

o  Finally, iDC is completely removed.  

Thank you!

Recommended