Upload
tapster
View
4.290
Download
0
Tags:
Embed Size (px)
Citation preview
Cookpad’s Migration Path to AWS
Cookpad Inc. Genki Sugawara
About Me• My work at Cookpad
o Head of Infrastructure o Mission: Building and implementing Cookpad’s
infrastructure, always working to improve speed, scalability, availability, back up, and security.
• Open source work o Development of AWS tools
• elasticfox-ec2tag, IAM Fox, R53 Fox o Ruby Library Development
• Zipruby, libarchive, rua, etc.
Contents
• About Cookpad • Why AWS? • AWS server and network configuration • Migration of service
About Cookpad
About Cookpad
• Recipe website used by over 15 million people
• Over 1 million Recipes • 490 million monthly PVs • Ruby on Rails + MySQL
About Cookpad
• PC site o cookpad.com
About Cookpad
• Mobile site o m.cookpad.com
About Cookpad
• iPhone • Android
About Cookpad0:0
0
1:0
0
2:0
0
3:0
0
4:0
0
5:0
0
6:0
0
7:0
0
8:0
0
9:0
0
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
0
PV�
PV variation during a single day �
About Cookpad
4月 5月 6月 7月 8月 9月 10月 11月 12月 1月 2月 3月
PV�
Variation in PVs across the year
Why move to AWS?
Why AWS?
1. Speed 2. Distribution of Work 3. Cost �
Why AWS?
o Development speed
Speed
Distribution
of Work
Cost
Why AWS?
o New servers currently require several weeks or more to prepare
o We lack the some of the know-how to build our own servers
Speed
Distribution
of Work
Cost
Why AWS?
o Getting caught up in infrastructure issues causes large delays in releases
Speed
Distribution
of Work
Cost
Why AWS?
o With AWS, it takes less than 10 minutes to start up an instance.
Speed
Distribution
of Work
Cost
Why AWS?
o Ability to distribute work
Speed
Distribution
of Work
Cost
Why AWS?
Speed
Distribution
of Work
CostApp
Engineer
RequestInfra
Engineer
Prep
Before AWS
Why AWS?
After AWS Speed
Distribution
of Work
Cost App Engineer
Prep
Why AWS?
o Without AWS, distributing work is difficult: • Need infrastructure skills/knowledge • Problems with security & stability
o With AWS, distribution of work is made possible • Very little specialized skill needed • Security/stability issues can be solved by
giving authority where needed
Speed
Distribution
of Work
Cost
Why AWS?
o EC2 seems a little too costly
Speed
Distribution
of Work
Cost �
Why AWS?For example, here’s an unexpected “surprise” in my EC2 monthly statement… Speed
Distribution
of Work
Cost �
Why AWS?iDC:Charged according to greatest bandwidth
Speed
Distribution
of Work
Cost �
Why AWS?AWS:Charged by data transmitted (Less cost for sites like Cookpad, which have peak and non-peak times) Speed
Distribution
of Work
Cost �
Why AWS?
o Charged by amount of data transmitted • Less costly when difference between peak
& non-peak times is especially large. o Do away with excess investment into servers
Speed
Distribution
of Work
Cost �
Server & Network Configuration
Server & Network Configuration
Current NetworkNetwork
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo Simple 3-layer structure o Networks are partitioned at each layer
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
EC2’s NetworkNetwork
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo All servers located in same segment o Instead of partitioned networks,
security groups are used
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o Two types of security groups set for instances • Basic
• Security groups for each role
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
Security group organization/structureNetwork
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o Basic allows for mutual communication between basic ports • ping(icmp) • http
o Allows access from specific security groups • Health monitoring tools (Nagios, etc.) • Performance monitoring tools (Munin,
etc.)
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
Security group organization/structureNetwork
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o Security groups for each role • Enables communication between
roles themselves • Enables communication between
each role and basic.
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo Enable access from App groups to DB
groups Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo Allows queries from Basic to DNS
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo IP address are not specified for general
access. o One exception are roles accessed from
Elastic Load Balancing, in which 10.0.0.0/8 access is allowed• Cannot specify source IP • Cannot specify security group
o Start iptables on all servers • Helps eliminate human error
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o With EC2, internal IP addresses cannot be fixed • Internal IP addresses end up
changed with stops & reactivations o Use Internal DNS to block out IP
addresses
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo DNS is organized into a 2-part Active-Active
configuration • Each is assigned an Elastic IP
o Each server references DNS with resolv.conf
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server
Server
Server & Network Configurationo DNS obtains name tag information
and configures domain information Ex.) Name:dev → dev.ap-northeast-1.compute.internal
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo resolv.conf is periodically reset by cron
• When internal IP address changes, resolv.conf is reset
• If one DNS server stops, it is removed from resolv.conf
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server
Server & Network Configurationo Cron requests DNS’s Public DNS
Name(Public DNS Name is fixed by Elastic IP assignment)
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Request Public DNS
Name
Server & Network Configurationo DNS’s internal IP is acquired as the IP
address associated with the Public DNS Name
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Acquire Public DNS
Name
Server & Network Configurationo Acquired internal IP is written into resolv.conf o If the request isn’t returned, then it is
removed from resolv.conf
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Write internal IP
Server & Network Configuration
o Clean installation of CentOS5.5 o Root Device = EBS
o Currently, a mix of 32bit and 64bit, but will move to 64bit only in the future.
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo AMI for each role is created from the base
AMI o Each AMI is given its own version o Also implement system management tools
such as Chef
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o System network health monitoring • Nagios + nrpe
o Performance monitoring • Munin
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo Nagios monitors server health status o Munin monitors and records server
performance data (e.g. CPU usage, load average, etc.)
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server
Server
Server & Network Configurationo Started instances are automatically
monitored by Nagios・Munin o Each instance is given a tag so the
appropriate type of monitoring can be identified.
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o Increasing availability • Mutual monitoring using Elastic IP
• Restoration from AMI using Nagios
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
Mutual monitoring using Elastic IP o Used in Nagios & LDAP redundancy
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configurationo Monitor public DNS name of each
elastic IP Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Monitors Public DNS Name
Server & Network Configurationo Health check is not performed if the
returning internal IP address is of the server itself.
o If the address differs from the server, then health check is carried out
o →Back up always performs health check for master
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Back-up performs master health
check
Server & Network Configurationo If the master health check fails, then
the back-up assigns itself an elastic ID
o Elastic IP is moved from the master to the back-up, and switched to failover
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Elastic IP moved to back-up
Server & Network Configuration
Restoration from AMI using Nagios o When Nagios fails its health check, it
is restored from AMI o Used in Munin, etc.
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Monitor
Starts instance
Server (new instance)
Server & Network Configuration
o Mutual monitoring using Elastic IP • Applied to the server that we most
want to minimize downtime o Restoration from AMI using Nagios • Applied to server allowing 5〜~10
minutes downtime
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
o Downtime is longer compared to keepalived, etc.
o Currently looking into redundancy using Heartbeat
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL
Server & Network Configuration
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL �
Data
Data
(Daily)
Server & Network Configurationo EC2 used only for Slaves o Data in EBS
o Snapshots of data taken daily
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL �
Data
Data
(Daily)
Server & Network Configurationo New slave created from snapshots
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL �
(Daily)
Start up
Data
Data
New DB �
Restoration
Server & Network Configurationo Data created from snapshot has same
replication position o Simplification of slave failover
Network
Security
DNS
AMI
Monitoring
Redundancy
MySQL �
DataNew Data (EBS)
New DB �
Restore Create
Service Migration
Service MigrationiDC & EC2 Hybrid
Internet
Service Migrationo Service access is divided up between EC2 & iDC
using round robin o Read from DB comes from EC2
o Write to DB takes place in iDC
Service MigrationMoving the master DB to EC2
Internet
Service Migrationo The master DB is moved to EC2 o Before the move, iDC access is gradually stopped
o Finally, iDC is completely removed.
Thank you!