17

Click here to load reader

Corley scalability

Embed Size (px)

DESCRIPTION

How to scale

Citation preview

Page 1: Corley scalability

SCALABLE APPLICATIONHOW TO MAKE WEB APP SCALABLE ON THE

CLOUD

CORLEY SRL – WWW.CORLEY.IT

Page 2: Corley scalability

A TYPICAL SCALABLE INFRASTRUCTURE

CORLEY SRL – WWW.CORLEY.IT

Page 3: Corley scalability

RDBMS – START FROM THE END

CORLEY SRL – WWW.CORLEY.IT

• MySQL • RDBMS

• Relational Database Management System

• How it scales?• Read Replica

• Pros (In terms of scalability)• Simple to do• Simple management

• Cons• You can scale only read operations

• The master instance has to handle all write operations (bottleneck on writes)

Page 4: Corley scalability

READ REPLICA ON AWS

CORLEY SRL – WWW.CORLEY.IT

• From RDS service tab on the AWS console right click on a running instance and create a Read Replica DB Instance

• Configure the read-replica and create it through the graphical console.

Page 5: Corley scalability

IN ORDER TO PROMOTE A SLAVE TO MASTER?

CORLEY SRL – WWW.CORLEY.IT

Similar to master creation• Select a read-replica• Right-click and promote Read Replica

Discover more on RDS:• http://aws.typepad.com/aws/amazon-rds/

Page 6: Corley scalability

NOW HAVE A LOOK ON WEB INSTANCES

CORLEY SRL – WWW.CORLEY.IT

• All web instances scales out instead scales up• Scale out? What it means?

• Instead increase VM performances (more RAM, more CPU, more IO etc. etc.) open new VM and serve requests from these instances

• Load balancer route incoming connections to VMs using common algorithms• Round robin techniques• Based on VMs average load

Page 7: Corley scalability

PROBLEMS… WE NEVER TALK ABOUT…

CORLEY SRL – WWW.CORLEY.IT

• Session management• If we open and close servers runtime we have to maintain PHP

sessions in order to handle user logins and other features related to sessions

• Database connections• All MySQL connectors handle just one connection… No “x” RDB

connections a the same time…• Software and Plugins maintenance

• How can we have the same version of WordPress and WP Plugins if VMs starts and stops continuously? How can we handle software updates?

• What about logs? How can we centralize the log management?

Page 8: Corley scalability

DELEGATE SESSION MANAGEMENT TO MEMCACHE

CORLEY SRL – WWW.CORLEY.IT

• Memcache(d) servers are not only useful distributed in RAM caching servers but also they can manage PHP session for us.• Memcache infrastructure is simple to create and

maintain• Elasticache Service of AWS

• No software modification• We have just to configure the PHP interpreter (compile

with memcache/memcached support)

session.save_handler = memcache session.save_path = "tcp://1.cache.group.domain.tld:11211" 

Page 9: Corley scalability

DELEGATE CONNECTIONS TO MYSQL NATIVE DRIVER

CORLEY SRL – WWW.CORLEY.IT

• MySQL native driver?• Available from PHP >=5.3• Compile PHP with mysqlnd support

• --with-mysqli=mysqlnd --with-pdo=mysqlnd --with-mysql=mysqlnd

• WARN mysql extension is deprecated as of PHP 5.5.0

• Delegate to “mysqlnd_ms” the master/slave management• http://www.php.net/manual/en/book.mysqlnd-ms.php

Page 10: Corley scalability

DELEGATE CONNECTIONS TO MYSQL NATIVE DRIVER

CORLEY SRL – WWW.CORLEY.IT

{ "myapp": { "master": { "master_0": { "host": "localhost", "port": "3306" } }, "slave": { "slave_0": { "host": "192.168.2.27", "port": "3306" } } }}

The simple JSON configuration is divided in two main section

• Master• Slaves

“myapp” is the hostname that we use instead the real mysql host address.

Eg.• mysql_connect(“myapp”,

“user”, “passwd”);• new Mysqli(“myapp”, “user”,

“passwd”);• new

PDO(“mysql:dbname=testdb;host=myapp”);

Page 11: Corley scalability

START TALKING ABOUT ELASTIC COMPUTE CLOUD

CORLEY SRL – WWW.CORLEY.IT

• ELB – Elastic Load Balancer• Distributed load balancer on AWS regions (eu-west-1, 2, 3 you

have to select in how many region you are available)• Watch EC2 status thanks to a ping strategy

• Page check every x minutes/seconds

• Turn on/off EC2 instances automatically thanks to alarms (CloudWatch raise alarms)• Receive Alarms from CloudWatch and engage scale operations• You can raise CPU alarms, Network Alarms, VM status alarms and many

others in order to increase or decrease the actual number of EC2

• Scale strategy is not simple and you have to understand how your application works• CPU is the simplest way but remember that the bandwidth is limited by

network interfaces and bottlenecks can obfuscate the CPU alarm and your application stucks in weird and strange situations.

Page 12: Corley scalability

AUTOSCALING WITH ELB + EC2 + CLOUDWATCH

CORLEY SRL – WWW.CORLEY.IT

• If servers start and stops continuously, we have to find solutions to stay fresh and updated also on software• When a server starts, it has to create a valid

environment in order to provides web pages. Strategies?• Compile and bundle all softwares in one instance image

• It is very simple but all software becomes old very quickly and when you have to release an update you have to compile a new image and update all load balancers configurations. It is a long and complex operation

• Use EC2_USER_DATA feature provided by AWS• You can run a shell script when your instances bootstraps. It is more

flexible because you can create a skeleton (PHP + libraries) and download all software runtime during the boot operation

Page 13: Corley scalability

THE PROBLEM WITH SOFTWARE MANAGEMENT

CORLEY SRL – WWW.CORLEY.IT

Use SVN (Subversion) to download the latest version of WordPress

Probably is not a good idea use the “trunk” but you can use tags in order to stay aligned in all VMssvn checkout http://core.svn.wordpress.org/tags/3.5.1/ mywebsite

http://codex.wordpress.org/Installing/Updating_WordPress_with_Subversion

Use SVN externals to download your pluginscd mywebsite/wp-content/plugins/svn propset svn:externals akismet http://plugins.svn.wordpress.org/akismet/tags/2.5.7/svn up

Create/Download your WordPress configuration file during VM bootstrap

Page 14: Corley scalability

HOW WE CAN DOWNLOAD WP AND PLUGINS?

CORLEY SRL – WWW.CORLEY.IT

• If you ran 10 servers execute commands could be hard. You can use tools to run command on a server list• Capistrano (Ruby)

• https://github.com/capistrano/capistrano

• Fabric (Python)• https://github.com/fabric/fabric• Use CLOTH for AWS EC2 instances

• https://github.com/garethr/cloth

Page 15: Corley scalability

HOW TO UPDATE CONFIGURATIONS RUNTIME?

CORLEY SRL – WWW.CORLEY.IT

#! /usr/bin/env python

from __future__ import with_statementfrom fabric.api import *

from fabric.contrib.console import confirm from cloth.tasks import * env.user = "root"env.directory = '/mnt/wordpress'env.key_filename = ['/home/walter/Amazon/wp-cms.pem'] @taskdef reload(): "Reload Apache configuration"

run('/etc/init.d/apache2 reload') @taskdef tail(): "Tail Apache logs"

run('tail /var/log/syslog')

EC2 instances are dynamic with don’t know address, for that reason we can use tagging system to execute commands on a group of instances

fab nodes:"^production.*" tail

Execute the “tail” command on all instances with a name that starts with “production.”

Eg.• production.web-1• production.log• production.mongodb

Page 16: Corley scalability

EXAMPLE OF FABRIC – USAGE WITH CLOTH

CORLEY SRL – WWW.CORLEY.IT

• We create and destroy instances thanks to alarms but when we close an instance we lose immediately all apache logs (or equivalent)

• How we can manage logs?• The simplest way is to use Rsyslog clusters

• Rsyslog is an opensource software that forwarding log messages in an IP network

• Rsyslog implement the basic syslog protol• That means that we can configure apache logs to “syslog” instead

using normal text files.• In this way we can collect all logs in one group of VM and work

on these files later thanks to other technologies.

Page 17: Corley scalability

ALSO LOG MANAGEMENT IS NOT SIMPLE…

CORLEY SRL – WWW.CORLEY.IT

• Collecting logs is not the latest operation because you have to analyse and reduce information• Move logs to S3 bucket – Time based• Analyze logs with Hadoop

• Map Reduce on the cloud with Elastic Map Reduce service (EMR)

• Use script languages on top of Hadoop in order to simply the log analysis• HIVE – Data Warehouse infrastructure (data summarization)• Pig – High level platform for creating MapReduce program