Caching, sharding, distributing - Scaling best practices

Preview:

DESCRIPTION

The german travel meta search engine Swoodoo was hit by heavy load spikes due to TV advertisments. Learn about the successful caching, hosting and database strategies we've implemented, and which did not work well. Covering file-based Caching, APC, memcached and sharded database layouts on to our experiences with fully virtualized hosting.

Citation preview

Caching, sharding, distributing - Scaling best practices.

18.11.2009

Lars Jankowfsky

CTO swoodoo AG

Mittwoch, 18. November 2009

About me:

PHP, C++, Developer, Software Architect since 1992

PHP since 1998

Many successful projects from 2 to 20 developers

Running right now three projects using eXtreme Programming

CTO and (Co-)Founder swoodoo AG

(Co-)Founder OXID eSales AG

Mittwoch, 18. November 2009

LOAD?

Average 17, Maximum 138

Mittwoch, 18. November 2009

Scaling?

Scaling Distributing

Caching Sharding

Mittwoch, 18. November 2009

(c) istockphoto

Mittwoch, 18. November 2009

Scaling

Mittwoch, 18. November 2009

SOA Scaling

Your App Your AppYour App

Your App Your AppYour App

Mittwoch, 18. November 2009

SOA Scaling

Your App

GUI/Frontend

Database

API

Engine

Mittwoch, 18. November 2009

SOA Scaling

GUI/Frontend

Database

API

Engine

GUI/FrontendGUI/Frontend

API

Mittwoch, 18. November 2009

SOA

Scalable!

You can add Servers where you need them

Easier maintainable

More robust

easy to introduce HA

Cloud...

PRO

Mittwoch, 18. November 2009

SOA

A lot of work....

Difficult to test when doing TDD

Complex deployment

CON

Mittwoch, 18. November 2009

Distributing

Mittwoch, 18. November 2009

Server 1

Virtual Machines Distributing

Server 2

GUI

GUI

GUI

API

API

API

DBEngine

Mittwoch, 18. November 2009

Virtual Machines Distributing

Server 1 Server 2

GUI

GUI

GUI

API

API

API

Server 2

API

API

API

Server 1 Server 2Engine Server 2DB

GUI

GUI

GUI

Mittwoch, 18. November 2009

Virtual Machines

Easy to distribute on new hardware as needed

Isolated, separated services even on one machine

Easy to install when using templates (DB, GUI...)

Very good for testing, staging

PRO

Mittwoch, 18. November 2009

Virtual Machines

Hardware failure....

Costs (at least for VMWare)

Performance penalty (15%)

Limitations (VMWare only 4 CPU‘s, VSphere 8...)

Some resources can‘t be virtualized (Disk I/O)

CON

Mittwoch, 18. November 2009

Caching

Mittwoch, 18. November 2009

Mittwoch, 18. November 2009

Caching

GUI/Frontend

Database

API

Engine

Mittwoch, 18. November 2009

Files

simple, easy for the begin

good for a „share nothing“ architecture

PRO

Mittwoch, 18. November 2009

Files

hits the HDD

consumes memory (file system cache)

local cache, can‘t be reused by different servers

manual handling of expiration

serialization penalty

CON

Mittwoch, 18. November 2009

APC

OPCODE Cache

Invalidation and size limits are automatically handled

good for a „share nothing“ architecture

PRO

Mittwoch, 18. November 2009

APC

bloats web server (apache) process memory

local cache, can‘t be reused by different servers

CON

Mittwoch, 18. November 2009

memcached

can be used by several servers

Invalidation and size limits are automatically handled

PRO

Mittwoch, 18. November 2009

memcached

network roundtrip penalty

serialization penalty

CON

Mittwoch, 18. November 2009

Conclusion

memcached

File System APC

Caching

Mittwoch, 18. November 2009

Conclusion

memcachedAPC

Caching

cacheopcoderarely used local data

Mittwoch, 18. November 2009

Sharding

Mittwoch, 18. November 2009

Single Table Database

Data

Mittwoch, 18. November 2009

Single Table

simple, easy for the begin

PRO

Mittwoch, 18. November 2009

Single Table

slow

read/write lock problematic

doesn‘t scale properly

CON

Mittwoch, 18. November 2009

Offline/Online Table Database

Once per hourOnline,read only

MYISAM

Offline,write only

INNODB

Mittwoch, 18. November 2009

Offline/Online Table

simple architecture

separation between read & write access

very fast reads

PRO

Mittwoch, 18. November 2009

Offline/Online Table

writes not scalable

generation process will take longer with more data

„stale“ data might occur in read table, no „live“ feeling

after generation of read table, is „cold“ again. Slow!

CON

Mittwoch, 18. November 2009

Flight Server

Sharding #1 Generation Database

masterMEMORY

masterMEMORY

masterMEMORY

masterMEMORY

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM

Mittwoch, 18. November 2009

Sharding #1 Generation

Scalable!

Still fast with hundreds of millions of records

Separates Database logic from system, easy scalable

Moving, Adding, Deleting shards on the fly

query can be run on various machines in parallel -> Fast!

PRO

Mittwoch, 18. November 2009

Sharding #1 Generation

Queries are limited by shards, you can‘t join all shards

Complex to develop, special „protocol“ needed for the queries

Custom Queries not possible, no SQL any more in your App.

Difficult to maintain data (import, export, purge...)

After failure or power loss it takes a while to rebuild tables

Memory table leak

CON

Mittwoch, 18. November 2009

Flight Server

Sharding #2 Generation Database

masterINNODB

masterINNODB

masterINNODB

masterINNODB

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM

Mittwoch, 18. November 2009

Sharding #2 Generation

More stable (INNODB vs. MEMORY)

Fast failover

Slave hardware can be used for production shards

PRO

Mittwoch, 18. November 2009

Sharding #2 Generation

Slower ( MEMORY faster than INNODB)

but that‘s ok, we got additional machines (slaves..)

CON

Mittwoch, 18. November 2009

„Questions?“

Mittwoch, 18. November 2009

Recommended