Caching, sharding, distributing - Scaling best practices

Caching, sharding, distributing - Scaling best practices.

18.11.2009

Lars Jankowfsky

CTO swoodoo AG

Mittwoch, 18. November 2009

About me:

PHP, C++, Developer, Software Architect since 1992

PHP since 1998

Many successful projects from 2 to 20 developers

Running right now three projects using eXtreme Programming

CTO and (Co-)Founder swoodoo AG

(Co-)Founder OXID eSales AG


LOAD?

Average 17, Maximum 138


Scaling?

Scaling Distributing

Caching Sharding


(c) istockphoto


Scaling


SOA Scaling

Your App Your AppYour App

Your App Your AppYour App


SOA Scaling

Your App

GUI/Frontend

Database

API

Engine


SOA Scaling

GUI/Frontend

Database

API

Engine

GUI/FrontendGUI/Frontend

API


SOA

Scalable!

You can add Servers where you need them

Easier maintainable

More robust

easy to introduce HA

Cloud...

PRO


SOA

A lot of work....

Difficult to test when doing TDD

Complex deployment

CON


Distributing


Server 1

Virtual Machines Distributing

Server 2

GUI

GUI

GUI

API

API

API

DBEngine


Virtual Machines Distributing

Server 1 Server 2

GUI

GUI

GUI

API

API

API

Server 2

API

API

API

Server 1 Server 2Engine Server 2DB

GUI

GUI

GUI


Virtual Machines

Easy to distribute on new hardware as needed

Isolated, separated services even on one machine

Easy to install when using templates (DB, GUI...)

Very good for testing, staging

PRO


Virtual Machines

Hardware failure....

Costs (at least for VMWare)

Performance penalty (15%)

Limitations (VMWare only 4 CPU‘s, VSphere 8...)

Some resources can‘t be virtualized (Disk I/O)

CON


Caching



Caching

GUI/Frontend

Database

API

Engine


Files

simple, easy for the begin

good for a „share nothing“ architecture

PRO


Files

hits the HDD

consumes memory (file system cache)

local cache, can‘t be reused by different servers

manual handling of expiration

serialization penalty

CON


APC

OPCODE Cache

Invalidation and size limits are automatically handled

good for a „share nothing“ architecture

PRO


APC

bloats web server (apache) process memory

local cache, can‘t be reused by different servers

CON


memcached

can be used by several servers

Invalidation and size limits are automatically handled

PRO


memcached

network roundtrip penalty

serialization penalty

CON


Conclusion

memcached

File System APC

Caching


Conclusion

memcachedAPC

Caching

cacheopcoderarely used local data


Sharding


Single Table Database

Data


Single Table

simple, easy for the begin

PRO


Single Table

slow

read/write lock problematic

doesn‘t scale properly

CON


Offline/Online Table Database

Once per hourOnline,read only

MYISAM

Offline,write only

INNODB


Offline/Online Table

simple architecture

separation between read & write access

very fast reads

PRO


Offline/Online Table

writes not scalable

generation process will take longer with more data

„stale“ data might occur in read table, no „live“ feeling

after generation of read table, is „cold“ again. Slow!

CON


Flight Server

Sharding #1 Generation Database

masterMEMORY

masterMEMORY

masterMEMORY

masterMEMORY

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM


Sharding #1 Generation

Scalable!

Still fast with hundreds of millions of records

Separates Database logic from system, easy scalable

Moving, Adding, Deleting shards on the fly

query can be run on various machines in parallel -> Fast!

PRO



Queries are limited by shards, you can‘t join all shards

Complex to develop, special „protocol“ needed for the queries

Custom Queries not possible, no SQL any more in your App.

Difficult to maintain data (import, export, purge...)

After failure or power loss it takes a while to rebuild tables

Memory table leak

CON


Flight Server

Sharding #2 Generation Database

masterINNODB

masterINNODB

masterINNODB

masterINNODB

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM



More stable (INNODB vs. MEMORY)

Fast failover

Slave hardware can be used for production shards

PRO



Slower ( MEMORY faster than INNODB)

but that‘s ok, we got additional machines (slaves..)

CON


„Questions?“


Technology

Caching, sharding, distributing - Scaling best practices