41
Caching, sharding, distributing - Scaling best practices. 18.11.2009 Lars Jankowfsky CTO swoodoo AG Mittwoch, 18. November 2009

Caching, sharding, distributing - Scaling best practices

Embed Size (px)

DESCRIPTION

The german travel meta search engine Swoodoo was hit by heavy load spikes due to TV advertisments. Learn about the successful caching, hosting and database strategies we've implemented, and which did not work well. Covering file-based Caching, APC, memcached and sharded database layouts on to our experiences with fully virtualized hosting.

Citation preview

Page 1: Caching, sharding, distributing - Scaling best practices

Caching, sharding, distributing - Scaling best practices.

18.11.2009

Lars Jankowfsky

CTO swoodoo AG

Mittwoch, 18. November 2009

Page 2: Caching, sharding, distributing - Scaling best practices

About me:

PHP, C++, Developer, Software Architect since 1992

PHP since 1998

Many successful projects from 2 to 20 developers

Running right now three projects using eXtreme Programming

CTO and (Co-)Founder swoodoo AG

(Co-)Founder OXID eSales AG

Mittwoch, 18. November 2009

Page 3: Caching, sharding, distributing - Scaling best practices

LOAD?

Average 17, Maximum 138

Mittwoch, 18. November 2009

Page 4: Caching, sharding, distributing - Scaling best practices

Scaling?

Scaling Distributing

Caching Sharding

Mittwoch, 18. November 2009

Page 5: Caching, sharding, distributing - Scaling best practices

(c) istockphoto

Mittwoch, 18. November 2009

Page 6: Caching, sharding, distributing - Scaling best practices

Scaling

Mittwoch, 18. November 2009

Page 7: Caching, sharding, distributing - Scaling best practices

SOA Scaling

Your App Your AppYour App

Your App Your AppYour App

Mittwoch, 18. November 2009

Page 8: Caching, sharding, distributing - Scaling best practices

SOA Scaling

Your App

GUI/Frontend

Database

API

Engine

Mittwoch, 18. November 2009

Page 9: Caching, sharding, distributing - Scaling best practices

SOA Scaling

GUI/Frontend

Database

API

Engine

GUI/FrontendGUI/Frontend

API

Mittwoch, 18. November 2009

Page 10: Caching, sharding, distributing - Scaling best practices

SOA

Scalable!

You can add Servers where you need them

Easier maintainable

More robust

easy to introduce HA

Cloud...

PRO

Mittwoch, 18. November 2009

Page 11: Caching, sharding, distributing - Scaling best practices

SOA

A lot of work....

Difficult to test when doing TDD

Complex deployment

CON

Mittwoch, 18. November 2009

Page 12: Caching, sharding, distributing - Scaling best practices

Distributing

Mittwoch, 18. November 2009

Page 13: Caching, sharding, distributing - Scaling best practices

Server 1

Virtual Machines Distributing

Server 2

GUI

GUI

GUI

API

API

API

DBEngine

Mittwoch, 18. November 2009

Page 14: Caching, sharding, distributing - Scaling best practices

Virtual Machines Distributing

Server 1 Server 2

GUI

GUI

GUI

API

API

API

Server 2

API

API

API

Server 1 Server 2Engine Server 2DB

GUI

GUI

GUI

Mittwoch, 18. November 2009

Page 15: Caching, sharding, distributing - Scaling best practices

Virtual Machines

Easy to distribute on new hardware as needed

Isolated, separated services even on one machine

Easy to install when using templates (DB, GUI...)

Very good for testing, staging

PRO

Mittwoch, 18. November 2009

Page 16: Caching, sharding, distributing - Scaling best practices

Virtual Machines

Hardware failure....

Costs (at least for VMWare)

Performance penalty (15%)

Limitations (VMWare only 4 CPU‘s, VSphere 8...)

Some resources can‘t be virtualized (Disk I/O)

CON

Mittwoch, 18. November 2009

Page 17: Caching, sharding, distributing - Scaling best practices

Caching

Mittwoch, 18. November 2009

Page 18: Caching, sharding, distributing - Scaling best practices

Mittwoch, 18. November 2009

Page 19: Caching, sharding, distributing - Scaling best practices

Caching

GUI/Frontend

Database

API

Engine

Mittwoch, 18. November 2009

Page 20: Caching, sharding, distributing - Scaling best practices

Files

simple, easy for the begin

good for a „share nothing“ architecture

PRO

Mittwoch, 18. November 2009

Page 21: Caching, sharding, distributing - Scaling best practices

Files

hits the HDD

consumes memory (file system cache)

local cache, can‘t be reused by different servers

manual handling of expiration

serialization penalty

CON

Mittwoch, 18. November 2009

Page 22: Caching, sharding, distributing - Scaling best practices

APC

OPCODE Cache

Invalidation and size limits are automatically handled

good for a „share nothing“ architecture

PRO

Mittwoch, 18. November 2009

Page 23: Caching, sharding, distributing - Scaling best practices

APC

bloats web server (apache) process memory

local cache, can‘t be reused by different servers

CON

Mittwoch, 18. November 2009

Page 24: Caching, sharding, distributing - Scaling best practices

memcached

can be used by several servers

Invalidation and size limits are automatically handled

PRO

Mittwoch, 18. November 2009

Page 25: Caching, sharding, distributing - Scaling best practices

memcached

network roundtrip penalty

serialization penalty

CON

Mittwoch, 18. November 2009

Page 26: Caching, sharding, distributing - Scaling best practices

Conclusion

memcached

File System APC

Caching

Mittwoch, 18. November 2009

Page 27: Caching, sharding, distributing - Scaling best practices

Conclusion

memcachedAPC

Caching

cacheopcoderarely used local data

Mittwoch, 18. November 2009

Page 28: Caching, sharding, distributing - Scaling best practices

Sharding

Mittwoch, 18. November 2009

Page 29: Caching, sharding, distributing - Scaling best practices

Single Table Database

Data

Mittwoch, 18. November 2009

Page 30: Caching, sharding, distributing - Scaling best practices

Single Table

simple, easy for the begin

PRO

Mittwoch, 18. November 2009

Page 31: Caching, sharding, distributing - Scaling best practices

Single Table

slow

read/write lock problematic

doesn‘t scale properly

CON

Mittwoch, 18. November 2009

Page 32: Caching, sharding, distributing - Scaling best practices

Offline/Online Table Database

Once per hourOnline,read only

MYISAM

Offline,write only

INNODB

Mittwoch, 18. November 2009

Page 33: Caching, sharding, distributing - Scaling best practices

Offline/Online Table

simple architecture

separation between read & write access

very fast reads

PRO

Mittwoch, 18. November 2009

Page 34: Caching, sharding, distributing - Scaling best practices

Offline/Online Table

writes not scalable

generation process will take longer with more data

„stale“ data might occur in read table, no „live“ feeling

after generation of read table, is „cold“ again. Slow!

CON

Mittwoch, 18. November 2009

Page 35: Caching, sharding, distributing - Scaling best practices

Flight Server

Sharding #1 Generation Database

masterMEMORY

masterMEMORY

masterMEMORY

masterMEMORY

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM

Mittwoch, 18. November 2009

Page 36: Caching, sharding, distributing - Scaling best practices

Sharding #1 Generation

Scalable!

Still fast with hundreds of millions of records

Separates Database logic from system, easy scalable

Moving, Adding, Deleting shards on the fly

query can be run on various machines in parallel -> Fast!

PRO

Mittwoch, 18. November 2009

Page 37: Caching, sharding, distributing - Scaling best practices

Sharding #1 Generation

Queries are limited by shards, you can‘t join all shards

Complex to develop, special „protocol“ needed for the queries

Custom Queries not possible, no SQL any more in your App.

Difficult to maintain data (import, export, purge...)

After failure or power loss it takes a while to rebuild tables

Memory table leak

CON

Mittwoch, 18. November 2009

Page 38: Caching, sharding, distributing - Scaling best practices

Flight Server

Sharding #2 Generation Database

masterINNODB

masterINNODB

masterINNODB

masterINNODB

slaveMYISAM

slaveMYISAM

slaveMYISAM

slaveMYISAM

Mittwoch, 18. November 2009

Page 39: Caching, sharding, distributing - Scaling best practices

Sharding #2 Generation

More stable (INNODB vs. MEMORY)

Fast failover

Slave hardware can be used for production shards

PRO

Mittwoch, 18. November 2009

Page 40: Caching, sharding, distributing - Scaling best practices

Sharding #2 Generation

Slower ( MEMORY faster than INNODB)

but that‘s ok, we got additional machines (slaves..)

CON

Mittwoch, 18. November 2009

Page 41: Caching, sharding, distributing - Scaling best practices

„Questions?“

Mittwoch, 18. November 2009