37
Capacity Planning for LAMP Capacity Planning for LAMP Architectures Architectures John Allspaw John Allspaw Manager, Operations Flickr.com Web Builder 2.0 Las Vegas

Capacity Planning for LAMP Architectures

  • Upload
    aurora

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Capacity Planning for LAMP Architectures. Web Builder 2.0 Las Vegas. John Allspaw Manager, Operations Flickr.com. Capacity Planning for LAMP Architectures. Capacity planning: “ the ability to make snap decisions to spend millions of dollars with not enough information” - PowerPoint PPT Presentation

Citation preview

Page 1: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP ArchitecturesCapacity Planning for LAMP Architectures

John AllspawJohn AllspawManager, OperationsFlickr.com

Web Builder 2.0 Las Vegas

Page 2: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Capacity planning:

“the ability to make snap decisions to spend millions of dollars with not enough information”

- Kevin Murphy

Page 3: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

It is NOT:

• Performance tuning• Tips and tricks to be “scalable”

Page 4: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

It IS:

• What comes after you’ve made it all “scalable”• Making sure that you have enough equipment to

handle gradual and bursty traffic

Page 5: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Questions to answer

Page 6: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

How many of each server “class” should you add as you grow ?

Hint: Don’t add too much (too much $$! Ahh!)

Hint: Don’t add too little (too much traffic! Ahh!)

Page 7: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

How to make it easier to predict the future* ?

How to make it easier to justify those predictions ?

How to make it easier to predict the future…in the future ?!

*You can’t predict the future, but you can try.

Page 8: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

The OLD way of doing things was easier

• A.k.a. “web1.0”

• Small number of content producers

• Control over the content

• Capacity was dictated by the demand for that static/small content

• Even bbs/communities/ecommerce had relatively predictable growth

Page 9: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Today’s way of doing things is harder/fun

• No control over content (users have control)

• No control over usage (users have control)

Page 10: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Today’s way of doing things is harder/fun

• Network effect, nonlinear growth (more users/content/contacts/activity mean >> usage)

• Event-related growth (press, news event, social trend, etc. can affect usage and content)

Example: London bombing, tsunami, holidays, etc.

Page 11: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Considerations for “social” applications

• User behavior should guide you with defining capacity metrics (not just server stats)

• Usage can accelerate, not just grow

Page 12: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

How we do it at Flickr

Page 13: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Gathering Usage

Application-level information (users, photos, activity, etc.)

Server-level information (cpu, disk I/O, memory, etc.)

We tie the two together

Page 14: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

BEFORE we start collecting server stats

What resources are peak-driven ? (concurrent use)– Ex: photo processed/sec, pages/sec, images/sec, db qps

What resources are permanently consumable ?– Ex: database space, storage (GB/day) etc.

Page 15: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

BEFORE we start collecting server stats

What is an average user:consumption ratio ? (example: user: photo)

What is the high and low of ratios ?

Is the average ratio changing over time ?

Page 16: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Non-linear growth

Page 17: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Non-linear growth

Page 18: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Linear relationships, though

Page 19: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Server and Network statistics

Ganglia - (we love ganglia!)– Multicast-y goodness– SUPER simple to make a graph from any stat– Clustering

Other custom rrdtool-based stuff

MRTG

Page 20: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Photos uploaded/processed/min

Avg processing time per minute

Avg CPU per minute

Page 21: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Gather and record statistics

Accept the ‘observer effect’ (it’s worth it)

Aggregate your stats across clusters

– Stacked graphs– Totals and averages

Page 22: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Squid client requests(24 hours)

(Y axis is req/sec)

Page 23: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Squid LRU reference ageOver 24 hours

-Y axis is “days”-So peak has 3.6hours

Page 24: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Find the ceiling of each class/function/server

Maximum allowable somethings

– MySQL : queries/sec before slave lag sets in

– Apache/php : page requests/sec before total meltdown

– Squid/memcached : cache churn rate, request rate

– Storage : disk I/O utilization, storage limit(!)

– Etc.

Page 25: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Forget benchmarking, use real load

– Make sure you have a easy mechanism to take servers in and out of production

– Pull machines from a balanced pool during medium-level traffic (very carefully)

– Watch and record

Page 26: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Build the infrastructure to make it EASY to measure

Obvious things to help this:

Load balancing Network segmentation Carve up functions into clusters

– Don’t let a machine do more than one primary thing (if you can help it)

this isn’t for performance! If it makes it faster/better, then bonus!

Page 27: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

For graphs you don’t have raw data for

GraphClick

http://www.arizona-software.ch/applications/graphclick/en/

- graph digitizer package- $8 US- you pick points on a calibrated image, it spits out tabular data

Page 28: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Page 29: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Once you have:

1. Time history of metrics2. “Ideal” peak levels (ceiling)

Then you can:

3. Predict the future!*

Page 30: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Example:

Photo Processing

Page 31: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Photos uploaded/processed/min

Avg processing time per minute

Avg CPU per minute

Page 32: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Page 33: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Dirty linear math

25% CPU @40 photos/min40% CPU @60 photos/min

So….take a “ceiling”:

75% CPU @112 photos/min = 6720 photos/hour(but double-check the process time)

Page 34: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Conclusion

Page 35: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Know your machines and their limits Measure how the site is being used with application-

level stats Tie real-world observations to server stats

Page 36: Capacity Planning for LAMP Architectures

Capacity Planning for LAMP Capacity Planning for LAMP ArchitecturesArchitectures

Some Flickr statistics

300M photos, 4 or 5 different sizes Keep ~25M images in cache at any time, ~1M from RAM 2B MySQL queries/day 21k req/sec to memcached 1.2 PT raw disk storage

Page 37: Capacity Planning for LAMP Architectures