23
Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience [email protected] O’Reilly Open Source Convention, San Diego, CA July 24, 2002

Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Embed Size (px)

DESCRIPTION

Goals for Apache 2.0 Performance Make the httpd faster But what does that mean? –How will we measure speed? –What are we willing to sacrifice for speed? –And why does performance matter?

Citation preview

Page 1: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Performance Optimization in Apache 2.0 Development:

How we made Apache faster, and what

we learned from the experience

[email protected]

O’Reilly Open Source Convention, San Diego, CA July 24, 2002

Page 2: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Agenda• Introductions

• Performance optimization approach– Specific optimizations in Apache 2.0– General strategy for open-source

software performance improvement

• Results and Next Steps

Page 3: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Goals for Apache 2.0 Performance

• Make the httpd faster

• But what does that mean?– How will we measure speed?– What are we willing to sacrifice for

speed?– And why does performance matter?

Page 4: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Optimization Strategy: Part 1

Know your project’s priorities:•Metrics that matter•Rules of the game

Page 5: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Performance Guidelines• Metrics that matter for Apache:

– Throughput• HTTP requests per second

– Resource utilization• CPU, memory

• Rules of the game for Apache:– Keep the server portable, reliable,

configurable, maintainable, and compatible

Page 6: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Making Strategic Tradeoffs• Use these metrics and rules to make

effective tradeoffs

• Example: Table data structures– Slow, O(n)-time lookups; a significant

bottleneck– But 3rd party code depended upon the array-

based implementation (wasn’t well abstracted)– Solution: keep the O(n) design, but optimize it

heavily (improve the throughput metric, but maintain compatibility)

Page 7: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Optimization Strategy: Part 2

Profile early, profile often

Page 8: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Profiling Tools• We used traditional code profiling tools

to find the slow functions and basic blocks– gprof– Quantify– OProfile

• Plus tracing tools to profile system calls– truss– strace

• And occasional custom instrumentation

Page 9: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Profile-Driven Optimization• Profiling helps to create an

informal roadmap:– Small problems: fix the code now– Medium problems: phase in API changes &

faster code– Large problems: rearchitect

Page 10: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Profile-Driven OptimizationApache 2.0 optimizations due to profiling, throughout the entire request processing flow:

Faster accept(2)serialization

Less buffercopying

More scalable, multi-threaded memory allocator

Faster MIME-typemapper and configmerge

Less stringmanipulation

Complete rewrite ofserver-side-includeparser

Platform-specificsocket I/O speedups

Timestamp cachingin access logger

ReadRequest

Create RequestData Structures

Map URLto File

DetermineContent-Type

Stream OutputThrough Filters

Send ResponseTo Client

AcceptConnection

LogRequest

OpenFile

Page 11: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Optimization Strategy: Part 3

Take advantage ofimprovements in the platform

Page 12: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Platform Optimizations• 2.0 uses fast platform features if

available:– sendfile(2)– unserialized or pthread-mutex-serialized

accept(2)– Atomic operations

Page 13: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Platform Optimizations• Apache Portable Runtime (APR) library

abstracts the OS specifics– “Greatest common denominator” approach– Write your application code to use efficient

OS features– On platforms where those features are not

available, APR will emulate them

• In 2.0, the concurrency model is a plug-in– We can add better threading models for

specific platforms

Page 14: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Optimization Strategy, Part 4

Use the powerof distributed development

Page 15: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Distributed Development• Just like open source debugging, open-

source performance tuning scales well as more people work on a problem

• “Redundant” coding has worked well:– Multiple people implementing different

approaches to the same problem– Share ideas, compare results, pick the

best implementation

Page 16: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Distributed Optimization Example:SSI Parser

From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0

…Here are the top 30 functions, ranked according totheir CPU utilization. :

CPU timefunction (% of total)-------- ------------find_start_sequence 23.9 …* find_start_sequence() is the main scanning function within mod_include. …

Page 17: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Distributed Optimization Example:SSI Parser

From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence

…Basically, replace the inner search with aRabin-Karp search…

From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence

…Rabin-Karp introduces a lot of * and %.I'll try Boyer-Moore with precalced tablesfor '<!--#' and '--->'…

From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence

…I'd suggest looking at BNDM which combines theadvantages of bit-parallelism (shift-and/-oralgorithms) and suffix automata…

From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5

…I can post my code to the skip5 implementation. Itisn't optimized yet, but in my tests I see a lowerCPU utilization than the standard mod-includes parser…

Page 18: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Distributed Optimization Example:SSI Parser

From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...

…Replaced Rabin-Karp with the bndm algorithm asimplemented by Sascha. Seems to work. Can peopleplease test/review?…

• SSI parser performance improvement:– Before: 23.9% of total usr CPU time– After: 4.8%

• Greater than 4x improvement in 48 hours

Page 19: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

Results

Page 20: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

ResultsPerformance on a simple file delivery test:

Test case description:– Server running on Solaris 8 on Sun E4000/8x167

MHz, 2GB RAM– 20 concurrent client connections requesting 10KB

non-parsed file over 100Mb/s switched network

httpd Requests/sec

CPU Utilization

1.3.24 777 61%2.0.36 912 77%

Page 21: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

ResultsPerformance on a server-parsed (.shtml) file

test:

Test case description:– Server running on Solaris 8 on Sun E4000/8x167 MHz,

2GB RAM– 20 concurrent client connections over 100Mb/s switched

network– .shtml file with virtual includes of five 2KB files

httpd Requests/sec

CPU Utilization

1.3.24 389 94%2.0.37 712 93%

Page 22: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

ConclusionNext steps for Apache:

• Continue incremental performance improvements

• Explore highly scalable concurrency models (multiple connections per thread)

Page 23: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open

ConclusionRecommendations for other projects:

1. Know your project’s priorities:• Metrics that matter• Rules of the game

2. Profile early, profile often3. Take advantage of platform

improvements4. Use the power of distributed

development