Upload
ross-mccormick
View
218
Download
0
Embed Size (px)
DESCRIPTION
Goals for Apache 2.0 Performance Make the httpd faster But what does that mean? –How will we measure speed? –What are we willing to sacrifice for speed? –And why does performance matter?
Citation preview
Performance Optimization in Apache 2.0 Development:
How we made Apache faster, and what
we learned from the experience
O’Reilly Open Source Convention, San Diego, CA July 24, 2002
Agenda• Introductions
• Performance optimization approach– Specific optimizations in Apache 2.0– General strategy for open-source
software performance improvement
• Results and Next Steps
Goals for Apache 2.0 Performance
• Make the httpd faster
• But what does that mean?– How will we measure speed?– What are we willing to sacrifice for
speed?– And why does performance matter?
Optimization Strategy: Part 1
Know your project’s priorities:•Metrics that matter•Rules of the game
Performance Guidelines• Metrics that matter for Apache:
– Throughput• HTTP requests per second
– Resource utilization• CPU, memory
• Rules of the game for Apache:– Keep the server portable, reliable,
configurable, maintainable, and compatible
Making Strategic Tradeoffs• Use these metrics and rules to make
effective tradeoffs
• Example: Table data structures– Slow, O(n)-time lookups; a significant
bottleneck– But 3rd party code depended upon the array-
based implementation (wasn’t well abstracted)– Solution: keep the O(n) design, but optimize it
heavily (improve the throughput metric, but maintain compatibility)
Optimization Strategy: Part 2
Profile early, profile often
Profiling Tools• We used traditional code profiling tools
to find the slow functions and basic blocks– gprof– Quantify– OProfile
• Plus tracing tools to profile system calls– truss– strace
• And occasional custom instrumentation
Profile-Driven Optimization• Profiling helps to create an
informal roadmap:– Small problems: fix the code now– Medium problems: phase in API changes &
faster code– Large problems: rearchitect
Profile-Driven OptimizationApache 2.0 optimizations due to profiling, throughout the entire request processing flow:
Faster accept(2)serialization
Less buffercopying
More scalable, multi-threaded memory allocator
Faster MIME-typemapper and configmerge
Less stringmanipulation
Complete rewrite ofserver-side-includeparser
Platform-specificsocket I/O speedups
Timestamp cachingin access logger
ReadRequest
Create RequestData Structures
Map URLto File
DetermineContent-Type
Stream OutputThrough Filters
Send ResponseTo Client
AcceptConnection
LogRequest
OpenFile
Optimization Strategy: Part 3
Take advantage ofimprovements in the platform
Platform Optimizations• 2.0 uses fast platform features if
available:– sendfile(2)– unserialized or pthread-mutex-serialized
accept(2)– Atomic operations
Platform Optimizations• Apache Portable Runtime (APR) library
abstracts the OS specifics– “Greatest common denominator” approach– Write your application code to use efficient
OS features– On platforms where those features are not
available, APR will emulate them
• In 2.0, the concurrency model is a plug-in– We can add better threading models for
specific platforms
Optimization Strategy, Part 4
Use the powerof distributed development
Distributed Development• Just like open source debugging, open-
source performance tuning scales well as more people work on a problem
• “Redundant” coding has worked well:– Multiple people implementing different
approaches to the same problem– Share ideas, compare results, pick the
best implementation
Distributed Optimization Example:SSI Parser
From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0
…Here are the top 30 functions, ranked according totheir CPU utilization. :
CPU timefunction (% of total)-------- ------------find_start_sequence 23.9 …* find_start_sequence() is the main scanning function within mod_include. …
Distributed Optimization Example:SSI Parser
From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence
…Basically, replace the inner search with aRabin-Karp search…
From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence
…Rabin-Karp introduces a lot of * and %.I'll try Boyer-Moore with precalced tablesfor '<!--#' and '--->'…
From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence
…I'd suggest looking at BNDM which combines theadvantages of bit-parallelism (shift-and/-oralgorithms) and suffix automata…
From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5
…I can post my code to the skip5 implementation. Itisn't optimized yet, but in my tests I see a lowerCPU utilization than the standard mod-includes parser…
Distributed Optimization Example:SSI Parser
From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...
…Replaced Rabin-Karp with the bndm algorithm asimplemented by Sascha. Seems to work. Can peopleplease test/review?…
• SSI parser performance improvement:– Before: 23.9% of total usr CPU time– After: 4.8%
• Greater than 4x improvement in 48 hours
Results
ResultsPerformance on a simple file delivery test:
Test case description:– Server running on Solaris 8 on Sun E4000/8x167
MHz, 2GB RAM– 20 concurrent client connections requesting 10KB
non-parsed file over 100Mb/s switched network
httpd Requests/sec
CPU Utilization
1.3.24 777 61%2.0.36 912 77%
ResultsPerformance on a server-parsed (.shtml) file
test:
Test case description:– Server running on Solaris 8 on Sun E4000/8x167 MHz,
2GB RAM– 20 concurrent client connections over 100Mb/s switched
network– .shtml file with virtual includes of five 2KB files
httpd Requests/sec
CPU Utilization
1.3.24 389 94%2.0.37 712 93%
ConclusionNext steps for Apache:
• Continue incremental performance improvements
• Explore highly scalable concurrency models (multiple connections per thread)
ConclusionRecommendations for other projects:
1. Know your project’s priorities:• Metrics that matter• Rules of the game
2. Profile early, profile often3. Take advantage of platform
improvements4. Use the power of distributed
development