Upload
gaetano-giunta
View
4.277
Download
2
Embed Size (px)
DESCRIPTION
How to make sure a website can survive go-live and cope with ever increasing traffic and amounts of data: knowing what to measure and log, during both development and production phases; load testing ; identifying bottlenecks; preventing disasters
Citation preview
Make it scale! Tools and techniques for analyzing
performances of eZ Publish websites
Gaetano Giunta | eZ Summer Camp | Sept. 6 2012
6/9/2012 SLIDE 3
Synopsys
How to make sure a website can survive go-live and cope with ever increasing
traffic and amounts of data: knowing what to measure and log, during both
development and production phases; load testing; identifying bottlenecks;
preventing disasters
PRESENTER: GAETANO GIUNTA
6/9/2012 SLIDE 4
Table of contents
The workshop is comprised of 2 parts:
• Theory
• As you might guess, it’s all about slides
• Can we skip this or do you want it really detailed? Raise hands!
• Practice
• Part 1: load testing
• Part 2: performance logging
Requirements:
• a working eZ Publish 4 installation on Linux (Debian/Ubuntu preferred)
[a VirtualBox image is available if you don’t have this]
• Shell access, root access
• Internet access
• LibreOffice (or any other spreadsheet software)
PRESENTER: GAETANO GIUNTA
1. A scalable web site
• It is impossible to go for infinite scalability
• Expected traffic figures should ideally be known beforehand
• If not, a round of load testing before go-live is highly recommended
2. “Fast enough” pages
• Definition of enough hash to be agreed upon: for a webshop it is smaller than for an institutional site
• Page load times experienced by the user depends on user bandwidth as well as html/js optimization (but that takes a dedicated workshop of its own)
scaling > faster pages
• Typical developer mistake: test pages on his own laptop (concurrency = 1)
• The fast page becomes extremely slow when concurrency increases
• If traffic never increases, your career as web developer is on a wrong path
6/9/2012 SLIDE 6
GOALS Finishing the workshop early to go bathing does not count ;-)
PRESENTER: GAETANO GIUNTA
• Developers measure traffic in PVS (page views per second/minute/hour/day)
• It is easy to relate to server load
• PV != Hits
• But serving static content should never pose a problem anyway
• Customers measure traffic in concurrent users
• It is a good idea to agree on metrics when defining goals
• Analytics packages generally measure user session length and average page
impressions per session => average page views per second per user
6/9/2012 SLIDE 7
What is “scalable” anyway? Lies, damn lies and statistics (W. Churchill)
PRESENTER: GAETANO GIUNTA
A few useful formulas:
• Apache MaxClients x max memory for a web page = server memory - OS memory
(assuming you are not running other stuff on the webserver, which you shouldn’t)
• Max PVs = Ap. MaxClients x 60 / ( page generation + delivery time )
• PVs = Max concurrent users x Avg user sess. impressions / Avg user sess. length
• Tips
• User session != webserver process
• Using a reverse proxy is almost always a good idea
• Apache processes never release memory until they are recycled
• Clients with low bandwidth keep a webserver process occupied for a long time; R.P. acts as
“buffer”
• Avoiding server swapping gives better results under peak traffic
• Limiting traffic at the webserver preserves the rest of the server farm from meltdown
6/9/2012 SLIDE 8
Math! Back-of-the-napkin type
PRESENTER: GAETANO GIUNTA
WE NEED TO MEASURE RESOURCE CONSUMPTION
TO FIND AND REMOVE SCALABILITY BOTTLENECKS
• Many resources are involved in serving web pages
• Network
• Server hardware
• OS
• Apache / PHP / APC / eZ Publish
• Database
• Solr / external services / more…
• Scalability is determined by the most scarce resource (bottleneck)
• …which is generally not known beforehand
• Improving response time for a resource which is not loaded can have the perverse effect of overloading the bottleneck resource and actually decrease performances!
• eZ Publish does many things “behind the back” of the developer
• developers suck anyway*
6/9/2012 SLIDE 9
Nosce te ipsum
PRESENTER: GAETANO GIUNTA
The more you measure, the slower the system (Heisemberg principle)
The more you measure, the harder it is to grasp the overall system state
For eZ Publish applications, start with:
• Ram, CPU, IO (disk), DB requests
• Can be measured either globally on the server (BLACK BOX) or “per page” (WHITE BOX)
• “per page” numbers will usually not vary between environments
• Time taken to generate web pages
• will vary depending on many factors (dev != prod)
• Split between the time eZ does “computation” and access to external resources
• Other?
• Number of active user sessions
6/9/2012 SLIDE 10
The art of measurement, I What to measure
PRESENTER: GAETANO GIUNTA
Up to eZ 4.6 eZ 4.7 and later
6/9/2012 SLIDE 11
Information overload Can you spot the problem?
Oracle AWR report (this is the «summary», the report actually goes on for 10 pages)
PRESENTER: GAETANO GIUNTA
1. During development - to avoid nasty surprises when it’s too late
• measured data should be easily understandable by developer
• it should in fact always be straight in his face
• it should be easy to drill down on specific problems
• all the way down to profiling every php function call
2. Before go-live - to validate production HW and architecture
• Never assume that production hw will magically solve all problems
• Sysadmins are morons anyway*
• This is the good time for some load-testing
3. In real-life usage - for post-mortem analysis, troubleshooting and more
• A small percentage of users could be getting slow pages without overall stats being
impacted
• Things always change over time
6/9/2012 SLIDE 12
The art of measurement, II When to measure
PRESENTER: GAETANO GIUNTA
• Black Box: measure load of the (web)server
• CLI tools: vmstat, free, iostat, top, ps, atop, dstat, etc…
• PHP: APC control panel
• Apache: mod_status
• Mysql: mtop, innotop, percona toolkit, mysqli_get_client_stats, monyog, mysql ent.
monitor
• Monitoring systems: munin, cacti, zenoss, etc…
(nb: availability monitoring != performance monitoring)
• Need to correlate data with web traffic
• Need to execute load testing to simulate real-life traffic
• White Box: measure load per page
• eZ debug output is good - but it is too detailed
• It does slow down the site a bit
• Reporting needed to compare evolution over time: have to log data somewhere
• Enter ezperformancelogger (but also ezsnpd, ezmunin, etc…)
6/9/2012 SLIDE 13
The art of measurement, III How to measure
PRESENTER: GAETANO GIUNTA
• Always check error logs if there is something unexpected in measured data
• Testing should be done using a realistic data set (eg. 10.000 users, not 10)
• The clock of all servers should be in sync to allow correlation (no, really)
• Measuring VMs: time is generally a lie
6/9/2012 SLIDE 14
The art of measurement, IV Tips
PRESENTER: GAETANO GIUNTA
1. Baseline test a) Test downloading a small static file, a big static file
b) Test executing the most simple php page which executes a db query
c) Increment concurrent users until you get no increase in hits / second
• Useful to uncover configuration errors in network / db / AMP stack
• This is the “idealistic” goal for your dynamic pages
2. Bruteforce test a) Hit the homepage N times in a row
b) Increment concurrent users until you get no increase in hits / second
• Can be run on other most-visited pages on the site as well
• Make sure you’re not testing 404 pages (or redirects)
• Reset user sessions between runs if they are auto-generated
• Keep open vmstat and iostat while test runs to quickly identify bottleneck
• Does not really correlate to concurrent users
• Results difficult to communicate to customer
6/9/2012 SLIDE 16
Load testing strategies I
PRESENTER: GAETANO GIUNTA
3. Full site navigation test
a) Enable logging of interesting data
b) Use wget or httrack to navigate the whole site
c) Get log files data into a spreadhseet
• Useful to uncover pages with bad resource usage
• Can be run with both cold and warm caches to gauge cache efficiency
4. Scenario testing
• Need support from end user to determine most likely/useful scenarios
• Takes time to configure in load-testing tool
• Do not believe tools that promise to automagically generate a scenario by “sniffing”
browser sessions: manual intervention will be needed
• Always validate first each single response before running the whole test
• The one test which is closer to real life…
• …but also one which is easy to manipulate (many knobs to tweak)
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 17
Load testing strategies II
• Do not use the webserver to execute the client (to avoid impact on cpu)
• Do not measure routers, firewalls or network card performance either (by testing from
remote network), unless what you want is real-world measures
• Always write down complete hw and sw specs – some of it will have changed next
time you want to run the test for comparison (a good idea: zip and save complete
apache and php config files, write down command line used for client in the report)
• Automate tasks to avoid human error / getting bored
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 18
Load testing tips
• Apache bench • Good: always available
• Bad: not very flexible; limited support for advanced http features
• Siege • Good: better than Apache Bench; some support for scenario testing
• Bad: not available by default in many linux distros (or windows)
• Jmeter • Good: allows complex scenario testing; can run tests from a farm of machines
• Bad: has a learning curve; needs Java
• Httperf, web polygraph, …
• Web-based tools • Good: can test from many locations across the world; easy to use
• Bad: usually do not offer too much flexibility; $$$
• Roll-your-own (php) script • Good: flexible; can be used on servers where you can not install any other software
• Bad: can not compare with other measures; guarantee of correctness
• Excellent: it is there to use! https://github.com/gggeek/ezab
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 19
Load testing tools (the ones I know about)
Comes with two scripts:
• ezab.php replacement for ApacheBench when it is not available
• abrunner.php runs ab many times in a rows and produces reports
Useful for strategies I and II
Example: testing the VirtualBox VM from the host OS – baseline data
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 20
Load testing tools: ezab https://github.com/gggeek/ezab
1 2 4 8 16 32 64 128
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0
20
40
60
80
100
120
140
160
180
200
phpinfo.php
Requests per second
Time per request ms (mean)
Time per request (90%)
Time per request (min)
Time per request (max)
Time per request (median)
concurrent clients
ms
rps
1 2 4 8 16 32 64 128
0
50
100
150
200
250
300
350
400
450
500
0
100
200
300
400
500
600
700
800
900
1000
favicon.ico
Requests per second
Time per request ms (mean)
Time per request (90%)
Time per request (min)
Time per request (max)
Time per request (median)
concurrent clients
ms
rps
c
Testing performance of the eZ Publish 4 installation (homepage) on VirtualBox
1. Install abrunner wget https://raw.github.com/gggeek/ezab/master/abrunner.php
2. Execute: test homepage of the installed site
php abrunner.php -s ezpublish4.ezsc -u / -c "1 2 4 8 16" -a
3. Import into LibreOffice the resulting csv file: test_logs/_.csv
4. Create a graph by selecting the first 7 columns
• Chart Type: line
• Data range: «1° column as label»
5. Icing on the cake: set a separate Y axis for the number of requests/second
6. Stop VM, add cpus, reboot and repeat steps 1-5, compare graphs
• Use the -l option to ezab to get different file names for reports
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 21
Load testing tools: ezab This is an exercice you are expected to carry out
See anything strange?
NB: this VM has 4 VCPUs
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 22
Load testing: eZ Publish homepage
1 2 4 8 16
0
1000
2000
3000
4000
5000
6000
0
1
2
3
4
5
6
7
8
9
10
Requests per second
Time per request ms (mean)
Time per request (90%)
Time per request (min)
Time per request (max)
Time per request (median)
See anything strange?
NB: this VM has 4 VCPUs
• Performance is waay too little:
6 page views per second
• RPS does not increase going
from 1 to 16 concurrent clients
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 23
Load testing: eZ Publish homepage
1 2 4 8 16
0
1000
2000
3000
4000
5000
6000
0
1
2
3
4
5
6
7
8
9
10
Requests per second
Time per request ms (mean)
Time per request (90%)
Time per request (min)
Time per request (max)
Time per request (median)
See anything strange?
NB: this VM has 4 VCPUs
• Performance is waay too little:
6 page views per second
• RPS does not increase going
from 1 to 16 concurrent clients
• Xdebug is ON, APC is OFF! sudo apt-get install php-apc
sudo mv /etc/php5/apache2/conf.d/xdebug.ini /etc/php5/apache2/conf.d/xdbg.ini.bak
sudo service apache2 restart
• Test again: we get 150 rps at concurrency 4 and 8 :-)
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 24
Load testing: eZ Publish homepage
1 2 4 8 16
0
50
100
150
200
250
300
350
0
20
40
60
80
100
120
140
160
180
200
Requests per second
Time per request ms (mean)
Time per request (90%)
Time per request (min)
Time per request (max)
Time per request (median)
Q: Is the site CPU or memory bound?
The answer is:
Cpu bound
At 16 concurrent clients,
cpu usage is near 100%
RAM is more than enough
And RPS decrease
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 25
Load testing: eZ Publish homepage
Cpu idle time No swap
• Website copier
• GUI app on windows, web-based (or command-line) on linux
• Used for further exercices later on
• Alternatives: wget -R
• Install and launch:
sudo apt-get install webhttrack
sudo /usr/lib/httrack/htsserver /usr/share/httrack/
• Connect to http://192.168.56.101:8080/
Tips
• Make sure the server can send requests to itself: add to /etc/hosts the
ezpublish4.ezsc hostname
• If a robots.txt file is present, it will be respected by default
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 26
Load testing tools: HTTRACK
• Allows the developer to define a set of variables (KPI) to be measured
• Each variable is measured for every page view (rest / ajax pages as well)
• Measured KPIs can be sent to multiple logging systems
• Csv formatted log probably the easiest to later parse
• Apache’s own access log probably the best suited to avoid any performance hit
• Supports logging directly to Google Analytics or Piwik via rewriting of html pages
• Common KPIs are available (eg. db queries, db time), custom ones can be added
• Integrates with Munin to visualize the measured data
• Throws in full integration with XHProf profiler as bonus
• According to facebook “good enough” to keep enabled in production
• Does NOT come with a nice GUI of its own
6/9/2012 SLIDE 28
eZ Performance Logger http://projects.ez.no/ezperformancelogger
PRESENTER: GAETANO GIUNTA
Requirements
• eZ Publish 4.x
• Apache webserver recommended
• Optional: Xhprof
• Optional: a Google Analytics account or Piwik
• Optional: Munin
The extension comes preinstalled in the Virtual Machine for the Workshop
To install by hand, follow the standard procedure – no need to touch the database
For advanced users: in classes/tracers, alternative connectors are provided for mysqli database and ezdfs cluster, which allow to measure performance data even in
production environments (where ezdebug is turned off).
6/9/2012 SLIDE 29
Installation
PRESENTER: GAETANO GIUNTA
1. Unzip and activate extension
2. Set logging format to csv-formatted file:
create file settings/override/ezperformancelogger.ini.append.php
[GeneralSettings]
LogMethods[]
LogMethods[]=csv
[csvSettings]
FileName=var/log/ezperflog.csv
3. Set performance indicators to be logged, eg: memory used, execution time
[GeneralSettings]
TrackVariables[]
TrackVariables[]=mem_usage
TrackVariables[]=execution_time
4. Test that it is working:
1. Browse to the homepage
2. Check for presence of var/log/ezperflog.csv
6/9/2012 SLIDE 30
First steps
PRESENTER: GAETANO GIUNTA
The eZPublish database connector measures all the commands sent to the database:
• Number of queries executed
• Time taken
This is only done when the debug output is enabled.
eZPerformanceLogger allows to log any existing «timing point»
1. Enable debug output (in settings/override/site.ini.append.php)
[DebugSettings]
DebugOutput=enabled
2. Add number of queries and time taken to the performance indicators logged
TrackVariables[]=accumulators/mysqli_query/count
TrackVariables[]=accumulators/mysqli_query
3. Rotate the csv file, since it will now have a different number of columns
php extension/ezperformancelogger/bin/php/rotateperflogs.php
4. Reload homepage, check log file
6/9/2012 SLIDE 31
Measuring database queries per page
PRESENTER: GAETANO GIUNTA
By default ajax calls and requests which end up in a redirect are not logged.
How to fix:
• Edit index.php, on line 198 add
eZExecution::addCleanupHandler( array( 'eZPerfLogger', 'cleanup' ) );
• Browse content in the Admin interface (which uses ezjscore)
• Look for calls to ezjscore/call in var/log/ezperflog.csv
• Other frontend controllers have to be patched as well
• index_ajax.php (removed in recent versions)
• index_treemenu.php, index_treemenu_tags.php, index_soap.php
• index_cluster.php currently not supported
6/9/2012 SLIDE 32
Making sure all requests are measured
PRESENTER: GAETANO GIUNTA
Q: is the site database-bound ?
1. Use httrack to navigate the whole site (cache warmup)
Tip: exclude from files to be downloaded all images, css, js, m4v
2. Rotate log file php extension/ezperformancelogger/bin/php/rotateperflogs.php
3. Use httrack to navigate the whole site again
4. Rotate log file
5. Import log file into LibreOffice
6. Graph db queries per page, db time per page as % of page time
Q: how effective is the view cache?
i. Disable the view cache
ii. Clear all caches
iii. Execute steps 1 to 6 again
iv. Compare the number of queries per page
6/9/2012 SLIDE 33
Visualization of data: spreadsheets
PRESENTER: GAETANO GIUNTA
• Munin is an open source monitoring tool
• It generates daily and weekly graphs for collected data
• It collects a lot of data from the operating system
• It comes with a wide set of plugins for existing software, such as Apache
and MySql
• Creating plugins for new software is relatively easy
• Agent-based architecture: a munin “master” server can collect and display
data from multiple “node” servers
• For our scenario, the webserver acts as both master and node
• Master: runs a cronjob that generates reports by querying nodes and stores
them in /var/cache/munin; the reports are made available via Apache
• Slave: runs a daemon: munin-node, listening on port 4949
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 35
Visualization of data: Munin I http://www.munin-monitoring.org/
• The default interval for collecting data is 5 minutes (it should be more flexible in
version 2)
• All ezperformancelogger KPIs can be shown in a Munin graph
• By default, the “per page” value of the PKI is shown
• In every graph, the Average, Maximum and Minimum value are shown
• Via eZ Publish settings, appearance of those graphs can be tuned
• Note: the timestamp of the last time the munin plugin has collected data from
ezperformancelogger for any specific KPI is stored in var/<vardir>/log
6/9/2012 SLIDE 37
Visualization of data: Munin II Integration of eZ Performance Logger
PRESENTER: GAETANO GIUNTA
1. make sure you have a valid munin-node installation on your webserver
Connect to http://192.168.56.101/munin
If you get an access denied method, edit /etc/apache2/conf.d/munin <Directory /var/cache/munin/www>
Allow from all
2. Symlink the file bin/scripts/ezmuninperflogger_ into /usr/share/munin/plugins/ and make it executable
cd extension/ezperformancelogger/bin/scripts
chmod 755 ezmuninperflogger_
sudo ln –s
/var/www/ezpublish4/extension/ezperformancelogger/bin/scripts/ezmuninperflogger_
/usr/share/munin/plugins
Fix an error in the script ezmuninperflogger_: on line 1 put
#!/bin/bash
instead of
#!/bin/sh
6/9/2012 SLIDE 38
Integrating with Munin I
PRESENTER: GAETANO GIUNTA
3. Create a configuration file for the munin plugin:
sudo vi /etc/munin/plugin-conf.d/ezmuninperflogger
[ezmuninperflogger_*]
env.php /usr/bin/php
env.ezpublishroot /var/www/ezpublish4
4. Restart the munin node service
sudo service munin-node restart
5. check if the configuration works: sudo munin-node-configure --suggest
If it does, you should see in the output a line similar to:
ezmuninperflogger_ | no | yes (+execution_time +mem_usage)
the "yes" in the 2nd column is important. Between parenthesis you get the list of
variables which can be graphed
6/9/2012 SLIDE 39
Integrating with Munin II
PRESENTER: GAETANO GIUNTA
6. activate the plugin:
sudo munin-node-configure --suggest –shell
You should get 3 lines with "ln -s ..." commands. Execute them (nb: as root)
7. test that it works: run: sudo munin-run ezmuninperflogger_<$varname>
8. restart munin-node again: sudo service munin-node restart
9. navigate the site, wait 5 minutes, connect to Munin again.
Troubleshooting tip: munin logs are available in /var/log/munin
10. Integrate Munin in the eZ administration interface: edit ezperformancelogger.ini
[MuninSettings]
MuninURL=http://192.168.56.101/munin/
11. Optionally, you can customize how the variables recorded will show up in Munin graphs by editing more ini settings in section [MuninSettings]
6/9/2012 SLIDE 40
Integrating with Munin III
PRESENTER: GAETANO GIUNTA
XHProf
• http://pecl.php.net/package/xhprof
• Profiler from Facebook
• Designed to be fast enough to be used in production (at least faster than Xdebug ;-)
• Comes with its own web-based GUI
• Installation sudo apt-get install graphviz
sudo pecl config-set preferred_state beta
sudo pecl install xhprof
sudo vi /etc/php5/apache2/conf.d/xhprof.ini => add extension=xhprof.so
sudo service apache2 restart
• While at it, disable apc (???)
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 42
Drilling down on hot code paths
• edit your config.php file and add the following lines at the top:
(if you miss the config.php file, copy config.php-RECOMMENDED into config.php)
include( 'extension/ezperformancelogger/classes/ezxhproflogger.php' );
eZXHProfLogger::start();
• Log in to admin interface, go to Setup tab, bottom-left menu item: XHProf Profiling
• You can see the data recorded for the pages you have just browsed to
• Click on the name of a run to get profiling information in all its gory detail
Tips
• To avoid logging profiling of all pages, you can start it in any place in the code
• Links to profiling runs will be displayed in the debug output as well...
• ...but enabling debug output does have an impact on profiling
• A cronjob is available to periodically remove old profiling data
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 43
Activating XHprof
Thanks for participating!
These slides https://dl.dropbox.com/u/520168/eZ%20Performance%20Measurement.pdf
Source code, command snippets https://gist.github.com/gggeek (look for gists numbered 1 to 9)
About me
Consultant for eZ Systems since 2007 [email protected]
@gggeek http://share.ez.no/blogs/gaetano-giunta
http://projects.ez.no/users/community/gaetano_giunta
With helpful support from Yannick Modah Gouez! ( [email protected] )
* = I hope you were not offended by jokes about developers and sysadmins.
I consider myself a devop: someone incarnating the worst aspects of both ;-)
6/9/2012 PRESENTER: GAETANO GIUNTA SLIDE 47
Questions?