Upload
alexis-felipe-alvarado-diaz
View
227
Download
0
Embed Size (px)
Citation preview
8/9/2019 What Performance Metrics Do I Measure
1/22
Results from Metrics Research
Metrics definitionObjective
Methodology
Desktop Site Measurements
onLoad
Analysis
Action
Value Distribution
SpeedIndex
Analysis
Action Value Distribution
Time to First Byte (TTFB)
Analysis
Action
Value Distribution
Total number of Requests
Analysis
Action
Value Distribution
PageSpeed
Analysis
A ction
Value Distribution
VisualComplete
Analysis
Action
Value Distribution
Total Bytes
Analysis
Action
Value Distribution
Number of Domains
Analysis
Action
Value Distribution
Section Conclusion
Mobile
8/9/2019 What Performance Metrics Do I Measure
2/22
onLoad
Analysis
Action
Value Distribution
SpeedIndex
Analysis Action
Value Distribution
Total number of Requests
Analysis
Action
Value Distribution
VisualComplete
Analysis
Action
Value DistributionSection Conclusion
Conclusion
Appendix
Further Reading
People to Follow for Performance
R Program code
By: Akshay Ranganath, Enterprise Architect
8/9/2019 What Performance Metrics Do I Measure
3/22
Metrics definition
I’ve used the various metrics as defined in the WebPageTest website. Here’s a brief
summary of the metrics.
Load TimeThe Load Time is measured as the time from the start of the initial navigation until the beginning of
the window load event (onload).
Fully Loaded
The Fully Loaded time is measured as the time from the start of the initial navigation until there was
2 seconds of no network activity after Document Complete. This will usually include any activity
that is triggered by javascript after the main page loads.
First Byte
The First Byte time (often abbreviated as TTFB) is measured as the time from the start of the initial
navigation until the first byte of the base page is received by the browser (after following redirects).
Start Render
The Start Render time is measured as the time from the start of the initial navigation until the first
non-white content is painted to the browser display.
Speed Index
The Speed Index is a calculated metric that represents how quickly the page rendered the
user-visible content (lower is better). More information on how it is calculated is available here .
DOM Elements
The DOM Elements metric is the count of the DOM elements on the tested page as measured at
the end of the test.
Objective
The purpose of this research is the help website stakeholders to arrive at a right combination
of metrics that can help them to measure and record performance details. By “right
combination”, I mean the metrics that can provide value for different perspectives related to
performance.
Another objective is to identify metrics that are relatively rich and independent from othermetrics. The purpose of identifying specific metrics is to optimize and aid in measuring and
recording performance budget.
Do note that each business is different and critical metrics will vary by the business
objectives. For example, Twitter has described that they define effectiveness by “time to first
tweet”. WebPageTest allows an ability to define and track custom metrics . For more in-depth
https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/custom-metricshttps://blog.twitter.com/2012/improving-performance-on-twittercomhttp://timkadlec.com/2013/01/setting-a-performance-budget/https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-indexhttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics
8/9/2019 What Performance Metrics Do I Measure
4/22
look at custom metrics, do watch this webinar. The presentation from the webinar is posted
here.
This study is for those who are relatively new to the world of performance budget and are
looking answer to the question: “I have limited time and budget for measuring performance.
What are the top 3-4 measurements that will give the most bang for buck?”
Methodology
For computing the results, HTTPArchive database was used. From the date, all the non-Null
values were extracted and compared for correlation. On the desktop result set, there were
no null values for the metrics that were used for the study. However, for the mobile site
crawl, the data set is sparse and had null values that varied by metric. The idea here is to
look at patterns and hopefully we can re-visit the study once HttpArchive start to gather
more data for the mobile sites.
Correlation between the two values were computed using 2 measurements:
● Pearson Correlation : The Pearson product-moment correlation coefficient (sometimes
referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation
(dependence) between two variables X and Y, giving a value between +1 and −1
inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total
negative correlation.
● Spearman Correlation : Spearman's rank correlation coefficient or Spearman's rho,
named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as
r_s, is a nonparametric measure of statistical dependence between two variables. It
assesses how well the relationship between two variables can be described using a
monotonic function. If there are no repeated data values, a perfect Spearman
correlation of +1 or −1 occurs when each of the variables is a perfect monotone
function of the other.
In my analysis, a correlation of over +/-0.7 is considered as significant correlation and a value
below +/-0.4 is considered a correlation that is not significantly correlated. I have chose the
+/-0.7 and +/-0.4 as thresholds to make the analysis simple. Many of the metrics exhibit the
highest correlation at the level of +/-0.7. Empirically, metrics exhibiting values less than
+/-0.4 are the ones that we tend to consider as independent. For example, the relationship
between onLoad and number of elements in DoM is relatively independent. To differentiate
such metrics, I have chosen the range the cut-off of +/-0.7 and +/-0.4.
If two values are not significantly correlated, it would imply relative independence between
the two variables.
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficienthttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://www.oreilly.com/pub/e/3390
8/9/2019 What Performance Metrics Do I Measure
5/22
8/9/2019 What Performance Metrics Do I Measure
6/22
Desktop Site Measurements
The following discussion is based on the HTTPArchive database run for March 15, 2015.
onLoad
AnalysisThis is the event that is typically measured by most 3rd party synthetic testing tools. Since it
is widespread, it makes sense to measure this metric.
onLoad is closely correlated to visualComplete and SpeedIndex. There is a decent
correlation between onLoad and total requests indicating that a site slows down as the
number of requests increase. It will be an interesting number to measure during the transition
from HTTP/1.1 to HTTP2. HTTP2 (or H2) aims provided ability to combine responses in
single TCP packet and that could help reduce the total number of round-trips.
Action
Always measure onLoad since it is one of the most widely used metric and can provide
performance comparison across different measuring resources like Synthetic tests, RUM andWebPageTest.
Do note that this is considered a very old metric and not a representative of user’s perceived
performance. Over time, reduce the stress on this metric and start to adopt newer metrics
that are closer to the performance that makes sense for your site. (See notes for more
details).
8/9/2019 What Performance Metrics Do I Measure
7/22
Value Distribution
All values in milliseconds (ms)
Min. 1st Quartile Median 3rd Quartile Max.
268 8442 14310 27510 102800
SpeedIndex
Analysis
SpeedIndex is closely correlated to visualComplete, renderStart and onLoad time
respectively. It is loosely correlated to TTFB and pagespeed.
Action
Measure speedindex as it is closely related to rendering of content, especially above the fold
content. Being a number, it is easier to compare and leaves little to subjective interpretation.
The biggest draw back is that this metric is not available across all testing products.
Value Distribution
For extremely performance oriented site, the ideal target is ~ 1000. Refer to this blog post by
Lara Hogan where she explains the design and use of a performance budget very well.
Here’s the distribution of the SpeedIndex:
All values in are in units and not based on time.
Min 1st Quartile Median 3rd Quartile Max
200 3956 6163 11330 104200
https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/
8/9/2019 What Performance Metrics Do I Measure
8/22
Time to First Byte (TTFB)
Analysis
TTFB appears to have no strong correlation with the other metrics. At most, it can impact the
start render time. All other metrics are relatively independent of this value.
Action
If this metric is being collected for a static resource that uses CDN, then it can help measure
the performance of the CDN to some extent. If the page is dynamic then, it will help
determine the health of connection and the time spent on back-end.
Since this is the only metric that can expose the time spent on back-end or the relative
optimality of CDN, it should be a metric that should be part of performance budget.
Value Distribution
All values in milliseconds (ms)
Min. 1st Quartile Median 3rd Quartile Max.
67 545 943 1867 60920
8/9/2019 What Performance Metrics Do I Measure
9/22
Total number of Requests
Analysis
Total number of requests appears to be correlated with non-performance metrics like 3rd
party domains. However, it does show a decent correlation with fullyLoaded, visualComplete
and onLoad.
Action
If other metrics like onLoad is already being measured, then this metric may be of limited
value. This metric would be helpful when a customer appears to be relying on too many 3rdparty tags and we have reason to believe that there is a performance lag being caused by
these 3rd parties. WebPageTest has an option to test front-end SPOF. More information is
availble here and here.
http://www.stevesouders.com/blog/2010/06/01/frontend-spof/http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html
8/9/2019 What Performance Metrics Do I Measure
10/22
Value Distribution
All values in units
Min. 1st Quartile Median 3rd Quartile Max.
1 44 75 119 1715
PageSpeed
Analysis
Google PageSpeed is supposed to measure “the network-independent aspects of page
performance: the server configuration, the HTML structure of a page, and its use of external
resources such as images, JavaScript, and CSS”. This is clearly borne out by very lowcorrelation with other metrics. It is also important to note that the correlation is mostly
negative indicating that a lower value of the time metric corresponds to a higher pagespeed
score.
https://developers.google.com/speed/docs/insights/about
8/9/2019 What Performance Metrics Do I Measure
11/22
Action
Google PageSpeed values are relatively independent to other metrics and yet impact the site
structure. Since these are measures that needs to be implemented by the developers of
website, it is a very important metrics and should be part of the performance budget toolkit.
Apart from just a number, PageSpeed can also identify issues in the page design like
blocking javascripts and stylesheets that would be harder to identify with other metrics.
Value Distribution
All values in units between 0-100.
Min. 1st Quartile Median 3rd Quartile Max.
0 71 82 89 100
VisualComplete
Analysis
visualComplete tries to measure the time taken to render “above the fold” (ATF) content.
VisualComplete is closely related to the performance of fullyLoaded (when everything is
loaded), onLoad and SpeedIndex. This makes sense empirically as well. Unless a page has a
lot of lazy loaded / defered content, visualComplete will be close to fullyLoaded time.
8/9/2019 What Performance Metrics Do I Measure
12/22
Action
Since the value of this metric relates to SpeedIndex and onLoad, measuring it separately
would be of limited value. However, if you want to compare the performance of a page
before lazy loading and after lazy loading, them use the pair visualComplete and fullyLoaded
to measure the effectiveness of your implementation.
Value Distribution
All values in milliseconds (ms)
Min. 1st Quartile Median 3rd Quartile Max.
0 6700 11900 21400 104200
Total Bytes
Analysis
Total bytes downloaded is not very strongly associated with any metric. However, it does
have a decent correlation with fullyLoaded, VisualComplete and onLoad.
It is interesting to note that total bytes has a non-linear correlation (Spearman correlation) to
fullyLoaded, visualComplete and onLoad. Empirically, it would mean that a 2 unit increase in
total bytes would cause a 1 unit increase in fullyLoaded. It could also mean that a unit
increase in total bytes could cause a 2 unit increase in fullyLoaded.
8/9/2019 What Performance Metrics Do I Measure
13/22
Action
This metric could help uncover sudden bloat in size, especially due to images or new
Javascript libraries. It would be a good catch-all metric to track, if your performance budget
allows for an extra metric to be used. Scott Jehl has an excellent article that talks about the
fact that a heavy page need not mean bad user experience.
Value Distribution
All values in bytes
Min. 1st Quartile Median 3rd Quartile Max.
0 608200 1275000 2005000 36770000
Number of Domains
Analysis
Higher number of domains appear to indicate a heavier website. Similarly, a higher number
of domains also indicates a slightly higher fullyLoaded time. However, the correlation is not
very strong.
http://www.filamentgroup.com/lab/weight-wait.html
8/9/2019 What Performance Metrics Do I Measure
14/22
Action
This metric would be helpful to track the number of shards and third parties. Generally
speaking, the number of 3rd parties must be controlled through a strict testing process.
Enforcing a policy of always asynchronously loading the 3rd parties or defering them after
onLoad should ensuring that the number of 3rd parties has minimal impact on perceived
performance. Catchpoint has an article on the impact of 3rd parties and the issue with SPOF
when 3rd party tags aren’t optimally placed.
It would be a good metric to track for ensuring compliance to limit 3rd parties. However,
measuring this metric will not have provide any useful information on the perceivedperformance for the user.
Value Distribution
All values in units
Min. 1st Quartile Median 3rd Quartile Max.
1 5 11 20 395
Section Conclusion
HTTPArchive has a lot of data captured for desktop websites. SpeedIndex clearly has a lot of
correlation to perceived performance metrics like pageLoad, startRender and visualComplete.
There are a lot of metrics associated it the number of domains, number of requests and
number of DOM elements. However, keeping in mind that we have a restricted budget, the
recommendation is to measure SpeedIndex, onLoad and PageSpeed scores.
http://blog.catchpoint.com/2015/03/12/truth-behind-effect-third-party-tags-web-performance/
8/9/2019 What Performance Metrics Do I Measure
15/22
If there is a lot of push to add more 3rd party metrics, then please measure the number of
domains and total number of requests. The impact these metrics have over performance can
then be documented and shown to the right business owners. This will provide good
discussion points on rationalizing the use of 3rd parties and using the services of those that
really matters.
8/9/2019 What Performance Metrics Do I Measure
16/22
Mobile
Analyzing the metrics for mobile is a bit hard. HTTPArchive does not collect many measures
like PageSpeed, time to first byte (TTFB) and dom related numbers. The number of sites
crawled too is much lesser (4000+) as compared desktop (400,000+).
Just because the numbers aren’t available in HTTPArchive does not mean they are not
measurable or unimportant. The missed metrics would definitely be very important for mobile
devices as well.
onLoad
Analysis
onLoad is the granddad of the performance metrics. As Steve Souders mentions in his blog ,
it is not very effective for a lazy-load, AJAX based, Web 2.0 application. However, it is the
metric that is supported by almost everyone. It is closely related to the fullyLoaded metricand has a good relationship to SpeedIndex.
Action
As this is a metric that is universally available and reported, it makes sense to continue to
track it and have specific performance budgets for it. However, care should be taken to
define the value for this metric. Spending too much of time optimizing it may cost the end
user experience.
http://www.stevesouders.com/blog/2013/05/13/moving-beyond-window-onload/
8/9/2019 What Performance Metrics Do I Measure
17/22
Here’s a slide highlighting this issue from one of SpeedCurve+Soasta presentations. “ATF”
stands for above-the-fold and Amazon, the page is quite usable by 2s whereas onLoad fires
only at 9s. At the other extreme, in case of Gmail, onLoad has fired at 3.9s whereas emails
are visible only after a second.
Value Distribution
Based on the explanation above, onLoad cannot really have a fixed value. The measure will
vary on the website implementation. If the site is relatively static and has very few
lazy-loading or AJAX based features, then it should aim for a low value. If there are a lot of
dynamic content being with clever logic handling below-the-fold content with lazy-loading
and other techniques, this metric can have a higher value.
All values in milliseconds (ms)
Min. 1st Quartile Median 3rd Quartile Max.
584 9916 15700 23700 61580
SpeedIndex
Analysis
This metric is closely related to the visual elements like renderStart and visualComplete.
There is a more than linear relationship between this metric and visualComplete and
fullyLoaded.
8/9/2019 What Performance Metrics Do I Measure
18/22
Action
This is a metric that ties up different visual aspects like loading of “above-the-fold” content
and delivering an actionable site, it is a metric that should always be part of the metrics
collection set.
Value Distribution
All values in are in units and not based on time.
Min 1st Quartile Median 3rd Quartile Max
1000 6210 9220 10650 91860
Total number of Requests
Analysis
Compared to desktop results, total number of requests has a more direct correlation to
metrics like total bytes and visual metrics like fullyLoaded and visualComplete. However, if
onLoad is being measured then this metric may not be too important.
One interesting use case to measure this metric would be during the adoption of H2. Due to
better management of single TCP connection, the number of requests (from a single domain)
is not supposed to have a major impact page performance. However this assertion may not
be entirely hold true for mobile devices. Until better studies are available, tracking this metric
would provide insight for early adopters.
8/9/2019 What Performance Metrics Do I Measure
19/22
Action
Track this metric during the H2 adoption. Beyond this use case, it may not be a very valuable
metric to focus.
Value Distribution
All values in milliseconds (ms)
Min 1st Quartile Median 3rd Quartile Max
1000 6210 9220 10650 91860
VisualComplete
Analysis
visualComplete appears to be closely related to SpeedIndex and onLoad as well.
8/9/2019 What Performance Metrics Do I Measure
20/22
Action
Since the recommendation is to measure both SpeedIndex and onLoad, this metric by itself
will not add value and can be ignored in the performance budget.
Value Distribution
All values in milliseconds (ms)Min 1st Quartile Median 3rd Quartile Max
0 9000 15000 23000 97000
Section Conclusion
The crawl data from HTTPArchive for mobile websites is relatively less rich. PageSpeed
scores are not available for the mobile devices but, that does not reduce its importance.
From just the available data, the best metrics to measure are SpeedIndex and onLoad (for
compatibility). Apart from this, number of compressed objects (numCompressed) andnumber of domains (numDomains) would be useful to measure since opening connections to
different domains is always expensive for a mobile device.
With the growing importance of mobile devices I am sure the future crawls will improve and
start to have much better reporting. Once this is available, I hope to re-do this part of the
research agin.
8/9/2019 What Performance Metrics Do I Measure
21/22
Conclusion
Based on the study, the following metrics appear to stand out in terms of richness and an
ability to provide different perspective of data:
● SpeedIndex (perceived performance)● onLoad (for backward compatibility)
● Google Page Speed (network independent optimization)
● TTFB (backend effectiveness, CDN efficiency)
● Total domains (3rd party bloat)
Depending on your appetite for data, consider measuring at least the
Do note that each website is different and has a special purpose. The best metric is one that
measures the effectiveness of this critical action. If none of the metric suits your needs, do
consider to develop a custom metric that helps your business.
Appendix
Further Reading
● Raw speed score correlation spreadsheet:
https://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9y
wXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing
● WebPageTest definition of metrics:
https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics
● General concept of performance budgeting:
https://en.wikipedia.org/wiki/Performance-based_budgeting ● Performance budget blog by Tim Kadlec:
http://timkadlec.com/2013/01/setting-a-performance-budget/
● Performance budget at Etsy by Lara Callendar Hogan:
https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/
● Grunt task for performance budgeting by Tim Kaldec:
https://github.com/tkadlec/grunt-perfbudget
● Performance budgeting using the Grunt task explained by Tim Kaldec:
http://timkadlec.com/2014/05/performance-budgeting-with-grunt/
● An easy to understand overview of Performance Budget by Catherine Farman:
http://www.sitepoint.com/automate-performance-testing-grunt-js/ ● Collection of tools to help in performance tuning: http://perf-tooling.today/tools
● Webinar “Creating Meaningful Metrics That Get Your Users to do the Things You
Want” - http://www.oreilly.com/pub/e/3390
● Lara Hogan’s blog post on a importance performance budget:
https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/
http://www.oreilly.com/pub/e/3390http://perf-tooling.today/toolshttp://www.sitepoint.com/automate-performance-testing-grunt-js/http://timkadlec.com/2014/05/performance-budgeting-with-grunt/https://github.com/tkadlec/grunt-perfbudgethttps://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/http://timkadlec.com/2013/01/setting-a-performance-budget/https://en.wikipedia.org/wiki/Performance-based_budgetinghttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metricshttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharinghttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing
8/9/2019 What Performance Metrics Do I Measure
22/22
● Chris Coyer’s summary of Tim Kadlec’s performance budget blog:
https://css-tricks.com/fast-fast-enough/
● A nice comment from Paul Irish on Performance Budget:
http://timkadlec.com/2014/01/fast-enough/#comment-1200946500
● A huge collection of articles, tools and videos related to performance:
http://perf.rocks/ ● Testing for Front-End SPOF by Patrick Meenan:
http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html
● Frontend SPOF by Steve Souders:
http://www.stevesouders.com/blog/2010/06/01/frontend-spof/
● Metrics reporting:
○ Catchpoint: http://www.catchpoint.com/
○ Keynote: http://www.keynote.com/
○ SpeedCurve: http://speedcurve.com/
○ SpeedTest.io Free Dashboard: http://dashboard.sitespeed.io/
People to Follow for Performance
● Steve Souders: @souders
● Scott Jehl: @scottjehl
● Tim Kadlec: @tkadlec
● Lara Hogan: @lara_hogan
● Guy Podjarny: @guypo
● Paul Irish: @paul_irish
● Ilya Grigorik: @igrigorik
● PerfPlanet: @perfplanet
● Hastags: #webperf #permatters
R Program code
Sample R program code to compute the correlation metric
data