What Performance Metrics Do I Measure

8/9/2019 What Performance Metrics Do I Measure

1/22

Results from Metrics Research

Metrics definitionObjective

Methodology

Desktop Site Measurements

onLoad

Analysis

Action

Value Distribution

SpeedIndex

Analysis

Action Value Distribution

Time to First Byte (TTFB)

Analysis

Action

Value Distribution

Total number of Requests

Analysis

Action

Value Distribution

PageSpeed

Analysis

A ction

Value Distribution

VisualComplete

Analysis

Action

Value Distribution

Total Bytes

Analysis

Action

Value Distribution

Number of Domains

Analysis

Action

Value Distribution

Section Conclusion

Mobile


2/22

onLoad

Analysis

Action

Value Distribution

SpeedIndex

Analysis Action

Value Distribution


Analysis

Action

Value Distribution

VisualComplete

Analysis

Action

Value DistributionSection Conclusion

Conclusion

Appendix

Further Reading

People to Follow for Performance

R Program code

By: Akshay Ranganath, Enterprise Architect


3/22

Metrics definition

I’ve used the various metrics as defined in the WebPageTest website. Here’s a brief

summary of the metrics.

Load TimeThe Load Time is measured as the time from the start of the initial navigation until the beginning of

the window load event (onload).

Fully Loaded

The Fully Loaded time is measured as the time from the start of the initial navigation until there was

2 seconds of no network activity after Document Complete. This will usually include any activity

that is triggered by javascript after the main page loads.

First Byte

The First Byte time (often abbreviated as TTFB) is measured as the time from the start of the initial

navigation until the first byte of the base page is received by the browser (after following redirects).

Start Render

The Start Render time is measured as the time from the start of the initial navigation until the first

non-white content is painted to the browser display.

Speed Index

The Speed Index is a calculated metric that represents how quickly the page rendered the

user-visible content (lower is better). More information on how it is calculated is available here .

DOM Elements

The DOM Elements metric is the count of the DOM elements on the tested page as measured at

the end of the test.

Objective

The purpose of this research is the help website stakeholders to arrive at a right combination

of metrics that can help them to measure and record performance details. By “right

combination”, I mean the metrics that can provide value for different perspectives related to

performance.

Another objective is to identify metrics that are relatively rich and independent from othermetrics. The purpose of identifying specific metrics is to optimize and aid in measuring and

recording performance budget.

Do note that each business is different and critical metrics will vary by the business

objectives. For example, Twitter has described that they define effectiveness by “time to first

tweet”. WebPageTest allows an ability to define and track custom metrics . For more in-depth

https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/custom-metricshttps://blog.twitter.com/2012/improving-performance-on-twittercomhttp://timkadlec.com/2013/01/setting-a-performance-budget/https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-indexhttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics


4/22

look at custom metrics, do watch this webinar. The presentation from the webinar is posted

here.

This study is for those who are relatively new to the world of performance budget and are

looking answer to the question: “I have limited time and budget for measuring performance.

What are the top 3-4 measurements that will give the most bang for buck?”

Methodology

For computing the results, HTTPArchive database was used. From the date, all the non-Null

values were extracted and compared for correlation. On the desktop result set, there were

no null values for the metrics that were used for the study. However, for the mobile site

crawl, the data set is sparse and had null values that varied by metric. The idea here is to

look at patterns and hopefully we can re-visit the study once HttpArchive start to gather

more data for the mobile sites.

Correlation between the two values were computed using 2 measurements:

● Pearson Correlation : The Pearson product-moment correlation coefficient (sometimes

referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation

(dependence) between two variables X and Y, giving a value between +1 and −1

inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total

negative correlation.

● Spearman Correlation : Spearman's rank correlation coefficient or Spearman's rho,

named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as

r_s, is a nonparametric measure of statistical dependence between two variables. It

assesses how well the relationship between two variables can be described using a

monotonic function. If there are no repeated data values, a perfect Spearman

correlation of +1 or −1 occurs when each of the variables is a perfect monotone

function of the other.

In my analysis, a correlation of over +/-0.7 is considered as significant correlation and a value

below +/-0.4 is considered a correlation that is not significantly correlated. I have chose the

+/-0.7 and +/-0.4 as thresholds to make the analysis simple. Many of the metrics exhibit the

highest correlation at the level of +/-0.7. Empirically, metrics exhibiting values less than

+/-0.4 are the ones that we tend to consider as independent. For example, the relationship

between onLoad and number of elements in DoM is relatively independent. To differentiate

such metrics, I have chosen the range the cut-off of +/-0.7 and +/-0.4.

If two values are not significantly correlated, it would imply relative independence between

the two variables.

https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficienthttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://www.oreilly.com/pub/e/3390


5/22


6/22

Desktop Site Measurements

The following discussion is based on the HTTPArchive database run for March 15, 2015.

onLoad

AnalysisThis is the event that is typically measured by most 3rd party synthetic testing tools. Since it

is widespread, it makes sense to measure this metric.

onLoad is closely correlated to visualComplete and SpeedIndex. There is a decent

correlation between onLoad and total requests indicating that a site slows down as the

number of requests increase. It will be an interesting number to measure during the transition

from HTTP/1.1 to HTTP2. HTTP2 (or H2) aims provided ability to combine responses in

single TCP packet and that could help reduce the total number of round-trips.

Action

Always measure onLoad since it is one of the most widely used metric and can provide

performance comparison across different measuring resources like Synthetic tests, RUM andWebPageTest.

Do note that this is considered a very old metric and not a representative of user’s perceived

performance. Over time, reduce the stress on this metric and start to adopt newer metrics

that are closer to the performance that makes sense for your site. (See notes for more

details).


7/22

Value Distribution

All values in milliseconds (ms)

Min. 1st Quartile Median 3rd Quartile Max.

268 8442 14310 27510 102800

SpeedIndex

Analysis

SpeedIndex is closely correlated to visualComplete, renderStart and onLoad time

respectively. It is loosely correlated to TTFB and pagespeed.

Action

Measure speedindex as it is closely related to rendering of content, especially above the fold

content. Being a number, it is easier to compare and leaves little to subjective interpretation.

The biggest draw back is that this metric is not available across all testing products.

Value Distribution

For extremely performance oriented site, the ideal target is ~ 1000. Refer to this blog post by

Lara Hogan where she explains the design and use of a performance budget very well.

Here’s the distribution of the SpeedIndex:

All values in are in units and not based on time.

Min 1st Quartile Median 3rd Quartile Max

200 3956 6163 11330 104200

https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/


8/22

Time to First Byte (TTFB)

Analysis

TTFB appears to have no strong correlation with the other metrics. At most, it can impact the

start render time. All other metrics are relatively independent of this value.

Action

If this metric is being collected for a static resource that uses CDN, then it can help measure

the performance of the CDN to some extent. If the page is dynamic then, it will help

determine the health of connection and the time spent on back-end.

Since this is the only metric that can expose the time spent on back-end or the relative

optimality of CDN, it should be a metric that should be part of performance budget.

Value Distribution



67 545 943 1867 60920


9/22


Analysis

Total number of requests appears to be correlated with non-performance metrics like 3rd

party domains. However, it does show a decent correlation with fullyLoaded, visualComplete

and onLoad.

Action

If other metrics like onLoad is already being measured, then this metric may be of limited

value. This metric would be helpful when a customer appears to be relying on too many 3rdparty tags and we have reason to believe that there is a performance lag being caused by

these 3rd parties. WebPageTest has an option to test front-end SPOF. More information is

availble here and here.

http://www.stevesouders.com/blog/2010/06/01/frontend-spof/http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html


10/22

Value Distribution

All values in units


1 44 75 119 1715

PageSpeed

Analysis

Google PageSpeed is supposed to measure “the network-independent aspects of page

performance: the server configuration, the HTML structure of a page, and its use of external

resources such as images, JavaScript, and CSS”. This is clearly borne out by very lowcorrelation with other metrics. It is also important to note that the correlation is mostly

negative indicating that a lower value of the time metric corresponds to a higher pagespeed

score.

https://developers.google.com/speed/docs/insights/about


11/22

Action

Google PageSpeed values are relatively independent to other metrics and yet impact the site

structure. Since these are measures that needs to be implemented by the developers of

website, it is a very important metrics and should be part of the performance budget toolkit.

Apart from just a number, PageSpeed can also identify issues in the page design like

blocking javascripts and stylesheets that would be harder to identify with other metrics.

Value Distribution

All values in units between 0-100.


0 71 82 89 100

VisualComplete

Analysis

visualComplete tries to measure the time taken to render “above the fold” (ATF) content.

VisualComplete is closely related to the performance of fullyLoaded (when everything is

loaded), onLoad and SpeedIndex. This makes sense empirically as well. Unless a page has a

lot of lazy loaded / defered content, visualComplete will be close to fullyLoaded time.


12/22

Action

Since the value of this metric relates to SpeedIndex and onLoad, measuring it separately

would be of limited value. However, if you want to compare the performance of a page

before lazy loading and after lazy loading, them use the pair visualComplete and fullyLoaded

to measure the effectiveness of your implementation.

Value Distribution



0 6700 11900 21400 104200

Total Bytes

Analysis

Total bytes downloaded is not very strongly associated with any metric. However, it does

have a decent correlation with fullyLoaded, VisualComplete and onLoad.

It is interesting to note that total bytes has a non-linear correlation (Spearman correlation) to

fullyLoaded, visualComplete and onLoad. Empirically, it would mean that a 2 unit increase in

total bytes would cause a 1 unit increase in fullyLoaded. It could also mean that a unit

increase in total bytes could cause a 2 unit increase in fullyLoaded.


13/22

Action

This metric could help uncover sudden bloat in size, especially due to images or new

Javascript libraries. It would be a good catch-all metric to track, if your performance budget

allows for an extra metric to be used. Scott Jehl has an excellent article that talks about the

fact that a heavy page need not mean bad user experience.

Value Distribution

All values in bytes


0 608200 1275000 2005000 36770000

Number of Domains

Analysis

Higher number of domains appear to indicate a heavier website. Similarly, a higher number

of domains also indicates a slightly higher fullyLoaded time. However, the correlation is not

very strong.

http://www.filamentgroup.com/lab/weight-wait.html


14/22

Action

This metric would be helpful to track the number of shards and third parties. Generally

speaking, the number of 3rd parties must be controlled through a strict testing process.

Enforcing a policy of always asynchronously loading the 3rd parties or defering them after

onLoad should ensuring that the number of 3rd parties has minimal impact on perceived

performance. Catchpoint has an article on the impact of 3rd parties and the issue with SPOF

when 3rd party tags aren’t optimally placed.

It would be a good metric to track for ensuring compliance to limit 3rd parties. However,

measuring this metric will not have provide any useful information on the perceivedperformance for the user.

Value Distribution

All values in units


1 5 11 20 395

Section Conclusion

HTTPArchive has a lot of data captured for desktop websites. SpeedIndex clearly has a lot of

correlation to perceived performance metrics like pageLoad, startRender and visualComplete.

There are a lot of metrics associated it the number of domains, number of requests and

number of DOM elements. However, keeping in mind that we have a restricted budget, the

recommendation is to measure SpeedIndex, onLoad and PageSpeed scores.

http://blog.catchpoint.com/2015/03/12/truth-behind-effect-third-party-tags-web-performance/


15/22

If there is a lot of push to add more 3rd party metrics, then please measure the number of

domains and total number of requests. The impact these metrics have over performance can

then be documented and shown to the right business owners. This will provide good

discussion points on rationalizing the use of 3rd parties and using the services of those that

really matters.


16/22

Mobile

Analyzing the metrics for mobile is a bit hard. HTTPArchive does not collect many measures

like PageSpeed, time to first byte (TTFB) and dom related numbers. The number of sites

crawled too is much lesser (4000+) as compared desktop (400,000+).

Just because the numbers aren’t available in HTTPArchive does not mean they are not

measurable or unimportant. The missed metrics would definitely be very important for mobile

devices as well.

onLoad

Analysis

onLoad is the granddad of the performance metrics. As Steve Souders mentions in his blog ,

it is not very effective for a lazy-load, AJAX based, Web 2.0 application. However, it is the

metric that is supported by almost everyone. It is closely related to the fullyLoaded metricand has a good relationship to SpeedIndex.

Action

As this is a metric that is universally available and reported, it makes sense to continue to

track it and have specific performance budgets for it. However, care should be taken to

define the value for this metric. Spending too much of time optimizing it may cost the end

user experience.

http://www.stevesouders.com/blog/2013/05/13/moving-beyond-window-onload/


17/22

Here’s a slide highlighting this issue from one of SpeedCurve+Soasta presentations. “ATF”

stands for above-the-fold and Amazon, the page is quite usable by 2s whereas onLoad fires

only at 9s. At the other extreme, in case of Gmail, onLoad has fired at 3.9s whereas emails

are visible only after a second.

Value Distribution

Based on the explanation above, onLoad cannot really have a fixed value. The measure will

vary on the website implementation. If the site is relatively static and has very few

lazy-loading or AJAX based features, then it should aim for a low value. If there are a lot of

dynamic content being with clever logic handling below-the-fold content with lazy-loading

and other techniques, this metric can have a higher value.



584 9916 15700 23700 61580

SpeedIndex

Analysis

This metric is closely related to the visual elements like renderStart and visualComplete.

There is a more than linear relationship between this metric and visualComplete and

fullyLoaded.


18/22

Action

This is a metric that ties up different visual aspects like loading of “above-the-fold” content

and delivering an actionable site, it is a metric that should always be part of the metrics

collection set.

Value Distribution

All values in are in units and not based on time.


1000 6210 9220 10650 91860


Analysis

Compared to desktop results, total number of requests has a more direct correlation to

metrics like total bytes and visual metrics like fullyLoaded and visualComplete. However, if

onLoad is being measured then this metric may not be too important.

One interesting use case to measure this metric would be during the adoption of H2. Due to

better management of single TCP connection, the number of requests (from a single domain)

is not supposed to have a major impact page performance. However this assertion may not

be entirely hold true for mobile devices. Until better studies are available, tracking this metric

would provide insight for early adopters.


19/22

Action

Track this metric during the H2 adoption. Beyond this use case, it may not be a very valuable

metric to focus.

Value Distribution



1000 6210 9220 10650 91860

VisualComplete

Analysis

visualComplete appears to be closely related to SpeedIndex and onLoad as well.


20/22

Action

Since the recommendation is to measure both SpeedIndex and onLoad, this metric by itself

will not add value and can be ignored in the performance budget.

Value Distribution

All values in milliseconds (ms)Min 1st Quartile Median 3rd Quartile Max

0 9000 15000 23000 97000

Section Conclusion

The crawl data from HTTPArchive for mobile websites is relatively less rich. PageSpeed

scores are not available for the mobile devices but, that does not reduce its importance.

From just the available data, the best metrics to measure are SpeedIndex and onLoad (for

compatibility). Apart from this, number of compressed objects (numCompressed) andnumber of domains (numDomains) would be useful to measure since opening connections to

different domains is always expensive for a mobile device.

With the growing importance of mobile devices I am sure the future crawls will improve and

start to have much better reporting. Once this is available, I hope to re-do this part of the

research agin.


21/22

Conclusion

Based on the study, the following metrics appear to stand out in terms of richness and an

ability to provide different perspective of data:

● SpeedIndex (perceived performance)● onLoad (for backward compatibility)

● Google Page Speed (network independent optimization)

● TTFB (backend effectiveness, CDN efficiency)

● Total domains (3rd party bloat)

Depending on your appetite for data, consider measuring at least the

Do note that each website is different and has a special purpose. The best metric is one that

measures the effectiveness of this critical action. If none of the metric suits your needs, do

consider to develop a custom metric that helps your business.

Appendix

Further Reading

● Raw speed score correlation spreadsheet:

https://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9y

wXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing

● WebPageTest definition of metrics:

https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics

● General concept of performance budgeting:

https://en.wikipedia.org/wiki/Performance-based_budgeting ● Performance budget blog by Tim Kadlec:

http://timkadlec.com/2013/01/setting-a-performance-budget/

● Performance budget at Etsy by Lara Callendar Hogan:


● Grunt task for performance budgeting by Tim Kaldec:

https://github.com/tkadlec/grunt-perfbudget

● Performance budgeting using the Grunt task explained by Tim Kaldec:

http://timkadlec.com/2014/05/performance-budgeting-with-grunt/

● An easy to understand overview of Performance Budget by Catherine Farman:

http://www.sitepoint.com/automate-performance-testing-grunt-js/ ● Collection of tools to help in performance tuning: http://perf-tooling.today/tools

● Webinar “Creating Meaningful Metrics That Get Your Users to do the Things You

Want” - http://www.oreilly.com/pub/e/3390

● Lara Hogan’s blog post on a importance performance budget:


http://www.oreilly.com/pub/e/3390http://perf-tooling.today/toolshttp://www.sitepoint.com/automate-performance-testing-grunt-js/http://timkadlec.com/2014/05/performance-budgeting-with-grunt/https://github.com/tkadlec/grunt-perfbudgethttps://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/http://timkadlec.com/2013/01/setting-a-performance-budget/https://en.wikipedia.org/wiki/Performance-based_budgetinghttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metricshttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharinghttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing


22/22

● Chris Coyer’s summary of Tim Kadlec’s performance budget blog:

https://css-tricks.com/fast-fast-enough/

● A nice comment from Paul Irish on Performance Budget:

http://timkadlec.com/2014/01/fast-enough/#comment-1200946500

● A huge collection of articles, tools and videos related to performance:

http://perf.rocks/ ● Testing for Front-End SPOF by Patrick Meenan:

http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html

● Frontend SPOF by Steve Souders:

http://www.stevesouders.com/blog/2010/06/01/frontend-spof/

● Metrics reporting:

○ Catchpoint: http://www.catchpoint.com/

○ Keynote: http://www.keynote.com/

○ SpeedCurve: http://speedcurve.com/

○ SpeedTest.io Free Dashboard: http://dashboard.sitespeed.io/

People to Follow for Performance

● Steve Souders: @souders

● Scott Jehl: @scottjehl

● Tim Kadlec: @tkadlec

● Lara Hogan: @lara_hogan

● Guy Podjarny: @guypo

● Paul Irish: @paul_irish

● Ilya Grigorik: @igrigorik

● PerfPlanet: @perfplanet

● Hastags: #webperf #permatters

R Program code

Sample R program code to compute the correlation metric

data

Documents

What Performance Metrics Do I Measure