What Performance Metrics Do I Measure

Embed Size (px)

Citation preview

  • 8/9/2019 What Performance Metrics Do I Measure

    1/22

    Results from Metrics Research

    Metrics definitionObjective

    Methodology

    Desktop Site Measurements

    onLoad

     Analysis

     Action

     Value Distribution

    SpeedIndex

     Analysis

     Action Value Distribution

    Time to First Byte (TTFB)

     Analysis

     Action

     Value Distribution

    Total number of Requests

     Analysis

     Action

     Value Distribution

    PageSpeed

     Analysis

     A ction

     Value Distribution

     VisualComplete

     Analysis

     Action

     Value Distribution

    Total Bytes

     Analysis

     Action

     Value Distribution

    Number of Domains

     Analysis

     Action

     Value Distribution

    Section Conclusion

    Mobile

  • 8/9/2019 What Performance Metrics Do I Measure

    2/22

    onLoad

     Analysis

     Action

     Value Distribution

    SpeedIndex

     Analysis Action

     Value Distribution

    Total number of Requests

     Analysis

     Action

     Value Distribution

     VisualComplete

     Analysis

     Action

     Value DistributionSection Conclusion

    Conclusion

     Appendix

    Further Reading

    People to Follow for Performance

    R Program code

    By: Akshay Ranganath, Enterprise Architect 

  • 8/9/2019 What Performance Metrics Do I Measure

    3/22

    Metrics definition

    I’ve used the various metrics as defined in the WebPageTest website. Here’s a brief

    summary of the metrics.

    Load TimeThe Load Time is measured as the time from the start of the initial navigation until the beginning of

    the window load event (onload).

    Fully Loaded

    The Fully Loaded time is measured as the time from the start of the initial navigation until there was

    2 seconds of no network activity after Document Complete. This will usually include any activity

    that is triggered by javascript after the main page loads.

    First Byte

    The First Byte time (often abbreviated as TTFB) is measured as the time from the start of the initial

    navigation until the first byte of the base page is received by the browser (after following redirects).

    Start Render

    The Start Render time is measured as the time from the start of the initial navigation until the first

    non-white content is painted to the browser display.

    Speed Index

    The Speed Index is a calculated metric that represents how quickly the page rendered the

    user-visible content (lower is better). More information on how it is calculated is available here .

    DOM Elements

    The DOM Elements metric is the count of the DOM elements on the tested page as measured at

    the end of the test.

    Objective

    The purpose of this research is the help website stakeholders to arrive at a right combination

    of metrics that can help them to measure and record performance details. By “right

    combination”, I mean the metrics that can provide value for different perspectives related to

    performance.

     Another objective is to identify metrics that are relatively rich and independent from othermetrics. The purpose of identifying specific metrics is to optimize and aid in measuring and

    recording performance budget.

    Do note that each business is different and critical metrics will vary by the business

    objectives. For example, Twitter has described that they define effectiveness by “time to first

    tweet”. WebPageTest allows an ability to define and track custom metrics . For more in-depth

    https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/custom-metricshttps://blog.twitter.com/2012/improving-performance-on-twittercomhttp://timkadlec.com/2013/01/setting-a-performance-budget/https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-indexhttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics

  • 8/9/2019 What Performance Metrics Do I Measure

    4/22

    look at custom metrics, do watch this webinar. The presentation from the webinar is posted

    here.

    This study is for those who are relatively new to the world of performance budget and are

    looking answer to the question: “I have limited time and budget for measuring performance.

    What are the top 3-4 measurements that will give the most bang for buck?”

    Methodology

    For computing the results, HTTPArchive database was used. From the date, all the non-Null

    values were extracted and compared for correlation. On the desktop result set, there were

    no null values for the metrics that were used for the study. However, for the mobile site

    crawl, the data set is sparse and had null values that varied by metric. The idea here is to

    look at patterns and hopefully we can re-visit the study once HttpArchive start to gather

    more data for the mobile sites.

    Correlation between the two values were computed using 2 measurements:

    ● Pearson Correlation : The Pearson product-moment correlation coefficient (sometimes

    referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation

    (dependence) between two variables X and Y, giving a value between +1 and −1

    inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total

    negative correlation.

    ● Spearman Correlation : Spearman's rank correlation coefficient or Spearman's rho,

    named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as

    r_s, is a nonparametric measure of statistical dependence between two variables. It

    assesses how well the relationship between two variables can be described using a

    monotonic function. If there are no repeated data values, a perfect Spearman

    correlation of +1 or −1 occurs when each of the variables is a perfect monotone

    function of the other.

    In my analysis, a correlation of over +/-0.7 is considered as significant correlation and a value

    below +/-0.4 is considered a correlation that is not significantly correlated. I have chose the

    +/-0.7 and +/-0.4 as thresholds to make the analysis simple. Many of the metrics exhibit the

    highest correlation at the level of +/-0.7. Empirically, metrics exhibiting values less than

    +/-0.4 are the ones that we tend to consider as independent. For example, the relationship

    between onLoad and number of elements in DoM is relatively independent. To differentiate

    such metrics, I have chosen the range the cut-off of +/-0.7 and +/-0.4.

    If two values are not significantly correlated, it would imply relative independence between

    the two variables.

    https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficienthttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://www.oreilly.com/pub/e/3390

  • 8/9/2019 What Performance Metrics Do I Measure

    5/22

     

  • 8/9/2019 What Performance Metrics Do I Measure

    6/22

    Desktop Site Measurements

    The following discussion is based on the HTTPArchive database run for March 15, 2015.

    onLoad

     AnalysisThis is the event that is typically measured by most 3rd party synthetic testing tools. Since it

    is widespread, it makes sense to measure this metric.

    onLoad is closely correlated to visualComplete and SpeedIndex. There is a decent

    correlation between onLoad and total requests indicating that a site slows down as the

    number of requests increase. It will be an interesting number to measure during the transition

    from HTTP/1.1 to HTTP2. HTTP2 (or H2) aims provided ability to combine responses in

    single TCP packet and that could help reduce the total number of round-trips.

     Action

     Always measure onLoad since it is one of the most widely used metric and can provide

    performance comparison across different measuring resources like Synthetic tests, RUM andWebPageTest.

    Do note that this is considered a very old metric and not a representative of user’s perceived

    performance. Over time, reduce the stress on this metric and start to adopt newer metrics

    that are closer to the performance that makes sense for your site. (See notes for more

    details).

  • 8/9/2019 What Performance Metrics Do I Measure

    7/22

     

     Value Distribution

     All values in milliseconds (ms)

    Min. 1st Quartile Median 3rd Quartile Max.

    268 8442 14310 27510 102800

    SpeedIndex

     Analysis

    SpeedIndex is closely correlated to visualComplete, renderStart and onLoad time

    respectively. It is loosely correlated to TTFB and pagespeed.

     Action

    Measure speedindex as it is closely related to rendering of content, especially above the fold

    content. Being a number, it is easier to compare and leaves little to subjective interpretation.

    The biggest draw back is that this metric is not available across all testing products.

     Value Distribution

    For extremely performance oriented site, the ideal target is ~ 1000. Refer to this blog  post by

    Lara Hogan where she explains the design and use of a performance budget very well.

    Here’s the distribution of the SpeedIndex:

     All values in are in units and not based on time.

    Min 1st Quartile Median 3rd Quartile Max

    200 3956 6163 11330 104200

    https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/

  • 8/9/2019 What Performance Metrics Do I Measure

    8/22

     

    Time to First Byte (TTFB)

     Analysis

    TTFB appears to have no strong correlation with the other metrics. At most, it can impact the

    start render time. All other metrics are relatively independent of this value.

     Action

    If this metric is being collected for a static resource that uses CDN, then it can help measure

    the performance of the CDN to some extent. If the page is dynamic then, it will help

    determine the health of connection and the time spent on back-end.

    Since this is the only metric that can expose the time spent on back-end or the relative

    optimality of CDN, it should be a metric that should be part of performance budget.

     Value Distribution

     All values in milliseconds (ms)

    Min. 1st Quartile Median 3rd Quartile Max.

    67 545 943 1867 60920

  • 8/9/2019 What Performance Metrics Do I Measure

    9/22

    Total number of Requests

     Analysis

    Total number of requests appears to be correlated with non-performance metrics like 3rd

    party domains. However, it does show a decent correlation with fullyLoaded, visualComplete

    and onLoad.

     Action

    If other metrics like onLoad is already being measured, then this metric may be of limited

    value. This metric would be helpful when a customer appears to be relying on too many 3rdparty tags and we have reason to believe that there is a performance lag being caused by

    these 3rd parties. WebPageTest has an option to test front-end SPOF. More information is

    availble here and here. 

    http://www.stevesouders.com/blog/2010/06/01/frontend-spof/http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html

  • 8/9/2019 What Performance Metrics Do I Measure

    10/22

     

     Value Distribution

     All values in units

    Min. 1st Quartile Median 3rd Quartile Max.

    1 44 75 119 1715

    PageSpeed

     Analysis

    Google PageSpeed is supposed to measure “the network-independent aspects of page

    performance: the server configuration, the HTML structure of a page, and its use of external

    resources such as images, JavaScript, and CSS”. This is clearly borne out by very lowcorrelation with other metrics. It is also important to note that the correlation is mostly

    negative indicating that a lower value of the time metric corresponds to a higher pagespeed

    score.

    https://developers.google.com/speed/docs/insights/about

  • 8/9/2019 What Performance Metrics Do I Measure

    11/22

     

     Action

    Google PageSpeed values are relatively independent to other metrics and yet impact the site

    structure. Since these are measures that needs to be implemented by the developers of

    website, it is a very important metrics and should be part of the performance budget toolkit.

     Apart from just a number, PageSpeed can also identify issues in the page design like

    blocking javascripts and stylesheets that would be harder to identify with other metrics.

     Value Distribution

     All values in units between 0-100.

    Min. 1st Quartile Median 3rd Quartile Max.

    0 71 82 89 100

     VisualComplete

     Analysis

    visualComplete tries to measure the time taken to render “above the fold” (ATF) content.

     VisualComplete is closely related to the performance of fullyLoaded (when everything is

    loaded), onLoad and SpeedIndex. This makes sense empirically as well. Unless a page has a

    lot of lazy loaded / defered content, visualComplete will be close to fullyLoaded time.

  • 8/9/2019 What Performance Metrics Do I Measure

    12/22

     

     Action

    Since the value of this metric relates to SpeedIndex and onLoad, measuring it separately

    would be of limited value. However, if you want to compare the performance of a page

    before lazy loading and after lazy loading, them use the pair visualComplete and fullyLoaded

    to measure the effectiveness of your implementation.

     Value Distribution

     All values in milliseconds (ms)

    Min. 1st Quartile Median 3rd Quartile Max.

    0 6700 11900 21400 104200

    Total Bytes

     Analysis

    Total bytes downloaded is not very strongly associated with any metric. However, it does

    have a decent correlation with fullyLoaded, VisualComplete and onLoad.

    It is interesting to note that total bytes has a non-linear correlation (Spearman correlation) to

    fullyLoaded, visualComplete and onLoad. Empirically, it would mean that a 2 unit increase in

    total bytes would cause a 1 unit increase in fullyLoaded. It could also mean that a unit

    increase in total bytes could cause a 2 unit increase in fullyLoaded.

  • 8/9/2019 What Performance Metrics Do I Measure

    13/22

     

     Action

    This metric could help uncover sudden bloat in size, especially due to images or new

    Javascript libraries. It would be a good catch-all metric to track, if your performance budget

    allows for an extra metric to be used. Scott Jehl has an excellent article  that talks about the

    fact that a heavy page need not mean bad user experience.

     Value Distribution

     All values in bytes

    Min. 1st Quartile Median 3rd Quartile Max.

    0 608200 1275000 2005000 36770000

    Number of Domains

     Analysis

    Higher number of domains appear to indicate a heavier website. Similarly, a higher number

    of domains also indicates a slightly higher fullyLoaded time. However, the correlation is not

    very strong.

    http://www.filamentgroup.com/lab/weight-wait.html

  • 8/9/2019 What Performance Metrics Do I Measure

    14/22

     

     Action

    This metric would be helpful to track the number of shards and third parties. Generally

    speaking, the number of 3rd parties must be controlled through a strict testing process.

    Enforcing a policy of always asynchronously loading the 3rd parties or defering them after

    onLoad should ensuring that the number of 3rd parties has minimal impact on perceived

    performance. Catchpoint has an article  on the impact of 3rd parties and the issue with SPOF

    when 3rd party tags aren’t optimally placed.

    It would be a good metric to track for ensuring compliance to limit 3rd parties. However,

    measuring this metric will not have provide any useful information on the perceivedperformance for the user.

     Value Distribution

     All values in units

    Min. 1st Quartile Median 3rd Quartile Max.

    1 5 11 20 395

    Section Conclusion

    HTTPArchive has a lot of data captured for desktop websites. SpeedIndex clearly has a lot of

    correlation to perceived performance metrics like pageLoad, startRender and visualComplete.

    There are a lot of metrics associated it the number of domains, number of requests and

    number of DOM elements. However, keeping in mind that we have a restricted budget, the

    recommendation is to measure SpeedIndex, onLoad and PageSpeed scores.

    http://blog.catchpoint.com/2015/03/12/truth-behind-effect-third-party-tags-web-performance/

  • 8/9/2019 What Performance Metrics Do I Measure

    15/22

    If there is a lot of push to add more 3rd party metrics, then please measure the number of

    domains and total number of requests. The impact these metrics have over performance can

    then be documented and shown to the right business owners. This will provide good

    discussion points on rationalizing the use of 3rd parties and using the services of those that

    really matters.

  • 8/9/2019 What Performance Metrics Do I Measure

    16/22

    Mobile

     Analyzing the metrics for mobile is a bit hard. HTTPArchive does not collect many measures

    like PageSpeed, time to first byte (TTFB) and dom related numbers. The number of sites

    crawled too is much lesser (4000+) as compared desktop (400,000+).

    Just because the numbers aren’t available in HTTPArchive does not mean they are not

    measurable or unimportant. The missed metrics would definitely be very important for mobile

    devices as well.

    onLoad

     Analysis

    onLoad is the granddad of the performance metrics. As Steve Souders mentions in his blog ,

    it is not very effective for a lazy-load, AJAX based, Web 2.0 application. However, it is the

    metric that is supported by almost everyone. It is closely related to the fullyLoaded metricand has a good relationship to SpeedIndex.

     Action

     As this is a metric that is universally available and reported, it makes sense to continue to

    track it and have specific performance budgets for it. However, care should be taken to

    define the value for this metric. Spending too much of time optimizing it may cost the end

    user experience.

    http://www.stevesouders.com/blog/2013/05/13/moving-beyond-window-onload/

  • 8/9/2019 What Performance Metrics Do I Measure

    17/22

    Here’s a slide highlighting this issue from one of SpeedCurve+Soasta presentations. “ATF”

    stands for above-the-fold and Amazon, the page is quite usable by 2s whereas onLoad fires

    only at 9s. At the other extreme, in case of Gmail, onLoad has fired at 3.9s whereas emails

    are visible only after a second.

     Value Distribution

    Based on the explanation above, onLoad cannot really have a fixed value. The measure will

    vary on the website implementation. If the site is relatively static and has very few

    lazy-loading or AJAX based features, then it should aim for a low value. If there are a lot of

    dynamic content being with clever logic handling below-the-fold content with lazy-loading

    and other techniques, this metric can have a higher value.

     All values in milliseconds (ms)

    Min. 1st Quartile Median 3rd Quartile Max.

    584 9916 15700 23700 61580

    SpeedIndex

     Analysis

    This metric is closely related to the visual elements like renderStart and visualComplete.

    There is a more than linear relationship between this metric and visualComplete and

    fullyLoaded.

  • 8/9/2019 What Performance Metrics Do I Measure

    18/22

     

     Action

    This is a metric that ties up different visual aspects like loading of “above-the-fold” content

    and delivering an actionable site, it is a metric that should always be part of the metrics

    collection set.

     Value Distribution

     All values in are in units and not based on time.

    Min 1st Quartile Median 3rd Quartile Max

    1000 6210 9220 10650 91860

    Total number of Requests

     Analysis

    Compared to desktop results, total number of requests has a more direct correlation to

    metrics like total bytes and visual metrics like fullyLoaded and visualComplete. However, if

    onLoad is being measured then this metric may not be too important.

    One interesting use case to measure this metric would be during the adoption of H2. Due to

    better management of single TCP connection, the number of requests (from a single domain)

    is not supposed to have a major impact page performance. However this assertion may not

    be entirely hold true for mobile devices. Until better studies are available, tracking this metric

    would provide insight for early adopters.

  • 8/9/2019 What Performance Metrics Do I Measure

    19/22

     

     Action

    Track this metric during the H2 adoption. Beyond this use case, it may not be a very valuable

    metric to focus.

     Value Distribution

     All values in milliseconds (ms)

    Min 1st Quartile Median 3rd Quartile Max

    1000 6210 9220 10650 91860

     VisualComplete

     Analysis

    visualComplete appears to be closely related to SpeedIndex and onLoad as well.

  • 8/9/2019 What Performance Metrics Do I Measure

    20/22

     

     Action

    Since the recommendation is to measure both SpeedIndex and onLoad, this metric by itself

    will not add value and can be ignored in the performance budget.

     Value Distribution

     All values in milliseconds (ms)Min 1st Quartile Median 3rd Quartile Max

    0 9000 15000 23000 97000

    Section Conclusion 

    The crawl data from HTTPArchive for mobile websites is relatively less rich. PageSpeed

    scores are not available for the mobile devices but, that does not reduce its importance.

    From just the available data, the best metrics to measure are SpeedIndex and onLoad (for

    compatibility). Apart from this, number of compressed objects (numCompressed) andnumber of domains (numDomains) would be useful to measure since opening connections to

    different domains is always expensive for a mobile device.

    With the growing importance of mobile devices I am sure the future crawls will improve and

    start to have much better reporting. Once this is available, I hope to re-do this part of the

    research agin.

  • 8/9/2019 What Performance Metrics Do I Measure

    21/22

    Conclusion

    Based on the study, the following metrics appear to stand out in terms of richness and an

    ability to provide different perspective of data:

    ● SpeedIndex (perceived performance)● onLoad (for backward compatibility)

    ● Google Page Speed (network independent optimization)

    ● TTFB (backend effectiveness, CDN efficiency)

    ● Total domains (3rd party bloat)

    Depending on your appetite for data, consider measuring at least the

    Do note that each website is different and has a special purpose. The best metric is one that

    measures the effectiveness of this critical action. If none of the metric suits your needs, do

    consider to develop a custom metric that helps your business.

     Appendix

    Further Reading

    ● Raw speed score correlation spreadsheet:

    https://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9y

    wXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing  

    ● WebPageTest definition of metrics:

    https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics 

    ● General concept of performance budgeting:

    https://en.wikipedia.org/wiki/Performance-based_budgeting  ● Performance budget blog by Tim Kadlec:

    http://timkadlec.com/2013/01/setting-a-performance-budget/  

    ● Performance budget at Etsy by Lara Callendar Hogan:

    https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/   

    ● Grunt task for performance budgeting by Tim Kaldec:

    https://github.com/tkadlec/grunt-perfbudget 

    ● Performance budgeting using the Grunt task explained by Tim Kaldec:

    http://timkadlec.com/2014/05/performance-budgeting-with-grunt/  

    ●  An easy to understand overview of Performance Budget by Catherine Farman:

    http://www.sitepoint.com/automate-performance-testing-grunt-js/  ● Collection of tools to help in performance tuning: http://perf-tooling.today/tools 

    ● Webinar “Creating Meaningful Metrics That Get Your Users to do the Things You

    Want” - http://www.oreilly.com/pub/e/3390  

    ● Lara Hogan’s blog post on a importance performance budget:

    https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/

    http://www.oreilly.com/pub/e/3390http://perf-tooling.today/toolshttp://www.sitepoint.com/automate-performance-testing-grunt-js/http://timkadlec.com/2014/05/performance-budgeting-with-grunt/https://github.com/tkadlec/grunt-perfbudgethttps://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/http://timkadlec.com/2013/01/setting-a-performance-budget/https://en.wikipedia.org/wiki/Performance-based_budgetinghttps://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metricshttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharinghttps://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9ywXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing

  • 8/9/2019 What Performance Metrics Do I Measure

    22/22

    ● Chris Coyer’s summary of Tim Kadlec’s performance budget blog:

    https://css-tricks.com/fast-fast-enough/  

    ●  A nice comment from Paul Irish on Performance Budget:

    http://timkadlec.com/2014/01/fast-enough/#comment-1200946500  

    ●  A huge collection of articles, tools and videos related to performance:

    http://perf.rocks/  ● Testing for Front-End SPOF by Patrick Meenan:

    http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html  

    ● Frontend SPOF by Steve Souders:

    http://www.stevesouders.com/blog/2010/06/01/frontend-spof/  

    ● Metrics reporting:

    ○ Catchpoint: http://www.catchpoint.com/  

    ○ Keynote: http://www.keynote.com/  

    ○ SpeedCurve: http://speedcurve.com/  

    ○ SpeedTest.io Free Dashboard: http://dashboard.sitespeed.io/  

    People to Follow for Performance

    ● Steve Souders: @souders

    ● Scott Jehl: @scottjehl

    ● Tim Kadlec: @tkadlec

    ● Lara Hogan: @lara_hogan

    ● Guy Podjarny: @guypo

    ● Paul Irish: @paul_irish

    ● Ilya Grigorik: @igrigorik

    ● PerfPlanet: @perfplanet

    ● Hastags: #webperf #permatters

    R Program code

    Sample R program code to compute the correlation metric

    data