Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Preview:

DESCRIPTION

This presentation was delivered by Tom Kleingarn at HP Software Universe 2010 in Washington DC. It describes basic statistical tests that can be applied to any performance engineering practice to improve accuracy and confidence in your test results.

Citation preview

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Predictive Performance TestingIntegrating Statistical Tests into Agile Development Lifecycles

Tom KleingarnLead, Performance Engineering

Digital River

http://www.linkedin.com/in/tomkleingarn

http://www.perftom.com

Agenda

> Introduction

> Performance engineering

> Agile

> Outputs from LoadRunner

> Basic statistics

> Advanced statistics

> Summary

> Practical application

About Me> Tom Kleingarn

> Lead, Performance Engineering - Digital River

> 4 years in performance engineering

> Tested over 100 systems/applications

> 100’s of performance tests

> Tools> LoadRunner

> JMeter

> Webmetrics, Keynote, Gomez

> ‘R’ and Excel

> Quality Center

> QuickTest Professional

> Leading provider of global e-commerce solutions

> Builds and manages online businesses for software and game publishers, consumer electronics manufacturers, distributors, online retailers and affiliates.

> Comprehensive platform offers > Site development and hosting> Order management> Fraud management> Export control> Tax management> Physical and digital product fulfillment> Multi-lingual customer service> Advanced reporting and strategic marketing

Performance Engineering

> The process of experimental design, test execution, and results analysis, utilized to validate system performance as part of the Software Development Lifecycle (SDLC).

> Performance requirements – measureable targets of speed, reliability, and/or capacity used in performance validation.

> Latency < 10ms, measured at the 99th percentile

> 99.95% uptime

> Throughput of 1,000 requests per second

Performance Testing Cycle

1. Requirements Analysis

2. Create test plan

3. Create automated scripts

4. Define workload model

5. Execute scenarios

6. Analyze results

> Rinse and repeat if…

> Defects identified

> Change in requirements

> Setup or environment issues

> Performance requirement not met

Digital River Test Automation

Agile

> A software development paradigm that emphasizes rapid process cycles, cross-functional teams, frequent examination of progress, and adaptability.

Scrum

Initial Plan

Deploy

Agile Performance Engineering

> Clear and constant communication

> Involvement in initial requirements and design phase

> Identify key business processes before they are built

> Coordinate with analysts and development to build key business processes first

> Integrate load generation requirements into project schedule

> Test immediately with v1.0

> Schedule tests to auto-start, run independently

> Identify invalid test results before deep analysis

LoadRunner Results

> Measures of central tendency> Average = ∑(all samples)/(sample size) =

> Median = 50th percentile

> Mode – highest frequency, the value that occurred the most

> Measures of variability> Min, max

> Standard Deviation =

> 90th percentile

LoadRunner Results

50% 50%

90% 10%

Basic Statistics – Sample vs. Population

> Performance requirement: average latency < 3 seconds

> What if you ran 50 rounds? 100 rounds?

Basic Statistics – Sample vs. Population

> Sample – set of values, subset of population

> Population – all potentially observable values

> Measurements

> Statistic – the estimated value from a collection of samples

> Parameter – the “true” value you are attempting to estimate

Not a representative sample!

Basic Statistics – Sample vs. Population> Sampling distribution – the probability distribution of a

given statistic based on a random sample of size n> Dependent on the underlying population

> How do you know the system under test met the performance requirement?

Basic Statistics – Normal Distribution

> With larger samples, data tend to cluster around the mean

Basic Statistics – Normal Distribution

Sir Francis Galton’s “Bean Machine”

Confidence Intervals

> The probability that an interval made up of two endpoints will contain the true mean parameter μ

> 95% confidence interval:

> … where 1.96 is a score from the normal distribution associated with 95% probability:

Confidence Intervals

> In repeated rounds of testing, a confidence interval will contain the true mean parameter with a certain probability:

True Average

Confidence Intervals in Excel

> 95% confidence - true average latency 3.273 to 3.527 seconds

> 99% confidence - true average latency 3.233 to 3.567 seconds

> Our range is wider at 99% compared to 95%, 0.334 sec vs. 0.254 sec

Statistic Value 95% Value 99% Formula

Average 3.40 3.40

Standard Deviation 1.45 1.45

Sample size 500 500

Confidence Level 0.95 0.99

Significance Level 0.05 0.01 =1-(Confidence Level)

Margin of Error 0.0127 0.167 =CONFIDENCE(Sig. Level, Std Dev, Sample Size)

Lower Bound 3.273 3.233 =Average - Margin of Error

Upper Bound 3.527 3.567 =Average + Margin of Error

The T-test

> Test that your sample mean is greater than/less than a certain value

> Performance requirement:

Mean latency < 3 seconds

> Null hypothesis:

Mean latency >= 3 seconds

> Alternative hypothesis:

Mean latency is < 3 seconds

Add pic

T-test – Raw Data from LoadRunner

n = 500

T-test in ‘R’> ‘R’ for statistical analysis

> http://www.r-project.org/

Load test data from a file:> datafile <- read.table("C:\\Data\\test.data",

header = FALSE, col.names= c("latency"))

Attach the dataframe:> attach(datafile)

Create a “vector” from the dataframe:

> latency <- datafile$latency

T.Test in ‘R’

> t.test(latency, alternative="less", mu=3, tails=1)

One Sample t-test 

data: latency

t = -2.9968, df = 499, p-value = 0.001432

alternative hypothesis: true mean is less than 3

> There is a 0.14% probability that the true average latency of the system is greater than 3 seconds. In this case we would reject the null hypothesis.

> There is a 99.86% probability that the true average latency is less than 3 seconds

T-test – Number of Samples Required

> power.t.test(sd=sd, sig.level=0.05, power=0.90, delta=mean(latency)*0.01, type="one.sample")

One-sample t test power calculation

n = 215.5319

delta = 0.03241267

sd = 0.1461401

sig.level = 0.05

power = 0.9

alternative = two.sided

> We need at least 216 samples

> Our sample size is 500, we have enough samples to proceed

Test for Normality

> Test that the data is “normal”

> Clustered around a central value, no outliers

> Roughly fits the normal distribution

> shapiro.test(latency)

Shapiro-Wilk normality test

data: latency

p-value = 0.8943

> Our sample distribution is approximately normal

> p-value < 0.05 indicates the distribution is not normal

Review

> Sample vs. Population

> Normal distribution

> Confidence intervals

> T-test

> Sample size

> Test for normality

> Practical application

> Performance requirements

> Compare two code builds

> Compare system infrastructure changes

Case Study

> Engaged in a new web service project

> Average latency < 25ms

> Applied statistical analysis

> System did not meet requirement

> Identified problem transaction

> Development fix applied

> Additional test, requirement met

> Prevented a failure in production

Implementation in Agile Projects

> Involvement in early design stages

> Identify performance requirements

> Build key business processes first

> Calculate required sample size

> Apply statistical analysis

> Run fewer tests with greater confidence in your results

> Prevent performance defects from entering production

> Prevent SLA violations in production

Recommended