Presented by:
Rick raig Software eering
Brought to you by:
340 Corporate Way, Suite Orange Park, FL 32073 888‐2
MG PM Tutorial4/7/2014 1:00 PM
“Measurement and Metrics for Test Managers”
C
Quality Engin
300,68‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com
A consultant, lecturer, author, and test manager, Rick Craig has led numerous teams
ny
e has
quent on.
Rick Craig ity Engineering Software Qual
of testers on both large and small projects. In his twenty-five years of consulting worldwide, Rick has advised and supported a diverse group of organizations on matesting and test management issues. From large insurance providers and telecommunications companies to smaller software services companies, hmentored senior software managers and helped test teams improve their effectiveness. Rick is coauthor of Systematic Software Testing and is a frespeaker at testing conferences, including every STAR conference since its incepti
© 2014 SQE Training V3.2 1
IntroductionMEASUREMENT AND METRICS FOR TEST MANAGERS
Administrivia
Course timing Mealstiming
Electronic devices Facilities
Smoking Breaks
4© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 2
Course Agenda
1. Introduction to Software Measurement2. Metrics—Rules of Thumb3. A Tester’s Dashboard4. Estimation (Optional)
5© 2014 SQE Training V3.2
1INTRODUCTION TO SOFTWARE MEASUREMENT
© 2014 SQE Training V3.2 3
What is software measurement?
“It’s easy to get numbers, what is y g ,hard is to know they are right and understand what they mean”
— Bill Hetzel
7© 2014 SQE Training V3.2
What is software measurement?
“ ifi d b i ”“Quantified observations”about any aspect of software(product, process, or project)
8© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 4
Lord Kelvin
“To measure is to know”
“If you cannot measure it, you cannot improve it”
“Th d d h i“The more you understand what is wrong with a figure, the more valuable that figure becomes”
9© 2014 SQE Training V3.2
There Are Lots and Lots of MeasuresPrimitive:
– Aspirins consumed this week– Number of staff assigned to project AP f i t ifi ti– Pages of requirements specifications
– Hours worked to accomplish change request X– Number of operational failures in system Y this year– Lines of code in program Z
Computed:Defects per 1000 lines of code in program A– Defects per 1000 lines of code in program A
– Productivity in function points deliveredby person B
– Quality Score for project C– Average coffee consumption per line of code– Accuracy of hours worked per week is ± 20%
10© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 5
Common Metrics• Test defects• Defects after releaseO bl• Open problems
• Open issues• Schedule performance• Process compliance (e.g., ISO)• Test results• Reliability• Time fixing problems• Defects from fixes• Lines of code• Plan and schedule changes
11© 2014 SQE Training V3.2
Uncommon Metrics
• Code coverageC l it• Complexity
• Cost of rework• Cost of quality
Defect age
12© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 6
Basic DefinitionsThe four Ms:
• Measure
Primitive (raw data)
13, 34, 17, 74 42, 34 56 77 94 34• Measure
• Metric• Meter• Meta‐measure
34, 56, 77 94, 34, 45, 63 45, 67, 12, 31 61, 06, 91, 42
Computed (information)
13© 2014 SQE Training V3.2
What Makes a Good Measure?
• SimpleObj i• Objective
• Easily collected• Robust• Valid
14© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 7
What Can Measures Do for You?• Facilitate estimation• Identify risky areasy y• Measure testing status• Measure/predict product quality• Measure test effectiveness• Identify training opportunitiesIdentify training opportunities• Identify process improvement opportunities• Provide “meters” to flag actions
15© 2014 SQE Training V3.2
2METRICS—RULES OF THUMB
© 2014 SQE Training V3.2 8
Metrics--Rules of Thumb
• The Human Element• The Basics• KISS• And a Myth or Two
17© 2014 SQE Training V3.2
The Human Element
•Without buy‐in, metrics may be falsified
•Without buy‐in, metrics may be ignored
Buy‐in is key
18© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 9
Class Discussion
How do you obtain buy‐in?
19© 2014 SQE Training V3.2
Ways to Obtain Buy-in
• Training• Metrics• Feedback loops• ReviewsReviews• Participation
20© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 10
The Human Element
• Measure processes and products instead of l if iblpeople if possible
• Beware of the dark side of the Hawthorne Effect
21© 2014 SQE Training V3.2
Two Sides of Measurement
…the information
…the information will help me
may be used against me.
punderstand what is going on and do
a better job.
22© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 11
The Hawthorne EffectMeasuring people improves their productivity
23© 2014 SQE Training V3.2
The Human Element
Tailor metrics to the audience
Users, managers, practitioners all have different languages
Set the appropriate level of detailpp p
How you present the material matters
24© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 12
Who is your audience?Users
Developers
Testers
25© 2014 SQE Training V3.2
% of Red Cars Soars
2626 026.1
25.525.4
25.425.525.625.725.825.926.0
2008 2009 201025.125.225.3
26© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 13
% of Red Cars Soars? 100
50
75
25.5 2625 4
2008 2009 2010
2525.4
27© 2014 SQE Training V3.2
The Human Factor
Training is required
Metrics are not second nature
Your metrics are affected by how they are collected
Establish range of expected values
Publish historical values
28© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 14
The Basics
• Use a metric to validate a metric• Use meta‐measures• Use meters when possible• Consistency sometimes trumps accuracy• Subjective is good; objective is better
29© 2014 SQE Training V3.2
KISS ― Keep It Simple Sir
• More is not always better• All metrics are not foreverAll metrics are not forever
– Consider temporary metrics– Consider sampling
• Automate collection when possible
30© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 15
3’A TESTER’S DASHBOARD
A Dashboard
33© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 16
Establish a Dashboard
• Easy to use/understand at a glance
Quality of productQuality of product
Status
Test effectiveness
Resources
* Remember you need at least two metrics per “instrument”
Issues
34© 2014 SQE Training V3.2
Measures of Quality
• It is difficult to develop practical measures of qualityquality
• The cost to achieve various quality levels must be taken into account
• Many quality metrics are relatively subjective
• Quality goals will be affected by the industry and corporate culture
35© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 17
What Is Quality?
•Meeting requirements(stated and/or implied)Quality
36© 2014 SQE Training V3.2
Sample Quality Factors and Criteria• Correctness• Reliability• Testability• Flexibility• Usability• Portability• Interoperability• Efficiency
Correctability
CorrectnessCorrectnessCorrectness
• Integrity• Maintainability• Revisability• Survivability
CorrectnessCorrectnessCorrectnessCorrectness
37© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 18
Defect Density/Clustering
# of Defects
per1,000
Lines of Code
Module NameD B A C E F
38© 2014 SQE Training V3.2
Defect Density
IssuesCoverage of tests
Weighting of defects
Weighting by relative risk
What to use as the denominator
39© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 19
Effect of Complexity on Quality
ase
bilit
y of
Pos
t-rel
eaD
efec
t
Complexity
Pro
bab
40© 2014 SQE Training V3.2
Other Measures of Product Quality• Customer satisfaction• Repeat customers?• Referrals?• Calls to the help desk?• Timeliness?• Defect age?• Complexity?• Rework?• Reliability?
41© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 20
Quality of Product
• Record any current measures of product quality that you are using here. Give them a q y y ggrade for effectiveness (A, B, C, etc.)
• Any new metrics you would use?
* Remember you need at least two metrics per instrument
42© 2014 SQE Training V3.2
Establish a Dashboard• Easy to use/understand at a glance:
Quality of productQuality of product
Status
Test effectiveness
Resources
* Remember you need at least two metrics per instrument
Issues
43© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 21
Status Reporting• The Master Test Plan should
specify
– What to report– How often– To whom
44© 2014 SQE Training V3.2
Common Test Status Metrics% of Test Cases Executed
IIssues:
• Weighting of TC by coverage metrics• Weighting of TC by risk• Weighting of TC by execution effort• Weighting of TC by time to executeWeighting of TC by time to execute
What do you really want to know?
45© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 22
Sample Test Status Report (raw data)Project: Online-Trade Date: 4/23/2009
Feature Total # % # %Tested Tests CompleteCompleteSuccess Success
Open Acct 46 46 100 41 89Sell Order 36 25 69 25 69Buy Order 19 17 89 12 63…..…..…..…..
Totals 395 320 81 311 79
46© 2014 SQE Training V3.2
Open and Closed Over TimeIncoming
Released
30
40
2 4 6 8 10 12 14 16 18 20
Weeks
Def
ects
Fixed
0
10
20
24222018161412ve
Def
ects
Detected
121086420
Days
Cum
ulat
iv
Open
0 10 20 30 40
47© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 23
When Is the Software “Good Enough”?
• Test exit criteria met
When to stop testing
• Return On Investment (ROI) not sufficient• Defect arrival rate• Resources exhausted
– Time– Moneyy
• Profiles (based on failures encounteredusing profiles of real data)
• Project cancelled!
48© 2014 SQE Training V3.2
Software PsychologyWhat is “good enough”?
# of# of Bugs
Time49© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 24
Economics of Test and Failure
Source: IBM Systems Sciences Institute
50© 2014 SQE Training V3.2
Stopping Criteria ― RevisitedAbnormal
• Resource exhaustion– Schedule
Normal• Test set exit criteria• Remaining defects
– Budget– System access– Patience
• Project redirection
e a g de ectsestimation criteria
– Defect history of past software– Defect history of current item– Software complexity– Combination of these
• Diminishing return criteria
– Cost to Detect Next Defect
• Combined criteria“There is no single, valid, rational criterion for stopping. Furthermore, given any set of applicable criteria, how each is weighed depends very much on the product, the environment, the culture, and the attitude to risk.”
— Boris Beizer
51© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 25
Test Summary Report• Report identifier• References
– Test items (with revision #s)
Adequacy assessmentEvaluation of coverageIdentify uncovered attributes
Summary of activitiesTest items (with revision #s)– Environments– References
• Variances (deviations)– From test plan or
requirements– Reasons for deviations
• Summary of incidents
Summary of activitiesSystem/CPU usageStaff timeElapsed time
Software evaluation LimitationsFailure likelihood
ApprovalsSummary of incidents– Resolved incidents– Defect patterns– Unresolved incidents
52© 2014 SQE Training V3.2
Status• Record any current test status measures that you are using here. Give them a grade for effectiveness (A B C etc )(A, B, C, etc.)
• Any new metrics you would use?
* Remember you need at least two metrics per instrument
53© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 26
Establish a Dashboard• Easy to use/understand at a glance:
Quality of productQuality of product
Status
Test effectiveness
Resources
* Remember you need at least two metrics per instrument
Issues
54© 2014 SQE Training V3.2
How Do You Measure Test Effectiveness?
55© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 27
A Common Answer
– Coverage– Defect age (phase or product version)Defect age (phase or product version)– # of bugs– Defect density– Defect removal efficiency– Defect seeding– Mutation analysis– Customer complaints
56© 2014 SQE Training V3.2
Three Major Categories
57© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 28
Customer Satisfaction Measures
Issues
Who to ask
“After the fact”
Diffi l i iDifficulty in measuring
Doesn’t differentiate between the effectiveness of development and testing
58© 2014 SQE Training V3.2
Customer Satisfaction Measures• Subjective is good• Objective is betterObjective is better
59© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 29
Defect Measures
• Why is it important to track defects?• What are some ways to analyze defects?• What are some ways to analyze defects?• DDP• Defect density• Defect age
60© 2014 SQE Training V3.2
Why is it important to track defects?• Identify process improvement• Identify training needsy g• Identify problematic (high‐risk) areas• Determine test status
61© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 30
Defect Analysis ― Example
• Phase• Type• Type• Severity• Priority• AuthorA• Age
• Module
62© 2014 SQE Training V3.2
Defect Detection Percentage (DDP)
D f t Di dDDP =
Defects Discoveredx 100%
Defects at Start
85% is the average DRE for US software gprojects greater than 1,000 function points in size.
— Capers Jones
63© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 31
Defect Detection Percentage (DDP)
Issues
Severity and distribution of defects
How to know when all bugs are found
“After the fact”
What constitutes bug‐finding activities?
Some bugs cannot be found in testing
64© 2014 SQE Training V3.2
Defect “Value” (Cost Avoidance)
Requirements 1
When discovered Typical hours to rework/fix
High level design 1Detailed design 1Code 1Unit Test 3 – 5Integration test 5 –10System/acceptance test 10 – 30Production 20 – 60+
65© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 32
Defect Age (PhAge)Phase
discovered
Requirements
High level design0
Phasecreated
1 432 98765
0 321 87654
Detailed design
Coding
0 321 87654
210 76543
10 65432
66© 2014 SQE Training V3.2
Defect Age
IssuesIssues
Difficult to do root cause
Requires weighting of defectsq g g
How to handle latent/masked defects
67© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 33
Coverage Measures
Discussion
Requirements vs. design vs. code coverage
Completeness/accuracy of test basis
Coverage of test set vs. coverage of tests g gexecuted (e.g., we don’t always run every test)
Coverage vs. actual results (DDP)
68© 2014 SQE Training V3.2
Mapping Test Cases to Requirements
Requirements spec.
3 5 1 3 2
Test plan
T t C #33.5.1.3.2
…..
3.5.1.4.7
…..
3.6.4.2.1
Test Case #3
…..
Test Case #5
…..
Test Case #12
…..
3.8.2.7.1
…..
Test Case #19
69© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 34
Requirements/Design Coverage
Test Case
Conceptual model of requirements/ design coverage:
Test Case1 2 3 Covered?
Requirement A X X YB NC X X Y
Feature A X YB X X YB X X Y
Design A X X YB X YC ND X X Y
70© 2014 SQE Training V3.2
Requirements/Design Coverage
IssuesOnly as good as test basis
Relatively low coverage of code
Code coverage achieved with requirements testsb k ( )Major bank (20 apps) 20%
Major DBMS vendor 47%Major h/w s/w vendor 60%
— Source: Bender and Associates
71© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 35
Code Coverage
Test run
Conceptual model of code coverage
Test run1 2 3 Covered?
Statement A X X X YB X X YC X YD NE X YE X Y
60% 20% 60% 80%
72© 2014 SQE Training V3.2
Code Coverage
IssuesRequires a tool
Doesn’t prove the code actually “works” correctly
Did we test the “right code”?
Statement vs. branch vs. path
73© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 36
Test Effectiveness• Record any current test effectiveness measures that you are using here. Give them a grade for ff ti (A B C t )effectiveness (A, B, C, etc.)
• Any new metrics you would use?
* Remember you need at least two metrics per instrument
74© 2014 SQE Training V3.2
Establish a Dashboard
• Easy to use/understand at a glance:
Quality of productQuality of product
Status
Test effectiveness
Resources
* Remember you need at least two metrics per instrument
Issues
75© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 37
Resources• Resource estimates/consumption are necessary in order to do test planning, estimation, budgeting and staffingbudgeting, and staffing
• You must consider the level of granularity in the collection of these metrics based on the accuracy of the required metrics and your ability to validate them
• Some people choose to exclude the resources instrument from the dashboard because they feel it is not a “day to day” metric
76© 2014 SQE Training V3.2
ResourcesResource metrics are normally collected Resource metrics are normally collected in terms of
Actual/expected budget
Actual/expected engineering hours
Test environment utilization/availability
Staffing levelsStaffing levels
Contractor availability
Other hardware/software resources
77© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 38
Resources• Record any current resource measures that you are using here. Give them a grade for ff ti (A B C t )effectiveness (A, B, C, etc.)
• Any new metrics you would use?
* Remember you need at least two metrics per instrument
78© 2014 SQE Training V3.2
Establish a Dashboard• Easy to use/understand at a glance:
Quality of ProductQuality of Product
Status
Test Effectiveness
Resources
* Remember you need at least two metrics per instrument
Issues
79© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 39
Issues
• This is included to address any important items not otherwise included on theitems not otherwise included on the dashboard. These are normally subjective and not necessarily conducive to systematic analysis
I ld i l t i i i t ll ti f• Issues could involve training, installation of new hardware/software, politics—even the weather
80© 2014 SQE Training V3.2
A Sample Tester’s Dashboard
Status• % completion• Defect info
Product Quality• Defect density• Performance, etc.
Test Effectiveness• DDP• Coverage
ResourcesE i i h• Engineering hours
• MoneyIssues
81© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 40
Avoiding Dysfunction
• Measure processes and products—not people!
• Beware of the dark side of the Hawthorne EffectBeware of the dark side of the Hawthorne Effect
• Remember that more is not always better
• Avoid the exclusive use of top‐down metrics
• Provide training—not all metrics are intuitive
• Consider temporary metricsConsider temporary metrics
82© 2014 SQE Training V3.2
Avoiding Dysfunction
• Define each metric, its use, who will see it, expected ranges etcranges, etc.
• Remember your audience and tailor to their needs
• Always seek multiple interpretations
• Ask your audience what their interpretation of a metric is before you offer yoursmetric is before you offer yours
• Sell, sell, sell, sell
83© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 41
One Truth and One Myth in ClosingThe TruthGilb’s Law“Anything you need to quantify can be measured in some way that is superior to not measuring it at all”
—Tom Gilb
The Myth“Some metrics are always better than no metrics …”
84© 2014 SQE Training V3.2
4ESTIMATION(OPTIONAL)
© 2014 SQE Training V3.2 42
Estimation
Estimate:1. A tentative evaluation or rough calculation g2. A preliminary calculation of the cost of a project 3. A judgment based upon one’s impressions; opinion
—The American Heritage Dictionary
It is very difficult to make a vigorous plausible and job‐It is very difficult to make a vigorous, plausible, and jobrisking estimate that is derived by no quantitative method, supported by little data and certified chiefly by hunches of the managers.
— Fred Brooks87© 2014 SQE Training V3.2
Test Estimation
Estimation: the creation of an approximate target for costs and completion dates
The best estimates
Represent the collective wisdom of practitioners and have their buy‐in
g p
Provide specific, detailed catalogs of the costs, resources, tasks, and people involved
Present, for each activity estimated, the most likely cost, effort, and duration
88© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 43
Test Estimation (cont.)
Factors that can influence cost, effort, and duration include:
Required level of quality of the system
Size of the system to be tested
Historical data
Process factors (process maturity, etc.)Process factors (process maturity, etc.)
Material factors (tools, data, etc.)
People factors (skills, experience, managers, etc.)
89© 2014 SQE Training V3.2
Test Estimation (cont.)
• Delivery of estimates should include j tifi tijustification
• Negotiation and re‐work of estimates is normal
• Final estimates represent a balance of organizational and project goals in the areasorganizational and project goals in the areas of quality, schedule, budget, and features
90© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 44
How Good Is Our Industry (at Estimating)?• Tata: 62% of projects fail to meet schedule
49% have budget overruns• Moiokken and Jorgensen: 30-40% overruns
91© 2014 SQE Training V3.2
Class DiscussionWhy is estimating not done well?Y t fiYour top five reasons:
1) Too many variables____________________
2) ____________________________________
3) ____________________________________
4) ____________________________________
5) ____________________________________
92© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 45
Why Estimates Are Inaccurate ― Part I• Lack of estimating experience• Lack of historical data on which to base estimates• Lack of systematic estimation process sound• Lack of systematic estimation process, sound techniques, or models suited to the project
• Failure to include essential activities and products within the scope of the estimates
• Unrealistic expectations or assumptions• Failure to recognize and address the uncertainty inherent in project estimatesinherent in project estimates
Practical Software MeasurementAddison‐Wesley, 2001
93© 2014 SQE Training V3.2
Why Estimates Are Inaccurate ― Part II• Lack of education and training• Confusing the target with the estimate• Hope‐based planning• Inability to communicate and support estimates
• Incomplete, changing, and creeping requirements
• Quality surprises (test and re‐fix)• Quality surprises (test and re‐fix)
—adapted from Linda M. LairdThe Limitations of Estimation
94© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 46
Bohem’s Cone of Uncertainty
95© 2014 SQE Training V3.2
NHC Track Forecast Cone
96© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 47
“Testing” Track Forecast Cone(or why it is important to constantly re-estimate)
Ta
sk
s
T i m e97© 2014 SQE Training V3.2
The Fantasy FactorToday 1st 3rd2nd
Weeks
0 1 2 3 4 5 6 7 8 9
What would have to happen to deliver this in four weeks?weeks?
What should the estimate have been?
98© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 48
Estimation
1, 2, 3, or 4 Variables + Many Modifiers:
Time
If it’s not variable, then it’s fixed.
Size Resources
99© 2014 SQE Training V3.2
Time vs. Resources
=
100© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 49
Test Estimation Techniques ― Examples• Intuition and guess• Work‐breakdown‐structures• Three‐point estimates• Company standards and norms• % of project effort or staffing• Industry averages and predictive models (e.g., FP, TPA )• Team estimation sessions
d b d l h– Wideband Delphi– Story point sizing– Poker estimation– T‐shirt sizing
101© 2014 SQE Training V3.2
Karl Wiegers’s Estimation Safety Tips
• A goal is not an estimate• The estimate you produce should be unrelated to what you think the requester wants to hear
• The correct answer to any request for an estimate is “Let me get back to you on that”estimate is “Let me get back to you on that”
• Avoid giving single point estimates• Incorporate contingency buffers into estimates
102© 2014 SQE Training V3.2
© 2014 SQE Training V3.2 50
Rick Craig’s Tips for Better Estimates• Do it!• Collect metrics• Remember the “fantasy” factor• Remember the fantasy factor• Don’t “pad” your estimates*• Don’t spend a ton of time• Estimates don’t have to be perfect
– Estimates are just estimates– They will change/constantly as you re‐estimate– Remember planning risks and contingenciesp g g– Remember Brooke’s Law
• If the date is fixed, estimate something else• Use tools• Use ranges of value instead of discrete numbers
103© 2014 SQE Training V3.2
Recommended