25
Bad Metrics and What You Can Do About It Paul Holland Managing Director of Testing Practice at Doran Jones, Inc.

[Paul Holland] Bad Metrics and What You Can Do About It

Embed Size (px)

DESCRIPTION

Software testing metrics are used extensively by many organizations to determine the status of their projects and whether or not their products are ready to ship. Unfortunately most, if not all, of the metrics being used are so flawed that they are not only useless but are possibly dangerous—misleading decision makers, inadvertently encouraging unwanted behavior, or providing overly simplistic summaries out of context. Paul Holland identifies four characteristics that will enable you to recognize the bad metrics in your organization. Despite showing how the majority of metrics used today are “bad”, all is not lost as Paul shows the collection of information he has developed that is more effective. Learn how to create a status report that provides details sought after by upper management while avoiding the problems that bad metrics cause.

Citation preview

Page 1: [Paul Holland] Bad Metrics and What You Can Do About It

Bad Metrics and What You

Can Do About It

Paul Holland Managing Director of Testing Practice at Doran Jones, Inc.

Page 2: [Paul Holland] Bad Metrics and What You Can Do About It

My Background

• Independent S/W Testing consultant since Apr 2012 • 16+ years testing telecommunications equipment and

reworking test methodologies at Alcatel-Lucent • 10+ years as a test manager • Presenter at STAREast, STARWest, Let’s Test, EuroSTAR

and CAST • Keynote at KWSQA conference in 2012 • Facilitator at 35+ peer conferences and workshops • Teacher of S/W testing for the past 5 years • Teacher of Rapid Software Testing

– through Satisfice (James Bach): www.satisfice.com

• Military Helicopter pilot – Canadian Sea Kings April, 2013 ©2013 Testing Thoughts 2

Page 3: [Paul Holland] Bad Metrics and What You Can Do About It

Attributions

• Over the past 10 years I have spoken with many people regarding metrics. I cannot directly attribute any specific aspects of this talk to any individual but all of these people (and more) have influenced my opinions and thoughts on metrics:

– Cem Kaner, James Bach, Michael Bolton, Ross Collard, Doug Hoffman, Scott Barber, John Hazel, Eric Proegler, Dan Downing, Greg McNelly, Ben Yaroch

April, 2013 ©2013 Testing Thoughts 3

Page 4: [Paul Holland] Bad Metrics and What You Can Do About It

Definitions of METRIC (from http://www.merriam-

webster.com, April 2012)

• 1 plural : a part of prosody that deals with metrical structure

• 2 : a standard of measurement <no metric exists that can be applied directly to happiness — Scientific Monthly>

• 3 : a mathematical function that associates a real nonnegative number analogous to distance with each pair of elements in a set such that the number is zero only if the two elements are identical, the number is the same regardless of the order in which the two elements are taken, and the number associated with one pair of elements plus that associated with one member of the pair and a third element is equal to or greater than the number associated with the other member of the pair and the third element

April, 2013 ©2013 Testing Thoughts 4

Page 5: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Metrics

• Number of Test Cases Planned (per release or feature)

• Number of Test Cases Executed vs. Plan

• Number of Bugs Found per Tester

• Number of Bugs Found per Feature

• Number of Bugs Found in the Field

• Number of Open Bugs

• Lab Equipment Usage

April, 2013 ©2013 Testing Thoughts 5

Page 6: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Metrics

• Hours between crashes in the Field

• Percentage Behind Plan

• Percentage of Automated Test Cases

• Percentage of Tests Passed vs. Failed (pass rate)

• Number of Test Steps

• Code Coverage / Path Coverage

• Requirements Coverage

April, 2013 ©2013 Testing Thoughts 6

Page 7: [Paul Holland] Bad Metrics and What You Can Do About It

Goodhart’s Law

• In 1975, Charles Goodhart, a former advisor to the Bank of England and Emeritus Professor at the London School of Economics stated:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes

Goodhart, C.A.E. (1975a) ‘Problems of Monetary Management: The UK Experience’ in Papers in Monetary Economics, Volume I, Reserve Bank of Australia, 1975

April, 2013 ©2013 Testing Thoughts 7

Page 8: [Paul Holland] Bad Metrics and What You Can Do About It

Goodhart’s Law

• Professor Marilyn Strathern FBA has re-stated Goodhart's Law more succinctly and more generally:

`When a measure becomes a target, it ceases to be a good measure.'

April, 2013 ©2013 Testing Thoughts 8

Page 9: [Paul Holland] Bad Metrics and What You Can Do About It

Elements of Bad Metrics

1. Measure and/or compare elements that are inconsistent in size or composition – Impossible to effectively use for comparison – How many containers do you need for your

possessions? – Test Cases and Test Steps

• Greatly vary in time required and complexity

– Bugs • Can be different severity, likelihood - i.e.: risk

April, 2013 ©2013 Testing Thoughts 9

Page 10: [Paul Holland] Bad Metrics and What You Can Do About It

Elements of Bad Metrics

2. Create competition between individuals and/or teams

– They typically do not result in friendly competition

– Inhibits sharing of information and teamwork

– Especially damaging if compensation is impacted

– Number of xxxx per tester

– Number of xxxx per feature

April, 2013 ©2013 Testing Thoughts 10

Page 11: [Paul Holland] Bad Metrics and What You Can Do About It

Elements of Bad Metrics

3. Easy to “game” or circumvent the desired intention – Easy to be improved by undesirable behaviour

– Pass rate (percentage): Execute more simple tests that will pass or break up a long test case into many smaller ones

– Number of bugs raised: Raising two similar bug reports instead of combining them

April, 2013 ©2013 Testing Thoughts 11

Page 12: [Paul Holland] Bad Metrics and What You Can Do About It

Elements of Bad Metrics

4. Contain misleading information or gives a false sense of completeness – Summarizing a large amount of information into

one or two numbers out of context

– Coverage (Code, Path) • Misleading information based on touching the code

once

– Pass rate and number of test cases

April, 2013 ©2013 Testing Thoughts 12

Page 13: [Paul Holland] Bad Metrics and What You Can Do About It

Impact of Using Bad Metrics

• Gives Executives a false sense of test coverage – All they see is numbers out of context

– The larger the numbers the better the testing

– The difficulty of good testing is hidden by large “fake” numbers

• Dangerous message to Executives – Our pass rate is at 96% so our product is in good shape

– Code coverage is at 100% - our code is completely tested

– Feature specification coverage is at 100% - Ship it!!!

• What could possibly go wrong?

April, 2013 ©2013 Testing Thoughts 13

Page 14: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Metrics

• Number of Test Cases Planned (per release or feature)

• Number of Test Cases Executed vs. Plan

• Number of Bugs Found per Tester

• Number of Bugs Found per Feature

• Number of Bugs Found in the Field – A list of Bugs

• Number of Open Bugs – A list of Open Bugs

• Lab Equipment Usage

April, 2013 ©2013 Testing Thoughts 14

Page 15: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Metrics

• Hours between crashes in the Field

• Percentage Behind Plan – depends if plan is flexible

• Percentage of Automated Test Cases

• Percentage of Tests Passed vs. Failed (pass rate)

• Number of Test Steps

• Code Coverage / Path Coverage – depends on usage

• Requirements Coverage – depends on usage

April, 2013 ©2013 Testing Thoughts 15

Page 16: [Paul Holland] Bad Metrics and What You Can Do About It

So … Now what?

• I have to stop counting everything. I feel naked and exposed.

• Track expected effort instead of tracking test cases using:

– Whiteboard

– Excel spreadsheet

April, 2013 ©2013 Testing Thoughts 16

Page 17: [Paul Holland] Bad Metrics and What You Can Do About It

Whiteboard

• Used for planning and tracking of test execution

• Suitable for use in waterfall or agile (as long as you have control over your own team’s process)

• Use colours to track:

– Features, or

– Main Areas, or

– Test styles (performance, robustness, system)

April, 2013 ©2013 Testing Thoughts 17

Page 18: [Paul Holland] Bad Metrics and What You Can Do About It

Whiteboard

• Divide the board into four areas: – Work to be done

– Work in Progress

– Cancelled or Work not being done

– Completed work

• Red stickies indicate issues (not just bugs)

• Create a sticky note for each half day of work (or mark # of half days expected on the sticky note)

• Prioritize stickies daily (or at least twice/wk)

• Finish “on-time” with low priority work incomplete

April, 2013 ©2013 Testing Thoughts 18

Page 19: [Paul Holland] Bad Metrics and What You Can Do About It

Whiteboard Example

April, 2013 ©2013 Testing Thoughts 19

End of week 1

Out of 7 weeks

Page 20: [Paul Holland] Bad Metrics and What You Can Do About It

Whiteboard Example

April, 2013 ©2013 Testing Thoughts 20

End of week 6

Out of 7 weeks

Page 21: [Paul Holland] Bad Metrics and What You Can Do About It

Reporting

• An Excel Spreadsheet with: – List of Charters – Area – Estimated Effort – Expended Effort – Remaining Effort – Tester(s) – Start Date – Completed Date – Issues – Comments

• Does NOT include pass/fail percentage or number of test cases

April, 2013 ©2013 Testing Thoughts 21

Page 22: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Report

April, 2013 ©2013 Testing Thoughts 22

Charter Area Estimated

Effort Expended

Effort Remaining

Effort Tester Date Started

Date Completed

Issues Found Comments

Investigation for high QLN spikes on EVLT

H/W Performance 0 20 0 acode 12/10/2011 01/14/2012

ALU01617032

Lots of investigation. Problem was on 2-3 out of 48 ports which just happened to be 2 of the 6 ports I tested.

ARQ Verification under different RA Modes ARQ 2 2 0 ncowan 12/14/2011 12/15/2011

POTS interference ARQ 2 0 0 --- 01/08/2012 01/08/2012

Decided not to test as the H/W team already tested this functionality and time was tight.

Expected throughput testing ARQ 5 5 0 acode 01/10/2012 01/14/2012

INP vs. SHINE ARQ 6 6 0 ncowan 12/01/2011 12/04/2011

INP vs. REIN ARQ 6 7 5 jbright 01/06/2012 01/10/2012

To translate the files properly, had to install Python solution from Antwerp. Some overhead to begin testing (installation, config test) but was fairly quick to execute afterwards

INP vs. REIN + SHINE ARQ 12 12

Traffic delay and jitter from RTX ARQ 2 2 0 ncowan 12/05/2011 12/05/2011

Attainable Throughput ARQ 1 4 0 jbright 01/05/2012 01/08/2012

Took longer because was not behaving as expected and I had to make sure I was testing correctly. My expectations were wrong based on virtual noise not being exact.

Page 23: [Paul Holland] Bad Metrics and What You Can Do About It

Weekly Report

• A PowerPoint slide indicating the important issues (not a count but a list) – “Show stopping” bugs

– New bugs found since last report

– Important issues with testing (blocking bugs, equipment issues, people issues, etc.)

– Risks (updates and newly discovered)

– Tester concerns (if different from above)

– The slide on the next page indicating progress

April, 2013 ©2013 Testing Thoughts 23

Page 24: [Paul Holland] Bad Metrics and What You Can Do About It

Sample Report

April, 2013 ©2013 Testing Thoughts 24

0

10

20

30

40

50

60

70

80

90

ARQ SRA Vectoring Regression H/W Performance

Effo

rt (

pe

rso

n h

alf

day

s)

Feature

"Awesome Product" Test Progress as of 02/01/2012

Original Planned Effort

Expended Effort

Total Expected Effort

Direction of lines indicates effort trend since last report

Solid centre bar=finished Green: No concerns Yellow: Some concerns Red: Major concerns

Page 25: [Paul Holland] Bad Metrics and What You Can Do About It

April, 2013 ©2013 Testing Thoughts 25