Reliability and Quality - Predicting post-release defects using pre-release field testing results

PREDICTING POST-RELEASE

DEFECTS USING PRE-RELEASE

FIELD TESTING RESULTS

Foutse Khomh, Brian Foutse Khomh, Brian

Chan, Ying Zou

Anand Sinha, Dave Dietz

FIELD TESTING CYCLE

Field testing is important to improve the quality of

an application before release.2

MEAN TIME BETWEEN

FAILURE

Mean Time Between Failures (MTBF) is frequently

used to gauge the reliability of the application.

3

Applications with a low MTBF are undesirable

since they would have a higher number of

defects

AVERAGE USAGE TIME

� AVT is the average time that a user actively uses the

application.

� The AVT can be longer than the period of field testing.

4

A longer AVT indicates that an application is

reliable and a user tends to use the application

longer.

PROBLEM STATEMENT

� MTBF and AVT cannot capture the whole

pattern of failure occurrences in the field testing

of an application.

5

The reliability of A and B is very different.

METRICS

� We propose three metrics that capture additional

patterns of failure occurrences:

� TTFF: the average length of usage time before

the occurrence of the first failure, the occurrence of the first failure,

� FAR: the failure accumulation rating to gauge

the spread of failures to the majority of users,

and

� OFR: the overall failure ratio that captures

daily rates of failures. 6

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA

% of users reporting failures

70

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days



FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB


80

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days



FAILURE (TTFF)

0

0.1

0.2

0.3

0.4

0.5

VersionA VersionB

% of users

reporting failures

9

0

1 2 3 4 5 6 7 8 9 1011121314

Daysreporting failures

TTFF produces high scores for applications

where the majority of users experience the

first failure late.


FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB


100

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days


TTFFB = 3.56

TTFFA = 6.11

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

% of users reporting

110

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

Number of unique failures



RATING (FAR)

0.6

0.7

0.8

0.9

1


120

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionB




RATING (FAR)

0.2

0.4

0.6

0.8

1


13

0

1 3 5 7 9 11 13% of users reportingNumber of unique failures

The FAR metric produces high scores for

applications where the majority of users report

a very low numbers of failures.


RATING (FAR)

0.6

0.7

0.8

0.9

1

FARB = 4.97


140

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionBFARA = 6.97



OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA


150

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days



(OFR)

0.25

0.3

0.35

VersionA VersionB


160

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14


Days


(OFR)

0

0.1

0.2

0.3

0.4

VersionA VersionB


failures

17

0

1 3 5 7 9 11 13


failures

Days

The OFR metric produces high scores for

applications with fewer users reporting

failures overall.


(OFR)

0.25

0.3

0.35

VersionA VersionB OFRB = 0.78

OFRA = 0.93


180

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days


CASE STUDY

We analyze 18 versions of an enterprise software

application

� Overall 2,546 users were involved in the field

testingtesting

� The testing period lasted 30 days

19

SPEARMAN CORRELATION

OF THE METRICS

TTFF FAR OFR AVT MTBF

TTFF 1 0.09 -0.08 -0.31 -0.08

20

FAR 0.09 1 0.07 0.33 -0.24

OFR -0.08 0.07 1 0.39 -0.54

AVT -0.31 0.33 0.39 1 -0.3

MTBF -0.08 -0.24 -0.54 -0.3 1

INDEPENDENCY AMONG

PROPOSED METRICS

0.4

0.6

0.8

1

TTFF

21

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

PC1 PC2 PC3 PC4

TTFF

FAR

OFR

MTBF

PREDICTIVE POWER FOR

POST-RELEASE DEFECTS

0.1

0.12

0.14

square

220

0.02

0.04

0.06

0.08

TTFF FAR OFR AVT MTBF

6 months

1 year

2 years

Metrics

Marginal R-square

PRECISION OF PREDICTIONS

WITH ALL FIVE METRICS

60

70

80

90

100

230

10

20

30

40

50

60

5 10 15 20 25 30

6 months

1 year

2 years

Precision (%)

Number of testing days

CONCLUSION

� TTFF, FAR, and OFR complement the traditional

MTBF and AVT in predicting the number of post-

release defects

� Provide faster predictions of the number of post-� Provide faster predictions of the number of post-

release defects with good precision within just 5

days of a pre-release testing period

� It takes MTBF up to 25 days to predict the

number of post-release defects

24

25

Technology

Reliability and Quality - Predicting post-release defects using pre-release field testing results