Upload
icsm-2011
View
696
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Paper : Predicting Post-release Defects Using Pre-release Field Testing Results Authors : Foutse Khomh, Brian Chan, Ying Zou, Anand Sinha and Dave Dietz Session: Research Track Session 9: Reliability and Quality
Citation preview
PREDICTING POST-RELEASE
DEFECTS USING PRE-RELEASE
FIELD TESTING RESULTS
Foutse Khomh, Brian Foutse Khomh, Brian
Chan, Ying Zou
Anand Sinha, Dave Dietz
FIELD TESTING CYCLE
Field testing is important to improve the quality of
an application before release.2
MEAN TIME BETWEEN
FAILURE
Mean Time Between Failures (MTBF) is frequently
used to gauge the reliability of the application.
3
Applications with a low MTBF are undesirable
since they would have a higher number of
defects
AVERAGE USAGE TIME
� AVT is the average time that a user actively uses the
application.
� The AVT can be longer than the period of field testing.
4
A longer AVT indicates that an application is
reliable and a user tends to use the application
longer.
PROBLEM STATEMENT
� MTBF and AVT cannot capture the whole
pattern of failure occurrences in the field testing
of an application.
5
The reliability of A and B is very different.
METRICS
� We propose three metrics that capture additional
patterns of failure occurrences:
� TTFF: the average length of usage time before
the occurrence of the first failure, the occurrence of the first failure,
� FAR: the failure accumulation rating to gauge
the spread of failures to the majority of users,
and
� OFR: the overall failure ratio that captures
daily rates of failures. 6
AVERAGE TIME TO FIRST
FAILURE (TTFF)
0.3
0.35
0.4
0.45
VersionA
% of users reporting failures
70
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
% of users reporting failures
AVERAGE TIME TO FIRST
FAILURE (TTFF)
0.3
0.35
0.4
0.45
VersionA VersionB
% of users reporting failures
80
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
% of users reporting failures
AVERAGE TIME TO FIRST
FAILURE (TTFF)
0
0.1
0.2
0.3
0.4
0.5
VersionA VersionB
% of users
reporting failures
9
0
1 2 3 4 5 6 7 8 9 1011121314
Daysreporting failures
TTFF produces high scores for applications
where the majority of users experience the
first failure late.
AVERAGE TIME TO FIRST
FAILURE (TTFF)
0.3
0.35
0.4
0.45
VersionA VersionB
% of users reporting failures
100
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
% of users reporting failures
TTFFB = 3.56
TTFFA = 6.11
FAILURE ACCUMULATION
RATING (FAR)
0.6
0.7
0.8
0.9
1
% of users reporting
110
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14
VersionA
Number of unique failures
% of users reporting
FAILURE ACCUMULATION
RATING (FAR)
0.6
0.7
0.8
0.9
1
% of users reporting
120
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14
VersionA
VersionB
% of users reporting
Number of unique failures
FAILURE ACCUMULATION
RATING (FAR)
0.2
0.4
0.6
0.8
1
% of users reporting
13
0
1 3 5 7 9 11 13% of users reportingNumber of unique failures
The FAR metric produces high scores for
applications where the majority of users report
a very low numbers of failures.
FAILURE ACCUMULATION
RATING (FAR)
0.6
0.7
0.8
0.9
1
FARB = 4.97
% of users reporting
140
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14
VersionA
VersionBFARA = 6.97
Number of unique failures
% of users reporting
OVERALL FAILURE RATING
(OFR)
0.25
0.3
0.35
VersionA
% of users reporting failures
150
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
% of users reporting failures
OVERALL FAILURE RATING
(OFR)
0.25
0.3
0.35
VersionA VersionB
% of users reporting failures
160
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14
% of users reporting failures
Days
OVERALL FAILURE RATING
(OFR)
0
0.1
0.2
0.3
0.4
VersionA VersionB
% of users reporting
failures
17
0
1 3 5 7 9 11 13
% of users reporting
failures
Days
The OFR metric produces high scores for
applications with fewer users reporting
failures overall.
OVERALL FAILURE RATING
(OFR)
0.25
0.3
0.35
VersionA VersionB OFRB = 0.78
OFRA = 0.93
% of users reporting failures
180
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
% of users reporting failures
CASE STUDY
We analyze 18 versions of an enterprise software
application
� Overall 2,546 users were involved in the field
testingtesting
� The testing period lasted 30 days
19
SPEARMAN CORRELATION
OF THE METRICS
TTFF FAR OFR AVT MTBF
TTFF 1 0.09 -0.08 -0.31 -0.08
20
FAR 0.09 1 0.07 0.33 -0.24
OFR -0.08 0.07 1 0.39 -0.54
AVT -0.31 0.33 0.39 1 -0.3
MTBF -0.08 -0.24 -0.54 -0.3 1
INDEPENDENCY AMONG
PROPOSED METRICS
0.4
0.6
0.8
1
TTFF
21
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
PC1 PC2 PC3 PC4
TTFF
FAR
OFR
MTBF
PREDICTIVE POWER FOR
POST-RELEASE DEFECTS
0.1
0.12
0.14
square
220
0.02
0.04
0.06
0.08
TTFF FAR OFR AVT MTBF
6 months
1 year
2 years
Metrics
Marginal R-square
PRECISION OF PREDICTIONS
WITH ALL FIVE METRICS
60
70
80
90
100
230
10
20
30
40
50
60
5 10 15 20 25 30
6 months
1 year
2 years
Precision (%)
Number of testing days
CONCLUSION
� TTFF, FAR, and OFR complement the traditional
MTBF and AVT in predicting the number of post-
release defects
� Provide faster predictions of the number of post-� Provide faster predictions of the number of post-
release defects with good precision within just 5
days of a pre-release testing period
� It takes MTBF up to 25 days to predict the
number of post-release defects
24
25