Research Heaven,West Virginia
A Framework for Early Reliability Assessment
Bojan Cukic, Erdogan Gunel, Harshinder Singh,
Lan Guo, Dejan Desovski
West Virginia University
Carol Smidts, Ming Li
University of Maryland
(WVU UI: Integrating Formal Methods and Testing in a Quantitative Software Reliability Assessment Framework 2003)
2
Research Heaven,West Virginia
Overview
• Introduction and Motivation.• Software Reliability Corroboration
Approach. • Case Studies.• Applying Dempster Shafer Inference
to NASA datasets. • Summary and Further Work.
3
Research Heaven,West Virginia
Introduction
• Quantification of the effects of V&V activities is always desirable.
• Is software reliability quantification practical for safety/mission critical systems?– Time and cost considerations may limit the appeal.
• Reliability growth applicable only to integration testing, the tail end of V&V.
• Estimation of operational usage profiles is rare.
4
Research Heaven,West Virginia
Is SRE Impractical for NASA IV&V?
• Most IV&V techniques are qualitative in nature.– Mature software reliability estimation methods based
exclusively on testing.
• Can IV&V techniques be utilized for reliability?– Requirements readings, inspections, problem reports and
tracking, unit level tests…
Req Design Code Test (Verification & Validation)Unit Integration Acceptance
Life cycle long IV&V Implementation
TraditionalSoftware ReliabilityAssessmentTechniques
5
Research Heaven,West Virginia
Contribution
• Develop software reliability assessment methods that build on:– Stable and mature development environments.– Lifecycle long IV&V activities.– Utilize all relevant available information
• Static (SIAT), dynamic, requirements problems, severities.
– Qualitative (formal and informal) IV&V methods.
• Strengthening the case for IV&V across NASA enterprise.– Accurate, stable reliability measurement and tracking. – Available throughout the development lifecycle.
6
Research Heaven,West Virginia
Assessment vs. Corroboration
• Current thinking– Software reliability “tested into” the product through
the integration and acceptance testing.
• Our thinking– Why “waste” the results of all the qualitative IV&V
activities.– Testing should corroborate that the life-cycle long
IV&V techniques are giving the “usual” results, that the project follows usual quality patterns.
7
Research Heaven,West Virginia
Approach
Software qualityMeasures (SQM)
Reliability PredictionSystems (RPS)
RPS CombinationTechniques
SW Reliability CorroborationTesting
SQM1SQM3
SQM2
SQM4
SQM6
SQM5SQMi
SQMj
RPS1 RPS2 RPSk RPSm. . .
RPS Combination (Experience, Learning, Dempster-Schafer…)
BHT software reliability corroboration
Null Hypothesis, H0
Alternative Hypothesis, Ha
Softw
are Developm
ent Lifecycle
Trustworthy Software Reliability Measure
8
Research Heaven,West Virginia
Software Quality Measures (roots)
• The following ones used in experiments.– Lines of code– Defect density
• No defect that remain unresolved after testing, divided by the LOC.
– Test coverage• LOCtested / LOCtotal.
– Requirements traceability• RT= #_requirements_implemented/#_original_requirements.
– Function points
– . . .
• In principle, any measures available could/should be taken into account. – Defining appropriate Reliability Prediction Systems (RPS).
9
Research Heaven,West Virginia
Reliability Prediction Systems
• An RPS is a complete set of measures from which software reliability can be predicted.
• The bridge between an RPS and software reliability is a MODEL.
• Therefore, select (and collect) those measures that have the highest relevance to reliability. – Relevance to reliability ranked from expert opinions
[Smidts 2002].
10
Research Heaven,West Virginia
RPS for Test Coverage
))1(1ln(2
10
0
MFPkILOCMFPkTLOC
aLeaa
N
T
K
s ep
RPS Model
Test coverage
Root measures Notation
:Test coverageSupport measures: Implemented LOC (LOCI)
Tested LOC (LOCT)
The number of defects found by test (N0) Missing function point (FPM)
Backfiring coefficient (k) Defects found by test (DT)
Linear execution time (TL)
Execution time per demand ()Fault exposure ratio (K)
C0 defect coverageC1 test coverage (statement
coverage)a0,a1,a2 coefficients
N0 the number of defects found by testN the number of defects remainingK fault exposure ratioTL linear execution time
the average execution time per demand
11
Research Heaven,West Virginia
Approach
Software qualityMeasures (SQM)
Reliability PredictionSystems (RPS)
RPS CombinationTechniques
SW Reliability CorroborationTesting
SQM1SQM3
SQM2SQM4
SQM6
SQM5SQMi
SQMj
RPS1 RPS2 RPSk RPSm. . .
RPS Combination (Experience, Learning, Dempster-Schafer…)
BHT software reliability corroboration
Null Hypothesis, H0
Alternative Hypothesis, Ha
Softw
are Developm
ent Lifecycle
Software ReliabilityMeasure
12
Research Heaven,West Virginia
Reliability “worthiness” of different RPS
Measure/RPS Relevance to Reliability
Failure rate 0.98
Test coverage 0.83
Mutation score 0.75
Fault density 0.73
Requirement specification change request
0.64
Class coupling 0.45
No. class methods 0.45
Man hours/defect detected 0.45
Function point analysis 0.00
32 measures ranked by
five experts
13
Research Heaven,West Virginia
Combining RPS
• Weighted sums used in initial experiments.– RPS results weighted by the expert opinion index. – Removing inherent dependencies/correlations.
• Dempster-Shafer (D-S) belief networks approach developed. – Network automatically built from datasets by the
Induction Algorithm.
• Existence of suitable NASA datasets? – Pursuing leads with several CMM level 5 companies.
14
Research Heaven,West Virginia
Approach
Software qualityMeasures (SQM)
Reliability PredictionSystems (RPS)
RPS CombinationTechniques
SW Reliability CorroborationTesting
SQM1SQM3
SQM2SQM4
SQM6
SQM5SQMi
SQMj
RPS1 RPS2 RPSk RPSm. . .
RPS Combination (Experience, Learning, Dempster-Schafer…)
BHT software reliability corroboration
Null Hypothesis, H0
Alternative Hypothesis, Ha
Softw
are Developm
ent Lifecycle
Software ReliabilityPrediction
15
Research Heaven,West Virginia
Bayesian Inference
• Allows for the inclusion of imprecise (subjective) probability of failure.
• Subjective estimate reflects beliefs.• Hypothesis on the event occurrence
probability is combined with new evidence, which may change the degree of belief.
16
Research Heaven,West Virginia
Bayesian Hypothesis Testing (BHT)
• Hypothesized reliability H0 comes as a result of RPS combination.
• Based on the level of (in)experience, the degree of belief assigned: P(H0).
• Corroboration testing now looks for the evidence in favor of the hypothesized reliability.
– Ho : <= o null hypothesis
H1 : > o alternative hypothesis.
17
Research Heaven,West Virginia
o P(H0) n0 n1 n2
0.01 0.01 457 476 497
0.001 0.01 2378 2671 2975
0.0001 0.01 6831 10648 14501
0.01 0.1 228 258 289
0.001 0.1 636 1017 1402
0.0001 0.1 853 3157 6150
0.01 0.4 90 128 167
0.001 0.4 138 411 739
0.0001 0.4 146 1251 3260
0.01 0.6 50 87 126
0.001 0.6 63 269 552
0.0001 0.6 65 827 2458
The number of corroboration tests according to BHT theory
18
Research Heaven,West Virginia
Controlled Experiments
• Two independently developed versions of PACS (smart card based access control).– Controlled requirements document (NSA specs).
Exception
message : char *
print()Exception()~Exception()
IOException
Display
fileName : char *
Display()write()clear()~Display()
Validator
Validator()getPIN()~Validator()
PACS
val : Validatordriver : CommuserLCD : DisplayofficerLCD : Displayaudit : Auditorname : char[21]ssn : char[10]pin : char[5]
PACS()run()init()failure()seeOfficer()openDoor()readCard()readPIN()~PACS()
Auditor
Auditor()writeEntry()~Auditor()
Comm
base : unsigned short int
Comm()readRegister()writeRegister()readCardData()waitForOne()waitForZero()~Comm()
TimeOutException
1
11
2
1
1
1
1
19
Research Heaven,West Virginia
RPS Experimentation
Measure Relevance to reliability
Predicted failure rate
Code defect density 0.85 0.078
Test coverage 0.83 0.092
Requirements traceability 0.45 0.078
Function point analysis 0.00 0.0020
Bugs per line of code (Gaffney) 0.00 0.000028
RPS predictions of system failure rates:
Predicted Failure Rate: 0.084
Actual Failure Rate: 0.09
20
Research Heaven,West Virginia
Reliability Corroboration
• Accurate predictors appear adequate – Low levels of trust in the prediction accuracy.– No experience in repeatability at this point in time.
oP(Ho) # corroboration tests
0.09 0.01 72
0.09 0.1 47
0.09 0.2 39
0.09 0.5 25
0.09 0.6 20
21
Research Heaven,West Virginia
“Research Side Products”
• Significant amount of time spent studying and developing Dempster-Shafer inference networks.
• “No hope” of demonstrating this work within the scope of integrating RPS results. – Availability of suitable datasets.
• But, some datasets are available. So, use them for D-S demo!– Predicting fault-prone modules in two NASA projects (KC2, JM1)– KC2 contains over 3,000 modules, 520 modules of research interest
• 106 modules have errors, ranging from 1 to 13• 414 modules are error free
– JM1 contains 10,883 modules
• 2,105 modules have errors, ranging from 1 to 26• 8,778 modules are error free
– Each dataset contains 21 software metrics, mainly McCabe and Halstead
22
Research Heaven,West Virginia
How D-S Networks Work
• Combining distinct sources of evidence by the D-S scheme.
• Building D-S networks by prediction logic.– Nodes connected by implication rules.– Each implication rule assigned a specific weight.
• Updating belief for the corresponding nodes – Propagating the updated belief to the neighboring nodes,
and throughout the entire network.
• D-S network can be tuned for a various range of verification requirements.
24
Research Heaven,West Virginia
D-S Networks vs. ROCKY
• KC2 • JM1
55
60
65
70
75
80
85
90
95
1 2 3 4 5
Experiment No.
Per
cent
(%)
Effort-DS
Effort-Ro
Acc-DS
Acc-Ro
PD-DS
PD-Ro
20
30
40
50
60
70
80
90
100
1 2 3 4 5
Experiment No.
Perc
ent (
%)
Effort-DS
Effort-Ro
Acc-DS
Acc-Ro
PD-DS
PD-Ro
25
Research Heaven,West Virginia
D-S Networks vs. See5
• KC2 • JM1
40
45
50
55
60
65
70
75
80
85
90
DecisionTree RuleSet Boosting
See5 Classifiers
Per
cent
(%)
PD-C5
PD-DS
Acc-C5
Acc-DS
0
10
20
30
40
50
60
70
80
90
DecisiontTree RuleSet Boosting
See5 Classifiers
Perc
ent (
%)
PD-C5
PD-DS
Acc-C5
Acc-DS
26
Research Heaven,West Virginia
D-S Networks vs. WEKA
KC2 dataset
30
40
50
60
70
80
90
100
WEKA Classifiers
Perc
en
t (%
)
PD-WEKA
PD-DS
Acc-WEKA
Acc-DS
27
Research Heaven,West Virginia
D-S Networks vs. WEKA
• JM1
0
10
20
30
40
50
60
70
80
90
100
Logistic
KernelDensity
NaiveBayesSimple J48 IBK IB1
VotedPerceptron VFI
HyperPipes
WEKA Classifiers
Per
cen
t (%
)
PD-WEKA
PD-DS
Acc-WEKA
Acc-DS
28
Research Heaven,West Virginia
Status and Perspectives
• Software reliability corroboration allows:– Inclusion of IV&V quality measures and activities into
the reliability assessment.– A significant reduction in the number of
(corroboration) tests.– Software reliability of safety/mission critical systems
can be assessed with a reasonable effort.
• Research directions.– Further experimentation (data sets, measures,
repeatability).– Defining RPS based on the “formality” of the IV&V
methods.