A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis

Preview:

DESCRIPTION

Talk from ISSRE 2010

Citation preview

© ABB Group April 9, 2023 | Slide 1

A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis

Heiko Koziolek, Bastian Schlich, Carlos Bilich, ABB Corporate Research, 2010-11-01

Architecture-based Software Reliability Analysis (ABSRA)

What?

Typical questions of software architects concerning reliability

„What is the reliability (probability of failures) in my system?“

„How do individual components contribute to the system reliability?“

„Which architectural alternative is best for reliability?“

„Where shall I introduce fault-tolerance mechanisms?“

„How to distribute my limited testing efforts among components?“

Additional questions by ABB

„How much more reliable is a new architecture than a former one?“

„Does ABSRA work on large-scale systems?“

© ABB Group April 9, 2023 | Slide 2

Architecture-based Software Reliability Analysis (ABSRA)

How?

© ABB Group April 9, 2023 | Slide 3

Softwarecomponents, control flow, reliabilities

R=0.995

R=0.982

R=0.937

Markov Model

combine

Markov Model

Solution

trans-form

R = 0.9923Predicted system

reliabilitysolve

im-prove

Related workExisting empirical studies

© ABB Group April 9, 2023 | Slide 4

”… very little effort has been devoted to the validation of architecture-based software reliability techniques.”

[Gokhale2007, IEEE Transactions on Dependable and Secure Computing, Vol. 4, No. 1]

Source Name Year Lang. LOC # Components[Gokhale2004, Perf. Eval.]

SHARPE 1998 C 35,000 30

[Goseva2001, ISSRE]

ESA 2001 C 10,000 3

[Goseva2005,ISSRE]

GCC 2005 C 350,000 13

[Wang2005,JSS]

SMS 2006 C/C++ 13,000 15

[Goseva2006,ISSRE]

IDN 2006 C 11,000 6

Source Name Year Lang. LOC # Components[Gokhale2004, Perf. Eval.]

SHARPE 1998 C 35,000 30

[Goseva2001, ISSRE]

ESA 2001 C 10,000 3

[Goseva2005,ISSRE]

GCC 2005 C 350,000 13

[Wang2005,JSS]

SMS 2006 C/C++ 13,000 15

[Goseva2006,ISSRE]

IDN 2006 C 11,000 6

Our Paper ABB 2010 C++ >3,000,000 8 (>100)

System under study: Process control system

© ABB Group April 9, 2023 | Slide 5

System under study: Process control systemTopology

© ABB Group April 9, 2023 | Slide 6

Plant / Office Network

NetworkIsolation

Device

RemoteWorkplaces

Firewall

Internet

RemoteWorkplaces

Redundant Network

Workplaces

Controllers

Servers

Fieldbus

Remote I/O andField devices

System under study: Process control systemSubsystems within the servers

© ABB Group April 9, 2023 | Slide 7

Which steps are required for ABSRA?

Estimate component failure probabilities

Estimate transition probabilities

Construct the Markov model

Exploit the results

© ABB Group April 9, 2023 | Slide 8

Estimate component failure probabilitiesExisting methods

Code metrics [Nagappan2006]

• Validity debated

Reliability growth modeling [IEEE Std 1633-2008]

• Requires component failure reports

Random/statistical testing [Miller1992]

• Does not scale, difficult to apply on components

Fault injection [Gokhale2004]

• Does not determine the current reliability

Explicit failure modeling [Cheung2008]

• Accuracy unknown

© ABB Group April 9, 2023 | Slide 9

Reliability growth modelingGeneral principle

© ABB Group April 9, 2023 | Slide 10

0 ,

)(

))(exp()()(),,(

1

llilii

ilg

Littlewood/Verrall Model

Reliability growth modeling Using the Littlewood/Verrall-model on one subsystem

© ABB Group April 9, 2023 | Slide 11

Filtered subsystem bug list Release dates

Curve fitting in CASRE 3.0http://www.openchannelsoftware.com/projects/CASRE_3.0/

Reliability growth modeling Result

© ABB Group April 9, 2023 | Slide 12

R1= ...

R8= ...

R4= ...

R3= ...

R5= ...

R6= ...

R7= ...

R2= ...

Which steps are required for ABSRA?

Estimate component failure probabilities

Estimate transition probabilities

Construct the Markov model

Exploit the results

© ABB Group April 9, 2023 | Slide 13

Estimate component transition probabilitiesExisting methods

Exploiting design document [Gokhale2007]

• Only static dependencies in SW architecture

Profiling [Goseva2005]

• Complicated filtering of data required

Manual code instrumentation• Can be time-comsuming

© ABB Group April 9, 2023 | Slide 14

Self-coded script

Estimate component transition probabilitiesProfiling with proprietary tools

© ABB Group April 9, 2023 | Slide 15

Example trace from profiling

Set up and ran the system

Which steps are required for ABSRA?

Estimate component failure probabilities

Estimate transition probabilities

Construct the Markov model

Exploit the results

© ABB Group April 9, 2023 | Slide 16

Construct the Markov modelExisting state-based methods

[Littlewood1979]

[Cheung1980]

[Laprie1984]

[Kubat1989]

[Gokhale1998]

[Ledoux1999]

[Gokhale1998-2]

© ABB Group April 9, 2023 | Slide 17

[Goseva-Popstojanova2001]

Cheung modelAdding failure & end states, compute reliability

© ABB Group April 9, 2023 | Slide 18

[Cheung1980]

Which steps are required for ABSRA?

Estimate component failure probabilities

Estimate transition probabilities

Construct the Markov model

Exploit the results

© ABB Group April 9, 2023 | Slide 19

Exploit the resultsPossibilities

Estimate system reliability [Cheung1980]

• Experience by customers hard to validate

Conduct sensitivity analysis [Gokhale2002]

• Study system reliability for varying component failure rates

Assess costs of bugs [Cheung1980]

• Quantify the effect of an error in component

Evaluate design alternatives [Goseva2001]

• Values for new componentes need to be guessed

Allocate test budgets efficiently [Pietrantuono2010]

• Test critical components more often

© ABB Group April 9, 2023 | Slide 20

Sensitivity AnalysisImpact of varying subsystem failure rates

© ABB Group April 9, 2023 | Slide 21

http://www.prismmodelchecker.org/

Evaluation Cost estimations in person hours (best/worst case)

© ABB Group April 9, 2023 | Slide 22

ConclusionsLessons learned

Getting failure and transition probabilities is hard

Time consuming, error-prone, limited automation

Main obstacle for ABSRA is data collection

Currently rather simple models

No technologies, concurrency, hardware

Difficult to evaluate architecture alternatives

Limited decision support from the predictions

Lack of empirical studies in literature

Predominantly small systems

Often dubious techniques for estimating failure rates

Replicated case studies needed

© ABB Group April 9, 2023 | Slide 23

© ABB Group April 9, 2023 | Slide 24

Recommended