42
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder. BENEFITS OF SOFTWARE RELIABILITY ASSESSMENT AND SOFTWARE FMEAS Ann Marie Neufelder, SoftRel, LLC, 2020 [email protected] http://www.softrel.com 321-514-4659 1

Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

BENEFITS OF SOFTWARE RELIABILITY ASSESSMENT AND SOFTWARE FMEAS

Ann Marie Neufelder, SoftRel, LLC, [email protected]

http://www.softrel.com

321-514-4659

1

Page 2: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.2

Why do people do software reliability assessments? Software FMEAs?

• Software and firmware is growing in size by an average of 10-12% per year according to the General Accounting Office [1]

• With software, you only need one catastrophic failure to escape the testing cycle to effect ROI of the entire system. Software FMEAs can identify the failure modes that are difficult to see in testing but are catastrophic in operation.

• Leading causes of late software deliveries are[2]:• Defect pileup from previous releases resulted in unplanned

maintenance• Maintaining the previous release required unexpected

personnel from the current release which caused it to in turn be late

• Both of these can be predicted and managed with SRE

• Contrary to popular belief, the organizations that (legitimately) deliver on time also deliver with fewer defects• Does not apply to organizations that are on time via half

baked deliveries

• Almost any development practice that keeps the schedule on track, especially, early in development has the potential to also reduce the defects found in operation

2

Page 3: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.3

Software reliability

3

Quantitative assessment used for planning releases, staffing, and minimizing project risk.

Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early enough to effect design.

Ann Marie Neufelder is a recognized leader in both quantitative and qualitative software reliability

IEEE 1633 Recommended Practices for Software Reliability recommends both quantitative and qualitative approaches discussed herein

• Approved on first ballot with 100% approval by DoD, NASA, NRC, medical devices, energy systems, manufacturing

Page 4: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.4

Softrel, LLC currently has

largest database of software

development factors versus

reliability versus on time delivery

Since 1993, Softrel, LLC has been benchmarking actual operational and test data from 150+ real software systems based onReliability/ defect density of deployed

softwareProbability of on time deliveryMagnitude of schedule slip when not on

time689 Development practices and inherent

risks associated with the software releaseReliability growth

Every 12-18 months predictive models are recalibrated based on new data

Every 4-5 years models are rerun for new development and testing methods

Software projects span many industries and sizes and range from seriously distressed to world class success

4

Page 5: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.5

Softrel Database has Industry CoverageOur benchmarking is mainly on engineering systems that

contain software

5

Defense30%

Space4%

Medical8%

Commercial electronics

10%

Commercial software

4%Energy5%

Machinery39%

Page 6: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.6

Softrel Database has SW Release Outcome Coverage

Our data shows… Contrary to popular myth - on time

delivery and magnitude of late deliveries decrease as defect density decreases (On time is with respect to software engineering estimates) No successful SW project had more than

2 major risks No distressed SW project had fewer than

1 major risk Inherent risks typically can’t be avoided

and include New technology, new product, new

personnel, target hardware that doesn’t yet exist when software is being developed

Most distressed project had 612 times as many defects in operation as most successful project when normalized by effective code size

6

Successfulrelease

Mediocre release

Distressed release

Probability of late delivery (based on SE estimates) 10-25%

25% -85% 100%

Magnitude of late delivery as % of original schedule 12-25%

25%-67%

67%-200%

Defect removal upon operational deployment >= 75% 40-74% < 40%Fielded defect density per normalized effective size 0.04 0.31 1.63

Range of defect density.0056-.089

.090 -

.870..880 to 3.4

No major risks 78% 27% 0%Exactly one inherent risk 11% 64% 50%Exactly two inherent risks 11% 6% 30%Exactly three inherent risks 0% 0% 10%Four or more inherent risks 0% 3% 10%

Page 7: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.7

Softrel Database software projects over nearly every

size too

Effective size (EKSLOC) – amount of new and modified code plus de-rated amount of reused code

Database includes projects covering entire range of size875 to 1,587,000 lines of new or

modified code for a specific software release. Reused code is typically an

additional 100,000 to 10,000,000 SLOC 1 month project to 7 year project 1 month of labor to 290 years of

labor

Normalized EKSLOC – normalized to one base language so as tocompare projects developed in different languages

7

Page 8: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.8

How the SRE Assessment Models Works

© SoftRel, LLC 2020. This presentation may not be reprinted in whole or part without written permission from [email protected]

1. Complete assessment

PredictedGroup

World class

Distressed

Very goodAbove average

AverageFair

Impaired

Score

PredictedNormalized

FieldedDefectDensity

PredictedProbability

latedelivery

.011

2.069

.060

.112

.205

.6081.111

10%

100%

20%25%36%85%

100%

2.Defect density Probability late delivery

Identified from corresponding row

Predicted operational defects = Defect density x normalized effective size in KSLOC Predicted failure rate = Predicted defects per month / expected duty cycle.

Page 9: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.9

Key benefits of SRE quantitative assessment

Identify a failed project before it becomes failed. • Basic Softrel models predicts distressed, mediocre and successful before

code is written.• Detailed models allow for sensitivity analysis to identify cheapest and

fastest way to get back on track

Identify failed project early

Predict defect pileup – #1 cause of distressed programs. Predict the best way to schedule releases so as to avoid defect pileup.Predict

Identify effective and ineffective development practicesIdentify

ineffective practices

Select alternatives – Commercial off the shelf/vendor supplied, reinvent, or reuse to reduce effective code size and hence operational defects; replacing ineffective with effective development practices; planning for reliability growth

Select alternatives

Benchmark defect density of components to each other or to othersBenchmark to industry

Assess vendors, subcontractorsAssess

vendors

Page 10: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

IDENTIFY A FAILED SW

PROJECT WHEN IT’S

EARLY ENOUGH TO

MITIGATE

Reason #1 for SRE assessment

10

Page 11: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.11

SRE predicts this defect profile, so you can manage defects, size and on time delivery and identify a failed project

Height and width is function of new/modified code (effective size) and development techniques and risk (defect density). Incremental testing can cause multiple peaks.Every distressed project in our DB was not aware that SW failure rate was increasing upon deployment.

Def

ects

Normalized usage time

40 years of history to show that all software releases eventually experience a Rayleigh curve. Only difference is height, width and number of peaks.[3]

Successful deployments release SW from 75% removal onwards

Mediocre programs release SW after peak and before 75% removal

Distressed programs deploy before peak observed

40% defectsremoved

75% defectsremoved

Page 12: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.12

Projects are often late and unreliable when SRE isn’t used because of underestimates of scope and defect potentialNo one sets out to release software with increasing failure rate

It happens when SRE metrics aren’t used early in project when there is time to do something about it

Team is expecting a small number of defects when the larger number could have been predicted and managed before code was even written

© Softrel, LLC 2014 This presentation may not be copied in part or in whole without written permission from AM Neufelder. 12

Page 13: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.13

Real world application of

SRE assessment

• Company X had long history of successful software projects

• Until one, which compared to national average was mediocre, but compared to their past history was a failure

• Company X wanted to know why so• They don’t spend money fixing the wrong root

cause• To ensure that history doesn’t repeat itself

• Root cause was identified from SRE assessment• They tried to tackle 4 inherent risks in one release• They learned how to

• Identify the inherent risks that derail the project• Schedule them such that no more than 2 in any

one SW release• 2 smaller releases with 2 risks each is better than

1 large release with 4 risks with regards to both schedule and defects

• Testing longer and adding more people did not solve this problem, breaking the releases into smaller chunks did solve the problem.

13

Page 14: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

AVOID DEFECT PILEUP WHICH IS #1 CAUSE OF

LATE RELEASEReason #2 for SRE assessment

14

Page 15: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.15

Defect density/Defect prediction can be used to plan release sizes/frequency to avoid defect pileup Superimpose predicted defects from current and future releases together

15

0

2

4

6

8

10

12

Defects from release #1

0

2

4

6

8

10

12

Defects from Release #2In this example, defects

are piling up from release to release

Solutions to pileup –1) Split features up into more smaller releases

2) Keep the same spacing but less new code in each release

3) Keep the same code size but greater

spacing.Red – OperationalYellow – Formal testGrey – Developer testing

Page 16: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.16

Real example of avoiding defect pileup In the below real example, “kicking the can” predicted to cause defect pileup

Releases are too far apart initially and too close together at the end

SRE predictions allowed for leveling of features before code was even written

0100200300400500600700800900

Total defects predicted (nominal case) from releases 3 to 7 predicted for each month

Average per month = 132

Page 17: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

IDENTIFY EFFECTIVE AND INEFFECTIVE

DEVELOPMENT PRACTICES

Reason #3 for SRE Assessment

17

Page 18: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

Identify which software development characteristics and practices have biggest effect on software reliability Softrel, LLC has mathematically correlated 689 development characteristics related to

below categories to operational defect density

23 characteristics can identify distressed, mediocre and successful

Top 155 characteristics comprise detailed model which supports sensitivity analysis

Assessment identifies gaps as well as predicted improvement when addressing a gap

18

Category of questions Examples

Avoiding big blobs -Decomposition

Code a little, test a little philosophy. Release development/test time < 18 months long. Each developer has a schedule that is granular to day or week.

Domain Expertise Expertise of software engineers as end user or with industry

Inherent risks Government regulations, safety, cyber, untrained end users, etc.

Execution of project Monitoring software progress daily or weekly, identifying risks early, etc.

Personnel Small team sizes, software manager’s who don’t try to manage people and write code

Planning ahead Planning the scope, personnel, equipment, risks before they become problematic

Visualization A picture is worth 1000 words. Specifications with diagrams/pictures/tables are associated with fewer defects than text.

Requirements Developing requirements that aren’t missing anything important

System testing Testing the requirements, design, stresses, lines of code, operational profile

Unit testing Unit testing by every software engineer is mandatary and as per a defined template.Branch coverage tools and metrics.

Page 19: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.19

SRE Assessment considers Neufelder’s LawIn Softrel DB there have been No successful

releases when engineering cycle exceeds 18 months

All successful releases have <=18 monthengineering cycle

When the engineering cycle time is <= 8.5 months few SW projects fail

5/24/17 Software Reliability in Acquisitions 19

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70 80 90

DEP

LOY

ED D

EFEC

T D

ENSI

TY

MONTHS OF DEVELOPMENT/TEST TIME FOR RELEASE

Engineering cycle time versus defect density

Successful Mediocre Distressed

Page 20: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

Identify practices that don’t effect defects as much as people think• Twenty five years of research by Softrel, LLC shows….

20

Practices that don’t always reduce defects as much as people think

Why

Code reviews • Engineers rarely look at what is “missing” from the code• Agenda isn’t necessarily related to defects.• Action items aren’t followed up on. • Too much time spent on unimportant code.• Too much time spent on things that could be easily

identified with an automated tool.• Too much time on style -not enough on substance

SEI CMMi assessment ROI plateaus at level 3

Too much focus on formal validation and not enough on developer testing

Organizations forget to do unit and integration testing and focus only on requirements testing which covers < 40% of code

Waiting until the code is done to write the test plan

Test plan is based on what the code does as opposed to what it is required to do. Missing code falls through the cracks.

RM Tools such as DOORs It’s hard to have pictures/diagrams in DOORS. The problem isn’t the tool, it’s the “text” approach to requirements tracing.

Page 21: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

ABILITY TO SELECT ALTERNATIVES

WHILE ALTERNATIVES ARE STILL FEASIBLE

Reason #4 for SRE Assessment

21

Page 22: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.22

Use SRE assessment to perform sensitivity analysis

© SoftRel, LLC 2020. This presentation may not be reprinted in whole or part without written permission from [email protected]

1. Complete assessment

3. If project can improve to next group before code is written then…

•Average defect reduction = 55%•Average probability late reduction = 25%

PredictedPercentile Group

World class

Distressed

SuccessfulAbove average

AverageBelow average

Impaired

Score

PredictedNormalized

FieldedDefectDensity

PredictedProbability

latedelivery

.011

2.069

.060

.112

.205

.6081.111

10%

100%

20%25%36%85%

100%

2.Find defect density Probability late delivery)

from corresponding row

Page 23: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.23

Identify alternatives• Software defects found in operation are related to the below things which

can be traded off before code is written.

• Some of these things can be changed early in development.

• However, once code is written, testing longer and delaying schedule or deploying and living with field support is typically only alternative.

Parameter Sensitivity Resolution

EFFECTIVEsize

Cutting the EFFECTIVE size in half will double the MTTF.

Avoid reinventing the wheel with Reuse, COTS, FOSS when possible.

Defect density prediction (assessmentof practices and risks)

Cutting the defect density in half will double the MTTF. Problem is that this may not be possible in the short term. Generally not possible to reduce > 50% in one release.

Replace ineffective practices with effective practices.

Reliability Growth

Increasing test time on target hardware, removed defects and not adding new features during growth has exponential effect.

Deploy smaller releases and grow reliability while next release is under development.

Page 24: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

BENCHMARK DEFECT DENSITY OF COMPONENTS TO EACH OTHER OR TO OTHERS

Reason #5 for SRE Assessment

24

Page 25: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

Average defect density by industry and application type

IndustryFielded defect density per normalized EKSLOC* 95% confidence

Defense 0.0899 0.0357Space 0.2292Medical 1.0608 0.3946Commercial electronics 0.2373 0.1407Commercial transportation 0.0355Commercial software 0.1681 0.0832Energy 0.6573Machinery 0.7365 0.2675

*This means only defects found in operation (after testing).

ApplicationFielded defect density per normalized EKSLOC* 95% confidence

Vehicle 0.0956 0.0111Satellite 0.1023 0.0565Missiles 0.0108Software only 0.2477 0.2183Equipment 0.7037 0.2481Sensor or FW 0.2292Device 0.3377 0.2591Aircraft 0.0355

Page 26: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.26

Average Defect density by SEI CMMi level

CMMi level

Predicted fielded defect density

95% confidence

Predicted testing defect density

95% confidence

1 0.548 0.208 3.563 3.142 0.182 0.086 3.554 2.7553,4 or 5 0.1005 0.081 1.356 .351

Note that the Softrel, LLC database did not identify any measurable difference in fielded defect density beyond level 3

Page 27: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

The demonstrated accuracy of the modelsOne parameter models (that don’t employ an assessment) are more accurate than guessing but not as accurate as with an SRE assessment

Model Demonstrated relative error when used before code is written

Guessing 800%

Industry/application lookup 284% (RSQ = 9.9)

SEI CMMi model overall 450% (RSQ = 6.6)

SEI CMMi level 1 706%

SEI CMMi level 2 49% *

SEI CMMi level >= 2 155% *

All relative error demonstrations depend on accurate and complete inputs

*If, and only if, the SEI CMMi assessment is recent and the organization developing the software is working at that level consistently and throughout

Page 28: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

The demonstrated accuracy of the Softrel SRE Assessment models

• Relative error for world class is high because of small number defects predicted.• Example: If model predicts 1 defect and 2 are found, the relative error is 100%.• Shortcut model is relatively accurate but provides very little sensitivity analysis• All relative error demonstrations depend on accurate and complete inputs

Model#

of p

aram

eter

sDemonstrated relative error by percentile group when prediction performed before code is written

Overall

Wor

ldcl

ass

Very

good

Abo

ve a

vera

ge

Ave

rage

Belo

w a

vera

ge

Impa

ired

Dis

tres

sed

Full-scale 100 83% 188% 35% 71% 63% 26% 66% 96%Full-scale 208 113% 245% 50% 104% 83% 48% 73% 98%Full-scale 361 131% 302% 82% 103% 102% 23% 79% 81%Shortcut 22 90% 747% 60% 29% 21% 26% 42% 68%

Page 29: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

ASSESS VENDORS, SUBCONTRACTORS

Reason #6 for SRE Assessments

All models can be used to select and assess vendors/subcontractors

Assessment has one of seven outcomes which can be used relatively to compare one contractor or vendor to another

Several large organizations in industry and Government have used SRE assessment for that purpose

29

Page 30: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

SOFTWARE FMEAQualitative methods

30

Page 31: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.31

Software and firmware FMEAs are used to• Identify failure modes for systems that are difficult or expensive to test (i.e.

missiles, spacecraft)

• Identify a small number of catastrophic failures that would be difficult or expensive to identify during testing

• Identify a small number of catastrophic failures that span across multiple systems (i.e. mass produced systems)

• > 50% of operational failures are due to what was not specified and should have been. SFMEA is one of few tools that can identify this.

• Focus on failure space with regards to requirements, design, code, installation scripts, use cases, user manuals. (Reviews rarely focus on anything other than success space)

• Identify alternative processing, fault tolerance, health monitoring systems (HMS)

• Develop test plans that cover both off nominal and nominal cases

31

Page 32: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.32

A few software and firmware failure modesFailure mode

categoriesDescription

Faulty functionality The software provides the incorrect functionality, fails to provide required functionality, provides extraneous functionality

Faulty timing The software or parts of it execute too early or too late or the software responds too quickly or too sluggishly

Faulty sequence/ order A particular event is initiated in the incorrect order or not at all. Faulty data Data is corrupted, incorrect, in the incorrect units, etc.Faulty error detection and/or recovery

Software fails to detect or recover from a failure in the system

False alarm Software detects a failure when there is noneFaulty synchronization The parts of the system aren’t synchronized or communicating.Faulty Logic There is complex logic and the software executes the incorrect

response for a certain set of conditionsFaulty processing The software behaves improperly after an unexpected shutdownFaulty Algorithms/Computations

A formula or set of formulas does not work for all possible inputs

Faulty usability Software engineers have faulty assumptions about end users. End user’s can’t recover from mistakes they make. User manuals are incorrect, missing or not useful.

Page 33: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.33

Process for performing a Software Failure Modes Effects Analyses is similar to hardware FMEA except for failure modes and viewpoint

Prepare Software FMEA

Define scope Tailor the SFMEA

Generate CILMitigate

Analyze failure modes and root causes

Identify resources

Identify equivalent

failure modes

Identify consequences

Identify local/subsystem/

system failure effects

Identify severity

and likelihood

Identify corrective

actionsIdentify

preventivemeasures

Identify compensating

provisions

Analyze applicablefailure modes

Identify root causes(s) for each failure mode

Generate a Critical

Items List (CIL)

Identify boundary

Set ground

rules

Select View

points

Identifywhat cango wrong

Gather artifacts

Define likelihood

and severity

Select template

and tools

Revise RPN

Identify riskiest

functions

For each use case, use case steps,

requirements, interfaces, detailed design, user manuals,

Installation scripts …(as applicable based

on selected view point)

Page 34: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.34

SFMEA template

34

Softw

are

unde

r anl

aysi

s

Des

crip

tion

of u

se c

ase,

re

quire

men

t,in

terfa

ce, e

tc.

Failu

re m

ode

Roo

t cau

se

Loca

l effe

ct

Effe

ct o

n su

bsys

tem

Effe

ct o

n sy

stem

Prev

entiv

e m

easu

res

Seve

rity

Like

lihoo

d

RPN

= s

ever

ity* l

ikel

ihoo

dC

orre

ctiv

e ac

tion

Com

pens

atin

g Pr

ovis

ions

Rev

ised

RPN

Test

pro

cedu

re?

Actio

n ite

m?

Faul

t tol

eran

ce?

Failure analysis contains information pertinent to selected viewpoint:1. Use case2. SRS statement3. Interface definition4. Function5. User instructions6. Installation scripts

Consequences section. There can be more effects such as effects on manufacturer, effects on user, etc.

Severity ratings can use scale as HWLikelihood = f(development risk * visibility * past history * install base)

Mitigation section. Some SFMEA rows will feed the test procedures. Some will result in action items. Some will result in fault tolerance.

Page 35: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.35

Failure mode identification is most critical part of SFMEA• Peel the onion approach typically works the best

• What can go wrong with entire software? What’s missing altogether? What happens if a commonly executed function fails? What if software loses track of system state?

• What can go wrong with one use case or feature? What’s missing within the use case? What if steps execute out of order? What if timing is off? What if use case inadvertently executes? Does it conflict with other use cases?

• What can go wrong with one step in a use case or feature? What happens if software shuts down while executing this step of the use case? What happens if data is faulty?

35Entire software system

One use case or feature

One software requirement/s

pecification

Page 36: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.36

Summary

36

SRE quantitative assessments available via

• Training class (open session, online, on site) – Software Reliability Toolkit provided to every student. Has basic capabilities in macro enabled spreadsheet

• Frestimate software – Has a graphical user interface for the toolkit and has more features for planning and sensitivity analysis

• Services – Ann Marie Neufelder can perform predictions for you until employees are trained. She can also review SRE assessments once employees are trained.

SRE qualitative SFMEA available via

• Practical Applications of Software Reliability, 2014 available on website and Amazon

• SFMEA toolkit automates 400+ failure mode and root cause pairs. SFMEA toolkit and book bundle - $525

• SFMEA training (open session, online, on site)

• SFMEA services – Ann Marie Neufelder can perform SFMEAs for you until employees are trained. Ann Marie can also review the SFMEAs once employees are trained.

Page 37: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.

BACKUP MATERIALAnnex

37

Page 38: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.38

SFMEA viewpointsFMEA Viewpoints

Software viewpoint

Level of architecture applicable for viewpoint

Failure Modes

Functional The use cases, system and software requirements

The system does not do it’s required function or does the wrong function

Interface The interface design The system components aren’t synchronized or compatible

Detailed The detailed design or code The design and/or code isn’t implemented to the requirements or design

Maintenance A change to the design or code The change to the design or code will cause a new fault in the software

Usability The ability for the software to be consistent and user friendly

The end user causes a system failure because of the software interface

Serviceability The ability for the software to be installed or updated without a software engineer

The software doesn’t operate because it isn’t installed or updated properly

Vulnerability The ability for the software to protect the system from hackers

The software is performing the wrong functions because it is being controlled externally. Or sensitive information has been leaked to the wrong people.

Software production process

The ability for the software engineering process to uncover software faults prior to operational failure events.

The software system has faults that could have been found and corrected prior to operation.

Page 39: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.39

A few examples of big disasters caused by little SW faults

Failure Event Associated software faultSeveral patients suffered radiation overdose from the Therac 25 equipment in the mid-1980s. [THERAC]

A race condition combined with ambiguous error messages and missing hardware overrides.

AT&T long distance service was down for 9 hours in January 1991. [AT&T]

An improperly placed “break” statement was introduced into the code while making another change.

Ariane 5 Explosion in 1996. [ARIAN5]

An unhandled mismatch between 64 bit and 16 bit format.

NASA Mars Climate Orbiter crash in 1999.[MARS]

Metric/English unit mismatch. Mars Climate Orbiter was written to take thrust instructions using the metric unit Newton (N), while the software on the ground that generated those instructions used the Imperial measure pound-force(lbf).

28 cancer patients were over-radiated in Panama City in 2000. [PANAMA]

The software was reconfigured in a manner that had not been tested by the manufacturer.

On October 8th, 2005, The European Space Agency's CryoSat-1 satellite was lost shortly after launching. [CRYOSAT]

Flight Control System code was missing a required command from the on-board flight control system to the main engine.

A rail car fire in a major underground metro system in April 2007. [RAILCAR]

Missing error detection and recovery by the software.

Page 40: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.40

Ann Marie Neufelder authored current guidance for SFMEAGuidance Comments

Mil-Std 1629A Procedures for Performing a Failure Mode, Effects and Criticality Analysis, November 24, 1980. Cancelled on 8/1998.

Defines how FMEAs are performed but it doesn’t discuss software components

MIL-HDBK-338B, Military Handbook: Electronic Reliability Design Handbook, October 1, 1998.

Adapted in 1988 to apply to software. However, the guidance provides only a few failure modes and a limited example. There is no discussion of the software related viewpoints.

“SAE ARP 5580 Recommended Failure Modes and Effects Analysis (FMEA) Practices for Non-Automobile Applications”, July, 2001, Society of Automotive Engineers.

Introduced the concepts of the various software viewpoints. Introduced a few failure modes but examples and guidance is limited.

“Effective Application of Software Failure Modes Effects Analysis”, November, 2014, AM Neufelder, produced for Quanterion, Inc.

Identifies hundreds of software specific failure modes and root causes, 8 possible viewpoints and dozens of real worldexamples.

IEEE 1633 Recommended Practices for Software Reliability, 2016

Based on AM Neufelder 2014 publication

Page 41: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.41

GUIDANCE RELATED TO SOFTWARE

RELIABILITY

Guidance Comments

IEEE 1633 Recommended Practices for Software Reliability

2016 document is comprehensive and practical. 2008 document is not.

SAE JA 1002 and 1003 Software Reliability Program Implementation Guide

Useful for developing a software reliability plan. The techniques, however, are discussed elsewhere such as IEEE 1633.

DO178C Software Considerations in Airborne Systems and Equipment Certification

Probably the best software standard for ultra high reliable software.

Rome Laboratory TR-92-52: Software Reliability, Measurement, and Testing, 1992.

Great document but outdated.

DACS Software Reliability Sourcebook

Nice overview but doesn’t discuss predictions

The Handbook of Software Reliability Engineering

Encyclopedia type document.Parts are outdated.

System and Software Assurance Notebook

If combining software and hardware predictions, this is a must have document.

A Survey of Software Reliability Modeling and Estimation by Naval Surface Weapons Center

Contains the theory behind nearly every software reliability growth model

Page 42: Softrel, LLC Benefits of SRE Assessment and Software FMEAs€¦ · Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early

Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.42

References

[1] US General Accounting Office, “GAO report number GAO-10-706T entitled 'defense acquisitions: observations on weapon program performance and acquisition reforms' which was released on may 19, 2010.Http://www.Gao.Gov/products/GAO-10-706T

[2] A. Neufelder, “The Cold Hard Truth About Reliable Software”, Published by Softrel, LLC, 2016. http://www.softrel.com/truth.htm

[3] Some references includea) J. McCall, W. Randell, J. Dunham, L. Lauterbach, Software Reliability, Measurement, and Testing Software Reliability and Test Integration RL-TR-92-52, Rome Laboratory, Rome, NY, 1992 b) "System and Software Reliability Assurance Notebook", P. Lakey, Boeing Corp., A. Neufelder, produced for Rome Laboratory, 1997.c) Keene, Dr. Samuel, Cole, G.F. “Gerry”, “Reliability Growth of Fielded Software”, Reliability Review, Vol 14, March 1994.

© Softrel, LLC 2014 This presentation may not be copied in part or in whole without written permission from AM Neufelder.

42