Software reliability & quality

SOFTWARE RELIABILITY AND QUALITY

By Nur ISLAM

AIM

Categorising and specifying the reliability of software systems

Discussing various issues associated with Software Quality Assurance (SQA)

SOFTWARE RELIABILITY

WHAT IS SOFTWARE RELIABILITY?

Probability of failure-free operation for a specified time in a specified environment for a given purpose

This means quite different things depending on the system and the users of that system

Informally, reliability is a measure of how well system users think it provides the services they require

Contd…

Cannot be defined objectively Reliability measurements which are quoted out

of context are not meaningful Requires operational profile for its

definition The operational profile defines the expected

pattern of software usage Must consider fault consequences

Not all faults are equally serious. System is perceived as more unreliable if there are more serious faults


What does matter most, assessing the core of the program (running 90% of the time) or non-core section (running 10% of the time) Use tools called program profilers (in Unix it is

called prof) Removing defects/bugs does not indicate

its effectiveness in increasing the reliability of the product Removing defects from non-core section does

not have the same effect as removing the ones in core section.


The perceived software reliability is an observant dependent If you do not face the problem, you do

not report it Different users have different views of

the systems and thus different quality and reliability assessments

The software reliability keeps changing as the defects are detected and fixed

HARDWARE Vs SOFTWARE RELIABILITY

An important characteristic feature that sets hardware and software reliability issues apart is the difference between their failure patterns

Hardware components fail due to very different reasons as compared to software components.

Hardware components fail mostly due to wear and tear, whereas software components fail due to bugs

Reliability Metrics

Hardware metrics not really suitable for software as they are based on component failures and the need to repair or replace a component once it has failed. The design is assumed to be correct

Software failures are always design failures. Often the system continues to be available in spite of the fact that a failure has occurred

Reliability Metrics

Probability of failure on demand This is a measure of the likelihood that the system

will fail when a service request is made POFOD = 0.001 means 1 out of 1000 service

requests result in failure Relevant for safety-critical or non-stop systems

Rate of fault occurrence (ROCOF) Frequency of occurrence of unexpected behaviour ROCOF of 0.02 means 2 failures are likely in each

100 operational time units Relevant for operating systems, transaction

processing systems

Reliability Metrics

Mean time to failure Measure of the time between observed failures MTTF of 500 means that the time between failures is 500

time units Relevant for systems with long transactions e.g. CAD

systems Availability

Measure of how likely the system is available for use. Takes repair/restart time into account

Availability of 0.998 means software is available for 998 out of 1000 time units

Relevant for continuously running systems e.g. telephone switching systems

Failure Consequences

Reliability does not take consequences into account

Transient faults have no real consequences but other faults might cause data loss or corruption

May be worthwhile to identify different classes of failure, and use different metrics for each

Failure Consequences

When specifying reliability both the number of failures and the consequences of each matter

Failures with serious consequences are more damaging than those where repair and recovery is straightforward

In some cases, different reliability specifications may be defined for different failure types

Failure Classification

Transient - only occurs with certain inputs

Permanent - occurs on all inputsRecoverable - system can recover

without operator helpUnrecoverable - operator has to

help Non-corrupting - failure does not

corrupt system state or data Corrupting - system state or data

are altered

Reliability Growth Modelling

Growth model is a mathematical model of the system reliability change as it is tested and faults are removed

Used as a means of reliability prediction by extrapolating from current data

Depends on the use of statistical testing to measure the reliability of a system version

Reliability Growth ModellingStep Function Model

The simplest reliability growth model:▪ a step function model

The basic assumption:▪ reliability increases by a constant amount

each time an error is detected and repaired. Assumes:▪ all errors contribute equally to reliability

growth▪ highly unrealistic:▪ we already know that different errors contribute

differently to reliability growth.


Jelinski and Moranda Model Realizes each time an error is

repaired reliability does not increase by a constant amount.

Reliability improvement due to fixing of an error is assumed to be proportional to the number of errors present in the system at that time.

Reliability Growth ModellingLittlewood and Verall’s Model

Assumes different fault have different sizes, thereby contributing unequally to failures.

Allows for negative reliability growth Large sized faults tends to be detected

and fixed earlier As number of errors is driven down with

the progress in test, so is the average error size, causing a law of diminishing return in debugging


Applicability of models: There is no universally applicable

reliability growth model. Reliability growth is not independent of

application. Fit observed data to several growth

models.▪ Take the one that best fits the data.

Statistical Testing

The objective is to determine reliability rather than discover errors.

Uses data different from defect testing.

Statistical Testing

Different users have different operational profile: i.e. they use the system in different ways formally, operational profile:▪ probability distribution of input

Divide the input data into a number of input classes: e.g. create, edit, print, file operations, etc.

Assign a probability value to each input class: a probability for an input value from that class

to be selected.

Statistical Testing

Determine the operational profile of the software: This can be determined by analyzing the usage

pattern. Manually select or automatically generate a

set of test data: corresponding to the operational profile.

Apply test cases to the program: record execution time between each failure it may not be appropriate to use raw execution time

After a statistically significant number of failures have been observed: reliability can be computed

Statistical Testing

Relies on using large test data set. Assumes that only a small percentage of

test inputs: likely to cause system failure.

It is straight forward to generate tests corresponding to the most common inputs: but a statistically significant percentage of

unlikely inputs should also be included. Creating these may be difficult:

especially if test generators are used.

Statistical Testing

Pros and cons -Say by yourself

SOFTWARE QUALITY

What is Software Quality?

Software quality is: - The degree to which a

system, component, or process meets specified requirements.

- The degree to which a system, component, or process meets customer or user needs or expectations.

Software Quality Criteria

Correctness Efficiency Flexibility Robustness Interoperability Maintainability Performance Portability Reliability Reusability Testability Usability Availability Understandability

Software Quality Management System

A quality management system is a principal methodology used by organizations to ensure that the products they develop have the desired quality

A quality system is the responsibility of the system as a whole, and the full support of the top management is a must

A good quality system must be well documented

Software Quality Management System

The quality system activities encompass the following: Auditing of projects Review of the quality system Development of standards, procedures

and guidelines… etc Production of reports for the top

management summarizing the effectiveness of the quality system in the organization

Evolution of Quality Systems Pre WWII, the usual way to produce

quality products is to inspect the final product for defective units

Then Quality Control (QC) principle was found: focuses not only on detecting the defective products and eliminating them, but also on determining the causes behind the defects (and fixing them), so that the product rejection rate can be reduced

Evolution of Quality Systems The Quality Assurance (QA) :the basic

premise of modern quality assurance is that if an organization's processes are good and are followed rigorously, then the products are bound to be of good quality

Total Quality Management (TQM) advocates that the process followed by an organization must continuously be improved through process management

Evolution of Quality Systems

TQM requires continuous improvement (more than just documenting the process and optimizing them through redesign)

Over the last six decades the quality management has shifted from inspection to Total quality management, and the quality assurance has shifted from product to process assurance

Evolution of Quality Systems

Quality Assurance Method

Quality Paradigm

Inspection

Quality Control (QC)

Quality Assurance (QA)

Total Quality Management (TQM)

Product Assurance

Process Assurance

Product Metrics versus Process Metrics

Product metrics help measure the characteristics of a product being developed, whereas process metric help measure how a process is performing

Product metrics: LOC ,FP, PM, time to develop the product, the complexity of the system … etc

Process metrics: review effectiveness, inspection efficiency … etc

THANK YOU

Education

Software reliability & quality