41
A Survey Of Fault Prediction Using Machine Learning Algorithms Presented by: Ahmed Magdy Ezzeldin

A survey of fault prediction using machine learning algorithms

Embed Size (px)

DESCRIPTION

Reliability is concerned with decreasing faults and their impact. The earlier the faults are detected the better. That's why this presentation talks about automated techniques using machine learning to detect faults as early as possible.

Citation preview

Page 1: A survey of fault prediction using machine learning algorithms

A Survey Of Fault Prediction Using Machine Learning Algorithms

Presented by: Ahmed Magdy Ezzeldin

Page 2: A survey of fault prediction using machine learning algorithms

Instroduction

The world relies on software heavily now so it should be reliable

Software Reliability is the probability of a software system or component to perform its intended function under the specified operating conditions over the specified period of time [1]

In other words the less faults there are in a software the more reliable it is.

Page 3: A survey of fault prediction using machine learning algorithms

What is Fault Proneness and Fault Predeiction

A fault is a problem in software that when run causes a failure.

Fault Proneness is the likelihood of a piece of software to have faults.

Fault prediction is identified as one major area to predict the probability that the software contains fault.

We will survey 4 papers that use Machine learning to predict faults as early as possible.

Page 4: A survey of fault prediction using machine learning algorithms

[1]

A Fuzzy Model for Early Software Fault Prediction Using Process Maturity and Software Metrics

Page 5: A survey of fault prediction using machine learning algorithms

What is Fuzzy Logic

Fuzzy logic is a form of logic deals deals with reasoning that is approximate rather than fixed and exact. Its variables may have a truth value that ranges in degree between 0 and 1.

It works by taking inputs in a range form then setting rules that define how these inputs will be used and then finding out the output and defuzzification by finding out a crisp value from a Fuzzy set.

Page 6: A survey of fault prediction using machine learning algorithms

The Model

The model considers two most significant factors, software metrics and process maturity together, for fault prediction.

Input: Reliability Relevant Metric List (RRML)

Output: Faults at the end of Requirements Phase

(FRP) Faults at the end of Design Phase (FDP) Faults at the end of Coding Phase (FCP)

Page 7: A survey of fault prediction using machine learning algorithms

RRML

Reliability Relevant Metric List (RRML) Requirements Metrics (RM)

Requirements Change Request (RCR) Review, Inspection and Walk through (RIW) Process Maturity (PM)

Design Metrics (DM) Design Defect Density (DDD) Fault Days Number (FDN) Data Flow Complexity (DC)

Coding Metrics (CM) Code Defect Density (CDD) Cyclomatic Complexity (CC)

Page 8: A survey of fault prediction using machine learning algorithms

Proposed Model

Early Fault Prediction Model

Page 9: A survey of fault prediction using machine learning algorithms

●(1) Early Information gathering Phase

a) Identify the Input and Output Variables according to subjective knowledge & expert opinion

b) Develop Fuzzy Profile of Identified Variables

Define the membership function using expert’s opinion, user’s expectations, and previous data

Page 10: A survey of fault prediction using machine learning algorithms

Inputs

Fuzzy Profile of RCR Fuzzy Profile of RIW

Fuzzy Profile of PM Fuzzy Profile of DDD

Page 11: A survey of fault prediction using machine learning algorithms

Fuzzy Profile of FDN Fuzzy Profile of DC

Fuzzy Profile of CC Fuzzy Profile of CDD

Page 12: A survey of fault prediction using machine learning algorithms

Outputs

Fuzzy Profile of FCP

Fuzzy Profile of FRP Fuzzy Profile of FDP

Page 13: A survey of fault prediction using machine learning algorithms

Fuzzy Rule Base

c) Develop Fuzzy Rule Base

From Domain Experts, historical data analysis of similar or earlier system, and engineering knowledge from existing literature’s

Rules in the form of ‘If A then B’

Page 14: A survey of fault prediction using machine learning algorithms

Fuzzy Rule Base

Page 15: A survey of fault prediction using machine learning algorithms

(2) Information processing phase

Mapping inputs on to output (fuzzy inference process or fuzzy reasoning)

Defuzzification is the process of deriving a crisp value from a fuzzy set using a defuzzification method.

Page 16: A survey of fault prediction using machine learning algorithms

Results

The number of faults at end of each phase.

Could only detect defects from 0 to 85

My opinion is that this should be mutiplied by a metric that show the size of the software (like function points, or object points) to predict the amount of faults in it.

Page 17: A survey of fault prediction using machine learning algorithms

Results [continued]

Page 18: A survey of fault prediction using machine learning algorithms

[2]

Software Fault Proneness Prediction Using Support

Vector Machines

Page 19: A survey of fault prediction using machine learning algorithms

What is SVM?

A support vector machine (SVM) is a supervised learning method that analyzes data and recognizes patterns. The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes comprises the input.

The approach uses an SVM model to find the relationship between object-oriented metrics and fault proneness empirically evaluated using the KC1 NASA data set of a storage management system for ground data written in C++ with 145 classes and 2107 methods and 40 KLOC.

Page 20: A survey of fault prediction using machine learning algorithms

Metrics Studied

Page 21: A survey of fault prediction using machine learning algorithms

Some Measures

Sensitivity is defined as the probability that a module which contains a fault is correctly classified [7]

Specificity is the proportion of correctly identified fault-free modules.[7]

Probability of False alarm (PF) is the proportion of fault-free modules that are classified erroneously. PF=1-specificity [7]

Precision is the probability of correctly predicting faulty modules among the modules classified as fault-prone. [7]

Completeness value, which is defined as the number of faults in faulty predicted classes divided by the number of faults in all classes. [8]

Page 22: A survey of fault prediction using machine learning algorithms

Results

Page 23: A survey of fault prediction using machine learning algorithms

Results [continued]

Page 24: A survey of fault prediction using machine learning algorithms

Results [continued]

Page 25: A survey of fault prediction using machine learning algorithms

Results [continued]

Sensitivity and Completeness of the model

Page 26: A survey of fault prediction using machine learning algorithms

[3]

A Genetic Algorithm Based Classification Approach for Finding Fault Prone Classes

Page 27: A survey of fault prediction using machine learning algorithms

What is GA?

A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems.

The accuracy of the developed system to find fault prone classes is measured as 80.14%

Page 28: A survey of fault prediction using machine learning algorithms

How it works?

In the beginning start with a large “population” of randomly generated “attempted solutions” to a problem then repeatedly do the following:

• Evaluate each of the attempted solutions

• Keep a subset of these solutions (the “best” ones)

• Use these solutions to generate a new population

• Quit when you have a satisfactory solution (or you run out of time)

With help of Genetic algorithm classification of the software components into faulty/fault-free systems is performed

Page 29: A survey of fault prediction using machine learning algorithms

Used Metrics

●Coupling between Objects (CBO)●Lack of Cohesion (LCOM)●Number of Children (NOC)●Depth of Inheritance (DIT)●Weighted Methods per Class (WMC)●Response for a Class (RFC)●Number of Public Methods (NPM)●Lines Of Code (LOC)

Page 30: A survey of fault prediction using machine learning algorithms

Flowchart of GA based approach

Page 31: A survey of fault prediction using machine learning algorithms

[4]

Comparing The Effectiveness Of Machine Learning

Algorithms For Defect Prediction

Page 32: A survey of fault prediction using machine learning algorithms

Machine Learning Algorithms used

3 machine learning algorithms J48 OneR Naïve Bayes

Used 29 Metrics Applied on 2 Small embedded pieces of

software written in C 121 modules having 9 defective ones 101 modules having 15 defective ones

Page 33: A survey of fault prediction using machine learning algorithms

J48

J48 : JAVA implementation of Quinlan’s C4.5 algorithm

C4.5 recursively splits a data set according to checks on attribute values

C4.5 uses greedy top-down construction technique to build classification decision trees using information theory

Page 34: A survey of fault prediction using machine learning algorithms

OneR

OneR induces simple rules based on a single attribute

OneR creates one rule for each attribute in the training data, then selects the rule with the smallest error rate to be the only one rule.

Determines the class that appears most often for an attribute value

A rule is simply a set of attribute values bound to their majority class.

The error rate is the number of training data instances that the class of an attribute value does not agree with the binding for that attribute value in the rule.[4]

Page 35: A survey of fault prediction using machine learning algorithms

Naïve Bayes

Naïve Bayes: based on theorem of Bayes posterior probability

Naïve Bayes assumes that all classes are conditionally independent i.e. there are no dependence relationship among

the attributes.

Naïve Bayes classifier estimates the probability of attribute values of each class from the training set by counting the frequency of each discrete attribute values. [4]

Page 36: A survey of fault prediction using machine learning algorithms

Results

Page 37: A survey of fault prediction using machine learning algorithms

Results [continued]

J48 and OneR performed better than Naïve Bayes.

The performance of J48, OneR and Naïve Bayes for correctly classified instances are 90.086%, 89.2562% and 85.124% respectively. [4]

Page 38: A survey of fault prediction using machine learning algorithms

Conclusion

Early fault prediction saves projects from budget overrun and risks.

We discussed 4 approaches to fault prediction using machine learning algorithms on different reliability relevant software metrics and Capability Maturity Model (CMM) level.

Results show that machine learning algorithms have good accuracy that can range from 80% to 90%

Machine Learning approaches can also help software maintenance developers to classifying software modules into faulty and non-faulty modules.

Page 39: A survey of fault prediction using machine learning algorithms

References

[1] A Fuzzy Model for Early Software Fault Prediction Using Process Maturity and Software Metrics (Ajeet Kumar Pandey & N. K. Goyal, Reliability Engineering Centre, IIT Kharagpur, INDIA)

[2] Software Fault Proneness Prediction Using Support Vector Machines (Yogesh Singh, Arvinder Kaur, Ruchika Malhotra)

[3] A Genetic Algorithm Based Classification Approach for Finding Fault Prone Classes (Parvinder S. Sandhu, Satish Kumar Dhiman, Anmol Goyal)

[4] Comparing The Effectiveness Of Machine Learning Algorithms For Defect Prediction by Pradeep Singh

Page 40: A survey of fault prediction using machine learning algorithms

References [continued]

[5] Mining Metrics to Predict Component Failures (Nachiappan Nagappan, Thomas Ball, and Andreas Zeller)

[6] Data Mining Static Code Attributes to Learn Defect Predictors (Tim Menzies, and Jeremy Greenwald)

[7] Techniques for evaluating fault prediction models (Yue Jiang & Bojan Cukic & Yan Ma)

[8] Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction (Tibor Gyimothy, Rudolf Ferenc, and Istvan Siket)

Page 41: A survey of fault prediction using machine learning algorithms

Thank YouQuestions?