Upload
sadia-sharmin
View
129
Download
0
Embed Size (px)
Citation preview
Institute of Information TechnologyUniversity of Dhaka
SELECTION AND REPRESENTATION OF ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION
Supervised byDr. Mohammad ShoyaibAssociate Professor
Presented bySadia SharminBSSE-0426
May 1, 2023
2
CONTENTS Background Motivation Problem Specification Objectives of Research Literature Review Methodology Result Analysis and Discussion Future Work
May 1, 2023
3
BACKGROUNDSoftware Defect
Any flaw or imperfection in a software work product or software process
Software Defect PredictionAn approach to find out the defected part earlier before
testing/releasing the product
May 1, 2023
4
AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS
Data SetPre-processing
Attribute SelectionTesting Data
Prediction ResultTraining Data
Prediction Model Training
May 1, 2023
5
MOTIVATION
Identifying the software bugs in an early stage
Allocating the test resources efficiently
Minimizing the cost of software development
Improving the quality and productivity of software
May 1, 2023
6
WHY NEED PRE-PROCESSING Noisy Data Outliers Missing value or Conflicting value Inconsistency
May 1, 2023
7
WHY NEED ATTRIBUTE SELECTION Attributes are not equally important No standard set of attributes
May 1, 2023
8
OBJECTIVES OF RESEARCH To find out how the existing pre-processing can be used with
the attribute selection methods more efficiently. To survey the existing methods and propose a proper
attribute selection method.
May 1, 2023
9
A GENERAL SOFTWARE DEFECT-PRONENESSPREDICTION FRAMEWORK [1] Defect prediction framework :
Data pre-processor: Log-filteringFeature selector: Forward Selection , Backward EliminationLearning algorithms : Naïve Bayes, J48, OneR
May 1, 2023
10
A GENERAL SOFTWARE DEFECT-PRONENESSPREDICTION FRAMEWORK [1] Small changes to data representation can have a major
impact Feature selection one attribute at a time is not a practical
solution for large datasets Different learning schemes should be chosen carefully for
different datasets There is no clear indication about which combination should
be used for a particular dataset
May 1, 2023
11
HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Five filter-based feature ranking technique Methodology
Min-max normalizationPair of each independent attribute and class attributeRanking the attributeSubset selection (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20)
May 1, 2023
12
HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Three metrics on average can be enough for building an
effective prediction model Eliminating 98.5% of the available metrics improves the
result It is not confirmed that it will work with all datasets
May 1, 2023
13
CHOOSING SOFTWARE METRICS FOR DEFECT PREDICTION: AN INVESTIGATION ON FEATURE SELECTION TECHNIQUES[3] Hybrid attribute selection approach
Feature ranking Feature subset selection
Removal of 85% metrics can enhance the performance of the prediction model
May 1, 2023
14
METHODOLOGYSAL: Selection of Attribute with Log filtering
Pre-process the data
with logarithmic
filter
Rank the Attribute
Select the best set of attributes
Build the predictor
May 1, 2023
15
PRE-PROCESSING
ln (n + where = 0.01
May 1, 2023
16
ATTRIBUTE RANKINGA1A2A3A4A5………An
May 1, 2023
17
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….
An 0.789
Individual Balance value
May 1, 2023
18
ATTRIBUTE RANKINGA1A2A3A4A5………An
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….
An 0.789
May 1, 2023
19
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
Pair wise Balance value
May 1, 2023
20
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
Average Balance value
for each attribute
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897
May 1, 2023
21
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
Average Balance value
for each attribute
Average Balance Value = (Individual value +
Average value of n pair)/2
A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
May 1, 2023
22
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
A1 0.765 A2 0.534A3 0.679A5 0.887A4 0.869… .…..… .…..An 0.897
Average Balance value
For each attribute A5 0.887
A4 0.869A10 0.765A8 0.750A9 0.696… .…..… .…..An 0.523
SortedBalance value in
decreasing order
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
May 1, 2023
23
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
24
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
25
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
26
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
27
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
28
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
29
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
May 1, 2023
30
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
A5A4
May 1, 2023
31
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)Combined Balance value
May 1, 2023
32
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)Combined Balance value
new value > previous value
May 1, 2023
33
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
May 1, 2023
34
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
May 1, 2023
35
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
May 1, 2023
36
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
May 1, 2023
37
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10
May 1, 2023
38
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10 0.856 (new)Combined Balance value
May 1, 2023
39
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A40.891
(previous)
A10 3rd ranked
A5A4A10 0.856 (new)Combined Balance value
new value < previous value
May 1, 2023
40
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked Discarde
d
May 1, 2023
41
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of AttributesContinue this process…….
May 1, 2023
42
SELECT BEST SET OF ATTRIBUTES
A5,A4,A9,A12,A7
Best Set of Attributes
May 1, 2023
43
PERFORMANCE MEASUREMENT SCALES
Confusion MatrixPredicted
Actual TP FNFP TN
False Positive rate
True
Pos
itive
rate
0 1
1
Area Under the ROC curve (AUC)
May 1, 2023
44
RESULT AND DISCUSSIONS Data set : NASA MDP repository and PROMISE repository Classifier : Naïve Bayes Performance Metrics : Balance , AUC (Area Under the ROC
Curve) Programming Language : Java Machine Learning Tool : WEKA
May 1, 2023
45
RESULT AND DISCUSSIONS
Comparison of AUC values of
different methods
Date set
Wahono[4]
Abaei[5]
Ren [6]
Lowest Highest CM1 0.702 0.723 0.550 0.724 0.7946KC1 0.79 0.790 0.592 0.800 0.8006KC2 - - 0.591 0.796 0.8449KC3 0.677 - 0.569 0.713 0.8322KC4 - - - - 0.8059MC1 - - - - 0.8110MC2 0.739 - - - 0.7340MW1 0.724 - 0.534 0.725 0.7340PC1 0.799 - 0.692 0.882 0.8369PC2 0.805 - - - 0.8668PC3 0.78 0.795 - - 0.8068PC4 0.861 - - - 0.9049PC5 - - - - 0.9624JM1 - 0.717 - - 0.7167AR1 - - - - 0.8167AR3 - - 0.580 0.699 0.8590AR4 - - 0.555 0.671 0.8681AR5 - - 0.614 0.722 0.925AR6 - - - - 0.7566
May 1, 2023
46
RESULT AND DISCUSSIONSDataset Song [1] Wang [7] Jobaer
[8]CM1 0.695 0.663 0.5500 0.680JM1 0.585 0.678 - 0.6152KC1 0.707 0.718 - 0.7244KC2 - 0.753 - 0.7835KC3 0.708 0.693 0.6037 0.7529KC4 0.691 - - 0.7036MC1 0.793 - - 0.6904MC2 0.614 0.620 - 0.6847MW1 0.661 0.636 0.7202 0.6577PC1 0.668 0.688 0.5719 0.7040PC2 - - 0.7046 0.7468PC3 0.711 0.749 0.7114 0.7232PC4 0.821 0.854 0.7450 0.8272PC5 0.904 - - 0.9046AR1 0.411 - - 0.6651AR3 0.661 - - 0.8238AR4 0.683 - - 0.7051AR6 0.492 - - 0.5471
Comparison of Balance values
of different methods
May 1, 2023
47
FUTURE WORK Cross-project defect prediction Using other publicly available datasets
May 1, 2023
48
REFERENCES[1] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A
general software defect-proneness prediction framework." Software Engineering, IEEE Transactions on 37, no. 3 (2011): 356-370
[2] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics should be selected for defect prediction?" In FLAIRS Conference. 2011
[3] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of filter attribute selection techniques for software quality classification." In Information Reuse & Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009.
[4] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244.
[5] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95.
[6] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect prediction using machine learning." Journal of Applied Mathematics 2014 (2014).
May 1, 2023
49
REFERENCES [7] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect
prediction." Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443. [8] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa
Khaled, and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE, 2014
May 1, 2023
50