50
Institute of Information Technology University of Dhaka SELECTION AND REPRESENTATION OF ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION Supervised by Dr. Mohammad Shoyaib Associate Professor Presented by Sadia Sharmin BSSE-0426

Thesis Final Presentation

Embed Size (px)

Citation preview

Page 1: Thesis Final Presentation

Institute of Information TechnologyUniversity of Dhaka

SELECTION AND REPRESENTATION OF ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION

Supervised byDr. Mohammad ShoyaibAssociate Professor

Presented bySadia SharminBSSE-0426

Page 2: Thesis Final Presentation

May 1, 2023

2

CONTENTS Background Motivation Problem Specification Objectives of Research Literature Review Methodology Result Analysis and Discussion Future Work

Page 3: Thesis Final Presentation

May 1, 2023

3

BACKGROUNDSoftware Defect

Any flaw or imperfection in a software work product or software process

Software Defect PredictionAn approach to find out the defected part earlier before

testing/releasing the product

Page 4: Thesis Final Presentation

May 1, 2023

4

AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS

Data SetPre-processing

Attribute SelectionTesting Data

Prediction ResultTraining Data

Prediction Model Training

Page 5: Thesis Final Presentation

May 1, 2023

5

MOTIVATION

Identifying the software bugs in an early stage

Allocating the test resources efficiently

Minimizing the cost of software development

Improving the quality and productivity of software

Page 6: Thesis Final Presentation

May 1, 2023

6

WHY NEED PRE-PROCESSING Noisy Data Outliers Missing value or Conflicting value Inconsistency

Page 7: Thesis Final Presentation

May 1, 2023

7

WHY NEED ATTRIBUTE SELECTION Attributes are not equally important No standard set of attributes

Page 8: Thesis Final Presentation

May 1, 2023

8

OBJECTIVES OF RESEARCH To find out how the existing pre-processing can be used with

the attribute selection methods more efficiently. To survey the existing methods and propose a proper

attribute selection method.

Page 9: Thesis Final Presentation

May 1, 2023

9

A GENERAL SOFTWARE DEFECT-PRONENESSPREDICTION FRAMEWORK [1] Defect prediction framework :

Data pre-processor: Log-filteringFeature selector: Forward Selection , Backward EliminationLearning algorithms : Naïve Bayes, J48, OneR

Page 10: Thesis Final Presentation

May 1, 2023

10

A GENERAL SOFTWARE DEFECT-PRONENESSPREDICTION FRAMEWORK [1] Small changes to data representation can have a major

impact Feature selection one attribute at a time is not a practical

solution for large datasets Different learning schemes should be chosen carefully for

different datasets There is no clear indication about which combination should

be used for a particular dataset

Page 11: Thesis Final Presentation

May 1, 2023

11

HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Five filter-based feature ranking technique Methodology

Min-max normalizationPair of each independent attribute and class attributeRanking the attributeSubset selection (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20)

Page 12: Thesis Final Presentation

May 1, 2023

12

HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Three metrics on average can be enough for building an

effective prediction model Eliminating 98.5% of the available metrics improves the

result It is not confirmed that it will work with all datasets

Page 13: Thesis Final Presentation

May 1, 2023

13

CHOOSING SOFTWARE METRICS FOR DEFECT PREDICTION: AN INVESTIGATION ON FEATURE SELECTION TECHNIQUES[3] Hybrid attribute selection approach

Feature ranking Feature subset selection

Removal of 85% metrics can enhance the performance of the prediction model

Page 14: Thesis Final Presentation

May 1, 2023

14

METHODOLOGYSAL: Selection of Attribute with Log filtering

Pre-process the data

with logarithmic

filter

Rank the Attribute

Select the best set of attributes

Build the predictor

Page 15: Thesis Final Presentation

May 1, 2023

15

PRE-PROCESSING

ln (n + where = 0.01

Page 16: Thesis Final Presentation

May 1, 2023

16

ATTRIBUTE RANKINGA1A2A3A4A5………An

Page 17: Thesis Final Presentation

May 1, 2023

17

ATTRIBUTE RANKINGA1A2A3A4A5………An

A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….

An 0.789

Individual Balance value

Page 18: Thesis Final Presentation

May 1, 2023

18

ATTRIBUTE RANKINGA1A2A3A4A5………An

Individual Balance value

A1A2A3A4A5………An

A1A2A1A3…….…….A3A1A3A2…….…….AmAn

Pair wise combinatio

n

A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….

An 0.789

Page 19: Thesis Final Presentation

May 1, 2023

19

ATTRIBUTE RANKINGA1A2A3A4A5………An

A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….

An 0.789

Individual Balance value

A1A2A3A4A5………An

A1A2A1A3…….…….A3A1A3A2…….…….AmAn

Pair wise combinatio

n

A1A2 0.896

A1A3 0.734

…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..

AmAn 0.897

Pair wise Balance value

Page 20: Thesis Final Presentation

May 1, 2023

20

ATTRIBUTE RANKINGA1A2A3A4A5………An

A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….

An 0.789

Individual Balance value

A1A2A3A4A5………An

A1A2A1A3…….…….A3A1A3A2…….…….AmAn

Pair wise combinatio

n

Pair wise Balance value

Average Balance value

for each attribute

A1A2 0.896

A1A3 0.734

…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..

AmAn 0.897

A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897

Page 21: Thesis Final Presentation

May 1, 2023

21

ATTRIBUTE RANKINGA1A2A3A4A5………An

A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….

An 0.789

Individual Balance value

A1A2A3A4A5………An

A1A2A1A3…….…….A3A1A3A2…….…….AmAn

Pair wise combinatio

n

Pair wise Balance value

Average Balance value

for each attribute

Average Balance Value = (Individual value +

Average value of n pair)/2

A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897

A1A2 0.896

A1A3 0.734

…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..

AmAn 0.897

Page 22: Thesis Final Presentation

May 1, 2023

22

ATTRIBUTE RANKINGA1A2A3A4A5………An

A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….

An 0.789

Individual Balance value

A1A2A3A4A5………An

A1A2A1A3…….…….A3A1A3A2…….…….AmAn

Pair wise combinatio

n

Pair wise Balance value

A1 0.765 A2 0.534A3 0.679A5 0.887A4 0.869… .…..… .…..An 0.897

Average Balance value

For each attribute A5 0.887

A4 0.869A10 0.765A8 0.750A9 0.696… .…..… .…..An 0.523

SortedBalance value in

decreasing order

A1A2 0.896

A1A3 0.734

…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..

AmAn 0.897

Page 23: Thesis Final Presentation

May 1, 2023

23

SELECT BEST SET OF ATTRIBUTESA5

A4 A10 A8 A9

.….. .…..An

Ranking of Attributes

Best Set of Attributes

Page 24: Thesis Final Presentation

May 1, 2023

24

SELECT BEST SET OF ATTRIBUTESA5

A4 A10 A8 A9

.….. .…..An

Ranking of Attributes

Best Set of Attributes

Page 25: Thesis Final Presentation

May 1, 2023

25

SELECT BEST SET OF ATTRIBUTESA5

A4 A10 A8 A9

.….. .…..An

Ranking of Attributes

Best Set of Attributes

Page 26: Thesis Final Presentation

May 1, 2023

26

SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

Page 27: Thesis Final Presentation

May 1, 2023

27

SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

Page 28: Thesis Final Presentation

May 1, 2023

28

SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

Page 29: Thesis Final Presentation

May 1, 2023

29

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

A4 2nd ranked

Page 30: Thesis Final Presentation

May 1, 2023

30

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

A4 2nd ranked

A5A4

Page 31: Thesis Final Presentation

May 1, 2023

31

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887 (previous)

A4 2nd ranked

A5A4 0.891 (new)Combined Balance value

Page 32: Thesis Final Presentation

May 1, 2023

32

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887 (previous)

A4 2nd ranked

A5A4 0.891 (new)Combined Balance value

new value > previous value

Page 33: Thesis Final Presentation

May 1, 2023

33

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5

Best Set of Attributes

A5 1st ranked 0.887

A4 2nd ranked

Page 34: Thesis Final Presentation

May 1, 2023

34

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

Page 35: Thesis Final Presentation

May 1, 2023

35

SELECT BEST SET OF ATTRIBUTES

A10 A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

Page 36: Thesis Final Presentation

May 1, 2023

36

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

A10 3rd ranked

Page 37: Thesis Final Presentation

May 1, 2023

37

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

A10 3rd ranked

A5A4A10

Page 38: Thesis Final Presentation

May 1, 2023

38

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

A10 3rd ranked

A5A4A10 0.856 (new)Combined Balance value

Page 39: Thesis Final Presentation

May 1, 2023

39

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A40.891

(previous)

A10 3rd ranked

A5A4A10 0.856 (new)Combined Balance value

new value < previous value

Page 40: Thesis Final Presentation

May 1, 2023

40

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of Attributes

A5A4 0.891

A10 3rd ranked Discarde

d

Page 41: Thesis Final Presentation

May 1, 2023

41

SELECT BEST SET OF ATTRIBUTES

A8 A9

.….. .…..An

Ranking of Attributes

A5,A4

Best Set of AttributesContinue this process…….

Page 42: Thesis Final Presentation

May 1, 2023

42

SELECT BEST SET OF ATTRIBUTES

A5,A4,A9,A12,A7

Best Set of Attributes

Page 43: Thesis Final Presentation

May 1, 2023

43

PERFORMANCE MEASUREMENT SCALES

Confusion MatrixPredicted

Actual TP FNFP TN

False Positive rate

True

Pos

itive

rate

0 1

1

Area Under the ROC curve (AUC)

Page 44: Thesis Final Presentation

May 1, 2023

44

RESULT AND DISCUSSIONS Data set : NASA MDP repository and PROMISE repository Classifier : Naïve Bayes Performance Metrics : Balance , AUC (Area Under the ROC

Curve) Programming Language : Java Machine Learning Tool : WEKA

Page 45: Thesis Final Presentation

May 1, 2023

45

RESULT AND DISCUSSIONS

Comparison of AUC values of

different methods

Date set

Wahono[4]

Abaei[5]

Ren [6]

      Lowest Highest  CM1 0.702 0.723 0.550 0.724 0.7946KC1 0.79 0.790 0.592 0.800 0.8006KC2 - - 0.591 0.796 0.8449KC3 0.677 - 0.569 0.713 0.8322KC4 - - - - 0.8059MC1 - - - - 0.8110MC2 0.739 - - - 0.7340MW1 0.724 - 0.534 0.725 0.7340PC1 0.799 - 0.692 0.882 0.8369PC2 0.805 - - - 0.8668PC3 0.78 0.795 - - 0.8068PC4 0.861 - - - 0.9049PC5 - - - - 0.9624JM1 - 0.717 - - 0.7167AR1 - - - - 0.8167AR3 - - 0.580 0.699 0.8590AR4 - - 0.555 0.671 0.8681AR5 - - 0.614 0.722 0.925AR6 - - - - 0.7566

Page 46: Thesis Final Presentation

May 1, 2023

46

RESULT AND DISCUSSIONSDataset Song [1] Wang [7] Jobaer

[8]CM1 0.695 0.663 0.5500 0.680JM1 0.585 0.678 - 0.6152KC1 0.707 0.718 - 0.7244KC2 - 0.753 - 0.7835KC3 0.708 0.693 0.6037 0.7529KC4 0.691 - - 0.7036MC1 0.793 - - 0.6904MC2 0.614 0.620 - 0.6847MW1 0.661 0.636 0.7202 0.6577PC1 0.668 0.688 0.5719 0.7040PC2 - - 0.7046 0.7468PC3 0.711 0.749 0.7114 0.7232PC4 0.821 0.854 0.7450 0.8272PC5 0.904 - - 0.9046AR1 0.411 - - 0.6651AR3 0.661 - - 0.8238AR4 0.683 - - 0.7051AR6 0.492 - - 0.5471

Comparison of Balance values

of different methods

Page 47: Thesis Final Presentation

May 1, 2023

47

FUTURE WORK Cross-project defect prediction Using other publicly available datasets

Page 48: Thesis Final Presentation

May 1, 2023

48

REFERENCES[1] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A

general software defect-proneness prediction framework." Software Engineering, IEEE Transactions on 37, no. 3 (2011): 356-370

[2] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics should be selected for defect prediction?" In FLAIRS Conference. 2011

[3] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of filter attribute selection techniques for software quality classification." In Information Reuse & Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009.

[4] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244.

[5] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95.

[6] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect prediction using machine learning." Journal of Applied Mathematics 2014 (2014).

Page 49: Thesis Final Presentation

May 1, 2023

49

REFERENCES [7] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect

prediction." Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443. [8] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa

Khaled, and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE, 2014

Page 50: Thesis Final Presentation

May 1, 2023

50