16
Nigerian Journal of Science Vol 51 - No 2 (2017) 77-91 Date Received 27/09/2017 Corrected & Acceptance Date 14/12/2017 Abstract Computer programming is central to Computer Science and Computer Science Education. However, many students who take programming courses do not find it easy getting a mastery of it. This study evaluates factors that mostly influence students' proficiency in programming presents as well as build data mining models for predicting student competence in computer programming using three data mining classification techniques. Socio-economic and students' personal data as well as their final grades in a computer programming course were obtained using a structured questionnaire from students of Computer Science in three Universities and one Polytechnic, all in Oyo State, Nigeria. These data were subjected to Decision Tree, Artificial Neural Network and Bayesian Network classification tasks using WEKA tool. Results from this study indicate that the factors that mostly influence students' proficiency in programming were the student interest level, self-efficacy, motivation, previous knowledge of programming and the lecturer's performance. Furthermore, the Decision Tree J48, multilayer perceptron and Naive Bayes techniques/algorithms created models whose overall accuracies were 71.57, 80.30 and 64.09 %, respectively. The study serves as a guide to predict students' performance especially those who are generally at risk of failing computer programming. Advisors, lecturers and management bodies could thus single out these ones and provide appropriate advice, recommendations and resources. Introduction Computer programming is defined as “the art and science of creating (developing) computer programs (instructions) for the computer hardware to perform” (Akinola, 2011). Undoubtedly, computer programming is the area that connects the theoretical foundations of computer science, the source of problem-solving algorithms, to modern computer architectures on which the corresponding programs produce solutions. Programming is thus central to computer science education. It behooves on any tertiary institution of learning that offers Computer Science as a discipline to make programming an essential course or series of courses in its curriculum. More so, the ubiquity of computers in everyday life in the st 21 century justifies the centrality of programming languages to Computer Science Education. To this end, many naturally assume that students with a degree in computer science have been well- grounded in computer programming and hence should be quite proficient in one or more of the many existing programming languages. 77 A Comparative Analysis of Classification Techniques in Educational Data Mining Using Computer Programming Proficiency Indicators S.O. AKINOLA AND B.G. ABRAHAM Department of Computer Science, University of Ibadan. Ibadan, Oyo State, Nigeria Corresponding Author Email: [email protected] and [email protected] Keywords: Computer Programming, Naïve Bayes, Multilayer Perceptron, C4.5, Decision tree, Artificial Neural Network

A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

Nigerian Journal of Science Vol 51 - No 2 (2017) 77-91Date Received 27/09/2017

Corrected & Acceptance Date 14/12/2017

Abstract

Computer programming is central to Computer Science and Computer Science Education. However, many students who take programming courses do not find it easy getting a mastery of it. This study evaluates factors that mostly influence students' proficiency in programming presents as well as build data mining models for predicting student competence in computer programming using three data mining classification techniques. Socio-economic and students' personal data as well as their final grades in a computer programming course were obtained using a structured questionnaire from students of Computer Science in three Universities and one Polytechnic, all in Oyo State, Nigeria. These data were subjected to Decision Tree, Artificial Neural Network and Bayesian Network classification tasks using WEKA tool. Results from this study indicate that the factors that mostly influence students' proficiency in programming were the student interest level, self-efficacy, motivation, previous knowledge of programming and the lecturer's performance. Furthermore, the Decision Tree J48, multilayer perceptron and Naive Bayes techniques/algorithms created models whose overall accuracies were 71.57, 80.30 and 64.09 %, respectively. The study serves as a guide to predict students' performance especially those who are generally at risk of failing computer programming. Advisors, lecturers and management bodies could thus single out these ones and provide appropriate advice, recommendations and resources.

Introduction

Computer programming is defined as “the art and science of creating (developing) computer programs (instructions) for the computer hardware to perform” (Akinola, 2011). Undoubtedly, computer programming is the area that connects the theoretical foundations of computer science, the source of problem-solving algorithms, to modern computer architectures on which the corresponding programs produce solutions.

Programming is thus central to computer science education. It behooves on any tertiary institution of learning that offers Computer Science as a discipline to make programming an essential course or series of courses in its curriculum. More so, the ubiquity of computers in everyday life in the

st21 century justifies the centrality of programming languages to Computer Science Education. To this end, many naturally assume that students with a degree in computer science have been well-grounded in computer programming and hence should be quite proficient in one or more of the many existing programming languages.

77

A Comparative Analysis of Classification Techniques in Educational Data Mining Using Computer Programming

Proficiency Indicators

S.O. AKINOLA AND B.G. ABRAHAMDepartment of Computer Science,

University of Ibadan. Ibadan, Oyo State, NigeriaCorresponding Author

Email: [email protected] and [email protected]

Keywords: Computer Programming, Naïve Bayes, Multilayer Perceptron, C4.5, Decision tree, Artificial Neural Network

Page 2: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

Remarkably though, learning to program is difficult for many students. While some students learn to program easily and quickly, others flounder (Wiedenbeck, et al., 2004). Researches on the root cause of this disparity have identified different factors culpable. They include self-efficacy, a student mental model, previous computer programming experience and a good background in mathematics or science (Wilson & Shrock 2001; Byrne & Lyons 2001). Although there are substantial literature on the factors affecting students' competence in their initial programming course, Wiedenbeck et al., (2004) opined that “more research was needed to determine which were the key factors, how they interact with each other, and how they combine to affect course outcomes”.

Data mining as a field of study and a process has been successfully used in “extracting nontrivial, implicit, previously unknown and potentially useful information (such as knowledge, rules, constraints, regularities) from data in databases” in different domains (Chen, Han, & Yu, 1997). In recent years, the research area of Educational Data Mining (EDM) has evolved (Asanbe, et. al., 2016). Data mining has thus been used in student databases to extract useful information and provide knowledge hitherto unknown.

Related Works

Success has been recorded in the application of data mining in different fields of endeavour – medicine, crime, sports, commerce, and government. In education though, its application is a relatively new discipline. Since the first “Education data mining” workshop in 2005, the field of Educational Data Mining (EDM) has grown substantially.

The website for this emerging field (www.educationaldatamining.org) defined educational data mining as “an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.”

Romero et al., (2008) set about comparing twenty-five (25) different data mining techniques and methods for the classification of students in a web-based educational environment. The main objective of the researchers was to classify students with equal final marks into different groups depending on the activities carried out in a web-based course. After testing, the best algorithms using the original numerical data were CART, GAP, GGP and NNEP (all recorded global percentage of correctly classified instances above 65%). Using categorical data, CART and C4.5 were the best (over 65% global PCC). Noted too is the fact that none of the algorithm exceeded 70% global PCC.

Abu Tair and El-Halees (2012) researched on using educational data mining to discover knowledge from educational data, study the main attributes that may affect the students' performance, improve graduate students' performance, and ultimately overcoming the problem of low grades of graduate students. Rule Induction and Naïve Bayesian classifier were adopted as classification method of choice. The Rule induction model yielded an accuracy of 71.25% and the attributes that influence the category of the target class are the “Secondary_School_Type”, “Matriculation_GPA”, City, “Gender” and “Speciality”. Applying the Naïve Bayesian classifier gave a model with an accuracy of 67.50%.

Akinola et al., (2012) investigated the relative importance of a student's pre-qualification ordinary level subject results to the proficiency of a particular programming course taken at the 200 level in the Computer Science Department of the University of Ibadan, Nigeria. In addition to the programming course result and the ordinary level results, the results obtained in the first year 100 level physics and mathematics courses were part of the necessary data for the research. The researchers made use of the Multi-Layer Perceptron Feed-Forward Back Propagation Neural Network technique for the data mining study. The result obtained from the study showed that candidates with a sound background in Physics and Mathematics (also in Further Mathematics) will generally perform efficiently in Computer Programming

78

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 3: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

and could possibly go on to achieve excellent programmer's status beyond the school years.

Olamiti and Osofisan (2015) assert that prospective students into Computer Science programme should be encouraged to put their all into all their subjects especially Mathematics at secondary school (junior and senior) and that they should be encouraged to take Further Mathematics at Senior Secondary Certificate Examination (SSCE) level. Adeyemo and Kuyoro (2013), using Decision Tree Algorithm, identified parental conditions (education, occupation and marital status) as socio-economic factors affecting students' performance.

Kabakchieva (2013) set about revealing the high potential of data mining applications for university management by implementing data mining techniques on records collected from the University of National and World Economy (UNWE). This was part of a data mining project, to seek patterns that would be invaluable for predicting students' performance. Based on the results achieved, the researcher emphasized that the decision tree classifier (J48) which had the overall highest accuracy performed the best followed by the rule learner (JRip) and the kNN classifier. The Bayesian classifiers were the worst. However he noted that all the test classifiers overall accuracies were below 70% and thus the error rate were high and the predictions unreliable.

Pandey and Taruna (2014) presented a multilevel classification model using four classification

techniques namely Decision Tree, Lazy Learner (KNN), Neural Network and Naïve Bayes Tree (a hybrid classification technique between Decision Tree and Naive Bayes classification). The researchers agreed that the Decision tree J48 algorithm (Multi-level Classification model) is the best suitable classifier for predicting the grades of the students. Methodology /Data Mining Process

The methodology for this research paper closely follows the steps in the Knowledge Discovery in Databases (KDD) process model as proposed by Fayyad et. al., (1996). This process which is both interactive and iterative, attempts to complete a particular discovery task.

The Data Set and Preprocessing

Data for this research was collected from databases of Computer Science students in four tertiary institutions in Oyo state – University of Ibadan (Federal), The Polytechnic Ibadan (State Polytechnic), Ajayi Crowther University, Oyo and Lead City University, Ibadan (Private Universities). The data collected were in two formats. First, the use of a structured questionnaire and the second was obtaining the results of the students in their very first computer programming course. For the four institutions, data collected spanned the 2014/2015, 2013/2014 and 2012/2013 sessions. The data set had a total of 401 records and 15 attributes. Table 1 presents the attributes, description and possible values.

Table 1: The Dataset Attributes, Description and Values

Attribute No.

Attribute Data type

Description Possible Values

0Name/Matric Alpha-

numeric

The name of the Student and/or unique ID no.

* The name of the Student and/or unique ID no.

1Gender nominal The gender of the

student, male or female

{M, F}

2 Age numeric The current age of student.

{A: integer value}

79

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 4: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

3 SecSchType

nominal

The secondary school type the student attended.

{public, private}

4 SecToughtProg

nominal

The secondary school taught a programming subject?

{yes, no}

5 InstitutionType

nominal

The institution type the student is currently a student of.

{federal, state, private}

6 PrevKnowledge

nominal The level of student prior knowledge of programming concepts.

{none, basic, good, very good}

7 Lect.Performance nominal The student assessment of the lecturer who taught their first programming course.

{excellent, good, average, poor}

8 InterestLevel

nominal

The student self-assessed interest level while taking the course.

{high, medium, low}

9 Self-efficacy nominal Student confidence level of succeeding in the course.

{high, medium, low}

10 Motivation nominal The student primary motivation while undertaking the course.

{challenging, to pass, love for programming}

11 PersonalPC

nominal

The student has a personal computer or laptop?

{yes, no}

12 Teamwork nominal Did the student learn the programming course as a team?

{yes, no}

13 PracticeHours nominal Student assessment of

hours spent on programming weekly.

{very much(15-21), much(8-14), fair(0-7)}

14 Income/Allowance

nominal

Students take on the allowance or income received while taking the course

{very sufficient, sufficient, insufficient}

15 Grade

nominal

A discretized variable got from the scores of student in the programming course.

{success

(50-100), failure

(0-49)}

80

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 5: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

81

As part of data preprocessing, all but 'Name/Matric' attribute were selected for use in the data mining task. This is because the Student name or Matric number present personal information of the students, and they also have large variances or a duplicity of information. This makes it irrelevant to the data mining classification task. Attribute Grade (15) which is the target class to be predicted was discretized and thus labelled as either success or failure based on the score of a student in the programming course. Label success was from 50- 100 while failure was from 0-49. In reality, students who scored 40-49 passed the course based on the normal grading scheme, but they only just. Since the goal of the research is predicting those who fail or are at risk of failing, these students who just passed by a whisker were deemed to be at risk of failing. The Software tool for the implementation of this study is the Waikato Environment for Knowledge Analysis (WEKA) – version 3.6.9.

Data Transformation

The dataset was subjected to four different attribute selection techniques to determine subsets of the attributes that could produce more accurate

and compact prediction models. These attribute evaluator techniques are ChiSquared AttributeEval, GainRatio AttributeEval, InfoGain AttributeEval and ReliefFAttributeEval. The Ranker search method was adopted for all the four techniques and default parameters were used in the evaluation. The attribute selection model used was the stratified 10 fold cross-validation with Seed value of 1. To address the class imbalance in the data set, we employed the “weka.filter.supervised.instance.Resample” filter as provided in the Weka software. This filter samples with replacement and produces a random subsample of a dataset. It takes the distribution into account in generating the sample. The default parameter for the filter was used.

Results and Discussion

In this section, we describe the results of applying three (3) data mining classification techniques to the preprocessed data of our case study. These techniques are the C4.5 Decision Tree (Implemented in WEKA as J48 Algorithm), Multilayer Perceptron Artificial Neural Network and the Naïve Bayes Bayesian Network Algorithms. Figure 1 shows screenshots of the evaluated attributes using the

Figure 1: Screenshot displaying the result of the ChiSquared AttributeEval

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 6: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

82

ChiSquared AttributeEval attributes evaluator techniques. It shows the average merits and average ranks of the attributes. Table 2 shows the results obtained for all the evaluator techniques, which juxtaposes the top ranking feature subsets based on the four attribute evaluators. The best ranked is first and the least ranked is last.

Table 2: Attribute Evaluators and Ranked Attribute

Taking a holistic view of Table 2, the consistently top ranked attributes across the four attribute evaluators, in order of merit or importance are attributes 8, 9, 10, 5, 6, 7, 11 and 13, which are Interest Level, Self-efficacy, Motivation, Institution Type, Previous Knowledge, Lecturer Performance, Having Personal PC and Practice Hours, respectively.

Figure 2: A chart visualizing the detailed performance measure of J48 Algorithm

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 7: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

83

This subset of the attributes as well as attribute Grade which is the response/target class was selected for developing the prediction models.

C4.5 Decision Tree Model

Figure 2 and Table 3 show that the decision tree learning algorithm took 0.01 second to build the model and the model generated an un-pruned tree; the size being 96 and the number of leaves as 65. The model correctly classified 287 instances correctly (71.5711%) while 114 instances (28.4289 %) were incorrectly classified. The confusion matrix shows that class success was predicted as success in 219 (True Positive, TP) instances whereas it predicted 78 instances (False Positives, FP) as success instead of failure. Thus, this gives a class success precision of 0.737. This means of all the examples the classifier labeled as success, 73.7 % of it was a correct prediction. This tells rather well of the classifiers exactness in classifying the success class.

As seen in Figure 2 and Tables 3, 4 and 5, the recall or TP Rate of 0.859 for class success indicated that 85.9% of the samples in class success were correctly classified as success. The recall for class failure though was 0.466. This gives a weighted average recall value of 0.716 for the model. This high proportion is due to the fact that there were few (36) False Negatives, FN. The F-Measure value shows the balance between the precision and the recall. Its value for the success class was 0.793 and the value for the failure class was 0.544. This gives a weighted average F-Measure value of 0.703. Since the F-measure reaches its best value at 1 and worst at 0, thus there is quite a good balance between precision and recall for the J48 tree learner model. The ROC Area highlights the tradeoff between TP rate and FP rate (how often the classifier incorrectly predicts success when the actual classification is failure). The J48 decision learner gave a ROC Area of 0.741 – a fair performance.

Table 3: Results of Modeling Student Data Using J48 Decision Tree Algorithm

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 8: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

84

Table 4: Confusion Matrix for C4.5 (J48) Algorithm

Table 5: Detailed performance measures for C4.5 (J48) Algorithm

Specific Rules Extraction

The J48 tree produced some Interesting IF-THEN rules, some of which are thus presented:

IF InterestLevel = high and Self-efficacy = high THEN Grade = success.

IF InterestLevel = high and Self-efficacy = medium and InstitutionType = federal and practiceHours = very much THEN Grade = success.

IF InterestLevel = high and Self-efficacy = medium and InstitutionType = state and lect.Performance = excellent and personalPC = yes THEN Grade = success.

IF InterestLevel = high and Self-efficacy = medium and InstitutionType = state and lect.Performance = good and prevKnowledge = basic and PracticeHours = much THEN Grade = success.

IF InterestLevel = high and Self-efficacy = medium and InstitutionType = private THEN Grade = success.

IF InterestLevel = high and Self-efficacy = low and Motivation = to pass THEN Grade = failure.……………………….

IF InterestLevel = medium and Lect.Peformance = excellent and prevKnowledge = none and Self-efficacy = low THEN Grade = failure.

IF InterestLevel = medium and Lect.Peformance = good and Self-efficacy = high and InstitutionType = federal Grade = success.

IF InterestLevel = medium and Lect.Peformance = good and Self-efficacy = low and prevKnowledge = none THEN Grade = failure.

IF InterestLevel = medium and Lect.Peformance = average and InstitutionType = state and PersonalPC = no and Motivation = challenging THEN Grade = failure.…………………………………..IF InterestLevel = low and InstitutionType = federal or private THEN Grade = failure.

Multilayer Perceptron Model

As captured in Figure 3 and Tables 6 to 8, the Multi Layer Perceptron (MLP) model had the number of True positives (TP) and False Positive (FP) as 219 and 43 respectively while the True Negatives (TN) and False Negatives (FN) recorded were 103 and 36 counts respectively as well. The multilayer perceptron classifier took 6.9 seconds to build the model. It had an overall classification accuracy of 80.30%, having correctly classified 322 instances and incorrectly classified 79 instances.

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 9: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

85

Figure 3: A chart visualizing the detailed performance measure of Multilayer Perceptron Algorithm

The TP rate of 0.859 shows 85.9% of the proportion of the instances that were actually success was predicted as success – a good recall value for the class success. Similarly, the recall value for failure class was 0.705 (70% proportion) thus giving an overall weighted recall value of 0.803 (80.3%) for the MLP model. This is attributed to the low false negative FN value. The precision of the classifier class success and failure (0.836 and 0.741 respectively) tells that out of all

the examples the classifier labeled as success, 83.6% of it was correct and of the examples labeled as failure, 74% of them were predicted rightly. The F-measure value for the class success and failure were 0.847 and 0.723 respectively. The weighted average value was thus 0.802. As this value is tending to 1, it indicates a good balance between precision and recall for the multilayer perceptron model.

Table 6: Results of Modeling Student Data Using Multilayer Perceptron Algorithm

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 10: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

86

Table 7: Confusion Matrix for Multilayer perceptron Algorithm

Table 8: Detailed performance measures for multilayer perceptron Algorithm

Naïve Bayes Model

From Figure 4 and Tables 9 -11, it is shown that the Naïve Bayes model with an overall classification accuracy of 64.1% took 0.01 second to build. Of the 401 instances, it correctly classified 257 and misclassified 144, a 35.9% of the instances. The number of True positives (TP) and True Negatives (TN) were 182 and 74 respectively while the False Positive (FP) and False Negatives (FN) had identical counts of 72 instances.

Figure 4: A chart visualizing the detailed performance measure of the Naive Bayes Algorithm

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 11: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

87

The detailed performance shows the class success precision rate value to be 0.718. This signifies that of all the instances the classifier labeled as success, 71.8% were actually success. Similarly too, the class success recall (0.718) indicates that of all the categorized class success instances they were, the classifier picked up 71.8%. This is as a result of the same False Positive (FP) and False Negative (FN) values of 72.0. The class failure had a precision value

Table 9: Results of modeling student data using Naïve Bayes Algorithm

Table 10: Confusion Matrix for Naïve Bayes Algorithm

Table 11: Detailed performance measures for Naïve Bayes Algorithm

of 0.507 (50.7%) and so too for the recall. The weighted average of the precision and recall for both class success and failure were thus 0.641 (64.1%). The ROC Area value was 0.701, meaning a poor performance from the Naive Bayes model.

Models ComparisonTable 12 shows the predicted accuracy of the

classifiers relative to the class distribution. The MLP model had the highest prediction accuracy for class success with 83.59 %, with the J48 measuring 73.73% and the Naïve Bayes with a poorest accuracy percent of 7l.76% for class success. The MLP also was the best predictor of class failure getting 103 instances right – a percentage of 74.10%.

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 12: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

88

Table 12: Confusion Matrix of the Classifiers

Table 13 shows, the J48 and NaiveBayes classifier were the fastest to build their models, using 0.01 seconds whereas the MLP used 6.9 seconds. The MLP's greater execution time is somewhat justified as it had the highest overall accuracy rate of 80.30%. It gave a correct classification of 322 instances of the total 401 dataset. The next best accuracy was the J48 model

Table 13: Experimental Statistics Summary of the Classifiers

with 71.57% having correctly classified 287 samples correctly. The Naïve Bayes performed worst with 64.09% accuracy as it classified 144 instances incorrectly. Figure 5 compares and contrasts the number of the correctly classified and incorrectly classified instances with a stacked cylinder chart.

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

Page 13: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

89

Figure 5: A chart representation of Correctly and Incorrectly classified instances of the dataset by the Classifiers

A bar chart of the Kappa Statistics for the three classifiers demonstrated in Figure 6 clearly reveals the MLP as having the highest value of 0.5701. This indicates a moderate agreement

between predicted and observed categorization, whereas the J48 and Naïve Bayes (NB) values only show a fair agreement.

Figure 6: A bar chart of the Kappa Statistics of the classifiers

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

Page 14: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

A focus on the TP rate values show that while Naïve Bayes had the lowest TP rate value of 0.718, the TP Rate for class success is highest in both the J48 and MLP algorithms (0.859). Whereas the TP rate for class failure is highest for the MLP algorithm (0.705), J48 TP rate for class failure on the hand recorded the worst (0.466). This made the MLP the standout performer in terms of the TP rate with the best weighted average of 0.803. In a nutshell, the MLP performed better in picking up class success instances of all the class success instances and way better in picking up class failure instances of all the class failure instances in the dataset.

The F-measure shows the level of balance between precision and recall in a model. Therefore the MLP model gave the most superb balance to precision and recall for both class success and failure (0.847 and 0.723 respectively).

The ROC Area highlights the trade of between the TP rate and the FP rate. A classifier making a random prediction has an expected area of 0.5. The J4 classifier gave a ROC Area value of 0.741, the MLP had 0.865 and the NB had 0.701. Since the area of a good to perfect ROC tends to 1, the MLP gave a more systematic prediction, followed by the J48 and then the NB.

The precision answers the question: Of all the instances the classifiers labeled as success what fraction was correct? The MLP classifier got the highest precision (0.836) for class success - that is, 83% of the instances it predicted as success were indeed success. Its precision for class failure was also superlative (0.741). The J48 performed next best, getting 73.7% (0.737) of its class success prediction spot on while its precision for class failure was 0.654. The NB had a precision of 71.8% (0.718) and 50.7% (0.507) for the class success and failure respectively, making it the least performer when considering precision. The weighted values for the MLP, J48, and NB were thus 0.801, 0.707 and 0.641 respectively.

Conclusions

Results from this study indicate that the factors that mostly influence students' proficiency in

programming were the student interest level, self-efficacy, motivation, previous knowledge of programming and the lecturer's performance. This result confirms our earlier work using Fuzzy Set Operation approach (Akinola and Nosiru, 2014). Furthermore, the C4.5 Decision Tree (J48), Multilayer Perceptron (MLP) and Naive Bayes techniques/algorithms created models whose overall accuracy were 71.57%, 80.30% and 64.09% respectively. The Multilayer Perceptron algorithm gave a model with superlative result relative to the J48 and Naive Bayes algorithms in terms of some performance measures that included precision recall, ROC area and Kappa Statistics. Although it took the longest time to build its model, the MLP recorded the highest overall accuracy (80.30%), lowest weighted average FP Rate of 0.239 and highest weighted average TP rate and precision rate. Its ability to predict class failure mirrored by its TN Rate (0.859) was only matched by J48. Put succinctly; it gave the best predictive model in this study. Hence, this can be used as a base for developing a student performance system.

This study focused on students' personal, social and economic factors in creating a predictive model for predicting student competence in computer programming. The dataset used however was got from tertiary institutions in a single state in Nigeria. For future work, it is recommended that the dataset be extended to reflect a region - e.g. the South-West, Western, South-South et cetera – in order to get predictive models on these same factors but which could be evaluated as for given region.

References

Abu Tair, M.M., & El-Halees, A.M. (2012). Mining educational data to improve students' performance: A case study. International Journal of Information and Communication Technology Research, Volume 2 no.2 (pp: 140-146).

Adeyemo A. B. and Kuyoro, S. O. (2013). Investigating the effect of students socio-economic/family background on students' performance in tertiary institutions using decision tree algorithm, Journal of Life and

S.O. Akinola and B.G. Abraham Computer Programming Proficiency Indicators

A Comparative Analysis of Classification Techniques in Educational Data Mining Using

90

Page 15: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

Physical Sciences, Vol. 4(2), pp. 61 - 78.Akinola, S. O. (2011). Java Companion for

Beginners: Olakem Educational Books.Akinola, S. O., Akinkunmi B.O., & Alo, T.S.

(2012). A Data Mining Model for Predicting Computer Programming Proficiency of Computer Science Undergraduate Students. Africa Journal of Comp & ICTs. Vol 5, No.1 pp 43– 52.

Akinola, S.O. & Nosiru, K.A (2014). Factors Influencing Students' Performance in Computer Programming: A Fuzzy Set Operations Approach. International Journal of Advances in Engineering & Technology, Sept., 2014. ISSN: 22311963.

Asanbe M. O., Osofisan A. O. and William W. F. (2016). Teachers' Performance Evaluation in Higher Educational Institution using Data Mining Technique. International Journal of Applied Information Systems 10(7):10-15.

Byrne, P. & Lyons, G. (2001). The effect of student attributes on success in programming. ITiCSE: Proceedings of the 6th annual conference on Innovation and technology in computer science education. ACM Press, NY, 49-52.

Chen, M.S., Han, J., & Yu, P. S. (1997). Data Mining: An Overview from Database Perspective. National Taiwan University, Taipei, Taiwan.

Romero, C., Ventura, S., Espejo, P.G., & Hervas, C. (2008). Data mining algorithms to classify students in educational data mining: In Proceedings of the 1st international conference on educational data mining, Montreal, Canada. (pp: 8–17).

Fayyad, U. M., Shapiro, G.P. & Smyth, P. (1996). From Data Mining to Knowledge Discovery in

Database. 0738-4602-1996, AI Magazine (Fall 1996): 37–53.

Kabakchieva, D. (2013). Predicting student performance by using data mining methods for classification. Cybernetics and information technologies, Vol 13, No.1 Online ISSN: 1314-4081 (pp: 61-72).

Olamiti, A. O. and Osofisan, A. O. (2015). Academic Background of Students and Performance in a Computer Science Programme in a Nigerian University, accessed January 2018 from http://sci.ui.edu.ng/sites/default/files/Academic%20Background%20of%20Students%20and%20Performance%20in%20a%20Computer%20Science%20Programme%20in%20a%20Nigerian%20University.pdf

Pandey, M., & Taruna, S. (2014). A multi-level classification model pertaining to the student's academic performance prediction. International Journal of Advances in Engineering & Technology (IJAET) Sept., 2014. ISSN: 22311963.

Romero, C., Ventura, S., Pechenizkiy, M., & Baker, R.S.J.D. (2010). Handbook of Educational Data Mining. Data Mining and Knowledge Discovry Series. CRC Press.

Wiedenbeck, S., Labelle, D., & Kain, V.N.R. (2004). Factors affecting courses outcomes in Introductory Programming. At the 16th Workshop of the Psychology of Programming Interest Group. Carlow, Ireland, April 2004.

Wilson, B.C. & Shrock, S. (2001). Contributing to success in an introductory computer science course: a study of twelve factors. In Proceedings of the 32nd SIGCSE Technical Symposium on Computer Science Education, ACM Press, NY, pp. 184-188.

S.O. Akinola and B.G. AbrahamNigerian Journal of Science Vol. 51- No 2 (2017): 77-91ISSN 0029 0114www.sciencenigeria.org

Nigerian Journal of Science Vol 51- No 2 (2017): 77-91

91

Page 16: A Comparative Analysis of Classification Techniques in ...sciencenigeria.org/journal/manuscript/completed/journal1549553361.pdfThese data were subjected to Decision Tree, Artificial

Blank