Single and Ensemble Classification for Predicting …...Zarqa, Jordan Ala’a Al-shdaifat2 Department of Computer Science The Hashemite University Zarqa, Jordan Abstract—Classiﬁcation

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 7, 2020

Single and Ensemble Classification for PredictingUser’s Restaurant Preference

Esra’a Alshdaifat1Department of Computer Information System

The Hashemite UniversityZarqa, Jordan

Ala’a Al-shdaifat2Department of Computer Science

The Hashemite UniversityZarqa, Jordan

Abstract—Classification is one of the most attractive andpowerful data mining functionalities. Classification algorithms areapplied to real-world problems to produce intelligent predictionmodels. Two main categories of classification algorithms can beadopted for generating prediction models: Single and Ensembleclassification algorithms. In this paper, both categories are utilizedto generate a novel prediction model to predict restaurantcategory preferences. More specifically, the central idea espousedin this paper is to construct an effective prediction model,using Single and Ensemble classification algorithms, to assistpeople to determine the best relevant place to go based on theirdemographic data, income level and place preferences. Therefore,this paper introduces a new application of classification task.According to the reported experimental results, an effectiveRestaurant Category Preferences Prediction Model (RCPPM)could be generated using classification algorithms. In addition,Bagging Homogeneous Ensemble classification produced the mosteffective RCPPM.

Keywords—Classification; data mining; ensemble algorithms;restaurant preferences

I. INTRODUCTION

With the increasing accessibility of innumerable data col-lections, the extraction of interesting patterns from such databecomes a necessity. Data mining involves extracting interest-ing and helpful patterns from enormous amount of data [1].Classification is a well-known data mining functionality thatrefers to the process of generating a prediction model and usingit to predict categories for new unseen samples. More specifi-cally, classification can be considered as a three-step process.The first step commences with generating the prediction modelusing the “training” dataset that comprises a set of samples,where each sample is associated with a categorical class label.The classification problems can be differentiated according to:(i) the number of class labels featured in the dataset and (ii)the number of the class labels associated with each samplein the dataset. With respect to the number of labels featuredin the dataset two kinds of classification problems can berecognized: binary and multi-class classification problems. Inbinary classification problems, the considered dataset includesonly two labels, while more than two labels featured inthe multi-class classification problems. Regarding the numberof labels associated with each sample in the dataset, alsotwo types of classification problems can be distinguished:single-label and multi-label classification problems. When eachsample in the dataset is associated with exactly one labelthen we have a single-label classification process. Whilst, ifseveral labels can be associated with one sample then we

have a multi-label classification process. Several classificationalgorithms can be utilized to produce the prediction modelfor each classification problem. After generating the predictionmodel, the next step is the evaluation in which the performanceof the generated prediction model is assessed to determine itsapplicability to be used for predicting class labels for newsamples. Several measures can be used to evaluate predictionmodels effectiveness; accuracy and Area Under the ROC Curve(AUC) are the most widely used measures [2], [3]. Based onthe values obtained from evaluation measures, a decision canbe drawn regarding whether or not to utilize the model forfuture prediction. The last step in the classification processis the model usage, where the prediction model is utilizedto predict class labels for new unseen data. Classificationhas been employed in many application domains, examplesof application domains include: text categorization [4], bioin-formatics [5], manufacturing [6], e-learning evaluation sys-tem [7], medical diagnosis [8], data management [9], musiccategorization [10] and movie genre prediction [11]. Amongthese music categorization and movie genre predictions orgenre preferences prediction [12], [13] could be considered asentertainment applications of classification. To the best of ourknowledge, no previous work utilized classification algorithmsfor predicting restaurant category preferences.

In this paper, a novel application of classification is in-troduced. Classification algorithms are utilized to generateRestaurant Category Preferences Prediction Model (RCPPM).RCPPM could be considered as an entertainment applicationof data mining. Using RCPPM the category of the preferredrestaurant could be predicted for the user relying on his/herdemographic data, income level and place preferences. Thiswould help people to know the most suitable restaurant cat-egory for them without wasting time trying several places orsearching among a huge amount of the available options. Tothis end: (i) a novel dataset was collected, using a survey, inorder to build the desired prediction model and (ii) severalclassification styles, i.e. single and ensemble classificationalgorithms were utilized. The RCPPM is a single-label multi-class classification. More specifically, each sample (user) is as-signed with a single class label (preferred restaurant category)from several available categories. It is interesting to note herethat RCPPM could be utilized as a “recommender system”that suggests a set of real places to the user. More specifically,RCPPM could be linked with a database comprising realplaces, in a specific country, that combined with categories(class labels). The recommendation process commences withacquiring features from the user, and then the RCPPM predicts

www.ijacsa.thesai.org 682 | P a g e


the category of the preferred place relying on the givenfeatures. After that, all the real places stored in the databaseand categorized as the predicted category will be presented tothe user.

The remainder of this paper is organized as follows. SectionII supplies the reader with the essential background to the workpresented in this study. Section III shows the methodology thathas been followed to generate the RCPPM. Section IV presentsan overview of the main characteristics of the dataset used togenerate the RCPPM. Section V presents the obtained resultsfollowed by Section VI with the conclusion of the presentedwork and directions for future research.

II. BACKGROUND

Classification is an interesting and challenging researcharea. Several researchers directed their research work on ap-plying classification algorithms to real-word problems due tothe potential benefits that can be summarized by producingprediction models that can predict a solution to each instancein the considered problem. As noted in the introduction tothis paper, much research work has been conducted on variousdomains such as medical, biological, social and entertainmentdomains. In order to apply classification algorithms to real-world problems, the researcher should be knowledgeable aboutthe available classification algorithms. In this section, thenecessary background regarding classification algorithms isprovided to the reader. Classification algorithms can be dividedinto two main categories: (i) “Single” classification algorithmsand (ii) “Ensemble” classification algorithms. Commencingwith Single classification algorithms, where only one classifier,that generated using one classification algorithm, is used forpredicting output (class label). Several algorithms are availablefor this purpose, the most vastly used algorithms are:

• Naıve Bayes (NB) algorithms, which generate proba-bilistic classifiers relying on Bayes’ theorem.

• Decision Tree (DT) algorithms, which produce deci-sion tree classifiers where none-leaf nodes representfeatures (input) and leaf-nodes represent class labels(output).

• Rule-Based (RB) algorithms, which generate classi-fiers comprised of a set of “If-Then” rules. Features(input) are presented at the If side, while class labels(output) at the Then side.

• k-Nearest Neighbor (kNN) algorithms, in which thegenerated classifiers are referred to as lazy classifiers,because no classification models are generated. Classlabels (output) are predicted based on similarity.

• Artificial Neural Network (ANN) algorithms, whichproduce sophisticated mathematical classifiers thatcomprised of connected input/output units (neurodes)and communication channels (connections).

• Support Vector Machine (SVM) algorithms, whichgenerate classifiers by finding a “hyperplane” thatdistinctly distinguishes the two classes featured in thedataset.

With respect to Ensemble classification, several classifierscooperate together to output a more effective prediction than

what can be acquired from using a single classifier. If thebase classifiers within the Ensemble are generated using oneclassification algorithm, then the Ensemble is referred to as“Homogeneous”. While if the base classifiers are produced us-ing more than one classification algorithm, then the ensembleis called “Heterogeneous” [14]. Any classification algorithm,such as DT, NB and SVM could be used to construct the baseclassifiers within the Ensemble. Three fundamental methodsare usually used to combine the results produced by theindividual classifiers: weighted averaging, majority voting andaveraging [15]. Numerous researchers provided theoretical andpractical evidences that Ensemble generally produces moreeffective prediction than their base classifiers when they areused alone (single classification) [14], [16], [17]. The mostwidely used Ensemble classification algorithms are:

• Bagging, in which several classifiers are constructedin parallel, using different variations of the considereddataset. To output prediction, voting is adopted tocombine results from the trained classifiers [18], [19].

• Boosting, in which several classifiers are generated se-quentially, the importance of the sequential connectionis to use the information acquired by one classifier toenhance the training process of the next classifier [19],[20].

In this paper, several Single and Ensemble classification algo-rithms are utilized to generate the desired RCPPM.

III. THE ADOPTED EXPERIMENTAL METHODOLOGY

This section presents the followed methodology to producethe desired RCPPM. The first and the main step in the adoptedmethodology is obtaining and preparing the dataset that willbe used to train the classifier. The next section describes themain characteristics of the collected dataset and the consideredpreprocessing. Once the dataset is preprocessed, it will be fedto one of the classification algorithms to produce the predictionmodel. In this study, several Single and Ensemble classificationalgorithms have been utilized and this will be explained in theexperiment section. The last step in the adopted methodology isto evaluate the effectiveness of the generated models, in orderto decide the “best” model and its applicability to be used forfuture prediction. In this work, accuracy and Area Under theROC Curve (AUC) metrics have been utilized for assessingthe performance of the constructed prediction models. Theaccuracy is a simple metric that measures the percentage of thesamples correctly predicted by the prediction model. While theAUC is a robust measure to evaluate the overall effectivenessof the prediction model by measuring the area under the ROCcurve which plots true positive rate and false positive rate [1].

IV. DATASET DESCRIPTION

This section presents an overview of the main characteris-tics of the dataset that were used to generate the RCPPM. Theconsidered dataset was collected using a survey that coversperson demographic data, income level and place preferences.Table I presents the extracted features, with a brief descriptionof each. The main goal is to build a prediction model to predictthe user-preferred restaurant category.



TABLE I. THE EVALUATION DATASET DESCRIPTION

Feature Brief Description Type Values/Range

Age The age of the person Nominal {>18, 18-25, 26-35, >35}

Education Level The educational level of the person Nominal {School, Collage, B.S,Master and PhD}

Work Indicates whether the person works or not Nominal/Binary {yes, no}

Income level The income level of the person per month in Jordanian Dinar Nominal {<50, 50-100, 100-300, 300-500, 500-1000,>1000}

Gender The gender of the person Nominal/Binary {female, male}

Place Design The design of the place preferred by the person Nominal {traditional, classic, mod-ern}

Atmosphere The preferred atmosphere for the person in terms of quiet or loud Nominal/Binary { quiet, loud}

City The city that the person prefers when he/she wants to go to arestaurant

Nominal {Amman, Zarqa, Irbid,Jerash}

Average Spending The average amount of money that the person spends, in JordanianDinar, when going to restaurants

Nominal {<5, 5-10, 10-20, >20}

Hang-out reason Indicates the usual reason(s) for going to restaurants with respectto the person

Nominal * {Reading, Dating, Meet-ing, Parties, Studying}

Music Kind Indicates the preferred person’s music kind in the place he/shewould like to go to

Nominal {Background, DJ, No mu-sic, Live music}

Service Indicates whether the person prefers table-service or self-servicerestaurants

Nominal/Binary {Table-service, Self-service}

Go with Indicates with whom the person prefers to go to restaurants Nominal * {family, friends, co-workers, nobody}

Food preferences Refers to the person’s preferred food kind(s) Nominal * {fast food, American, Ital-ian, Middle East, Chinese}

Meal Refers to the usual meal or food category the person prefers to eatat restaurants

Nominal * {Breakfast, Lunch, Din-nar, Deserts, Drinks}

Sitting Preferences Indicates whether the person prefers to sit in or out in the restaurant Nominal {Inside, Outside}

Seating Preferences Refers to the kind of the furniture that available in the restaurantthe person prefers to go to

Nominal {Chairs, Couches, Both}

Parking Indicates the availability of a parking service in the restaurant Nominal/Binary {yes, no}

Pay Method Refers to the preferred payment method for the person Nominal {Cash, Card, Both}

Free Wi-Fi Indicates if the person prefers free Wi-Fi to be available in therestaurant

Nominal/Binary {yes, no}

Table Reservation Indicates if the person can reserve a table before going to therestaurant


Open After Midnight Indicates whether the person prefers restaurants that open aftermidnight


Speed Indicates whether the speed of offering service is important to theperson


Children seat Indicates if the person prefers a children seat to be available in therestaurant


Wheelchair Indicates if the person prefers a wheelchair seat to be available inthe restaurant


Place Category The category of the person’s preferred restaurant (Class Label) Nominal {Fine Dining, NationalDishes, Cafe Shop(Hookah), Cafe Shop(Study), Picnic, JordanFolklore, Fast Food}

*the attribute is decomposed into a set of binary attributes during the preprocessing, because several options can be selected



Restaurants are categorized into seven categories (classlabels): (i) Fine Dining, (ii) National Dishes, (iii) Cafe Shop(Hookah), (iv) Cafe Shop (Study), (v) Picnic, (vi) Jordan Folk-lore and (vii) Fast Food. Fig. 1 presents labels distribution inthe considered dataset. As shown in the figure, the distributionof the labels is imbalanced, thus a preprocessing is required toresolve this issue and generate an effective prediction model.The well-known Minority Oversampling TEchnique (SMOTE)[21] was adopted. SMOTE is considered as an oversamplingtechnique that produces artificial minority class samples.

17%

5%

16%

8%12%

4%

38%

Original Labels Distribution

Fine Dining

National Dishes

Cafe Shop (Hookah)

Cafe Shop (Study)

Picnic

Jordan Folklore

Fast Food

Fig. 1. Original Labels Distribution in the Considered Dataset

Fig. 2 represents labels distribution after applying SMOTE.In addition to SMOTE preprocessing, handling missing values,solving inconsistency and removing redundancy were alsoapplied to the considered dataset. After preprocessing, thedataset features 25 dimensions and 344 data samples.

14%

14%

15%

14%

14%

14%

15%

Labels Distribution after Applying SMOT

Fine Dining

National Dishes

Cafe Shop (Hookah)

Cafe Shop (Study)

Picnic

Jordan Folklore

Fast Food

Fig. 2. Labels Distribution after Employing SMOTE

V. EXPERIMENTS AND RESULTS

In this section, the obtained results from the undertakenexperiments are presented. As noted earlier in the introduc-tion to this paper, two categories of classification algorithmswere utilized to generate the desired RCPPM: (i) Singleclassification and (ii) Ensemble classification. With respect tothe first classification category; six well-known classificationalgorithms were used to produce the RCPPM: (i) Naıve Bayes(NB), (ii) Decision Tree (DT), (iii) Rule-Based (RB), (iv)k-Nearest Neighbor (kNN), (v) Artificial Neural Network(ANN) and (vi) Support Vector Machine (SVM). Regarding the

second classification category, three algorithms were utilizedto generate the RCPPM: (i) Bagging Ensemble Classification,(ii) Boosting Ensemble Classification, (iii) Heterogeneous En-semble Classification. The well-known 10-fold cross validationtechnique was adopted to divide the dataset into trainingand testing sets and to obtain more accurate classificationresults. All classification experiments founded in this workwere performed using the WEKA data mining tool [22].

Commencing with the results obtained from using singleclassification algorithms to construct the RCPPM. Table IIpresents the obtained results when using the six well-knownclassification algorithms. From the table it can be observedthat DT and NB classifiers generated the same and the highestclassification accuracy (Accuracy= 86.92 and AUC = 0.98).

TABLE II. AVERAGE ACCURACY AND AUC RESULTS OBTAINED WHENUSING SINGLE CLASSIFICATION ALGORITHMS TO GENERATE THE

RCPPM

Classification Algorithm Accuracy AUC

Simple Naıve Bayes(Naıve Bayes) 86.9186 0.979Decision Tree(Hoeffding Tree) 86.9186 0.979Rule-Based(Decision Table) 72.6744 0.917k-nearest neighbor(IBK) 86.0465 0.976Support Vector Machine(SMO) 84.0116 0.944Artificial Neural Network(Multilayer Perceptron) 86.6279 0.961

Because the Ensemble model effectiveness is highly af-fected by the base classifiers [11], the Ensemble classificationexperiments were only conducted using DT and Naıve Bayesclassifiers as base classifiers. Table III presents the obtainedresults from using ensemble classification to generate theRCPPM. Note here that Bagging (DT) refers to utilizing a setof DT classifiers as the base classifiers within the BaggingEnsemble to generate the RCPPM model. While Bagging(NB) refers to using Bagging Ensemble classification withNB classifiers as the base classifiers. Boosting (DT) refersto using Boosting Ensemble classification with DT classifiersas the base classifiers, while Boosting (NB) considers usingNB classifiers as the base classifiers. Regarding HeterogeneousEnsemble classification, a combination of DT and NB classi-fiers were utilized to generate the model. Two Heterogeneousclassification approaches were utilized, the first one adopts“Majority Voting” to combine results from the base classifiers,while the second one considers “Average Probability” to outputthe final prediction result. From the table, Bagging Ensem-ble classification outperforms Boosting and HeterogeneousEnsemble classification, in terms of average accuracy andAUC, for generating the RCPPM. The worst results obtainedwhen using Boosting Ensemble classification to generate theRCPPM.

Fig. 3 presents a comparison between the performance ofSingle classification and Ensemble classification for generatingRCPPM. From the figure, it is clearly observed that BaggingEnsemble classification outperforms Single classification algo-



TABLE III. AVERAGE ACCURACY AND AUC RESULTS OBTAINED WHENUSING ENSEMBLE CLASSIFICATION ALGORITHMS TO GENERATE THE

RCPPM

Classification Algorithm Accuracy AUC

Bagging (NB) 87.2093 0.979

Bagging (DT) 87.2093 0.979

Boosting (NB) 83.1395 0.952

Boosting (DT) 83.1395 0.943

Heterogeneous Ensemble(Average Probability) 86.9186 0.979Heterogonous Ensemble(Majority Voting) 86.9186 0.923

rithms and other forms of Ensemble classification (Boostingand Heterogeneous). The reason behind the superiority ofBagging over Single classification and Boosting is the size ofthe considered dataset. More specifically, Bagging adopts the“Sampling with Replacement” technique to generate differentvariations of the dataset with the same size [1], and thistechnique works very well with small size datasets such as thedataset considered in this research. While the reason behind thesuperiority of Bagging over Heterogeneous Ensemble returnsto the homogeneity of the base classifiers that can reduceprediction conflicts.

70.0072.0074.0076.0078.0080.0082.0084.0086.0088.0090.0092.0094.0096.0098.00

100.00

72.67

83.14 83.14 84.0186.05 86.63 86.92 86.92 86.92 86.92 87.21 87.21

Accuracy

Fig. 3. Comparison between the Performance of Single Classification andEnsemble Classification for Generating RCPPM

VI. CONCLUSION

In this paper, Single and Ensemble classification algorithmshave been utilized to generate a prediction model that aimsto predict restaurant category preferences. The RCPPM is anintelligent prediction model that helps users to decide thebest suitable place to go. The experiments have been accom-plished using a novel dataset that covers person demographicdata, income level and place preferences. From the reportedexperiments, supervised machine learning could be utilizedto generate a high-performance RCPPM. Using ensembleof classifiers enhanced the classification effectiveness of theRCPPM. Moreover, Bagging Homogeneous Ensemble clas-sification outperformed Single and Heterogeneous Ensembleclassification. Although Heterogeneous Ensemble classifica-tion could be utilized to improve classification accuracy byusing the power of completely different classifiers, it did notenhance the effectiveness of the RCPPM. The reason behindthat could be the predictions conflict that generated by differentkinds of classifiers. In the future, the authors plan to investigatethe effect of using different features on predicting restaurantcategory preferences.

ACKNOWLEDGMENT

The authors would like to thank the Hashemite Universityfor the continuous support. The authors would also like tothank Lina, Wasn, Alaa and Hala for distributing the surveyand collecting the dataset.

REFERENCES

[1] H. Jiawei, K. Micheline, and P. Jian, Data Mining: Concepts andTechniques. USA: Morgan Kaufmann, 2011.

[2] J. Demsar, “Statistical comparisons of classifiers over multiple datasets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.

[3] J. Huang and C. X. Ling, “Using auc and accuracy in evaluatinglearning algorithms,” IEEE Trans. on Knowl. and Data Eng.,vol. 17, no. 3, pp. 299–310, Mar. 2005. [Online]. Available:https://doi.org/10.1109/TKDE.2005.50

[4] L. Moreira-Matias, J. Moreira, J. Gama, and P. Brazdil, “Text catego-rization using an ensemble classifier based on a mean co-associationmatrix,” vol. 7376, 07 2012.

[5] P. Yang, J. Yang, B. Zhou, and A. Zomaya, “A review of ensemblemethods in bioinformatics,” Current Bioinformatics, vol. 5, 12 2010.

[6] O. Maimon and L. Rokach, “Ensemble of decision trees for miningmanufacturing data sets,” Machine Engineering, vol. 4, 01 2004.

[7] L. Kai and Z. Zhiping, “Using an ensemble classifier on learningevaluation for e-learning system,” in 2012 International Conference onComputer Science and Service System, 2012, pp. 538–541.

[8] P. Srimani and M. Koti, “Medical diagnosis using ensemble classifiers- a novel machine-learning approach,” J Adv Comput, vol. 1, pp. 9–27,01 2013.

[9] S. Kamatkar, A. Tayade, A. Viloria, and A. Hernandez, Applicationof Classification Technique of Data Mining for Employee ManagementSystem, 06 2018, pp. 434–444.

[10] A. Elbir, H. Bilal Cam, M. Emre Iyican, B. Ozturk, and N. Aydin,“Music genre classification and recommendation by using machinelearning techniques,” in 2018 Innovations in Intelligent Systems andApplications Conference (ASYU), 2018, pp. 1–5.

[11] H. Wang and H. Zhang, “Movie genre preference prediction usingmachine learning for customer-based information,” InternationalJournal of Computer and Information Engineering, vol. 11,no. 12, pp. 1329 – 1336, 2017. [Online]. Available:https://publications.waset.org/vol/132



[12] M. S. H. Mukta, E. M. Khan, M. E. Ali, and J. Mahmud, “Predictingmovie genre preferences from personality and values of social mediausers,” in ICWSM, 2017.

[13] H. Wang and H. Zhang, “Movie genre preference prediction usingmachine learning for customer-based information,” in 2018 IEEE 8thAnnual Computing and Communication Workshop and Conference(CCWC), 2018, pp. 110–116.

[14] Z.-H. Zhou, Ensemble Learning. Boston, MA: Springer US, 2009, pp.270–273.

[15] I. Nti, A. Adekoya, and B. Weyori, “A comprehensive evaluation ofensemble learning for stock-market prediction,” Journal of Big Data,vol. 7, pp. 1–40, 2020.

[16] T. G. Dietterich, “Ensemble methods in machine learning,” in MULTI-PLE CLASSIFIER SYSTEMS, LBCS-1857. Springer, 2000, pp. 1–15.

[17] N. Oza and K. Tumer, “Classifier ensembles: Select real-world appli-cations,” Information Fusion, vol. 9, pp. 4–20, 01 2008.

[18] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2,pp. 123–140, 1996.

[19] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, DataMining: Practical Machine Learning Tools and Techniques,4th ed. Amsterdam: Morgan Kaufmann, 2017. [Online]. Available:http://www.sciencedirect.com/science/book/9780128042915

[20] Y. Freund and R. E. Schapire, “A short introduction to boosting,” in InProceedings of the Sixteenth International Joint Conference on ArtificialIntelligence. Morgan Kaufmann, 1999, pp. 1401–1406.

[21] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:Synthetic minority over-sampling technique,” Journal of Artificial In-telligence Research, vol. 16, pp. 321–357, 2002.

[22] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.Witten, “The weka data mining software: An update,” SIGKDD Explor.Newsl., vol. 11, no. 1, pp. 10–18, Nov. 2009. [Online]. Available:http://doi.acm.org/10.1145/1656274.1656278


Documents

Single and Ensemble Classification for Predicting …...Zarqa, Jordan Ala’a Al-shdaifat2 Department of Computer Science The Hashemite University Zarqa, Jordan Abstract—Classiﬁcation