17
A comprehensive study on data clustering for breastcancer prognosis and risk exposure NIHARIKA SAXENA 1 ,Prof. MEETA KUMAR 2 1,2 Department of Computer Science & Engineering Symbiosis Institute of Technology Pune, Maharashtra, INDIA [email protected] [email protected] May 25, 2018 Abstract Abstract Breast cancer is the most perilous cancer of all type. It is the second leading cause of death among women throughout the world. Breast cancer is preventable and cur- able if detected at its early stage, but as it cannot be diag- nosed early through lab tasks and due to its late detection, the death rate is increasing enormously. Thus, diagnosis and prediction of breast cancer is always being a challeng- ing task, which make it alluring study for researchers over past few years. The work done in field of breast cancer is not bounded to historical methods. Recent advancement in the field of technology gives liberty to the researchers to ex- pand their field of research. Since then various data mining and machine learning techniques are used on various breast cancer dataset. Data mining is extracting logical patterns from given database using machine intelligence. Cluster- ing is the technique of partitioning unstructured data into clusters automatically. In this paper, a brief survey of dif- ferent techniques and models in the field of breast cancer is portrayed. The paper consists of two parts, first part is 1 International Journal of Pure and Applied Mathematics Volume 118 No. 24 2018 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/

A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

A comprehensive study on dataclustering for breastcancer prognosis and

risk exposure

NIHARIKA SAXENA1,Prof. MEETA KUMAR2

1,2Department of Computer Science & EngineeringSymbiosis Institute of Technology

Pune, Maharashtra, [email protected]@sitpune.edu.in

May 25, 2018

Abstract

Abstract Breast cancer is the most perilous cancer of alltype. It is the second leading cause of death among womenthroughout the world. Breast cancer is preventable and cur-able if detected at its early stage, but as it cannot be diag-nosed early through lab tasks and due to its late detection,the death rate is increasing enormously. Thus, diagnosisand prediction of breast cancer is always being a challeng-ing task, which make it alluring study for researchers overpast few years. The work done in field of breast cancer isnot bounded to historical methods. Recent advancement inthe field of technology gives liberty to the researchers to ex-pand their field of research. Since then various data miningand machine learning techniques are used on various breastcancer dataset. Data mining is extracting logical patternsfrom given database using machine intelligence. Cluster-ing is the technique of partitioning unstructured data intoclusters automatically. In this paper, a brief survey of dif-ferent techniques and models in the field of breast canceris portrayed. The paper consists of two parts, first part is

1

International Journal of Pure and Applied MathematicsVolume 118 No. 24 2018ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

Page 2: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

a comprehensive study of different clinical and non-clinicalapproaches used for diagnosis and prediction of breast can-cer on different dataset. The second part gives an idea ofour proposed hybridized evolutionary model.

Key Words:Breast Cancer, Cohort Intelligence Algo-rithm (CI), Fuzzy c-mean(FCM).

1 INTRODUCTION

Breast cancer, the contagious, continual and the second leadingsource of fatality among women worldwide. It starts evolving froma single tissue generally from the inner lining of the lobules whichis the main supplier of milk in ducts and the origination looks likea lymph in the breast. According to a researches, not only womenbut in men also breast cancer is the endemic cause of mortality. 1 of28 women are affected by breast cancer whereas, the ratio of men isquite less but cant be neglected. The breast cancer if identified atits early stages, it will be very helpful to identify the root case andtype of tumor [1]. However, the early prognosis of a disease falloutis one of the most striking and imposing task for physicians. Thus,over past decennary, rather than using a single method, researchershave made a number of studies on classification and prediction ofbreast cancer. The scope of their study is not limited to clinicalapproaches but recent advances in technologies open up the gate touse various machine learning methods for data classification whichis quite time consuming approach initially. In order to predict anddiagnose breast cancer, Clinical methods that are more likely toused are mammography, surgical biopsy, magnetic resonance imag-ing (MRI) and Fine needle aspiration (FNA) with visual interpre-tation. These methods dont provide high accuracy since doctorsneed to look the MRI or mammograms physically which increasethe chance of dangerous error if they have lack of knowledge [4].This leads to the high demand of computerized tools, algorithmsand models for cancer prediction and diagnosis. Over past years,researchers use the concept of data mining and/or machine learningtechniques with clinical methods to get better results. Non clinicalmethods in generally consists of Data mining and Machine learningapproaches [15]. Data mining is a technique of analyzing humon-gous data in search of logical patterns and some precise relation

2

International Journal of Pure and Applied Mathematics Special Issue

Page 3: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

among variables for finding useful information. Generally, it is theprocess of analyzing data and summarizing the result to get mean-ingful information. It is the most important part of Knowledgediscovery cycle. It became the most useful method in different re-search domain, most importantly medical. It helps in identificationof patterns and relation among the variables and help in predictionof the root cause of disease using historical dataset. Data mining isfurther classified into methods like Association, Classification, Clus-tering, Prediction, etc. among which clustering is widely popular.Machine learning, the is branch of Artificial Intelligence, in whichcomputer generate the ability to learn on its own without being pro-grammed. Diverse effective machine learning techniques includingSupport Vector Machine (SVM), Decision Tree (DT), Nave Bayes(NB), Artificial Neural Network (ANN), Bayesian Network (BNs),Genetic Algorithm (GA) have been extensively applied in cancerprognosis and diagnosis [5]. Machine learning algorithms uses datamining techniques along with some learning algorithms for buildingmodels to predict the future outcome of given dataset. Clustering isprocesses of partitioning the set of objects into such a manner thatthe object having higher similarities are grouped together in a singlecluster as compared to the objects of different cluster on the basis ofpredefined scale [9]. Clustering is unsupervised in nature, that havebeen used in many research areas like medical, marketing, biology,etc. and also in wide variety of applications like machine learning,image processing, pattern recognition and many more. Clusteringalgorithm can be either Hierarchical clustering algorithm or Parti-tional clustering algorithm. Hierarchical clustering is the cluster-ing technique in which hierarchy of cluster is formed. It generallyuses the concept of tree, where the child node clusters are uniqueand at parent node, all item belong to single cluster. Hierarchi-cal clustering algorithm can further divide into top-down approachand bottom-up approach. Partitional clustering technique is widelyused for clustering since computational time is low as compared tohierarchical clustering. In early years, K-means is the algorithmthat are used for breast cancer diagnosis and prediction as it isvery fast and simple algorithm. But due to its shortcomings likelocal minima, sensitive to initial cluster center selected, sensitive tooutliers and noise decreases its performance [23] [26]. To over comethese issues, another algorithm that is used is Fuzzy C-mean. Fuzzy

3

International Journal of Pure and Applied Mathematics Special Issue

Page 4: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

C-mean (FCM) [32] is highly unsupervised method of clustering inwhich every data is provided with a membership function for everycluster between 0 and 1 which makes objects belonging uncertainin nature. Generally, fuzzy c-mean is clustering technique in whichdata belongs to one or more cluster, possessing different degree fordifferent cluster. Fuzzy C-mean method is highly used in field ofpattern recognition. Using fuzzy c-mean algorithm over k-meanbecause of its better outcomes and using it nonlinear, complex sys-tems can be easily modeled by implementing human experienceand knowledge. Fuzzy C- mean takes a very long computationaltime and sensitive to local minima and noise. Many more heuristicalgorithms like Genetic Algorithm (GA), Outlier Detection Algo-rithm [ODA] have been discussed in literature surveyed. In pastfew years, researchers focus on nature and socio based optimiza-tion methods like Ant Colony Optimization (ACO), Gbest-GuidedArtificial Bee Colony Algorithm [27], Swarm Intelligence (SI), havebeen proposed. These algorithms are inspired from their socialbehavior. Later, in 2013 based on candidates self-supervised learn-ing behavior Cohort Intelligence algorithm (CI) [17] was developed.Cohort Intelligence algorithm (CI), a novel optimization algorithmin which candidates tries to improve its behavior either by compet-ing or by interacting with other candidates in a cohort which helpsin increasing the clustering property and increases computationalefficiency. This natural and society tendency of possessing cohortnature, learning from one another helps candidates to improve over-all cohort behavior. Although it is computational efficient and givequality result, it sticks to local minima and converge slowly.

2 LITERATURE REVIEW

The review of literature presents a brief description of differenttechniques with respect to breast cancer. This literature survey isdivided into two parts: Clinical Methods and Non Clinical Methods.The narration of the algorithms used along with their result andlimitations are presented in this review.

A. Clinical MethodsClinical methods consist of methods like mammography, sur-

gical biopsy, magnetic resonance imaging (MRI) and Fine needle

4

International Journal of Pure and Applied Mathematics Special Issue

Page 5: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

aspiration (FNA). Banu, G.S., [5] propose a diagnostic algorithmfor classification of mammograms. In this 1DContinuous

Wavelet Transformation (CWT) method is applied for featureselection and Support Vector Machine as classifier. The algorithmachieved 100algorithm like CWT with Fuzzy c-mean. Also, SVMshows no minimum squared error as compare to Fuzzy c-Mean andother classification algorithms. Chang, Y.H., [7] investigated CADschemes using artificial neural network to determine the probabilityof breast cancer. The Artificial Neural Network was first trainedby Back Propagation (BP) and then by Genetic Algorithm (GA).BP and GA is applied on both training and testing set and perfor-mance compared using Receiver-Operating Characteristics (ROC)curves. Result shows that GA performed better then BP, but notso significantly. Gayathri, B.K., [11] give a summary on variousimage segmentation methods used in breast cancer and stated thatonly very few methods are valuable for segmentation on all types ofimages. They also discuss how seeding region growing algorithm isthe most appropriate method for dynamic image segmentation onimages. Kamalakannan, J., [12] focuses on abnormality detectionfrom the mammogram. For pre-processing, Gaussian and LaplacianFilter are used and the differences of both the filters is measuredfor segmentation. Secondly, segmentation of processed image usingOTSU method. Kashyap, K.L., [13] proposed an algorithm that isused for the detection of abnormalities in mammograms. They usedimage processing techniques in which the mammogram image is en-hanced, segmented and classified using Unsharp masking, Medianfiltering and Fuzzy c-mean to find the region of interest (ROI). Theproposed algorithm gives 91% accuracy on Min MIAS database.Mina, L.M., [20] proposes a system using Artificial Neural Network(ANN) based on two dimensional wavelet transformation. The pro-posed system is divided into three phases: pre-processing, waveletdecomposition analysis and ANN approach which gives 91.64% suc-cession rate when tested on MIAS database. Ngayarkanni, S.P., [22]proposed system for automatic segmentation of mammograms im-ages for classification of cancer using three layered artificial neuralnetwork and weights are adjusted using ID3 algorithm The pro-posed method was compared with existing algorithms on the basisof sensitivity, specificity, positive and negative prediction values,which accounts to 99.78%, 99.9%, 94%, and 95% highest then ex-

5

International Journal of Pure and Applied Mathematics Special Issue

Page 6: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

isting algorithms. Ramani, R., [28] done a survey on various clus-tering techniques used for image segmentation. They found thatthese clustering algorithm may increase the efficiency of image re-covery process but no clustering algorithm gives 100recurrence pre-diction models that uses structural clinical data. This survey helpsto find out how these models are used to determine the risk a pa-tient can have after potentially curative treatment like surgery. S.Saheb Basha, [31] presents a research of how breast cancer massis automatically detected in mammograms. They used morpholog-ical operators for distinguish the affected masses and backgroundtissues and Fuzzy c-mean clustering for intensity based image seg-mentation. These methods detect the affected region and cause ofcancer at its early stage that can be missed by pathologists. Verma,A., [33] surveyed on various image processing techniques for tumordetection in mammograms. This survey comes to an outcome, thatmammogram is the best practice for tumor detection.

Table 2 summarizes clinical methodologies with their contri-butions in the field of breast cancer. The literature also summa-rizes the data sets that are used. [5,11,12,20,31,33] uses Mammo-graphic Image Analysis Society(MISA) dataset, whereas, 26 usesWisconsin Breast Cancer Database (WBCD) and [10,26] uses clin-ical database.

TABLE I. COMPARATIVE STUDY OF VARIOUS NONCLINICAL METHODS

6

International Journal of Pure and Applied Mathematics Special Issue

Page 7: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

B. Non Clinical MethodsNon clinical methods in generally consists of machine learning

approaches. The algorithms and methods that are used mostly are:Ahmad, L.G., [1] develop predictive model using machine learn-ing techniques like SVM, decision tree (C4.5) and ANN and com-pare their performance on the basis of sensitivity, specificity andaccuracy. The result shows that SVM predicts with high accuracy(95.7[2] evaluate the performance of SVM and Bayesian Network forthe prediction of survival state of breast cancer patient. The resultshows that SVM outrun Bayesian Network, achieved 74.44Bayesiannetwork achieved 67.56Genetic Algorithm (GA) and Mutual Infor-mation (MI) for breast cancer diagnosis. This hybrid approach usesGA for selecting the optimal set of features, which provided as aninput to Support Vector Machine (SVM) and k-Nearest Neighbour(k-NN). MI is used to increase the mutual information between fea-tures. The experimental results notify that this proposed systemis highly effective in cancer prognosis and can be useful for physi-cians. Asri, H., [4] performs a comparison on different machinelearning algorithms like Support Vector Machine (SVM), DecisionTree (C4.5), Naive Bayes (NB) and k Nearest Neighbors (k-NN) onthe Wisconsin Breast Cancer (original) datasets to determine thecorrectness in classification of data with respect to efficiency and

7

International Journal of Pure and Applied Mathematics Special Issue

Page 8: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

effectiveness of each algorithm in term of accuracy, precision, sensi-tivity and specificity. The result shows that SVM gives highest ac-curacy (97.13%) with lowest error rate. Bethapudi, P., [6] proposesa competent diagnosis technique using Genetic Algorithm (GA).Genetic Algorithm is applied on 3- fold cross validation, gives clas-sification accuracy of 97.7% which are more accurate then existingalgorithms. Devi, R.D.H., [8] focused on investigating automaticdiagnosis of breast cancer using machine learning approaches. Theproposed model is carried out in three phases: first is cluster for-mation using Farthest First Clustering algorithm. Second is outlierdetection using Outlier Detection algorithm and last is classifica-tion of cancer, whether benign or malignant, using J48 classifica-tion algorithm. The efficiency of model is validated by testing it onWBCD and WDBC. The numerical analysis shows that proposedmodel proves to be best one with highest accuracy accuracy of99.9% for WBCD and 99.6% with WDBC as compared to existingalgorithms.

Elouedi, H., [9] proposed a hybrid diagnosis approach using De-cision Tree (C4.5) for classification and Clustering (K-mean) toimprove the classification of malignant cases. Sub dividing the ma-lignant instances into two cluster improves the result of cancer pre-diction as compared to initial results. Elshazly, H.I., [10] presents ageneric classification model that depends on rough set method andDecision rules. Using this method, a comparative study of three dif-ferent classifiers (Decision Tree (DT), K-nearest neighbour (KNN)and Nave Bayes (NB)) over three different discretization techniques(equal binning, entropy and Boolean reasoning) is done. GeneticAlgorithm is used to find the best attributes, whereas, rough setreduction techniques is applied to find the reducts of data for pre-diction. Decision Rules are used as a classifier for the evaluation ofperformance of predicted results and classes. Kermani, B.G., [14]presents a hybrid approach using Genetic Algorithm with NeuralNetwork (GANN) for effective feature extraction. The neural net-work works better when selected features are effective and relevant.

Kourou, K., [15] describes how effective ML tools detects thekey features from complex datasets. They have discussed few tech-niques like Artificial Neural Network (ANN), Bayesian Networks(BNs), Support Vector Machines (SVMs) and Decision Trees (DTs)that are widely used as predictive models in various cancer research.

8

International Journal of Pure and Applied Mathematics Special Issue

Page 9: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

Krishnasamy, G., [16] presents a hybrid data clustering algorithmknown as K-MCI, where they combine k-means with modified co-hort intelligence algorithm. The proposed algorithm is tested ondifferent UCI datasets and a comparison is done on the basis ofits performance with well known algorithms. Lafta, H.A., [18] pro-poses a system using the concept of fuzzy with Genetic Algorithm.Genetic Fuzzy Rule Based System (GFRBS) attain high perfor-mance and the system provide the interpretable feature. Medjahed,S.A., [19] highlight the study of k- Nearest Neighbour (k-NN) algo-rithm as classification algorithm in field of breast cancer. Differentdistance measures and classification rules are used. Experimentalresult supports the use of Euclidean and Manhattan distance withk-NN algorithm. They are effective in terms of classification andperformance but are time consuming in nature. But, the best re-sults are shown by these distances although the value of k variesfrom 1 to 50.

Muhic, I., [21] proposed an approach using Fuzzy C-mean al-gorithm (FCM) and pattern recognition method that was appliedon Wisconsin breast cancer dataset. Fuzzy C-mean algorithm isused to classify the dataset into two groups benign and malignant,whereas, pattern recognition method is used for cluster formation.This proposed system shows high accuracy as compared to surgi-cal approaches like Fine Niddle Aspiration (FNA). Odajima, K.,[23] provides a detailed description of KNN method used in breastcancer diagnosis. It shows how the accuracy of k-NN is dependenton number of neighbors and percentage of data used. Ojha, U.,[24] main objective is to find out the effectiveness of data miningalgorithm in predicting the probability of recurrence of any diseaseamong the patient. The data mining algorithms that are consid-ered are classified as: Classification algorithms (C5.0, KNN, NaveBayes, SVM) and Clustering Algorithm (Kmeans, EM, PAM, Fuzzyc-mean). Experimental result shows that classification algorithmsare best predictors

than clustering algorithms, among which SVM and DecisionTree (C5.0) gives 81accuracy and Fuzzy cmean gives 37among all.

Pawlovsky, A.P., [25] presented how Genetic algorithm (GA)can be used for increasing the accuracy of k-NN method. GeneticAlgorithm selects the best component for prognosis of breast cancerthat is done by kNN algorithm. This dimensionally reduce the data

9

International Journal of Pure and Applied Mathematics Special Issue

Page 10: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

and uses six distances as similarity metrics. Initially 76accuracy isachieved but using this method accuracy increases by 79%.

Pawlovsky, A.P., [25] presented a method for evaluating differentsettings of k-NN method and select the best one using the result oftest set. Range of its result is dependent on the setting the valuesof some of its parameter. Average accuracy achieved is 76% with90% as maximum value and 62% as minimum value. Pourman-dia, M., [27] presents a diagnostic system using hybrid of Fuzzy c-mean and Optimized neural network (ONN). Fuzzy c-mean extractthe efficient features and ONN is used for intelligent classification.Gbest-guided artificial Bee Colony (GABC) algorithm is used forselecting the most efficient features for classification. Complexityis minimum as compared to other methods.

Rashmi, G.D., [29] uses Nave Bayes classification and NaveBayes Prediction algorithm to classify and predict whether the can-cer is Benign or Malignant. Numerical analysis shows that boththe algorithm shows same success and error rate. The algorithmscan be improved to increase the success rate but only for smallinstances. Tintu, P.B. [32] uses Fuzzy C-mean for breast cancerdiagnosis which has been applied to UCI data set. By this, theyfocused on breast cancer diagnosis using fuzzy approach. The re-sult shows that this approach classifies the cancer cases with highaccuracy. Wang, Z., [34] proposed a system based on rough set,SOM neural network and Genetic Algorithm for extracting knowl-edge from the data for early diagnosis and prediction. Discretizedata using SOM, selection of relevant attributes using Genetic Al-gorithm (GA) and form diagnosis rule from rough set for decisionmaking. Rules are used as classifier for performance evaluation ofthe system.

Table 2 summarizes non clinical methodologies with their con-tributions in the field of breast cancer. The literature also sum-marizes the data sets that are used. In the research papers num-bered [3,4,6,8,9,10,14,15,16,18,19,20,25,26,27,29,34] uses WisconsinBreast Cancer Data Set (WBCD), whereas, [1] uses Iranian Centrefor Breast Cancer dataset, [2] uses Haberman’s survival dataset.[24] uses Wisconsin Prognosis Breast Cancer (WPBC) for theirstudy.

TABLE II. COMPARATIVE STUDY OF VARIOUS NON

10

International Journal of Pure and Applied Mathematics Special Issue

Page 11: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

CLINICAL METHODS

3 PROPOSED IDEA

Our proposed model is a hybridized data clustering model usingfuzzy c-mean and Cohort Intelligence algorithm (CI) for makinghighly optimized clusters for cancer prediction. Using fuzzy c-meanalgorithm over k-mean because of its better outcomes and using itnon-linear, complex systems can be easily modeled by implement-ing human experience and knowledge. Cohort Intelligence (CI), anovel optimization algorithm in which candidates show self super-vised learning behavior in a cohort which helps in increasing theclustering property and increases computational efficiency. Fig. 1,represents the overview of our system. The stages of the proposedsystem consist of :

11

International Journal of Pure and Applied Mathematics Special Issue

Page 12: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

a) Dataset Acquisition: Understanding the input data, trans-form and convert into required format.

b) Attribute Selection: Selecting the attribute related to breastcancer prognosis.

c) Data Preprocessing: Manipulation of data by removing miss-ing values and noisy data.

d) Data Clustering: Clusters are formed using Fuzzy c-Meanthat will further processed using CI.

e) Cluster Optimization: Optimizing the clusters to get a satu-rated behavior possessed by each candidate.

f) Analysis: Comparison of the result with existing methodolo-gies.

Fig 1: Overview of Proposed System

4 CONCLUSION

Breast cancer is the contagious source of death among women world-wide. Early diagnosis and prediction of breast cancer is always be-ing a challenging task. This paper presented a review of previouswork done in field of breast cancer. Research methods in this fieldcan be categorized in two ways: Clinical methods and Non clinicalmethods. Clinical methods include image processing and segmenta-tion techniques for breast cancer diagnosis and prediction considermammograms, whereas, Non Clinical methods include upcomingtechniques like machine learning and Data mining. A comprehen-sive study in this area has been given to upsurge the growth ofresearch. Keeping various drawbacks and future ideas from liter-ature, we proposed a hybridized evolutionary model, using Fuzzyc-mean with Cohort Intelligence is proposed for improving the effi-ciency of data clusters for prediction.

12

International Journal of Pure and Applied Mathematics Special Issue

Page 13: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

References

[1] Ahmad, L.G., Eshlaghy, A.T., Poorebrahimi, A., Ebrahimi,M. and Razavi, A.R., Using three machine learning techniquesfor predicting breast cancer recurrence. J Health Med Inform,4(124), 2013, p.3.

[2] Aljawad, D.A., Alqahtani, E., Ghaidaa, A.K., Qamhan, N.,Alghamdi, N., Alrashed, S., Alhiyafi, J. and Olatunji, S.O.,Breast cancer surgery survivability prediction using bayesiannetwork and support vector machines. IEEE In Informatics,Health & Technology (ICIHT), 2017, February, pp. 1-6.

[3] Alzubaidi, A., Cosma, G., Brown, D. and Pockley, A.G.,Breast Cancer Diagnosis using a Hybrid Genetic Algorithmfor Feature Selection based on Mutual Information. IEEE InInteractive Technologies and Games (iTAG), 2016, October,pp. 70-76.

[4] Asri, H., Mousannif, H., Al Moatassime, H. and Noel, T.,Using Machine Learning Algorithms for Breast Cancer RiskPrediction and Diagnosis. Procedia Computer Science,Vol. 83,2016, pp.1064-1069.

[5] Banu, G.S., Fareeth, A. and Hundewale, N., Prediction ofbreast cancer in mammogram image using support vector ma-chine and fuzzy C-means. IEEE-EMBS In Biomedical andHealth Informatics (BHI), 2012, January, pp. 573-576.

[6] Bethapudi, P., SreenivasaReddy, E. and Sitamahalakshmi, T.,A Computational Approach for Better Classification of BreastCancer using Genetic Algorithm.

[7] Chang, Y.H., Zheng, B., Wang, X.H. and Good, W.F.,Computer-aided diagnosis of breast cancer using artificial neu-ral networks: comparison of backpropagation and genetic al-gorithms. IEEE International Joint Conference on Neural Net-works, IJCNN’99. Vol. 5, 1999, pp. 3674-3679.

[8] Devi, R.D.H. and Devi, M.I., Outlier detection algorithm com-bined with decision tree classifier for early diagnosis of breast

13

International Journal of Pure and Applied Mathematics Special Issue

Page 14: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

cancer. Int J Adv Engg Tech/Vol. VII/Issue II/April-June,2016, 93, p.98.

[9] Elouedi, H., Meliani, W., Elouedi, Z. and Amor, N.B., A hy-brid approach based on decision trees and clustering for breastcancer classification. IEEE 6th International Conference of SoftComputing and Pattern Recognition (SoCPaR), 2014, August,pp. 226-231.

[10] Elshazly, H.I., Ghali, N.I., Korany, A.M.E. and Hassanien,A.E., Rough sets and genetic algorithms: A hybrid approachto breast cancer classification. IEEE In Information and Com-munication Technologies (WICT), 2012, October, pp. 260-265.

[11] Gayathri, B.K. and Raajan, P., A survey of breast cancer de-tection based on image segmentation techniques. IEEE Inter-national Conference on Computing Technologies and Intelli-gent Data Engineering (ICCTIDE), 2016, January, pp. 1-5.

[12] Kamalakannan, J., Krishna, P.V., Babu, M.R. and Mukeshb-hai, K.D., Identification of abnormality from digital mammo-gram to detect breast cancer. IEEE International Conferenceon Circuit, Power and Computing Technologies (ICCPCT),2015, March, pp. 1-5.

[13] Kashyap, K.L., Bajpai, M.K. and Khanna, P., Breast cancerdetection in digital mammograms. IEEE International Confer-ence on Imaging Systems and Techniques (IST), 2015, Septem-ber, pp. 1-6.

[14] Kermani, B.G., White, M.W. and Nagle, H.T., Feature ex-traction by genetic algorithms for neural networks in breastcancer classification. IEEE 17th Annual Conference Engineer-ing in Medicine and Biology Society, 1995, September, Vol. 1,pp. 831-832.

[15] Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis,M.V. and Fotiadis, D.I., Machine learning applications in can-cer prognosis and prediction. Computational and structuralbiotechnology journal, Vol. 13, 2015, pp.8-17.

14

International Journal of Pure and Applied Mathematics Special Issue

Page 15: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

[16] Krishnasamy, G., Kulkarni, A.J. and Paramesran, R., A hy-brid approach for data clustering based on modified cohortintelligence and K-means. Expert Systems with Applications,vol. 41(13), 2014, pp.6009-6016.

[17] Kulkarni, A.J., Durugkar, I.P. and Kumar, M., Cohort intelli-gence: a self supervised learning behavior. IEEE InternationalConference on Systems, Man, and Cybernetics (SMC), 2013October, pp. 1396-1400.

[18] Lafta, H.A. and Ayoob, N.K., Breast Cancer Diagnosis UsingGenetic Fuzzy Rule Based System.

[19] Medjahed, S.A., Saadi, T.A. and Benyettou, A., Breast cancerdiagnosis by using k-nearest neighbor with different distancesand classification rules. International Journal of Computer Ap-plications, Vol. 62(1), 2013.

[20] Mina, L.M. and Isa, N.A.M., 2015, April. Breast abnormal-ity detection in mammograms using Artificial Neural Net-work. IEEE International Conference on Computer, Commu-nications, and Control Technology (I4CT), 2015, April, pp.258-263.

[21] Muhic, I., Fuzzy analysis of breast cancer disease using FuzzyC-Means and pattern recognition. Southeast Europe Journalof Soft Computing, Vol. 2(1), 2013.

[22] Ngayarkanni, S.P., Kamal, N.B. and Thavavel, V., Automaticdetection and classification of cancerous masses in mammo-gram. IEEE 3 rd International Conference on Computing Com-munication & Networking Technologies (ICCCNT), 2012 July,pp. 1-10.

[23] Odajima, K. and Pawlovsky, A.P., A detailed description ofthe use of the kNN method for breast cancer diagnosis. IEEE7th International Conference on Biomedical Engineering andInformatics (BMEI), 2014, Ooctober, pp. 688-692.

[24] Ojha, U. and Goel, S., A study on prediction of breast can-cer recurrence using data mining techniques. IEEE 7th In-ternational Conference on Cloud Computing, Data Science &Engineering-Confluence, 2017, January, pp. 527-530.

15

International Journal of Pure and Applied Mathematics Special Issue

Page 16: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

[25] Pawlovsky, A.P. and Matsuhashi, H., The use of a novel ge-netic algorithm in component selection for a kNN method forbreast cancer prognosis. IEEE In Global Medical Engineer-ing Physics Exchanges/Pan American Health Care Exchanges(GMEPE/PAHCE), 2017, March, pp. 1-5.

[26] Pawlovsky, A.P. and Nagahashi, M., A method to select a goodsetting for the kNN algorithm when using it for breast cancerprognosis. IEEE-EMBS International Conference on Biomedi-cal and Health Informatics (BHI), 2014, June, pp. 189-192.

[27] Pourmandia, M. and Addeh, J., Breast cancer diagnosis usingfuzzy feature and optimized neural network via the Gbest-guided artificial bee colony algorithm. Comput Res Prog ApplSci Eng, 1(04), 2015, pp.152-159.

[28] Ramani, R., Valarmathy, S. and Vanitha, N.S., Breast cancerdetection in mammograms based on clustering techniques-Asurvey. International Journal of Computer Applications, 2013,Vol. 62(11).

[29] Rashmi, G.D., Lekha, A. and Bawane, N., Analysis of effi-ciency of classification and prediction algorithms (Nave Bayes)for Breast Cancer dataset. IEEE International Conferenceon Emerging Research in Electronics, Computer Science andTechnology (ICERECT), 2015, December., pp. 108-113.

[30] Richter, A.N. and Khoshgoftaar, T.M., Predicting Cancer Re-lapse with Clinical Data: A Survey of Current Techniques.IEEE17th International Conference on Information Reuse andIntegration (IRI), 2016 July, pp. 369-376.

[31] S. Saheb Basha, Satya Prasad, Automatic detection of breastcancer mass in mammograms using morphological operatorsand fuzzy c-means clustering Journal of theoretical and appliedinformation technology, pp. 704-709.

[32] Tintu, P.B., Detect Breast Cancer using Fuzzy C means Tech-niques in Wisconsin Prognostic Breast Cancer (WPBC) DataSets. International Journal of Computer Applications Technol-ogy and Research, Vol. 2(5), 2013, pp.614-meta.

16

International Journal of Pure and Applied Mathematics Special Issue

Page 17: A comprehensive study on data clustering for breastcancer ...a comprehensive study of di erent clinical and non-clinical approaches used for diagnosis and prediction of breast can-cer

[33] Verma, A. and Khanna, G., A survey on image processingtechniques for tumor detection in mammograms. IEEE Inter-national Conference on Computing for Sustainable Global De-velopment (INDIACom), 2016, March, pp. 988-993.

[34] Wang, Z., Zhang, X. and Yang, W., Rule induction of breastcancer medical diagnose based on combination of rough sets,artificial neutral network and genetic algorithm. IEEE In Con-trol and Decision Conference (CCDC), 2016, May, pp. 5707-5711.

17

International Journal of Pure and Applied Mathematics Special Issue