AN EFFECTIVE FEATURE SUBSET SELECTION
BASED DATA CLASSIFICATION MODEL USING
STOCHASTIC ANT COLONY OPTIMIZATION IN
HEALTHCARE INDUSTRY
M. Arasakumar, Dr. P. Sudhakar 1Assistant Professor, Department of Information Technology, Faculty of Engineering and
Technology, Annamalai University 2Associate Professor, Department of Computer Science and Engineering,
Faculty of Engineering and Technology, Annamalai University
[email protected], [email protected]
ABSTRACT
At present times, the developing medical sector generates massive quantity of needed details
related to the patient's demographics; medications, payments, and insurance coverage have
attracted physicians and academicians to a greater extent. Several studies have been
presented in the various aspects of data mining applications in medicinal sector. This paper
proposes a feature selection based classification model for data mining in healthcare
industry. A set of four FS approaches has been employed namely ant colony optimization
(ACO) based FS; genetic algorithm (GA) based FS, gain entropy and principal component
analysis (PCA). Besides, to prevent the local optima problem of ACO, an improved version
named stochastic ACO algorithm is derived by incorporating periodic partial reinitialization
of the population. The presented model is validated using a set of three benchmark medical
dataset namely chronic kidney disease (CKD), Indian Liver Patient (ILP) and Wisconsin
Breast Cancer dataset. An extensive comparative analysis of the presented model takes place
and verified the superiority of the presented model against all the applied dataset.
Keywords: Healthcare, Data mining, Feature selection, Data classification
1. INTRODUCTION
At present days, Healthcare domain gains more importance in several aspects in all over the
globe [1]. With its growth, various issues exist namely cost, inefficiency, bad quality, and
high complexity [2]. The expense of healthcare in US is raised by 123% from 2010 and 2015
[3]. Ineffective and on-value added processes results in an increase of 21–47% of these
massive expenses [4]. A study showed that around 2 lakh patients die in US because of
medicinal errors [5]. Effective decision-making using existing details can resolve the issues
and offers better treatment to patients. Nowadays, healthcare sectors adopt information
technology in their management system [6]. Massive quantity of data gets gathered by this
model in a periodic manner. Analytics offers different models for the extraction of details
from the difficult and massive data and transform it to the needed information for the
assistance of decision making in healthcare sector. From the past decade, several studies have
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1653
been presented on the development of data classification models for the diagnosis of diseases
utilizing the patient data.
FS is an important part in the data classification process, finds helpful to attain small
collection of rules from the training set. Different models like machine learning algorithms,
biologically inspired algorithms have been utilized for feature selection.
In [7], a hybrid genetic algorithm with support vector machine (GA-SVM) method is
proposed for efficient selection of attributes. The elimination of repeated attributes by hybrid
GA-SVM enhances the classification accuracy. The GA-SVM method operates on two levels.
In the first level, the attributes are selected by evolutionary algorithms and they are applied to
SVM to obtain a fitness measure for every attribute set in the next level. The obtained fitness
values are applied to select the best set of attributes using GA. SVM and GA are easily
incorporated using a wrapper method. Finally, the hybrid method is enhanced by the use of a
correlation measure among attributes in place of fitness measure. It substitutes the weaker
members of the population with recently created chromosomes. It produces good diversity
and improves the total fitness of the population. The hybrid GA-SVM is tested against 5
datasets (iris, diabetes, breast cancer, HD and hepatitis) from UCI repository. The outcomes
obtained by the GA-SVM indicate that it performs well and produces significantly better
results.
[8] proposed a method to identify and classify CKD and non-CKD patients. The proposed
method involves three steps: 1) a framework is created to mine data, 2) Wrapper subset
attribute evaluator and best first search method are applied for attribute selection and 3) three
classifiers are used to classify the CKD and non-CKD patients. The attribute selector
eliminates the unwanted attributes to reduce the size of the dataset. The attribute evaluator
model achieves better selection of attributes by decreasing the number of attributes. These
observations reveal that the accuracy is higher for reduced dataset when compared to the
original dataset. In [9], a new method is developed to enhance the diagnosing quality of
CKD. The proposed method comprises of three steps which includes feature selection,
ensemble learning and classification. The original dataset include 400 instances with 25
attributes which are minimized using Correlation-based Feature Selection (CFS). The
classifiers such as k nearest neighbor (kNN), SVM and naive bayes (NB) were used in
ensemble learning for base classifier. AdaBoost is employed for ensemble learning to
enhance the detection of CKD. The results show that the integration kNN classifier achieves
higher accuracy of 0.981.
In [10], several data mining techniques like SVM, decision tree (DT), NB and kNN algorithm
are used to investigate the CKD dataset gathered from UCI repository to predict KD. The
performance of the classifiers is measured interms of accuracy, Root Mean Squared Error,
Mean Absolute Error and Receiver Operating Characteristic curve. Ranking algorithm
provides vital improvements in classifications with proper number of attributes. The
experimental results prove that the DT accomplishes better results with an accuracy of 99%
and SVM achieves an accuracy of 97.75%.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1654
[11] proposed an artificial neural network (ANN) to increases the performance of heart
disease identification interms of cost, response time and accuracy. The proposed approach
utilizes ANN to select essential features from the input layer of the network. MLP-NN
method is employed to select features from Ischemic Heart Disease (IHD) dataset. Initially,
the total number of attributes is 17 for 712 patient records and the number of attributes is
minimized to 12 after feature selection. When the number of features is 12, the prediction
accuracy for training and testing process is 89.4% and 82.2%. When the number of features is
reduced more, then the performance will be affected, so, the number of features is set to 12
for IHD data set.
[12] presented a method to identify CKD using two kinds of feature selection techniques
namely wrapper and filter approach. It is observed that the reduction in the number of
features does not guarantee higher classification accuracy. For example, the obtained
accuracy rate of SVM in prior and after the attribute selection is found to be 98.5% and 98%
respectively.
In [13], a multilayer perceptron (MLP) with Back-Propagation learning method is integrated
to the feature selection algorithm to predict HD. Information gain is used for feature selection
to minimize the number of attributes. Initially 13 attributes are included to classify HD. MLP
is used for data classification. Without reducing the number of attributes, the accuracy of
training and validation data set is 88.46% and 80.17% respectively. When feature selection is
involved and number of attributes is decreased from 13 to 8, the accuracy of training and
validation data set is 89.56% and 80.99% respectively. From the experimental results, the
resultant accuracy is improved by 1.1% and 0.82% for training and validation data set.
Several studies have been presented in the various aspects of data mining applications in
medicinal sector. This paper proposes a feature selection based classification model for data
mining in healthcare industry. A set of four FS approaches has been employed namely ant
colony optimization (ACO) based FS, genetic algorithm (GA) based FS, gain entropy and
principal component analysis (PCA). Besides, to prevent the local optima problem of ACO,
an improved version named stochastic ACO algorithm is derived by incorporating periodic
partial reinitialization of the population. The presented model is validated using a set of three
benchmark medical dataset namely chronic kidney disease (CKD), Indian Liver Patient (ILP)
and Wisconsin Breast Cancer dataset. An extensive comparative analysis of the presented
model takes place and verified the superiority of the presented model against all the applied
dataset.
2. PROPOSED MODEL
The entire working operation of the proposed model is shown in Fig. 1. Initially, the medical
data will undergo pre-processing and FS process to choose the needed features along with the
removal of unwanted features. Then, the stochastic ACO algorithm is applied for the
classification of medical dataset into the occurrence of disease or not.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1655
2.1. FEATURE SELECTION
2.1.1. ACO Based feature selection (ACO-FS)
Provided a feature group of size n, the FS problem is to recognize a minimum feature subset
of size s (s<n), as retaining a high accuracy in the demonstration of actual features. Partial
results do not signify some ordering among the result of features. Simultaneously, the future
feature for selection is not essentially authorized with the preceding feature attached to partial
result. But, there is no require which the outcomes of FS problem must be corresponding size.
The mapping of FS problem to ACO technique occupies the subsequent stages:
Represent Graph,
Heuristic desirability,
Update Pheromone and
Construct result
Fig. 1. Overall process of proposed work
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1656
2.1.2. GA Based FS
The essential EA for FS is GA that is a stochastic method for optimization functions utilizing
the nature of genetics as well as biological evolution. Usually, the genes include the trend to
develop over following generations to adjust by surrounding. GA functions on a population
of separates to grow optimal approximations. In each generation, a new population is
increased with method of choosing the separates dependent on their level in the problem area,
and reintegrating them with utilize of operators in natural genetics. The children can also
execute mutation function with it outcome to generation of populations of separates that is
adjusted to surroundings than the persons that they were created from, presently as in natural
adjustment. In SPP, each person in the population specifies a predictive method. The number
of genes is the number of obtainable features in the data set. Genes are double values that
denote the count or removal of definite features in the method. The number of particulars, or
population size would be choose to all application is set to 10 with default (N=10, where N is
the number of features).
2.1.3. PCA Based FS
The PCA-based FS system to machine form observing was dependent on the perceptive
which the amplitude of vibration signals of imperfect machine modules raises as the severity
of the fault raises.
Commonly, the PCA method transforms 𝑛 vectors (p1, p2, … p𝑢 , p𝑛) from a 𝑑‐dimensional
space to 𝑛 vectors (p1′ , p2
′ , … p𝑢′ , … , p𝑛
′ ) in a new, 𝑑′‐dimensional space as
p𝑢′ = ∑ 𝑎𝑧,𝑢
𝑑′
𝑧=1
e𝑧 , 𝑑′ ≤ 𝑑 (1)
where e𝑧 are the eigenvectors equivalent to the 𝑑′leading eigenvalues for the scatter matrix
Sand 𝑎𝑧,𝑢 is calculation of actual vectors p𝑢 on the eigenvectors e𝑧. These calculations are
known as the principal modules of actual data set. Both𝑑 and 𝑑′are positive integers with
dimension 𝑑′cannot are larger than d. The d‐by‐d scatter matrix 𝑆 to actual data set
(p1, p2, … p𝑢 , … , p𝑛) is described as
S = 𝐸[p𝑢p𝑢𝐿 ], 𝑓𝑜𝑟𝑢 = 1 𝑡𝑜 𝑛 (2)
where 𝐸[𝑝𝑢𝑝𝑢𝐿] is the statistical probability operator executed on outer product of 𝑝𝑢 and its
reverse. The demonstration revealed in (1) diminishes the mistake among the actual with
altered vectors. This is showed with regarding the difference of the principal modules
𝑎𝑧,𝑢provided as
𝜎2(e𝑧) = 𝐸[𝑎𝑧,𝑢2 ] = e𝑧
𝐿Se𝑧 (3)
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1657
where e𝑧signifies the d‐by‐1 vector e𝑧 = [𝑒1,𝑧𝑒2,𝑧 … 𝑒𝑑,𝑧]𝐿. It is apparent that the difference
of principal modules in the operation of magnitude of the modules in the vectors e𝑧. At the
local maximum as well as minimum to difference operation in (3), the subsequent connection
exists:
𝜎2(e𝑧 + 𝛿e𝑧) = 𝜎2(e𝑧). (4)
Equation (4) is pleased when
(𝛿e𝑧)𝐿Se𝑧 − 𝜆(𝛿e𝑧)𝐿e𝑧 = 0 (5)
where 𝜆 is a scaling factor. This leads to
Se𝑧 = 𝜆e𝑧 . (6)
This equation could be identified as an eigenvalue problem through non trivial outcomes only
when 𝜆 is the Eigen values of scatter matrix S. So, the connection vectors e𝑧 (𝑧 = 1 to 𝑑′) be
the eigenvectors. If the state𝑑′ < 𝑑 is fulfilled, then the over demonstration also diminishes
the dimensionality of the vectors.
Provided that features altered with principal modules are not directly associated for the
physical nature of the fault, the fault classified method proposed in this study was dependent
on actual features themselves. The eigenvectors to altered data were only utilized for
selection the most receptive features from actual feature set.
2.1.4. Gain Ratio Based FS
GR is a modified of the information gain (IG) to decreases its bias. GR gets number as well
as size of branches into account if selecting a parameter. It’s accurate the IG with getting the
intrinsic information of a divide into account. Intrinsic information is entropy of sharing of
examples into divisions (i.e. how much information do we require to inform that branch an
example belongs to). Attribute value are reduces as intrinsic information takes bigger.
𝐺𝑎𝑖𝑛 𝑅𝑎𝑡𝑖𝑜 (𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒) =𝐺𝑎𝑖𝑛 (𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒)
𝐼𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐_𝑖𝑛𝑓𝑜 (𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒) (7)
Ranking processes are utilized for chosen subset of required attributes from the actual dataset
of total attributes.
2.2. STOCHASTIC ACO FOR DATA CLASSIFICATION
Here, ACO model is applied for the extraction of classifier rules from the data depending
upon the nature of ants and data mining approaches. The intention of this model is to allocate
every individual instance to a class label from the available classes by the sue of specific
parameters. In generally, in the classification process, the discovered knowledge can be
defined in the form of if then rules as mentioned below.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1658
IF < 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠 > 𝑇𝐻𝐸𝑁 < 𝑐𝑙𝑎𝑠𝑠 > (8)
The rule antecedent (IF part) holds a set of criteria which are normally related by a AND
operation. Then, the rule consequent (THEN part) defines the class predicted for cases whose
predictor variables ensure that every term can be represented by the rule antecedent. The
application of ACO method for disease prediction comprises of the following processes:
Represent the structure
Create rules
Heuristic function
Prune rules
Pheromone update
Use explored rules
2.2.1. Represent the structure
The structure of the ACO model is represented in Fig. 2. A variable is defined by 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑘,
where 𝑘 represents the series of variables and Valkl defines the non-continuous variable value.
The resulting part of the variables comes under the class and the class is defined by Cz, where
𝑧 is the series value in the class. At this point, the ant has begun its traversal process from the
source and select s value for every variable. Once it completes the traversal of all the
parameters, it will choose a value for every class. As depicted in Fig. 2, the elected route is
indicated by solid lines: Val1, 2, Val2, 1, Val3, 3, C3, destination. To discover the rules, a certain
number of ants should traverse an individual path as defined in the following subsections.
Fig. 2. Structural representation of ACO method
2.2.2. Create rules
For the exploration of the sequence of classifier rules, a sequential covering technique is
applied. In the previous stage, a collection of explored rules are set to 0 and the training set
holds explored rules. While classifier rules are explored at each round, they are moved to the
classifier rule list and discarded from the training set. The explorations of the rules are carried
out in case of fulfilling below mentioned conditions.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1659
i. A value lesser than the fixed value will be included to the rule, known as
minimum_cases_per_rule.
ii. When the exploitation of parameters takes place by every ant, the rule create procedure
gets stopped. The ants utilize a probability function (Prouv) as provided in Eq. (9) for
selecting a variable value for rule creation.
Prouv =ηuv.τuv
∑ (pu)𝑎𝑢=1 .∑ (ηuv.τuv(l))
𝑏
𝑣=1
(9)
where ηuv is the problem oriented heuristic function and τuv defines the quanitty of the
pheromone quantity.
2.2.3. Heuristic function
Under each termuv, it can be appended to the current rule and the ACO model determines the
η ijindicating the term quality based on the capability of enhancing the predictive accuracy.
Particularly, the value of η uv for termuv involves a measure of the entropy (or amount of
information) linked with the term. Under each termuv, the entropy can be determined by
𝐻(𝑊|𝐴𝑢 = 𝑉𝑢𝑣) = − ∑ (𝑃(𝑤|𝐴𝑢 = 𝑉𝑢𝑣). 𝑙𝑜𝑔2 𝑃(𝑤|𝐴𝑢 = 𝑉𝑢𝑣))𝑧𝑤=1 (10)
where W denotes the class variable, z represents the class count, and 𝑃(𝑤|𝐴𝑢 = 𝑉𝑢𝑣) is the
empirical likelihood of observing class w conditional on having observed 𝐴𝑢 = 𝑉𝑢𝑣 .
2.2.4. Prune Rules
It intends to discard the irrelevant terms that exist in the rule. It mainly enhances the
prediction ability of the rule and assists to resolve the issue of over fitting of the training data.
But, it improves the simplicity of the rules where a short rule is easier to interpret over a long
rule. In case of rule generation by an ant, pruning task has begun. It eliminates the unwanted
rules generated by ants, increasing the rule quality. The rule quality (Q) present in the range
of 0 ≤ 𝑊 ≤ 1q. (11).
𝑄 = 𝑇𝑃
(𝑇𝑃 + 𝐹𝑁)∗
𝑇𝑁
(𝐹𝑃 + 𝑇𝑁) (11)
where TP- True positive, TN- True Negative, FN- False Negative and FP- false positive.
2.2.5. Update pheromones
It indicates the volatile nature of pheromone laid by ants in the real time. Due to the positive
feedback procedure, the errors exist in the heuristic parameter can be resolved and leads to
enhanced classifier outcome. The ants make use of this process for the exploration of simple
and effective classifier performance. At the beginning, the trails are provided with an
identical quantity of pheromone as defined by
τuv(𝑙 = 0) =1
∑ 𝑏𝑢𝑎𝑢=1
(12)
where au denotes the attribute count and bu is the probable value of au. The quantity of
pheromone can be utilized for updates due to the fact that the ants lay pheromone while
exploring the paths. On the other hand, the volatility of pheromone has to be designed. As a
result, a repetitive procedure takes palace using Eq. (13).
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1660
τuv(l) = (1 − ρ)τuv(l − 1) + (1 −1
1+Q) τuv(l − 1) (13)
Where ρ is the pheromone evaporation rate, Q is the quality as given in Eq. (11) and t is the
series number of the iteration. On the other hand, nodes which are not utilized by the existing
rule have only pheromone evaporation and are represented as
τuv(l) =τuv(l−1)
∑ ∑ τuv(l−1)buv=1
au=1
(14)
where 𝑎 is the number of attributes, bu is the number of values in the attibuteu and 𝑙
represents the series number of the round. It indicates that the quantity of pheromone of the
undiscovered nodes gets minimized with an increase in time.
2.2.6. Use explored rules
For the classification of fresh testing instances, the explored rules are applied in the series of
exploration as they are saved in an ordered list. The initial rule that envelop the testing
instance is provided, representing that the instance is allocated to the predicted class. In few
cases, no rules present in the list can hold the testing instances. In such cases, the testing
instances undergo classification using a default rule that predicts the main class in the set of
uncovered training cases.
2.2.7. Stochastic process
Though ACO algorithm provides several benefits such as parallelism, self learning and
efficient information feedback, it shows poor convergence due to the absence of information
at the earlier stages. To resolve this issue, a new stochastic ACO algorithm is presented by
the incorporation of periodic partial reinitialization of the population into the ACO to
improve its total efficiency. The global convergence rate of stochastic ACO algorithm is
ensured by the process of periodic restart in use under the conditions of participating in
comparison, which helps to avoid the premature convergence. Once the learning agent gets
trapped into local optimal, the learning agents will learn with no observations, however with
complete arbitrariness by stochastic searching process. It assists to improve the global
ergodicity of the knowledge library and eliminates earlier convergence. Under the population
reinitialization strategy, the average fitness will oscillate from a very low minimum value,
after reinitialization exists to a maximum value when convergence takes place. The
maximum fitness advances in steps between the reinitializations, which correspond to better
solutions evolving by the recombination of the best individuals with the randomly generated
ones.
3. PERFORMANCE VALIDATION
Table 1 provides a brief description regarding the dataset employed in this study. For
experimentation, a set of three dataset namely CKD [14], ILP [15] and WBC [16] dataset are
employed. In the CKD dataset, a total of 400 instances were present and a set of 24 attributes
exist under the presence of two classes.
A set of 250 instances comes under the class 1 and remaining 150 instances falls under the
class 2. In the ILP dataset, a total of 583 instances were present and a set of 10 attributes exist
under the presence of two classes.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1661
Table 1 Dataset Description
Dataset Source # of instances # of attributes # of class Class 1/Class 2
CKD UCI 400 24 2 250/150
ILP UCI 583 10 2 416/167
WBC UCI 699 10 2 458/241
Table 2 Dataset Description of CKD Dataset
Table 3 Dataset Description of ILP Dataset
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1662
Table 4 Dataset Description of WBC Dataset
A set of 416 instances comes under the class 1 and remaining 167 instances falls under the
class 2. In the WBC dataset, a total of 699 instances were present and a set of 10 attributes
exist under the presence of two classes. A set of 458 instances comes under the class 1 and
remaining 241 instances falls under the class 2. Tables 2-4 provide the description of the
attributes present in the dataset.
3.1. Results analysis under CKD dataset
Table 5 provides a comparison of the results offered by different FS models on the applied
CKD dataset. The table values indicated that the gain ratio model selects a total of 10 features
with the best cost of 0.104 which is significantly higher than other methods. Similarly, the
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1663
PCA model also selects a total of 10 features with the best of 0.041. Though the best cost of
PCA is superior to gain ration, it does not outperform the ACO-FS and GA-FS. Next to that,
the GA-FS shows slightly better performance over the other methods except ACO-FS by
attaining a moderate best cost of 0.18763 with the selection of 9 features. However, the ACO-
FS model offers superior results by attaining a least best cost of 0.0084956 with the selection
of 16 features. These values indicated the effective FS performance of the presented ACO-FS
model on the applied CKD dataset.
Table 5 Comparative analysis of state of art with proposed method for CKD Dataset
Methods Best Cost Selected Features
ACO 0.0084956 15, 4, 1, 12, 18, 10, 13, 16, 2, 6, 7, 23, 11, 22, 20, 8
GA-FS 0.0187653 13, 4, 8, 9, 10, 12, 11, 22, 20
PCA 0.0410000 1,5,6,4,8,9,10,12,15,17
Gain Ratio 0.1040000 2,5,4,3,8,7,10,14,21,24
Fig. 3. Best cost analysis of presented model on CKD dataset
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1664
Table 6 Selected Features of CKD using ACO
Features I-1 I-2 I-3 I-4 I-5 I-6 I-7 I-8 I-9 I-10 I-11 I-12 I-13 I-14 I-15 I-16 I-17 I-18 I-19 I-20
Age
Blood Pressure
Specific Gravity - - - - - - - - -
Albumin
Sugar - - - - - - - - -
Red Blood Cells
Pus Cell - - - - - - - - - - -
Pus Cell clumps
Bacteria - - - - - - - - - - - - - - - - - - - -
Blood Glucose Random
Blood Urea
Serum Creatinine - - - - - - - - - - -
Sodium
Potassium - - - - - - - - -
Haemoglobin
Packed Cell Volume
White Blood Cell Count - - - - - - - - -
Red Blood Cell Count
Hypertension - - - - - - - - - - - - - - - - - - - -
Diabetes Mellitus
Coronary Artery
Disease
- - - - - - - - - - - - - - - - - - - -
Appetite - - - - - - - - - - -
Pedal Edema - - - - - - - - - - -
Anaemia - - - - - - - - - - - - - - - - - - - -
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1665
Fig. 3 shows the best cost analysis of the ACO-FS model on the applied CKD dataset. Then,
the set of features chosen by the ACO-FS model under a set of 20 iterations are provided in
Table 6.
Table 7 and Fig. 4 provide a comparison of the results offered by the presented model under
several measures under CKD dataset. On measuring the classifier results under several
performance measures, it is noted that the RF classifier model offers a least performance with
the sensitivity of 95.14, specificity of 90.17, accuracy of 93.86, F-score of 94.19,MCC of
0.87 and kappa value of 89.92 respectively. It is also depicted that the DT classifier model
tries to manage well with the sensitivity of 96.45, specificity of 91.43, accuracy of 95.18, F-
score of 95.23, MCC of 0.89 and kappa value of 91.46 respectively. Similarly, it is shown
that the LR model offers moderate classifier outcome with the sensitivity of 98.24, specificity
of 92.96, accuracy of 95.89, F-score of 96.32, MCC of 0.91 and kappa value of 91.84
respectively. Likewise, the RBFNetwork model shows competitive result over the stochastic
ACO algorithm by attaining sensitivity of 98.35, specificity of 92.99, accuracy of 96.25, F-
score of 96.95, MCC of 0.92 and kappa value of 92.07 respectively. Finally, the stochastic
ACO model shows superior outcome with the maximum sensitivity of 98.81, specificity of
93.16, accuracy of 96.84, F-score of 97.13, MCC of 0.93 and kappa value of 92.14
respectively.
Table 7 Performance of CKD using Proposed method with various classifiers
Classifier Sensitivity Specificity Accuracy F-score MCC Kappa
Stochastic-ACO 98.81 93.16 96.84 97.13 0.93 92.14
RBFNetwork 98.35 92.99 96.25 96.95 0.92 92.07
LR 98.24 92.96 95.89 96.32 0.91 91.84
DT 96.45 91.43 95.18 95.23 0.89 91.46
RF 95.14 90.17 93.86 94.19 0.87 89.92
Fig. 4. Classifier results analysis of diverse models on CKD dataset
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1666
For further verification of the presented model on the applied CKD dataset, a comparison is
made with the recent methods interms of accuracy and is shown in Table 8 and Fig. 5. As
illustrated in the figure, it is shown that the MLP offers worst classification by attaining a
minimum accuracy of 51.50. Simultaneously, the SVM model shows somewhat better results
over the MLP and offered a slightly higher accuracy of 60.70. Afterwards, the NN model
exhibits manageable outcome over SVM and MLP models by attaining an accuracy of 87.00.
Simultaneously, the PNN model shows effective outcome by offering a near optimal
accuracy of 96.70. At last, it is noted that the stochastic ACO model offers optimal results by
attaining a maximum accuracy of 96.84. These values proved that the stochastic model is
found to an effective tool for the classification of medicinal CKD dataset.
Table 8 Performance of stochastic ACO with recent methods on CKD dataset
Classifier Accuracy
Stochastic-ACO 96.84
PNN 96.70
SVM 60.70
NN 87.00
MLP 51.50
Fig. 5. Accuracy analysis of different recent methods on CKD dataset
3.2. Results analysis under ILP dataset
Table 9 provides a comparison of the results offered by different FS models on the applied
ILP dataset. The table values indicated that the selects a total of 10 features with the best cost
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1667
of 0.104 which is significantly higher than other methods. Similarly, the PCA model also
selects a total of 10 features with the best of 0.041. Though the best cost of PCA is superior
to gain ration, it does not outperform the ACO-FS and GA-FS. Next to that, the GA-FS
shows slightly better performance over the other methods except ACO-FS by attaining a
moderate best cost of 0.18763 with the selection of 9 features. However, the ACO-FS model
offers superior results by attaining a least best cost of 0.0084956 with the selection of 16
features. These values indicated the effective FS performance of the presented ACO-FS
model on the applied ILP dataset.
Table 9 Comparative analysis of state of art with proposed method for ILP Dataset
Methods Best Cost Selected Features
ACO 0.16279 9, 7, 10, 5, 8, 1, 3, 6
GA-FS 0.19837 8, 3, 7, 2, 6, 9
PCA 0.03850 1,2,3,4,5,6,7
Gain Ratio 0.17382 1,5,4,3,6,8
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1668
Table 10 Selected Features of ILP using ACO
Features I-1 I-2 I-3 I-4 I-5 I-6 I-7 I-8 I-9 I-10 I-11 I-12 I-13 I-14 I-15 I-16 I-17 I-18 I-19 I-20
Age of Patient
Gender of Patient - - - - - - - - - - - - - - - - -
Total Bilirubin (TB) - - - -
Direct Bilirubin (DB) - - - - - - - - - - - - - - - -
Alkphos - -
SGPT Alamine
SGOT Aspartate
Total Proteins (TP)
ALB Albumin -
A/G Ratio
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1669
Fig. 6. Best cost analysis of presented model on ILP dataset
Fig. 6 shows the best cost analysis of the ACO-FS model on the applied ILP dataset. Then,
the set of features chosen by the ACO-FS model under a set of 20 iterations are provided in
Table 10.
Table 11 and Fig. 7 provide a comparison of the results offered by the presented model under
several measures under ILP dataset. On measuring the classifier results under several
performance measures, it is noted that the RBFNetwork classifier model offers a least
performance with the sensitivity of 71.35, specificity of 0, accuracy of 71.35, F-score of
83.28, MCC of 0 and kappa value of 0 respectively. It is also depicted that the DT classifier
model tries to manage well with the sensitivity of 75.65, specificity of 44.09, accuracy of
68.78, F-score of 79.13, MCC of 0.18 and kappa value of 17.74 respectively. Similarly, it is
shown that the LR model offers moderate classifier outcome with the sensitivity of 75.91,
specificity of 53.93, accuracy of 72.55, F-score of 82.41, MCC of 0.323 and kappa value of
21.95 respectively. Likewise, the RFmodel shows competitive result over the stochastic ACO
algorithm by attaining sensitivity of 76.33, specificity of 49.12, accuracy of 71.01, F-score of
80.90, MCC of 0.22 and kappa value of 21.64 respectively. Finally, the stochastic ACO
model shows superior outcome with the maximum sensitivity of 92.19, specificity of 83.75,
accuracy of 89.88, F-score of 92.97, MCC of 0.74 and kappa value of 74.93 respectively.
Table 11 Performance of ILP using Proposed method with various classifiers
Classifier Sensitivity Specificity Accuracy F-score MCC Kappa
Stochastic-
ACO 92.19 83.75 89.88 92.97 0.74 74.93
RBFNetwork 71.35 0 71.35 83.28 0 0
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1670
LR 75.91 53.93 72.55 82.41 0.23 21.95
DT 75.65 44.09 68.78 79.13 0.18 17.74
RF 76.33 49.12 71.01 80.90 0.22 21.64
Fig. 7. Classifier results analysis of diverse models on ILP dataset
For further verification of the presented model on the applied ILP dataset, a comparison is
made with the recent methods interms of accuracy and is shown in Table 12 and Fig. 8. As
illustrated in the figure, it is shown that the NBTree offers worst classification by attaining a
minimum accuracy of 67.01. Simultaneously, the KStar model shows somewhat better results
over the NBTree and offered a slightly higher accuracy of 73.07. Afterwards, the SVM model
exhibits manageable outcome over KStar and NBTree models by attaining an accuracy of
75.10. Simultaneously, the Bayesian Network model shows effective outcome by offering a
near optimal accuracy of 66.09. At last, it is noted that the stochastic ACO model offers
optimal results by attaining a maximum accuracy of 89.88. These values proved that the
stochastic model is found to an effective tool for the classification of medicinal ILP dataset.
Table 12 Performance of stochastic ACO with recent methods on ILP dataset
Classifier Accuracy
Stochastic-ACO 89.88
KStar 73.07
NBTree 67.01
SVM 75.10
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1671
Bayesian Network 66.09
Fig. 8. Accuracy analysis of different recent methods on ILP dataset
3.3. Results analysis under WBC dataset
Table 13 provides a comparison of the results offered by different FS models on the applied
WBC dataset. The table values indicated that gain ratio model selects a total of 6 features
with the best cost of 0.128392 which is significantly higher than other methods. Similarly, the
PCA model also selects a total of 7 features with the best of 0.099000. Though the best cost of
PCA is superior to gain ration, it does not outperform the ACO-FS and GA-FS. Next to that,
the GA-FS shows slightly better performance over the other methods except ACO-FS by
attaining a moderate best cost of 0.098372 with the selection of 5 features. However, the ACO-
FS model offers superior results by attaining a least best cost of 0.068607 with the selection of
6 features. These values indicated the effective FS performance of the presented ACO-FS
model on the applied WBC dataset.
Table 13 Comparative analysis of state of art with proposed method for WBC Dataset
Methods Best Cost Selected Features
ACO 0.068607 7, 8, 1, 6, 2, 4
GA-FS 0.098372 1,3,2,7,6
PCA 0.099000 1,2,3,4,5,6,7
Gain Ratio 0.128392 9, 3, 2, 1, 6, 8
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1672
Fig. 9. Best cost analysis of presented model on WBC dataset
Fig. 9 shows the best cost analysis of the ACO-FS model on the applied WBC dataset. Then,
the set of features chosen by the ACO-FS model under a set of 20 iterations are provided in
Table 14.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1673
Table 14 Selected Features of WBC using ACO
Features I-1 I-2 I-3 I-4 I-5 I-6 I-7 I-8 I-9 I-10 I-11 I-12 I-13 I-14 I-15 I-16 I-17 I-18 I-19 I-20
Sample ID Number
Clump Thickness
Uniformity of Cell Size - - - - - - - - - - - - - - - - - - -
Uniformity of Cell Shape -
Marginal Adhesion - - - - - - - - - - - - - - - -
Single Epithelial Cell Size
Bare Nuclei - - - -
Bland Chromatin -
Normal Nucleoli - - - - - - - - - - - - - - - - - - -
Mitoses - - - - - - - - - - - - - - - - - - - -
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1674
Table 15 and Fig. 10 provide a comparison of the results offered by the presented model
under several measures under WBC dataset. On measuring the classifier results under several
performance measures, it is noted that the DT classifier model offers a least performance with
the sensitivity of 96.05, specificity of 91.76, accuracy of 94.56, F-score of 95.84, MCC of
0.87 and kappa value of 87.99 respectively. It is also depicted that the LR classifier model
tries to manage well with the sensitivity of 97.37, specificity of 95.02, accuracy of 96.56, F-
score of 97.37, MCC of 0.92 and kappa value of 92.40 respectively. Similarly, it is shown
that the RBFNetwork model offers moderate classifier outcome with the sensitivity of 98.20,
specificity of 91.73, accuracy of 95.85, F-score of 96.79, MCC of 0.91 and kappa value of
90.95 respectively. Likewise, the RFmodel shows competitive result over the stochastic ACO
algorithm by attaining sensitivity of 98.23, specificity of 94.33, accuracy of 95.85, F-score of
96.79, MCC of 0.91 and kappa value of 90.95 respectively. Finally, the stochastic ACO
model shows superior outcome with the maximum sensitivity of 98.02, specificity of 95.47,
accuracy of 97.14, F-score of 97.81, MCC of 0.93 and kappa value of 93.67 respectively.
Table 15 Performance of WBC using Proposed method with various classifiers
Classifier Sensitivity Specificity Accuracy F-score MCC Kappa
Stochastic-ACO 98.02 95.47 97.14 97.81 0.93 93.67
RBFNetwork 98.20 91.73 95.85 96.79 0.91 90.95
LR 97.37 95.02 96.56 97.37 0.92 92.40
DT 96.05 91.76 94.56 95.84 0.87 87.99
RF 98.23 94.33 96.85 97.58 0.93 93.07
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1675
Fig. 10. Classifier results analysis of diverse models on WBC dataset
For further verification of the presented model on the applied WBC dataset, a comparison is
made with the recent methods interms of accuracy and is shown in Table 16 and Fig. 11. As
illustrated in the figure, it is shown that the BP offers worst classification by attaining a
minimum accuracy of 94.40. Simultaneously, the SVM model shows somewhat better results
over the BP and offered a slightly higher accuracy of 94.50. Afterwards, the GAW+BP and
GAW+CSSSVM models exhibit manageable outcome over SVM and BP models by attaining
an identical accuracy of 95.00. Simultaneously, the IGSAGAW+CSSSVM and
IGSAGAW+BP models show effective outcome by offering a near optimal accuracy of 95.80
and 96.30. At last, it is noted that the stochastic ACO model offers optimal results by
attaining a maximum accuracy of 97.14. These values proved that the stochastic model is
found to an effective tool for the classification of medicinal WBC dataset.
Table 16 Performance of stochastic ACO with recent methods on WBC dataset
Classifier Accuracy
Stochastic-ACO 97.14
SVM 94.50
GAW+CSSSVM 95.00
IGSAGAW+CSSSVM 95.80
BP 94.40
GAW+BP 95.00
IGSAGAW+BP 96.30
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1676
Fig.11. Accuracy analysis of different recent methods on WBC dataset
4. CONCLUSION
This paper has presented an optimal FS based classification model for data mining in
healthcare industry. Several studies employed ACO algorithm for data classification. Though
ACO algorithm provides several benefits such as parallelism, self-learning, and efficient
information feedback, it shows poor convergence due to the absence of information at the
earlier stages. To resolve this issue, a new stochastic ACO algorithm is presented by the
incorporation of periodic partial reinitialization of the population into the ACO to improve its
total efficiency. A detailed experimentation takes place by the use of three benchamrk dataset
namely CKD, ILP and WBC dataset. The experimental results clearly showcased the
extraordinary performance of the proposed method over the compared methods.
References
[1] Koh, H.C. and Tan, G., 2011. Data mining applications in healthcare. Journal of
healthcare information management, 19(2), p.65.
[2] Tomar, D. and Agarwal, S., 2013. A survey on Data Mining approaches for
Healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), pp.241-
266.
[3] Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F.
and Hua, L., 2012. Data mining in healthcare and biomedicine: a survey of the
literature. Journal of medical systems, 36(4), pp.2431-2448.
[4] Srinivas, K., Rani, B.K. and Govrdhan, A., 2010. Applications of data mining
techniques in healthcare and prediction of heart attacks. International Journal on
Computer Science and Engineering (IJCSE), 2(02), pp.250-255.
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1677
[5] Herland, M., Khoshgoftaar, T.M. and Wald, R., 2014. A review of data mining using
big data in health informatics. Journal of Big data, 1(1), pp.1-35.
[6] Banaee, H., Ahmed, M.U. and Loutfi, A., 2013. Data mining for wearable sensors in
health monitoring systems: a review of recent trends and
challenges. Sensors, 13(12), pp.17472-17500.
[7] Tan, K. C., Teoh, E. J., Yu, Q., &Goh, K. C. (2009).A hybrid evolutionary algorithm
for attribute selection in data mining. Expert Systems with Applications, 36(4),
8616-8630.
[8] Chetty, N., Vaisla, K. S., &Sudarsan, S. D. (2015, December). Role of attributes
selection in classification of Chronic Kidney diseasepatients.In Computing,
Communication and Security (ICCCS), 2015 International Conference on (pp. 1-
6).IEEE.
[9] Wibawa, M. S., Maysanjaya, I. M. D., & Putra, I. M. A. W. (2017, August).Boosted
classifier and features selection for enhancing chronic kidney diseasediagnose.
In Cyber and IT Service Management (CITSM), 2017 5th International Conference
on (pp. 1-6). IEEE.
[10] Tazin, N., Sabab, S. A., &Chowdhury, M. T. (2016, December).Diagnosis of Chronic
Kidney diseaseusing effective classification and FStechnique.In Medical
Engineering, Health Informatics and Technology (MediTec), 2016 International
Conference on (pp. 1-6).IEEE.
[11] Rajeswari, K., Vaithiyanathan, V., &Neelakantan, T. R. (2012). FS in ischemic heart
disease identification using feed forward neural networks. Procedia
Engineering, 41, 1818-1823.
[12] Polat, H., Mehr, H. D., & Cetin, A. (2017).Diagnosis of Chronic Kidney
diseaseBased on Support Vector Machine by FS Methods. Journal of medical
systems, 41(4), 55.
[13] Khemphila, A., &Boonjing, V. (2011, August).Heart disease classification using
neural network and FS.In Systems Engineering (ICSEng), 2011 21st International
Conference on (pp. 406-409).IEEE.
[14] https://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease
[15] https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset)
[16] https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
Journal of Information and Computational Science
Volume 9 Issue 12 - 2019
ISSN: 1548-7741
www.joics.org1678