Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Classification of Abnormalities in Mammograms using
Adaptive Approach
Sreedevi S*1, Terry Jacob mathew2 and Srikripa V3
1Associate Professor, Dept. of Computer Science, Sree Ayyappa College, Eramallikkara,
Chengannur, Kerala, India 2School of Computer Sciences, Mahatma Gandhi University,
Kottayam, Kerala, India and MACFAST 3Academic Assistant, IIITM-K, Kerala, India
1 [email protected], [email protected], [email protected]
Abstract
Breast cancer is one of the most deadly diseases of today, but the early detection of
breast cancer is vital to reduce its mortality rates. This paper proposes an automated
diagnosis of mammograms by categorizing them as benign, malignant or normal. The
proposed method concentrates on the algorithmic development of automated noise
removal, contrast enhancement, pectoral muscle removal, segmentation of Region of
Interest (ROI) in micro-calcification clusters, feature extraction, feature selection and
classification of mammograms. Fourteen textural and statistical features are extracted
from the segmented micro-calcification clusters using Gray Level Co-occurrence Matrix
(GLCM) for 00 angle and 3 pixel distances. A total of 7 features are finally utilized from
the 14 extracted features for classification. The classification performed using Naïve
Baye’s and Support Vector Machine classifier resulted in an accuracy of 94.89% and
87.35% respectively. The proposed method also benefited in dimensionality reduction,
reduced memory usage and time reduction, resulting in overall performance
enhancement.
Keywords: Breast Cancer, Digital Mammography, H -Domes transformation, CLAHE,
SVM, Naïve Baye’s.
1. Introduction
Breast cancer is one of the most deadly diseases of today and it is one of the leading
causes of mortality among woman, around the world. Mammography has emerged as a
major diagnostic procedure in the detection and screening of breast cancer [1]. Among
the large number of imaging modalities available today, high quality mammography is
considered as the most cost-effective and sensitive method for detecting breast cancers at
1 *Corresponding Author
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1389
an early stage [2]. The detection of breast cancer at its early stage is important to reduce
mortality, because treatment in the early stages is found to significantly increase the
survival rate of patients [3]. Early detection of breast cancer can be achieved with the
timely screening of mammograms; however, this is heavily reliant on the correct
interpretation of mammograms by an experienced radiologist. Clustered micro-
calcifications present in mammographic X-ray images are an important indicator for
early detection of breast cancer [4]. These micro-calcifications are tiny granule-like
deposits of calcium with diameters up to about 0.1 mm and with an average diameter of
0.3 mm. A micro-calcification cluster is indicated by the presence of three or more
noticeable micro-calcifications within a square centimeter region of the mammogram [5].
Different methods with highly sophisticated algorithms have been developed for the
automatic detection of breast cancer in digital mammograms. Studies report that relevant
features extracted from the individual micro-calcifications [6] or from the Region of
Interest (ROI), which contains micro-calcification clusters [7] can detect breast cancers
accurately [8].
Pelin et al. [9] developed a wavelet based Support Vector Machine (SVM) method for
capturing information on micro-calcifications in digital mammograms and for the
classification of mammographic masses as benign or malignant. The masses were
segmented manually by radiologists and wavelet-based features are extracted from the
ROI. Final decision was taken by the classifier trained on the extracted features and
resulted in total classification accuracy of 84.8 %.
Bose et al. [10] presented a new method for the detection and classification of
micro-calcifications in mammogram images. This approach included four stages:
preprocessing, segmentation, feature extraction and classification. In preprocessing,
adaptive median filtering is used to remove noise from the image. Segmentation of
pectoral muscles and micro-calcifications is done with Fuzzy c-Means clustering (FCM)
algorithm. Nine features are extracted from the Low Low (LL) band of wavelet
transform and Artificial Neural Network (ANN) is used for classification. This work
used only conventional methods for detection and classification.
Deeba et al. [11] developed a new classification method for identifying
abnormalities in digital mammograms using Particle Swarm Optimized Wavelet Neural
Network (PSOWNN). They developed a detection algorithm based on texture energy
measures from mammograms. They implemented the algorithm with real clinical
database of 216 mammograms and the result gave an area under the Receiver Operating
Characteristic (ROC) curve for this algorithm as 0.96853, with a sensitivity and
specificity of 94.167% and 92.105% respectively.
Boulehmi et al. [12] proposed a micro-calcification (MC) detection system in
which they developed automatic methods for the enhancement of a mammogram by
using the method of galactophorous tree interpolation, segmentation of micro-
calcifications using Generalized Gaussian Density (GGD) estimation and a Bayesian
back-propagation neural network. Micro-calcifications were further classified using a
neuro-fuzzy system.
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1390
Nashid Alam and Reyer Zwiggelaar [13] proposed a system for automatic
differentiation between benign and malignant MC clusters based on their morphology,
texture, and the distribution of individual and global features using an ensemble
classifier. The relevant features are fed into an ensemble classifier to classify the MC
clusters. The validity of the proposed method was investigated using Mammographic
Image Analysis Society (MIAS) and Digital Database for Screening Mammography
(DDSM) databases. The results indicate that the approach in [13] outperforms the current
state-of-the-art methods.
Birmohan Singh and Manpreet Kaur [14] proposed an approach which
enhances the region of interest, using morphological operations. They extracted two
types of features, related to cluster shape and cluster texture and applied SVM for
classification. A new set of shape features based on the recursive subsampling method is
added to the feature set, which improved the classification accuracy of the system. These
features are capable of differentiating malignant and benign tissue regions. To examine
the performance of the proposed approach, images are taken from DDSM database and
an accuracy of 94.25% was recorded.
The disadvantage of these methods is that they used generic existing methods
for removing noise, resulting in incomplete noise removal. Also these methods extracted
features from the segmented ROI for classification. Hence, in order to improve upon the
limitations of the related works in mammogram classification, a novel approach of
removing noise and pectoral muscles is introduced here. For classification, features are
taken from the micro-calcification clusters present in the ROI and are further segmented
from these clusters for obtaining accurate results.
The proposed work concentrates on the algorithmic development of automated
noise removal, contrast enhancement, pectoral muscle removal, finding the ROI,
segmentation of micro-calcification clusters from segmented ROI, feature extraction,
feature selection and classification of mammograms. The mammogram image is
denoised by a detection and filtering mechanism. Contrast Limited Adaptive Histogram
Equalization (CLAHE) is applied to enhance the contrast of intensity of the image, while
modified tracking algorithm is used for removing pectoral muscles. The method for
segmenting ROI in the mammograms uses a hierarchical fuzzy c-means clustering,
incorporated with a feature vector containing 14 features including statistical and textural
features extracted from pre-processed image. Adaptive H-Dome transformation with a
threshold is used to segment the micro-calcification clusters from the segmented ROI.
Fourteen textural and statistical features are extracted from the segmented micro-
calcification clusters using Gray Level Co-occurrence Matrix (GLCM). Out of 14
extracted features, 7 features are used for classification using Naïve Baye’s and SVM
classifier.
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1391
2. Proposed Algorithm
2.1. Algorithm Steps
The algorithm for computer aided detection of breast cancer can be explained in the
following steps.
Step 1: Read the image.
Step 2: Flip the image if it is left oriented.
Step 3: Remove noise using MROR-ENLM algorithm.
Step 4: Remove pectoral muscles & background objects.
Step 5: Enhance the image using Contrast Limited Adaptive Histogram Equalization.
Step 6: Segment ROI’s that consists of micro-calcification a cluster using a hierarchical
fuzzy c means clustering with 14 features.
Step 7: Apply adaptive h-domes transformation and threshold the image and apply
morphological operations to identify micro-calcification clusters.
Step 8: Extract 14 features from the micro-calcification clusters.
Step 9: Select 7 relevant features using information gain attribute evaluator and ranker
search method.
Step 9: Perform the classification using Naïve Baye’s and SVM classifier.
2.2. Pre-processing
Mammogram images often contain noise, pectoral muscles and unwanted background
objects like name tags and other identification marks. Preprocessing stage deals with
noise removal, pectoral muscle & background objects removal. For noise removal, a
technique named Modified Robust Outlyingness Ratio with Extended Non Local Means
filter (MROR-ENLM) is used and for removing pectoral muscle & background objects, a
new tracking algorithm integrated with connected component labeling is employed.
Contrast-Limited Adaptive Histogram Equalization (CLAHE) is used to enhance the
contrast intensity of the image. CLAHE operates on small regions within the image,
called tiles, instead of the whole image.
2.3. Segmentation
A novel Feature Based Spatial Fuzzy c-means clustering Method (FBSFCM) is
implemented to segment the region of interests for further processing. Fourteen textural
and statistical features are extracted from the preprocessed mammogram images using
Gray Level Co-occurrence Matrix (GLCM) for 00 angle and 3 pixel distances. FBSFCM
method is implemented by incorporating both the features extracted from pre-processed
image and the spatial information to segment the correct region of interests for further
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1392
processing. The extracted features used for segmentation are contrast, correlation,
energy, homogeneity, entropy, dissimilarity, autocorrelation; cluster prominence, cluster
shade, sum average, sum entropy, variance, information measures of Correlation1
(correlation1) and difference entropy. After segmenting the ROI, adaptive h-domes
transformation and a threshold based on the intensity classification of the image are used
to segment micro-calcification clusters in the ROI, based on the local maxima. The
algorithm selects all regional maximum in ROI and is independent of any size or shape
criterion. Fig. 1 represents the h-domes transformation. A regional maximum M of a
gray scale image I is a connected component of pixels with a given value h (plateau at
altitude h), such that every pixel in the neighborhood of M has a strictly lower value
[15]. For extracting “domes” of a given height, subtract arbitrary gray-level constant h
from I and is called h-domes. The value of h is not constant for all the images and this
value is higher for higher intensity images as it depends on the average intensity value of
the image. So the brightness variations in mammograms due to breast density differences
can be nullified after transformation. The h-dome image Dh(I) of the h-domes of a gray
scale image I is given by
Dh(I) = I − ρI(I − h) (1)
The value of ρ is between 0 and 1. The transformed image is converted to a
binary image by applying Otsu's method [17]. The obtained binary image is used to map
the original gray scaled image and the new image, thus obtained is free from background
intensities and contains only intensity peaks.
Figure 1. H-dome Transformation of Gray Sale Image I
2.4. Feature Selection
After segmenting micro-calcification clusters from the ROI obtained from the pre-
processed image, the next step is to extract the same fourteen features from the clusters
using GLCM for 00 angle and three pixel distance. This follows feature selection, which
is the process of removing irrelevant features and helps to reduce dimensionality of the
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1393
feature. In this work, we used the technique of information gain attribute evaluator and
ranker search method to select attribute and rank them based on the filter and wrapper
method. Entry values vary from no information to maximum information. Those
attributes that contribute more information will have a better information gain value and
may be selected, whereas mammograms without much information will have a lower
score and can be removed. This selection process thus selects 7 relevant features out of
the fourteen features extracted using GLCM for classification. The selected features are
contrast, correlation, energy, homogeneity, autocorrelation, dissimilarity and entropy.
2.5. Classification
Classification is the most important step in automatic breast cancer detection
system. Various measurements based on co-occurrence matrix features, are given as
inputs to the classifier. Here SVM and Naïve Baye’s classifiers are used for
classification.
Naive Baye’s
Naïve Baye’s is one of the simplest density estimation methods from which we
can form one of the standard classification methods in machine learning. It works on the
basis of Baye’s theorem [18]. Compared to other classification algorithms, Naïve Baye’s
is optimal for accompanying multiple prior probabilities from the training set. According
to Baye’s Theorem, posterior probability, 𝑃(𝑐 𝑥)⁄ is calculated from 𝑃(𝑐) , 𝑃(𝑥)
and 𝑃(𝑥 𝑐)⁄ . Naive Baye’s classifier works on the assumption of conditional
independence which says that the effect of the value of a predictor (x) on a given class
(c) is independent of the values of other predictors.
𝑃(𝑐 𝑥)⁄ =𝑃(𝑥 𝑐)𝑃(𝑐)⁄
𝑃(𝑥)
Where, 𝑃(𝑐 𝑥)⁄ is the posterior probability.
𝑃(𝑐) is the class prior probability
𝑃(𝑥 𝑐)⁄ is the likelihood which is the probability of predictor given class.
𝑃(𝑥) is the predictor prior probability.
Support Vector Machine (SVM)
The SVM is a supervised learning algorithm that infers a function from a set of labelled
examples. The function takes new examples as input, and produces predicted labels as
output. The output of the algorithm is a mathematical function that is defined on the
space from which the examples are taken, and takes one of the two values at all points in
the space, corresponding to the two class labels that are considered in binary
classification. SVM is a classification and regression prediction tool that uses machine
learning theory to maximize predictive accuracy while automatically avoiding over-
fitting of the data. Suppose a set of data points that belong to one of two classes, and the
goal is to decide the class to which a new data point may belong. In SVM, a data point is
viewed as a p-dimensional vector (a list of p numbers), and we want to know whether we
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1394
can separate such points with a (p − 1) dimensional hyper-plane. This is called a linear
classifier. There are many hyper planes that might classify the data. One reasonable
choice as the best hyper plane is the one that represents the largest separation, or margin,
between the two classes.
III. Experiment and Result
For testing, MIAS database [16] of 322 images that contains left and right breast images
of 161 patients were employed. These images include three types of images such as
normal, benign and malignant. Figure 2 represents original mammogram image taken
from MIAS database (a), corrupted image with 30% noise (b) and restored image using
MROR-ENLM (c). For removing pectoral muscles, right MLO mammograms are to be
flipped to left MLO and it is showed in Figure 3. Figure 4 shows the ground truth
marked by an expert radiologist, enhanced image, pectoral identified image and pectoral
removed image of mdb007 and mdn008. For segmenting the ROI that contain micro-
calcification clusters, the Feature Based Spatial Fuzzy C-Means clustering (FBSFCM)
method is used by integrating both spatial and feature information along with
conventional FCM. The result obtained by applying FBSFCM and corresponding
benchmark marked by a radiologist is given in Figure 5. The experiment is performed
with 322 images and the resulted accuracy measures are given in Table 1. Table 2 shows
comparison of accuracy measures of two methods and its graphical representation is
given in Figure 6.
(a) (b) (c)
Figure 2. (a) Original image mdb058 (b) mdb058 corrupted with30% noise. (c) ) Restored images of mdb058 using
MROR-ENLM
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1395
(a) (b) (c)
Figure 3. (a) Original image mdb007 (b) Flipped image. (c) Enhanced image
Imag
e Id
Ground truth Enhanced
image
Pectoral
identified
image
pectoral muscle
removed
mdb
007
mdb
008
Figure 4. The Ground Truth marked by an Expert Radiologist, Enhanced
Image,Ppectoral Identified Image and Pectoral Removed Image
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1396
Figure 5. Benchmark and the result of segmentation using FBSFCM
The performance of each classifier measured using the following formula:
Sensitivity =𝑇𝑃
𝑇𝑃+𝐹𝑁 (2)
Specificity = TN
TN+FP (3)
Accuracy = TP+TN
TP+FN+TN+FP (4)
Table -1 Experiment Results
Table -2 Comparisons of accuracy measures of two methods
Labelled
image
Segmented
ROI using
FBSFCM
Micro-
calcification in
FBSFCM
segmented ROI
Classifier No. of
images TP
TN FP FN
SVM 322 42 226 44 10
NB 322 46 229 39 8
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1397
Method Sensitivity
(%)
Specificity
(%)
Accuracy
(%)
SVM
Classifier 80.76 83.70 83.22
Naive Baye’s
Classifier
85.18 85.44 85.40
Figure 6: Graphical representation of the accuracy comparison
IV.CONCLUSION
The development of adaptive Computer Aided Detection technique for segmenting
mammograms is highly desirable in order to assist radiologists in the interpretation of
abnormalities and to improve the diagnostic accuracy. Noise removal, pectoral muscle
removal and segmentation of micro-calcification clusters play an important role in the
detection of abnormalities in digital mammograms. Every suspicious object can be
detected using a binary image, which is used as a mask for object extraction from the
original image. Feature extraction methods are applied to select relevant features for
classification. Two classifiers are used to detect abnormalities in mammograms and the
results showed that Naive Baye’s classifier gave better results than the SVM classifier.
78.00
80.00
82.00
84.00
86.00
sensitivitySpecificity
Accuracy
In P
ere
nta
ge
Quality Measures
SVM
NB
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1398
Reference
[1] R. J. Ferrari., R. M.. Rangayyan., J. E. Desautels, A. F. Borges, and Frere.,
“Automatic identification of the pectoral mu R. A., scle in mammo-grams”,
IEEE Trans on Medical Imaging, Vol. 23, no.2, pp. 232-245.
[2] R. Takiar, D. Nadayil, A. Nandakumar,.”Projections of number of cancer cases in
india (2010-2020) by cancer groups”, Asian Pac J Cancer Prevew, vol. 11, no.
4,(2010) .
[3] B. Verma and J. Zakos, “A Computer_Aided Diagnosis System for Digital
Mammograms Based on Fuzzy-Neural and Feature Extraction Techniques”. IEEE
Traansactions on Information Technology in Biomedicine, (2001), vol. 5, no. 1.
(2001)
[4] D. H., Davies and D. R Dance,. “Automaic computer detection of clustered
calcifications in digital mammograms”, Phys. Med. Biol. Vol. 35, no. 8, (1990),
pp. 1111-1118.
.
[5] B N BeenaUllala Mata and Dr. M Meenakshi“A Novel Approach for Automatic
Detection of Abnormalities in Mammograms”, Recent Advances in Intelligent
Computational Systems (RAICS), IEEE, (2011), Print ISBN: 978-1-4244-9478-1.
[6] N. Mudigonda and R. Rangayyan, “December). Detection of Breast Masses in
Mammograms by Density Slicing and Texture Flow-Field Analysis”, IEEE
TRANSACTIONS ON MEDICAL IMAGING, Vol 20, no.12, (2001).
[7] M. P. Sampat,. M. K. Markey and A. C. Bovik, “Computer-Aided Detection and
Diagnosis in Mammography. Handbook of Image and Video Processing”, (2005). ,
pp.1195-1217.
[8] M. Wirth, M, “Nonrigid approach to medical image registration matching images
of the breast”, RMIT University. Melbourne, Australia: RMIT University, (2000)
[9] G. Pelin, A. Serbas and G. Pelin, “Mammographic mass calculation using wavelet
based support vector machine”, Journal of Electrical and Electronics
Engineering”, Vol.9, no. 1, (2009)., pp. 867-875.
[10] S. C. Bose, K. R. Shankar Kumar, and M. Karnan, , “Detection of
Microcalcification in Mammograms using Soft Computing Tchniques”, European
Journal of Scientific Research, Vol. 86, no. 1,pp. 103-122.
[11] J. Deeba., N. Albert Singh and S. Tamil Selvi, “ Computer-aided detection of
breast cancer on mammograms: A swarm intelligence optimized wavelet neural
network approach”, Journal of Biomedical Informatics, Vol. 49, (2014), pp. 45-
52.
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1399
[12] H. Boulehmi, H. Mahersia and K. A. hamrouni, “ New CAD System for Breast
Microcalcifications Diagnosis”, International Journal of Advanced Computer
Science and Applications, Vol. 7, no. 4, (2016)pp. 133-143.
[13] N. Alam and Zwiggelaar, “RAutomatic classification of clustered
morocalcification in digitized mammogram using ensemble learning”. The
Fourteenth International Workshop on Breast Imaging, (2018), Atlanta, Georgia,
United States. Georgia, United State.
[14] B. Singh and M. Kaur, “An approach for classification of malignant and benign
mirocalcification clusters”, Indian Academy of Sciences, Vol. 8, no.2, (2018), pp.
39-43.
[15] L. Vincent, “Morphological grayscale reconstruction in image analysis:
Applications and efficient algorithms”, IEEE Trans Image Process,Vol. 2,
(1993). Pp. i176-201.
[16] J. Suckling, J. Parker, D. R. Dance, S. Astley, I. Hutt and C. Boggis, C., “The
mammographic image analysis siciety digital mammogram database. 2nd
International Workshop on Digital Mammography” (1994).
[17] N. Suresh Chandra Satapathy, N. Sri Madhava Raja, V. Rajinikanth,
Amira S. Ashour and Nilanjan Dey, “Multi-level image thresholding using Otsu
and chaotic bat algorithm”, Neural Computing and Applications, Vol. 29, no.
12,(2018), pp. 1285-1307
[18] V. Priya , N. Sathya, “Classification and Prediction of Dermatitis Dataset Using
Naïve Bayes And Value Weighted Naïve Bayes Algorithms”, International
Research Journal of Engineering and Technology (IRJET), Vol. 06, no. 02,
(2019), pp. 1077-1081.
Journal of Information and Computational Science
Volume 10 Issue 2 - 2020
ISSN: 1548-7741
www.joics.org1400