[IEEE 2011 7th Iranian Conference on Machine Vision and Image Processing (MVIP) - Tehran, Iran (2011.11.16-2011.11.17)] 2011 7th Iranian Conference on Machine Vision and Image Processing

ReliefF-based Feature Selection for Automatic Tumor Classification of Mammogram Images

Abed Heshmati Computer Department,

Tarbiat Moallem University Tehran, Iran

E-mail: [email protected]

Roya Amjadifard Computer Department,



Jamshid Shanbehzadeh Computer Department,



Abstract— mammography is a powerful manual method to detect breast cancer in its early stage. This paper looks at wavelet based machine vision method to automate breast cancer diagnosis by the use of digital mammogram images and presents method to improve their performance by employing feature selection. Machine vision scheme consists of three steps. The first step is preprocessing and here a three levels decomposition wavelet transform of mammogram images is the first step. The second step is the analysis step which consists of feature extraction and selection. Wavelet transform coefficients are the extracted features. Feature selection is the focus of this paper and we employ ReliefF to reduce the number of features by removing the redundant and retaining the most informative ones to find an optimum set of features among the wavelet coefficients for the third step which is the recognition phase. As feature selection reduces the number of features with attention on retaining the most informative ones this improves the performance of recognition phase. The optimum set of features is the input of recognition phase. This paper uses support vector machine as the classifier of recognition phase to distinguish benign mass and malignant tumor. Experimental results verify the performance of the proposed method.

Keywords: Breast Cancer, Mammography, Wavelet Transform, Feature Extraction, Feature Selection, Support Vector Machine

I. INTRODUCTION Breast cancer (BC) is a significant public health problem

in world. In 2009, American Cancer Society reported approximately 269,800 deaths in women where the cause of 15% was BC [1]. The diagnosed cancer cases were 713,220 and the reason for 27% of them was BC [1] where, the possibility of curing BC completely is high if it is detected at an early stage. Detection of BC, in its initial stage, improves the survival rate of patients 30% more [2].

Digital mammography (DM) is one of the most suitable methods for early detection of BC [3] and, it is a convenient and easy tool in detecting tumors. Literature review showed the effectiveness of DM in BC diagnosis [4]. Mammogram is an X-ray image used to diagnose BC. Breast tumors and masses appear in the form of dense regions in mammograms. The difference between benign mass and malignant tumor is that the first one generally possess smooth, round and well

circumscribed boundaries, while the second one possess speculated, rough and blurry boundaries. In addition, subtle texture differences have been observed between benign and malignant masses with former being mostly homogeneous and the later shows heterogeneous textures [5].

Image feature extraction is an important step in the analysis step of medical image processing. Features of image texture are sensitive respect to scale and they are variable. This makes extracting robust features complicated. The features could be extracted directly from spatial data or transformed data. A different space by transforms such as Fourier or wavelets could change data into specific characteristics conveying special meaning and compacting information in less space. [6].

Wavelet transform (WT) provides an efficient image representation and recently several schemes based on WT for mammogram analysis were introduced [7]. Boccignone et al. presented a wavelet based algorithm to detect microcalcification clusters in digital mammogram [8]. Liu et al. demonstrated that the use of multiresolution analysis of mammograms improves the efficiency of BC diagnosis based on WT [9].

Rashed et al. [6] employed a multi-resolution mammogram analysis in multilevel decomposition to extract a fraction of the biggest coefficients. They showed the biggest coefficients in multilevel decomposition improved BC diagnosis. Mousa et al. proposed a WT based analysis scheme and used Adaptive Neuro-fuzzy Inference System (ANFIS) in detection step to distinguish benign masses and malignant tumors [10] and, the results showed a successful classification rate. Sakka et al. [11] performed a comparative study on wavelet functions widely used in feature extraction and microcalcification detection by decomposing the mammograms into different sub-bands, and reconstructing the mammogram from high frequency sub-bands, due to the fact that microcalcification correspond to high frequency sub-bands of image. Experimental results showed that employing wavelet obtained significant detecting result.

The focus of this paper is on feature selection (FS) to improve the speed and recognition rate of BC detection. This paper employs wavelet based feature because of their

U.S. Government work not protected by U.S. copyright

successful results reported in literature and, improves the performance by selecting an optimum subset of wavelet coefficients where this step has less attention in other BC detection papers.

This paper uses dataset of mammogram images of Mammographic Image Analysis Society (MIAS) [12]. We choose region of interest (ROI) of dataset images with the size of 64×64 pixels containing the abnormality approximately at the center of ROI. Each image undergoes WT decomposition and we extract a set of the corresponding coefficients of each image. To increase classification efficiency and reducing feature dimensions, this paper employs ReliefF algorithm to select the most relevant and non-redundant features. Finally, support vector machine (SVM) classifies mammogram images.

The organization of the rest of this paper is as follows. Section 2 gives a brief introduction on WT. Section 3 discusses the methodology of the proposed method with focus on feature selection. Section 4 shows the experimental results and discussions.

II. WAVELET TRANSFORM Wavelet transform represents the frequency content of

image and their location in spatial simultaneously. A wavelet is a waveform of limited duration with an average value of zero [13]. Multiresolution capability of WT provides the opportunity to look at the image features at different resolution based on the properties of the image objects. An interesting property of WT is its ability to extract coarser resolution from the finer ones. Fig. 1 and 2 present this feature of WT. These figures show how it is possible to extract coarser resolution. The output of each state generates the coefficients that construct a coarser resolution. Each resolution is equal to a

frequency band of a signal or an image. Fig. 1 presents one dimensional WT and Figure 2 presents two dimensional WT. More detail information on WT can be seen in [14].

III. MATERIALS AND METHODS Fig. 3 shows the various stages followed for the design of a

BC recognition system. The first step extracts the image and then image undergoes preprocessing, then features are extracted and selected and finally the features are feed into classifier [15]. The organization of this section is as follows. Section A introduces the data set employed in simulation and, Section B explains feature extraction and selection; and Section C discusses SVM classification.

A. Data Set The proposed algorithm is validated using the

mammograms obtained from the freely available digital mammogram images of MIAS. Table І illustrates the distribution of the images in MIAS data set. These images were digitized at a resolution of 50 micron per pixel, with 1024×1024 pixel size and at 256 gray levels and they were investigated and labeled by an expert based on technical experience and biopsy. The dataset is selected due to the various cases it includes. It is also widely used in similar research works [2, 3, 6 and 7]. The dataset is composed of 322 mammograms of right and left breast, from 161 patients, where 51 were diagnosed as malignant, 64 as benign and 207 as normal. The abnormalities are classified into microcalcification, circumscribed mass, speculated mass, ill-defined mass, architectural distortion, and asymmetry. The original mammograms are 1024×1024 pixels, and almost 50% of the whole image comprised of the background with a lot of noise. Therefore a manual image cropping operation is applied

g ↓2

↓2

A f xD f x

hg

↓2

↓2

A f x

D f x

Figure 1. A wavelet decomposition of a signal A f x

A f

Original image

A1 H1

V1 D1

H1

V1 D1 V1 D1

H1 A2 H2

V2 D2 V2 D2

H2

Figure 2. Wavelet multiresolution decomposition for three levels

TABLE I. The Distribution of MIAS Data Set

to extract a 64×64 image as ROI, where the center of the abnormality area is selected to be the center of ROI.

B. Feature Extraction and Selection Feature extraction and selection are important steps in BC

detection and classification. An optimum feature set should have effective and discriminating features, while mostly reduce the redundancy of features pace to avoid ‘‘curse of dimensionality’’ problem [16]. The ‘‘curse of dimensionality’’ suggests that the sampling density of the training data is too low to promise a meaningful estimation of a high dimensional classification function with the available finite number of training data. For some advanced classification methods, such as artificial neural network and support vector machine, the dimension of feature vectors not only highly affects the performance of the classification, but also determines the training time of the algorithm. Thus, extracting useful features and selecting suitable features is a crucial task for Computer Aided Diagnosis (CAD) systems.

The most important step for classification task is to extract suitable features capable of distinguishing between different classes. As discussed in introduction, Multi-scale representations have proved usefulness in image processing. Wavelet analysis is one way to generate such representation.

Once the images are cropped as described, wavelet transform is applied and the feature vectors are extracted. Features are extracted from the ROI based on transform coefficients. This work we uses Daubechies-1(db1), Daubechies-4(db4) and Daubechies-8(db8) wavelet function with three level decompositions.

Feature extraction consists of decomposing a set of images and, constructing an M×N matrix, where M and N are the number of images and the coefficients of each image, respectively.

FS algorithms may be classified into two categories based on their evaluation procedure. If an algorithm performs FS independently of any learning algorithm (i.e., it is completely a separate preprocessor), then it is a filter approach. In fact, irrelevant attributes are filtered out before induction. Filters tend to be applicable to most domains as they are not tied to any particular induction algorithm. If the evaluation procedure

is tied to the clustering, the FS algorithm is called wrapper. This method searches through the feature subset space using the estimated accuracy from an induction algorithm as a measure of subset suitability. Although wrapper schemes may produce better results, they are expensive to run and can break down with very large numbers of features. This is due to the use of learning algorithms in the evaluation of subsets, some of which can encounter problems when the datasets are large [16].

This study uses a filter approach based on ReliefF which is an efficient heuristic estimator of attribute quality. It is able to deal with data sets with conditionally dependent and independent attributes. The extensions of ReliefF enable it to deal with noisy, incomplete, and multi-class data sets [17]. The effect of image's features in image clustering might be different. Therefore, we consider features' weight rather than treating all the features equal. Hence we applied ReliefF algorithm to measure features' weight in clustering.

C. SVM Classification The SVM proposed by Vapnik [18] has been studied

extensively for classification, regression and density estimation. It has been applied to a wide variety of domains such as pattern recognition, function estimation, and image processing, etc. When the SVM approach is applied to the detection of microcalcifications, it can achieve better detection performance [19]. Many learning techniques attempt to minimize the classification error in training phase however, there is no guarantee on a low error rate in the testing phase.

In statistical learning theory, the SVM is claimed to efficiently address this issue [20].

Class Benign Malignant Total Microcalcification 12 13 25

Circumscribed 19 4 23

Ill-defined 7 7 14

Spiculated 11 8 19

Architectural 9 10 19

Asymmetry 6 9 15

Normal tissue - - 207

Total 64 51 322

Input Image and Preprocessing

Extract ROI

Wavelet Transform for Feature Extraction

ReliefF-based Feature Selection

SVM Classification based RBF Kernel

Figure 3. The various stages for the design of a BC recognition system

SVM first maps the input points into a high-dimensional feature space and finds a separating hyper-plane that maximizes the margin between two classes. Margin maximization is a quadratic programming (QP) problem where it can be solved from its dual problem by Lagrangian multipliers. SVM finds the optimal hyper-plane without prior knowledge of the mapping, the by the use of dot product function in feature space that are called kernels [18].

When the data points are not linearly separable, a nonlinear transformation is used to map the data vector onto a higher dimensional space (called feature space) prior to applying the linear maximum margin classifier. To avoid over-fitting in higher dimensional space, SVM uses a kernel function in which the nonlinear mapping is implicitly embedded. According to Cover’s theorem, a function can be considered as a kernel provided that it satisfies Mercer’s conditions. The kernel function plays an important role in implicitly mapping the input vector onto a high dimensional feature space, in which better seprability can be achieved. This paper uses Gaussian radial basis function (GRBF) kernel because of its generalization property (1):

2

2

| |[ ]

2( , )i jx x

i jk x x e σ−

−= (1)

Where , is the SVM kernel and σ > 0 is a constant showing the kernel width [18].

The last step of the proposed approach consists of using SVM classification to training and testing. This method is repeated until arriving to the best classification rate with the minimum number of coefficients.

IV. RESULTS AND DISCUSSIONS This section presents and evaluates results of the

experiments performed according to two stages of mammogram examination introduced in this paper: (1) ROI feature extraction and selection and (2) mass clustering. The method was applied to a set of 284 (207 normal, 47 benign and 30 malignant cases) mammograms taken from the MIAS dataset.

First, each ROI was first represented in a multiresolution way by applying wavelet transform. We choose Daubechies wavelet family. We apply Daubechies-1, -4, -8 filters in all extracted ROI’s, using three levels of multiresolution. The total numbers of coefficients obtained using (db1, db4, db8) were 18392 coefficients. Then, to increase clustering efficiency and reducing number of features (coefficients), ReliefF algorithm is applied to select the most relevant and non-redundant ones.

In the second stage, the data set was divided into two groups, 75% used for training and 25% for testing. Tables ІІ, III illustrate the performance of the SVM classifier in distinguishing classes corresponding to the number of extracted features. They show that the maximum accuracy obtained is 84.51% with 25 features. Table IV illustrates performance comparison of two feature selection methods, ReliefF and genetic algorithm.

TABLE II. Performance Comparison of SVM Classifier Using Different Features in Detecting Benign and Malignant Masses(Before Feature Selection)

TABLE III. Performance Comparison of SVM Classifier Using Different Features in Detecting Benign and Malignant Masses (After Feature Selection)

TABLE IV. Performance Comparison of Feature Selection Methods, Relieff and Genetic Algorithm

Measure features

Daubechies-1

Daubechies-4

Daubechies-8

All wavelet coefficients

Sensitivity mean 0.1111 0.3529 0.2105 0.5625

Specificity mean 0.8868 0.8333 0.9423 0.8909

Accuracy mean 0.6901 0.7183 0.7465 0.8169

Measure features

Daubechies-1

Daubechies-4

Daubechies-8

All wavelet coefficients

Optimal Number of Features

55 30 40 25

Sensitivity mean

0.2857 0.40 0.4211 0.3125

Specificity mean

0.9649 0.9107 0.9423 0.9818

Accuracy mean

0.8169 0.8028 0.8028 0.8451

Measure feature selection method

ReliefF Genetic algorithm

Sensitivity mean

0.3125 0.4503

Specificity mean 0.9818 0.9601

Accuracy mean 0.8451 0.8257

REFERENCES [1] American Cancer Society available online:

http://www.cancer.org/downloads/PRO/Cancer_Statistic_2009_Slides_rev.ppt. [2] D. Sankar, T. Thomas, “Analysis of mammograms using fractal

features,” NaBIC, World Congress on, Coimbatore, Dec, 2009, pp. 936–941.

[3] F. Moayedi, Z. Azimifar, R. Boostani, S. Katebi, “Contourlet based mammography mass classification,” in: Lecture Notes in Computer Science, Image Analysis and Recognition, vol. 4633, Springer, Berlin, 2007, pp. 923–934.

[4] H.D. Cheng, X. Cia, X. Chen, L.H. Lou, “Computer aided detection and classification of microcalcification in mammogram: a survey”, Pattern Recognition Letters 36 , 2003, pp. 2967–2991.

[5] Tingting Mu,Asoke K. Nandi, Rangaraj M.Rangayyan2, “Classification of breast masses using selected shape, edge-sharpness, and texture features with linear and kernel-based classifiers,” Journal of Digital Imaging, Vol. 21, No 2, June, 2008, pp. 153-169.

[6] Essam A. Rashed a, Ismail A. Ismail b, Sherif I. Zaki, “Multiresolution mammogram analysis in multilevel decomposition,” Pattern Recognition Letters 28, 2007, pp. 286–292.

[7] M. Meselhy Eltoukhy, I. Faye, B. Belhaouari Samir , “A comparison of wavelet and curvelet for breast cancer diagnosis in digital mammogram,” Computers in Biology and Medicine Vol. 40, April, 2010, pp. 384-391.

[8] G. Bocciglione, A. Chainese and A. Picariello, “Computer aided detection of microcalcifications in digital mammograms,” Comput. Biol. Med 30, 2000, pp. 267-286.

[9] S. Liu, C.F. Babbs, E.J. Delp, “Multiresolution detection of spiculated lesions in digital mammograms,” IEEE Transactions on Image Processing 10 (6) , 2001, pp. 874–884.

[10] R. Mousa, Q. Munib, A. Moussa, “Breast cancer diagnosis system based on wavelet analysis and fuzzy-neural,” Expert Systems with Applications 28, 2005, pp. 713–723.

[11] E. Sakka, A. Prentza, I.E. Lamprinos, D. Koutsouris, “Microcalcification detection using multiresolution analysis based on wavelet transform,” Proc.in: Proceeding of the International Special Topic Conference on Information Technology in Biomedicine (IEEE-ITAB2006), Ioannina, Epirus, Greece, October 26–28, 2006.

[12] http://peipa.essex.ac.uk/ipa/pix/mias. [13] Geraldo Braz Junior, Erick Correa da Silva, Anselmo Cardoso de Paiva

and Aristofanes Correa Silva, “Breast tissues classification based on the application of geostatistical features and wavelet transform,” 6th International Special Topic Conference, Tokyo, 2007, pp. 227 – 230.

[14] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation”, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 7, No. 11, 1989, pp. 674–693.

[15] Nasser H. Sweilam , A.A. Tharwat , N.K. Abdel Moniem, “Support vector machine for diagnosis cancer disease: A comparative study,” Egyptian Informatics Journal, 2010, Cairo, pp. 81–92.

[16] R. Jensen, Q. Shen, Computational Intelligence and Feature Selection, IEEE Computational Intelligence Society, Sponsor, 2008, pp. 113-133.

[17] I. Kononenko. “Estimating attributes: Analysis and extensions of RelieF,” Lecture Notes in Computer Science, 1994, Vol. 784, 1994, pp. 171-182.

[18] VN. Vapnik, The Nature of Statistical Learning Theory, New York: Springer-Verlag, 1995.

[19] J. Ye, S. Zheng, C. Yang, “SVM-Based microcalcification detection in digital mammograms,” , IEEE International Conference on Computer Science and Software Engineering, 2008, pp. 89-92.

[20] C. Lin, C. Yeh, Sh. Liang, J. Chung, N. Kumar, Support vector based fuzzy neural network for pattern classification, IEEE Transactions on Fuzzy Systems 14 (1) (2006) 31–41.

Documents

[IEEE 2011 7th Iranian Conference on Machine Vision and Image Processing (MVIP) - Tehran, Iran (2011.11.16-2011.11.17)] 2011 7th Iranian Conference on Machine Vision and Image Processing