9
Unsupervised Method to Localize Masses in Mammograms Bilal Ahmed Lodhi * Korea University Seoul Republic of Korea [email protected] ABSTRACT Breast cancer is one of the most common and prevalent type of cancer that mainly affects the women population. chances of effective treatment increases with early diagnosis. Mam- mography is considered one of the effective and proven tech- niques for early diagnosis of breast cancer. Tissues around masses look identical in mammogram, which makes auto- matic detection process a very challenging task. They are indistinguishable from the surrounding parenchyma. In this paper, we present an efficient and automated approach to segment masses in mammograms. The proposed method uses hierarchical clustering to isolate the salient area, and then features are extracted to reject false detection. We ap- plied our method on two popular publicly available datasets (mini-MIAS and DDSM). A total of 56 images from mini- mias database, and 76 images from DDSM were randomly selected . Results are explained in-terms of ROC (Receiver Operating Characteristics) curves and compared with the other techniques. Experimental results demonstrate the ef- ficiency and advantages of the proposed system in automatic mass identification in mammograms. CCS Concepts Computing methodologies Cluster analysis; Keywords Breast Mass Detection; Automatic Mammogram Segmenta- tion; Mass Classification; 1. INTRODUCTION Breast cancer is the most common cause of cancer-related deaths among women worldwide. With more than 450, 000 deaths each year, breast cancer accounts for about 14% of all female cancer deaths ([11]). Recent statistics says that * PhD candidate in Korea University Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. c 2019 ACM. ISBN 978-1-4503-2138-9. DOI: 10.1145/1235 1 out of 10 women is affected by breast cancer in their life- time. According to GLOBOCAN 2012, 1.7 million Women were diagnosed with breast cancer and there were 6.3 million women alive who had been diagnosed with breast cancer in the previous five years ([3]). Although the breast cancer rate is increasing in many parts of the world, however the mortal- ity rate is much higher in less developed countries, because of insufficient facilities available for diagnosis and treatment. Therefore, there is an urgent need of reliable and affordable approches for early diagnosis and treatment of breast cancer in less developed countries.It can have significant impact on cancer treatment, faster recovery and reducing mortality. Mammography is considered most effective technique as it can detect 8590% percent of all breast cancers ([3]). A mass is an uncontrolled grown tumor and we classify them into malignant and benign by their size, shape and other fea- tures. As described earlier that early diagnosis is a key for effective treatment. Therefore the job radiologist becomes very important, who can interpret mammograms for early diagnosis. Mammogram does not have so much information imprinted on the film. Cancer diagnosis in this scenario becomes a subjective criteria. Radiologist opinion depends on their experience. [22] states that radiologist’s diagnosis inter-observer variation rate is 65 75% ). He can miss a significant proportion of abnormalities and in addition a large number of mass come out to be benign after biopsy ([22]). [12] states that Computer aided diagnosis (CAD) systems is helpful for the radiologists in diagnosis. ([26]) claims that detection accuracy improved by combining the expert knowledge with CAD scheme.We proposed an algo- rithm to address the previously described problem for breast cancer diagnosis. Proposed scheme is novel in the following ways: Scope of the detection algorithm is wide. It can de- tect different type of cancers in malignant and benign categories. Proposed algorithm was tested on many ill-defined masses also. A method is proposed to identify masses, irrespective of their size and shape. We proposed an efficient and unsupervised approach to detect masses in mammogram images. It segments the breast region and finds the candidate regions of interests (ROIs). Generalization of algorithm is tested by experimenting cross validation across two different datasets. arXiv:1904.06044v1 [cs.CV] 12 Apr 2019

Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea [email protected] ABSTRACT Breast cancer is one of the most common

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

Unsupervised Method to Localize Masses in Mammograms

Bilal Ahmed Lodhi∗

Korea UniversitySeoul

Republic of [email protected]

ABSTRACTBreast cancer is one of the most common and prevalent typeof cancer that mainly affects the women population. chancesof effective treatment increases with early diagnosis. Mam-mography is considered one of the effective and proven tech-niques for early diagnosis of breast cancer. Tissues aroundmasses look identical in mammogram, which makes auto-matic detection process a very challenging task. They areindistinguishable from the surrounding parenchyma. In thispaper, we present an efficient and automated approach tosegment masses in mammograms. The proposed methoduses hierarchical clustering to isolate the salient area, andthen features are extracted to reject false detection. We ap-plied our method on two popular publicly available datasets(mini-MIAS and DDSM). A total of 56 images from mini-mias database, and 76 images from DDSM were randomlyselected . Results are explained in-terms of ROC (ReceiverOperating Characteristics) curves and compared with theother techniques. Experimental results demonstrate the ef-ficiency and advantages of the proposed system in automaticmass identification in mammograms.

CCS Concepts•Computing methodologies → Cluster analysis;

KeywordsBreast Mass Detection; Automatic Mammogram Segmenta-tion; Mass Classification;

1. INTRODUCTIONBreast cancer is the most common cause of cancer-related

deaths among women worldwide. With more than 450, 000deaths each year, breast cancer accounts for about 14% ofall female cancer deaths ([11]). Recent statistics says that

∗PhD candidate in Korea University

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

c© 2019 ACM. ISBN 978-1-4503-2138-9.

DOI: 10.1145/1235

1 out of 10 women is affected by breast cancer in their life-time. According to GLOBOCAN 2012, 1.7 million Womenwere diagnosed with breast cancer and there were 6.3 millionwomen alive who had been diagnosed with breast cancer inthe previous five years ([3]). Although the breast cancer rateis increasing in many parts of the world, however the mortal-ity rate is much higher in less developed countries, becauseof insufficient facilities available for diagnosis and treatment.Therefore, there is an urgent need of reliable and affordableapproches for early diagnosis and treatment of breast cancerin less developed countries.It can have significant impact oncancer treatment, faster recovery and reducing mortality.

Mammography is considered most effective technique asit can detect 85∼90% percent of all breast cancers ([3]). Amass is an uncontrolled grown tumor and we classify theminto malignant and benign by their size, shape and other fea-tures. As described earlier that early diagnosis is a key foreffective treatment. Therefore the job radiologist becomesvery important, who can interpret mammograms for earlydiagnosis. Mammogram does not have so much informationimprinted on the film. Cancer diagnosis in this scenariobecomes a subjective criteria. Radiologist opinion dependson their experience. [22] states that radiologist’s diagnosisinter-observer variation rate is 65 ∼ 75% ). He can missa significant proportion of abnormalities and in addition alarge number of mass come out to be benign after biopsy([22]). [12] states that Computer aided diagnosis (CAD)systems is helpful for the radiologists in diagnosis. ([26])claims that detection accuracy improved by combining theexpert knowledge with CAD scheme.We proposed an algo-rithm to address the previously described problem for breastcancer diagnosis. Proposed scheme is novel in the followingways:

• Scope of the detection algorithm is wide. It can de-tect different type of cancers in malignant and benigncategories. Proposed algorithm was tested on manyill-defined masses also.

• A method is proposed to identify masses, irrespectiveof their size and shape.

• We proposed an efficient and unsupervised approachto detect masses in mammogram images. It segmentsthe breast region and finds the candidate regions ofinterests (ROIs).

• Generalization of algorithm is tested by experimentingcross validation across two different datasets.

arX

iv:1

904.

0604

4v1

[cs

.CV

] 1

2 A

pr 2

019

Page 2: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

The organization of paper is as follows. Section I presentsintroduction and significance of the work. Section II dis-cusses previous and related work. Section III briefly de-scribes the proposed method for preprocessing. Section IVanalyses the results and finally, Section V concludes the ar-ticle.

2. RELATED WORKIn order to develop computer aided breast cancer detec-

tion tools, researchers have used several approaches. [10]proposes a Particle Swarm Optimized Wavelet Neural Net-work (PSOWNN) based classification approach for detectionof masses in digital mammograms. Their method is based onextracting Laws Texture Energy Measures from the mam-mograms and classifies the suspicious regions by PSOWNN.Their method does not have any noise removal algorithmand also they do not propose any intelligent method ROIdetection. In ([25], [24], authors used Latent Dirichlet Al-location (LDA) to mine the feature set of mammogram im-ages. They presented the modified Morphological Compo-nent Analysis method to identify the mass region and thenextracted morphological features. Finally, LDA is used toclassify the masses. Simple Morphological approaches aresensitive to noise. They also did not presented any prepro-cessing for collection of ROIs.

In [21], authors proposed the modified Fuzzy c-means clus-tering to cluster the masses, extracted morphological, tex-tual and spatial features and classified the features usingSVM (Support Vector Machine). Their method lacks thenoise removal and intelligent ROI segmentation. [16] pre-sented a set of tools to aid segmentation and detection ofmammograms that contained mass. After the top-hat mor-phological operator, de-noising is applied. Image gray-levelwas enhanced by wavelet transform and wiener filter. Fi-nally, segmentation method was employed using multiplethresholding, wavelet transform and genetic algorithm. Theyused manual process to reduce the false positives generatedby genetic algorithm. Authors also did not do the auto-matic classification of the ROIs. [1] proposed a method formass detection based on saliency map. After the creation ofsaliency map, a threshold is used to obtain the ROI. A num-ber of features were extracted and classified by SVM. Auto-mated detection of malignant masses in screening mammog-raphy has been discussed in [19]. It developed a techniquethat used presence of concentric layers which surrounds afocal area in the breast region, that has suspicious morpho-logical characteristics and low relative incidence. Segmenta-tion process in both of the earlier described algorithms arefocused on the bright or salient parts of the image, which isalways mis-leaded by the blood vessels resulting in the wholebreast parenchyma as a ROI. [13] work is based on applyingone-dimensional recursive median filter to different numberof angles to each pixel. It becomes difficult to detect whenstructure of the mass and a normal glandular looks similar.It can only be detected if there were asymmetry between theleft and right breasts.

[14] proposed method is based on the analysis of ISO-intensity contour groups to segment skeptical masses. Falsepositives are then removed using features based on flow ori-entation in adaptive ribbons of pixels across the margins ofmasses. The procedure is tested on 56 images from the Mini-MIAS database and got a sensitivity at the rate of 81% at2.2 false positives per image. Furthermore, based on gray-

ImageStandardizat-ion

Preproce-ssing

FeaturesExtraction

Classific-ation

System Input

ROI

Textual Informa-

tion

Phase IInput Image

Phase IIPre-Processing

Phase IIIFeature Extraction

Phase IVClassification

Artifacts RemovalContrast Enhancement

Segmentation. . .

TextureShape

Statistical. . .

Support Vector MachineNeural Network

Linear Discriminant Analysis. . .

Figure 1: General phases of Detection Algorithm

level co-occurrence matrices (GCM) and using features on alogistic regression method, the classification of masses wereperformed as benign or malignant using five texture features.An accuracy of 0.79 is achieved as a result of this classifi-cation, with 19 benign and 13 malignant lesions. Authorsused the hard thresholds to get the contours of objects inthe image. Contour are very sensitive to noise resulting inincrease of false positives and bad segmentation. Algorithmwill fail to detect the mass if the boundary is ill-defined oreven the mammogram is much denser.

[5] proposed a method for diagnosis of breast lesions (di-agnosis). Masses using the wavelet transform to obtain amulti-resolution representation of the original image at eachresolution, a set of features is extracted which serves as inputto a binary tree classifier. Algorithm achieved 91.9% truepositive detection accuracy. ROIs were manually croppedin the proposed system. Their proposed system is based onwavelet and curvelet coefficients, which is very high in num-bers. Selecting best coefficients is an optimization prob-lem and also it is very sensitive to noise. [27] proposeda method combines several artificial intelligence techniqueswith the discrete wavelet transform (DWT). ROI’s are deter-mined through dimensional analysis using a multi-resolutionMarkov random field algorithm, the segmentation is per-formed that leads to the application of tree type classifica-tion strategy. The algorithm was tested in the Mini-MIASdatabase and has a sensitivity of 97.3% with 3.9 false pos-itives per image. Their proposed method works well withwell-defined masses, but ill-defined masses are difficult to beclassified by this method.

3. METHODOLOGYFemale breast parenchyma is a multiplex biological struc-

ture and is composed of glandular, fatty, and lymphatic tis-sues (lymphovascular structures). Mammography imprintsthe texture information of breast tissue in image. Thoughthe composing components may be complicated, the massregions are characterized of high intensity and high texture.Figure 1 shows the process of a typical analysis system.

We propose an efficient and unsupervised approach toidentify the suspicious regions in mammogram images. Pro-posed algorithm isolates the spatially interconnected struc-tures in the image, which are concentrated around salientintensities. As a result, it is possible to extract high-levelinformation to analyze further, to characterize the physicalproperties of mass regions and to prepare a short-list of skep-tical ROIs. Figure 2 shows our proposed algorithm. Furtherexplanation of the algorithm is explained in the followingsubsections.

3.1 Image StandardizationData from different sources should be converted to one for-

mat. Proposed algorithm was tested on two datasets: Digi-

Page 3: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

Hierarchical Clustering

Hierarchical Clustering

(b) ROIGeneration

Grouping and Elimination

(c) Feature Extraction (d) Pattern

Classification (SVM)

Co-occurrence Matrix

Co-occurrence Matrix

Features for Benign vs Malignant

Features for Benign vs Malignant

Features for Mass vs Normal

Features for Mass vs Normal Mass vs.

NormalMass vs. Normal

Image

Input

Image Standardization

Pre-processing

Figure 2: Overview of the Proposed Algorithm

Figure 3: Original Image from MIAS dataset

tal Database for Screening Mammography (DDSM) ([8]:[9])and Mammographic Image Analysis Society Database (mini-MIAS) ([20]). MIAS dataset is in Portable Gray Map (PGM)format while DDSM dataset contains images in LJPEG for-mat. We converted the DDSM dataset TO 16-bit PortableNetwork Graphics (PNG) format by a wrapper program de-veloped by us 1.

3.2 ROI Detection PhaseOne of the main tasks is to get mass-candidate regions.

Following subsections describe the way to get those regions.

3.2.1 SmoothingIt is assumed that malignant masses typically cause dis-

tortion to the surrounding tissues. So, segmentation processcan over-segment the image and it can’t get those masses in asingle entity. To overcome this problem, prior smoothing ofthe image is necessary. In the present work, Gaussian pyra-mid is used to uniformly highlight the salient regions. Sub-sampling to many levels results in over smoothing the imagewhich converts the image regions as blobs. However, someresearchers ([17]) have performed mass detection on reducedresolutions of 800m. Regions of mass are hyper-densed. Weneed to get the full mass area to extract meaningful featuresfrom the ROI. Abrupt changes in the intensity of the objectspresent in the image effect the segmentation process. Peaksin the image objects are smoothed by the above describedpreprocessing.

3.2.2 Hierarchical Clustering with GLCM (Gray levelCo-occurrence Matrix) data

We applied hierarchical clustering with GLCM data tosegment the salient regions of image. Before segmentationof the image, its contrast was enhanced by CLAHE (Con-trast Limited Adaptive Histogram Equalization). Further,

1Utilities at http://microserf.org.uk/academic/Software.htmlwere used to write a wrapper program.

we calculate the gray-level co-occurrence matrix from im-age. GLCM is created with distance one and 4 directions [01; -1 1; -1 0; -1 -1] (0◦, 45◦, 90◦, 135◦). Other angles werenot computed due to redundancy of the data. GLCM datafrom all directions are summed up and normalized. Figure4 depicts the explanation of co-occurrence matrix. Intensi-ties in mass exhibit the glowing effect (intensities are propa-gated from the center of the masses). Hierarchical clusteringcan cluster image according to propagated intensities whilehaving a family structure of concentric objects. At eachhierarchical level a measure of dissimilarity is defined to dif-ferentiate clusters and object are merged together as one,if their dissimilarity is less than or equal to the acceptabledissimilarity measure.

Many researchers have proposed methods for multilevelthresholding by discriminant analysis ([15], [18], and [2]).They thresholded the image by the cluster analysis irrespec-tive of the physical location of the cluster. This idea worksbetter if the image is multi-modal and we divide it into twoclusters (background and foreground). However, it does notgive fine results on low-level x-rays images which are mostlyunimodal. In this case, multi-thresholding does not givecompact objects for ROI. We incorporated the discriminantanalysis ([2]) with GLCM data to get compact objects. Theproposed method clusters the image intensities in a hierar-chy, according to their co-occurrence and similarity measure.Number of thresholds are found by cutting the dendrogramat desired level.Initially, each gray-level is designated to adifferent cluster i.e. g gray-levels in image will generate qnumber of clusters and each cluster has its own threshold Ti

. Family hierarchy of clustering process can be viewed as adendrogram. The estimated thresholds for the image to seg-ment can be obtained by cutting the branch in dendrogram.Clustering algorithm is defined in algorithm 1.

Result: Return n thresholdsGiven: A set of gray-levels {x1, x2, ...., xq};A distance function dist(c1,c2);m number of threshold levels;for i=1 to q do

ci = {xi};ti = {xi};

endC = {c1,.....,cq};T = {t1,.....,tq};for k=1 to q-m do

- make adjacent cluster pairs;- ( cmin1,cmin2) = minimum dist(ci,cj) for all ci,cjin C;

- remove cmin1 and cmin2 from C;- remove tmin1 and cmin2 from T;- add {cmin1,cmin2} to C;- add {tmin1,cmin2} to T;

endAlgorithm 1: Clustering Algorithm

Distance Metric.The distance measure between two clusters in the pro-

posed algorithm is defined as ratio between the measure ofobserved dispersement and the expected dispersement. it iscalculated as:

Page 4: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

10 5 8 1 8 5 8

5 10 8 1 10 8 1

1 8 5 5 1 5 1

1 5 5 8 10 8 8

10 8 5 8 8 1 10

8 1 1 10 1 8 8

5 8 5 8 8 5 10

0° 45° 90°

135° All Directions1 5 8 10

1 1 2 3 3

5 2 2 6 2

8 5 5 4 1

10 1 1 4 0

1 5 8 10

1 1 1 5 1

5 3 3 2 2

8 5 1 5 2

10 0 3 2 0

1 5 8 10

1 3 2 4 1

5 2 2 5 1

8 4 4 2 5

10 2 1 4 0

1 5 8 10

1 1 1 4 2

5 3 1 2 2

8 4 5 4 1

10 1 2 2 1

1 5 8 10

1 6 6 16 7

5 10 8 15 7

8 18 15 15 9

10 4 7 12 1

j1 2 Ng

1 2

Ng

M1,1 M1,2 M1,Nj

M2,1 M2,2

MNi,1 MNi,Nj

i

Figure 4: Process of co-occurrence matrix

dist(qi,qj) =(1− CP qiqj )(Pqi − Pqj )2

[Xqi −Xqj

]2σ2qiqj

(1)

where q is the total number of clusters, Pq is probabilitydensity function of image histogram and it can be calcu-lated as equation 2. CPi,j represents the normalized co-occurrence frequency of the cluster pair being merged. Itis defined in equation 3. X is the mean value of the clusterand defined in equation 5. σ2 is the variance of both clusterswhich are being merged. It is defined in equation 7.

Pq =

Tq∑l=Tq−1+1

h(l) (2)

where l represents the gray-level in image (value: [0 255])such

thatq∑

i=1

Pi = 1.

CPi,j =

Tqj∑t=Tqj−1+1

Tqi∑s=Tqi−1+1

CMs,t

Tqi − Tqi−1

(3)

where CMs,t is the co-occurrence probability of gray-level sand t.

Mean is also called as the expectation of the cluster andcan be represented as:

µ = E(qi) =

q∑i=1

liP (li) (4)

so we calculated the mean as:

Xq =1

Pq

Tq∑l=Tq1+1

lh(l) (5)

Variance of the distribution is defined as:

σ2 =

q∑i=1

(li − µ)2P (li) (6)

This formulates the variance into the following equation.

σ2qiqj =

Tq2∑l=Tq−1+1

[l − CXqiqj

]2h(l) (7)

where CX is defined as average mean of the cluster pair.It is calculated as the weighted average between the clustermeans of the pair being merged:

CXqiqj =PqiXqi + PqjXqj

Pqi + Pqj(8)

We imposed a restriction that only the adjacent clus-ters are allowed to merge. The similarity measurement isadapted by [15]. Pair having the minimum distance value isthe best candidate to merge.

The saliency of a region is measured by the nesting depthof hierarchical clustering which identifies nested objects. Onestatistical parameter LevelParameter is introduced that rep-resents the levels in hierarchical clustering. LevelParametervalue of 5 is used in the study. Figure 5a shows the numberof objects found in mammogram by segmentation process.

3.2.3 Grouping and EliminationSegmentation process described in previous section results

in a large number segmented objects. We devised an algo-rithm to reduce the number of objects and extract only therelevant data for analysis. Our first step in this process isgrouping and elimination. As previously described, massesexhibits the glowing effect, therefore, we first find the dense-core portions and then go to the next threshold level to findobjects which encircles the previously detected object. Theidea of prestige in link analysis is used and also the hierarchi-cal clustering nodal relation is considered. Every possible re-gions are given a prestige score of 1. When they are encircledby other immediate lower density parent they forward theirprestige score to the parent. Sum of euclidean distance be-tween the higher density objects and lower density objects.Lower density object should cover at least 80% of higherdensity object. Algorithm 2 describes the process of mergescore. This process is repeated for all the segmented regionsat every selected hierarchical level. Hierarchical clusteringgives a parent-child relationship of clusters also, we can usethis relationship to avoid unacceptable merging of objects.

Page 5: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

Objects having at least 3 prestige score from each level areup-sampled to full resolution image. Result of merging pro-cess is shown in Figure 5, where 5a represents the detectedROIs and 5b shows the merged objects.

Result: Merge ScoreGiven;Labels = {L1, L2,.....,Ln};for i = 1 to n do

currentLabel = {Li};Objects = Object by current current label;numObjects = number of Objects by current label;for j = 1 to numObjects do

mergeScore[i][j] = 1;;enddist = distanceL2 (Objects[i], Objects[i-1] );if dist <0.2 then

mergeScore[i][j] += mergeScore [i-1][j] ;end

endAlgorithm 2: Merge Score

3.3 Features for False Positive (FP) AnalysisFollowing set of features are extracted to classify objects

into true mass and breast tissue (false positive). These fea-tures are well-established statistical features and finalizedby radiologist too after analyzing the prominent patterns ofmasses on mammograms.

Region Contrast:Generally, mass is imprinted on mammogram as a denseobject as compared to its surroundings, having at least auniform density. We used this property for classificationbetween true mass and breast tissue. Region Contrast iscomputed as a difference between mean intensities of fore-ground and background in ROI. Foreground area is the se-lected mass or object while background represents the back-ground area surrounding this object. Regions which resultsin negative values of region contrast are rejected for furtherprocessing.

Mean Gradient:Gradient monitors the directional change in intensity. Gra-dient magnitude describes that how quick the image is chang-ing. We calculated the mean gradient of the boundary pixelswhich strengthens the compactness of the region( describedlater).

Entropy:The concept of entropy is in information theory which statesthe probabilistic behavior of the information sources. Thisstatistical measure is a measure of randomness that is usedto characterize the texture of image.

Standard Deviation:It is popular term in statistics which gives a measure ofspread of data. This represents the measure, that how muchclose the points are in the given region of the image.

Compactness:The value of compactness gives the ratio of contour whichencloses an area. it is defined as:

compactness = 1− 4 ∗ pi ∗AP 2

(9)

where A is Area of object enclosed by perimeter P. UsuallyBenign masses have higher value of compactness, because itdefines that small perimeter is enclosing a bigger area. Wehave used this feature in benign vs malignant classificationtoo.

3.4 Classification ModelSVM(Support Vector Machine) was used to classify the

masses. We selected support vector machine as it gives goodresults for binary classification. The basic idea behind SVMis to separate the input data by optimal method. As ourdata is not linearly separable, we used Gaussian RBF (Ra-dial basis function) kernel. Sigma and C are two importantfactors for RBF kernel. optimal values for RBF were grid-searched between 10−3 to 103. Harmonic Mean (HM) iscalculated to compare the C and sigma pairs. HarmonicMean is defined as:

HM =2 ∗ sens ∗ specsens+ spec

(10)

where sens is sensitivity and spec represents specificity ofthe system. We adopted a 10-fold cross validation techniqueto train,test and validate the data.

4. RESULTS AND DISCUSSION

4.1 Image DatabaseThis study was carried out on images from two databases.

We selected 56 images from mini-MIAS database ([20]). Itincludes 13 normal, 13 malignant and 30 benign cases. Thedataset include all types of masses from both classes (benignand malignant). Table 1 shows the overview of number ofcases used in experiments from MIAS-dataset. We also se-lected 76 cases from DDSM database ([8]:[9]). Table 2 showsthe summary of DDSM database

4.2 Detection of ROIsOur proposed preprocessing steps detected almost all masses

in the dataset. Through careful examination of ROIs, wefound that our algorithm missed two cases in MIAS database.One from Malignant and the other from Benign case (mdb179and mdb191), Dense-glandular and Fatty Glandular. Thecontrast in these two images was very high and distributed,making it difficult to detect isolated regions. All other masseswere successfully detected. This results in the detection ac-curacy of 95.3%. The detection accuracy on DDSM datasetwas 97.3%. We missed 2 cases. Detected ROIs were care-fully compared with the given ground truth data.

4.3 Normal and Mass DifferentiationOur algorithm detected all the malignant masses except

one (mdb0186) on MIAS dataset. However we did not get soprominent success on benign masses. 30 cases were testedbut Algorithm failed to detect 6 masses. Three of thesemissed masses were Fatty (mdb069, mdb080 and mdb195),two were Dense-glandular (mdb193 and mdb290) and onewas Fatty-glandular (mdb190). The total accuracy of the

Page 6: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

(a) Detected ROIs (Objects) (b) Merged ROIs (Objects)

Figure 5: Detected objects and their merging process

Table 1: MIAS (Mammographic Image Analysis Society) dataset

Benign MalignantDense Fatty-Glandular Glandular Dense Fatty-Glandular Glandular

8 12 10 2 6 5

Table 2: DDSM (Digital Database for Screening Mammog-raphy) dataset information

Property Description Image Count

Density

1 142 433 344 6

Shape

Fine Linear Branch 2Irregular 16Irregular Architecture 8Lobulated 9Oval 8Pleomorphic 4Round 3

Margin

Circumscribed 10Circumscribed ill Defined 1ill Defined 9ill Defined Spiculated 1Microlobulated 1Obscured 3Obscured ill defined 2Obscured ill defined spiculated 3Spiculated 15

PathologyBenign 14Malignant 38Normal 46

system was 83.43%. Figure 7 shows the example ROI whichis classified as mass.

We further investigated the missed cases and found thefollowing observations. In the first missed case (mdb069),the margin and boundary with wide transition zone, if wecompare with opposite side breast, the lesion could be de-tectable, and in clinical practice, we describe it as archi-tectural distortion. In case of mdb080, the tumor lesion issubtle ill margined, non-mass like parenchymal asymmetricpattern. In case of mdb195, the malignant lesion is almostisodensed to the normal breast fatty parenchyma. So thedetection is not feasible. In mdb186 we found that the masshas poor contrast and also it lacks the dense region. Itscontrast with respect to the surrounding was very poor. Be-nign cases, where algorithm was unable to classify masses,we observed that, in three fatty and one fatty glandularcase (mdb069, mdb080, mdb190 and mdb195) the masseswere not clear. They do not have center core region andtheir contrast with respect to their surrounding was poor

too. We are confident that if we add some good contrastenhancement technique, our algorithm performance will beimproved by classifying above described cases as well. Theremaining two dense-glandular cases (mdb193 and mdb290)do not follow the assumption we made in this paper (they donot have glowing effect), so features values were not good inthese cases to classify them. To successfully detect massesin these cases, it may require additional methods or includemore features. In the present work, we did not reject anyregion because of its size, this results in generating a largenumber of false positives. Although our classification phasereduces the number of FPs, but we aim to reduce the numberFPs by improved algorithm in future work. We also believethat automatic breast density assessment before applyingour method will improve the performance ([11]).

We validated the results by plotting the receiver operat-ing characteristic (ROC) curve, which illustrates the per-formance of binary classifier system as its discriminationthreshold is varied. Figure 6 shows the ROC curve of clas-sification between normal and mass data, which is obtainedby varying the threshold on the probabilities by classifier(SVM). AUC refers to the Area Under Curve. Table 3 showsthe classification results in terms of specificity and sensi-tivity.In medical domain, only sensitivity is not important,algorithm should yield good specificity results also. As pre-viously described, we used harmonic mean (equation 10) toget the best pair of specificity and sensitivity

Algorithm missed 2 cases from malignant category and6 from benign category of DDSM dataset. The maximumsensitivity and specificity pair we achieved is 91.32% and85.05% respectively. Average sensitivity and specificity is76.19% and 87.05% respectively.

We also tested our algorithm for its generality by trainingit on one dataset and testing on the other. Algorithm wastrained on MIAS dataset, tested on DDSM and vice versa.Algorithm results in table 3 confirm our claim that proposedalgorithm is not limited to some limited type of masses orabnormalities. It covers a wide spectrum of masses. Distri-bution of the dataset is uneven, which degrades the perfor-mance of learning algorithm.

Investigation of the missed cases confirms the reasons de-

Page 7: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

ROC for classification by SVM

Az =0.93769

(a) ROC of Normal and Mass (MIAS dataset)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

ROC for classification by SVM

AUC =0.91446

(b) ROC of Normal and Mass (DDSM Dataset)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

ROC for classification by SVM

AUC =0.84474

(c) ROC of training on MIAS and Test on DDSM

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

ROC for classification by SVM

AUC =0.82292

(d) ROC of training on DDSM and Test on MIAS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

ROC for classification by SVM

AUC =0.91614

(e) ROC of MIAS and DDSM combined

Figure 6: Breast tissue vs Mass classification results (ROC plots)

Page 8: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

Table 3: Specificity and Sensitivity of Mass vs Normal clas-sification

Training TestingSenstivity(%)

Specificity(%)

MIAS MIAS 90.47 82.95DDSM DDSM 78.03 87.02MIAS+DDSM MIAS+DDSM 75.51 84.12MIAS DDSM 87.80 64.88DDSM MIAS 75.51 84.12

Figure 7: Detected Mass ROI

scribed earlier. Case0004 from DDSM shows poor contrastaround the mass, making it difficult to be detected. Case0005,case0006, and case008 does not follow the assumption wemade in the paper. More features may be required to detectthose masses. We also calculated the number of false posi-tives per image which was 4.67 FP/Image. This number iscalculated only on Normal Images to give the fair view ofthe system.

4.4 Comparison with existing algorithms[19] stated their results of mass detection phase they achieved

84.4% detection accuracy. Their algorithm is based on im-age enhancement and after that Gaussian Markov RandomField (MRF) is used for mass segmentation. They did notclassified the ROIs into mass and non-mass regions. [10]also reported their detection accuracy as 94.44%. They pre-sented a particle swarm optimization (PSO) based detectiontechnique. Our algorithm outperformed previously reporteddetection accuracies.

Work presented by [7], [23], [5], and [4] can be consideras the baseline in recent work on this domain. [4] imple-mented a fully automated system. They extracted localbinary pattern LBP features and the classification is doneby SVM. They also proposed a feature selection technique.[4] reported their performance in terms of sensitivity and75.86% is reported for overall CAD performance on MIASdatabase. [6] reported their results on already selected 305ROIs and achieved a sensitivity of 76.53%. They extractedfeatures from Grey-level co-occurrence matrices (GLCM)and then classify features into mass and non-mass regions.[5] proposed the technique of curvelet transformation, fea-ture selection and then classification by SVM. They man-ually cropped the ROIs and then applied their algorithm.Their reported accuracy is higher 90%, but their algorithmis not fully automated, they lack mass detection phase. Allmethods were tested on separate dataset, cross validationbetween the datasets was never performed.

5. CONCLUSIONThis paper proposes a new mass detection in mammogram

images. The proposed method is fully automated. It findsthe candidate regions by segmenting the salient regions inmammogram and then extract features to differentiate be-

tween breast tissue and mass.Promising results are obtainedin mass identification and normal vs mass tissue classifica-tion. Classification results confirms that the segmentationprocess extracts enough information to find masses and lo-calize it in mammogram. Experiments were performed onmini-MIAS and DDSM databases to show the usefulness andgeneralization of the proposed algorithm. Correlating thefull image set (CC and MLO) is considered as future workthat can help to identify the architectural distorted mam-mograms also.

6. REFERENCES[1] P. Agrawal, M. Vatsa, and R. Singh. Saliency based

mass detection from screening mammograms. SignalProcessing, 99:29–47, 2014.

[2] A. Z. Arifin and A. Asano. Image thresholding byhistogram segmentation using discriminant analysis.In Proceedings of Indonesia–Japan Joint ScientificSymposium, pages 169–174, 2004.

[3] Brigham and F. H. Women’s Hospital. Breast careand you, 2015.

[4] J. Y. Choi and Y. M. Ro. Multiresolution local binarypattern texture analysis combined with variableselection for application to false-positive reduction incomputer-aided detection of breast masses onmammograms. Physics in medicine and biology,57(21):7029, 2012.

[5] M. M. Eltoukhy and I. Faye. An optimized featureselection method for breast cancer diagnosis in digitalmammogram using multiresolution representation.Appl. Math, 8(6):2921–2928, 2014.

[6] S. J. S. Gardezi, I. Faye, and M. M. Eltoukhy.Analysis of mammogram images based on texturefeatures of curvelet sub-bands. In Fifth InternationalConference on Graphic and Image Processing, pages906924–906924. International Society for Optics andPhotonics, 2014.

[7] F. B. Garma, M. A. E. Almoon, M. M. Bakry, M. E.Mohamed, and E. Osman. Detection of breast cancercells by using texture analysis. Journal of ClinicalEngineering, 38(2):79–83, 2013.

[8] M. Heath, K. Bowyer, D. Kopans, P. Kegelmeyer Jr,R. Moore, K. Chang, and S. Munishkumaran. Currentstatus of the digital database for screeningmammography. In Digital mammography, pages457–460. Springer, 1998.

[9] M. Heath, K. Bowyer, D. Kopans, R. Moore, andW. P. Kegelmeyer. The digital database for screeningmammography. In Proceedings of the 5th internationalworkshop on digital mammography, pages 212–218.Citeseer, 2000.

[10] Y.-C. Kuo, W.-C. Lin, S.-C. Hsu, and A.-C. Cheng.Mass detection in digital mammograms system basedon pso algorithm. In Computer, Consumer andControl (IS3C), 2014 International Symposium on,pages 662–668. IEEE, 2014.

[11] Y. N. Law, M. K. Lieng, J. Li, and D. A.-A. Khoo.Automated breast tissue density assessment using highorder regional texture descriptors in mammography.In SPIE Medical Imaging, pages 90351Q–90351Q.International Society for Optics and Photonics, 2014.

[12] J. Lesniak, R. Hupse, M. Kallenberg, M. Samulski,

Page 9: Unsupervised Method to Localize Masses in Mammograms · Bilal Ahmed Lodhi Korea University Seoul Republic of Korea balodhi@gmail.com ABSTRACT Breast cancer is one of the most common

R. Blanc, N. Karssemeijer, and G. Szekely. Computeraided detection of breast masses in mammographyusing support vector machine classification. In SPIEMedical Imaging, pages 79631K–79631K. InternationalSociety for Optics and Photonics, 2011.

[13] M. Mavroforakis, H. Georgiou, N. Dimitropoulos,D. Cavouras, and S. Theodoridis. Significance analysisof qualitative mammographic features, using linearclassifiers, neural networks and support vectormachines. European Journal of Radiology, 54(1):80–89,2005.

[14] N. R. Mudigonda, R. M. Rangayyan, and J. L.Desautels. Detection of breast masses in mammogramsby density slicing and texture flow-field analysis.Medical Imaging, IEEE Transactions on,20(12):1215–1227, 2001.

[15] N. Otsu. A threshold selection method from gray-levelhistograms. Automatica, 11(285-296):23–27, 1975.

[16] D. C. Pereira, R. P. Ramos, and M. Z. do Nascimento.Segmentation and detection of breast cancer inmammograms combining wavelet analysis and geneticalgorithm. Computer methods and programs inbiomedicine, 114(1):88–101, 2014.

[17] N. Petrick, H.-P. Chan, B. Sahiner, and D. Wei. Anadaptive density-weighted contrast enhancement filterfor mammographic breast mass detection. MedicalImaging, IEEE Transactions on, 15(1):59–67, 1996.

[18] R. Rodrigues, R. Braz, M. Pereira, J. Moutinho, andA. M. Pinheiro. A two-step segmentation method forbreast ultrasound masses based on multi-resolutionanalysis. Ultrasound in medicine & biology,41(6):1737–1748, 2015.

[19] V. Rodrıguez-Lopez, R. Miranda-Luna, and J. A.Arias-Aguilar. Detection of masses in mammogramimages using morphological operators and markovrandom fields. In Advances in Artificial Intelligenceand Its Applications, pages 558–569. Springer, 2013.

[20] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt,C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz,S. Kok, et al. The mammographic image analysissociety digital mammogram database. In ExerptaMedica. International Congress Series, volume 1069,pages 375–378, 1994.

[21] W. Sun, B. Zheng, F. Lure, T. Wu, J. Zhang, B. Y.Wang, E. C. Saltzstein, and W. Qian. Prediction ofnear-term risk of developing breast cancer usingcomputerized features from bilateral mammograms.Computerized Medical Imaging and Graphics,38(5):348–357, 2014.

[22] B. Surendiran, A. Vadivel, and H. Selvaraj. Asoft-decision approach for microcalcification massidentification from digital mammogram. Sat, 1:2, 2008.

[23] J. Torrents-Barrena, D. Puig, M. Ferre, J. Melendez,L. Diez-Presa, M. Arenas, and J. Marti. Breast massesidentification through pixel-based textureclassification. In Breast Imaging, pages 581–588.Springer, 2014.

[24] N. Vallez, G. Bueno, O. Deniz, J. Dorado, J. A.Seoane, A. Pazos, and C. Pastor. Breast densityclassification to reduce false positives in cade systems.Computer methods and programs in biomedicine,113(2):569–584, 2014.

[25] Y. Wang, J. Li, and X. Gao. Latent feature mining ofspatial and marginal characteristics formammographic mass classification. Neurocomputing,144:107–118, 2014.

[26] L. Wei, Y. Yang, R. M. Nishikawa, and Y. Jiang. Astudy on several machine-learning methods forclassification of malignant and benign clusteredmicrocalcifications. Medical Imaging, IEEETransactions on, 24(3):371–380, 2005.

[27] L. Zheng, A. K. Chan, G. McCord, S. Wu, and J. S.Liu. Detection of cancerous masses for screeningmammography using discrete wavelet transform-basedmultiresolution markov random field. Journal ofdigital imaging, 12(1):18–23, 1999.