[IEEE 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS) - Rome, Italy (2012.06.20-2012.06.22)] 2012 25th IEEE International Symposium on Computer-Based

k-Gabor: A New Feature Extraction Method for Medical Images Providing

Internal Analysis

Gabriel Humpire-Mamani

University of Sao Paulo

Computer Science Dept.

Sao Carlos, Brazil

[email protected]

Agma J. M. Traina



Sao Carlos, Brazil

[email protected]

Caetano Traina Jr.



Sao Carlos, Brasil

[email protected]

Abstract

This paper proposes the k-Gabor method, a new im-

age feature extractor that captures texture information from

medical image regions without a costly segmentation usu-

ally associated to texture extractors. It employs Gabor fil-

ters, thus, the k-Gabor method can quantify texture infor-

mation from specific regions, tissues and internal structures

of the images providing a succint representation for a richer

image analysis. The feature vectors generated describe the

images more precisely than other methods from the litera-

ture, as shown in the experiments. Besides providing mean-

ingful information from the images, the cost to obtain it is

very small, since the total time to extract the k-Gabor fea-

tures was always only fractions of seconds.

1 Introduction

The problem of managing and extracting useful informa-

tion from the massive amount of medical images daily col-

lected in hospitals and medical centers has motivated the re-

search for new methods and algorithms supporting Content-

based Image Retrieval (CBIR). The quest for effective and

efficient image processing algorithms and methods that get

the essence of the image regarding specific criteria has mo-

tivated many works in the CBIR field, since it is the core

technique to provide effective similarity query processing.

Image processing methods are employed to extract from

each image its succint representation into feature vectors,

which are used in place of the images to process queries.

Searching by the image content supports the processing

of similarity queries. For example, considering an image

database of X-Ray chest images, a usual query could be:

“Given the John Doe X-Ray obtained in Jan 5th, 2012, re-

turn the 5 images from the database that are most similar

to it”, or “Given the John Doe X-Ray obtained in Jan 5th,

2012, return the images from the database that differ up to

10 units from it”.

Being able to receive images similar to the one under

analysis, together with their associated reports, is a valuable

asset for radiologists during the image analysis procedure,

because it adds new sources of relevant information to the

process, as well as it improves the physician confidence by

considering similar cases already treated or confirmed by

other exams. This procedure is also employed to train stu-

dents and residents in the field, allowing the comparison of

the residents analysis with the analysis of a more experi-

enced radiologist.

In this paper we present a new method based on Gabor

filters applied on specific image regions, which can properly

represent medical images considering the texture of their re-

gions. When compared to other well-known methods from

the literature, our proposed k-Gabor method provides better

results regarding the retrieval precision of the images.

This paper is organized as follows: Section 2 discusses

related work and previous researches in the area. Section

3 details our proposed new method and describes how it

works. The experiments, their discussions and interpreta-

tion are shown in section 4. Finally, conclusions are pro-

vided in section 5.

2 Background and Related Work

Image processing algorithms applied to medical images

usually employ pre-processing steps, such as histogram

equalization or segmentation in order to highlight the inter-

nal structures in the image. Therefore, the feature extraction

algorithms can provide more representative features. This

approach is performed in several works, such as [4, 5, 6].

Our proposed k-Gabor method is also based on a pre-

processing stage, but in a way that provides a fast clustering

of the pixels pertaining to each internal structures (regions)

of the images. Therefore, k-Gabor provides an interesting

approach that integrates broad separation of the image re-

gions (given by the clustering step) and the texture (given

by the Gabor filters) from such regions.

There are many feature extraction algorithms for med-

ical images in the literature that focus on shape, texture

and color of the image regions. Fractal analysis has been

widely used in medical image diagnosis, taking frequently

advantage of the image shape characteristics. For instance,

the Triangular Prism Surface Area helps detecting brain’s

diseases [9], and is independent from geometric transfor-

mations of the images. The Fast Fractal Stack (FFS) [4]

method finds the image borders and compute the Fractal di-

mension of several levels of the image, which are generated

by different thresholds. The dimensionality of the feature

vector depends on the number of threshold’s levels.

Color-based feature extraction algorithms, such as His-

tograms, offer global information about the pixels inten-

sity distribution and are very fast to compute. Thus, His-

tograms are frequently present in CBIR systems. Usually,

when employing Histograms as feature vectors, the num-

ber of bins used are shrinked (from hundreds to dozens ele-

ments), in order to reduce the dimensionality of the feature

vectors. Examples of texture feature extractors commonly

employ co-occurrence matrixes [1], such as Gray Level Co-

occurrence Matrix (GLCM) [10] and the Haralick moments

[8], which need two parameters to represent the displace-

ments δx and δy in the axis X and Y to compare the pixel

values in order to build the co-occurrence matrixes and to

compute statistical features from them. The feature extrac-

tors based on co-occurrence matrixes are widely used in

many types of image datasets not just in medical images,

and are frequently employed as a baseline method regard-

ing the image’s texture.

Another texture approach employs Gabor filters [7],

which can be represented as a Gaussian function modulated

by a complex sinusoidal signal. Considering that θ is the

filter orientation, σ is the standard deviation that determines

the effective size of the neighborhood for both axis (σx and

σy , respectively) of a pixel in which the convolution is ap-

plied, and λ is the wavelength of the sinusoid, formally a

Gabor filter g(x, y) can be expressed as:

g(x, y) =

(

1

2πσxσy

)

exp

[

−1

2A

]

exp

(

2πiR1

λ

)

(1)

where,

A =

(

R2

1

σ2x

+R2

2

σ2y

)

(2)

[

R1

R2

]

=

[

cos θ sin θ− sin θ cos θ

] [

xy

]

(3)

The pixel position in the new filter is represented as xand y. Some filters generated by Equation 1 are shown in

Figure 1. These filters are applied over the input image to

obtain features of the filtered images. Thus, the variables

σ, λ, and θ can take a lot of values in their respective range.

It was shown in [3] that good values for these variables can

be obtained if they are related with the number of scales mand orientations n required by the Gabor feature extractor,

such as: σ = 1.5 ∗ 1.2m−1, λ =σ ∗ π

2and θ =

n ∗ π

16.

Thus, we have used these proposed values to build the filters

employed by our method to obtain the feature vectors.

Figure 1. Samples of Gabor filters

3 Proposed Method: k-Gabor

The proposed k-Gabor feature extractor comprises two

stages: (i) each original grayscale image is clusterized con-

sidering its pixels values using the k-Means algorithm, re-

sulting in a new set of images. Thus, each original image

will give a set of k new clusterized images. Afterwards,

every clusterized image generates a new image for analysis;

(ii) Gabor features are extracted from each image generated

in the first stage. Finally, all the features extracted from the

set of clusterized images composes the final feature vec-

tor, building the k-Gabor elements. Adequate separation

of the gray scale levels of the images helps at highlighting

the relevant information of the query image, since specific

levels or image clusters can identify tissues, characteristics

of regions of interest and/or discard redundant information,

such as the background. The Gabor feature extractor was

selected in order to take advantage of the image texture.

Recall that Gabor filters work with rotations and scales,

making the analysis of the internal structures in any level

of the original image stronger and more robust. Also, by

clustering image regions, our method takes advantage of

the shape characteristic of the regions. Thus, our method

integrates two (texture and shape) of the main characteris-

tics that are analyzed in images, spotting the most salient

components of each image. As a pre-processing step of our

method, histogram equalization of the grayscale image is

employed to improve the contrast of the internal structures

and to highlight their features.

��

��

��

��

��

��

��

��

��

�

��

� ��

�� !

!

��

�� !

!

��

�� !

!

��

Figure 2. The k-Gabor process for k = 3

3.1 k-Means clustering

Methods for clustering data are also applied to images

to clusterize their pixels. Such methods group the image

pixels in k groups considering their grayscale values. The

k-Means algorithm is used as a segmentation process [6] to

help other methods to improve their results. The k-Means

algorithm needs k centroids to group the nearest pixel val-

ues. Equation 4 shows how to compute the centroid µk for

each group. The original grayscale image I has pixels rep-

resented by Ii,j , where Ii,j ∈ {0, 1, . . . , 255}. In this work

we considered 2 ≤ k ≤ 10 in order to work with feature

vectors of at most thousands elements. The value k defines

the number of image outputs, where every output image is

a mask Mk with the same size of the input (original) image

I , but with only two colors: white or black.

White pixels represent the current mask selected by the

k-Means algorithm, and black pixels the background. This

mask will be applied to find matches between white pixels

and the original image. Formally, equations 4 and 5 show

how to obtain the mask Mk.

µk = k ∗ η/(k + 1) (4)

Mk = argmin∑

i

∑

j

||Ii,j − µk|| (5)

where µk represents the centroid of cluster k, η is the maxi-

mum pixel value in the image, which must be lower than or

equal to 255.

3.2 Obtaining the feature vector

The Gabor method generates a feature vector consider-

ing every clusterized image created by the k-Means algo-

rithm with the following configuration: 6 orientations and

4 scales. Since we have k masked (clusterized) images, the

size of the feature vector will be k∗6∗4∗ number of features

captured from each Gabor subspace. We are computing the

mean and standard deviation from each subspace (number

of features=2).

��

��

��

��

��

Figure 3. Matching between a mask image Mk

and an input (original) image I to obtain the

output image I∗k .

The matching performed between the original image and

every mask Mk generates the new image I∗k . In the I∗k im-

age, each white pixel value is replaced by its respective orig-

inal pixel value from the original image, as illustrated in

Figure 3. This process is summarized in equation 6.

I∗k(i,j)=

{

0, if Mk(i,j)= Black

Ii,j , if Mk(i,j)= White

(6)

The feature vector dimensionality is k times the number of

Gabor features and it is obtained following equation 7.

FeatV ect =

k⋃

i=1

Gabor(I∗i ) (7)

where Gabor(I∗i )→ {f1, f2, . . . , fn} is the function to ob-

tain the Gabor features; n is the dimensionality of the Gabor

feature extractor.

4 Experiments

We compared the efficacy our proposed k-Gabor method

to other well-known feature extractors from the literature:

the traditional gray-level Histograms with 256 bins, the

FFS method, the conventional Gabor, GLCM and Haral-

ick methods. The dimensions of the feature vector are 256,

8, 48, 22 and 13 respectively. The dimensionality of the

feature vector generated by our k-Gabor method depends

on the value k, which will be detailed in the discussions

regarding the results achieved. With the purpose of evalu-

ating the efficiency of the feature extractors we have used

the Precision&Recall (P&R) graphs [2] as an efficacy mea-

sure, since it is a well-known approach to measure the re-

trieval ability of a method. Precision and recall are defined

as: precision =|Ra|

|A|e recall =

|Ra|

|R|; where, |Ra| is the

number of relevant images retrieved; |A| the total of images

in the query result; |R| the number of relevant images in

the dataset. For each query, the κ most similar images were

retrieved using the κ-nearest neighbor (κNN) queries.

Thus, using the leave-one-out cross-validation [11] ap-

proach, each image in the dataset was used as query center

and compared with the Euclidean distance function. All the

features were normalized to avoid data distortions.

��

��

��

Figure 4. (a): Samples of medical images oflung; (b)(c)(d): the outputs of k-Means algo-rithm with k=3.

�

��

��

��

��

�

��

� �

��

��

��

��

�� ! �"

�#��

�$�!%�&'#��%(� ��

Figure 5. P&R graphs of the dataset “Lung

ROI-3258” dataset using k=3 to build the fea-

ture vector extracted by k-Gabor.

In order to have a fair comparison among the methods,

the configuration of the feature extractors were set follow-

ing their standard values: for GLCM and Haralick, the pa-

rameters of displacement of the values on axis δx and δywere set as 1 and 0 respectively.

For FFS, the quantity of stacks is s = 8. For the con-

ventional Gabor we used 4 scales and 6 orientations, the

same configuration of our k-Gabor descriptor. Addition-

ally our proposed method has one more parameter, the kvalue depends on the specific dataset, it can vary accord-

ing to the internal structure of the images. Values between

2 ≤ k ≤ 10 were empirically tested to set the adequate

value to each dataset. A properly k value aims to get the

best performance of the method. Finally, all the feature ex-

tractors were implemented in the same platform employing

Matlab.

4.1 Experiment 1: Lung ROI-3258 imagedataset

The first image base “Lung ROI-3258” was provided by

the Riberao Preto Hospital of the University of Sao Paulo

in Brazil. The image base consists of 3,258 ROIs, each of

which having 64x64 pixels. Each ROI was classified over

the segmented lung region by a specialist, as one of the

following classes: consolidation, emphysema, thickening,

honeycombing, normal and ground glass, with 451, 502,

590, 530, 590 and 595 images respectively. Hence, the

dataset has six classes including the normal lung class.

Samples of the medical image base “Lung ROI-3258” are

shown in Figure 4, where the images in Figure 4(a) are orig-

inal ones and the next images are generated by the k-Means

algorithm, so that images in Figure 4(b)(c)(d) are the output

of first, second and third clustering, respectively.

Figure 5 shows the P&R graphs from the comparison be-

tween our proposed method (k-Gabor) and other feature ex-

tractors representative from the literature. A rule of thumb

to read P&R graphs is: the higher the graph the better the

technique. Our k-Gabor method obtains the best precision

over the other methods being stable in all the relevant im-

ages using k = 3 (the number of clusters to the algorithm

k-Means). The following best extractor is the Histogram

that, in spite of its simplicity, presents good results in this

dataset.

The following feature extractor is Gabor. The next ex-

tractors FFS, Haralick and GLCM have similar behavior,

but all of them give lower precision than our k-Gabor. As

we used k = 3 and the dimensionality for the Gabor ex-

tractor is 48 (6 orientations, 4 scales and 2 features: mean

and standard deviation), the dimensionality of our k-Gabor

feature vector in this experiment is 144.

4.2 Experiment 2: Body-Image dataset

This dataset contains 944 images collected at the Riberao

Preto Hospital of University of Sao Paulo - Brazil. This

dataset has 6 classes: abdomen coronal, angiography, axial

head, coronal head, sagital head and sagital spine with 23,

36, 200, 236, 293 and 155 images, respectively.

Figure 6. Sample images of the “Body-Image” dataset used in the experiment #2.

�

��

��

��

��

�

��

� �

��

��

��

��

�� ! �"

�#��

$�%&#'"�!�(%� ��

Figure 7. P&R graphs of the “Body-Image”dataset using k=2 to build the feature vector

extracted by k-Gabor.

One sample of each class of the dataset 944 Body im-

age is shown in Figure 6. Figure 7 shows the results of the

feature extractors considering P&R graphs. Our k-Gabor

method obtains the best precision over all the other meth-

ods and is on the top in all the relevant images considering

k = 2. The Gabor feature extractor obtains initially com-

parable results with our proposal, but it decays more than

k-Gabor when the recall value increases. The dimensional-

ity of the feature vector extracted by the k-Gabor method to

this dataset is (48 * 2) = 96, because we used k = 2.

4.3 Experiment 3: Digital Database forScreening Mammography (DDSM)

This dataset contains 250 mammogram ROIs and was

taken from the Digital Database for Screening Mam-

mography (DDSM) at the University of South Carolina

(http://marathon.csee.usf.edu/Mammography). This dataset

has two classes: benign and malignant masses with 99 and

151 images respectively.

Figure 8 shows the P&R graphs with the comparative re-

sults of feature extractors. In this experiment, our k-Gabor

method is still on top of the graph, although it almost ties

with the FFS method. It is important to highlight that both

k-Gabor and FFS are built considering information of the

image regions. Thus, the dimensionality of k-Gabor to this

dataset is 480 using k = 10. This is a rather difficult dataset

to classify, due to the nature of the image. In fact, in this

case the traditional Gabor feature extractor does not have a

significant difference with other methods, like it does with

the other datasets, providing results comparable to the fea-

ture extractor Histogram.

�

��

��

��

��

�

��

� �

��

��

��

��

�� ! �"

�#��

��""�! �"#�$%&'� ��

Figure 8. P&R graphs of the “Digital Database

for Screening Mammography” dataset, usingk=10 to build the feature vector extracted by

k-Gabor.

4.4 System Results

Figure 9 shows a snapshot of our prototype system built

running a similarity query over the “Lung ROI-3258” im-

age dataset. The example is representative of the results

achieved and are summarized in the graphs of the experi-

ments Section. The prototype employs all the feature ex-

tractors evaluated in this paper. It is an interactive system

and the screen presented also shows the potential of the pro-

posed method k-Gabor. Table 1 shows a summarization of

the average feature extractor results of all datasets employed

in the experiments. We used the results of the P&R graphs

presented in figures 5, 7 and 8 to average the precision of

each feature extractor applied over each dataset. This table

highlights that k-Gabor presents significantly higher preci-

sion than any of the other methods.

��

��

��

��

��

��

�

��

�

��

��

��

��

��

�

��

��

��

��

��

��

��

Figure 9. Snapshot of the system using thedataset #1 “Lung ROI-3258”.

5 Conclusions

In this paper we have presented a new image feature ex-

traction method named k-Gabor, which employes the clus-

tering algorithm k-Means to group the image data into re-

gions of interest, based on tissues or internal structures of

the image, generating k new clustered images. On every

new clustered image, Gabor filters are applied, in order to

extract texture information from such clusters, to compose

the feature vector. The value of k can vary according to

the type of the images stored in the dataset, considering the

degree of detail demanded in the application. In this work,

the k value was empirically set to k = 2, 3, 10 to three dif-

ferent datasets of medical images. We compared the pro-

Table 1. Summarization of the precision re-sults (average) for the feature extractors con-

sidering all the datasets.

Lung Body DDSM

FFS 0.4443 0.6093 0.5848

GLCM 0.4627 0.6202 0.4738

Gabor 0.4919 0.7867 0.5043

Haralick 0.4205 0.6011 0.4485

Histograms 0.5407 0.5153 0.4860

k-Gabor 0.5809 0.8292 0.6057

posed k-Gabor method to several well-known feature ex-

tractors available in the literature, and the results reveal that

our method presented the best precision and retrieval of the

images when answering similarity queries. The total time

spent to compute the k-Gabor features was always within

fractions of seconds. Although this time can grow as k gets

larger values. If larger values of k are needed, the method

can be easily paralelized, since the Gabor filters are inde-

pently applied to each clusterized image. Further work in-

cludes designing approaches to select the most meaningful

features present in the feature vector, in order to drive its use

in indexing methods from database management systems,

aiming at speeding up the similarity query processing.

6 Acknowledgements

This research was supported by Brazilian National Re-

search Council (CNPq), Sao Paulo State Research Founda-

tion (FAPESP) and Brazilian Federal Funding Agency for

Graduate Education Improvement (CAPES).

References

[1] F. Albregtsen and Others. Statistical Texture Measures Com-

puted from Gray Level Coocurrence Matrices. Image Pro-

cessing Laboratory, Department of Informatics, University

of Oslo, pages 1–14, 1995.[2] R. Baeza-Yates, B. Ribeiro-Neto, and Others. Modern infor-

mation retrieval, volume 463. ACM press New York, 1999.[3] J. Cope, P. Remagnino, and S. Barman. Plant texture clas-

sification using gabor co-occurrences. Advances in Visual

Computing, 2010.[4] A. F. Costa, J. Tekli, and A. J. Traina. Fast Fractal Stack:

Fractal Analysis of Computed Tomography Scans of the

Lung. In International ACM Workshop on Medical Multi-

media Analysis and Retrieval, 2011.[5] P. Ghosh, S. Antani, L. Long, and G. Thoma. Review

of medical image retrieval systems and future directions.

Computer-Based Medical Systems, pages 1 –6, june 2011.[6] L. He, L. R. Long, S. Antani, and G. R. Thoma. Multiphase

Level Set Model with Local K-means Energy for Histology

Image Segmentation. 2011 IEEE First International Con-

ference on Healthcare Informatics, pages 32–39, July 2011.[7] B. Manjunath. Texture features for browsing and retrieval of

image data. Pattern Analysis and Machine, pages 837–842,

1996.[8] E. Miyamoto and T. Merryman. Fast calculation of Haralick

texture features. Human Computer Interaction Institute, De-

partment of Electrical and Computer Engineering, Carnegie

Mellon University, Pittsburgh, PA, 15213, 2011.[9] M. Tang and H. Nan Wang. Feature analysis of brain MRI

images based on fractal dimension. Annual International

Conference of the IEEE Engineering in Medicine and Biol-

ogy Society., Jan. 2005.[10] C.-h. Wei and C. Li. A content-based approach to medical

image database retrieval. Database Modeling for Industrial

Data, pages 258–291, 2006.[11] J. Yuan, Y. Li, C. Liu, and X. Zha. Leave-One-Out Cross-

Validation Based Model Selection for Manifold Regulariza-

tion. Advances in Neural Networks-ISNN 2010, 2010.

Documents

[IEEE 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS) - Rome, Italy (2012.06.20-2012.06.22)] 2012 25th IEEE International Symposium on Computer-Based