Click here to load reader
Upload
caetano
View
217
Download
3
Embed Size (px)
Citation preview
k-Gabor: A New Feature Extraction Method for Medical Images Providing
Internal Analysis
Gabriel Humpire-Mamani
University of Sao Paulo
Computer Science Dept.
Sao Carlos, Brazil
Agma J. M. Traina
University of Sao Paulo
Computer Science Dept.
Sao Carlos, Brazil
Caetano Traina Jr.
University of Sao Paulo
Computer Science Dept.
Sao Carlos, Brasil
Abstract
This paper proposes the k-Gabor method, a new im-
age feature extractor that captures texture information from
medical image regions without a costly segmentation usu-
ally associated to texture extractors. It employs Gabor fil-
ters, thus, the k-Gabor method can quantify texture infor-
mation from specific regions, tissues and internal structures
of the images providing a succint representation for a richer
image analysis. The feature vectors generated describe the
images more precisely than other methods from the litera-
ture, as shown in the experiments. Besides providing mean-
ingful information from the images, the cost to obtain it is
very small, since the total time to extract the k-Gabor fea-
tures was always only fractions of seconds.
1 Introduction
The problem of managing and extracting useful informa-
tion from the massive amount of medical images daily col-
lected in hospitals and medical centers has motivated the re-
search for new methods and algorithms supporting Content-
based Image Retrieval (CBIR). The quest for effective and
efficient image processing algorithms and methods that get
the essence of the image regarding specific criteria has mo-
tivated many works in the CBIR field, since it is the core
technique to provide effective similarity query processing.
Image processing methods are employed to extract from
each image its succint representation into feature vectors,
which are used in place of the images to process queries.
Searching by the image content supports the processing
of similarity queries. For example, considering an image
database of X-Ray chest images, a usual query could be:
“Given the John Doe X-Ray obtained in Jan 5th, 2012, re-
turn the 5 images from the database that are most similar
to it”, or “Given the John Doe X-Ray obtained in Jan 5th,
2012, return the images from the database that differ up to
10 units from it”.
Being able to receive images similar to the one under
analysis, together with their associated reports, is a valuable
asset for radiologists during the image analysis procedure,
because it adds new sources of relevant information to the
process, as well as it improves the physician confidence by
considering similar cases already treated or confirmed by
other exams. This procedure is also employed to train stu-
dents and residents in the field, allowing the comparison of
the residents analysis with the analysis of a more experi-
enced radiologist.
In this paper we present a new method based on Gabor
filters applied on specific image regions, which can properly
represent medical images considering the texture of their re-
gions. When compared to other well-known methods from
the literature, our proposed k-Gabor method provides better
results regarding the retrieval precision of the images.
This paper is organized as follows: Section 2 discusses
related work and previous researches in the area. Section
3 details our proposed new method and describes how it
works. The experiments, their discussions and interpreta-
tion are shown in section 4. Finally, conclusions are pro-
vided in section 5.
2 Background and Related Work
Image processing algorithms applied to medical images
usually employ pre-processing steps, such as histogram
equalization or segmentation in order to highlight the inter-
nal structures in the image. Therefore, the feature extraction
algorithms can provide more representative features. This
approach is performed in several works, such as [4, 5, 6].
Our proposed k-Gabor method is also based on a pre-
processing stage, but in a way that provides a fast clustering
of the pixels pertaining to each internal structures (regions)
of the images. Therefore, k-Gabor provides an interesting
approach that integrates broad separation of the image re-
gions (given by the clustering step) and the texture (given
by the Gabor filters) from such regions.
There are many feature extraction algorithms for med-
ical images in the literature that focus on shape, texture
and color of the image regions. Fractal analysis has been
widely used in medical image diagnosis, taking frequently
advantage of the image shape characteristics. For instance,
the Triangular Prism Surface Area helps detecting brain’s
diseases [9], and is independent from geometric transfor-
mations of the images. The Fast Fractal Stack (FFS) [4]
method finds the image borders and compute the Fractal di-
mension of several levels of the image, which are generated
by different thresholds. The dimensionality of the feature
vector depends on the number of threshold’s levels.
Color-based feature extraction algorithms, such as His-
tograms, offer global information about the pixels inten-
sity distribution and are very fast to compute. Thus, His-
tograms are frequently present in CBIR systems. Usually,
when employing Histograms as feature vectors, the num-
ber of bins used are shrinked (from hundreds to dozens ele-
ments), in order to reduce the dimensionality of the feature
vectors. Examples of texture feature extractors commonly
employ co-occurrence matrixes [1], such as Gray Level Co-
occurrence Matrix (GLCM) [10] and the Haralick moments
[8], which need two parameters to represent the displace-
ments δx and δy in the axis X and Y to compare the pixel
values in order to build the co-occurrence matrixes and to
compute statistical features from them. The feature extrac-
tors based on co-occurrence matrixes are widely used in
many types of image datasets not just in medical images,
and are frequently employed as a baseline method regard-
ing the image’s texture.
Another texture approach employs Gabor filters [7],
which can be represented as a Gaussian function modulated
by a complex sinusoidal signal. Considering that θ is the
filter orientation, σ is the standard deviation that determines
the effective size of the neighborhood for both axis (σx and
σy , respectively) of a pixel in which the convolution is ap-
plied, and λ is the wavelength of the sinusoid, formally a
Gabor filter g(x, y) can be expressed as:
g(x, y) =
(
1
2πσxσy
)
exp
[
−1
2A
]
exp
(
2πiR1
λ
)
(1)
where,
A =
(
R2
1
σ2x
+R2
2
σ2y
)
(2)
[
R1
R2
]
=
[
cos θ sin θ− sin θ cos θ
] [
xy
]
(3)
The pixel position in the new filter is represented as xand y. Some filters generated by Equation 1 are shown in
Figure 1. These filters are applied over the input image to
obtain features of the filtered images. Thus, the variables
σ, λ, and θ can take a lot of values in their respective range.
It was shown in [3] that good values for these variables can
be obtained if they are related with the number of scales mand orientations n required by the Gabor feature extractor,
such as: σ = 1.5 ∗ 1.2m−1, λ =σ ∗ π
2and θ =
n ∗ π
16.
Thus, we have used these proposed values to build the filters
employed by our method to obtain the feature vectors.
Figure 1. Samples of Gabor filters
3 Proposed Method: k-Gabor
The proposed k-Gabor feature extractor comprises two
stages: (i) each original grayscale image is clusterized con-
sidering its pixels values using the k-Means algorithm, re-
sulting in a new set of images. Thus, each original image
will give a set of k new clusterized images. Afterwards,
every clusterized image generates a new image for analysis;
(ii) Gabor features are extracted from each image generated
in the first stage. Finally, all the features extracted from the
set of clusterized images composes the final feature vec-
tor, building the k-Gabor elements. Adequate separation
of the gray scale levels of the images helps at highlighting
the relevant information of the query image, since specific
levels or image clusters can identify tissues, characteristics
of regions of interest and/or discard redundant information,
such as the background. The Gabor feature extractor was
selected in order to take advantage of the image texture.
Recall that Gabor filters work with rotations and scales,
making the analysis of the internal structures in any level
of the original image stronger and more robust. Also, by
clustering image regions, our method takes advantage of
the shape characteristic of the regions. Thus, our method
integrates two (texture and shape) of the main characteris-
tics that are analyzed in images, spotting the most salient
components of each image. As a pre-processing step of our
method, histogram equalization of the grayscale image is
employed to improve the contrast of the internal structures
and to highlight their features.
��
��
��
���
���
���
��������� ���� ������
�����
������������������������������
�
���������������������
� ���������� ������ �
��� !
!
��
��� !
!
��
��� !
!
��
Figure 2. The k-Gabor process for k = 3
3.1 k-Means clustering
Methods for clustering data are also applied to images
to clusterize their pixels. Such methods group the image
pixels in k groups considering their grayscale values. The
k-Means algorithm is used as a segmentation process [6] to
help other methods to improve their results. The k-Means
algorithm needs k centroids to group the nearest pixel val-
ues. Equation 4 shows how to compute the centroid µk for
each group. The original grayscale image I has pixels rep-
resented by Ii,j , where Ii,j ∈ {0, 1, . . . , 255}. In this work
we considered 2 ≤ k ≤ 10 in order to work with feature
vectors of at most thousands elements. The value k defines
the number of image outputs, where every output image is
a mask Mk with the same size of the input (original) image
I , but with only two colors: white or black.
White pixels represent the current mask selected by the
k-Means algorithm, and black pixels the background. This
mask will be applied to find matches between white pixels
and the original image. Formally, equations 4 and 5 show
how to obtain the mask Mk.
µk = k ∗ η/(k + 1) (4)
Mk = argmin∑
i
∑
j
||Ii,j − µk|| (5)
where µk represents the centroid of cluster k, η is the maxi-
mum pixel value in the image, which must be lower than or
equal to 255.
3.2 Obtaining the feature vector
The Gabor method generates a feature vector consider-
ing every clusterized image created by the k-Means algo-
rithm with the following configuration: 6 orientations and
4 scales. Since we have k masked (clusterized) images, the
size of the feature vector will be k∗6∗4∗ number of features
captured from each Gabor subspace. We are computing the
mean and standard deviation from each subspace (number
of features=2).
�������
�������� ����
�����������
������������������������������������� ��
������
Figure 3. Matching between a mask image Mk
and an input (original) image I to obtain the
output image I∗k .
The matching performed between the original image and
every mask Mk generates the new image I∗k . In the I∗k im-
age, each white pixel value is replaced by its respective orig-
inal pixel value from the original image, as illustrated in
Figure 3. This process is summarized in equation 6.
I∗k(i,j)=
{
0, if Mk(i,j)= Black
Ii,j , if Mk(i,j)= White
(6)
The feature vector dimensionality is k times the number of
Gabor features and it is obtained following equation 7.
FeatV ect =
k⋃
i=1
Gabor(I∗i ) (7)
where Gabor(I∗i )→ {f1, f2, . . . , fn} is the function to ob-
tain the Gabor features; n is the dimensionality of the Gabor
feature extractor.
4 Experiments
We compared the efficacy our proposed k-Gabor method
to other well-known feature extractors from the literature:
the traditional gray-level Histograms with 256 bins, the
FFS method, the conventional Gabor, GLCM and Haral-
ick methods. The dimensions of the feature vector are 256,
8, 48, 22 and 13 respectively. The dimensionality of the
feature vector generated by our k-Gabor method depends
on the value k, which will be detailed in the discussions
regarding the results achieved. With the purpose of evalu-
ating the efficiency of the feature extractors we have used
the Precision&Recall (P&R) graphs [2] as an efficacy mea-
sure, since it is a well-known approach to measure the re-
trieval ability of a method. Precision and recall are defined
as: precision =|Ra|
|A|e recall =
|Ra|
|R|; where, |Ra| is the
number of relevant images retrieved; |A| the total of images
in the query result; |R| the number of relevant images in
the dataset. For each query, the κ most similar images were
retrieved using the κ-nearest neighbor (κNN) queries.
Thus, using the leave-one-out cross-validation [11] ap-
proach, each image in the dataset was used as query center
and compared with the Euclidean distance function. All the
features were normalized to avoid data distortions.
������������������������������������������������������������������
�������� ����������������������������������������������������������
���������������������������� ������������������� ������������������� �����
Figure 4. (a): Samples of medical images oflung; (b)(c)(d): the outputs of k-Means algo-rithm with k=3.
�
���
���
���
���
�
��� ��� ��� ��� �� ��� �� ��� ��� �
� �
���
���
������
�����������
�� �������� �! �"
�#����
�$�!%�&'#���%(� ���
Figure 5. P&R graphs of the dataset “Lung
ROI-3258” dataset using k=3 to build the fea-
ture vector extracted by k-Gabor.
In order to have a fair comparison among the methods,
the configuration of the feature extractors were set follow-
ing their standard values: for GLCM and Haralick, the pa-
rameters of displacement of the values on axis δx and δywere set as 1 and 0 respectively.
For FFS, the quantity of stacks is s = 8. For the con-
ventional Gabor we used 4 scales and 6 orientations, the
same configuration of our k-Gabor descriptor. Addition-
ally our proposed method has one more parameter, the kvalue depends on the specific dataset, it can vary accord-
ing to the internal structure of the images. Values between
2 ≤ k ≤ 10 were empirically tested to set the adequate
value to each dataset. A properly k value aims to get the
best performance of the method. Finally, all the feature ex-
tractors were implemented in the same platform employing
Matlab.
4.1 Experiment 1: Lung ROI-3258 imagedataset
The first image base “Lung ROI-3258” was provided by
the Riberao Preto Hospital of the University of Sao Paulo
in Brazil. The image base consists of 3,258 ROIs, each of
which having 64x64 pixels. Each ROI was classified over
the segmented lung region by a specialist, as one of the
following classes: consolidation, emphysema, thickening,
honeycombing, normal and ground glass, with 451, 502,
590, 530, 590 and 595 images respectively. Hence, the
dataset has six classes including the normal lung class.
Samples of the medical image base “Lung ROI-3258” are
shown in Figure 4, where the images in Figure 4(a) are orig-
inal ones and the next images are generated by the k-Means
algorithm, so that images in Figure 4(b)(c)(d) are the output
of first, second and third clustering, respectively.
Figure 5 shows the P&R graphs from the comparison be-
tween our proposed method (k-Gabor) and other feature ex-
tractors representative from the literature. A rule of thumb
to read P&R graphs is: the higher the graph the better the
technique. Our k-Gabor method obtains the best precision
over the other methods being stable in all the relevant im-
ages using k = 3 (the number of clusters to the algorithm
k-Means). The following best extractor is the Histogram
that, in spite of its simplicity, presents good results in this
dataset.
The following feature extractor is Gabor. The next ex-
tractors FFS, Haralick and GLCM have similar behavior,
but all of them give lower precision than our k-Gabor. As
we used k = 3 and the dimensionality for the Gabor ex-
tractor is 48 (6 orientations, 4 scales and 2 features: mean
and standard deviation), the dimensionality of our k-Gabor
feature vector in this experiment is 144.
4.2 Experiment 2: Body-Image dataset
This dataset contains 944 images collected at the Riberao
Preto Hospital of University of Sao Paulo - Brazil. This
dataset has 6 classes: abdomen coronal, angiography, axial
head, coronal head, sagital head and sagital spine with 23,
36, 200, 236, 293 and 155 images, respectively.
Figure 6. Sample images of the “Body-Image” dataset used in the experiment #2.
�
���
���
���
���
�
��� ��� ��� ��� �� ��� �� ��� ��� �
� �
���
���
������
�����������
�� �������� �! �"
�#����
$�%&#'"�!�(%� ���
Figure 7. P&R graphs of the “Body-Image”dataset using k=2 to build the feature vector
extracted by k-Gabor.
One sample of each class of the dataset 944 Body im-
age is shown in Figure 6. Figure 7 shows the results of the
feature extractors considering P&R graphs. Our k-Gabor
method obtains the best precision over all the other meth-
ods and is on the top in all the relevant images considering
k = 2. The Gabor feature extractor obtains initially com-
parable results with our proposal, but it decays more than
k-Gabor when the recall value increases. The dimensional-
ity of the feature vector extracted by the k-Gabor method to
this dataset is (48 * 2) = 96, because we used k = 2.
4.3 Experiment 3: Digital Database forScreening Mammography (DDSM)
This dataset contains 250 mammogram ROIs and was
taken from the Digital Database for Screening Mam-
mography (DDSM) at the University of South Carolina
(http://marathon.csee.usf.edu/Mammography). This dataset
has two classes: benign and malignant masses with 99 and
151 images respectively.
Figure 8 shows the P&R graphs with the comparative re-
sults of feature extractors. In this experiment, our k-Gabor
method is still on top of the graph, although it almost ties
with the FFS method. It is important to highlight that both
k-Gabor and FFS are built considering information of the
image regions. Thus, the dimensionality of k-Gabor to this
dataset is 480 using k = 10. This is a rather difficult dataset
to classify, due to the nature of the image. In fact, in this
case the traditional Gabor feature extractor does not have a
significant difference with other methods, like it does with
the other datasets, providing results comparable to the fea-
ture extractor Histogram.
�
���
���
���
���
�
��� ��� ��� ��� �� ��� �� ��� ��� �
� �
���
���
������
�����������
�� �������� �! �"
�#����
��""�! �"#�$%&'� ���
Figure 8. P&R graphs of the “Digital Database
for Screening Mammography” dataset, usingk=10 to build the feature vector extracted by
k-Gabor.
4.4 System Results
Figure 9 shows a snapshot of our prototype system built
running a similarity query over the “Lung ROI-3258” im-
age dataset. The example is representative of the results
achieved and are summarized in the graphs of the experi-
ments Section. The prototype employs all the feature ex-
tractors evaluated in this paper. It is an interactive system
and the screen presented also shows the potential of the pro-
posed method k-Gabor. Table 1 shows a summarization of
the average feature extractor results of all datasets employed
in the experiments. We used the results of the P&R graphs
presented in figures 5, 7 and 8 to average the precision of
each feature extractor applied over each dataset. This table
highlights that k-Gabor presents significantly higher preci-
sion than any of the other methods.
�����
���������
��
���������
��
���������
�
���������
�
������� ���
��
���������
��
���������
�
���������
��
���������
��
���������
���
���������
Figure 9. Snapshot of the system using thedataset #1 “Lung ROI-3258”.
5 Conclusions
In this paper we have presented a new image feature ex-
traction method named k-Gabor, which employes the clus-
tering algorithm k-Means to group the image data into re-
gions of interest, based on tissues or internal structures of
the image, generating k new clustered images. On every
new clustered image, Gabor filters are applied, in order to
extract texture information from such clusters, to compose
the feature vector. The value of k can vary according to
the type of the images stored in the dataset, considering the
degree of detail demanded in the application. In this work,
the k value was empirically set to k = 2, 3, 10 to three dif-
ferent datasets of medical images. We compared the pro-
Table 1. Summarization of the precision re-sults (average) for the feature extractors con-
sidering all the datasets.
Lung Body DDSM
FFS 0.4443 0.6093 0.5848
GLCM 0.4627 0.6202 0.4738
Gabor 0.4919 0.7867 0.5043
Haralick 0.4205 0.6011 0.4485
Histograms 0.5407 0.5153 0.4860
k-Gabor 0.5809 0.8292 0.6057
posed k-Gabor method to several well-known feature ex-
tractors available in the literature, and the results reveal that
our method presented the best precision and retrieval of the
images when answering similarity queries. The total time
spent to compute the k-Gabor features was always within
fractions of seconds. Although this time can grow as k gets
larger values. If larger values of k are needed, the method
can be easily paralelized, since the Gabor filters are inde-
pently applied to each clusterized image. Further work in-
cludes designing approaches to select the most meaningful
features present in the feature vector, in order to drive its use
in indexing methods from database management systems,
aiming at speeding up the similarity query processing.
6 Acknowledgements
This research was supported by Brazilian National Re-
search Council (CNPq), Sao Paulo State Research Founda-
tion (FAPESP) and Brazilian Federal Funding Agency for
Graduate Education Improvement (CAPES).
References
[1] F. Albregtsen and Others. Statistical Texture Measures Com-
puted from Gray Level Coocurrence Matrices. Image Pro-
cessing Laboratory, Department of Informatics, University
of Oslo, pages 1–14, 1995.[2] R. Baeza-Yates, B. Ribeiro-Neto, and Others. Modern infor-
mation retrieval, volume 463. ACM press New York, 1999.[3] J. Cope, P. Remagnino, and S. Barman. Plant texture clas-
sification using gabor co-occurrences. Advances in Visual
Computing, 2010.[4] A. F. Costa, J. Tekli, and A. J. Traina. Fast Fractal Stack:
Fractal Analysis of Computed Tomography Scans of the
Lung. In International ACM Workshop on Medical Multi-
media Analysis and Retrieval, 2011.[5] P. Ghosh, S. Antani, L. Long, and G. Thoma. Review
of medical image retrieval systems and future directions.
Computer-Based Medical Systems, pages 1 –6, june 2011.[6] L. He, L. R. Long, S. Antani, and G. R. Thoma. Multiphase
Level Set Model with Local K-means Energy for Histology
Image Segmentation. 2011 IEEE First International Con-
ference on Healthcare Informatics, pages 32–39, July 2011.[7] B. Manjunath. Texture features for browsing and retrieval of
image data. Pattern Analysis and Machine, pages 837–842,
1996.[8] E. Miyamoto and T. Merryman. Fast calculation of Haralick
texture features. Human Computer Interaction Institute, De-
partment of Electrical and Computer Engineering, Carnegie
Mellon University, Pittsburgh, PA, 15213, 2011.[9] M. Tang and H. Nan Wang. Feature analysis of brain MRI
images based on fractal dimension. Annual International
Conference of the IEEE Engineering in Medicine and Biol-
ogy Society., Jan. 2005.[10] C.-h. Wei and C. Li. A content-based approach to medical
image database retrieval. Database Modeling for Industrial
Data, pages 258–291, 2006.[11] J. Yuan, Y. Li, C. Liu, and X. Zha. Leave-One-Out Cross-
Validation Based Model Selection for Manifold Regulariza-
tion. Advances in Neural Networks-ISNN 2010, 2010.