Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
128
CHAPTER 5
WBC IMAGE SEGMENTATION AND CLASSIFICATION
USING RELEVANCE VECTOR MACHINE
5.1 INTRODUCTION
Medical Image Segmentation becomes vital process for its proper
detection and diagnosis of diseases. In which accurate White Blood Cells
segmentation becomes important issue because differential counting, plays a
major role in the determination the diseases and based on it the treatment is
followed for the patients. The Standard Modified Fuzzy Possibilistic C-Means
is used for segmentation. A WBC image classification method is based on
Relevance Vector Machines (RVMs) is given by Alagappan et al (2012). It is
proposed to use a Fast RVM based approach for the segmentation of WBC
images. Modified RVM is much faster testing time, compared to standard
RVM based classification. Modified RVM based classification approach is
more suitable for applications that require low complexity, possibly and real-
time classification. The proposed methodology of WBC classification using
MRVM is shown in Figure 5.1.
Figure 5.1 Proposed Methodology
Blood Cell
Images
Segmentation of WBC
Feature Extraction
Modified RVM
Classification
129
The Modified RVM has an identical functional form to the Fast
Relevance Vector Machine, but provides probabilistic classification. Firstly,
the astonishingly sparse Relevance Vectors (RVs) are obtained while fitting
the 1Dhistogram by MRVM. Finally the entire connective WBC regions are
segmented from the original image. It also has the advantages, such as high
computation efficiency and no extra parameter setting.
5.2 WBC DETECTION
The automatic detection of white blood cells (WBCs) still remains
as an unsolved issue in medical imaging. The analysis of WBC images has
engaged researchers from fields of medicine and computer vision alike. This
work presents an algorithm for the automatic detection of WBC embedded in
complicated and cluttered smear images that considers the complete process
as an image detection problem. The approach, which is based on the proposed
algorithm, transforms the detection task into an optimization problem.
Although detection algorithms based on optimization approaches present
several advantages in comparison to traditional approaches, they have been
scarcely applied to WBC detection.
5.3 FEATURE EXTRACTION
In Image processing, feature extraction is a particular variety of
dimensionality reduction. When the input information to an algorithm is too
huge to be processed, then the input information will be transformed into a
reduced representation set of features. Transforming the input data into the set
of features is called feature extraction. The cell feature extraction is based on
four main groups.
130
They are:
Textural
Colour
Geometrical
Shape
5.3.1 Texture Feature
In image any two pixels are assumed as a and b, the spatial
relationship between the two points is d x, y), the probability of the gray
levels of a and b are i and j respectively is pd(i,j), after all pixels in image are
traversed, the gray level co-occurrence matrix of image can be acquired. It
can be seen that gray level co-occurrence matrix is texture analysis method on
the basis of estimating the image of the second order combination conditional
probability density function, where contrast, entropy, angular second moment
and deficit moment are adopted as texture features of image. The mean value,
variance, skewness, and kurtosis are used for the texture features.
Contrast reflects image clarity and depth of texture grooves as,
(5.1)
Entropy value reflects image information amount, if there is no
defect in surface, there is almost no texture information and the entropy value
is nearly zero as,
(5.2)
131
Angular second moment reflects image gray levels distribution and
thickness of texture as,
(5.3)
Deficit moment reflects image texture homogeneity which measures image
local texture changes as,
(5.4)
After image generated gray level co-occurrence matrix, the above
four parameters are calculated and the gray texture feature vector of image is
acquired.
5.3.2 Colour Feature
Colour histogram method, colour aggregation vector method and
colour set method are common colour feature extraction methods, but the
acquired feature vector dimension of these methods and the algorithm
complexity are high. Where colour moment method is adopted to describe
image colour characteristics, because image colour distribution is focused on
low order moments, where first order moment describes average colour,
second order moment describes colour variance and third order moment
describes colour shift properties as Equations (5.5), (5.6) and (5.7)
(5.5)
132
(5.6)
(5.7)
Where Pij represents the probability of the pixels of gray level j emerging in
the i colour channel component, and N represents the number of image pixels.
In this work HSI colour space is adopted because it is closer to the human eye
for colour perception, and the image colour information is mainly focused on
the components of chrominance H and saturation S, so the first three colour
moments of components H and I are computed, and 6-D feature vector can be
acquired.
5.3.3 Geometrical Feature
In geometrical features are widely used, as various blood cells
differ greatly by their size or nucleus shape. Geometrical features are
computed on the basis of Region of Interest ROI (R) , which has well defined
closed boundary composed of a set of geometrical features such as area,
perimeter, centroid, tortuosity (Area/Perimeter), Compactness C-given by the
formula: perimeter2/area and radius. In this work, a simple step-by-step
procedure for detection and segmentation of blood smear particles has been
presented using the geometrical features.
5.3.4 Shape Feature
Shape does not refer to the shape of an image but to the shape of a
particular region that is being sought out. Shapes will often be determined
first applying segmentation or edge detection to an image. Other methods use
133
shape filters to identify given shapes of an image. In some case accurate shape
detection will require human intervention because methods like segmentation
are very difficult to completely automate.
5.4 SUPPORT VECTOR MACHINE (SVM)
Support Vector Machines are a set of supervised learning methods
used for classification, regression
The advantages of support vector machines are
Effective in high dimensional spaces
Still effective in cases where a number of dimensions is greater
than the number of samples
Memory efficient: Uses a subset of training points in the
decision function (called support vectors)
Versatile: different kernel functions can be specified for the
decision function. Common kernels are provided, but it is also
possible to specify custom kernels.
The disadvantages of support vector machines include:
If the number of features is much greater than the number of
samples, the method is likely to give poor performances
SVMs do not directly provide probability estimates, these are
calculated using an expensive five-fold cross-validation.
5.4.1 SVM Application
SVMs are currently among the best performers for a number of
classification tasks ranging from text to genomic data. SVMs can be applied
134
to complex data types beyond feature vectors (e.g. graphs, sequences, and
relational data) by designing kernel functions for such data. Its techniques
have been extended to a number of tasks such as regression, principal
component analysis, etc. Most popular optimization algorithms for SVMs use
decomposition to hill-climb over a subset of i s at a time, e.g. SMO. Tuning
SVMs remains a black art: selecting a specific kernel and parameters is
usually done in a try-and-see manner.
Today, the researches on image surface defects are focused on
defects detection, but hardly focused on defects classification on the basis of
determining defect area correctly; the thesis focuses on surface defects
classification in order to meet the requirements of grading standards, then
achieves automatic image classification.
5.5 RELEVANCE VECTOR MACHINE (RVM)
Given a training data set of input-target pairs D={(xi , yi)}1i=1,
RVM follows the standard probabilistic formulation and assumes that the
targets are samples from the model with additive noise:
ti= y(xi;w) + i (5.8)
Where i are error terms which are generally assumed to be independent
identically distributed Gaussian variables with mean zeros and variance 2 .
The likelihood function can be written as:
(5.9)
Where is the design matrix with x1 x1T , in
which xi k xi x1 xi x1T and k is a kernel.
135
Maximum likelihood estimation of w and 2 from will generally
lead to severe over fitting, so RVM encodes a preference for smoother
functions by defining an automatic relevance determination Gaussian prior
over the weights:
(5.10)
The posterior over the weights is then obtained from Bayesian rule:
(5.11)
Where
By integrating the weights of the product: P t w 2 w RVM
obtains the marginal likelihood for the hyper-parameters:
(5.12)
Because the values of and 2 that maximize the function defined
in equation (5.12) cannot be obtained in closed form, RVM considers an
alternative formula for iterative re-estimation of and 2:
(5.13)
During re-estimation, many of the I approach infinity, and the
corresponding weights approach zeros, implying that the corresponding
136
i finally stabilize at some finite
numbers. The corresponding xi
demonstrated that the solution obtained by RVM is astonishingly sparse,
which is helpful for the improvement of our algorithm efficiency. That is one
of the reasons why they choose RVM as the fitting tool rather than SVM in
this work. Another important reason is that RVM needs no extra parameter
setting, which makes our method more convenient. In contrast, for SVM, it is
necessary to estimate the error/margin trade- C
-validation
procedure, which is wasteful both of data and computation.
In Mathematics, a Relevance Vector Machine (RVM) is a machine
learning technique that uses Bayesian inference to obtain parsimonious
solutions for regression and probabilistic classification. The RVM has an
identical functional form to the support vector machine, but provides
probabilistic classification.
It is actually equivalent to a Gaussian process model
with covariance function:
(5.14)
where is the kernel function (usually Gaussian), and are the input
vectors of the training set.
Compared to that of support vector machines (SVM), the Bayesian
formulation of the RVM avoids the set of free parameters of the SVM (that
usually require cross-validation-based post-optimizations). However, RVMs
use an Expectation Maximization (EM)-like learning method and are
therefore at risk of local minima. This is unlike the standard sequential
137
minimal optimization (SMO)-based algorithms employed by SVMs, which
are guaranteed to find a global optimum (of the convex problem).
However, despite its success, they can identify a number of
significant and practical disadvantages of the support vector learning
methodology:
Although relatively sparse, SVMs make unnecessarily liberal
use of basic functions since the number of support vectors
required typically grows linearly with the size of the training
set. Some form of post-processing is often required to reduce
computational complexity.
Predictions are not probabilistic. In regression the SVM outputs
a point estimate, and in classification, a 'hard' binary decision.
Ideally, they desire to estimate the conditional distribution p(t x)
in order to capture uncertainty in our prediction. In regression
this may take the form of 'error-bars', but it is particularly
crucial in classification where posterior probabilities of class
membership are necessary to adapt to varying class priors and
asymmetric misclassification costs. Posterior probability
estimates have been coerced from SVMs via post-processing,
although they argue that these estimates are unreliable.
It is necessary to estimate the error/margin trade-off parameter
'C' (and in regression, the insensitivity parameter ' ' too). This
generally entails a cross-validation procedure, which is wasteful
both of data and computation.
The kernel function K(x; xi) must satisfy Mercer's condition.
That is, it must be the continuous symmetric kernel of a positive
integral operator.
138
The 'relevance vector machine' (RVM) is a Bayesian treatment
which does not suffer from any of the above limitations. Specifically, they
adopt a fully probabilistic framework and introduce a prior over the model
weights governed by a set of hyper parameters, one associated with each
weight, whose most probable values are iteratively estimated from the data.
Sparsity is achieved because in practice they find that the posterior
distributions of many of the weights are sharply (indeed infinitely) peaked
around zero. They term those training vectors associated with the remaining
non-zero weights 'relevance' vectors, in deference to the principle of
automatic relevance determination which motivates the presented approach.
The most compelling feature of the RVM is that, while capable of
generalization performance comparable to an equivalent SVM, it typically
utilizes dramatically fewer kernel functions.
5.5.1 RVM Theory
The methods and the functions can be used for this RVM theory
can be discussed by Tzikas et al (2006).
A) Multi-kernel Relevance Vector Machine
Relevance vector machine (RVM) is a special case of a sparse
linear model, where the basic
centred at the different training points:
(5.15)
While this model is similar in form to the support vector machines (SVM), the
139
Multi-kernel RVM is an extension of the simple RVM model. It consists of
several different types of kernels m
(5.16)
The sparseness property enables automatic selection of the proper
kernel at each location by pruning all irrelevant kernels, though it is possible
that two different kernels remain on the same location.
B) Sparse Bayesian Prior
A sparse weight prior distribution can be obtained by modifying the
commonly used Gaussian prior, such that a different variance parameter is
assigned for each Weight:
(5.17)
Where = ( 1 M) is a vector consisting of M hyper parameters, which are
treated as independent random variables. A Gamma prior distribution is
assigned on these hyper parameters:
p( i) = Gamma (a,b) (5.18)
Where a and b are constants and are usually set to zero, which results in a flat
Gamma distribution. By integrating over the hyper parameters, they can
obtain the p(w a)p(a)da. The above integral gives a
student prior, which is known to enforce sparse representations, owing to the
fact that its mass is mostly concentrated near the origin and the axes of
definition.
140
C) Bayesian Inference
Assuming independent, zero-mean, Gaussian noise with variance -1, i.e.,
they have the likelihood of the observed data as:
(5.19)
N×N or an N×(N*M
multi kernel cases respectively. This matrix is formed by all the basis
functions evaluated at all the training points, i.e., x1 xNT where
xi 1 xi-x1 1 xi-xN ,.., M xi-xNT. (5.20)
In order to make predictions using the Bayesian model, the
parameter posterior distribution p(w t) needs to be computed.
Unfortunately, it cannot be computed analytically due to its complexity, and
approximations have to be made. The following procedure describes that they
decompose the parameter posterior as:
Where
(5.21)
(5.22)
141
and
A=diag( 1,.., M)
The posterior of the hyper parameters p t ) cannot be
computed analytically and is approximated by a delta function at its mode:
(5.23)
They can find MPand MPby maximizing:
Written as:
(5.24)
And
(5.25)
The term p ( t -II
likelihood and is computed by marginalizing the weights:
,
Which yields,
(5.26)
(5.27)
An alternative approach is to follow the variation Bayesian
methodology to obtain an approximation to the posterior parameter
142
distribution p(w t). This is demonstrated, but it is concluded that the method
achieves only slightly improved results at significant additional computations.
D) Marginal Likelihood Optimization for Fast RVM
MP cannot be solved analytically and
an iterative method has to be used. Instead of maximizing the hyper parameter
posterior, it is equivalent, and more convenient, to minimize its negative log
likelihood which for the multi kernel case is:
(5.28)
Where This equation whenM = 1 gives the single kernel
case.
Setting the derivative of L
(5.29)
mi is the mi- (mi)(mi) is the
mi-th diagonal element of the posterior weight covariance. At each iteration,
mi (mi)(mi) MP. Similarly,
the following formula can be obtained for the variance parameter:
(5.30)
O((NM)3) computations, which can be
very demanding for models with many basis functions. During the training
143
process, basic functions whose corresponding weights are estimated to be
and its inversion will be easier. However, there are M basis functions initially
that the iterative updates for the hyper parameters can also be derived using
an expectation-maximization (EM) algorithm by treating the weights w as
hidden variables and the observations t
observed variables.
E) Incremental Optimization
A more efficient approach is the incremental algorithm and it is
used in this process. The model is initially assumed to contain only one basic
function, and basic functions are incrementally added or deleted subsequently.
For the case of a flat prior on hyper parameter a, maximization of the
marginal likelihood is equivalent to maximizing:
(5.31)
Given a single hyper parameter i they can decompose L
(5.32)
Where L( -i i and
(5.33)
144
With while C i is matrix C with the contribution of
basis
Function i removed, i.e., . Analysis of l( i) shows that L
i:
(5.34)
Thus, they can find aMPby iteratively:
adding a basic i with qi2>si,
re- i for a basic function already in
the model, or
deleting a basic iwith qi2 si.
When adding a basic function or re-estimating the value of its hyper
parameter, they set
(5.35)
which maximizes L
Vectors s and q are calculated using an iterative algorithm that utilizes their
value from the previous iteration and the details of these calculations can be
found.
This incremental algorithm successfully overcomes the major
145
one basic function can be modified, significantly more iteration is required to
reach convergence. Convergence could be faster by choosing at each step to
modify the basic function that leads to the largest increase of the marginal
likelihood. However, this requires evaluating the marginal likelihood increase
for all the basic functions at each step and is computationally expensive.
Overall, the incremental algorithm is a major improvement over the initial
non incremental algorithm. However, it is still computationally demanding for
very large datasets.
5.5.2 Application for RVM
The RVM process is an iterative one and involves repeatedly re-
estimating and until a stopping condition is met.
Algorithm for RVM
The steps are as follows:
1. Select a suitable kernel function for the data set and relevant
parameters. Use this kernel function to create the design
matrix .
2. Establish suitable convergence criteria for and , e.g. a
threshold value for change Thresh between one iteration's
estimation of and the next = i=1 ain+1-ai
n so that re-
estimation will stop when < Thresh.
3. Establish a threshold value Thresh which it is assumed an i is
tending to infinity upon reaching it.
4. Choose starting values for and .
146
5. Calculate m = m = Tt and = (A+ T )-1.
6. Update
7. Prune the i and corresponding basic functions where
i> Thresh.
8. Repeat steps (5) to (7) until the convergence criteria is met.
Our hyper parameter values and which result from the above
procedure are those that maximize our marginal likelihood and hence those
are used when making a new estimate of a target value t for a new input x :
t = mT ( x ) (5.37)
The variance relating to our confidence in this estimate is given by:
2(x1) = -1 + (x1)T (x1) (5.38)
The algorithm is summarized below:
Step 1: Form a compact histogram from a given microscopic image;
Step 2: Use the RVM to approximate the above compact histogram and
obtain all the RVs;
Step 3: Seek the threshold from the RV set to ensure it to occupy the
deepest concavity;
Step 4: Use the so-obtained threshold to segment the given image;
Step 5: Perform morphological operations to the above image in order to
obtain the entire connective WBC region.
147
Given a training data set of input-target pairs ,
RVM follows the standard probabilistic formulation and assumes that the
targets are samples from the model with additive noise:
(10) (5.39)
Where are error terms which are generally assumed to be independent
identically distributed Gaussian variables with mean zeros and variance .
5.5.3 Decision Function in RVM
In the first part, the RVM algorithm and the
way to apply a multiple kernel strategy. The relevance vector machine is a
probabilistic sparse kernel model that has been introduced. The aim is to
reveal the underlying distribution of a set of data {xi, yi}i=1...n, where x Rd.
p(y|x)
standard deviation coming from the addition of a Gaussian noise : N(0,
. (5.40)
-e cient associated to each support vector.
So they can rewrite the probability of the data according to the parameters:
(5.41)
With a n × (n + 1) matrix containing the kernel and a bias : =
148
The key of this approach is to define a prior on each coefficient wi.
According to the Automatic Relevance Determination mechanism, all
coefficients which are unnecessary are pruned. This mechanism explained the
sparsity of the solution, since it prunes all parameters that add complexity to
the probabilistic model. By pruning coefficients, the likelihood is then
maximized regarding the input data. This presents the adaptation of this
algorithm for the classification case.
Similar to regression, RVM has also been used for classification.
Consider a two-class problem with training points X={x1,...,xN} and
corresponding class labels t={t1 ,...,tN } with ti {0,1}. Based on the
Bernoulli distribution, the likelihood (the target conditional distribution) is
expressed as:
(5.42)
y) is the logistic sigmoid function:
(5.43)
Unlike the regression case, however, the marginal likelihood p(t
can no longer be obtained analytically by integrating the weights from (1),
and an iterative procedure has to be used.
i denotes the maximum a posteriori (MAP) estimate of the
i. The MAP estimate for the weights, denoted by wMAP, can
be obtained by maximizing the posterior distribution of the class labels given
the input vectors.
149
This is equivalent to maximizing the following objective function:
(5.44)
Where the first summation term corresponds to the likelihood of the class
labels, and the second term corresponds to the prior on the parameters wi. In
the resulting solution, only those samples associated with nonzero coefficients
wi(called relevance vectors) will contribute to the decision function.
The gradient of the objective function J with respect to w is:
(5.45)
Where i,j=K(xi,xj) .
(5.46)
The Hessian of J is:
Where B= ( 1,..., N)is a diagonal matrix with .
The posterior is approximated around WMAPby a Gaussian
approximation with covariance,
(5.47)
and mean
(5.48)
150
These results are identical to the regression case and the hyper
i is updated iteratively in the same manner as for the regression
case.
The Relevance Vector Machine (RVM) technique has been applied
in many different areas of pattern recognition, including communication
channel equalization, head model retrieval, feature optimization, functional
neuro images analysis and facial expressions recognition. In this thesis, two
applications are discussed: the first concerns the application of large scale
multi kernel RVM for object detection in large scale images, while the second
deals with computer-aided diagnosis of micro calcifications in digitized
mammograms.
Computational techniques are now characteristically applied in the
field of turbo equipment. In the improvement of turbo machinery blades, a
variety of dedicated computer codes are required to assess and modify a
design before the prototype is made. Consequently, the accessibility of good
analysis and design codes is vital for manufacturers to stay at the forefront of
a very aggressive field of engineering. Inverse methodologies and practical
automatic optimization procedures make available a systematic means of
design, reducing the considerable time and cost frequently incurred in the
conventional technique of iterating between analyzing the design and
modifying the blade shape manually. Many 2-D inverse design methods are
obtainable, such as those in references, and are commonly used in the
preliminary stage of the design process for axial turbo machines. In modern
years, 3-D inverse design methods have emerged and have been applied
successfully for a wide assortment of designs, involving both turbo machinery
blades and wings.
151
5.6 CLASSIFICATION OF WHITE BLOOD CELLS
To classify the WBC to its respective subtype, it uses features that
describe the characteristics of the cytoplasm and the nucleus. This work
chooses set of features such as area, equidiameter, circularity, perimeter,
convex area, solidity, orientation, eccentricity, ratio of area of nucleus,
cytoplasm, majoraxislength/minoraxislength, separately evaluated for the
nucleus and the WBC. The result obtained from the previous step gives us
information about the broad nucleus type (segmented or nonsegmented). This
result is a novel binary feature added to our classifier. In addition features like
"circularity" (ratio between the perimeter of the tightest bounding circle and
the nuclear perimeter) of the nucleus, nucleus to cytoplasm ratio, ratio of
nucleus area to area of WBC, entropy of the cytoplasm, and mean gray-level
intensity of the cytoplasm (all three colour channels) are computed.
linear discriminant is used to reduce our multidimensional dataset to six
dimensions. It uses a linear discriminant in this six-dimensional space to
classify the data to their respective type.
Linear Discriminant Analysis (LDA) is used to find a linear
combination of the features which characterizes or separates these five classes
of WBCs. The classifier is biased using the number of samples in each class.
The system is evaluated using 10-fold cross-validation. Cross-validation is a
technique for assessing how the results of a statistical analysis will generalize
to an independent dataset. It is mainly used in settings where the goal is
prediction, and one wants to estimate how accurately a predictive model will
perform in practice. One round of cross-validation involves partitioning a
sample of data into complementary subsets, performing the analysis on one
subset (called the training set), and validating the analysis on the other subset
(called the validation set or testing set). To reduce variability, multiple rounds
of cross-validation are performed using different partitions, and the validation
152
results are averaged over the rounds. The functions from the Statistics
Toolbox in MATLAB have been used to analyze the data.
5.7 IMAGE ANALYSIS
Automatic recognition of white blood cells in light microscopic
images usually consists of four major steps, including: preprocessing, image
segmentation, feature extraction, and classification are shown by Rezatofighi
et al (2009). The pre-processing stage usually includes image enhancement of
acquired image and is essentially performed in order to prepare the image for
the vital segmentation stage. Individual objects of interest are separated from
the background in the segmentation process. This is followed by a labelling
operation (post-processing) in which, segmented objects of interest are tagged
with unique labels that can be used to count the number of objects in the
image. These labels along with spatial information of the segmented objects
are used for the subsequent feature extraction procedure. The geometrical
features are used to identify and classify the leukocyte cells, namely,
lymphocyte, monocyte, and neutrophil. The proposed method for the
segmentation and classification of blood cell (leukocytes) is given below
Algorithm 1: Training phase
Step 1: Input the leukocyte colour cell image.
Step 2: Convert the colour image into grayscale image.
Step 3: Apply histogram equalization on grayscale image.
Step 4: Perform pre-processing by using morphological operations,
namely, erosion, reconstruction and dilation.
Step 5: Segment the image of Step 4 by global thresholding and obtain
resulting binary image.
153
Step 6: Remove the border touching cells obtained in binary image and
then perform labelling the segmented binary image.
Step 7: For each labelled segment, compute geometric shape features
(area,MajorAxislength/MinorAxislength, perimeter, circularity) and
store them.
Let aik be the value of ith parameter for kth class. The i=1,2,3,4
correspond to area, MajorAxislength/MinorAxis length, perimeter,
circularity, respectively; and k=1,2,3 correspond to lymphocyte,
monocyte, neutrophil, respectively.
Step 8: Repeat Steps 1 to 7 for all the training images.
Step 9: Compute minimum and maximum values of features of leukocyte
cells, denoted by akimin and ak
imax,for all i and k, and store them as
knowledgebase.
Algorithm 2: Testing phase
Step 1: Input the leukocyte colour cell image.
Step 2: Convert the colour image into grayscale image.
The image processing techniques were applied to extract the needed
feature (e.g. size, colour
types of WBC.
Step 3: Apply histogram equalization on gray scale image.
Step 4: Perform pre-processing by using morphological operations,
namely, erosion, reconstruction, and dilation.
Step 5: Segment the image of Step 3 using global thresholding and obtain
resulting binary image.
154
Step 6: Remove the border touching cells obtained in binary image and
then perform labellingthe segmented binary image.
Step 7: For each labeled segment, compute geometric shape features ai,
i=1,2,3,4
Step 8: Apply rule for classification of the leukocyte cells; if ai lies in the
range [akimin, ak
imax], for i=1,2,3,4, then thecell (labeled segment)
belongs tokth class, wherek=1,2,3corresponds to lymphocyte,
monocyte, and neutrophil respectively.
Step 9: Repeat the Steps 7 and 8 for all labeled segments and output the
Classification of identified leukocyte cells.
An improved algorithm has also proposed for identification and
classification of white blood cells in digital microscopic images using colour
image segmentation method. The ratio of areas of nucleus and cytoplasm of a
cell as a prominent feature is presented. For a given input image, the colour
image analysis is carried out based on HSV model. The experimental results
are compared with the manual results obtained by pathologist. The
performance of the proposed algorithm is analyzed for four different feature
sets.
The input RGB image of leukocyte cell is converted into HSV
colour space and then only hue is considered. From the observation, it is clear
that the hue value for the cell lies between0.7 and 0.85. So, if the hue is
between 0.7 and 0.85, then cell portion is extracted which contains the
spurious regions along with it. These spurious regions can be eliminated by
removing regions whose total number of pixels is less than TA=1000. The
actual cell region is obtained after removing the spurious regions. Now,
nucleus has to be extracted from the already extracted cell region. Here, only
155
the saturation is considered to extract the nucleus. Empirically, it is observed
that, the nucleus has high saturation and it is above 0.45. Applying the above
thresholding, it yields binarized images of nucleus and that of cell region.
Finally, the cytoplasm region is obtained by subtracting binary
image of nucleus from that of cell. For the experimentation, they use three
feature sets, namely, F2, F3, and F4 for classification and compare with the
results obtained for feature set F1 of the previous method:
F2= (area, eccentricity, equivdiameter, perimeter, circularity)
F3= (area, eccentricity, equidiameter, perimeter, circularity,
ratio of areas of nucleus and cytoplasm
F4= (area, majoraxislength/minoraxislength, perimeter,
circularity, ratio of areas of nucleus, and cytoplasm)
The feature set F1= (area, majoraxislength / minoraxislength,
perimeter, circularity) is extended to F4 and F2 is extended to F3by
considering an extra feature, namely, the ratio of nucleus area and cytoplasm
area.
5.8 EXPERIMENTAL RESULTS
To evaluate the results of the techniques, the experiment is
conducted on various blood cell images. The blood cell image contains RBC,
WBC, and platelets. From those the WBC are alone segmented and its
number of WBC detected by various techniques is compared with actually
present in the image which is manually obtained. Out of the 85 images, 62
samples are considered as training data and remaining 23 samples as testing
data. The ground truth for the complete dataset collected from the pathologist.
156
A relevance Vector Machine classifier was compared with its
support vector counterpart, using the same Gaussian kernel. A value of C for
the SVM was selected.
The Table5.1 presents the geometric feature values computed for
the segmented leukocyte cells, namely, lymphocytes, monocytes, and
neutrophil of the image.
Table 5.1 The geometric feature values (F1) of the cell, regions of the images in Figure 1(c), (f) and (i)
Cell Types Area MajorAxisLength/ MinorAxisLength
Perimeter Circularity
Lymphocytes 883 1.0605 128 0.67734
Monocyte 1340 1.2850 220 0.34796
Neutrophil 1806 1.3428 314 0.34796
5.8.1 Accuracy and Processing Time of RVM
The Accuracy and Processing Time for SVM, RVM and Proposed
fast RVM are shown in Table 5.2. Thus the accuracy of the proposed
approach is to be high and processing time is to be less when compared to the
SVM.
Table 5.2 Accuracy and Processing Time for Fast RVM
Feature Set
Accuracy (%) Processing Time (Seconds)
SVM RVM Fast
RVM SVM RVM
Fast RVM
F1 80.23 85.12 92.53 59 34 20
F2 78.69 81.43 86.33 52 41 28
F3 73.32 80.54 83.78 48 35 19
F4 80.63 87.28 97.56 45 28 16
157
The accuracy of the SVM, RVM, and Fast RVM can be shown in Figure 5.2. The proposed Fast RVM has high accuracy when compare with other methods.
Figure 5.2 Accuracy for proposed Fast RVM
In Figure 5.3 represents the comparison of existing SVM, RVM, and proposed Fast RVM. The processing time is very low and the exact performance are rendering by using Fast RVM.
Figure 5.3 Processing Time for proposed Fast RVM
0102030405060708090
100
F1 F2 F3 F4
Acc
urac
y (%
)
Feature Set
SVM
RVM
Fast RVM
0
10
20
30
40
50
60
F1 F2 F3 F4
Proc
essi
ng T
ime
(Sec
ond)
Feature Set
SVM
RVM
Fast RVM
158
During the testing phase, the test values are compared with the
manual knowledge base for each type of blood cell. The classification
efficiency is measured for three feature sets namely F1, F2, and F3. Out of the
85 sample data, 62 samples are considered as training data and remaining 23
samples as testing data.
The comparison is made between the proposed Fast RVM and
standard RVM for testing efficiency. In the comparison given in Table 5.3 the
three feature selections are mentioned and taken as F1, F2, and F3.
Table 5.3 Testing efficiency Comparison
Feature Activation Function Testing Efficiency
RVM Fast RVM
F1
Unipolar sigmoid 67% 85%
Bipolar sigmoid 65% 83%
Radial basis kernel 63% 81%
F2
Unipolar sigmoid 69% 89%
Bipolar sigmoid 67% 88%
Radial basis kernel 68% 86%
F3
Unipolar sigmoid 68% 90%
Bipolar sigmoid 70% 91%
Radial basis kernel 61% 89%
These feature selections are based on area, perimeter, convex
length, diameter, and number of lobes. Activation functions are taken place
for this comparison and they are unipolar sigmoid, bipolar sigmoid, and radial
basis kernal. A testing efficiency of up to 85% is obtained for feature set F1
and around 89% in case of feature set F2. Comparing F1 and F2 sets, F3 gave
the maximum efficiency up to 91% for the proposed RVM. In this case,
exiting ELM gives the result of testing efficiency of up to 67% is obtained for
159
feature set F1 and around 69% in case of feature set F2. Comparing F1 and F2
sets, F3 gave the maximum efficiency up to 70%. From the above table it is
clearly noticed that the proposed method of Fast RVM gives the better result
than standard RVM. Therefore, overall testing efficiency of the proposed Fast
RVM has given better result in F3.
5.9 SUMMARY
A medical result support system known as Leuko has been
developed for leukemia diagnosis using a Naive Bayes classifier. The scheme
is able to distinguish six types of white blood cells (WBC), including a
malignancy. This research examines the use of Fast Relevance Vector
Machines (FRVMs) classifiers to identify WBC for future leukemia
diagnosis. Since RVMs are initially designed for the explanation of two class
problems, a number of strategies for their addition to this multiclass task are
examined and compared. The planned method uses discriminative shape,
colour and texture features, which evidently contains information for better
discrimination of bone marrow cells. Further, feature selection methods,
based on mutual information distribution and recursive feature removal along
with Fast Relevance Vector Machines (FRVM) are used for effective
classification. The results are analyzed and 93% has been achieved.