12
ORIGINAL PAPER Computer Vision Approach to Morphometric Feature Analysis of Basal Cell Nuclei for Evaluating Malignant Potentiality of Oral Submucous Fibrosis M. Muthu Rama Krishnan & Mousumi Pal & Ranjan Rashmi Paul & Chandan Chakraborty & Jyotirmoy Chatterjee & Ajoy K. Ray Received: 1 October 2010 / Accepted: 24 November 2010 / Published online: 9 December 2010 # Springer Science+Business Media, LLC 2010 Abstract This research work presents a quantitative approach for analysis of histomorphometric features of the basal cell nuclei in respect to their size, shape and intensity of staining, from surface epithelium of Oral Submucous Fibrosis showing dysplasia (OSFD) to that of the Normal Oral Mucosa (NOM). For all biological activity, the basal cells of the surface epithelium form the proliferative compartment and therefore their morphometric changes will spell the intricate biological behavior pertaining to normal cellular functions as well as in premalignant and malignant status. In view of this, the changes in shape, size and intensity of staining of the nuclei in the basal cell layer of the NOM and OSFD have been studied. Geometric, Zernike moments and Fourier descriptor (FD) based as well as intensity based features are extracted for histomorphometric pattern analysis of the nuclei. All these features are statistically analyzed along with 3D visualization in order to discriminate the groups. Results showed increase in the dimensions (area and perimeter), shape parameters and decreasing mean nuclei intensity of the nuclei in OSFD in respect to NOM. Further, the selected features are fed to the Bayesian classifier to discriminate normal and OSFD. The morphometric and intensity features provide a good sensitivity of 100%, specificity of 98.53% and positive predicative accuracy of 97.35%. This comparative quantitative character- ization of basal cell nuclei will be of immense help for oral onco-pathologists, researchers and clinicians to assess the biological behavior of OSFD, specially relating to their premalignant and malignant potentiality. As a future direction more extensive study involving more number of disease subjects is observed. Keywords Oral submucous fibrosis . Dysplasia . Cellular pleomorphism . Nuclear pleomorphism . Microscopic image analysis . Feature extraction . Zernike moments . Parabola fitting . Color deconvolution . Fuzzy divergence Introduction Globally oral and pharyngeal cancer is the sixth leading epithelial malignancy [1]. In recent years there has been an increase in the prevalence of cancers of the oral cavity and each year more than 0.3 million new cases of oral cancer are reported [2]. This high incidence of oral cancer is corrobora- tive to late diagnosis of potential oral precancerous lesions and conditions [3]. Unlike cancers of other parts of the body, oral cancer usually develops from preexisting precancerous oral lesions. The common oral precancerous lesions and conditions are leukoplakia, erythroplakia and oral sub-mucous fibrosis (OSF) etc. Oral leukoplakia affects the people from across the globe, while oral submucous fibrosis primarily affects people of Indian subcontinent, because of their habit of chewing betal nut and allied tobacco products. OSF is an insidious, chronic, progressive, scaring precancerous condition of the oral cavity and oropharynx [4]. The usual characteristic features of the M. Muthu Rama Krishnan : C. Chakraborty (*) : J. Chatterjee School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India e-mail: [email protected] M. Pal : R. R. Paul Department of Oral and Maxillofacial Pathology, Guru Nanak Institute of Dental Sciences and Research, Kolkata, India A. K. Ray Department of Electronics & Electrical Communication Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India J Med Syst (2012) 36:17451756 DOI 10.1007/s10916-010-9634-5

Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

ORIGINAL PAPER

Computer Vision Approach to Morphometric FeatureAnalysis of Basal Cell Nuclei for Evaluating MalignantPotentiality of Oral Submucous Fibrosis

M. Muthu Rama Krishnan & Mousumi Pal &Ranjan Rashmi Paul & Chandan Chakraborty &

Jyotirmoy Chatterjee & Ajoy K. Ray

Received: 1 October 2010 /Accepted: 24 November 2010 /Published online: 9 December 2010# Springer Science+Business Media, LLC 2010

Abstract This research work presents a quantitative approachfor analysis of histomorphometric features of the basal cellnuclei in respect to their size, shape and intensity of staining,from surface epithelium of Oral Submucous Fibrosis showingdysplasia (OSFD) to that of the Normal Oral Mucosa (NOM).For all biological activity, the basal cells of the surfaceepithelium form the proliferative compartment and thereforetheir morphometric changes will spell the intricate biologicalbehavior pertaining to normal cellular functions as well as inpremalignant and malignant status. In view of this, the changesin shape, size and intensity of staining of the nuclei in the basalcell layer of the NOM and OSFD have been studied.Geometric, Zernike moments and Fourier descriptor (FD)based as well as intensity based features are extracted forhistomorphometric pattern analysis of the nuclei. All thesefeatures are statistically analyzed along with 3D visualization inorder to discriminate the groups. Results showed increase in thedimensions (area and perimeter), shape parameters anddecreasing mean nuclei intensity of the nuclei in OSFD inrespect to NOM. Further, the selected features are fed to theBayesian classifier to discriminate normal and OSFD. The

morphometric and intensity features provide a good sensitivityof 100%, specificity of 98.53% and positive predicativeaccuracy of 97.35%. This comparative quantitative character-ization of basal cell nuclei will be of immense help for oralonco-pathologists, researchers and clinicians to assess thebiological behavior of OSFD, specially relating to theirpremalignant and malignant potentiality. As a future directionmore extensive study involving more number of diseasesubjects is observed.

Keywords Oral submucous fibrosis . Dysplasia . Cellularpleomorphism . Nuclear pleomorphism .Microscopic imageanalysis . Feature extraction . Zernike moments . Parabolafitting . Color deconvolution . Fuzzy divergence

Introduction

Globally oral and pharyngeal cancer is the sixth leadingepithelial malignancy [1]. In recent years there has been anincrease in the prevalence of cancers of the oral cavity andeach year more than 0.3 million new cases of oral cancer arereported [2]. This high incidence of oral cancer is corrobora-tive to late diagnosis of potential oral precancerous lesions andconditions [3]. Unlike cancers of other parts of the body, oralcancer usually develops from preexisting precancerous orallesions. The common oral precancerous lesions and conditionsare leukoplakia, erythroplakia and oral sub-mucous fibrosis(OSF) etc. Oral leukoplakia affects the people from across theglobe, while oral submucous fibrosis primarily affects peopleof Indian subcontinent, because of their habit of chewing betalnut and allied tobacco products. OSF is an insidious, chronic,progressive, scaring precancerous condition of the oral cavityand oropharynx [4]. The usual characteristic features of the

M. Muthu Rama Krishnan : C. Chakraborty (*) : J. ChatterjeeSchool of Medical Science and Technology,Indian Institute of Technology Kharagpur,Kharagpur, West Bengal 721302, Indiae-mail: [email protected]

M. Pal : R. R. PaulDepartment of Oral and Maxillofacial Pathology,Guru Nanak Institute of Dental Sciences and Research,Kolkata, India

A. K. RayDepartment of Electronics & Electrical CommunicationEngineering, Indian Institute of Technology Kharagpur,Kharagpur, West Bengal 721302, India

J Med Syst (2012) 36:1745–1756DOI 10.1007/s10916-010-9634-5

Page 2: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

disease are discomfort, burning sensation, pain, firm & coarseoral mucosa, depapillation of tongue, recurrent patchyulceration as well as varying degrees of trismus [8, 9]. Themalignant potentiality of OSF is high too, showing epithelialdysplasia approximately in 15% of cases, while cancer isfound in at least 6% of the patients [9]. In the recent past it hasbeen reported that the prevalence of OSF has risen consider-ably in India [1].

Histopathologists, especially oral histopathologist have beenusing light microscopic images as hallmark for achievingdiagnosis of oral precancer and cancer. This process oflaboratory medicinal evaluation is primarily based on thepersonal histopathological acumen and the experience of theoral histopathologist. As a result of this individual basedanalytical process, variability in the reported diagnosis mayoccur [5–7]. The heterogeneous complex nature of the diseaseand the qualitative microscopic analysis of the cellular andnuclear pleomorphisms are usually not focused in nature andthus the microscopic evaluations make this inter observervariability [10–12]. Moreover, it is extremely difficult topredict about the malignant potentiality of a precancerouslesion on the basis of qualitative microscopic observationalone. As a result of this the researchers are actively involvedover the past three decades to develop an alternative, moreaccurate diagnostic modality through quantitative computeraided diagnosis of the microscopic images. These quantitativemodalities of microscopic image analysis will help thepathologist to overcome the individual’s limitation in diag-nostic onco-pathology to much of an extent and will make theprocess of cancer diagnosis easier [13, 14].

The cells of the oral epithelium consist of two functionalpopulations: a progenitor population (whose function is todivide and provide new cells) and a maturing population(whose cells are continuously undergoing a process ofdifferentiation or maturation to form a protective surfacelayer). The progenitor cell populations usually lie in thebasal layer of the oral epithelium [15]. Thus the biologicalchanges in the basal cells may have potent implications onthe future cell behavior, in disease process especially inprecancer and cancer. Incidentally pronounced morphomet-ric changes involving the size and shape of the basal cellsin general and their nuclei in particular, are recorded inOSF. The quantitative assessment, rather than qualitativeassessment, of these nuclear morphometric changes may bevery helpful in the diagnostic and prognostic evaluation ofmalignant potentiality of OSF to a great extent.

Interestingly, very few studies have addressed the basal cellnuclei changes in a definite quantitative manner [16, 17].Therefore, the present study aims to develop a computer aidedquantitative methodology for assessing the diagnostic andprognostic aspects of oral precancer and cancer. In this study,the size, shape and intensity of staining features of the basalcell nuclei of the normal and OSF tissues are extracted from

the respective histopathological light microscopic images andthe statistical significance of these features are validated usingvarious statistical tests viz., independent sample t-test, boxplotand density estimation to compare and corroborate thequantitative features of basal cell nuclei of normal and OSFD.

Materials and methods

Histopathology

Initially 45 study subjects were clinically diagnosed assuffering from OSF. All these patients were properlyevaluated from medical and surgical view point and wereadvised to undergo biopsy. Three of them denied, while rest42 diseased individuals agreed to the proposal. Subsequentlyincisional biopsies were performed under local anesthesiafrom the affected buccal mucosa of these 42 patients undertheir informed consent at the Department of Oral andMaxillofacial Pathology, Guru Nanak Institute of DentalSciences and Research, Kolkata, India. Normal study sampleswere also collected from the buccal mucosa of 10 healthyvolunteers without having any oral habits or any other knownsystemic diseases with prior written consent. All the abovestudy subjects were of similar age (21-40 years) and foodhabits. This study was duly approved by the ethics reviewcommittee of Guru Nanak Institute of Dental Sciences andResearch, Kolkata. All the biopsy samples were processed forhistopathological examination and paraffin embedded tissuesections of 5 μm thickness were prepared and then stained byhaematoxylin and eosin (H&E) and were evaluated subse-quently. Out of the 42 OSF subjects, only 12 of them revealedvarious grades of epithelial dysplasia.

Image acquisition

The light microscopic representative images (microphoto-graphs) especially of the basal layer of cells were opticallygrabbed from all these 10 healthy volunteers and 12 OSFDcases by Zeiss Observer.Z1microscope under 100× objectives(N.A.1.4) at School of Medical Science & Technology, IIT,Kharagpur, India. Image database for this analysis consist of1355 cells, which were extracted from 885 normal and 470OSFD images. The grabbed images were digitized at 1388×1040 pixels and stored in a computer.

Image processing

The above histopathological image grabbed by Carl Zeissmicroscope contains white and black pixels (noise) randomly.To remove the noise median filter have been used [15]. Theblock diagram of the proposed methodology for quantitativeevaluation of basal cell nuclei is as shown in Fig. 1.

1746 J Med Syst (2012) 36:1745–1756

Page 3: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

Basal cell nuclei segmentation

The maiden approach for basal cell nuclei analysis in thepresent study mainly consisted of three stages. 1) basal cellnuclei extraction, 2) feature extraction of the nuclei and 3)comparative evaluation and interpretation.

The basal nuclei extractions were performed on the basisof the following three step mechanisms i.e. a) delineation ofthe epithelio-mesenchymal (EM) junction, b) parabolafitting and watershed algorithm to segment the basal layerof cells and c) finally extraction of nuclei of the basal cells.

The delineation of the EM junction was achievedthrough morphological operations on H&E stained lightmicroscopic images by diminishing the overlying and

underlying local maxima viz., epithelial cell boundariesand variation in collagen fibers, followed by generatingbinary image using fuzzy divergence [19, 20].

Fuzzy divergence based threshold selection Fan and Xie,proposed fuzzy divergence from fuzzy exponential entropy[21]. Here the divergence concept of Fan and Xie isextended to an image, represented by a matrix. In an imageof size M × M with L distinct gray level having probabilities(p0, p1, p2, ..., pL-1), the exponential entropy is defined asH ¼ PL�1

i¼0 pie1�pi .The fuzzy entropy for an image A of size M × M is

defined as

HðAÞ ¼ 1

nffiffiffie

p � 1ð ÞXM�1

i¼0

XM�1

j¼0

mAfij� � � e1�mAfij þ 1� mAfij

� � � emAfij � 1� �

ð1ÞHere n=M2 and i,j=0,1,2,3,….,(M-1). μAfij is the member-

ship value if the pixel in the image and fij is the (i,j)th pixel ofthe image A. For two images A and B, at the (i, j)th pixel ofthe image, the information of discrimination between μA(aij)and μB(bij) of images A and B is given by [19, 20]

emAðaijÞ=emBðbijÞ ¼ emAðaijÞ�mBðbijÞ ð2Þwhere μA(aij) and μB(bij) are the membership values of the (i,j)th pixeles in images A and B, respectively. i,j=0,1,2,....,M-1.The discrimination of image A against image B may be givenas

D1ðA;BÞ ¼XM�1

i¼0

XM�1

j¼0

1� 1� mA aij� �� �

emAðaijÞ�mBðbijÞ � mA aij� �

emBðbijÞ�mAðaijÞ� �

ð3ÞLikewise the discrimination of B against A is

D2ðB;AÞ ¼XM�1

i¼0

XM�1

j¼0

1� 1� mB bij� �� �

emBðbijÞ�mAðaijÞ � mB bij� �

emAðaijÞ�mBðbijÞ� �

ð4ÞSo, total fuzzy divergence between image A and B is

obtained from Eqs. 3 and 4

DðA;BÞ ¼ D1ðA;BÞ þ D2ðB;AÞ ð5Þ

DðA;BÞ ¼XM�1

i¼0

XM�1

j¼0

2� 1� mA aij� �þ mB bij

� �� �:emAðaijÞ�mBðbijÞ � 1� mB bij

� �þ mA aij� �� �

:emBðbijÞ�mAðaijÞ� �

ð6Þ

In the method, image A is an original image and image Bis an ideally segmented image. An ideally segmented imageis defined as the image which is perfectly thresholded sothat each pixel belongs to exactly either to the object or tothe background region. In such situation, the membershipvalues for ideally segmented image of each pixel belong to

the object/background region should be equal to one. Hencethe above Eq. 6 becomes,

DðA;BÞ ¼XM�1

i¼o

XN�1

j¼0

2� 2� mAðaijÞ� �

:emAðaijÞ�1 � mAðaijÞ:e1�mAðaijÞ� �

ð7Þ

Input from normal and diseased surface epithelium

Basal layer extraction(Fuzzy divergence, morphological

operations, parabola fitting)

Basal cell nuclei segmentation(Color deconvolution, watershed,

morphological operations)

Feature extraction

Statistical analysis(t-test, box plot, density estimation)

Normal OSF with dysplasia

Bayesian classifier

Fig. 1 Block diagram of the proposed methodology

J Med Syst (2012) 36:1745–1756 1747

Page 4: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

Henceforth, in that way the divergence value of each pixelis calculated for whole image and corresponding gray level isnoted. The gray value corresponding to the minimumdivergence (Fig. 2(b)) is chosen as threshold initially forsegmenting the object (epithelium) and background (rest ofthe image) regions. In fact, the minimum divergence valueindicates the maximum belongingness of each object pixel tothe object region (epithelium) and each background pixel tothe background region (connective tissue).

Next, the edges, boundaries are extracted from thisbinary image using canny edge detector. The longest edgepresents in this image is the EM junction, which isextracted by ‘Connected component labeling’ to locate theEM junction. The abrupt variation in this edge is lessenedby filtering it with band-pass filter. The shape andorientation of this EM junction at 100× magnification canbe approximated by parabola (Fig. 3(a-b)). Parabola fittingis performed by linear regression as parabola equation is alinear model [22].

Parabola fitting Generalized equation for parabola isY ¼ aX 2 þ bX þ C. If the straight line model is inadequatefor given data set, polynomial with degree 2 i.e. parabola

may be one of the good choices as higher orderspolynomial are unstable. Polynomial equation is a linearmodel so generalized model can be used to obtain linearregression. The model for the (n+1)th order or nth degreepolynomial is

yðtÞ ¼Xnþ1

i¼1

aiXi�1 ð8Þ

In matrix form,

yðt1Þyðt2Þyðt3Þ...

yðtmÞ

2666664

3777775 ¼

1 Xt1 X 2t1

1 Xt2 X 2t2

� � � X nt1

X nt2

..

. . .. ..

.

1 Xtm X 2tm � � � X n

tm

26664

37775

aa2a3...

anþ1

266664

377775

or in shorter form,Y=Xα, where X is m × n + 1dimentionalmatrix (n<=m) and α is an n+1 column vector.let us writethe objective function for the least square estimation as

L að Þ ¼Xmi¼1

Y � Xa½ �2i¼ Y � Xa½ �T Y � Xa½ � ð9Þ

(a) (b) (c)

(d) (e) (f)

Fig. 2 a Normal gray scale image; b Plots of gray level against fuzzydivergence, for selection of the threshold value; c Thresholded imageof (a); d Morphological operation to remove small objects within the

epithelium; e Larger white area extracted using connected componentlabeling; f Extracted lower boundary using connected componentlabeling

1748 J Med Syst (2012) 36:1745–1756

Page 5: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

Which we can expand to give

L að Þ ¼ YTY � 2aTX TY þ aTX TXa ð10ÞGeometrically, the objective function defines an (n+1)

dimensional, quadratic hyper-surface, sometimes called theresponse surface, whose level curves correspond to concen-tric n-dimensional ellipsoids in the α-space. It has a uniqueglobal minimum that we can find by differentiating L (α)with respect to α and equating the result to the zero vector,

@L

@a¼ �2XTY þ 2XTXa ¼ 0

Thus, the minimizing α must satisfy the n×n system oflinear equations

XTXa ¼ XTY ð11ÞWhich are often called the normal equations. Because the

columns of X are linearly independent, the matrix product onthe left side is nonsingular, so the unique solution is

a ¼ XTX� ��1

XTY ð12ÞIf relatively small perturbations in the data produce relativelylarge perturbation in solution, we can get more numericallystable algorithm by computing an orthogonal factorizationform

X ¼ QR0

where Q is an m × m orthogonal matrix QTQ = I = QQT, R isan n × n upper triangular matrix, and 0 is an (m– n) × nmatrix of zeroes. By substituting this factorization intoEquation, we can easily verify that α satisfies the n × nupper triangular system

Ra ¼ Q1Y ð13Þwhere Q1 is the mx n matrix formed by the first n columns of[22].

Assuming the model fitted to the data is correct, theresiduals approximate the random errors. It is defined asri = Yi – Xiα, for i = t1, t2, ... tm. Therefore, if the residualsappear to behave randomly, it suggests that the model fitsthe data well. The parabola is fitted over the EM junction.

Next step is to generate ‘n’ parabola parallel to thefitted parabola for EM junction (basement membraneregion) such that the images generated from these parallelparabola overlays basal layer completely (Fig. 3(c)). Theeffective distance between two parallel parabolas at distalend and center is not same. This property of the parallelparabolas generates image mask which has higher thick-ness at center part compare to distal end and leads to oversegmentation at center part. Moreover, epithelial cellborders cannot be isolated accurately in H&E stain; itcan be estimated statistically using space partition proce-dure. Initially, the Haematoxylin plane is extracted usingcolor deconvolution [23], which has high contrast betweennuclei and cytoplasm. This enhanced nuclei with morpho-logical operations works as a marker in watershedalgorithm to segment the basal layer of cells. Here, allpartitions do not exactly contain the basal cells as some ofthem have the suprabasal cells or clump of basal cells. Thefollowing approach is adopted to classify the partition socalled pseudo cell as a basal cell or non-basal cell.

First step is to find the neighbors for all pseudo cellsfollowed by evaluation of each pseudo cell area and if itis not within threshold, then it should be merged orignored depending upon whether it is part of the cell orbackground respectively and named as ‘to be mergedcell’. Further, shape parameter compactness and varianceare evaluated for ‘to be merged cell’ and respectiveneighbor. These features are fuzzy in nature and areevaluated by trapezoid membership function. Then, ‘tobe merged cell’ is merged with highest membershipvalue. Moreover, the elimination of suprabasal layer iscarried out by extracting the lowest cell from the imagein basal layer.

(a) (b) (c)

Fig. 3 a Normal basal layer image; b Extracted epithelial lower boundary and fitted parabola; c Extracted basal layer

J Med Syst (2012) 36:1745–1756 1749

Page 6: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

Feature extraction

Geometric, moment and transformation based features ofbasal cell nuclei are extracted to analyze the changesoccurred during OSF which are described as follows.

Morphometric features

The nucleus is being separated out using color deconvolu-tion algorithm. The following features are evaluated fornucleus. a) area, b) perimeter, c) compactness, d) eccen-tricity, e) area equivalent diameter, f) perimeter equivalentdiameter, g) convex area, h) Zernike moments, i) Fourierdescriptors and j) mean nuclei intensity. Counting thenumber of pixels present in binary image of the nucleusgives the area, whereas perimeter of the nucleus has beenobtained by counting the number of boundary pixelspresent in the nucleus. Compactness is proportional to thearea of each nucleus divided by the square of perimeter[18]. Mathematically defined as

Compactness ¼ ð4� p � areaÞðperimeterÞ2 ð14Þ

Area equivalent diameter mathematically defined as

Area equivalent diameter ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi4

p� area

rð15Þ

Perimeter equivalent diameter mathematically defined as

Perimeter equivalent diameter ¼ area

pð16Þ

Elliptical approximation

Each cell nucleus has been approximated by an ellipse. Theellipse approximation has done by first fitting the cell by aminimum bounding rectangle and then circumscribing theminimum bounding ellipse over the rectangle. It can beproved that the minimum bounding ellipse has major andminor axis equal to 1.414 times the length and width of therectangle. The algorithm [24] for minimum boundingrectangle is given below.

1. The centroid of each cell is found out. This can beobtained from the mean of all the pixels in the cell.

2. The principal axis of the cell is obtained next. Principalaxis or the major axis is the line that passes through thecentre and from which the sum of squares of theperpendicular distances of all the boundary points isminimum. Using this property of principal axis itsorientation is obtained. Let θ be the angle (see Fig. 4.)of the major axis with the horizontal axis X.

The orientation θ is given by the formula below.

tan 2q ¼2Pni¼1

ðXi � X ÞðYi � Y ÞPni¼1

½ðXi � X Þ2 � ðYi � Y Þ2�ð17Þ

From the orientation of major axis we have theequation of major axis and the minor axis is perpen-dicular to the major axis.

3. From the equation of major axis we can obtain thewidth of the rectangle, by measuring the perpendiculardistance of each boundary point on the cell. The widthof rectangle will be equal to the maximum perpendic-ular distance obtained. The length of rectangle issimilarly obtained from the equation of minor axis.

Thus, the length and width of the rectangle have beenobtained and from there the major and minor axis havebeen obtained and the approximating ellipse is drawn. Theapproximating ellipse can be used to find the eccentricityfeature. Eccentricity is calculated by the following equation

Eccentricity ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 � b2

p

að18Þ

where a and b indicate major and minor axis.

Zernike moment

The Zernike polynomials were first proposed in 1934 byZernike. Their moment formulation appears to be one of themost popular, outperforming in terms of noise resilience,information redundancy and reconstruction capability.Complex Zernike moments are constructed using a set ofcomplex polynomials which form a complete orthogonal

Major Axis

Fig. 4 Angle made by the major axis with the horizontal axis

1750 J Med Syst (2012) 36:1745–1756

Page 7: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

basis set defined on the unit disc. They are expressed asTwo dimensional Zernike moment [25]:

Amn ¼ mþ 1

p

Zx

Zy

f ðx; yÞ Vmnðx; yÞ½ �»dxdy ð19Þ

where m=0,1,2,... defines order of the moment and f(x,y) isthe function being described. Here n is an integer thatdepicting the angular dependence or rotation subject to thefollowing condition:

m� jnj ¼ even; jnj � m

Now, its expression in polar coordinates is

Vmnðr; qÞ ¼ RmnðrÞ expðjnqÞ ð20ÞHere Rmn is the orthogonal radial polynomial and is defined as

RmnðrÞ ¼Xm�jnj

2

s¼0

ð�1ÞsFðm; n; s; rÞ ð21Þ

And Fðm; n; s; rÞ ¼ ðm� sÞ!s! m�jnj

2 � s� �

! m�jnj2 þ s

� �!rm�2s ð22Þ

For a discrete case such as image, if p(x,y) is the currentpixel, the expression of Zernike moment becomes

Amn ¼ mþ 1

p

Xx

Xy

pðx; yÞ Vmnðx; yÞ½ �» ð23Þ

To calculate the Zernike moment, the image is firstmapped to the unit disc using polar coordinates, where thecentre of the image is the origin of the unit disc. Thosepixels falling outside the unit disc are not used in thecalculation. The coordinates are then described by thelength of the vector from the origin to the coordinate point rand the angle from the x-axis to the vector r. Zernikemoments are rotation invariants but not invariants to scalingand translation. Scaling and translation invariant can beachieved by transforming the pixel coordinate usingfollowing rule before applying Zernike moment.

h x; yð Þ ¼ fx

aþ x;

y

aþ y

� �where a ¼

ffiffiffiffiffiffiffiffibm00

rð24Þ

and ; x ¼ m10

m00; y ¼ m01

m00

Here, m01 m00, m10 are the regular moments.

mpq ¼Xx

Xy

xpyqf ðx; yÞ ð25Þ

Translation invariance is achieved by moving the originto the image object center, causing m01=m10=0. Followingthis, scale invariance is produced by altering each object so

that its area (or pixel count for a binary image) is m00=β,where β is a predetermined value [22].

Fourier descriptors

In any image (xi, yi) where i=1,2..., K represents the edgepoints of an object, Fourier descriptors of that edge can berepresented by the following approach. Each point can betreated as a complex number [18] so that

sðkÞ ¼ xi þ jyi ð26ÞNow the DFT of s(k) is

aðuÞ ¼XK�1

k¼0

sðkÞe�j2puk=K ð27Þ

If we consider length of DFT of any sequence is same asoriginal sequence, the total number of the descriptors variesas the length of the edge changes. Here the AC power ofthe Fourier descriptor is computed as follows

PAC ¼X

u 6¼0;v 6¼0

F2R u; vð Þ þ F2

I u; vð Þ� � ð28Þ

where FR(u,v) and FI(u,v) are real and imaginary parts ofthe Fourier transform of the image respectively, and u and vare the frequencies along the x and y axes of the image,respectively. Fourier descriptors are not invariants toscaling and translation. Scaling and translation invariantcan be achieved using Eqs. 24 and 25.

Hyperchromatism

Hyperchromatism represents intense staining of the nuclei[26]. It is an important characteristic feature appearing in amalignant cell. In the case of severe epithelial dysplasia,chromatin abnormality will result in increased staining ofthe nuclei, which will appear darker than its normalcounterpart. To find degree of hyperchromatism, meanintensity of the nuclear staining is calculated as follows

Mean intensity of nuclei ¼ 1

n

Xni¼1

1

Nik kX

8ðx;yÞ2Ni

Ni x; yð Þ0@

1Að29Þ

where n total number of nuclei, Ni : ith nucleus in theimage 1 ≤ i ≤ n

Statistical analysis

a) Independent sample t-testIt is mandatory to verify whether a feature or a set of

features has the discriminating capability among the

J Med Syst (2012) 36:1745–1756 1751

Page 8: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

labeled classes or not. In doing so, classical statisticalinference provides one of the well-established statisticaltests viz., independent samples t-test, which is used forcomparing the population means of two classes. Forlarge samples, the procedure often performs well evenfor non-normal populations [27]. The procedure willalso produce confidence interval estimate for thedifference of two means. A large difference betweenthe two sample means should lead us to reject the nullhypothesis H0: μ1 = μ2. In addition to this, it can alsobe meaningful to study the distribution pattern of eachof the features over the classes.

b) Kernel density estimationKernel probability density estimation methods are

based on the premise that continuous, differentiablefunctions can be exactly modelled by the infinite sumof some other, appropriately chosen, ‘kernel’ function[28]. If x1, x2 ...... xn ~ f is an independent andidentically-distributed random variables sample of arandom variable, then the kernel density approximationof its probability density function is

bfhðxÞ ¼ 1

nh

Xni¼1

Kx� xih

� �ð30Þ

where K is some kernel and h is the bandwidth(smoothing parameter). Quite often K is taken to be astandard Gaussian function with mean zero andvariance 1:

KðxÞ ¼ 1ffiffiffiffiffi2p

p e�12x

2 ð31Þ

Statistical classification using Bayesian approach

Bayesian classification [29] is basically a probabilisticapproach to any pattern classification problem with priorknowledge. Here Baysian classification algorithm is breiflydescribed.

Let us consider a d-dimensional feature vector or patternas x = [x1, x2, ...,xd]

T for classifying a pattern into any of thek classes. Baysian approach mainly deals with the compu-tation of posterior probability such that the probability ofbelonging of a pattern X to class wk, denoted by P(wk|X) byusing Bayes’ rule as:

Pðwk jX Þ ¼ pðX jwkÞ:PðwkÞPðX Þ ð32Þ

where p(X|wk) is the likelihood function for wk, 1≤k≤2indicating the distribution of feature vector over a particularclass viz., healthy tissue and OSFD. p(wk) denote a prioriprobability, which tells the probability of the class before

measuring any features. If prior probabilities are notactually known, they are estimated by the relative occur-rences. The divisor is a scaling factor to assure thatposterior probabilities are real probabilities, i.e., their sumis 1.

PðX Þ ¼XKi¼1

pðX jwiÞPðwiÞ ð33Þ

It can be shown that choosing the class of the highestposterior probability produces the minimum error probabil-ity. The key issue in the Bayesian classifier is the class-conditional probability density function p(X|wk). In practiceit is always unknown, except in some artificial classifica-tion tasks. The distribution can be estimated from thetraining set with a range of methods.

If the pattern X from miscellaneous classes can beapproximated by normal distribution, the class conditionaldistribution p(X|wk) has the form

pðX jwkÞ ¼ 1

2pð ÞM2 Wkj j12exp � 1

2ðX � mkÞTWk

�1ðX � mkÞ� �

ð34Þ

where |W|, M and μ indicate determinant of covariancematrix, number of patterns in the class and the mean vectorrespectively. The covariance matrix W is

Wk ¼ 1

M

XMi¼1

ðX i � mkÞT ðX i � mkÞ ð35Þ

where the mean vector μ can be calculated as

m ¼ 1

M

XMi¼1

X i ð36Þ

For two-class classification problem, Bayes’ decisioncan be made based on the following comparison:

If p X w1jð Þ � p X w2jð Þ then X 2 w1 else X 2 w2 ð37ÞIn this application w1 and w2 are normal and OSFD classesrespectively

Results

The basal cell boundaries are extracted (see Fig. 5) from thebasal layer of the H&E stained image as shown in Fig. 3(c).While the Fig. 6(a) and (c) show one of the extracted cellsafter performing fuzzy classification for indentifying cellfor normal mucosa and OSFD respectively. Finally thenuclei of these cells as shown in Fig. 6(b) and (d)respectively are used to extract the desired features.

The features of normal and OSFD are summarized intomean, standard deviation and their p-values (Table 1) usingindependent sample t-test. The results suggest that all

1752 J Med Syst (2012) 36:1745–1756

Page 9: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

features are significant at 5% level of significance excepteccentricity. Further, it is observed from Table 1 that mostof the features from normal and OSFD occupy disjointranges with non-overlapping spreads. However, for eccen-tricity it can be observed that the respective range overlapbetween the two classes, suggesting the obtained higherp-value.

The mean intensity of nucleus in sever dysplasia usuallyappears darker than that of normal nucleus (Fig. 7(b)&(d)),which can be inferred from the results for normal case themean intensity (gray) value is 28.30 and OSFD the meanintensity (gray) value is 19.42.

Figure 8(a) shows the box plot for one of the feature,area of nucleus, which suggests that median of the feature isalmost same as mean so neglecting the chance of outliersfor contributing the higher difference between two classes.Figure 8(b) shows the bivariate plot for perimeter of normaland OSFD cases which shows the distinct discriminationbetween the two groups and 3D scatter plot as shown inFig. 8(c) shows that the features i.e. Zernike moments,Fourier descriptors and area equivalent diameter are quietseparable. In addition, the probability density functions areestimated in order to show the likelihood of area for normaland OSFD groups (Fig. 8(d)) from discrimination point ofview. Accordingly it may be inferred that a simple linearclassifier can achieve higher accuracy. In this work we haveused Bayesian classifier to classify normal and OSFDgroups.

Considering the dataset used in this work, ten-fold crossvalidation [30] technique is employed to test the classifierswith the nine features. That is, the whole dataset is dividedinto ten parts such that each part contains approximately thesame proportion of class samples as the original dataset.Nine parts of the data (training set) are used for classifierdevelopment and the built classifier is evaluated using theremaining one part (test set). This procedure is repeated tentimes using a different part for testing in each case. The tentest classification performance metrics are then averaged,and the average test performance is declared as the estimateof the true generalization performance. The performancemetrics used in this work are sensitivity, specificity, andpositive predictive value and accuracy. Classificationaccuracy result is shown in Table 2 for morphometric andintensity features.

We have evaluated the performance of the system usingperformance measures such as sensitivity, specificity andpositive predictive accuracy. It can be seen that theproposed system yields a promising 99.04% diagnosticaccuracy. The system is evaluated to have an averagesensitivity of 100%, average specificity of 98.53% andaverage positive predictive value of 97.35% for morpho-metric and intensity features. Moreover, we have observedthe sensitivity of the system is 100% in all ten-foldsconsistently the reason may be the data is well separated.

Discussion

In the present day the gold standard for qualitative methodof assessment of malignant potentiality of oral precancerouslesions and conditions are primarily based on lightmicroscopic features of the dysplastic surface epithelium,especially overall architecture of the epithelium. Thisqualitative assessment is purely based on the clinico-pathological acumen and expertise of the oral onco-pathologist concerned and intern leads to inter observervariations. To overcome this limitation, automatic detectionof OSF with the help of classifiers has been studied over the

Fig. 5 Segmented boundaries of basal cells are superimposed on theextracted basal layer

(a) (b) (c) (d)

Fig. 6 a Segmented Normalbasal cell; b Extracted Normalbasal cell nucleus; c SegmentedDysplastic basal cell; dExtracted dysplastic basal cellnucleus

J Med Syst (2012) 36:1745–1756 1753

lomenie
Highlight
Page 10: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

years. The general procedure is to extract various featuresfrom the basal cell nuclei and to allow a classifier to makethe diagnostic prediction based on the fed features.

In this work, the results (Table 1) indicate the nuclearareas of the dysplastic cells are twice as large as that of thenormal cells. This increase in nuclear area in this study maybe biologically indicative of malignant potentiality of thetumor cells, which intern may also be correlated with theirincreased and abnormal metabolic activity [13]. Moreover,the stain taken by the nucleus is high in OSFD than that ofnormal counterpart (because, the gray value of darkerregion of the image always low as compared to brighterregion). This is due to hyperchromatism i.e., intensestaining of the basal cell nuclei. This is an importantcharacteristic of the malignant cell. These quantitativefeatures are useful for oral onco-pathologist to discriminatethe normal and OSFD groups.

Moreover, very few studies have addressed the basal cellnuclei changes in a definite quantitative manner Shabana etal. (1987) proposed a method for morphometric analysis ofbasal cell layer in oral premalignant white lesions andsquamous cell carcinoma. Satheesh et al. (2007) proposed amethod for modeling the epithelial layer cells. They studiedhow the features are varying from normal to abnormal butthey haven’t done any automated classification.

Thus, based on the above facts, we felt the necessity fora better technique that can assist the oral onco-pathologistduring OSF screening. In this work, we proposed a CAD

system for the detection of normal and OSFD groups fromhistopathological images. We have used the morphometricand intensity features in classifiers and concluded that asimple Bayesian classifier can be used for automatedclassification. Our proposed system is able to identify theunknown class with an accuracy of 99.04%. The proposedsystem can help in faster, easier, and more objectivedetection of normal and OSFD groups which is veryhelpful to the onco-pathologist for their final decision ontheir patients. By using such an efficient tool, they canmake very accurate decisions.

Conclusion

The results of this maiden research with the view to assessthe quantitative morphometric features of basal cell nucleiof dysplastic oral submucous fibrosis, especially relating tonuclear atypia (size, shape and intensity of staining), haveindicated remarkable quantitative changes, which can becorrelated to the malignant potentiality of this precancerouscondition.

The present day golden qualitative method of assessmentof malignant potentiality of oral precancerous lesions andconditions are primarily based on light microscopic featuresof the dysplastic surface epithelium, especially of theatypical basal cells. This qualitative assessment is purelybased on the clinico-pathological acumen and expertise of

Nucleus features Normal OSF with Dysplasia p-valueμ±σ μ±σ

Area 7.81±1.84 13.78±2.07 0.000*

Perimeter 9.24±1.22 12.51±1.12 0.000*

Compactness 11.12±0.87 11.43±1.01 0.000*

Eccentricity 0.89±0.12 0.88±0.14 0.095

Fourier descriptors 9.77e+012±7.484e+012 8.191e+013±5.18e+013 0.000*

Area equivalent diameter 3.13±0.39 4.18±0.31 0.000*

Perimeter equivalent diameter 2.48±0.59 4.39±0.66 0.000*

Zernike moments** (** (m=1;n=3)) 2.39±0.32 2.45±0.43 0.000*

Convex area 8.10±1.91 14.34±2.18 0.000*

Mean nuclei intensity 28.30±6.64 19.42±6.13 0.000*

Table 1 Features extractedfrom nucleus of normal andOSF basal cells

*p<0.05: statistical significance

(a) (b) (c) (d)

Fig. 7 a Segmented Normalbasal cell; b Less intense nor-mal basal cell nucleus; c Seg-mented Dysplastic basal cell; dHigh intense dysplastic basalcell nucleus

1754 J Med Syst (2012) 36:1745–1756

Page 11: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

the oral onco-pathologist concerned and in turn leads tointer observer variations. The present quantitative methodof assessment will help to overcome these limitations to agreat extent and will be of enormous biological importance

to predict the malignant potentiality of oral submucousfibrosis in a more accurate way. However, more extensivestudy involving more number of disease subjects areawaited for.

(a) (b)

(c) (d)

Fig. 8 a Box plot for area; b 3D plot for features of Normal and OSF with dysplasia cases; c Bivariate plot for perimeter of Normal and OSF withdysplasia; d Density functions of convex area of basal cell nuclei for Normal and OSF with dysplasia

Table 2 Sensitivity, specificity, positive predictive accuracy and classification accuracy obtained using the Bayesian classifier

Ten-Fold Sensitivity (%) Specificity (%) Positive predictive accuracy (%) Classification accuracy (%)

Fold#1 100 98.86 97.91 99.25

Fold#2 100 98.87 97.91 99.26

Fold#3 100 98.87 97.91 99.26

Fold#4 100 98.87 97.91 99.26

Fold#5 100 97.75 95.91 98.52

Fold#6 100 96.62 94.00 97.79

Fold#7 100 98.86 97.91 99.25

Fold#8 100 96.59 94.00 97.77

Fold#9 100 100 100 100

Fold#10 100 100 100 100

Average 100 98.53 97.35 99.04

J Med Syst (2012) 36:1745–1756 1755

Page 12: Computer Vision Approach to Morphometric Feature Analysis ...helios.mi.parisdescartes.fr/~lomn/Data/2017/MicroEnvironment/Jour… · Potentiality of Oral Submucous Fibrosis M. Muthu

Acknowledgement The authors are extremely grateful to TexasInstruments (I) Pvt. Ltd., Bangalore, INDIA for funding this work.

References

1. Gupta, P. C., Sinor, P. N., Bhonsle, R. B., Pawar, V. S., andMehta, H. C., Oral submucous fibrosis in India: A new epidemic?Natl. Med. J. India II(3):113–116, 1998.

2. Wingo, P. A., Tong, T., and Bolden, S., Cancer statistics. CACancer J. Clin. 45:8–30, 1995.

3. Aziz, S. R., Oral submucous fibrosis: An unusual disease. J. N. J.Dent. Assoc. 68:17–19, 1997.

4. Cannif, J. P., Harvey, W., and Harris, M., Oral submucousfibrosis: Its pathogenesis and management. Br. Dent. J.160:429–433, 1986.

5. Krishnan, M. M. R., Shah, P., Chakraborty, C., and Ray, A. K.,Statistical analysis of textural features for improved classificationof oral histopathological images. J. Med. Syst., doi:10.1007/s10916-010-9550-8 (Manuscript accepted, available online).

6. Krishnan, M. M. R., Pal, M., Bomminayuni, S. K., Chakraborty,C., Paul, R. R., Chatterjee, J., and Ray, A. K., Automatedclassification of cells in sub-epithelial connective tissue of oralsub-mucous fibrosis—An SVM based approach. Comput. Biol.Med. 39(12):1096–1104, 2009.

7. Krishnan, M. M. R., Shah, P., Pal, M., Chakraborty, C., Paul, R.R., Chatterjee, J., and Ray, A. K., Structural markers for normaloral mucosa and oral sub-mucous fibrosis. Micron. 41(4):312–320, 2010.

8. Pindborg, J. J., Oral cancer and precancer. John Wright and SonsLtd, Bristal, UK, pp. 15–16, 1980.

9. Neville, B. W., Damm, D. D., Allen, C. M., and Bouquot, J. E.,Oral and maxillofacial pathology, 3rd edition. Elsevier, India, pp.401–402, 2009.

10. Gilles, F. H., Tavare, C. J., Becker, L. E., Burger, P. C., Yates, A. J.,Pollack, I. F., and Finlay, J. L., Pathologist interobserver variability ofhistologic features in childhood brain tumors: Results from the CCG-945 study. Pediatr. Dev. Pathol. 11:08–117, 2008.

11. Grootscholten, C., Bajema, I. M., Florquin, S., Steenbergen, E. J.,Peutz-Kootstra, C. J., Goldschmeding, R., Bijl, M., Hagen, E. C.,Van Houwelingen, H. C., Derksen, R., and Berden, J. H. M.,Interobserver agreement of scoring of histopathological character-istics and classification of lupus nephritis. Nephrol. Dial.Transplant. 23:223–230, 2008.

12. Shuttleworth, J., Todman, A., Norrish, M., and Bennett, M.,Learning histopathological microscopy. Pattern Recognition andImage Analysis, Pt 2, Proceedings 3687:764–772, 2005.

13. Duncan, J. S., and Ayache, N., Medical image analysis: Progressover two decades and the challenges ahead. IEEE Trans PatternAnal. Mach. Intell. 22:85–106, 2000.

14. Kramer, I. R. H., Lucas, R. B., El-Labban, N. G., and Lister, L., Acomputer aided study on the tissue changes in oral keratoses andlichen planus, and an analysis of case groupings by subjective andobjective criteria. Br. J. Cancer 24:407–426, 1970.

15. Ten Cate, A. R., Oral histology: Development, structure andfunction, 5th Edn, 351-353, 2001.

16. Shabana, A. H., el-Labban, N. G., and Lee, K. W., Morphometricanalysis of basal cell layer in oral premalignant white lesions andsquamous cell carcinoma. J. Clin. Pathol. 40(4):454–458, 1987.

17. Satheesh, M., Paul, M., and Hammond, S. P., Modeling epithelialcell behavior and organization. IEEE Trans. Nanobioscience 6(1):77–85, 2007.

18. Gonzalez, R. C., and Woods, R. E., Digital image processing, 2ndedition. Prentice Hall, New York, pp. 655–659, 2002.

19. Chaira, T., and Ray, A. K., Segmentation using fuzzy divergence.Pattern Recogn Lett 24(12):1837–1844, 2003.

20. Chaira, T., and Ray, A. K., Fuzzy image processing andapplications with MATLAB. CRC Press, New York, pp. 80–81,2009.

21. Fan, J., and Xie, W., Distance measure and induced fuzzy entropy.Fuzzy Sets Syst 104:305–314, 1999.

22. Rust, B. W., Fitting nature’s basic functions part I: Polynomialsand linear least squares. Comput. Sci. Eng. 84-89, 2001.

23. Ruifrok, A. C., and Johnston, D. A., Quantification of histochem-ical staining by color deconvolution. Anal. Quant. Cytol. Histol.291-299, 2001.

24. Chaudari, D., and Samal, A., A simple method for fitting ofbounding rectangle to closed regions. Pattern Recogn. 40:1981–1989, 2007.

25. Khotanzad, A., and Hong, Y. H., Invariant Image Recognition byzernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12(5):489–497, 1990.

26. Huang, P. W., and Lai, Y. H., Effective segmentation andclassification for HCC biopsy images. Pattern Recogn 43(4):1550–1563, 2010.

27. Gun, A. M., Gupta, M. K., and Dasgupta, B., Fundamentals ofstatistics (Volume one), 5th Edn. The World press Pvt. Ltd,2005.

28. Towers, S., Kernel probability density estimation methods.Proceedings of the Advanced Statistical Techniques in ParticlePhysics. 107-111, 2002.

29. Duda, R., Hart, P., and Stork, D., Pattern classification, 2ndedition. Wiley, India, 2007.

30. http://www.cs.cmu.edu/~schneide/tut5/node42.html last accessedAugust 2010.

1756 J Med Syst (2012) 36:1745–1756

lomenie
Highlight