A Hybrid Approach for Automated Detection of Lung Nodules in Ct Images

Embed Size (px)

Citation preview

  • 7/30/2019 A Hybrid Approach for Automated Detection of Lung Nodules in Ct Images

    1/4

    A HYBRID APPROACH FOR AUTOMATED DETECTION OF

    LUNG NODULES IN CT IMAGESJ. DEHMESHKI, X. YE, M. V. CASIQUE, XY. LIN

    Medicsight PLC, 46 Berkeley square, London, WIJ 5AT, UK

    AbstractThis paper presents a novel shape based Genetic AlgorithmTemplate Matching (GATM) method for the automated detectionof lung nodules. The GA process is employed as an optimisation

    method to effectively search for the location of nodule candidateswithin the lung area. To define the fitness function for GATM, 3D

    geometric shape feature is calculated at each voxel and thencombined into global nodule intensity distribution. Lung nodule

    phantom images are used as reference images for templatematching. The proposed method has been validated on 70 clinicalthoracic CT scans that contain 178 nodules as a gold standard. 151nodules were detected by the proposed method, a detection rate of

    85%, with the number of False Positives (FP) at approximately14.0/scan. This high detection performance provides a good basisfor a Computer-Aided Detection (CAD) system for lung nodules.

    1. INTRODUCTION

    Lung cancer is the most common cause of cancer death [1]. Earlydetection and treatment of lung cancer can significantly improvethe long term health of those inflicted with it. Nodules can bemissed due to low relative contrast, small size, or location of thenodule within an area of complicated anatomy. Recently,researchers have developed a number of computer-aided lungnodule detection methods to aid radiologists in identifying nodulecandidates from CT images. The approaches can be divided intotwo groups: intensity based [2, 3] and model based detection

    methods [4, 5, 6].Although much of the effort was devoted to the Computer-

    Aided Detection (CAD) of lung nodules, lung CAD system stillremains an ongoing research task and should be improved further

    [7]. One of the major difficulties that should be tackled is to detectnodules which are adjacent to anatomical structures such as blood

    vessels or the chest wall when they have very similar X-rayattenuation and appearance in individual cross-sectional CT imagesor to detect nodules which are in non-spherical shapes.

    To tackle this problem, a new hybrid approach has beendeveloped which is based on the shape-based Genetic AlgorithmTemplate Matching (GATM). 3D local shape information iscombined into global nodule intensity distribution for fitnesscalculation of GA process. Furthermore, new definition forchromosome is proposed which includes directional information.Lung nodule phantom images are used as references for templatematching instead of synthetic Gaussian template suggested in [4].From the experimental results shown in section 3, the proposedmethod is robust to the templates, and also is able to detect non-spherical nodules with local spherical elements. Details of

    proposed method are described in following sections.

    2. METHOD

    As it is well known, a typical CAD system for lung nodule

    detection consists of three major phases. The first phase deals withdetection of all potential nodules (objects). Then important features

    of each object will be extracted in second phase. The extractedfeatures, in third stage, are incorporated into a classifier to reduceFP objects (normal tissues). The overall performance of a CADsystem depends on performances of each individual phase.Typically, the most challenging aspect of a CAD system is the first

    phase, object detection. In this paper, we focus on the first phaseaiming at developing a method to detect most of the nodules

    candidates while introducing only a few FP objects into nextphases.

    Figure 1 provides a flow diagram outlining the key steps in theproposed approach. The lung area is firstly extracted by using anadaptive thresholding method followed with a rolling ballalgorithm [8]. Further processing of nodule candidates detection iscarried out in the segmented lung region. Rules based filtering isthen used to remove easily dismissible FP such as joint of vessels.The main focus of this study, shape based GATM, is shown in

    boldface in the figure.

    CT Lung Image

    Lung Extraction

    Shape-based GATMProcess

    Rules-based Filtering forFP Reduction

    Nodule candidates

    Figure 1 Flow diagram of proposed nodule detection system

    1.1. Shape based GA template matching

    The GATM process is used as an optimisation method to determinethe target position of the nodule candidates within the lung area.

    Compared to linear searching, the advantage of using GA

    searching method is its stochastic optimisation characteristics,which simulates the evolution processes such as natural selectionand the genetic modifications.

    Three key issues in proposed shape based GATM are: (a) howto define fitness function considering the shape information andnodule intensity distribution; (b) how to design the chromosome;and (c) how to create template images. In the following sections,

    5060-7803-9577-8/06/$20.00 2006 IEEE ISBI 2006

  • 7/30/2019 A Hybrid Approach for Automated Detection of Lung Nodules in Ct Images

    2/4

    new definitions are proposed to obtain high performance of lungnodule detection.

    1.1.1. Lung phantom images as references for template matching

    In [4], Gaussian templates are used as reference images for thetemplate matching. This is based on one assumption that CT valuedistribution of the nodules can be approximated using the Gaussianimage model. But it might not be true in some cases due to partialvolume effect in CT imaging. Also, based on our experience, thismethod is very sensitive to the parameters used for the models. Inour work, QRM lung nodule phantom images are used as referenceimages. QRM Lung Phantom (QRM, Moehrendorf, Germany) [9]is a standard synthetic device created to mimic human lung for CTscanning. It includes spherical objects similar to lung nodules withknown dimensions in various positions. The plastics used in thissemi-anthropomorphic phantom mimic the tissues in the lung withrespect to density and attenuation characteristic.

    The QRM lung phantom was scanned at 0 degree angle using a16-slice MDCT (GE LightSpeed) scanner with slice thickness at1.25mm, reconstruction pitch at 0.562. Eight nodule models withsizes ranging from 3mm to 20mm were created based on the CTlung phantom image. Figure 2 shows largest cross-section for each

    model. The advantage of using lung phantom nodule images asreferences is because of their better simulation to the real nodulecompared to the synthetic Gaussian models suggested in [4].

    Figure 2 Eight lung nodule phantom images (ranging from

    3mm to 20mm) as reference images (Largest cross-section)

    1.1.2. Definition of chromosome

    Each chromosome consists of 6 genes that are in the form of bit-strings. The first three genes in the chromosome represent thegeometric location (x, y, z) of one nodule candidate. The fourthgene indicates the moving direction of the nodule candidate in thesearch space. The fifth gene represents the moving distance alongthe direction specified by the fourth gene. The new nodulecandidate location obtained from the movement specified byrelevant genes is used for fitness evaluation. The last gene choosesone of the template images as reference for template matching

    processing.Figure 3 illustrates an example of chromosome. A synthetic 2D

    image is given in Figure 3 (a), which has an object consisting of 8pixels shown in black. The chromosome shown in Figure 3 (e)represents the pixel (3,3) highlighted by a circle in Figure 3 (a), in

    which the first two genes 011 and 011 represents the x and ylocation (In this example, Z coordinates is always 000 since thesynthetic image is only 2D). Assuming there are 8 templates, 8different moving magnitudes and 6 possible searching directionsdepicted in Figure 3 (b), (c) and (d), respectively, the fourth gene isencoded as 000 which means the North direction is chosen; whilethe moving distance is 2 pixels as the gene 5 is encoded as 001.Consequently, the pixel highlighted with a triangle is selected as acandidate for fitness evaluation. In this example, the third templateis selected as the code for gene 6 is 010.

    The incorporating of direction information into thechromosome design improves of intensification property of

    proposed GA method. In contrast to the chromosomes havinggenes representing static locations, the design of chromosome withthe embedded directional feature enables the convergence to theoptimal solution more efficiently.

    (a) (b)

    (c) (d)

    Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene6

    xcoordinate

    ycoordinate

    zcoordinate

    Direction Distance Template

    0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0

    (e)

    X

    0 1 2 3 4

    Template

    T1: 000 T5: 100T2: 001 T6: 101T3: 010 T7: 110

    T4: 011 T8: 111

    0

    1

    Y2

    3

    4

    Back(Next slice)

    100

    Distance0 pixel: 000

    2 pixel: 001

    8 pixel: 010

    12 pixel: 011

    20 pixel: 100

    26 pixel: 10132 pixel: 110

    40 ixel: 111

    North

    000

    Figure 3 (a) Synthetic image; (b) coding of gene for direction;

    (c) coding of gene for moving distance; (d) coding of template;

    (e) an examples of chromosome.

    1.1.3. Definition of fitness function

    Due to the fact that an isolated nodule or a nodule attached to a

    blood vessel is either depicted as a sphere or has some sphericalelements, while a blood vessel is seen to be oblong, a 3D geometryfeature can be used to distinguish nodules from adjoining bloodvessels.

    The volumetric shape index is a measure of local shape, whichis based on two principal curvatures defined in Equation 1.

    pKpHpHpk 21 ,

    pKpHpHpk 22[1]

    where pK and pH are the Gaussian and mean curvatures [10].Based on these principal curvatures, the volumetric shape

    index SI(p) for each voxel (p) is defined as [11]:

    pkpk

    pkpkpSI

    21

    21arctan1

    2

    1

    S

    [2]

    Every distinct shape, except for the plane, corresponds to aunique value ofSI, for example, SI is 1 for the sphere-like shape,and 0.75 for the cylinder-like shape. Based on the definition, theshape index directly characterizes the topological shape of an iso-surface in the vicinity of each voxel without explicitly calculatingthe iso-surface.

    The shape index encodes 3D local shape information at eachvoxel, which is a very attractive feature for fitness calculation ofGA process to separate nodules from blood vessels. The fitness ofthe chromosome is then defined as the similarity measurement

    West

    011

    East

    001

    South010

    Front(Previous slice)

    101

    507

  • 7/30/2019 A Hybrid Approach for Automated Detection of Lung Nodules in Ct Images

    3/4

    between the selected reference image and the extracted sub-imagewhose centre is determined by the chromosome and whose size isthe same as that of the selected reference image:

    uu

    1

    0

    21

    0

    2

    1

    0,

    )()(

    )()(

    n

    i bi

    n

    i ai

    n

    i biaii

    ba

    mbma

    mbmaSf

    1

    0

    1 n

    i

    ia anm ,

    1

    0

    1n

    i

    ib bnm

    [3]

    where is the total number of pixels in the image; and are

    the intensity values of i

    n ia ibth pixel in the sub-image and the selected

    reference image, respectively; is the shape index value of iiSth

    pixel of sub-image, which is obtained from Equation 2.

    It can be seen that without the shape index , Equation 3

    becomes normal cross-correlation similarity measurement. Asshape index measures 3D local feature, different shape indexvalues are given for the sphere-like nodule and cylinder-like bloodvessel. By combining this feature into the similarity calculation, thefitness function not only depends on global CT intensitydistribution but also 3D local geometry feature. The high value of

    shape index (such as = 1 for the sphere-like shape) gives higher

    weighting to the similarity measurement; while the elongation

    shape (such as = 0.75 for the cylinder-like shape) has lower

    weighting factor.

    iS

    iS

    iS

    Figure 4 (a) shows one sub-image of a nodule attached to ablood vessel. Both of the nodule and the blood vessel have verysimilar image intensity. Figure 4 (c) and (d) are the shape indexmap and its highlighted shape index values, it can be seen thatshape index values in the nodule region are much higher than thatof the blood vessel area. By assigning these shape index values tofitness function, the nodule can have higher weighting than that ofthe blood vessel. Figure 4 (b) shows different fitness valuescalculated based on the cross-correlation and shape index weightedcross-correlation on 5 sampled points (A-E) indicated in (a). It isnoted that by using shape index as weighting, the difference of the

    fitness values between the blood vessel and the nodule is higherthan that of using cross-correlation. As a result, nodule can beeasily distinguished from the adjoining blood vessel by using shapeindex weighted fitness function.

    1.1.4. GA processing on CT lung nodule detection

    During the initialisation step, the genes related to nodule candidatelocation (x, y, z) and template images are randomly generated foreach chromosome within initial population. The genes responsiblefor directional information are preset to zero, which means nomovement required. The first evaluation of chromosome fitness isthen based on initial location randomly generated. In the proposedGA process, we use 6 possible searching directions and 8 differentmoving lengths as shown in Figure 3 (c) and (d). Eight lung nodule

    phantom images are used as reference images shown in Figure 2,

    resulting in 3 bits length for the sixth gene. The number ofpopulation is set to be 1% of total voxel number within lung area.The maximum generation is set to be 200.

    After initialisation, the GA evolving process begins throughconsecutive generations. For each generation, the fitness of whole

    population is evaluated using shape based modified cross-correlation defined in Equation 3. GA operations are then applied,including selection, crossover and mutation. Roulette wheelselection is used to select the parents for crossover operation. 70%of the population with lower fitness are replaced by new

    One-point crossover is employed in the proposed system, while themutation rate is 5%. Finally, after the process reaches themaximum generation, the chromosomes whose fitness values aregreater than a pre-defined value are considered as nodulecandidates. The pre-defined value is decided experimentally which

    provides the best overall performance.The input images are partitioned w

    individuals that are produced by crossover and mutation operation.

    ithin minimum rectangular

    encompassing the lung area and the GATM process introducedabove is performed on each partition individually. The final resultsare the union of nodule candidates from each partition. From ourexperiments, by partitioning the lung area and applying GAindividually, both sensitivity and specificity performance ofdetection system are improved compared with applying GA on

    whole lung area as searching space.

    (a) (b). 7 4 0.75 0.76 0.77 0.85 0.66 0.55 0.51 0.52 0.54

    0.74 0.76 0.76 0.77 0.77 0.76 0.67 0.59 0.60 0.64

    0.73 0.74 0.75 0.76 0.74 0.70 0.68 0.67 0.67 0.67

    0.69 0.72 0.73 0.72 0.60 0.68 0.75 0.76 0.80 0.83

    0.36 0.64 0.69 0.69 0.67 0.88 0.87 0.82 0.85 0.85

    0.18 0.25 0.63 069 0.72 0.83 0.95 0.90 0.88 0.91

    0.22 0.23 0.34 0.68 0.75 0.82 0.88 0.93 0.92 0.95

    0.36 0.32 0.36 0.50 0.81 0.85 0.89 0.92 0.96 0.96

    (c) (d)

    Figure 4 (a) Sub-image of nodule adjoining to blood vessel with

    corr on

    1.2. Rules based filtering to eliminate FP

    hing process, the

    l mask iscon

    The proposed shape based GA a

    Point label

    SI Effect

    ss

    FitneA

    B CD

    E

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    A B C D E

    Without SI Weighting

    Vessel

    Nodule

    With SI Weighting

    similar intensity; (b) Fitness curves by using the cross-

    elation and shape index (SI) weighted cross-correlation

    5 sampled points; (c) Shape index map; (d) Shape index values

    for nodule and blood vessel

    After shape-based GA template matcchromosomes with fitness values higher than a pre-definedthreshold are kept as potential nodule candidates. Simple rules

    based filtering is used to remove easily dismissible FP.For each 3D nodule candidate point, a sphericastructed based on the template image selected by the

    corresponding chromosome (gene 6). The average HounsfieldUnits (HU) value is calculated as threshold for segmentation withinthe spherical mask. Six shape features of segmented objects arethen calculated, namely, effective diameter, elongation andsphericity, compactness, maximum HU, minimum HU. A rules-

    based filtering is applied based on these calculated features. The

    parameters are determined experimentally. As this study focuses ondetecting most of the nodules while introducing only a few FPobjects into next phases for feature calculation and classification,more advanced filtering methods are being investigated.

    3. EXPERIMENTAL RESULTS

    TM approach was applied todatabase of 70 thoracic CT scans from 3 different hospitals. Eachscan was read by three thoracic radiologists to produce a goldstandard of 178 nodules. Slice thickness varied from 0.5mm to

    508

  • 7/30/2019 A Hybrid Approach for Automated Detection of Lung Nodules in Ct Images

    4/4

    1.25mm and the total slice number for each scan varied from 79 to433 with an average of 240 per-scan. Dose is ranged from 60mA to325mA. Table 1 shows the experiment results based on differentfitness functions for GATM. By using the proposed method, 151 ofthe 178 (85%) nodules were detected. The average of FP is14.0/scan (0.06/slice). It can be seen that the performance ofnodule detection can be significantly improved by combining local

    shape feature calculation into global cross-correlation frameworkfor GATM. Compared with results by only using cross-correlationcoefficient as similarity measurement, the sensitivity is improved

    by 13.3% (from 75% to 85%), while FP can be reduced by 51%.Figure 5 shows examples of the detected nodules.

    Figure 6 shows examples of nodules detected by using shapeind

    hape index characterizes the localgeo

    shape index actor with nodule

    NCES

    [1] R. Greenlee, T. Nurray

    G. Armato, M. L. Giger and H. MacMahon, Automated

    g, et al. Automatic

    ujita, et al. Automated detection of

    ex as a weighting factor for the fitness function. However, thesenodules with non-spherical shapes or attached to vessels withsimilar intensity were missed if the fitness function is defined bycross-correlation coefficient only. Figure 7 shows normal vessels

    eliminated from nodule candidates by using proposed shape basedGATM, which were wrongly identified as nodules by cross-

    correlation GATM method.As mentioned before, smetric feature which favors regions with high spherical

    elements, higher fitness value is obtained when the local sphericalelements matches to one of the templates. This is the main reasonthat the proposed algorithm is able to detect non-spherical nodules

    but with high spherical local elements. But nodules can still bemissed if there are no spherical local elements or the size of theelements not matching to any of the templates. Examples of missednodules are shown in Figure 8. Most of these missed nodules areeither in irregular shapes close to chest wall or Ground GrassOpacity (GGO) nodules with very low contrast.

    4. CONCLUSIONS

    By combining as a weighting fintensity distribution in fitness function for GATM, the proposedalgorithm significantly improved the detection sensitivity and theFP reduction performance, compared to the GATM using cross-correlation as similarity measurement. The experimental resultsindicate the nodule detection rate of 85%, with FP 14.0/scanapproximately. The new definition of GA chromosome andemployment of lung nodule phantom template make GA processconverge to the optimal solutions more efficiently. Somechallenging nodules such as non-spherical nodules or nodulesattached to pulmonary vessels with similar intensity can beidentified, with a lower rate of FP. Most of the normal tissues suchas blood vessels, sternum type, apical scarring, etc can beeliminated from nodule candidates.

    5. REFERE

    , S. Bolden and P. Wingo, Cancerstatistics 2000, CA: Cancer Journal Clinicians, vol.50, pp.7-33,

    2000.[2] S.detection of lung nodules in CT scans: Preliminary results,

    Medical Physics, vol.28, pp.1552-1561, 2001.[3] B. Zhao, G. Gamsu, M. S Ginsberg, L. Jian

    detection of small lung nodules on CT utilizing a local densitymaximum algorithm, Journal of Applied Clinical Medical

    Physics, vol.4, no.3, 2003.[4] Y. Lee, T. Hara, H. F

    pulmonary nodules in helical CT images based on an improved

    template-matching technique, IEEE Transactions on MedicalImaging, vol.20, pp.595-604, 2001.[5] Z.Y. Ge, B. Sahiner, H. P. Chan, et al. Computer-aideddetection of lung nodules: False positive reduction using a 3Dgradient field method and 3D ellipsoid fitting, Medical Physics,vol.32, pp.2443, 2005.[6] M. S. Brown, M. F. McNitt-Cray, J. G. Golldin, et al. Patient-

    Specific Models for Lung Nodule Detection and Surveillance inCT Images, IEEE Trans. Medical Imaging, vol.20, no.12,pp.1242-1250, 2001.[7] J. M. Goo, Computer-Aided Detection of Lung Nodule onChest CT: Issues to be Solved before Clinical Use, Journal ofRadiology, vol.6, pp. 62-63, 2005.[8] R Gonzalez, Digital Image Processing, Prentice Hall, 2003[9] http://www.qrm.de[10] O. Faugeras, ThreeDimensional Computer Vision: Ageometric view-point, Cambridge, MA: MIT press, 1993.

    [11] J. Dehmeshki, X. Ye, J. Costello, Shape based regiongrowing using derivatives of 3D medical images: application to

    semi-automated detection of nodules, ICIP, pp.1085-1088, 2003.

    Nodules

    missed

    Detection

    rate

    FP per

    scanGATM based on fitness

    function with cross-correlation only

    44 75% 29.0

    Proposed Shape-based

    GATM27 85% 14.0

    Table 1 Experiment results based on different fitness functions

    on a database of 70 CT scans with 178 nodules and an average

    slice number 240 per-scan

    Figure 5 Examples of nodules detected by the proposed method

    Figure 6 Non-spherical nodules with spherical elements

    detected by using shape index in fitness function for GATM

    Figure 7 Normal tissues (FP) eliminated from nodule

    candidates by using shape index in fitness function for GATM

    Figure 8 Nodules still missed by the proposed method

    509