13
Atrial Scar Quantification via Multi-scale CNN in the Graph-cuts Framework Lei Li a,b , Fuping Wu c , Guang Yang d,e , Lingchao Xu f , Tom Wong e , Raad Mohiaddin d,e , David Firmin d,e , Jennifer Keegan d,e , Xiahai Zhuang* b a School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China b School of Data Science, Fudan University, Shanghai, China c Dept of Statistics, School of Management, Fudan University, Shanghai, China d National Heart and Lung Institute, Imperial College London, London, UK e Cardiovascular Research Center, Royal Brompton Hospital, London, UK f School of NAOCE, Shanghai Jiao Tong University, Shanghai, China Abstract Late gadolinium enhancement magnetic resonance imaging (LGE MRI) appears to be a promising alternative for scar assessment in patients with atrial fibrillation (AF). Automating the quantification and analysis of atrial scars can be challenging due to the low image quality. In this work, we propose a fully automated method based on the graph-cuts framework, where the potentials of the graph are learned on a surface mesh of the left atrium (LA) using a multi-scale convolutional neural network (MS-CNN). For validation, we have employed fifty-eight images with manual delineations. MS-CNN, which can efficiently incorporate both the local and global texture information of the images, has been shown to evidently improve the segmentation accuracy of the proposed graph-cuts based method. The segmentation could be further improved when the contribution between the t-link and n-link weights of the graph is balanced. The proposed method achieves a mean accuracy of 0.856 ± 0.033 and mean Dice score of 0.702 ± 0.071 for LA scar quantification. Compared with the conventional methods, which are based on the manual delineation of LA for initialization, our method is fully automatic and has demonstrated significantly better Dice score and accuracy (p< 0.01). The method is promising and can be useful in diagnosis and prognosis of AF. Keywords: Atrial fibrillation, Left atrium, LGE MRI, Scar segmentation, Graph-cuts, Multi-scale patch 1. Introduction Atrial fibrillation (AF) is the most common arrhyth- mia observed in clinical practice, occurring in up to 1% of the population and rising fast with advancing age (Chugh et al., 2013). Radiofrequency catheter ablation using the pulmonary vein (PV) isolation technique has emerged as one of the most common methods for the treatment of AF patients (Wilber et al., 2010; Calkins et al., 2012). Quan- tification of atrial scars is potentially beneficial in selecting candidates and guiding ablation treatment. Late gadolin- ium enhancement magnetic resonance imaging (LGE MRI) is a promising technique to visualize and quantify the atrial scars (Vergara and Marrouche, 2011). Many clinical stud- ies mainly focus on the location and extent of scarring areas of the left atrium (LA) myocardium (McGann et al., 2008; Vergara et al., 2011; Badger et al., 2010). Automatic delineation of scars from LGE MRI is still challenging due to various reasons. First, the image quality of LGE MRI is generally poor. Second, the prior model of scars is hard to construct on account of the various LA shapes, the thin wall (mean thickness of 1.89 ± 0.48 mm reported by Beinart et al. (2011)), the surrounding URL: [email protected] (Xiahai Zhuang*) enhanced regions and the complex patterns of scars in AF patients. Fig. 1 illustrates and explains the challenges in more details. To the best of our knowledge, little work has been reported in the literature to achieve the fully automatic quantification of LA scars from LGE MRI. The most widespread methods for atrial scar segmenta- tion are mainly based on thresholding (Badger et al., 2010; Karim et al., 2013). Pontecorboli et al. (2016) provided an overall review of scar segmentation using various threshold techniques. For these methods, an appropriate threshold value is decisive, but setting this value can be subjective, eventually limiting the applicability and reproducibility. Perry et al. (2012) proposed to use k-means clustering to classify the normal and fibrosis tissue from manually seg- mented LA walls. Karim et al. (2014) combined the scar intensity priors and Gaussian mixture model (GMM) to construct a cost function for scar segmentation, which was achieved by an optimization using the graph-cuts frame- work. Yang et al. (2018) employed the super-pixel method and support vector machine (SVM) to segment the atrial scars. Most of the reported methods rely on manual segmen- tation of the LA or LA walls to provide an accurate ini- tialization. In ISBI 2012 challenge (Karim et al., 2013), manual segmentation of LA was provided. There was large Preprint submitted to Medical Image Analysis February 22, 2019 arXiv:1902.07877v1 [cs.CV] 21 Feb 2019

arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

Atrial Scar Quantification via Multi-scale CNN in the Graph-cuts Framework

Lei Lia,b, Fuping Wuc, Guang Yangd,e, Lingchao Xuf, Tom Wonge, Raad Mohiaddind,e, David Firmind,e, JenniferKeegand,e, Xiahai Zhuang*b

aSchool of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, ChinabSchool of Data Science, Fudan University, Shanghai, China

cDept of Statistics, School of Management, Fudan University, Shanghai, ChinadNational Heart and Lung Institute, Imperial College London, London, UKeCardiovascular Research Center, Royal Brompton Hospital, London, UK

fSchool of NAOCE, Shanghai Jiao Tong University, Shanghai, China

Abstract

Late gadolinium enhancement magnetic resonance imaging (LGE MRI) appears to be a promising alternative for scarassessment in patients with atrial fibrillation (AF). Automating the quantification and analysis of atrial scars can bechallenging due to the low image quality. In this work, we propose a fully automated method based on the graph-cutsframework, where the potentials of the graph are learned on a surface mesh of the left atrium (LA) using a multi-scaleconvolutional neural network (MS-CNN). For validation, we have employed fifty-eight images with manual delineations.MS-CNN, which can efficiently incorporate both the local and global texture information of the images, has been shownto evidently improve the segmentation accuracy of the proposed graph-cuts based method. The segmentation could befurther improved when the contribution between the t-link and n-link weights of the graph is balanced. The proposedmethod achieves a mean accuracy of 0.856 ± 0.033 and mean Dice score of 0.702 ± 0.071 for LA scar quantification.Compared with the conventional methods, which are based on the manual delineation of LA for initialization, ourmethod is fully automatic and has demonstrated significantly better Dice score and accuracy (p < 0.01). The method ispromising and can be useful in diagnosis and prognosis of AF.

Keywords: Atrial fibrillation, Left atrium, LGE MRI, Scar segmentation, Graph-cuts, Multi-scale patch

1. Introduction

Atrial fibrillation (AF) is the most common arrhyth-mia observed in clinical practice, occurring in up to 1% ofthe population and rising fast with advancing age (Chughet al., 2013). Radiofrequency catheter ablation using thepulmonary vein (PV) isolation technique has emerged asone of the most common methods for the treatment of AFpatients (Wilber et al., 2010; Calkins et al., 2012). Quan-tification of atrial scars is potentially beneficial in selectingcandidates and guiding ablation treatment. Late gadolin-ium enhancement magnetic resonance imaging (LGE MRI)is a promising technique to visualize and quantify the atrialscars (Vergara and Marrouche, 2011). Many clinical stud-ies mainly focus on the location and extent of scarringareas of the left atrium (LA) myocardium (McGann et al.,2008; Vergara et al., 2011; Badger et al., 2010).

Automatic delineation of scars from LGE MRI is stillchallenging due to various reasons. First, the image qualityof LGE MRI is generally poor. Second, the prior modelof scars is hard to construct on account of the variousLA shapes, the thin wall (mean thickness of 1.89 ± 0.48mm reported by Beinart et al. (2011)), the surrounding

URL: [email protected] (Xiahai Zhuang*)

enhanced regions and the complex patterns of scars in AFpatients. Fig. 1 illustrates and explains the challenges inmore details. To the best of our knowledge, little workhas been reported in the literature to achieve the fullyautomatic quantification of LA scars from LGE MRI.

The most widespread methods for atrial scar segmenta-tion are mainly based on thresholding (Badger et al., 2010;Karim et al., 2013). Pontecorboli et al. (2016) provided anoverall review of scar segmentation using various thresholdtechniques. For these methods, an appropriate thresholdvalue is decisive, but setting this value can be subjective,eventually limiting the applicability and reproducibility.Perry et al. (2012) proposed to use k-means clustering toclassify the normal and fibrosis tissue from manually seg-mented LA walls. Karim et al. (2014) combined the scarintensity priors and Gaussian mixture model (GMM) toconstruct a cost function for scar segmentation, which wasachieved by an optimization using the graph-cuts frame-work. Yang et al. (2018) employed the super-pixel methodand support vector machine (SVM) to segment the atrialscars.

Most of the reported methods rely on manual segmen-tation of the LA or LA walls to provide an accurate ini-tialization. In ISBI 2012 challenge (Karim et al., 2013),manual segmentation of LA was provided. There was large

Preprint submitted to Medical Image Analysis February 22, 2019

arX

iv:1

902.

0787

7v1

[cs

.CV

] 2

1 Fe

b 20

19

Page 2: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

LALA

(a)

LA LA

(b)

LALA

RA

RA

(1)

(2)(3)

(c)

Figure 1: The challenges of automatic delineation of scars from LGE MRI: (a) two typical LGE MRIs with poor quality; (b) thin atrial wallshighlighted using bright white color in the figure; (c) surrounding enhanced regions pointed out by the arrows, where (1) and (2) respectivelyindicate the enhanced walls of descending and ascending aorta, (3) denotes the enhanced walls of right atrium.

variance in terms of segmentation accuracy, especially forthe pre-ablation cases, and the teams using manually de-lineated LA walls generally obtained much better perfor-mance than those using fully automatic approaches in thechallenge. Their benchmark study emphasizes the impor-tance of an accurate initialization.

For LA segmentation, Ravanelli et al. (2014) proposeda method using threshold for initialization, followed by the3D fast marching for segmentation. They required man-ual correction from the clinicians to achieve reliable per-formance. Tao et al. (2016) combined LGE MRI with an-other MRI sequence with better anatomical information tosegment the LA, and the combined segmentation achievedbetter results than the method solely using LGE MRI.Xiong et al. (2018) proposed a dual fully convolutionalneural network for LA segmentation from LGE MRI withpromising results. Later, they organized a LA segmenta-tion challenge in MICCAI 2018 (Zhao and Xiong, 2018).For LA wall segmentation, Veni et al. (2017) proposed analgorithm named ShapeCut, combining a shape-based sys-tem and the graph-cuts approach to make a Bayesian dualsurface estimation. Ji et al. (2018) applied the advancedtwo-layer level set with a soft distance constraint for dualsurface segmentation of LA and LV walls. Their methodwas 2D-based and required an manual initialization of theendocardial boundaries.

In summary, in previous works scar quantification relieson an accurate segmentation of the LA or LA walls forinitialization, but automating this segmentation is still anopen question. In this work, we propose a fully automaticmethod for LA scar quantification and analysis, withoutthe requirement of an accurate LA segmentation.

Firstly, we propose to perform scar quantification ona surface, onto which the LA endocardium is projected.We neglect the thickness of LA walls, because the clini-cal studies are generally performed by projecting the scarsonto the LA endocardial surface for visualization (Peterset al., 2007; Knowles et al., 2010; Ravanelli et al., 2014).In this framework, we represent the surface using a graph,and formulate the classification as an energy minimizationproblem which can be solved by graph-cuts. We furtherpropose to explicitly learn the edge weights of the graph,

i.e., n-link and t-link potentials (Boykov and Jolly, 2001).This is achieved by a convolutional neural network (CNN),which learns features from the images (Krizhevsky et al.,2012). Here, we do not directly compute these weightssolely based on the intensity similarity, as the conventionalgraph-cuts methods do. This is because the enhancementpatterns in LGE MRI are complex and can vary greatlyacross different patients, leading to inconsistent intensitypatterns. Also, currently the automatic methods couldhave a few millimeters under or over segmentation, leadingto the estimated endocardial surface being misaligned tothe ground truth. The proposed CNN scheme can exploitboth image features and spatial context by means of neigh-borhood information, to provide more accurate estimationof the graph weights. We finally obtain the classificationbased on the graph-cuts framework, which mitigates theeffect of misalignments of the endocardial surface due toautomatic LA segmentation errors.

Furthermore, we propose to employ the multi-scale patch(MSP) strategy (Zhuang and Shen, 2016), and combine itwith the CNN for graph potential learning. This is be-cause distinguishing scars can be challenging solely basedon local texture information, particularly in the area wherethe LA wall is surrounded by other enhanced regions, suchas the fibrosis of mitral valve, aortic wall and right atrial(RA) wall. The MSP method is developed from the imagescale space theory, which can handle information at dif-ferent levels within a limited window and has been widelyapplied to the tasks of feature extraction, detection andimage matching (Lorensen and Cline, 1987). The MSPstrategy can incorporate both the local fine texture fea-tures and the global structural information into the CNNarchitecture. We refer to such MSP-based CNN as multi-scale CNN, i.e. MS-CNN. In addition, the MSPs are ex-tracted with random offsets along the perpendicular di-rection of the LA endocardial surface, simulating the mis-alignments between the automatically segmented LA sur-face and the ground truth. Therefore, such patches notonly can model the multi-scale texture patterns of the im-ages, but also can further improve the robustness of theproposed method against the LA segmentation errors.

The remainder of the paper is organized as follows: the

2

Page 3: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

LGE

MRI

Ana-

MRI

Graph-cuts

MAS

LA&PV

Atlas pool

GraphProjection &

Generate Mesh

Extract

PatchMS-CNN

Predict edge

weights Result

Figure 2: Flowchart of the proposed framework for LA scar quantification and analysis.

detailed framework of the proposed algorithm is presentedin Section 2. Section 3 presents the experiments and re-sults. Discussion and conclusion are given in Section 4.

2. Method

Fig. 2 provides an overview of the proposed framework.First, we use a well-developed multi-atlas whole heart seg-mentation (MA-WHS) to obtain an initial segmentationof the LA (see Section 2.1). Then, we project the LAendocardium to generate a surface mesh, where the quan-tification is performed (see Section 2.2). The labeling ofscars is achieved by optimizing a cost function based onthe graph-cuts framework (see Section 2.3), whose poten-tials for edge weights are explicitly learned by the proposedMS-CNN (see Section 2.4). Note that the graph-cuts basedclassification is performed on the surface mesh. This canavoid the challenging segmentation of thin LA wall andalso greatly reduce the computational cost. At the sametime, both the texture and anatomical features of the LAmyocardium can be adequately extracted by employingthe MSP strategy. Thus, the features of the nodes in thegraph are represented by a set of MSPs, and the potentialsare learned and predicted by the MS-CNN.

2.1. Initialization of atrial endocardium and pulmonaryveins

We use MA-WHS, which is based on multi-atlas seg-mentation (MAS), to obtain the geometrical informationof the LA. This is because the LGE MRI covers the wholeheart, and MA-WHS has been well developed and ap-plied in recent years (Zhuang and Shen, 2016; Yang et al.,2018). MAS algorithm segments an unknown target im-age by propagating and fusing the labels from multipleannotated atlases using registration. As the LGE MRIimages could have relatively poor image quality, we firstapply MA-WHS on the anatomical MRI (Ana-MRI), andthen propagate the segmentation using affine registrationfrom the Ana-MRI to LGE MRI. The Ana-MRI image isnormally acquired in the same MRI examination as LGEMRI, using the b-SSFP sequence, which generates higherquality images for atlas-based segmentation.

Having finished the WHS for LA and PV delineation,the marching cubes algorithm (Lorensen and Cline, 1987)is then used to obtain a surface mesh of the LA endo-cardium which excludes the mitral valve. Note that the LAsegmentation is generally reliable, but still contains errorsleading to misalignments between the extracted surfacemesh and the ground truth. For example, the mean Dicescore of our MA-WHS for LA is 0.898± 0.044 (please c.f.Section 3.3 for details). However, the effect of inaccurateLA segmentation can be minimized thanks to the projec-tion strategy and the MS-CNN learning coupled with therandomly shifted MSP sampling strategy. The reader isreferred to Fig. 3 for illustration and following methodol-ogy sections for details.

2.2. Projection of the atrial endocardium

We project the LA endocardium onto a surface mesh,and then the atrial scars can be classified on a graphicalsurface. This is because the clinical demands for scar quan-tification in AF patients mainly concern the location andextent of scarring areas (Ravanelli et al., 2014). Williamset al. (2017) proposed a method to simultaneously repre-sent multiple parameters on a surface model based on thetemplate of an average LA mesh. By projection, both theerrors due to LA wall thickness and misregistration of theWHS can be mitigated. At the same time, the computa-tional complexity of the algorithm can be reduced dramat-ically.

The endocardial surface is generated from the volu-metric binary segmentation result of the LA cavity usingthe marching cubes algorithm (Lorensen and Cline, 1987).The resolution of the surface mesh is denser than the res-olution of the image, which protects the small scars. Theprojection from the LA geometry to the surface mesh canpreserve the geodesic distances between two nodes. Thisequidistant projection is required due to the definition ofn-link weights in the proposed graph-cuts framework. Inthis formulation, each vertex on the surface, i.e., node ofthe graph, should include a profile that represents the tex-ture information of the corresponding location in the LGEMRI. Here, we represent this profile using MSPs, which

3

Page 4: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

z

y

x

Ground TruthMA-WHS

LA&PVWHS Project Extract Patch Sample

Patches withrandom shift

(a) (b) (c) (d) (e) (f)

Figure 3: Pipeline of projection and patch extraction phases. Please refer to the text for more details.

can incorporate both global structural features and localtexture information.

2.3. Graph formulation for scar segmentation

Classification and quantification of scars on the LA sur-face can be formulated as an energy minimization prob-lem solved via graph-cuts. The weights of the graph comefrom two parts, i.e., the regional term ER and the bound-ary term EB (Boykov and Jolly, 2001). The regionalterm encodes the intensity distributions of different classes,and the boundary term maintains the continuity betweenneighbors.

LetG = {X ,N} denotes a graph, where X = {xi} indi-cates the set of graph nodes, and N = {< xi, xj >} is theset of edges. Here, the weights of edges connecting graphnodes to the terminals are known as t-link weight, andthe weights of edges connecting neighboring nodes are re-ferred to as n-link weight. The two terminals respectivelydenote the scars and normal myocardium in our problem,analogous to the foreground and background of the gen-eral image segmentation task. Let lxi

∈ {0, 1} be the labelassigned to xi, and l = {lxi

|xi ∈ X} be the label vectorthat defines a segmentation. The segmentation energy isdefined as follows,

E(l) = ER(l) + λEB(l)

=∑xi∈X

W t−linkxi

(lxi) + λ∑

(xi,xj)∈N

Wn−link{xi,xj} (lxi , lxj ).

(1)where W t−link

xiand Wn−link

{xi,xj} are respectively the t-link and

n-link weight.In conventional graph-based segmentation, the regional

term is generally obtained by optimizing based on a man-ual defined initial model. For example, Boykov and Jolly(2001) manually selected a number of seed points to con-struct such model, referred to as graph cuts method, andRother et al. (2004) manually defined a bounding box forinteractive segmentation, known as GrabCut approach.The boundary term in these works was normally definedaccording to the dissimilarity of intensity and distance be-tween two connected nodes. Veni et al. (2017) designed aregional term based on a generative image model incorpo-rating both local and global shape priors. The boundary

term was defined for regularizing the smoothness of the es-timated surface, i.e., minimizing the squared difference ofthe offsets between neighboring vertices. Lu et al. (2017)estimated a regional term combining three maps, includinga probability map, a thresholding map and a local appear-ance map. The boundary term they defined was relatedto the intensity difference and distance of two connectednodes.

In this work, we propose to directly learn and pre-dict the t/n-link potentials for the regional and bound-ary terms. This is different from the conventional means,where the profile of a graph node is commonly representedby the intensity of a single pixel or its local texture, whichconsists of limited information. Here, we combine theprofile representation of graph nodes with the MSP strat-egy, and learn the potentials using the proposed MS-CNN.Fig. 4 illustrates the flowchart of constructing the graph.

2.4. Explicit learning of graph potentials using MS-CNN

Fig. 4 illustrates the computation of graph potentialsfor the graph-cuts based classification of LA scars.

2.4.1. Multi-scale patch and patch extraction

We propose to extract MSPs from LGE MRI to repre-sent the profile of the graph node, and to feed the MS-CNNfor training and prediction. MSP can represent differentlevels of structural information at a location in an im-age, with low scale capturing local fine details and highscale providing global structural information of the image(Zhuang and Shen, 2016).

Each graph node xi has its associated MSPs, denotedas Pi = {p0xi

, p1xi, ..., pNs−1

xi}, where Ns indicates the num-

ber of scales. They are extracted from the correspondingvolumetric region in the LGE MRI, by back projectingthe node to the position in the image. These patches areelongate-shaped and are defined along the normal direc-tion of the LA endocardial surface, as Fig. 3 (e) shows,and their local orientations are maximally aligned to thecommon world coordinate system of the LGE MRI. Themulti-scale strategy is implemented by adjusting the sam-ple spacing to generate patches with different scales, corre-sponding to different resolutions of the LGE MRI. We em-ploy parallel convolutional pathways for multi-scale pro-cessing, to feed the different scale information of imagesto the neural network simultaneously, as Fig. 5 (a) shows.

4

Page 5: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

Target MSP and its adjacent MSP pairsTarget Image

T-Net

N-Net

MS-CNN

𝐸 𝑙 =

𝑥𝑖𝜖𝒳

𝑊𝑥𝑖𝑡−𝑙𝑖𝑛𝑘 + λ

൫𝑥𝑖,𝑥𝑗)∈𝒩

𝑊 𝑥𝑖,𝑥𝑗

𝑛−𝑙𝑖𝑛𝑘

T

S

graph

mesh

scar

Extract

Patch

𝑾𝒙𝒊𝒕−𝒍𝒊𝒏𝒌

𝑾 𝒙𝒊,𝒙𝒋

𝐧−𝒍𝒊𝒏𝒌

Figure 4: Construction of the graph and the explicit learning of the graph potentials by MS-CNN (MSPs are integrated and represented usingred cuboid in this work).

2.4.2. Multi-scale convolutional neural network

We have two neural networks, i.e., T -NET andN -NET.T -NET learns and predicts the t-link potentials, i.e., theprobabilities of a node belonging to scars and normal wallsrespectively, as Fig. 5 (a) shows. N -NET calculates the n-link potential between two connected nodes, as Fig. 5 (b)shows.

For training of the t-link potentials, we define a sam-ple for each node of a graph constructed from LGE MRI.The sample is composed of the MSPs associated to thenode xi, and its ground truth label Li, i.e., Li equalsto 1 if it is scar, and 0 otherwise. As Fig. 5 (a) shows,the training data of T -NET can be represented as DT =[(P1, L1), ..., (PN , LN )], i.e., N nodes with correspondinglabels. Thus, the T -NET can be parameterized by θT asfollows,

θT = arg minθT

N∑i=1

(L(Pi; θT)− Li)2, (2)

where Pi = {p0xi, p1xi

, p2xi}, and L is the estimated t-link

weight.For training of the n-link, we define a sample for each

pair of two neighboring nodes {xi, xj}, consisting of threeelements, i.e., (1) the pair of the two sets of MSPs associ-ated with the two nodes, i.e. {Pi,Pj}, (2) the geodesic dis-tance between them, denoted as dij , and (3) their groundtruth label similarity Mij .

As Fig. 5 (b) shows, the training data of N -NET can

be represented as DN = [(Pi,Pj , dij ,Mij)]i,j=Ni,j=1 . The dis-

tance dij is viewed as an additional similarity feature,namely the labeling of two nodes can be more similar ifthey are closer. To this end, we design a sub-network, de-noted as ~F, to extract high-level and dense features, i.e.

~F(P). We then obtain a new feature vector from Pi andPj , as follows,

~Gij = ~F(Pi)× ~F(Pj) + (1− ~F(Pi))× (1− ~F(Pj)). (3)

Each element of ~Gij can be considered as a similarity met-

ric in the feature space. Finally, we combine ~Gij and dij ,and feed them to another sub-network for computing thelabel similarity, i.e., the n-link weight. Thus, the N -NETcan be parameterized by θN as follows,

θN = arg minθN

N∑i,j=1

(M(~Gij , dij ; θN)−Mij)2, (4)

where M is the estimated n-link weight.

2.4.3. Training and testing strategy

In the training phase, we use weighted sampling tomitigate the problem of class imbalance in the trainingset, where the number of the nodes belonging to normalmyocardium in a subject could be tens or even hundredstimes more than that of scars. In addition, we add a ran-dom shift, along with the normal direction, to the centerof the MSPs, to mitigate the effects from the inaccuratedelineation of the LA boundaries due to over or under seg-mentation. This is illustrated in Fig. 3 (d-f). This shiftshould be large enough to overcome potential segmenta-tion errors, while at the same time be small enough toavoid being too distant and cannot capture the textureprofile of the LA wall. We propose to assign this randomvalue in a given range, i.e. γ ∈ (−R,+R), to a node inthe training phase, where γ is the shift value, − and +represent being inside and outside of the LA blood cavity,respectively.

5

Page 6: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

...

... ...

...

... ......

...

... ...

...

... ......

...

... ...

...

... ......

𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7 𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7

𝟔𝟒×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟏𝟐𝟖×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟏𝟐×𝟐

𝟓𝟏𝟐×𝟏𝟐×𝟐

25

6

𝟏𝟑𝟐×𝟏

7

64

12

8

𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7 𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7

𝟔𝟒×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟏𝟐𝟖×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟏𝟐×𝟐

𝟓𝟏𝟐×𝟏𝟐×𝟐

25

6

𝟏𝟑𝟐×𝟏

7

64

12

8

𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7 𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7

𝟔𝟒×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟏𝟐𝟖×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟏𝟐×𝟐

𝟓𝟏𝟐×𝟏𝟐×𝟐

25

6

𝟏𝟑𝟐×𝟏

7

64

12

8

Scale 0

Res

am

ple

Res

am

ple

Scale 1

Scale 2

64 32

Ou

tpu

t

FC

FC

FC

FC

(a)

65 32

Ou

tpu

t

𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7 𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7

𝟔𝟒×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟏𝟐𝟖×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟏𝟐×𝟐

𝟓𝟏𝟐×𝟏𝟐×𝟐

25

6

𝟏𝟑𝟐×𝟏

7

64

12

8

16

FC

...

...

FC

...

...

...

𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7 𝟔𝟒 × 𝟏𝟑𝟐 × 𝟏7

𝟔𝟒×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟏𝟐𝟖×𝟔𝟐×𝟖

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟏𝟐𝟖×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟑𝟐×𝟒

𝟐𝟓𝟔×𝟏𝟐×𝟐

𝟓𝟏𝟐×𝟏𝟐×𝟐

25

6

𝟏𝟑𝟐×𝟏

7

64

12

8

...

...

FC

...

...

... Conv 3×3×3, Relu

Max Pool 2×2×2

Parameter Sharing

+Distance

......

......

(b)

Figure 5: The hierarchical architecture of the networks: (a) T -NET; (b) N -NET.

In the testing phase, one can compute the t-link and n-link potentials of the graph, and the classification of scarson the LA surface can be achieved by embedding theseestimated weights into the graph-cuts framework, i.e.,

W t−linkxi

= L(Pγi ; θT), (5)

and,

Wn−link{xi,xj} = M(Pγi ,P

γj , dij ; θ

N) = M(~Gij , dij ; θN). (6)

Note that the two normalized t-link weights of a node,indicating potentials to the foreground and backgroundrespectively, can also be viewed as the probabilities of thisnode belonging to scars and normal tissues.

3. Experiments and results

3.1. Data acquisition and experimental setup

We collected fifty-eight post-ablation LGE MRI datafrom patients with longstanding persistent AF for exper-iments. Transverse navigator-gated 3D LGE MRI wasperformed on a 1.5T Siemens Magnetom Avanto scanner(Siemens Medical Systems, Erlangen, Germany), whichused an inversion prepared segmented gradient echo se-quence (TE/TR 2.2 ms/5.2 ms) 15 minutes after gadolin-ium administration. The LGE MRI data were acquiredat resolution of (1.4-1.5) × (1.4-1.5) × 4 mm, and recon-structed to (0.7-0.75)×(0.7-0.75)×2 mm. For each patient,prior to contrast agent administration, coronal navigator-gated 3D b-SSFP (TE/TR 1 ms/2.3 ms) data were scanned,with acquisition resolution of (1.6-1.8) × (1.6-1.8) × 3.2mm, and reconstructed to (0.8-0.9)× (0.8-0.9)× 1.6 mm.Both LGE MRI and b-SSFP data were acquired duringfree breathing with respiratory motion control (Keeganet al., 2014).

The available data were randomly divided into two sets,one for training (31 images) and the other for testing (27

images). T -NET was trained using stochastic gradient de-scent optimizer, with following hyper-parameters: momen-tum = 0.9, batch size=50, weight decay=10−4, number ofepochs=15. The learning rate was initially set to 0.01, andhad a stepped decay rate of 0.8 every 1000 iterations. Asimilar configuration was designed for N -NET.

We first evaluated the accuracy of automatic segmen-tation of LA in Section 3.3. Then, we performed four pa-rameter studies to verify the effects of the parameters andexplore their optimal values. In Section 3.4.1, we investi-gated the influence of different patch sizes to the proposedframework using the single-scale CNN, and then comparedthe results with that of MS-CNN. In Section 3.4.2, westudied the proposed method with different values of thebalancing parameter λ. Section 3.4.3 and Section 3.4.4present the studies of random shift and multi-scale learn-ing, respectively. The optimal parameters concluded fromthese studies were used for the proposed method, in com-parisons with other methods, in Section 3.5. Finally, Sec-tion 3.6 reports the performance of the proposed methodand results of the inter-observer study.

3.2. Gold standard and evaluation

All the LGE MRIs were manually segmented by an ex-perienced cardiologist specialized in cardiac MRI, to labelthe enhanced atrial scarring regions, which are consideredas ground truth in this work. To assess the scar classifi-cation results, we generated the ground truth reference byprojecting the manually segmented scars onto the LA sur-face. With regard to different initializations, i.e., the man-ual (abbreviated as LAM ) and automatic (abbreviated asLAauto) delineation of LA, two different ground truths, re-spectively referred to as GTM and GTauto, were generatedfor evaluation. In the comparison studies in Section 3.5,the fully automatic methods were evaluated using GTauto,while the semi-automatic algorithms based on LAM wereevaluated using GTM .

For evaluation, we computed the statistical measures,Dice score of scars, referred to as Dice (scar), and the

6

Page 7: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

generalized Dice score, denoted as GDice. The statisti-cal measures include accuracy, sensitivity and specificity.GDice is a weighted Dice score by evaluating the segmen-tation of all labels (Crum et al., 2006; Zhuang, 2013), andis formulated as follows,

GDice =2∑Nk−1

k=0 |Sautok ∩Smanual

k |∑Nk−1

k=0 (|Sautok |)+(|Smanual

k |) , (7)

where Sautok and Smanualk indicate the segmentation resultsof label k from the automatic method and manual delin-eation, respectively, and Nk is the number of labels. Allthe metrics are computed on the projected LA surface.

3.3. Automatic segmentation of LA and correlation anal-ysis

To obtain an initialization of LA for scar segmentation,we developed the MA-WHS method using 30 b-SSFP MRIatlases. The 30 high resolution atlases were constructedfrom the Left Atrial Segmentation Challenge (STACOM2013) (Tobon-Gomez et al., 2015). The manual delineationof LA was regarded as the gold standard for this exper-iment. The MA-WHS results of Ana-MRI were mappedto LGE MRI from the same subject, and then generatedthe initial LA labels. The average Dice score of this LAsegmentation to the manual delineation was 0.898±0.044.

To analyze the relation between the LA segmentationerror and the scar quantification accuracy by the proposedmethod, we plotted these two values for each of the 27 testsubjects as two dimension scatter points in Fig. 6. One cansee that the plot shows little direct relationship betweenthem. We further performed linear regression, Pearsoncorrelation and Spearman’s rank correlation. The R2 andPearson coefficient were respectively 0.0199 and 0.1412,indicating low linear correlation between Dice (scar) andDice (LA); and the rank correlation coefficient was 0.0110,meaning hardly monotonic relationship between them ei-ther. To conclude, the result illustrates the low correlationbetween the scar quantification accuracy and the LA seg-mentation accuracy by the proposed method.

3.4. Parameter Studies

3.4.1. Study of patch sizes

We used one-scale MSP, namely only the original image(scale 0) was used and the CNN was a single-scale network,for studying the proposed method with different sizes ofpatches. The patch sizes ranged from 7× 7× 11 to 17×17× 21 voxel, where the voxel size is 1× 1× 1 mm. Then,we implemented the three-scale MSP and CNN with patchsize 13 × 13 × 17 voxel, for comparisons with the single-scale CNNs. The balancing parameter λ in this study wasset to 0.6, and the random shift range R was set to half ofthe patch length.

Fig. 7 (a) shows that the average Dice score increasesdramatically at first with respect to the increased sizes ofpatches, then starts to converge after the patch size reach-ing 13 × 13 × 17 voxel. This is reasonable, as the larger

0.80 0.85 0.90 0.95 1Dice (LA)

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

Dic

e (s

car)

R2 = 0.0199

Figure 6: Scatter point plot for analyzing the correlation between theLA segmentation performance and scar quantification accuracy, bothindicated by Dice scores. The Pearson coefficient and Spearman’srank coefficient are respectively 0.1412 and 0.0110.

size is used, the richer intensity profile is included for fea-ture training and detection. However, the increase of patchsize generally requires more complex networks, either morekernels or more convolutional layers, which increases com-putation load and memory requirements. This also ratio-nalizes our proposal to use MSP and MS-CNN. As Fig. 7(a) presents, our MS-CNN drastically increases the accu-racy of the classification results, thanks to the usage of theMSP strategy which incorporates both local and global in-formation of the images. In the following experiments, weadopted this three-scale setting (except for Section 3.4.4)and patch size of 13× 13× 17 voxel.

3.4.2. Study of balancing parameter λ

In this study, we compared the results of the proposedscheme using different values for the balancing parameterλ, λ ∈ [0,∞), to demonstrate the effect of graph-cuts.Here, we set the values ranging from 0 to 2. The patchstrategy was as follows, number of scales was three, patchsize was 13× 13× 17 voxel, and the random shift range Rwas set to a maximum of 8 mm.

Fig. 7 (b) presents the results. One can see that thebest performance in terms of Dice score is obtained whenλ is set to 0.4. This indicates that the inter-node relation(n-link) is important, and the weighting between the t-linkand n-link terms should be balanced to achieve optimalperformance. In the following experiments, λ was set to0.4 for the proposed method.

3.4.3. Study of random shift range

To demonstrate the effect of random shift, we com-pared the performance of the proposed method with dif-ferent random shift ranges R, for γ ∈ (−R,+R). Here,we set R ranging from 0 to 12 mm. The patch size was 13× 13× 17 voxel, and λ was set to 0.4.

Fig. 7 (c) provides the results of this study. The bestDice score is obtained when R is set to 8 mm, i.e., half of

7

Page 8: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

0.63

0.64

0.65

0.66

0.67

0.68

0.69

0.70

0.71

Dic

e sc

ore

with

diff

eren

t pat

ch si

zes

0.640±0.085

0.655±0.080

0.665±0.077

0.677±0.070 0.680±0.071

0.675±0.072

0.700±0.073

72 ×9 9

2 ×13 112

×15 132

×17 152

×19 172 ×21 13

2 ×17

Patch size (MSP)

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Weight term λ

0.680

0.685

0.690

0.695

0.700

0.705

Dic

e sc

ore

with

diff

eren

t wei

ght t

erms

0.692±0.069

0.698±0.069

0.700±0.070

0.701±0.070

0.702±0.0710.701±0.072

0.700±0.073

0.699±0.074

0.697±0.075

1.5 2

0.692±0.077

0.684±0.088

(b)

0 2 4 6 8 10 12Random shift range R

0.63

0.64

0.65

0.66

0.67

0.68

0.69

0.70

0.71

0.72

Dic

e sc

ore

with

diff

eren

t ran

dom

shift

ran

ges

0.644±0.083

0.668±0.078

0.698±0.064 0.700±0.0710.702±0.071

0.690±0.061

0.664±0.072

(mm)

(c)

1 30.670

0.675

0.680

0.685

0.690

0.695

0.700

0.705

0.710

Dic

e sc

ore

with

diff

eren

t num

bers

of

scal

es

0.677±0.070

0.690±0.067

0.702±0.071

2 Scale number Ns

(d)

Figure 7: Dice scores of the proposed method with different parameterizations: (a) performance against different patch sizes (λ=0.6); (b)performance against different values of the balancing parameter λ to weight the t-link and n-link terms in the graph-cuts framework; (c)performance against different random shift ranges R; (d) performance against different numbers of scales Ns.

the patch length in the long-axis direction, and the per-formance of the proposed method deteriorates drasticallywhen the shift range becomes larger than 8 mm. This isrational, because the shift range should cover all the poten-tial misalignments of the constructed surface to the groundtruth. When the random shift range R is greater than 8mm, the patch may not cover the regions which includethe important features for training and classification.

3.4.4. Study of scales

To study the effect of multi-scale learning, we com-pared the results using different numbers of scales, i.e.Ns = {1, 2, 3}. The patch size of MSP was set to 13 ×13×17 voxel, λ was set to 0.4, and the random shift rangeR was set to a maximum of 8 mm.

Fig. 7 (d) presents the mean Dice scores of the method.This study demonstrates that the effectiveness of the multi-scale learning. It indicates that the more scales we used thebetter accuracy we obtained. It should be noted that when

we tried to use more scales, the training session failed, dueto the limited computation capacity of our computer.

3.5. Comparison with other methods

In this study, we implemented eight segmentation ap-proaches, including the proposed method, for comparisons.Here, LAM indicates the methods adopt the manual seg-mentation of LA for initialization, and LAauto denotesthe methods employ the automatic segmentation from theMA-WHS approach described in Section 2.1.

(1) LAM + 2SD: This is one of the most widespreadthresholding methods to detect atrial scars. It cal-culates a specific number of standard deviation (SD)above a reference value. The reference value is gener-ally set to the mean intensity from the blood pool orLA wall. It is however generally patient-specific andslice-specific, and different numbers of SD have beenused (Karim et al., 2013). In our study, we obtained

8

Page 9: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

Table 1: Summary of the quantitative evaluation results. GDice denotes the generalized Dice score. Here, the asterisk (∗) in column Dice(scar) indicates the methods obtained statistically poorer (p < 0.01) results compared to the proposed LAauto + LearnGC. The p value ofthe Dice (scar) between LAM +MS-CNN0 and LAauto +MS-CNN0 is 0.225.

Method Accuracy Sensitivity Specificity Dice (scar) GDice

LAM + 2SD 0.809 ± 0.074 0.168 ± 0.067 0.994± 0.005 0.275 ± 0.091∗ 0.758 ± 0.098LAM + Otsu 0.763 ± 0.188 0.346 ± 0.214 0.880 ± 0.289 0.396 ± 0.090∗ 0.726 ± 0.207LAM + MGMM 0.708 ± 0.160 0.781 ± 0.127 0.690 ± 0.236 0.545 ± 0.101∗ 0.716 ± 0.190LAM + MGMM + GC 0.716 ± 0.162 0.799± 0.124 0.694 ± 0.240 0.562 ± 0.102∗ 0.721 ± 0.192LAM + MS-CNN0 0.798 ± 0.051 0.775 ± 0.099 0.805 ± 0.078 0.615 ± 0.083∗ 0.811 ± 0.047LAauto + MS-CNN0 0.806 ± 0.052 0.743 ± 0.126 0.824 ± 0.088 0.631 ± 0.080∗ 0.814 ± 0.047LAauto + MS-CNN 0.846 ± 0.032 0.786 ± 0.118 0.886 ± 0.057 0.692 ± 0.069∗ 0.851 ± 0.030LAauto + LearnGC 0.856± 0.033 0.773 ± 0.132 0.883 ± 0.058 0.702± 0.071 0.859± 0.031

the optimal performance by setting the thresholdvalue to 2 SD above the mean intensity of LA walls.Here, we constructed the LA wall from a manual seg-mentation of the LA with a morphological dilation,which was also used for the following experimentswhen the LA wall was needed from LAM .

(2) LAM +Otsu: This method uses the Otsu algorithm(Otsu, 1979) for automatic thresholding of the scar-ring tissues from the LA wall obtained from LAM .

(3) LAM + MGMM : This method adopts the multi-component Gaussian mixture model (MGMM) forscar segmentation from the LA wall (Liu et al., 2017).MGMM can deal with the intensity heterogeneity ofmyocardium caused by the infarcts, and has beenproven to be effective in myocardium segmentation.

(4) LAM + MGMM + GC: This method further reg-ularizes the spatial continuity using the graph-cutsframework, based on the result of MGMM. Here, wedefined the boundary weight using the intensity dif-ference between neighboring points, and the regionalweight was computed from the posterior probabilitymap of scars generated from MGMM.

(5) LAM + MS-CNN0: This learning based methodonly uses the two t-link weights estimated from T -NET to classify scars. The two weights, i.e. respec-tively linked to the foreground scar and backgroundnormal tissue, are normalized and considered as theposterior probability of the two labels. Here, bothtraining data and test data were initialized usingmanually segmented LA, so the random shift in thetraining phase was set to zero, i.e. γ=0.

(6) LAauto + MS-CNN0: This method uses the esti-mated t-link weights from T -NET, similar to LAM +MS-CNN0, to classify scars. However, the LA herewas automatically segmented using MA-WHS. Forcomparisons with LAM + MS-CNN0, here we alsoset the random shift to zero (γ=0).

(7) LAauto + MS-CNN : Similarly, this method usesthe estimated t-link weights from T -NET to classifyscars, and the LA was automatically segmented us-ing MA-WHS. However, in the training phase we setthe random shift accordingly based on the parameterstudy in Section 3.4.3.

LAM+2SDLAM+Otsu

LAM+MGMM

LAM+MGMM+GC

LAM+MS-CNN0

LAauto+MS-CNN0

LAauto+MS-CNN

LAauto+LearnGC

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Dic

e sc

ores

of d

iffer

ent m

etho

ds

Figure 8: Boxplots of the Dice scores of scars by the eight methods.

(8) LAauto + LearnGC: This is the proposed methodin which the LA was initialized by MA-WHS andthe weights of the graph were learned and predictedusing MS-CNN. Here, the balancing parameter λwas set to 0.4. Noted that when λ = 0, LAauto +LearnGC becomes LAauto +MS-CNN .

Table 1 presents all the quantitative results of the eightmethods, and Fig. 8 provides their boxplots of Dice scoresof scars. The proposed learning graph-cuts method, i.e.LAauto + LearnGC, obtained evidently better scar seg-mentation (Dice of scars) than the conventional methodsbased on LAM . It also performed statistically better thanall the other seven methods in terms of Dice scores ofscars (p < 0.01). Note that LAauto + MS-CNN has aslightly better Dice (scar) than LAM + MS-CNN butwithout statistical significance (p=0.255), even though theformer is based on automatic segmentation of LA andthe latter uses manual segmentations. When combinedwith the random shift strategy, LAauto + LearnGC andLAauto+MS-CNN obtained evidently and statistical bet-ter Dice (scar) than the other methods (p < 0.01). Forthem, LAauto+LearnGC is generally better, but the gainis marginal, due to the fact that the graph-cuts is consid-

9

Page 10: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

𝑳𝑨𝑴 + 𝟐𝑺𝑫 𝑳𝑨𝑴 + 𝑶𝒕𝒔𝒖

𝑳𝑨𝑴 +𝑴𝑮𝑴𝑴 𝑳𝑨𝑴 +𝑴𝑮𝑴𝑴 + 𝑮𝑪 𝑳𝑨𝑴 +𝑴𝑺-𝑪𝑵𝑵0

𝑳𝑨auto +𝑴𝑺-𝑪𝑵𝑵0 𝑳𝑨auto +𝑴𝑺-𝑪𝑵𝑵 𝑳𝑨auto + 𝑳𝒆𝒂𝒓𝒏𝑮𝑪

Dice=0.696 Dice=0.713

Dice=0.363 Dice=0.442

Dice=0.618 Dice=0.636 Dice=0.652

Dice=0.620

Scars projected ontomanually segmentedLA surface (𝑳𝑨𝑴)

𝐆𝐫𝐨𝐮𝐧𝐝 𝐓𝐫𝐮𝐭𝐡

𝐆𝐫𝐨𝐮𝐧𝐝 𝐓𝐫𝐮𝐭𝐡

Scars projected ontoautomatically segmentedLA surface (𝑳𝑨𝒂𝒖𝒕𝒐)

Figure 9: 3D visualization of the LA scar classification results usingthe eight methods. This is the median case selected from the testset in terms of Dice score of scars by the proposed method. Thescarring areas are red-colored on the LA surface mesh, which can beconstructed either from LAM (LA surface in white) or from LAauto

(LA surface in light yellow).

ered as a built-in smoothness constraint to generate lesspatchy results. In this study, LAauto + LearnGC did notobtain the best figures in sensitivity or specificity metrics.Sensitivity measures the proportion of actual scarring re-gions that are correctly identified, and specificity measuresthe proportion of actual normal wall regions that are cor-rectly segmented. One can see the misleading of these twometrics in evaluating the performance of a method fromTable 1, where LAM + 2SD and LAM + MGMM + GCachieved the best specificity or sensitivity, though theirperformance was actually poor in our visual assessment.

In addition, we chose a representative case, the medianin terms of Dice (scar) from the test set by the proposedLAauto + LearnGC. Fig. 9 visualizes the 3D results bythe eight methods. One can observe that the 3D visu-alization agrees well with the quantitative analysis resultusing Dice (scar). Though the manually segmented scarsin LAM and LAauto are projected onto two different ref-erence surfaces, GTM and GTauto visually appear similarwhen we compare the location and extent of scars. Boththe two threshold algorithms, 2SD and Otsu, tended tounder estimate (segment) the scars, though Otsu gener-ally performed better. The results of LAM + MGMMand LAM + MGMM + GC were acceptable, but the ac-curacy and automation needed improving. The leaning-based methods, from LAM +MS-CNN0, LAauto +MS-CNN0 and LAauto + MS-CNN , to LAauto + LearnGC,improved the performance when the new methodologieswere introduced. Particularly, LAauto +LearnGC furtherreduced the noise and patchy segmentation results, andit obtained full automation and best Dice score of scarquantification.

3.6. Performance of the proposed method and inter-observerstudy

This study analyzes the performance of the proposedmethod in detail. To provide a reference for the quanti-tative evaluation metrics, we conducted a study of inter-observation variation from two manual delineations. Werandomly selected ten cases from the available data, andasked two experts to manually label the scars separately.For each case, the two labelling results of scars were pro-jected onto the LAM surface. The Dice (scar), generalizedDice, and accuracy of inter-observer variation were respec-tively 0.695± 0.049, 0.868± 0.027 and 0.867± 0.026.

Table 1 summarizes the quantitative evaluation resultsof the proposed method, i.e. LAauto + LearnGC. Theaverage Dice of scar is 0.702± 0.071, which is comparableto the inter-observer variation (0.695 ± 0.049), and thedifference is not significant (p=0.7783). This conclusionalso applies when we compare them using accuracy andGDice evaluation metrics.

Fig. 10 provides 2D visualization of the axial view fromthree examples. These three cases were the first quarter,median and third quarter cases from the test set in termsof Dice (scar) by the proposed method. This illustratesthat the method could provide promising performance forlocalizing and quantifying atrial scars of LA. In the medianand third quarter cases, we highlight the errors, particu-larly due to the enhanced adjacent regions, pointed out byarrow (1), (2) and (3). These mis-classifications, represent-ing the main challenges of this task, contributed to the ma-jor errors of scar quantification by the proposed method.Another type of error was caused by the misalignmentsof the automatic LA segmentation, as arrow (4) pointedout. This happened in some local areas where the errorsoccurred because of the different shapes of LA after recon-struction from the automatic segmentation. One can alsosee that even there existed large LA segmentation errors,indicated by arrow (5) in Fig. 10, the proposed methodstill could identify the scars at the corresponding locationof the projected surface. This is mainly attributed to theeffective training of the MS-CNN, which assigns randomshifts along the perpendicular direction of the surface whenextracting the training patches. The multi-scale learningalso contributes to the less demanding of accuracy fromthe automatic LA segmentation, thus enables to achievefully automated LA scar quantification.

4. Discussion and conclusion

In this work, we have proposed a fully automatic frame-work for segmentation and quantification of LA scars. Twomajor methodological contributions have been introduced.One is the formulation of quantifying the LA scarringbased on a surface mesh. The classification and quantifi-cation are achieved via the surface projection and graph-cuts framework. The other is the adoption of the multi-scale learning combined with CNN, i.e. MS-CNN. The

10

Page 11: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

Table 2: Overview of previous methods for scar quantification and segmentation in LA. Abbreviations: segmentation (seg); inter-observervariation in terms of Dice (Inter-ob); Society of Photo-Optical Instrumentation Engineers (SPIE), IEEE Journal of Translational Engineeringin Health and Medicine (TEHM), IEEE transactions on medical imaging (TMI), Medical physics (MP), Medical Image Computing andComputer-Assisted Intervention (MICCAI).

Work No. subjects LA (wall) seg Scar seg method Result (Dice) Inter-ob

Perry et al. (2012), SPIE 34 manual k-means 0.807 ± 0.106 0.786 ± 0.072Karim et al. (2014), TEHM 15 semi-auto GMM + Graph-cuts > 0.8 N/ARavanelli et al. (2014), TMI 10 semi-auto NVI + Manual correction 0.850 ± 0.070 N/A

10 auto NVI 0.600 ± 0.210 N/AWu et al. (2018), MICCAI 36 auto Multivariate mixture model 0.556 ± 0.187 N/AChen et al. (2018), MICCAI 100 auto Dilated Attention Network 0.776 ± 0.146 N/AYang et al. (2018), MP 37 auto Super-pixels + SVM 0.790 ± 0.050 N/A

Original Image Ground Truth Proposed Method

first

qu

art

erM

ed

ian

3rd

qu

art

er

LA

RA

LR

LA

RA

LR

LA

RA

LR

Dice=0.664

Dice=0.713

Dice=0.742

(1)

(2)

(5)

(4)

(3)

(1)

Figure 10: Axial view of the images, the ground truth scar segmen-tation and the results by the proposed method. The red and whitecolor labels represent the scar and normal wall, respectively. Ar-row (1), (2) and (3) indicate the major classification errors of theproposed method caused by the surrounding enhanced regions, re-spectively from the right atrium wall, ascending aorta wall and de-scending aorta wall; arrow (4) shows an error from the misalignmentbetween the automatic LA segmentation and the ground truth; ar-row (5) illustrates that the proposed method can still perform well,even the automatic LA segmentation contains obvious errors.

multi-scale learning is implemented using the MSP strat-egy, which extracts the features from both the local andglobal intensity profiles of LGE MRI. The MS-CNN learnsboth the label probability of each nodes and the relationsbetween connected nodes in the graph. The surface pro-jection in the proposed framework avoids the difficulty ofproviding an accurate and demanding LA wall segmenta-tion, and the multi-scale patch-based learning, with therandom shift training strategy, further mitigates the effectof less accurate LA initialization from a fully automaticapproach, as demonstrated in Section 3.4.3. We employedfifty-eight images with manual delineation for experiments.The proposed method performs better when the size of ex-

tracted patches increases, but the performance convergeswhen the size is larger than a certain value (see Section3.4.1). The multi-scaling learning further improves theperformance compared to the method with single-scalelearning, as demonstrated in Section 3.4.4. Finally, theproposed learning graph-cuts based method demonstratesevidently better performance compared to the conventionalapproaches, and the mean accuracy and Dice (scar) forquantifying LA scars are respectively 0.856 and 0.702, whichare comparable to those of inter-observer variation (accu-racy=0.867, Dice =0.695).

Table 2 summarizes the related works from literature.Perry et al. (2012) evaluated their method on a datasetconsisting of 34 images. The mean Dice score was 0.807±0.106, and the inter-observation Dice was 0.786 ± 0.072.Their method required an accurate initialization of LAwalls from manual segmentation, followed by a k-meanclassification. Karim et al. (2014) employed GMM to modelthe enhancement of scar region, and used the graph-cutsmethod to consider neighbouring regions. This methodused LA segmentation for initialization, which was achievedfrom a semi-automatic method with manual correction.They evaluated the method using numerical phantoms aswell as using 15 in vivo images. They obtained more than0.8 Dice scores on the two datasets. Ravanelli et al. (2014)adopted a threshold based approach, where the normalizedvoxel intensity (NVI) of LA walls was applied. The thresh-old value, NV I = 4, was assigned according to previousstudies and visual validation by experts, base on whichthey used a 2-D skeletonization algorithm to quantify theatrial fibrosis. The authors evaluated both the fully au-tomatic method and the semi-automatic approach withmanual correction. The mean Dice scores of LA scar quan-tification increased from 0.60 ± 0.21 to 0.85 ± 0.07 whenthe manual correction was included. Wu et al. (2018) pro-posed a fully automatic method for LA fibrosis quantifi-cation. They formulated the joint distribution of imagesbased on the multivariate mixture model, and optimizedmodel parameters using the iterated conditional mode al-gorithm. They tested the method on 36 cases and reporteda mean Dice score of 0.556±0.187 and average accuracy of0.809 ± 0.150. Chen et al. (2018) developed a multi-viewtwo-task recursive attention model for simultaneous seg-

11

Page 12: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

mentation of LA and scars. The mean Dice score of LAsegmentation was 0.908± 0.031, which was similar to theresult (Dice=0.898 ± 0.044) from our study, though theiraverage Dice score of scar quantification was 0.776±0.146.Yang et al. (2018) employed the super-pixel algorithm andSVM to segment the scars on 37 subjects. They obtained0.790± 0.050 Dice score, 0.87 segmentation accuracy, 0.89sensitivity and 0.79 specificity by using the leave-one-outcross-validation strategy. This study yielded better Dicescore than ours in this work, but there was no evidentdifference in terms of the accuracy, sensitivity and speci-ficity between these two works. It should be noted thatamong these six works, only one, i.e., Perry et al. (2012),reported the details of inter-observer variation. Also notethat it can be difficult to pursue an objective cross-studycomparison due to the difference of datasets, initializationmethods, and evaluation metrics.

One of the challenges of LA scar quantification is to dis-tinguish artifacts from the boundary regions, such as fromthe RA wall and aorta wall, as we discussed above andshowed in Fig. 1 and Fig. 10. Conventionally, providingaccurate LA walls is the crucial step (Karim et al., 2013;Perry et al., 2012). In this work, we propose to use multi-scale deep learning technology, with specifically designedtraining strategy, to tackle this challenge. However, due tothe limited training data, the errors caused by this problemcould still happen. Secondly, the quantification of scars inour work is performed on the surface mesh projected fromthe LA endocardium. Karim et al. (2018) discussed theimportance of wall thickness, particularly considering thepotential that the ectopic activity can prevail in scars thatare non-transmural. However, they also emphasized thatthe relationship between the AF and the changes in wallthickness was not clear, and the thickness was difficult tomeasure based on current MRI data. In clinical practice,the location and extent of scarring areas are considered tohave greater clinical significance, which is however ardu-ous to represent and to perform quantitative cross-subjectcomparisons. In the future work, visual assessment willbe considered. Finally, a limitation of this work is thatthe gold standard was constructed from the manual seg-mentation of only one cardiologist. In the future, we cancombine the delineations from multiple experts to obtainan average and consensus gold standard.

Acknowledgement

This work was supported by the Science and Technol-ogy Commission of Shanghai Municipality (17JC1401600)and the NSFC (81301283). This study was also fundedby the British Heart Foundation Project Grant (ProjectNumber: PG/16/78/32402).

References

References

Badger, T.J., Daccarett, M., Akoum, N.W., Adjei-Poku, Y.A., Bur-gon, N.S., Haslam, T.S., Kalvaitis, S., Kuppahally, S., Vergara, G.,McMullen, L., et al., 2010. Evaluation of left atrial lesions afterinitial and repeat atrial fibrillation ablation: Lessons learned fromdelayed-enhancement MRI in repeat ablation procedures. Circu-lation: Arrhythmia and Electrophysiology 3, 249–259.

Beinart, R., Abbara, S., Blum, A., Ferencik, M., Heist, K., Ruskin,J., Mansour, M., 2011. Left atrial wall thickness variability mea-sured by ct scans in patients undergoing pulmonary vein isolation.Journal of cardiovascular electrophysiology 22, 1232–1236.

Boykov, Y.Y., Jolly, M.P., 2001. Interactive graph cuts for optimalboundary & region segmentation of objects in nd images, in: IEEEInternational Conference on Computer Vision, pp. 105–112.

Calkins, H., Kuck, K.H., Cappato, R., Camm, A.J., Chen, S.A.,Crijns, H.J.G., Jr, R.J.D., Davies, D.W., Dimarco, J., Edgerton,J., 2012. 2012 HRS/EHRA/ECAS expert consensus statementon catheter and surgical ablation of atrial fibrillation: recommen-dations for patient selection, procedural techniques, patient man-agement and follow-up, definitions, endpoints, and research trialdesign. Heart Rhythm 9, 632–696.

Chen, J., Yang, G., Gao, Z., Ni, H., Angelini, E., Mohiaddin, R.,Wong, T., Zhang, Y., Du, X., Zhang, H., et al., 2018. Multi-view two-task recursive attention model for left atrium and atrialscars segmentation, in: International Conference on Medical Im-age Computing and Computer-Assisted Intervention.

Chugh, S.S., Havmoeller, R., Narayanan, K., Singh, D., Rienstra,M., Benjamin, E.J., Gillum, R.F., Kim, Y.H., McAnulty, J.H.,Zheng, Z.J., et al., 2013. Worldwide epidemiology of atrial fib-rillation: a global burden of disease 2010 study. Circulation ,CIRCULATIONAHA–113.

Crum, W.R., Camara, O., Hill, D.L., 2006. Generalized overlapmeasures for evaluation and validation in medical image analysis.IEEE transactions on medical imaging 25, 1451–1461.

Ji, Y., van der Geest, R.J., Nazarian, S., Lelieveldt, B.P., Tao, Q.,2018. Advanced two-layer level set with a soft distance constraintfor dual surfaces segmentation in medical images, in: MedicalImaging 2018: Image Processing, p. 105743B.

Karim, R., Arujuna, A., Housden, R.J., Gill, J., Cliffe, H., Matharu,K., Gill, J., Rindaldi, C.A., ONeill, M., Rueckert, D., Razavi, R.,2014. A method to standardize quantification of left atrial scarfrom delayed-enhancement MR images. IEEE J Transl Eng HealthMed 2, 1–15.

Karim, R., Blake, L.E., Inoue, J., Tao, Q., Jia, S., Housden, R.J.,Bhagirath, P., Duval, J.L., Varela, M., Behar, J., et al., 2018. Al-gorithms for left atrial wall segmentation and thickness-evaluationon an open-source CT and MRI image database. Medical ImageAnalysis 50, 36–53.

Karim, R., Housden, R.J., Balasubramaniam, M., Chen, Z., Perry,D., Uddin, A., Al-Beyatti, Y., Palkhi, E., Acheampong, P., Obom,S., et al., 2013. Evaluation of current algorithms for segmentationof scar tissue from late gadolinium enhancement cardiovascularmagnetic resonance of the left atrium: an open-access grand chal-lenge. Journal of Cardiovascular Magnetic Resonance 15, 105–105.

Keegan, J., Jhooti, P., Babu-Narayan, S.V., Drivas, P., Ernst, S.,Firmin, D.N., 2014. Improved respiratory efficiency of 3D lategadolinium enhancement imaging using the continuously adaptivewindowing strategy (CLAWS). Magnetic resonance in medicine71, 1064–1074.

Knowles, B.R., Caulfield, D., Cooklin, M., Rinaldi, C.A., Gill, J.,Bostock, J., Razavi, R., Schaeffter, T., Rhode, K.S., 2010. 3-D visualization of acute RF ablation lesions using MRI for thesimultaneous determination of the patterns of necrosis and edema.IEEE transactions on biomedical engineering 57, 1467–1475.

Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classifi-cation with deep convolutional neural networks, in: Advances inneural information processing systems, pp. 1097–1105.

Liu, J., Zhuang, X., Wu, L., An, D., Xu, J., Peters, T., Gu, L., 2017.Myocardium segmentation from DE MRI using multicomponent

12

Page 13: arXiv:1902.07877v1 [cs.CV] 21 Feb 2019 · Atlas pool Graph Projection & Generate Mesh Extract Patch MS -CNN Predict edge weights Result Figure 2: Flowchart of the proposed framework

gaussian mixture model and coupled level set. IEEE Transactionson Biomedical Engineering 64, 2650–2661.

Lorensen, W.E., Cline, H.E., 1987. Marching cubes: A high resolu-tion 3D surface construction algorithm, in: ACM siggraph com-puter graphics, pp. 163–169.

Lu, F., Wu, F., Hu, P., Peng, Z., Kong, D., 2017. Automatic 3D liverlocation and segmentation via convolutional neural network andgraph cut. International journal of computer assisted radiologyand surgery 12, 171–182.

McGann, C.J., Kholmovski, E.G., Oakes, R.S., Blauer, J.J., Dac-carett, M., Segerson, N., Airey, K.J., Akoum, N., Fish, E., Bad-ger, T.J., et al., 2008. New magnetic resonance imaging-basedmethod for defining the extent of left atrial wall injury after theablation of atrial fibrillation. Journal of the American College ofCardiology 52, 1263–1271.

Otsu, N., 1979. A threshold selection method from gray-level his-tograms. IEEE transactions on systems, man, and cybernetics 9,62–66.

Perry, D., Morris, A., Burgon, N., McGann, C., MacLeod, R., Cates,J., 2012. Automatic classification of scar tissue in late gadoliniumenhancement cardiac MRI for the assessment of left-atrial wallinjury after radiofrequency ablation, in: Medical Imaging 2012:Computer-Aided Diagnosis, International Society for Optics andPhotonics. p. 83151D.

Peters, D.C., Wylie, J.V., Hauser, T.H., Kissinger, K.V., Botnar,R.M., Essebag, V., Josephson, M.E., Manning, W.J., 2007. Detec-tion of pulmonary vein and left atrial scar after catheter ablationwith three-dimensional navigator-gated delayed enhancement MRimaging: initial experience. Radiology 243, 690–695.

Pontecorboli, G., Figueras i Ventura, R.M., Carlosena, A., Ben-ito, E., Prat-Gonzales, S., Padeletti, L., Mont, L., 2016. Useof delayed-enhancement magnetic resonance imaging for fibrosisdetection in the atria: a review. EP Europace 19, 180–189.

Ravanelli, D., dal Piaz, E.C., Centonze, M., Casagranda, G., Marini,M., Del Greco, M., Karim, R., Rhode, K., Valentini, A., 2014.A novel skeleton based quantification and 3-D volumetric visual-ization of left atrium fibrosis using late gadolinium enhancementmagnetic resonance imaging. IEEE transactions on medical imag-ing 33, 566–576.

Rother, C., Kolmogorov, V., Blake, A., 2004. Grabcut: Interactiveforeground extraction using iterated graph cuts, in: ACM trans-actions on graphics (TOG), pp. 309–314.

Tao, Q., Ipek, E.G., Shahzad, R., Berendsen, F.F., Nazarian, S.,van der Geest, R.J., 2016. Fully automatic segmentation of leftatrium and pulmonary veins in late gadolinium-enhanced MRI:Towards objective atrial scar assessment. Journal of magneticresonance imaging 44, 346–354.

Tobon-Gomez, C., Geers, A.J., Peters, J., Weese, J., Pinto, K.,Karim, R., Ammar, M., Daoudi, A., Margeta, J., Sandoval, Z.,et al., 2015. Benchmark for algorithms segmenting the left atriumfrom 3d ct and mri datasets. IEEE transactions on medical imag-ing 34, 1460–1473.

Veni, G., Elhabian, S.Y., Whitaker, R.T., 2017. Shapecut: Bayesiansurface estimation using shape-driven graph. Medical image anal-ysis 40, 11–29.

Vergara, G.R., Marrouche, N.F., 2011. Tailored management of atrialfibrillation using a LGE-MRI based model: From the clinic to theelectrophysiology laboratory. Journal of cardiovascular electro-physiology 22, 481–487.

Vergara, G.R., Vijayakumar, S., Kholmovski, E.G., Blauer, J.J.,Guttman, M.A., Gloschat, C., Payne, G., Vij, K., Akoum,N.W., Daccarett, M., et al., 2011. Real-time magnetic resonanceimaging–guided radiofrequency atrial ablation and visualizationof lesion formation at 3 Tesla. Heart Rhythm 8, 295–303.

Wilber, D.J., Pappone, C., Neuzil, P., De Paola, A., Marchlinski,F., Natale, A., Macle, L., Daoud, E.G., Calkins, H., Hall, B.,et al., 2010. Comparison of antiarrhythmic drug therapy and ra-diofrequency catheter ablation in patients with paroxysmal atrialfibrillation: A randomized controlled trial. Jama 303, 333–340.

Williams, S.E., Tobon-Gomez, C., Zuluaga, M.A., Chubb, H., Bu-takoff, C., Karim, R., Ahmed, E., Camara, O., Rhode, K.S., 2017.

Standardized unfold mapping: a technique to permit left atrial re-gional data display and analysis. Journal of Interventional CardiacElectrophysiology 50, 125–131.

Wu, F., Li, L., Yang, G., Wong, T., Mohiaddin, R., Firmin, D.,Keegan, J., Xu, L., Zhuang, X., 2018. Atrial fibrosis quantificationbased on maximum likelihood estimator of multivariate images,in: International Conference on Medical Image Computing andComputer-Assisted Intervention, Springer. pp. 604–612.

Xiong, Z., Fedorov, V.V., Fu, X., Cheng, E., Macleod, R., Zhao, J.,2018. Fully automatic left atrium segmentation from late gadolin-ium enhanced magnetic resonance imaging using a dual fully con-volutional neural network. IEEE Transactions on Medical ImagingPP, 1–10.

Yang, G., Zhuang, X., Khan, H., Haldar, S., Nyktari, E., Li, L.,Wage, R., Ye, X., Slabaugh, G., Mohiaddin, R., et al., 2018. Fullyautomatic segmentation and objective assessment of atrial scarsfor long-standing persistent atrial fibrillation patients using lategadolinium-enhanced MRI. Medical physics 45, 1562–1576.

Zhao, J., Xiong, Z., 2018. 2018 Atrial Segmentation Challenge. http://atriaseg2018.cardiacatlas.org/.

Zhuang, X., 2013. Challenges and methodologies of fully automaticwhole heart segmentation: a review. Journal of healthcare engi-neering 4, 371–407.

Zhuang, X., Shen, J., 2016. Multi-scale patch and multi-modality at-lases for whole heart segmentation of MRI. Medical image analysis31, 77–87.

13