Motion Artifact Recognition and Quanti cation in Coronary CT ...Keywords: Cardiac CT, Motion Artifact Measure, Coronary Angiography, Convolutional Neural Network 1. Introduction Non-invasive

Motion Artifact Recognition and Quantification in Coronary CTAngiography using Convolutional Neural Networks

T. Lossau (nee Elss)a,b, H. Nickischa, T. Wissela, R. Bippusa, H. Schmitta, M. Morlockb, M. Grassa

aPhilips Research, Hamburg, GermanybHamburg University of Technology, Germany

Abstract

Excellent image quality is a primary prerequisite for diagnostic non-invasive coronary CT angiography.Artifacts due to cardiac motion may interfere with detection and diagnosis of coronary artery disease andrender subsequent treatment decisions more difficult. We propose deep-learning-based measures for coronarymotion artifact recognition and quantification in order to assess the diagnostic reliability and image qualityof coronary CT angiography images. More specifically, the application, steering and evaluation of motioncompensation algorithms can be triggered by these measures. A Coronary Motion Forward Artifact modelfor CT data (CoMoFACT) is developed and applied to clinical cases with excellent image quality to introducemotion artifacts using simulated motion vector fields. The data required for supervised learning is generatedby the CoMoFACT from 17 prospectively ECG-triggered clinical cases with controlled motion levels on ascale of 0 to 10. Convolutional neural networks achieve an accuracy of 93.3% ± 1.8% for the classificationtask of separating motion-free from motion-perturbed coronary cross-sectional image patches. The targetmotion level is predicted by a corresponding regression network with a mean absolute error of 1.12 ± 0.07.Transferability and generalization capabilities are demonstrated by motion artifact measurements on eightadditional CCTA cases with real motion artifacts.

Keywords: Cardiac CT, Motion Artifact Measure, Coronary Angiography, Convolutional Neural Network

1. Introduction

Non-invasive coronary computed tomography an-giography (CCTA) has become a preferred tech-nique for the detection and diagnosis of coronaryartery disease (CAD) (Budoff et al., 2017; Foy et al.,2017; Camargo et al., 2017; Liu et al., 2017), buthigh quality imaging for small and moving vesselsis still challenging. ECG-controlled acquisition isused to enable the reconstruction of heart phaseswith small motion level and gating windows are lim-ited to the temporal projection range required forback-projection. However, hardware constraints re-strict the temporal resolution of the reconstructedCT image volumes. Despite ECG-triggering and-gating, cardiac motion frequently leads to artifactsin the reconstructed CT image volumes (Ghekiereet al., 2017). These artifacts manifest in typicalpatterns containing intensity undershoots and arc-shaped blurring due to the CT reconstruction ge-ometry (see Figure 1) and potentially limit or even

preclude the evaluation of parts of coronary arteriesor cause misinterpretations.

Thus, motion correction algorithms to improveimage quality of the coronary arteries have beenan important research area for years. Several ap-proaches have been developed which are basedon motion estimation via 3-D/3-D registrationof multiple heart phases and subsequent motion-compensated filtered back-projection (MC-FBP)(van Stevendaal et al., 2008; Isola et al., 2010; Bha-galia et al., 2012). An iterative motion compen-sation approach dealing with motion vector field(MVF) estimation by minimization of handcraftedmotion artifact measures (MAMs) has been intro-duced by Rohkohl et al. (2013).

Due to possible failure modes and their sub-stantial computational footprint, motion correctionmethods can benefit from a reliable measure of mo-tion artifacts. First, the recognition and quantifica-tion of motion artifacts in the coronary artery treeduring CCTA could decide whether and where mo-

Preprint submitted to ELSEVIER November 5, 2018

tion correction is required to enable the diagnosisof CAD and prevent misinterpretations. Second,measures for motion artifacts could be used to ver-ify the success of a motion compensation method.Finally, an integration of motion artifact measuresin the motion compensation process as shown byRohkohl et al. (2013) is conceivable.

Furthermore, assessment of the scan qualitymight be useful in automatic analysis of the coro-nary arteries, e.g. by reporting on the reliabil-ity of the coronary artery calcium score. A deep-learning-based system for the identification of coro-nary artery calcifications which are strongly af-fected by cardiac motion artifacts has been intro-duced by Sprem et al. (2017).

Most handcrafted measures for motion artifacts(Rohkohl et al., 2013; McGee et al., 2000) such asentropy and positivity are best suited for relativeassessment, i.e. for the comparison of the same im-age region at different motion states. An absolutemeasure for consistent artifact quantification acrosspatients and vessel segments has to be robust topossible variations in noise level, background inten-sity, vessel structure and contrast agent enhance-ment. The ability of five handcrafted metrics toquantify absolute motion artifact levels at the coro-nary arteries has been investigated by Ma et al.(2018). Beside the MAMs entropy and positivity,the three metrics normalized circularity, Fold Over-lap Ratio and Low-Intensity Region Score are con-sidered. These rely on a prior segmentation of theblurring artifacts or the intensity undershoot areas.

We propose a deep-learning-based method forthe quantification of the absolute motion artifactlevels directly from coronary cross-sectional im-age patches. By a data-driven methodology, ma-chine learning holds the promise to circumvent thechallenging task of designing an appropriate hand-crafted measure. Over the past few years, Convolu-tional Neural Networks (CNNs) have been drivingadvances in many image-related tasks such as pat-tern recognition, classification, segmentation, gen-eration, synthesis, style transfer, and translation(Krizhevsky et al., 2012; Chen et al., 2016; Gatyset al., 2016). Also in the medical domain, CNN-based predictive models showed great results (Lit-jens et al., 2017; Zreik et al., 2018; Ronnebergeret al., 2015).

In the preliminary work (Elss et al., 2018a,b),clinical data with synthetic motion artifacts is gen-erated and used for a subsequent supervised learn-ing process. In (Elss et al., 2018b) the feasibility

(a) Axial plane of astep-and-shoot case re-constructed at diastolicrest phase. Severe motionartifacts at the right coro-nary artery (RCA) arehighlighted in red.

(b) Axial plane of a heli-cal case reconstructed atdiastolic rest phase. Se-vere motion artifacts atthe right coronary artery(RCA) are highlighted inred.

Figure 1: Cardiac motion leads to differently shaped arti-facts in helical and step-and-shoot CT scans. Motion arti-facts in step-and-shoot CT scans are less complex as merelyone coherent angular segment is used for the reconstructionof each voxel. Due to table movement and multi-cycle re-construction, motion artifacts have a different appearance inCT scans with helical acquisition mode.

of motion artifact recognition using CNNs has al-ready been demonstrated. The most recent work(Elss et al., 2018a) furthermore deals with single-slice motion estimation based on the coronary ar-tifact appearance and the angular reconstructionrange.

Extending this work, we propose a CoronaryMotion Forward Artifact model for CT data (Co-MoFACT). Beyond motion artifact recognition, wequantify motion artifact levels with CNNs. Ourdeep-learning-based motion artifact measures arecreated by the following steps:

1. The CoMoFACT generates the required inputand label data for supervised learning by intro-ducing simulated and hence controlled motionto clinical cases with excellent image quality,see Section 3.1.

2. Following Elss et al. (2018b), CNNs are trainedto classify motion-free and motion-perturbedcoronary cross-sectional image patches, seeSections 3.2 & 3.2.3.

3. CNNs are trained to predict the artifact levelof coronary cross-sectional image patches, seeSections 3.2 & 3.2.4.

Finally, several experiments are performed to inves-tigate the generalization capabilities of the resul-tant deep-learning-based motion artifact measures

2

to clinical data with real artifacts (see Sections 4.2& 4.3.2).

2. Material

2.1. Reference data

Contrast-enhanced cardiac CT data sets with ex-cellent image quality are the basis of the proposeddeep learning procedure. In addition to the recon-structed CT image volumes, we require the corre-sponding coronary artery trees and the raw projec-tion data. Slice-by-slice visual inspection is per-formed to gather CCTA data sets which exhibit nocoronary motion artifacts in the reconstructed car-diac CT image volume. In total, 17 prospectivelyECG-triggered clinical data sets from different pa-tients are selected as reference point determiningthe no motion state.

In all reference cases, acquisition was performedwith a 256-slice CT scanner (Brilliance iCT, PhilipsHealthcare, Cleveland, OH, USA) using a step-and-shoot protocol and a gantry rotation speed of0.272 sec per turn. The restriction to step-and-shoot cases offers the advantage to generate arti-facts in a well-controlled situation without tablemovement or multi-cycle reconstruction (see Figure1). The mean heart rates of the patients HRmean

ranged from 45.2 bpm to 66.0 bpm during the ac-quisition. The cardiac CT image volumes are re-constructed at the mid-diastolic quiescent cardiacphase. The center of the cardiac gating windowfor the aperture-weighted cardiac reconstruction(AWCR) (Koken and Grass, 2006; van Stevendaalet al., 2007), hereafter called the reference cardiacphase r is chosen between 70% and 80% R-R inter-val, respectively.

The coronary artery tree of each case is seg-mented using the Comprehensive Cardiac AnalysisSoftware (IntelliSpace Portal 9.0, Philips Health-care, Cleveland, OH, USA). It includes a set of cen-terline points ~c ∈ C with associated information onthe corresponding cross-section, such as lumen con-tour and normal vector ~n~c (centerline direction). Asillustrated in Figure 2, the required input and labeldata for supervised learning is generated by apply-ing the forward model presented in Section 3.1 tothese 17 reference cases.

2.2. Test data

We collect eight clinical cases from different pa-tients which exhibit real motion artifacts for testing

purposes to complement the artifact-free referencecases. Step-and-shoot data (five vessels) as wellas helical data (three vessels) from the BrillianceiCT are considered and corresponding centerlinesare extracted by the Comprehensive Cardiac Anal-ysis Software. The helical CCTA scans have anextended temporal scan range which enables ret-rospectively ECG-gated reconstruction of multipleheart phases. The centerlines are extracted forthese cases from the reconstructed CT image vol-ume with 75% R-R as reference gating phase.

3. Methods

3.1. CoMoFACT

The Coronary Motion Forward Artifact model forCT data (CoMoFACT) takes a reconstructed CTimage volume with corresponding raw projectiondata and segmented coronary artery tree as inputand delivers locally motion-perturbed CT imagevolumes as output. The introduced motion levelis determined by the control parameter s ∈ R+,hereafter called the target motion strength.

The CoMoFACT introduces simulated motion byapplying the motion compensated filtered back-projection (MC-FBP) algorithm (van Stevendaalet al., 2008; Schafer et al., 2006) which is briefly ex-plained in Section 3.1.1. For each centerline point~c ∈ C in the coronary artery tree, an continuousMVF ~d~c with pre-selected target motion strength sis created. The Subsections 3.1.2, 3.1.3, 3.1.4 de-tail the design of the synthetic MVF and underlyingmotion models. Subsequent MC-FBP delivers a CTimage volume which is locally motion-perturbedaround the corresponding centerline point. The re-versing motion trajectory ~d−1

~c corresponds to thesimulated heart motion during acquisition.

After application of the CoMoFACT, one cross-sectional image patch is sampled perpendicular tothe centerline (see Section 3.2.1) and finally addedto the input data of the supervised learning process.As described in Section 3.2.3 and 3.2.4, groundtruth labels are defined by means of the utilizedtarget motion strength s. In this way, the CoMo-FACT enables the generation of the required inputand label data for supervised learning.

3.1.1. Motion-compensated filtered back-projection

The MC-FBP algorithm is an extension of theAWCR method. Both concepts are compared inFigure 3. In the AWCR, the attenuation coefficient

3

CoMoFACT

synthetic motion vector field

supervised

learning

3D cardiac

CT volume

coronary

artery tree

projection

data

ECG data

clean reference data

CNN

(classifier)

CNN

(regressor)

Ƹ𝑠 ϵ [0, 10]

sampled cross-sections

targ

et m

oti

on

str

eng

th𝑠

Figure 2: A Coronary Motion Forward Artifact model for CT data (CoMoFACT) is developed which enables to transformcardiac CT data sets with excellent image quality to locally motion-perturbed CT image volumes. Motion is introduced aroundeach centerline point of the coronary artery tree by respectively creating a synthetic continuous MVF and applying the MC-FBP algorithm. The target motion strength s is a control parameter which scales the length of the displacement vectors in theCoMoFACT. A coronary cross-sectional patch (highlighted in blue) is sampled perpendicular to the centerline from the locallymotion-perturbed image sub-volume and used as input data for supervised learning. Corresponding ground truth labels aredefined by means of the target motion strength s. Finally, CNNs are trained for motion artifact classification and artifact levelregression on randomly rotated, mirrored and cropped cross-sectional patches (highlighted in green, gray and red).

µ(~ν) of each voxel ~ν ∈ Ω in the field of view Ω ⊂ R3

is calculated by:

µ(~ν) =

tend∫tstart

wAWCR(t, ~ν) pfilt(t, ~ν) dt (1)

The weighting function wAWCR includes apertureweighting for avoidance of cone-beam artifacts, an-gular weighting for gated reconstruction and pi-partner normalization. The projection integralpfilt(t, ~ν), which passes through the voxel ~ν at timepoint t ∈ [tstart, tend], is re-binned to wedge geome-try and high-pass filtered with a ramp filter. Thevariables tstart and tend denote the start time andthe end time of the CT scan. The MC-FBP fur-thermore takes into account the estimated displace-ments ~d(t, ~ν) of each voxel during acquisition:

µ(~ν) =

tend∫tstart

wAWCR(t, ~ν+~d(t, ~ν)) pfilt(t, ~ν+~d(t, ~ν)) dt

(2)For this purpose, a reference motion state has to bechosen. MVFs are usually calculated by registra-tion and interpolation to approximate the motioneach image voxel has undergone between the refer-ence time t0 and the time each specific projection

was acquired. Each voxel is moved accordingly be-fore back-projection is actually done. So, MC-FBPleads to a compensation of correctly estimated mo-tion, whereas the application of the MC-FBP withan artificial MVF on high quality cases induces mo-tion artifacts.

𝑝 𝑡0, Ԧ𝑣

𝑝(𝑡−1, Ԧ𝑣)

Ԧ𝑣

(a) AWCR

Ԧ𝑑 𝑡−1, Ԧ𝑣𝑝 𝑡0, Ԧ𝑣

Ԧ𝑑 𝑡+1, Ԧ𝑣

𝑝(𝑡−1, Ԧ𝑣 + Ԧ𝑑 𝑡−1, Ԧ𝑣 )

Ԧ𝑣

(b) MC-FBP

Figure 3: Schematic drawing of voxel-driven back-projectionwithout and with motion compensation. (a) The simpleback-projection procedure of AWCR relies on the assump-tion, that the object is static during acquisition. Inconsis-tent projection data caused by motion leads to artifacts inthe reconstructed CT image volume. (b) In case of MC-FBP,moving voxel positions are considered and line integrals arespatially corrected in the back-projection procedure.

4

3.1.2. Synthetic motion vector field

The continuous MVF ~d~c : [0%, 100%]× Ω→ R3

describes the displacement of each voxel coordinate~ν ∈ Ω ⊂ R3 in the CT volume at each time pointtcc ∈ [0%, 100%] in millimeters. Due to the simula-tion of periodic motion, time is measured in percentcardiac cycle here. The artificial MVF is defined bythree separable components:

~d~c(tcc, ~ν) = s ·m~c(~ν) · ~δ~c(tcc) (3)

The first component is the pre-selected targetmotion strength s which scales the length of eachdisplacement vector in the continuous MVF. Therole of s as motion level regulator is illustrated inFigure 2. Whenever s = 0, no motion is introducedand the CoMoFACT delivers the input CT imagevolume without motion artifacts as output.

The second component is the location-dependentweighting mask m~c : Ω → [0, 1] which restricts themotion to a limited area around the currently pro-cessed centerline point ~c and additionally forces theMVF to be spatially smooth. It is defined as a 3Dtrapezoidal function generated by binary dilation ofthe centerline point and subsequent uniform filter-ing. A kernel radius of 15 mm for dilation and auniform filter size of 12.4 mm × 12.4 mm are chosenin the following experiments. The limitation of themotion area is required to prevent undesired mo-tion artifacts from peripheral structures like bones.The smoothing is necessary to avoid reconstructionartifacts as elastic tissue structure forbids abruptchanges of motion in a local neighborhood.

The third component ~δ~c : [0%, 100%] → R3

defines the motion direction for each point intime. It is obtained by piecewise linear inter-polation between five sample vectors ~δi ∈ R3,i ∈ −2,−1, 0,+1,+2. The corresponding phasepoints ti ∈ r − 10%, r − 5%, r, r + 5%, r + 10%are assigned around the reference heart phase rof the input CT volume with a temporal distanceof 5% cardiac cycle. The temporal projectionrange required for reconstruction depends on theheart rate and the gantry rotation speed. For thegiven data sets, the angular weighting window isnarrower than 20% cardiac cycle respectively, sono extrapolation has to be performed.

A schematic drawing of the artificial MVF isgiven in Figure 2. The displacement vectors (lightred arrows) are linearly interpolated in time domainfrom the sample vectors (dark red arrows) to obtainthe motion state at some tcc ∈ [r − 10%, r + 10%].

For a phase point tcc, the motion directions arespatially constant, while the displacement lengthdecreases with increasing distance to the currentlyprocessed centerline point ~c (highlighted in blue).

Two model variants are presented in the followingSubsections 3.1.3 and 3.1.4 which differ in termsof the sample vector definition. Both concepts arecompared in Figure 4.

Ԧ𝛿+1Ԧ𝛿0 = 0

Ԧ𝛿−1Ԧ𝛿−2

Ԧ𝛿+2Ԧ𝛿+2

ℓ = 𝑗=−2

1Ԧ𝛿𝑗 − Ԧ𝛿𝑗+1 2

Ԧ𝛿−2Ԧ𝛿+1

Ԧ𝛿−1 Ԧ𝛿0 = 0

sample vectorsreference point interpolated trajectory

a piecewise linear motion model (b) constant linear motion model

Figure 4: Schematic drawing of the time-dependent motiontrajectories (dashed blue lines) determined by the sample

vectors ~δi (red arrows) for both sub-models. (a) The piece-wise linear motion model comprises random displacementdirections and varying velocities. (b) The constant linearmotion model is restricted to a predefined motion directionand equidistant sample vectors.

3.1.3. Piecewise linear motion model

The first model variant of piecewise linear mo-tion has been introduced in Elss et al. (2018b) andwas developed for the classification task of separat-ing no-artifact and artifact coronary cross-sectionalpatches. The sample vectors are calculated by:

~δi =~ρi

maxj,k ‖~ρj − ~ρk‖2(4)

The motion directions are given by random uniformvectors ~ρi ∼ U [−1, 1]3 for i ∈ −2,−1,+1,+2 andthe center heart phase defines the reference stateof no motion, i.e. ~ρ0 = ~0. Normalization of therandom uniform vectors ~ρi is performed so that thetarget motion strength s finally corresponds to themaximal displacement during 20% R-R interval inmillimeters (see Figure 4a).

The choice of random uniform vectors for ~ρi en-ables complex motion trajectories and applicationof the CoMoFACT leads to realistic-looking mo-tion artifact pattern with differently shaped blur-ring and intensity undershoots (see Figure 5a). Thetarget motion strength s can be observed as weaksurrogate of the visual artifact level. For deep-learning-based motion artifact quantification, the

5

motion model is adapted with the objective of in-creasing the correlation between selected target mo-tion strength s and the resulting visual artifactlevel. Compared to (Elss et al., 2018b), the Co-MoFACT is extended for cross-section-wise motioncorruption under consideration of the mean heartrate and the angular reconstruction range.

3.1.4. Constant linear motion model

The second model variant is developed for theregression task of predicting the artifact level incoronary cross-sectional patches. Multiple factorsbeside the motion level during acquisition have animpact on the artifact level. The phantom studyin Figure 6 shows that the relation between mo-tion direction and the angular reconstruction rangeis essential. Most severe artifacts occur in case ofmotion which is orthogonal to the mean reconstruc-tion direction (highlighted in gray). In addition, thevisual artifact level depends on surrounding back-ground intensities, the temporal resolution requiredfor reconstruction and the relation between motiondirection and vessel orientation.

In contrast to the classification model variantpresented in Section 3.1.3, severe restrictions aremade to consider each of the aforementioned influ-encing factors except for surrounding backgroundintensities. The sample vectors are calculated nowby the following formula:

~δi =60 bpm

HRmean

i

4· ~ρorth

‖~ρorth‖2(5)

As illustrated in Figure 4b, the regression model islimited to constant linear motion. It takes the meanheart rate HRmean of each data set during acquisi-tion into account to force homogeneous velocitiesamong the clinical input cases. In contrast to theprevious classification model, the motion directionnow depends on the currently processed centerlinepoint ~c.

The motion direction determined by ~ρorth is de-fined as the cross product of the normal vector ~n~c ofthe corresponding centerline segment and the meanreconstruction direction in axial plane (see Figure 6,right). The mean reconstruction direction is com-puted by means of the gantry rotation angle at thecenter of the cardiac gating window and is constantfor each voxel reconstructed by the same circularscanning shoot. In case of helical acquisition tra-jectories, voxels are not necessarily reconstructedby one coherent angular segment. So, in contrast

to the classification model, the regression modelwith its orthogonal displacement directions is notdirectly transferable to helical cases.

Figure 5b shows coronary cross-section imageswith varying target motion strength s generated bythe constant linear motion model. In comparisonto the corresponding outputs of the piecewise lin-ear model in Figure 5a, the data looks more consis-tent and the target motion strength s can be inter-preted as an approximate measure for the artifactlevel. However, the severely restricted constant lin-ear motion model merely allows for a specific arti-fact appearance (banana-shaped blurring) whereasthe more complex motion trajectories of the piece-wise linear motion also generates bird-shaped arti-facts. Therefore, the risk of overfitting should beconsidered in the evaluation of the regression net-works.

3.1.5. Sub-volume reconstruction

For the final extraction of one cross-sectionalimage patch per motion-perturbed image volume,merely a limited area around the currently pro-cessed centerline point ~c is of interest. Therefore,the FOV Ω = Ω(~c) is restricted in the CoMoFACT,in order to speed-up the data generation process.The selected centerline point ~c defines the center ofthe restricted FOV and the FOV size is determinedby the patch size during the subsequent samplingprocess (see Section 3.2.1). In addition, the recon-struction of a sub-volume instead of the full inputCT image geometry is reasonable in terms of mem-ory requirements.

3.2. Supervised learning

The proposed CoMoFACT enables the gen-eration of multiple motion-perturbed CT image(sub)volumes I~c,sΩ with controlled motion level s atspecific coronary centerline points ~c ∈ C. On thebasis of velocity measurements at the coronary ar-teries by Vembar et al. (2003), the data generationprocess for the supervised learning task is limitedin the following experiments to maximal displace-ments of 10 millimeters during 20% cardiac circle.

3.2.1. Patch sampling

The sampling process of the input data is illus-trated in Figure 2. One cross-sectional image patchI~c,s100 of size 100× 100× k voxels (blue box) is sam-pled by trilinear interpolation from each output CTimage volume I~c,sΩ of the CoMoFACT with a resolu-tion of 0.4×0.4×0.4 millimeters per voxel. The first

6

s= 0 s= 1 s= 2 s= 3 s= 4 s= 5 s= 6 s= 7 s= 8 s= 9 s= 10

(a) Output patches of the piecewise linear motion model from Section 3.1.3.

s= 0 s= 1 s= 2 s= 3 s= 4 s= 5 s= 6 s= 7 s= 8 s= 9 s= 10

(b) Output patches of the constant linear motion model from Section 3.1.4.

Figure 5: Coronary cross-sectional image patches are sampled from motion-perturbed CT image volumes, which are generatedby means of the CoMoFACT. Each row shows the same cross-section of size 60 × 60 pixels in different motion states. Bothsub-models of the CoMoFACT are compared and reveal the trade-off between motion model complexity and suitability of s asartifact level surrogate. (a) The piecewise linear motion model from Section 3.1.3 leads to a wide range of coronary artifactappearances, but the visual coronary artifact level is not monotonically increasing with the underlying target motion strengths. Patches highlighted in green and red are assigned to the classes no artifact or artifact respectively. Non-highlighted patcheswith a target motion strength s between two and five are excluded from the learning process. (b) The constant linear motionmodel from Section 3.1.4 leads to a limited range of coronary artifact appearances, but the underlying target motion strength shighly correlates with the visual artifact level at the coronary arteries. All patches are included as input data in the regressionlearning process with s as their corresponding label.

7

RMSE = 3.39RMSE = 3.37RMSE = 3.05RMSE = 2.50no motion

Figure 6: Constant linear motion is introduced in the axial plane of a vessel phantom using the CoMoFACT. Depending on therelation of motion direction (red arrow) and mean reconstruction direction (gray dashed arrow), motion artifacts with varyinglevel occur. The artifact level is measured by the root mean squared error (RMSE) with respect to the motion-free phantomplane (left). Orthogonal motion (right) leads to most severe motion artifacts in the reconstructed image.

two patch dimensions are aligned perpendicular tothe centerline segment while the third dimension isoriented along the normal vector ~n~c. The centerslice of each image patch covers the processed cen-terline point ~c. Single-slice (k = 1) and multi-slice(k > 1) patches are tested as input data in the fol-lowing experiments. The grey values of each patchare clipped to the relevant intensity range with awindow/level setting of 900/200 HU and addition-ally normalized (from [−250, 650] to [−1, 1]).

3.2.2. Data augmentation

Due to the patch similarity of adjacent center-line points, the data for training, validation andtesting are case-wise separated with a ratio of9 : 4 : 4. The database during the training processis extended by online data augmentation. Motionartifacts are variable in shape, orientation andposition. In order to build this invariance into theneural network, the following transformations areperformed in cross-sectional plane, i.e. limited tothe first two dimensions:

Cropping: The CoMoFACT may cause small ves-sel shifts compared to the original coronary cen-terline position. Image translation by cropping isnecessary to avoid a bias from the in-plane coro-nary position. Therefore, sub-patches of the size85× 85× k are randomly cropped from I~c,s100.

Rotating: The sub-patches of the size 85×85×kare randomly rotated by 0 to 360 degrees. Thecenter patch of the size 60×60×k is finally croppedto ensure full image contents.

Mirroring: Horizontal mirroring is performedwith a probability of 0.5.

The final image patches I~c,s60 of size 60 × 60 × k

voxels are used as input data for supervised learn-ing. During validation and testing merely centercropping is performed.

3.2.3. Classification

Following Elss et al. (2018b), the database forthe classification task of separating artifact and noartifact cross-sectional patches is generated by ap-plying the proposed piecewiese linear motion modelof Section 3.1.3 seven times (with different targetmotion strength s) per coronary centerline point.The target label yclass (0: no artifact, 1: artifact)

of each image patch I~c,s60 is defined by the corre-sponding utilized target motion strength:

yclass =

0, if s ∈ 0, 11, if s ∈ 6, 7, 8, 9, 10

(6)

The gap in s is chosen to assure better class sep-aration. Merely a subset of two fifths of thesamples from class artifact is randomly selectedand included in the learning process, in order toforce balanced classes. By this procedure, a totalamount of 14 724 samples is collected as classifica-tion database.

Multiple hyperparameter settings and networkarchitectures including the ResNet (He et al.,2016) and VGG inspired networks (Simonyan andZisserman, 2015) were tested by extensive cross-validation. Highest validation results are achievedby a feed-forward 20-layer ResNet which is em-ployed in all subsequent experiments. Figure 7 il-lustrates the network architecture. The learningprocess is driven by the cross-entropy loss. TheAdam optimizer (Kingma and Ba, 2015) with aninitial learning rate of 0.05, a minibatch size of 32and a momentum of 0.8 is defined as the learningsetup.

8

Ou

tpu

t: 2

Inp

ut:

60

x 6

0 x

k

Den

se: 2

Glo

bal

avg

po

ol

Bas

ic_b

lock

: 12

8

Bas

ic_b

lock

: 12

8

Stri

de_

blo

ck: 1

28

Bas

ic_b

lock

: 64

Bas

ic_b

lock

: 64

Stri

de_

blo

ck: 6

4

Bas

ic_b

lock

: 32

Bas

ic_b

lock

: 32

Bas

ic_b

lock

: 32

Co

nv:

3 x

3, 3

2

BN

+ R

eLU

Soft

max

Re

LU

Co

nv:

1 x

1, 6

4, /

2

BN

+ R

eLU

BN

Co

nv:

3 x

3, 6

4, /

2

Co

nv:

3 x

3, 6

4

Stride_block: 64

BN

+ R

eLU

Re

LU

BN

Co

nv:

3 x

3, 3

2

Co

nv:

3 x

3, 3

2

Basic_block: 32Conv: 3 x 3, 32, /2 , BN = Batch Normalization, ReLU = Rectified Linear Unit

kernel size, number filters, stride

Notation:

Figure 7: The 20-layer ResNet takes coronary cross-sectional image patches of the size 60 × 60 × k as input. In the last layerthe soft-max function is used as activation function of the two (positive and negative) output nodes. The projection shortcutsare realized as 1 × 1 convolutions. In case k > 1 the convolutional kernel sizes are enlarged from (3 × 3 × 1) to (3 × 3 × 3).

Finally, the trained neural network NNclass takesa cross-sectional patch I~c,s60 as input and delivers apredicted artifact probability as output.

3.2.4. Regression

The database for the regression task of predict-ing the artifact level in cross-sectional patches isgenerated by applying the proposed constant lin-ear motion model of Section 3.1.4 eleven timesper coronary centerline point. The target labelyregr = s ∈ 0, 1, . . . , 10 of each image patch I~c,s60 isset equal to the corresponding utilized target mo-tion strength. By this discrete equidistant labelingprocedure, a total amount of 40 491 samples is col-lected as regression database.

Except for the reduction of output neurons in thelast layer from two to one and the replacement ofthe soft-max function by simple linear activation,no adaption of the network architecture is donecompared to Figure 7. The initial learning rateis changed to 5 · 10−4 while the remaining hyper-parameters remain unchanged.

The neural network NNregr takes a cross-sectional

patch I~c,s60 as input and delivers a prediction y ∈R as output. The learning process is driven by apiecewise L1 loss function:

l(yregr, y) =

max(0, y), if yregr = 0

|yregr − y|, if 0 < yregr < 10

max(0, 10− y), if yregr = 10

(7)This loss function considers adaptive penalizationat the boundaries for predictions outside the targetinterval [0, 10] to avoid for too conservative predic-tions. In comparison to network training with the

simple L1 loss, the regressors more often dare out-put values near zero or ten. Clipping of the net-work output finally delivers the predicted artifactlevel s = min(max(0, y), 10) ∈ [0, 10] which is usedfor the following evaluation.

4. Experiments and Results

4.1. Quantitative error analysis

For all experiments, the Microsoft CognitiveToolkit (CNTK v2.5, Microsoft Research, Red-mond, WA, USA) is used as deep learning frame-work. A bagging approach is applied for quantita-tive evaluation which comprises the following steps:

1. Four validation cases and four test cases arerandomly sampled.

2. Network training is performed based on the re-maining nine clinical cases.

3. After every epoch of the learning process thegeneralization capability is examined by meansof the validation set.

4. The model with the highest performance onthe validation set during 60 epochs of trainingis selected for calculation of the test accuracy(or test error).

5. Steps 1.-4. are performed five times in total.

6. The mean test accuracy (or the mean test er-ror) over the five splits is calculated.

7. Steps 5.-6. are performed for k ∈ 1, 3, 5, 7(with the same separations in training, valida-tion and testing for comparability).

The test results of the classification and the re-gression networks are summarized in Table 1. Thenetworks performances increase with the number of

9

Table 1: Test results including mean and standard deviationof the classification accuracy and the absolute regression er-ror for single-slice and multi-slice input data.

number classification regressionslices accuracy error

k = 1 91.64%± 1.63% 1.38± 0.17k = 3 92.08%± 2.12% 1.16± 0.06k = 5 92.70%± 2.18% 1.14± 0.07k = 7 93.26%± 1.82%93.26%± 1.82%93.26%± 1.82% 1.12± 0.071.12± 0.071.12± 0.07

input slices. The additional information in multi-slice patches seem to provide a benefit, e.g. inthe differentiation between bifurcations and blur-ring artifacts. But, higher memory requirementsand execution time have to set against it. In caseof k = 7, the classification result splits into a ratioof 46.90% : 46.36% : 3.64% : 3.10% for the ratesTN : TP : FN : FP, where positive refers to theclass artifact. Figure 8 shows the confusion matrixof the corresponding regression network. A cleardiagonal structure with few scattering of the pre-dicted labels is observable.

0 1 2 3 4 5 6 7 8 9 10

Predicted label

0

1

2

3

4

5

6

7

8

9

10

Tru

e la

bel

0.65 0.25 0.06 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00

0.42 0.35 0.15 0.04 0.02 0.01 0.00 0.00 0.00 0.00 0.00

0.10 0.24 0.31 0.20 0.08 0.03 0.02 0.01 0.00 0.00 0.00

0.01 0.07 0.19 0.31 0.24 0.10 0.05 0.02 0.01 0.00 0.00

0.00 0.02 0.06 0.19 0.31 0.23 0.11 0.05 0.02 0.01 0.00

0.00 0.01 0.02 0.08 0.18 0.30 0.22 0.11 0.05 0.01 0.01

0.00 0.01 0.01 0.03 0.09 0.19 0.27 0.23 0.12 0.04 0.01

0.00 0.00 0.01 0.02 0.04 0.10 0.19 0.27 0.22 0.11 0.04

0.00 0.00 0.01 0.01 0.03 0.05 0.11 0.20 0.26 0.21 0.11

0.00 0.00 0.00 0.01 0.02 0.04 0.07 0.14 0.21 0.25 0.25

0.00 0.00 0.00 0.01 0.02 0.03 0.06 0.10 0.17 0.22 0.40

mean abs. error: 1.12 +/- 0.07

0.0

0.1

0.2

0.3

0.4

0.5

0.6

rd(s)

s

Figure 8: Confusion matrix of the regression network formulti-slice input data (k = 7) with rd denoting the roundingoperator.

The presented quantitative results prove thatCNNs are able to identify artifact pattern fromsynthetically introduced motion. To further evalu-ate generalization capabilities and the performanceof the learned motion artifact measures in clini-cal practice additional qualitative experiments areperformed. In comparison with the handcrafted

MAMs entropy and positivity from Rohkohl et al.(2013), the abilities for measuring relative and ab-solute levels of motion artifacts are verified.

4.2. Relative artifact measurement

In the following, it is investigated whether thedeep-learning-based artifact measures are able toidentify the cardiac phase of a clinical data setwith least motion artifacts. A quantitative studyby Vembar et al. (2003) has shown that minimumvelocities at the right coronary artery (RCA) canbe observed in the mid-diastolic cardiac phase (be-tween 70% and 80% R-R). Therefore, increasingtemporal distance to this cardiac phase should goalong with increasing artifact levels.

The three helical test cases (see Section 2.2) arereconstructed at multiple cardiac phases using theAWCR algorithm. The selected cardiac phases arearranged around 75% R-R interval (mid-diastole)with a temporal distance of 2% R-R interval. Cross-sectional patches of size 60 × 60 × 7 are sampledalong the RCA as input for the deep-learning-basedand the handcrafted motion artifact measures. Fi-nally, the mean motion artifact measures across theentire vessel are computed for each reconstructedphase image. In this experiment, the bagging en-sembles of the five classification and the five regres-sion networks with k = 7 are selected. The cal-culated motion artifact measures are scaled to thevalue range [0, 1], to provide comparability. There-fore, the predicted artifact levels s ∈ [0, 10] of theregression network NNregr are down-scaled by a fac-tor of ten. The handcrafted MAMs are normalizedto the interval [0, 1] by the minimal and maximaloutput over all motion states. As the classificationnetwork NNclass already delivers predicted artifactprobabilities, these remain unchanged.

Figure 9 shows the results of the proposed multi-phase experiment. In the first case (Figure 9a), allmotion artifact measures (handcrafted and deep-learning-based) provide similar results. The pre-dicted best cardiac phases around 72− 76% R-Rcomply with the visual inspection. In the sec-ond case (Figure 9b), only the positivity and thedeep-learning-based measures deliver predictions ofthe best cardiac phase which concur with the vi-sual impression. However, a weakness of the neu-ral networks can be discerned. The modulation ofthe radiation dose leads to lower signal-noise-ratios(SNRs) at the marginal cardiac phases around 60%R-R. The trained neural networks seem to be more

10

60%

R-R

62%

R-R

64%

R-R

66%

R-R

68%

R-R

70%

R-R

72%

R-R

74%

R-R

76%

R-R

78%

R-R

80%

R-R

82%

R-R

84%

R-R

86%

R-R

88%

R-R

90%

R-R

0.0

0.2

0.4

0.6

0.8

1.0

60%

R-R

62%

R-R

64%

R-R

66%

R-R

68%

R-R

70%

R-R

72%

R-R

74%

R-R

76%

R-R

78%

R-R

80%

R-R

82%

R-R

84%

R-R

86%

R-R

88%

R-R

90%

R-R

0.0

0.2

0.4

0.6

0.8

1.0

60%

R-R

62%

R-R

64%

R-R

66%

R-R

68%

R-R

70%

R-R

72%

R-R

74%

R-R

76%

R-R

78%

R-R

80%

R-R

82%

R-R

84%

R-R

86%

R-R

88%

R-R

90%

R-R

0.0

0.2

0.4

0.6

0.8

1.0

NNclassNNregrentropypositivity

60% R-R 62% R-R 64% R-R 66% R-R

68% R-R 70% R-R 72% R-R 74% R-R

76% R-R 78% R-R 80% R-R 82% R-R

84% R-R 86% R-R 88% R-R 90% R-R

(a) Case 1

60% R-R 62% R-R 64% R-R 66% R-R

68% R-R 70% R-R 72% R-R 74% R-R

76% R-R 78% R-R 80% R-R 82% R-R

84% R-R 86% R-R 88% R-R 90% R-R

(b) Case 2

60% R-R 62% R-R 64% R-R 66% R-R

68% R-R 70% R-R 72% R-R 74% R-R

76% R-R 78% R-R 80% R-R 82% R-R

84% R-R 86% R-R 88% R-R 90% R-R

(c) Case 3

Figure 9: The mean motion artifact level of the RCA is analyzed in three cardiac CT cases with helical acquisition mode bythe deep-learning-based and the handcrafted motion artifact measures. For each reconstructed CT image volume within thephase interval [60%, 90%], cross-sectional image patches are sampled based on the centerline segmented at the reference phase75% R-R. The outputs of the motion artifact measures are averaged, scaled to the interval [0, 1] and visualized in the upperrow. The predictions of the best cardiac phases are indicated by colored dots, respectively. One axial slice per selected cardiacphase is given below for visual inspection.

vulnerable to such SNR fluctuations than the hand-crafted measures. In the third case (Figure 9c), theregression network is in agreement with both hand-crafted MAMs with respect to the predicted bestcardiac phase at 76% R-R. The classification net-work selects an earlier stage around 66% R-R inwhich also hardly artifacts occur. A temporally ex-tended rest phase is observable which is discoveredby the trained neural networks.

The results of this multi-phase experiment arepromising given the fact, that the deep-learning-based measures are solely trained on step-and-shootdata which is perturbed by constant or piecewiselinear motion. Generalization capabilities of theCNNs and transferability to helical data sets withreal motion artifacts are demonstrated by this ex-periment.

4.3. Absolute artifact measurement

In the following, it is investigated whether thedeep-learning-based artifact measures are able todetect a region of motion, given the approximatelocation of the coronary artery.

4.3.1. Evaluation on synthetic motion artifacts

Local motion is introduced to test reference casesat arbitrary points in the coronary tree by meansof the piecewise linear motion model proposed inSection 3.1.3. A target motion strength of s = 8is selected in the following experiments. For eachcenterline point of the corresponding vessel, a cross-sectional patch of size 60× 60× 7 is sampled fromthe locally motion-perturbed CT image volume I~c,sΩ

and the corresponding motion artifact measures arecalculated. The classification and regression net-works are selected so that the currently processed

11

19.8 59.4 98.9 138.5 178.1 217.6 257.2

curv

ed M

PR

0.0

0.5

1.0

patc

hes

19.8 59.4 98.9 138.5 178.1 217.6 257.2

c

urv

ed M

PR

0.0

0.5

1.0

p

atc

hes

m~c NNclass NNregr entropy positivity

(a) Case 1, vessel = LAD

12.9 38.6 64.3 90.0 115.7 141.4 167.1

curv

ed M

PR

0.0

0.5

1.0

patc

hes

12.9 38.6 64.3 90.0 115.7 141.4 167.1

c

urv

ed M

PR

0.0

0.5

1.0

p

atc

hes


(b) Case 1, vessel = RCA

21.1 63.4 105.7 148.0 190.3 232.6 274.9

curv

ed M

PR

0.0

0.5

1.0

patc

hes

21.1 63.4 105.7 148.0 190.3 232.6 274.9

c

urv

ed M

PR

0.0

0.5

1.0

p

atc

hes


(c) Case 2, vessel = LAD

22.2 66.6 111.1 155.5 199.9 244.4 288.8

curv

ed M

PR

0.0

0.5

1.0

patc

hes

22.2 66.6 111.1 155.5 199.9 244.4 288.8

c

urv

ed M

PR

0.0

0.5

1.0

p

atc

hes


(d) Case 2, vessel = RCA

Figure 10: Section-wise outputs of the deep-learning-based and the handcrafted motion artifact measures are calculated for fourvessels without and with synthetic motion artifacts. In each subplot, the no motion state is given as reference above the localmotion state. Stack transition artifacts (a)&(d), bifurcations (b) and severe calcifications (c) are indicated by white arrows inthe MPRs. The weighting mask value m~c marked in red, corresponds to the true relative displacement width of each centerlinepoint and determines the area of introduced motion. Corresponding cross-sectional image patches are given below for visualinspection. In contrast to the handcrafted MAMs, high predictions made by the neural networks are mostly correctly locatedat the regions of motion influence.

12

test case has neither been used for training nor forvalidation.

Figure 10 shows the results of this local motionexperiment for two test cases. The left anteriordescending artery (LAD) and the right coronaryartery (RCA) are processed, respectively. In eachsubplot, the original no motion state is visualizedas reference above the local motion state. The x-axis corresponds to the spatial coordinate along thecenterline. The value range [0, 1] of the y-axis is de-termined by the weighting mask values m~c ∈ [0, 1]which correspond to the level of introduced mo-tion. In each subplot, calculated motion artifactmeasures are scaled accordingly, to provide compa-rability again.

As expected, the handcrafted MAMs are not suit-able for section-wise motion artifact quantification,due to limited robustness regarding variations inbackground intensities. The deep-learning-basedmeasures, by contrast, accurately detect regions ofmotion with few exceptions. Both networks are ro-bust towards vessel shifts at stack transitions (seeFigure 10a). Bifurcations (see Figures 10b), calcifi-cations (see Figures 10c) and varying contrast lev-els between scanning sequences (see Figure 10d) donot affect the deep-learning-based measures either.The aforementioned image areas are highlighted inthe multi-planar reformations (MPRs) by white ar-rows. The lowest artifact level in the motion area ispredicted by the regression network in Figure 10d,which also confirms with visual inspection of thefour motion-perturbed MPRs. This experiment al-ready demonstrates generalization capability of theregression network which is trained on perturbeddata with constant linear motion and tested on datawith more complex piecewise linear motion trajec-tories.

4.3.2. Evaluation on real motion artifacts

The ability for absolute motion artifact measure-ment is additionally tested on eight clinical caseswith real motion artifacts (see Section 2.2). En-semble averaging (k = 7) is performed for theevaluation. Figure 11 shows the resulting artifactmeasurements and corresponding cross-sectionalpatches. The vessels are sorted by the maximumartifact level predicted by the regression network.Artifact areas identified by the classification net-works (with running average for outlier removal)are highlighted in red.

Four separate observer studies were per-formed to rate the 120 cross-sectional image

patches visualized in Figure 11 with respectto diagnostic reliability in a five point Lik-ert scale ( : excellent, : good, : mixed,

: strong artifact, : non-diagnostic). Theeight vessels were presented in random order with-out indication of the motion artifact measures tothe readers. It has to be noted that the readerswere no radiologists, but research scientists withhigh level of expertise in reading cardiac CT im-ages. The resulting annotations are visualized ascolor bars in the Figure 11.

In contrast to the handcrafted MAMs, the deep-learning-based measures deliver sensible results.The classification networks enable the rough detec-tion of artifact areas and the regression networkadditionally allows one to assess the artifact sever-ity. Noise (see Figure 11e) and vessel segments withtiny lumen (see Figure 11g) can be identified aspotential sources of uncertainty. The human ob-server ratings show disagreements in these areas aswell. Furthermore, confusions between bifurcationsand blurring artifacts are observable in the readerscores (see first patch of Figure 11b and Figure11h). Image patches with consistent artifact labels(red or orange) are correctly identified as motion-perturbed by the neural networks. This also holdsfor image patches with consistent artifact-free labels(light green and dark green). Hence, transferabilityfrom synthetic to real motion artifacts is demon-strated. The proposed experiments reveal poten-tial and current limitations of deep-learning-basedartifact measurement in clinical practice.

5. Discussion

The paper demonstrates the feasibility of accu-rate motion artifact quantification in the coronaryarteries using deep learning. The results of thequantitative error analysis in Section 4.1 merelyprovide indications, as the target labels do notalways constitute exact ground truth. Figure 5ashows multiple image patches with false positivetarget label, for instance originating from motionalong the vessel orientation. The thresholds whichdefine the margin in s are crucial parameters whichmight affect the classification performance. Thereduced complexity of the constant linear motionmodel presented in Section 3.1.4 enables more con-sistent target labels, but these might also exhibitslight inaccuracies. Some label noise originatesfrom approximations and simplifications in the Co-MoFACT. The centerline and its normal vectors are

13

(a)RCA,seq 0.0

0.5

1.0patches

(b)RCA,hel 0.0

0.5

1.0

patches

(c)RCA,hel 0.0

0.5

1.0

patches

(d)RCA,seq 0.0

0.5

1.0

patches

(e)RCA,hel 0.0

0.5

1.0

patches

(f)RCA,seq 0.0

0.5

1.0

patches

(g)RCA,seq 0.0

0.5

1.0

patches

(h)LAD,seq 0.0

0.5

1.0

patches

NNclass NNregr entropy positivy predicted artifact area predicted artifact-free area

Figure 11: Section-wise outputs of the deep-learning-based and the handcrafted motion artifact measures are calculated foreight vessels with real motion artifacts. The predicted artifact level s of NNregr, the entropy and the positivity are down-scaled to [0, 1]. Corresponding cross-sectional image patches with four human observer ratings each (from : excellent to

: non-diagnostic) are given below for visual inspection. The vessels are sorted by the maximum artifact level predicted bythe regression network. Vessel type (RCA or LAD) and acquisition mode (seq: step-and-shoot or hel: helical) are specified.

14

merely estimates and the mean reconstruction di-rection is limited to the axial plane, i.e. the z-axisis not considered. Furthermore, motion in the ax-ial plane has different effects on the reconstructedimage data than motion in z-direction. Also otherfactors like background intensities and image noiseinfluence the visual artifact level. The majority ofthe generated image and corresponding label dataconstitute good approximations of the reality. Fornetwork training purposes, the label quality is suf-ficient, but exact test performance can not be de-termined. As a next step, quantitative compari-son studies to hand-labeled data from radiologistsshould be performed.

For the quality assessment of CCTA images us-ing the proposed motion artifact measures, the ap-proximate locations of the coronary arteries haveto be known. Since motion artifacts frequently in-hibit fully automatic centerline segmentation, al-ternative approaches are required. Many tools forcoronary centerline extraction are semi-automatic,i.e. they allow the user to guide the segmentationprocess. Furthermore, the creation of a coronaryartery atlas which involves the probability densityfor the position of each vessel segment with respectto the heart segmentation and the deployment ofdeep-learning based centerline extraction are op-tions which should be investigated. The proposedCoMoFACT enables one to evaluate the robustnessof centerline extraction methods with regard to mo-tion artifacts. In general, motion introduction bythe CoMoFACT might be useful as a data aug-mentation strategy in several other CT data-drivenlearning tasks.

So far, the proposed measures are merely basedon 17 clinical data sets. CT images for non-invasivecoronary angiography are acquired with a wide va-riety of scanner types, imaging protocols and recon-struction algorithms. In order to increase robust-ness of the CNNs, collection of more data and net-work fine-tuning should be performed. Especially inorder to increase the networks insensitivity to noise,an extension of the training database by includingclinical cases with lower SNR or synthetic noise in-troduction would be required. Robustness mightadditionally be improved by measurement smooth-ing of adjacent centerline points in order to reduceappearing scatter and to avoid outliers.

6. Conclusions

Application of the MC-FBP algorithm using highquality cardiac CT cases and artificial MVFs asinputs, enabled the generation of CT image datawith controlled motion levels at the coronary ar-teries. Subsequent supervised learning of CNNsdelivered measures for motion artifact recognitionand quantification. High predictive accuracy wasachieved on data with synthetically introduced mo-tion, thus demonstrating the advantage of deep-learning-based measures for motion artifact local-ization over existing handcrafted MAMs. Further-more, the generalization capabilities of our mea-sures have been shown by clinical data with realartifacts and helical acquisition mode. Quantitativevalidation studies are required to assess the trans-ferability of these promising initial results to motionartifact prediction in clinical practice.

References

Bhagalia, R., Pack, J.D., Miller, J.V., Iatrou, M., 2012. Non-rigid registration-based coronary artery motion correctionfor cardiac computed tomography. Medical Physics 39,4245–4254.

Budoff, M.J., Li, D., Kazerooni, E.A., Thomas, G.S., Mieres,J.H., Shaw, L.J., 2017. Diagnostic accuracy of nonin-vasive 64-row computed tomographic coronary angiogra-phy (CCTA) compared with myocardial perfusion imag-ing (MPI): the PICTURE study, a prospective multicen-ter trial. Academic Radiology 24, 22–29.

Camargo, G.C., Peclat, T., Souza, A.C., Lima, R.d.S.L.,Gottlieb, I., 2017. Prognostic performance of coronarycomputed tomography angiography in asymptomatic in-dividuals as compared to symptomatic patients with anappropriate indication. Journal of Cardiovascular Com-puted Tomography 11, 148–152.

Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille,A.L., 2016. DeepLab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully con-nected CRFs. IEEE Transactions on Pattern Analysis andMachine Intelligence doi:10.1109/TPAMI.2017.2699184.

Elss, T., Nickisch, H., Wissel, T., Bippus, R., Morlock, M.,Grass, M., 2018a. Motion estimation in coronary CTangiography images using convolutional neural networks.Medical Imaging with Deep Learning (MIDL) .

Elss, T., Nickisch, H., Wissel, T., Schmitt, H., Vembar,M., Morlock, M., Grass, M., 2018b. Deep-learning-based CT motion artifact recognition in coronary arter-ies, in: Medical Imaging 2018: Image Processing, Inter-national Society for Optics and Photonics. p. 1057416.doi:10.1117/12.2292882.

Foy, A.J., Dhruva, S.S., Peterson, B., Mandrola, J.M., Mor-gan, D.J., Redberg, R.F., 2017. Coronary computed to-mography angiography vs functional stress testing for pa-tients with suspected coronary artery disease: A system-atic review and meta-analysis. JAMA Internal Medicine177, 1623–1631. doi:10.1001/jamainternmed.2017.4772.

15

http://dx.doi.org/10.1109/TPAMI.2017.2699184

http://dx.doi.org/10.1117/12.2292882

http://dx.doi.org/10.1001/jamainternmed.2017.4772

Gatys, L.A., Ecker, A.S., Bethge, M., 2016. Image styletransfer using convolutional neural networks, in: Proceed-ings of the IEEE conference on Computer Vision and Pat-tern Recognition (CVPR), pp. 2414–2423.

Ghekiere, O., Salgado, R., Buls, N., Leiner, T., Mancini,I., Vanhoenacker, P., Dendale, P., Nchimi, A., 2017. Im-age quality in coronary CT angiography: challenges andtechnical solutions. The British Journal of Radiology 90,20160567.

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residuallearning for image recognition, in: Proceedings of theIEEE conference on Computer Vision and Pattern Recog-nition (CVPR), pp. 770–778.

Isola, A.A., Grass, M., Niessen, W.J., 2010. Fully auto-matic nonrigid registration-based local motion estimationfor motion-corrected iterative cardiac CT reconstruction.Medical Physics 37, 1093–1109.

Kingma, D., Ba, J., 2015. Adam: A method for stochasticoptimization, in: International Conference on LearningRepresentations (ICLR).

Koken, P., Grass, M., 2006. Aperture weighted cardiac re-construction for cone-beam CT. Physics in Medicine andBiology 51, 3433.

Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Ima-genet classification with deep convolutional neural net-works, in: Advances in Neural Information ProcessingSystems (NIPS), pp. 1097–1105.

Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi,F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B.,Sanchez, C.I., 2017. A survey on deep learning in medicalimage analysis. Medical Image Analysis 42, 60–88.

Liu, T., Maurovich-Horvat, P., Mayrhofer, T., Puchner,S.B., Lu, M.T., Ghemigian, K., Kitslaar, P.H., Broersen,A., Pursnani, A., Hoffmann, U., et al., 2017. Quantitativecoronary plaque analysis predicts high-risk plaque mor-phology on coronary computed tomography angiography:results from the ROMICAT II trial. The InternationalJournal of Cardiovascular Imaging , 1–9.

Ma, H., Gros, E., Szabo, A., Baginski, S.G., Laste, Z.R.,Kulkarni, N.M., Okerlund, D., Schmidt, T.G., 2018. Eval-uation of motion artifact metrics for coronary CT angiog-raphy. Medical Physics 45, 687–702.

McGee, K.P., Manduca, A., Felmlee, J.P., Riederer, S.J.,Ehman, R.L., 2000. Image metric-based correction (au-tocorrection) of motion effects: analysis of image metrics.Journal of Magnetic Resonance Imaging 11, 174–181.

Rohkohl, C., Bruder, H., Stierstorfer, K., Flohr, T., 2013.Improving best-phase image quality in cardiac CT by mo-tion correction with MAM optimization. Medical Physics40.

Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Con-volutional networks for biomedical image segmentation,in: International Conference on Medical Image Com-puting and Computer-Assisted Intervention (MICCAI),Springer. pp. 234–231.

Schafer, D., Borgert, J., Rasche, V., Grass, M., 2006.Motion-compensated and gated cone beam filtered back-projection for 3-D rotational X-ray angiography. IEEETransactions on Medical Imaging 25, 898–906.

Simonyan, K., Zisserman, A., 2015. Very deep convolutionalnetworks for large-scale image recognition, in: Interna-tional Conference on Learning Representations (ICLR).

van Stevendaal, U., von Berg, J., Lorenz, C., Grass, M.,2008. A motion-compensated scheme for helical cone-beam reconstruction in cardiac CT angiography. Medical

Physics 35, 3239–3251.van Stevendaal, U., Koken, P., Begemann, P.G., Koester,

R., Adam, G., Grass, M., 2007. ECG gated circular cone-beam multi-cycle short-scan reconstruction algorithm, in:Medical Imaging 2007: Physics of Medical Imaging, In-ternational Society for Optics and Photonics. p. 65105P.

Vembar, M., Garcia, M., Heuscher, D., Haberl, R.,Matthews, D., Bohme, G., Greenberg, N., 2003. A dy-namic approach to identifying desired physiological phasesfor cardiac imaging using multislice spiral CT. MedicalPhysics 30, 1683–1693.

Sprem, J., de Vos, B.D., de Jong, P.A., Viergever, M.A.,Isgum, I., 2017. Classification of coronary artery calcifi-cations according to motion artifacts in chest CT using aconvolutional neural network, in: Styner, M.A., Angelini,E.D. (Eds.), Medical Imaging 2018: Image Processing, In-ternational Society for Optics and Photonics. p. 101330R.doi:10.1117/12.2253669.

Zreik, M., Lessmann, N., van Hamersvelt, R.W., Wolterink,J.M., Voskuil, M., Viergever, M.A., Leiner, T., Isgum, I.,2018. Deep learning analysis of the myocardium in coro-nary CT angiography for identification of patients withfunctionally significant coronary artery stenosis. MedicalImage Analysis 44, 72–85.

16

http://dx.doi.org/10.1117/12.2253669

Documents

Motion Artifact Recognition and Quanti cation in Coronary CT ...Keywords: Cardiac CT, Motion Artifact Measure, Coronary Angiography, Convolutional Neural Network 1. Introduction Non-invasive