17
Qiao Zheng et al. 1 Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow Qiao Zheng a,* , Herv´ e Delingette a , Nicholas Ayache a a Universit´ eCˆ ote d’Azur, Inria, France ARTICLE INFO Keywords: Cardiac pathology, Classifi- cation, Cine MRI, Motion, Deep learn- ing, Semi-supervised learning, Neural network, Apparent flow ABSTRACT We propose a method to classify cardiac pathology based on a novel approach to ex- tract image derived features to characterize the shape and motion of the heart. An orig- inal semi-supervised learning procedure, which makes ecient use of a large amount of non-segmented images and a small amount of images segmented manually by ex- perts, is developed to generate pixel-wise apparent flow between two time points of a 2D+t cine MRI image sequence. Combining the apparent flow maps and cardiac seg- mentation masks, we obtain a local apparent flow corresponding to the 2D motion of myocardium and ventricular cavities. This leads to the generation of time series of the radius and thickness of myocardial segments to represent cardiac motion. These time series of motion features are reliable and explainable characteristics of pathological cardiac motion. Furthermore, they are combined with shape-related features to classify cardiac pathologies. Using only nine feature values as input, we propose an explain- able, simple and flexible model for pathology classification. On ACDC training set and testing set, the model achieves 95% and 94% respectively as classification accuracy. Its performance is hence comparable to that of the state-of-the-art. Comparison with various other models is performed to outline some advantages of our model. 1. Introduction Cine magnetic resonance imaging (cine MRI) is widely used in the clinic as an approach to identify cardiac pathology. For both the patients and the clinicians, there is hence a great need for automated accurate cardiac pathology identification and classification based on MRI images as mentioned in Rueck- ert et al. (2016) and Comaniciu et al. (2016), as well as in the myocardial infarct classification challenge run at the STACOM workshop in 2015 (Suinesiaputra et al. (2018)). Recently, the state-of-the-art cardiac pathology classification methods extract various features from MRI images and perform classification based on these features. Despite the great results achieved so far, there are still some aspects that need to be further explored. First, most classification models, including the state-of-the- art models, take many feature values together as input to a sin- gle or a group of machine learning classifiers (e.g. Khened et al. (2017), Khened et al. (2018), Wolterink et al. (2017), Cetin et al. (2017), Isensee et al. (2017)), and output the pre- dicted probability distribution over several classes. Like many * Corresponding author: Tel.: +33-492385024; fax: +33-492387669; Inria Epione, 2004 route des Lucioles BP 93, 06902 Sophia Antipolis Cedex, France e-mail: [email protected] (Qiao Zheng) other machine learning methods, or more specifically like most deep learning methods, these classification models are not easy to interpret. On the one hand, most of the models contain at least hundreds of parameters and it is impractical to examine and explain the role of each parameter. On the other hand, as many features are used simultaneously, it is hard to tell in a straightforward manner which feature value contributes to the identification of which category. This drawback on explain- ability causes many problems as pointed out in Holzinger et al. (2017). For instance, the lack of explainability is a significant hurdle for their widespread adoption in the clinic despite their performance. Moreover, under the new European General Data Protection Regulation, it may also generate legal issues in busi- ness, as companies are required to be able to explain why de- cisions have been made by their models upon demand. Hence we propose a simple classification model with 9 input features and 14 parameters in total such that the role and contribution of each feature or parameter are clear and explainable. Second, in terms of data availability in medical image anal- ysis, we usually have access to a large amount of unlabeled data and a small amount of labeled data. How to make good use of the available data to train automatic methods remains an open question (Weese and Lorenz (2016)). Semi-supervised learning appears to be a powerful approach to tackle this chal- arXiv:1811.03433v2 [cs.CV] 27 Mar 2019

semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 1

Explainable cardiac pathology classification on cine MRI with motion characterization bysemi-supervised learning of apparent flow

Qiao Zhenga,∗, Herve Delingettea, Nicholas Ayachea

aUniversite Cote d’Azur, Inria, France

A R T I C L E I N F O

Keywords: Cardiac pathology, Classifi-cation, Cine MRI, Motion, Deep learn-ing, Semi-supervised learning, Neuralnetwork, Apparent flow

A B S T R A C T

We propose a method to classify cardiac pathology based on a novel approach to ex-tract image derived features to characterize the shape and motion of the heart. An orig-inal semi-supervised learning procedure, which makes efficient use of a large amountof non-segmented images and a small amount of images segmented manually by ex-perts, is developed to generate pixel-wise apparent flow between two time points of a2D+t cine MRI image sequence. Combining the apparent flow maps and cardiac seg-mentation masks, we obtain a local apparent flow corresponding to the 2D motion ofmyocardium and ventricular cavities. This leads to the generation of time series of theradius and thickness of myocardial segments to represent cardiac motion. These timeseries of motion features are reliable and explainable characteristics of pathologicalcardiac motion. Furthermore, they are combined with shape-related features to classifycardiac pathologies. Using only nine feature values as input, we propose an explain-able, simple and flexible model for pathology classification. On ACDC training set andtesting set, the model achieves 95% and 94% respectively as classification accuracy.Its performance is hence comparable to that of the state-of-the-art. Comparison withvarious other models is performed to outline some advantages of our model.

1. Introduction

Cine magnetic resonance imaging (cine MRI) is widely usedin the clinic as an approach to identify cardiac pathology. Forboth the patients and the clinicians, there is hence a greatneed for automated accurate cardiac pathology identificationand classification based on MRI images as mentioned in Rueck-ert et al. (2016) and Comaniciu et al. (2016), as well as in themyocardial infarct classification challenge run at the STACOMworkshop in 2015 (Suinesiaputra et al. (2018)). Recently, thestate-of-the-art cardiac pathology classification methods extractvarious features from MRI images and perform classificationbased on these features. Despite the great results achieved sofar, there are still some aspects that need to be further explored.

First, most classification models, including the state-of-the-art models, take many feature values together as input to a sin-gle or a group of machine learning classifiers (e.g. Khenedet al. (2017), Khened et al. (2018), Wolterink et al. (2017),Cetin et al. (2017), Isensee et al. (2017)), and output the pre-dicted probability distribution over several classes. Like many

∗Corresponding author: Tel.: +33-492385024; fax: +33-492387669; InriaEpione, 2004 route des Lucioles BP 93, 06902 Sophia Antipolis Cedex, France

e-mail: [email protected] (Qiao Zheng)

other machine learning methods, or more specifically like mostdeep learning methods, these classification models are not easyto interpret. On the one hand, most of the models contain atleast hundreds of parameters and it is impractical to examineand explain the role of each parameter. On the other hand, asmany features are used simultaneously, it is hard to tell in astraightforward manner which feature value contributes to theidentification of which category. This drawback on explain-ability causes many problems as pointed out in Holzinger et al.(2017). For instance, the lack of explainability is a significanthurdle for their widespread adoption in the clinic despite theirperformance. Moreover, under the new European General DataProtection Regulation, it may also generate legal issues in busi-ness, as companies are required to be able to explain why de-cisions have been made by their models upon demand. Hencewe propose a simple classification model with 9 input featuresand 14 parameters in total such that the role and contribution ofeach feature or parameter are clear and explainable.

Second, in terms of data availability in medical image anal-ysis, we usually have access to a large amount of unlabeleddata and a small amount of labeled data. How to make gooduse of the available data to train automatic methods remainsan open question (Weese and Lorenz (2016)). Semi-supervisedlearning appears to be a powerful approach to tackle this chal-

arX

iv:1

811.

0343

3v2

[cs

.CV

] 2

7 M

ar 2

019

Page 2: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

2 Qiao Zheng et al.

lenge in general (Bai et al. (2017), Gu et al. (2017), Cheplyginaet al. (2018)). In this paper, while cardiac motion is estimatedin a flow-based manner like in many other methods (Gao et al.(2016), Parajuli et al. (2017)), we extend it as a semi-supervisedlearning method to train a network for apparent flow generation,using the dataset of Automatic Cardiac Diagnosis Challenge(ACDC) of MICCAI 2017 (Bernard et al. (2018)), for whichthe ground-truth segmentation mask is only available for 2 timeframes. Although the percentage of the segmented frames inthe dataset is small, making efficient use of their segmentationmasks in training is essential for the generated flow to have bet-ter consistency. In particular, with the supervision of the masksin training, we show that cardiac structures are better preservedafter warping by the generated flow.

Third, the state-of-the-art classification methods most exclu-sively focus on features extracted at two instants: the instantsof end-diastole (ED) and end-systole (ES). The other instantsare often ignored in pathology classification. For example, inthe ACDC challenge, 3 out of the 4 cardiac pathology classifi-cation methods, including Khened et al. (2017) (as well as itsupdated version Khened et al. (2018)), Wolterink et al. (2017)and Cetin et al. (2017), use only features based on ED and ES.The authors of Isensee et al. (2017) propose the only method inthe ACDC challenge which explores the instants other than EDand ES by quantifying the volume change and by measuringthe LV-RV dissynchrony. Yet much information about cardiacmotion (e.g. how individual myocardial segments move) is stillexcluded from the extracted features. While more and more re-search efforts are put on cardiac motion estimation (e.g. Qinet al. (2018a), Qin et al. (2018b), Xue et al. (2018), Yang et al.(2017), Yan et al. (2018)) and cardiac disease assessment viamotion analysis (e.g. Gilbert et al. (2017), Dawes et al. (2017),Lu et al. (2018)), we propose to explore the impact of specificmotion features to learn the detection of cardiac pathologies byextracting some useful time series of simple and straightforwardfeatures from cine MRI image sequences. Ideally, the resultingtime series should be both informative enough to be used forclassification and intuitive to be understood by a physician.

In this paper, we propose a novel and explainable method toclassify a subset of cardiac pathologies using deep learning ofcardiac motion (in the form of apparent flows) and shape. Ourmain contribution is threefold:• Semi-supervised learning of flow: a novel semi-supervisedlearning method is applied to train a neural network model,which outputs apparent flows given two MRI images from thesame 2D+t cine MRI image sequence. This allows to learn themotion as apparent flows efficiently from both segmented andnon-segmented image data.• Motion-characteristic features: combining the apparentflows across time with cardiac segmentation, time series of theradius and thickness of myocardial segments are extracted todescribe cardiac motion. As features, they are easy to interpretand allow to characterize different shapes and motions of car-diac pathologies.• Explainable classification model: we train a set of 4 sim-ple classifiers to perform binary classifications. Each classifierperforms a logistic regression and takes no more than 3 feature

values as input, which makes it very simple and easy to inter-pret. On the ACDC challenge training set and testing set, ourmodel achieves 95% and 94% as classification accuracy respec-tively, which is comparable to the state-of-the-art.

2. Data

2.1. Dataset

The proposed method is trained and evaluated on the ACDCchallenge dataset, which consists of a training set of 100 casesand a testing set of 50 cases. The cine MRIs were acquired witha conventional SSFP sequence (Bernard et al. (2018)). Mostof the cases contain about 10 slices of short-axis MRIs. Andthe number of frames in the cases varies between 12 and 35.ACDC training set and testing set are respectively divided into5 pathological groups of equal size (we cite below the proper-ties of each group as provided on the website, though they areonly roughly exact according to our measure and observation):• dilated cardiomyopathy (DCM): left ventricle cavity (LVC)volume at ED larger than 100 mL/m2 and LVC ejection frac-tion lower than 40%• hypertrophic cardiomyopathy (HCM): left ventricle (LV) car-diac mass higher than 110 g/m2, several myocardial segmentswith a thickness higher than 15 mm at ED and a normal ejectionfraction• myocardial infarction (MINF): LVC ejection fraction lowerthan 40% and several myocardial segments with abnormal con-traction• RV abnormality (RVA): right ventricle cavity (RVC) volumehigher than 110 mL/m2 or RVC ejection fraction lower than40%• normal subjects (NOR)Please note that the abnormal contraction mentioned in thecharacteristics of MINF is quite vague as a property. In ad-dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilatedLVC (for which we should classify the case to MINF instead ofDCM according to ACDC challenge). As we will present later,it is indeed a challenge to distinguish them.

For the cases of ACDC training set, expert manual segmenta-tion for LVC, RVC and the left ventricular myocardium (LVM)is provided as ground-truth for all slices at ED and ES phases;all other structures in the image are considered as background.For the cases of ACDC testing set, no ground-truth informationabout classification or segmentation is available. For perfor-mance evaluation on the testing set, the predicted results of amodel need to be submitted online.

2.2. Notation

In this paper, slices in image stacks are indexed in spatial or-der from the basal part to the apical part of the heart. Given animage stack S , we denote NS the number of its slices. Giventwo values a and b between 0 and NS − 1, we note S [a, b]the sub-stack consisting of slices of indexes in the interval[round(a), round(b)[ (round(a) is included while round(b) is ex-cluded).

Page 3: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 3

Step Operation Input Output

1

Apparent Flow Generation by ApparentFlow- net

Frame ED Frame t Apparent Flow

2 Segmentation by LVRV-net

Frame ED or ES

6-Segment Division of Mask ED the or ES Myocardial Mask at ED

3 Extraction of Shape-Related Features

Segmentation Masks of the Stacks at ED and ES

Volumes at ED and ES, Volume Ratios, Myocardial Thickness

4 Extraction of Motion- Characteristic Features

6-Segment Apparent Flow Myocardial along Time Mask at ED

Time Series of Segment Radius and Thickness, Segment Motion Disparity Indices

Fig. 1. Overview of the feature extraction method: 1. Apparent flow generation given the ED frame and another frame on the same slice; 2. Cardiacsegmentation on the ED and ES frames and division of the ED myocardium mask to 6 segments; 3. Extraction of the shape-related features, includingthe calculation of the volumes, volume ratios and myocardial thickness of a heart given the segmentation masks; 4. Extraction of motion-characteristicfeatures, including the creation of segment radius and thickness time series given a slice with the corresponding apparent flow maps and segmentationmask.

Input Feature Values

RVA Classifier

HCM Classifier

DCM Classifier

MINF Classifier

RVA

HCM

NOR

MINF

DCM

YES

YES

YES

YES

NO

NO

NO

NO

Fig. 2. Overview of the classification method: the 4 binary classifiers areapplied in sequence to classify a case to RVA, HCM, DCM, MINF or NOR.

3. Methods

Our method mainly consists of two parts: feature extraction(Fig.1) and classification based on features (Fig.2). But the re-gion of interest (ROI) needs to be determined first.

3.1. Preprocessing: Region of Interest (ROI) Determination

As a preprocessing step, the ROI needs to be determined onthe original MRI images. Short-axis MRI images usually cover

a zone much larger than that of the heart. To save memory us-age and to increase the speed of apparent flow and segmentationmethods, it is better to work on an appropriate ROI instead. Forthis purpose, we directly apply an existing ROI method: we usethe trained ROI-net exactly as described in Zheng et al. (2018)to define an ROI. Briefly speaking, the ROI-net is a variant ofU-net (Ronneberger et al. (2015)) for heart/background binarysegmentation. It is applied on several middle slices on the EDimage stack. As shown in Zheng et al. (2018), this ROI deter-mination method is very robust and succeeds in all cases of theACDC dataset. In the remainder of this paper, we only refer tothe automatically cropped ROI of the images.

3.2. Feature Extraction Step 1: Apparent Flow Generation

As shown in Fig.1, there are four steps for feature extrac-tion. In this first step, the ApparentFlow-net, which is a variantof U-net (Ronneberger et al. (2015)) as shown in Fig.3, is pro-posed. U-net, with the encoder-decoder structure consisting oflayers of various sizes of receptive fields, can effectively inte-grate local and global information, which is necessary for theanalysis of the shape and motion of the heart on MRIs. Pre-viously, we successfully used some variants of U-net for car-diac segmentation (Zheng et al. (2018)). So we expect a similarstructure would also work for the estimation of cardiac motion.The ApparentFlow-net is applied to generate pixel-wise appar-ent flow given a pair of image frames on the same slice as input:the ED frame and another frame of index t on the same slice.In other words, the generated apparent flow map is a displace-ment field of the slice between ED and instant t. In a later step,combined with the segmentation mask, we will extract cardiacmotion features from the sequences of apparent flow maps on

Page 4: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

4 Qiao Zheng et al.

Apparent FlowApparent Flow

ApparentFlow-net with N=24, O=2 for apparent flow generation

128

64

32

16

8

1 N 3N N O

N 2N 6N 2N

O2N 4N 12N 4N

O

4N 8N

8N 16N

24N 8N 64 Feature map size of blocks in the rowN,O Number of feature maps

(Conv 3x3, BN, ReLU) x 2

Conv 1x1

Copy, concatenation

Max pooling 2X2

Upsampling 2X2

Upsampling 2X2, element-wise addition

Frame ED Frame t Frame ED Frame t

Fig. 3. ApparentFlow-net: for apparent flow generation. The output is amap of pixel-wise flow Ft .

a slice. The details of this extraction are available in the sub-section 3.5. While there exists some researches that exploreimage registration (or equivalently, apparent flow) using unsu-pervised learning (e.g. Balakrishnan et al. (2018), Krebs et al.(2018), de Vos et al. (2017), Li and Fan (2017)), we proposea semi-supervised learning approach to make efficient use of alarge amount of non-segmented images and a small amount ofimages segmented manually by experts.

In general, the idea of representing motion by apparent flowis based on two assumptions. First, we assume that the pixelintensities of an object do not change much between the twoframes. Second, it is assumed that neighboring pixels have sim-ilar motion. By observation, we find that these assumptionsusually hold on the slices located below the base and above theapex with some margin. This is due to the limited out-of-planemotion on these slices (this is less the case for the slices aroundthe base and the apex). Hence ApparentFlow-net is trained andapplied on the middle slices only.

If we note IED(P) and It(P) the pixel intensity of the two inputframes of ApparentFlow-net at position P = (x, y), according tothe first assumption above, ApparentFlow-net should generatean apparent flow map Ft with Ft(P) = (F x

t (P), Fyt (P)) between

ED and t enabling image reconstruction such that the followingintensity discrepancy is minimized:

LIMG(Ft) =∑

P

(IED(P) − It

(P + Ft(P)

))2(1)

Meanwhile, the flow should also preserve the regularity ofthe motion of neighboring pixels according to the second as-sumption above. While there are already some methods inthe community to impose diffeomorphisms (e.g. demon’s al-gorithm as in Pennec et al. (1999), LDDMM as in Hernandezet al. (2008)), we propose a simple one to only discourage theoccurrence of the extreme situations such as the crossing be-

tween two adjacent pixels or rotations greater than 90◦(Fig.4).As long as these unrealistic motion patterns do not appear, thereis no penalty on the regularity at all and the network is free togenerate whatever flow without being influenced by the regu-larity constraint. More precisely, let us note WFt as the warpingfunction such that WFt (P) = P + Ft(P). For two adjacent pixelsP = (x, y) and Px+ = (x + 1, y) in a row, we want the warpedpixel WFt (Px+) to stay on the right of the warped pixel WFt (P)(similarly for the adjacent pixels P and Py+ = (x, y + 1) in acolumn) (see Fig.4). Otherwise, we say that a crossing on thex-components (y-components) of the flow pairs occurs and apenalty should apply. This translates as the following criterionto be minimized (more details about the derivation are availablein Appendix A):

LCROSS(Ft)

=∑

P

min(1 +∂F x

t (P)∂x

, 0)2 + min(1 +∂Fy

t (P)∂y

, 0)2 (2)

Moreover, we further encourage the flow to preserve the seg-mentation masks of cardiac structures S ∈ {LVC, LVM, RVC}.The warped segmentation masks of these structures should ap-proximately match the ground-truth masks on the correspond-ing frame. Let us note MS

ED and MSES the binary ground-truth

segmentation mask (of pixel intensity value 0 or 1) of S at theinstants of ED and ES (the only instants for which the ground-truth is available in the ACDC training set). This constraint onthe flow between ED and ES is based on the Dice coefficient

LGT (FES) =∑

S∈{LVC,LVM,RVC}

Dice(MSED,M

SES ◦WFES ) (3)

The formula of the Dice function is provided in Appendix A.Finally, the overall loss function for training the

ApparentFlow-net is a linear combination of the termsLIMG, LCROSS and potentially LGT . We adopt a semi-supervisedapproach for which LGT is applied when ground-truth segmen-tation is available:

Lflow(Ft) = LIMG(Ft) + p1LCROSS(Ft) + p21t=ESLGT (Ft) (4)

where 1t=ES is the indicator function for the event t = ES.1t=ES is necessary as for the instants t other than ED and ES,the ground-truth segmentation is not provided in ACDC. Pleasenote that this is a typical method of semi-supervised learning.It makes use of a small amount of labeled data (the imageswith ground-truth segmentation) and a large amount of unla-beled data (the images without ground-truth).

3.3. Feature Extraction Step 2: Segmentation

In this step, an existing model for segmentation proposed inZheng et al. (2018), the LVRV-net, is applied to segment MRIimage stacks as presented in Zheng et al. (2018). With the con-cept of propagation along the long axis, this method was provento be robust, as the results achieved on several different datasetsare all comparable or even better than the state-of-the-art. Formore details about the structure, training and application of theLVRV-net, please refer to Zheng et al. (2018). When we train

Page 5: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 5

Adjacent Pixel Pairs and

Their Transformed Positions by

Apparent Flow

Crossing (and Hence Penalty)

on the X-Components

NO YES NO YES - - - -

Crossing (and Hence Penalty)

on the Y-Components

- - - - NO YES NO YES

Fig. 4. Examples of adjacent pixel pairs transformed by apparent flow for which the crossing penalty applies or not.

and evaluate our method on the ACDC training set (100 cases),in each fold of a 5-fold cross-validation, the trained LVRV-net as given by Zheng et al. (2018) is finetuned with the 80cases used for training before being applied on the remaining20 cases; for the evaluation of our method on ACDC testing set(50 cases), the trained LVRV-net is first finetuned with the 100cases of ACDC training set.

In fact, in Zheng et al. (2018), LVRV-net was trained to startthe segmentation propagation from a given slice on which theventricle cavities are supposed to be present. In other words,it was only trained to identify LV and RV labels on the slicesbelow the base. So it might not work well if the basal slice isnot determined in a stack and if the top slice in the volumetricimage is located above the base. In this case, if we apply theoriginal LVRV-net starting from the top slice, it might make afalse positive prediction. With finetuning on ACDC, we findthat this issue is solved. In general, the finetuned LVRV-netsuccessfully learns from the ground-truth segmentation masksof ACDC that no foreground pixel is present (i.e. predict every-thing to be background) on the slices above the base and startsegmentation propagation only when the base is reached. Soit is no longer necessary to determine the basal slice manually.On the resulting sets of segmentation masks, we can hence alsodetermine the location of the base, which is necessary for thecalculation of volumes and the determination of sub-stacks formotion extraction as we will present later.

With the segmentation mask, we determine BL and BR, thebarycenters of LVC and RVC respectively. Then all the pixelsP labeled to LVM on the segmentation mask are divided into6 segments, depending on in which interval [kπ/3, (k + 1)π/3[for k in [0, 5] the angle between the vectors BL P and BLBRis. An example of the resulting 6 segments are shown in Fig.1.This division of segments is inspired by the 17-segment sys-tem recommended by the American Heart Association (AHA)in Cerqueira et al. (2002). Indeed, in the AHA system, on allshort-axis slices around the base and at the level of mid-cavity,the myocardium is divided into 6 segments.

3.4. Feature Extraction Step 3: Shape-Related Features

Based on the segmentation masks generated in the previousstep, we estimate the volumes of LVC, LVM and RVC of a caseat ED and ES. For each of the two phases, the volume of LVCis calculated by approximating the LVC between two adjacentslices as a truncated cone and summing up all the truncated cone

Table 1. The extracted features used by our classification modelFeature Notion (and Definition)

RVC volume at ED VRVC,ED

LVC volume at ES VLVC,ES

RVC ejection fraction EFRVC (= 1 − VRVC,ES/VRVC,ED)LVC ejection fraction EFLVC (= 1 − VLVC,ES/VLVC,ED)

Ratio between RVC and RRVCLV ,ED

LV volumes at ED (= VRVC,ED/(VLVC,ED + VLVM,ED))Ratio between LVM and RLVMLVC,ED

LVC volumes at ED (= VLVM,ED/VLVC,ED)Maximal LVM thickness MTLVM,ED

in all the slices at EDRadius motion RMD

disparityThickness motion TMD

disparity

volumes:

VLVC =∑

i

(S i + S i+1 +√

S iS i+1)(Li+1 − Li)/3 (5)

where S i is the area of LVC on the slice i and Li is the sliceposition along the long axis. The volume of LVM and RVCis calculated in a similar way. Then we normalize all the vol-umes by the corresponding body surface area (BSA) of the sub-ject, which is a traditional practice based on the assumption thatBSA is related to the metabolic rate. BSA can be computedfrom the height and the weight provided in ACDC (using theMosteller formula BSA=

√height ∗ weight/60 ).

With the segmentation masks and volumes at ED and ES, wethen compute the 7 shape-related features as listed in the first 7rows of Table 1.

3.5. Feature Extraction Step 4: Motion-Characteristic Fea-tures

3.5.1. Slice SelectionFor each case, let S be the image stack at ED (following the

Notation part in the previous section). Given the segmentationmasks of each slice generated in Step 2, we note i1 the indexof the first slice on which RVC mask is present (roughly thefirst slice below the base), and i2 the index of the last slice onwhich LVC mask is present (roughly the last slice above the

Page 6: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

6 Qiao Zheng et al.

B0

Bi

Ik,0

Ik,i

Ok,0

Ok,i

B0

Ik,0

Ok,0

Bi

Ik,i

Ok,i

Fig. 5. Definitions of Bi, Ik,i, Ok,i, RAk,i and Tk,i, for the extraction of motion-characteristic time series. The first row shows the definitions at t0; the secondrow presents the definitions at ti for i ∈ [1, 9]. 1st column: Frames at t0 and ti, based on which the apparent flow is generated. 2nd column: Bi isthe barycenter of warped LVC (segmented at t0) at ti. 3rd column: Ik,i is the barycenter of the warped inner boundary of segment S k at ti. 4thcolumn: Ok,i is the barycenter of the warped outer boundary of segment S k at ti. 5th column: RAk,i = |Bi Ik,i |/BSA, Tk,i = |BiOk,i |/BSA − RAk,i .

apex), and h = i2 − i1 + 1. Then we focus on extracting motioninformation from the sub-stack Smid = S [i1 +0.1h, i2 +1−0.2h].Please note that among the slices between the base and the apex,we exclude the top 10% and the bottom 20% and consider theremaining 70% in the middle, since the out-of-plane motion isparticularly large in the slices close to the base or the apex.

3.5.2. Frame SamplingAs presented in Fig.1, for each slice in Smid, let us note f

the number of frames available (all the frames together form acardiac cycle). We sample 10 frames of instant ti for i in [0,9],such that t0 is the instant of ED and ti = round(t0 + i ∗ f /10)mod f for i in [1,9]. The 10 sampled frames hence cover thewhole cardiac cycle. Applying the ApparentFlow-net of Step1 in all the 9 pairs of frame (t0, ti), we obtain 9 apparent flowmaps Fti . Hence for each pixel P, we get its warped positionWFti

(P) for i in [1,9].

3.5.3. Time Series ExtractionThen, with the convention that Ft0 is the null apparent flow

(and hence WFt0is the identity function), the barycenter of LVC

at ti for i ∈ [0, 9], Bi, is defined as the average of WFti(P) for all

the pixels P labeled as LVC on the segmentation mask MED att0 (the 2nd column of Fig.5):

Bi = average({WFti(P) | P ∈ LVC on MED}) (6)

In a similar way, for each myocardial segment S k (k ∈ [0, 5])and each instant ti (i ∈ [0, 9]), we define Ik,i, the barycenter ofthe inner boundary of the myocardial segment S k at ti (the 3rdcolumn of Fig.5):

Ik,i = average({WFti(P) | P ∈ LVC on MED

& P has neighboring pixel(s) ∈ S k})(7)

and Ok,i, the barycenter of the outer boundary of the myocardialsegment S k at ti (the 4th column of Fig.5):

Ok,i = average({WFti(P) | P ∈ S k

& P has neighboring pixel(s) ∈ background on MED})(8)

Finally, as shown in the 4th column of Fig.5, we define the ra-dius of S k at ti normalized by BSA:

RAk,i = |BiIk,i|/BSA (9)

and the thickness of S k at ti normalized by BSA:

Tk,i = |BiOk,i|/BSA − RAk,i (10)

We hence generate two time series {RAk,i : i ∈ [0, 9]} and {Tk,i :i ∈ [0, 9]} to represent the contraction and the thickening of S k.

3.5.4. Visual Correspondence between Time Series andPathologies

We compute the two time series introduced above for all theslices in S mid of all the cases in ACDC. From the majority of thecases, we manage to visually identify several typical slices withthe time series characterizing the motion of the correspondingcategory. Examples of such typical slices are presented in Fig.6.To sum up our observation on the typical slices of each categoryas shown in Fig.6:• NOR: all segments have similar radius and thickness at allinstants; their contraction and thickening are synchronous andwith comparable scales.• HCM: the segments not only look proportionally thicker atED, but also thicken more and contract stronger in the radialdirection.• DCM: the radiuses are quite large; the segments are movingso little that neither contraction nor thickening is obvious.•MINF: the radiuses are quite large; some segments are clearlymuch more active than others.

Page 7: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 7

NOR

HCM

DCM

MINF

Fig. 6. Examples of typical slice from 4 of the 5 pathological categories in ACDC. 1st column: the segmentation of the 6 myocardial segments (theboundaries of the segmentation masks are marked by lighter colors). 2nd column: time series of the radius (solid lines) and the thickness (dottedlines) of the 6 segments. 3rd column: a visualization of the motion information. For each segment, the radius connecting the LVC barycenter B0 andthe segment inner boundary barycenter Ik,0 (marked by the light green arrow) at ED is plotted. The segment inner boundary barycenter at ES is markedby the light orange arrow. The radius of the circle is proportional to the difference of segment thicknesses at ES and ED (∆Tk,i).

Page 8: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

8 Qiao Zheng et al.

3.5.5. Motion-Characteristic Feature ValuesTo better distinguish DCM and MINF cases, we define two

additional feature values which often indicate the abnormalcontraction described in the definition of MINF.

The first one is “radius motion disparity” (RMD). Given acase, we consider the set of radius series {RAk,i : i ∈ [0, 9]} ofall the segments S k on all the slices in the sub-stack S mid (e.g.if there are 4 slices in S mid, we consider a set of 6 × 4 = 24time series). We first define the disparity of motion over all thesegments in S mid at the instant ti as the difference between themaximum and minimum contraction at ti:

RDi = maxSk∈Smid

RAk,i/RAk,0 − minSk∈Smid

RAk,i/RAk,0 (11)

Then RMD is defined as the maximum disparity along the car-diac cycle:

RMD = maxi∈[0,9]

RDi (12)

The second motion-characteristic feature value is named“thickness motion disparity” (TMD). For each slice s in S mid

and each ti, we define the thickness motion disparity of the slices at ti as

TDs,i = ( maxk∈[0,5]

Tk,i − mink∈[0,5]

Tk,i)/ mink∈[0,5]

Tk,0 (13)

where we normalize the thicknesses by the minimum segmentthickness at t0 on slice s taking into account that myocardialthickness may vary across slice.

Finally, TMD is defined as

TMD = maxs∈Smid , i∈[0,9]

TDs,i (14)

3.6. Classification

3.6.1. 4-Classifier Classification ModelUsing the 7 shape-related features and the 2 motion-

characteristic features as input, a classification model isproposed (Fig.2) to classify the 5 pathological categories ofACDC. It consists of 4 binary classifiers:• RVA classifier: RVA cases v.s. all the other cases.• HCM classifier: HCM cases v.s. MINF, DCM and NORcases.• DCM classifier: DCM cases v.s. MINF and NOR cases.•MINF classifier: MINF cases v.s. NOR cases.

The 4 binary classifications are arranged in increasing orderof difficulty of the binary classification tasks. RVA and HCMcases can be identified based on the commonly used shape-related features. So they are classified first. DCM and MINFcases are somewhat similar in terms of sizes and ejection frac-tions. We use the novel motion-characteristic features to betterdistinguish them. Hence this more difficult classification is per-formed at the end.

3.6.2. Explainable Manual Feature SelectionTo keep the classifiers simple, limit their risk of overfitting

and increase their explainability, we chose no more than 3 fea-tures for each classifier as shown in Table 2:

Table 2. The input features of the 4 binary classifiersInput Feature(s)

RVA Classifier VRVC,ED, EFRVC, RRVCLV ,ED

HCM Classifier EFLVC, RLVMLVC,ED, MTLVM,ED

DCM Classifier VLVC,ES, RMD, TMDMINF Classifier EFLVC

• For RVA classifier, according to the definition provided byACDC, the RVC volume at ED and the RVC ejection fractionare the most relevant features. We add one more feature, theratio between RVC and LV volumes at ED, as we find that mostRVA cases have disproportionately large RVC.• For HCM classifier, LVC ejection fraction and maximal LVMthickness are selected according to the definition of HCM. Theratio between LVM and LVC volumes at ED is added becausewith most HCM cases this ratio is exceptionally high due to theexceptional myocardial thickness .• For DCM classifier, as DCM cases are usually dilated at EDand inactive from ED to ES, their volumes of LVC at ES can beexceptionally large. So this feature is used. In addition, we alsouse radius motion disparity and thickness motion disparity.• For MINF classifier, by definition, LVC ejection fraction isenough to distinguish MINF cases from NOR cases

3.6.3. Model of ClassifiersEach of the 4 classifiers is just a ridge logistic regression

model. For a training case of index m, if we note fm,i the i-th feature values used as input of the classifier and ym (-1 or 1,corresponding to no or yes) the binary ground-truth of the case,then the classifier is trained by minimizing

Lclassifier({pi}, b)

=12

∑i

p2i + C

∑m

log(exp

(− ym(

∑i

pi fm,i + b))

+ 1) (15)

with respect to the parameters {pi} and b. C is the inverseof regularization strength. After the training is done, given acase of index l and feature values fl,i, the prediction the sign of∑

i pi fl,i + b. If it is non-negative, the prediction of the trainedclassifier is yes; otherwise it is no.

3.6.4. Flexibility and Versatility of the ModelFinally, we would also like to point out that the 4 classi-

fiers function independently. While they are grouped togetherto form the proposed classification model in this paper, they cancertainly be applied separately or grouped in a different mannerin other situations if appropriate. This proposed classificationmodel is hence very flexible and versatile.

4. Experiments and Results

We evaluate our method in two different ways. On the onehand, the model is trained with ACDC training set and thentested on ACDC testing set. On the other hand, a 5-fold cross-validation is performed on ACDC training set. For the latter, the100 cases of ACDC training set are partitioned into 5 subsets of

Page 9: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 9

Table 3. The mean(standard deviation) of Dice coefficients achieved bycomparing MES◦WFES and MED for 3 cardiac structures in the 5-fold cross-validation on ACDC training set.

Training Method DiceLVM LVC RVC

semi-supervised 0.84(0.07) 0.94(0.07) 0.87(0.19)(proposed)

unsupervised 0.76(0.08) 0.93(0.06) 0.83(0.22)

20 cases, such that in each subset there are exactly 4 cases ofeach of the 5 categories.

In addition, we also analyze the proposed model by compar-ing it with various other models. Since the ground-truth cate-gory is only available for the cases in the training set (and notfor those in the testing set), this analysis is based on the resultson the training set.

4.1. Training ApparentFlow-net

4.1.1. Parameters and DataIn the training process with the whole ACDC training set, as

well as in each of the 5 training processes of the 5-fold crossvalidation, the ApparentFlow-net is trained using the loss func-tion Lflow(Ft) introduced in the Method section for 50 epochswith batch size 16. In terms of loss function parameter, we em-pirically find that p1 = 103 and p2 = 105 work well. Thesevalues are hence used for training. In terms of training data, foreach case in the corresponding training set, we use the slicesin the sub-stack S [i1 + 0.2h, i2 + 1 − 0.2h] (with the notationintroduced in the sub-section 3.5.1). In other words, we ap-proximately exclude the top 20% and the bottom 20% of all theslices covering the LV cavity, and select the remaining 60% inthe middle. This slice selection for training (middle 60%) isslightly more conservative than that for the application of themethod (middle 70%). This design is aimed to further reducethe impact of the out-of-plane motion in training. For each se-lected slice, the frame pairs of indices (ED, t) for all frame indext are used to train the ApparentFlow-net. Only when t = ES,the term LGT (Ft) in Lflow(Ft) using the segmentation groundtruth is applied. With our automatic slice selection approach,in total, there are 13672 frame pairs used for training in theACDC training set. Among the 13672 frame pairs, only 515pairs (3.77%) come with segmentation ground-truth such thatthe term LGT (Ft) applies.

4.1.2. PerformanceTo measure its performance, in each evaluation of the 5-

fold cross-validation, for all the slices in the sub-stack S [i1 +

0.2h, i2 + 1 − 0.2h] of all the 20 cases for evaluation, we ap-ply the trained ApparentFlow-net to generate FES. Then weuse it to warp the ground-truth segmentation mask at ES, notedas MES, to obtain MES ◦ WFES . MES ◦ WFES is then comparedwith MED, the corresponding ground-truth masks at ED, usingDice coefficient (2D version) on LVM, LVC and RVC. Overall,the means(standard deviations) of Dice coefficients achieved onLVM, LVC and RVC in the 5-fold cross-validation are reportedin Table 3.

Additionally, we also visually evaluate the apparent flow gen-erated by the ApparentFlow-net. We find that the apparent flowis indeed good enough to characterize the cardiac motion of thetypical cases in the pathological categories. Several examplesare given in Appendix D.

4.1.3. Importance of Supervision in TrainingIn order to understand the importance of the small amount

of segmentation ground-truth used in the proposed semi-supervised learning method, we also train a variant ofApparentFlow-net using only unsupervised learning. The onlymodification is the removal of the term LGT (Ft) from Lflow(Ft)such that the variant is trained without any ground-truth for su-pervision. As reported in Table 3, the means of Dice coefficientson LVM, LVC and RVC are all lower than the correspondingresults achieved by the semi-supervised learning method. Inparticular, there is a large drop from 0.84 to 0.76 on the meanof Dice coefficient on LVM. So the proposed semi-supervisedlearning method is indeed better than its unsupervised learn-ing variant by making efficient use of the small amount of seg-mented images.

4.2. Finetuning LVRV-net

LVRV-net is already trained in Zheng et al. (2018) for 80epochs on a subset of about 3000 cases of UK Biobank (Pe-tersen et al. (2016)). In the training process with the wholeACDC training set, as well as in each of the 5 training pro-cesses of the 5-fold cross validation, LVRV-net is finetuned for920 epochs on the corresponding training data, with exactly thesame loss function and training parameters as given in Zhenget al. (2018). With the finetuning, the means (standard devi-ations) of 3D Dice coefficients achieved on LVC, LVM andRVC segmentation volumes in the 5-fold cross-validation are0.94(0.06), 0.90(0.03) and 0.89(0.12).

4.3. Proposed Classification Model

Apparent flows and segmentation masks are generated by theApparentFlow-net and the finetuned LVRV-net, from which the7 shape-related features and the 2 motion-characteristic featuresare extracted. Then the 4 ridge logistic regression binary classi-fiers are implemented using Scikit-learn Pedregosa et al. (2011)and trained on the cases of the categories they are supposed toclassify. For example, DCM classifier is trained on the casesof NOR, MINF and DCM; the cases of RVA or HCM are notused to train it. In terms of classifier parameter, we empiricallyfind that C = 50 works well and use it in this paper. The perfor-mances of some variants with different values of C are providedin Appendix B.

4.3.1. Classification PerformanceAs presented in Table 4, on the testing set, the accuracy of

our model is 94%. In the 5-fold cross-validation on the trainingset, our method achieves an accuracy of 95%. Hence our modelachieves performances that are comparable to those of the state-of-the-art on both the training set and the testing set. Thisis quite remarkable because, in contrast to the state-of-the-art,each classifier in our model uses only up to three features and

Page 10: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

10 Qiao Zheng et al.

Table 4. The classification performance on the testing set (50 cases) and training set (100 cases) of ACDC by different modelsModel Testing Set Training Set Evaluation Method on Training Set

Accuracy Accuracyproposed model 94% 95% 5-fold cross-validation

Isensee et al. (2017) 92% 94% 5-fold cross-validationWolterink et al. (2017) 86% 91% 4-fold cross-validation

Cetin et al. (2017) 92% 100% forward feature selection and leave-one-out cross-validationKhened et al. (2017) 96% 90% 70 cases for training, 20 for validation, 10 for evaluationKhened et al. (2018) 100% N.A. N.A.

Predicted

NOR RVA HCM DCM MINF

Ground-Truth

NOR 20 0 0 0 0

RVA 2 18 0 0 0

HCM 1 0 19 0 0

DCM 0 0 0 20 0

MINF 0 0 0 2 18

Fig. 7. The confusion matrix of the predictions by the proposed classifica-tion model in the 5-fold cross-validation on the training set of ACDC.

has only up to 4 parameters. In total, our model uses 9 featuresand has 14 parameters. And each feature is selected in a clearlyexplainable manner. On the testing set, among the two methodswith performances better than ours, Khened et al. (2017) usesa random forest of 100 trees and Khened et al. (2018) appliesa more sophisticated ensemble system. Therefore, those classi-fication models are less straightforward to interpret than ours.Furthermore, since our model has very similar performances onthe training and testing sets, there seems to be little overfit.

Based on the confusion matrix of the prediction in the 5-foldcross-validation on the ACDC training set (Fig.7), for the bi-nary classification of NOR, RVA, HCM, DCM and MINF, wecalculate and find that the precision values are 0.87, 1.00, 1.00,0.91 and 1.00; the recall values are 1.00, 0.90, 0.95, 1.00 and0.90.

4.3.2. Interpretation of Mis-ClassificationAs our classifier can be interpreted easily, we figure out for

each of the 5 misclassified cases (Fig.7) why the prediction isdifferent from the ground-truth. In fact, they all seem to besomewhat ambiguous in terms of pathological category:• Patients 082 and 088 are of ground-truth RVA but are classi-fied as NOR. According to our segmentation, they have VRVC,ED,EFRVC and RRVCLV ,ED values all very similar to that of the NORcases. For instance, they have the third and the first lowestRRVCLV ,ED values (0.755 and 0.691) among all the RVA cases,which are well in the range of that of the NOR cases.• Patient 022 is of ground-truth HCM but is predicted as NOR.Unlike all the other HCM cases, patient 022 has both EFLVC

(0.622) and MTLVM,ED (14.7mm) in the normal ranges, whichmakes it look like a NOR case.• Patients 050 and 060 are of ground-truth MINF but arepredicted as DCM. Their values of VLVC,ES (118.0mL/m2 and83.5mL/m2) are the two highest among all the non-DCM cases

Table 5. The parameters of the 4 logistic regression binary classifierstrained on the training set of ACDC

Parameters of the Trained ClassifierRVA Classifier 0.010VRVC,ED − 4.695EFRVC

+14.012RRVCLV ,ED − 9.906HCM Classifier 8.434EFLVC + 4.614RLVMLVC,ED

+0.420MTLVM,ED − 16.580DCM Classifier 0.104VLVC,ES − 0.918RMD

−7.758TMD − 0.321MINF Classifier −17.122EFLVC + 7.994

and well in the range of that of the DCM cases. In terms ofmotion disparity, on RMD and TMD, unlike the majority ofthe MINF cases, their values (0.245 and 1.173 for patient 050,0.316 and 1.246 for patient 060) are also in the ranges of that ofthe DCM cases. For these reasons, the DCM classifier predictsthem to be DCM cases.

4.3.3. Explaining the ClassifiersThe 4 binary classifiers are just logistic regression models.

As presented in the previous section, their prediction dependson the sign of the sum

∑i pi fl,i + b. To understand what is

learned from the data by the trained classifiers, in Table 5 weshow the coefficients of the classifiers trained with all the rel-evant cases in ACDC. We find that the signs of the parameterspi all correspond to the positive or negative correlation betweenthe feature and the binary classification task. For instance, in thetrained RVA classifier, the signs of the coefficients of VRVC,ED

and RRVCLV ,ED are both positive, as a large RVC volume and ahigh ratio between the RVC and LV volumes are both indica-tors of RV abnormality; on the other hand, since low RVC ejec-tion fraction usually signifies RV abnormality, the coefficient ofEFRVC is negative. Similarly, such a correspondence applies toall the coefficients of the 3 other trained classifiers. In particu-lar, for MINF classifier, the learned threshold on EFLVC to dis-tinguish MINF cases from NOR cases is 7.994/17.122 = 0.467,which can well separate them according to their definitions.

4.4. Variants of the Proposed Classification Model

We compare the proposed classification model with its vari-ants for a justification of our design and a more comprehensiveunderstanding of the model.

Page 11: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 11

Table 6. The performance of the variants of DCM classifier on the trainingset of ACDC

DCM Classifier Input # of Mis-Classificationon the 60 DCM, MINF

and NOR casesVLVC,ES, RMD, TMD (proposed) 2

VLVC,ES, TMD 2VLVC,ES, RMD 3

VLVC,ES 4

Fig. 8. The motion-characteristic features (RMD and TMD) of the DCMand MINF cases in the training set of ACDC. The majority of the cases arewell separable with these two features.

4.4.1. Importance of Motion-Characteristic FeaturesTo better understand the value of the two proposed motion-

characteristic features, we further train three variants of DCMclassifier which use zero or one motion-characteristic feature asinput. And the set of input features is the only difference be-tween these models. As shown in Table 6, on the 60 cases ofNOR, MINF and DCM, while DCM classifier makes only twoerrors, the variant using only shape-related feature VLVC,ES mis-classifies 4 cases. But improvements can be made by using atleast one motion-characteristic feature. As can be visualized inFig.8, the motion characteristic features RMD and TMD allowthe separation of the majority of the cases of DCM and MINF.Combining them with the shape-related feature VLVC,ES togetheras the input, the DCM classifier can make more accurate classi-fication.

4.4.2. Proposed Model on Non-Normalized FeaturesWe test whether BSA normalization is required for our model

to achieve high performance. Among the 9 proposed features,only the values of VRVC,ED and VLVC,ES would be different with-out BSA-normalization. And only RVA and MINF classifierswhich use these two features as input would be affected. Aspresented in Table 7, without BSA-normalization on the fea-tures, the 5-fold cross validation accuracy on ACDC training

Table 7. The 5-fold cross validation accuracy on the training set of ACDCof the variants of the proposed classification model

Method BSA- Non-Normalized Normalized

Features Featureslogistic regression classifiers 95% 94%

(proposed model)logistic regression classifiers 63% 64%

in inversed orderLasso classifiers 89% 91%

LassoCV classifiers 80% 81%random forest classifiers 85% 87%

logistic regression classifiers 88% 88%w/o manual feature selection

SVM classifiers 87% 84%w/o manual feature selection

RVM classifiers 88% 72%w/o manual feature selection

Lasso classifiers 85% 86%w/o manual feature selection

LassoCV classifiers 84% 87%w/o manual feature selection

random forest classifiers 86% 88%w/o manual feature selection

HDDA classifiers 49% 46%w/o manual feature selection

one single random forest 87% 88%w/o binary classification

one single MLP 84% 84%w/o binary classification

set only drops a little bit to 94%. The proposed model still re-mains accurate.

4.4.3. Proposed Model with Inversed Classifier Order

As presented previously, the 4 classifiers in the proposedmodel are arranged according to the estimated difficulties of thecorresponding classification task. To confirm the importance ofthis order, we create another model by inversing the order ofthe classifiers. So, unlike the proposed model shown in Fig.2,in this variant, a case goes through successively MINF, DCM,HCM and RVA classifiers instead. As shown in Table 7, the ac-curacy of this variant is quite low. Hence the proposed order ofthe classifiers is indeed important.

4.4.4. Variants with Other Classifier Models

We replace the proposed logistic regression classifiers with 3other types of classifiers on the same sets of input features, in-cluding Lasso, LassoCV (Lasso with model selection by cross-validation) and random forest. Details of these models are avail-able in Appendix C. As reported in Table 7, their performancesare clearly below that of the proposed model. Our choice oflogistic regression as the classifier model is hence justified.

Page 12: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

12 Qiao Zheng et al.

4.4.5. Variants without Manual Input Feature SelectionTo evaluate if the manual feature selection is useful for the

model to be accurate, we train several modified versions of theproposed model without manual feature selection. They all con-sist of 4 classifiers to perform the same binary classificationtasks as in the proposed model. But each of the 4 classifiersof these variants takes all 9 features together as input. In to-tal, we implement 6 models with the following models as theirclassifiers respectively (details of these models are available inAppendix C): support vector machine (SVM), relevance vec-tor machine (RVM), Lasso, LassoCV, random forest and highdimensional discriminant analysis (HDDA) model.

As reported in Table 7, on the BSA-normalized features aswell as on the non-normalized features, they all have accuracylower than that of the proposed model by at least 6%. Thisjustifies the necessity of manual feature selection, at least ona relatively small dataset like ACDC. We are not yet able toexamine the importance of manual feature selection on largedatasets.

To better understand the roles of the features, we further ex-amine the variant with random forest classifiers without manualfeature selection trained on the 100 cases of ACDC training set.For each of the 4 classifiers, we compute the feature impor-tance for each of the 9 features to determine the most importantone. The importance of a feature is defined as the total reduc-tion of the entropy brought by that feature in all the trees inthe random forest. We find that for RVA, HCM and DCM clas-sifiers, the most important features are RRVCLV ,ED, RLVMLVC,ED

and VLVC,ES respectively, which are among the features man-ually selected for the corresponding proposed classifiers. ForMINF classifier, the two most important features are RRVCLV ,ED

and EFLVC, which have roughly the same importance (0.35 and0.32). Only EFLVC is used in the proposed model according toits direct relevance. These observations provide further supportfor our manual feature selection.

4.4.6. Variants without Binary ClassificationThe proposed model divides the 5-category classification

task into 4 binary classification sub-tasks. In order to under-stand whether this special design contributes to the achievedhigh accuracy, we train and evaluate 2 variants on the same setof 9 features. A random forest and a multi-layer perceptron(MLP) are respectively trained to predict a case to be one of thefive categories directly without binary classification (details ofthese models are available in Appendix C). As reported in Ta-ble 7, their performances are not as good as that of the proposedmodel. Hence the strategy of performing a series of binary clas-sification makes sense.

5. Conclusion and Discussion

We propose a method of cardiac pathology classificationbased on originally designed and trained neural networks andclassifiers 1. A novel semi-supervised training method is ap-

1The code and the model will be available in this repository:https://github.com/julien-zheng/CardiacMotionFlow

plied to train ApparentFlow-net which provides pixel-wise mo-tion information. Combining the apparent flow generated byApparentFlow-net and the segmentation masks predicted byLVRV-net, we introduce two novel features that characterize themotion of myocardial segments. These motion-characteristicfeatures are not only intuitive for visualization but also veryvaluable in classification. The proposed classification modelconsists of 4 small binary classifiers. Each classifier works in-dependently and takes up to 3 features with clearly explainablerelevance as input. On ACDC training set and testing set, theproposed model achieves 95% and 94% respectively as classifi-cation accuracy. Its performance is hence comparable to that ofthe state-of-the-art. To justify our design of the proposed clas-sification model, we also quantitatively compare it with othermodels.

The apparent flow generated by ApparentFlow-net and theoriginally designed time series of myocardial segment motionare not only straightforward to understand but also useful forclassification. We believe that making the automatic methodsmore understandable and explainable is important, as it is notonly helpful to facilitate the implementation and application ofthe research of medical image analysis in clinics but also usefulto improve transparency and to gain trust in medical practice(Holzinger et al. (2017), Rueckert et al. (2016)).

Furthermore, the motion information we extract from theapparent flow is fairly rich. We believe that ApparentFlow-net may be a powerful tool of motion extraction for the com-munity. The way we extract the time series and the motion-characteristic features from the flow maps is just one of the somany possible applications. Also, ApparentFlow-net is trainedin a semi-supervised manner. This training approach is highlyrelevant to the current situation of data availability in medicalimage analysis, as we usually have access to a relatively largeamount of unlabeled data and a relatively small amount of la-beled data. In a word, much more potential applications in var-ious circumstances of apparent flow are yet to be explored.

Regarding the extraction of the motion-characteristic fea-tures, one could use the segmentation network to segment allframes and then derive the motion-characteristic features fromthe segmentation masks. However, we find that the resultingtime series characterizing the cardiac motion (e.g. the time se-ries of the radius and thickness as shown in the second columnof Fig.6) by this approach are not as temporally consistent as wewould expect. In fact, the segmentation network was trainedto segment the frames at ED and ES only. And no constrainthas ever been imposed to make the segmentation masks tem-porally consistent. The problem would be clearer if we lookat the two frames in the first column in Fig.5. While the EDframe (upper image) is easy to segment, the other frame (lowerimage) appears to be more challenging due to the presence ofmassive trabeculations. Moreover, as the segmentation networksegments the two frames independently, it is not obvious how toensure the consistency of the segmentation masks. This prob-lem can be solved using the ApparentFlow-net instead to ex-tract motion. As shown in the second column of Fig.6, withthe ApparentFlow-net, the extracted time series of the radiusand thickness of the segments are reasonably smooth, which

Page 13: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 13

reflects the enforced temporal consistency.We could have used existing traditional registration mod-

els to supervise the training of ApparentFlow-net or even re-place ApparentFlow-net by a deformable registration algorithm(e.g. LDDMM, LCC-Demons, etc.). However, we notice thatin order to make the traditional registration models work rea-sonably well on an unseen dataset like ACDC, the estimationand finetuning of key parameters in these models are neces-sary. For instance, the authors of Krebs et al. (2019) empir-ically estimate the key parameters of LCC-Demons (Lorenziet al. (2013)) and SyN (Avants et al. (2008)) before applyingthem on ACDC. Our method is simpler in the sense that it learnseverything from data and requires no manual model/parameterestimation/adjustment. Hence, on the one hand, our method iseasier and more convenient to be applied to various datasets thatare reasonably similar to the training set. On the other hand, itallows us to take advantage of the increasing number of dataavailable in the community. We believe a method with theseadvantages is very interesting and worth trying. Moreover, asfar as we observe, our registration method is accurate enough tocharacterize the cardiac motion. Some examples are providedin Fig.D.9 to show that the generated apparent flow character-izes the motion patterns of typical cases in several pathologicalcategories. And as shown in the second column of Fig.6, withthe apparent flow generated by the ApparentFlow-net, the ex-tracted time series of the radius and thickness of the segmentsenable us to easily distinguish the typical cases of different car-diac pathologies.

A straightforward comparison with prior works on 2Dregistration methods on the ACDC dataset shows that theApparentFlow-net performs rather well. Indeed when lookingat the Dice coefficients achieved on LVC and RVC, our ap-proach leads to 0.94 and 0.87 respectively (see Table 3). In Her-ing et al. (2019), the authors describe a learning-based methodleading to Dice coefficients at best equal to 0.90 on the samestructures (based on Fig.3 of Hering et al. (2019)) and also per-formances of a non-learning-based method similar to Ruhaaket al. (2013) with at most 0.80 of Dice. Note however thatin this comparison, differences on cross-validation (5-fold v.s.10-fold), slice selection and ROI determination may hinder theanalysis.

While analyzing and extracting the cardiac motion, we adopta 2D slice-by-slice processing method, without taking the mo-tion on neighboring slices into account. The reason behind thischoice is the fact that the inter-slice distance in the short-axisMRIs in ACDC is quite large. Usually, the inter-slice distancebetween two adjacent slices in MRI stacks is 5 to 10mm. Theheart may hence have obviously different shape and motioneven on two adjacent slices. In this case, ignoring the neigh-boring slices for motion estimation might be reasonable. How-ever, if our method is to be applied on some volumetric imageswith small inter-slice distance, a modification of the approachby taking neighboring slices into account might be beneficial.

An issue that would hinder the generalization of pathologyclassification models like ours is the lack of a standard and uni-versal definition of pathological category Suinesiaputra et al.(2016). For instance, there is another public dataset made avail-

able for the MICCAI 2009 challenge on automated LV seg-mentation Radau et al. (2009) (the dataset is also known as theSunnybrook dataset (SD)) containing pathological cases. The 4categories of SD are heart failure with infarction, heart failurewithout infarction, LV hypertrophy and healthy. While a hy-pertrophic case in ACDC has a LV cardiac mass over 110g/m2

and several myocardial segments of thickness over 15mm at EDby definition, the hypertrophic cases according to SD definitiononly need to have a LV cardiac mass over 83g/m2. And nothreshold is proposed for the myocardial segment thickness bythe SD definition. In fact, we find multiple cases in SD whichare of LV cardiac mass between 83g/m2 and 110g/m2 and max-imal segment thicknesses well below 15mm. They are identi-fied as hypertrophic cases in SD. But they would not be consid-ered as hypertrophic at all according to ACDC. Similarly, thecategory of infarction is defined differently in SD and ACDC. InSD, the infarction is determined by the evidence of late gadolin-ium enhancement; abnormal cardiac motion might not be ob-servable. Yet in ACDC, the infarction category is defined bythe presence of abnormal motion. With such discrepancies be-tween the definitions in different datasets, it is difficult for thecommunity to train a classification model on a dataset such thatit generalizes well to the others. We hence appeal for more at-tention on this issue.

Another issue that may limit the generalization of our classi-fication model is the small size of the ACDC dataset used fortraining. ACDC training set has only 100 cases of 5 patholog-ical categories. Moreover, in each pathological category, thereare only 20 cases. Consequently, on the one hand, many patho-logical categories are not included in ACDC. On the other hand,for each of the 5 pathological categories in ACDC, we wouldexpect that the 20 cases might not be enough to represent allcases in the category. In order to achieve good generalization,we may need larger datasets with more pathological categoriesto train the model.

Also, notice that the proposed simple classification model ofonly 14 parameters is somewhat specific to the ACDC dataset.If we need to adapt our model to perform classification on alarger dataset with more pathological categories, it may be nec-essary to increase the size and hence the number of parametersof the model.

Finally, we would like to point out that although some single-value hand-crafted motion-characteristic features (e.g. RMDand TMD) are used in this paper, we believe that for somepathology it would be better to use the whole time series of seg-ment radius or thickness as input to a classification model. Forinstance, if we aim to discover subtler characteristics relatedto the motion (e.g. dyssynchrony, septal flash) from a largerdataset, doing so might become appropriate and necessary. Weexpect to carry out research on this topic in the future.

Acknowledgments

The authors acknowledge the partial support from the Euro-pean Research Council (MedYMA ERC-AdG-2011-291080).

Page 14: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

14 Qiao Zheng et al.

References

Avants, B., Epstein, C., Grossman, M., Gee, J., 2008. Symmetric diffeomorphicimage registration with cross-correlation: evaluating automated labeling ofelderly and neurodegenerative brain. Med Image Anal 12(1), 26–41.

Bai, W., Oktay, O., Sinclair, M., Suzuki, M., Rajchl, M., Tarroni, G., Glocker,B., King, A., Matthews, P., Rueckert, D., 2017. Semi-supervised learningfor network-based cardiac MR image segmentation. MICCAI , 253–260.

Balakrishnan, G., Zhao, A., Sabuncu, M., Guttag, J., Dalca, A., 2018. Vox-elMorph: a learning framework for deformable medical image registration.arXiv preprint arXiv:1809.05231 .

Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P., Cetin,I., Lekadir, K., Camara, O., Ballester, M., Sanroma, G., Napel, S., Petersen,S., Tziritas, G., Grinias, E., Khened, M., Kollerathu, V., Krishnamurthi, G.,Rohe, M., Pennec, X., Sermesant, M., Isensee, F., Jager, P., Maier-Hein, K.,Baumgartner, C., Koch, L., Wolterink, J., Isgum, I., Jang, Y., Hong, Y., Pa-travali, J., Jain, S., Humbert, O., Jodoin, P., 2018. Deep learning techniquesfor automatic MRI cardiac multi-structures segmentation and diagnosis: isthe problem solved? IEEE Trans Med Imaging 37(11), 2514–2525.

Bouveyron, C., Girard, S., Schmid, C., 2007. High dimensional data clustering.Computational Statistics and Data Analysis 52, 502–519.

Cerqueira, M., Weissman, N., Dilsizian, V., Jacobs, A., Kaul, S., Laskey, W.,Pennell, D., Rumberger, J., Ryan, T., Verani, M., 2002. Standardized my-ocardial segmentation and nomenclature for tomographic imaging of theheart: A statement for healthcare professionals from the cardiac imagingcommittee of the council on clinical cardiology of the american heart asso-ciation. Circulation .

Cetin, I., Sanroma, G., Petersen, S., Napel, S., Camara, O., Ballester, M.,Lekadir, K., 2017. A radiomics approach to computer-aided diagnosis withcardiac cine-MRI, in: Proc. Statistical Atlases and Computational Modelsof the Heart (STACOM), ACDC challenge, MICCAI’17 Workshop.

Cheplygina, V., de Bruijne, M., Pluim, J., 2018. Not-so-supervised: a surveyof semi-supervised, multi-instance, and transfer learning in medical imageanalysis. arXiv preprint arXiv:1804.06353 .

Comaniciu, D., Engel, K., Georgescu, B., Mansi, T., 2016. Shaping the futurethrough innovations: from medical imaging to precision medicine. MedImage Anal 33, 19–26.

Dawes, T., de Marvao, A., Shi, W., Fletcher, T., Watson, G., Wharton, W.,Rhodes, C., Howard, L., Gibbs, J., Rueckert, D., Cook, S., Wilkins, M.,D, O., 2017. Machine learning of three-dimensional right ventricular mo-tion enables outcome prediction in pulmonary hypertension: a cardiac MRimaging study. Radiology 283, 381–390.

Gao, B., Liu, W., Wang, L., Liu, P., Croisille, P., Delachartre, P., Clarysse, P.,2016. Estimation of cardiac motion in cine-MRI sequences by correlationtransform optical flow of monogenic features distance. Phys Med Biol 61,8640–8663.

Gilbert, K., Pontre, B., Occleshaw, C., Cowan, B., Suinesiaputra, A., Young,A., 2017. 4D modelling for rapid assessment of biventricular function incongenital heart disease. The International Journal of Cardiovascular Imag-ing 34(3), 407–417.

Gu, L., Zheng, Y., Bise, R., Sato, I., Imanishi, N., Aiso, S., 2017. Semi-supervised learning for biomedical image segmentation via forest orientedsuper pixels(voxels). MICCAI , 702–710.

Hering, A., Kuckertz, S., Heldmann, S., Heinrich, M., 2019. Enhancing label-driven deep deformable image registration with local distance metrics forstate-of-the-art cardiac motion tracking. Bildverarbeitung fur die Medizin2019 , 309–314.

Hernandez, M., Olmos, S., Pennec, X., 2008. Comparing algorithms for diffeo-morphic registration: stationary LDDMM and diffeomorphic demons. Proc.MFCA 2008 , 24–35.

Holzinger, A., Biemann, C., Pattichis, C., Kell, D., 2017. What do we needto build explainable AI systems for the medical domain? arXiv preprintarXiv:1712.09923 .

Isensee, F., Jaeger, P., Full, P., Wolf, I., Engelhardt, S., Maier-Hein, K., 2017.Automatic cardiac disease assessment on cine-MRI via time-series segmen-tation and domain specific features, in: Proc. Statistical Atlases and Com-putational Models of the Heart (STACOM), ACDC challenge, MICCAI’17Workshop.

Khened, M., Alex, V., Krishnamurthi, G., 2017. Densely connected fully con-volutional network for short-axis cardiac cine MR image segmentation andheart diagnosis using random forest, in: Proc. Statistical Atlases and Com-putational Models of the Heart (STACOM), ACDC challenge, MICCAI’17Workshop.

Khened, M., Alex, V., Krishnamurthi, G., 2018. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiacdiagnosis using ensemble of classifiers. arXiv preprint arXiv:1801.05173 .

Krebs, J., Delingette, H., Mailhe, B., Ayache, N., Mansi, T., 2019. Learninga probabilistic model for diffeomorphic registration. IEEE Transactions onMedical Imaging .

Krebs, J., Mansi, T., Mailhe, B., Ayache, N., Delingette, H., 2018. Unsu-pervised probabilistic deformation modeling for robust diffeomorphic reg-istration, in: Proc. Deep Learning in Medical Image Analysis (DLMIA),MICCAI’18 Workshop.

Li, H., Fan, Y., 2017. Non-rigid image registration using fully convolutionalnetworks with deep self-supervision. arXiv preprint arXiv:1709.00799 .

Lorenzi, M., Ayache, N., Frisoni, G., Pennec, X., 2013. LCC-Demons: a robustand accurate symmetric diffeomorphic registration algorithm. NeuroImage81, 470–483.

Lu, A., Parajuli, N., Zontak, M., Stendahl, J., Ta, K., Liu, Z., Boutagy, N.,Jeng, G., Alkhalil, I., Staib, L., O’Donnell, M., Sinusas, A., Duncan, J.,2018. Learning-based regularization for cardiac strain analysis with abilityfor domain adaptation. arXiv preprint arXiv:1807.04807 .

Orlhac, F., Mattei, P., Bouveyron, C., Ayache, N., 2018. Class-specific vari-able selection in high-dimensional discriminant analysis through bayesiansparsity. preprint HAL 01811514 .

Parajuli, N., Lu, A., Stendahl, J., Zontak, M., Boutagy, N., Alkhalil, I., Eberle,M., Lin, B., O’Donnell, M., Sinusas, A., Duncan, J., 2017. Flow networkbased cardiac motion tracking leveraging learned feature matching. MIC-CAI , 279–286.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research12, 2825–2830.

Pennec, X., Cachier, P., Ayache, N., 1999. Understanding the “demon’s algo-rithm”: 3D non-rigid registration by gradient descent. MICCAI’99 , 597–605.

Petersen, S., Matthews, P., Francis, J., Robson, M., Zemrak, F., Boubertakh,R., Young, A., Hudson, S., Weale, P., Garratt, S., Collins, R., Piechnik,S., Neubauer, S., 2016. UK Biobank’s cardiovascular magnetic resonanceprotocol. J Cardiovasc Magn Reson 18:8, 8+.

Qin, C., Bai, W., Schlemper, J., Petersen, S., Piechnik, S., Neubauer, S., Rueck-ert, D., 2018a. Joint learning of motion estimation and segmentation forcardiac MR image sequences. MICCAI , 472–480.

Qin, C., Bai, W., Schlemper, J., Petersen, S., Piechnik, S., Neubauer, S., Rueck-ert, D., 2018b. Joint motion estimation and segmentation from undersampledcardiac MR image. MLMIR 2018 , 55–63.

Radau, P., Lu, Y., Connelly, K., Paul, G., Dick, A., Wright, G., 2009. Evaluationframework for algorithms segmenting short axis cardiac MRI. The MIDASJournal - Cardiac MR Left Ventricle Segmentation Challenge http://hdl.handle.net/10380/3070 .

Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networksfor biomedical image segmentation. MICCAI 9351, 234–241.

Rueckert, D., Glocker, B., Kainz, B., 2016. Learning clinically useful infor-mation from images: past, present and future. Medical Image Analysis 33,13–18.

Ruhaak, J., Heldmann, S., Kipshagen, T., Fischer, B., 2013. Highly accuratefast lung CT registration. Medical Imaging 2013: Image Processing 8669.

Suinesiaputra, A., Ablin, P., Alba, X., Alessandrini, M., Allen, J., Bai, W.,Cimen, S., Claes, P., Cowan, B., D’hooge, J., Duchateau, N., Ehrhardt, J.,Frangi, A., Gooya, A., Grau, V., Lekadir, K., Lu, A., Mukhopadhyay, A.,Oksuz, I., Parajali, N., Pennec, X., Pereanez, M., Pinto, C., Piras, P., Rohe,M., Rueckert, D., Saring, D., Sermesant, M., Siddiqi, K., Tabassian, M.,Teresi, L., Tsaftaris, S., Wilms, M., Young, A., Zhang, X., Medrano-Gracia,P., 2018. Statistical shape modeling of the left ventricle: myocardial infarctclassification challenge. IEEE J Biomed Health Inform 22(2), 503–515.

Suinesiaputra, A., McCulloch, A., Nash, M., Pontre, B., Young, A., 2016. Car-diac image modelling: breadth and depth in heart disease. Medical ImageAnalysis 33, 38–43.

Tipping, M., Faul, A., 2003. Fast marginal likelihood maximisation for sparsebayesian models. AISTATS .

de Vos, B., Berendsen, F., Viergever, M., Staring, M., Isgum, I., 2017. End-to-end unsupervised deformable image registration with a convolutional neuralnetwork. Deep Learning in Medical Image Analysis and Multimodal Learn-ing for Clinical Decision Support: Third International Workshop, DLMIA

Page 15: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 15

2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunctionwith MICCAI 2017, Quebec City, QC, Canada, September 14, Proceedings, 204–212.

Weese, J., Lorenz, C., 2016. Four challenges in medical image analysis froman industrial perspective. Medical Image Analysis 33, 44–49.

Wolterink, J., Leiner, T., Viergever, M., Isgum, I., 2017. Automatic segmenta-tion and disease classification using cardiac cine MR images, in: Proc. Sta-tistical Atlases and Computational Models of the Heart (STACOM), ACDCchallenge, MICCAI’17 Workshop.

Xue, W., Brahm, G., Pandey, S., Leung, S., Li, S., 2018. Full left ventriclequantification via deep multitask relationships learning. Med Image Anal43, 54–65.

Yan, W., Wang, Y., Li, Z., van der Geest, R., Tao, Q., 2018. Left ventriclesegmentation via optical-flow-net from short-axis cine MRI: preserving thetemporal coherence of cardiac motion. MICCAI , 613–621.

Yang, D., Wu, P., Tan, C., Pohl, K., Axel, L., Metaxas, D., 2017. 3D motionmodeling and reconstruction of left ventricle wall in cardiac MRI. : In-ternational Conference on Functional Imaging and Modeling of the Heart ,481–492.

Zheng, Q., Delingette, H., Duchateau, N., Ayache, N., 2018. 3D consistentand robust segmentation of cardiac images by deep learning with spatialpropagation. IEEE Trans Med Imaging 37(9), 2137–2148.

Appendix A. Loss Function for Training ApparentFlow-Net

To penalize the crossing or large rotations of flows, we com-pute the difference between of the warped x-components (resp.y-components) of each pair of horizontally (resp. vertically)adjacent pixels Px+ and P (resp. Py+ and P). There is a cross-ing if and only if this difference is smaller than 0, for which apenalty which is equal to the square of this difference applies.Otherwise no penalty applies. Hence we come up with the termLCROSS(Ft) to penalize the crossing of flows:

LCROSS(Ft)

=∑

P

min((

x + 1 + F xt (Px+)

)−

(x + F x

t (P)), 0

)2

+∑

P

min((

y + 1 + Fyt (Py+)

)−

(y + Fy

t (P)), 0

)2

=∑

P

min(1 +∂F x

t (P)∂x

, 0)2 + min(1 +∂Fy

t (P)∂y

, 0)2

(A.1)

in which ∂F xt (P)/∂x is computed with finite difference as

F xt (Px+) − F x

t (P) (and similarly for ∂Fyt (P)/∂y).

The Dice function in the term LGT (FES) is defined on twoimages U and V as described in Zheng et al. (2018)

Dice(U,V) = −2∑

P U(P)V(P) + ε∑P U(P) +

∑P V(P) + ε

(A.2)

with ε = 1 a term for better numerical stability in training.

Appendix B. Variants of the Proposed ClassificationModel with Different Values of Parameter C

We also perform 5-fold cross-validation on the ACDC train-ing set for the variants of the proposed classification model byvarying the parameter C in the 4 logistic regression classifiers.Their performances are reported in Table B.8.

Table B.8. The 5-fold cross-validation performance on the ACDC trainingset of some variants of the proposed classification model with various val-ues of parameter C

C Training Set Accuracy1 76%5 88%10 92%50 95%

100 95%500 93%

1000 93%5000 93%

Appendix C. Variants of the Proposed ClassificationModel with Different Classifiers and InputFeatures

Appendix C.1. Variants with Other Classifier Models

We replace the proposed ridge logistic regression classifiersby other types of classifiers on the same sets of input features:• Lasso classifiers: in this variant, each of the 4 classifiers isa least absolute shrinkage and selection operator (Lasso). Theconstant alpha that multiplies the L1 term is empirically chosento be 10−4.• LassoCV classifiers: each of the 4 classifiers is a Lasso modelwith model selection by cross-validation (LassoCV). The opti-mal constant alpha is searched in the range [10−4, 10−0.5] in a4-fold cross-validation on the training data.• random forest classifiers: each of the 4 classifiers is a randomforest of 1000 trees which expand their nodes in training un-til all leaves are pure or all leaves contain less than 2 samples.Entropy is used to measure the quality of a split in training.

Appendix C.2. Variants without Manual Input Feature Selec-tion

We train several variants of the proposed model withoutmanual feature selection. They all consist of 4 classifiersarranged in the same order to perform the same binary clas-sification tasks as in the proposed model. But each of the 4classifiers in these variants takes all the 9 features together asinput. In total, we implement and examine 6 variants with thefollowing models as their classifiers respectively:• Variant with SVM classifiers: each of the 4 binary classifiersis a support vector machine (SVM) with linear kernel andpenalty parameter C=50.• Variant with RVM classifiers: each of the 4 binary classifiersis a relevance vector machine (RVM) as introduced in Tippingand Faul (2003) with linear kernel.• Variant with Lasso classifiers: each of the 4 binary classifiersis a Lasso. Lasso is known as a model capable of performingboth variable selection and regularization. alpha, the constantthat multiplies the L1 term, is empirically set to 10−4.• Variant with LassoCV classifiers: each of the 4 binary classi-fiers is a Lasso with model selection in a 4-fold cross-validationon the training data. The optimal constant alpha is searched inthe range [10−4, 10−0.5].

Page 16: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

16 Qiao Zheng et al.

• Variant with random forest classifiers: in this variant, each ofthe 4 binary classifiers is a random forest of 1000 trees whichexpand their nodes in training until all leaves are pure or allleaves contain less than 2 samples. Entropy is used to measurethe quality of a split in training.•Variant with HDDA classifiers: each of the 4 binary classifiersis a high dimensional discriminant analysis (HDDA) model,which is an expectation-maximization algorithm designedfor high-dimensional data clustering based on the ideas ofdimension reduction and parsimonious modeling (Bouveyronet al. (2007), Orlhac et al. (2018)). Though the 9-feature spacein this paper is not high dimensional, we show the performanceof such a sophisticated method for comparison.

Appendix C.3. Variants without Binary Classification

We train and evaluate the following 2 variants on all the 9input features. These variants are obtained by replacing the 4binary classifiers with a single multi-class one:• Variant using random forest: it is a random forest of 1000trees which expand their nodes in training until all leaves arepure or all leaves contain less than 2 samples. Entropy is usedto measure the quality of a split in training.• Variant using MLP: it is a multi-layer perceptron (MLP). Ithas 2 hidden layers of 32 neurons with tanh activation function.Adam optimizer is used to train it for 105 epochs with learningrate 0.001.

Appendix C.4. Implementation of the Variants

Among the above variants of the proposed classificationmodel with different classifiers and input features, the HDDAclassifiers are implemented using the HDDA python tool-box downloaded from the GitHub page https://github.

com/mfauvel/HDDA, the RVM classifiers are implemented inPython according to the method described in Tipping and Faul(2003), and all the other variants are implemented with Scikit-learn.

Appendix D. Examples of Apparent Flow Generated bythe ApparentFlow-net

We provide 4 examples of the apparent flow generated bythe ApparentFlow-net of 4 ACDC training set cases in differ-ent pathological categories. In Fig.D.9, given the frames atED (first column) and the frames around ES (second column),we apply the trained ApparentFlow-net to generate the apparentflow maps (third column).

Visually, we find that the apparent flow can indeed character-ize the cardiac motion of the typical cases in the pathologicalcategories. As expected, the apparent flow on the LVM of aNOR case is oriented along the gradient of the image intensityand has roughly the same amplitude throughout the left ven-tricle, signifying the synchronous and uniform contraction andthickening of the LVM of the NOR case. For a HCM cases, wecan see that the flow on LVM is excessively large, which means

that the contraction and thickening is excessive, a typical phe-nomenon we find on HCM cases. Conversely, the flow on theLVM of a DCM case is small since the hearts of DCM casesusually do not contract or thicken enough. Finally, the flow onthe LVM of a MINF case is not uniform: some myocardial seg-ments contract and thicken much less than the others. This is atypical symptom that we can find on MINF cases.

Page 17: semi-supervised learning of apparent flow · dition, both MINF and DCM cases have low LVC ejection frac-tions. And sometimes, a myocardial infarction causes a dilated LVC (for which

Qiao Zheng et al. 17

NOR

HCM

DCM

MINF

Fig. D.9. Four examples of the apparent flow generated by the ApparentFlow-net of 4 ACDC training set cases in different pathological categories. Theimages in the first column are the frames at ED. The images in the second column are the frames around ES. The apparent flow maps corresponding tothe pairs of frames in the first and second columns are shown in the third column. The apparent flow can indeed characterize the cardiac motion of thetypical cases in the pathological categories. NOR: synchronous and uniform flow on LVM; HCM: excessively large flow on LVM; DCM: very small flowon LVM; MINF: asynchronous and ununiform flow on LVM.