12
Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images Using Deep Neural Networks Enze Zhang 1,2 , Boheng Zhang 3 , Shaohan Hu 4 , Fa Zhang 1(&) , and Xiaohua Wan 1 1 High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {zhangenze,zhangfa,wanxiaohua}@ict.ac.cn 2 University of Chinese Academy of Sciences, Beijing, China 3 Department of Automation, Tsinghua University, Beijing, China [email protected] 4 School of Software, Tsinghua University, Beijing, China [email protected] Abstract. Proteins contribute signicantly in most body functions within cells, and are essential to the physiological activities of every creature. Microscopy imaging, as a remarkable technique, is applied to observe and identify proteins in different kinds of cells, by which the analysis results are critical to the biomedical studies. However, as the development of high-throughput micro- scopy imaging, images of protein microscopy are generated in a faster pace ever, making it harder for experts to manually identify them. For better digging and understanding the information of the proteins in those huge amounts of images, it is urgent for methods to identify the mixed-patterned proteins within various cells automatically and accurately. Here in this paper, we design some novel and effective data preparation and preprocessing methods for high-throughput microscopy protein datasets. We propose ACP layer and bufferinglayers, using them to design customized architectures for some typical CNN classiers with new inputs and head parts. The modications let the models be more adaptive and accurate to our task. We train the models in more effective and ef cient optimization strategies that we design, e.g., cycle learning with learning rate scheduling. Besides, greedy selection of thresholds and multi-sized models ensembling in the post-process stage are proposed to further improve the pre- diction accuracy. Our experimental results based on Human Protein Atlas datasets demonstrates that the proposed methods show an excellent performance in mixed-patterned protein classications to date, even beyond the state-of-the- art architecture GapNet-PL by 0.02 to 0.03 in F1 score. The whole work reveals the usefulness of our methods for high-throughput microscopy protein images identication. Keywords: Protein classication Deep learning Mixed patterns of proteins High-throughput microscopy images © Springer Nature Switzerland AG 2019 D.-S. Huang et al. (Eds.): ICIC 2019, LNCS 11643, pp. 448459, 2019. https://doi.org/10.1007/978-3-030-26763-6_43

Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

Classifying Mixed Patterns of Proteinsin High-Throughput Microscopy Images

Using Deep Neural Networks

Enze Zhang1,2, Boheng Zhang3, Shaohan Hu4, Fa Zhang1(&),and Xiaohua Wan1

1 High Performance Computer Research Center, Institute of ComputingTechnology, Chinese Academy of Sciences, Beijing, China{zhangenze,zhangfa,wanxiaohua}@ict.ac.cn

2 University of Chinese Academy of Sciences, Beijing, China3 Department of Automation, Tsinghua University, Beijing, China

[email protected] School of Software, Tsinghua University, Beijing, China

[email protected]

Abstract. Proteins contribute significantly in most body functions within cells,and are essential to the physiological activities of every creature. Microscopyimaging, as a remarkable technique, is applied to observe and identify proteinsin different kinds of cells, by which the analysis results are critical to thebiomedical studies. However, as the development of high-throughput micro-scopy imaging, images of protein microscopy are generated in a faster pace ever,making it harder for experts to manually identify them. For better digging andunderstanding the information of the proteins in those huge amounts of images,it is urgent for methods to identify the mixed-patterned proteins within variouscells automatically and accurately. Here in this paper, we design some novel andeffective data preparation and preprocessing methods for high-throughputmicroscopy protein datasets. We propose ACP layer and “buffering” layers,using them to design customized architectures for some typical CNN classifierswith new inputs and head parts. The modifications let the models be moreadaptive and accurate to our task. We train the models in more effective andefficient optimization strategies that we design, e.g., cycle learning with learningrate scheduling. Besides, greedy selection of thresholds and multi-sized modelsensembling in the post-process stage are proposed to further improve the pre-diction accuracy. Our experimental results based on Human Protein Atlasdatasets demonstrates that the proposed methods show an excellent performancein mixed-patterned protein classifications to date, even beyond the state-of-the-art architecture GapNet-PL by 0.02 to 0.03 in F1 score. The whole work revealsthe usefulness of our methods for high-throughput microscopy protein imagesidentification.

Keywords: Protein classification � Deep learning �Mixed patterns of proteins �High-throughput microscopy images

© Springer Nature Switzerland AG 2019D.-S. Huang et al. (Eds.): ICIC 2019, LNCS 11643, pp. 448–459, 2019.https://doi.org/10.1007/978-3-030-26763-6_43

Page 2: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

1 Introduction

Proteins perform their function in different types and forms within distinct cells.Understanding various protein structures in different cell types of highly differentmorphology is fundamental in understanding processes of human body. Proteinidentification was restricted to one pattern in certain types of cells. However, themodels are expected to recognize multi-patterns that are mixed together among varioustypes of cells. The identification of some specific patterns of proteins may reveal thecontacts and relations of them and help us knowing better about biological features,such as diseases or evolution, of ourselves.

Recently, the Human Protein Atlas is determined to create a smart-microscopysystem to classify the proteins and localize their positions from high-throughput imagesusing methods capable of classifying mixed patterns of proteins in cellular microscopyimages (see Fig. 1). Generally, human experts can identify proteins based on thosehigh-throughput microscopy images [1]. However, it is considered as high-cost andtime-consuming because there are lots of extremely similar and confusing patterns inthose images. Therefore, high performance in classifying proteins is expected byapplying deep learning methods since the HPA project has provided plenty of proteinmicroscopy images with annotations as our data for training neural network models [7].

DNNs methods have become popular in image analysis and other tasks currently asthey can learn features automatically. Especially, CNNs have been strong tools forimage classification, localization, detection, segmentation and so on, e.g., FCNs [3],VggNet [6], InceptionNet [3], ResNet [4], DenseNet [5]. Currently, CNNs have alsobeen applied to analyze biological images, medical images, and microscopy images.However, these classification networks cannot be applied to these high-throughputmicroscopy data directly due to some certain characteristics such as the complexpatterns in the data, relatively high resolution with random sizes, and high noise.

GapNet-PL [7] is a state-of-the-art CNN architecture that has been designed totackle the characteristics of high-throughput fluorescence microscopy imaging data anduses global averages of filters at different abstraction levels. The architecture canachieve an excellent performance on datasets provided by HPA project. For betteraccuracy, we came up with some novel methods of dataset preprocessing, which helpedto improve the performance. We utilized some typical CNNs feature extractors andbuilt our networks by combining these normal encoders with the novel customized

Fig. 1. Some random protein microscopy images from HPA Cell Atlas datasets.

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 449

Page 3: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

head parts. Moreover, we came up with some new optimization strategies. Greedyselection of thresholds and multi-sized model ensembling in postprocessing aredesigned to further improve the scores of the models. Our experimental results showthat the proposed methods make our models achieve higher accuracy than all baselineapproaches. In the following, we will explain our datasets that are applied to theexperiments and the proposed methods in details.

2 Methods

2.1 Dataset Preparation and Preprocessing

All experiments were conducted on datasets released for the “Human Protein AtlasImage Classification” challenge by Human Protein Atlas [8]. The main dataset containsaround 30000 samples for training and 11500 samples for testing from part of the HPACell Atlas led by Dr. Emma Lundberg. And we also adopt around 70000 external datafrom the HPA Cell Atlas [9]. Therefore, we have around 100000 samples for trainingand validating, and about 30000 for testing. We removed about 6000 duplicatedsamples which seem extremely similar by image hashing to avoid label distributionshifting problem.

There are 28 distinct classes (or types) of proteins in the dataset. The data iscollected via the confocal microscopy approach. However, the dataset includes 27various cell types of highly different morphology, and this could influence the proteinpatterns of the distinct organelles. Every sample in the dataset is formed of 4 imagechannels, red, green, blue and yellow. Each channel filter is stored as an individual file.The red channel represents for the microtubules, green for protein, blue for nucleus,and yellow for reticulum. Obviously, the green channel files should be used as thelabels, and other channels could also be utilized for references. We use all filterinformation for our inputs. Moreover, there is extreme label imbalance in the dataset

Fig. 2. The categories and calculated distribution of 28 protein classes (Color figure online).

450 E. Zhang et al.

Page 4: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

(see Fig. 2). Some type of proteins may take most part of the whole dataset, e.g.,Nucleoplasm, Cytosol. Rare classes, like Rods & rings, are hard to train and predict butplay an important role in the score. Therefore, we adopted multilabel stratificationparticularly to balance the inconsistent distributions of training and validation datawhen splitting the whole training dataset. The training samples were randomly splitinto a training (90%, 80% when 5-folds CV) set and a validation (10%, 20% when 5-folds CV) set by different random seeds (Fig. 3).

The data was provided as two versions of the same images, a scaled set of512 � 512 PNG files and full-sized original images (a mix of 2048 � 2048 and3072 � 3072 TIFF files). However, there are only full-sized images for our externaldata. To obtain more accurate prediction results, full size original images were adopted.Though the original images are with high quality, we must find a balance betweenmodel efficiency and accuracy. Therefore, we resized every image to 768 � 768 or1024 � 1024 depending on its original size. And we randomly crop 512 � 512 pat-ches from 768 � 768 images (or cropping 768 � 768 patches from 1024 � 1024images) when training time augmentations before feeding images to the models.

2.2 Customized Architectures

We design our architectures inspired by some typical classification networks,improving them with more targeted, accurate and adaptive head parts, as well as newinput layers. We implement these architectures with two parts, encoders and heads. Weadopt some encoders from the widely utilized classifiers, as they have been provedadvanced and effective in many situations. We change the input layers to ones with 4channels because all four filters (red, green, blue, yellow) of each image are utilized.

Fig. 3. The figure illustrates some protein images (green) related to Endosomes together withother three filter channels. The first row shows a sample that only contains Endosomes proteins.The second row displays another sample which contains two mixed types of protein. (Colorfigure online)

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 451

Page 5: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

The head part is key to the task. The high-throughput microscopy images of proteinsare often related with various sizes, high resolution and complex patterns. Therefore,we drop the GAP (global average pooling) layers at the beginning of other typicalheads (see Fig. 4(a)). Instead we propose and build up an ACP (adaptive concatenatepooling) layer by concatenating two kinds of pooling layers, an adaptive averagepooling layer and an adaptive max pooling layer, together in channel dimension (seeFig. 4(b)). An adaptive average pooling layer applies adaptive average pooling over aninput signal composed of several input planes, while an adaptive max pooling layerapplies adaptive max pooling. The output is of size H � W, for any input size. Thenumber of output features is equal to the number of input planes. It allows us to decideon what output dimensions we want, instead of choosing the input’s dimensions to fit adesired output size. Therefore, we set the H and W to be 1. No matter the sizes of theinput images, this layer will act like global pooling adaptively. We utilize and con-catenate both types of pooling layers because it provides the model with the infor-mation of both methods and improves performance.

We assume that the channel number of output features by a general encoder is C.After our adaptive concatenate pooling, the channel number will be 2C, usually arelatively large number. Instead of directly cutting or shrinking the channel number tothe target channel number as usual classifiers do, which would lose much informationwe have got from encoded features, we added one or two middle linear layers which wecall “buffering layers”, with channel numbers distributed from C to C/4, to maintain the

Fig. 4. The comparison between (a) typical CNN classifier with normal input and head part, and(b) proposed architecture with customized head and input layer (Color figure online).

452 E. Zhang et al.

Page 6: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

original feature information as much as possible (see Fig. 4(b)). Both the adaptiveconcatenate pooling and buffering layers help to let the classifiers be more accurate ontheir specific task. The modification of the input layers also matters.

2.3 Optimization Strategies

Although we may have designed good architectures, we discover it’s rather hard andinefficient to train networks on this dataset so we cannot get an ideal result. Therefore,we design some more effective and efficient optimization strategies. While training ourmodels, we divide our networks to 2 different layer groups (see Fig. 5(a)). The firstgroup consists of the layers of encoders and the second group includes the layers ofheads. Therefore, we can apply multi learning rates on different layer groups. It hasbeen proven that building CNNs based on pre-trained architectures usually performsbetter than building ones from scratch. Obviously, when adapted to a new task, theweights of top layers need the most changing since they are newly initialized andpresent more high-level object features and those deeper-level (encoder part) layersneed less change since they are already well-trained to recognize some primary featureslike lines and corners. Since the images in HPA datasets are quite different fromImageNet that only include normal and daily pictures, we decrease the learning rate forfirst layer group (encoder part) only by 2 to 3 times. Therefore, models are trained with[lr/2, lr] where lr denotes the learning rate.

Fig. 5. (a) The network is divided with 2 layer groups with separate learning rate (b) Thefunction of batch loss and learning rate when find the optimal starting lr (the best lr is around 1e−3 to 1e−2). (c) Some condition where the loss might be stuck at local minima as the learningrate keeps dropping during training. (d) An example of cycle learning with cosine annealing andcycle repetition (lr restart).

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 453

Page 7: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

As for choosing the starting learning rates, we no longer try the learning rate from alarger one to a small one, or conversely. We begin with finding the optimal initiallearning rate. In this method, at first, we train our networks with a relative low learningrate (around 1e−6). Then we gradually increase it exponentially with each batch, andthe loss is recorded in an array for each learning rate at the same time (see Fig. 5(b)).The current optimal learning rate is the value found where the learning rate is highestyet the loss is still dropping.

After choosing the starting optimal learning rate, we adopted cycle learning withlearning rate scheduling to train our models (see Fig. 5(d)). The cycle learning, inspiredby Leslie Smith’s work [10], contains two key factors which are cosine annealing andcycle repetition. Along the process of training, the total loss of the architecture shouldbe closer and closer to the minimum (local or global). However, it is often found hardto converge as the loss gets closer to its minimum value. Cosine annealing solves theproblem by decreasing the learning rate following the mode of cosine function witheach batch of data. We start at a high-level of learning rate, and drop the learning ratebased on a cosine function. We found this mode of learning rate decreasing works verywell with the convergence problem in this task. Moreover, it is very likely for loss to betrapped at its local minimum instead of the global one during training (see Fig. 5(c)).Therefore, if we increase the learning rate suddenly, the current loss may find its waytoward the global minimum by “jumping” out of the local minimum with a biggerstep. We reset the learning rate each time the learning rate drops to it’s minimum valueby cosine annealing, and we call that a “cycle”. We repeat the process every time onecycle is done, until the loss hardly decreases.

2.4 Thresholds Selection and Models Ensembling with Multi-sized Inputs

For convenience, we use single value of 0.5 as the thresholds for F1 computation of allclasses when training. During post-process time, instead of 0.5, we use multi values asthresholds for different classes. And the value of the threshold for each class indi-vidually is adjusted by validation set. As we know, the model responds much stronglyto common classes since their training samples are more than others. Therefore,probability scores of these classes are more likely to be higher than 0.5, even gettingclose to 1.0. The rare classes, on the contrary, are more tended to obtain much smallerscores even close to 0. Considering the extreme label imbalance of our dataset, agreedy thresholds selection method is proposed to reduce the impact caused by thesample imbalance to the model, and in the end to boost the referring scores viaincreasing the scores of the validation data.

We search from 0 to 1.0 in steps of 0.001 to find the best threshold for each class.The starting threshold value for every class is set to be 0.3, which is about the bestsingle threshold we have got if the threshold value is the same for each class. For theprocess of greedy thresholds selection, we start from the first class and fix thethresholds of other classes, finding and choosing the local optimal value that canachieve the highest score on validation set. Then we move on to the second class andfix the thresholds of other classes, exploring the local optimal value. The searchingprocess continues until we finish the last class. In the end we obtain an array of “greedy

454 E. Zhang et al.

Page 8: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

optimal” thresholds for 28 classes using the threshold strategy, and it works well in theexperiments.

Apart from ensembles of N-folds from CV (cross validation), we ensemble themodels with different architectures and input sizes (512 � 512, 768 � 768) whilepredicting. Generally, every kind of network has its advantages. If we adopt a fewnetworks and take advantages from each one, the model would be more general androbust. Meanwhile, the models with larger inputs may do better in details classificationand the models with smaller input sizes are more stable to a dataset and easier to train.By averaging the predicting scores of various models with different input sizes, weachieve a much better final performance. The experiment and evaluation results will beshowed and analyzed in the next section.

3 Experiments and Results

3.1 Models Designing and Training Settings

The whole experiments are implemented with Python3.6 under the framework ofPyTorch1.0 library [11]. To conduct a fair comparison we re-implemented all methods,including all baselines and ours, and optimized the relevant hyperparameters and evenstructures for each method. As it is a multi-class and multi-label classification task with28 classes, the final output layers of all networks contain 28 units. Apart from theGapNet, all models are optimized by Focal loss with c set by 2. The batch sizes for themodels depends on their memory consumption. They are set as large as possible to fiton 4 NVIDIA GTX 1080 Ti GPU with around 11 GB memory for each. We use Adamwith default settings as the optimizer of our models except GapNet.

VGG19. A modified version is adopted where a BN (batch normalization) layer fol-lows every time there is a convolutional layer in the original VGG, which makes thetraining easier.

ResNet18. The optimal learning rate is discovered around 3e−4, and it grows a littlewith new head part to 4e−4. We use 3e−4 with learning rate scheduling to train theoriginal ResNet18, after replacing its old GAP layer to fit the new input size. For themodified ResNet18, we use cycle learning with a cycle length of 5 epochs, optimizedby learning rates of [2e−4, 4e−4]. We decrease the learning rate by a half every cycle.After about 20 epochs the training process stops. The number of buffering layers is setto be one with 1024 units. We use the large inputs of 768 � 768 to feed the networks.Since it is a small network, a batch size of 128 is adopted.

ResNet50. For modified ResNet50, cycle learning with length of 4 epochs is used, andwe apply the learning rates of [1e−4, 2e−4] to train for 3 or 4 cycles where we find theperformance is the best. One buffering layer of 1024 units with 0.5 dropout ratio isemployed following the flattened ACP layer of 4096 units. We use 512 � 512 crops tofeed the networks and adopt a batch size of 74 as after several attempts.

ResNet101. Since we have enough data for feeding this large network, the model shallbe solid. The alteration of ResNet101 is the same as ResNet50 except that the ACP

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 455

Page 9: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

layer is followed by two buffering layers with 4096 units, 0.25 dropout ratio and 1024units, 0.5 dropout ratio separately.

InceptionV4. InceptionV4 is proposed recently and has been widely applied. Thoughit improves memory consumption problem existing on InceptionV3, it is still spaceconsuming. Therefore, we use a batch size of 48 with the input size of 512 � 512. Thechannel number of the encoder outputs is 1536, and it becomes 3072 after ACP layer.We add one buffering layer with 1024 units for a little information maintaining, fol-lowed by 50% dropout.

GapNet-PL. This architecture has a relatively simple structure and low number ofparameters. To achieve its best performance, we add some convolutional blocks in itsfirst step, thus passing more features to its second stage. Large input size of 768 � 768is fed, since we want to test its best capability on the dataset. The Stochastic GradientDescent (SGD) with momentum of 0.9 is kept as well as its original initial learning rateof 0.01. To avoid overfitting the following regularization techniques are applied: L1norm of 1e−7, L2 norm of 1e−5, which is the same as the original settings. The dropoutrate in the fully connected layers is still 30%, and we use a bigger batch size of 128.

3.2 Evaluation and Results

For performance evaluation, we use F1-score as our evaluating metric. As it is a taskfor multilabel classification, the version of Macro F-Score is adopted. It calculates theF1 score for each label, and then finds their unweighted mean. The computing methodis as follows:

R ¼ TPTPþFN ð1Þ

P ¼ TPTPþFP ð2Þ

F1 ¼ 2PRPþR ð3Þ

F1macro ¼PN

i¼1F1i

Nð4Þ

where R denotes the Recall score of one certain class and P denotes the Precision of aclass. TP, FP and FN denote the number of true positive, false positive and falsenegative correspondingly. The F1 score (single class) is computed by harmonic averageof Recall and Precision. At last the Macro F1 score that measuring the classificationaccuracy of all classes can be obtained by unweighted average of F1 for each class.

Usually, when evaluating the performance of a classification task, a confusionmatrix can be utilized to help understand. However, as this is a multi-class classifi-cation with multi labels for each sample, it is a bit hard to say which class is mistakenfor another if you contain two or more classes in the label of one sample. Therefore, weuse three tables to demonstrate the evaluation results of the proposed methods.

456 E. Zhang et al.

Page 10: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

In Table 1, we can see that both the proposed customized architectures (Cus-tomized) and our optimization strategies (Opt) help to improve the performance ofidentifying those mixed patterns of protein microscopy images, compared with theoriginal classifiers (Original). We can even see a relatively large increase of F1 by even0.02 to 0.03 in some customized models. Except of the proposed ACP layer andbuffering layer, the new input layers which can help the networks utilize all providedcellular landmarks also matter.

In Table 2, we list performances of all customized single fold models trained withproposed strategies and applied with thresholds selection. And we also list the f1 scoreof GapNet-PL for comparison. We can see that our thresholds optimization algorithm issignificant to the performance of models. The F1 increase 0.01-0.02 after thresholds

Table 1. Comparison of model performance (F1) with various methods combination

Model Original Original+Opt Customized Customized+Opt

ResNet50 (fold 0) 0.737 0.742 0.750 0.759ResNet50 (fold 1) 0.729 0.738 0.734 0.747ResNet50 (fold 2) 0.735 0.737 0.746 0.751ResNet50 (fold 3) 0.731 0.733 0.741 0.746ResNet50 (fold 4) 0.726 0.731 0.730 0.735InceptionV4 (single fold) 0.736 0.743 0.749 0.758VGG19_BN (single fold) 0.721 0.727 0.736 0.745ResNet18 (random fold 1) 0.725 0.728 0.737 0.743ResNet18 (random fold 2) 0.719 0.730 0.725 0.734ResNet18 (random fold 3) 0.728 0.734 0.741 0.748ResNet101 (single fold) 0.742 0.749 0.758 0.766

Table 2. Scores of models applied with methods above and threshold selection (except GapNet)

Model Macro F1

ResNet50 (fold 0) 0.773ResNet50 (fold 1) 0.766ResNet50 (fold 2) 0.769ResNet50 (fold 3) 0.762ResNet50 (fold 4) 0.754InceptionV4 (single fold) 0.776VGG19_BN (single fold) 0.761ResNet18 (random fold 1) 0.758ResNet18 (random fold 2) 0.747ResNet18 (random fold 3) 0.763ResNet101 (single fold) 0.780GapNet-PL 0.763

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 457

Page 11: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

selection compared with scores in Table 1. Moreover, we can find out that many singlefold models equipped with the proposed methods can perform better than GapNet-PL.Especially the ResNet101, could achieve around 0.02 higher scores than GapNet-PL,and it can still perform a little better than GapNet-PL even without thresholds selectionaccording to Tables 1 and 2.

In Table 3, we demonstrate the performance of 6 ensembled models for furthercomparison. We notice that though the scores of VGG and ResNet18 are relatively low,the performance improves a little after ensembled with other models with 512 � 512inputs. For example, there would be a 0.002 drop, from 0.791 to 0.789, on F1 withoutensemble of VGG, according to the third and fourth row of the table. Since theVGG19_BN and Resnet18 are both fed with 768 � 768 inputs and others are with512 � 512 input sizes, we can conclude that the ensemble of models with multi-sizedinputs do help to improve the accuracy.

Finally we can obtain some models with excellent performance as seen in tablesabove, based on all proposed methods. The score of the best model comes to 0.791,even a 0.028 higher than the state-of-the-art model GapNet-PL on Macro F1 score.

4 Conclusion and Discussion

In this paper, we propose some effective methods on identifying proteins with mixedpatterns in high-throughput microscope images based on datasets provided by HumanProtein Atlas. We design some customized typical CNN architectures with new inputlayers and novel top parts. Several data preparation and preprocessing methods areproposed to solve data distribution problems and improve total performance. And byproposing “resize and crop” method mapping the original huge sized high-throughputmicroscopy images to different smaller sizes, we find a balance between efficiency andaccuracy on processing this kind of data. Meanwhile, some optimization strategies areproposed to improve the training performance and the accuracy of models, which isimplemented by layer group division, optimal learning rate probing and cycle learning.

Table 3. Performance comparison of models with different ensembles and the GapNet-PL

Model Input size Macro F1

ResNet50 (5-folds) 512 0.783ResNet18 (3-random folds) 768 0.7685 (ResNet50) + 3 (ResNet18)+ 1 (ResNet101)+ 1 (InceptionV4) +1 (VGG19)

512, 768 0.791

5 (ResNet50) + 3 (ResNet18)+ 1 (ResNet101)+ 1 (InceptionV4) 512, 768 0.7895 (ResNet50) + 1 (ResNet101)+ 1 (InceptionV4)+ 1 (VGG19) 512, 768 0.7885 (ResNet50) + 3 (ResNet18) 512, 768 0.784GapNet-PL 768 0.763

458 E. Zhang et al.

Page 12: Classifying Mixed Patterns of Proteins in High-Throughput …ear.ict.ac.cn/wp-content/download/papers/Classifying... · 2019. 9. 12. · GapNet-PL [7] is a state-of-the-art CNN architecture

Some postprocess strategies have been designed to further upgrade the scores of ourmodels.

The evaluation results based on our experiments demonstrate that our methods doimprove the performances of regular CNNs, and the best-performing models based onour methods outperform all baselines and the state-of-the-art architecture with their bestsettings on HPA datasets. The work reveals the usefulness of our methods for high-throughput microscopy protein images identification.

In the future work, we may try different combinations of input channels instead ofusing all provided channel filters. We would adopt larger input crops if more com-puting resources are available.

Acknowledgments. This research is supported by the Strategic Priority Research Program ofthe Chinese Academy of Sciences Grant (No. XDA19020400), the National Key Research andDevelopment Program of China (No. 2017YFE0103900 and 2017YFA0504702, 2017YFE0100500), Beijing Municipal Natural Science Foundation Grant (No. L182053), the NSFCprojects Grant (No. U1611263, U1611261 and 61672493).

References

1. Swamidoss, I.N., et al.: Automated classification of immunostaining patterns in breast tissuefrom the human protein atlas. J. Pathol. Inf. 4(Suppl) (2013)

2. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmen-tation. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, pp. 3431–3440 (2015)

3. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact ofresidual connections on learning. CoRR, abs/1602.07261 (2016)

4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEEConference on Computer Vision and Pattern Recognition (CVPR) (2015)

5. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutionalnetworks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),pp. 2261–2269. IEEE (2017)

6. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale imagerecognition. In: ICLR (2015)

7. Rumetshofer, E., Hofmarcher, M., Röhrl, C., Hochreiter, S., Klambauer, G.: Human-levelprotein localization with convolutional neural networks. In: ICLR (2019)

8. Human Protein Atlas Image Classification Challenge . https://www.kaggle.com/c/human-protein-atlas-image-classification

9. The Human Protein Atlas. http://www.proteinatlas.org/10. Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter

Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE (2017)11. PyTorch 1.0 library. https://pytorch.org/. Accessed 23 Feb 2019

Classifying Mixed Patterns of Proteins in High-Throughput Microscopy Images 459