10
This may be the author’s version of a work that was submitted/accepted for publication in the following source: Nguyen Thanh, Kien, Fookes, Clinton, Ross, Arun, & Sridharan, Sridha (2017) Iris recognition with off-the-shelf CNN features: A deep learning perspec- tive. IEEE Access, 6, pp. 18848-18855. This file was downloaded from: https://eprints.qut.edu.au/116718/ c Consult author(s) regarding copyright matters This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected] Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source. https://doi.org/10.1109/ACCESS.2017.2784352

c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

This may be the author’s version of a work that was submitted/acceptedfor publication in the following source:

Nguyen Thanh, Kien, Fookes, Clinton, Ross, Arun, & Sridharan, Sridha(2017)Iris recognition with off-the-shelf CNN features: A deep learning perspec-tive.IEEE Access, 6, pp. 18848-18855.

This file was downloaded from: https://eprints.qut.edu.au/116718/

c© Consult author(s) regarding copyright matters

This work is covered by copyright. Unless the document is being made available under aCreative Commons Licence, you must assume that re-use is limited to personal use andthat permission from the copyright owner must be obtained for all other uses. If the docu-ment is available under a Creative Commons License (or other specified license) then referto the Licence for details of permitted re-use. It is a condition of access that users recog-nise and abide by the legal requirements associated with these rights. If you believe thatthis work infringes copyright please provide details by email to [email protected]

Notice: Please note that this document may not be the Version of Record(i.e. published version) of the work. Author manuscript versions (as Sub-mitted for peer review or as Accepted for publication after peer review) canbe identified by an absence of publisher branding and/or typeset appear-ance. If there is any doubt, please refer to the published source.

https://doi.org/10.1109/ACCESS.2017.2784352

Page 2: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 1

Iris Recognition with Off-the-Shelf CNN Features:A Deep Learning Perspective

Kien Nguyen, Member, IEEE, Clinton Fookes, Senior Member, IEEE, Arun Ross, Senior Member, IEEE,Sridha Sridharan, Senior Member, IEEE

(Invited Paper)

Abstract—Iris recognition refers to the automated process ofrecognizing individuals based on their iris patterns. The seem-ingly stochastic nature of the iris stroma makes it a distinctive cuefor biometric recognition. The textural nuances of an individual’siris pattern can be effectively extracted and encoded by projectingthem onto Gabor wavelets and transforming the ensuing phasorresponse into a binary code - a technique pioneered by Daugman.This textural descriptor has been observed to be a robust featuredescriptor with very low false match rates and low computationalcomplexity. However, recent advancements in deep learning andcomputer vision indicate that generic descriptors extracted usingconvolutional neural networks (CNNs) are able to representcomplex image characteristics. Given the superior performance ofCNNs on the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and a large number of other computer vision tasks,in this paper, we explore the performance of state-of-the-art pre-trained CNNs on iris recognition. We show that off-the-shelf CNNfeatures, while originally trained for classifying generic objects,are also extremely good at representing iris images, effectivelyextracting discriminative visual features and achieving promisingrecognition results on two iris datasets: ND-CrossSensor-2013and CASIA-Iris-Thousand. We also discuss the challenges andfuture research directions in leveraging deep learning methodsfor the problem of iris recognition.

Index Terms—Iris Recognition, Biometrics, Deep Learning,Convolutional Neural Network.

I. INTRODUCTION

IRIS recognition refers to the automated process of recog-nizing individuals based on their iris patterns. Iris recogni-

tion algorithms have demonstrated very low false match ratesand very high matching efficiency in large databases. This isnot entirely surprising given the (a) complex textural patternof the iris stroma that varies significantly across individuals,(b) the perceived permanence of its distinguishing attributes,and (c) its limited genetic penetrance [1]–[5]. A large-scaleevaluation conducted by the National Institute of Science andTechnology (NIST) has further highlighted the impressiverecognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report from 2014 [8], overone billion people worldwide have had their iris images elec-tronically enrolled in various databases across the world. Thisincludes about 1 billion people in the Unique IDentification

Kien Nguyen, Clinton Fookes and Sridha Sridharan are with Image andVideo Lab, Queensland University of Technology, Brisbane, QLD 4000Australia, E-mail: {k.nguyenthanh,c.fookes,s.sridharan}@qut.edu.au

Arun Ross is with the Department of Computer Science and Engi-neering, Michigan State University, East Lansing, MI 48824 USA, Email:[email protected]

Manuscript received Sept 15, 2017

Authority of India (UIDAI) program, 160 million people fromthe national ID program of Indonesia, and 10 million peoplefrom the US Department of Defense program. Thus, the irisis likely to play a critical role in next generation large-scaleidentification systems.

State-of-the-art literature: The success of iris recognition- besides its attractive physical characteristics - is rooted inthe development of efficient feature descriptors, especiallythe Gabor phase-quadrant feature descriptor introduced inDaugman’s pioneering work [3], [5], [9]. This Gabor phase-quadrant feature descriptor (often referred to as the iriscode)has dominated the iris recognition field, exhibiting very lowfalse match rates and high matching efficiency. Researchershave also proposed a wide range of other descriptors for irisbased on Discrete Cosine Transforms (DCT) [10], DiscreteFourier Transforms (DFT) [11], ordinal measures [12], class-specific weight maps [13], compressive sensing and sparsecoding [14], hierarchical visual code-books [15], multi-scaleTaylor expansion [16], [17], etc. Readers are referred to[18]–[21] for an extensive list of methods that have beenappropriated for iris recognition.

Gap: Given the widespread use of classical texture descrip-tors for iris recognition, including the Gabor phase-quadrantfeature descriptor, it is instructive to take a step back andanswer the following question: how do we know that thesehand-crafted feature descriptors proposed in the literature areactually the best representations for the iris? Furthermore, canwe achieve better performance (compared to the Gabor-basedapproach) by designing a novel feature representation schemethat can perhaps attain the upper bound on iris recognitionaccuracy with low computational complexity?

Potential solution: One possible solution is to leveragethe recent advances in Deep Learning to discover a featurerepresentation scheme that is primarily data-driven. By auto-matically learning the feature representation from the iris data,an optimal representation scheme can potentially be deduced,leading to high recognition results for the iris recognition task.Deep learning techniques often use hierarchical multi-layernetworks to elicit feature maps that optimize performance onthe training data [22]. These networks allow for the featurerepresentation scheme to be learned and discovered directlyfrom data, and avoid some of the pitfalls in developing hand-crafted features. Deep learning has completely transformedthe performance of many computer vision tasks [23], [24].Therefore, we hypothesize that deep learning techniques, asembodied by Convolutional Neural Networks (CNNs), can be

Page 3: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 2

used to design alternate feature descriptors for the problem ofiris recognition.

Why are deep iris methods not widely used as yet? Therehave been a few attempts to appropriate the principles of deeplearning to the task of iris recognition [25], [26]. The limitedapplication of deep learning methods to the problem of irisrecognition is due to the fact that deep learning requires a hugeamount of training data, which is not available to most irisresearchers at this time. In addition, deep learning is also verycomputationally expensive and requires the power of multipleGraphical Processing Units (GPUs). This is a deterrent to thephysical implementation of such deep learning approaches.Most importantly, to date, there has been no insight intowhy deep learning should work for iris recognition, and nosystematic analysis has been conducted to ascertain how bestto capitalize on modern deep approaches to design an optimalarchitecture of deep networks to achieve high accuracy andlow computational complexity. Simply stacking multiple layersto design a CNN for iris recognition without intuitive insightswould be infeasible (due to the lack of large-scale iris datasetsin the public domain), non-optimal (due to ad hoc choices forCNN architecture, number of layers, configuration of layers...)and inefficient (due to redundant layers).

We argue that rather than designing and training new CNNsfor iris recognition, using CNNs whose architectures havebeen proven to be successful in large-scale computer visionchallenges should yield good performance without the time-consuming architecture design step. A major source of state-of-the-art CNNs is from the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) [27] organized annuallyto evaluate state-of-the-art algorithms for large-scale objectdetection and image classification. The networks developedas part of this challenge are generally made available inthe public domain for extracting deep features from images.Researchers have shown that these off-the-shelf CNN featuresare extremely effective for various computer vision tasks,such as facial expression classification, action recognition andvisual instance retrieval, and are not restricted to just theobject detection and image classification tasks for which theywere designed [28]. In this paper, we will investigate theperformance of those CNNs that won the ILSVRC challengesince 2012 (before 2012, the winners were non-CNN methodsthat did not perform as well as CNN-based approaches).

The main contributions of this paper are as follows:

• First, we analyze the deep architectures which have beenproposed in the literature for iris recognition.

• Second, we appropriate off-the-shelf CNNs to the prob-lem of iris recognition and present our preliminary resultsusing them.

• Third, we discuss the challenges and the future of deeplearning for iris recognition.

The remainder of the paper is organized as follows: Sec-tion II briefly introduces CNNs: Section II-A discusses relatedwork in general CNNs, while Section II-B discusses CNNs thathave been proposed for iris recognition. Section III describesthe off-the-shelf CNNs that were used in this work, and ourproposed framework for investigating the performance of these

CNNs. Section IV presents the experimental results. Section Vconcludes the paper with a discussion on future work.

II. RELATED WORK

A. CNNs - Convolutional Neural Networks

Deep learning methods, especially convolutional neuralnetworks (CNNs), have recently led to breakthroughs inmany computer vision tasks such as object detection andrecognition, and image segmentation and captioning [22]–[24]. By attempting to mimic the structure and operationof the neurons in the human visual cortex through the useof hierarchical multi-layer networks, deep learning has beenshown to be extremely effective in automating the process oflearning feature-representation schemes from the training data,thereby eliminating the laborious feature engineering task.CNNs belong to a specific category of deep learning methodsdesigned to process images and videos. By using repetitiveblocks of neurons in the form of a convolution layer that isapplied hierarchically across images, CNNs have not only beenable to automatically learn image feature representations, butthey have also outperformed many conventional hand-craftedfeature techniques [29].

In the 1960s, Hubel and Wiesel found that cells in theanimal visual cortex were responsible for detecting light inreceptive fields and constructing an image [30]. Further, theydemonstrated that this visual field could be represented usinga topographic map. Later, Fukushima proposed the NeoCogni-tron, which could be regarded as the predecessor of the CNN[31]. The seminal modern CNN was introduced by Yan Lecunet al. in the 1990s for Handwritten Digit Recognition with anarchitecture called LeNet [32]. Many features of modern deepnetworks are derived from the LeNet, where convolutionalconnections were introduced and a backpropagation algorithmwas used to train the network. CNNs became exceptionallypopular in 2012 when Krizhesky et al. introduced a CNNcalled AlexNet, which significantly outperformed previousmethods on the ImageNet Large-Scale Visual RecognitionChallenge (ILSVRC) [33]. The AlexNet is simply a scaledversion of the LeNet with a deeper structure, but is trained ona much larger dataset (ImageNet with 14 million images) witha much more powerful computational resource (GPUs).

Since then, many novel architectures and efficient learningtechniques have been introduced to make CNNs deeper andmore powerful [34]–[37], achieving revolutionary performancein a wide range of computer vision applications. The annualILSVRC event has become an important venue to recognizethe performance of new CNN architectures, especially with theparticipation of technology giants like Google, Microsoft andFacebook. The depth of the “winning” CNNs has progressivelyincreased from 8 layers in 2012 to 152 layers in 2015, whilethe recognition error rate has significantly dropped from 16.4%in 2012 to 3.57% in 2015. This phenomenal progress isillustrated in Figure 1.

Pre-trained CNNs have been open-sourced and widely usedin other applications and show very promising performance[28].

Page 4: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 3

Fig. 1. The evolution of the winning entries on the ImageNet LargeScale Visual Recognition Challenge from 2010 to 2015. Since 2012, CNNshave outperformed hand-crafted descriptors and shallow networks by a largemargin. Image re-printed with permission [36].

B. CNNs for iris recognition in the literature

A number of deep networks have been proposed for improv-ing the performance of iris recognition. Liu et al. proposeda DeepIris network of 9 layers consisting of one pairwisefilter layer, one convolutional layer, two pooling layers, twonormalization layers, two local layers and one fully-connectedlayer [38]. This deep network achieved a very promisingrecognition rate on both the Q-FIRE [39] and CASIA [40]datasets. Gangwar et al. employed more advanced layers tocreate two DeepIrisNets for the iris recognition task [25]. Thefirst network, DeepIrisNet-A, contained 8 convolutional layers(each followed by a batch normalization layer), 4 poolinglayers, 3 fully connected layers and two drop-out layers.The second network, DeepIrisNet-B, added two inceptionlayers to increase the modeling capability. These two networksexhibited superior performance on the ND-IRIS-0405 [41] andND-CrossSensor-Iris-2013 [41] datasets. It is worth mention-ing that CNNs have also been used in the iris biometricscommunity for iris segmentation [42], [43], spoof detection[44], [45] and gender classification [46]. While self-designedCNNs such as DeepIris [38] and DeepIrisNet [25] have shownpromising results, their major limit lies in the design of thenetwork since the choice on the number of layers is limitedby the number of training samples. The largest public datasetthat is currently available is the ND-CrossSensor-2013 datasetwhich contains only 116,564 iris images. This number is farfrom the millions of parameters that embody any substantiallydeep neural network.

To deal with the absence of a large iris dataset, transferlearning can be used. Here, CNNs that have been trained onother large datasets such as ImageNet [47], can be appropriateddirectly to the iris recognition domain. In fact, CNN modelspre-trained on ImageNet, have been successfully transferred tomany computer vision tasks [28]. Minaee et al. showed that theVGG model, even though pre-trained on ImageNet to classifyobjects from different categories, works reasonably well forthe task of iris recognition [26]. However, since the release ofthe VGG model in 2014, many other advanced architectureshave been proposed in the literature. In this paper, we willharness such CNN architectures - primarily those that have

won the ImageNet challenge - for the iris recognition task.

III. METHODS - OFF-THE-SHELF CNN FEATURES FOR IRISRECOGNITION

Considering the dominance of CNNs in the computer visionfield and inspired by recent research [28] which has shownthat Off-the-Shelf CNN Features work very well for multipleclassification and recognition tasks, we investigate the perfor-mance of state-of-the-art CNNs pre-trained on the ImageNetdataset for the iris recognition task. We first review somepopular CNN architectures and then present our frameworkfor iris recognition using these CNN Features.

A. CNNs in use

We will now analyze each architecture in detail and high-light their notable properties.

AlexNet: ILSVRC 2012 winner: In 2012, Krizhevsky et al.[33] achieved a breakthrough in the large-scale ILSVRC chal-lenge, by utilizing a deep CNN that significantly outperformedother hand-crafted features resulting in a top-5 error rate of16.4%. AlexNet is actually a scaled version of the conventionalLeNet, and takes advantage of a large-scale training dataset(ImageNet) and more computational power (GPUs that allowfor 10x speed-up in training). Tuning the hyperparametersof AlexNet was observed to result in better performance,subsequently winning the ILSVCR 2013 challenge [48]. Thedetailed architecture of AlexNet is presented in the Appendix.In this paper, we extract the outputs of all convolutional layers(5) and all fully connected layers (2) to generate the CNNFeatures for the iris recognition task.

VGG: ILSVRC 2014 runner-up: In 2014, Simonyan andZisserman from Oxford showed that using smaller filters(3× 3) in each convolutional layer leads to improved perfor-mance. The intuition is that multiple small filters in sequencecan emulate the effects of larger ones. The simplicity of usingsmall sized filters throughout the network leads to very goodgeneralization performance. Based on these observations, theyintroduced a network called VGG which is still widely used to-day due to its simplicity and good generalization performance[34]. Multiple versions of VGG has been introduced, but thetwo most popular ones are VGG-16 and VGG-19 that contain16 and 19 layers, respectively. The detailed architecture ofVGG is presented in the Appendix. In this paper, we extract theoutputs of all convolutional layers (16) and all fully connectedlayers (2) to generate the CNN Features for the iris recognitiontask.

GoogLeNet and Inception: ILSVRC 2014 winner: In 2014,Szegedy et al. from Google introduced the Inception v1architecture that was implemented in the winning ILSVRC2014 submission called GoogLeNet with a top-5 error rateof 6.7%. The main innovation is the introduction of aninception module, which functions as a small network insidea bigger network [35]. The new insight was the use of 1× 1convolutional blocks to aggregate and reduce the number offeatures before invoking the expensive parallel blocks. Thishelps in combining convolutional features in a better way thatis not possible by simply stacking more convolutional layers.

Page 5: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 4

Fig. 2. The framework for iris recognition using Off-the-shelf CNN Features. The iris image is segmented using two circular contours and then geometricallynormalized using a pseudo-polar transformation resulting in a fixed rectangular image. Features are next extracted using off-the-shelf CNNs, and then classifiedusing an SVM.

Later, the authors introduced some improvements in terms ofbatch normalization, and re-designed the filter arrangementin the inception module to create Inception v2 and v3 [49].Most recently, they added residual connections to improve thegradient flows in Inception v4 [50]. The detailed architectureof Inception v3 is presented in the Appendix. In this paper,we extract the outputs of all convolutional layers (5) and allinception layers (12) to generate the CNN Features for the irisrecognition task.

ResNet: ILSVRC 2015 winner: In 2015, He et al. fromMicrosoft introduced the notion of residual connection orskip connection which feeds the output of two successiveconvolutional layers and bypasses the input to the next layer[36]. This residual connection improves the gradient flow inthe network, allowing the network to become very deep with152 layers. This network won the ILSVRC 2015 challengewith a top-5 error rate of 3.57%. The detailed architectureof ResNet-152 is presented in the Appendix. In this paper,we extract the outputs of all convolutional layers (1) and allbottleneck layers (17) to generate the CNN Features for theiris recognition task.

DenseNet: In 2016, Huang et al. from Facebook proposedDenseNet, which connects each layer of a CNN to everyother layer in a feed-forward fashion [37]. Using denselyconnected architectures leads to several advantages as pointedout by the authors: “alleviating the vanishing-gradient prob-lem, strengthening feature propagation, encouraging featurereuse, and substantially reducing the number of parameters”.The detailed architecture of DenseNet-201 is presented in theAppendix. In this paper, we extract the outputs of a selectednumber of dense layers (15) to generate the CNN Features forthe iris recognition task.

It is worth noting that there are several other powerfulCNN architectures in the literature [29], [51]. However, wehave only chosen the above architectures for illustrating theperformance of pre-trained CNNs on the iris recognition task.

B. Iris recognition framework using CNN Features

The framework we employ to investigate the performance ofoff-the-shelf CNN Features for iris recognition is summarizedin Figure 2.Segmentation: The iris is first localized by extracting twocircular contours pertaining to the inner and outer boundaries

of the iris region. The integro-differential operator, one of themost commonly used circle detectors, can be mathematicallyexpressed as,

maxr,x0,y0

∣∣Gσ(r) ∗ ∂

∂r

∮r,x0,y0

I(x, y)

2πrds∣∣, (1)

where I(x, y) and Gσ denote the input image and a Gaussianblurring filter, respectively. The symbol ∗ denotes a convo-lution operation and r represents the radius of the circulararc ds, centered at the location (x0, y0). The described op-eration detects circular edges by iteratively searching for themaximum responses of a contour defined by the parameters(x0, y0, r). In most cases, the iris region can be obscured bythe upper and lower eyelids and eyelashes. In such images, theeyelids can be localized using the above operator with the pathof contour integration changed from a circle to an arc. Noisemasks distinguish the iris-pixels from the non-iris pixels (e.g.,eyelashes, eyelids, etc.) in a given image. Such noise masks,corresponding to each input image, are generated during thesegmentation stage and used in the subsequent steps.

Normalization: The area enclosed by the inner and outerboundaries of an iris can vary due to the dilation and con-traction of the pupil. The effect of such variations needto be minimized before comparing different iris images. Tothis end, the segmented iris region is typically mapped to aregion of fixed dimension. Daugman proposed the usage of arubber-sheet model to transform the segmented iris to a fixedrectangular region. This process is carried out by re-mappingthe iris region, I(x, y), from the raw Cartesian coordinates(x, y) to the dimensionless polar coordinates (r, θ), and canbe mathematically expressed as,

I(x(r, θ), y(r, θ))→ I(r, θ), (2)

where r is in the unit interval [0,1], and θ is an angle inthe range of [0,2π]. x(r, θ) and y(r, θ) are defined as thelinear combination of both pupillary (xp(θ), yp(θ)) and limbicboundary points (xs(θ), ys(θ)) as,

x(r, θ) = (1− r)xp(θ) + rxs(θ), (3)y(r, θ) = (1− r)yp(θ) + rys(θ). (4)

An additional benefit of normalization is that the rotations ofthe eye (e.g., due to the movement of the head) are reduced tosimple translations during matching. The corresponding noise

Page 6: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 5

Fig. 3. Sample images from the LG2200 (first row) and CASIA-Iris-Thousand(second row) datasets.

mask is also normalized to facilitate easier matching in thelater stages.CNN feature extraction: The normalized iris image is thenfed into the CNN feature extraction module. As discussedearlier, five state-of-the-art and off-the-shelf CNNs (AlexNet,VGG, Google Inception, ResNet and DenseNet) are used inthis work to extract features from the normalized iris images.Note that there are multiple layers in each CNN. Every layermodels different levels of visual content in the image, withlater layers encoding finer and more abstract information, andearlier layers retaining coarser information. One of the keyreasons why CNNs work very well on computer vision tasksis that these deep networks with tens or hundreds of layers andmillions of parameters are extremely good at capturing andencoding complex features of the images, leading to superiorperformance. To investigate the representation capability ofeach layer for the iris recognition task, we employ the outputof each layer as a feature descriptor and report the correspond-ing recognition accuracy.SVM classification: The extracted CNN feature vector is thenfed into the classification module. We use a simple multi-classSupport Vector Machine (SVM) [52] due to its popularity andefficiency in image classification. The multi-class SVM for Nclasses is implemented as a one-against-all strategy, which isequivalent to combining N binary SVM classifiers, with everyclassifier discriminating a single class against all other classes.The test sample is assigned to the class with the largest margin[52].

IV. EXPERIMENTAL RESULTS

A. Datasets:

We conducted our experiments on two large iris datasets:1) LG2200 dataset: ND-CrossSensor-Iris-2013 is the largestpublic iris dataset in the literature in terms of the numberof images [41]. The ND-CrossSensor-2013 dataset contains116,564 iris images captured by the LG2200 iris camera from676 subjects. 2) CASIA-Iris-Thousand: This contains 20,000iris images from 1,000 subjects, which were collected usingthe IKEMB-100 camera from IrisKing [40]. Some samplesimages from the two datasets are depicted in Figure 3.

B. Performance metric and baseline method:

To report the performance, we rely on the Recognition Rate.Recognition Rate is calculated as the proportion of correctly

classified samples at a pre-defined False Acceptance Rate(FAR). In this work, we choose to report Recognition Rateat FAR = 0.1%.

The baseline feature descriptor we used for comparison isthe Gabor phase-quadrant feature [3]. The popular matchingoperator coupled with this descriptor is the Hamming distance[3]. This baseline achieved recognition accuracies of 91.1%and 90.7% on the LG2200 and CASIA-Iris-Thousand datasets,respectively.

C. Experimental setup:

The left and right iris images of every subject are treatedas two different classes. Thus, the LG2200 dataset has 1,352classes and the CASIA-Iris-Thousand has 2,000 classes. Werandomly choose 70% of the data corresponding to each classfor training and the rest (30%) for testing. It must be notedthat the training images were used only to train the multi-classSVMs; the pre-trained CNNs were not modified at all usingthe training data. This is one of the main advantage of usingpre-trained CNNs.

For iris segmentation and normalization, we used an open-source software, USIT v2.2, from the University of Salzburg[53]. This software takes each iris image as an input, segmentsit using inner and outer circles, and normalizes the segmentedregion into a rectangle of size 64× 256 pixels.

For CNN feature extraction, we implemented our approachusing PyTorch [54]. PyTorch is a recently released deeplearning framework by Facebook combining the advantagesof both Torch and Python. The two most advanced features ofthis framework are dynamic graph computation and imperativeprogramming, which make deep network coding more flexibleand powerful [54]. With regards to our experiments, PyTorchsupplies a wide range of pre-trained off-the-shelf CNNs,making our feature extraction task that much more convenient.

For classification, we used the LIBSVM library [55] witha Python wrapper implemented in the scikit-learn library [56]which allows for ease of integration with the feature extractionstep.

D. Performance analysis

As stated earlier, different layers encode different levels ofvisual content. To investigate the performance due to eachlayer, we estimate the recognition accuracy after using theoutput from each layer as a feature vector to represent theiris. The recognition accuracies are illustrated in Figure 4 forthe two datasets: LG2200 and CASIA-Iris-Thousand.

Interestingly, the recognition accuracy peaks in some middlelayers for all CNNs. On the LG2200 dataset: layer 10 forVGG, layer 10 for Inception, layer 11 for ResNet and layer6 for DenseNet. On the CASIA-Iris-Thousand dataset: layer9 for VGG, layer 10 for Inception, layer 12 for ResNet andlayer 5 for DenseNet. The difference in the “peak layers” canbe explained by the properties of each CNN. Since Inceptionuses intricate inception layers (actually each layer is a networkinside a larger network), it quickly converges to the peak thanothers. In contrast, ResNet with its skip connections is verygood at allowing the gradient to flow through the network,

Page 7: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 6

(a) LG2200 (b) CASIA-Iris-Thousand

Fig. 4. Recognition accuracy of different layers in the CNNs on the two datasets: (a) LG2200 and (b) CASIA-Iris-Thousand.

making the network perform well at a deeper depth, leadingto a later peak in the iris recognition accuracy. DenseNet, withits rich dense connections, allows neurons to interact easily,leading to the best recognition accuracy among all CNNs forthe iris recognition task.

As can be seen, peak results do not occur toward the laterlayers of the CNNs. This can be explained by the fact that thenormalized iris image is not as complex as the images in theImageNet dataset where large structural variations are presentin a wide range of objects. Hence, it is not necessary to havea large number of layers to encode the normalized iris. Thus,peak accuracies are achieved in the middle layers.

Among all five CNNs, DenseNet achieves the highest peakrecognition accuracy of 98.7% at layer 6 on the LG2200dataset and 98.8% at layer 5 on the CASIA-Iris-Thousanddataset. ResNet and Inception achieve similar peak recognitionaccuracies of 98.0% and 98.2% at layers 11 and 10, respec-tively, on the LG2200 dataset; and 98.5% and 98.3% at layers12 and 10, respectively, on the CASIA-Iris-Thousand dataset.VGG, with its simple architecture, only achieves 92.7% and93.1% recognition accuracy, both at layer 9, on the LG2200and CASIA-Iris-Thousand datasets, respectively. The continu-ally increasing recognition accuracy of AlexNet indicates thatthe number of layers considered in its architecture may notfully capture the discriminative visual features in iris images.

V. DISCUSSIONS AND CONCLUSION

In this paper, we have approached the task of iris recog-nition from a deep learning point of view. Our experimentshave shown that off-the-shelf pre-trained CNN features, eventhough originally trained for the problem of object recog-nition, can be appropriated to the iris recognition task. Byharnessing state-of-the-art CNNs from the ILSVRC challengeand applying them on the iris recognition task, we achievestate-of-the-art recognition accuracy in two large iris datasets,viz., ND-CrossSensor-2013 and CASIA-Iris-Thousand. Thesepreliminary results show that off-the-shelf CNN features canbe successfully transferred to the iris recognition problem,thereby effectively extracting discriminative visual features iniris images and eliminating the laborious feature-engineeringtask. The benefit of CNNs in automated feature-engineering is

critical in learning new iris encoding schemes that can benefitlarge scale applications.Open problems: This work, together with previous work onDeepIris [38] and DeepIrisNet [25], indicates that CNNs areeffective in encoding discriminative features for iris recog-nition. Notwithstanding this observation, there are severalchallenges and open questions when applying deep learningto the problem of iris recognition.

• Computational complexity: The computational com-plexity of CNNs during the training phase is very highdue to the millions of parameters used in the network.In fact, powerful GPUs are needed to accomplish train-ing. This compares unfavorably with hand-crafted irisfeatures, especially Daugman’s Gabor features, wherethousands of iriscodes can be extracted and comparedon a general purpose CPU within a second. Modelreduction techniques such as pruning and compressingmay, therefore, be needed to eliminate redundant neuronsand layers, and to reduce the size of the network.

• Domain adaptation and fine-tuning: Another approachto use off-the-shelf CNNs is by fine-tuning, which wouldentail freezing the early layers and only re-training a fewselected later layers to adapt the representation capabilityof the CNNs to iris images. Fine-tuning is expected tolearn and encode iris-specific features, as opposed togeneric image features. In addition, domain adaptationcan be used to transform the representation from theImageNet domain to the iris image domain.

• Few-shot learning: If the network for iris recognition hasto be designed and trained from scratch, the problem oflimited number of training images can be partially solvedwith a technique called few-shot learning, which wouldallow the network to perform well after seeing very fewsamples from each class.

• Architecture evolution: Recent advances in EvolutionTheory and Deep Reinforcement Learning allow thenetwork to change itself and generate better instances forthe problem at hand. Such an approach can be used toevolve off-the-shelf CNNs in order to generate powerfulnetworks that are more suited for iris recognition.

• Other architectures: In the field of deep learning, there

Page 8: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 7

are other architectures such as unsupervised Deep BeliefNetwork (DBN), Stacked Auto-Encoder (SAE) and Re-current Neural Network (RNN). These architectures havetheir own advantages and can be used to extract featuresfor iris images. They can be used either separately orin combination with classical CNNs to improve therepresentation capacity of iris templates.

REFERENCES

[1] A. Muron and J. Pospisil, “The human iris structure and its usages,” inActa Univ Plalcki Physica, 2000, vol. 39, pp. 87–95.

[2] A. Jain, A. Ross, and S. Prabhakar, “An introduction to biometricrecognition,” IEEE Transactions on Circuits and Systems for VideoTechnology, vol. 14, pp. 4–20, Jan 2004.

[3] J. Daugman, “How iris recognition works?” IEEE Transactions onCircuits and Systems for Video Technology, vol. 14, pp. 21 – 30, 2004.

[4] J. Daugman and C. Downing, “Searching for doppelgangers: assessingthe universality of the iriscode impostors distribution,” IET Biometrics,vol. 5, pp. 65–75, 2016.

[5] J. Daugman, “Information theory and the iriscode,” IEEE Transactionson Information Forensics and Security, vol. 11, pp. 400–409, Feb 2016.

[6] NIST, “IREX III - performance of iris identification algorithms,” Na-tional Institute of Science and Technology, USA, Tech. Rep. NISTInteragency Report 7836, 2012.

[7] ——, “IREX IV - evaluation of iris identification algorithms,” NationalInstitute of Science and Technology, USA, Tech. Rep. NIST InteragencyReport 7949, 2013.

[8] J. Daugman, “Major international deployments of the iris recognitionalgorithms: a billion persons,” Dec 2014.

[9] J. Daugman, “New methods in iris recognition,” IEEE Transactions onSystems, Man and Cybernetics, vol. 37, pp. 1167 – 75, 2007.

[10] D. Monro, S. Rakshit, and D. Zhang, “DCT-Based iris recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 29, pp. 586 –595, Apr 2007.

[11] K. Miyazawa, K. Ito, T. Aoki, K. Kobayashi, and H. Nakajima, “Aneffective approach for iris recognition using phase-based image match-ing,” IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 30, pp. 1741 –1756, Oct 2008.

[12] Z. Sun and T. Tan, “Ordinal measures for iris recognition,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 31, pp.2211 –2226, Dec 2009.

[13] W. Dong, Z. Sun, and T. Tan, “Iris matching based on personalizedweight map,” IEEE Transactions on Pattern Analysis and MachineIntelligence, pp. 1744 –1757, Sep 2011.

[14] J. K. Pillai, V. M. Patel, R. Chellappa, and N. K. Ratha, “Secure androbust iris recognition using random projections and sparse represen-tations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp.1877–1893, Sep 2011.

[15] Z. Sun, H. Zhang, T. Tan, and J. Wang, “Iris image classification basedon hierarchical visual codebook,” IEEE Trans. Pattern Anal. Mach.Intell. (USA), vol. 36, pp. 1120–1133, Jun 2014.

[16] A. Bastys, J. Kranauskas, and R. Masiulis, “Iris recognition by localextremum points of multiscale taylor expansion,” Pattern Recognition,vol. 42, pp. 1869 – 77, 2009.

[17] A. Bastys, J. Kranauskas, and V. Kruger, “Iris recognition by fusingdifferent representations of multi-scale taylor expansion,” ComputerVision and Image Understanding, vol. 115, pp. 804 – 16, 2011.

[18] K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image understandingfor iris biometrics: A survey,” Computer Vision and Image Understand-ing, vol. 110, pp. 281 – 307, 2008.

[19] ——, Handbook of Iris Recognition. UK: Springer-Verlag, 2013, ch.2. A Survey of Iris Biometrics Research: 2008-2010.

[20] I. Nigam, M. Vatsa, and R. Singh, “Ocular biometrics: A survey ofmodalities and fusion approaches,” Information Fusion, vol. 26, pp. 1 –35, 2015.

[21] K. Nguyen, C. Fookes, R. Jillela, S. Sridharan, and A. Ross, “Longrange iris recognition: A survey,” Pattern Recognition, vol. 72, pp. 123– 143, 2017.

[22] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,no. 7553, pp. 436 – 44, 2015/05/27.

[23] J. Schmidhuber, “Deep learning in neural networks: An overview,”Neural Networks, vol. 61, pp. 85 – 117, 2015.

[24] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: Areview and new perspectives,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, Aug 2013.

[25] A. Gangwar and A. Joshi, “Deepirisnet: Deep iris representation withapplications in iris recognition and cross-sensor iris recognition,” in 2016IEEE International Conference on Image Processing (ICIP), Sep 2016,pp. 2301–2305.

[26] S. Minaee, A. Abdolrashidiy, and Y. Wang, “An experimental study ofdeep convolutional features for iris recognition,” in 2016 IEEE SignalProcessing in Medicine and Biology Symposium (SPMB), Dec 2016, pp.1–6.

[27] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” InternationalJournal of Computer Vision, vol. 115, no. 3, pp. 211–252, Dec 2015.

[28] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn featuresoff-the-shelf: An astounding baseline for recognition,” in 2014 IEEEConference on Computer Vision and Pattern Recognition Workshops,Jun 2014, pp. 512–519.

[29] A. Canziani, A. Paszke, and E. Culurciello, “An analysis of deep neuralnetwork models for practical applications,” CoRR, vol. abs/1605.07678,2016.

[30] D. H. Hubel and T. N. Wiesel, “Receptive fields and functional archi-tecture of monkey striate cortex,” Journal of Physiology (London), vol.195, pp. 215–243, 1968.

[31] K. Fukushima and S. Miyake, Neocognitron: A Self-Organizing NeuralNetwork Model for a Mechanism of Visual Pattern Recognition. Berlin,Heidelberg: Springer Berlin Heidelberg, 1982, pp. 267–285.

[32] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E.Hubbard, and L. D. Jackel, “Handwritten digit recognition with a back-propagation network,” in Advances in Neural Information ProcessingSystems. Morgan-Kaufmann, 1990, pp. 396–404.

[33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” vol. 2, US, 2012, pp. 1097 –1105.

[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.

[35] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”in 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Jun 2015, pp. 1–9.

[36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), Jun 2016, pp. 770–778.

[37] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Denselyconnected convolutional networks,” in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), Jun 2017, pp. 4700–4708.

[38] N. Liu, M. Zhang, H. Li, Z. Sun, and T. Tan, “Deepiris: Learning pair-wise filter bank for heterogeneous iris verification,” Pattern RecognitionLetters, vol. 82, pp. 154 – 161, 2016.

[39] P. A. Johnson, P. Lopez-Meyer, N. Sazonova, F. Hua, and S. Schuckers,“Quality in face and iris research ensemble (q-fire),” in 2010 FourthIEEE International Conference on Biometrics: Theory, Applications andSystems (BTAS), Sep 2010, pp. 1–6.

[40] Chinese Academy of Sciences Institute of Automation, “CASIA irisimage database, http://biometrics.idealtest.org/,” Aug 2017.

[41] P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W.Bowyer, C. L. Schott, and M. Sharpe, “Frvt 2006 and ice 2006 large-scale experimental results,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 32, no. 5, pp. 831–846, May 2010.

[42] N. Liu, H. Li, M. Zhang, J. Liu, Z. Sun, and T. Tan, “Accurate irissegmentation in non-cooperative environments using fully convolutionalnetworks,” in 2016 International Conference on Biometrics (ICB), Jun2016, pp. 1–8.

[43] E. Jalilian and A. Uhl, Iris Segmentation Using Fully ConvolutionalEncoder–Decoder Networks. Cham: Springer International Publishing,2017, pp. 133–155.

[44] L. He, H. Li, F. Liu, N. Liu, Z. Sun, and Z. He, “Multi-patchconvolution neural network for iris liveness detection,” in 2016 IEEE8th International Conference on Biometrics Theory, Applications andSystems (BTAS), Sep 2016, pp. 1–7.

[45] D. Menotti, G. Chiachia, A. Pinto, W. R. Schwartz, H. Pedrini, A. X.Falco, and A. Rocha, “Deep representations for iris, face, and fingerprintspoofing detection,” IEEE Transactions on Information Forensics andSecurity, vol. 10, no. 4, pp. 864–879, Apr 2015.

Page 9: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

IEEE ACCESS, SEPT 2017 8

[46] J. Tapia and C. Aravena, Gender Classification from NIR Iris ImagesUsing Deep Learning. Cham: Springer International Publishing, 2017,pp. 219–239.

[47] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, andL. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp.211–252, 2015.

[48] M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolu-tional Networks. Springer International Publishing, 2014, pp. 818–833.

[49] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinkingthe inception architecture for computer vision,” in 2016 IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), Jun 2016,pp. 2818–2826.

[50] ——, “Inception-v4, inception-resnet and the impact of residual con-nections on learning,” in AAAI Conference on Artificial Intelligence,Feb 2017, pp. 4278–4284.

[51] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu,X. Wang, and G. Wang, “Recent advances in convolutional neuralnetworks,” CoRR, vol. abs/1512.07108v5, 2017.

[52] B. Scholkopf and A. J. Smola, Learning with Kernels: Support VectorMachines, Regularization, Optimization, and Beyond. USA: MIT Press,2001.

[53] C. Rathgeb, A. Uhl, P. Wild, and H. Hofbauer, “Design decisionsfor an iris recognition sdk,” in Handbook of Iris Recognition, secondedition ed., ser. Advances in Computer Vision and Pattern Recognition,K. Bowyer and M. J. Burge, Eds. Springer, 2016.

[54] “PyTorch tensors and dynamic neural networks in python with stronggpu acceleration,” http://pytorch.org/, Sep 2017.

[55] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vectormachines,” ACM Transactions on Intelligent Systems and Technology,vol. 2, pp. 1–27, 2011.

[56] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay, “Scikit-learn: Machine learning in Python,” Journal of MachineLearning Research, vol. 12, pp. 2825–2830, 2011.

Kien Nguyen is a Postdoctoral Research Fellow atQueensland University of Technology (QUT, Aus-tralia). He has been researching iris recognition for7 years, including 4 years of his PhD at QUT.His research interests are applications of computervision and deep learning techniques for biometrics,surveillance and scene understanding.

Clinton Fookes is a Professor in Vision and SignalProcessing at the Queensland University of Tech-nology. He holds a BEng (Aerospace/Avionics), anMBA, and a PhD in computer vision. He is a SeniorMember of the IEEE, an AIPS Young Tall Poppy,an Australian Museum Eureka Prize winner, and aSenior Fulbright Scholar.

Arun Ross is a Professor in Michigan State Univer-sity and the Director of the iPRoBe Lab. He is theco-author of the books Handbook of Multibiometricsand Introduction to Biometrics. Arun is a recipientof the IAPR JK Aggarwal Prize, IAPR Young Bio-metrics Investigator Award and the NSF CAREERAward.

Sridha Sridharan obtained his MSc degree fromthe University of Manchester, UK and his PhDdegree from University of New South Wales, Aus-tralia. He is currently a Professor at QueenslandUniversity of Technology (QUT) where he leadsthe research program in Speech, Audio, Image andVideo Technologies (SAIVT).

Page 10: c Consult author(s) regarding copyright matters Notice Please … · 2020. 4. 25. · recognition accuracy of iris recognition in operational sce-narios [6], [7]. According to a report

Appe

ndix.)Five)state/of/th

e/art)C

onvolutio

nal)N

eural)N

etworks)(C

NNs))used)in)th

is)pa

per:)AlexNe

t,)VG

G,)ResNe

t,)Inception)an

d)De

nseN

et.)

Note%that%th

ere%are%a%nu

mbe

r%of%variants%o

f%VGG,%R

esNet,%Incep

tion%an

d%De

nseN

et.%W

e%choo

se%th

e%de

epest%v

ariants%for%each,%i.e.%VGG19

,%ResNet15

2,%In

ception%v3%and

%Den

seNet20

1.%

The%layers%used%to%extract%CNN%fe

atures%fo

r%iris%re

cogn

ition

%are%highlighted

%in%re

d.%%

% AlexNet)[1

])VG

G)[2])

ResN

et)[3

])Inception)[4])

DenseN

et)[5

])AlexNet%(%

%%(features):%Sequ

entia

l%(%

%%%%(1

):%Co

nv2d

(3,%6

4,%k=(11

,%11),%st=(4,%4

),%p=

(2,%2

))%%%%%

%%%%%%%%%%%R

eLU%(inp

lace)%%%%%

%%%%%%%%%%%M

axPo

ol2d

%(s=(3,%3),%s=(2,%2

),%dilatio

n=(1,%1

))%%%%%(2

):%Co

nv2d

(64,%192

,%k=(5,%5),%st=(1,%1),%p=

(2,%2

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(s=(3,%3),%st=(2,%2),%dilatio

n=(1,%1

))%%%%%(3

):%Co

nv2d

(192

,%384

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(4

):%Co

nv2d

(384

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(5

):%Co

nv2d

(256

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(s=(3,%3),%st=(2,%2),%dilatio

n=(1,%1

))%%%)%

%%(classifie

r):%Seq

uential%(%

%%%%%%%%%%%D

ropo

ut%(p

%=%0.5)%

%%%%(6

):%Line

ar%(9

216%W>%409

6)%

%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%D

ropo

ut%(p

%=%0.5)%

%%%%(7

):%Line

ar%(4

096%W>%409

6)%

%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%Linea

r%(40

96%W>

%100

0)%

%%)%

)%

VGG19

%(%%%(features):%Sequ

entia

l%(%

%%%%(1

):%Co

nv2d

(3,%6

4,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(2

):%Co

nv2d

(64,%64,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(size=

(2,%2

),%st=(2,%2),%dilatio

n=(1,%1

))%%%%%(3

):%Co

nv2d

(64,%128

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(4

):%Co

nv2d

(128

,%128

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(size=

(2,%2

),%st=(2,%2),%dilatio

n=(1,%1

))%%%%%(5

):%Co

nv2d

(128

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(6

):%Co

nv2d

(256

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(7

):%Co

nv2d

(256

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(8

):%Co

nv2d

(256

,%256

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(size=

(2,%2

),%st=(2,%2),%dilatio

n=(1,%1

))%%%%%(9

):%Co

nv2d

(256

,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

0):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

1):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

2):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(size=

(2,%2

),%st=(2,%2),%dilatio

n=(1,%1

))%%%%%(1

3):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

4):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

5):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%(1

6):%C

onv2d(51

2,%512

,%k=(3,%3),%st=(1,%1),%p=

(1,%1

))%%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%M

axPo

ol2d

%(size=

(2,%2

),%st=(2,%2),%dilatio

n=(1,%1

))%%%)%

%%(classifie

r):%Seq

uential%(%

%%%%(1

7):%Linea

r%(25

088%W>%409

6)%

%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%D

ropo

ut%(p

%=%0.5)%

%%%%(1

8):%Linea

r%(40

96%W>

%409

6)%

%%%%%%%%%%%R

eLU%(inp

lace)%

%%%%%%%%%%%D

ropo

ut%(p

%=%0.5)%

%%%%%%%%%%%Linea

r%(40

96%W>

%100

0)%

%%)%

)% %

ResN

et%152

(%%(1

)%(conv

1):%C

onv2d(3,%64,%k=(7,%7),%st=(2,%2),%p=

(3,%3

),%bias=False)%

%%(bn

1):%B

atchNorm2d

%%(relu):%Re

LU%(m

axpo

ol):%MaxPo

ol2d

%%%%(layer1):%Sequ

entia

l%(%

%%%(2)%(0):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%

%%%%%%(dow

nsam

ple):%Seq

uential%((0):%Co

nv2d

%%%%%%%(1

):%Ba

tchN

orm2d

)%%%)%

%%%(3)%(1):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%%)%

%%%%(4

)%(2):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%%)%

%%)%

%%(layer2):%Sequ

entia

l%(%

%%%(5)%(0):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%

%%%%%%(dow

nsam

ple):%Seq

uential%((0):%Co

nv2d

%%%%%%(1):%Ba

tchN

orm2d

)%)%

%%%(6)%(1):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(7

)%(2):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%(8)%(3):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(9

)%(4):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(1

0)%(5

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(1

1)%(6

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(1

2)%(7

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%)%

%%(layer3):%Sequ

entia

l%(%

%%%%(1

3)%(0

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%

%%%%%(do

wnsam

ple):%Seq

uential%((0):%Co

nv2d

%%%%%%(1):%Ba

tchN

orm2d

)%)%

%%%%(1

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(2

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(3

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(4

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(5

W16):%B

ottle

neck%%

%%%%(1

4)%(1

7):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%%%(1

8W34

):%Bo

ttlene

ck%%

%%%%(1

5)%(3

5):%B

ottle

neck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%)%

%%(layer4):%Sequ

entia

l%(%

%%%(16)%(0

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%

%%%%%%(dow

nsam

ple):%Seq

uential%(%(0

):%Co

nv2d

%%%%%%%%(1):%B

atchNorm2d

)%)%

%%%(17)%(1

):%Bo

ttlene

ck%%

%%%(18)%(2

):%Bo

ttlene

ck%(%

%%%%%%Co

nv2d

%W%Ba

tchN

orm2d

%W%%Con

v2d%W%B

atchNorm2d

%W%Co

nv2d

%WBatchNorm2d

%W%Re

LU%)%

%%)%

%%(avgp

ool):%AvgPo

ol2d

%(%%%)%

%%(fc):%Line

ar%(2

048%W>%100

0)%

)%

Inception3

%(%%(1

)%(Co

nv2d

_1a_

3x3):%B

asicCo

nv2d

%(%%%%%(con

v):%C

onv2d%%%(bn

):%Ba

tchN

orm2d

%)%%%(2)%(C

onv2d_

2a_3

x3):%Ba

sicCon

v2d%(%

%%%%(con

v):%C

onv2d%%%%(bn):%B

atchNorm2d

%%)%

%%(3)%(C

onv2d_

2b_3

x3):%Ba

sicCon

v2d%(%

%%%%(con

v):%C

onv2d%%%(bn

):%Ba

tchN

orm2d

%%)%

%%(4)%(C

onv2d_

3b_1

x1):%Ba

sicCon

v2d%(%

%%%%(con

v):%C

onv2d%%%%(bn):%B

atchNorm2d

%%)%

%%(5)%(C

onv2d_

4a_3

x3):%Ba

sicCon

v2d%(%

%%%%(con

v):%C

onv2d%%%%(bn):%B

atchNorm2d

%%)%

%%(6)%(M

ixed

_5b):%Incep

tionA

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch5

x5_1

):%%%%%(b

ranch5

x5_2

):%%%%%

%%%%(b

ranch3

x3db

l_1):%%%(b

ranch3

x3db

l_2):%%%%(bran

ch3x3d

bl_3

):%%%(bran

ch_p

ool)%)%

%%(7)%(M

ixed

_5c):%Incep

tionA

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch5

x5_1

):%%%%%(b

ranch5

x5_2

):%%%%%

%%%%(b

ranch3

x3db

l_1):%%%(b

ranch3

x3db

l_2):%%%%(bran

ch3x3d

bl_3

):%%%(bran

ch_p

ool)%)%

%%(8)%(M

ixed

_5d):%Incep

tionA

%(%%%%%%%(branch1

x1):%%%%%(b

ranch5

x5_1

):%%%%%(b

ranch5

x5_2

):%%%%%

%%%%(b

ranch3

x3db

l_1):%%%(b

ranch3

x3db

l_2):%%%%(bran

ch3x3d

bl_3

):%%%(bran

ch_p

ool)%)%

%(9)%(Mixed

_6a):%Incep

tionB

%(%%%%%(b

ranch3

x3):%%%%%(b

ranch3

x3db

l_1):%%%%%(branch3

x3db

l_2):%%%%(bran

ch3x3d

bl_3

):%)%

%%(10

%(Mixed

_6b):%Incep

tionC

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch7

x7_1

):%%%%%(b

ranch7

x7_2

):%%%%%(b

ranch7

x7_3

):%%

%%%%(b

ranch7

x7db

l_1):%%%%%(branch7

x7db

l_2):%%%%%(branch7

x7db

l_3):%%

%%%%(b

ranch7

x7db

l_4):%%%%%(branch7

x7db

l_5):%%%%%(branch_

pool):%%)%

%%(11

)%(Mixed

_6c):%Incep

tionC

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch7

x7_1

):%%%%%(b

ranch7

x7_2

):%%%%%(b

ranch7

x7_3

):%%

%%%%(b

ranch7

x7db

l_1):%%%%%(branch7

x7db

l_2):%%%%%(branch7

x7db

l_3):%%

%%%%(b

ranch7

x7db

l_4):%%%%%(branch7

x7db

l_5):%%%%%(branch_

pool):%%)%

%(12)%(M

ixed

_6d):%Incep

tionC

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch7

x7_1

):%%%%%(b

ranch7

x7_2

):%%%%%(b

ranch7

x7_3

):%%

%%%%(b

ranch7

x7db

l_1):%%%%%(branch7

x7db

l_2):%%%%%(branch7

x7db

l_3):%%

%%%%(b

ranch7

x7db

l_4):%%%%%(branch7

x7db

l_5):%%%%%(branch_

pool):%%)%

(13)%(M

ixed

_6e):%Incep

tionC

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch7

x7_1

):%%%%%(b

ranch7

x7_2

):%%%%%(b

ranch7

x7_3

):%%

%%%%(b

ranch7

x7db

l_1):%%%%%(branch7

x7db

l_2):%%%%%(branch7

x7db

l_3):%%

%%%%(b

ranch7

x7db

l_4):%%%%%(branch7

x7db

l_5):%%%%%(branch_

pool):%%)%

%%)%

%(14)%(A

uxLogits):%InceptionA

ux%(%

%%%%(con

v0):%Ba

sicCon

v2d%(%(conv

):%Co

nv2d

%%%%%%(bn):%B

atchNorm2d

%%%%)%

%%%%(con

v1):%Ba

sicCon

v2d%(%(conv

):%Co

nv2d

%%%%%%(bn):%B

atchNorm2d

%%%%)%

%%%%(fc):%Linea

r%(76

8%W>%100

0)%%)%

%(15)%%(Mixed

_7a):%Incep

tionD

%(%%%%%(b

ranch3

x3_1

):%%%%%(b

ranch3

x3_2

):%%

%%%%(b

ranch7

x7x3_1

):%%%%%(b

ranch7

x7x3_2

):%%%(bran

ch7x7x3_

3):%%%(b

ranch7

x7x3_4

):%%)%

%%(16

)%(Mixed

_7b):%Incep

tionE

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch3

x3_1

):%%%%%(b

ranch3

x3_2

a):%%%%%(branch3

x3_2

b):%%

%%%%(b

ranch3

x3db

l_1):%%%%%(branch3

x3db

l_2):%%%%%%

%%%%(b

ranch3

x3db

l_3a

):%%%(bran

ch3x3d

bl_3

b):%%%%%(branch_

pool):%%)%

%(17)%(M

ixed

_7c):%Incep

tionE

%(%%%%%(b

ranch1

x1):%%%%%(b

ranch3

x3_1

):%%%%%(b

ranch3

x3_2

a):%%%%%(branch3

x3_2

b):%%

%%%%(b

ranch3

x3db

l_1):%%%%%(branch3

x3db

l_2):%%

%%%%(b

ranch3

x3db

l_3a

):%%%%%(b

ranch3

x3db

l_3b

):%%%%%(b

ranch_

pool):%%)%

%(fc):%Linea

r%(20

48%W>

%100

0)%

)% %

Den

seNet%(%

%(fea

tures):%Seq

uential%(%

%%%%%%(1)%(conv

0):%C

onv2d%%%(no

rm0):%B

atchNorm2d

%%%(relu0

):%Re

LU%%

%%%%%%%(p

ool0):%MaxPo

ol2d

%%%%%%%%%(2

)%(de

nseb

lock1):%_

Den

seBlock%(%

%%%%%%%%%%%%(den

selayer1):%_D

enseLayer%(%

%%%%%%%%%%%%%%%%%(no

rm.1):%Ba

tchN

orm2d

%(relu.1):%R

eLU%%(conv

.1):%Co

nv2d

%%%%%%%%

%%%%%%%%%%%%%%%%%(no

rm.2):%Ba

tchN

orm2d

%(relu.2):%R

eLU%(con

v.2):%C

onv2d%)%

%%%%%%%%%%%%(den

selayer2):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer3):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer4):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer5):%_D

enseLayer%

%%%%%%%%%%%%(den

selayer6):%_D

enseLayer%

%%%%%%%%%%%%%)%

%%%%%%%%(3)%(transition

1):%_

Tran

sitio

n%(%

%%%%%%%%%%%%%(n

orm):%Ba

tchN

orm2d

%%%(relu):%R

eLU%%%(con

v):%C

onv2d%

%%%%%%%%%%%%%(p

ool):%AvgPo

ol2d

%)%%%%%%%%%(de

nseb

lock2):%_

Den

seBlock%(%

%%%%%%%%%%%%(den

selayer1W5):%_D

enseLayer%%

%%%%%%%%%%(4

)%%(den

selayer6):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer7W12):%_

Den

seLayer)%%

%%%%%%%%(5)%(transition

2):%_

Tran

sitio

n%(%

%%%%%%%%%%%%%%(no

rm):%Ba

tchN

orm2d

%%%(relu):%R

eLU%%%(con

v):%C

onv2d%%%%

%%%%%%%%%%%%%%(po

ol):%Av

gPoo

l2d)%

%%%%%%%%(de

nseb

lock3):%_

Den

seBlock%(%

%%%%%%%%%%%%(den

selayer1W10):%_

Den

seLayer%%

%%%%%%%%%%%(6)%(d

enselayer11):%_

Den

seLayer%%

%%%%%%%%%%%%(den

selayer12W19

):%_D

enseLayer%%

%%%%%%%%%%%(7)%(d

enselayer20):%_

Den

seLayer%%

%%%%%%%%%%%%(den

selayer21W29

):%_D

enseLayer%%

%%%%%%%%%%%%(8)%(de

nselayer30

):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer31W39

):%_D

enseLayer%%

%%%%%%%%%%%%(9)%(de

nselayer40

):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer41W47

):%_D

enseLayer%%

%%%%%%%%%%%%(10)%(d

enselayer48):%_

Den

seLayer%%

%%%%%%%%(11

)%(tran

sitio

n3):%_T

ransition

%(%%%%%%%%%%%%%%%(no

rm):%Ba

tchN

orm2d

%(relu):%R

eLU%%%

(con

v):%C

onv2d%%%%%%%

%%%%%%%%%%%%%%(po

ol):%Av

gPoo

l2d%)%

%%%%%%%%(de

nseb

lock4):%_

Den

seBlock%(%

%%%%%%%%%%%%(den

selayer1W10):%_

Den

seLayer%%

%%%%%%%%%%%(12

)%(de

nselayer11

):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer12W19

):%_D

enseLayer%%

%%%%%%%%%%%(13

)%(de

nselayer20

):%_D

enseLayer%%

%%%%%%%%%%%%(den

selayer21W27

):%_D

enseLayer%%

%%%%%%%%%%%%(14)%(d

enselayer28):%_

Den

seLayer%%

%%%%%%%%%%%%(den

selayer29W31

):%_D

enseLayer%%

%%%%%%%%%%%%(15)%(d

enselayer32):%_

Den

seLayer%)%

%%%%%%%%(no

rm5):%B

atchNorm2d

%%)% %(classifier):%Line

ar%(1

920%W>%100

0)%

)% % %

% [1]%A

.%Krizhe

vsky,%I.%Sutskever,%and

%G.%E.%H

inton,%“Im

agen

et%classificatio

n%with

%deep%convolutiona

l%neu

ral%networks,”%vol.%2,%U

S,%201

2,%pp.%109

7%–%11

05.%

[2]%K

.%Sim

onyan%an

d%A.%Ziss

erman

,%“Ve

ry%deep%convolutiona

l%networks%fo

r%largeWscale%im

age%recognition

,”%CoR

R,%vol.%abs/140

9.15

56,%201

4.%

[3]%K

.%He,%X.%Zha

ng,%S.%R

en,%and

%J.%Sun

,%“De

ep%re

sidua

l%learning%for%image%recognition

,”%in%201

6%IEEE%Con

ference%on

%Com

puter%V

ision

%and

%Pattern%Recognitio

n%(CVP

R),%Jun

%201

6,%pp.%770

–778

.%[4]%C

.Szegedy,V.Van

houcke,S.Io

ffe,J.Shlens,and

Z.Wojna

,“Re

thinking%th

e%inception%archite

cture%for%com

puter%v

ision

,”%in%201

6%IEEE%Con

ference%on

%Com

puter%V

ision

%and

%Pattern%Recognitio

n%(CVP

R),%Jun

%201

6,%pp.%

2818

–282

6.%

[5]%G

.%Hua

ng,%Z.%Liu,%K.%Q

.%Weinb

erger,%an

d%L.%van

%der%M

aaten,%“De

nsely%conn

ected%convolutiona

l%networks,”%in%201

6%IEEE%Con

ference%on

%Com

puter%V

ision

%and

%Pattern%Recognitio

n%(CVP

R),%Jun

%201

7,%pp.%470

0–47

08.%

APPENDIX