13
Research Article Deep Convolutional Neural Networks for Hyperspectral Image Classification Wei Hu, 1 Yangyu Huang, 1 Li Wei, 1 Fan Zhang, 1 and Hengchao Li 2,3 1 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 10029, China 2 Sichuan Provincial Key Laboratory of Information Coding and Transmission, Southwest Jiaotong University, Chengdu 610031, China 3 Department of Aerospace Engineering Sciences, University of Colorado, Boulder, CO 80309, USA Correspondence should be addressed to Wei Hu; [email protected] Received 23 November 2014; Accepted 22 January 2015 Academic Editor: Tianfu Wu Copyright © 2015 Wei Hu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify hyperspectral images directly in spectral domain. More specifically, the architecture of the proposed classifier contains five layers with weights which are the input layer, the convolutional layer, the max pooling layer, the full connection layer, and the output layer. ese five layers are implemented on each spectral signature to discriminate against others. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than some traditional methods, such as support vector machines and the conventional deep learning-based methods. 1. Introduction e hyperspectral imagery (HSI) [1] is acquired by remote sensers, which are characterized in hundreds of observa- tion channels with high spectral resolution. Taking advan- tages of the rich spectral information, numerous traditional classification methods, such as -nearest-neighbors (-nn), minimum distance, and logistic regression [2], have been developed. Recently, some more effective feature extraction methods as well as advanced classifiers were proposed, such as spectral-spatial classification [3] and local Fisher discrim- inant analysis [4]. In the current literatures, support vector machine (SVM) [5, 6] has been viewed as an efficient and stable method for hyperspectral classification tasks, especially for the small training sample sizes. SVM seeks to separate two-class data by learning an optimal decision hyperplane which best separates the training samples in a kernel- included high-dimensional feature space. Some extensions of SVM in hyperspectral image classification were presented to improve the classification performance [3, 7, 8]. Neural networks (NN), such as multilayer perceptron (MLP) [9] and radial basis function (RBF) [10] neural net- works, have already been investigated for classification of remote sensing data. In [11], the authors proposed a semi- supervised neural network framework for large-scale HSI classification. Actually, in remote sensing classification tasks, SVM is superior to the traditional NN in terms of classi- fication accuracy as well as computational cost. In [12], a deeper architecture of NN has been considered a powerful model for classification, whose classification performance is competitive to SVM. Deep learning-based methods achieve promising perfor- mance in many fields. In deep learning, the convolutional neural networks (CNNs) [12] play a dominant role for pro- cessing visual-related problems. CNNs are biologically- inspired and multilayer classes of deep learning models that use a single neural network trained end to end from raw image pixel values to classifier outputs. e idea of CNNs was firstly introduced in [13], improved in [14], and refined and simplified in [15, 16]. With the large-scale sources Hindawi Publishing Corporation Journal of Sensors Volume 2015, Article ID 258619, 12 pages http://dx.doi.org/10.1155/2015/258619

Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Research ArticleDeep Convolutional Neural Networks forHyperspectral Image Classification

Wei Hu1 Yangyu Huang1 Li Wei1 Fan Zhang1 and Hengchao Li23

1College of Information Science and Technology Beijing University of Chemical Technology Beijing 10029 China2Sichuan Provincial Key Laboratory of Information Coding and Transmission Southwest Jiaotong University Chengdu 610031 China3Department of Aerospace Engineering Sciences University of Colorado Boulder CO 80309 USA

Correspondence should be addressed to Wei Hu thuweihuqqcom

Received 23 November 2014 Accepted 22 January 2015

Academic Editor Tianfu Wu

Copyright copy 2015 Wei Hu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Recently convolutional neural networks have demonstrated excellent performance on various visual tasks including theclassification of common two-dimensional images In this paper deep convolutional neural networks are employed to classifyhyperspectral images directly in spectral domain More specifically the architecture of the proposed classifier contains five layerswith weights which are the input layer the convolutional layer the max pooling layer the full connection layer and the outputlayer These five layers are implemented on each spectral signature to discriminate against others Experimental results based onseveral hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance thansome traditional methods such as support vector machines and the conventional deep learning-based methods

1 Introduction

The hyperspectral imagery (HSI) [1] is acquired by remotesensers which are characterized in hundreds of observa-tion channels with high spectral resolution Taking advan-tages of the rich spectral information numerous traditionalclassification methods such as 119896-nearest-neighbors (119896-nn)minimum distance and logistic regression [2] have beendeveloped Recently some more effective feature extractionmethods as well as advanced classifiers were proposed suchas spectral-spatial classification [3] and local Fisher discrim-inant analysis [4] In the current literatures support vectormachine (SVM) [5 6] has been viewed as an efficient andstablemethod for hyperspectral classification tasks especiallyfor the small training sample sizes SVM seeks to separatetwo-class data by learning an optimal decision hyperplanewhich best separates the training samples in a kernel-included high-dimensional feature space Some extensions ofSVM in hyperspectral image classification were presented toimprove the classification performance [3 7 8]

Neural networks (NN) such as multilayer perceptron(MLP) [9] and radial basis function (RBF) [10] neural net-works have already been investigated for classification ofremote sensing data In [11] the authors proposed a semi-supervised neural network framework for large-scale HSIclassification Actually in remote sensing classification tasksSVM is superior to the traditional NN in terms of classi-fication accuracy as well as computational cost In [12] adeeper architecture of NN has been considered a powerfulmodel for classification whose classification performance iscompetitive to SVM

Deep learning-based methods achieve promising perfor-mance in many fields In deep learning the convolutionalneural networks (CNNs) [12] play a dominant role for pro-cessing visual-related problems CNNs are biologically-inspired and multilayer classes of deep learning models thatuse a single neural network trained end to end from rawimage pixel values to classifier outputs The idea of CNNswas firstly introduced in [13] improved in [14] and refinedand simplified in [15 16] With the large-scale sources

Hindawi Publishing CorporationJournal of SensorsVolume 2015 Article ID 258619 12 pageshttpdxdoiorg1011552015258619

2 Journal of Sensors

of training data and efficient implementation on GPUsCNNs have recently outperformed some other conventionalmethods even human performance [17] on many vision-related tasks including image classification [18 19] objectdetection [20] scene labeling [21] house number digitclassification [22] and face recognition [23] Besides visiontasks CNNs have been also applied to other areas such asspeech recognition [24 25] The technique has been verifiedas an effective class of models for understanding visualimage content giving some state-of-the-art results on visualimage classification and other visual-related problems In[26] the authors presented DNN for HSI classification inwhich stacked autoencoders (SAEs) were employed to extractdiscriminative features

CNNs have been demonstrated to provide even betterclassification performance than the traditional SVM classi-fiers [27] and the conventional deep neural networks (DNNs)[18] in visual-related area However since CNNs have beenonly considered on visual-related problems there are rareliteratures on the technique with multiple layers for HSIclassification In this paper we have found that CNNs canbe effectively employed to classify hyperspectral data afterbuilding appropriate layer architecture According to ourexperiments we observe that the typical CNNs such asLeNet-5 [14] with two convolutional layers are actually notapplicable for hyperspectral data Alternatively we present asimple but effective CNN architecture containing five layerswith weights for supervised HSI classification Several exper-iments demonstrate excellent performance of our proposedmethod compared to the classic SVM and the conventionaldeep learning architecture As far as we know it is the firsttime to employ the CNN with multiple layers for HSI classi-fication

The paper is organized as follows In Section 2 we givea brief introduction to CNNs In Section 3 the typicalCNN architecture and the corresponding training processare presented In Section 4 we experimentally compare theperformance of our method with SVM and some neuralnetworks with different architectures Finally we conclude bysummarizing our results in Section 5

2 CNNs

CNNs represent feed-forward neural networks which consistof various combinations of the convolutional layers maxpooling layers and fully connected layers and exploit spatiallylocal correlation by enforcing a local connectivity patternbetween neurons of adjacent layers Convolutional layersalternate with max pooling layers mimicking the nature ofcomplex and simple cells in mammalian visual cortex [28]A CNN consists of one or more pairs of convolution andmaxpooling layers and finally ends with a fully connected neuralnetworks A typical convolutional network architecture isshown in Figure 1 [24]

In ordinary deep neural networks a neuron is connectedto all neurons in the next layer CNNs are different fromordinary NN in that neurons in convolutional layer are onlysparsely connected to the neurons in the next layer basedon their relative location That is to say in a fully connected

Filter size

Pooling size

Input

Shared weights

Convolutional layer

Max pooling layer

Fully connected layer

h1 h2 h3 h4 h5 h6

W

V 1 2 3 4 5 6

Figure 1 A typical CNN architecture consisting of a convolutionallayer a max pooling layer and a fully connected layer

DNNs each hidden activation ℎ119894is computed by multiplying

the entire input V by weights W in that layer However inCNNs each hidden activation is computed by multiplyinga small local input against the weights W The weights Ware then shared across the entire input space as shown inFigure 1 Neurons that belong to the same layer share the sameweights Weight sharing is a critical principle in CNNs sinceit helps reduce the total number of trainable parameters andleads to more efficient training and more effective model Aconvolutional layer is usually followed by amax pooling layer

Due to the replication of weights in a CNN a feature maybe detected across the input data If an input image is shiftedthe neuron detecting the feature is shifted as much Pooling isused to make the features invariant from the location and itsummarizes the output of multiple neurons in convolutionallayers through a pooling function Typical pooling functionis maximum A max pooling function basically returns themaximum value from the input Max pooling partitions theinput data into a set of nonoverlapping windows and outputsthe maximum value for each subregion and reduces thecomputational complexity for upper layers and provides aform of translation invariance To be used for classificationthe computation chain of a CNN ends in a fully connectednetwork that integrates information across all locations in allthe feature maps of the layer below

Most of CNNs working in image recognition have thelower layers composed to alternate convolutional and maxpooling layers while the upper layers are fully connectedtraditional MLP NNs For example LeNet-5 is such a CNNarchitecture presented for handwritten digit recognition [14]firstly and then it is successfully used for solving other visual-related problems However LeNet-5 might not be directlyemployed for HSI classification especially for small-size datasets according to our experiments in Section 4 In this paperwe will explore what is the suitable architecture and strategyfor CNN-based HSI classification

Journal of Sensors 3

600050004000300020001000

00 50 100

(a) Asphalt

600050004000300020001000

00 50 100

(b) Meadows

600050004000300020001000

00 50 100

(c) Gravel

600050004000300020001000

00 50 100

(d) Trees

600050004000300020001000

00 50 100

(e) Sheets

600050004000300020001000

00 50 100

(f) Bare soil

600050004000300020001000

00 50 100

(g) Bitumen

600050004000300020001000

00 50 100

(h) Bricks

600050004000300020001000

00 50 100

(i) Shadows

Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands

3 CNN-Based HSI Classification

31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures

32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in

C1 feature maps M2 feature maps OutputF3

ConvolutionMax pooling Full connection Full connection

Input

Sharing same weights

1n1 times 1 20n2 times 1 20n3 times 1 n4 n5

middot middot middot

middot middot middot

Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network

Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight

4 Journal of Sensors

values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579

119894is the parameter

set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D

image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899

1 1) and 119899

1is the number of bands The first hidden

convolutional layer C1 filters the 1198991times 1 input data with 20

kernels of size 1198961times1 Layer C1 contains 20times119899

2times1 nodes and

1198992= 1198991minus 1198961+ 1 There are 20 times (119896

1+ 1) trainable parameters

between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896

2 1)

Layer M2 contains 20 times 1198993times 1 nodes and 119899

3= 11989921198962

There is no parameter in this layer The fully connected layerF3 has 119899

4nodes and there are (20 times 119899

3+ 1) times 119899

4trainable

parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899

4+ 1) times 119899

5trainable parameters

between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896

1+ 1) + (20 times

1198993+ 1) times 119899

4+ (1198994+ 1) times 119899

5trainable parameters

Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899

1and

1198995are the spectral channel size and the number of output

classes of the data set respectively In our experiments 1198961is

better to be lceil11989919rceil and 119899

2= 1198991minus1198961+1 1198993can be any number

between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These

choices might not be the best but are effective for general HSIdata

In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899

3

dimensionsOur architecture has some similarities to architectures

that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data

33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible

331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899

1input units in layer

INPUT 1198995output units in layer OUTPUT and several so-

called hidden units in layers C2 M3 and F4 Assuming x119894is

the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x

119894+1as

x119894+1

= 119891119894(u119894) (1)

where

u119894= W119879119894x119894+ b119894 (2)

and W119879119894is a weight matrix of the 119894th layer acting on the

input data and b119894is an additive bias vector for the 119894th

layer 119891119894(sdot) is the activation function of the 119894th layer In

our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899

5way softmax function

which produces a distribution over the 1198995class labels and the

softmax regression model is defined as

y = 1

sum1198995

119896=1119890W119879119871119896

x119871+b119871119896

[[[[[[[[

[

119890W1198791198711

x119871+b1198711

119890W1198791198712

x119871+b1198712

119890W1198791198711198995

x119871+b1198711198995

]]]]]]]]

]

(3)

The output vector y = x119871+1

of the layer OUTPUT denotesthe final probability of all the classes in the current iteration

332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as

119869 (120579) = minus1

119898

119898

sum

119894=1

1198995

sum

119895=1

1 119895 = Y(119894) log (y(119894)119895) (4)

where 119898 is the number of training samples Y is the desiredoutput y(119894)

119895is the 119895th value of the actual output y(119894) (see (3)) of

the 119894th training sample and is a vector whose size is 1198995 In the

desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient

The derivative of the loss function with respect to u119894is

120575119894=

120597119869

120597u119894

=

minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871

(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871

(5)

where ∘ denotes element-wise multiplication 1198911015840(u119894) can be

easily represented as

1198911015840(u119894) =

(1 minus 119891 (u119894)) ∘ (1 + 119891 (u

119894)) 119894 = 1 3

null 119894 = 2

119891 (u119894) ∘ (1 minus 119891 (u

119894)) 119894 = 4

(6)

Journal of Sensors 5

⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899

1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899

119894

layerneurons = new Neuron [119899119894+1]

layerparams = 120579119894

modeladdLayer(layer)end forreturnmodel

end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899

2 1198993 1198994 1198961 1198962 according to 119899

1and 119899

5

Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899

1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)

Update 120579 using (7)err = err + mean(119869(120579))

end forerr = errBATCHEStrainingiter++

end whileSave parameters 120579 of the CNN

Algorithm 1 Our CNN-based method

Therefore on each iteration we would perform the update

120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)

for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and

nabla120579119869 (120579) =

120597119869

1205971205791

120597119869

1205971205792

120597119869

120597120579119871

(8)

We know that 120579119894containsW

119894and b

119894 and

120597119869

120597120579119894

= 120597119869

120597W119894

120597119869

120597b119894

(9)

where

120597119869

120597W119894

=120597119869

120597u119894

∘120597u119894

120597W119894

=120597119869

120597u119894

∘ x119894= 120575119894∘ x119894

120597119869

120597b119894

=120597119869

120597u119894

∘120597u119894

120597b119894

=120597119869

120597u119894

= 120575119894

(10)

With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1

34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)

4 Experiments

All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

2 Journal of Sensors

of training data and efficient implementation on GPUsCNNs have recently outperformed some other conventionalmethods even human performance [17] on many vision-related tasks including image classification [18 19] objectdetection [20] scene labeling [21] house number digitclassification [22] and face recognition [23] Besides visiontasks CNNs have been also applied to other areas such asspeech recognition [24 25] The technique has been verifiedas an effective class of models for understanding visualimage content giving some state-of-the-art results on visualimage classification and other visual-related problems In[26] the authors presented DNN for HSI classification inwhich stacked autoencoders (SAEs) were employed to extractdiscriminative features

CNNs have been demonstrated to provide even betterclassification performance than the traditional SVM classi-fiers [27] and the conventional deep neural networks (DNNs)[18] in visual-related area However since CNNs have beenonly considered on visual-related problems there are rareliteratures on the technique with multiple layers for HSIclassification In this paper we have found that CNNs canbe effectively employed to classify hyperspectral data afterbuilding appropriate layer architecture According to ourexperiments we observe that the typical CNNs such asLeNet-5 [14] with two convolutional layers are actually notapplicable for hyperspectral data Alternatively we present asimple but effective CNN architecture containing five layerswith weights for supervised HSI classification Several exper-iments demonstrate excellent performance of our proposedmethod compared to the classic SVM and the conventionaldeep learning architecture As far as we know it is the firsttime to employ the CNN with multiple layers for HSI classi-fication

The paper is organized as follows In Section 2 we givea brief introduction to CNNs In Section 3 the typicalCNN architecture and the corresponding training processare presented In Section 4 we experimentally compare theperformance of our method with SVM and some neuralnetworks with different architectures Finally we conclude bysummarizing our results in Section 5

2 CNNs

CNNs represent feed-forward neural networks which consistof various combinations of the convolutional layers maxpooling layers and fully connected layers and exploit spatiallylocal correlation by enforcing a local connectivity patternbetween neurons of adjacent layers Convolutional layersalternate with max pooling layers mimicking the nature ofcomplex and simple cells in mammalian visual cortex [28]A CNN consists of one or more pairs of convolution andmaxpooling layers and finally ends with a fully connected neuralnetworks A typical convolutional network architecture isshown in Figure 1 [24]

In ordinary deep neural networks a neuron is connectedto all neurons in the next layer CNNs are different fromordinary NN in that neurons in convolutional layer are onlysparsely connected to the neurons in the next layer basedon their relative location That is to say in a fully connected

Filter size

Pooling size

Input

Shared weights

Convolutional layer

Max pooling layer

Fully connected layer

h1 h2 h3 h4 h5 h6

W

V 1 2 3 4 5 6

Figure 1 A typical CNN architecture consisting of a convolutionallayer a max pooling layer and a fully connected layer

DNNs each hidden activation ℎ119894is computed by multiplying

the entire input V by weights W in that layer However inCNNs each hidden activation is computed by multiplyinga small local input against the weights W The weights Ware then shared across the entire input space as shown inFigure 1 Neurons that belong to the same layer share the sameweights Weight sharing is a critical principle in CNNs sinceit helps reduce the total number of trainable parameters andleads to more efficient training and more effective model Aconvolutional layer is usually followed by amax pooling layer

Due to the replication of weights in a CNN a feature maybe detected across the input data If an input image is shiftedthe neuron detecting the feature is shifted as much Pooling isused to make the features invariant from the location and itsummarizes the output of multiple neurons in convolutionallayers through a pooling function Typical pooling functionis maximum A max pooling function basically returns themaximum value from the input Max pooling partitions theinput data into a set of nonoverlapping windows and outputsthe maximum value for each subregion and reduces thecomputational complexity for upper layers and provides aform of translation invariance To be used for classificationthe computation chain of a CNN ends in a fully connectednetwork that integrates information across all locations in allthe feature maps of the layer below

Most of CNNs working in image recognition have thelower layers composed to alternate convolutional and maxpooling layers while the upper layers are fully connectedtraditional MLP NNs For example LeNet-5 is such a CNNarchitecture presented for handwritten digit recognition [14]firstly and then it is successfully used for solving other visual-related problems However LeNet-5 might not be directlyemployed for HSI classification especially for small-size datasets according to our experiments in Section 4 In this paperwe will explore what is the suitable architecture and strategyfor CNN-based HSI classification

Journal of Sensors 3

600050004000300020001000

00 50 100

(a) Asphalt

600050004000300020001000

00 50 100

(b) Meadows

600050004000300020001000

00 50 100

(c) Gravel

600050004000300020001000

00 50 100

(d) Trees

600050004000300020001000

00 50 100

(e) Sheets

600050004000300020001000

00 50 100

(f) Bare soil

600050004000300020001000

00 50 100

(g) Bitumen

600050004000300020001000

00 50 100

(h) Bricks

600050004000300020001000

00 50 100

(i) Shadows

Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands

3 CNN-Based HSI Classification

31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures

32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in

C1 feature maps M2 feature maps OutputF3

ConvolutionMax pooling Full connection Full connection

Input

Sharing same weights

1n1 times 1 20n2 times 1 20n3 times 1 n4 n5

middot middot middot

middot middot middot

Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network

Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight

4 Journal of Sensors

values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579

119894is the parameter

set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D

image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899

1 1) and 119899

1is the number of bands The first hidden

convolutional layer C1 filters the 1198991times 1 input data with 20

kernels of size 1198961times1 Layer C1 contains 20times119899

2times1 nodes and

1198992= 1198991minus 1198961+ 1 There are 20 times (119896

1+ 1) trainable parameters

between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896

2 1)

Layer M2 contains 20 times 1198993times 1 nodes and 119899

3= 11989921198962

There is no parameter in this layer The fully connected layerF3 has 119899

4nodes and there are (20 times 119899

3+ 1) times 119899

4trainable

parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899

4+ 1) times 119899

5trainable parameters

between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896

1+ 1) + (20 times

1198993+ 1) times 119899

4+ (1198994+ 1) times 119899

5trainable parameters

Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899

1and

1198995are the spectral channel size and the number of output

classes of the data set respectively In our experiments 1198961is

better to be lceil11989919rceil and 119899

2= 1198991minus1198961+1 1198993can be any number

between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These

choices might not be the best but are effective for general HSIdata

In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899

3

dimensionsOur architecture has some similarities to architectures

that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data

33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible

331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899

1input units in layer

INPUT 1198995output units in layer OUTPUT and several so-

called hidden units in layers C2 M3 and F4 Assuming x119894is

the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x

119894+1as

x119894+1

= 119891119894(u119894) (1)

where

u119894= W119879119894x119894+ b119894 (2)

and W119879119894is a weight matrix of the 119894th layer acting on the

input data and b119894is an additive bias vector for the 119894th

layer 119891119894(sdot) is the activation function of the 119894th layer In

our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899

5way softmax function

which produces a distribution over the 1198995class labels and the

softmax regression model is defined as

y = 1

sum1198995

119896=1119890W119879119871119896

x119871+b119871119896

[[[[[[[[

[

119890W1198791198711

x119871+b1198711

119890W1198791198712

x119871+b1198712

119890W1198791198711198995

x119871+b1198711198995

]]]]]]]]

]

(3)

The output vector y = x119871+1

of the layer OUTPUT denotesthe final probability of all the classes in the current iteration

332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as

119869 (120579) = minus1

119898

119898

sum

119894=1

1198995

sum

119895=1

1 119895 = Y(119894) log (y(119894)119895) (4)

where 119898 is the number of training samples Y is the desiredoutput y(119894)

119895is the 119895th value of the actual output y(119894) (see (3)) of

the 119894th training sample and is a vector whose size is 1198995 In the

desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient

The derivative of the loss function with respect to u119894is

120575119894=

120597119869

120597u119894

=

minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871

(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871

(5)

where ∘ denotes element-wise multiplication 1198911015840(u119894) can be

easily represented as

1198911015840(u119894) =

(1 minus 119891 (u119894)) ∘ (1 + 119891 (u

119894)) 119894 = 1 3

null 119894 = 2

119891 (u119894) ∘ (1 minus 119891 (u

119894)) 119894 = 4

(6)

Journal of Sensors 5

⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899

1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899

119894

layerneurons = new Neuron [119899119894+1]

layerparams = 120579119894

modeladdLayer(layer)end forreturnmodel

end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899

2 1198993 1198994 1198961 1198962 according to 119899

1and 119899

5

Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899

1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)

Update 120579 using (7)err = err + mean(119869(120579))

end forerr = errBATCHEStrainingiter++

end whileSave parameters 120579 of the CNN

Algorithm 1 Our CNN-based method

Therefore on each iteration we would perform the update

120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)

for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and

nabla120579119869 (120579) =

120597119869

1205971205791

120597119869

1205971205792

120597119869

120597120579119871

(8)

We know that 120579119894containsW

119894and b

119894 and

120597119869

120597120579119894

= 120597119869

120597W119894

120597119869

120597b119894

(9)

where

120597119869

120597W119894

=120597119869

120597u119894

∘120597u119894

120597W119894

=120597119869

120597u119894

∘ x119894= 120575119894∘ x119894

120597119869

120597b119894

=120597119869

120597u119894

∘120597u119894

120597b119894

=120597119869

120597u119894

= 120575119894

(10)

With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1

34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)

4 Experiments

All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors 3

600050004000300020001000

00 50 100

(a) Asphalt

600050004000300020001000

00 50 100

(b) Meadows

600050004000300020001000

00 50 100

(c) Gravel

600050004000300020001000

00 50 100

(d) Trees

600050004000300020001000

00 50 100

(e) Sheets

600050004000300020001000

00 50 100

(f) Bare soil

600050004000300020001000

00 50 100

(g) Bitumen

600050004000300020001000

00 50 100

(h) Bricks

600050004000300020001000

00 50 100

(i) Shadows

Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands

3 CNN-Based HSI Classification

31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures

32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in

C1 feature maps M2 feature maps OutputF3

ConvolutionMax pooling Full connection Full connection

Input

Sharing same weights

1n1 times 1 20n2 times 1 20n3 times 1 n4 n5

middot middot middot

middot middot middot

Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network

Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight

4 Journal of Sensors

values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579

119894is the parameter

set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D

image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899

1 1) and 119899

1is the number of bands The first hidden

convolutional layer C1 filters the 1198991times 1 input data with 20

kernels of size 1198961times1 Layer C1 contains 20times119899

2times1 nodes and

1198992= 1198991minus 1198961+ 1 There are 20 times (119896

1+ 1) trainable parameters

between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896

2 1)

Layer M2 contains 20 times 1198993times 1 nodes and 119899

3= 11989921198962

There is no parameter in this layer The fully connected layerF3 has 119899

4nodes and there are (20 times 119899

3+ 1) times 119899

4trainable

parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899

4+ 1) times 119899

5trainable parameters

between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896

1+ 1) + (20 times

1198993+ 1) times 119899

4+ (1198994+ 1) times 119899

5trainable parameters

Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899

1and

1198995are the spectral channel size and the number of output

classes of the data set respectively In our experiments 1198961is

better to be lceil11989919rceil and 119899

2= 1198991minus1198961+1 1198993can be any number

between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These

choices might not be the best but are effective for general HSIdata

In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899

3

dimensionsOur architecture has some similarities to architectures

that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data

33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible

331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899

1input units in layer

INPUT 1198995output units in layer OUTPUT and several so-

called hidden units in layers C2 M3 and F4 Assuming x119894is

the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x

119894+1as

x119894+1

= 119891119894(u119894) (1)

where

u119894= W119879119894x119894+ b119894 (2)

and W119879119894is a weight matrix of the 119894th layer acting on the

input data and b119894is an additive bias vector for the 119894th

layer 119891119894(sdot) is the activation function of the 119894th layer In

our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899

5way softmax function

which produces a distribution over the 1198995class labels and the

softmax regression model is defined as

y = 1

sum1198995

119896=1119890W119879119871119896

x119871+b119871119896

[[[[[[[[

[

119890W1198791198711

x119871+b1198711

119890W1198791198712

x119871+b1198712

119890W1198791198711198995

x119871+b1198711198995

]]]]]]]]

]

(3)

The output vector y = x119871+1

of the layer OUTPUT denotesthe final probability of all the classes in the current iteration

332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as

119869 (120579) = minus1

119898

119898

sum

119894=1

1198995

sum

119895=1

1 119895 = Y(119894) log (y(119894)119895) (4)

where 119898 is the number of training samples Y is the desiredoutput y(119894)

119895is the 119895th value of the actual output y(119894) (see (3)) of

the 119894th training sample and is a vector whose size is 1198995 In the

desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient

The derivative of the loss function with respect to u119894is

120575119894=

120597119869

120597u119894

=

minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871

(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871

(5)

where ∘ denotes element-wise multiplication 1198911015840(u119894) can be

easily represented as

1198911015840(u119894) =

(1 minus 119891 (u119894)) ∘ (1 + 119891 (u

119894)) 119894 = 1 3

null 119894 = 2

119891 (u119894) ∘ (1 minus 119891 (u

119894)) 119894 = 4

(6)

Journal of Sensors 5

⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899

1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899

119894

layerneurons = new Neuron [119899119894+1]

layerparams = 120579119894

modeladdLayer(layer)end forreturnmodel

end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899

2 1198993 1198994 1198961 1198962 according to 119899

1and 119899

5

Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899

1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)

Update 120579 using (7)err = err + mean(119869(120579))

end forerr = errBATCHEStrainingiter++

end whileSave parameters 120579 of the CNN

Algorithm 1 Our CNN-based method

Therefore on each iteration we would perform the update

120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)

for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and

nabla120579119869 (120579) =

120597119869

1205971205791

120597119869

1205971205792

120597119869

120597120579119871

(8)

We know that 120579119894containsW

119894and b

119894 and

120597119869

120597120579119894

= 120597119869

120597W119894

120597119869

120597b119894

(9)

where

120597119869

120597W119894

=120597119869

120597u119894

∘120597u119894

120597W119894

=120597119869

120597u119894

∘ x119894= 120575119894∘ x119894

120597119869

120597b119894

=120597119869

120597u119894

∘120597u119894

120597b119894

=120597119869

120597u119894

= 120575119894

(10)

With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1

34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)

4 Experiments

All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

4 Journal of Sensors

values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579

119894is the parameter

set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D

image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899

1 1) and 119899

1is the number of bands The first hidden

convolutional layer C1 filters the 1198991times 1 input data with 20

kernels of size 1198961times1 Layer C1 contains 20times119899

2times1 nodes and

1198992= 1198991minus 1198961+ 1 There are 20 times (119896

1+ 1) trainable parameters

between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896

2 1)

Layer M2 contains 20 times 1198993times 1 nodes and 119899

3= 11989921198962

There is no parameter in this layer The fully connected layerF3 has 119899

4nodes and there are (20 times 119899

3+ 1) times 119899

4trainable

parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899

4+ 1) times 119899

5trainable parameters

between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896

1+ 1) + (20 times

1198993+ 1) times 119899

4+ (1198994+ 1) times 119899

5trainable parameters

Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899

1and

1198995are the spectral channel size and the number of output

classes of the data set respectively In our experiments 1198961is

better to be lceil11989919rceil and 119899

2= 1198991minus1198961+1 1198993can be any number

between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These

choices might not be the best but are effective for general HSIdata

In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899

3

dimensionsOur architecture has some similarities to architectures

that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data

33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible

331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899

1input units in layer

INPUT 1198995output units in layer OUTPUT and several so-

called hidden units in layers C2 M3 and F4 Assuming x119894is

the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x

119894+1as

x119894+1

= 119891119894(u119894) (1)

where

u119894= W119879119894x119894+ b119894 (2)

and W119879119894is a weight matrix of the 119894th layer acting on the

input data and b119894is an additive bias vector for the 119894th

layer 119891119894(sdot) is the activation function of the 119894th layer In

our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899

5way softmax function

which produces a distribution over the 1198995class labels and the

softmax regression model is defined as

y = 1

sum1198995

119896=1119890W119879119871119896

x119871+b119871119896

[[[[[[[[

[

119890W1198791198711

x119871+b1198711

119890W1198791198712

x119871+b1198712

119890W1198791198711198995

x119871+b1198711198995

]]]]]]]]

]

(3)

The output vector y = x119871+1

of the layer OUTPUT denotesthe final probability of all the classes in the current iteration

332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as

119869 (120579) = minus1

119898

119898

sum

119894=1

1198995

sum

119895=1

1 119895 = Y(119894) log (y(119894)119895) (4)

where 119898 is the number of training samples Y is the desiredoutput y(119894)

119895is the 119895th value of the actual output y(119894) (see (3)) of

the 119894th training sample and is a vector whose size is 1198995 In the

desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient

The derivative of the loss function with respect to u119894is

120575119894=

120597119869

120597u119894

=

minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871

(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871

(5)

where ∘ denotes element-wise multiplication 1198911015840(u119894) can be

easily represented as

1198911015840(u119894) =

(1 minus 119891 (u119894)) ∘ (1 + 119891 (u

119894)) 119894 = 1 3

null 119894 = 2

119891 (u119894) ∘ (1 minus 119891 (u

119894)) 119894 = 4

(6)

Journal of Sensors 5

⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899

1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899

119894

layerneurons = new Neuron [119899119894+1]

layerparams = 120579119894

modeladdLayer(layer)end forreturnmodel

end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899

2 1198993 1198994 1198961 1198962 according to 119899

1and 119899

5

Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899

1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)

Update 120579 using (7)err = err + mean(119869(120579))

end forerr = errBATCHEStrainingiter++

end whileSave parameters 120579 of the CNN

Algorithm 1 Our CNN-based method

Therefore on each iteration we would perform the update

120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)

for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and

nabla120579119869 (120579) =

120597119869

1205971205791

120597119869

1205971205792

120597119869

120597120579119871

(8)

We know that 120579119894containsW

119894and b

119894 and

120597119869

120597120579119894

= 120597119869

120597W119894

120597119869

120597b119894

(9)

where

120597119869

120597W119894

=120597119869

120597u119894

∘120597u119894

120597W119894

=120597119869

120597u119894

∘ x119894= 120575119894∘ x119894

120597119869

120597b119894

=120597119869

120597u119894

∘120597u119894

120597b119894

=120597119869

120597u119894

= 120575119894

(10)

With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1

34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)

4 Experiments

All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors 5

⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899

1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899

119894

layerneurons = new Neuron [119899119894+1]

layerparams = 120579119894

modeladdLayer(layer)end forreturnmodel

end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899

2 1198993 1198994 1198961 1198962 according to 119899

1and 119899

5

Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899

1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)

Update 120579 using (7)err = err + mean(119869(120579))

end forerr = errBATCHEStrainingiter++

end whileSave parameters 120579 of the CNN

Algorithm 1 Our CNN-based method

Therefore on each iteration we would perform the update

120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)

for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and

nabla120579119869 (120579) =

120597119869

1205971205791

120597119869

1205971205792

120597119869

120597120579119871

(8)

We know that 120579119894containsW

119894and b

119894 and

120597119869

120597120579119894

= 120597119869

120597W119894

120597119869

120597b119894

(9)

where

120597119869

120597W119894

=120597119869

120597u119894

∘120597u119894

120597W119894

=120597119869

120597u119894

∘ x119894= 120575119894∘ x119894

120597119869

120597b119894

=120597119869

120597u119894

∘120597u119894

120597b119894

=120597119869

120597u119894

= 120575119894

(10)

With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1

34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)

4 Experiments

All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

6 Journal of Sensors

Table 1 Number of training and test samples used in the IndianPines data set

Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065

Total 1600 6904

equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card

41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]

uniformlyThe Indian Pines data set was gathered by Airborne

VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899

1= 220 119896

1= 24 119899

2= 197 119896

2= 5 119899

3= 40 119899

4=

100 and 1198995= 8 and the number of total trainable parameters

in the data set is 81408The second data employed was also collected by the

AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899

1= 224 119896

1= 24

1198992= 201 119896

2= 5 119899

3= 40 119899

4= 100 and 119899

5= 16 and the

number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the

Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340

pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water

Table 2 Number of training and test samples used in the Salinasdata set

Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607

Total 3200 50929

Table 3 Number of training and test samples used in University ofPavia data set

Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747

Total 1800 40976

band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896

1= 11 119899

2= 93 119896

2= 3 119899

3= 30 119899

4= 100 and

1198995= 9 and the number of total trainable parameters in the

data set is 61249

42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors 7

Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods

Hay-windrowed

Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method

FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds

Vineyard untrainedVineyard vertical trellis

Broccoli green weeds 1Broccoli green weeds 2

Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk

Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method

Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows

Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method

(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7

Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is

Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets

Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052

Table 5 Results of comparison with different neural networks onthe Indian Pines data set

Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016

also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training

lenovo
高亮

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

8 Journal of Sensors

75

80

85

90

95

100

1 2 3 4 5 6 7 8

Accu

racy

()

Class number

CNNSVM

(a)

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Accu

racy

()

Class number

CNNSVM

(b)

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9

Accu

racy

()

Class number

CNNSVM

(c)

Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3

and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework

Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors 9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70

Ove

rall

accu

racy

()

Training time (min)

(a)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

Ove

rall

accu

racy

()

Training time (min)

(b)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

Ove

rall

accu

racy

()

Training time (min)

(c)

Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time

the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network

To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides

higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders

To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

10 Journal of Sensors

Table 6 The detailed classification accuracies of all the classes for University of Pavia data set

Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000

0

02

04

06

08

1

12

14

16

18

2

0 1 2 3 4 5 6 7 8 9

Cos

t val

ue

Training time (min)

Figure 9 Cost value versus the training time for University of Paviadata sets

truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes

Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is

83

84

85

86

87

88

89

90

91

92

93

94

60 80 100 120 140 160 180 200

Ove

rall

accu

racy

()

Number of training samples of each class

SVMCNN

Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets

a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)

5 Conclusion and Future Work

In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors 11

Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples

Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively

References

[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002

[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004

[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009

[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012

[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004

[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000

[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011

[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013

[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997

[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999

[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010

[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988

[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011

[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003

[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011

[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012

[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

12 Journal of Sensors

[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013

[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012

[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014

[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013

[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012

[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014

[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008

[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968

[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012

[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011

[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist

[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005

[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Deep Convolutional Neural Networks for Hyperspectral Image Classification · 2017-10-17 · ResearchArticle Deep Convolutional Neural Networks for Hyperspectral Image Classification

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Antennas andPropagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014