HYPERSPECTRAL IMAGERY CLASSIFICATION USING TECHNOLOGIES OF COMPUTATIONAL INTELLIGENCE

http://www.iaeme.com/IJECET/index.asp 18 [email protected]

International Journal of Electronics and Communication Engineering and Technology (IJECET) Volume 8, Issue 1, January - February 2017, pp. 18–31, Article ID: IJECET_08_01_003

Available online at

http://www.iaeme.com/IJECET/issues.asp?JType=IJECET&VType=8&IType=1

ISSN Print: 0976-6464 and ISSN Online: 0976-6472

© IAEME Publication

HYPERSPECTRAL IMAGERY CLASSIFICATION

USING TECHNOLOGIES OF COMPUTATIONAL

INTELLIGENCE

Priya G. Deshmukh

Electronics Department, Amrutvahini College of Engineering,

Sangamner, Maharashtra, India

Prof. M. P. Dongare

Assistant Professor, Electronics Department, Amrutvahini College of Engineering,

Sangamner, Maharashtra, India

ABSTRACT

Texture information is exploited for classification of HSI (Hyperspectral Imagery) at high

spatial resolution. For this purpose, framework employs to LBP (Local Binary Pattern) to extract

local image features such as edges, corners & spots. After the extraction of LBP feature two levels

of fusions are applied along with Gabor feature & spectral feature, i.e. Feature level fusion &

Decision level fusion. In Feature level fusion multiple features are concurred before pattern

classification. While in decision level fusion, it works on probability output of each individual

classification pipeline combines the distinct decisions into final one. Decision level fusion consists

of either hard fusion, soft fusion method. In hard fusion we consider majority part & in soft fusion

linear logarithmic opinion pool at probability level (LOGP). In addition to this, extreme learning

machine (ELM) classifier is included which is more efficient than support vector machine (SVM),

used to provide probability classification output. It has simple structure with one hidden layer &

one linear output layer. ELM trained much faster than SVM.

Key words: Decision fusion, extreme learning machine (ELM), Gabor filter, hyperspectral imagery

(HSI), local binary patterns (LBPs), pattern classification.

Cite this Article: Priya G. Deshmukh and Prof. M.P. Dongare, Hyperspectral Imagery

Classification using Technologies of Computational Intelligence, International Journal of

Electronics and Communication Engineering and Technology, 8(1), 2017, pp. 18–31.

http://www.iaeme.com/IJECET/issues.asp?JType=IJECET&VType=8&IType=1

1. INTRODUCTION

To develop an innovative technique to classify the hyperspectral images using tools of computational

intelligence. Classification of hyperspectral imagery (I) at high spatial resolution is done by exploiting

texture information. It uses local binary pattern to extract local features and a simple efficient extreme

learning machine with a very simple structure is employed as the classifier. Many algorithms have been

proposed improve the local image feature for classification of hyperspectral images currently feature-level

Hyperspectral Imagery Classification using Technologies of Computational Intelligence


fusion simply concatenates a pair of different features (i.e., Gabor features, LBP features, and spectral

features) in the feature space

A) LBP Features- LBP for HSI classification works on gray scale image with single spectral band. In this

method, linear prediction error (LPE) is used for unsupervised band selection. LPE is first applied to the

set of featured & distinct bands. For each band LBP code is generated for each pixel in the entire image. So

that to generate LBP code image. From LBP code image LBP image patch is extracted to calculate its

histogram. Performance of LPE is better than principle component analysis.

After the extraction of LBP feature two levels of fusions are applied along with Gabor feature &

spectral feature, i.e. Feature level fusion & Decision level fusion. In Feature level fusion multiple features

are concurred before pattern classification. While in decision level fusion, it works on probability output of

each individual classification pipeline combines the distinct decisions into final one. Decision level fusion

consists of either hard fusion, soft fusion method. In hard fusion we consider majority part & in soft fusion

linear logarithmic opinion pool at probability level (LOGP).

B) ELM- Extreme Learning Machine (ELM) classifier is used to provide probability classification outputs

using LBP features. ELM is a neural network based method. It has very simple structure consisting of one

hidden layer & one linear layer. It is much faster technique due to random generation of input weight &

analytical generation (least square method) of output weights, which reduces computational cost.

2. LITERATURE REVIEW

It is of great interest in exploiting spatial information to improve HSI classification. In previous system

like SVM classifier composite kernels is employed for combination of both spectral & spatial information

referred [9] as SVM-CK. Further researches are like SVM-MRF [10] based on segmentation map obtained

by pixel wise SVM classifier, Gaussian mixture model classifier (MRF-GMM). Moreover, morphological

profile (MP) [7] generated by certain morphological operators which is widely used for modeling structural

information. MPs are extracted from principle components (PC). But in PCs fine structures tend to be

present in minor PCs than in major PCs. Before SVM-CK [8] kernel discriminate analysis were employed

but it has problem of overload computation. Next researches are like Gabor texture feature, Gabor texture

feature including gray level concurrent matrix, different MPs, & urban complexity index also Gabor

feature with band selection

3. PROBLEM STATEMENT

Hyperspectral image processing has been a very dynamic area in remote sensing and other applications in

recent years. Hyperspectral images provide ample spectral information to identify and distinguish

spectrally similar materials for more accurate and detailed information extraction. Wide range of advanced

classification techniques are available based on spectral information and spatial information. To improve

classification accuracy it is essential to identify and reduce uncertainties in image processing chain. Large

number of high spatial resolution images is available through various advances of sensor technology. In

conventional HSI classification systems, classifiers only consider spectral signatures and ignore the spatial

information at neighboring locations. So we focused on classification of hyperspectral images using local

binary patterns and technologies of computational intelligence.

4. PROPOSED SOLUTION

There are total two primary stages in this method i.e. an effective texture feature extraction & fusion of

extracted local LBP features, global Gabor features, & original spectral feature. First LPE is applied to an

image for band selection with gray scale image generation. Then LBP code generation takes place to each

pixel in an image. From that only local LBP image patch extracted & then histogram is calculated. After

this process fusion of extracted local LBP is carried out along with Gabor, spectral features. In this process

LOGP plays a vital role in merging probability outputs of the multiple texture & spectral features. Gabor

filter is used as a global operator to capture global texture feature like orientation &scale. While LBP can

Priya G. Deshmukh and Prof. M.P. Dongare


characterized the local spatial textures such as edges, corners, & knots. Finally Gabor feature & LBP

texture features are combined for HSI classification betterment.

5. SYSTEM OVERVIEW

5.1. Hyper Spectral Image Classification Approaches

The “hyper” means “over” i.e. “too many” and refers as large number of measured wavelength bands.

Hyper spectral images are spectrally over determined, which provides ample spectral information to

recognize and distinguish spectrally unique materials. Hyper spectral imagery gives the potential for more

accurate and detailed information extraction than possible with any other type of remotely sensed data

[1].Hyper spectral images are 3D data, with a spectral signature for the scene spread over several bands.

Generally, the high dimensional spectral information is used to perform operations like pixel-by-pixel

classification of the scene. Band selection or feature extraction methods had been developed to improve the

performance of parametric classifiers like ML, Distance Classifiers and clustering methods. However, the

classification accuracies of these methods do not match for gray scale/color images. To identify groups of

pixels which having similar spectral characteristics and also to determine the various features represented

by these groups is an important part of image analysis, called as classification. To classify an image, visual

classification based on the analyst's ability to use visual elements (tone, contrast, shape, etc.) is necessary.

Digital image classify on the basis of the spectral information used to create the image and each individual

pixel is classified based on its spectral characteristics & then all pixels in an image are assigned to

particular classes (e.g. water, coniferous forest, deciduous forest, corn, wheat, etc.). This classified image

is called a thematic map of the original image. Hence a classification is performed for observation of land

use patterns, geology, vegetation types, or rain fall. In classification of an image we have to distinguish

between spectral classes and information classes. Spectral classes are groups of pixels which have

approximately uniform spectral characteristics. The main objective of image classification procedures is to

automatically categorize all pixels in image into land cover classes. Based on pixel information, images is

classified into Per-pixel, Sub pixel, Per-field, Knowledge based, Contextual and multiple classifiers. Per-

pixel classifiers are parametric or nonparametric. By using training samples, images can be classified as

Supervised and Unsupervised Classification. The unsupervised classification is the identification of natural

groups. The supervised classification is the method of using samples of known identity to assign

unclassified pixels to one of several informational classes. Supervised method follows the steps such as

feature extraction, training and labeling processes. In first step transforming the image to a feature image to

reduce the data dimensionality and improve the data interpretability takes place. It is optional phase and

composed techniques such as HIS transformation, principal component analysis and linear mixture model.

In the training phase, a set of training samples in the image is selected to specify each class. Training

samples trained classifiers to identify the classes and are used to determine the ‘rules’ which allow

assignment of a class label to each pixel in the image. Hyperspectral Image Classification approaches are

classified as shown in Figure 1



Figure 1 Hyperspectral image classification

On the basis of pixel information, Images can be classified as Per-Pixel, Sub Pixel, Per-field,

Knowledge based, Contextual and multiple Classifiers. In Per-Pixel Classifier image classification is based

on processing the entire scene in an image, pixel by pixel referred as pixel-based classification. In many

applications per-pixel classifiers are not suitable as they basically handle spectral information. In Sub pixel

classifier, each pixel is classified into one category. It deals with mixed pixel problems. Using an extended

version of the Gaussian Maximum Likelihood (GML) algorithm per-field classifier scene is divided into

homogeneous image segments. Contextual classifier makes the use of the spectral information at every

pixel to predict the class of that pixel independently by observation at other pixels. It utilizes the

information from neighborhood pixels.

5.2. Local Binary Pattern (LBP)

The LBP operator is an image operator which transforms an image into an array which describes small-

scale appearance of the image. These labels, most of the histogram, are then used for further image

analysis. The basic version of the local binary pattern operator works in a 3×3 pixel block of an image.

Each pixel in this block is threshold by its center pixel value, multiplied by powers of two and then added

to obtain a label for the center pixel. Total 28= 256 different labels can be obtained on the basis of relative

gray values of the center and the pixels in the neighborhood as the neighborhood consists of 8 pixels. An

example of an LBP image and histogram are shown in Figure 2

Figure 2 Example of an input image, the corresponding LBP image and histogram



Figure 3 The circular (8,1),(16,2) and (8,2) neighborhoods. The pixel values are bilinear interpolated whenever the

sampling point is not in the center of a pixel

5.3. Mappings of the LBP Labels: Uniform Patterns

It is of great interest to have features that are invariant to rotations of the input image in many texture

analysis applications. Rotation of the input image has two effects first is each local neighborhood is rotated

into other pixel location, and second is within each neighborhood as the LBP P, R patterns are obtained by

circularly sampling around the center pixel. The sampling points on the circle surrounding the center point

are rotated into a different orientation. Another one is uniform patterns which uses original operator [5].For

this, a uniformity pattern is used with the bit pattern is considered circular: U (“pattern”) which is the

number of bitwise transitions from 0 to 1 or vice versa. If uniformity of local binary pattern measures at

most 2 then it is called as a uniform. For example, the patterns 00000000 (0 transitions), 01110000 (2

transitions) and 11001111 (2 transitions) are uniform and the patterns 11001001 (4 transitions) and

01010011 (6 transitions) are not uniform. Separate output label for each uniform pattern and all the non-

uniform patterns are assigned to a single label in uniform LBP mapping. Hence, the number of different

output labels for mapping for patterns of P bits is P (P −1) + 3.59 output labels for neighborhoods of 8

sampling points, and 243 labels for neighborhoods of 16 sampling points are produced by the uniform

mapping. The reasons for neglecting the non-uniform patterns are twofold. First is, most of the local binary

patterns are uniform in natural images. It is noticed that in experiments with texture images, when using

the (8, 1) neighborhood uniform patterns gives result, a bit less than 90% of all patterns and for around

70% in the (16, 2) neighborhood. In experiments with facial images [1], it is found that 90.6% of the

patterns in the (8, 1) neighborhood and 85.2% of the patterns in the (8, 2) neighborhood are uniform. The

second thing for considering uniform patterns is the statistical robustness. Using uniform patterns produces

better recognition results in many applications. Uniform patterns themselves are more stable, i.e. less

affected to noise and whereas, considering only uniform patterns makes the number of possible LBP labels

significantly lower and reliable estimation of their distribution requires fewer samples. The uniform

patterns allow seeing the LBP method as a unifying approach to the traditionally divergent statistical and

structural models of texture analysis [5].Every pixel is labeled with the code of the texture primitive that

matches to the local neighborhood for the best result. Thus each LBP code can be considered as a micro

text on. Spots, flat areas, edges; edge ends, and curves and so on are local primitives detected by the LBP.

Some examples are shown in Figure 4with the LBP 8, R operator. In the figure, 1’s are represented as bold

black circles, and 0’s are white. The LBP distribution hence has both the properties of a structural analysis

method: texture primitives and placement rules. Whereas, the distribution is just a statistic of a non-linearly

filtered image, which clearly makes the method a statistical one



Figure 4 Different texture primitives detected by the LBP

For these reasons, the LBP distribution can be successfully used in recognizing a wide variety of

different textures, to which statistical and structural methods have normally been applied separately.

5.4. Decision Fusion

Flowchart 1 Decision Fusion Approach

To combine the results from supervised and unsupervised classifiers a decision fusion approach is

developed. Advantage of the power of a support-vector machine-based supervised classification in class

separation and the capability of an unsupervised classifier, such as K-means clustering, in reducing trivial

spectral variation impact in homogeneous regions is taken by final output. In this three decision level

fusion methods and four schemes for input data are used to hyper spectral remote sensing image

classification. The first scheme is the most common in which the original hyper spectral dataset is used by

different classifiers. The second scheme is an improved one, in which all classifiers still use identical input

dataset, but these dataset should consists of both the original data and texture features derived from

original data. In third, all wavebands are divided into different groups based on inter-band correlation

analysis. Each group of data including texture feature are used to a specific classifier, that means the input

for multiple classifiers are different but every group of data should be a representative subset of original

data. In fourth one, the first ten components derived by MNF transformation to original data and texture

features are used as input of different classifiers. A modification to principal components analysis that

normalizes each band of the hyper spectral image by its noise level prior to processing is the Minimum (or

Maximum) Noise Transform (MNF).This acts to reduce the influence of noise in the transformed images

as the noisier bands of the hyper spectral image are deemphasized. Generally noise is calculated by using



"shift-difference" statistics. In this method, the difference between adjacent pixels is assumed to be an

estimate of noise.

5.5. Support Vector Machine

Support Vector Machines (SVM) is recently being used with success for the classification of hyper spectral

images. This method observe like a robust alternative for pattern recognition with hyper spectral data: as

the method is based on a geometric point of view, no statistical estimation has to be achieved. Then, SVM

performs classical supervised classification algorithms such as the maximum likelihood with the number of

spectral bands increases or number of training samples remains limited. The technique consists of finding

the optimal separation surface between classes. These samples are called support vectors. If the training

data set is not linearly separable a kernel method is used to simulate a non-linear projection of the data in a

higher dimension space, where the classes are linearly separable. Also, without statistical estimations, a

small number of training samples (understood that those are representative) is enough to find the support

vectors. Then, such a kind of classifier gives very interesting properties for hyper spectral image

processing: it does not affect due to the Hughes phenomenon (which is: for a limited number of training

samples, the classification rate decreases as the dimension increases). It may perform class separation even

with a small number of training samples spaced much closed to each other. This separability remains quite

difficult even with techniques dedicated to hyper spectral data such as Spectral Angle Mapping or Spectral

Un-mixing. However, separability measures are based on dot product or geometric distance between

vectors. Such a kind of approaches does not take into consideration spectral meaning and behavior.

Spectral signature of an object remains of the same shape, though it is observed with several illumination

conditions, and requires to be classified in the same way. Then for processing hyper spectral data cubes, it

is proposed to integrate spectral knowledge into SVM classifiers. It enables the classification results to be

improved for thematic classification of hyper spectral data cube. The process has been applied on hyper

spectral images from the CASI sensor.

5.6. Extreme Learning Machine

Extreme learning machine (ELM) [4] belongs to the class of single-hidden layer feed- forward neural

networks (SLFNs). To train such networks traditionally, gradient based method such as back propagation

algorithm is used. ELM randomly generates the hidden node parameters and analytically determines the

output weights instead of iterative tuning, to make learning extremely fast. ELM is computationally

efficient as well as tends to achieve similar or even better generalization performance than SVMs.

However, ELM can produce a large variation in classification accuracy, with the same number of hidden

nodes, due to the randomly assigned input weights and bias. But in these works, ELM was employed as a

pixel-wise classifier, which indicates that only the spectral signature has been exploited, ignoring the

spatial information at neighboring locations. Still for HSI, it is highly probable that two adjacent pixels

belong to the same class. Both spectral and spatial information are verified to improve the HSI

classification accuracy significantly [1].Two main categories utilizing spatial features to extract some type

of spatial features (e.g., texture, morphological profiles, and wavelet features), and to directly use pixels in

a small neighborhood for joint classification assuming that these pixels usually share the same class

membership. The first one (feature dimensionality increased), i.e. Gabor features are successfully used for

hyper spectral image classification [1] due to the ability to represent useful spatial information.

6. SYSTEMANALYSIS

6.1. Band Selection

Hyper spectral images consist of a large number of spectral bands, but many of which contain redundant

information. By selecting a subset of spectral bands with distinctive and informative features, band

selection, such as LPE [1], reduces the dimensionality. Linear projections, such as PCA, can also transform

the high-dimensional data into a lower dimensional subspace. In previous study [1], [5], investigation on



both LPE and PCA for spatial-feature-based hyper spectral image classification is done and found that the

classification performance of LPE was superior to that of PCA. The reason might be fine spatial structures

tend to be present in minor PCs rather than in major PCs. Thus, band selection (i.e., LPE) is employed in

this research. LPE [1] is a simple. Efficient band selection method based on band similarity measurement.

Assuming there are two initial bands B1 and B2. For every other band B, an approximation can be

expressed as� = �0 + �1�1 + �2�2, where a0, a1, a2 are the parameters to minimize the LPE:� = ‖� � Let the parameter vector be .2‖׳�− = �� , ��, ��. A least squares solution is employed to obtain � = (�� 1�2) − 1�� where XB1B2 is an N×3 matrix whose first column is with all 1s, second

column is the B1-band, and third column is the B2-band. Here, N is the total number of pixels, and XB is

the B-spectral band. The band which produces the maximum error e is considered as the most dissimilar

band toB1andB2, and it will be selected. Using these three bands, a fourth band can be found by using the

same strategy and so on. More implementation details can be found in [5].

6.2. Feature Extraction

It belongs to gray scale & rotation invariant texture operator. An effective texture feature extraction

approach that is more suitable to HSI is provided. To find a set of distinctive and informative bands, an

LPE-based band selection is first employed. For each band, the LBP code is computed for every pixel in

the entire image to form an LBP code image, and then for each local patch cantered at a pixel of interest,

the LBP histogram is generated. The second thing is in the effective fusion of extracted local LBP features,

global Gabor features, and original spectral features, LOGP plays a vital role in merging probability

outputs of the multiple texture and spectral features.

Figure 5 Pixel orientation

Figure 6 Example of LBP binary thresholding (a) Centre pixel ��and its eight circular neighbors {��} 7i=0 with

radius r=1.(b) 3×3 sample block.(c) Binary labels of eight neighbors

With center pixel ��, each neighbor of ��(t0 to t7) is assigned with a binary label, either 0 or 1. These

values are depends on value of intensity of center pixel�� . All these samples are equispaced with radius r.

where r is refers to distance between neighbor and center pixel. For m number of neighbors{ ��} � �!" , the

LBP code for �� is given by



#�$�,%(��) = ∑ '(�( − �))� �!" 2� (1)

Where, '(�( − �)) = 1 (*�( > �) =1& '(�( − �)) = 0 (*�( ≤ �)

Figure 6 shows an example of binary thresholding process of (m, r) = (8, 1). LBP divide examine

window into the cells (for e.g., 16x16). For each pixel in the cell, compare the pixels to each of its eight

neighbors; follow the pixel along the circle clockwise or counter clockwise. If center pixel value greater

than neighbor’s value write “0”, otherwise write “1” gives 8 digit binary number. If LBP code is calculated

in clockwise direction then 11001010= 83. Assuming that the coordinate of ��is (0, 0), each neighbor� �has

a coordinate of (r sin (2πi/m), r cos (2πi/m)).Practically, parameter set (m, r) may change, such as (4, 1), (8,

2), etc. The locations of circular neighbors that do not fall exactly on image grids are estimated by bilinear

interpolation [5]. The output of the LBP operator in (1) indicates that the binary labels in a neighborhood,

represented as an m-bit binary number (including 2m

distinct values), reflect texture orientation and

smoothness in a local region. After obtaining the LBP code, an occurrence histogram, as a nonparametric

statistical estimate, is computed over a local patch. A binning procedure is required to guarantee that the

histogram features have the same dimension.

Figure 7 Implementation of LBP feature extraction

After band selection, the LBP feature extraction process or Gabor filtering is applied to each selected

band image. Figure 3 illustrates the implementation of LBP feature extraction. In Figure 7, the LBP code is

first calculated for the entire image to form an LBP image, and the LBP features are then generated for the

pixel of interest in its corresponding local LBP image patch. Note that patch size is a user-defined

parameter.

6.3. Gabor Filter

It is band-pass filter whose value is based on orientation & sensitive to rotation invariance. Generally we

prefer circularly symmetric Gabor filter to consider all directions for pass band. Gabor features includes

magnitude of signal power in corresponding filter pass band of Gabor filtered image. Gabor filter can be

represented as -.,/,0,1,2(a, b) = �56 7− 89: ;ᵞ:=9:�>: ? �56 @A 72B 89

C + D?E

Where, �9 = acos I + J sin I

J9 = −asin I + J cos I (2)

In this equation; δ: wavelength of sinusoidal factor θ: orientation separation angles (π/8, π/4, π/2 etc.)

ψ: phase offset σ: Standard derivation of Gaussian envelope γ: Spatial aspect ratio. With ψ=0 & ψ= π/2

return the real & imaginary parts of the Gabor filter respectively.

M = CN OPQ�

��RSTU�RS (3)



6.4. Comparison of Gabor & LBP

From the above description, it is notified that the Gabor belongs to a global operator while the LBP is a

local one. From the output result, Gabor features and LBP represent texture information from different

perspectives.

Figure 8 Example (a) Input image (b) LBP-coded image (different intensities representing different codes) (c) – (f)

Filtered images obtained by the Gabor filter with different θ values. (c) Gabor feature image, θ=0. (d) Gabor feature

image, θ=π/4. (e) Gabor feature image, θ=π/2. (f) Gabor feature image, θ=3π/4

Figure 8 illustrates an example of comparison between LBP and Gabor features in a natural image

(namely, boat) of size 256×256. Figure 8 (b) shows the LBP-coded image obtained using (1) with (m, r) =

(8, 1), and figure 8(c)–(d) illustrates the filtered images obtained by the Gabor filter with different θ (i.e., 0,

π/4, π/2, and 3π/4). In figure 8, the Gabor features produced by the average magnitude response for each

Gabor-filtered image reflect the global signal power. While the LBP coded image gives a better expression

of detailed local spatial features, like edges, corners, and knots. Hence, it is necessary to apply the global

Gabor filter as a supplement to the local LBP operator that lacks the consideration of distant pixel

interactions, for getting better result. As stated earlier, the Gabor filter captures the global texture

information of an image, and LBP represents the local texture information, it came to know that an HSI

data usually contains homogeneous regions where pixels fall into the same class. Gabor features can able

to reflect such a global texture information as the Gabor filter can effectively capture the orientation and

scale of physical structures in the scene. Hence, combining Gabor and LBP features can achieve better

classification performance than using only LBP features.

6.5. Classifier

ELM [2], [4] a type of classifier is a neural network with only one hidden layer and one linear output layer.

The weights of the output layer are computed using a least squares method and the weights between the

input and the hidden layers are randomly assigned. Which causes to the computational cost is much lower

than any other neural-network-based methods. For C classes, let the class labels be defined as VW ∈ {1, −1} (1 ≤ Y ≤ Z). Thus, a constructed row vector\ = �\1, . . . , \Y, . . . , \Z�indicates the class to which

a sample belongs. For example, if \Y = 1and the other elements in y are −1, then the sample belongs to

the kth

class. Thus, the training samples and corresponding labels are represented as{5(, \(}�! Q , where 5( ∈ ]^and\( ∈ ]�, the output function of an ELM with L hidden nodes can be expressed as,



*P(_`) = ∑ aAℎ(cA. 5( + JA) = \�d�! , i=1,2,3,…n (4)

where h (·) is a nonlinear activation function (e.g., sigmoid function), fβg]�denotes the weight vector

connecting the jth

hidden node to the output nodes, cfg]^ represents the weight vector connecting the jth

hidden node to the input nodes, and fβ is the bias of the jth

hidden node. The term cf . 5� denotes the inner

product ofcfand5�. If a value of 1 is padded to xi to make it a (d+1)-dimensional vector, then the bias can

be considered as an element of the weight vector, which is also randomly assigned. For n equations, (4)

can be written as

ha = V (5)

Where V = �\1; \2 … … . \k� ∈ ]Q_� , a = �a1; a2. . . ak� g]d_l, And H is the hidden layer output matrix

of the neural network expressed as

h = mℎ5n⋮ℎ5Qp = mℎ(c . 5n + J ) ⋯ ℎ(cd . 5n + Jd)⋮ ⋱ ⋮ℎ(c . 5n + J ) ⋯ ℎ(cd . 5Q + Jd)p (6)

In (6), ℎ(_`)= �ℎ(c . 5n + J )... ℎ(cd. 5n + Jd)] is the output of the hidden nodes in response to the

input xi, which maps the data from d-dimensional input space to L-dimensional feature space. In most of

cases, the number of hidden neurons is much smaller than the number of training samples, i.e.,# << k the

least squares solution of (7) described in [4] can be used.

β׳=H

†Y (7)

Where H† is the Moore–Penrose generalized inverse of matrix H and H

†=H

T (HH

T) −1

. For better

stability and generalization, a positive value1/ρ is normally added to each diagonal element of HHT. As a

result, the output function of the ELM classifier is expressed as

t# (5() = ℎ (5()a = ℎ(5()hu 7 nv + hhu? − 1V (8)

In ELM, the feature mapping h (xi) is assumed to be known. Recently, kernel-based ELM [4] has been

proposed by ex-tending explicit activation functions in ELM to implicit map-ping functions, which have

exhibited a better generalization capability. If the feature mapping is unknown, a kernel matrix of ELM can

be considered as

wxdy = hhu: wxdy(, A = ℎ(5() · ℎ(5A) = |(5( , 5A)(9)

Hence, the output function of KELM is given as

*d_` = m|(5� , 5 )⋮|(5� , 5Q)p�

7nÞ + wxdy? ~ (10)

The input data label is finally determined as per the index of the output node with the largest value. In

these experiments, the kernel version of ELM is implemented. The training of ELM has only one analytical

step as compared to the standard SVM, which needs to solve a large constrained optimization problem. In

these experiments, it will be demonstrated that ELM can provide a classification accuracy that is similar to

or even better than that of SVM.



6.6. Feature-Level Fusion (FF)

Feature level fusion employed in the proposed classification framework, as shown in Figure 9

Figure 9 Feature level fusion

Each feature reflects various properties and has its special meaning, like the Gabor feature provides

spatial localization and orientation selectivity, the LBP feature open up the local image texture (e.g., edges,

corners, etc.), and the spectral feature represents the correlation among bands. For various classification

tasks, these features are having their own advantages and disadvantages, and hence it is difficult to

determine which one is always optimal [1]. Thus, it is straightforward to stack multiple features into a

composite one. In this fusion strategy,to modify the scale of feature values feature normalization before

feature stacking is a necessary;i.e. pre-processing step. A simple treatment is to perform a linear

transformation on these data and preserves the relationships among the values. For an instance, a min–max

technique maps all of the values in the range of [0, 1].Here, three aforementioned features, i.e., LBP

features (local texture), Gabor features (global texture), and selected bands(spectral features), and their

combinations, such as LBP features+ Gabor features +spectral features, LBP features+ spectral features,

Gabor features+ spectral features, etc., will be discussed. Note that there are at least two potential dis-

advantages of feature-level fusion: 1) multiple feature sets to be stacked may be incompatible, causes to

the induced feature space to be highly nonlinear, and 2) the induced feature space has a much larger

dimensionality, which may deteriorate classification accuracy and processing efficiency

6.7. Decision-Level Fusion (DF)

Figure 10 Decision level fusion

Different from feature level fusion, decision level fusion [3], [5] is to merge results from a classifier

ensemble of multiple features as shown in Figure 10 This mechanism combines distinct classification

results into a final decision, improving the accuracy of a single classifier that uses a certain type of

features. The main objective is to utilize the information of each type of features, compute the probability

outputs by ELM, and then combine them with the soft LOGP for final decision. As the output function

[i.e., (11)] of ELM estimates the accuracy of the predicted label and reflects the classifier confidence, the

conditional class probability from the decision function is attempted to achieve. The probability should be



higher for a larger output of the decision function. Platt’s empirical analysis using scaling functions of the

following form is added

$�(��n �) = ;�_��(�)�;�� (11)

Where$�(��n �) means the conditional class probability of the qth

classifier,*d(_) is the output decision

function of each ELM, and (�W,�W) are parameters estimated for ELM in class Y (1 ≤ Y ≤ Z). The

parameters �W and �Ware found by minimization of the cross-entropy error over the validation data. Note

that �W is negative. In the proposed framework, LOGP [3], [5] uses the conditional class probabilities to

estimate a global membership function$�(��n �)—a weighted product of these output probabilities. The

final class label y is given according to

\ = arg ��5Y − 1, … ) 6(\W|�) (12)

Where the global membership function is

$(\W|�) = ∏ $��! (\W|�)8� (13)

log $(\W|�) = ∑ ��! $�(\W|�) (14)

With {��} �� = 1being the classifier weights uniformly distributed over all of the classifiers and Q

being the number of pipelines (classifiers) in Figure 10

6.8. Comparison between FF & DF

DF based method superior than FF based method as FF based method couldn’t take advantages of discrete

power of each feature image. FF is incompatible to multiple feature sets FF is having large dimensions

7. CONCLUSION

In this paper, a framework based on LBP proposed to extract local image features for classification of HSI.

Specifically, LBP implemented to a subset of original bands selected by the LPE method. Two types of

fusion levels (i.e., feature and decision levels) were defined on the extracted LBP features along with the

Gabor features and the selected spectral bands. A soft-decision fusion process of ELM utilizing LOGP

proposed to merge the probability outputs of multiple texture and spectral features. The experimental

results express that local LBP representations are effective in HSI spatial feature extraction, because they

encode the information of image texture configuration while providing local structure patterns. Also, the

decision-level fusion of kernel ELM provides effective classification and also it is superior to SVM-based

methods. Recently, feature-level fusion simply concatenates a pair of different features (i.e., Gabor

features, LBP features, and spectral features) in the feature space.

REFERENCES

[1] C. Chen, W. Li, H. Su, and K. Liu, “spectral–spatial classification of hyperspectral image based on

kernel extreme learning machine,” Remote Sens., vol. 6, no. 6, pp. 5795–5814, Jun. 2014.

[2] R. Moreno, F. Corona, A. Lend asse, M. Grana, and L. S. Galvao “Extreme learning machines for

soybean classification in remote sensing hyperspectral images,”Neuro-computing, vol. 128, no. 27, pp.

207–216, Mar. 2014.

[3] W. Li, S. Prasad, and J. E. Fowler, “Decision fusion in kernel-induced spaces for hyperspectral image

classification,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 6, pp. 3399–3411, Jun. 2014

[4] Y. Baziet al. “Differential evolution extreme learning machine for the classification of hyperspectral

images” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 6, pp. 1066–1070, Jun. 2014.

[5] Z. Guo, L. Zhang, and D. Zhang, “Rotation invariant texture classification using LBP variance (LBPV)

with global matching” Pattern Recogn., vol. 43, no. 3, pp. 706–719, Mar. 2010



[6] X. Kang, S. Li, and J. A. Benediktsson, “spectral–spatial hyperspectralimage classification with edge-

preserving filtering” ,IEEE Trans. Geosci.Remote Sens., vol. 52, no. 5, pp. 2666–2677, May 2014

[7] 7. M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson “Spectral and spatial classification

of hyperspectral data using SVMs and morphological profiles”IEEE Trans. Geosci. Remote Sens., vol.

46, no. 11,pp. 3804–3814, Nov. 2008.

[8] 8.C. Chen and J. E. Fowler, “Single image super-resolution using multi hypothesis prediction,” in Proc.

46th Asilomar Conf. Signals, Syst., Comput.,Pacific Grove, CA, USA, Nov. 2012, pp. 608–612

[9] 9. C. Chenet al., “Multihypothesis prediction for noise-robust hyperspectral image classification” IEEE

J. Sel. Topics Applies Earth Observe. RemoteSens., vol. 7, no. 4, pp. 1047–1059, Apr. 2014.

[10] 10. X. Huang and L. Zhang, “An SVM ensemble approach combining spectral, structural, and semantic

features for the classification of high-resolution remotely sensed imagery,” IEEE Trans. Geosci. Remote

Sens.,vol. 51, no. 1, pp. 257–272, Jan. 2013

[11] Sorna Percy. G and Dr. T. Arumuga Maria Devi, An Efficiently Identify the Diabetic Foot ULCER

Based on Foot Anthropometry Using Hyper Spectral Imaging. International Journal of Information

Technology & Management Information System, 7 (2), 2016, pp. 36–44.

[12] Preethi N Patil and G. G. Rajput, Detection and Classification of Non Proliferative Diabetic Retinopathy

Stages Using Morphological Operations and SVM Classifier. International Journal of Computer

Engineering & Technology, 4 (6), 2013, pp. 1–8.