Upload
sunnyds
View
145
Download
4
Embed Size (px)
Citation preview
i
Centre for Geo-Information
Thesis Report GIRS-2003-08
A Comparison Assessment Between Supervised and Unsupervised Neural
Network Image Classifiers
Author: Senait Dereje Senay
Supervisor: Dr. Monica Wachowicz
WAGENINGEN UR
Janu
ary
2003
⇔⇔
?
ii
Center for Geo-Information
Thesis Report GIRS-2003-08
A Comparison Assessment Between Supervised and Unsupervised Neural
Network Image Classifiers
Senait Dereje Senay
Thesis submitted in the partial fulfillment of the degree of Master of Science in Geo-
information Science at the Wageningen University and Research Center
Supervisor: Dr. Monica Wachowicz
Examiners: Dr. Monica Wachowicz DRS. A.J.W de Wit Dr. Ir. Ron van Lammeren
January 2003
Wageningen University
Center for Geo-Information and Remote Sensing Department of Environmental Sciences
iii
To my parents: Lt.Col. Yeshihareg Chernet and Ato Dereje Senay,
And my brother: Daniel Dereje
Thank you for everything you have been to me.
iv
Acknowledgements
I am indebted to my supervisor Dr. Monica Wachowicz, who gave me continuous professional support during all stages of undertaking the thesis. I would like to sincerely thank her for the invaluable advice and support she gave me. I am very grateful for Dr. Gerrit Epema, and Dr. Ir. Ron Van Lammeren who helped me in facilitating the field trip to the study area for ground control point collection and of course for the continuous moral support I have got from Dr. Gerrit Epema. I would sincerely like to thank Dr. Gete Zeleke and Mr. Meneberu Allebachew, who helped me by facilitating vehicle and other necessary data and support while I went to the study area for field data collection; without their help the field trip would not have been successful at all. I would also like to express my gratitude for Mr. Wubshet Haile and Mr. Getachew, who assisted me throughout the field work, enabling me to finish the field work with in a very limited time I had. I would like to extend my heart felt thanks to Mr. John Stuiver and Drs. Harm Bartholomeus, who supported me whenever I needed a professional held in pre-processing of data; without their support the data preprocessing stage of my thesis would definitely have taken more time. I would also like to thank the cartography section of Alterra who helped me in printing and scanning maps used in producing the report as well as in the analysis. I would like to extend my heartfelt thanks to my friends Achilleas Psomas (Ευχαριστώ) and Krzysztof Kozłowski (Dziękuję), for all the friendly moral support, and invaluable friendship; thanks for making my stressful days easier. I gratefully thank Dawit Girma, for all the help I got whenever I needed it. I would also like to show my gratefulness to my uncle Mr. Tesfasilassie Senay for providing me a family atmosphere while I went for a fieldtrip. Yet I would not pass without expressing my gratitude and sincere thanks to my friends, Giuseppe Amatulli (Grazie), Mauricio Labrador-garcia and Sonia Barranco-Borja (Gracias), Nicolas Dosselaere (Dank u wel), Izabela Witkowska (Dziękuję), Fanus Woldetsion (Yekenyeley), Adrian Ducos (Merci) for creating a pleasant working atmosphere, and much more, which helped us during the difficult times of working on the thesis, and which is also unforgettable. Αντε, I wish you all the best in the future. Last but not least, I would like to extend my admiration to the whole GRS 2001 batch for the respect and friendship between us I wish you all the best, nothing but the best. It has been an honor and pleasure to know you. Finally, I would like to extend my heartfelt thanks for NUFFIC for covering my study costs and offering me this experience.
v
Abstract
Neural networks are a recently emerged science, which developed as part of artificial
intelligence. They are used in solving complex problems in various disciplines. The
application of neural networks in remote sensing particularly in image classification has
become very popular in the last decade. The motivation to use neural networks arose due
to the limitations in using the conventional parametric image classifiers, as the source,
data structure, scale and amount of remotely sensed data became highly varied.
Fortunately neural networks are found to compensate the drawbacks; these conventional
classifiers have towards image classification. Neural networks offer two kinds of image
classification, supervised and unsupervised. In this study both neural networks were
tested to evaluate, which will result in a better accuracy image classification and which
method handle poor quality data better. Finally, a land cover map of southern part of
Lake Tana area situated in North West part of Ethiopia is produced from the best
classifier.
Key words: Neural Networks, neuron ANN, KWTA, LVQ, BP, Image classification
vi
Abbreviations
ANN Artificial Neural Networks ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer BP Back Propagation KWTA Kohonen’s Winner Take All LVQ Learning Vector Quantization MIR Middle Infrared MLNFF Multi-Layer Normal Feed Forward MLP Multi-Layer Perceptron NN Neural networks NDVI Normalized Difference Vegetation Index SOFM Self-Organizing Feature Maps SWIR Short Wave Infrared TIR Thermal Infrared VNIR Visible and Near Infrared WTA Winner-Take All
vii
Table of Contents
Acknowledgements .......................................................................................................... iv
Abstract .............................................................................................................................. v
Abbreviations ................................................................................................................... vi
List of Figures ................................................................................................................... ix
List of Tables ..................................................................................................................... x
1 Introduction .................................................................................................................... 1
1.1 BACK GROUND ........................................................................................................... 1
1.2 STUDY AREA .............................................................................................................. 3
1.3 OBJECTIVES................................................................................................................ 5
1.4 RESEARCH QUESTIONS ............................................................................................... 5
1.5 RESEARCH OUTLINE ................................................................................................... 5
2 Artificial Neural Networks (ANN) ............................................................................... 6
2.1 OVERVIEW OF THE MAIN CONCEPTS ............................................................................ 6
2.1.1 Biological concepts ............................................................................................ 6
2.1.2 Historical development ...................................................................................... 7
2.1.3 Basic neural network processor ........................................................................ 9
2.1.4 Neural networks and image classification ....................................................... 11
2.2 TYPES OF NEURAL NETWORKS .................................................................................. 12
2.2.1 Supervised neural network classifiers ............................................................. 13
2.2.1.1 Description of supervised neural network classifiers ............................... 13
2.2.1.2 Architecture and algorithm ....................................................................... 14
2.2.2 Unsupervised neural networks classifiers ....................................................... 18
2.2.2.1 Description of unsupervised Neural Network classifiers .......................... 18
2.2.2.2 Architecture and algorithm ....................................................................... 19
3 Methodology ................................................................................................................. 24
3.1 FIELD DATA ACQUISITION ......................................................................................... 26
3.2DATA PREPROCESSING .............................................................................................. 27
3.2.1 Datasets ........................................................................................................... 27
3.2.1.1 ASTER ...................................................................................................... 27
viii
3.2.1.2 Landsat TM ............................................................................................... 29
3.2.2 Datasets preparation ....................................................................................... 30
3.2.3 Training and test sets preparation ................................................................... 32
3.3 SUPERVISED NEURAL NETWORK CLASSIFICATION .................................................... 34
3.4 UNSUPERVISED NEURAL NETWORKS CLASSIFICATION .............................................. 35
3.5 ACCURACY ASSESSMENT AND VALIDATION ............................................................. 37
3.6 SENSITIVITY ANALYSIS............................................................................................. 37
3.7 IMPLEMENTATION ASPECTS ..................................................................................... 37
4 Results and Discussion ................................................................................................. 39
4.1 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER OR LANDSAT
TM DATASETS ................................................................................................................ 39
4.2 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER AND
LANDSAT TM INPUT DATASET ....................................................................................... 40
4.3 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER OR LANDSAT
DATASETS ...................................................................................................................... 41
4.4 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER AND LANDSAT
TM COMBINED DATASETS .............................................................................................. 42
4.5 VALIDATION OF THE RESULTS OBTAINED FROM THE BACK PROPAGATION SUPERVISED
NEURAL NETWORKS CLASSIFIER ..................................................................................... 43
4.6 VALIDATION OF THE RESULTS OBTAINED FROM THE KOHONEN/LVQ UNSUPERVISED
NEURAL NETWORK CLASSIFIER ....................................................................................... 45
4.7 IMPROVING THE TRAINING DATA QUALITY ............................................................... 46
4.8 SENSITIVITY ANALYSIS ............................................................................................ 49
5 Conclusions ................................................................................................................... 51
6 Recommendation.......................................................................................................... 55
References ........................................................................................................................ 56
Appendices ....................................................................................................................... 59
APPENDIX1: DATASET PROJECTION INFORMATION ........................................................ 59
APPENDIX2: RESULTS OF INPUT SENSITIVITY ANALYSIS ............................................... 60
APPENDIX3: NEURAL NETWORK PARAMETERS USED ...................................................... 62
ix
List of Figures
Figure 1: Map of Ethiopia ................................................................................................... 4
Figure 2. Overview of the study area .................................................................................. 4
Figure 3: Signal path of a single human neuron ................................................................. 6
Figure 4: The basic neural networks processor; the neuron, and its functions. .................. 9
Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture ...................... 14
Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer .................... 19
Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer. ............ 21
Figure 8: Design of Learning Vector Quantitzation Architecture .................................... 22
Figure 9: Overview of the main procedures involved in the methodological process ...... 25
Figure 10: ASTER bands superimposed on model Atmosphere. ..................................... 28
Figure 11: Landsat TM bands superimposed on model Atmosphere. .............................. 29
Figure 12: Study area after Lake Tana is masked out of the image. ................................. 30
Figure 13: Spectral signature of the six classes (before ASTER image rescaling) ......... 31
Figure 14: Spectral signature of the six classes (After ASTER image rescaling) ........... 31
Figure 15: Training/test data preparation procedure ......................................................... 32
Figure 16: Design of Back Propagation Neural Network ................................................. 34
Figure 17: The design of the KWTA/LVQ network. ........................................................ 36
x
List of Tables
Table 1: Spectral range of bands and spatial resolution for the ASTER sensor ............... 28
Table 2: Spectral range of bands and spatial resolution for the TM sensor ...................... 29
Table 3: Training and test sets for ASTER dataset. .......................................................... 33
Table 4: Training and test sets for Landsat TM dataset .................................................... 33
Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets33
Table 6. Training data set up for the Back Propagation neural network .......................... 35
Table 7. Training data set up for the Kohonen Winner Take All/LVQ network .............. 36
Table 8: Accuracy of the back propagation classifier using ASTER data ........................ 39
Table 9: Accuracy of the back propagation classifier trained with Landsat TM data ...... 40
Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat
TM combined datasets. ............................................................................................. 41
Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data .............. 42
Table 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data ... 42
Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM
data ............................................................................................................................ 42
Table 14: Confusion matrix for the Back propagation network classification using
ASTER and Landsat TM images .............................................................................. 43
Table 15: Percentage accuracy of the classes of the supervised classified image using the
ASTER and LandastTM data source ........................................................................ 44
Table 16: Confusion matrix for the unsupervised network .............................................. 45
Table 17: Percentage accuracy of the six classes of the unsupervised classified image .. 46
Table 18: classification result for ASTER-Landsat TM data into 5 classes ..................... 47
Table 19: Confusion matrix for the supervised classification with five output classes .... 47
Table 20: Percentage accuracy of the various classes of the supervised classified .......... 48
Image (with 5 classes) ....................................................................................................... 48
Table 21: Result of Sensitivity analysis of the ASTER dataset ....................................... 49
Table 22: Accuracy of supervised and unsupervised neural network classifiers .............. 52
1
1 Introduction
1.1 Back ground
Artificial Neural networks (ANNs) are systems that make use of some of the known or
expected organizing principles of the human brain. They consist of a number of
independent, simple processors – the neurons. These neurons communicate with each
other through weighted connections (REF1, 2002)1. The study of neural networks is also
referred as Artificial Neural Networks or connectionism (Roy, 2000). Use of artificial
neural networks for various applications is becoming common now days. The ease of
using the newly developing system ranges from less subjectivity of our analysis to full
automation of processes so that less manual interference is needed.
Neural networks are a very new technology, though the basis of this technology dated
back to the 40’s when McCulloch, a neuro-physiologist and a young mathematician,
Walter Pitts wrote a paper on how neurons might work, explaining their model, a simple
neural network with electrical circuits(Anderson et al, 1992). Ever since, the science
faced a lot of obstacles before becoming popular in use of different applications today.
The idea of imitating human brain structure i.e. neurons, in order to invent thinking
machines was proposed to be a moral issue in 1970’s. Much criticism was extended
towards the development of this science with an issue concerning how this neural
networks development affects human beings. People were concerned what the world
would look like with machines doing everything man can do. These movements ended up
in reducing much of the funds assigned for the development of the science; hence,
drawing the pace of development of neural networks backwards (Anderson et al, 1992).
However, this did not last long and the interest renewed when different scientists showed
that the idea of neural networks is not simply to model brains but to use them in a way
1 (Ref #) refers to references taken from the Internet; the path to sites is listed in the reference section.
2
that makes our way of life easier, in terms of computation, analysis of different
applications and less manual involvement of complex processes. This gave promising
lead to neural networks of today.
A lot of applications apart from Artificial Intelligence and computer sciences attempt to
make use of neural networks in their applications. This includes data warehousing, data
mining, robotics, wave physics, remote sensing, GIS (Roy, 2000). Although remote
sensing is not one of the primary fields to use ANN’s for analysis, recently neural
networks are being used in several applications of GIS and remote sensing. Some of the
most common applications include: data fusion, land suitability evaluation, spatio-
temporal analysis, and land cover classification of satellite images. However neural
networks did not fully replace the conventional way of analysis in these applications;
they are still being tested since the technology is not exhaustively tested on all kinds of
remote sensing data ranges.
Neural networks is of special interest to today’s remote sensing where, the problem is no
more absence or insufficiency of data but accumulation of multi-scale, multi-source, and
multi-temporal data. It is of high importance to incorporate the information found in these
data originated from different media and scale in order to achieve a better, higher
accuracy classification. There are some limitations in using conventional parametric
(statistical) classifiers like maximum likelihood, such as, the need for normal (Gaussian)
distribution in our data, the absence of flexibility in the classification process, the
inability to deal with multi-scale data without standardization of the data into the same
scale, and the inefficiency of the image classification process in terms of time. These
limitations motivated scientists to look for alternative where these drawbacks could be
compromised. Neural networks are found to be one of the soundest choices since they are
very appropriate for image classification due to their processing speed, easiness in
dealing with high-dimensional spaces, robustness, and ability to deal with a varied array
of data despite of the variation in statistical distribution, scale and type of data (Vassilas
et al 2000).
3
Like parametric classifiers neural classifiers offer two kinds of classification, supervised
and unsupervised neural network classification; both have their own advantages and
disadvantages in regards to using them for image classification. However the advantages
of one over the other depend on the type of data we have, time and expertise.
In this study the ANN’s are used in processing and classification of multispectral remote
sensing data. This study aims at investigating the difference in accuracy of supervised
and unsupervised neural network classifiers, and at evaluating significance in the
difference between the two classifiers.
1.2 Study area
The study area is located in the North West part of Ethiopia. In administrative sense the
area is found in Amahara Regional State between Gojam and Gonder provinces. The area
is of high importance in terms of irrigation, hydroelectric power, tourism, Its importance
became indispensable especially after 1992, after Ethiopia has become landlocked, since
then all fish resources come only from an inland water bodies. Wudneh (1998) stated that
Lake Tana is the least exploited fish resource in the country; he also explained that the
reasons are bad road connection with the capital city, Addis Ababa, and absence of the
highly marketable fish species, Nile perch, in this lake.
The study area has some patches of forest, although not very big these forest areas fulfill
the fuel wood demand of Bahir Dar, which is the second largest city of Ethiopia. Wild
coffee production is also an essential economical activity in the forested areas. Recently
high human encroachment is noted in the forested areas. With the expanding fishery
industry, high population growing rate, and deforestation of the meager forest resource
remained in the area; degradation of this resourceful area can be easily forecasted unless
management intervention is employed. In order to manage the area in sustainable manner,
basic geographic information about the area is very important. This study will provide
basic land cover map for the area.
4
Figure 1: Map of Ethiopia
Map copyright by Hammond World Atlas Corp. #12576
Figure 2. Overview of the study area
The study area
Lake Tana
5
1.3 Objectives
The main goal of this study is three-fold
- Investigate the advantages and disadvantages of supervised and unsupervised
neural network classifiers in the field of remote sensing, in particular land cover
classification of Multispectral and multi-scale satellite images;
- Evaluate the difference in accuracy between supervised and unsupervised neural
networks image classification of Multispectral and multi-scale satellite images.
- Produce the land cover map at of the study area located at the Amahara Regional
State, Ethiopia
1.4 Research questions
Is there a significance difference in the accuracy of supervised neural network
classification and unsupervised neural network classification?
Which type of neural networks classification; supervised or unsupervised, will handle
poor quality data better?
1.5 Research Outline Chapter 1 Gives introduction to the main theme of the thesis, it also describes the study
area, objectives and research questions
Chapter 2: deals with describing the basic concepts of neural networks including
biological, historical and basic processor of neural networks
Chapter 3: covers the methodological aspect of the study, detail procedures for the neural
networks classification process is given.
Chapter 4: reports the results obtained from the data analysis and processing it also
includes discussion of the results
Chapter 5: contains conclusions made out of the results obtained.
Chapter 6: gives recommendations on how results from this study can be improved
and/or applied.
6
2 Artificial Neural Networks (ANN)
2.1 overview of the main concepts
2.1.1 Biological concepts
The major source of inspiration for artificial neural networks creation is the human brain;
arguably the most powerful computing engine in the known universe. Both from a
computational and energy perspective, the brain have an enormously efficient structure
(Alavi, 2002). The most basic element of the human brain is a specialized cell, which is
called the neuron. The brain consists of some 100 billion neurons that are massively
interconnected by ‘synapses’ (estimated at about 60 trillion), and which operate in
parallel.
In order to understand the basic operation of the brain, it is necessary to know in detail
about the neuron. This was originally undertaken by neurobiologists, but has lately
become an interest of physicists, mathematician and engineers. As a background to this
study it is enough to review that each neuron cell, consists of a nucleus surrounded by a
cell body (soma) from which extends a single long fiber (axon) which branches
eventually into a tree-like network of nerve endings which connect to other neurons
through further synapses. This is illustrated in figure 3.
Figure 3: Signal path of a single human neuron
7
Information is transmitted from one neuron to another by a complex chemical process,
based on sodium-potassium flow dynamics, whose net effect is to activate an electrical
impulse (action potential) that gets transmitted down the axon to other cells. When this
happens, the neuron is said to have fired. Firing only occurs when the combined voltage
impulses from preceding neurons add up to a certain ‘threshold’ value. After firing, the
cell needs to ‘rest’ for a short time (refractory period) before it can fire again (Alavi,
2002).
The brain is understood to use massively parallel computations where each computing
element (the neuron) in the system is supposed to perform a very simple
computation(Roy, 2000). The basis of the Artificial Neural Networks also came from this
understanding; where each node (the analog for the neuron in our brain) performs this
simple computation, however building complex parallel computations with other
neurons.
2.1.2 Historical development
The story of neural networks can be traced back to a scientific paper by McCulloch and
Pitts, published in 1943 that described a formal calculus of networks of simple computing
elements (Anderson et al, 1992). Many of the basic ideas developed by them survive to
this day. The next big development in neural networks was the publication in 1949 of
The Book The Organization of Behavior by Donald Hebbs (Alavi 2002). Hebbs argued
that if two connected neurons are simultaneously active then the connection between
them should strengthen proportionally, which means the more frequently a particular
neural connection is activated, the greater the weight between them. This has implications
for the machine learning, since those tasks that had been better learnt had a much higher
frequency (or probability) of being accessed. This gave a clear definition to the learning
process by indicating that learning occurs by the readjustment of weight connections
between neurons.
8
In the late 1950s, Rosenblatt (Alavi, 2002) developed a class of neural networks called
the perceptron. He furthermore introduced the idea of ‘layered’ networks. A layer is
simply a one-dimensional array of artificial neurons. Most current problems to which
ANNs are applied to, use multi-layer networks with different kinds of interconnections
between these layers. The original perceptron, however, was simply one-layer
architecture (Chen, 2000). As a result this architecture has not been able to deal with
almost all of the complex problems in various fields of studies, which use neural
networks.
Furthermore, Rosenblatt developed a mathematical proof, the perceptron Convergence
Theorem that showed that algorithms for learning (or weight adjustment) would not lead
to ever –increasing weight values under iteration (Alavi, 2002). However, this was
followed by a demonstration in 1969 (by Minsky and Papert) of a class of problems
where the Convergence Theorem was inapplicable. This work led to a considerable
downsizing of interest in neural networks, which was to continue until the early 1980’s
(Alavi, 2002).
In 1982, John Hopfield, a Nobel Prize winning Caltech physist, developed the idea of
‘recurrent’ networks, i.e. one that has self-feedback connections. The Hopfield net, as it
has come to be known, is capable of storing information in dynamically stable networks,
and is capable of solving constrained optimization problems (such as the algorithm,
which showed that it was possible to train a multi-layer neural architecture using a simple
interactive procedure). These two events have proved to be the ones most responsible for
the revival of interest in neural networks in the 1980s, up to the explosive growth
industry that is today shared between physicists, engineers, computer scientists,
mathematicians and even psychologists and neurobiologists (Alavi, 2002, Anderson et al
1992, Roy, 2000)
All in all the neural networks science faced a lot of ups and downs before evolving to its
present state; the fact that it involves modeling the human brain raised a lot of moral
9
U
θ
issues that lagged the pace of its development. The whole historical development of
neural network is given on the report “Artificial Neural Networks Technology” 1
2.1.3 Basic neural network processor
An artificial neuron is a simple computing element that sums input from other neurons; a
network of neurons is interconnected by adaptive paths called ‘weights’; each neuron
computes a linear sum of the weights acting upon it, and gives outputs depending on
whether this sum exceeds a preset threshold value or not (see Figure 6). A positive value
of the weight increases the chance of a 1, and is considered excitatory; a negative value
increases a chance of a zero and considered inhibitory (real biological neurons have this
property too, but with analogue output values rather than binary ones)(Alavi, 2002).
Figure 4: The basic neural networks processor; the neuron, and its functions.
The basic functions of each neuron in the whole network are, to evaluate all the input
vectors directed towards the neuron, to return or calculate the sum of all inputs, then
1 The report can be found at the site: http://www.dacs.dtic.mil/techs/neural/neural_ToC.html or the PDF version: ftp://192.73.45.130/pub/dacs_reports/pdf/neural_nets.pdf
10
compare the sum of the inputs to the threshold1 value at the neuron (node), and lastly
determine the output through the non-linear function provided at the neuron (Chen,
2000). The output can be an input for the next node in the next layer or could simply be
the final output depending on the architecture and learning rule of the network the neuron
belongs to. These four distinctive functions must be carried out at a neuron level for the
network to learn properly according to the specified function.
Even though the mechanism seems simple at a neuron level the way all the neurons
interact through the weight adjustment in the process of learning makes the whole set up
complex which enables them to solve real complicated problems in organized way. The
mathematical representation is given as follows,
( )zy iif= ....................................................................... (1)
θ iijiji xwz −= Σ * ………………………………….. (2)
Where:
i represents a simple neuron processing a particular learning
Zi is assumed to be a real valued input
Yi is either a binary or a real valued output of the ith neuron
f is a non-linear function; the function is also called a node function
Wij represents the weight connected, explaining the strength of the xij input.
Xij are a series of inputs to the ith neuron.
θI is the threshold value of the ith neuron.(Roy, 2000)
The most important issues in neural networks are training and design of networks.
Training involves determining the connection weights (Wij) and the threshold values (θi)
from a set of training samples. Network design is concerned with determining the layers, 1 For a neuron to produce an output the sum of the weights of all the inputs should exceed the threshold value (θ) see Figure 6
11
the number of neurons in each layer, the connectivity pattern between layers and neurons
and the mode of operation e.g. feed back vs. feed forward (Roy, 2000).
2.1.4 Neural networks and image classification
Applying the ANN technology in remote sensing is a very recent phenomenon. Among
other reasons, the generation of extremely varied remote sensing data is the main reason
for the consideration of ANNs for image classification.
There are a lot of advantages ANNs offer to image classification. Primarily neural
networks are characterized with much faster classification time compared to the
conventional statistical image classifiers, the time that takes to train the networks depends
on the size of training data presented to the network; classifying the whole image with an
already learned network is generally much faster than any other image classifier
available. The other very important feature of neural networks, which is very helpful in
image classification, is the ability to incorporate other ancillary and GIS information with
the spectral information in image classification process. This will enable us to use all the
information we have about the area other than the spectral information from the image,
which increases the accuracy of the classification result. Yet the major use of neural
networks is possibility of using different scale data together in a single classification
process. For instance, the thermal band of the Landsat TM is not usually used in the
maximum likelihood image classification due to its different ground resolution than the
visible and the infrared bands. This will not be a problem when it comes to neural
networks since they have the ability to analyze multi-scale data. Neural networks learn
the pattern and relation within the input vector, so the resolution or data structure of each
individual input does not affect the image classification process. Similarly multi-date or
temporally different data can also easily be analyzed in a single image classification
process, hence the network maximizes the amount of information used in the
classification by learning patterns from the different date images. Last but not least, the
12
neural networks have the ability to deal with data regardless of the statistical distribution
of the dataset. This distribution-free nature of neural networks will allow us to deal with
various remote sensing data that do not have the normal (Gaussian) distribution.
Berberoglo et al (1999), Fauzi et al (2001), Kumar et al (2001) and Luo et al (2000)
provide a background to ANN with a remote sensing context. These powerful analyzing
natures of neural networks in image classification is grants a faster, more accurate and
reliable land cover classifications from remote sensing data compared to the results
obtained from any other image classifiers (Kumar et al, 2001).
There are many kinds of neural networks, which range from the simple perceptron neural
network to more developed many-layer networks many of them are used in image
classification. Some of the commonly used neural networks are: Adaptive Resonance
Theory (ART), Multi-layer Perceptrons (MLP), Reduced Coulomb Energy (RCE), Radial
Basis Function (RBF), Delta Bar Delta (DBD), Extended Delta Bar Delta (EDBD),
Directed Random Search (DRS), Higher Order Neural Networks (HNN), Self Organizing
Map (SOM), Learning Vector Quantization, Counter-propagation, Probabilistic neural
network, Hopfield, Boltzmann Machine, Hamming network, Bi-directional associative
memory, Spatio-temporal pattern recognition and many others.(Roy, 2000; Anderson et
al, 1992) The uses and advantages of these networks depend on how we want to use them
and for what purpose. The most common areas are prediction, data association, data
conceptualization, data filtering and classification.
2.2 Types of neural networks
Neural networks are commonly categorized in terms of their training algorithms.
Basically there are three types of neural nets. Fixed Weights Neural Networks,
Unsupervised Neural Networks and Supervised Neural Networks(REF5, 2002). Fixed
weight NN are not very common since there is no learning involved. Supervised NN and
unsupervised NN involve training but in a completely different approach. Supervised
training involves a mechanism of providing the network with the desired output either by
13
“manually” grading the networks performance or by providing the desired output with the
inputs (Anderson et al, 1992). Unsupervised training deals with training the NN by itself.
Where the network itself should recognize the patterns with the inputs and decides on an
outcome with out any outside help.
2.2.1 Supervised neural network classifiers
2.2.1.1 Description of supervised neural network classifiers
Supervised learning networks are the main streams of neural networks development.
Most of applications using neural networks implement supervised neural networks. In
supervised training, the training data consists of many pairs of input/output training
patterns, i.e., both inputs and outputs are provided. The network analyses the inputs and
compares the result with the desired output given with the input. The network is then
enabled to calculate the error by comparison. The error will be propagated back to the
network and a better training will take place. The more the number of iteration the less
the error of the network will become. This is known as convergence; hence the output of
the network converges to resemble the desired output provided.
The network attempts to pick up the pattern provided by comparing the input and desired
outputs and will approximate its outputs to the learned pattern. However these might not
always happen, there are factors which disable a network from learning properly, the
most important ones are: when the data provided is not sufficient or when it does not
contain the kind of information or pattern needed to solve the problem involved.
A test data, which has not been used in the training process, should always be set aside, to
surely ascertain the accuracy of the learning of a network. Apart from quality and
quantity of data, the design or architecture of a network and learning rule used for
training affects the rate and extent of network learning. Finally, the type and size of data
to be processed by the neural network is an important factor that should be considered
while choosing architecture and learning rules.
14
2.2.1.2 Architecture and algorithm
The Multi-layer Normal Feed Forward (MLNFF), a kind of Multi-layer perceptron
(MLP) is the commonly used architecture in supervised neural networks, however there
are also many other architectures that can be used. The MLNFF architecture is going to
be reviewed in this section because it is the architecture used in this study.
The MLNFF should have at least 3 layers the input, the hidden layer and the output layer;
it is very common to have one hidden layer, however the architecture can have more than
one hidden layer according to the data size, type and the kind of application it is going to
be used for. There is no maximum number of hidden layer limited for the architecture,
however care should be taken while structuring the data, since too many layers induce
over learning or memorization of the training data, which will make the network useless,
to be used on new data. The basic network design of the MLNFF architecture is shown
below. The more complex our data and relation between input and output classes gets the
greater number of layers we need to solve the problem.
Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture
Source: Carol E. Brown and Daniel E. O'Leary 1995
15
Several algorithms can be used with this architecture to perform image classification.
One the most used algorithms is the back propagation algorithm (Kulkarni, et al, 1999).
Back propagation learning rule is the most popular, effective and easy-to-learn for
complex multi-layered network, In fact this network is used more than all the other
networks combined (Anderson et al, 1992). Its greatest advantage is the ability to provide
non-linear function solutions to problems. The back propagation has been developed by
many researchers through time; hence the algorithm representation has a slight difference
in various literatures or sources.
The learning procedure is given as follows:
For a network, which has, 3 layers composed of the input, hidden and output layer;
Let: L1 represents the input layer
L2 represents the hidden layer
L3 represents the output layer
The number of neurons (processing elements, units) in the input layer represents the
number of input data used. The number of neurons in the output layer represents the
number of land cover classes into which the image is going to be classified. The number
of processing elements to be assigned in the hidden layer has no clearly set rule for this
study Kolomogrov’s theory which states the number of neurons in hidden layer should be
2N + 1 where N is the number of nodes in the input layer (Rangesneri et al, 1998).
The number of neurons assigned in the hidden layer affects the performance of the
network directly. The presence of many neurons might lead the network to memorize the
training set instead of learning. If memorization takes place the network will not
generalize the pattern it learned, but it will only recognize the pattern from the training
set which will make it useless since it will not be able to classify the whole image
(Anderson et al, 1992).
The net input and output for neurons in layer L2 and L3 is given by
16
woutnet ijji ∑= 1 ……………………………………….. (1)
Where neti is the net-input and outj is the output of the unit j in the preceding layer, wij is
represents the weight between the units i and j.
( )[ ] ∅+−+= netout ijexp1/1 …………………………. (2)
Where outi is the output of neuron i and ∅ is a constant. The network works in two
phases the training phase and the decision making phase. During the training phase
weights between layers L1 – L2 and L2 – L3 are adjusted so as to minimize the error
between the desired and the actual output.
The back propagation learning algorithm is described below.
1. Present a continues valued input vector X = (x1, x2, …..xn)t layer L1 and obtain the
output vector Y = ( y1, y2, …..., ym)t at layer L3. In order to obtain the output vector Y,
calculation is done layer by layer from L1 to layer L3.
2. Calculate change in weight. In order to do this, the output vector Y is compared with
the desired output vector or the target vector d, and the error is then propagated backward
to obtain the change in the weight ∆wij that is used to update the weight. ∆wij for weights
between layers L2, L3 is given by:
ww ijijE ∂∆ −∂= / ……………………………………….. (3)
This can be reduced to
οδα jiijw =∆ …………………………………………… (4)
Where α is a training rate coefficient (typically 0.01 to 1.0), οj is the output of neuron k in
layer L3 and δi is given by: 1 Algorithm information on Back propagation is modified from (Kulkarni, 1999)
17
( )odnetnet
iii
iF −⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛∂= ∂δ ……………………………. (5)
= ( )( )odoo iiij−−1
In equation (7) oi, represents the actual output of neuron I in layer L3, and di represents
the target or the desired output at neuron I in layer L3. Layer L2 has no target vector so
equation (5) can not be used in layer L1. The back propagation algorithm trains hidden
layers by propagating the output error back, layer by layer, adjusting weights at each
layer. The change in weights between layers L1 L2 can be obtained as:
δβHijij ow = ……………………. (6)
where β is a training rate coefficient for layer L1, (typically 0.01to1), oj is the output of
neuron j in layer L2, and δHi :
( ) wdoo ikk
kiiHi ∑−= 1δ …………………. (7)
in equation (7) , oi is the output of neuron i in layer L1, and the summation term
represents the weighted sum of all d values corresponding to neurons in layer L3 that are
obtained by using Equation (8).
3. Update the weights
( ) ( ) www ijijij nn ∆+=+1 ……………………………… (8)
Where ( )1+nwij represents the value of the weight at iteration n+1 (after adjustment),
and wij(n) represents the value of the weight at iteration n.
4. Obtain error ε for neurons in layer L3.
18
( )2∑ −= do iiε ………………………………… (9)
If the error is greater than some minimum εmin (user defined, depends on the accuracy
needed), then repeat step 2 through 4: otherwise terminate the training process.
2.2.2 Unsupervised neural networks classifiers
2.2.2.1 Description of unsupervised Neural Network classifiers
Unsupervised neural networks are networks, which organize the input vector into similar
groups by self –learning pattern of the input vector. This is a major limitation in
supervised neural networks, where training with a lot of input and output example is
necessary. Even though, learning a network with an existing example set gives reliability
for the result, it will not be handy for some cases where training data is not available; in
this case unsupervised or adaptive neural networks will be of a great help. The learning
rule of unsupervised neural networks is supposed to perform learning in unsupervised or
self-organizing manner (Chen, 2000). This leads to a relevant output by learning patterns
from the redundant training data. In Unsupervised neural networks, only input vectors are
presented to the network, and the network adjusts its own weights without any additional
(external) help to decide what particular output to assign to a given input. Usually
unsupervised neural networks classify input data into distinct or discrete groupings.
Unsupervised neural networks can be ideal where seemingly uncorrelated data has to be
classified and most importantly when there is no training (example) data available (Alavi,
2002).
There are two major ways of unsupervised learning; Competitive Learning Networks, and
Self Organizing Feature Maps (SOFM). Competitive Learning Networks involve a
process in which output layer neurons compete among themselves to acquire the ability
to fire in response to given input vectors (patterns). The basic learning rules to perform
unsupervised learning are the Hebbian and the competitive rules, both inspired by
neurobiological considerations (Chen, 2000). When an input pattern is presented, a
winning output neuron K is selected and activations are reset, such that:
19
Y = K = 1 and Yj ≠K = 0
The output layer is referred as Winner-Take-All. The other type of unsupervised network,
which is only slightly different from competitive layer is Self-Organizing Feature Maps
(SOFM), or sometimes known as Auto Associative networks (Anderson et al, 1992). The
leading researcher into unsupervised learning, Tuevo Kohonen, developed this network.
It relies on the use of competitive learning but with different emergent properties. Its
unique feature is preserving the topology of the input vectors (pattern). The SOFM are
intended to map a continuous high dimensional space into a discrete one- or two-
dimensional space, (Chen, 2000).
2.2.2.2 Architecture and algorithm
In this study a SOFM network will be used for the unsupervised neural network
classification, hence the review focuses on the architecture and learning rules of a SOFM
network. The typical architecture of unsupervised neural networks (SOFM) comprises
two layers that are the input layer and the output layer (figure 8).
Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer
Source: (REF4, 2000).
In a SOFM network there are two kinds of weight connections (REF5, 2002): Feed
forward, which is between input layer and the output layer, and a lateral feedback weight
connection within the output layer. In the feed forward connection, the weight connection
20
is usually excitatory which activates the neurons in the output layer to be active so that
their weight gets updated. In the other hand the Lateral feedback or the weight connection
within the output layers is inhibitory, which lags the neurons to be activated. Therefore
all the neurons do not get updated, instead only the winner neuron from the lateral feed
back connection or competitive layer will get updated. This learning rule of the output
layer of unsupervised neural networks is called Winner Take All (WTA).
WTA learning rule is the most widely used learning rule for SOFM. As its name implies,
only the winner neuron from the output layer will be activated which will get its weight
updated. This rule is common both for SOFM and other Competitive Learning Networks.
What makes SOFM unique is; it is not only the winner neuron that gets its weight
updated but also its neighbors neurons get their weight updated, so that in this way as it is
mentioned earlier, the SOFM will be capable of preserving topology of the output layer
with respect to the input layer. This is very important when dealing with geo-spatial data
where topology is very important. The mechanism in which the winner is selected is by
looking into the distance between the neuron in the output layer and the input vector.
There are many kinds of distances considered in this case; the widely used distance is
Euclidean distance.
The WTA learning rule procedure can be described as following:
1. Select a winner neuron, with the smallest Euclidian distance:
wjx − ……………………………. (1)
Where Wi denotes the weight vector corresponding to the ith output neuron
2. Let i* denote the index of the winner and let I* denote a set of indexes corresponding
to a defined neighborhood of winner i*. Then the weights associated with the winner and
its neighboring neurons are updated by:
21
( )ww jjx −=∆ η ………………………………. (2)
For all the indices *Ij∈ and n is a small positive learning rate.
The amount of updating may be weighted according to a pre-assigned ‘neighborhood
function’, ( )*, ijΛ .
( )( )ww jjxij −−Λ=∆ *η …………………….. (3)
for all j.
For example, a neighborhood function ( )*, ijΛ may be chosen as
( )( )ww jjxij −Λ=∆ *,η
Where ( ) ⎜⎜⎝
⎛⎟⎟⎠
⎞=Λ
−−
σ2exp*, 2
2
*rrij ij represents the position of the neuron j in the output
space. The convergence of the feature map depends on the proper choice of rj. One
choice is that η =1/t. The size of the neighborhood (or σ) should decrease gradually as
shown in the next figure:
Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer.
3. The weight update should be immediately succeeded by the normalization of Wi.
The rate at which the weight of a winner neuron is updated depends on a small positive
constant which is user defined. It is referred as alpha (α).
22
In some cases another constant theta (θ) is applied to the network to avoid neurons, which
never get their weight updated. The θ the rate at which a neuron that did not win loses, it
represents the losing rate. When there is no losing rate θ is set to zero.
Although a design containing two layers is the mostly used architecture, it is possible to
use the Kohonen layer or the Self Organizing Maps (SOFM) as a hidden layer by
providing an extra output layer usually a Learning Vector Quantitization (LVQ) layer,
which helps in unsupervised classification incase of complex data. The same person who
developed the SOFMs, Tuevo Kohonen, created this architecture also. The architecture is
very similar to the SOFMs; it is a form of supervised learning adapted from Kohonen
unsupervised learning. It uses the Kohonen layer with the WTA transfer function, which
is capable of sorting items into similar categories (Anderson et al, 1992). However there
are some important modifications added to this architecture, which makes it more robust
to handle classification and image segmentation problems.
Figure 8: Design of Learning Vector Quantitzation Architecture
Source: Anderson et al, 1992
Learning Vector Quantization classifies its input data into classes that are determined by
the user. Essentially, it maps an n-dimensional space into an m dimensional space. We
can refer to this learning rule as a semi-unsupervised network, since it gives the freedom
of giving the number of classes we want to group the input data into. That is, it takes n
23
inputs and produces m outputs. The networks can be trained to classify inputs while
preserving the topology of the training set. This occurs by preserving the nearest neighbor
relationships in the training set such that input patterns which have not been previously
learned will be categorized by their nearest neighbors in the training data.
The training mechanism for the LVQ network is the same as the Kohonen network, the
transfer function WTA is used to process the input data in the hidden Kohonen layer,
there will be only one winner in a layer for one input vector (for each iteration). The only
extra step in the LVQ is involved in re-assigning of the output found from the Kohonen
layer into another output layer for which the number of neurons is user defined (here the
number of classes we want to classify the input vector into is given).
The LVQ output re-assigns the Kohonen layer outputs by adjusting the connection of the
weight between the output neurons and the Kohonen layer, i.e. is if a winner neuron from
the Kohonen layer is not assigned the appropriate class (the network learns in which class
the neuron should be classified after training) the connection weights entering the neuron
are moved away from the training vector, so that it doesn’t get classified into the wrong
output class. (Anderson et al 1992). This network is of special interest for this study since
it is the most appropriate network for image classification purpose due to its topology
preserving nature.
24
3 Methodology
The whole methodological process in this study is divided into three main procedures.
The first procedure comprises:
Preparation of data collected from field into an appropriate format to be used as
input in the image classification process.
Preprocessing of the main dataset to be used and standardization of data types so
as to make them compatible during the classification procedure.
Preparation of training and testing sets from the field data and subsets of the input
dataset.
The second procedure deals with performing the supervised and unsupervised neural
network classifications, first by classifying the training dataset and then classifying the
whole image once the classification accuracy is satisfactory
The last step of the methodology deals with the validation of the neural networks
classification. After the accuracy assessment, validation is carried out, in order to make a
concrete conclusion on the analysis performed.
To explain shortly how the whole process is executed: the primary datasets, ASTER and
Landsat TM images, were pre processed and standardized. Training/test sets were
prepared from field data and the primary datasets, then data transformation was carried
out, this part of the process took considerable time because of the large size of the data
used (the images). The image data were transformed into ASCII file format in order to
make the data compatible with the neural networks processor. Then both supervised and
unsupervised neural network classification is carried out. After evaluating the outcome of
the training data, the whole image is classified by the networks learned from the training
set. Then classification resulted from supervised and unsupervised classifiers are
compared. The overall process is illustrated in Figure 11.
25
Tran
sfor
mat
ion
Figure 9: Overview of the main procedures involved in the methodological process
Intermediate corrected data
Intermediate ASCII format file
Main Actions (processes)
12
Comparison between supervised and unsupervised classification results
Accuracy assessment
Transformation
Accuracy assessment
Tra
inin
g
Tes
ting
Dataset
ASCII
Test Set
Unsupervised NN
Classification Result
Supervised NN Classification Result
Aster Image
NDVI StackLayer
Transformation
Training set for Supervised NN classification
Landsat
NDVI
Training set for Unsupervised NN classification
Supervised Classification
Unsupervised Classification
ASCII
ASCII No
Yes
Yes
No
Training
Training
ASCII
ASCII
ASCII
ASCII
Dataset Dataset Transformation
Transformation
Field data
Training dataset
Mask
Training/test datasets
Primary datasets
Secondary datasets
Classification result
Sensitivity analysis
26
3.1 Field data acquisition
The study area covers 1558 km2; it is found between 36.990 E and 37.400E longitude, and
11.48oN and 11.950 N latitude.
Representative ground truth samples were marked on the image for all the output classes.
The output classes are:
Arable land
Forest
Settlement
Shrub land and scrubland
Swampy area
Water
These land classes were chosen based on the major classes used for the available
topographic map of the area; the topographic map has also been used as a source of an
additional control data during the validation stage.
Ground truth sets were taken from the study area. Adequate ground truth was needed
both for the training of the network and for testing (validation) after the classification was
performed.
GPS was used to mark the geographical position of the ground control (ground truth)
points
27
3.2Data preprocessing
3.2.1 Datasets There are two primary data sets used for the study. These are:
Satellite image from TERRA satellite (ASTER).
Satellite image from Landsat 5 satellite (Landsat T
3.2.1.1 ASTER
ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) is an
imaging instrument TERRA satellite. ASTER is used to obtain detailed maps of land
surface temperature, emissivity, reflectance and elevation. It consists of three high-
performance optical radiometers with 14 spectral channels. Its spectral cannels are found
in the visible and near infrared (VNIR), the short wavelength infrared (SWIR) and the
thermal infrared (TIR) bands (REF3, 2002). The major features of ASTER are:
simultaneous earth surface images from the visible to thermal infrared, higher geometric
and radiometric resolution in each band than current satellite sensors, near infrared
stereoscopic image pairs collected during the same orbit, optics that allow the instrument
axis to move as much as + or – 24 degrees for SWIR and TIR cross talk direction from
the nadir and highly reliable cryocoolers for the SWIR and TIR sensors (Vani, 2000)
28
Table 1: Spectral range of bands and spatial resolution for the ASTER sensor
ASTER Bands Wavelength
(micrometers)
Resolution (meters)
Band 1 0.52 - 0.60 15
Band 2 0.63 - 0.69 15
Band 3 nadir looking 0.76 - 0.86 15
Band 3 backward looking 0.76 - 0.86 15
Band 4 600 - 1.700 30
Band 5 2.145 - 2.185 30
Band 6 2.185 - 2.225 30
Band 7 2.235 - 2.285 30
Band 8 2.295 - 2.365 30
Band 9 2.360 - 2.430 30
Band 10 8.125 - 8.475 90
Band 11 8.475 - 8.825 90
Band 12 8.925 - 9.275 90
Band 13 10.25 - 10.95 90
Band 14 10.95 - 11.65 90
Figure 10: ASTER bands superimposed on model Atmosphere.
Source: Jet Propulsion Laboratory (JPL), ASTER homepage.
29
3.2.1.2 Landsat TM
The Thematic Mapper (TM) sensor is an advanced, multispectral scanning, Earth
resources instrument designed to achieve higher image resolution, sharper spectral
separation, improved geometric fidelity, and greater radiometric accuracy and resolution
than the Multispectral Scanner (MSS) sensor. The TM data are scanned simultaneously in
seven spectral bands. Band 6 scans thermal (heat) infrared radiation. All TM bands are
quantized as 8 bit data (REF2, 1999) (Figure 11)
Table 2: Spectral range of bands and spatial resolution for the TM sensor
Landsat 5 Bands Wavelength (micrometers) Resolution (meters)
Band 1 0.45 - 0.52 30
Band 2 0.52 - 0.60 30
Band 3 0.63 - 0.69 30
Band 4 0.76 - 0.90 30
Band 5 1.55 - 1.75 30
Band 6 10.40- 12.50 120
Band 7 2.08 - 2.35 30
Figure 11: Landsat TM bands superimposed on model Atmosphere.
Background image source: Remote sensing Basics lecture note, Wageningen
University.
30
3.2.2 Datasets preparation
Both ASTER and Landsat TM images were geo-referenced1 according to the 1:50,000 m
Topographic map from the study area. The large water body found in the area, southern
part of the Lake Tana was removed from both images, since it is a known feature (Figure
12). Keeping this lake area would have increased data processing and analysis time
significantly.
Figure 12: Study area after Lake Tana is masked out of the image.
The Short Wave Infra Red band of ASTER image (6 in number) has a very low DN
number, which is very difficult to detect the variation. To solve this problem rescaling
was performed on the whole range of bands from visible to SWIR. (Figure 13 and Figure
14)
1 Projection and datum information can be found in appendix 1
31
Figure 13: Spectral signature of the six classes (before ASTER image rescaling)
Figure 14: Spectral signature of the six classes (After ASTER image rescaling)
Due to the large spatial extent of the study area, it is not possible to process the whole
image at once, during the analysis and preparation of train and test sets. As the software
32
used for the neural networks analysis, ThinksPro® accepts only ASCII files; it was
necessary to subset the image into four sub-study areas. This avoided extra large ASCII
files, which couldn’t have been edited by notepad, word pad or MS access for preparation
of input dataset.
3.2.3 Training and test sets preparation
Ground control points taken from the field work were merged with reference points taken
from the image and the 1:50,000 topographic maps by visual interpretation and expert
knowledge; this has increased the number of ground control points sufficiently for
adequate training and test data. The technical procedure of training and test data
preparation is shown on figure 15.
Figure 15: Training/test data preparation procedure
After the training and test data were prepared, different possible data combinations were
tested both for the ASTER and Landsat TM images to find out which combination of
bands could give better accuracy of neural networks classification. Although testing data
quality for neural network classification was not the primary objective of this study, it
GCPs + Points from Topographic Map
Arc coverage
Grid
ASCII
Training data Test data
+ Points from Image
Landsat TM image
Aster image +
Training data image Image Mask
Stac
k
Transformation
Randomization
Shape file
Shapearc
Pointgrid
Import to Image
33
was necessary to find out the best band combination, in order to get the best out of the
available information, since the classification was based on only spectral information.
Therefore three pre-classification training and test set evaluation were made for the
ASTER Image, the Landsat TM image and for the combination of ASTER and Landsat
TM images respectively.(see tables below for more details)
Table 3: Training and test sets for ASTER dataset.
No No of bands
used
Type of bands used Additional
information
Total No of input
Visible Near infrared SWIR
1 9 2 1 6 NDVI 10
2 7 2 1 4 NDVI 8
Table 4: Training and test sets for Landsat TM dataset
No No of bands
used
Type of bands used Additional
information
Total No
of input Visible Near
infrared
Mid
infrared
Thermal
infrared
1 7 3 1 2 1 NDVI 8
2 6 3 1 2 - NDVI 7
Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets
No No of
band
Type of bands used Additional
information
Total No of input
ASTER Landsat TM
Vis
ible
Nea
r inf
rare
d Sh
ortw
ave
infr
a red
Vis
ible
N
ear i
nfra
red
Mid
infr
ared
Ther
mal
1 14 2 1 4 3 1 2 1 2 NDVI 16
The images were changed into an ASCII file in order to make them format compatible
with the neural network processors of ThinksPro where the values from the spectral
34
bands were fed as an input vector to the neural network. The neural network will map the
feature space of the input (image data) into a category space, which in our study consists
of land cover classes. The dimension of the feature space equals to the number of spectral
bands provided.
3.3 Supervised neural network classification
The Multi-layer Normal Feed Forward (MNFF) Architecture was used for the supervised
classification. This architecture is a typical example of Multi-layer Perceptron (MLP)
architecture. The network comprises three layers, input layer, one hidden layer and output
layer. Both the hidden and output layers have a BP1 learning rule.
The network design is shown in Figure 16.
Figure 16: Design of Back Propagation Neural Network
The assignment of the number of nodes in each layer architecture used for the Back
Propagation Neural Network can be:
• The nodes and transfer and input functions vary according to the different input
dataset tested.
• The number of nodes in the input layer equals the number of inputs
• The number of nodes in the hidden layer is assigned based on Kolomogrov theory
2N+1 where N is the number of input nodes (Rangsaneri et al, 1998)
1 All the network parameters used both for supervised and unsupervised NN classification are listed in Appendix 2
Bac
k pr
opag
atio
n of
err
or
Lear
ning
dire
ctio
n
35
• The number of nodes in the output layer equals to the number of output classes,
which is 6 for this study.
Table 6 illustrates the five data sets created for the classification using the back
propagation neural network. Since the supervised classification was needed for a
comparison with the unsupervised classifier, a neural network that has proved to be a
good image classifier was needed. Back propagation was chosen because it fulfills the
above criteria. Both the hidden and the output layers of the supervised network were set
to the back propagation learning rule. All the data set described in Table 3,4, and 5.were
used for the supervised neural network classification. The percentage accuracy of each
data combination ( input vector) was recorded in order to choose the best result for the
final image classification.
Table 6. Training data set up for the Back Propagation neural network
3.4 Unsupervised neural networks classification
The Multi-Layer Normal Feed Forward Architecture (MLNFF) was used for the
unsupervised classification. The Kohonen Winner Take All (KWTA) and Learning
Vector Quntitization (LVQ) Learning rules were used for the unsupervised NN
classification. The network design comprises of 3 layers; input layer, 1 hidden layer with
Trai
ning
set
Dat
aset
Arc
hite
ctur
e
Lear
ning
rule
Lear
ning
rule
NO
of h
idde
n
laye
rs
N0
of
inpu
t
No
of h
idde
n
node
s
No
of o
utpu
t
node
s
1 ASTER Multi-Layer NFF Back prop. Back prop. 1 10 21 6
2 ASTER Multi-Layer NFF Back prop. Back prop. 1 8 17 6
3 Landsat TM Multi-Layer NFF Back prop. Back prop. 1 8 17 6
4 Landsat TM Multi-Layer NFF Back prop. Back prop. 1 7 15 6
5 ASTER
+LSTM
Multi-Layer NFF Back prop. Back prop. 1 16 33 6
36
the KWTA learning rule and output layer with LVQ learning rule. The network design is
shown in Figure 17.
Figure 17: The design of the KWTA/LVQ network.
The assignment of the number of nodes in each layer is similar to the supervised
classification. Table 7 illustrates the five data sets created for the unsupervised image
classification using KWTA/LVW Network. The hidden layer of the unsupervised
network is Kohonen WTA, while the output layer is LVQ. These learning rules are
chosen because they are topology preserving in their nature, which is very appropriate for
the kind of data we are dealing with.
Table 7. Training data set up for the Kohonen Winner Take All/LVQ network
Trai
ning
set
Dat
aset
Arc
hite
ctur
e
Lear
ning
rule
NO
of
hidd
en
N0
of
inpu
t
node
sN
o of
hi
dden
No
of
outp
ut
node
s
1 ASTER Multi-Layer NFF Kohonen WTA and LVQ 1 10 21 6
2 ASTER Multi-Layer NFF Kohonen WTA and LVQ 1 8 17 6
3 Landsat TM Multi-Layer NFF Kohonen WTA and LVQ 1 8 17 6
4 Landsat TM Multi-Layer NFF Kohonen WTA and LVQ 1 7 15 6
5 ASTER
+LSTM
Multi-Layer NFF Kohonen WTA and LVQ 1 16 33 6
Input layer
Hidden layer KWTA
Output layer LVQ
37
3.5 Accuracy assessment and validation
The accuracy or performance of the supervised neural network was evaluated by the
built-in testing mechanism of ThinksPro. While training a network, a set of pair of test set
having input and desired output set, were given to the network for the evaluation of the
correct learning percentage. Validation was then carried out by comparing a set of ground
control points with the results of the neural network classifier. A confusion matrix and
table of accuracy percentages were generated based on the validation results.
3.6 Sensitivity analysis
Sensitivity analysis is a method, which helps to determine the importance or contribution
of each input towards the generation of the final output. This information will enable us
to determine, which input is more important, or provides more information to the over all
image classification process. It is stated in the ThinksPro guide that eliminating inputs
that have little effect can improve the performance of the neural network on test data;
since lowering the input dimension, can enhance generalization (Logical Designs, 1996) .
Sensitivity analysis can be used as a decision making tool to separate useful inputs from
noise
For this study a built-in- procedure in ThinksPro (software for neural network processing)
was used to carry out the sensitivity analysis. There are many ways of calculating the
sensitivity analysis, the method used in ThinksPro is replacing each input by its average
value over the training set and calculate its effect over the output, then the magnitude of
the output change is then averaged over the whole training set, this is done for all the
inputs in the training set. Finally the effect of each input is reported given in the log file
(Logical Designs, 1996)
3.7 Implementation Aspects
Five software packages were used for the implementation of the methodology described
in the previous sections. They are listed below:
Arc/info: was used to standardize projection information for all the dataset including
preparation of the training and test sets;
38
Arcview: was used for the production and visualization, of training/test sets and land
cover map.
ERDAS imagine: was used for image pre-processing and transformation of image into
the ASCII file format.
ThinksPro: was used for neural network processing
GPS: was used to retrieve geographical position of ground control points during the
fieldwork
39
4 Results and Discussion
4.1 Accuracy of Back Propagation classifier trained with ASTER or
Landsat TM datasets
As explained in the previous chapter different combinations of bands of ASTER were
tested in order to find out the best input vector. The input vector plays an important role
because it should provide the appropriate for the neural network perceptrons and as a
result it generates a classification. Table 8 shows how the two classifications cases
carried out on the ASTER image using the back propagation classifier had a very
significant difference in accuracy. The first network trained with 3 VNIR, 6 SWIR and 1
NDVI inputs resulted in 66.60 %, where as the second network trained with the 3 VNIR,
4 SWIR and 1 NDVI (after the last 2 SWIR bands of the ASTER image are removed)
resulted in 82.64% correct training.
Table 8: Accuracy of the back propagation classifier using ASTER data
The big leap in accuracy can be explained by the noise reduction on the input data. In
other words the two last SWIR bands which were removed from the second neural
network (case 2 in Table 8) can be considered as noise, since the recorded value or (DN)
value of the bands were very poor. Most of the pixels of the image were represented as
zero for these bands (more than 75 % was zero or blank). The presence noise in the input
vector affects the overall accuracy of the neural networks performance.
Table 9 shows the Landsat TM band combination datasets used to carry out the training
and testing of the back propagation classifier. In the first case the back propagation
Case Data type: ASTER Correct %
Training
Correct %
test
Error
training
Error test
1 3 VNIR, 6 SWIR and 1 NDVI 66.60 65.78 0.134 0.137
2 3 VNIR, 4 SWIR and 1 NDVI 82.64 74.67 0.223 0.258
40
classifier was used to perform training and testing having as the input vector containing
NDVI data and all bands of Landsat TM The accuracy resulted in 79.87% correct training
and 73.73 correct test. In the second case, the back propagation classifier was used to
perform training and testing having as the input vector containing NDVI data and all
bands of Landsat TM except the thermal band. The Accuracy results were very low and
resulted in 57.54% correct training and 56.57 % correct testing.
Table 9: Accuracy of the back propagation classifier trained with Landsat TM data
There accuracy obtained from the network trained with the thermal band included, is
significantly high in comparison to the same classifier trained without the thermal band.
This confirms one of the most important advantages of using neural networks for image
classification, the possibility to use a multi-scale data in a classification process. Usually
the thermal bands are not used in many of the conventional parametric classification
methods due to their low spatial resolution, which is different from the VNIR bands. The
neural network overcomes this problem because of its multi-scale nature where different
resolution bands can be used as an input for image classification. This obviously enables
the use of the information in the input vector that would have been lost within the thermal
band.
4.2 Accuracy of Back propagation classifier trained with ASTER and Landsat TM input dataset
The back propagation network trained with the combination of the two datasets and their
NDVI derivatives showed a better result (85.07%) than any of the results obtained from
the networks trained with only one of the datasets as shown on the table below.
Test Data Source: Landsat TM Correct %
Training
Correct %
test
Error
training
Error test
1 4 VNIR, 2 MIR, 1TIR and 1 NDVI 79.87 73.73 0.057 0.078
2 4 VNIR, 2 MIR and 1 NDVI 57.54 56.57 0.146 0.151
41
Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat TM combined datasets.
The result shows that the maximization of the already high accuracy training obtained
from the ASTER and Landsat TM data individually was possible. This explains the
ability of a neural network classifier to extract information from multi-source datasets.
The advantage of this approach is two fold; it provides a quicker and efficient mechanism
to use different datasets in a multispectral image classification, while offering a very high
flexibility during the image classification process. Here to explain flexibility; for
instance, in the image classification process; any particular input data within the input
vector can be easily removed from the network if later considered a noise, or similarly
additional data can be included to the network if needed at a later stage of classification
after the process has started. This obviously avoids time loss for preparation of multiple
input vectors with different input data sources. In other words instead of going over
training and data set preparation every time we want to try more or less number of inputs.
4.3 Accuracy of Kohonen/LVQ classifier trained with ASTER or Landsat datasets Table 11 shows the results of the Kohonen/LVQ classifier trained with ASTER data
confirmed the previous results found using the Back propagation classifier for the same
dataset. The Kohononen/LVQ classifier trained with the noise-reduced dataset of ASTER
image gave a better result than the one with all SWIR bands of ASTER. This indicates
that the noise redaction affected the unsupervised classifier as well. A significant
decreasing in error and increasing in correct test percentage was observed for the training
done after the last two SWIR bands of ASTER were removed.
Data source both ASTER and Landsat
TM
Correct %
Training
Correct %
test
Error
training
Error test
7 VNIR, 6 SWIR, 1TIR and 2 NDVI 85.07 77.14 0.034 0.069
42
Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data
Test Data source: 1-2 ASTER Correct % Test Error test
1 3 VNIR, 6 SWIR and 1 NDVI 27.42 0.242
2 3 VNIR, 4 SWIR and 1 NDVI 42.16 0.193
Table 12 shows the correct training percentage of the Kohonen/LVQ classifier trained
with the Landsat TM data. Once again the unsupervised classifier confirmed the result
found from the supervised one. The data set with out the thermal band gave less result
than the classifier trained with the thermal band
Table 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data
Test Data source: Landsat TM Correct % Test Error test
1 4 VNIR, 2 MIR, 1TIR and 1 NDVI 37.03 0.210
2 4 VNIR, 2 MIR and 1 NDVI 29.19 0.236
4.4 Accuracy of Kohonen/LVQ classifier trained with ASTER and Landsat TM combined datasets The Kohonen/LVQ classifier trained with the combination of bands from the ASTER and
Landsat TM datasets gave better result compared to the results obtained from the datasets
individually (Table 13). The classifier also returned the lowest error value compared to
all the other unsupervised trainings carried out; hence an indication that the network
benefited from the merged input vector, which provided more information that helped to
better detect patterns in the input.
Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM data
Data source: ASTER and Landsat TM Correct % Test Error test
7 VNIR, 6 SWIR, 1TIR and 2 NDVI 47.15 0.176
43
In the unsupervised classification only correct test set percentage is given, correct
training evaluation is not available since desired output is not given in the training of the
unsupervised classification.
4.5 Validation of the results obtained from the back propagation supervised neural networks classifier The validation was carried out by using a set of test ground control points that were not
used in the training process. Validation was also carried for the classification of the
entire image that resulted in the highest correct training percentage (since the most
accurate classifier will be used for the final land cover classification of the study area) In
this case. the classifier trained with the combined data from the ASTER and Landsat TM
image. A total of 1264 points were used in the validation process.
The confusion matrix for the supervised back propagation neural network classification is
given in the table below.
Table 14: Confusion matrix for the Back propagation network classification using ASTER and
Landsat TM images
Ground control
Classified as Correct %
Agriculture
ω1
Forest
ω2
Settlement
ω3
Shrub
Ω4
Swamp
ω5
Water
ω6
ΣX
Agriculture (Ω1) 372 0 0 25 3 0 400
Forest (Ω2) 0 102 0 0 1 1 104
Settlement (Ω3) 23 0 0 31 6 24 84
Shrub (Ω4) 14 0 0 402 0 0 416
Swamp (Ω5) 14 1 0 19 134 3 171
Water (Ω6) 4 0 0 8 11 66 89
ΣY 427 103 0 485 155 94 ΣΣX = 1264
44
Accuracy assessment formula: example for class ω1 (agriculture)
Accuracy for each class = (ω1, Ω1)*100/ ΣX
Error of omission = 1-Accuracy (Producers Accuracy)
Error of commission = (Ω2+ Ω3+ Ω4+ Ω5+ Ω6) *100/ ΣY (users accuracy)
Overall accuracy = Σ (ω1, Ω1),(ω2, Ω2),(ω3, Ω3),(ω4, Ω4),(ω5, Ω5), (ω6, Ω6)* 100
ΣΣX
The validation of the result of the back propagation classifier revealed that one of the
classes, ‘settlement’ is not classified at all, while all the other classes are classified with a
very high accuracy. (Table 15) ‘Forest’ is the best classified class in this case, with 98%
class accuracy; and 0.97% error of commission, this ascertains that the land cover map,
which will be produced from this classification, will be useful and basic for studies
concerning forest cover in the study area. The limited number of ‘settlement’ class
ground control points used to train the network explains why the classifier could not
recognize the pattern for the settlement class. The result also indicates that the quality and
the size of the training data affect accuracy during training of the neural networks.
Table 15: Percentage accuracy of the classes of the supervised classified image using the ASTER
and LandastTM data source
Class Class accuracy Error of omission
(Producers accuracy)
Error of commission
(Users accuracy)
Agriculture 93 7.00 12.88 Forest 98.1 1.92 0.97 Settlement 0 100.00 0.00 Shrub 96.6 3.37 17.11 Swamp 78.4 21.64 13.55 Water 74.2 25.84 29.79 Overall accuracy = 85.13%
45
4.6 Validation of the results obtained from the Kohonen/LVQ unsupervised neural network classifier
For ease of comparison of the results from the two classifiers, the same size of test points
used to validate the supervised classifier result was used for the validation of the result
from the unsupervised classifier. The validation was carried out for the classification that
resulted in the high correct training percentage, i.e. from the dataset containing both
ASTER and Landsat TM bands. The confusion matrix for the KWTA/LVQ network
classification is given below.
Table 16: Confusion matrix for the unsupervised network
Ground control
Classified as Correct %
Agriculture
ω1
Forest
ω2
Settlement
ω3
Shrub
Ω4
Swamp
ω5
Water
ω6
ΣX
Agriculture (Ω1) 232 0 54 88 39 15 428
Forest (Ω2) 2 102 0 10 72 5 191
Settlement (Ω3) 57 1 1 41 15 5 120
Shrub (Ω4) 64 6 5 193 53 13 334
Swamp (Ω5) 34 0 1 78 13 4 130
Water (Ω6) 0 0 3 1 2 55 61
ΣY 389 109 64 411 194 97 ΣΣX= 1264
Accuracy assessment formula: example for class 1 (agriculture)
Accuracy for each class = (ω1, Ω1) *100/ ΣX
Error of omission = 1-Accuracy (Producers Accuracy)
Error of commission = (Ω2+ Ω3+ Ω4+ Ω5+ Ω6) *100/ ΣY (users accuracy)
Overall accuracy = Σ (ω1, Ω1),(ω2, Ω2),(ω3, Ω3),(ω4, Ω4),(ω5, Ω5), (ω6, Ω6)* 100
ΣΣX
Table 17 shows that the unsupervised network classified some ‘settlement’ class points
unlike the Back propagation network, which did not recognize the class at all. This
indicates absence of enough information on that particular class in the training data set.
46
This is highly probable since the location of the nature of the areas labeled settlement is
very small spatially. For example, a village of 10 to 15 cottages was labeled as settlement
in order to be able to get information on the amount of settlements in the area. However,
due to the very limited number of training data available for this class, the back
propagation network could not learn or recognize the pattern for this class.
Table 17: Percentage accuracy of the six classes of the unsupervised classified image
Class Class accuracy Error of omission
(Producers accuracy)
Error of commission
(Users accuracy)
Agriculture 54.21 45.79 40.36 Forest 53.40 46.60 6.42 Settlement 0.83 99.17 98.44 Shrub 57.78 42.22 53.04 Swamp 10.00 90.00 93.30 Water 90.16 9.84 43.30 Overall accuracy = 47.15% The result from the KWTA/LVQ network did not give high overall accuracy. The
unsupervised network classified some ‘settlement’ class points unlike the Back
propagation network, which did not recognize the class at all.
4.7 Improving the training data quality
According to the results obtained from the validation of the previous results, the neural
networks could not learn the pattern for the output class ‘settlement’. This indicates that
there is a high possibility that this occurred because of lack of information in the training
dataset for this particular class. If this occurred because of the poor training points taken
for the class, it means it had also lowered the overall accuracy of the classification. To
confirm whether the training data quality affected the classification accuracy of neural
networks, another classification was carried out with the entire sample data of the
“settlement” class removed from the training data. The classification was carried out
47
using both supervised and unsupervised classifiers as described in the previous sections.
The results are given below:
Table 18: classification result for ASTER-Landsat TM data into 5 classes
With the removal the ‘Settlement’ class from the training set, the desired output set and
the network output increased the correct training percentage of the back propagation
classifier while it did not affect the performance of the Kohonen/LVQ classifier. The
accuracy obtained is 46.5% with 0.176 mean absolute error, this indicated that the
Kohonen/LVQ layer was not affected by the removal of the ‘settlement’ class training
points.
The training points used decreased to 1811, 84 points were removed because they were
training points representing the ‘Settlement’ class (Table 19); Respectively training
points representing ‘settlement’ class were removed from the test set as well.
Table 19: Confusion matrix for the supervised classification with five output classes
Ground control
Classified as Correct %
Agriculture ω1
Forest ω2
Shrub ω3
Swamp ω4
Water ω5 ΣX
Agriculture (Ω1) 356 0 37 5 2 400 Forest (Ω2) 0 104 0 0 0 104 Shrub (Ω3) 15 1 394 6 0 416 Swamp (Ω4) 13 2 21 133 2 171 Water (Ω5) 1 0 4 1 84 90 ΣY 385 107 456 145 88 ΣΣX = 1811
No Data type both ASTER and Landsat
TM
Correct %
Training
Correct %
test
Error
training
Error test
1 Supervised Back Propagation 90.15 81.5 0.031 0.074
2 Unsupervised KWTA - 46.5 - 0.176
48
The results obtained from the classification were very satisfactory, with 100% class
accuracy for the class ‘Forest’, and very reliable results for the classes ‘shrub’ and
‘water’ (Table 20). Although it will not be possible to get geographical locations and
distribution of the settlements in the area, for other purpose, which do now necessarily
include settlements in the objective the land cover that will be produced from this
classifier will be adequately accurate.
Table 20: Percentage accuracy of the various classes of the supervised classified
Image (with 5 classes)
Class Class accuracy Error of omission
(Producers accuracy)
Error of commission
(Users accuracy)
Agriculture 89.00 11.00 7.53 Forest 100.00 0.00 2.80 Shrub 94.71 5.29 13.60 Swamp 77.78 22.22 8.28 Water 93.33 6.67 4.55 Overall accuracy = 90.69
49
4.8 Sensitivity Analysis The result of the sensitivity analysis is given in table 23. The figures in the effect column
show the average change in the output over the training set due to a particular input being
tested. The effect normalized column is calculated so that, if all the inputs had equal
effect, the normalized effect would be 1.0. Inputs with normalized effect larger than 1
contribute more than average to the network output.
Table 21: Result of Sensitivity analysis of the ASTER dataset 1 Sensitivity analysis for back propagation networkInput Effect Effect normalized 1 Visible green 0.343842 1.264904 2 Visible Red 0.282187 1.038093 3 Infrared 0.401328 1.476381 4 SWIR 1 0.32859 1.208798 5 SWIR 2 0.26671 0.981157 6 SWIR 3 0.277654 1.021417 7 SWIR 4 0.400053 1.471692 8 SWIR 5 0.114453 0.421041 9 SWIR 6 0 0 10 NDVI 0.303505 1.116517 Sensitivity analysis for KWTA/LVQ networkInput Effect Effect normalized 1 Visible green 0.37995 1.505496 2 Visible Red 0.310228 1.229232 3 Infrared 0.383452 1.519371 4 SWIR 1 0.23123 0.916215 5 SWIR 2 0.258523 1.02436 6 SWIR 3 0.278438 1.103269 7 SWIR 4 0.278438 1.103269 8 SWIR 5 0.115615 0.458108 9 SWIR 6 0 0 10 NDVI 0.287879 1.140679 We can see that the 8th and 9th input which are the last two SWIR bands of ASTER did
not contribute much in both the supervised and unsupervised networks, especially the last
SWIR band (the 9th input) returned 0 for the sensitivity analysis which means it did not
count in the learning process at all. For the Landsat TM data inputs have their effect and
normalized effect values very close 1.which showed that all bands of LandsatTM and the 1 The result of the sensitivity analysis for the other inputs and networks is given in appendix 2
50
NDVI used contributed more or less the same proportion in the learning process both in
the supervised and unsupervised networks. This indicates removing any input from this
input vector will affect the accuracy of the output, since it would be removing
information used to classify the image
51
5 Conclusions
Neural networks gave a good result for land cover classification of the ASTER and
Landsat TM images, considering that only spectral information is used for the
classification. Both the supervised and unsupervised neural networks classifiers
confirmed that noise reduction in an input data affects the accuracy image classification
significantly. It is very difficult to be able to compromise between data quality and
quantity, since the more quality data we want we need to remove the more data we
consider noise. However, noise reduction might lead to the removal of valuable data or
information, which might be useful for the classification process. Therefore, careful
examination of the input data is very important before deciding on noise reduction.
The power of the neural networks to extract information from the multi-scale and multi-
spectral datasets, in order to come up with a better classification result was observed both
in the supervised and unsupervised neural network classifiers. It was possible to use the
TIR band of the Landsat TM and the SWIR bands of the ASTER image, which have a
different ground resolution than the rest of the bands of the images. The information
extracted from thermal band increased the accuracy of the image classification
significantly.
The back propagation supervised neural network classifier proved to be a highly accurate
classifier than the Kohonen/LVQ unsupervised classifier. The comparison between the
result of supervised and unsupervised showed that the supervised neural networks gave a
better class result for image classification however this does not give enough ground to
conclude that the unsupervised neural networks are not useful for image classification.
Because it is known that neural networks might not converge to the pattern into which the
data should be classified due to absence of enough information in the input vector.
Basically the presence of more data like DEM, soil type and other thematic information
increases the accuracy of unsupervised networks image classification.
52
Research question: Is there a significance difference in the accuracy of supervised
neural network classification and unsupervised neural network classification?
In summary, the unsupervised neural network gave the most inaccurate results, which can
not be used for image classification for this particular dataset and area. However,; it
would be premature to reach this conclusion such as a failure of unsupervised
classification since many other factors like availability of ancillary data affects the
accuracy of unsupervised neural networks classification. Fauzi et al (2001) explained in
their result that the adding of ancillary information like digital elevation model (DEM) is
proved to increase the classification accuracy. This confirms that a digital elevation
model is a valuable input that gives additional information in order to improve the
accuracy of neural network classifier in image classification. Unfortunately due
unavailability of DEM data it was not possible to see the significance it will make if used
in the classification process for our study area. The supervised neural network resulted in
a high accuracy classification result, which successfully can be used for making of the
land cover map of the study area. To conclude the answer for the first research question
of this study, the results of supervised and unsupervised neural network classification
have a significant difference. The supervised neural network classifier proved to be
robust, generalizing and accurate as illustrated in Table 22.
Table 22: Accuracy of supervised and unsupervised neural network classifiers
Class Class accuracy (%) (Supervised) Six classes
Class accuracy (%) (Unsupervised) Six classes
Class accuracy (%) (Supervised) Five classes
Agriculture 93 54.21 89.00 Forest 98.1 53.40 100.00 Settlement 0 0.83 - Shrub 96.6 57.78 94.71 Swamp 78.4 10.00 77.78 Water 74.2 90.16 93.33
Over all accuracy 85.13 47.15 90.69
53
Research Question: Which type of neural networks classification; supervised or
unsupervised, will handle poor quality data better?
Both indicated that a less noisy data generalizes faster and with a better accuracy. This
was detected from the classification done on the ASTER image; both back propagation
and the KWTA/LVQ network gave increased correct training percentage after the last
two SWIR bands1 were removed from their input vector. The ability to utilize multi-scale
data in order to extract information that maximizes classification was detected in both
networks with the Landsat TM training data; both networks gave better correct training
percentage for the input with a thermal band included than the input without thermal
band.
When it comes to handling a poor quality dataset the unsupervised neural network
indicates that it was less affected than a supervised network was. The additional
classification carried out without using the settlement class has proved this clearly. The
result of the back propagation network increased after the settlement class was removed,
while this did not change or affected the unsupervised classifier, Kohonen/LVQ network,
since it already learned all the information it can get with the unsupervised mode and was
not forced to recognize a pattern enforced by a desired output. As a result it could not
perform more, after what were considered a poor training pairs (the training points
representing the ‘settlement’ class) were removed (See Table 23). This can be explained
by the mechanism of learning the two networks operate with. In the supervised
classification a set of desired outputs are provided, which are supposed to correspond to
the input data in a certain pattern. The supervised neural networks learn2 in such a way
that their error is propagated back after every iteration, so that their output resembles the
desired output according to some transfer function pre-assigned to the networks. This will
restrict the networks from recognizing any existing pattern or relationship that is not
supported by the provided desired output. Even though having training data is very good
to get more accurate image classification, this does not always hold true, since either this
1 Why the last 2 SWIR bands are considered noise was explained in Chapter 4 2 The learning mechanism of the supervised and unsupervised networks is given in detail in Chapter 2
54
data might not be available or it might not be reliable. In the previous case where the
training data was not available unsupervised neural networks can be provided as an
alternative, in the later case where the training data was not reliable, unsupervised neural
networks can be used to test the reliability of a training data. This concludes the second
research question by phrasing that the unsupervised neural networks can utilize poor or
less correlated data better than supervised networks.
55
6 Recommendation
The neural networks classification (both supervised and unsupervised) undertaken in this
study was by using only spectral information from satellite images; since neural networks
can accommodate different GIS and ancillary data in image classification process, the
classification accuracy of the datasets can be maximized by using additional information
like Digital Elevation Model (DEM), Soil Map, Geology map etc. during the image
classification process.
The other important point, which could not be covered in this study, was incorporating
the five thermal bands of ASTER into the neural networks classification; it was not
possible to use all the 14 bands of the ASTER image, due to the large size of the study
area having more input layers would have slowed down the data preparation and
processing time beyond the time available for this study. The thermal bands of ASTER
might give more classification accuracy both in supervised and unsupervised cases.
The application of fuzzy logic in the neural network image classification process is
proved to increase image classification in other studies (Abuelgasim et al, 1999).
Application of these systems is expected to increase the accuracy of the neural network
classifiers significantly. More researching has to be done in this aspect to find out if the
fuzzy logic/systems will increase the accuracy of image classification for the datasets
used.
Finally, detailed land cover maps and other kinds of GIS data like soil map, hydrology
map etc. of the adjacent areas of the study area should be produced in order to have
sufficient geographic information data, which can be used for sustainable resource
management and development planning of the area.
56
References
Abuelgasim, A.A., Ross,W.D. Gopal,S. and Wookcock, E.1999. “Change Detection
Using Adaptive Fuzzy Neural Networks: Environmental Damage Assessment after the
Gulf War”. Remote Sensing and Environment. 70:208223 Elsevier Science Inc. NY.
Alavi,F.(2002)."A survey of Neural Networks: part I."
http:www.iopwe.org/jul97/neural1.html.
Anderson, D., and McNeill,G. (1992). Artificial Neural Networks Technology. Utica,
N.Y, Kaman Sciences Corporation.
Badran, f. M., C. and Crepon, M remote sensing operations. Paris, France, CEDRIC.
Berberoglu, S., Lloyd, C.D.,Atkinson, P.M., and Curran,.2000. “The Integration of
Spectral and Textural Information Using Neural Networks for Land cover Mapping in the
Mediterranean”.
Chen, Z. (2000). Data Mining and Uncertain Reasoning: an Integrated Approach. New
York, John Wiley & sons, inc.
Fauzi,A., Hussin,A.Y. and Weir,M. 2001. “A Comparison Between Neural Networks and
Maximum Likelihood Remotely Sensed Data Classifiers to Detect Tropical Rain Logged-
over Forest in Indonesia”.22nd Asian Conference on Remote Sensing. Singapore.
Gahegan, M. G., G, and West G. (1999). "Improving Neural Network Performance on
the Classification of Complex Geogrphic Datasets." journal of geographical systems 1: 3-
22.
57
Kulkarni A. D., Lulla, K.1999. “Fuzzy Neural Network Models for Supervised
Classification: Multispectral Image Analysis”.Geocarto international. Vol.14. No.4.
Geocarto International Center. Hong kong.
Kumar,M.,and Srinivas, S.2001. “Unsupervised Image Classification by Radial Basis
Function Neural Network (RBFNN)”.22nd Asian Conference on Remote
Sensing.Singapore.
Luo,j. and Tseng, D.2000. “Self-Organinzing Feature Map for Multi-Spectral Spot Land
Cover Classification”.GISdevelopment.net.Taiwan.
Logical Designs.1996.ThinksPro: Neural Networks for Windows. User’s guide.
REF1 (2002). Neuro-Fuzzy Systems.
http://www.cs.berkeley.edu/~anuernb/nfs/, University of California at Berkeley.
REF2 1999. Landsat Thematic Mapper.
http://edc.usgs.gov/glis/hyper/guide/landsat_tm#tm8.
REF3.2002. ASTER. http://asterweb.jpl.nasa.gov/
REF4 2000.Neuscence, Intelligence Technologies
http://www.neusciences.com/technologies/intelligent_technologies.htm.
REF5 (2002). Supervised and Unsupervised Neural Networks
http://www.gc.ssr.upm.es/inves/ann1/concepts/Suunsupm.htm.
Roy, A. (2000). "Artificial neural networks-A Science in Trouble." SIGKDD
explorations 1(2): 33-38.
58
Vani, k.2000. "fusion of ASTER Image Data for Enhanced Mapping of Land Cover
Features."GISdevelopment.net
http://www.gisdevelopment.net/application/environment/pp/envp0005pf.htm.
Vassilas,N.,Perantonis,S.,Charou,E.,Varoufakis,S., and Moutsoulas,M.2000. “Neural
Networks for Fast and Efficient Classification of Multispectral Remote Sensing Data”. 5th
Hellenic conference on Informatics.University of Athens, Greece.
Wudneh, T. (1998). Biology and Management of Fish Stocks in Bahir Dar Gulf, Lake
Tana, Ethiopia. Wageningen Institute of animal Science. Wageningen, Wageningen
University: 144.
59
Appendices
Appendix1: Dataset Projection Information
All the dataset used in this study are map-registered to Transverse Mercator Projection.
Projection: Transverse Mercator
Datum: Adindan (30th Arc)
Spheroid: Clarke 1880 (Modified)
Unite of Measurement: Meter
Meridian of Origin 39000’ East of Greenwich
Latitude of Origin Equator
False Easting: 500,000m
False Northing: 0 meters (nil northing)
Scale factor at Origin 09996
Grid U.T. M. Zone 37
60
Appendix2: Results of Input Sensitivity Analysis
2.1 Sensitivity analysis of the ASTER dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible green 0.343842 1.264904 2 Visible Red 0.282187 1.038093 3 Infrared 0.401328 1.476381 4 SWIR 1 0.32859 1.208798 5 SWIR 2 0.26671 0.981157 6 SWIR 3 0.277654 1.021417 7 SWIR 4 0.400053 1.471692 8 SWIR 5 0.114453 0.421041 9 SWIR 6 0 0 10 NDVI 0.303505 1.116517 2.2 Sensitivity analysis of the Landsat TM dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible blue 0.609463 0.958399 2 Visible green 0.624913 0.982694 3 Visible Red 0.740876 1.165049 4 Near infrared 0.514623 0.80926 5 Mid infrared 0.699164 1.099457 6 TIR 0.67013 1.053799 7 Mid infrared 0.646624 1.016836 8 NDVI 0.581552 0.914508 2.3 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible blue 0.488368 0.990814 2 Visible green 0.564402 1.145074 3 Visible Red 0.592826 1.202741 4 Near infrared 0.451129 0.915263 5 Mid infrared 0.517241 1.049392 6 TIR 0.628186 1.27448 7 Mid infrared 0.558333 1.132761 8 NDVI (from Landsat TM) 0.395887 0.803186 9 Visible green 0.549178 1.114187 10 Visible Red 0.525436 1.066019 11 Infrared 0.396506 0.804442 12 SWIR 1 0.492817 0.99984 13 SWIR 2 0.431338 0.87511 14 SWIR 3 0.442284 0.897318 15 SWIR 4 0.407337 0.826416 16 NDVI (from ASTER) 0.445065 0.902959
61
2.4 Sensitivity analysis of the ASTER dataset inputs for the KWTA/LVQ classifier Input Effect Effect normalized 1 Visible green 0.37995 1.505496 2 Visible Red 0.310228 1.229232 3 Infrared 0.383452 1.519371 4 SWIR 1 0.23123 0.916215 5 SWIR 2 0.258523 1.02436 6 SWIR 3 0.278438 1.103269 7 SWIR 4 0.278438 1.103269 8 SWIR 5 0.115615 0.458108 9 SWIR 6 0 0 10 NDVI 0.287879 1.140679 2.5 Sensitivity analysis of the Landsat TM dataset inputs for the KWTA/LVQ classifier Input Effect Effect normalized 1 Visible blue 0.370968 1.002597 2 Visible green 0.355568 0.960977 3 Visible Red 0.301042 0.81361 4 Near infrared 0.436122 1.178687 5 Mid infrared 0.341115 0.921915 6 TIR 0.433549 1.171733 7 Mid infrared 0.387201 1.046469 8 NDVI 0.334491 0.904012 2.6 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for KWTA/LVQ classifier Input Effect Effect normalized 1 Visible blue 0.277849 1.085128 2 Visible green 0.245799 0.95996 3 Visible Red 0.19504 0.761721 4 Near infrared 0.275828 1.077236 5 Mid infrared 0.236521 0.923722 6 TIR 0.229315 0.895582 7 Mid infrared 0.236521 0.923722 8 NDVI (from Landsat TM) 0.21155 0.826202 9 Visible green 0.302894 1.182941 10 Visible Red 0.252535 0.986265 11 Infrared 0.28774 1.123756 12 SWIR 1 0.291602 1.138841 13 SWIR 2 0.295414 1.153728 14 SWIR 3 0.304735 1.190133 15 SWIR 4 0.234143 0.914438 16 NDVI(from ASTER) 0.21934 0.856625
62
Appendix3: Neural network parameters used
3.1 Parameters used for the Back Propagation neural network classifier
case Architecture Error type
batch size
input layer
input preprocessing
sample arrangement
# of hidden layer nodes
max nodes
learning rule
input function
transfer function
output nodes
learning rule
input function
transfer function
1 MNFF MAE 1 10 (Mean/SD) Normal 1 21 21 BPN RBF Sigmoid 6 BPN DP sigmoid 2 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 BPN DP Sigmoid 6 BPN DP sigmoid 3 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 BPN DP Sigmoid 6 BPN DP sigmoid 4 MNFF MAE 1 7 (Mean/SD) Normal 1 15 15 BPN RBF Sigmoid 6 BPN DP sigmoid 5 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 BPN DP Sigmoid 6 BPN DP sigmoid 6 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 BPN DP Sigmoid 5 BPN DP sigmoid
3.2 Parameters used for the Kohonen/LVQ neural network classifier
case Architecture Error type
batch size
input layer
input preprocessing
sample arrangement
# of hidden layer nodes
max nodes
learning rule
input function
transfer function
output nodes
learning rule
input function
transfer function
1 MNFF MAE 1 10 (Mean/SD) Normal 1 21 21 KWTA L2 WTA 6 LVQ RBF WTA 2 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 KWTA RBF WTA 6 LVQ RBF WTA 3 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 KWTA L2 WTA 6 LVQ RBF WTA 4 MNFF MAE 1 7 (Mean/SD) Normal 1 15 15 KWTA L2 WTA 6 LVQ RBF WTA 5 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 KWTA L2 WTA 6 LVQ RBF WTA 6 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 KWTA L2 WTA 5 LVQ RBF WTA