Comparison Bn Supervised&Unsupervised Neural Networks Senait D Senay 2003

i

Centre for Geo-Information

Thesis Report GIRS-2003-08

A Comparison Assessment Between Supervised and Unsupervised Neural

Network Image Classifiers

Author: Senait Dereje Senay

Supervisor: Dr. Monica Wachowicz

WAGENINGEN UR

Janu

ary

2003

⇔⇔

?

ii

Center for Geo-Information

Thesis Report GIRS-2003-08

A Comparison Assessment Between Supervised and Unsupervised Neural

Network Image Classifiers

Senait Dereje Senay

Thesis submitted in the partial fulfillment of the degree of Master of Science in Geo-

information Science at the Wageningen University and Research Center

Supervisor: Dr. Monica Wachowicz

Examiners: Dr. Monica Wachowicz DRS. A.J.W de Wit Dr. Ir. Ron van Lammeren

January 2003

Wageningen University

Center for Geo-Information and Remote Sensing Department of Environmental Sciences

iii

To my parents: Lt.Col. Yeshihareg Chernet and Ato Dereje Senay,

And my brother: Daniel Dereje

Thank you for everything you have been to me.

iv

Acknowledgements

I am indebted to my supervisor Dr. Monica Wachowicz, who gave me continuous professional support during all stages of undertaking the thesis. I would like to sincerely thank her for the invaluable advice and support she gave me. I am very grateful for Dr. Gerrit Epema, and Dr. Ir. Ron Van Lammeren who helped me in facilitating the field trip to the study area for ground control point collection and of course for the continuous moral support I have got from Dr. Gerrit Epema. I would sincerely like to thank Dr. Gete Zeleke and Mr. Meneberu Allebachew, who helped me by facilitating vehicle and other necessary data and support while I went to the study area for field data collection; without their help the field trip would not have been successful at all. I would also like to express my gratitude for Mr. Wubshet Haile and Mr. Getachew, who assisted me throughout the field work, enabling me to finish the field work with in a very limited time I had. I would like to extend my heart felt thanks to Mr. John Stuiver and Drs. Harm Bartholomeus, who supported me whenever I needed a professional held in pre-processing of data; without their support the data preprocessing stage of my thesis would definitely have taken more time. I would also like to thank the cartography section of Alterra who helped me in printing and scanning maps used in producing the report as well as in the analysis. I would like to extend my heartfelt thanks to my friends Achilleas Psomas (Ευχαριστώ) and Krzysztof Kozłowski (Dziękuję), for all the friendly moral support, and invaluable friendship; thanks for making my stressful days easier. I gratefully thank Dawit Girma, for all the help I got whenever I needed it. I would also like to show my gratefulness to my uncle Mr. Tesfasilassie Senay for providing me a family atmosphere while I went for a fieldtrip. Yet I would not pass without expressing my gratitude and sincere thanks to my friends, Giuseppe Amatulli (Grazie), Mauricio Labrador-garcia and Sonia Barranco-Borja (Gracias), Nicolas Dosselaere (Dank u wel), Izabela Witkowska (Dziękuję), Fanus Woldetsion (Yekenyeley), Adrian Ducos (Merci) for creating a pleasant working atmosphere, and much more, which helped us during the difficult times of working on the thesis, and which is also unforgettable. Αντε, I wish you all the best in the future. Last but not least, I would like to extend my admiration to the whole GRS 2001 batch for the respect and friendship between us I wish you all the best, nothing but the best. It has been an honor and pleasure to know you. Finally, I would like to extend my heartfelt thanks for NUFFIC for covering my study costs and offering me this experience.

v

Abstract

Neural networks are a recently emerged science, which developed as part of artificial

intelligence. They are used in solving complex problems in various disciplines. The

application of neural networks in remote sensing particularly in image classification has

become very popular in the last decade. The motivation to use neural networks arose due

to the limitations in using the conventional parametric image classifiers, as the source,

data structure, scale and amount of remotely sensed data became highly varied.

Fortunately neural networks are found to compensate the drawbacks; these conventional

classifiers have towards image classification. Neural networks offer two kinds of image

classification, supervised and unsupervised. In this study both neural networks were

tested to evaluate, which will result in a better accuracy image classification and which

method handle poor quality data better. Finally, a land cover map of southern part of

Lake Tana area situated in North West part of Ethiopia is produced from the best

classifier.

Key words: Neural Networks, neuron ANN, KWTA, LVQ, BP, Image classification

vi

Abbreviations

ANN Artificial Neural Networks ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer BP Back Propagation KWTA Kohonen’s Winner Take All LVQ Learning Vector Quantization MIR Middle Infrared MLNFF Multi-Layer Normal Feed Forward MLP Multi-Layer Perceptron NN Neural networks NDVI Normalized Difference Vegetation Index SOFM Self-Organizing Feature Maps SWIR Short Wave Infrared TIR Thermal Infrared VNIR Visible and Near Infrared WTA Winner-Take All

vii

Table of Contents

Acknowledgements .......................................................................................................... iv

Abstract .............................................................................................................................. v

Abbreviations ................................................................................................................... vi

List of Figures ................................................................................................................... ix

List of Tables ..................................................................................................................... x

1 Introduction .................................................................................................................... 1

1.1 BACK GROUND ........................................................................................................... 1

1.2 STUDY AREA .............................................................................................................. 3

1.3 OBJECTIVES................................................................................................................ 5

1.4 RESEARCH QUESTIONS ............................................................................................... 5

1.5 RESEARCH OUTLINE ................................................................................................... 5

2 Artificial Neural Networks (ANN) ............................................................................... 6

2.1 OVERVIEW OF THE MAIN CONCEPTS ............................................................................ 6

2.1.1 Biological concepts ............................................................................................ 6

2.1.2 Historical development ...................................................................................... 7

2.1.3 Basic neural network processor ........................................................................ 9

2.1.4 Neural networks and image classification ....................................................... 11

2.2 TYPES OF NEURAL NETWORKS .................................................................................. 12

2.2.1 Supervised neural network classifiers ............................................................. 13

2.2.1.1 Description of supervised neural network classifiers ............................... 13

2.2.1.2 Architecture and algorithm ....................................................................... 14

2.2.2 Unsupervised neural networks classifiers ....................................................... 18

2.2.2.1 Description of unsupervised Neural Network classifiers .......................... 18

2.2.2.2 Architecture and algorithm ....................................................................... 19

3 Methodology ................................................................................................................. 24

3.1 FIELD DATA ACQUISITION ......................................................................................... 26

3.2DATA PREPROCESSING .............................................................................................. 27

3.2.1 Datasets ........................................................................................................... 27

3.2.1.1 ASTER ...................................................................................................... 27

viii

3.2.1.2 Landsat TM ............................................................................................... 29

3.2.2 Datasets preparation ....................................................................................... 30

3.2.3 Training and test sets preparation ................................................................... 32

3.3 SUPERVISED NEURAL NETWORK CLASSIFICATION .................................................... 34

3.4 UNSUPERVISED NEURAL NETWORKS CLASSIFICATION .............................................. 35

3.5 ACCURACY ASSESSMENT AND VALIDATION ............................................................. 37

3.6 SENSITIVITY ANALYSIS............................................................................................. 37

3.7 IMPLEMENTATION ASPECTS ..................................................................................... 37

4 Results and Discussion ................................................................................................. 39

4.1 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER OR LANDSAT

TM DATASETS ................................................................................................................ 39

4.2 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER AND

LANDSAT TM INPUT DATASET ....................................................................................... 40

4.3 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER OR LANDSAT

DATASETS ...................................................................................................................... 41

4.4 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER AND LANDSAT

TM COMBINED DATASETS .............................................................................................. 42

4.5 VALIDATION OF THE RESULTS OBTAINED FROM THE BACK PROPAGATION SUPERVISED

NEURAL NETWORKS CLASSIFIER ..................................................................................... 43

4.6 VALIDATION OF THE RESULTS OBTAINED FROM THE KOHONEN/LVQ UNSUPERVISED

NEURAL NETWORK CLASSIFIER ....................................................................................... 45

4.7 IMPROVING THE TRAINING DATA QUALITY ............................................................... 46

4.8 SENSITIVITY ANALYSIS ............................................................................................ 49

5 Conclusions ................................................................................................................... 51

6 Recommendation.......................................................................................................... 55

References ........................................................................................................................ 56

Appendices ....................................................................................................................... 59

APPENDIX1: DATASET PROJECTION INFORMATION ........................................................ 59

APPENDIX2: RESULTS OF INPUT SENSITIVITY ANALYSIS ............................................... 60

APPENDIX3: NEURAL NETWORK PARAMETERS USED ...................................................... 62

ix

List of Figures

Figure 1: Map of Ethiopia ................................................................................................... 4

Figure 2. Overview of the study area .................................................................................. 4

Figure 3: Signal path of a single human neuron ................................................................. 6

Figure 4: The basic neural networks processor; the neuron, and its functions. .................. 9

Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture ...................... 14

Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer .................... 19

Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer. ............ 21

Figure 8: Design of Learning Vector Quantitzation Architecture .................................... 22

Figure 9: Overview of the main procedures involved in the methodological process ...... 25

Figure 10: ASTER bands superimposed on model Atmosphere. ..................................... 28

Figure 11: Landsat TM bands superimposed on model Atmosphere. .............................. 29

Figure 12: Study area after Lake Tana is masked out of the image. ................................. 30

Figure 13: Spectral signature of the six classes (before ASTER image rescaling) ......... 31

Figure 14: Spectral signature of the six classes (After ASTER image rescaling) ........... 31

Figure 15: Training/test data preparation procedure ......................................................... 32

Figure 16: Design of Back Propagation Neural Network ................................................. 34

Figure 17: The design of the KWTA/LVQ network. ........................................................ 36

x

List of Tables

Table 1: Spectral range of bands and spatial resolution for the ASTER sensor ............... 28

Table 2: Spectral range of bands and spatial resolution for the TM sensor ...................... 29

Table 3: Training and test sets for ASTER dataset. .......................................................... 33

Table 4: Training and test sets for Landsat TM dataset .................................................... 33

Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets33

Table 6. Training data set up for the Back Propagation neural network .......................... 35

Table 7. Training data set up for the Kohonen Winner Take All/LVQ network .............. 36

Table 8: Accuracy of the back propagation classifier using ASTER data ........................ 39

Table 9: Accuracy of the back propagation classifier trained with Landsat TM data ...... 40

Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat

TM combined datasets. ............................................................................................. 41

Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data .............. 42

Table 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data ... 42

Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM

data ............................................................................................................................ 42

Table 14: Confusion matrix for the Back propagation network classification using

ASTER and Landsat TM images .............................................................................. 43

Table 15: Percentage accuracy of the classes of the supervised classified image using the

ASTER and LandastTM data source ........................................................................ 44

Table 16: Confusion matrix for the unsupervised network .............................................. 45

Table 17: Percentage accuracy of the six classes of the unsupervised classified image .. 46

Table 18: classification result for ASTER-Landsat TM data into 5 classes ..................... 47

Table 19: Confusion matrix for the supervised classification with five output classes .... 47

Table 20: Percentage accuracy of the various classes of the supervised classified .......... 48

Image (with 5 classes) ....................................................................................................... 48

Table 21: Result of Sensitivity analysis of the ASTER dataset ....................................... 49

Table 22: Accuracy of supervised and unsupervised neural network classifiers .............. 52

1

1 Introduction

1.1 Back ground

Artificial Neural networks (ANNs) are systems that make use of some of the known or

expected organizing principles of the human brain. They consist of a number of

independent, simple processors – the neurons. These neurons communicate with each

other through weighted connections (REF1, 2002)1. The study of neural networks is also

referred as Artificial Neural Networks or connectionism (Roy, 2000). Use of artificial

neural networks for various applications is becoming common now days. The ease of

using the newly developing system ranges from less subjectivity of our analysis to full

automation of processes so that less manual interference is needed.

Neural networks are a very new technology, though the basis of this technology dated

back to the 40’s when McCulloch, a neuro-physiologist and a young mathematician,

Walter Pitts wrote a paper on how neurons might work, explaining their model, a simple

neural network with electrical circuits(Anderson et al, 1992). Ever since, the science

faced a lot of obstacles before becoming popular in use of different applications today.

The idea of imitating human brain structure i.e. neurons, in order to invent thinking

machines was proposed to be a moral issue in 1970’s. Much criticism was extended

towards the development of this science with an issue concerning how this neural

networks development affects human beings. People were concerned what the world

would look like with machines doing everything man can do. These movements ended up

in reducing much of the funds assigned for the development of the science; hence,

drawing the pace of development of neural networks backwards (Anderson et al, 1992).

However, this did not last long and the interest renewed when different scientists showed

that the idea of neural networks is not simply to model brains but to use them in a way

1 (Ref #) refers to references taken from the Internet; the path to sites is listed in the reference section.

2

that makes our way of life easier, in terms of computation, analysis of different

applications and less manual involvement of complex processes. This gave promising

lead to neural networks of today.

A lot of applications apart from Artificial Intelligence and computer sciences attempt to

make use of neural networks in their applications. This includes data warehousing, data

mining, robotics, wave physics, remote sensing, GIS (Roy, 2000). Although remote

sensing is not one of the primary fields to use ANN’s for analysis, recently neural

networks are being used in several applications of GIS and remote sensing. Some of the

most common applications include: data fusion, land suitability evaluation, spatio-

temporal analysis, and land cover classification of satellite images. However neural

networks did not fully replace the conventional way of analysis in these applications;

they are still being tested since the technology is not exhaustively tested on all kinds of

remote sensing data ranges.

Neural networks is of special interest to today’s remote sensing where, the problem is no

more absence or insufficiency of data but accumulation of multi-scale, multi-source, and

multi-temporal data. It is of high importance to incorporate the information found in these

data originated from different media and scale in order to achieve a better, higher

accuracy classification. There are some limitations in using conventional parametric

(statistical) classifiers like maximum likelihood, such as, the need for normal (Gaussian)

distribution in our data, the absence of flexibility in the classification process, the

inability to deal with multi-scale data without standardization of the data into the same

scale, and the inefficiency of the image classification process in terms of time. These

limitations motivated scientists to look for alternative where these drawbacks could be

compromised. Neural networks are found to be one of the soundest choices since they are

very appropriate for image classification due to their processing speed, easiness in

dealing with high-dimensional spaces, robustness, and ability to deal with a varied array

of data despite of the variation in statistical distribution, scale and type of data (Vassilas

et al 2000).

3

Like parametric classifiers neural classifiers offer two kinds of classification, supervised

and unsupervised neural network classification; both have their own advantages and

disadvantages in regards to using them for image classification. However the advantages

of one over the other depend on the type of data we have, time and expertise.

In this study the ANN’s are used in processing and classification of multispectral remote

sensing data. This study aims at investigating the difference in accuracy of supervised

and unsupervised neural network classifiers, and at evaluating significance in the

difference between the two classifiers.

1.2 Study area

The study area is located in the North West part of Ethiopia. In administrative sense the

area is found in Amahara Regional State between Gojam and Gonder provinces. The area

is of high importance in terms of irrigation, hydroelectric power, tourism, Its importance

became indispensable especially after 1992, after Ethiopia has become landlocked, since

then all fish resources come only from an inland water bodies. Wudneh (1998) stated that

Lake Tana is the least exploited fish resource in the country; he also explained that the

reasons are bad road connection with the capital city, Addis Ababa, and absence of the

highly marketable fish species, Nile perch, in this lake.

The study area has some patches of forest, although not very big these forest areas fulfill

the fuel wood demand of Bahir Dar, which is the second largest city of Ethiopia. Wild

coffee production is also an essential economical activity in the forested areas. Recently

high human encroachment is noted in the forested areas. With the expanding fishery

industry, high population growing rate, and deforestation of the meager forest resource

remained in the area; degradation of this resourceful area can be easily forecasted unless

management intervention is employed. In order to manage the area in sustainable manner,

basic geographic information about the area is very important. This study will provide

basic land cover map for the area.

4

Figure 1: Map of Ethiopia

Map copyright by Hammond World Atlas Corp. #12576

Figure 2. Overview of the study area

The study area

Lake Tana

5

1.3 Objectives

The main goal of this study is three-fold

- Investigate the advantages and disadvantages of supervised and unsupervised

neural network classifiers in the field of remote sensing, in particular land cover

classification of Multispectral and multi-scale satellite images;

- Evaluate the difference in accuracy between supervised and unsupervised neural

networks image classification of Multispectral and multi-scale satellite images.

- Produce the land cover map at of the study area located at the Amahara Regional

State, Ethiopia

1.4 Research questions

Is there a significance difference in the accuracy of supervised neural network

classification and unsupervised neural network classification?

Which type of neural networks classification; supervised or unsupervised, will handle

poor quality data better?

1.5 Research Outline Chapter 1 Gives introduction to the main theme of the thesis, it also describes the study

area, objectives and research questions

Chapter 2: deals with describing the basic concepts of neural networks including

biological, historical and basic processor of neural networks

Chapter 3: covers the methodological aspect of the study, detail procedures for the neural

networks classification process is given.

Chapter 4: reports the results obtained from the data analysis and processing it also

includes discussion of the results

Chapter 5: contains conclusions made out of the results obtained.

Chapter 6: gives recommendations on how results from this study can be improved

and/or applied.

6

2 Artificial Neural Networks (ANN)

2.1 overview of the main concepts

2.1.1 Biological concepts

The major source of inspiration for artificial neural networks creation is the human brain;

arguably the most powerful computing engine in the known universe. Both from a

computational and energy perspective, the brain have an enormously efficient structure

(Alavi, 2002). The most basic element of the human brain is a specialized cell, which is

called the neuron. The brain consists of some 100 billion neurons that are massively

interconnected by ‘synapses’ (estimated at about 60 trillion), and which operate in

parallel.

In order to understand the basic operation of the brain, it is necessary to know in detail

about the neuron. This was originally undertaken by neurobiologists, but has lately

become an interest of physicists, mathematician and engineers. As a background to this

study it is enough to review that each neuron cell, consists of a nucleus surrounded by a

cell body (soma) from which extends a single long fiber (axon) which branches

eventually into a tree-like network of nerve endings which connect to other neurons

through further synapses. This is illustrated in figure 3.

Figure 3: Signal path of a single human neuron

7

Information is transmitted from one neuron to another by a complex chemical process,

based on sodium-potassium flow dynamics, whose net effect is to activate an electrical

impulse (action potential) that gets transmitted down the axon to other cells. When this

happens, the neuron is said to have fired. Firing only occurs when the combined voltage

impulses from preceding neurons add up to a certain ‘threshold’ value. After firing, the

cell needs to ‘rest’ for a short time (refractory period) before it can fire again (Alavi,

2002).

The brain is understood to use massively parallel computations where each computing

element (the neuron) in the system is supposed to perform a very simple

computation(Roy, 2000). The basis of the Artificial Neural Networks also came from this

understanding; where each node (the analog for the neuron in our brain) performs this

simple computation, however building complex parallel computations with other

neurons.

2.1.2 Historical development

The story of neural networks can be traced back to a scientific paper by McCulloch and

Pitts, published in 1943 that described a formal calculus of networks of simple computing

elements (Anderson et al, 1992). Many of the basic ideas developed by them survive to

this day. The next big development in neural networks was the publication in 1949 of

The Book The Organization of Behavior by Donald Hebbs (Alavi 2002). Hebbs argued

that if two connected neurons are simultaneously active then the connection between

them should strengthen proportionally, which means the more frequently a particular

neural connection is activated, the greater the weight between them. This has implications

for the machine learning, since those tasks that had been better learnt had a much higher

frequency (or probability) of being accessed. This gave a clear definition to the learning

process by indicating that learning occurs by the readjustment of weight connections

between neurons.

8

In the late 1950s, Rosenblatt (Alavi, 2002) developed a class of neural networks called

the perceptron. He furthermore introduced the idea of ‘layered’ networks. A layer is

simply a one-dimensional array of artificial neurons. Most current problems to which

ANNs are applied to, use multi-layer networks with different kinds of interconnections

between these layers. The original perceptron, however, was simply one-layer

architecture (Chen, 2000). As a result this architecture has not been able to deal with

almost all of the complex problems in various fields of studies, which use neural

networks.

Furthermore, Rosenblatt developed a mathematical proof, the perceptron Convergence

Theorem that showed that algorithms for learning (or weight adjustment) would not lead

to ever –increasing weight values under iteration (Alavi, 2002). However, this was

followed by a demonstration in 1969 (by Minsky and Papert) of a class of problems

where the Convergence Theorem was inapplicable. This work led to a considerable

downsizing of interest in neural networks, which was to continue until the early 1980’s

(Alavi, 2002).

In 1982, John Hopfield, a Nobel Prize winning Caltech physist, developed the idea of

‘recurrent’ networks, i.e. one that has self-feedback connections. The Hopfield net, as it

has come to be known, is capable of storing information in dynamically stable networks,

and is capable of solving constrained optimization problems (such as the algorithm,

which showed that it was possible to train a multi-layer neural architecture using a simple

interactive procedure). These two events have proved to be the ones most responsible for

the revival of interest in neural networks in the 1980s, up to the explosive growth

industry that is today shared between physicists, engineers, computer scientists,

mathematicians and even psychologists and neurobiologists (Alavi, 2002, Anderson et al

1992, Roy, 2000)

All in all the neural networks science faced a lot of ups and downs before evolving to its

present state; the fact that it involves modeling the human brain raised a lot of moral

9

U

θ

issues that lagged the pace of its development. The whole historical development of

neural network is given on the report “Artificial Neural Networks Technology” 1

2.1.3 Basic neural network processor

An artificial neuron is a simple computing element that sums input from other neurons; a

network of neurons is interconnected by adaptive paths called ‘weights’; each neuron

computes a linear sum of the weights acting upon it, and gives outputs depending on

whether this sum exceeds a preset threshold value or not (see Figure 6). A positive value

of the weight increases the chance of a 1, and is considered excitatory; a negative value

increases a chance of a zero and considered inhibitory (real biological neurons have this

property too, but with analogue output values rather than binary ones)(Alavi, 2002).

Figure 4: The basic neural networks processor; the neuron, and its functions.

The basic functions of each neuron in the whole network are, to evaluate all the input

vectors directed towards the neuron, to return or calculate the sum of all inputs, then

1 The report can be found at the site: http://www.dacs.dtic.mil/techs/neural/neural_ToC.html or the PDF version: ftp://192.73.45.130/pub/dacs_reports/pdf/neural_nets.pdf

10

compare the sum of the inputs to the threshold1 value at the neuron (node), and lastly

determine the output through the non-linear function provided at the neuron (Chen,

2000). The output can be an input for the next node in the next layer or could simply be

the final output depending on the architecture and learning rule of the network the neuron

belongs to. These four distinctive functions must be carried out at a neuron level for the

network to learn properly according to the specified function.

Even though the mechanism seems simple at a neuron level the way all the neurons

interact through the weight adjustment in the process of learning makes the whole set up

complex which enables them to solve real complicated problems in organized way. The

mathematical representation is given as follows,

( )zy iif= ....................................................................... (1)

θ iijiji xwz −= Σ * ………………………………….. (2)

Where:

i represents a simple neuron processing a particular learning

Zi is assumed to be a real valued input

Yi is either a binary or a real valued output of the ith neuron

f is a non-linear function; the function is also called a node function

Wij represents the weight connected, explaining the strength of the xij input.

Xij are a series of inputs to the ith neuron.

θI is the threshold value of the ith neuron.(Roy, 2000)

The most important issues in neural networks are training and design of networks.

Training involves determining the connection weights (Wij) and the threshold values (θi)

from a set of training samples. Network design is concerned with determining the layers, 1 For a neuron to produce an output the sum of the weights of all the inputs should exceed the threshold value (θ) see Figure 6

11

the number of neurons in each layer, the connectivity pattern between layers and neurons

and the mode of operation e.g. feed back vs. feed forward (Roy, 2000).

2.1.4 Neural networks and image classification

Applying the ANN technology in remote sensing is a very recent phenomenon. Among

other reasons, the generation of extremely varied remote sensing data is the main reason

for the consideration of ANNs for image classification.

There are a lot of advantages ANNs offer to image classification. Primarily neural

networks are characterized with much faster classification time compared to the

conventional statistical image classifiers, the time that takes to train the networks depends

on the size of training data presented to the network; classifying the whole image with an

already learned network is generally much faster than any other image classifier

available. The other very important feature of neural networks, which is very helpful in

image classification, is the ability to incorporate other ancillary and GIS information with

the spectral information in image classification process. This will enable us to use all the

information we have about the area other than the spectral information from the image,

which increases the accuracy of the classification result. Yet the major use of neural

networks is possibility of using different scale data together in a single classification

process. For instance, the thermal band of the Landsat TM is not usually used in the

maximum likelihood image classification due to its different ground resolution than the

visible and the infrared bands. This will not be a problem when it comes to neural

networks since they have the ability to analyze multi-scale data. Neural networks learn

the pattern and relation within the input vector, so the resolution or data structure of each

individual input does not affect the image classification process. Similarly multi-date or

temporally different data can also easily be analyzed in a single image classification

process, hence the network maximizes the amount of information used in the

classification by learning patterns from the different date images. Last but not least, the

12

neural networks have the ability to deal with data regardless of the statistical distribution

of the dataset. This distribution-free nature of neural networks will allow us to deal with

various remote sensing data that do not have the normal (Gaussian) distribution.

Berberoglo et al (1999), Fauzi et al (2001), Kumar et al (2001) and Luo et al (2000)

provide a background to ANN with a remote sensing context. These powerful analyzing

natures of neural networks in image classification is grants a faster, more accurate and

reliable land cover classifications from remote sensing data compared to the results

obtained from any other image classifiers (Kumar et al, 2001).

There are many kinds of neural networks, which range from the simple perceptron neural

network to more developed many-layer networks many of them are used in image

classification. Some of the commonly used neural networks are: Adaptive Resonance

Theory (ART), Multi-layer Perceptrons (MLP), Reduced Coulomb Energy (RCE), Radial

Basis Function (RBF), Delta Bar Delta (DBD), Extended Delta Bar Delta (EDBD),

Directed Random Search (DRS), Higher Order Neural Networks (HNN), Self Organizing

Map (SOM), Learning Vector Quantization, Counter-propagation, Probabilistic neural

network, Hopfield, Boltzmann Machine, Hamming network, Bi-directional associative

memory, Spatio-temporal pattern recognition and many others.(Roy, 2000; Anderson et

al, 1992) The uses and advantages of these networks depend on how we want to use them

and for what purpose. The most common areas are prediction, data association, data

conceptualization, data filtering and classification.

2.2 Types of neural networks

Neural networks are commonly categorized in terms of their training algorithms.

Basically there are three types of neural nets. Fixed Weights Neural Networks,

Unsupervised Neural Networks and Supervised Neural Networks(REF5, 2002). Fixed

weight NN are not very common since there is no learning involved. Supervised NN and

unsupervised NN involve training but in a completely different approach. Supervised

training involves a mechanism of providing the network with the desired output either by

13

“manually” grading the networks performance or by providing the desired output with the

inputs (Anderson et al, 1992). Unsupervised training deals with training the NN by itself.

Where the network itself should recognize the patterns with the inputs and decides on an

outcome with out any outside help.

2.2.1 Supervised neural network classifiers

2.2.1.1 Description of supervised neural network classifiers

Supervised learning networks are the main streams of neural networks development.

Most of applications using neural networks implement supervised neural networks. In

supervised training, the training data consists of many pairs of input/output training

patterns, i.e., both inputs and outputs are provided. The network analyses the inputs and

compares the result with the desired output given with the input. The network is then

enabled to calculate the error by comparison. The error will be propagated back to the

network and a better training will take place. The more the number of iteration the less

the error of the network will become. This is known as convergence; hence the output of

the network converges to resemble the desired output provided.

The network attempts to pick up the pattern provided by comparing the input and desired

outputs and will approximate its outputs to the learned pattern. However these might not

always happen, there are factors which disable a network from learning properly, the

most important ones are: when the data provided is not sufficient or when it does not

contain the kind of information or pattern needed to solve the problem involved.

A test data, which has not been used in the training process, should always be set aside, to

surely ascertain the accuracy of the learning of a network. Apart from quality and

quantity of data, the design or architecture of a network and learning rule used for

training affects the rate and extent of network learning. Finally, the type and size of data

to be processed by the neural network is an important factor that should be considered

while choosing architecture and learning rules.

14

2.2.1.2 Architecture and algorithm

The Multi-layer Normal Feed Forward (MLNFF), a kind of Multi-layer perceptron

(MLP) is the commonly used architecture in supervised neural networks, however there

are also many other architectures that can be used. The MLNFF architecture is going to

be reviewed in this section because it is the architecture used in this study.

The MLNFF should have at least 3 layers the input, the hidden layer and the output layer;

it is very common to have one hidden layer, however the architecture can have more than

one hidden layer according to the data size, type and the kind of application it is going to

be used for. There is no maximum number of hidden layer limited for the architecture,

however care should be taken while structuring the data, since too many layers induce

over learning or memorization of the training data, which will make the network useless,

to be used on new data. The basic network design of the MLNFF architecture is shown

below. The more complex our data and relation between input and output classes gets the

greater number of layers we need to solve the problem.

Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture

Source: Carol E. Brown and Daniel E. O'Leary 1995

15

Several algorithms can be used with this architecture to perform image classification.

One the most used algorithms is the back propagation algorithm (Kulkarni, et al, 1999).

Back propagation learning rule is the most popular, effective and easy-to-learn for

complex multi-layered network, In fact this network is used more than all the other

networks combined (Anderson et al, 1992). Its greatest advantage is the ability to provide

non-linear function solutions to problems. The back propagation has been developed by

many researchers through time; hence the algorithm representation has a slight difference

in various literatures or sources.

The learning procedure is given as follows:

For a network, which has, 3 layers composed of the input, hidden and output layer;

Let: L1 represents the input layer

L2 represents the hidden layer

L3 represents the output layer

The number of neurons (processing elements, units) in the input layer represents the

number of input data used. The number of neurons in the output layer represents the

number of land cover classes into which the image is going to be classified. The number

of processing elements to be assigned in the hidden layer has no clearly set rule for this

study Kolomogrov’s theory which states the number of neurons in hidden layer should be

2N + 1 where N is the number of nodes in the input layer (Rangesneri et al, 1998).

The number of neurons assigned in the hidden layer affects the performance of the

network directly. The presence of many neurons might lead the network to memorize the

training set instead of learning. If memorization takes place the network will not

generalize the pattern it learned, but it will only recognize the pattern from the training

set which will make it useless since it will not be able to classify the whole image

(Anderson et al, 1992).

The net input and output for neurons in layer L2 and L3 is given by

16

woutnet ijji ∑= 1 ……………………………………….. (1)

Where neti is the net-input and outj is the output of the unit j in the preceding layer, wij is

represents the weight between the units i and j.

( )[ ] ∅+−+= netout ijexp1/1 …………………………. (2)

Where outi is the output of neuron i and ∅ is a constant. The network works in two

phases the training phase and the decision making phase. During the training phase

weights between layers L1 – L2 and L2 – L3 are adjusted so as to minimize the error

between the desired and the actual output.

The back propagation learning algorithm is described below.

1. Present a continues valued input vector X = (x1, x2, …..xn)t layer L1 and obtain the

output vector Y = ( y1, y2, …..., ym)t at layer L3. In order to obtain the output vector Y,

calculation is done layer by layer from L1 to layer L3.

2. Calculate change in weight. In order to do this, the output vector Y is compared with

the desired output vector or the target vector d, and the error is then propagated backward

to obtain the change in the weight ∆wij that is used to update the weight. ∆wij for weights

between layers L2, L3 is given by:

ww ijijE ∂∆ −∂= / ……………………………………….. (3)

This can be reduced to

οδα jiijw =∆ …………………………………………… (4)

Where α is a training rate coefficient (typically 0.01 to 1.0), οj is the output of neuron k in

layer L3 and δi is given by: 1 Algorithm information on Back propagation is modified from (Kulkarni, 1999)

17

( )odnetnet

iii

iF −⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛∂= ∂δ ……………………………. (5)

= ( )( )odoo iiij−−1

In equation (7) oi, represents the actual output of neuron I in layer L3, and di represents

the target or the desired output at neuron I in layer L3. Layer L2 has no target vector so

equation (5) can not be used in layer L1. The back propagation algorithm trains hidden

layers by propagating the output error back, layer by layer, adjusting weights at each

layer. The change in weights between layers L1 L2 can be obtained as:

δβHijij ow = ……………………. (6)

where β is a training rate coefficient for layer L1, (typically 0.01to1), oj is the output of

neuron j in layer L2, and δHi :

( ) wdoo ikk

kiiHi ∑−= 1δ …………………. (7)

in equation (7) , oi is the output of neuron i in layer L1, and the summation term

represents the weighted sum of all d values corresponding to neurons in layer L3 that are

obtained by using Equation (8).

3. Update the weights

( ) ( ) www ijijij nn ∆+=+1 ……………………………… (8)

Where ( )1+nwij represents the value of the weight at iteration n+1 (after adjustment),

and wij(n) represents the value of the weight at iteration n.

4. Obtain error ε for neurons in layer L3.

18

( )2∑ −= do iiε ………………………………… (9)

If the error is greater than some minimum εmin (user defined, depends on the accuracy

needed), then repeat step 2 through 4: otherwise terminate the training process.

2.2.2 Unsupervised neural networks classifiers

2.2.2.1 Description of unsupervised Neural Network classifiers

Unsupervised neural networks are networks, which organize the input vector into similar

groups by self –learning pattern of the input vector. This is a major limitation in

supervised neural networks, where training with a lot of input and output example is

necessary. Even though, learning a network with an existing example set gives reliability

for the result, it will not be handy for some cases where training data is not available; in

this case unsupervised or adaptive neural networks will be of a great help. The learning

rule of unsupervised neural networks is supposed to perform learning in unsupervised or

self-organizing manner (Chen, 2000). This leads to a relevant output by learning patterns

from the redundant training data. In Unsupervised neural networks, only input vectors are

presented to the network, and the network adjusts its own weights without any additional

(external) help to decide what particular output to assign to a given input. Usually

unsupervised neural networks classify input data into distinct or discrete groupings.

Unsupervised neural networks can be ideal where seemingly uncorrelated data has to be

classified and most importantly when there is no training (example) data available (Alavi,

2002).

There are two major ways of unsupervised learning; Competitive Learning Networks, and

Self Organizing Feature Maps (SOFM). Competitive Learning Networks involve a

process in which output layer neurons compete among themselves to acquire the ability

to fire in response to given input vectors (patterns). The basic learning rules to perform

unsupervised learning are the Hebbian and the competitive rules, both inspired by

neurobiological considerations (Chen, 2000). When an input pattern is presented, a

winning output neuron K is selected and activations are reset, such that:

19

Y = K = 1 and Yj ≠K = 0

The output layer is referred as Winner-Take-All. The other type of unsupervised network,

which is only slightly different from competitive layer is Self-Organizing Feature Maps

(SOFM), or sometimes known as Auto Associative networks (Anderson et al, 1992). The

leading researcher into unsupervised learning, Tuevo Kohonen, developed this network.

It relies on the use of competitive learning but with different emergent properties. Its

unique feature is preserving the topology of the input vectors (pattern). The SOFM are

intended to map a continuous high dimensional space into a discrete one- or two-

dimensional space, (Chen, 2000).

2.2.2.2 Architecture and algorithm

In this study a SOFM network will be used for the unsupervised neural network

classification, hence the review focuses on the architecture and learning rules of a SOFM

network. The typical architecture of unsupervised neural networks (SOFM) comprises

two layers that are the input layer and the output layer (figure 8).

Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer

Source: (REF4, 2000).

In a SOFM network there are two kinds of weight connections (REF5, 2002): Feed

forward, which is between input layer and the output layer, and a lateral feedback weight

connection within the output layer. In the feed forward connection, the weight connection

20

is usually excitatory which activates the neurons in the output layer to be active so that

their weight gets updated. In the other hand the Lateral feedback or the weight connection

within the output layers is inhibitory, which lags the neurons to be activated. Therefore

all the neurons do not get updated, instead only the winner neuron from the lateral feed

back connection or competitive layer will get updated. This learning rule of the output

layer of unsupervised neural networks is called Winner Take All (WTA).

WTA learning rule is the most widely used learning rule for SOFM. As its name implies,

only the winner neuron from the output layer will be activated which will get its weight

updated. This rule is common both for SOFM and other Competitive Learning Networks.

What makes SOFM unique is; it is not only the winner neuron that gets its weight

updated but also its neighbors neurons get their weight updated, so that in this way as it is

mentioned earlier, the SOFM will be capable of preserving topology of the output layer

with respect to the input layer. This is very important when dealing with geo-spatial data

where topology is very important. The mechanism in which the winner is selected is by

looking into the distance between the neuron in the output layer and the input vector.

There are many kinds of distances considered in this case; the widely used distance is

Euclidean distance.

The WTA learning rule procedure can be described as following:

1. Select a winner neuron, with the smallest Euclidian distance:

wjx − ……………………………. (1)

Where Wi denotes the weight vector corresponding to the ith output neuron

2. Let i* denote the index of the winner and let I* denote a set of indexes corresponding

to a defined neighborhood of winner i*. Then the weights associated with the winner and

its neighboring neurons are updated by:

21

( )ww jjx −=∆ η ………………………………. (2)

For all the indices *Ij∈ and n is a small positive learning rate.

The amount of updating may be weighted according to a pre-assigned ‘neighborhood

function’, ( )*, ijΛ .

( )( )ww jjxij −−Λ=∆ *η …………………….. (3)

for all j.

For example, a neighborhood function ( )*, ijΛ may be chosen as

( )( )ww jjxij −Λ=∆ *,η

Where ( ) ⎜⎜⎝

⎛⎟⎟⎠

⎞=Λ

−−

σ2exp*, 2

2

*rrij ij represents the position of the neuron j in the output

space. The convergence of the feature map depends on the proper choice of rj. One

choice is that η =1/t. The size of the neighborhood (or σ) should decrease gradually as

shown in the next figure:

Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer.

3. The weight update should be immediately succeeded by the normalization of Wi.

The rate at which the weight of a winner neuron is updated depends on a small positive

constant which is user defined. It is referred as alpha (α).

22

In some cases another constant theta (θ) is applied to the network to avoid neurons, which

never get their weight updated. The θ the rate at which a neuron that did not win loses, it

represents the losing rate. When there is no losing rate θ is set to zero.

Although a design containing two layers is the mostly used architecture, it is possible to

use the Kohonen layer or the Self Organizing Maps (SOFM) as a hidden layer by

providing an extra output layer usually a Learning Vector Quantitization (LVQ) layer,

which helps in unsupervised classification incase of complex data. The same person who

developed the SOFMs, Tuevo Kohonen, created this architecture also. The architecture is

very similar to the SOFMs; it is a form of supervised learning adapted from Kohonen

unsupervised learning. It uses the Kohonen layer with the WTA transfer function, which

is capable of sorting items into similar categories (Anderson et al, 1992). However there

are some important modifications added to this architecture, which makes it more robust

to handle classification and image segmentation problems.

Figure 8: Design of Learning Vector Quantitzation Architecture

Source: Anderson et al, 1992

Learning Vector Quantization classifies its input data into classes that are determined by

the user. Essentially, it maps an n-dimensional space into an m dimensional space. We

can refer to this learning rule as a semi-unsupervised network, since it gives the freedom

of giving the number of classes we want to group the input data into. That is, it takes n

23

inputs and produces m outputs. The networks can be trained to classify inputs while

preserving the topology of the training set. This occurs by preserving the nearest neighbor

relationships in the training set such that input patterns which have not been previously

learned will be categorized by their nearest neighbors in the training data.

The training mechanism for the LVQ network is the same as the Kohonen network, the

transfer function WTA is used to process the input data in the hidden Kohonen layer,

there will be only one winner in a layer for one input vector (for each iteration). The only

extra step in the LVQ is involved in re-assigning of the output found from the Kohonen

layer into another output layer for which the number of neurons is user defined (here the

number of classes we want to classify the input vector into is given).

The LVQ output re-assigns the Kohonen layer outputs by adjusting the connection of the

weight between the output neurons and the Kohonen layer, i.e. is if a winner neuron from

the Kohonen layer is not assigned the appropriate class (the network learns in which class

the neuron should be classified after training) the connection weights entering the neuron

are moved away from the training vector, so that it doesn’t get classified into the wrong

output class. (Anderson et al 1992). This network is of special interest for this study since

it is the most appropriate network for image classification purpose due to its topology

preserving nature.

24

3 Methodology

The whole methodological process in this study is divided into three main procedures.

The first procedure comprises:

Preparation of data collected from field into an appropriate format to be used as

input in the image classification process.

Preprocessing of the main dataset to be used and standardization of data types so

as to make them compatible during the classification procedure.

Preparation of training and testing sets from the field data and subsets of the input

dataset.

The second procedure deals with performing the supervised and unsupervised neural

network classifications, first by classifying the training dataset and then classifying the

whole image once the classification accuracy is satisfactory

The last step of the methodology deals with the validation of the neural networks

classification. After the accuracy assessment, validation is carried out, in order to make a

concrete conclusion on the analysis performed.

To explain shortly how the whole process is executed: the primary datasets, ASTER and

Landsat TM images, were pre processed and standardized. Training/test sets were

prepared from field data and the primary datasets, then data transformation was carried

out, this part of the process took considerable time because of the large size of the data

used (the images). The image data were transformed into ASCII file format in order to

make the data compatible with the neural networks processor. Then both supervised and

unsupervised neural network classification is carried out. After evaluating the outcome of

the training data, the whole image is classified by the networks learned from the training

set. Then classification resulted from supervised and unsupervised classifiers are

compared. The overall process is illustrated in Figure 11.

25

Tran

sfor

mat

ion

Figure 9: Overview of the main procedures involved in the methodological process

Intermediate corrected data

Intermediate ASCII format file

Main Actions (processes)

12

Comparison between supervised and unsupervised classification results

Accuracy assessment

Transformation

Accuracy assessment

Tra

inin

g

Tes

ting

Dataset

ASCII

Test Set

Unsupervised NN

Classification Result

Supervised NN Classification Result

Aster Image

NDVI StackLayer

Transformation

Training set for Supervised NN classification

Landsat

NDVI

Training set for Unsupervised NN classification

Supervised Classification

Unsupervised Classification

ASCII

ASCII No

Yes

Yes

No

Training

Training

ASCII

ASCII

ASCII

ASCII

Dataset Dataset Transformation

Transformation

Field data

Training dataset

Mask

Training/test datasets

Primary datasets

Secondary datasets

Classification result

Sensitivity analysis

26

3.1 Field data acquisition

The study area covers 1558 km2; it is found between 36.990 E and 37.400E longitude, and

11.48oN and 11.950 N latitude.

Representative ground truth samples were marked on the image for all the output classes.

The output classes are:

Arable land

Forest

Settlement

Shrub land and scrubland

Swampy area

Water

These land classes were chosen based on the major classes used for the available

topographic map of the area; the topographic map has also been used as a source of an

additional control data during the validation stage.

Ground truth sets were taken from the study area. Adequate ground truth was needed

both for the training of the network and for testing (validation) after the classification was

performed.

GPS was used to mark the geographical position of the ground control (ground truth)

points

27

3.2Data preprocessing

3.2.1 Datasets There are two primary data sets used for the study. These are:

Satellite image from TERRA satellite (ASTER).

Satellite image from Landsat 5 satellite (Landsat T

3.2.1.1 ASTER

ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) is an

imaging instrument TERRA satellite. ASTER is used to obtain detailed maps of land

surface temperature, emissivity, reflectance and elevation. It consists of three high-

performance optical radiometers with 14 spectral channels. Its spectral cannels are found

in the visible and near infrared (VNIR), the short wavelength infrared (SWIR) and the

thermal infrared (TIR) bands (REF3, 2002). The major features of ASTER are:

simultaneous earth surface images from the visible to thermal infrared, higher geometric

and radiometric resolution in each band than current satellite sensors, near infrared

stereoscopic image pairs collected during the same orbit, optics that allow the instrument

axis to move as much as + or – 24 degrees for SWIR and TIR cross talk direction from

the nadir and highly reliable cryocoolers for the SWIR and TIR sensors (Vani, 2000)

28

Table 1: Spectral range of bands and spatial resolution for the ASTER sensor

ASTER Bands Wavelength

(micrometers)

Resolution (meters)

Band 1 0.52 - 0.60 15

Band 2 0.63 - 0.69 15

Band 3 nadir looking 0.76 - 0.86 15

Band 3 backward looking 0.76 - 0.86 15

Band 4 600 - 1.700 30

Band 5 2.145 - 2.185 30

Band 6 2.185 - 2.225 30

Band 7 2.235 - 2.285 30

Band 8 2.295 - 2.365 30

Band 9 2.360 - 2.430 30

Band 10 8.125 - 8.475 90

Band 11 8.475 - 8.825 90

Band 12 8.925 - 9.275 90

Band 13 10.25 - 10.95 90

Band 14 10.95 - 11.65 90

Figure 10: ASTER bands superimposed on model Atmosphere.

Source: Jet Propulsion Laboratory (JPL), ASTER homepage.

29

3.2.1.2 Landsat TM

The Thematic Mapper (TM) sensor is an advanced, multispectral scanning, Earth

resources instrument designed to achieve higher image resolution, sharper spectral

separation, improved geometric fidelity, and greater radiometric accuracy and resolution

than the Multispectral Scanner (MSS) sensor. The TM data are scanned simultaneously in

seven spectral bands. Band 6 scans thermal (heat) infrared radiation. All TM bands are

quantized as 8 bit data (REF2, 1999) (Figure 11)

Table 2: Spectral range of bands and spatial resolution for the TM sensor

Landsat 5 Bands Wavelength (micrometers) Resolution (meters)

Band 1 0.45 - 0.52 30

Band 2 0.52 - 0.60 30

Band 3 0.63 - 0.69 30

Band 4 0.76 - 0.90 30

Band 5 1.55 - 1.75 30

Band 6 10.40- 12.50 120

Band 7 2.08 - 2.35 30

Figure 11: Landsat TM bands superimposed on model Atmosphere.

Background image source: Remote sensing Basics lecture note, Wageningen

University.

30

3.2.2 Datasets preparation

Both ASTER and Landsat TM images were geo-referenced1 according to the 1:50,000 m

Topographic map from the study area. The large water body found in the area, southern

part of the Lake Tana was removed from both images, since it is a known feature (Figure

12). Keeping this lake area would have increased data processing and analysis time

significantly.

Figure 12: Study area after Lake Tana is masked out of the image.

The Short Wave Infra Red band of ASTER image (6 in number) has a very low DN

number, which is very difficult to detect the variation. To solve this problem rescaling

was performed on the whole range of bands from visible to SWIR. (Figure 13 and Figure

14)

1 Projection and datum information can be found in appendix 1

31

Figure 13: Spectral signature of the six classes (before ASTER image rescaling)

Figure 14: Spectral signature of the six classes (After ASTER image rescaling)

Due to the large spatial extent of the study area, it is not possible to process the whole

image at once, during the analysis and preparation of train and test sets. As the software

32

used for the neural networks analysis, ThinksPro® accepts only ASCII files; it was

necessary to subset the image into four sub-study areas. This avoided extra large ASCII

files, which couldn’t have been edited by notepad, word pad or MS access for preparation

of input dataset.

3.2.3 Training and test sets preparation

Ground control points taken from the field work were merged with reference points taken

from the image and the 1:50,000 topographic maps by visual interpretation and expert

knowledge; this has increased the number of ground control points sufficiently for

adequate training and test data. The technical procedure of training and test data

preparation is shown on figure 15.

Figure 15: Training/test data preparation procedure

After the training and test data were prepared, different possible data combinations were

tested both for the ASTER and Landsat TM images to find out which combination of

bands could give better accuracy of neural networks classification. Although testing data

quality for neural network classification was not the primary objective of this study, it

GCPs + Points from Topographic Map

Arc coverage

Grid

ASCII

Training data Test data

+ Points from Image

Landsat TM image

Aster image +

Training data image Image Mask

Stac

k

Transformation

Randomization

Shape file

Shapearc

Pointgrid

Import to Image

33

was necessary to find out the best band combination, in order to get the best out of the

available information, since the classification was based on only spectral information.

Therefore three pre-classification training and test set evaluation were made for the

ASTER Image, the Landsat TM image and for the combination of ASTER and Landsat

TM images respectively.(see tables below for more details)

Table 3: Training and test sets for ASTER dataset.

No No of bands

used

Type of bands used Additional

information

Total No of input

Visible Near infrared SWIR

1 9 2 1 6 NDVI 10

2 7 2 1 4 NDVI 8

Table 4: Training and test sets for Landsat TM dataset

No No of bands

used


information

Total No

of input Visible Near

infrared

Mid

infrared

Thermal

infrared

1 7 3 1 2 1 NDVI 8

2 6 3 1 2 - NDVI 7

Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets

No No of

band


information

Total No of input

ASTER Landsat TM

Vis

ible

Nea

r inf

rare

d Sh

ortw

ave

infr

a red

Vis

ible

N

ear i

nfra

red

Mid

infr

ared

Ther

mal

1 14 2 1 4 3 1 2 1 2 NDVI 16

The images were changed into an ASCII file in order to make them format compatible

with the neural network processors of ThinksPro where the values from the spectral

34

bands were fed as an input vector to the neural network. The neural network will map the

feature space of the input (image data) into a category space, which in our study consists

of land cover classes. The dimension of the feature space equals to the number of spectral

bands provided.

3.3 Supervised neural network classification

The Multi-layer Normal Feed Forward (MNFF) Architecture was used for the supervised

classification. This architecture is a typical example of Multi-layer Perceptron (MLP)

architecture. The network comprises three layers, input layer, one hidden layer and output

layer. Both the hidden and output layers have a BP1 learning rule.

The network design is shown in Figure 16.

Figure 16: Design of Back Propagation Neural Network

The assignment of the number of nodes in each layer architecture used for the Back

Propagation Neural Network can be:

• The nodes and transfer and input functions vary according to the different input

dataset tested.

• The number of nodes in the input layer equals the number of inputs

• The number of nodes in the hidden layer is assigned based on Kolomogrov theory

2N+1 where N is the number of input nodes (Rangsaneri et al, 1998)

1 All the network parameters used both for supervised and unsupervised NN classification are listed in Appendix 2

Bac

k pr

opag

atio

n of

err

or

Lear

ning

dire

ctio

n

35

• The number of nodes in the output layer equals to the number of output classes,

which is 6 for this study.

Table 6 illustrates the five data sets created for the classification using the back

propagation neural network. Since the supervised classification was needed for a

comparison with the unsupervised classifier, a neural network that has proved to be a

good image classifier was needed. Back propagation was chosen because it fulfills the

above criteria. Both the hidden and the output layers of the supervised network were set

to the back propagation learning rule. All the data set described in Table 3,4, and 5.were

used for the supervised neural network classification. The percentage accuracy of each

data combination ( input vector) was recorded in order to choose the best result for the

final image classification.

Table 6. Training data set up for the Back Propagation neural network

3.4 Unsupervised neural networks classification

The Multi-Layer Normal Feed Forward Architecture (MLNFF) was used for the

unsupervised classification. The Kohonen Winner Take All (KWTA) and Learning

Vector Quntitization (LVQ) Learning rules were used for the unsupervised NN

classification. The network design comprises of 3 layers; input layer, 1 hidden layer with

Trai

ning

set

Dat

aset

Arc

hite

ctur

e

Lear

ning

rule

Lear

ning

rule

NO

of h

idde

n

laye

rs

N0

of

inpu

t

No

of h

idde

n

node

s

No

of o

utpu

t

node

s

1 ASTER Multi-Layer NFF Back prop. Back prop. 1 10 21 6

2 ASTER Multi-Layer NFF Back prop. Back prop. 1 8 17 6

3 Landsat TM Multi-Layer NFF Back prop. Back prop. 1 8 17 6

4 Landsat TM Multi-Layer NFF Back prop. Back prop. 1 7 15 6

5 ASTER

+LSTM

Multi-Layer NFF Back prop. Back prop. 1 16 33 6

36

the KWTA learning rule and output layer with LVQ learning rule. The network design is

shown in Figure 17.

Figure 17: The design of the KWTA/LVQ network.

The assignment of the number of nodes in each layer is similar to the supervised

classification. Table 7 illustrates the five data sets created for the unsupervised image

classification using KWTA/LVW Network. The hidden layer of the unsupervised

network is Kohonen WTA, while the output layer is LVQ. These learning rules are

chosen because they are topology preserving in their nature, which is very appropriate for

the kind of data we are dealing with.

Table 7. Training data set up for the Kohonen Winner Take All/LVQ network

Trai

ning

set

Dat

aset

Arc

hite

ctur

e

Lear

ning

rule

NO

of

hidd

en

N0

of

inpu

t

node

sN

o of

hi

dden

No

of

outp

ut

node

s

1 ASTER Multi-Layer NFF Kohonen WTA and LVQ 1 10 21 6

2 ASTER Multi-Layer NFF Kohonen WTA and LVQ 1 8 17 6

3 Landsat TM Multi-Layer NFF Kohonen WTA and LVQ 1 8 17 6

4 Landsat TM Multi-Layer NFF Kohonen WTA and LVQ 1 7 15 6

5 ASTER

+LSTM

Multi-Layer NFF Kohonen WTA and LVQ 1 16 33 6

Input layer

Hidden layer KWTA

Output layer LVQ

37

3.5 Accuracy assessment and validation

The accuracy or performance of the supervised neural network was evaluated by the

built-in testing mechanism of ThinksPro. While training a network, a set of pair of test set

having input and desired output set, were given to the network for the evaluation of the

correct learning percentage. Validation was then carried out by comparing a set of ground

control points with the results of the neural network classifier. A confusion matrix and

table of accuracy percentages were generated based on the validation results.

3.6 Sensitivity analysis

Sensitivity analysis is a method, which helps to determine the importance or contribution

of each input towards the generation of the final output. This information will enable us

to determine, which input is more important, or provides more information to the over all

image classification process. It is stated in the ThinksPro guide that eliminating inputs

that have little effect can improve the performance of the neural network on test data;

since lowering the input dimension, can enhance generalization (Logical Designs, 1996) .

Sensitivity analysis can be used as a decision making tool to separate useful inputs from

noise

For this study a built-in- procedure in ThinksPro (software for neural network processing)

was used to carry out the sensitivity analysis. There are many ways of calculating the

sensitivity analysis, the method used in ThinksPro is replacing each input by its average

value over the training set and calculate its effect over the output, then the magnitude of

the output change is then averaged over the whole training set, this is done for all the

inputs in the training set. Finally the effect of each input is reported given in the log file

(Logical Designs, 1996)

3.7 Implementation Aspects

Five software packages were used for the implementation of the methodology described

in the previous sections. They are listed below:

Arc/info: was used to standardize projection information for all the dataset including

preparation of the training and test sets;

38

Arcview: was used for the production and visualization, of training/test sets and land

cover map.

ERDAS imagine: was used for image pre-processing and transformation of image into

the ASCII file format.

ThinksPro: was used for neural network processing

GPS: was used to retrieve geographical position of ground control points during the

fieldwork

39

4 Results and Discussion

4.1 Accuracy of Back Propagation classifier trained with ASTER or

Landsat TM datasets

As explained in the previous chapter different combinations of bands of ASTER were

tested in order to find out the best input vector. The input vector plays an important role

because it should provide the appropriate for the neural network perceptrons and as a

result it generates a classification. Table 8 shows how the two classifications cases

carried out on the ASTER image using the back propagation classifier had a very

significant difference in accuracy. The first network trained with 3 VNIR, 6 SWIR and 1

NDVI inputs resulted in 66.60 %, where as the second network trained with the 3 VNIR,

4 SWIR and 1 NDVI (after the last 2 SWIR bands of the ASTER image are removed)

resulted in 82.64% correct training.

Table 8: Accuracy of the back propagation classifier using ASTER data

The big leap in accuracy can be explained by the noise reduction on the input data. In

other words the two last SWIR bands which were removed from the second neural

network (case 2 in Table 8) can be considered as noise, since the recorded value or (DN)

value of the bands were very poor. Most of the pixels of the image were represented as

zero for these bands (more than 75 % was zero or blank). The presence noise in the input

vector affects the overall accuracy of the neural networks performance.

Table 9 shows the Landsat TM band combination datasets used to carry out the training

and testing of the back propagation classifier. In the first case the back propagation

Case Data type: ASTER Correct %

Training

Correct %

test

Error

training

Error test

1 3 VNIR, 6 SWIR and 1 NDVI 66.60 65.78 0.134 0.137

2 3 VNIR, 4 SWIR and 1 NDVI 82.64 74.67 0.223 0.258

40

classifier was used to perform training and testing having as the input vector containing

NDVI data and all bands of Landsat TM The accuracy resulted in 79.87% correct training

and 73.73 correct test. In the second case, the back propagation classifier was used to

perform training and testing having as the input vector containing NDVI data and all

bands of Landsat TM except the thermal band. The Accuracy results were very low and

resulted in 57.54% correct training and 56.57 % correct testing.

Table 9: Accuracy of the back propagation classifier trained with Landsat TM data

There accuracy obtained from the network trained with the thermal band included, is

significantly high in comparison to the same classifier trained without the thermal band.

This confirms one of the most important advantages of using neural networks for image

classification, the possibility to use a multi-scale data in a classification process. Usually

the thermal bands are not used in many of the conventional parametric classification

methods due to their low spatial resolution, which is different from the VNIR bands. The

neural network overcomes this problem because of its multi-scale nature where different

resolution bands can be used as an input for image classification. This obviously enables

the use of the information in the input vector that would have been lost within the thermal

band.

4.2 Accuracy of Back propagation classifier trained with ASTER and Landsat TM input dataset

The back propagation network trained with the combination of the two datasets and their

NDVI derivatives showed a better result (85.07%) than any of the results obtained from

the networks trained with only one of the datasets as shown on the table below.

Test Data Source: Landsat TM Correct %

Training

Correct %

test

Error

training

Error test

1 4 VNIR, 2 MIR, 1TIR and 1 NDVI 79.87 73.73 0.057 0.078

2 4 VNIR, 2 MIR and 1 NDVI 57.54 56.57 0.146 0.151

41

Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat TM combined datasets.

The result shows that the maximization of the already high accuracy training obtained

from the ASTER and Landsat TM data individually was possible. This explains the

ability of a neural network classifier to extract information from multi-source datasets.

The advantage of this approach is two fold; it provides a quicker and efficient mechanism

to use different datasets in a multispectral image classification, while offering a very high

flexibility during the image classification process. Here to explain flexibility; for

instance, in the image classification process; any particular input data within the input

vector can be easily removed from the network if later considered a noise, or similarly

additional data can be included to the network if needed at a later stage of classification

after the process has started. This obviously avoids time loss for preparation of multiple

input vectors with different input data sources. In other words instead of going over

training and data set preparation every time we want to try more or less number of inputs.

4.3 Accuracy of Kohonen/LVQ classifier trained with ASTER or Landsat datasets Table 11 shows the results of the Kohonen/LVQ classifier trained with ASTER data

confirmed the previous results found using the Back propagation classifier for the same

dataset. The Kohononen/LVQ classifier trained with the noise-reduced dataset of ASTER

image gave a better result than the one with all SWIR bands of ASTER. This indicates

that the noise redaction affected the unsupervised classifier as well. A significant

decreasing in error and increasing in correct test percentage was observed for the training

done after the last two SWIR bands of ASTER were removed.

Data source both ASTER and Landsat

TM

Correct %

Training

Correct %

test

Error

training

Error test

7 VNIR, 6 SWIR, 1TIR and 2 NDVI 85.07 77.14 0.034 0.069

42

Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data

Test Data source: 1-2 ASTER Correct % Test Error test

1 3 VNIR, 6 SWIR and 1 NDVI 27.42 0.242

2 3 VNIR, 4 SWIR and 1 NDVI 42.16 0.193

Table 12 shows the correct training percentage of the Kohonen/LVQ classifier trained

with the Landsat TM data. Once again the unsupervised classifier confirmed the result

found from the supervised one. The data set with out the thermal band gave less result

than the classifier trained with the thermal band

Table 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data

Test Data source: Landsat TM Correct % Test Error test

1 4 VNIR, 2 MIR, 1TIR and 1 NDVI 37.03 0.210

2 4 VNIR, 2 MIR and 1 NDVI 29.19 0.236

4.4 Accuracy of Kohonen/LVQ classifier trained with ASTER and Landsat TM combined datasets The Kohonen/LVQ classifier trained with the combination of bands from the ASTER and

Landsat TM datasets gave better result compared to the results obtained from the datasets

individually (Table 13). The classifier also returned the lowest error value compared to

all the other unsupervised trainings carried out; hence an indication that the network

benefited from the merged input vector, which provided more information that helped to

better detect patterns in the input.

Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM data

Data source: ASTER and Landsat TM Correct % Test Error test

7 VNIR, 6 SWIR, 1TIR and 2 NDVI 47.15 0.176

43

In the unsupervised classification only correct test set percentage is given, correct

training evaluation is not available since desired output is not given in the training of the

unsupervised classification.

4.5 Validation of the results obtained from the back propagation supervised neural networks classifier The validation was carried out by using a set of test ground control points that were not

used in the training process. Validation was also carried for the classification of the

entire image that resulted in the highest correct training percentage (since the most

accurate classifier will be used for the final land cover classification of the study area) In

this case. the classifier trained with the combined data from the ASTER and Landsat TM

image. A total of 1264 points were used in the validation process.

The confusion matrix for the supervised back propagation neural network classification is

given in the table below.

Table 14: Confusion matrix for the Back propagation network classification using ASTER and

Landsat TM images

Ground control

Classified as Correct %

Agriculture

ω1

Forest

ω2

Settlement

ω3

Shrub

Ω4

Swamp

ω5

Water

ω6

ΣX

Agriculture (Ω1) 372 0 0 25 3 0 400

Forest (Ω2) 0 102 0 0 1 1 104

Settlement (Ω3) 23 0 0 31 6 24 84

Shrub (Ω4) 14 0 0 402 0 0 416

Swamp (Ω5) 14 1 0 19 134 3 171

Water (Ω6) 4 0 0 8 11 66 89

ΣY 427 103 0 485 155 94 ΣΣX = 1264

44

Accuracy assessment formula: example for class ω1 (agriculture)

Accuracy for each class = (ω1, Ω1)*100/ ΣX

Error of omission = 1-Accuracy (Producers Accuracy)

Error of commission = (Ω2+ Ω3+ Ω4+ Ω5+ Ω6) *100/ ΣY (users accuracy)

Overall accuracy = Σ (ω1, Ω1),(ω2, Ω2),(ω3, Ω3),(ω4, Ω4),(ω5, Ω5), (ω6, Ω6)* 100

ΣΣX

The validation of the result of the back propagation classifier revealed that one of the

classes, ‘settlement’ is not classified at all, while all the other classes are classified with a

very high accuracy. (Table 15) ‘Forest’ is the best classified class in this case, with 98%

class accuracy; and 0.97% error of commission, this ascertains that the land cover map,

which will be produced from this classification, will be useful and basic for studies

concerning forest cover in the study area. The limited number of ‘settlement’ class

ground control points used to train the network explains why the classifier could not

recognize the pattern for the settlement class. The result also indicates that the quality and

the size of the training data affect accuracy during training of the neural networks.

Table 15: Percentage accuracy of the classes of the supervised classified image using the ASTER

and LandastTM data source

Class Class accuracy Error of omission

(Producers accuracy)

Error of commission

(Users accuracy)

Agriculture 93 7.00 12.88 Forest 98.1 1.92 0.97 Settlement 0 100.00 0.00 Shrub 96.6 3.37 17.11 Swamp 78.4 21.64 13.55 Water 74.2 25.84 29.79 Overall accuracy = 85.13%

45

4.6 Validation of the results obtained from the Kohonen/LVQ unsupervised neural network classifier

For ease of comparison of the results from the two classifiers, the same size of test points

used to validate the supervised classifier result was used for the validation of the result

from the unsupervised classifier. The validation was carried out for the classification that

resulted in the high correct training percentage, i.e. from the dataset containing both

ASTER and Landsat TM bands. The confusion matrix for the KWTA/LVQ network

classification is given below.

Table 16: Confusion matrix for the unsupervised network

Ground control


Agriculture

ω1

Forest

ω2

Settlement

ω3

Shrub

Ω4

Swamp

ω5

Water

ω6

ΣX

Agriculture (Ω1) 232 0 54 88 39 15 428

Forest (Ω2) 2 102 0 10 72 5 191

Settlement (Ω3) 57 1 1 41 15 5 120

Shrub (Ω4) 64 6 5 193 53 13 334

Swamp (Ω5) 34 0 1 78 13 4 130

Water (Ω6) 0 0 3 1 2 55 61

ΣY 389 109 64 411 194 97 ΣΣX= 1264

Accuracy assessment formula: example for class 1 (agriculture)

Accuracy for each class = (ω1, Ω1) *100/ ΣX

Error of omission = 1-Accuracy (Producers Accuracy)

Error of commission = (Ω2+ Ω3+ Ω4+ Ω5+ Ω6) *100/ ΣY (users accuracy)

Overall accuracy = Σ (ω1, Ω1),(ω2, Ω2),(ω3, Ω3),(ω4, Ω4),(ω5, Ω5), (ω6, Ω6)* 100

ΣΣX

Table 17 shows that the unsupervised network classified some ‘settlement’ class points

unlike the Back propagation network, which did not recognize the class at all. This

indicates absence of enough information on that particular class in the training data set.

46

This is highly probable since the location of the nature of the areas labeled settlement is

very small spatially. For example, a village of 10 to 15 cottages was labeled as settlement

in order to be able to get information on the amount of settlements in the area. However,

due to the very limited number of training data available for this class, the back

propagation network could not learn or recognize the pattern for this class.

Table 17: Percentage accuracy of the six classes of the unsupervised classified image



Error of commission

(Users accuracy)

Agriculture 54.21 45.79 40.36 Forest 53.40 46.60 6.42 Settlement 0.83 99.17 98.44 Shrub 57.78 42.22 53.04 Swamp 10.00 90.00 93.30 Water 90.16 9.84 43.30 Overall accuracy = 47.15% The result from the KWTA/LVQ network did not give high overall accuracy. The

unsupervised network classified some ‘settlement’ class points unlike the Back

propagation network, which did not recognize the class at all.

4.7 Improving the training data quality

According to the results obtained from the validation of the previous results, the neural

networks could not learn the pattern for the output class ‘settlement’. This indicates that

there is a high possibility that this occurred because of lack of information in the training

dataset for this particular class. If this occurred because of the poor training points taken

for the class, it means it had also lowered the overall accuracy of the classification. To

confirm whether the training data quality affected the classification accuracy of neural

networks, another classification was carried out with the entire sample data of the

“settlement” class removed from the training data. The classification was carried out

47

using both supervised and unsupervised classifiers as described in the previous sections.

The results are given below:

Table 18: classification result for ASTER-Landsat TM data into 5 classes

With the removal the ‘Settlement’ class from the training set, the desired output set and

the network output increased the correct training percentage of the back propagation

classifier while it did not affect the performance of the Kohonen/LVQ classifier. The

accuracy obtained is 46.5% with 0.176 mean absolute error, this indicated that the

Kohonen/LVQ layer was not affected by the removal of the ‘settlement’ class training

points.

The training points used decreased to 1811, 84 points were removed because they were

training points representing the ‘Settlement’ class (Table 19); Respectively training

points representing ‘settlement’ class were removed from the test set as well.

Table 19: Confusion matrix for the supervised classification with five output classes

Ground control


Agriculture ω1

Forest ω2

Shrub ω3

Swamp ω4

Water ω5 ΣX

Agriculture (Ω1) 356 0 37 5 2 400 Forest (Ω2) 0 104 0 0 0 104 Shrub (Ω3) 15 1 394 6 0 416 Swamp (Ω4) 13 2 21 133 2 171 Water (Ω5) 1 0 4 1 84 90 ΣY 385 107 456 145 88 ΣΣX = 1811

No Data type both ASTER and Landsat

TM

Correct %

Training

Correct %

test

Error

training

Error test

1 Supervised Back Propagation 90.15 81.5 0.031 0.074

2 Unsupervised KWTA - 46.5 - 0.176

48

The results obtained from the classification were very satisfactory, with 100% class

accuracy for the class ‘Forest’, and very reliable results for the classes ‘shrub’ and

‘water’ (Table 20). Although it will not be possible to get geographical locations and

distribution of the settlements in the area, for other purpose, which do now necessarily

include settlements in the objective the land cover that will be produced from this

classifier will be adequately accurate.

Table 20: Percentage accuracy of the various classes of the supervised classified

Image (with 5 classes)



Error of commission

(Users accuracy)

Agriculture 89.00 11.00 7.53 Forest 100.00 0.00 2.80 Shrub 94.71 5.29 13.60 Swamp 77.78 22.22 8.28 Water 93.33 6.67 4.55 Overall accuracy = 90.69

49

4.8 Sensitivity Analysis The result of the sensitivity analysis is given in table 23. The figures in the effect column

show the average change in the output over the training set due to a particular input being

tested. The effect normalized column is calculated so that, if all the inputs had equal

effect, the normalized effect would be 1.0. Inputs with normalized effect larger than 1

contribute more than average to the network output.

Table 21: Result of Sensitivity analysis of the ASTER dataset 1 Sensitivity analysis for back propagation networkInput Effect Effect normalized 1 Visible green 0.343842 1.264904 2 Visible Red 0.282187 1.038093 3 Infrared 0.401328 1.476381 4 SWIR 1 0.32859 1.208798 5 SWIR 2 0.26671 0.981157 6 SWIR 3 0.277654 1.021417 7 SWIR 4 0.400053 1.471692 8 SWIR 5 0.114453 0.421041 9 SWIR 6 0 0 10 NDVI 0.303505 1.116517 Sensitivity analysis for KWTA/LVQ networkInput Effect Effect normalized 1 Visible green 0.37995 1.505496 2 Visible Red 0.310228 1.229232 3 Infrared 0.383452 1.519371 4 SWIR 1 0.23123 0.916215 5 SWIR 2 0.258523 1.02436 6 SWIR 3 0.278438 1.103269 7 SWIR 4 0.278438 1.103269 8 SWIR 5 0.115615 0.458108 9 SWIR 6 0 0 10 NDVI 0.287879 1.140679 We can see that the 8th and 9th input which are the last two SWIR bands of ASTER did

not contribute much in both the supervised and unsupervised networks, especially the last

SWIR band (the 9th input) returned 0 for the sensitivity analysis which means it did not

count in the learning process at all. For the Landsat TM data inputs have their effect and

normalized effect values very close 1.which showed that all bands of LandsatTM and the 1 The result of the sensitivity analysis for the other inputs and networks is given in appendix 2

50

NDVI used contributed more or less the same proportion in the learning process both in

the supervised and unsupervised networks. This indicates removing any input from this

input vector will affect the accuracy of the output, since it would be removing

information used to classify the image

51

5 Conclusions

Neural networks gave a good result for land cover classification of the ASTER and

Landsat TM images, considering that only spectral information is used for the

classification. Both the supervised and unsupervised neural networks classifiers

confirmed that noise reduction in an input data affects the accuracy image classification

significantly. It is very difficult to be able to compromise between data quality and

quantity, since the more quality data we want we need to remove the more data we

consider noise. However, noise reduction might lead to the removal of valuable data or

information, which might be useful for the classification process. Therefore, careful

examination of the input data is very important before deciding on noise reduction.

The power of the neural networks to extract information from the multi-scale and multi-

spectral datasets, in order to come up with a better classification result was observed both

in the supervised and unsupervised neural network classifiers. It was possible to use the

TIR band of the Landsat TM and the SWIR bands of the ASTER image, which have a

different ground resolution than the rest of the bands of the images. The information

extracted from thermal band increased the accuracy of the image classification

significantly.

The back propagation supervised neural network classifier proved to be a highly accurate

classifier than the Kohonen/LVQ unsupervised classifier. The comparison between the

result of supervised and unsupervised showed that the supervised neural networks gave a

better class result for image classification however this does not give enough ground to

conclude that the unsupervised neural networks are not useful for image classification.

Because it is known that neural networks might not converge to the pattern into which the

data should be classified due to absence of enough information in the input vector.

Basically the presence of more data like DEM, soil type and other thematic information

increases the accuracy of unsupervised networks image classification.

52

Research question: Is there a significance difference in the accuracy of supervised

neural network classification and unsupervised neural network classification?

In summary, the unsupervised neural network gave the most inaccurate results, which can

not be used for image classification for this particular dataset and area. However,; it

would be premature to reach this conclusion such as a failure of unsupervised

classification since many other factors like availability of ancillary data affects the

accuracy of unsupervised neural networks classification. Fauzi et al (2001) explained in

their result that the adding of ancillary information like digital elevation model (DEM) is

proved to increase the classification accuracy. This confirms that a digital elevation

model is a valuable input that gives additional information in order to improve the

accuracy of neural network classifier in image classification. Unfortunately due

unavailability of DEM data it was not possible to see the significance it will make if used

in the classification process for our study area. The supervised neural network resulted in

a high accuracy classification result, which successfully can be used for making of the

land cover map of the study area. To conclude the answer for the first research question

of this study, the results of supervised and unsupervised neural network classification

have a significant difference. The supervised neural network classifier proved to be

robust, generalizing and accurate as illustrated in Table 22.

Table 22: Accuracy of supervised and unsupervised neural network classifiers

Class Class accuracy (%) (Supervised) Six classes

Class accuracy (%) (Unsupervised) Six classes

Class accuracy (%) (Supervised) Five classes

Agriculture 93 54.21 89.00 Forest 98.1 53.40 100.00 Settlement 0 0.83 - Shrub 96.6 57.78 94.71 Swamp 78.4 10.00 77.78 Water 74.2 90.16 93.33

Over all accuracy 85.13 47.15 90.69

53

Research Question: Which type of neural networks classification; supervised or

unsupervised, will handle poor quality data better?

Both indicated that a less noisy data generalizes faster and with a better accuracy. This

was detected from the classification done on the ASTER image; both back propagation

and the KWTA/LVQ network gave increased correct training percentage after the last

two SWIR bands1 were removed from their input vector. The ability to utilize multi-scale

data in order to extract information that maximizes classification was detected in both

networks with the Landsat TM training data; both networks gave better correct training

percentage for the input with a thermal band included than the input without thermal

band.

When it comes to handling a poor quality dataset the unsupervised neural network

indicates that it was less affected than a supervised network was. The additional

classification carried out without using the settlement class has proved this clearly. The

result of the back propagation network increased after the settlement class was removed,

while this did not change or affected the unsupervised classifier, Kohonen/LVQ network,

since it already learned all the information it can get with the unsupervised mode and was

not forced to recognize a pattern enforced by a desired output. As a result it could not

perform more, after what were considered a poor training pairs (the training points

representing the ‘settlement’ class) were removed (See Table 23). This can be explained

by the mechanism of learning the two networks operate with. In the supervised

classification a set of desired outputs are provided, which are supposed to correspond to

the input data in a certain pattern. The supervised neural networks learn2 in such a way

that their error is propagated back after every iteration, so that their output resembles the

desired output according to some transfer function pre-assigned to the networks. This will

restrict the networks from recognizing any existing pattern or relationship that is not

supported by the provided desired output. Even though having training data is very good

to get more accurate image classification, this does not always hold true, since either this

1 Why the last 2 SWIR bands are considered noise was explained in Chapter 4 2 The learning mechanism of the supervised and unsupervised networks is given in detail in Chapter 2

54

data might not be available or it might not be reliable. In the previous case where the

training data was not available unsupervised neural networks can be provided as an

alternative, in the later case where the training data was not reliable, unsupervised neural

networks can be used to test the reliability of a training data. This concludes the second

research question by phrasing that the unsupervised neural networks can utilize poor or

less correlated data better than supervised networks.

55

6 Recommendation

The neural networks classification (both supervised and unsupervised) undertaken in this

study was by using only spectral information from satellite images; since neural networks

can accommodate different GIS and ancillary data in image classification process, the

classification accuracy of the datasets can be maximized by using additional information

like Digital Elevation Model (DEM), Soil Map, Geology map etc. during the image

classification process.

The other important point, which could not be covered in this study, was incorporating

the five thermal bands of ASTER into the neural networks classification; it was not

possible to use all the 14 bands of the ASTER image, due to the large size of the study

area having more input layers would have slowed down the data preparation and

processing time beyond the time available for this study. The thermal bands of ASTER

might give more classification accuracy both in supervised and unsupervised cases.

The application of fuzzy logic in the neural network image classification process is

proved to increase image classification in other studies (Abuelgasim et al, 1999).

Application of these systems is expected to increase the accuracy of the neural network

classifiers significantly. More researching has to be done in this aspect to find out if the

fuzzy logic/systems will increase the accuracy of image classification for the datasets

used.

Finally, detailed land cover maps and other kinds of GIS data like soil map, hydrology

map etc. of the adjacent areas of the study area should be produced in order to have

sufficient geographic information data, which can be used for sustainable resource

management and development planning of the area.

56

References

Abuelgasim, A.A., Ross,W.D. Gopal,S. and Wookcock, E.1999. “Change Detection

Using Adaptive Fuzzy Neural Networks: Environmental Damage Assessment after the

Gulf War”. Remote Sensing and Environment. 70:208223 Elsevier Science Inc. NY.

Alavi,F.(2002)."A survey of Neural Networks: part I."

http:www.iopwe.org/jul97/neural1.html.

Anderson, D., and McNeill,G. (1992). Artificial Neural Networks Technology. Utica,

N.Y, Kaman Sciences Corporation.

Badran, f. M., C. and Crepon, M remote sensing operations. Paris, France, CEDRIC.

Berberoglu, S., Lloyd, C.D.,Atkinson, P.M., and Curran,.2000. “The Integration of

Spectral and Textural Information Using Neural Networks for Land cover Mapping in the

Mediterranean”.

Chen, Z. (2000). Data Mining and Uncertain Reasoning: an Integrated Approach. New

York, John Wiley & sons, inc.

Fauzi,A., Hussin,A.Y. and Weir,M. 2001. “A Comparison Between Neural Networks and

Maximum Likelihood Remotely Sensed Data Classifiers to Detect Tropical Rain Logged-

over Forest in Indonesia”.22nd Asian Conference on Remote Sensing. Singapore.

Gahegan, M. G., G, and West G. (1999). "Improving Neural Network Performance on

the Classification of Complex Geogrphic Datasets." journal of geographical systems 1: 3-

22.

57

Kulkarni A. D., Lulla, K.1999. “Fuzzy Neural Network Models for Supervised

Classification: Multispectral Image Analysis”.Geocarto international. Vol.14. No.4.

Geocarto International Center. Hong kong.

Kumar,M.,and Srinivas, S.2001. “Unsupervised Image Classification by Radial Basis

Function Neural Network (RBFNN)”.22nd Asian Conference on Remote

Sensing.Singapore.

Luo,j. and Tseng, D.2000. “Self-Organinzing Feature Map for Multi-Spectral Spot Land

Cover Classification”.GISdevelopment.net.Taiwan.

Logical Designs.1996.ThinksPro: Neural Networks for Windows. User’s guide.

REF1 (2002). Neuro-Fuzzy Systems.

http://www.cs.berkeley.edu/~anuernb/nfs/, University of California at Berkeley.

REF2 1999. Landsat Thematic Mapper.

http://edc.usgs.gov/glis/hyper/guide/landsat_tm#tm8.

REF3.2002. ASTER. http://asterweb.jpl.nasa.gov/

REF4 2000.Neuscence, Intelligence Technologies

http://www.neusciences.com/technologies/intelligent_technologies.htm.

REF5 (2002). Supervised and Unsupervised Neural Networks

http://www.gc.ssr.upm.es/inves/ann1/concepts/Suunsupm.htm.

Roy, A. (2000). "Artificial neural networks-A Science in Trouble." SIGKDD

explorations 1(2): 33-38.

58

Vani, k.2000. "fusion of ASTER Image Data for Enhanced Mapping of Land Cover

Features."GISdevelopment.net

http://www.gisdevelopment.net/application/environment/pp/envp0005pf.htm.

Vassilas,N.,Perantonis,S.,Charou,E.,Varoufakis,S., and Moutsoulas,M.2000. “Neural

Networks for Fast and Efficient Classification of Multispectral Remote Sensing Data”. 5th

Hellenic conference on Informatics.University of Athens, Greece.

Wudneh, T. (1998). Biology and Management of Fish Stocks in Bahir Dar Gulf, Lake

Tana, Ethiopia. Wageningen Institute of animal Science. Wageningen, Wageningen

University: 144.

59

Appendices

Appendix1: Dataset Projection Information

All the dataset used in this study are map-registered to Transverse Mercator Projection.

Projection: Transverse Mercator

Datum: Adindan (30th Arc)

Spheroid: Clarke 1880 (Modified)

Unite of Measurement: Meter

Meridian of Origin 39000’ East of Greenwich

Latitude of Origin Equator

False Easting: 500,000m

False Northing: 0 meters (nil northing)

Scale factor at Origin 09996

Grid U.T. M. Zone 37

60

Appendix2: Results of Input Sensitivity Analysis

2.1 Sensitivity analysis of the ASTER dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible green 0.343842 1.264904 2 Visible Red 0.282187 1.038093 3 Infrared 0.401328 1.476381 4 SWIR 1 0.32859 1.208798 5 SWIR 2 0.26671 0.981157 6 SWIR 3 0.277654 1.021417 7 SWIR 4 0.400053 1.471692 8 SWIR 5 0.114453 0.421041 9 SWIR 6 0 0 10 NDVI 0.303505 1.116517 2.2 Sensitivity analysis of the Landsat TM dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible blue 0.609463 0.958399 2 Visible green 0.624913 0.982694 3 Visible Red 0.740876 1.165049 4 Near infrared 0.514623 0.80926 5 Mid infrared 0.699164 1.099457 6 TIR 0.67013 1.053799 7 Mid infrared 0.646624 1.016836 8 NDVI 0.581552 0.914508 2.3 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for the Back propagation classifier Input Effect Effect normalized 1 Visible blue 0.488368 0.990814 2 Visible green 0.564402 1.145074 3 Visible Red 0.592826 1.202741 4 Near infrared 0.451129 0.915263 5 Mid infrared 0.517241 1.049392 6 TIR 0.628186 1.27448 7 Mid infrared 0.558333 1.132761 8 NDVI (from Landsat TM) 0.395887 0.803186 9 Visible green 0.549178 1.114187 10 Visible Red 0.525436 1.066019 11 Infrared 0.396506 0.804442 12 SWIR 1 0.492817 0.99984 13 SWIR 2 0.431338 0.87511 14 SWIR 3 0.442284 0.897318 15 SWIR 4 0.407337 0.826416 16 NDVI (from ASTER) 0.445065 0.902959

61

2.4 Sensitivity analysis of the ASTER dataset inputs for the KWTA/LVQ classifier Input Effect Effect normalized 1 Visible green 0.37995 1.505496 2 Visible Red 0.310228 1.229232 3 Infrared 0.383452 1.519371 4 SWIR 1 0.23123 0.916215 5 SWIR 2 0.258523 1.02436 6 SWIR 3 0.278438 1.103269 7 SWIR 4 0.278438 1.103269 8 SWIR 5 0.115615 0.458108 9 SWIR 6 0 0 10 NDVI 0.287879 1.140679 2.5 Sensitivity analysis of the Landsat TM dataset inputs for the KWTA/LVQ classifier Input Effect Effect normalized 1 Visible blue 0.370968 1.002597 2 Visible green 0.355568 0.960977 3 Visible Red 0.301042 0.81361 4 Near infrared 0.436122 1.178687 5 Mid infrared 0.341115 0.921915 6 TIR 0.433549 1.171733 7 Mid infrared 0.387201 1.046469 8 NDVI 0.334491 0.904012 2.6 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for KWTA/LVQ classifier Input Effect Effect normalized 1 Visible blue 0.277849 1.085128 2 Visible green 0.245799 0.95996 3 Visible Red 0.19504 0.761721 4 Near infrared 0.275828 1.077236 5 Mid infrared 0.236521 0.923722 6 TIR 0.229315 0.895582 7 Mid infrared 0.236521 0.923722 8 NDVI (from Landsat TM) 0.21155 0.826202 9 Visible green 0.302894 1.182941 10 Visible Red 0.252535 0.986265 11 Infrared 0.28774 1.123756 12 SWIR 1 0.291602 1.138841 13 SWIR 2 0.295414 1.153728 14 SWIR 3 0.304735 1.190133 15 SWIR 4 0.234143 0.914438 16 NDVI(from ASTER) 0.21934 0.856625

62

Appendix3: Neural network parameters used

3.1 Parameters used for the Back Propagation neural network classifier

case Architecture Error type

batch size

input layer

input preprocessing

sample arrangement

# of hidden layer nodes

max nodes

learning rule

input function

transfer function

output nodes

learning rule

input function

transfer function

1 MNFF MAE 1 10 (Mean/SD) Normal 1 21 21 BPN RBF Sigmoid 6 BPN DP sigmoid 2 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 BPN DP Sigmoid 6 BPN DP sigmoid 3 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 BPN DP Sigmoid 6 BPN DP sigmoid 4 MNFF MAE 1 7 (Mean/SD) Normal 1 15 15 BPN RBF Sigmoid 6 BPN DP sigmoid 5 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 BPN DP Sigmoid 6 BPN DP sigmoid 6 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 BPN DP Sigmoid 5 BPN DP sigmoid

3.2 Parameters used for the Kohonen/LVQ neural network classifier

case Architecture Error type

batch size

input layer

input preprocessing

sample arrangement

# of hidden layer nodes

max nodes

learning rule

input function

transfer function

output nodes

learning rule

input function

transfer function

1 MNFF MAE 1 10 (Mean/SD) Normal 1 21 21 KWTA L2 WTA 6 LVQ RBF WTA 2 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 KWTA RBF WTA 6 LVQ RBF WTA 3 MNFF MAE 1 8 (Mean/SD) Normal 1 17 17 KWTA L2 WTA 6 LVQ RBF WTA 4 MNFF MAE 1 7 (Mean/SD) Normal 1 15 15 KWTA L2 WTA 6 LVQ RBF WTA 5 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 KWTA L2 WTA 6 LVQ RBF WTA 6 MNFF MAE 1 16 (Mean/SD) Normal 1 33 33 KWTA L2 WTA 5 LVQ RBF WTA

Documents

Comparison Bn Supervised&Unsupervised Neural Networks Senait D Senay 2003