Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
FACULTY OF FOREST AND ENVIRONMENTAL SCIENCES
DEPARTMENT OF REMOTE SENSING AND LAND INFORMATION SYSTEMS (FELIS)
TENNENBACHERSTR.4. D79106 FREIBURG IM BREISGAU. GERMANY
________________________
Development of a method for forest type detection
________________________
Monograph submitted in partial fulfillment of the requirements for the degree of PhD
By
Juan Ygnacio López Hernández
Student ID 2729448
Freiburg, November 2012
Dean of the Faculty: Prof. Dr. Jurgen Bauhus
First supervisor: Prof. Dr. Barbara Koch
Second reviewer: Prof. Dr. Carsten F. Dormann
Adviser: Dr. Claus-Peter Gross
Date of defence: 14 of November 2012
This document is available free from servers "FreiDok" at the University of Freiburg.
[http://www.freidok.uni-freiburg.de/]
Printed with the support of the German Academic Exchange Service
DAAD
Acknowledgements Development of a method for forest type detection Juan Ygnacio López Hernández
Acknowledgements
My thanks, I express first to God. He is the beginning and the end of all.
Prof. Dr. Barbara Koch. She supported me from before starting in Freiburg, dedicated to me
more time than she uses to guide me and advised with wisdom all of the directions that should
take the research during years. Her patience and mysticism inspired me to always reach to the
best results. With all my respect: Thank you!
The person who was my first contact with Freiburg and the second supervisor of my thesis,
Prof. Dr. Dr. hc. Albert Reif. I thank you guiding also my thesis with serious questions and
suggestions.
The University of Los Andes that granted me the permission to study and gave the money in
my scholarship in difficult Venezuelan times: Thank you!
Thanks to the German DAAD, that gave me more than support, the security of a serious
institution and kept the ALEVEN agreement alive without equilibrium from Venezuela.
Special thanks to Frau Metje, she was the best official support and the voice of DAAD for me.
Thanks to the Graduate School "Environment, Society and Global Change" (ESGC) and to
the International Graduate Academy. Both institutions helped to me in my personal walk to
the knowledge and sciences.
Dr. Mathias Dees, who made very useful contributions: Thank you.
Thanks to Dr. Claus-Peter Gross. He always made serious questions and provided valuable
support.
Thanks to Dr. Hooman Latifi and Dr. Ahmad Yousef. Both shared office, scientific questions,
procedures, ideas and jokes on daily basis.
Thanks to Fabian Faßnacht, the first reader and corrector of my work.
Dr. Johannes Heinzel, who addressed discussions about SVM and classification problems:
Thanks.
Thanks to the staff and friends in FeLis. Always ready in the work. The same if they have
fixed positions or exchange visitants: Thanks.
Thanks to Dr. Cristabel Duran, you were my personal adviser in Germany.
To Karl Heinz, who was always discussing methods and research questions and to the
Spanish-speaking group, which provided a secure and disinterested support. Thanks to all of
you. Pura vida, Victor, thank you very much. Thanks to the group of Portuñol, changing
members in the time and always appreciated.
I cannot remember all of the people that supported me to finish this research. To all of them,
that should be here and I forgot. Thank you.
Acknowledgements Development of a method for forest type detection Juan Ygnacio López Hernández
En Español va mi agradecimiento para ustedes:
A Norma, Moncho, Yami, Oscar y Wilma. Tan lejos y siempre sentados a mi lado. Gracias.
A Paito y Maita, en oración constante para que todo salga bién. Gracias.
A la familia Jiménez López. Gracias por cuidar y vigilar mis intereses en Mérida.
A Carmen Alicia, María del Pilar y Santiago Josué. El apoyo sentimental y tierno de cada día
en tierras lejanas, aprendiendo idioma y costumbres distintas. ¡Gracias!
Juan Ygnacio López Hernández Development of a method for forest type detection Contents
- i -
Contents
Contents ....................................................................................................................................... i
Executive summary .................................................................................................................... v
Resumen ejecutivo ................................................................................................................... vii
Zusammenfassung: .................................................................................................................... ix
List of figures ............................................................................................................................ xi
List of tables ............................................................................................................................ xiv
List of acronyms ..................................................................................................................... xvii
1 Introduction ......................................................................................................................... 1
1.1 Objectives .................................................................................................................... 3
2 Basic Information ................................................................................................................ 5
2.1 Literature review .......................................................................................................... 5
2.1.1 Definition of forest ............................................................................................... 5
2.1.2 Forest and type of forest mapping with LANDSAT data .................................... 7
2.1.3 Variable selection methods ................................................................................ 11
2.1.4 Unsupervised classification ................................................................................ 12
2.1.5 Supervised classification .................................................................................... 15
2.1.6 Classification with machine learning tools ........................................................ 21
2.1.7 Tuning algorithms .............................................................................................. 22
2.1.8 Cross validation .................................................................................................. 23
2.1.9 Pixel-based image analysis (PBIA) of forest areas ............................................ 24
2.1.10 Object based image analysis (OBIA) ................................................................. 28
2.2 Software used ............................................................................................................ 33
3 Test site and data used ...................................................................................................... 35
3.1 Test site ...................................................................................................................... 35
Contents Development of a method for forest type detection Juan Ygnacio López Hernández
- ii -
3.2 Satellite data .............................................................................................................. 38
3.3 DEM .......................................................................................................................... 41
3.4 Reference data ........................................................................................................... 41
3.4.1 Orthorectified colour aerial orthophotos ............................................................ 41
3.4.2 ATKIS forest layer ............................................................................................. 42
3.4.3 Forest stands from an aerial photo interpretation ............................................... 42
3.4.4 Document about interpretation of aerial photographs of forest stands .............. 42
3.4.5 Inventory data ..................................................................................................... 42
3.4.6 Cartographic projection of the data .................................................................... 43
4 Methods ............................................................................................................................. 45
4.1 Pre-processing the national forest inventory (NFI) data ........................................... 45
4.1.1 Query .................................................................................................................. 46
4.1.2 PAM Clustering .................................................................................................. 47
4.1.3 Table to points conversion ................................................................................. 50
4.1.4 Buffering ............................................................................................................ 51
4.1.5 Automatic selection of training samples from the NFI plots ............................. 52
4.1.6 Calculating the weight of the classes ................................................................. 53
4.2 Pre-processing satellite scenes ................................................................................... 55
4.2.1 Conversion of format ......................................................................................... 55
4.2.2 Pre-processing DEM .......................................................................................... 56
4.2.3 Orthorectification ............................................................................................... 56
4.2.4 Topographic normalisation ................................................................................ 59
4.2.5 Cloud mask ......................................................................................................... 61
4.2.6 Haze suppression ................................................................................................ 62
4.3 Creating synthetic layers for the classifications ........................................................ 63
4.4 Pixel based analysis of the satellite data .................................................................... 70
4.4.1 Detection of forest .............................................................................................. 70
4.4.2 Detection of forest types .................................................................................... 72
Juan Ygnacio López Hernández Development of a method for forest type detection Contents
- iii -
4.4.3 Assessment data based on aerial photographs, satellite and NFI ....................... 73
4.4.1 Vectorization of pixel-based classification results ............................................. 74
4.5 Model for detection of types of forest ....................................................................... 79
4.6 Object Based Image Analysis (OBIA) ...................................................................... 80
4.6.1 Variable selection methods ................................................................................ 80
4.6.2 Segmentation with OBIA ................................................................................... 83
4.7 Statistical approaches ................................................................................................ 85
4.7.1 Definition of parameters for grid search ............................................................ 85
4.7.2 Machine learning classification algorithms ........................................................ 85
5 Results ............................................................................................................................... 87
5.1 The digital layer of forest types of Bavaria ............................................................... 87
5.2 Variable selection process ......................................................................................... 88
5.2.1 Criterion based ................................................................................................... 88
5.2.2 Feature space optimization ................................................................................. 90
5.2.3 Recursive feature selection ................................................................................. 90
5.3 Evaluation of PBIA ................................................................................................... 91
5.3.1 Detection of forest areas ..................................................................................... 91
5.3.2 Detection of forest types for the scene in path 192 row 026. ............................. 95
5.3.3 Evaluation of the pixel oriented classification of forest types for all of the scenes
96
5.4 Evaluation of OBIA ................................................................................................... 97
5.4.1 Subset of a scene ................................................................................................ 97
5.5 OBIA of the forest types for the whole scene ........................................................... 98
5.5.1 Classification of the forest types for the whole scene in path 192, row 026 ...... 98
5.5.2 Classification of 6 scenes ................................................................................. 100
5.5.3 Results of the application of the machine learning classification algorithms .. 103
5.5.4 Comparing classifiers ....................................................................................... 110
5.5.5 Assessment of forest type classification ........................................................... 111
Contents Development of a method for forest type detection Juan Ygnacio López Hernández
- iv -
6 Discussion ....................................................................................................................... 115
6.1 Definition of forest .................................................................................................. 115
6.2 Regarding the satellite scenes .................................................................................. 115
6.3 Preprocessing the scenes ......................................................................................... 116
6.4 Orthorectification ..................................................................................................... 116
6.5 Clouds and haze ....................................................................................................... 117
6.6 PBIA approach ........................................................................................................ 118
6.7 OBIA approach ........................................................................................................ 118
6.8 OBIA method with fast segmentation ..................................................................... 119
6.9 Variable selection methods ...................................................................................... 120
6.10 The best classifier .................................................................................................... 120
6.11 Comparing kNN and SVM ...................................................................................... 121
6.12 PAM clustering the NFI .......................................................................................... 122
6.13 The border effect ..................................................................................................... 122
6.14 Other predictors from the NFI ................................................................................. 123
6.15 The stratified approach ............................................................................................ 123
6.16 Automatization of the processing chain .................................................................. 124
7 References ....................................................................................................................... 125
Apendix 1 ............................................................................................................................... 139
Apendix 2 ............................................................................................................................... 142
Apendix 3 ............................................................................................................................... 145
Apendix 4 ............................................................................................................................... 150
Apendix 5 ............................................................................................................................... 152
Juan Ygnacio López Hernández Development of a method for forest type detection Executive Summary
- v -
Executive summary
The definition of forest based on FAO (Food and Agriculture Organization of the United
Nations), but with 40 % of crown closure was used. A set of 7056 plots, from the National
Forest Inventory (NFI) database of Bavaria, in Germany, were used to train algorithms and
classify LANDSAT TM data taken between 2006 and 2007. The data from the inventory was
filtered using a buffer around the boundary of official forest cover layer of the state to avoid
the boundary effect. The NFI was then classified with unsupervised method, taking into
account the relative importance of species. Clustering techniques based on Partitioning
Around Medoids (PAM) were used to find the number of clusters from the NFI. The images
were preprocessed for orthorectification and clouds removal. Some other anomalies at the
boundaries of the scenes were also removed. The algorithms tested for PBIA classification
were: Maximum Likelihood (MxL) and Minimum Distance (MD); for OBIA classification
were: support vector machines (SVM), k nearest neighbour (kNN), multinomial logistic
regression (MNL), decision trees (DT) and random forest (RF). Recursive back wise variable
selection based on Random Forest (RF) was carried out to select the best predictors of forest
types with OBIA. Classification and regression with Machine Learning algorithms were tuned
and applied with 10-folds cross validation and 5 replications to the optical bands and synthetic
bands of standard deviation (SD), tasseled cap (TC), and texture (TXT). The results showed
that the OBIA approach was more accurate for the delineation than the PBIA approach. PBIA
was rejected based on differences with the official forest layer. Consideration of boundary
effect in NFI was determinant in finding the solution to classify in terms of accuracy and
kappa. Precise placement of physical plot for each medoid of cluster derived from the NFI
could be found by PAM method. Mixed behavior was found in the accuracy of the
classifications. In general the SVM was the best classifier with overall accuracy ranging from
70.6%, 71.2%, 73.2% to 82.5% and kappa 0.559, 0.569, 0.898 to 0.738 respectively for 4 of
the 6 images under evaluation. In other image with less mixture of forest classes, the best
classifier was kNN with 85.1% accuracy and 0.778 kappa. RF was found to be the best in one
image with 78.5% accuracy and 0.678 kappa. The results suggest that the unequal distribution
of forest in all images can explain the errors in the classification. The best models selected per
image were composed by 10 to 12 predictors when SVM was the best classifier. Only models
that used 1 or 9 predictors were applied for two images which had kNN or RF as the best
Executive Summary Development of a method for forest type detection Juan Ygnacio López Hernández
- vi -
classifier. Parts of the method are automated and can be applied sequentially in the steps for
clustering NFI and the classification, regression with final evaluation.
Keywords: LANDSAT TM, forest type classification, machine learning, OBIA, forest
inventory.
Juan Ygnacio López Hernández Development of a method for forest type detection Executive Summary
- vii -
Resumen ejecutivo
La definición de bosque basada en la FAO pero con 40 % de cobertura del dosel, la
información de la base de datos del Inventario Nacional Forestal (NFI) de Bavaria, Alemania,
contentiva de casi 7056 parcelas, fue usada para clasificar datos de LANDSAT TM de los
años 2006 y 2007. Los datos del inventario fueron filtrados usando la capa oficial con los
bordes de bosque del Estado. Se usó un análisis de buffer alrededor de estos límites para
excluir los puntos del NFI. El NFI fue clasificado de forma no supervizada considerando la
importancia de los grupos de especies. Se usaron técnicas de clasificación basadas en
Partitioning Around Medoids (PAM) para encontrar la cantidad de clusters presentes en el
NFI y su definición basada en Medoides. Las imágenes fueron preprocesadas con
ortorectificación, remoción de nubes y algunas otras anomalías presentes en los bordes. Dos
enfoques fueron probados para analizar las imágenes: basado en pixeles (PBIA) y basado en
objetos (OBIA). Una selección recursiva por eliminación basada en Random Forest (RF) fue
empleada para seleccionar los mejores predictores de los tipos de bosque. Técnicas de
clasificación y regresión con Aprendizaje de Máquinas fueron usadas para entonar y aplicar
validación cruzada basada en 10 pliegues y 5 repeticiones con las imágenes de satélite y
bandas sintéticas, tales como la desviación estándar (SD), transformación especial gorro del
mago (TC) y la textura (TXT). Los algoritmos probados para clasificar por PBIA fueron:
Maximum Likelihood (MxL) y Minimum Distance (MD); para clasificación por OBIA,
support vector machines (SVM), k nearest neighbour (kNN), multinomial logistic regression
(MNL), decission trees (DT) y random forest (RF). Los resultados muestran que el enfoque
OBIA fue más exacto en la delineación que el PBIA. Un comportamiento mixto fue
encontrado para la exactitud de las clasificaciones. En general, SVM resultó el mejor
clasificador con exactitudes que van desde 70.6%, 71.2%, 73.2% a 82.5% y kappa 0.559,
0.569, 0.898 a 0.738 respectivamente para 4 de las 6 imágenes evaluadas. En las otras
imágenes, con menos mezcla de clases de bosque, el mejor clasificador fue kNN con 85.1%
en exactitud y kappa de 0.678. El efecto de borde fue determinante para encontrar la solución
de la clasificación in términos de exactitud y kappa. Los resultados sugieren que la
distribución de los bosques no es igual en todas las imágenes. Algunas técnicas de imputación
del algoritmo de kNN consiguió mejor solución en una imagen y el RF en la otra. Las
variables más importantes fueron NDVI, B7, B5 y B6. Los mejores modelos seleccionados
por imagen estaban compuestos por 10 o 12 predictores cuando el SVM resultó el mejor
clasificador. Sólo cuando kNN y RF fueron los mejores clasificadores se seleccionaron 1 o 9
variables respectivamente. El método puede ser automatizado y aplicado secuencialmente en
Executive Summary Development of a method for forest type detection Juan Ygnacio López Hernández
- viii -
dos pasos: para la agrupación en clusters del NFI y para la clasificación, regresión con
evaluación final.
Palabras clave: LANDSAT TM, clasificación de tipos de bosque, machine learning, OBIA,
inventario forestal.
Juan Ygnacio López Hernández Development of a method for forest type detection Executive Summary
- ix -
Zusammenfassung:
Die vorliegende Studie verwendet 7056 Inventurpunkte der Bundeswaldinventur des
Bundeslandes Bayern zur Klassifizierung von Bestandestypen aus Landsat TM
Satellitenbilder der Jahre 2006 und 2007. Die verwendeten Satellitenbilder wurden
orthorektifiziert and Wolken sowie andere Anomalien wurden entfernt. Die Inventurdaten
wurden mit Hilfe von offiziellen Geodaten der Forstverwaltung verschnitten um alle
Inventurpunkte außerhalb des Waldes zu entfernen. Dabei wurde die Walddefinition der FAO,
aber mit einem minimalen Kronenschlussgrad von 40% als Kriterium verwendet. Zusätzlich
wurde ein Puffer verwendet um Inventurpunkte nahe der Waldgrenzen ebenfalls
auszuschließen. Die verbleibenden Inventurpunkte wurden einem statistischen „Clustering“-
Verfahren unterzogen. Dabei dienten die Grundflächenanteile der unterschiedlichen
Baumarten als Eingangsvariable. Für das „Clustering“ wurde die „Partitioning Around
Medoids (PAM)“-Methode verwendet um eine optimale Anzahl unterschiedlicher
Bestandestypen und deren Eigenschaften, welche durch die Medoide definiert sind,
auszuscheiden. Zwei unterschiedliche Verfahren wurden für die Klassifizierung der
Satellitenbilder getestet: Pixelbasierte Verfahren und objekt-orientierte Verfahren. Rekursive
Rückwärtsselektion basierend auf Random Forest (RF) wurde durchgeführt um die besten
Prädiktoren für die Unterscheidung der definierten Waldtypen zu bestimmen. Die
Klassifizierung der Satellitenbilder und die Regression der Inventurpunkte mit den
Sallitendaten wurde mit unterschiedlichen Verfahren des maschinellen Lernens durchgeführt.
Es wurde eine 10-fache Kreuzvalidierung mit 5 Wiederholungen angewandt. Als
Eingangsdaten wurden sowohl die originären Landsat-Bänder als auch synthetische Derivate
der Bänder (Standartabweichungen, Tasseled-Cup-Transformationen, Texturinformationen)
verwendet. Als pixelbasierte Verfahren kamen MaximumLikelihood und MinimumDistance
Klassifikatoren zum Einsatz. Für die Klassifizierung der Segmente, welche während des
objekt-orientierten Ansatzes definiert wurden, wurden Support Vector machines (SVM), k
nearest neighbour (kNN) , multinomial logistic regression (MNL), und Entscheidungsbaum-
basierte Verfahren (DT, RF) eingesetzt. Die Ergebnisse zeigen, dass der verwendete objekt-
orientierte Ansatz bessere Ergebnisse als der pixel-basierte Ansatz lieferte. Allgemein lieferte
SVM die besten Genauigkeiten, die für 4 der 6 Bilder zwischen 70.6% und 82.5% lagen
(kappa zwischen 0.559 und 0.738). In den zwei übrigen Bildern, die eine geringere Mischung
an verschiedenen Bestandestypen aufwiesen, waren die besten Klassifikatoren einmal kNN
mit 85.1% (kappa 0.778) und einmal RF mit 78.5% (kappa 0.678). Die Ergebnisse zeigen,
dass die Zusammensetzung der Bestandestypen in den unterschiedlichen Landsat-Szenen
Executive Summary Development of a method for forest type detection Juan Ygnacio López Hernández
- x -
variiert und dass das optimale Klassifikationsverfahren von der Zusammensetzung der
Bestandestypen abhängt. Für die 4 Bilder, bei denen SVM-Klassifizierungen beste Ergebnisse
erzeugten, beinhalteten die Modelle zwischen 10 und 12 Prädiktoren. Die kNN- und RF-
Modelle der beiden anderen Bilder, welche beste Genauigkeiten lieferten beinhalteten einen
bzw. 9 Prädiktoren. Zusammenfassend kann gesagt werden, dass die vorgestellte Methode
autmatisierbar ist und die drei methodischen Schritte (Clustering der Inventurdaten,
Klassifikation und schlussendlich Regression und Validierung) sequentiell durchgeführt
werden können, um aus Landsat-Daten in Kombination mit Daten der Bundeswaldinventur
Bestandestypen zu klassifizieren.
Keywords: LANDSAT TM, Bestandestypenklassifikation, maschinelles Lernen,
Waldinventur, objektbasiertes Verfahren.
Juan Ygnacio López Hernández Development of a method for forest type detection List of figures
- xi -
List of figures
Figure 1: Contrasting maps of forest cover globally based on different definitions of
canopy cover. Source: Hansen et al. (2003) and Kirkup (2001), cited in Achard
(2009). ......................................................................................................................... 5
Figure 2: LANDSAT mission’s timeline (USGS, 2012). ........................................................... 8
Figure 3: Minimum distance classifier. Source: Campbell and Wynne (2011). ....................... 16
Figure 4: Maximum likelihood classification. Source: Campbell and Wynne (2011). ............. 17
Figure 5: k-nearest neighbours classifier. Source: Campbell and Wynne (2011). .................... 18
Figure 6: In the example of the classification with SVM, the groups can be separated by a
line when the axis is transformed. Source: Verplancke et al. (2008). ...................... 19
Figure 7: Representation of a network from the multilayer perception with a hidden layer.
From Witten et al. (2011). ........................................................................................ 21
Figure 8: Effect of modal filters after classification. ................................................................ 26
Figure 9: Example of the second classification using dual modal filtering (forest: green;
water: blue; grey: other classes). The black lines are the forest from the ATKIS
layer. ......................................................................................................................... 27
Figure 10: Illustration of the radiometric differences between two adjacent images. .............. 28
Figure 11: Illustration of the generation of segment objects by the image segmentation
method. ..................................................................................................................... 29
Figure 12: Subset of LANDSAT scene showing a false color composite (a) and the same
place with a subdivision using chessboard segmentation (b). .................................. 30
Figure 13: View with the segment objects created by multiresolution segmentation. .............. 31
Figure 14: Example of merging segment objects dissolving lines dividing the same class. ..... 32
Figure 15: The location of Bavaria relative to the other fifteen federal states in Germany
and the position of Germany in the world. ............................................................... 36
Figure 16: Natural regions of Bavaria. Source: Bundesamt für Naturschutz (2012). ............... 36
Figure 17: Overview of the best ESA satellite data ordered. .................................................... 39
Figure 18: Illustration of the satellite scenes used. At left with the path and row numbers
and at left a mosaic overlapping to avoid the clouds ................................................ 40
Figure 19: An example of the reference data used. ................................................................... 41
Figure 20. General flow of the methods applied. The preprocessing is also presented. ........... 45
List of figures Development of a method for forest type detection Juan Ygnacio López Hernández
- xii -
Figure 21: Histogram of the number of species in the National Forest Inventory (NFI) of
Bavaria. The groups of species were sorted based on abundance from most
frequent to less frequent. ........................................................................................... 47
Figure 22: Silhouette of the groups built during the definition of the optimum number of
clusters based on the mean standardised intercluster distance (ave), known as
dissimilarity. ............................................................................................................. 48
Figure 23: Cluster plot of the three groups identified by the PAM method. Each group,
identified with numbers 1 to 3, represent the clusters in Table 10. .......................... 49
Figure 24: The 7056 inventory plots for Bavaria contained in the NFI database illustrated
graphically after import into GIS. ............................................................................. 50
Figure 25: A close-up of the centre of Bavaria showing individual groups of plots. ............... 51
Figure 26: A buffer of 1.5 times the pixel size (right) was applied to the boundaries of the
forest (left) and used to select the NFI plots for the classification and the
validation of the images. ........................................................................................... 52
Figure 27: Pre-processing steps applied to the scenes. ............................................................. 55
Figure 28: The DEM before (left) and after (right) the pre-processing step. ............................ 56
Figure 29: Example 1 highlighting the accuracy of the rectification (green lines are
vectors of ATKIS forest layers). ............................................................................... 58
Figure 30: Example 2 highlighting the accuracy of the rectification (green lines are
vectors of ATKIS forest layers). ............................................................................... 59
Figure 31: Test areas for the calculation of the C-factor taken from a subset of the image
in path 192, row 026. ................................................................................................ 60
Figure 32: Overcorrection effects of topographic normalization. ............................................. 61
Figure 33: Subset of the image before (left) and after (right) masking of the clouds. .............. 62
Figure 34: The haze uncorrected image (left) and the haze corrected image (right). ............... 63
Figure 35: Subset of the NDVI. ................................................................................................ 64
Figure 36: Subset of the tasselled cap 1. This band is also called Brighness of the objects
in the visual channels (Crist and Cicone, 1984). ...................................................... 65
Figure 37: Subset of the tasselled cap 2. Called Greenness, is presenting the white most
areas related to the level of green for the human eye (Crist and Cicone, 1984). ...... 65
Figure 38: Subset of the tasselled cap 4. This layer is called Haze and it shows the hazy
areas with more intense levels of white. ................................................................... 66
Figure 39: Subset of the tasselled cap 6. This band is called sixt. ............................................ 66
Figure 40: Subset of the texture for band 1 of LANDSAT TM 5 data. .................................... 67
Juan Ygnacio López Hernández Development of a method for forest type detection List of figures
- xiii -
Figure 41: Subset of the texture for band 2 of LANDSAT TM 5 data. .................................... 67
Figure 42: Subset of the texture for band 3 of LANDSAT TM 5 data. .................................... 68
Figure 43: Subset of the texture for band 4 for the LANDSAT TM 5 data. ............................. 68
Figure 44: Subset of the texture for band 5 of LANDSAT TM 5 data. .................................... 69
Figure 45: Subset of the texture for band 7 of LANDSAT TM 5 data. .................................... 69
Figure 46: General description of the workflow. ...................................................................... 70
Figure 47: The pixel based image analysis applied to complete scenes. .................................. 70
Figure 48: The training process of the MxL classification involved adding points to the
image. ........................................................................................................................ 72
Figure 49: Example of the selection of reference points in two aerial photographs. ................ 73
Figure 50: Matrix values of a classified scene. ......................................................................... 75
Figure 51: Graphical representation of the legend. ................................................................... 78
Figure 52: The original ortho-scene in (a) is classified with kNN algorithm in (b), the
erroneous classification of forest in (c) and after the application of noHaze
algorithm. .................................................................................................................. 84
Figure 53: An example of the suppression of holes within the clouds is presented. ................ 84
Figure 54: Flowchart presenting the evaluation of the results in the methodology. ................. 87
Figure 55: Graphical representation of the map layer produced. .............................................. 88
Figure 56: Sectors for the statistical evaluation. ....................................................................... 89
Figure 57: The first classification using the original bands. ..................................................... 92
Figure 58: Image sections (in 4 illustrations) distributed over Bavaria visualising the
results of the classification. ....................................................................................... 94
Figure 59: General view of the results of the PBIA classification. ........................................... 96
Figure 60: Results of tunning process with the accuracies obtained....................................... 101
Figure 61: Levels of accuracies found in the tuning process applied with SVM algorithm . . 102
Figure 62: Example of one decision tree for the image in path 192, row 126. ....................... 103
Figure 63: Diagram with the output of the ML algoritms applied. ......................................... 106
Figure 64: Trend of the accuracy by changing the sigma and C values for the SVM
algorithm. ................................................................................................................ 108
List of tables Development of a method for forest type detection Juan Ygnacio López Hernández
- xiv -
List of tables
Table 1: Summary of the launch dates and the sensors on board the satellites. .......................... 8
Table 2: Selected algorithms and their tuning parameters. ....................................................... 22
Table 3: List of LANDSAT scenes ordered from ESA. ........................................................... 39
Table 4: LANDSAT TM 5 satellite images observed in the study. .......................................... 40
Table 5: The final 7 LANDSAT TM images used in the project. ............................................. 40
Table 6: Variables obtained from the NFI as they were stored in the files, including their
description. ................................................................................................................ 42
Table 7: Map projection spatial reference parameters used (Butler et al., 2010). .................... 43
Table 8: The original field names and their descriptions contained in the national forest
inventory (NFI) for Bavaria. ..................................................................................... 46
Table 9: Example of the table obtained after applying a query to total basal area per
species in each plot the National Forest Inventory (NFI) database for Bavaria. ...... 46
Table 10: Characteristics of the clusters found by PAM algorithm. ......................................... 49
Table 11: Experiment on SVM for seleccion of features and threshold the NDVI feature. ..... 53
Table 12: List of the orthorectified LANDSAT 5 TM images acquired. .................................. 58
Table 13: Example of the signature editing process and the color assigned per class. ............. 73
Table 14: Minimum Distance Classification. List of codes used for every class. .................... 76
Table 15: Maximum Likelihood Classification. List of codes used for every class. ................ 77
Table 16: The classification of raster data were recoded with this values ................................ 77
Table 17. Results of the variable selection using the Bayesian Information Criterion (BIC). .. 89
Table 18: Results from the FSO. Features selected for the classification with kNN. ............... 90
Table 19: Some of the results of the recursive feature selection. .............................................. 90
Table 20: Results of the evaluation of the classification for forest and non forest areas for
the image in path 192, row 126. ............................................................................... 91
Table 21: Results of the evaluation of the accuracy for forest and non forest areas with all
of the 6 images. ......................................................................................................... 91
Table 22. Contingency matrix comparing the inventory data with the result of the MxL
classification method. ............................................................................................... 95
Table 23. Contingency matrix comparing the inventory data with the result of the MD
classification method. ............................................................................................... 95
Juan Ygnacio López Hernández Development of a method for forest type detection List of tables
- xv -
Table 24: Results of the accuracy assessment for the classification per LANDSAT TM
image. ........................................................................................................................ 96
Table 25: Contingency table showing the results of the classification using MxL compared
with the inventory data. ............................................................................................ 99
Table 26: Contingency table showing the results of the classification using kNN compared
with the inventory data. ............................................................................................ 99
Table 27: Accuracies obtained for the scene in path 192 row 026 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 108
Table 28: Contingency table for the scene in path 192, row 026 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 109
Table 29: Accuracies found applying SVM algorithm for classification with 4 replications
and 10 folds LGOCV for the scene in path 192, row 026. The minimum, first
quarter, median, mean, third quarter and maximum accuracy is shown. The
figures are in the scale from 0 to the worst accuracy and 1 to the best. ................. 110
Table 30: Differences in accuracy among classifiers applied to the scene in path 192, row
026. The accuracy and their differences is presented in the scale from 0 to 1. ...... 110
Table 31: Results of assessment the classifications for the image in path 192, row 026. ....... 111
Table 32: Results of assessment the classifications for the image in path 192, row 027. ....... 111
Table 33: Results of assessment the classifications for the image in path 193, row 025. ....... 112
Table 34: Results of assessment the classifications for the image in path 193, row 026. ....... 112
Table 35: Results of assessment the classifications for the image in path 193, row 027. ....... 112
Table 36: Results of assessment the classifications for the image in path 194, row 025. ....... 113
Table 37: Accuracies obtained for the scene in path 192, row 026 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 139
Table 38: Accuracies obtained for the scene in path 192, row 027 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 139
Table 39: Accuracies obtained for the scene in path 193, row 025 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 140
Table 40: Accuracies obtained for the scene in path 193, row 026 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 140
Table 41: Accuracies obtained for the scene in path 193, row 027 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 141
Table 42: Accuracies obtained for the scene in path 194, row 025 tuning the C and sigma
parameters for the support vector machines algorithm. .......................................... 141
List of tables Development of a method for forest type detection Juan Ygnacio López Hernández
- xvi -
Table 43: Confussion table for the scene in path 192, row 026 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 142
Table 44: Confussion table for the scene in path 192, row 027 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 142
Table 45: Confussion table for the scene in path 193, row 026 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 143
Table 46: Confussion table for the scene in path 193, row 025 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 143
Table 47: Confussion table for the scene in path 193, row 027 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 143
Table 48: Confussion table for the scene in path 194, row 025 comparing the NFI plots
with the group obtained by SVM classification algorithm. .................................... 144
Juan Ygnacio López Hernández Development of a method for forest type detection List of a
- xvii -
List of acronyms
CLC2000 Corine Land Cover database 2000
DEM Digital elevation model
ETM+ Thematic Mapper Enhanced
FSO Feature Space Optimization
GIS Geographic information systems
GLS Global land survey
JRC Joint Research Centre of the European Commission (EC)
kNN k Nearest neighbours
LAI Leaf area index
LFU Bayerische Landesamt für Umwelt (Bavarian State Office for the Environment)
MD Minimum distance to means
ML Machine learning
MNL Multinomial logistic regression
MxL Maximum likelihood
NFI National forest inventory (in German BWI, Bundeswaldinventur)
OA Overall accuracy
PCA Principal component analysis
PDF Portable document format
REDD+ Reducing emissions from deforestation in developing countries and approaches to
stimulate action
SVM Support vector machines
TM Thematic Mapper
List of acronyms Development of a method for forest type detection Juan Ygnacio López Hernández
- xviii -
Juan Ygnacio López Hernández Development of a method for forest type detection 1.- Introduction
- 1 -
1 Introduction
Concern over the correct use and conservation of natural resources is on the increase globally.
Considering only the amount of published documents, recorded in the science direct database,
regarding conservation and natural resources in 2009 were 7428. That figure increased to
7703 in 2010, 9288 in 2011 and 11493 in 2012. The planning and evaluation of the use of
forest resources is largely based on map products. (Kennedy, et al. 2009; MacAlister et al.
2009, Raymond et al. 2009; Hooftman and Bullock, 2012) The mapping process is carried out
by means of the inventory and analysis of satellite images. (Brandt, J. et al. 2012; Ohmann, J.
et al. 2012 and Sedano, F. et al. 2012) Satellite data together with advances in mathematics,
statistics and the computing sciences applied to the data retrieved in order to develop and
standardise methods, and to produce maps in a more efficient way.
Often at the time of evaluations carried out in the past, no standard definition of forest was
agreed upon. Even though currently there is still no universal agreement, in the year 2000 the
FAO published a standard definition of forest that should be considered in every new study of
forest cover carried out anywhere across the globe. (Achard, 2009) Many earlier studies of
forest were carried out in small areas or with limited coverage, and the statistical methods
applied were based upon the supposition of a normal probability distribution; an assumption
not supported by the more comprehensive data sets provided using modern remote sensing
instruments. (Camps-Valls, G and Bruzzone, L. 2009) New non-parametric statistics provide
methods and tools producing results that approximate better the data collected from
measurements made in the field, than parametric statistics. (Herbrich, R. 2002)
Evaluation of forest areas require to cover broad areas in short time. For forest areas, the most
common data source for evaluation is based on satellite scenes. The remote sensing
community is producing constantly maps and assessment of the natural resources. Newly
developed remote sensing instruments orbiting the Earth can be used to calibrate, and even
validate, the results of old Thematic Mapper (TM) and Thematic Mapper (ETM+) products or
other remote sensing data. (Racoviteanu, A. et al. 2008) NASA, the US National
Administration of Space Agency, has a programme to continue the acquisition of new
imagery using new and improved instruments which will provide the scientific community
with even more up-to-date data, the LANDSAT Data Continuity Mission (LDCM). (NASA.
2012) The United Nations (UN) is currently implementing a programme on Reducing
Emissions from Deforestation and Degradation (REDD+) focused on the evaluation of natural
1.- Introduction Development of a method for forest type detection Juan Ygnacio López Hernández
- 2 -
resources and on the accounting the CO2 emissions produced by third world countries. (FAO,
UNDP, UNEP. 2012) By conducing changes to land use practices addressed to reduce the
emissions of CO2 produced in deforestation and degradation, the countries could derive
profits and receive technical support. Most of the evaluations made as part of this programme
were based on the comparison of forest layer coverage at different dates. However, in the
absence of a standard method and a standard forest definition, the comparability of the data is
open to question and the results ambiguous. (Bock, M, et al. 2005)
For these reasons, the assessment of methods and their application to the evaluation of forest
cover is of considerable importance for coming generations. The process may reveal ways to
apply the results obtained to many activities; for example, in supporting the reduction of CO2
emissions, the steering of the actual forest cover to an optimal level, the restoration of
damaged forest areas, the preservation of ecological diversity and the provision of services to
communities. (Grainger, A, and M. Obersteiner. 2011)
Other policies focusing on conservation and sustainable development may be supported by
evaluations of natural resources. The process of mapping forest cover is an important part of
this analysis. In general, the management of forests is based on knowledge of the current
conditions so that they may be steered towards optimal productive and protective conditions.
(Corona, P. et al.2011, Hooftman, D. et al. 2012, Ohmann, J. et al. 2011 and Zaho, Y. et al.
2011)
With the use of a standardized method to develop in this work, the forest mapping either
forest map production, evaluations of degradation, carbon stock, related conservation and
management can be improved in terms of quality and resources to find internationally valid
products. The decision making process will be boosted when the information about actual
conditions of forest areas are well known.
Juan Ygnacio López Hernández Development of a method for forest type detection 1.- Introduction
- 3 -
1.1 Objectives
The general aim of the research presented here focused on two main tasks:
1. The development of a method comparison to obtain forest maps for large areas
First separate the forest areas from other land cover types. The inductive approach can
ensure quality control in the processing chain.
Second within forest areas, classify forest types.
Third, evaluate the accuracy of the classical pixel based image analysis (PBIA) for
classification of forest areas against object based image analysis (OBIA).
2. Generate a digital layer comprising the forests of Bavaria.
1.- Introduction Development of a method for forest type detection Juan Ygnacio López Hernández
- 4 -
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 5 -
2 Basic Information
2.1 Literature review
2.1.1 Definition of forest
There is no broadly accepted basic definition of forest today. Rowntree (1984) stated that
different definitions of forest could be based only on changing the percent canopy cover.
Another study from the USA stated that canopy cover might range from 18 to 36 % (Marotz
and Coiner, 1973, cited by Rowntree, 1984, p. 4). Continuing in USA, other authors cited by
Rowntree (1984) used 7, 8, 17, 24, 34 and 39 % canopy cover to define forest from aerial
photos. This problem was reviewed in the United Nations Environment Programme (UNEP).
The authors of the study presented world maps of the forest cover and illustrated graphically
the variation in the global forest area depending on the different thresholds of canopy cover
(Achard, 2009), as shown in Figure 1.
Figure 1: Contrasting maps of forest cover globally based on different definitions of canopy cover. Source: Hansen et
al. (2003) and Kirkup (2001), cited in Achard (2009).
A working paper by the FAO indicated the need for a concerted global effort to standardise
the definition of forest (Neeff et al., 2006). Some authors have stated the need to incorporate
the ecological parameters of pioneer and non-pioneer species (Swaine and Whitmore, 1988).
However, this definition is useful only for tropical rain forest. Other authors have stressed the
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 6 -
need to consider the natural mixture of taxonomical classes. As the image processing chain
normally reduces the amount of information pertaining to ecosystems of a continuous nature,
this reduction results in uncertainties in terms of ecosystem mapping. (Rocchini et al., in
press) Sasaki and Putz (2009) highlighted the need for a new definition of forest that is
universally acceptable. They maintained that, “great quantities of carbon and other
environmental values will be lost when natural forests are severely degraded or replaced by
plantations but technically remain forests.” They also suggested that the classification as
forest should be based on a canopy coverage higher than 40 %. In this study, the definition
was modified in terms of canopy cover. The optical data obtained using remote sensing
instruments does not allow forests of low canopy cover to be distinguished readily. The Food
and Agriculture Organisation of the United Nations (FAO) provided a definition of forest as,
“land spanning more than 0.5 hectares with trees higher than 5 meters and a canopy cover of
more than 10 percent, or trees able to reach these thresholds in situ. It does not include land
that is predominantly under agricultural or urban land use” (FAO, 2010).
Certain approaches were developed to counter the loss of information. These approaches use
the landform classification based on hydrology and topography. In subdividing the Earth into
natural areas, the soil scientists and geologists introduced the term geomorphometric units
(Hengl & Reuter, 2009). The calculation of geomorphometric units is based on standard
terrain analysis. The geoinformatics principles are based mainly on DEM data and were
documented by Li et al. (2005). The uniform regions can be used as a first subdivision of the
geographic space, then either subdividing again or by directly applying the classification of
the scenes. An example of this approach was published by Pert et al. (2012). They developed
an index for anthropogenic threats of fire at a regional scale in a bioregion of Australia. In
their method they deemed that, “each sub-region has a characteristic climate, pattern of
geology and landform, and associated soils and vegetation” (Pert et al., 2012). A similar
method to address the continuum found in nature was used for a study of natural and land use
history in a mountain region of the USA and how these related to patterns of plant invasions
(Parks et al., 2005). The final product is a map that should be evaluated in terms of the
accuracy of the units presented (Seabrook et al., 2007; Fraser et al., 2009; Mellin et al., 2010).
A cloud of points is placed over the area according to certain sampling criteria, for example,
randomly, and the value obtained from the map evaluated compared to a reference. The
evaluation of the map is made and reported in terms of percent accuracy and the probability of
finding a point on the map for belonging to a selected class that is also in the same class in the
verification data (user accuracy), or the probability that a point in the verification data is also
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 7 -
present in the corresponding class in the classification (producer accuracy) (Stehman, 1997;
Foody, 2002). The maps produced using remote sensing data are usually validated with data
from one of three different sources: inventory data (Thessler et al. 2005; Gillespie et al.,
2006), data from field measurements (Powell et al., 2010; Avitabile et al., 2012) or using
images with a higher resolution (Huang et al., 2008; Kirui et al., 2011).
The evaluation of natural resources, and particularly of forest, is frequently carried out using
remote sensing instruments. By definition, these instruments need not touch the objects being
measured in order to produce a measurement. “The remote sensing sensors can be classified
in two types: passive or active. Passive sensors measure radiation that reaches a detector
without the sensor first transmitting a pulse of radiation. Active sensors emit a pulse and later
measure the energy returned or bounced back to a detector. Both passive and active sensors
record the intensity of a signal within a wavelength interval, known as a ‘band’ or ‘channel,’
of specified width within the electromagnetic spectrum” (Turner et al., 2003). Another
classification of the instruments used for remote sensing is based on the number of bands
from the electromagnetic spectrum. Black and white cameras are examples of monospectral
instruments. Multispectral remote sensors are those with more than one band, such as those
producing colour photographs. Hyperspectral instruments usually have bands with a narrow
spectral width. Some bands in the sensors are capable of measuring the temperature of the
objects, and are referred to as thermal (Shaw and Burke 2003).
2.1.2 Forest and type of forest mapping with LANDSAT data
One of the most popular remote sensing systems in civil applications is the LANDSAT series.
The LANDSAT series of satellite systems began in July 1972 with the launch of LANDSAT
1. From that moment up to the present, the world is being recorded in images that have
similar characteristics in terms of geometry and radiometry. In Figure 2 the timeline of
LANDSAT missions is presented. Currently LANDSAT 5 TM is set to standby, in
preparation for decommissioning by the USGS. Since May 31st 2003 LANDSAT 7 has been
operating with a fault in the scan line corrector known as SLC-off. This fault produces
banding at the borders of the image. A summary of the dates and sensors on board the
satellites is provided in Table 1 and Figure 2.
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 8 -
Table 1: Summary of the launch dates and the sensors on board the satellites.
LANDSAT Launch date Sensors Imaging up to
1 July 23, 1972 RBV, MSS January 6, 1978
2 January 22, 1975 RBV, MSS February 25, 1982
3 March 5, 1978 RBV, MSS March 31 of 1983
4 July 16, 1982 MSS, TM December 14, 1993
5 March 1, 1984 MSS, TM August 1995 MSS
May 8, 2012 TM
6 October 5, 1993 ETM Did not achieve orbit
7 April 15, 1999 ETM+ Present
8-LDCM January 24, 2013 OLI -
Figure 2: LANDSAT mission’s timeline (USGS, 2012).
Access to this historic database is open and can be attained through the internet without
restriction, copyright of the USGS. New advances in mathematics and statistics have opened
broad avenues to apply grouping techniques to the classification of satellite images. A branch
of machine learning algorithms has produced techniques to classify numerical data free from
distribution assumptions (Yang, 2010). The success of non-parametric methods is frequently
reported in remote sensing research papers (Tokola et al., 1996; Chirici et al., 2008; Shimatani
et al., 2008; Chang et al., 2010).
These characteristics make the LANDSAT satellite series the most popular reference for
mapping purposes. Even the new remote sensing instruments, using similar or contrasting
resolutions, are commonly used to validate or calibrate TM and ETM+ data (Cohen et al.,
2010; Huang et al., 2010; Propastin and Erasmi, 2010).
The assessment accuracy of the maps has increased with time. Wolter et al. (1995) produced
maps containing a classification of forest in the USA. They reported 83.2 % OA based on
verification using aerial photos and maps. Fooddy and Hill (1996) reported 84 to 94 % OA.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 9 -
They based their validation on contextual information and ancillary information, particularly
on topography. Hyyppä et al. (2000) based a validation of a forest map produced from remote
sensing data on forest inventory data. They used SAR, LANDSAT TM, ERS-1/2 SAR PRI
and SLC, JERS-1 SAR as a source of data and reported the accuracy in terms of error. The
result obtained for random error of the mean height of the trees of 2 m with a standard
deviation of 0.57 m. They determined a random error of the basal area of 3.7 m2/ha with a
corresponding standard deviation of 0 m2/ha. The result in relation to volume exhibited a
random error of 41.0 m3/ha, with a standard deviation of 19.3 m
3/ha. This large number of
sources of remote sensing data is not available for locations all around the world, and would
cost too much for conservation programmes and for the evaluation of forest degradation.
Heikkilä et al. (2002) used LANDSAT TM and aerial photographs as sources of remote
sensing data for the purposes of evaluating defoliation. They ascertained a 60.1 % OA for
sample plots that changed due to defoliation. They also reported that the most useful variables
to estimate defoliation are TM channels 4 and 5. However, the OA was 56 % for plot and
stand level, with a kappa coefficient of 0.12. Dorren et al. (2003) fused LANDSAT TM, IRS
and DEM data for the purposes of forest mapping, used object-based classification and
compared the results against the pixel classification. They used colour-infrared orthophotos to
build ground truth polygons to be used in the evaluation. Their results showed that the pixel-
oriented classification was the best approach, with 73 % OA. The object-oriented
classification achieved only 70 % OA. The most important features used in the analysis were
DEM, TM4, TM5, TM4 - TM2. Jiang et al. (2004a) used digital orhto-quad images in the
verification of maps of seral forest made from LANDSAT ETM+ images and attained 90 %
OA.
Salovaara et al. (2005) attained 85 % OA in the classification of Amazonian primary rain
forest vegetation. They used line transects of 500 m to make the floristic assessment. Kennedy
et al. (2007) obtained 90 % OA in trajectory-based change detection for the automated
characterisation of forest disturbance dynamics. The evaluation was carried out by comparing
delineations of the change in forest conditions made visually by the interpreter against the
findings of the automated approach. The same level of accuracy (90 %) was found by
Pekkarinen et al. (2007) in the pan-European forest map. They used supervised classification
and validated on the basis of three sources: field plots, visual interpretation of very high
resolution imagery and official national statistics. The highest error of 5 % was found in the
national statistics. In the USA, Triepke et al. (2008) used LANDSAT TM+ data to map forest
alliances and associations. They achieved 60 % accuracy using a regional database of plot
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 10 -
records. Thessler et al. (2008) evaluated two statistical grouping techniques to classify rain
forest types in Costa Rica with LANDSAT TM images. They achieved 91 % OA validating
using three different field data sets: 41 plots in an old growth forest, 11 permanent plots in an
old growth forest and 52 plots in a young forest. Pu et al. (2008) reported 71 % OA mapping
forest crown closure and leaf area index from LANDSAT ETM+ data. They also used data
from other sensors, such as Hyperion and ALI. Both of the latter provided better results than
ETM+. The results of these studies show that the classification accuracy of LANDSAT TM
images is not the same for all locations. Most of the recent research was conducted using
more sources of data or using data of a higher spatial resolution. Using IKONOS data, Huang
et al. (2008) applied the concept of the dark object to automate forest change. They evaluated
the change in seven study areas representing different biomes across the world. The best OA
was over 90 % for five areas, compared to 89.4 % and 89.6 % for the remaining two.
A forest map of the whole of Europe has been produced (Pekkarinen et al., 2007; 2009), by
the Joint Research Centre (JRC) of the European Commission (EC). They employed a fully
automated procedure, verified using the Corine Land Cover database 2000 (CLC2000). The
method is one of the most widely accepted in the region but the minimum mapping unit
(MMU) was 25 ha and the degree of canopy cover 30%. Given these characteristics,
especially the MMU, many forest areas were excluded from the map. They applied
segmentation of single LANDSAT TM and ETM+ images, and built an adaptive spectral
representativity analysis tool to identify representative combinations of spectral and
informational classes of interest based on the CLC2000 database. Seebach et al. (2011)
reported that the adaptive spectral representativity analysis is the core part of the method to
properly select the CLC2000 data as training data.
Today new methods are waiting to be applied in forest evaluations. They are more than 100
methods are available from machine learning information theory and have been programmed
into the statistical language (Kuhn et al., 2012). However, only a dozen of these are reported
in the literature in the context of forest and land cover mapping applications (Moisen and
Frescino, 2002; Remm, 2004; Peters et al., 2007; Rogan et al., 2008; Sesnie et al., 2008;
Brenning, 2009; Ke et al., 2010; Chen et al., 2011). New methods such as robust linear
regression, quantile regression forests, quantile regression neural networks, stabilised linear
discriminant analysis and high dimensional discriminant analysis are just some of those
reported on and implemented in the statistical software (Kuhn et al., 2012). These methods are
waiting to be applied in forest mapping applications. All that is required for their application
is good data, some modelling foundations and careful analysis of the results.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 11 -
2.1.3 Variable selection methods
The main interest in modeling is to represent the world with the lowest amount of variables,
also known as features, predictors or descriptors. That principle is known as parsimony.
(Marsh and Hau, 1998 cited by Raykov and Marcoulides 1999) Three types of variable
selection methods were selected based on the availability of preprogrammed software
accessible for implementation.
2.1.3.1 Criterion based variable selection
Both Calculation Methods shares some propreties:
Base on the amount of information obtained from a set of variables.
Do not depend on normality of the input variables.
Can use either categorical or continuous variables.
Can use forward or backward stepwise variable selection.
Penalize some variables to promote others.
2.1.3.2 Feature space optimization (FSO)
According to Trimble (2011b) “the feature Space Optimization function offers a method to
mathematically calculate the best combination of features in the feature space.” When
classifying image objects using the Nearest Neighbor classifier, the recommended workflow
involves the next steps:
1. Load or create classes
2. Define the feature space
3 Define sample image objects
4. Classify, review the results and optimize the classification.
This option is available only in eCognition and no other classifiers can be tested.
2.1.3.3 Recursive feature selection
A tool already implemented for variable selection was chosen from caret package in R.
This method for variable selection is based on random forest. The algorithm uses backwards
variable selection. The process starts with all of the variables as predictors in the model. The
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 12 -
predictors are ranked and the less important ones are sequentially eliminated prior to
modelling. The goal is to find a subset of predictors that can be used to produce an accurate
model. The metrics to select the predictors can be set from accuracy or the root mean square
error (Kuhn et al., 2012). Among the steps documented can be found the eliminating of zero
and near zero variance predictors, identifying and removing the correlated predictors,
removing linear dependent predictors, centering and scaling, imputation, transforming, data
splitting.
With the variables selected, an approach on classification can be applied. For remote sensing
data, the classification is divided in two types: unsupervised and supervised. A general
description of these two types of classification types is described in next sections.
2.1.4 Unsupervised classification
The classification algorithms applied can be presented in two types: Supervised and non-
supervised. Both of them builds groups called clusters. The principle is that the groups should
have all the elements with similar characteristics. The difference is for unsupervised, we do
not know ahead of time where the clusters are located or what they look like. This is the
difference with supervised learning or classification, where we attempt to assign data points to
pre-existing classes. (Janert 2011)
Clustering techniques can be used for classification with the definition of the classes made at
the end of the process. (Gan, G. et al. 2007). The list of algorithms reported in mathematical
and statistical sciences is increasing with the time. Two algorithms were selected for
classification of forest and non-forest areas and for clustering the reference data. The main
reason is that they were used in previously for forest mapping or were recommended to obtain
strong structures in clusters of the data. Both methods are based in central points to build the
clusters. Centroids are the objects located in the center of the cluster and are considered as
representative of it as in ISOCLASS. When partitioning, the other algorithm selected and
described in 2.1.4.2 (page 13), the objects in a cluster “show high degree of similarity, while
objects belonging to different clusters are as dissimilar as possible.” The dissimilarity is a
measure of distance in relative units from 0 to 1. A short description of both clustering
algorithms is presented next.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 13 -
2.1.4.1 ISODATA
“The Iterative Self-Organizing Data Analysis Technique (ISODATA) is considered an
iterative technique because it may make many passes through the data, rather than just two, in
order to develop an adequate final set of clusters.” (Khorram 2012, 49) This algorithm was
reported by Kaufman L., and Rousseauw, P. (2005). They citrated Ball and Hall (1965) and
presented the main features as follows:
“The method starts with a clustering into a given number k of clusters, according to the
method of Forgy
In a second step, outliers and very small clusters are eliminated; they are disaggregated for the
remainder of the method.
Then perform either a lumping (fusion) or splitting of one of the clusters. This is done
according to the following rules:
Perform a lumping if the current number of clusters is more than 2k.
Perform splitting if the current number of clusters is less than k/2.
Otherwise alternate between lumping and splitting.
Stop if the same clustering is obtained twice.
Return to the first step with the newly obtained number of clusters (which replaces k), unless
the user-specified maximum number of iterations is reached.” (Kaufman, L., Rousseeuw,
P.J., 2005)
The ISODATA algorithm was proven to be useful in the separation of forest and non-forest
areas as well as soil cover and crop evaluation. (Gumma et al. 2011; Dheeravath et al. 2010;
Kim and Ellis 2009; Biradar et al. 2009; Dorrough and Moxham 2005; Verhoeye and De
Wulf 2002; Chen, Tateishi, and Wang 1999; Cihlar, Ly, and Xiao 1996; Pope, Rey-Benayas,
and Paris 1994)
2.1.4.2 Partitioning around medoids (PAM)
The PAM method is termed also as k-medoids. (Kaufman and Rousseeuw 2005) “The
algorithm works in the same way as the k-means algorithm except that, instead of calculating
the new centroid, search through all points in the cluster to find the data point (the medoid)
that has the smallest average distance to all other points in the cluster.” (Venables and Ripley
2010) The k-means algorithm is linear in the number of data points; the k-medoids algorithm
is quadratic in the number of points. (Janert 2011) It results heavy to a single computer to find
the solution. The main reasons on the selection of PAM is that it can process categorical data,
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 14 -
when strong clusters are present, k-means fails, the operator should set the k number of
clusters in k-means without a tool for assist in this point, in PAM the silhouette is built and
can be used to assess the most adequate value of k. (Janert 2011) The PAM method is also
“more robust to messy data, and will always return the same clusters. (Hengl 2007). In
IDAMS statistical package developed by UNESCO (2008) documents the algorithm and says
“It is more robust than k-means, because it minimizes a sum of dissimilarities instead of a
sum of squared Euclidean distances.”
As documented in Janert (2011), the algorithm select initial positions, in the feature space, for
the cluster medoid, calculate its distance (dissimilarities) from each cluster medoid, assign
each point to the nearest cluster and repeat the process. All the points are used as medoids in
this process. In this iteration a record of the mean dissimilarities and the number of clusters is
operated from k to 2 building the known silhouette and calculating the silhouette coeflcient
(SC). The operator should select the size of k based on in the maximum SC encountered. The
R implementation of PAM also includes other tools for graphic visualization of clusters in
standard cluster plot and silhouette representation. When the dataset are big, another package
called CLARA as acronym for Clustering for Large Applications is also available. It makes
possible to implement PAM algorithm for large datasets.
A short descriptions presented by Kaufman and Rousseeuw (2005), says that “the clustering
of a set of objects with CLARA is carried out in two steps. First a sample is drawn from the
set of objects and clustered into k subsets using the k-medoid method, which also gives k
representative objects (this is done with the same algorithm as in PAM). Then, each object not
belonging to the sample is assigned to the nearest of the k representative objects. This yields a
clustering of the entire data set. A measure of the quality of this clustering is obtained by
computing the average distance between each object of the data set and its representative
object. After five samples have been drawn and clustered, the one is selected for which the
lowest average distance was obtained. The resulting clustering of the entire data set is then
analysed further. For each cluster, CLARA gives its size and its medoid and prints a complete
list of its objects. Also a graphical representation of the clustering is provided. CLARA can
deal with much larger data sets than can PAM in the same computer.”
Other statistical approach different to clustering or partitioning is the supervised
classification. The nature and the procedures in those methods are very different and are
presented in the next section.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 15 -
2.1.5 Supervised classification
The term is also known as supervised learning, that assign each record to exactly one of a set
of predefined classes. In supervised classification the classes are known ahead of time and
don’t need to be inferred from the data. Algorithms are judged on their ability to assign
records to the correct class. (Janert 2011)
Continuing with Janert (2011) “first the user split the existing data set into a training set and a
test set. In the training phase, present each record from the training set to the classification
algorithm. Next compare the class label produced by the algorithm to the true class label of
the record in question; then adjust the algorithm’s “parameters” to achieve the greatest
possible accuracy or, equivalently, the lowest possible error rate. The results can be
summarized in a so-called confusion matrix whose entries are the number of records in each
category. Unfortunately, the error rate derived from the training set (the training error) is
typically way too optimistic as an indicator of the error rate the classifier would achieve on
new data—that is, on data that was not used during the learning phase. This is the purpose of
the test set: after we have optimized the algorithm using only the training data, we let the
classifier operate on the elements of the test set to see how well it classifies them. The error
rate obtained in this way is the generalization error and is a much more reliable indicator of
the accuracy of the classifier.”
A set of five algorithms for supervised classification will be evaluated. All they share similar
properties. They are distribution free. The Bayesian basis of the analysis is taking a
probabilistic (i.e., nondeterministic) view of classification. (Janert 2011) All of them were
reported previously applied to remote sensing data in small regions, with similar data and
reported good to very good accuracies in the validation process.
2.1.5.1 Minimum distance to means (MD) method
According to Navulur (2007), “the minimum distance classifier sets up clusters in
multidimensional space, each defining a distinct class. Each pixel within the image is then
assigned to that class it is closest to. This type of classifier determines the mean value of each
class in each band. It then assigns unknown pixels to the class whose means are most similar
to the value of the unknown pixel.” The method is presented as a robust estimator of unknown
distributions (Parr and Schucany, 1980). This method is “not widely used in remote sensing
work” as it “is not always accurate; there is no provision for accommodating differences in
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 16 -
variability of classes, and some classes may overlap at their edges”. See Figure 3. (Campbell
and Wynne, 2011).
Figure 3: Minimum distance classifier. Source: Campbell and Wynne (2011).
In Figure 3, the small dots represent pixels from training fields and the crosses represent
examples of large numbers of unassigned pixels from elsewhere on the image. Each of the
pixels is assigned to the closest group, as measured from the centroids (represented by the
larger dots) using the distance measures discussed in the text.
2.1.5.2 Maximum likelihood (MxL)
The MxL method is “the most powerful classifier in common use” (Navulur, 2007). Navulur
(2007) stated that, “based on statistics mean, variance/covariance, a Bayesian probability
function is calculated from the inputs for classes established from training sites. Each pixel is
then judged as to the class to which it most probably belongs.” The MxL method is
implemented in many image processing programmes and is often reported to be a good
classifier for forest mapping (Fooddy and Hill, 1996; Bozdogan, 2000; Kokaly et al., 2007;
Ward, 2008; Couturier et al., 2009). As the classifier requires intensive calculations it has the
disadvantage of requiring more computer resources than most of the simpler techniques.
Another characteristic is that the method is sensitive to variations in the quality of training
data, even more so than most other supervised techniques. Computation of the estimated
probabilities is based on the assumption that both training data and the classes themselves
display multivariate normal (Gaussian) frequency distributions. That is the reason why the
training samples should present unimodal distribution, and often the data from remotely
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 17 -
sensed images do not strictly adhere to this rule (Campbell and Wynne, 2011). Figure 4
presents this principle.
Figure 4: Maximum likelihood classification. Source: Campbell and Wynne (2011).
These frequency distributions represent pixels from two training fields; the zone of overlap
depicts pixel values common to both categories. The relation of the pixels with the region of
overlap to the overall frequency distribution for each class defines the basis for assigning
pixels to classes. Here, the relationship between the two histograms indicates that the pixel
with the value ‘45’ is more likely to belong to the forest (‘F’) class rather than the crop (‘C’)
class.
2.1.5.3 Fuzzy k nearest neighbours (kNN) method
Fuzzy kNN is a method for classifying objects based on the closest training samples in the
feature space. The k-nearest neighbour algorithm is a non-parametric classification algorithm
that considers an object classified by a majority vote of its feature space neighbours with the
object being assigned the most common class among its k nearest neighbours (k is a positive
integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest
neighbour (Shakhnarovich et al., 2006). Figure 5 presents this voting process.
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 18 -
Figure 5: k-nearest neighbours classifier. Source: Campbell and Wynne (2011).
kNN assigns candidate pixels according to a ‘vote’ of the k neighbouring pixels, with k
determined by the analyst. Mills (2011) compared the kNN to other newer methods, such as
learning vector quantisation and support vector machines. He found that kNN is slower in
both stages: training and classifying. Man et al. (2004) reported that, “k-nearest neighbours is
computationally efficient and is easy to visualise and understand.” The same kNN method
was used by Mäkelä and Pekkarinen (2004) to discriminate stand volumes from TM images
and field inventory data from stands. They concluded that, “the estimation results obtained are
not accurate enough for forest management purposes.” This conclusion may be inaccurate,
however, because they selected volume as a predicted variable, and the remote sensing data
does not include information about elevation of the canopy.
The k parameter is used when tuning kNN in caret package.
2.1.5.4 Support vector machines (SVM)
The SVM is a supervised learning kernel method that generates input-output mapping
functions from a set of labelled training data to create theoretical areas called maximum-
margin hyperplanes. The hyperplane (see Figure 6b) intersects the training samples in a n+1
dimensional feature space. The new dimension is automatically calculated by the algorithm.
In cases when given classes cannot be linearly separated in the original input space, the SVM
first (non-linearly) transforms the original input space into a higher dimensional feature space
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 19 -
(Wang, 2005). The formulation of this process was also presented by Karatzoglou et al.
(2006) and Koutroumbas et al. (2006).
Figure 6: In the example of the classification with SVM, the groups can be separated by a line when the axis is
transformed. Source: Verplancke et al. (2008).
As presented in Figure 6, in (a) the true decision boundary (x1)2 +(x2)
2 ≤ 1 is also shown. (b)
The same data after mapping into a three-dimensional input space ((x1)2, (x2)
2, (2(x1)(x2) )
1/2
). The circular decision boundary in (a) becomes a linear decision boundary in three
dimensions (b).
“In searching for the best hyper-plane, SVMs find a set of data points which are most difficult
to classify. These data points are referred to as support vectors” (Yang, 2010). SVM finds a
linear separating hyperplane with the maximum margin in this higher dimensional space. The
penalty cost C and the gamma (some authors refers as sigma) parameters are unknown and
should be calculated for every single problem (Hsu et al., 2010). That calculation is referred
to as tuning for the modelling process. The grid search is an exhaustive tuning method to
search through the input feature space, or a subset, for the best values for these parameters.
The cross-validation procedure can prevent the overfitting problem. By dividing the training
set into v subsets of equal size, sequentially one subset is tested using the classifier trained on
the remaining v-1 subsets. Thus, each instance of the whole training set is predicted once so
the cross-validation accuracy is the percentage of data correctly classified (Hsu et al., 2010).
“Training the SVM with a Gaussian radial basis function requires setting two parameters: C is
a regularization parameter that controls the trade-off between maximizing the margin and
minimizing the training error, while sigma describes the kernel width”. (Knorn, et al. 2009)
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 20 -
In this case, the main application of SVM is in the classification of sets of objects selected
from the output of the segmentation. Every segment involves spectrally similar neighbour
pixels. Among those pixels basic statistical measurements were made. Tuning SVM sigma
and C were used in caret package.
2.1.5.5 Classification and decision trees (DT)
According with Janert (2011) “decision trees consist of a hierarchy of decision points (the
nodes of the tree). When using a decision tree to classify an unknown instance, a single
feature is examined at each node of the tree. Based on the value of that feature, the next node
is selected. Leaf nodes on the tree correspond to classes; once we have reached a leaf node,
the instance in question is assigned the corresponding class label.”
In every node a variable is used. The algorithm can be adapted to search in certain amount of
predictors in every node. This is the only value used when tuning the DT. In caret the name of
this parameter is mtry.
2.1.5.6 Classification with random forest (RF)
“RF apply specifically to decision trees. In this technique, randomness is introduced not by
sampling from the training set but by randomly choosing what features to use when building
the decision tree. Instead of examining all features at every node to find the feature that gives
the greatest gain ratio, only a subset of features is evaluated for each tree” (Janert, 2011).
Witten et al. (2011) says that RF builds a randomized decision tree in each iteration of the
algorithm, and often produces excellent predictors. The parameter mtry, as used in DT, was
used in caret for tuning the RF.
2.1.5.7 Classification with multinomial logistic regression (MNL)
MNL is a regression model which generalizes logistic regression by allowing more than two
discrete outcomes. The multinomial logistic regression is an extension of the logistic
regression for multiple responses (Wikipedia contributors, 2012). “Until the arrival of support
vector machines, it was the method of choice for many classification problems” (Janert,
2011). The logistic function takes on only positive values between 0 and 1.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 21 -
In the application of MNL regression, the nnet package is used to search the best decay
parameter, which “adds to the error function a penalty term that consists of the squared sum of
all weights in the network, as in ridge regression. This attempts to limit the influence of
irrelevant connections on the network’s predictions by penalizing large weights that do not
contribute a correspondingly large reduction in the error” (Witten et al., 2011). The network is
the term used in the paradigm on the motivation for research into the supervised learning
problem in the fields of machine learning with biological analogies to the brain (Hastie et al.,
2009). This principle is presented in Figure 7.
Figure 7: Representation of a network from the multilayer perception with a hidden layer. From Witten et al. (2011).
Figure 7 presents the paradigm of the biological neural network in the brain connecting input
data to produce the output based on the connection among ideas.
The list of algorithms presented could be applied once and the results will bring a measure of
error and accuracies. From the machine learning tools, the principle is to apply many times
the same algorithm in different conditions. A description of this process is presented in the
following section.
2.1.6 Classification with machine learning tools
The list of methods available for classification under the machine learning approach is
extensive. In the caret package in R this list features 122 methods for classification, regression
only or a combination of both (Kuhn et al., 2012). A set of five classification and regression
algorithms was used for types of forest. The algorithms were selected based on their ability to
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 22 -
classify efficiently satellite images in forest mapping applications (Inglada, 2007; Balas,
2008; Brenning, 2009; Qiu et al., 2009; Chen et al., 2011). The algorithms selected and
compared were maximum likelihood (MxL), minimum distance to means (MD), k nearest
neighbours (kNN), multinomial logistic regression (MNL) and support vector machines
(SVM). The MxL and MD methods were applied for the classification of forest and non-forest
areas. The kNN, MNL and SVM methods were applied in classification and regression for
forest types in a subset of one scene only. All the supervised classification algorithms were
applied to the OBIA products. The classification of forest/non-forest areas was made using
the pixel-oriented approach and the classification of forest types was made using object-
oriented segmentation (Trimble, 2011a).
2.1.7 Tuning algorithms
The machine learning (ML) tools must be implemented in such a way that the model selected
is trained using the parameters selected. The precise value for an algorithm is not known a
priori. A measure of quality such as accuracy should be evaluated to find the best value for
the best parameters. Each algorithm has its own tuning parameters. From the manual for caret
(Kuhn et al., 2012), the list of parameters used is defined by the algorithm selected. In Table 2
the selected algorithms and their tuning parameters for classification and regression is
presented.
Table 2: Selected algorithms and their tuning parameters.
Model Family method Package Tuning Parameters
Random forest rf randomForest mtry
cforest party mtry
Support vector machines svmRadial kernlab sigma, C
k nearest neighbors knn caret k
Multinomial logistic regression multinom nnet decay
The random forest “is a substantial modification of bagging that builds a large collection of
de-correlated trees, and then averages them” (Hastie et al., 2009). “Each tree is learned using
a bootstrap sample obtained by randomly drawing N cases with replacement from the original
dataset, where N is the number of cases in that dataset. With each of these training sets, a
different tree is obtained. Each node of these trees is chosen considering only a random subset
of the predictors of the original problem. The size of these subsets should be much smaller
than the number of predictors in the dataset. The trees are fully grown, that is, they are
obtained without any post-pruning step” (Torgo, 2010). Witten et al. (2011) stated that this
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 23 -
“often produces excellent predictors”. For this reason, the random forest was used for variable
selection in this process. The parameter mtry, used both for decision trees and random forest,
is the only parameter that requires that some judgment be made. In the process of tuning, the
value that achieves the highest accuracy is the one to use.
The tuning parameters for SVM in the kernlab package are sigma and C. This is the only
algorithm that requires more than one parameter be tuned. In this case, the term grid search is
used for the selection of the optimum values of both parameters. Witten et al. (2011)
mentioned that it is performed over a pair of classifier options. Describing the same
technique, the same authors claimed, “it offers the ability to optimize parameters of a
classifier, a pre-processing filter, or one parameter from each.” In this process, according to
the same source, “the user specifies the lower and upper bounds for each parameter, and the
desired number of increments.” The process reports the best combination of sigma and C,
which corresponds to the highest accuracy and kappa values.
When applying the kNN algorithm, the parameter to tune is the k number of neighbors in the
evaluation of the class to be assigned.
2.1.8 Cross validation
The use of remote sensing data for mapping is based on statistical classification methods.
Some methods can be applied to either normally distributed or distribution-free data
(Kaufman & Rousseeuw, 2005). The information technology supports numerical description
for many of the basic principles. This technology has also provided many tools, algorithms
and systems applied in biology; a science known today as bioinformatics (Yang, 2010). The
open source community has made accessible tools with libraries expanding upon open
statistical languages like R. The list of methods provided by these tools was grouped in task
views. The machine learning task view for R holds 12 groups of tools, covering
computational statistics methods to GUI interfaces (Hothorn, 2012). The machine learning
methods commonly make the classification and can provide a re-sampling option. Normally
this option is provided under cross validation. A re-sampling can be made either from groups
of samples, called leave group out cross validation (LGOCV), or from each sample, called
leave one out cross validation (LOOCV). These re-sampling options and others can be found
in the caret package for R (Kuhn et al., 2012). During the cross-validation the reference data
is used in such way that all the records are used at least once as training samples and one as
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 24 -
validation data. Yang (2010, 102) says that “Data are randomly divided into k-folds. K
models are constructed. Each of them uses one fold of input vectors as test data while
the rest are used for model construction. The final model performance is estimated based on
these k sets of testing results.”
The algorithms presented for supervised classification can be applied either to pixel based or
object based image analysis. In next section, the classification of forest areas under PBIA is
presented.
2.1.9 Pixel-based image analysis (PBIA) of forest areas
“Multispectral satellite images that have been through preprocessing are then ready for
processing, which essentially means they are ready for image classification.” (Khorram 2012,
p 46). The classification of satellite scenes based on PBIA was described by many authors.
(Campbell and Wynne 2011, p 335; Khorram 2012, p 46; Purkis and Klemas 2011, p 92)
“The objective of image analysis is to create an accurate map of an area viewed by satellite
sensors.” (Purkis and Klemas 2011, p 84)
2.1.9.1 Detection of forest
The statistical approach employed was developed to solve the problem in two parts: forest
delineation and forest type identification. For the delineation of forest (forest and non-forest
areas) the ISODATA was applied as described in section 2.1.4.1 (page 13). This algorithm
was successfully used in the mapping of different types of forest cover (Carman and Merickel,
1990; Jiang et al., 2004b; Lang et al., 2008; Walsh et al., 2008; Makkeasorn et al., 2009;
Wang and Niu, 2009).
Two different classifications of forest areas were performed. One of the classifications was
based only on the original LANDSAT bands and the second using artificial bands such as
NDVI, TC1 to TC4 and texture bands, as described in section 0 (page 24). For the second
classification, two results were generated by the modal filter, described in section 2.1.9.2
(page 25), applied twice and once.
No other vegetation index was used based on the affirmation made by Campbell and Wynne
(2011) that says that “in practice there are few differences between the many VIs that have
been proposed”.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 25 -
The detection of forest was based on the unsupervised approach. This approach was selected
on the basis of previous experience and reports in the literature (Holopainen et al., 2009;
Pekkarinen et al., 2009; Ren et al., 2009). The accuracy of the classification of forest areas
was greater than 85 % in most of the literature cited.
In preparation for the classification of each image, a statistical analysis containing features for
the classification needs to be carried out. The Bayesian information criterion (BIC) was
chosen for the selection of features because the tests showed this criterion to be most robust in
the selection of predictors in different images, and results in a smaller number of features
compared to other selection methods (López Hernández et al., 2010).
2.1.9.2 Application of modal filter
After completion of the classification, a modal filter was used to reduce the salt and pepper
effect in the areas classified (Figure 8).
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 26 -
Figure 8: Effect of modal filters after classification.
In Figure 8, at the top is the original image, in the middle is the forest ‘green class’ without a
filter and at the bottom is the forest with the stray pixels cleaned up by the 3 x 3 modal filter.
The modal filters clean stray pixels and so the number of polygons at the end is more
manageable. However, the filtering also results in a slight increase in large areas and a slight
shrinkage of small areas. Some very small areas even disappear. This will affect the final area
in each class. Products have been created in which the filtering process is performed either
once or twice. (Figure 9)
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 27 -
Figure 9: Example of the second classification using dual modal filtering (forest: green; water: blue; grey: other
classes). The black lines are the forest from the ATKIS layer.
The dual filtering resulted in a generalisation of forest areas and in an increase in already
large forest areas. To counteract this, a simple modal filter was used in the final version.
After filtering, all of the areas classified were converted from raster to vector, a process
referred to as vectorization. This is a very complex and computationally intensive process
during which all of the areas with common pixel values are delimited and stored as closed
polygons. This process was undertaken to facilitate further cartographic and GIS tasks. After
vectorization all forest areas smaller than 1 ha were filtered out and added to the non-forest
areas. This filtering ensures that many small polygons inside and outside of the forest will be
deleted and reduced the size of the final file. In ATKIS the size is 5 ha. A second filter is
needed to apply in order to make further comparisons.
The mosaicking of the classifications from all images was carried out at the end, because the
different images had strong radiometric differences, therefore each image was classified
individually (Figure 10).
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 28 -
Figure 10: Illustration of the radiometric differences between two adjacent images.
The filtering of the product is the final step in the pixel based image analysis (PBIA). In the
next section the approach based on image objects is presented.
2.1.9.3 Detection of forest types
The classification of forest types was based in two methods. The first method tested was the
Minimum Distance to Means (MD), described in section 2.1.5.1 (page 15), and the second
was Maximum Likelihood (MxL), described in section and 2.1.5.2 (page 16).
2.1.10 Object based image analysis (OBIA)
The PBIA generated a map of forest. The next steps were conducted in order to obtain the
same map by other means in order to compare the accuracies. Segmentation is the subdivision
of an area in zones (segments) with similar characteristics. The segments are also called
objects (Navulur, 2007). The same author says that “an object can be defined as a
grouping of pixels of similar spectral and spatial properties. Thus, applying the object-
oriented paradigm to image analysis refers to analyzing the image in object space rather than
in pixel space, and objects can be used as the primitives for image classification rather than
pixels”. The spatial properties, also called features, can be: size, spectral value, indexes,
textures, values of surrounding polygons etc. Figure 11 presents the general idea on creating
objects from a satellite scene.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 29 -
Figure 11: Illustration of the generation of segment objects by the image segmentation method.
In Figure 11 (a) the original image in false color composition, in (b), the segment objects
generated drawn by their boundaries in blue and in (c) the result of the classification of the
segments.
The segments are polygonal objects derived from more or less uniform areas within the
images. As stated by Trimble (2011), the “image is cut in pieces, which serve as building
blocks for further analysis.” The generation of segments is an analysis commanded by the
parameters roughness, scale and shape calculated from the brightness values stored in the
image. This analysis can spread from only one band to the complete set of bands obtained by
the remote sensing instruments (Navulur, 2007). In every segment object, a descriptor can be
obtained and used as a feature for the learning process in the application of classifications.
More than 100 features can be obtained pre-programmed in eCognition (Trimble, 2011a). The
analysis using such large numbers of descriptors per polygon is frequently referred to as
object-based image analysis (OBIA).
Of numerous German exercises in forest mapping, the most recent focused on the riparian
forest of the Danube Floodplain National Park (Suchenwirth et al., 2012). The authors
identified meadows, reed beds and hardwood and softwood tree species using remote sensing
data from Ikonos-2 and a DEM created using LIDAR data. A spectral and knowledge-based
classification was performed with object-based image analysis. The authors’ objective was to
classify floodplain habitats, using OBIA to improve the accuracy of the classification of
vegetation cover mapping in central European floodplain habitats and in the estimation of the
carbon stored in floodplains. Their overall accuracy (OA) was 70 % with a kappa value of
0.64.
The OBIA can be resumed in three steps. The first step is importing the image. In this step the
coordinate system and the bands included in the analysis is made. The second step is the
a b c
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 30 -
application of the algorithm for segmentation. Various algorithms can be applied sequentially.
The last step is the classification of the segments and production of the final layer.
Every segmentation process generates a level, and all of the segments produced belong to that
level. A single segment can be subdivided following some rules and a new level would be
created below. Many different segments with the same attribute can also be merged in a new
level, that level would be above of the actual one. Characteristics from one to the other level
can be inherited, like level, super or sub objects values obtained from the image.
Some of the segmentation algorithms available were selected to complete the forest mapping
process. Every segmentation have their own parameters to set and will be presented next.
2.1.10.1 Chessboard segmentation (size of squares)
According to Trimble (2011), “Chessboard segmentation cuts the scene into equal squares of
a given size”. The chessboard segmentation creates a regular division of the area based on the
amount of pixels that will be used to build it. It is particularly useful for processing large
areas. It provides a framework before a multi-resolution segmentation is applied. This speeds
up the calculation of the second level segmentation. The chessboard segmentation method is
relatively new and no publications dealing with the optimal size of squares were found. A
subset of an image is presented with the result of the application of this segmentation
algorithm in Figure 12.
Figure 12: Subset of LANDSAT scene showing a false color composite (a) and the same place with a subdivision using
chessboard segmentation (b).
a b
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 31 -
Due to the size of the scenes (from 6000 to 8000 pixels in files and columns), the chessboard
segmentation algorithm was applied. The image was subdivided into small subsets of 600 x
600 pixels. With the subdivision, the next step in the segmentation required to calculate the
final segments was faster.
2.1.10.2 Multiresolution segmentation
Multi-resolution segmentation is an optimisation procedure that, for a given number of image
objects, minimises the average heterogeneity and maximises homogeneity (Trimble, 2011a).
The multi-resolution segmentation algorithm produces the segments that will be used for the
classification. It is called multi-resolution because the same algorithms can be used both to
subdivide and to merge segments.
Determining the parameter scale is one of the most important parts of this process. It is an
abstract definition of the size of the area to be used to select the information to create the
segments. The scale also depends on the size of the image, because large images can only
accommodate smaller scales. In Figure 13 an example of the output of this algorithm is
shown.
Figure 13: View with the segment objects created by multiresolution segmentation.
From Figure 13, some clouds and shadows are visible in white color. The clouds and their
shadows can be over forest and non-forest areas.
The scale of the multiresolution segmentation algorithm was set to 10. Bock et al. (2005)
presented a list of recommended scales to apply in REDD+ evaluations. When the algorithm
is set to a higher value, the segments tend to be bigger and include both forest an non-forest
classes based on the texture. When the value is smaller, the processing time tends to be longer
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 32 -
and the resulting segments are generally too small, producing a file too large for small
regions.
2.1.10.3 Merging segments
After segment objects are classified, some have the same class and share the same boundaries.
A rule can be applied to merge these segments. A copy of the first level is made and the
product is a new level of segmentation superior to the first, in which areas with similar
attributes are agglomerated. The resultant file requires less storage space and contains the
same spatial information as the first. In the figure below an example of the merging of
segments is shown.
Figure 14: Example of merging segment objects dissolving lines dividing the same class.
In this example conifer forest and broad leaved forest is classified and merged generating
polygons with two colors.
Juan Ygnacio López Hernández Development of a method for forest type detection 2 Basic Information
- 33 -
2.1.10.4 Object features
In the segmentation the image is subdivided into areas (segment objects) with similar
characteristics. Those segments are polygons enclosing groups of pixels. Different
segmentation programmes are able to record these polygons in either vector or raster format.
Calculations can be made for each polygon, such as the number of pixels in the polygon. Each
type of calculation produces a result called a feature.
In eCognition more than 100 features can be obtained for every segment or object. The
features per segment include, for example, the type of segment with two features (whether 3D
or connected), layer values with 20 features (mean, standard deviation, skewness, ratio,
minimum, maximum, mean of inner border, mean of outer border, border contrast, contrast to
neighbour, edge contrast, standard deviation of neighbours, circular mean, circular standard
deviation, circular standard deviation/mean), similar measurements to neighbours, to super
objects, to image, hue and saturation. There are anther 18 features related to the class of
segment, such as eight descriptors of relations to neighbour, five related to sub-objects of
the segment, relation to super-object, and five relations to the classification. Another three
features relate to linked objects, 53 image features, seven process related features, 13 region
features, two image registration features, three metadata and as many user defined
variables as the user defines (Trimble, 2011a).
Some of these features are used for other sciences, for instance, in microbiology (Li et al.,
2002; Kyan et al., 2005; Muthu Rama Krishnan et al., 2012), text analysis (Yang et al., 2011),
face recognition (Liau and Isa, 2011), etc. The features used to classify the satellite image
most commonly applied in the literature were incorporated in the model described previously
(Palubinskas et al., 1995; Triepke et al., 2008; Brenning, 2009; Heinl et al., 2009; Lier et al.,
2009; Petropoulos et al., 2011).
2.2 Software used
In the preprocessing steps, reprojection and converting formats of the original datas the next
licences of software was used:
Erdas Imagine 9.1 (Intergraph Geospatial)
ArcGIS 10 (ESRI)
eCognition 8.64 (Trimble)
SAGA GIS 2.0.8 (saga-gis.org)
2 Basic Information Development of a method for forest type detection Juan Ygnacio López Hernández
- 34 -
Quantum GIS 1.7.0 (qgis.org)
A pair of programming languages was used for converting data and statistics.
Phyton 2.5 (python.org)
R 2.15. Extension packages: car, caret, cluster, DAAG, e1071, graphics, foreign and all of the
additional extensions required for each package. (cran.r-project.org)
RStudio 0.95.265 (rstudio.org)
Juan Ygnacio López Hernández Development of a method for forest type detection 3 Test site and data used
- 35 -
3 Test site and data used
3.1 Test site
The chosen test site was Bavaria as the availability of reference data was guaranteed. Bavaria
(Bayern in German) is located to the southeast of Germany (Figure 15). The total area is
70547.8 km2
(Meschede, 2004). The southernmost limit is situated at 47° 16’ and the north
eastern boundary at 50° 34’. The borders from east to west are situated at 8° 58’ and 13° 50’.
The elevation, in meters above sea level, ranges from 100 m in the international boundary in
Main up to 2962 m in the highest mountain of Germany. Bavaria shares borders with Austria,
the Czech Republic, Switzerland (across Lake Constance) and the neighbouring German
states Baden-Württemberg, Hessen, Thüringen and Sachsen. The Danube (known as the
Donau in German) and the Main are the two major rivers flowing through Bavaria. The
landscape of Bavaria has been subdivided into four major regions: a) the Alps, with the
Zugspitze the highest mountain in Germany at 2,962 metres; b) the Alpine foothills with their
numerous lakes; c) the eastern Bavarian central mountains, which host the first national park
established in Germany; and d) the Schwäbisch-Fränkische scarp landscape (BAYERN
TOURISMUS Marketing GmbH, 2012).
3 Test site and data used Development of a method for forest type detection Juan Ygnacio López Hernández
- 36 -
Figure 15: The location of Bavaria relative to the other fifteen federal states in Germany and the position of Germany
in the world.
The natural regions of Bavaria are presented in Figure 16.
Figure 16: Natural regions of Bavaria. Source: Bundesamt für Naturschutz (2012).
Juan Ygnacio López Hernández Development of a method for forest type detection 3 Test site and data used
- 37 -
This figure shows in green lines the boundaries of the natural regions of Bavaria according
with the (BFN) Bundesamt für Naturschutz (2012). This division of Germany into major
natural regions takes account primarily of geomorphological, geological, hydrological and
pedological criteria in order to divide the country into large, physical units with a common
geographical basis. From south to north the Alps and Alpine foreland covers the Iller-Lech
Plateau (D64), Lower Bavarian Uplands (D65) and Isar-Inn Gravel Plateau (05-06), Southern
Alpine Foreland (D66), Swabian-Bavarian Foreland (D67) and the Northern Limestone Alps
(D68). At center of the state the Southwestern German Scarplands is covering the Swabian
Jura (D60), Franconian Jura (D61), Upper Palatinate-Upper Main Hills (D62), Swabian
Keuper-Lias Plains (D58), Franconian Keuper-Lias Plains (D59), Mainfranken Plateau (D56),
Odenwald, Spessart and South Rhön (D55). A region of the Eastern Central Uplands is
covered by the Upper Palatine-Bavarian Forest (D63) and the Vogtland (D17) in the north
most part of the state. A small region of the Western Central Uplands is covered by the East
Hesse Highlands (D47).
The geology of Bavaria is referred to as the southern German scarplands, a geological and
geomorphological natural region or landscape characterising the southern German states
Bavaria and Baden-Württemberg and also Switzerland. The landscape is characterised by
escarpments. The wooded scarps drop sharply to the west towards the Rhine Rift Valley and
the Rhine-Main Plain, whilst the slopes fall comparatively gradually towards the northeast
into the depressions beyond which lie the Thüringer Wald, Thüringer Schiefergebirge,
Frankenwald, Fichtelgebirge, Oberpfälzer Wald and Bayerische Wald. Similarly the
Schwäbische and Fränkische Jura descend gently to the southeast towards the Danube valley,
whilst the Schwäbische Jura, for example, drop very steeply to the north-northwest from the
Albtrauf, the top of the main scarp (Meynen, 1902; Dickinson, 1964; Geyer and Gwinner,
1986; Rothe, 2009).
The climate of Bavaria was reported as a transition between Continental and Atlantic
conditions. The south eastern region is closer to the continental due to the biggest differences
in temperature between the coldest and the warmest month. The annual mean temperature is
7.5 ºC, with the maximum in July and the minimum in January. The annual mean
precipitation surrounds the 850 mm. (Meschede, 2004)
3 Test site and data used Development of a method for forest type detection Juan Ygnacio López Hernández
- 38 -
3.2 Satellite data
The first step undertaken as part of the study was to explore a data set consisting of many
freely available satellite images from the past few years covering the Bavarian region.
LANDSAT data was used as the application had to be low cost. Only free satellite data with a
very high geometric resolution were considered. In order to carry out the evaluation within a
short period of time, LANDSAT 4 and 5 data was acquired. There are known problems with
LANDSAT TM 7, referred to as slc-off, that appeared in May 2003. This served to reduce the
area for which useable images were available by about 22 % (Roy et al., 2008). Only data
from the older LANDSAT TM 5 was selected for this method.
LANDSAT images from different sources provided the data used in the various treatment
stages. The three main sources of LANDSAT TM data at the time of writing are as follows:
1. Eurimage (www.eurimag.com). Eurimage is a company providing a wide range of
LANDSAT TM products. Additional information and products from many other satellite
sensors can also be purchased. However, the company works on a commercial basis, which
means that the data and products are not free of charge. Eurimage charge € 1500 for a
LANDSAT image. The nine images necessary to cover Bavaria would have cost € 13500. No
Eurimage data were used in this study.
2. Another source of data is the European Space Agency (ESA) (www.esa.int). The aim of the
ESA is to provide satellite data for the purposes of implementation. Therefore, it supports the
development of procedures for the application of satellite data in the form of free data. In
order to receive free information, however, a proposal must be formulated in advance and
submitted to a peer review process. This is likely to lead to significant delays in the start-up of
the research. Furthermore, ESA only provide LANDSAT data up to 2003 free of charge. All
data have cloud cover of less than 10 %. The Table 3 shown below provides an overview of
the data ordered from ESA.
3. The third organisation providing data is the United States Geodetic Survey (USGS)
(www.usgs.gov). The organisation offers worldwide LANDSAT data for free use. The data
can be ordered in various processing levels. The Table 4 lists part of the data ordered from
USGS. The cloud cover was less than 10 % for all of the most recent USGS LANDSAT data
available. These are LANDSAT TM 5 images. Most of the images were taken in the months
June, July and August; only two images were from September. Consequently, all of the
images depicted similar seasonal vegetation and in each the sun was high and the areas well
illuminated.
Juan Ygnacio López Hernández Development of a method for forest type detection 3 Test site and data used
- 39 -
LANDSAT TM data available (Figure 17)
Of the LANDSAT images available, only those with a higher sun elevation were selected.
(Table 3)
Table 3: List of LANDSAT scenes ordered from ESA.
No Acquisition Path Row Azimuth Elevation zone
1 13.06.2006 192 26 142.357978 60.167360 33
2 13.06.2006 192 27 139.999279 61.080092 33
3 01.09.2006 192 27 150.716592 47.442644 33
4 22.07.2006 193 25 144.460102 55.967562 33
5 22.07.2006 193 26 142.432246 56.913072 32
6 24.09.2006 193 26 158.039958 38.507702 32
7 22.07.2006 193 26 142.432246 56.913072 32
8 22.07.2006 193 27 140.330468 57.826531 32
9 26.08.2007 193 27 148.779599 49.332557 32
10 11.06.2006 194 25 144.863185 59.141177 32
11 11.06.2006 194 26 142.618862 60.096287 32
12 11.06.2006 194 27 140.289196 61.008800 32
13 18.06.2006 195 25 144.051298 59.302303 32
14 20.07.2006 195 25 144.221858 56.322025 32
Another set of 14 LANDSAT scenes from GLS 2000 and 2005 were ordered to provide the
best available images (Table 4). The following LANDSAT TM 5 satellite images were used
in the study (Table 5).
Figure 17: Overview of the best ESA
satellite data ordered.
3 Test site and data used Development of a method for forest type detection Juan Ygnacio López Hernández
- 40 -
Table 4: LANDSAT TM 5 satellite images observed in the study.
Id Mission Sensor Date File name Path Row
1 Landsat-7 ETM+ 18/06/2000 LN7_TM_6259_25.zip 194 25
2 Landsat-7 ETM+ 18/06/2000 LN7_TM_6259_26.zip 194 26
3 Landsat-7 ETM+ 18/06/2000 LN7_TM_6259_27.zip 194 27
4 Landsat-7 ETM+ 15/08/2001 LN7_TM_12419_25.zip 195 25
5 Landsat-7 ETM+ 15/08/2001 LN7_TM_12419_26.zip 195 26
6 Landsat-7 ETM+ 26/08/2001 LN7_TM_12579_26.zip 192 26
7 Landsat-7 ETM+ 26/08/2001 LN7_TM_12579_27.zip 192 27
8 Landsat-5 TM 18/03/1990 LN5_TM_32153_26.zip 195 26
9 Landsat-5 TM 23/10/1990 LN5_TM_35342_26.zip 192 26
10 Landsat-5 TM 23/10/1990 LN5_TM_35342_27.zip 192 27
11 Landsat-5 TM 15/04/1991 LN5_TM_37876_25.zip 194 25
12 Landsat-5 TM 02/06/1991 LN5_TM_38575_26.zip 194 26
13 Landsat-5 TM 07/08/1991 LN5_TM_39536_26.zip 192 26
14 Landsat-5 TM 30/08/1991 LN5_TM_39871_27.zip 193 27
15 Landsat-5 TM 19/05/1992 LN5_TM_43701_27.zip 194 27
16 Landsat-5 TM 17/09/1992 LN5_TM_45463_27.zip 193 27
17 Landsat-7 ETM+ 13/09/1999 LN7_TM_2196_25.zip 193 25
18 Landsat-7 ETM+ 13/09/1999 LN7_TM_2196_26.zip 193 26
19 Landsat-7 ETM+ 13/09/1999 LN7_TM_2196_27.zip 193 27
The images selected in this research are presented in the Table 5.
Table 5: The final 7 LANDSAT TM images used in the project.
Path Row Acquisition date
192 026 13/06/2006
192 027 01/09/2006
193 025 22/07/2006
193 026 24/09/2006
193 027 27/08/2007
194 025 11/06/2006
194 026 11/06/2006
Figure 18 shows that the images supplied were affected only very slightly affected by clouds.
Figure 18: Illustration of the satellite scenes used. At left with the path and row numbers and at left a mosaic
overlapping to avoid the clouds
Juan Ygnacio López Hernández Development of a method for forest type detection 3 Test site and data used
- 41 -
The quality of the scenes can vary due to different atmospheric conditions and sun position.
This was considered in the subsequent classification.
3.3 DEM
The digital elevation model (DEM) was supplied by the LFU (Bayerische Landesamt für
Umwelt “Bavarian State Office for the Environment”) and its use was permitted for the
purposes of this project. It was used in the orthorectification process documented in the
section 4.2.3 (page 56).
A DEM in Geotiff format with a resolution of 50 m was used for the orthorectification. In
Figure 19 an example of the reference data used is presented.
In Figure 19 we can see in a) the DEM with the boundaries of Bavaria superimposed; b)
general overview of the aerial orthophotos (in each point there is at least one); c) detail of one
orthophoto; and d) other orthophoto in detail.
3.4 Reference data
The LFU provided the author of the study with the following reference data:
3.4.1 Orthorectified colour aerial orthophotos
A collection of 286 digital colour orthophotos in MrSid format with a resolution of 40 cm per
pixel was obtained. These orthophotos were used to select ground control points and in the
definition of training and verification samples. A set of 97 orthorectified aerial photographs
was used to select ground control points in the orthorectification of the LANDSAT image and
a set of 32 in the sampling of data for the validation process.
a c b d
Figure 19: An example of the reference data used.
3 Test site and data used Development of a method for forest type detection Juan Ygnacio López Hernández
- 42 -
3.4.2 ATKIS forest layer
A digital map of forest boundaries in ATKIS (authoritative topographic-cartographic
information system) vector shapefile format (ADV, 2003) was used to support the selection of
training samples and in the identification of clouds over forest areas.
3.4.3 Forest stands from an aerial photo interpretation
A file in vector format (shapefile) detailing forest stands in the foothills and alpine areas of
Bavaria from aerial photo interpretation was also used.
3.4.4 Document about interpretation of aerial photographs of forest stands
A file in .pdf format documenting the process of interpretation of aerial photographs for the
identification of forest stands in the foothills of the Alps was provided by the LFU. By
interpretation the aerial photographs the precise forest types could be identified.
3.4.5 Inventory data
A data set from the German national forest inventory (NFI) (‘Bundeswaldinventur,’ BWI)
provided by the Federal Ministry of Food, Agriculture and Consumer Protection
(Bundesministerium für Ernährung, Landwirtschaft und Verbraucherschutz, BMELV) and a
description of the data (BMELV, 2010) was used to validate the results of the classifications.
Whereas the classification of forest areas was validated based on the inventory data, forest
types were assessed based on the species information contained within the NFI sampling
plots. The groups of species considered are presented in table 4.
Table 6: Variables obtained from the NFI as they were stored in the files, including their description.
Variable Description
Ei [Oak] All species of oak (including red oak).
Bu [Beech] A genus (Fagus), broad leaved tree from the Fagaceae family.
ALH Other long-lived broad leaved trees: maple species, sycamore, sweet chestnut, ash,
lime species, walnut species, black locust, horse chestnut, wild service tree, holly,
elm species, white ash.
ALN Other short-lived broad leaved trees: birch species, service berry, alder species,
poplars, black cherry species, wild cherry, wild fruits, all other broad leaved tree
Juan Ygnacio López Hernández Development of a method for forest type detection 3 Test site and data used
- 43 -
Variable Description
species not mentioned.
FI [Spruce] All spruce and other coniferous trees species except Douglas fir, pine,
larch, fir.
TA [Abies] White fir, grand fir and other fir trees.
DGL [Pseudotsuga] Douglas fir.
Ki All pine species.
LAE [Larix] Larch of all kinds.
3.4.6 Cartographic projection of the data
The original inventory data from the NFI were obtained with the datum in the standard EPSG
3396 and converted into EPSG 3397 (Butler et al., 2010). The ATKIS forest layer was also
obtained in the standard EPSG 3396 and re-projected into the standard EPSG 3397. (Table 7)
Table 7: Map projection spatial reference parameters used (Butler et al., 2010).
EPSG code 3396 3397
EPSG name PD/83 / Gauss-Kruger zone
3
PD/83 / Gauss-Kruger zone
4
Datum Potsdam 83 Potsdam 83
Spheroid Bessel 1841 Bessel 1841
Projection Transverse Mercator Transverse Mercator
Latitude of origin 0 0
Central_meridian 9 12
Scale_factor 1 1
False_easting 3500000 4500000
False_northing 0 0
3 Test site and data used Development of a method for forest type detection Juan Ygnacio López Hernández
- 44 -
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 45 -
4 Methods
All methods applied, beginning with original images and finishing with the map of forest
types are presented in Figure 20.
Figure 20. General flow of the methods applied. The preprocessing is also presented.
The first part was the preprocessing. The NFI and the scenes received a preliminary
processing treatment that is described next.
4.1 Pre-processing the national forest inventory (NFI) data
When classification methods are applied to the data, they must be evaluated based on
reference data independent of the method. The NFI database was used in one approach only
as a validation, and in the other as training and in the validation of the results. This database
contains information relating to the plots assessed throughout the whole state and shows the
species, number of trees per hectare, basal area per hectare and the volume per hectare in each
plot.
The records containing information about gaps and open areas (BL > 0, or iBL > 0) were
excluded from sampling and validation.
For the evaluation of PBIA classifications the broadleaved and conifer forest were obtained
from the original records summarizing the species records.
The proportion of broadleaved forest by area was obtained from the following equation:
BROAD = Ei + Bu + ALH + ALN
The proportion of conifers by area, alternatively, was derived from the equation:
CONIFER = FI + TA + DGL + Ki + LAE
The original database contained information presented in only one table. Every row presents a
measurement made in a plot for a group of species.
Presented in Table 8 is a list of field names and their meaning according to the descriptions
stored in the original file.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 46 -
Table 8: The original field names and their descriptions contained in the national forest inventory (NFI) for Bavaria.
Field Description
BL State (9 = Bavaria)
TNR Tract no.
ENR Corner no. (‘wing area’)
RW Gauss-Krueger easting
HW Gauss-Krueger northing
BAGRNR Species group no.
BAGR Tree species group
NHA Number of stems per hectare (major and minor stand)
NHA_HB Number of stems per hectare (main stand)
BAF Stand area [ha] (main stand)
GHA Basal area [m2] per hectare (major and minor stand)
GHA_HB Basal area [m2] per hectare (main stand)
VHA Stock [m³] per hectare (major and minor stand)
VHA_HB Stock [m³] per hectare (main stand)
H_L Lorey height (mean height weighted proportional to basal area) (major and minor stand trees) (Loetsch et al.,
1973; Van Laar and Akça, 1997)
DG Quadratic mean diameter [cm] (major and minor stand trees)
DG_HB Quadratic mean diameter [cm] (main stand trees)
4.1.1 Query
The data was provided in an MS Access database and converted to ASCII format for use in
the statistical software.
Some queries were made to extract the information on the number of stems per ha, basal area
per ha, and volume per ha agglomerated per stand and per group of trees.
Table 9: Example of the table obtained after applying a query to total basal area per species in each plot the National
Forest Inventory (NFI) database for Bavaria.
The first two columns represent the coordinates of the plot in arbitrary units. The next two
columns (RW and HW) represent the latitude and longitude on a map. The remaining columns
present the total basal area per hectare in each plot. This information was converted into
ASCII format and imported into a GIS database as a point layer. A total of 7056 points were
imported into the database. The map coordinates of the database were Gauss-Krueger zone 3,
as can be seen from the easting coordinates.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 47 -
4.1.2 PAM Clustering
As an ordination approach for grouping the NFI plots in clusters that represent species in the
field, the PAM algorithm was applied. The information contained in the NFI can be used to
find groups of species in the area on the basis of the natural distribution of the species. All of
the species are recorded in the inventory, but not all are attributed the same ecological
importance. This grouping process is called ordination and is used for the analysis of forested
areas (Guisan and Zimmermann, 2000; Austin, 2002; Thessler et al., 2005; Proisy et al.,
2007). The distances between clusters was measured based on normalised values of basal area
per ha, number of stems per ha and volume per ha.
The basal area per ha was chosen as the ordination variable, because it was found to have
relation with the values found in satellite scenes. (Holmgren et al. 2000; Reese et al. 2002;
Moisen et al. 2006) By other way the same order of importance was found in the number of
stems per ha and the volume per ha. The groups with the highest basal area were considered
to be the most important (Figure 21). The remaining groups had marginal representation for
basal area in the region. A histogram of frequencies was made to visualise this relative
importance of the groups.
Figure 21: Histogram of the number of species in the National Forest Inventory (NFI) of Bavaria. The groups of
species were sorted based on abundance from most frequent to less frequent.
The names of the species were presented in Table 6, page 42. The groups FI, KI and BU,
which represent species of spruce, pine and beech respectively, were selected to define the
Groups of species
Num
ber
of
spec
ies
Number of trees per group of species
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 48 -
ordination clusters. An exploration to define the number of clusters present in the area was
then made. Using the PAM package (Maechler, 2011), a,silhouette graph of the mean distance
between clusters was made (Kaufman and Rousseeuw, 2005). This diagram shows the mean
distance found after an unsupervised classification using the PAM method. Where there is too
much data for a desktop computer to handle, the package CLARA was used (Maechler, 2012).
Searching for a tentative number of clusters, the first grouping was made building a silhouette
with 30 clusters. The result of the clustering is shown in Figure 22.
Figure 22: Silhouette of the groups built during the definition of the optimum number of clusters based on the mean
standardised intercluster distance (ave), known as dissimilarity.
The silhouette was defined in section 2.1.4.2 (page 13). The best condition is when separation
among clusters get the highest values, which is given by the higher mean intercluster average
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 49 -
dissimilarity (ave). That property was found with fourteen clusters and the second best
separation was with three clusters. The higher the intercluster dissimilarity, the better the
number of clusters found. In Figure 22, the highest dissimilarity was found with ten clusters,
but it was considered inadequate to characterise the forest, especially when only three groups
of species were selected. The next was three clusters and 0.91 in mean dissimilarity which
corresponds to 3 clusters.
Only three main groups of species were selected from the NFI as it was expected that no more
than three clusters would be found. A new classification was performed with PAM with three
clusters were built. The clusters had the properties presented in the Table 10.
Table 10: Characteristics of the clusters found by PAM algorithm.
Cluster Plot Tract_Corner
Basal area per species Oak Pine Beech
1 4187_4 6 0 4 2 2914_4 44 0 0 3 15318_1 4 32 0
A scatter plot of these clusters was made along two axes, PCA 1 and 2. This diagram is
presented in Figure 23.
Figure 23: Cluster plot of the three groups identified by the PAM method. Each group, identified with numbers 1 to 3,
represent the clusters in Table 10.
The cluster 1 was represented by similar proportion of Oak and Beech trees. The total amount
of basal area is relatively low in this cluster. Cluster 2 is composed solely of Oak. No trees
Pine or Beech on it. Cluster 3 is composed mostly of pine. It also has some oak to a lesser
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 50 -
extent. Beech has not. Other species may be present or not, but they were not considered for
clustering. With the clusters made, and assigned to the records of the NFI, the next step is to
convert the table into special pints described in the next section.
4.1.3 Table to points conversion
The NFI table with the records was imported into the GIS. The EPSG codes were used to
make the transformation from GK zone 3 (3396) to GK zone 4 (3397). A general overview of
the plots imported into the GIS can be seen in Figure 24.
Figure 24: The 7056 inventory plots for Bavaria contained in the NFI database illustrated graphically after import
into GIS.
A close-up of the centre of the area shown in Figure 25 reveals the plots individually.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 51 -
Figure 25: A close-up of the centre of Bavaria showing individual groups of plots.
The distance from one group of plots to the next is 4 km and between plots in the same group
is 150 m. In each plot are measured trees within a circle with a radius of 25 m. In the case of
Bavaria, inventory has greater density than the rest of the country. Only in western regions of
the state, they added a new group of four plots with the same characteristics of the standard
inventory at centre of the diagram. This increased density, so called enrichment, can be seen
at left in Figure 24.
With all the plots imported, the next step is to filter those plots that could affect the evaluation
of the classification due to irregularities compared with the rest of the records. A spatial
criterion was applied to avoid problems in classification. This criterion is described next.
4.1.4 Buffering
The forest is normally covering areas and a inventory plot can lay on the boundary of the
forest. Those records could bring flase information to the analysis if consider that the whole
area of the plot could have some portions without forest at al. The ATKIS forest layer was
used as a boundary of forest. All of the plots that were too close to the boundary of the forest
were filtered out. Only plots located at distances measured from the forest edge of greater
than the diagonal of one pixel were included. The distance considered was 45 m. A layer
4 Km
150 m
Group of 4 plots
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 52 -
defining all areas influenced by the distance to boundary of the forest was created. An
attribute was assigned to the NFI plots in those areas. All NFI plots located at a distance
greater than 45 m were selected for further use. A diagram with the boundary of forest and the
buffer around the boundary is presented in the Figure 26.
Figure 26: A buffer of 1.5 times the pixel size (right) was applied to the boundaries of the forest (left) and used to
select the NFI plots for the classification and the validation of the images.
The result of this step is a subset of the NFI database was obtained. The data is ready to be
used with satellite scenes. In the following steps, the preparation of the satellite scenes for the
analysis.
4.1.5 Automatic selection of training samples from the NFI plots
An experiment to evaluate the source and the number of training samples, and using a
threshold for the NDVI, was carried out over the image in path 192, row 026. The SVM
classification was carried out and evaluated with the National Forest Inventory (NFI) data.
Presented in table 33 is a list of the resulting parameters after application of the grid search
method to tune the parameters gamma and cost of SVM, described in 2.1.5.4 (page 18), the
best performance (BestP) achieved, the number of support vectors (SupportVec) and the
accuracy (Total Acc) evaluation of the training samples. The overall accuracy (OA) and the
respective kappa coefficient found are shown in the last two rows.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 53 -
Table 11: Experiment on SVM for seleccion of features and threshold the NDVI feature.
G
rid
Sea
rch
Samples a b c d e f
Gamma 1.024 0.101 1.040 0.128 0.101 0.301
C 0.100 9.601 1.020 0.100 8.701 1.201
BestP 0 0.5 0.57 0 0.53 0.27
SupportVec 52.00 76.00 40.00 33.00 37.00 31.00
Total Acc 94.23 59 53 100 60 75.5
OA 38.6 55 51 40 59 75
Kappa 0.15 0.23 0.24 0.12 0.28 0.2
Samples selected in the following ways:
a Visual selection of training samples from Orthophotos and variables B2, B3,
B5, SD7 for bands (B) and Standard deviation (SD)
b Selecting 20 random training samples per class from NFI all scene features.
c Selecting 10 random training samples per class from NFI all scene features.
d Using all scene features NDVI as predictor.
e Selecting 20 random training samples per class from NFI all scene features.
f Selecting only NFI plots with NDVI > 50 as training samples and balancing
the weights per class.
As result of the experiment, all of the features from the image were selected and the NDVI
threshold was set to 50. This option is in column f of Table 11. This result conduced to adjust
the definition of forest.
The kappa coefficient was too low, indicating inconsistencies between the classes resulting
from the classification. In general, broadleaved forest is the class with the lowest user
accuracy. Data from the inventory were used in order to improve accuracy.
4.1.6 Calculating the weight of the classes
The selection of training samples for the classification of the segments always bring non-
balanced amount of training samples per class. The authors presented two methods to use
those unbalanced training samples for classification: a) repeat the smaller classes up to make
the same number as the biggest or b) to use some weighting factor in order to balance the
training.
The factor was applied by using the following equation:
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 54 -
Factorc = samplesc / max(samplesc)
Where
Factorc The weighting factor to be applied for each class c.
samplesc Is the amount of training samples in the class c.
max() A function to extract the highest value.
The overall accuracy of the classification increased from 59 to 75 %. Most of the errors in
classification are related to conifer forest.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 55 -
4.2 Pre-processing satellite scenes
A general flow chart of the pre-processing steps applied to the scenes is presented in the
Figure 27.
Figure 27: Pre-processing steps applied to the scenes.
The data used were produced by different institutions, such as the USGS, BMELV and LFU.
Each has its own standards in the production of data. Those standards are usually different to
those of the final user. The files are optimised for transfer via the internet, for example, and
are usually in a standard exchange format but not in a form appropriate for use in geographic
information systems (GIS) software. During pre-processing, the user normally converts the
data from the original digital format to the format required by the software in use. Among
other tasks, the calibration and geodetic datum should be processed.
4.2.1 Conversion of format
The original format of the images provided was Geo-TIFF compressed to tar.gz by the USGS.
In this format, these images were originally projected in WGS84 zones 32 and 33, depending
on the position of the centre of each image.
The geodetic reference system was converted to the Gauss-Krueger reference system for the
selection of aerial orthophotos as a basis for the orthorectification. This conversion was
performed because all of the reference maps found was in Gauss-Kruger, the official
projection of Germany. The original satellite images obtained were referenced in the WGS84
system, were orthorectified based on control points selected from orthophotos.
The DEM was converted to a resolution of 30 x 30 m so that it could be registered with the
LANDSAT TM image.
In all processing steps, the Gauss-Krueger zone 4 projection system was used. The parameters
used for the projection transformation were:
Gauss-Krueger coordinate system
Datum: Potsdam
Spheroid: Bessel 1841
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 56 -
Projection: Transverse Mercator
Longitude of central meridian: 12 ° east
Latitude of origin: 0 ° north
Scale factor: 1
False easting: 4500000 m
False northing: 0 m
EPSG code: 3397
4.2.2 Pre-processing DEM
The DEM presented some abnormalities that could affect further steps like the Ortho-
rectification. The abnormalities found are presented in the following list:
The null value was set to -255. It was a problem introduced by the conversion of the
format from geotiff to IMG.
Some areas outside the test area and were clipped out from the raster.
A general view of the DEM before and after the pre-processing are presented in the .
Figure 28: The DEM before (left) and after (right) the pre-processing step.
4.2.3 Orthorectification
The satellite images to be orthorectified and the DEM were converted to compatible formats
so that the images could be opened in a viewer of the image processing system. To identify
the location of each ground control point (GCP) provided by the LFU, orthorectified USGS
images were used. These have a low spatial resolution, but as they are georeferenced. They
have been used to identify the location of the GCPs and relate each GCP to a certain aerial
photograph page number. The selected aerial photographs were then ordered from the LFU.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 57 -
Every single checkpoint was then precisely identified in the satellite image and aerial
photographs. A total of at least 20 GCP were used per satellite image. At least five points are
required to calculate a model that is based on the equation of co-linearity. Then parameters of
the external orientation and the error (RMSE) can be calculated. The best results are achieved
when the control points are distributed uniformly over the image. For a more precise
orthorectification in hilly and montanious areas, the equeation of co-linearity was used. This
equation uses the altitude information from the DEM. The higher accuracy is found when
used high amount of GCP. The coarse resolution of satellite images complicated the
identification of suitable control points, especially in wooded rural areas.
The elimination of geometric distortion from an image is called orthorectification. The
geometric distortion of the image is caused by two basic reasons, the systematic and non-
systematic deformation present in the original scenes. Non-systematic deformations include
uneven terrain, whereas systematic deformations refer to the central perspective and incorrect
behaviour of the recording system. This is a special form of image georeferencing.
To geocode an image one requires the transformation equation, with which each pixel of the
input image can be transferred into the matrix of the output image. There are basically two
approaches to find this equation.
The recording geometry of the sensor is modelled in the DEM using parametric methods. The
pre-requisite is that the location and movement of the sensor is known and a DEM is
available.
GCP are used to establish the relationship to the reference system. A transformation equation
is applied to re-order the data of the input image into the matrix of the output image. With the
re-sampling, the grey value of the output image is calculated back from the input image. In
this study the nearest neighbour re-sampling method was used in order not to change the
original grey value, which is important for subsequent classification.
Height errors in the terrain model have an impact on the orthorectification, as position errors
in the orthorectified images, whereas the number of GCP used plays not so dominant role.
Studies have shown that a greater number of GCP produce no substantial improvement in the
resultant image. The quality of the DEM and of the GCP and the distribution serve to improve
quality considerably. The residuals expressed by the RMSE (residual error) resulting from the
orthorectification ultimately reflect the quality of the GCP (ERDAS Inc., 2010).
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 58 -
For the orthorectification, the LANDSAT images were uniformly converted to a pixel size of
30 m x 30 m. The DEM was converted from 50 m x 50 m to 30 m x 30 m.
The following LANDSAT TM 5 images were orthorectified: (Table 12)
Table 12: List of the orthorectified LANDSAT 5 TM images acquired.
Path/Row Acquisition date
192/026 13/06/2006
192/027 01/09/06
193/025 22/07/2006
193/026 24/09/2006
193/027 27/08/2007
194/025 11/06/06
194/026 11/06/06
The accuracy achieved ranged from 1 to 1.5 pixels. This corresponds to a maximum deviation
of 30 to 45 m.
The two following figures illustrate the level of accuracy on the basis of the rectification of
the superimposed ATKIS data.
Figure 29: Example 1 highlighting the accuracy of the rectification (green lines are vectors of ATKIS forest layers).
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 59 -
Figure 30: Example 2 highlighting the accuracy of the rectification (green lines are vectors of ATKIS forest layers).
In Figure 30 two adjacent images are presented. The differences in illumination conditions
can be seen at the centre of the image.
Once the image is orthorectified, the user can compare known topographic precision data
from different dates and even different sensors in one analysis.
4.2.4 Topographic normalisation
To compensate for the slope-related differences in lighting, a topographic normalisation was
applied to each satellite image. The C-factor method was selected for topographic
normalisation. This method was deemed particularly suitable, as it has been used successfully
in several forest inventories (Oehmichen, 2007).
The C-factor method is used to balance the brightness values of every pixel in the scene
depending on local sunlight illumination angle, varying with a compensation factor C, which
also takes into account the diffuse incident radiation at the sensor. The correction factor is
calculated by means of a regression between the individual bands of the sensor and the DEM
local solar incidence angle, and refers in this case to training areas selected as homogeneous
coniferous forests through visual interpretation of the satellite data (Figure 31). The
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 60 -
topographic normalisation was performed using the SILVICS programme package ((c) Niall
McCormick, JRC-SAI, Ispra, 1995-1998) to determine the C-factor for all images. (Figure 31)
To avoid an erroneous correction of sharp edge slopes, the digital elevation model was
processed first with a 3 x 3 smoothed average filter.
Figure 31: Test areas for the calculation of the C-factor taken from a subset of the image in path 192, row 026.
The C-factor (correction factor) was calculated based on the homogeneous coniferous forest
areas. Once determined for all bands, the topographic normalisation was applied to the entire
image.
Contrary to the expectations for the topographic normalisation, however, artefacts of ridges
remained on corrected northern slopes (see Figure 32).
Therefore after a quality check the topographic normalization was discarded and not used for
the further processing.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 61 -
Figure 32: Overcorrection effects of topographic normalization.
The standard false colour composition of the original image is shown (left) and the same area
with overcorrections resulted from the topographic normalization (right). The topographic
normalization was not applied because of the observed over-corrections. The original
orthorectified images were used for further processing to ensure a better result in the
classification.
4.2.5 Cloud mask
The clouds were identified by a ratio and the setting of a threshold. For this, the value of each
pixel of the LANDSAT TM band 1 was divided by the corresponding value of the band
thermal channel 6. If the result was greater than the threshold 1, the corresponding pixel was
deemed a cloud and redefined as ‘not present’ (null value). The threshold was determined
interactively in multiple passes to ensure the best possible adaptation of the cloud mask to the
images.
Simultaneously, when band 4 was less than 25 and band 7 greatest than 15, the shadows were
identified. In those locations a null value was assigned. Some water bodies were assigned a
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 62 -
null value with this process, but for the objectives of this research, this error was not
important.
The resulting data sets included only those areas not in the shadow of clouds and only those
with directly illuminated pixel values. Only at the edge of the clouds, where the clouds were
translucent, were adverse effects evident (see Figure 33).
Figure 33: Subset of the image before (left) and after (right) masking of the clouds.
4.2.6 Haze suppression
After orthorectification a haze correction was performed to minimise the impact of haze, and
to enhance the spectral differences between classes. The ‘haze reduction’ method was based
on the ‘tasselled cap coefficients’ method. Here, the proportions of vapour components are
identified and corrected in each pixel (Lavreau, 1991).
The results of the haze correction were checked for spectrally homogeneous land units. There
was a significant reduction in the noise detected by the correction (Figure 34). The haze
correction was, therefore, calculated for all images.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 63 -
Figure 34: The haze uncorrected image (left) and the haze corrected image (right).
Figure 34 shows significant improvement to the spectral differences between homogeneous
land units (water and vegetation) is clearly visible in the highlighted part of the images.
With the set of scenes prepared in the pre-processing steps, some variable selection methods
could be applied to choose the best predictors of the forest and forest types in the area. The
next section is dedicated to the variable selection.
4.3 Creating synthetic layers for the classifications
Before the application of the PBIA, some syntetic layers were made. The NDVI is proven to
be saturated and is quite sensible to seasonal changes (Huete et al., 2002; Moreau et al., 2003;
Inoue et al., 2008). In order to improve the classification result within the forest, the following
additional synthetic layers was created from the available data. The normalized difference
vegetation index (NDVI) was calculated. In Figure 35 is a subset of this index.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 64 -
Figure 35: Subset of the NDVI.
The special transformation Tasseled Cap (TC) was calculated and generated 4 components.
(Leica Geosystems Geospatial Imaging, LLC 2010) The TC values were obtained by the next
equations:
TC1 = Band1 * 0.2909 + Band2 * 0.2493 + Band3 * 0.4806 + Band4 * 0.5568 + Band5 *
0.4438 + Band7 * 0.1706 + 10.3695
TC2 = Band1 * -0.2728 + Band2 * -0.2174 + Band3 * -0.5508 + Band4 * 0.7220 + Band5 *
0.0733 + Band7 * -0.1648 - 0.7310
TC4 = Band1 * 0.8461 + Band2 * -0.0731 + Band3 * -0.4640 + Band4 * -0.0032 + Band5 *
-0.0492 + Band7 * 0.0119 + 0.7879
TC6 = Band1 * 0.1186 + Band2 * -0.08069 + Band3 * 0.4094 + Band4 * 0.0571 + Band5 *
-0.0228 + Band7 * 0.0220 - 0.0336
A subset of those components is presented in Figure 36, to Figure 39.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 65 -
Figure 36: Subset of the tasselled cap 1. This band is also called Brighness of the objects in the visual channels (Crist
and Cicone, 1984).
Figure 37: Subset of the tasselled cap 2. Called Greenness, is presenting the white most areas related to the level of
green for the human eye (Crist and Cicone, 1984).
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 66 -
Figure 38: Subset of the tasselled cap 4. This layer is called Haze and it shows the hazy areas with more intense levels
of white.
Figure 39: Subset of the tasselled cap 6. This band is called sixt.
The texture of the bands was calculated based on the second moment gray level co-occurrence
matrix angular 2nd moment (Trimble, 2011b), and one more feature produced per band. In
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 67 -
total, seven new features were obtained. A subset of the texture for every band is presented in
the following six figures.
Figure 40: Subset of the texture for band 1 of LANDSAT TM 5 data.
Figure 41: Subset of the texture for band 2 of LANDSAT TM 5 data.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 68 -
Figure 42: Subset of the texture for band 3 of LANDSAT TM 5 data.
Figure 43: Subset of the texture for band 4 for the LANDSAT TM 5 data.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 69 -
Figure 44: Subset of the texture for band 5 of LANDSAT TM 5 data.
Figure 45: Subset of the texture for band 7 of LANDSAT TM 5 data.
With the synthetic layers ready, the best predictor of forest and non-forest areas is selected in
the next step.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 70 -
4.4 Pixel based analysis of the satellite data
A general description of the workflow is presented in Figure 46.
Figure 46: General description of the workflow.
The preprocessing described in section 4.2 (page 55) was followed by two kinds of image
analysis. The approaches on pixel based image analysis (PBIA) and the object based image
analysis (OBIA) were tested. The processing steps are presented in the diagram of the Figure
47.
Figure 47: The pixel based image analysis applied to complete scenes.
4.4.1 Detection of forest
Two different classifications of forest areas were performed. One of the classifications was
based only on the original LANDSAT bands and the second using artificial bands such as
NDVI, TC1 to TC4 and texture bands, as described in section 0 (page 24). For the second
classification, two results were generated by the modal filter, described in section 2.1.9.2
(page 25), applied twice and once.
No other vegetation index was used based on the affirmation made by Campbell and Wynne
(2011) that says that “in practice there are few differences between the many VIs that have
been proposed”.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 71 -
In preparation for the classification of each image, a statistical analysis containing features for
the classification needs to be carried out.
An unsupervised classification method was selected for the classification of forest and non-
forest areas, as a supervised classification requires a variety of training areas which is time
consuming and needs reference data. The selection of training areas on the basis of existing
aerial photographs would have added significantly to the costs.
Based on the orthorectified and normalised data sets, the forest/non-forest unsupervised
classification was performed. A unsupervised classification is based solely on the statistical
values in the image, and classes are eliminated without thematic mapping. The thematic
mapping takes place after classes are formed.
An ISODATA algorithm was used for the classification. This is one of the most widely used
unsupervised classifiers in image processing (Jiang et al., 2004b; Lang et al., 2008; Walsh et
al., 2008; Makkeasorn et al., 2009; Wang and Niu, 2009). As the result of a classification
based on an ISODATA algorithm also depends on the initialisation parameters, these are
described here. These parameters were selected as follows:
- The first cluster is located on a diagonal axis of the first two principal components;
- The number of classes will start with 60 or 250 (classification 2). The mixture of classes is
analysed by the operator merging the most similar to forest;
- The thresholds are 20 repetitions and 95 % of convergence of the means. The repetitions
ensure that the stability of the classification could be found and the convergency of new
cluster centres with the previous ones ensures that not many clusters will have change in the
final stage;
- The classes will eventually be grouped into six final classes for thematic mapping because a
map with more classes could be confusing for the human eye.
The forest/non-forest classification using the ISODATA Classifier was performed twice. The
input features taken in the first approach were the six original bands of the LANDSAT TM 5.
In the second approach, the classification was made adding the six original types and features.
The NDVI, a texture feature (homogeneity) and the tasselled cap features TC 1, TC 2, TC 4
and TC 6. This approach explores the possibility to get more information from the synthetic
bands. According to Healey et al. (2005) cited by Campbell and Wynne (2011) in page 445
“recently cleared forest exhibits high brightness and low greenness and wetness in relation to
undisturbed forest.”
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 72 -
4.4.2 Detection of forest types
The classification of forest types was based in two methods described in 2.1.9.3 (page 28).
The Minimum distance to means (MD) method and the Maximum likelihood (MxL) method
were applied as is documented in ERDAS, Inc. (2010).
Some areas containing pure forest communities were selected as samples. Those segments
were plotted over the image and related to the feature space of the bands 3 and 7. The spectral
signatures of the forest types were selected. The plots of every polygon in the samples were
observed and delimited. Figure 48 is an illustration of this process.
Figure 48: The training process of the MxL classification involved adding points to the image.
As can be seen in Figure 48, the user points to some pure pixels in the broadleaved forest
(left), and the system presents the exact position in the feature space, in this case, bands 3 and
7 (right). By pointing to many pixels, a precise area can be found where the specific class of
cover is present.
The spectral signatures for conifer and broadleaved forest were obtained. These spectral
signatures were used as a non-parametric estimator for the supervised classification. An
example of the signature editing process, the Table 13 is presented.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 73 -
Table 13: Example of the signature editing process and the color assigned per class.
Every class should have a similar sample size for the supervised classification. The value
column identifies the class trained and was re-labelled to 1 and 2 for broadleaves and conifer
forest. The classes were either conifer or broadleaved forest. The MxL algorithm for
classification was applied and the result is a map showing the cover of conifer and of
broadleaved forest in the area.
4.4.3 Assessment data based on aerial photographs, satellite and NFI
After classification of forest types a first evaluation took place to assess the classification
accuracy for forest and non-forest areas. The evaluation was based on a range of aerial
photographs covering the entire area. Points for verification were selected randomly in the
classified maps by the system (see Figure 49).
Figure 49: Example of the selection of reference points in two aerial photographs.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 74 -
These verification points were visually assigned to aerial photographs and verified for the five
classes forest, cloud, shade, water and other surfaces. The classes cloud and shade were
relabelled ‘others’ in order to speed up the process.
4.4.1 Vectorization of pixel-based classification results
Each one of the 7 scenes in BIL format (see below) were classified with the minimum
distance producing files named (* _MD_40.bil) and with maximum likelihood producing files
named (* _MLN_40.bil) to vectorize.
192_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
192_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
192_27_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
192_27_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
193_25_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
193_25_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
193_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
193_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
193_27_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
193_27_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
194_25_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
194_25_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
194_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MD_40.bil
194_26_POTSDAM_TMGER4_ATKIS_Maske_klassifiziert_MLN.bil
An illustration of the pixel values obtained after the classification of the image in path192,
row 027 is presented in Figure 50.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 75 -
Figure 50: Matrix values of a classified scene.
These raster data sets were first imported into the ERDAS image processing system. Each
raster data set contains classes corresponding to the predetermined values of 0-5.
To reduce the high variability within each classified scenes and generate a balanced picture of
the landscape in the vector data sets, the data were processed with a 3x3 pixel Majority filter.
Non-forest (pixel value 0) was not taken into account.
In comparison, the effect of filtering can be properly assessed (see figure 33).
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 76 -
Figure 33: Classified scene (left) and filtered scene (right): the right scene shows balanced polygons with edges,
isolated pixels are eliminated.
To assign the correct class name of the individual pixel values, use was made of the BIL files
associated DBF tables in which the respective assignment was filed. The final pixel value
class - assignment can be found in the following 2 tables.
Table 14: Minimum Distance Classification. List of codes used for every class.
Scene No forest mixed broad leaved mixed conifers mixed broad leaved conifer
192_26 0 1 2 3 4 5
192_27 0 1 2 3 4 5
193_25 0 1 4 3 2 5
193_26 0 1 4 3 2 5
193_27 0 1 4 3 2 5
194_25 0 3 4 1 2 5
194_26 0 3 4 1 2 5
In Table 14 and Table 15 every scene were classified and coded in mixed, broad leaved forest
(broad leaved), mixed forest dominated by conifers (mixed conifers), mixed forest dominated
by broad leaved species (mixed broad leaved) and conifer forest (conifer). The 0 value
corresponds to no forest.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 77 -
Table 15: Maximum Likelihood Classification. List of codes used for every class.
Scene No forest mixed broad leaved mixed conifers mixed broad leaved conifer
192_26 0 1 2 3 4 5
192_27 0 1 4 3 2 5
193_25 0 2 4 3 1 5
193_26 0 2 4 3 1 5
193_27 0 3 4 1 2 5
194_25 0 3 4 1 2 5
194_26 0 3 4 1 2 5
To proceed with the subsequent analysis, classified scenes were recoded according to the
following table. (see Table 16)
Table 16: The classification of raster data were recoded with this values
Scene NoForest broad leaved
mixed:
broad leaved
dominated
mixed
mixed:
coniferous
dominated
coniferous
192_26 0 1 2 3 4 5
192_27 0 1 2 3 4 5
193_25 0 1 2 3 4 5
193_26 0 1 2 3 4 5
193_27 0 1 2 3 4 5
194_25 0 1 2 3 4 5
194_26 0 1 2 3 4 5
All raster files were converted into vector format in a vectorization procedure from GIS
software. (ESRI, 2010) In this process files were created in vector format (shapefile) and were
stored with the names listed in the list below:
192_26_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
192_26_potsdam_tmger4_atkis_maske_klassifiziert_mln_modal.shp
192_27_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
192_27_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
193_25_potsdam_tmger4_atkis_maske_klassifiziert_md_40_modal.shp
193_25_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
193_26_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
193_26_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
193_27_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
193_27_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
194_25_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
194_25_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
194_26_potsdam_tmger4_atkis_maske_klassifiziert_md_40_bil_modal.shp
194_26_potsdam_tmger4_atkis_maske_klassifiziert_mln_bil_modal.shp
Color.mxd (ArcGIS Project data with color schema / Legend)
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 78 -
After the vectorization was done, the resulting vector data sets were provided with the legend
shown in the following list and Figure 51.
Figure 51: Graphical representation of the legend.
In addition, two mosaics in vector format from the individual scenes were generated.
The following tables show a comparison of the surface distribution of the individual classes in
the various scenes for the Minimum Distance and Maximum Likelihood - Classification.
Here, the surfaces on the basis of the desired of the LFU classes have been computed,
1 broad leaved forest
2 mixed forest: broad leaved forest dominated
3 mixed forest
4 mixed forest: coniferous forest dominated
5 conifer forest
The areas were combined and produced the three major classes
1 broad leaved forest and mixed forest dominated by broad leaved forest
2 mixed forest
3 coniferous forest and mixed forest dominated by coniferous forest.
In evaluating the results is observed to remain there to the effect that the two classification
approaches radically different that the minimum distance method pixels that fit into no
category, unclassified, while the maximum likelihood approach in principle, all pixels are
assigned a class. This means that the maximum likelihood method is more area classified as
forest than the minimum distance method.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 79 -
4.5 Model for detection of types of forest
Some classification algorithms are based in models for predicting an output variable based on
independent input variables. This process is called modelling and the general model used is
described in this section.
In the case of the delineation within forest areas (types of forest), the model applied was:
Forest = function (Bi, NDVI, SDi, TCj, Txk, )
Where
Forest The presence of forest in one pixel of LANDSAT TM.
Bi The value of each pixel in every band of LANDSAT TM, i=1 for the blue up to
7 in the middle infra-red channel. Includes the thermal band i=6.
NDVI The normalised difference vegetation index value in one pixel scaled from -100
to 100. The equation used was: NDVI = 100 (B4-B3) / (B4+B3). Adapted from
Campbell and Wynne (2011).
SDi Standard deviation of the segment in every band i.
TCj The four channels produced by the special tasselled cap transformation. j ranges
from 1 to 4 (Crist and Cicone, 1984; Kauth and Thomas, 1976, cited by
Campbell and Wynne, 2011).
Txk Texture of each band based on the grey level co-occurrence matrix angular
second moment in all directions. This value is obtained from the following
parameters of the image: i the row number, j the column number, Pi, j the
normalized value in the cell i, j, N is the number of rows or columns. The
formula used was: (Trimble, 2011).
Two approaches were used in the delineation of forest types. One was based on pixel analysis,
making a supervised classification of areas within the forest. The other was the segmentation
approach. These approaches have been frequently reported upon in the literature and have
produced different results (Johansen et al., 2007; Gamanya et al., 2009; Watts et al., 2009;
Chirici et al., 2011). In the case of the segmentation approach, the classification was made by
the application of learning machine algorithms with use of cross validation. See section 4.7 on
page 85 regarding “Statistical approaches”.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 80 -
With the model for the delineation of types of forest selected, the segmentation and
classification of forest was made under OBIA.
4.6 Object Based Image Analysis (OBIA)
The OBIA was tested first classifying a subset of one image. When the result was acceptable,
the classification was applied to the complete scene, and then to the rest of the scenes
covering the complete area. For the classification of OBIA product, the next three feature
selection methods were tested: Criterion based, Feature Space Optimization (FSO) and
recursive feature selection.
The OBIA was applied in the order presented as follow:
4.6.1 Variable selection methods
Some classification methods result with higher accuracy with low amount of predictors
(Bacauskiene et al., 2009; Ekbal and Saha, 2011; Laliberte et al., 2011; Liau and Isa, 2011;
Luukka, 2011; Yang et al., 2011). Three types of variable selection methods were tested.
4.6.1.1 Criterion based variable selection
Some methods for variable selection are based in the normality distribution of the variables.
In an analysis of normality those variables it were found no normality distribution. Among the
methods for variable selection distribution free, some of them were found too intensive in
terms of computing processing.
The variable selection process followed the next steps
Consider all the input variables.
Apply an ISOCLASS algorithm.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 81 -
Obtain the vegetation map.
Select sample pixels.
Apply statistical GLM selection forward, backward, …)
o Build a model (variables)
o Find the maximum likelihood for prediction
o Measure residuals
Criteria for selecting the model of minimum residuals
o Akaike Information Criterion (AIC)
AIC(p') = n ln(SS(Res)p)+ 2p' - n ln(n)
o Bayes Information Criterion (BIC)
BIC(p') = n ln(SS(Res)p)+ [ln(n)]p' - n ln(n)
where:
n = number of data points
SS = sum of squares
p = number of parameters
The implementation of the variable selection was based in Geyer (2010) and McLeod and Xu
(2010).
In preparation for the classification, a statistical analysis of each image containing features for
classification was initially carried out. The Bayesian information criterion (BIC) was chosen
for feature selection because tests have shown this criterion to be robust in selection using
different images, resulting in a smaller number of features compared to other selection
methods (López Hernández, Juan Ygnacio et al., 2010).
4.6.1.2 Feature space optimization (FSO)
According to Trimble (2011b) “the feature Space Optimization function offers a method to
mathematically calculate the best combination of features in the feature space.” When
classifying image objects using the Nearest Neighbor classifier, the recommended workflow
involves the next steps:
1. Load or create classes
2. Define the feature space
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 82 -
3 Define sample image objects
4. Classify, review the results and optimize the classification.
This option is available only in eCognition and no other classifiers can be tested.
4.6.1.3 Recursive feature selection
Another tool already implemented for variable selection was chosen from caret package in R.
This method for variable selection is based on random forest. The algorithm uses backwards
variable selection. The process starts with all of the variables as predictors in the model. The
predictors are ranked and the less important ones are sequentially eliminated prior to
modelling. The goal is to find a subset of predictors that can be used to produce an accurate
model. The metrics to select the predictors can be set from accuracy or the root mean square
error (Kuhn et al., 2012). Among the steps documented can be found the eliminating of zero
and near zero variance predictors, identifying and removing the correlated predictors,
removing linear dependent predictors, centering and scaling, imputation, transforming, data
splitting,
The recursive variable selection implemented in caret was applied. The preprocessing steps
were data splitting, centering and scaling the data. No near zero predictor, neither linear
dependent transformation was applied. As the satellite bands are correlated (Campbell and
Wynne, 2011), there was no elimination of correlated variables.
The variable selection was performed backwards stepwise, with the options presented next:
The classes of the samples were weighted.
A number of 10 folds cross validation were set.
Preprocessing the input values by Centering and scaling.
The metrics for selection of the models was the accuracy.
All of the 19 predictors were used for the selection and the iteration evaluated the
relative importance removing predictors one by one.
This method resulted in acceptable accuracies and kappa coefficient. For all the classifications
this variable selection was used.
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 83 -
4.6.2 Segmentation with OBIA
The segmentation was applied with the chessboard method assigning 400 pixels in size. The
next segmentation method applied was multiresolution segmentation with scale parameter of
10. That makes the smaller polygon could have 10 pixels. The relation brightness / shape was
set to 1, in order to consider the brigness and the shape of the polygons should follow the
boundary of the polygons.
The haze masking filter was applied to the object as a procedure for removing areas
surrounding the clouds that normally had problems for distinguish the soli cover type as
described in the next step.
4.6.2.1 Automatic suppression of haze (haze out) by segmentation
Despite of the haze corrections applied to the images, some of them presented the effect of
haze. An object based rule was designed to suppress segments with haze. Clouds and their
shadows are clearly visible in the scene. Sometimes haze is present around the clouds. This
haze was reported as transparent and classified as cirrus by (Watmough et al. 2011). In those
areas, often no class is assigned or a misclassification occurs. An automated approach was set
to eliminate the haze areas. Previous experiments on land cover and change analysis using
OBIA (Ernst et al. 2010; Malinverni et al. 2010), integrated automatized haze suppression
techniques (Watmough et al. 2011). Even when those techniques can solve the problem
concerning clouds, in some other areas algorithms like the automatic cloud cover assessment
(ACCA) (Irish et al. 2006) can affect the spectral response. Watmough et al. (2011) modified
the ACCA algorithm and applied filters to identify clouds. In the case of the scenes under
consideration, some points were found to be affected with artifacts. For this reason an OBIA
approach was selected to solve this problem. In the rule for segmentation, the presence of a
non-forest polygon sharing a boundary with a cloud polygon was the main condition for a
class change, irrespective of the size of the boundary. This rule was applied three times and it
was enough to classify the erroneous non-forest polygons in the vicinity of clouds as clouds.
An outcome of this procedure was the enlargement of the size of the cloud class but only in
the areas surrounding cloud, where a correct classification usually fails. It solved the problem
of determining new sampling points in areas in the vicinity clouds, which can result in a
misclassification of other areas in different places as forest. An example of this effect can be
seen in Figure 52.
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 84 -
Figure 52: The original ortho-scene in (a) is classified with kNN algorithm in (b), the erroneous classification of forest
in (c) and after the application of noHaze algorithm.
In Figure 52 the areas surrounded by clouds (1) are erroneously classified as non-forest.
4.6.2.2 Suppression of island classes within clouds
In the preprocessing step described in section 4.2.5 page 61, all clouds in the scenes were
detected and a mask was build. Some clouds contain small holes. In order to avoid
uncertainty, a segment object rule was set to classify all areas within clouds as cloud. Thus,
this error was eliminated. Figure 53 is an illustration of the effect of this rule.
Figure 53: An example of the suppression of holes within the clouds is presented.
In Figure 53, these areas show the classification of forest, in green, and non-forest in blue-
grey affected by clouds in white. On the left the classification product is presented and island
suppression within clouds is illustrated on the right.
a b c
1
Juan Ygnacio López Hernández A method for forest type detection 4 Methods
- 85 -
After the suppression of island in clouds, only the forest areas´ segments were obtained
4.7 Statistical approaches
The classification of the scenes was made under the machine learning approach. Under this
approach all the reference data is divided randomly into training samples and verification
samples. This division is called fold. Then, a new fold is created by selecting randomly again.
The process is called cross fold validation was made 10 folds cross validation and repeated 4
times.
4.7.1 Definition of parameters for grid search
The technique for grid search and preprocessing suggested by Duro et al. (2012) and Kuhn et
al. (2012) was adapted to the area. At least 400 NFI plots were randomly selected per scene.
In the first step the feature selection was applied. Then it was configured the LGOCV as
validation method, with 10 folds and was replicated 4 times. Even when some studies
reported LOOCV, the method is time consuming and can grow similar accuracies as LGOCV.
With 10 folds the algorithm for learning machines applies a resampling process that chooses
10 groups of BWI plots with 1/10 of the plots in every subset. The first time the first group of
the plots was used as verification samples and the other groups were left as training samples
for the classification. Then, the process is repeated for the second group of plots using the rest
as training samples. This process is repeated 10 times in the classification. By this way, the
LGOCV algorithm ensures that all of the plots are used as training samples and as validation
at least once in every repetition. Under this definition of learning machines, the algorithms for
support vector machines (SVM), random forests (RF), multinomial logistic regression MNL,
k-nearest neighbor (kNN) and Conditional Inference Trees (condInfTree) were used for
classification and regression.
4.7.2 Machine learning classification algorithms
In the first steps, the boundaries of the forest detected were used to clip the scenes. A feature
selection was applied to the set of features available in the segmentation process. Now the
implementation of the machine learning algorithms was made in R language for statistics.
According with the vignettes of caret package, the code was written to apply the five
4 Methods Development of a method for forest type detection Juan Ygnacio López Hernández
- 86 -
algorithm tested. All the algorithms were set to use Leave Group Out cross validation
approach with 10 folds and 5 replications. The training percentage of samples for leave-group
out cross-validation was 2/3 of the total amount of NFI plots in every scene. All the values
were centered and scaled from -1 to 1. All the resampled summary metrics should be saved.
The specific parameters for every algorithm were set as follows:
For the Decision trees
The method used in R was ctree2.
The maxdepthparameter was set from 3 to 11.
For Random Forests
The method used in R was cforest.
The tuning parameter for mtry was set from 1 to the amount of descriptors.
For the multinomial logistic regression
The method used in R was multinom.
The decay parameter was set from 1 to 2-5
.
For the support vector machines
The method used in R was svmRadia.l
The metric used for grid search were the combination of
o c in these values: 1.25, 2.50, 5, 10, 20, 40, 80, 160
o sigma with values of: 0.125, 0.250, 0.500, 1, 2, 4, 8, 16
For kNN method
The method used in R was knn.
the k parameter was evaluated in the values of 3,5,7,9,10 and 11.
The fuzzy classifier was set to a threshold that ensures the selection of only those
classes assigned in the sampling process.
Using the features selected, the kNN classification was applied and resulted in a map of forest
types in the area selected. Contingency tables were created to evaluate classification results
compared with the clusters derivate from NFI data.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 87 -
5 Results
The results of the classifications will be presented separated per image analysis type. The
flowchart presented in Figure 54 is showing the step of the evaluation of the results.
Figure 54: Flowchart presenting the evaluation of the results in the methodology.
5.1 The digital layer of forest types of Bavaria
A digital layer in shapefile format was produced comprising the forest types of the state of
Bavaria in Germany. A graphical view of this layer is presented in the Figure 55.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 88 -
Figure 55: Graphical representation of the map layer produced.
5.2 Variable selection process
5.2.1 Criterion based
The criterion based variable selection was made in every scene. The scene was considered a
sector for the analysis and the order of the sectors is presented in the next figure.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 89 -
Figure 56: Sectors for the statistical evaluation.
The final selection of variables is presented in the next table:
Table 17. Results of the variable selection using the Bayesian Information Criterion (BIC).
Sector Variables selected
1 B1 + B2 + B4 + B7 + NDVI + TC1 +TXT1 + TXT2 + TXT7
2 B2 + B4 + B5 + B7 + NDVI + TXT1 + TXT3
3 B1 + B2 + B3 + B4 + B7 + NDVI + TC3 + TXT1 + TXT4
4 B1 + B2 + B3 + B4 + B7 + NDVI + TC1 + TXT2 + TXT4
5 B1 + B2 + B3 + B4 + B7 + NDVI + TC1 + TXT1 + TXT2 + TXT5 + TXT7
6 B1 + B2 + B3 + B4 + B5 + B7 + NDVI + TXT2 + TXT4
7 B1 + B2 + B3 + B4 + B5 + B7 + NDVI + TXT2 + TXT4
For every scene the minimum distance with a threshold value of 4 times the standard
deviation of basal area, and a classification with maximum likelihood for each scene was
carried out.
For the set of variables selected in every scene, a classification based on SVM algorithm was
applied. The results raised overall accuracies below 30 % and was rejected this procedure.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 90 -
5.2.2 Feature space optimization
K-nearest neighbour (kNN) classification was applied using the variables recommended by
the feature space optimisation (FSO) process, a set of 4 and 5 variables was selected. The
Table 18 present the results on the FSO.
Table 18: Results from the FSO. Features selected for the classification with kNN.
Features Features selected
4 Bands 1, 3, 4, and 5
5 Bands 4 and 7, standard deviation of band 4 and maximum difference
With the five variables selected, the kNN classification was applied and resulted in a map of
forest types in the area selected. After applying the evaluation by contingency matrix, the
result was 50.35 in overall accuracy and 0.3589 for kappa. This result was considered non
acceptable.
5.2.3 Recursive feature selection
The results of the variable selection are presented in the Table 19.
Table 19: Some of the results of the recursive feature selection.
Path
Row
Variables Accuracy Kappa Predictors
192 026 12 0.7410 0.6116 NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3
+ SD6 + SD2 + SD4
192 027 11 0.8397 0.7595 B5 + B6 + B7 + NDVI + B4 + B1 + SD6 + SD4 +
SD1 + SD7 + SD2
193 025 1 0.9022 0.8538 B2
193 026 12 0.7539 0.6308 NDVI + B5 + B4 + B7 + B2 + B6 + B1 + B3 + SD5 +
SD7 + SD1 + SD4
193 027 9 0.7902 0.6852 B5 + B7 + NDVI + B4 + B2 + SD1 + B3 + B1 + B6
194 025 12 0.7576 0.6364 NDVI + B4 + B7 + B5 + SD5 + B2 + B3 + SD7 + B1
+ B6 + SD4 + SD2
The predictors selected are presented in order of relative importance. The accuracy is
presented as fraction from 0 to 1. All the classification and evaluations were processed with
these variables selected by recursive feature selection.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 91 -
5.3 Evaluation of PBIA
The classification of forest and non-forest areas was found with very high accuracy when the
reference data was the aerial photographs. But a close look into some areas revealed that they
were many places misclassified.
5.3.1 Detection of forest areas
The contingency tables produced for the classification of PBIA are presented next. A total of
80 points were evaluated. Another selection of reference points was carried out to reinforce
the evaluation result. (Table 20)
Table 20: Results of the evaluation of the classification for forest and non forest areas for the image in path 192, row
126.
Reference
Classification Forest Water Other Total User Accuracy
Forest 41 41 100
Water 4 33 1 38 86.8
Other 1 1 100
Total 45 33 2 80
Producer Accuracy 91.1 100 50 OA 93.75
Contingency table shows a comparison of the classes forest, water, and other. The total points
and accuracies were evaluated. The acronyms are: UA for user's accuracy, producer accuracy
for PA and OA for overall accuracy. Visual evaluation of the classification was performed but
was dimed out because the reference was the satellite image and no real verification would be
provided. The evaluation should follow a standard statistical approach; the NFI was used to
evaluate the forest type map.
The goal was to have 800 reference points for the evaluation of the forest/non-forest
classification. The procedure used only ortho-photos and the evaluation was made only with
114 points. (Table 21)
Table 21: Results of the evaluation of the accuracy for forest and non forest areas with all of the 6 images.
Reference
Classification Forest Water Other total User Accuracy
Forest 65 1 66 98.484
Water 1 1 100
Other 5 42 47 89.364
total 70 1 43 114
Producer Accuracy 92.86 100 97.67 OA 94.74
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 92 -
With this result the classification of forest and non-forest areas were finished.
The visual evaluation of the forest areas is presented in the following figure:
Figure 57: The first classification using the original bands.
In Figure 57, the colours represent green: forest; grey and light blue: other land. The yellow
lines are forest from the ATKIS layer. A simple analysis was made in order to verify the
forest and non-forest areas and it was found errors that were not quantified in the contingency
table. Figure 58 contains a series of illustrations outlining various results. From these, the
remaining problems become clear:
There is a generalisation of forest land, whereby non-forest islands within forest are
erroneously classified as forest.
An area classified as agricultural land is in fact forest.
Misclassifications occur where the reflection of non-forest land is very similar to that
of forest areas.
Water is often classified as forest.
On the other hand, areas classified as forest in the ATKIS forest layer are shown as
non-forest, but forest or wooded areas are clearly present.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 93 -
Illustration 1: ATKIS borders (left) and area classified forest (right). The smaller forest areas delimited at right by
the classification are confusion with crop lands not presente in ATKIS at left.
Illustration 2: ATKIS borders (left) and area classified forest (right). Small crop areas, right image, were classified
erroneously as forest.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 94 -
Illustration 3: ATKIS borders (left) and area classified forest (right). The blue areas in the central upper part of the
scene were erroneously classified as forest in the right image.
Illustration 4: ATKIS borders (left) and area classified forest (right). In areas with small polygons at the centre of the
image was erroneously classified as forest. They are only present in the classified image at right.
Figure 58: Image sections (in 4 illustrations) distributed over Bavaria visualising the results of the classification.
The class boundaries are shown only as white lines in order to provide a better estimation of
the results. To the left are the limits of the ATKIS forest layers and to the right the
classification.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 95 -
5.3.2 Detection of forest types for the scene in path 192 row 026.
The classification of forest and non-forest areas was evaluated by comparing the inventory
NFI points and randomly generated points outside of ATKIS polygons. The results for both
Maximum Likelihood (MxL) and Minimum Distance (MD) methods are presented in the next
two tables.
Table 22. Contingency matrix comparing the inventory data with the result of the MxL classification method.
Inventory
MxL Forest NonForest Total User %
Forest 142 3 145 98
Non Forest 8 147 155 95
Total 150 150 300
Producer % 95 98 96.33
Overall Accuracy 93.33 kappa 0.93 Table 23. Contingency matrix comparing the inventory data with the result of the MD classification method.
Inventory
MD Forest NonForest Total User %
Forest 116 2 118 98
Non Forest 35 148 183 81
Total 151 150 301
Producer % 77 99 88
Overall Accuracy 87.71 kappa 0.75
The total accuracy for the MxL method was 8 % higher than that of the MD method. The user
accuracy (User %) and producer accuracies (Producer %) were similar. A general illustration
of the results of PBIA classification is presented in Figure 59.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 96 -
Figure 59: General view of the results of the PBIA classification.
5.3.3 Evaluation of the pixel oriented classification of forest types for all of the
scenes
All NFI plots were used in the evaluation of the classification. The evaluation of the
classification based on pixel analysis provided the results presented in Table 24.
Table 24: Results of the accuracy assessment for the classification per LANDSAT TM image.
Complete
MxL
MD
Scenes id n OA k OA k
192 026
694
27.52 0.09
22.33 0.08
192 027
469
38.38 0.09
47.33 0.17
193 025
147
21.09 0.04
29.93 -0.03
193 026
1047
21.20 0.07
13.75 0.04
193 027
909
26.95 0.05
57.65 0.18
194 025 1310 30.69 0.13 62.37 0.11
id Image identification by path and row (ppp rrr) from LANDSAT standard
worldwide reference system 2.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 97 -
n Number of reference points in forest used for the evaluation.
OA Overall accuracy in percent.
k kappa coefficient.
The accuracy of the classification of forest areas was deemed low. As the ATKIS data
contains the official boundaries of forest for Germany, it was decided to continue with the
classification of types of forest using ATKIS as mask to extract the values of the features
selected.
The ATKIS forest layer was the verification to be used for further classification within forest
only, after analysis of the reference data. The ATKIS areas approach was used because the
boundaries are considered ‘safe’ forests. Even where these boundaries do not correspond with
the actual images, they correspond to the official specification of forest.
5.4 Evaluation of OBIA
The evaluation was made based on the values of overall accuracy and kappa found for every
classifier.
5.4.1 Subset of a scene
A test of the classification made using OBIA was conducted for the subset of the scene in path
192 row 026. The objective was to evaluate whether this classification of segment objects
produced a higher level of accuracy than the pixel-based approach, as reported in the
literature.
From every classification algorithm a report of the procedure was obtained. The most
important information from those reports is presented here.
SVM
In the case of the application of the SVM, the accuracy estimated by the library e1071 was
obtained. The number of support vectors used was 23 and the estimated accuracy was 91.4%
in the automatic evaluation process. The levels of accuracy reported in this table are related to
10 cross fold validation processes based on bootstrapping of the training samples.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 98 -
kNN
The training with exhibited a misclassification rate of 52.75 %. And was considered too low.
MNL
The MNL regression reported residuals for the classification. The residual deviance was
0.0001253361 and the AIC was 68. The pixel-based MxL classification provided better
results than the OBIA classification method.
Results of classifications in a subset of the scenes were ready, and then the classification of
the complete scenes can be made. Results of this classification are presented in the following
section.
Contingency matrixes were made to evaluate the classification of forest types. The result per
classification algorithm is shown next.
5.5 OBIA of the forest types for the whole scene
A new classification was carried out in the forest. The boundary of forest was the one detected
in the section 4.4.1 (page 70). This classification is based on pixel analysis in one approach
and OBIA tools for the other approach.
5.5.1 Classification of the forest types for the whole scene in path 192, row 026
The output of the PBIA Maximum Likelihood (MxL) classification of forest and non-forest
areas was used as a mask for the complete image. The result was employed using the two
discrimination methods within forest classes. The methods MxL and kNN provided the best
results in the previous classification. The User and Producer accuracies are presented for each
classification for forest types. (Table 25 and Table 26)
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 99 -
Table 25: Contingency table showing the results of the classification using MxL compared with the inventory data.
Inventory
MxL Broad Conifer Total User %
Broad 4 16 20 20
Conifer 14 128 142 90.14
Total 18 144 162
Producer% 22.22 88.89
Overall Accuracy 81.48 kappa 0.11
Table 26: Contingency table showing the results of the classification using kNN compared with the inventory data.
Inventory
kNN Broad Conifer Total User%
Broad 11 5 16 68.75
Conifer 7 139 146 95.21
Total 18 144 162
Producer 61.11 96.53
Overall Accuracy 92.59 kappa 0.61
The amount of evaluation samples per class of forest was not identical. There may be
differences in the total number of evaluation samples per class but they should be similar. In
this case the broad leaved forest is 10 % compared with the area of conifer forest. With those
unbalanced data the higher class would present a false overall accuracy. There are two options
available to solve this problem: either reduce the higher more frequent class or increase the
smaller less frequent class. The first option was selected because the number of pure
broadleaved forest plots, the less frequent class, was limited in most regions. A random
sample of a size similar to this less frequent class was selected.
The algorithm for the segmentation of complete LANDSAT satellite images functions well
but requires large computer processing capabilities. In order to make it available to smaller
processing configuration systems, the FastWay segmentation approach was developed. The
core aspect of this algorithm is that after the classification, a feature space optimisation
process provides a list of the layers that are really needed and the kNN classification approach
can be applied by class, or by pairs of classes, with a nested interactive procedure. The
classification is set first to the most important classes, with the remaining classes classified
afterwards in order to illustrate the surrounding areas not centrally important to the evaluation
process. The product can be set to raster rather than vector polygons, and the time required to
export the product can be reduced from 50 minutes to 3 minutes. Using this method, a
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 100 -
complete process consisting of importing, segmenting and classifying a single LANDSAT
image can take about two hours.
An adaptation of the method presented by Trimble (2011) a segmentation method using
training samples from ESRI shapefiles was implemented. The training samples were stored as
a level in the project for segmentation and classification. Each training sample was set by
reading one attribute from a shapefile to the kNN classifier. This approach accelerates the
identification of the most suitable training samples necessary to train the classifier using
Trimble’s eCognition modules. Some visually homogeneous training samples may contain
different polygons obtained in the segmentation, due to differences in either spectral response
or in texture. At this point the operator can choose which segments best represent the class to
train.
The FastWay method for segmentation and classification of forests was applied in the
discrimination of forest types from LANDSAT TM images. The kNN classifier was trained
using samples selected visually and applied to the whole image. The evaluation was made
using selected plots from the German NFI.
Another approach is to apply the set of machine learning algorithms for classification and
regression to every scene. That approach is described in the next section.
5.5.2 Classification of 6 scenes
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 101 -
Results of tuning process with the accuracies obtained are presented in the Figure 60.
Figure 60: Results of tunning process with the accuracies obtained.
The accuracies found tuning parameters for Decision Tress (DT), multinomial logistic
regression (MNL), k Nearest Neighbour (kNN) and Random Forest (RF) clasiifiers is
presented. The objective is to find the tuning parameter, in x axis, that could reach the highest
accuracy, in y axis, for every case. For every scene similar calculation was made.
The accuracy was below 70 % in each classification method presented in the Figure 60, when
tunned the Support Vector Machines (SVM) algorithm the accuracy continued very low.
(Figure 61)
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 102 -
Figure 61: Levels of accuracies found in the tuning process applied with SVM algorithm .
The kNN provided good results for one image, but was not successful overall. The value
found for the fuzzy classifier was 0.60. Each segment was labelled according to a class only
when the fuzzy value of 60 % certainty was reached. This value should be assigned an image
basis in order to ensure the correct classification of the segments. SVM classification was
implemented within forest areas. SVM classifier, with a radial function, tuned for c (cost) and
gamma (some authors refer to sigma) parameters, was applied to the training samples and, as
an example, one of the images provided the same results as the image with path 193 and row
025. Using 10-fold cross-validation on a training data set, the total accuracy was 100 % in the
training phase. The individual accuracy per fold was 100 %. This indicates that the
classification adjusted very well to the training samples selected. A similar output was found
in the next images for the classification with SVM.
As described in the section 4.7.2 (page 85), the classification algorithms were applied under a
machine learning approach. The result of this process is presented in the following section.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 103 -
5.5.3 Results of the application of the machine learning classification
algorithms
Every ML algorithm was applied 50 times on the data. The output is an object with the
specifications of the results for every step. A summary of one result for SVM in every scene
is presented in the Appendix 1.
In the case of DT, an illustration of a decision tree is presented in Figure 62.
Figure 62: Example of one decision tree for the image in path 192, row 126.
Following is a list of one execution of DT. With the regression of the trees produced all the
segments were classified. After the validation, contingency tables were created for DT
algorithm.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 32 4 2
2 25 38 12
3 11 14 16
Overall Statistics
Accuracy : 0.5584
95% CI : (0.4763, 0.6383)
No Information Rate : 0.4416
P-Value [Acc > NIR] : 0.002339
Kappa : 0.3331
Mcnemar's Test P-Value : 7.933e-05
The result is presented below for RF:
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 104 -
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 47 8 4
2 16 41 5
3 5 7 21
Overall Statistics
Accuracy : 0.7078
95% CI : (0.6292, 0.7782)
No Information Rate : 0.4416
P-Value [Acc > NIR] : 2.173e-11
Kappa : 0.5453
Mcnemar's Test P-Value : 0.3748
The MNL resulted as follows in the list.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 41 8 4
2 14 29 9
3 13 19 17
Overall Statistics
Accuracy : 0.5649
95% CI : (0.4828, 0.6445)
No Information Rate : 0.4416
P-Value [Acc > NIR] : 0.001395
Kappa : 0.3441
Mcnemar's Test P-Value : 0.018801
For the kNN algorithm resulted as shown in the next list.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 44 7 2
2 13 36 7
3 11 13 21
Overall Statistics
Accuracy : 0.6558
95% CI : (0.5751, 0.7304)
No Information Rate : 0.4416
P-Value [Acc > NIR] : 6.959e-08
Kappa : 0.4777
Mcnemar's Test P-Value : 0.02006
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 105 -
For the SVM algorithm resulted as shown in the next list.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 43 8 1
2 19 39 7
3 6 9 22
Overall Statistics
Accuracy : 0.6753
95% CI : (0.5953, 0.7485)
No Information Rate : 0.4416
P-Value [Acc > NIR] : 4.193e-09
Kappa : 0.501
Mcnemar's Test P-Value : 0.04015
The process for tuning the SVM under grid search generated more statistical results that were
analyzed and evaluated in order to find the best combination of C and sigma for the highest
accuracy. In the Apendix 2 (page 142) the results for the scene in path 192, row 026 are
presented.
All the confusion matrixes presented are related to the last run of the machine learning
method. A stronger evaluation could be made considering all the executions of the algorithm
to have an idea of the errors. The classifiers can be compared by using all confusion matrix
produced. The values of overall accuracy were plotted and a distribution of the errors can be
evaluated. As an example, the overall accuracy (OA) values obtained are presented in the next
box plot diagram.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 106 -
Figure 63: Diagram with the output of the ML algoritms applied.
In Figure 63 the mean accuracy and the variation across the 4 repetitions with 10 folds
LGOCV approach are shown. The accuracy is presented from 0 to 1 in the x axis. The ML
algorithms presented are svm as Support Vector Machines, rf as Random Forests, mnl as
Multinomial Logistic Regression, kNN as k Nearest Neighbour and CondInfTree as
Conditional Inference Trees. For illustration proposes the table with the results of the SVM
algorithm is presented next.
Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3 + SD6 + SD2 + SD4
792 samples
12 predictors
3 classes: '1', '2', '3'
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Summary of sample sizes: 525, 525, 525, 525, 525, 525, ...
Resampling results across tuning parameters:
C sigma Accuracy Kappa Accuracy SD Kappa SD
1.25 2 0.685 0.528 0.0335 0.0502
1.25 3.5 0.704 0.557 0.0316 0.0473
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 107 -
1.25 5 0.709 0.563 0.0312 0.0469
1.25 6.5 0.707 0.56 0.0303 0.0454
1.25 8 0.707 0.561 0.0242 0.0363
2.19 2 0.7 0.549 0.0368 0.0552
2.19 3.5 0.71 0.565 0.0277 0.0416
2.19 5 0.709 0.563 0.0288 0.0433
2.19 6.5 0.704 0.556 0.0294 0.0441
2.19 8 0.707 0.561 0.0257 0.0385
3.12 2 0.7 0.549 0.0339 0.0508
3.12 3.5 0.712 0.569 0.0258 0.0386
3.12 5 0.708 0.562 0.0277 0.0416
3.12 6.5 0.704 0.557 0.0297 0.0446
3.12 8 0.707 0.561 0.0261 0.0391
4.06 2 0.7 0.55 0.0318 0.0477
4.06 3.5 0.712 0.567 0.0256 0.0384
4.06 5 0.708 0.562 0.0277 0.0416
4.06 6.5 0.704 0.557 0.0297 0.0446
4.06 8 0.707 0.561 0.0261 0.0391
5 2 0.704 0.556 0.0321 0.0481
5 3.5 0.711 0.566 0.0233 0.0349
5 5 0.709 0.563 0.0273 0.0409
5 6.5 0.704 0.557 0.0297 0.0446
5 8 0.707 0.561 0.0261 0.0391
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 3.12 and sigma = 3.5.
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 3.125
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 3.5
Number of Support Vectors : 529
The final model selected for this scene was Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 +
SD7 + SD3 + SD6 + SD2 + SD4. Some of the predictors were rejected and the accepted
ones are presented in order of importance for every model. For this scene 729 samples were
used in total. Centering and Scaling was used to standardize the input data. The grid search
method was applied in two levels for finding the best combination of cost (C) and Sigma
parameters (described in section 2.1.5.4, page 18) that ensures the higher accuracy value. The
main problem is that both parameters C and sigma are unknown and should be found in every
scene. When in some scene, with the first pass the best combination of C and sigma
parameters was in the boundary of the searching grid, the boundaries were readjusted and the
search was repeated. Every result is the best combination found by this process. Now the
results show both OA and kappa values and their standard deviation during the replication of
the process.
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 108 -
The classification was set to Radial Basis Kernel function that is the most appropriated for
nominal output.
The next tables show the results of the tuning process for all of the scenes. The figures were
rounded up to 3 decimals.
Scene 192 026
Final Model Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3 + SD6 + SD2 + SD4
Table 27: Accuracies obtained for the scene in path 192 row 026 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.19 3.12 4.06 5.00
2.00 0.685 0.700 0.700 0.700 0.704
3.50 0.704 0.710 0.712 0.712 0.711
5.00 0.709 0.709 0.708 0.708 0.709
6.50 0.707 0.704 0.704 0.704 0.704
8.00 0.707 0.707 0.707 0.707 0.707
This table shows the trend of the accuracy by changing the sigma and C values for the SVM
algorithm. The same data is presented in Figure 64.
Figure 64: Trend of the accuracy by changing the sigma and C values for the SVM algorithm.
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 109 -
Similar tables of accuracy were obtained for every scene under consideration. The complete
list of tables for SVM is presented in the Apendix 1 (Page 139).
Another set of BWI plots were used for verification of the results. With this set of plots new
contingency table was build. The results obtained are presented next:
Table 28: Contingency table for the scene in path 192, row 026 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.6503
Similar results were found for the rest of the scenes. The best classifier, its correspondent tuning parameters and the accuracy evaluation are presented in Apendix 5.
192026 Inventory Clusters
SVM 1 2 3 Total UA%
1 93 14 1 108 86.11
2 14 55 0 69 79.71
3 7 14 31 52 59.62
Total 114 83 32 229
PA% 81.58 66.27 96.88
OA% 78.17
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 110 -
5.5.4 Comparing classifiers
With the best classification found, the next logical question to answer is how good are the
other classification algorithms? In order to answer that question a calculation of the
differences among algorithms was calculated and the Bonferroni test of adjustments for
multiple tests based in the Bayesian theory was conducted.
The accuracy per method is presented in the next table:
Table 29: Accuracies found applying SVM algorithm for classification with 4 replications and 10 folds LGOCV for
the scene in path 192, row 026. The minimum, first quarter, median, mean, third quarter and maximum accuracy is
shown. The figures are in the scale from 0 to the worst accuracy and 1 to the best.
Accuracy Min. 1stQu. Median Mean 3rdQu. Max.
svm 0.6292 0.6891 0.7041 0.7056 0.7303 0.764
rf 0.5356 0.6217 0.6386 0.6353 0.6554 0.6891
kNN 0.5131 0.5609 0.573 0.5737 0.5843 0.6217
CondInfTree 0.4831 0.5693 0.5843 0.5841 0.6105 0.6442
mnl 0.5206 0.5393 0.5618 0.5612 0.5777 0.603
Models: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 891000
The box plot diagram made to show graphically the mean accuracy among classifiers was
shown in Figure 63, on page 106.
In the next table, the differences in accuracy, with scale from 0 designing the worst to 1 for
the best, among classifiers are presented in the upper diagonal and the lower diagonal present
the p value from the Bonferroni test. The p-value is measuring the estimate of the difference
found could be by chance.
Table 30: Differences in accuracy among classifiers applied to the scene in path 192, row 026. The accuracy and their
differences is presented in the scale from 0 to 1.
svm rf kNN CondInfTree mnl
svm
0.07034 0.13193 0.1215 0.14441
rf < 2.2e-16
0.06159 0.05116 0.07407
kNN < 2.2e-16 < 2.2e-16
-0.01042 0.01248
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16
0.02291
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Similar tables found for every scene were made and can be found in the Apendix 4 (page
150).
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 111 -
5.5.5 Assessment of forest type classification
The assessment was made considering only those plots that are more than 45 m away from
forest boundaries. By this condition, possible misregistration errors could have no influence in
the analysis.
The NDVI in the plots for the evaluation was set to values higher than 0.50. This threshold
considers that the density of trees in every place for the evaluation should be high. The
influence of soil reflectance is minimized by this way.
This assessment generated new contingency tables based on one third of the samples reserved.
The complete output of the evaluation for the image in path 192, row 126 can be found in the
Apendix 3. In the following tables the results of the tuning process per image is presented.
Table 31: Results of assessment the classifications for the image in path 192, row 026.
Scene 192 026
Algorithm OA kappa BestParameters
MxL 0.275 0.090
MD 0.223 0.079
SVM 0.704 0.569 C = 3.12; sigma = 3.5
kNN 0.573 0.375 k = 3
mnl 0.562 0.249 decay = 1
RF 0.639 0.483 mtry = 10
DT 0.584 0.404 maxdepth = 10
Model: Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3 + SD6 + SD2 + SD4
Table 32: Results of assessment the classifications for the image in path 192, row 027.
Scene 192 027
Algorithm OA kappa BestParameters
MxL 0.384 0.095
MD 0.473 0.171
SVM 0.799 0.738 C = 1.25; sigma = 1
kNN 0.738 0.630 k = 5
mnl 0.685 0.535 decay = 0.0312
RF 0.783 0.693 mtry = 10
DT 0.741 0.630 maxdepth = 10
Model: Y ~ B5 + B6 + B7 + NDVI + B4 + B1 + SD6 + SD4 + SD1 + SD7 + SD2
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 112 -
Table 33: Results of assessment the classifications for the image in path 193, row 025.
Scene 193 025
Algorithm OA kappa BestParameters
MxL 0.211 0.043
MD 0.299 -0.030
SVM 0.567 0.569 C = 160; sigma = 16
kNN 0.748 0.778 k = 3
mnl 0.431 0.121 decay = 0.125
RF 0.684 0.527 mtry = 1
DT 0.702 0.683 maxdepth = 11
Model: Y ~ B2
Table 34: Results of assessment the classifications for the image in path 193, row 026.
Scene 193 026
Algorithm OA kappa BestParameters
MxL 0.212 0.073
MD 0.138 0.035
SVM 0.712 0.598 C = 1.25; sigma = 2
kNN 0.602 0.456 k = 3
mnl 0.612 0.424 decay = 0.25
RF 0.672 0.521 mtry = 5
DT 0.597 0.409 maxdepth = 7
Model: Y ~ NDVI + B5 + B4 + B7 + B2 + B6 + B1 + B3 + SD5 + SD7 + SD1 + SD4
Table 35: Results of assessment the classifications for the image in path 193, row 027.
Scene 193 027
Algorithm OA kappa BestParameters
MxL 0.270 0.053
MD 0.577 0.182
SVM 0.765 0.671 C = 1.25; sigma = 1.75
kNN 0.727 0.596 k = 3
mnl 0.632 0.465 decay = 0.0312
RF 0.779 0.678 mtry = 3
DT 0.734 0.622 maxdepth = 5
Model: Y ~ B5 + B7 + NDVI + B4 + B2 + SD1 + B3 + B1 + B6
Juan Ygnacio López Hernández A method for forest type detection 5 Results
- 113 -
Table 36: Results of assessment the classifications for the image in path 194, row 025.
Scene 194 025
Algorithm OA kappa BestParameters
MxL 0.307 0.130
MD 0.624 0.108
SVM 0.706 0.559 C = 3.12; sigma = 3.5
kNN 0.574 0.499 k = 9
mnl 0.561 0.547 decay = 0.0312
RF 0.635 0.557 mtry = 9
DT 0.584 0.467 maxdepth = 7
Model: Y ~ NDVI + B4 + B7 + B5 + SD5 + B2 + B3 + SD7 + B1 + B6 + SD4 + SD2
5 Results Development of a method for forest type detection Juan Ygnacio López Hernández
- 114 -
Juan Ygnacio López Hernández A method for forest type detection 6 Discussion
- 115 -
6 Discussion
The developed method is carried out in several steps. Because some steps resulted better than
others, the discussion is presented.
6.1 Definition of forest
The definition of forest was adapted considering NDVI > 50. Carreiras et al. (2006) found
linear regression between NDVI and tree canopy cover for oak using aero photos and
LANDSAT TM images obtaining results with high correlation values when spectral bands
only where the predictors. The same authors showed the patter of canopy cover and NDVI is
close to linear on high NDVI values. Knyazikhin et al. (1998) made an algorithm for the
estimation of the vegetation canopy leaf area index and fraction of absorved photosintetically
active radiation that were based on 70 % of canopy closure. Carlson and Ripley (1997)
reported that NDVI and the fraction of canopy cover close to 100 % was well predicted using
simple radiative transfer model. They also found that the preprocessing with scaling
eliminated the need of rigorous atmospheric corrections. Even when the method developed
does not measure leaf area index (LAI) and they are no fixed relationship between NDVI and
canopy closure in forest (Gamon et al., 1995), according to the linear regression published by
Larsson (1993) for acacia wood lands, considering NDVI > 50 ensures canopy closure at least
40 %. When were fixed the lower NDVI value to 50 in the selection of inventory plots for
classification and evaluation, classification accuracies were improved by 10-20% and kappa
values raised above 0.5.
6.2 Regarding the satellite scenes
The use of Landsat TM 4 or 5 images in the classification of forest types is advisable for
reasons of both economy, given the free access to the data, and the low level of data
processing required.
Satellite data are readily available today and in the future will increasingly be available free of
charge. These data may also be included in this analysis in future. Much information can be
collated from a visual evaluation of the data, such as forests, their variation, and distribution
of broadleaf and conifer dominated areas. If, as in this study, the information is derived from
an automatic classification, it should be noted that the quality of the result depends not only
on the quality of satellite data, but also on the quality of the terrain model and the available
6 Discussion Development of a method for forest type detection Juan Ygnacio López Hernández
- 116 -
reference points. Only when a high resolution terrain model is available and a sufficient
number of reference points is defined for all classes, evenly distributed over all of the satellite
images available, can a good result be achieved. In addition, different classifiers should be
tested for large-scale approaches because the results revealed that, using the same input data,
different classifiers provide significantly different results.
6.3 Preprocessing the scenes
The conversion of formats ensures the quality of the input data. The process for format
conversion y the software used is controlled internally and the user is not able to know how
accurate is this step. The evaluation of the import tasks was made visually over each file
processed. The quality was acceptable in every step.
The DEM provided by the LFU was in ASCII format and the null value found in that file was
-32000 and the coordinates was in Gaus-Krugger system, zone 4. The boundaries of the DEM
extended away from the boundary of Bavaria state, which was conveniently used in the
orthorectification process. Some small areas had the elevation value of -89. A reclassification
option was used to mask out those values with elevation lower than 0. This mask ensured that
nosy values could not affect the orthorectification.
6.4 Orthorectification
In the orthorectification the use of at least 20 GCP well distributed per scene ensured that the
images were correctly orthorectified and the systematic and non-systematic errors were solved
up to the size of the diagonal of the original pixel size in the LANDSAT scenes, which is 45
m. A test was made overlaying the ATKIS forest boundaries over de orthoscenes and the
visual evaluation of the boundaries of forest compared with the boundaries present in some
forest areas of the scenes had very good match.
After analyzing the scenes, the effects of the topography, mainly in the Alpine region, the
need of topographic normalization was reconsidered. Instead of applying the correction for
the whole scene, only some hilly areas were processed and the results are not visible, but the
effect in normalization of the values is evident in a profile of the image in those sectors. The
effect of the normalization was a homogenization of areas with different illumination
Juan Ygnacio López Hernández A method for forest type detection 6 Discussion
- 117 -
conditions caused by the topographic position. This processing ensured the quality of the data
for input in the classification algorithms selected.
A few displacements were found following the orthorectification of the images. The
evaluation of the classification was carried out using data from inventory plots selected
randomly or on the basis of an NDVI value. Using an NDVI greater than 50, the selection of
NFI points produced good evaluation results.
6.5 Clouds and haze
The main difference among scenes is the amount of clouds and shadows of clouds in the
scenes. Some of them have almost no clouds but others have scattered clouds all over the
scene, or over the forest.
The adaptation of the ACCA algorithm covered all of the dense clouds in the area. That mask
was used also for removing those NFI plots covered by clouds and avoiding false results in
the posterior evaluations of the classifications.
Despite that the clouds can be suppressed, they are no problem themselves; some clouds are
usually surrounded by haze. That haze not always can be found and their treatment is limited
without atmospheric records per pixel in the day of the scene. That condition can explain the
low agreement presented by the kappa coefficient in some of the results.
The identification of haze in the scenes was useful for masking out those areas with problems
in the values obtained by LANDSAT TM and those areas were removed both from the
classification and the evaluation of the classification. Other no visible haze could be present
and that could be a source of errors in the classification.
Eliminating island inside clouds by the object oriented approach helped to remove more noisy
areas and ensures the quality of the data used for the classification.
All those preprocessing steps were carried out conveniently. Only validation of the reflectance
values was not carried out. It was recommended by USGS personal in telephonic interview
that the scenes do not need to be modified because in their processing chain the validation
was made.
6 Discussion Development of a method for forest type detection Juan Ygnacio López Hernández
- 118 -
6.6 PBIA approach
In the processing on the PBIA of the scenes, the evaluation for both, a sector of the scene in
path 192, row 126 resulted with enough high accuracy and in this case, even better than the
Machine Learning algorithms applied to the OBIA data.
The classification of complete scenes, applied on the pixel oriented approach, the training
samples were selected from areas in the original LANDSAT TM scene. This process ensured
homogeneity of the training samples for both algorithms ISOCLASS, Maximum Likelihood
and Minimum Distance. But the selection of the validation samples was made by the operator
and a bias in the results were introduced. This effect was confirmed with a review of the
forest areas with ATKIS forest layer.
6.7 OBIA approach
For the classification of complete scenes, applied on the OBIA approach, the training samples
were selected fist from the ortho-fotos provided by LFU. The results were not successful and
then, two tests were made: a) for selecting the training samples from visual selection of the
segments, fitting the algorithms and applying a regression to the rest of the objects from the
scene, b) the second option was to train the Machine Learning (ML) approach with
information from the NFI plots. Under this second option best results were found. A reason
why the OBIA approach resulted in lowest accuracies than the pixel oriented approach is that
the NFI plots are reporting trees with very low canopy crown closure. In some cases, the
values obtained in the scenes suggest that they are no forest, but a plot in the inventory is
present. Those points were not removed from the NFI database, but a filter on NDVI > 0.50
was applied and the accuracy increased up to the values reported.
When the classification applied was using kNN algorithm, the overall accuracy obtained has a
maximum of 84 % and minimum of 58%. This result was obtained using unbalanced
evaluation samples.
The evaluation of the classifications resulted with different behavior per scene. It suggests
that they are steps missing in the preprocessing or the model applied is not the most adequate
for the data used. However, the kappa values were greater than 0.5 for every best algorithm
per image. The lowest kappa value was 0.559 for the scene in path 194, row 025. That image
is not too greatly affected by clouds and was situated at the northern boundary of the state. In
that region, corresponding to the north most part of the Southwestern highlands Scarps and
Juan Ygnacio López Hernández A method for forest type detection 6 Discussion
- 119 -
south most of the Western highlands, hosts the largest proportion of broadleaved forest and
the clustering made of the NFI data had the worst agreement between the classification and
the evaluation of this areas.
The ISODATA algorithm resulted relatively well in the delineation of forest / non-forest
areas. But this algorithm was not totally evaluated. The accuracy of the classification by this
algorithm was not measured after discussing that the boundaries of the forest detected were
not coincident with ATKIS forest layer. From that point, the ATKIS forest layer was used as
boundary of forest and started the evaluation of the other algorithms.
The best result in accuracy and kappa was found with the OBIA approach. The scene located
in path 192, row 025 obtained an accuracy of 85.1 % and kappa 0.778. This scene is located
just at east of the scene with the worst results in classification. These results are relatively low
compared with Knorn et al. (2009) who found more than 90 %, but they used other optical
scenes as verification without forest inventory data. In bigger areas Gjertsen (2007) found
79.6 % in overall accuracy for spruce species only. That was the best accuracy found in those
studies with multisource forest inventory. Dees et al. (2000) found errors from 15 to 46.3 % in
classification of satellite images based in forest inventory data for large stand. They classified
in a stand level with 10 stands in the analysis. That results cannot be compared with the
classification of more than 6000 plots.
Focusing in the automatization of the production of the forest cover layer, the OBIA
segmentation with eCognition is susceptible to be automated. The import of the original
scenes and export of the segments in formats raster or vector can be placed in a processing
chain that only ask for the name of the input data and writes the products with similar names
and the parameters for the segmentation like scale and roughness.
6.8 OBIA method with fast segmentation
A new FastWay method for the segmentation of complete images was implemented in this
process. The method reduced the computing time from 2 hours to 20 minutes in the worst
case. The time required to achieve a complete image classification was approximately 2
hours. Most of this time (about 50 minutes) comprised converting the results to vector
shapefile format.
6 Discussion Development of a method for forest type detection Juan Ygnacio López Hernández
- 120 -
6.9 Variable selection methods
The criterion based variables selection was developed initially for numerical output variables,
for that reason, the application of this tool for categorical variables is not totally accepted. In
this point the intention was to get an indicator of the variables to use in a classification
process, not useful for regression proposes.
The FSO present in eCognition proposed a set of variables for the subset of the scene in path
192, row 126. When used, that set of variables did not improve the results in accuracy. For
that reason the recursive backwards stepwise feature selection was applied.
In four of the 6 scenes under consideration 11 or more predictors were selected. The NDVI
was among the first 4 most important predictors. The bands 5 and 7 share relatively the same
importance. The bands 1, 3, 4 and 6 were sometimes selected among the first 4 most variables
for the whole set of scenes.
The feature space optimization (FSO) was combined with a visual selection of bands to
increase the accuracy of the classification of forest and non-forest areas from 80 % to more
than 92 % with unbalanced samples. That tool for variable selection proved very useful in the
training set selection of the classification employing the kNN method. Murtaugh (2009)
compared the performance of several variable selection methods, but the FSO was not
considered.
6.10 The best classifier
In four of the six images processed, the SVM was the best algorithm on the basis of the
accuracy and kappa values obtained. In each of these images, the best C value was below 2
and the best sigma value below 4. The kNN proved best for the image in the path 193, row
025. The best k value was 3 and the degree of accuracy was 15 % higher than in the case of
the SVM. This was the image with the smallest model of all (Y ~ B2), with the classification
of forest depending only on the data in the green band. It was reported in the reference that the
kNN is a good classifier for data with low amount of predictors. (Díaz-Uriarte and De Andres,
2006; Deegalla and Boström, 2007)
The SVM is a classifier that can be adapted to all of the restrictions in place, but in some
images the position of the NFI plots was not representative of all of the forest communities in
the image; for example, situated on steep slopes or covered by clouds.
Juan Ygnacio López Hernández A method for forest type detection 6 Discussion
- 121 -
In general, the basal area is not an optimal predictor of forest type for the data provided by
LANDSAT TM when applying simple linear models. However, the accuracy values obtained
suggest that it should be used when modelling forest inventory data using images. This
finding is similar to Meng et al. (2009) who found good predictor the basal area in a
geostatistical approach for classification of Landsat ETM+ scenes.
The mixed effects models (MEM), generalised additive models (GAM) and Gaussian mixed
models (GMM) were not considered in this study.
When compared to one another, the SVM proved the best classifier. The mean overall
accuracy was 70.41%, about 10 % higher than the closest alternative classifier, the RF. The
difference was 7 % with a statistically high level of significance indicated by p < 2.2e-16. A
work made by Knorn et al. (2009) found a good classifier in SVM but recommended to test
other classifiers. Duro et al. (2012) did similar research with tree classifiers and lower
amount of scenes, used SPOT data and only multiresolution segmentation in their approach.
The conditional inference trees algorithm was the next best approach. The worst was MNL
followed by the kNN.
Compared with MNL and SVM in the first variant of the methods applied, the MxL resulted
with a high degree of accuracy in the discrimination of forest areas. It was found 81,48 %
accurate, but not acceptable after evaluation of the map produced with the ATKIS layer.
6.11 Comparing kNN and SVM
When the tests were performed over a subset of the image in path 192, row 026, the kNN
classifier proved best in this evaluation in terms of accuracy (92.59 %). The SVM classifier
tuned for c and the gamma parameter was similar to kNN. The accuracy of both classifiers
rose to 75 % for all images with unbalanced samples. When the weighting process was
applied, the accuracy of the kNN classifier dropped below that of the SVM and RF. But RF
was found as good predictor only under certain conditions and the properties of RF
computation makes the best algorithm in low resources requirement and speed. (Verikas et al.,
2011)
When kNN is applied in eCognition, the fuzzy operator should be tuned on an image basis to
a level at which the class can be classified correctly. Just one fuzzy threshold value for all
classes is enough for a good separation between classes in the training process. For the
6 Discussion Development of a method for forest type detection Juan Ygnacio López Hernández
- 122 -
evaluation phase, the fuzzy membership of 80% was found well separating the classes was
found to separate the classes well. This procedure is useful for the evaluation without NFI
data. The disadvantage of using this approach is that there is no possibility to tune the k
parameter with this programme. Hu et al., (2008) reported SVM as the best classifier for
experimental data. Now with diverse real conditions the superiority of SVM is demonstrated
as non universal.
6.12 PAM clustering the NFI
The use of PAM algorithm in the unsupervised classification of the NFI data obtained the
precise plots that are the centroids of the cluster. The Table 10 (page 49) presents the tract and
the corner for each cluster. Such plots cannot be found when other unsupervised clustering
methods are applied, such as ISOCLASS or Kmeans. With this information, the forest service
is able to use those plots for research with special proposes beyond the forest inventory. The
separation of the clusters was not perfect, but it is useful for classification proposes. As was
used by Seidl et al. (2007) who clustered forest inventory data to assess the trade-offs between
carbon sequestration and timber production. Similar approach was applied by Middleton et al.
(2012) in an ordination approach to classify biotopes with hyperspectral remote sensing data.
Overall accuracies they found are comparable with the accuracies found in this research.
Based on the large values in dissimilarity obtained in the clusters from the NFI, two special
groups of mixtures in conifer forests were found and correctly identified in the images. That
identification is not easy to obtain by visual interpretation.
6.13 The border effect
The level of accuracy doubled, generally rising from 30 % to 60 %, when a buffer of 1.5
pixels around the forest boundary was used to exclude the NFI plots situated close to the
forest edge.
Juan Ygnacio López Hernández A method for forest type detection 6 Discussion
- 123 -
6.14 Other predictors from the NFI
The number of individuals per plot was considered. This information was available in the data
set obtained. However, no appreciable effect on the accuracy was attained using this feature in
the validation process. The general histogram of the number of trees per ha, and stock in
m3/ha revealed behaviour similar to basal area (m2/ha) for this data set.
Some other reports suggest that it should be good to use LANDSAT scenes from multiple
dates (scene stacks) in order to identify precisely the type of forest in question, independent of
the weather conditions (Huang, Goward, et al., 2009; Huang, Li, et al., 2009; Waske and
Braun, 2009; Huang, Goward, et al., 2010; Thomas et al., 2011; Pflugmacher et al., n.d.).
These authors also used other sources of data, such as SAR, Lidar and high resolution optical
sensors.
Not all of the NFI data plots were suitable for the evaluation of the classification. Some were
located at the boundaries of forest classes and others were in areas where the degree of
canopy cover was very low. It was necessary to select those plots with enough canopy cover
to compare with LANDSAT TM data.
6.15 The stratified approach
The classification of forest areas and subsequent classification of within forest resulted a good
way to avoid typical confusions among some forest types with crops. Dees et al. (2000) found
easy find similar accuracies in predictions of forest stands with reduction of grid size in
sampling pattern. They used also multisource remote sensing data.
The definition of forest, when based only in LANDSAT TM data, should be based in 40% of
canopy cover. With lowest canopy cover, the influence of the reflectance of the soil can affect
the accuracy of the classification. With the use of another source of remote sensing data, such
as LIDAR this definition could be modified to lower canopy cover boundary and low NDVI
values.
The species data for the state showed that some regions have 100 % of conifer plots, but the
broad leaved forest does not account for 100 % canopy cover for all of the scenes.
6 Discussion Development of a method for forest type detection Juan Ygnacio López Hernández
- 124 -
6.16 Automatization of the processing chain
The procedure described is based in the automated selection of NFI data plots based on
clustering by PAM algorithm and machine learning classification. Many of the steps
described in the method for processing the NFI database can be automated from the import
into the statistical software up to the production of the silhouette. Even in small desktop
computer or laptops those steps can be carried out.
Other steps that could be automated are the classification based on NFI data under ML
algorithms, evaluation of the classifications and the regression to the complete area. Those
steps are in the caret package in R and are susceptible to be automated. Some authors have
already automatized forest mapping processes Pekkarinen et al. (2009), but the procedure
depends on databases from numerous countries and is not produced frequently.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 125 -
7 References
Achard, F., 2009. Vital forest graphics. UNEP/Earthprint.
ADV, 2003. ATKIS [WWW Document]. URL http>//www.atkis.de
Austin, M.P., 2002. Spatial prediction of species distribution: an interface between ecological
theory and statistical modelling 101 – 118.
Avitabile, V., Baccini, A., Friedl, M.A., Schmullius, C., 2012. Capabilities and limitations of
Landsat and land cover data for aboveground woody biomass estimation of Uganda.
Remote Sensing of Environment 117, 366–380.
Bacauskiene, M., Verikas, A., Gelzinis, A., Valincius, D., 2009. A feature selection technique
for generation of classification committees and its application to categorization of
laryngeal images 645 – 654.
Balas, B., 2008. Attentive texture similarity as a categorization task: Comparing texture
synthesis models 972 – 982.
BAYERN TOURISMUS Marketing GmbH, 2012. Germany Bavaria geography regions
people [WWW Document]. URL http://www.bavaria.by/bavaria-germany-geography-
travel-saison
BMELV, 2010. Bundeswaldinventur [WWW Document]. URL
http://www.bundeswaldinventur.de/enid/c07004d3dd6ad40388f73555f0f1d8da,0/6i.html
Biradar, Chandrashekhar M., Prasad S. Thenkabail, Praveen Noojipady, Yuanjie Li,
Venkateswarlu Dheeravath, Hugh Turral, Manohar Velpuri, et al. 2009. “A Global Map
of Rainfed Cropland Areas (GMRCA) at the End of Last Millennium Using Remote
Sensing.” International Journal of Applied Earth Observation and Geoinformation 11 (2):
114 – 129. doi:10.1016/j.jag.2008.11.002.
Bock, M, G. Rossner, M. Wissen, K. Remm, T. Langanke, S. Lang, H. Klug, T. Blaschke, and
B. Vrščaj. 2005. Spatial indicators for nature conservation from European to local scale.
Ecological Indicators 5 (4) (noviembre): 322–338. doi:10.1016/j.ecolind.2005.03.018.
http://www.sciencedirect.com/science/article/pii/S1470160X05000294.
Bozdogan, H., 2000. Akaike’s Information Criterion and Recent Developments in
Information Complexity 62–91.
Brandt, Jodi S., Tobias Kuemmerle, Haomin Li, Guopeng Ren, Jianguo Zhu, y Volker C.
Radeloff. 2012. Using Landsat imagery to map forest change in southwest China in
response to the national logging ban and ecotourism development. Remote Sensing of
Environment 121 (0) (junio): 358–369. doi:10.1016/j.rse.2012.02.010.
http://www.sciencedirect.com/science/article/pii/S0034425712001034.
Brenning, A., 2009. Benchmarking classifiers to optimally integrate terrain analysis and
multispectral remote sensing in automatic rock glacier detection 239 – 247.
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 126 -
Bundesamt für Naturschutz [WWW Document], 2012. . URL
http://www.bfn.de/geoinfo/landschaften/
Butler, H., Schmidt, C., Springmeyer, D., Livni, J., 2010. Spatial Reference [WWW
Document]. Spatial Reference. URL http://www.spatialreference.org/ref/epsg/3396/
Campbell, J.B., Wynne, R.H., 2011. Introduction to remote sensing. Guilford Press, New
York.
Camps-Valls, Gustavo, y Lorenzo Bruzzone. 2009. Kernel methods for remote sensing 1 :
data analysis 2. Hoboken, NJ: Wiley.
Carlson, T.N., Ripley, D.A., 1997. On the relation between NDVI, fractional vegetation
cover, and leaf area index. Remote Sensing of Environment 62, 241–252.
Carman, C.S., Merickel, M.B., 1990. Supervising ISODATA with an information theoretic
stopping rule 185–197.
Carreiras, J.M.B., Pereira, J.M.C., Pereira, J.S., 2006. Estimation of tree canopy cover in
evergreen oak woodlands using remote sensing. Forest Ecology and Management 223,
45–53.
Chang, F., Qiu, W., Zamar, R.H., Lazarus, R., Wang, X., 2010. clues: An R Package for
Nonparametric Clustering Based on Local Shrinking 1–16.
Chen, G., Hay, G.J., St-Onge, B., 2011. A GEOBIA framework to estimate forest parameters
from lidar transects, Quickbird imagery and machine learning: A case study in Quebec,
Canada.
Chen, X., R. Tateishi, y C. Wang. 1999. Development of a 1-km landcover dataset of China
using AVHRR data. ISPRS journal of photogrammetry and remote sensing 54 (5): 305–
316. http://www.sciencedirect.com/science/article/pii/S0924271699000271.
Chirici, G., Barbati, A., Corona, P., Marchetti, M., Travaglini, D., Maselli, F., Bertini, R.,
2008. Non-parametric and parametric methods using satellite images for estimating
growing stock volume in alpine and Mediterranean forest ecosystems. Remote Sensing of
Environment 112, 2686–2700.
Chirici, G., Giuliarelli, D., Biscontini, D., Tonti, D., Mattioli, W., Marchetti, M., Corona, P.,
2011. Large-scale monitoring of coppice forest clearcuts by multitemporal very high
resolution satellite imagery. A case study from central Italy 1025 – 1033.
Cihlar, J., H. Ly, y Q. Xiao. 1996. Land cover classification with AVHRR multichannel
composites in northern environments. Remote Sensing of Environment 58 (1): 36–51.
http://www.sciencedirect.com/science/article/pii/0034425795002103.
Cohen, W.B., Yang, Z., Kennedy, R., 2010. Detecting trends in forest disturbance and
recovery using yearly Landsat time series: 2. TimeSync — Tools for calibration and
validation. Remote Sensing of Environment 114, 2911–2924.
Corona, P, G. Chirici, R. E. McRoberts, S. Winter, and A. Barbati. 2011. Contribution of
large-scale forest inventories to biodiversity assessment and monitoring. Forest Ecology
and Management 262 (11) (diciembre 1): 2061–2069. doi:10.1016/j.foreco.2011.08.044.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 127 -
http://www.sciencedirect.com/science/article/pii/S0378112711005366.
Couto, P., 2003. Assessing the accuracy of spatial simulation models. Ecological Modelling
167, 181–198.
Couturier, S., Gastellu-Etchegorry, J.-P., Patiño, P., Martin, E., 2009. A model-based
performance test for forest classifiers on remote-sensing imagery 23 – 37.
Crist, E.P., Cicone, R.C., 1984. A physically-based transformation of Thematic Mapper
data—The TM Tasseled Cap. Geoscience and Remote Sensing, IEEE Transactions on
256–263.
Deegalla, S., Boström, H., 2007. Classification of microarrays with kNN: comparison of
dimensionality reduction methods, in: Proceedings of the 8th International Conference on
Intelligent Data Engineering and Automated Learning, IDEAL’07. Springer-Verlag,
Berlin, Heidelberg, pp. 800–809.
Dees, M., J. Duvenhorst, C. P. Gross, and B. Koch. 2000. “Combining Remote Sensing Data
Sources and Terrestrial Sample-based Inventory Data for the Use in Forest Management
Inventories.” INTERNATIONAL ARCHIVES OF PHOTOGRAMMETRY AND
REMOTE SENSING 33 (B7/1; PART 7): 355–362.
Díaz-Uriarte, R., De Andres, S.A., 2006. Gene selection and classification of microarray data
using random forest. BMC bioinformatics 7, 3.
Dheeravath, V., P. S. Thenkabail, G. Chandrakantha, P. Noojipady, G. P. O. Reddy, C. M.
Biradar, M. K. Gumma, and M. Velpuri. 2010. “Irrigated Areas of India Derived Using
MODIS 500 m Time Series for the Years 2001–2003.” ISPRS Journal of
Photogrammetry and Remote Sensing 65 (1): 42 – 59.
doi:10.1016/j.isprsjprs.2009.08.004.
Dickinson, R.E., 1964. Germany: A General and Regional Geography. Taylor & Francis.
Dorren, L.K.A., Maier, B., Seijmonsbergen, A.C., 2003. Improved Landsat-based forest
mapping in steep mountainous terrain using object-based classification. Forest Ecology
and Management 183, 31–46.
Dorrough, J, y C Moxham. 2005. Eucalypt establishment in agricultural landscapes and
implications for landscape-scale restoration. Biological Conservation 123 (1) (mayo):
55–66. doi:10.1016/j.biocon.2004.10.008.
Duro, D.C., Franklin, S.E., Dubé, M.G., 2012. A comparison of pixel-based and object-based
image analysis with selected machine learning algorithms for the classification of
agricultural landscapes using SPOT-5 HRG imagery. Remote Sensing of Environment
118, 259–272.
Ekbal, A., Saha, S., 2011. Multiobjective optimization for classifier ensemble and feature
selection: an application to named entity recognition. International Journal on Document
Analysis and Recognition (IJDAR) 15, 143–166.
ESRI, 2010. ArcGis help system. 380 New York Street, Redlands, CA 92373-8100, USA.
ERDAS, Inc., 2010. ERDAS Field GuideTM
, 2010th ed. ERDAS, Inc, Norcross, GA 30092-
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 128 -
2500 USA.
FAO, 2010. Evaluacion de los recursos forestales mundiales 2010 / Global Forest Resources
2010 Assessment Informe Principal / Main Report. Food & Agriculture Org, Rome.
FAO, UNDP, UNEP. 2012. UN-REDD Programme - home –. http://www.un-redd.org/
Fooddy, G.M., Hill, R.A., 1996. Classification of tropical forest classes from Landsat TM
data. International Journal of Remote Sensing 17, 2353–2367.
Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sensing
of Environment 80, 185–201.
Fraser, R.H., Olthof, I., Pouliot, D., 2009. Monitoring land cover change and ecological
integrity in Canada’s national parks. Remote Sensing of Environment 113, 1397–1409.
Gamanya, R., Maeyer, P.D., Dapper, M.D., 2009. Object-oriented change detection for the
city of Harare, Zimbabwe 571 – 588.
Gamon, J.A., Field, C.B., Goulden, M.L., Griffin, K.L., Hartley, A.E., Joel, G., Peñuelas, J.,
Valentini, R., 1995. Relationships Between NDVI, Canopy Structure, and Photosynthesis
in Three Californian Vegetation Types. Ecological Applications 5, 28–41.
Gan, G., C. Ma, y J. Wu. 2007. Data clustering: theory, algorithms, and applications.
Philadelphia, Pa.; Alexandria, Va.: SIAM, Society for Industrial and Applied
Mathematics; American Statistical Association.
Geyer, Charles J. 2010. Examples: Model Selection.
http://www.stat.umn.edu/geyer/5102/examp/select.html.
Geyer, O.F., Gwinner, M.P., 1986. Geologie von Baden-Württemberg.
Gjertsen, A. 2007. “Accuracy of Forest Mapping Based on Landsat TM Data and a kNN-
based Method.” Remote Sensing of Environment 110 (4) (October 30): 420–430.
doi:10.1016/j.rse.2006.08.018.
Grainger, A, and M. Obersteiner. 2011. A framework for structuring the global forest
monitoring landscape in the REDD+ era.. Environmental Science & Policy 14 (2)
(marzo): 127–139. doi:10.1016/j.envsci.2010.10.006.
http://www.sciencedirect.com/science/article/pii/S146290111000136X.
Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology 147 –
186.
Gumma, M. K, Devendra G., Andrew N, Sushil P, and Arnel R. 2011. Temporal Changes in
Rice-growing Area and Their Impact on Livelihood over a Decade: A Case Study of
Nepal. Agriculture, Ecosystems & Environment 142 (3–4): 382 – 392.
doi:10.1016/j.agee.2011.06.010.
Hastie, T., Tibshirani, R., Friedman, J.H., 2009. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer.
Heikkilä, J., Nevalainen, S., Tokola, T., 2002. Estimating defoliation in boreal coniferous
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 129 -
forests by combining Landsat TM, aerial photographs and field data. Forest Ecology and
Management 158, 9–23.
Heinl, M., Walde, J., Tappeiner, G., Tappeiner, U., 2009. Classifiers vs. input variables—The
drivers in image classification for land cover mapping. International Journal of Applied
Earth Observation and Geoinformation 11, 423–430.
Hengl, T. 2007. A Practical Guide to Geostatistical Mapping of Environmental Variables.
Luxembourg: Office for Official Publications of the European Communities.
http://spatial-analyst.net/book/system/files/Hengl_2009_GEOSTATe2c1w.pdf.
Hengl, T., Reuter, H.I., 2009. Geomorphometry. Elsevier, Amsterdam; Oxford.
Herbrich, Ralf. 2002. Learning kernel classifiers : theory and algorithms. Cambridge, Mass.:
MIT Press.
Hooftman, D. A. P., and J. M. Bullock. 2012. Mapping to inform conservation: A case study
of changes in semi-natural habitats and their connectivity over 70 years. Biological
Conservation 145 (1): 30 – 38. doi:10.1016/j.biocon.2011.09.015.
http://www.sciencedirect.com/science/article/pii/S0006320711003715.
Holopainen, M., Tuominen, S., Karjalainen, M., Hyypp\ä, J., Vastaranta, M., Hyypp\ä, H.,
2009. Accuracy of High-Resolution Radar Images in the Estimation of Plot-Level Forest
Variables 67–82.
Holmgren, Johan, Stephen Joyce, Mats Nilsson, and Håkan Olsson. 2000. “Estimating Stem
Volume and Basal Area in Forest Compartments by Combining Satellite Image Data
with Field Data.” Scandinavian Journal of Forest Research 15 (1) (January): 103–111.
doi:10.1080/02827580050160538.
Hothorn, T., 2012. CRAN Task View: Machine Learning & Statistical Learning, CRAN.
Hsu, C.-W., Chang, C.-C., Lin, C.-J., 2010. A practical guide to support vector classification.
Hu, Q., Yu, D., Xie, Z., 2008. Neighborhood classifiers. Expert Systems with Applications
34, 866–876.
Huang, C., Goward, S.N., Masek, J.G., Thomas, N., Zhu, Z., Vogelmann, J.E., 2010. An
automated approach for reconstructing recent forest disturbance history using dense
Landsat time series stacks 183 – 198.
Huang, C., Goward, S.N., Schleeweis, K., Thomas, N., Masek, J.G., Zhu, Z., 2009. Dynamics
of national forests assessed using the Landsat record: Case studies in eastern United
States 1430 – 1442.
Huang, C., Li, A., Shi, H., Sun, G., Zhu, Z., Goward, S.N., Masek, J., 2009. Developing a
Fine Resolution Forest Height Map for Mississippi Using Landsat Time Series
Observations and GLAS Lidar Data F3+.
Huang, C., Song, K., Kim, S., Townshend, J.R.G., Davis, P., Masek, J.G., Goward, S.N.,
2008. Use of a dark object concept and support vector machines to automate forest cover
change analysis 970 – 985.
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 130 -
Huang, S., Potter, C., Crabtree, R.L., Hager, S., Gross, P., 2010. Fusing optical and radar data
to estimate sagebrush, herbaceous, and bare ground cover in Yellowstone 251 – 264.
Huete, A., Didan, K., Miura, T., Rodriguez, E.P., Gao, X., Ferreira, L.G., 2002. Overview of
the radiometric and biophysical performance of the MODIS vegetation indices 195 –
213.
Hyyppä, J., Hyyppä, H., Inkinen, M., Engdahl, M., Linko, S., Zhu, Y.-H., 2000. Accuracy
comparison of various remote sensing data sources in the retrieval of forest stand
attributes. Forest Ecology and Management 128, 109–120.
Inglada, J., 2007. Automatic recognition of man-made objects in high resolution optical
remote sensing images by SVM classification of geometric image features 236 – 248.
Inoue, Y., Peñuelas, J., Miyata, A., Mano, M., 2008. Normalized difference spectral indices
for estimating photosynthetic efficiency and capacity at a canopy scale derived from
hyperspectral and CO2 flux measurements in rice 156–172.
Janert, P.K., 2011. Data analysis with open source tools. O’Reilly, Sebastopol, CA.
Jiang, H., Strittholt, J.R., Frost, P.A., Slosser, N.C., 2004a. The classification of late seral
forests in the Pacific Northwest, USA using Landsat ETM+ imagery 320 – 331.
Jiang, H., Strittholt, J.R., Frost, P.A., Slosser, N.C., 2004b. The classification of late seral
forests in the Pacific Northwest, USA using Landsat ETM+ imagery. Remote Sensing of
Environment 91, 320–331.
Johansen, K., Coops, N.C., Gergel, S.E., Stange, Y., 2007. Application of high spatial
resolution satellite imagery for riparian and forest ecosystem classification 29 – 44.
Kaufman, L. Rousseeuw, P.J., 2005. Finding groups in data : an introduction to cluster
analysis. Wiley, Hoboken, N.J.
Ke, Y., Quackenbush, L.J., Im, J., 2010. Synergistic use of QuickBird multispectral imagery
and LIDAR data for object-based forest species classification 1141 – 1154.
Kennedy, R.E., Cohen, W.B., Schroeder, T.A., 2007. Trajectory-based change detection for
automated characterization of forest disturbance dynamics 370 – 386.
Kennedy, Robert E., Philip A. Townsend, John E. Gross, Warren B. Cohen, Paul Bolstad, Y.
Q. Wang, y Phyllis Adams. 2009. «Remote sensing change detection tools for natural
resource managers: Understanding concepts and tradeoffs in the design of landscape
monitoring projects». Remote Sensing of Environment 113 (7): 1382 – 1396.
doi:10.1016/j.rse.2008.07.018.
http://www.sciencedirect.com/science/article/pii/S0034425709000601.
Khorram, Siamak. 2012. Remote sensing. New York: Springer.
Kim, J., and Christopher D. Ellis. 2009. Determining the Effects of Local Development
Regulations on Landscape Structure: Comparison of The Woodlands and North Houston,
TX. Landscape and Urban Planning 92 (3–4): 293 – 303.
doi:10.1016/j.landurbplan.2009.05.013.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 131 -
Kirui, K.B., Kairo, J.G., Bosire, J., Viergever, K.M., Rudra, S., Huxham, M., Briers, R.A.,
2011. Mapping of mangrove forest land cover change along the Kenya coastline using
Landsat imagery. Ocean & Coastal Management.
Knorn, Jan, Andreas Rabe, Volker C. Radeloff, Tobias Kuemmerle, Jacek Kozak, and Patrick
Hostert. 2009. “Land Cover Mapping of Large Areas Using Chain Classification of
Neighboring Landsat Satellite Images.” Remote Sensing of Environment 113 (5) (May):
957–964. doi:10.1016/j.rse.2009.01.010.
Knyazikhin, Y., Martonchik, J.V., Myneni, R.B., Diner, D.J., Running, S.W., 1998.
Synergistic algorithm for estimating vegetation canopy leaf area index and fraction of
absorbed photosynthetically active radiation from MODIS and MISR data. Journal of
Geophysical Research 103, 257–275.
Kokaly, R.F., Rockwell, B.W., Haire, S.L., King, T.V.V., 2007. Characterization of post-fire
surface cover, soils, and burn severity at the Cerro Grande Fire, New Mexico, using
hyperspectral and multispectral remote sensing 305 – 325.
Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., 2012. CRAN -
Package caret, CRAN. pfizer.com.
Kyan, M., Guan, L., Liss, S., 2005. Refining competition in the self-organising tree map for
unsupervised biofilm image segmentation. Neural Networks 18, 850–860.
Laliberte, A.S., Browning, D.M., Rango, A., 2011. A comparison of three feature selection
methods for object-based classification of sub-decimeter resolution UltraCam-L imagery.
Lang, R., Shao, G., Pijanowski, B.C., Farnsworth, R.L., 2008. Optimizing unsupervised
classifications of remotely sensed imagery with a data-assisted labeling approach 1877 –
1885.
Larsson, H., 1993. Linear regressions for canopy cover estimation in Acacia woodlands using
Landsat-TM, -MSS and SPOT HRV XS data. International Journal of Remote Sensing
14, 2129–2136.
Lavreau, J., 1991. De-Hazing Landsat Thematic Mapper Images. Photogrammetric
Engineering and Remote Sensing 57, 1297–1302.
Leica Geosystems Geospatial Imaging, LLC. 2010. ERDAS Tour GuidesTM. 2006th ed.
Norcross, GA 30092-2500 USA: ERDAS, Inc.
Li, W., Bernaola-Galván, P., Haghighi, F., Grosse, I., 2002. Applications of recursive
segmentation to the analysis of DNA sequences. Computers & Chemistry 26, 491–510.
Li, Z., Zhu, Q., Gold, C., CRC Press, 2005. Digital terrain modeling principles and
methodology. CRC Press, New York.
Liau, H.F., Isa, D., 2011. Feature selection for support vector machine-based face-iris
multimodal biometric system. Expert Systems with Applications 38, 11105–11111.
Lier, O.R. van, Fournier, R.A., Bradley, R.L., Thiffault, N., 2009. A multi-resolution satellite
imagery approach for large area mapping of ericaceous shrubs in Northern Quebec,
Canada 334 – 343.
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 132 -
Loetsch, F., Haller, K.E., Zöhrer, F., 1973. Forest inventory,. BLV, München.
López Hernández, Juan Ygnacio, Koch, B., Ueffing, C., 2010. Criterion-based procedures
applied to Landsat TM data for variable selection in an automated classifi cation schema
for Bavarian forests, in: Forests for the Future: Sustaining Society and the Environment.
Presented at the XXIII IUFRO World Congress, COMMONWEALTH FORESTRY
ASSOCIATION. The International Forestry Review, Seoul, Korea.
Luukka, P., 2011. Feature selection using fuzzy entropy measures with similarity classifier.
Expert Systems with Applications 38, 4600–4607.
MacAlister, Charlotte, and Manithaphone Mahaxay. 2009. Mapping wetlands in the Lower
Mekong Basin for wetland resource and conservation management using Landsat ETM
images and field survey data. Journal of Environmental Management 90 (7): 2130 –
2137. doi:10.1016/j.jenvman.2007.06.031.
http://www.sciencedirect.com/science/article/pii/S0301479708000339.
Maechler, M., 2011. Cluster Analysis Extended Rousseeuw et al.: Partitioning Around
Medoids [WWW Document]. URL http://127.0.0.1:21144/library/cluster/html/pam.html
Maechler, M., 2012. Cluster Analysis Extended Rousseeuw et al.
Mäkelä, H., Pekkarinen, A., 2004. Estimation of forest stand volumes by Landsat TM
imagery and stand-level field-inventory data. Forest Ecology and Management 196, 245–
255.
Makkeasorn, A., Chang, N.-B., Li, J., 2009. Seasonal change detection of riparian zones with
remote sensing images and genetic programming in a semi-arid watershed 1069 – 1080.
Man, M.Z., Dyson, G., Johnson, K., Liao, B., 2004. Evaluating Methods for Classifying
Expression Data. Journal of Biopharmaceutical Statistics 14, 1065–1084.
McLeod, A.I., and Changjiang Xu. 2010. Package ‘bestglm’. http://cran.r-
project.org/web/packages/bestglm/bestglm.pdf.
Mellin, C., Bradshaw, C.J.A., Meekan, M.G., Caley, M.J., 2010. Environmental and spatial
predictors of species richness and abundance in coral reef fishes. Global Ecology and
Biogeography 19, 212–222.
Meschede, A., 2004. Fledermause in Bayern / bearb. von Angelika Meschede und Bernd-
Ulrich Rudolph. Hrsg. vom Bayerischen Landesamt für Umweltschutz ... Ulmer,
Stuttgart (Hohenheim).
Meynen, E. [Hrsg, 1902. Handbuch der naturräumlichen Gliederung Deutschlands.
Bundesanstalt für Landeskunde und Raumforschung, Selbstverl., Bad Godesberg.
Middleton, M., Närhi, P., Arkimaa, H., Hyvönen, E., Kuosmanen, V., Treitz, P., Sutinen, R.,
2012. Ordination and hyperspectral remote sensing approach to classify peatland
biotopes along soil moisture and fertility gradients. Remote Sensing of Environment 124,
596–609.
Mills, P., 2011. Efficient statistical classification of satellite measurements. International
Journal of Remote Sensing 32, 6109–6132.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 133 -
Moisen, G.G., Frescino, T.S., 2002. Comparing five modelling techniques for predicting
forest characteristics 209 – 225.
Moisen, Gretchen G., Elizabeth A. Freeman, Jock A. Blackard, Tracey S. Frescino, Niklaus E.
Zimmermann, and Thomas C. Edwards. 2006. “Predicting Tree Species Presence and
Basal Area in Utah: A Comparison of Stochastic Gradient Boosting, Generalized
Additive Models, and Tree-based Methods.” Ecological Modelling 199 (2) (November):
176–187. doi:10.1016/j.ecolmodel.2006.05.021.
Moreau, S., Bosseno, R., Gu, X.F., Baret, F., 2003. Assessing the biomass dynamics of
Andean bofedal and totora high-protein wetland grasses from NOAA/AVHRR 516–529.
Muthu Rama Krishnan, M., Chakraborty, C., Paul, R.R., Ray, A.K., 2012. Hybrid
segmentation, characterization and classification of basal cell nuclei from
histopathological images of normal oral mucosa and oral submucous fibrosis. Expert
Systems with Applications 39, 1062–1077.
Navulur, K., 2007. Multispectral image analysis using the object-oriented paradigm. CRC
Press/Taylor & Francis, Boca Raton.
NASA. 2012. Landsat Data Continuity Mission. http://ldcm.nasa.gov/.
Neeff, T., von Luepke, H., Schoene, D., 2006. Choosing a forest definition for the Clean
Development Mechanism, Forests and Climate Change Working Paper. FAO, Rome.
Oehmichen, K., 2007. Satellitengestützte Waldflächenkartierung für die Bundeswaldinventur.
Ohmann, J. L., M. J. Gregory, H. M. Roberts, W. B. Cohen, R. E. Kennedy, and Z. Yang.
2012. Mapping change of older forest with nearest-neighbor imputation and Landsat
time-series. Forest Ecology and Management 272 (0) (mayo 15): 13–25.
doi:10.1016/j.foreco.2011.09.021.
http://www.sciencedirect.com/science/article/pii/S0378112711005809.
PALUBINSKAS, G., LUCAS, R.M., FOODY, G.M., CURRAN, P.J., 1995. An evaluation of
fuzzy and texture-based classification approaches for mapping regenerating tropical
forest classes from Landsat-TM data. International Journal of Remote Sensing 16, 747–
759.
Parks, C.G., Radosevich, S.R., Endress, B.A., Naylor, B.J., Anzinger, D., Rew, L.J., Maxwell,
B.D., Dwire, K.A., 2005. Natural and land-use history of the Northwest mountain
ecoregions (USA) in relation to patterns of plant invasions. Perspectives in Plant
Ecology, Evolution and Systematics 7, 137–158.
Parr, W.C., Schucany, W.R., 1980. Minimum Distance and Robust Estimation. Journal of the
American Statistical Association 75, 616–624.
Pekkarinen, A., Reithmaier, L., Strobl, P., 2007. High resolution pan-European forest/non-
forest map based on Landsat data and CORINE Land Cover 2000.
Pekkarinen, A., Reithmaier, L., Strobl, P., 2009. Pan-European forest/non-forest mapping
with Landsat ETM+ and CORINE Land Cover 2000 data. ISPRS Journal of
Photogrammetry and Remote Sensing 643, 171 – 183.
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 134 -
Pert, P.L., Butler, J.R.A., Bruce, C., Metcalfe, D., 2012. A composite threat indicator
approach to monitor vegetation condition in the Wet Tropics, Queensland, Australia.
Ecological Indicators 18, 191–199.
Peters, J., Baets, B.D., Verhoest, N.E.C., Samson, R., Degroeve, S., Becker, P.D., Huybrechts,
W., 2007. Random forests as a tool for ecohydrological distribution modelling 304 – 318.
Petropoulos, G.P., Kontoes, C., Keramitsoglou, I., 2011. Burnt area delineation from a uni-
temporal perspective based on Landsat TM imagery classification using Support Vector
Machines. International Journal of Applied Earth Observation and Geoinformation 13,
70–80.
Pflugmacher, D., Cohen, W.B., E. Kennedy, R., n.d. Using Landsat-derived disturbance
history (1972–2010) to predict current forest structure. Remote Sensing of Environment.
Pope, K. O., J. M. Rey-Benayas, y J. F. Paris. 1994. Radar remote sensing of forest and
wetland ecosystems in the Central American tropics. Remote Sensing of Environment 48
(2): 205–219. http://www.sciencedirect.com/science/article/pii/0034425794901422.
Powell, S.L., Cohen, W.B., Healey, S.P., Kennedy, R.E., Moisen, G.G., Pierce, K.B.,
Ohmann, J.L., 2010. Quantification of live aboveground forest biomass dynamics with
Landsat time-series and field inventory data: A comparison of empirical modeling
approaches 1053 – 1068.
Proisy, C., Couteron, P., Fromard, F., 2007. Predicting and mapping mangrove biomass from
canopy grain analysis using Fourier-based textural ordination of IKONOS images 379 –
392.
Propastin, P., Erasmi, S., 2010. A physically based approach to model LAI from MODIS 250
m data in a tropical region 47 – 59.
Pu, R., Gong, P., Yu, Q., 2008. Comparative Analysis of EO-1 ALI and Hyperion, and
Landsat ETM+ Data for Mapping Forest Crown Closure and Leaf Area Index 3744–
3766.
Purkis, Samuel J, and V Klemas. 2011. Remote sensing and global environmental change.
Chichester, West Sussex, UK; Hoboken, N.J.: Wiley-Blackwell.
Qiu, R., Guo, N., Li, H., Wu, Z., Chakravarthy, V., Song, Y., Hu, Z., Zhang, P., Chen, Z.,
2009. A Unified Multi-Functional Dynamic Spectrum Access Framework: Tutorial,
Theory and Multi-GHz Wideband Testbed 6530–6603.
Racoviteanu, Adina E., Mark W. Williams, y Roger G. Barry. 2008. «Optical Remote Sensing
of Glacier Characteristics: A Review with Focus on the Himalaya». Sensors 8 (5) (mayo
23): 3355–3383. doi:10.3390/s8053355. http://www.mdpi.com/1424-8220/8/5/3355/.
Raykov, Tenko, and George A. Marcoulides. 1999. “On Desirability of Parsimony in
Structural Equation Model Selection.” Structural Equation Modeling: A
Multidisciplinary Journal 6 (3) (January): 292–300. doi:10.1080/10705519909540135.
Raymond, Christopher M., Brett A. Bryan, Darla Hatton MacDonald, Andrea Cast, Sarah
Strathearn, Agnes Grandgirard, and Tina Kalivas. 2009. Mapping community values for
natural capital and ecosystem services. Ecological Economics 68 (5): 1301 – 1315.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 135 -
doi:10.1016/j.ecolecon.2008.12.006.
http://www.sciencedirect.com/science/article/pii/S0921800908005326.
Reese, H., M. Nilsson, P. Sandström, and H. Olsson. 2002. “Applications Using Estimates of
Forest Parameters Derived from Satellite and Forest Inventory Data.” Computers and
Electronics in Agriculture 37 (1): 37–55.
Remm, K., 2004. Case-based predictions for species and habitat mapping 259 – 281.
Ren, G., Zhu, A.-X., Wang, W., Xiao, W., Huang, Y., Li, G., Li, D., Zhu, J., 2009. A
hierarchical approach coupled with coarse DEM information for improving the efficiency
and accuracy of forest mapping over very rugged terrains 26 – 34.
Rocchini, D., Foody, G.M., Nagendra, H., Ricotta, C., Anand, M., He, K.S., Amici, V.,
Kleinschmit, B., Förster, M., Schmidtlein, S., Feilhauer, H., Ghisla, A., Metz, M.,
Neteler, M., In Press. Uncertainty in ecosystem mapping by remote sensing. Computers
& Geosciences.
Rogan, J., Franklin, J., Stow, D., Miller, J., Woodcock, C., Roberts, D., 2008. Mapping land-
cover modifications over large areas: A comparison of machine learning algorithms 2272
– 2283.
Rothe, P., 2009. Die Geologie Deutschlands. WBG, Darmstadt.
Rowntree, R.A., 1984. Ecology of the urban forest—Introduction to part I. Urban Ecology 8,
1–11.
Roy, D., Ju, J., Lewis, P., Schaaf, C., Gao, F., Hansen, M., Lindquist, E., 2008. Multi-
temporal MODIS–Landsat data fusion for relative radiometric normalization, gap filling,
and prediction of Landsat data 3112–3130.
Salovaara, K.J., Thessler, S., Malik, R.N., Tuomisto, H., 2005. Classification of Amazonian
primary rain forest vegetation using Landsat ETM+ satellite imagery 39 – 51.
Sasaki, N., Putz, F.E., 2009. Critical need for new definitions of “forest” and “forest
degradation” in global climate change agreements 226–232.
Seabrook, L., McAlpine, C., Fensham, R., 2007. Spatial and temporal analysis of vegetation
change in agricultural landscapes: A case study of two brigalow (Acacia harpophylla)
landscapes in Queensland, Australia. Agriculture, Ecosystems & Environment 120, 211–
228.
Sedano, Fernando, Pieter Kempeneers, Jesús San Miguel, Peter Strobl, y Peter Vogt. 2012.
Towards a pan-European burnt scar mapping methodology based on single date medium
resolution optical remote sensing data. International Journal of Applied Earth
Observation and Geoinformation (0) (abril 13). doi:10.1016/j.jag.2011.08.003.
http://www.sciencedirect.com/science/article/pii/S0303243411001115
Seebach, L.M., Strobl, P., San Miguel-Ayanz, J., Bastrup-Birk, A., 2011. Identifying strengths
and limitations of pan-European forest cover maps through spatial comparison.
International Journal of Geographical Information Science 25, 1865–1884.
Seidl, R., Rammer, W., Jäger, D., Currie, W.S., Lexer, M.J., 2007. Assessing trade-offs
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 136 -
between carbon sequestration and timber production within a framework of multi-
purpose forestry in Austria. Forest Ecology and Management 248, 64–79.
Sesnie, S.E., Gessler, P.E., Finegan, B., Thessler, S., 2008. Integrating Landsat TM and
SRTM-DEM derived variables with decision trees for habitat classification and change
detection in complex neotropical environments 2145 – 2159.
Shakhnarovich, G., Darrell, T., Indyk, P. (Eds.), 2006. Nearest-Neighbor Methods in Learning
and Vision: Theory and Practice. The MIT Press.
Shimatani, K., Kawarasaki, S., Manabe, T., 2008. Describing size-related mortality and size
distribution by nonparametric estimation and model selection using the Akaike Bayesian
Information Criterion. 289 – 297.
Stehman, S.V., 1997. Selecting and interpreting measures of thematic classification accuracy.
Remote Sensing of Environment 62, 77–89.
Suchenwirth, L., Förster, M., Cierjacks, A., Lang, F., Kleinschmit, B., 2012. Knowledge-
based classification of remote sensing data for the estimation of below- and above-
ground organic carbon stocks in riparian forests. Wetlands Ecology and Management 20,
151–163.
Swaine, M.D., Whitmore, T.C., 1988. On the definition of ecological species groups in
tropical rain forests. Plant Ecology 75, 81–86.
Thessler, S., Ruokolainen, K., Tuomisto, H., Tomppo, E., 2005. Mapping gradual landscape-
scale floristic changes in Amazonian primary rain forests by combining ordination and
remote sensing. Global Ecology and Biogeography 14, 315–325.
Thessler, S., Sesnie, S., Bendaña, Z.S.R., Ruokolainen, K., Tomppo, E., Finegan, B., 2008.
Using k-nn and discriminant analyses to classify rain forest types in a Landsat TM image
over northern Costa Rica 2485 – 2494.
Thomas, N.E., Huang, C., Goward, S.N., Powell, S., Rishmawi, K., Schleeweis, K., Hinds,
A., 2011. Validation of North American Forest Disturbance dynamics derived from
Landsat time series stacks. Remote Sensing of Environment 115, 19–32.
TOKOLA, T., PITKÄNEN, J., PARTINEN, S., MUINONEN, E., 1996. Point accuracy of a
non-parametric method in estimation of forest characteristics with different satellite
materials. International Journal of Remote Sensing 17, 2333–2351.
Torgo, L., 2010. Data Mining with R: Learning with Case Studies, 1st ed. Chapman and
Hall/CRC.
Triepke, F.J., Brewer, C.K., Leavell, D.M., Novak, S.J., 2008. Mapping forest alliances and
associations using fuzzy systems and nearest neighbor classifiers 1037 – 1050.
Trimble, 2011a. eCognition Developer - User Guide. Trimble Germany GmbH.
Trimble, 2011b. eCognition® Developer 8.64.1 Reference Book. Trimble Germany GmbH,,
Trappentreustr. 1, D-80339 München, Germany.
UNESCO. 2008. “IDAMS Statistical Software.” IDAMS Statistical Software.
Juan Ygnacio López Hernández A method for forest type detection 7 References
- 137 -
http://portal.unesco.org/ci/en/ev.php-
URL_ID=2070&URL_DO=DO_TOPIC&URL_SECTION=201.html.
Van Laar, A., Akça, A., 1997. Forest mensuration. Cuvillier, Göttingen.
Venables, W. N, and Brian D Ripley. 2010. Modern applied statistics with S. New York:
Springer.
Verhoeye, J., and R. De Wulf. 2002. Land Cover Mapping at Sub-pixel Scales Using Linear
Optimization Techniques. Remote Sensing of Environment 79 (1): 96–104.
Verikas, A., Gelzinis, A., Bacauskiene, M., 2011. Mining data with random forests: A survey
and results of new tests. Pattern Recognition 44, 330–349.
Verplancke, T., Van Looy, S., Benoit, D., Vansteelandt, S., Depuydt, P., De Turck, F.,
Decruyenaere, J., 2008. Support vector machine versus logistic regression modeling for
prediction of hospital mortality in critically ill patients with haematological malignancies.
BMC medical informatics and decision making 8, 56.
Walsh, S.J., McCleary, A.L., Mena, C.F., Shao, Y., Tuttle, J.P., González, A., Atkinson, R.,
2008. QuickBird and Hyperion data analysis of an invasive plant species in the
Galapagos Islands of Ecuador: Implications for control and land use management.
Remote Sensing of Environment 112, 1927–1941.
Wang, X., Niu, R., 2009. Spatial Forecast of Landslides in Three Gorges Based On Spatial
Data Mining 2035–2061.
Ward, E.J., 2008. A review and comparison of four commonly used Bayesian and maximum
likelihood model selection tools 1–10.
Waske, B., Braun, M., 2009. Classifier ensembles for land cover mapping using
multitemporal SAR imagery 450 – 457.
Watts, J.D., Lawrence, R.L., Miller, P.R., Montagne, C., 2009. Monitoring of cropland
practices for carbon sequestration purposes in north central Montana by Landsat remote
sensing 1843 – 1852.
Wikipedia contributors, 2012. Multinomial logit. Wikipedia, the free encyclopedia.
Witten, I.H., Frank, E., Hall, M.A., 2011. Data mining : practical machine learning tools and
techniques. Morgan Kaufmann, Burlington, MA.
Wolter, P.T., Mladenoff, D.J., Host, G.E., Crow, T.R., 1995. Improved forest classification in
the Northern Lake States using multi-temporal Landsat imagery. Photogrammetric
Engineering and Remote Sensing 61, 1129–1143.
Yang, W., Li, D., Zhu, L., 2011. An improved genetic algorithm for optimal feature subset
selection from multi-character feature set. Expert Systems with Applications 38, 2733–
2740.
Yang, Z.R., 2010. Machine learning approaches to bioinformatics. World Scientific,
Singapore; Hackensack, NJ.
7 References Development of a method for forest type detection Juan Ygnacio López Hernández
- 138 -
Zhao, Y., M. Tomita, K. Hara, M. Fujihara, Y. Yang, and L. Da. 2011. Effects of topography
on status and changes in land-cover patterns, Chongqing City, China.. Landscape and
Ecological Engineering (marzo 23). doi:10.1007/s11355-011-0155-2.
http://www.springerlink.com/index/10.1007/s11355-011-0155-2.
Juan Ygnacio López Hernández A method for forest type detection Apendix 1
- 139 -
Apendix 1
Results found in the tuning process for all of the scenes. The figures were rounded up to 3
decimals.
Scene 192 026
Final Model Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3 + SD6 + SD2 + SD4
Table 37: Accuracies obtained for the scene in path 192, row 026 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.19 3.12 4.06 5.00
2.00 0.685 0.7 0.7 0.7 0.704
3.50 0.704 0.71 0.712 0.712 0.711
5.00 0.709 0.709 0.708 0.708 0.709
6.50 0.707 0.704 0.704 0.704 0.704
8.00 0.707 0.707 0.707 0.707 0.707
Scene 192 027
Final Model Y ~ B5 + B6 + B7 + NDVI + B4 + B1 + SD6 + SD4 + SD1 + SD7 + SD2
Table 38: Accuracies obtained for the scene in path 192, row 027 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.5 5 10 20 40 80 160
0.125 0.757 0.773 0.778 0.79 0.795 0.802 0.805 0.804
0.250 0.776 0.794 0.805 0.809 0.812 0.81 0.81 0.81
0.500 0.804 0.821 0.812 0.812 0.813 0.813 0.813 0.813
1.000 0.825 0.82 0.819 0.819 0.819 0.819 0.819 0.819
2.000 0.807 0.809 0.809 0.809 0.809 0.809 0.809 0.809
4.000 0.789 0.789 0.789 0.789 0.789 0.789 0.789 0.789
8.000 0.785 0.785 0.785 0.785 0.785 0.785 0.785 0.785
16.000 0.782 0.782 0.782 0.782 0.782 0.782 0.782 0.782
Apendix 1 Development of a method for forest type detection Juan Ygnacio López Hernández
- 140 -
Scene 193 025
Final Model Y ~ B2
Table 39: Accuracies obtained for the scene in path 193, row 025 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.5 5 10 20 40 80 160
0.125 0.453 0.468 0.474 0.481 0.497 0.504 0.507 0.503
0.250 0.497 0.502 0.507 0.507 0.507 0.511 0.51 0.512
0.500 0.508 0.51 0.512 0.513 0.518 0.526 0.534 0.547
1,000 0.523 0.53 0.542 0.554 0.555 0.56 0.561 0.561
2,000 0.559 0.557 0.56 0.557 0.546 0.547 0.555 0.561
4,000 0.552 0.546 0.563 0.595 0.622 0.634 0.648 0.66
8,000 0.594 0.618 0.638 0.635 0.645 0.645 0.669 0.682
16,000 0.639 0.642 0.664 0.671 0.684 0.696 0.701 0.709
Scene 193 026
Final Model Y ~ NDVI + B5 + B4 + B7 + B2 + B6 + B1 + B3 + SD5 + SD7 + SD1 + SD4
Table 40: Accuracies obtained for the scene in path 193, row 026 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.5 5 10 20 40 80 160
0.125 0.655 0.67 0.687 0.702 0.707 0.71 0.712 0.708
0.250 0.679 0.697 0.712 0.718 0.716 0.716 0.716 0.716
0.500 0.71 0.725 0.722 0.721 0.723 0.723 0.723 0.723
1,000 0.729 0.728 0.727 0.727 0.727 0.727 0.727 0.727
2,000 0.732 0.731 0.731 0.731 0.731 0.731 0.731 0.731
4,000 0.713 0.713 0.713 0.713 0.713 0.713 0.713 0.713
8,000 0.694 0.694 0.694 0.694 0.694 0.694 0.694 0.694
16,000 0.692 0.692 0.692 0.692 0.692 0.692 0.692 0.692
Juan Ygnacio López Hernández A method for forest type detection Apendix 1
- 141 -
Scene 193 027
Final Model Y ~ B5 + B7 + NDVI + B4 + B2 + SD1 + B3 + B1 + B6
Table 41: Accuracies obtained for the scene in path 193, row 027 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 2.19 3.12 4.06 5
0.125 0.733 0.749 0.759 0.764 0.764
0.219 0.769 0.766 0.760 0.758 0.758
0.312 0.770 0.763 0.761 0.760 0.757
0.406 0.770 0.761 0.761 0.755 0.754
0.5 0.770 0.763 0.758 0.754 0.751
Scene 194 025
Final Model Y ~ NDVI + B4 + B7 + B5 + SD5 + B2 + B3 + SD7 + B1 + B6 + SD4 + SD2
Table 42: Accuracies obtained for the scene in path 194, row 025 tuning the C and sigma parameters for the support
vector machines algorithm.
Accuracy C
sigma 1.25 1.56 1.88 2.19 2.5
0.250 0.696 0.7 0.701 0.7 0.7
0.438 0.703 0.703 0.706 0.703 0.697
0.625 0.701 0.698 0.694 0.692 0.691
0.812 0.698 0.698 0.698 0.696 0.698
1.000 0.699 0.701 0.7 0.7 0.696
Apendix 2 Development of a method for forest type detection Juan Ygnacio López Hernández
- 142 -
Apendix 2
Results found in the verification of the classification with SVM algorithm. The figures were
rounded up to 2 decimals. The output of the classification with Support Vector Machines
(SVM) is compared with the National Forest Inventory (Nfi) data is used as reference. The
Over All accuracy (OA) User Accuracy (UA) and the Producer Accuracy (PA) per class was
calculated in percent.
Table 43: Confussion table for the scene in path 192, row 026 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.6503
Table 44: Confussion table for the scene in path 192, row 027 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.7632
192026
Nfi
SVM 1 2 3 Tot UA%
1 93 14 1 108 86.11
2 14 55 0 69 79.71
3 7 14 31 52 59.62
Tot 114 83 32 229
PA% 81.58 66.27 96.88
OA% 78.17
192027
Nfi
SVM 1 2 3 Tot UA%
1 62 10 0 72 86.11
2 6 69 0 75 92.00
3 2 2 5 9 55.56
Tot 70 81 5 156
PA% 88.57 85.19 100
OA% 87.18
Juan Ygnacio López Hernández A method for forest type detection Apendix 2
- 143 -
Table 45: Confussion table for the scene in path 193, row 026 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.7632
Table 46: Confussion table for the scene in path 193, row 025 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.4562
Table 47: Confussion table for the scene in path 193, row 027 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.5941
193026
Nfi
SVM 1 2 3 Tot UA%
1 179 32 0 211 84.83
2 29 74 0 103 71.84
3 13 3 31 47 65.96
Tot 221 109 31 361
PA% 81.00 67.89 100.00 OA% 78.67
193025 Nfi
SVM 1 2 3 Tot UA %
1 14 0 0 14 100
2 4 11 1 16 68.75
3 11 1 4 16 25.00
Tot 29 12 5 46
PA % 48.28 91.67 80.00 OA% 63.04
193027 Nfi
SVM 1 2 3 Tot UA%
1 103 43 0 168 74.40
2 22 120 0 142 84.51
3 0 0 3 3 100.00
Tot 147 163 3 313
PA% 85.03 73.62 100 OA% 79.23
Apendix 2 Development of a method for forest type detection Juan Ygnacio López Hernández
- 144 -
Table 48: Confussion table for the scene in path 194, row 025 comparing the NFI plots with the group obtained by
SVM classification algorithm.
kappa: 0.6503
194025 Nfi
SVM 1 2 3 Tot UA%
1 93 14 1 108 86.11
2 14 55 0 69 79.71
3 7 14 31 52 59.62
Tot 114 83 32 229
PA% 81.58 66.27 96.88 OA% 78.17
Juan Ygnacio López Hernández A method for forest type detection Apendix 3
- 145 -
Apendix 3
Messages and contingency table presented by the R statistical language after validation of
SVM for classification for all the scenes.
Scene path 192, row 026
SV type: C-svc (classification)
parameter : cost C = 3.125
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 3.5
Number of Support Vectors : 529
Objective Function Value : -423.0207 -180.4852 -270.6232
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 93 14 1
2 14 55 0
3 7 14 31
Overall Statistics
Accuracy : 0.7817
95% CI : (0.7225, 0.8334)
No Information Rate : 0.4978
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6503
Mcnemar's Test P-Value : 0.0003468
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8158 0.6627 0.9688
Specificity 0.8696 0.9041 0.8934
Pos Pred Value 0.8611 0.7971 0.5962
Neg Pred Value 0.8264 0.8250 0.9944
Prevalence 0.4978 0.3624 0.1397
Detection Rate 0.4061 0.2402 0.1354
Detection Prevalence 0.4716 0.3013 0.2271
Scene path 192, row 027
SV type: C-svc (classification)
parameter : cost C = 1.25
Apendix 3 Development of a method for forest type detection Juan Ygnacio López Hernández
- 146 -
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 1
Number of Support Vectors : 407
Objective Function Value : -240.2249 -30.996 -38.3275
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 62 10 0
2 6 69 0
3 2 2 5
Overall Statistics
Accuracy : 0.8718
95% CI : (0.809, 0.9199)
No Information Rate : 0.5192
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7632
Mcnemar's Test P-Value : 0.1718
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8857 0.8519 1.00000
Specificity 0.8837 0.9200 0.97351
Pos Pred Value 0.8611 0.9200 0.55556
Neg Pred Value 0.9048 0.8519 1.00000
Prevalence 0.4487 0.5192 0.03205
Detection Rate 0.3974 0.4423 0.03205
Detection Prevalence 0.4615 0.4808 0.05769
Scene path 193, row 025
SV type: C-svc (classification)
parameter : cost C = 160
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 16
Number of Support Vectors : 521
Objective Function Value : -39795.43 -36822.98 -18453.12
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 14 0 0
2 4 11 1
3 11 1 4
Overall Statistics
Accuracy : 0.6304
95% CI : (0.4755, 0.7679)
No Information Rate : 0.6304
P-Value [Acc > NIR] : 0.565738
Juan Ygnacio López Hernández A method for forest type detection Apendix 3
- 147 -
Kappa : 0.4562
Mcnemar's Test P-Value : 0.001817
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.4828 0.9167 0.80000
Specificity 1.0000 0.8529 0.70732
Pos Pred Value 1.0000 0.6875 0.25000
Neg Pred Value 0.5312 0.9667 0.96667
Prevalence 0.6304 0.2609 0.10870
Detection Rate 0.3043 0.2391 0.08696
Detection Prevalence 0.3043 0.3478 0.34783
Scene path 193, row 026
SV type: C-svc (classification)
parameter : cost C = 1.25
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 2
Number of Support Vectors : 591
Objective Function Value : -248.9349 -138.705 -104.0795
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 179 32 0
2 29 74 0
3 13 3 31
Overall Statistics
Accuracy : 0.7867
95% CI : (0.7408, 0.8278)
No Information Rate : 0.6122
P-Value [Acc > NIR] : 1.009e-12
Kappa : 0.6085
Mcnemar's Test P-Value : 0.001058
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8100 0.6789 1.00000
Specificity 0.7714 0.8849 0.95152
Pos Pred Value 0.8483 0.7184 0.65957
Neg Pred Value 0.7200 0.8643 1.00000
Prevalence 0.6122 0.3019 0.08587
Detection Rate 0.4958 0.2050 0.08587
Detection Prevalence 0.5845 0.2853 0.13019
Scene path 193, row 027
SV type: C-svc (classification)
parameter : cost C = 1.25
Apendix 3 Development of a method for forest type detection Juan Ygnacio López Hernández
- 148 -
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 1.75
Number of Support Vectors : 481
Objective Function Value : -284.3848 -27.0461 -27.8725
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 125 43 0
2 22 120 0
3 0 0 3
Overall Statistics
Accuracy : 0.7923
95% CI : (0.7431, 0.8359)
No Information Rate : 0.5208
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.5941
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8503 0.7362 1.000000
Specificity 0.7410 0.8533 1.000000
Pos Pred Value 0.7440 0.8451 1.000000
Neg Pred Value 0.8483 0.7485 1.000000
Prevalence 0.4696 0.5208 0.009585
Detection Rate 0.3994 0.3834 0.009585
Detection Prevalence 0.5367 0.4537 0.009585
Scene path 194, row 025
SV type: C-svc (classification)
parameter : cost C = 3.125
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 3.5
Number of Support Vectors : 529
Objective Function Value : -423.0207 -180.4852 -270.6232
Probability model included.
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 93 14 1
2 14 55 0
3 7 14 31
Overall Statistics
Accuracy : 0.7817
95% CI : (0.7225, 0.8334)
No Information Rate : 0.4978
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6503
Juan Ygnacio López Hernández A method for forest type detection Apendix 3
- 149 -
Mcnemar's Test P-Value : 0.0003468
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8158 0.6627 0.9688
Specificity 0.8696 0.9041 0.8934
Pos Pred Value 0.8611 0.7971 0.5962
Neg Pred Value 0.8264 0.8250 0.9944
Prevalence 0.4978 0.3624 0.1397
Detection Rate 0.4061 0.2402 0.1354
Detection Prevalence 0.4716 0.3013 0.2271
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 150 -
Apendix 4
Differences in accuracy found among classifications made with ML algorithms for the scenes
under consideration. The figures are presented as fraction in the scale from 0 to the worst to 1
for the best accuracy.
#######################
Scene path 192, row 026
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.07034 0.13193 0.12150 0.14441
rf < 2.2e-16 0.06159 0.05116 0.07407
kNN < 2.2e-16 < 2.2e-16 -0.01042 0.01248
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.02291
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
#######################
Scene path 192, row 027
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
svm rf kNN CondInfTree mnl
svm 0.0242989 0.0615149 0.0613276 0.1121391
rf < 2.2e-16 0.0372160 0.0370287 0.0878402
kNN < 2.2e-16 < 2.2e-16 -0.0001873 0.0506242
CondInfTree < 2.2e-16 < 2.2e-16 3.283e-11 0.0508115
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
#######################
Scene path 193, row 025
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm -0.11698 -0.18045 -0.13540 0.13594
rf < 2.2e-16 -0.06347 -0.01842 0.25292
kNN < 2.2e-16 < 2.2e-16 0.04504 0.31639
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.27134
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 151 -
#######################
Scene path 193, row 026
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.037946 0.108329 0.112928 0.097967
rf < 2.2e-16 0.070384 0.074982 0.060022
kNN < 2.2e-16 < 2.2e-16 0.004598 -0.010362
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 -0.014960
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
#######################
Scene path 193, row 027
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm -0.019735 0.032851 0.024050 0.127233
rf < 2.2e-16 0.052586 0.043785 0.146968
kNN < 2.2e-16 < 2.2e-16 -0.008801 0.094382
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.103184
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
#######################
Scene path 194, row 025
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.0038379 0.0418302 0.0581640 0.0038152
rf <2e-16 0.0379923 0.0543260 -0.0000227
kNN <2e-16 <2e-16 0.0163337 -0.0380150
CondInfTree <2e-16 <2e-16 <2e-16 -0.0543487
mnl <2e-16 <2e-16 <2e-16 <2e-16
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 152 -
Apendix 5
List of messages presented by R language for processing classifiers with
the best accuracy.
***************************************************************************
nfi_full_pam_z4_xyNg_img_192_26
***************************************************************************
******** Resumen ********
Models: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 891000
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.6292 0.6891 0.7041 0.7056 0.7303 0.7640
rf 0.5356 0.6217 0.6386 0.6353 0.6554 0.6891
kNN 0.5131 0.5609 0.5730 0.5737 0.5843 0.6217
CondInfTree 0.4831 0.5693 0.5843 0.5841 0.6105 0.6442
mnl 0.5206 0.5393 0.5618 0.5612 0.5777 0.6030
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.4438 0.5337 0.5562 0.5585 0.5955 0.6461
rf 0.3034 0.4326 0.4579 0.4530 0.4831 0.5337
kNN 0.2697 0.3413 0.3596 0.3606 0.3764 0.4326
CondInfTree 0.2247 0.3539 0.3764 0.3762 0.4157 0.4663
mnl 0.2809 0.3090 0.3427 0.3419 0.3666 0.4045
******** Diferencias en Kappa
Call:
summary.diff.resamples(object = diferencias)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm 0.10551 0.19789 0.18225 0.21662
rf < 2.2e-16 0.09238 0.07674 0.11111
kNN < 2.2e-16 < 2.2e-16 -0.01564 0.01873
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.03436
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
******** Diferencias en Exactitud
Call:
summary.diff.resamples(object = diferenciasA)
p-value adjustment: bonferroni
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 153 -
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.07034 0.13193 0.12150 0.14441
rf < 2.2e-16 0.06159 0.05116 0.07407
kNN < 2.2e-16 < 2.2e-16 -0.01042 0.01248
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.02291
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning SVM ************
Y ~ NDVI + B7 + B1 + B3 + B5 + B4 + B6 + SD7 + SD3 + SD6 + SD2 +
SD4
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 3.12 and sigma = 3.5.
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 3.125
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 3.5
Number of Support Vectors : 529
Objective Function Value : -423.0207 -180.4852 -270.6232
Probability model included.
************* Regresión *************
************* Tablas de contingencia *************
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 93 14 1
2 14 55 0
3 7 14 31
Overall Statistics
Accuracy : 0.7817
95% CI : (0.7225, 0.8334)
No Information Rate : 0.4978
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6503
Mcnemar's Test P-Value : 0.0003468
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8158 0.6627 0.9688
Specificity 0.8696 0.9041 0.8934
Pos Pred Value 0.8611 0.7971 0.5962
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 154 -
Neg Pred Value 0.8264 0.8250 0.9944
Prevalence 0.4978 0.3624 0.1397
Detection Rate 0.4061 0.2402 0.1354
Detection Prevalence 0.4716 0.3013 0.2271
***************************************************************************
nfi_ForestBoudBuf45_xy_img192027Segm
***************************************************************************
******** Resumen ********
Models: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 2073600
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.7228 0.7828 0.7978 0.7985 0.8165 0.8502
rf 0.6854 0.7556 0.7828 0.7742 0.8015 0.8240
kNN 0.6854 0.7219 0.7378 0.7370 0.7537 0.8127
CondInfTree 0.6704 0.7191 0.7416 0.7372 0.7603 0.7903
mnl 0.6517 0.6742 0.6854 0.6864 0.6938 0.7191
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.5843 0.6742 0.6966 0.6978 0.7247 0.7753
rf 0.5281 0.6334 0.6742 0.6613 0.7022 0.7360
kNN 0.5281 0.5829 0.6067 0.6055 0.6306 0.7191
CondInfTree 0.5056 0.5787 0.6124 0.6058 0.6404 0.6854
mnl 0.4775 0.5112 0.5281 0.5296 0.5407 0.5787
******** Diferencias en Kappa
Call:
summary.diff.resamples(object = diferencias)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm 0.0364484 0.0922724 0.0919915 0.1682087
rf < 2.2e-16 0.0558240 0.0555431 0.1317603
kNN < 2.2e-16 < 2.2e-16 -0.0002809 0.0759363
CondInfTree < 2.2e-16 < 2.2e-16 3.283e-11 0.0762172
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
******** Diferencias en Exactitud
Call:
summary.diff.resamples(object = diferenciasA)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 155 -
Accuracy
svm rf kNN CondInfTree mnl
svm 0.0242989 0.0615149 0.0613276 0.1121391
rf < 2.2e-16 0.0372160 0.0370287 0.0878402
kNN < 2.2e-16 < 2.2e-16 -0.0001873 0.0506242
CondInfTree < 2.2e-16 < 2.2e-16 3.283e-11 0.0508115
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning SVM ************
Y ~ B5 + B6 + B7 + NDVI + B4 + B1 + SD6 + SD4 + SD1 + SD7 + SD2
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 1.25 and sigma = 1.
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1.25
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 1
Number of Support Vectors : 407
Objective Function Value : -240.2249 -30.996 -38.3275
Probability model included.
************* Regresión *************
************* Tablas de contingencia *************
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 62 10 0
2 6 69 0
3 2 2 5
Overall Statistics
Accuracy : 0.8718
95% CI : (0.809, 0.9199)
No Information Rate : 0.5192
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7632
Mcnemar's Test P-Value : 0.1718
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8857 0.8519 1.00000
Specificity 0.8837 0.9200 0.97351
Pos Pred Value 0.8611 0.9200 0.55556
Neg Pred Value 0.9048 0.8519 1.00000
Prevalence 0.4487 0.5192 0.03205
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 156 -
Detection Rate 0.3974 0.4423 0.03205
Detection Prevalence 0.4615 0.4808 0.05769
***************************************************************************
pam_BoudBuf45_img193025seg
***************************************************************************
******** Resumen ********
Models: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 414720
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.4175 0.5084 0.5556 0.5670 0.6263 0.7475
rf 0.6431 0.6726 0.6818 0.6840 0.7020 0.7407
kNN 0.6364 0.7029 0.7306 0.7475 0.7946 0.8788
CondInfTree 0.4882 0.6566 0.7189 0.7024 0.7677 0.8316
mnl 0.3872 0.4242 0.4293 0.4311 0.4478 0.4714
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.08539 0.2633 0.3364 0.3534 0.4442 0.6234
rf 0.45830 0.5016 0.5169 0.5208 0.5471 0.6071
kNN 0.46650 0.5603 0.6024 0.6267 0.6957 0.8194
CondInfTree 0.20500 0.4947 0.5750 0.5555 0.6515 0.7466
mnl 0.05321 0.1115 0.1192 0.1205 0.1453 0.1840
******** Diferencias en Kappa
Call:
summary.diff.resamples(object = diferencias)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm -0.16738 -0.27330 -0.20204 0.23291
rf < 2.2e-16 -0.10592 -0.03466 0.40029
kNN < 2.2e-16 < 2.2e-16 0.07126 0.50621
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.43495
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
******** Diferencias en Exactitud
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm -0.11698 -0.18045 -0.13540 0.13594
rf < 2.2e-16 -0.06347 -0.01842 0.25292
kNN < 2.2e-16 < 2.2e-16 0.04504 0.31639
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.27134
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning kNN ************
Y ~ B2
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 157 -
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 3.
3-nearest neighbor classification model
************* Regresión *************
************* Tablas de contingencia *************
Training set class distribution:
1 2 3
333 282 264
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 20 0 0
2 4 11 0
3 5 1 5
Overall Statistics
Accuracy : 0.7826
95% CI : (0.6364, 0.8905)
No Information Rate : 0.6304
P-Value [Acc > NIR] : 0.02062
Kappa : 0.6464
Mcnemar's Test P-Value : 0.01857
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.6897 0.9167 1.0000
Specificity 1.0000 0.8824 0.8537
Pos Pred Value 1.0000 0.7333 0.4545
Neg Pred Value 0.6538 0.9677 1.0000
Prevalence 0.6304 0.2609 0.1087
Detection Rate 0.4348 0.2391 0.1087
Detection Prevalence 0.4348 0.3261 0.2391
***************************************************************************
nfi_Buf45_img193026
***************************************************************************
******** Resumen ********
Models: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 2280960
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.6180 0.6957 0.7116 0.7103 0.7303 0.7640
rf 0.5843 0.6554 0.6779 0.6723 0.7004 0.7191
kNN 0.5468 0.5730 0.5918 0.6019 0.6292 0.6667
CondInfTree 0.4944 0.5730 0.5993 0.5973 0.6217 0.6742
mnl 0.5618 0.5908 0.6105 0.6123 0.6339 0.6554
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 158 -
svm 0.4270 0.5435 0.5674 0.5654 0.5955 0.6461
rf 0.3764 0.4831 0.5169 0.5085 0.5506 0.5787
kNN 0.3202 0.3596 0.3876 0.4029 0.4438 0.5000
CondInfTree 0.2416 0.3596 0.3989 0.3960 0.4326 0.5112
mnl 0.3427 0.3862 0.4157 0.4184 0.4508 0.4831
******** Diferencias en Kappa
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm 0.056919 0.162494 0.169392 0.146951
rf < 2.2e-16 0.105575 0.112473 0.090032
kNN < 2.2e-16 < 2.2e-16 0.006898 -0.015543
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 -0.022441
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
******** Diferencias en Exactitud
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.037946 0.108329 0.112928 0.097967
rf < 2.2e-16 0.070384 0.074982 0.060022
kNN < 2.2e-16 < 2.2e-16 0.004598 -0.010362
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 -0.014960
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning SVM ************
Y ~ NDVI + B5 + B4 + B7 + B2 + B6 + B1 + B3 + SD5 + SD7 + SD1 +
SD4
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 1.25 and sigma = 2.
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1.25
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 2
Number of Support Vectors : 591
Objective Function Value : -248.9349 -138.705 -104.0795
Probability model included.
************* Regresión *************
************* Tablas de contingencia *************
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 159 -
1 179 32 0
2 29 74 0
3 13 3 31
Overall Statistics
Accuracy : 0.7867
95% CI : (0.7408, 0.8278)
No Information Rate : 0.6122
P-Value [Acc > NIR] : 1.009e-12
Kappa : 0.6085
Mcnemar's Test P-Value : 0.001058
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8100 0.6789 1.00000
Specificity 0.7714 0.8849 0.95152
Pos Pred Value 0.8483 0.7184 0.65957
Neg Pred Value 0.7200 0.8643 1.00000
Prevalence 0.6122 0.3019 0.08587
Detection Rate 0.4958 0.2050 0.08587
Detection Prevalence 0.5845 0.2853 0.13019
***************************************************************************
nfi_BoudBuf45_img193027seg
***************************************************************************
******** Resumen ********
MModels: svm, rf, kNN, CondInfTree, mnl
Number of resamples: 648000
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.7266 0.7528 0.7678 0.7653 0.7790 0.8090
rf 0.7378 0.7640 0.7790 0.7787 0.7940 0.8127
kNN 0.6891 0.7116 0.7266 0.7267 0.7416 0.7790
CondInfTree 0.6367 0.7228 0.7341 0.7355 0.7566 0.7828
mnl 0.5918 0.6142 0.6273 0.6323 0.6517 0.6779
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.5899 0.6292 0.6517 0.6480 0.6685 0.7135
rf 0.6067 0.6461 0.6685 0.6680 0.6910 0.7191
kNN 0.5337 0.5674 0.5899 0.5900 0.6124 0.6685
CondInfTree 0.4551 0.5843 0.6011 0.6032 0.6348 0.6742
mnl 0.3876 0.4213 0.4410 0.4484 0.4775 0.5169
******** Diferencias en Kappa
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm -0.01998 0.05802 0.04482 0.19959
rf < 2.2e-16 0.07800 0.06479 0.21957
kNN < 2.2e-16 < 2.2e-16 -0.01320 0.14157
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.15478
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 160 -
******** Diferencias en Exactitud
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm -0.013318 0.038679 0.029878 0.133061
rf < 2.2e-16 0.051998 0.043196 0.146380
kNN < 2.2e-16 < 2.2e-16 -0.008801 0.094382
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.103184
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning RF ************
Y ~ B5 + B7 + NDVI + B4 + B2 + SD1 + B3 + B1 + B6
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 3.
************* Regresión *************
************* Tablas de contingencia *************
Confusion Matrix and Statistics
Reference
Prediction 1 2 3
1 102 45 0
2 42 108 0
3 3 10 3
Overall Statistics
Accuracy : 0.6805
95% CI : (0.6257, 0.7318)
No Information Rate : 0.5208
P-Value [Acc > NIR] : 7.012e-09
Kappa : 0.3965
Mcnemar's Test P-Value : 0.004418
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.6939 0.6626 1.000000
Specificity 0.7289 0.7200 0.958065
Pos Pred Value 0.6939 0.7200 0.187500
Neg Pred Value 0.7289 0.6626 1.000000
Prevalence 0.4696 0.5208 0.009585
Detection Rate 0.3259 0.3450 0.009585
Detection Prevalence 0.4696 0.4792 0.051118
***************************************************************************
nfi_Buf45_img194025segm
***************************************************************************
******** Resumen ********
Models: svm, rf, kNN, CondInfTree, mnl
Juan Ygnacio López Hernández A method for forest type detection Apendix 5
- 161 -
Number of resamples: 891000
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.6292 0.6891 0.7041 0.7056 0.7303 0.7640
rf 0.5356 0.6217 0.6386 0.6353 0.6554 0.6891
kNN 0.5131 0.5609 0.5730 0.5737 0.5843 0.6217
CondInfTree 0.4831 0.5693 0.5843 0.5841 0.6105 0.6442
mnl 0.5206 0.5393 0.5618 0.5612 0.5777 0.6030
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max.
svm 0.4438 0.5337 0.5562 0.5585 0.5955 0.6461
rf 0.3034 0.4326 0.4579 0.4530 0.4831 0.5337
kNN 0.2697 0.3413 0.3596 0.3606 0.3764 0.4326
CondInfTree 0.2247 0.3539 0.3764 0.3762 0.4157 0.4663
mnl 0.2809 0.3090 0.3427 0.3419 0.3666 0.4045
******** Diferencias en Kappa
Call:
summary.diff.resamples(object = diferencias)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Kappa
svm rf kNN CondInfTree mnl
svm 0.10551 0.19789 0.18225 0.21662
rf < 2.2e-16 0.09238 0.07674 0.11111
kNN < 2.2e-16 < 2.2e-16 -0.01564 0.01873
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.03436
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
******** Diferencias en Exactitud
Call:
summary.diff.resamples(object = diferenciasA)
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
Accuracy
svm rf kNN CondInfTree mnl
svm 0.07034 0.13193 0.12150 0.14441
rf < 2.2e-16 0.06159 0.05116 0.07407
kNN < 2.2e-16 < 2.2e-16 -0.01042 0.01248
CondInfTree < 2.2e-16 < 2.2e-16 < 2.2e-16 0.02291
mnl < 2.2e-16 < 2.2e-16 < 2.2e-16 < 2.2e-16
************ Tuning SVM ************
Pre-processing: centered, scaled
Resampling: Repeated Train/Test Splits (10 reps, 0.66%)
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 3.12 and sigma = 3.5.
************* Regresión *************
************* Tablas de contingencia *************
Confusion Matrix and Statistics
Apendix 5 Development of a method for forest type detection Juan Ygnacio López Hernández
- 162 -
Reference
Prediction 1 2 3
1 93 14 1
2 14 55 0
3 7 14 31
Overall Statistics
Accuracy : 0.7817
95% CI : (0.7225, 0.8334)
No Information Rate : 0.4978
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6503
Mcnemar's Test P-Value : 0.0003468
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8158 0.6627 0.9688
Specificity 0.8696 0.9041 0.8934
Pos Pred Value 0.8611 0.7971 0.5962
Neg Pred Value 0.8264 0.8250 0.9944
Prevalence 0.4978 0.3624 0.1397
Detection Rate 0.4061 0.2402 0.1354
Detection Prevalence 0.4716 0.3013 0.2271