ANALYSIS OF SATELLITE IMAGES TO TRACK DEFORESTATION A
57
ANALYSIS OF SATELLITE IMAGES TO TRACK DEFORESTATION A Degree Thesis Submitted to the Faculty of the Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona Universitat Politècnica de Catalunya by Irene Šimić de Torres In partial fulfilment of the requirements for the degree in SCIENCE AND TELECOMUNICATION TECHNOLOGIES ENGINEERING Advisor: Philippe Salembier, Andres Pérez Uribe Barcelona, June 2016
ANALYSIS OF SATELLITE IMAGES TO TRACK DEFORESTATION A
Master's ThesisDEFORESTATION
Barcelona
SCIENCE AND TELECOMUNICATION TECHNOLOGIES
Barcelona, June 2016
1
Abstract
Deforestation around the world, especially on the tropics, is a
current very important
problem that is not being monitored appropriately, leading to
belated reactions by
environmental organizations and governments and more deforestation
fronts every year.
A project called Terra-I was created a few years ago with the aim
of changing this by
carrying out a temporal analysis of satellite images, from NASA
MODIS satellite, to
monitor near-real time information and accelerate the reactions to
new deforestation
fronts.
This was performed at first only for the Amazonian forest and the
whole South America
but it is willing to be expanded to the whole tropics.
Due to the launching of landsat8 satellite, with improved
information about satellite
images, this project proposes the idea of adding to that analysis a
spatial one to improve
the results of the deforestation detection carried out by Terra-I
with the aid of machine
learning algorithms.
2
Resum
La desforestació mundial, especialment en els tròpics, es un
problema actual molt
important que no està sent controlat de manera apropiada, portant a
que organitzacions
ambientals i governs reaccionin tard a controlar la desforestació
i, per tant, a augmentar-
la any rere any.
Fa uns anys, el projecte Terra-I fou creat amb l’objectiu de
canviar-ho duent a terme un
anàlisi temporal d’imatges del satèl·lit NASA MODIS per processar
informació en temps
quasi real i accelerar així la reacció envers a nous fronts de
desforestació.
En un principi es va dur a terme en zones de selva amazònica i per
a tota Amèrica del
Sud, però s’està treballant en l’ampliació sobre tots els
tròpics.
Gràcies al llançament del satèl·lit landsat8, amb informació
millorada sobre les imatges
obtingudes, aquest projecte proposa la idea d’afegir un anàlisi
espaial per millorar els
resultats de la detecció de desforestació que duu a terme Terra-I
amb l’ajuda
d’algoritmes de machine learning.
3
Resumen
La desforestación mundial, especialmente en los trópicos, es un
problema actual muy
importante que no está siendo controlado de manera apropiada, dando
lugar a que
organizaciones ambientales y gobiernos reaccionen tarde a controlar
la desforestación y,
por tanto, a aumentarla año tras año.
Hace unos años, el proyecto Terra-I fue creado con el objetivo de
cambiarlo llevando a
cabo un análisis temporal de imágenes del satélite NASA MODIS para
procesar
información en tiempo casi real y acelerar así la reacción respecto
a nuevos frentes de
desforestación.
En un principio se llevó a cabo en zonas de selva amazónica y para
toda América del
Sur, pero se está trabajando en la ampliación sobre todos los
trópicos.
Gracias al lanzamiento del satélite landsat8, con información
mejorada sobre las
imágenes obtenidas, este proyecto propone la idea de añadir un
análisis espacial para
mejorar los resultados de la detección de deforestación que lleva a
cabo Terra-I con la
ayuda de algoritmos de machine learning.
4
For those who were by my side during these last 5 years
5
Acknowledgements
I would like to thank my advisors, Philippe Salembier and Andres
Perez, who did not
doubt helping me to develop this project whenever I needed some
advice and made it
what it became, and to Julien Rebetez, for guiding me along these
last five months,
always with a smile and helping me getting out of any trouble I got
into during the project.
Also, I would like to thank all the new friends I made in
Switzerland, that were going
through the same thing and with whom I spent uncountable afternoons
working on the
office. You made the work easier and I am really thankful for
that.
Finally, I would like to thank my family and my couple, who were
always there to cheer
me up when everything was upside down.
Thank you all for these amazing five months that you have been
there for me.
6
Revision Date Purpose
Date 10/07/2016 Date 15/07/2016
Table of contents
..............................................................................................................
7
List of Figures
...................................................................................................................
9
List of Tables
..................................................................................................................
10
1.3. Workplan
..........................................................................................................
14
1.3.2. Gantt Diagram
...........................................................................................
16
2.1. Machine Learning
.............................................................................................
19
3.1.1. Libraries
....................................................................................................
25
3.2. Data
.................................................................................................................
25
3.3.1. Optimal model
...........................................................................................
29
3.3.1.2. Classification quality estimation
...............................................................
30
3.3.1.3. Definition of the model
.............................................................................
31
8
4. Results
....................................................................................................................
37
4.1. Training with the final model and Logistic Regression model
............................ 37
4.1.1. Training with upper half
.............................................................................
39
4.1.1.1. The model performed during this project .... ¡Error!
Marcador no definido.
4.1.1.2. Logistic Regression
.................................................................................
40
4.1.2.1. The model performed during this project
................................................. 42
4.1.2.2. Logistic Regression
.................................................................................
42
4.2.2. Detections
.................................................................................................
46
5. Budget
.....................................................................................................................
51
Bibliography
....................................................................................................................
54
Figure 2.1: Example of dataset
..................................................................................
p.20
Figure 2.2: Example of neural network for three classes
............................................. p.21
Figure 2.3: Example of convolutional neural network for CIFAR-10
............................ p.22
Figure 3.1: Jupyter notebook sheet example
..............................................................
p.24
Figure 3.2: Image sizes tests, where 0 is forest, 1 is field and 2
is clouds .................. p.27
Figure 3.3: example of labelling in a South American region
....................................... p.28
Figure 3.4: example of a good learning process’ loss and accuracy
graphic ............... p.29
Figure 3.5: Representation of the model chosen
.........................................................
p.32
Figure 3.6: First convolutional layer filter’s extracted features
for both bands ............. p.33
Figure 3.7: Band 4 values for forest, field and
clouds.................................................. p.33
Figure 3.8: Band 5 values for forest, field and
clouds.................................................. p.34
Figure 3.9: Representation of the classification process with the
trained model .......... p.35
Figure 3.10: Output values after the classification process on
figure 3.9 ..................... p.36
Figure 4.1: South American satellite zone and image used for this
experiment ........... p.37
Figure 4.2: labelling for the region 227_65_290
..........................................................
p.38
Figure 4.3: labelling for the region 227_65_172
..........................................................
p.38
Figure 4.4: labelling for the region 1_86
......................................................................
p.38
Figure 4.5: Region classified on the experiment “training with the
upper half” ............. p.39
Figure 4.6: Result of the classification with our model for the 1st
training .................... p.40
Figure 4.7: Result of the classification with logistic regression
for the 1st training ........ p.41
Figure 4.8: Region classified on the experiment “training with the
lower half” ............. p.41
Figure 4.9: Result of the classification with our model for the 2nd
training .................... p.42
Figure 4.10: Result of the classification with logistic regression
for the 2nd training ..... p.43
Figure 4.11: Classification for 2013 image from wet season
....................................... p.44
Figure 4.12: Classification for 2015 image from wet season
....................................... p.45
Figure 4.13: Classification for 2013 image from dry season
........................................ p.45
Figure 4.14: Classification for 2015 image from dry season
........................................ p.45
Figure 4.15: Deforestation detection wet season
........................................................ p.46
Figure 4.16: Deforestation detection dry season
.........................................................
p.46
Figure 4.17: Final detection on 2013 image for wet season
........................................ p.47
Figure 4.18: Final detection on 2015 image for wet season
........................................ p.47
Figure 4.19: Final detection on 2013 image for dry season
......................................... p.48
10
Figure 4.20: Final detection on 2015 image for dry season
......................................... p.48
Figure 4.21: Our detection
..........................................................................................
p.49
Figure 4.22: Tree cover loss detection
........................................................................
p.49
11
Table 3.1: band designations for landsat8
...............................................................
p.26
Table 4.1: classification report on our model for the 1st training
............................... p.38
Table 4.2: classification report on logistic regression for the 1st
training ................... p.39
Table 4.3: classification report on our model for the 2nd training
............................... p.41
Table 4.4: classification report on logistic regression for the 2nd
training .................. p.42
12
1. Introduction
It is well known that forests are vital to the planet, providing
critical ecosystem services
and livelihood to people and shelter to wildlife but, despite that
fact, human activities
leading to deforestation are rapidly threatening them.
Half of the world’s tropical forests (which cover about 47% of the
world’s forest) have
been destroyed over the last century and, according to the WWF
Living Forests1 model,
up to 170 million hectares of additional deforestation will occur
by 2030 if business as
usual continues.
The figure 1.1 below these lines shows a map from WWF with the
current deforestation
fronts.
Figure 1.1: WWF map with current deforestation fronts
Although, as stated previously, humans have had clearly profound
impacts on the Earth’s
natural ecosystem, in many parts of the world the scale and pattern
of habitat loss goes
unmonitored or roughly monitored.
This problem leads to taking conservation and sustainable
development decisions to
manage those impacts on ecosystem services without a complete
understanding of the
current state and recent history of land cover and use
change.
In response to this, some researchers of the CIAT (Center for
Tropical Agriculture)
created a project called Terra-I2 [1] and composed a deforestation
map with the
monitoring of near real time information based on machine learning
algorithms.
Those algorithms allowed Terra-I to monitor the information only in
a temporal way, by
comparing images from the same place but at different points in
time pixel by pixel, which
ended leading to quite good results.
1http://wwf.panda.org/wwf_news/?245370/Over-80-of-future-deforestation-confined-to-
just-11-places 2http://www.terra-i.org
13
The reason of monitoring only in a temporal way is because, despite
the fact that the
spatial analysis was a very good option, by that time the best
sensor for this kind of
detection (and the one used by Terra-I), NASA MODIS, could measure
the greenness of
the earth surface every 16 days, but only with a 250m resolution,
which is quite poor to
achieve the detection in a spatial way.
However, the remote sensing of the Earth from satellites has made a
great progress
since the moment when Terra-I started, and on 2013 a new satellite
with a 30m spatial
resolution called landsat8 was launched.
This launching led to think again about the possibility of
developing the spatial analysis of
the images, since 30m spatial resolution could be enough for the
algorithms to distinguish
the type of soil of the pictures. This would lead to a great
improvement in the detection by
the addition of this analysis to the temporal one, and that is how
this bachelor’s thesis
main goal was set.
The following sections will lead to an introduction to the
objectives of this work.
1.1. Statement of purpose
The main goal of this bachelor’s thesis is to create a new
algorithm based on spatial
information to distinguish and classify as forests and fields
landsat8 satellite images, in
order to be able to track deforestation.
At first, this implies making the algorithm able to distinguish
between the different levels
of greenness and forms of the soil and relating those colours and
architectures to fields or
forests.
This leads to the creation of a model able to carry out the
classification with a good level
of precision-complexity relation.
After this step, the model must be trained with a certain number of
tropical forest images
that will allow the classification around some tropical zones with
quite good precision. As
Terra-I did until now, we’re going to give priority to the forest
around South America,
since it has been the first world’s most important deforestation
front for a long time now.
The final objective will be to show how this implementation
improves the deforestation
detection and works as well as the previous algorithms and even
improves some of the
failures found before, by analyzing some results and experiments
carried out at the end
of the project.
- Python language to develop the code.
- Keras library to create the model.
- Landsat 8 satellite images to train and test the model.
14
Project specifications
- Create a classification model to be able to distinguish between
the different levels
of greenness of the forests around the globe and depending on the
season.
- Train the model with specific tropical forest images to reach the
detection along
the whole tropics.
- Detect the levels of greenness on the same zone in different
years to be able to
track the deforestation occurred in that period of time.
1.3. Workplan
Major constituent: Planning
specifications. First contact with the project.
Planned start date: 22/02/2016
Planned end date: 01/03/2016
Internal task T1: Definition of the project and the timeplan
Internal task T2: Elaboration of the document Project
Proposal
and Workplan.
Short description:
Manual classification of the images that are going to be used
to train the classification program
Planned start date: 01/03/2016
Planned end date: 09/03/2016
Internal task T2: Learn to download the images from landsat8
Internal task T3: Use GRASS7 to color the images
Deliverables:
they connected to Python and Keras
Short description:
Get familiarized with concepts such as neural networks, Keras
and libraries that may be used by it as Tensorflow and Theano.
Start event: 10/03/2016
End event: 08/04/2016
Internal task T2: Get familiarized with the neural networks
term
Internal task T3: Study the use of Keras in neural networks
Internal task T4: Get familiarized with neural networks
examples
such as CIFAR10
WP#4: Programming of the Neural Network WP ref: (WP4)
Major constituent: Creation of the model that will be used to
classify the images
Short description:
Creation of the optimal Neural Network model by training it
and
testing its accuracy with the results of its classification.
Keep
working on it as we increase the number of classes.
Planned start date: 08/04/2016
Planned end date: 17/06/2016
Internal task T3: Classification prediction for new images.
Internal task T4: Elaboration of the Critical Review
Deliverables:
Critical
Review
Dates:
09/05/2016
Deforestation detection
Short description:
Classification of two images from the same place delayed on
time and detect the level of deforestation occurred during
that
period.
Start event: 26/05/2016
End event: 12/06/2016
Internal task T1: Classify the two images with the model
created
before.
levels of both images.
Major constituent: Final experiments
Making some experiments to end up with the project and
preparing and writing the final report
Planned start date: 17/06/2016
Planned end date: 08/07/2016
Deliverables:
1.4. Incidences
Two of the goals that we wanted to achieve have not been carried
out because of the
lack of time: the amplification of the deforestation range to the
whole tropics (although it
has been introduced) and the implementation of the algorithm in
Terra-I, which has not
been performed yet.
The first one it turned out to take too long on computational time,
since it requires a large
dataset of images that it was not possible to manage in such a
short time.
On the other hand, although it is planned to implement the
algorithm in Terra-I it is not
going to be done at the end of this period.
1.5. Content
To start the document, a first background and review of the
relevant and recent research
on the methodology used is made. In this section, Machine learning
and Classification
algorithms will be introduced, as well as the data that Terra-I
previously used.
On the next section, the Machine learning methodologies chosen will
be more profoundly
exposed, as well as the justifications for the procedures taken
during the project. The
main algorithm chosen for the classification will be shown and
justified.
The next chapter of the document will show how the implementation
of this algorithm is
functional and, relating it to the results, how it can provide an
improvement for the
deforestation detection in the main project.
Finally, at the end of the document a section for conclusions will
cover and relate all the
sections and experiments of this project, by justifying the
usefulness of this research in
the future of Terra-I.
2.1. Machine Learning
Machine learning solves the requirement of how to build computers
that improve
automatically through experience. It is one of today’s most rapidly
growing technical fields,
lying between computer science, statistics, artificial intelligence
and data science [2].
To clearly address this field to the needs of this project, the
chapter will introduce this tool
applied to image recognition.
2.1.1. Introduction
As stated before, machine learning aims to “make computers learn”
about something, as
well as humans do. Some different learning paths have been proved
to achieve that from
the start of machine learning until nowadays, but they can be all
divided into two main
categories: supervised and unsupervised learning [2][3].
In the first case, supervised learning, a dataset with examples of
input / output has been
provided before the learning process, which means that the process
improves with the
aim of achieving the desired output already known. Therefore it
simulates a learning
process with a teacher, who guides it until it gets to the desired
output.
In the second one, there is no desired output, so it allows
approaching problems with little
or no idea what the results should look like. There is no feedback,
so nobody supervises
the process.
Supervised learning problems are categorized into regression and
classification problems,
in which the difference is the continuous and discrete output
respectively.
In the case of this project, we would want to determine if a
specific zone of the planet has
suffered from deforestation or not. To do so, knowing that the
output can be forest/field
type is enough for the computation.
Therefore, since we will have a dataset of examples and a discrete
known output before,
we will use a classification approach.
2.1.2 Classification
In classification the purpose is to determine what the input data
is by separating it into
some discrete previously defined values.
In our case this would mean define whether a specific zone of an
image corresponds to
field or forest output type or, as they are called in this type of
machine learning, class. To
do so, the algorithm that will be able to achieve the
classification must have a dataset of
examples that relate some kind of image section to some kind of
output type.
20
For example, we could have in the dataset the data shown on figure
2.1 (where 0
corresponds to “forest” and 1 to “field”) previous to the learning
process.
This kind of data may help the algorithm to learn how forest and
field areas look like and
the differences they have, in order be able to create a decision
boundary and separate
the data corresponding to each class.
Although it has been introduced the classification for only two
classes, is also possible to
use it for multiple outputs.
In fact, as we kept working on the project we realized that it
would be worth it to add a
new class to the algorithm: clouds. This is due to the fact that,
since we work with satellite
images, it is almost impossible to find all the set of images
without any clouds. Hence to
improve the classification we ended up creating an algorithm able
to classify data into
those three outputs.
2.1.3 Neural networks
As we introduced before, machine learning appeared with the aim of
making machines
able to learn, which meant creating some kind of “brain” with a set
of algorithms that
could imitate an animal brain itself. That is why, in one of the
most important supervised
learning techniques, these sets of algorithms are called neural
networks.
Since neural networks are inspired from the animals’ brain, the
model of how they work is
a very simplified version of how we know the brain works.
At a very simple level, we could say that neurons are computational
units that take inputs
that are channelled into outputs, by weighting them and applying a
so called “activation
function”.
A set of neurons create a layer that is connected to other layers
and can be either input,
hidden or output.
21
Input layer: is the one giving data to the neural network.
Output layer: is the one that contains the set of final values
computed along the
learning process.
Hidden layer(s): all the layers in between guiding the learning
process to the
desired output.
All neural networks must have one input layer, one output layer and
as many hidden
layers as needed.
We can see an example of a neural network for three classes in the
figure 2.2.
Figure 2.2: Example of neural network for three classes
In the figure we can distinguish the different layers and their set
of nodes. Each node is
activated following a mapping function from layer k to layer k+1
controlled by the set of
parameters w called weights.
During the learning process, these weights are updated and improved
with the help of the dataset. The training goal is to produce the
best weights: the one with the highest accuracy when performing the
classification.
We can also differenciate the bias nodes in pink from the others
(called neurons) in the
same layer. These nodes’ weights are always 1.
The reason for this is that their goal is to shift the linear
combination represented by other
weights. Thinking about 1D example, the bias b allows to go from y
= ax to y = ax + b,
where in the first case there is a limitation by lines that pass
through the space origin and
in the second one all possible lines in the plane can be
described.
2.1.1.1. Convolutional neural networks
Convolutional neural network is a type of feed-forward artificial
neural network in which
the connectivity pattern between its neurons is inspired by the
organization of the
animal visual cortex. This is why this type of neural networks is
going to be the one
employed to create the classification algorithm.
It is very common when starting to use convolutional neural
networks to create simple
networks for CIFAR-10 classification [5]. This is an established
computer-vision dataset
10 object classes, with 6000 images per class.
On the other hand, there are three main types of layers used to
build Convolutional
Neural Networks architectures: Convolutional Layer, Pooling Layer
and Fully-connected
Layer [4].
In figure 2.3 a simple example of a convolutional neural network
for CIFAR-10 is shown.
The network starts with an input layer containing 3 different data
(RGB) of a typical 32x32
CIFAR 10 image that, through the process of the network, ends up
corresponding to one
of the 10 different classes of the model.
In this network we see before the output a fully-connected layer.
This layer, after the
whole process, gives for each class the probability that the input
image corresponds to it.
That is how at the end, by using a specific function, we can decide
the class that the input
picture belongs to.
Figure 2.3: Example of convolutional neural network for
CIFAR-10
The specifications of each layer used during this project will be
explained on the next
chapter.
2.2. Terra-I
With the aim of introducing where this project takes part on
Terra-I, a brief introduction of
the data used currently by that project is going to be
performed.
Terra-I worked until now with data obtained by remote sensing by
250m resolution
MODIS satellite, and has been using some methods in order to
temporary analyze the
images, since the resolution was too poor to develop the spatial
analysis.
The data used for the analysis is determined by 3 different
measurements [6][7] given by
MODIS:
Quality
23
The point after collecting these measurements was to use them with
the aim of predicting
the future NDVI value for a given point based on the current and
previous NDVI and rain
(TRMM).
NDVI
NDVI3 (Normalized Difference Vegetation Index) measures the
vegetation index of some
region by using Near Infrared (NIR) and Visible red (VIR)
measurements in the same
zone.
Actually, the computation is as simple as:
This measurement takes into account that live green plants strongly
absorb visible light
(from 0.4 to 0.7 µm) and strongly reflects near-infrared light
(from 0.7 to 1.1 µm).
Although this index is very reliable, it has some problems relevant
for the case of study.
For example, it doesn’t give good results if the sensed area is
covered with clouds.
Quality
This is a value also given by MODIS that indicates the level of
precision of the sensor
measurement on the sensed area. This value evaluates the quality of
the products with
respect to their intended performance pixel per pixel.
Trying to erase the bad results from NDVI, this index can be vital
to determine if the area
may or not be cloudy.
However, this measurement is not always reliable, so clouds are
still a problem in the
process; actually, the main problem.
In fact, this was the reason why, as briefly introduced on section
2.1.2, while performing
this improvement of the project we decided to detect also clouds on
the images, in order
for us to be able to erase all the information in a more reliable
way.
This will be more specifically explained later on the
document.
TRMM
TRMM4 (Tropical Rainfall Measuring Mission) is the last feature
used in the process. It
measures the level of precipitations on Earth.
The next chapter will show the different data taken now from the
landsat8 satellite and its
relation to these measurements.
3. Methodology
This chapter exposes the methods used during the project as well as
its development and
improvement.
3.1. Programming language and work environment
To carry out the project, the language chosen to program the
algorithm was Python, and
the resource used to program in Python: IPython notebook5.
This notebook, also known as Jupyter notebook, is an interactive
computational
environment in which you can combine code execution, text,
mathematics and plots.
In figure 3.1 an example of a Jupyter notebook sheet is
shown.
5https://ipython.org/notebook.html
25
3.1.1. Libraries
Between the large amounts of machine learning libraries existing
nowadays, there are
two main ones specifically for python language: Scikit-learn6 and
Keras7.
Scikit-learn library is the deepest one in machine learning and the
most popular
among all languages. It is build on top of NumPy and SciPy.
Keras, on the other hand, is a minimalist, highly modular neural
networks library,
capable of running on top of either Theano8 or Tensorflow9
libraries.
For the development of this project the library used to create and
train the algorithm was
Keras, and it was used on top of both Theano and Tensorflow. This
is an advantage
because both of them are very useful for different uses in neural
networks.
The way Keras works is very easy, since its core data structure is
a model that organizes
layers. The main type of model, and also the one used in this
project, is the Sequential
model.
As a neural networks library, it contains all types of layers,
among them the ones used for
Convolutional neural networks: Convolutional layers, MaxPooling
layers, Activation layers
and Dense, which is the regular fully-connected layer that we saw
before in this
document.
3.2. Data
The data used for computing the classification are Landsat Surface
Reflectance High
Level satellite images from landsat8, extracted from the USGS earth
explorer10. These
images have a 30m pixel resolution and have been improved by
computing the correction
of the surface reflectance, which make its analysis easier and more
accurate.
The pictures extracted from this satellite are 11 band images
[8][9], each one of which
has special features seen on the table 3.1.
Bands Name Description
Band 1 Coastal aerosol Ultra-blue band. Useful for coastal
and
aerosol studies.
Band 4 Red
Band 5 Near Infrared (NIR) Especially important for ecology
because
plants reflect it.
useful for differing wet earth from dry
earth and for geology (strong contrast
between soil and rocks). Band 7 SWIR 2
Band 8 Panchromatic Combines all collecting visible colors
into
one channel.
Band 9 Cirrus Useful for cirrus cloud detection.
Band 10 Thermal Infrared (TIRS) 1 Useful in providing more accurate
surface
temperatures. They are collected at 100
meters. Band 11 Thermal Infrared (TIRS) 2
Table 3.1: Band designations for landsat8
The previous chapter showed that Terra-I used the value of NDVI to
distinguish between
forested and deforested zones and it was computed by using VIR and
NIR values. On
the table above, those values are given by bands 4 and 5, so it
would make sense, then,
for the aim of this project, to use these two bands from the
landsat8 satellite.
On the other hand, it was also described that a very important
feature used in Terra-I was
the Quality value, which was determined basically by the cloudiness
of the image. To
solve the problem of the wrong quality values in this project, as
previously stated, a new
class called clouds was added instead of using the quality
value.
Thus, no matter if the image is cloudy or not, the classification
model would be able to
make a good detection of the deforestation by using the information
about clouds.
3.2.1. Images size
To perform the detections, we wanted to classify each pixel of the
landsat images by
taking into account a specific region around it, so that the
analysis could be in a spatial
way.
For this reason, a few tests with small sizes were performed until
choosing the best
image size to execute the classification.
The tests carried out were using masks of 64x64 pixels, 32x32
pixels and 16x16 pixels
around the pixel that we wanted to classify. The figure 3.2 shows
the different attempts of
classification with these three different sizes.
In the case of smaller sizes it would not make sense to take them
into account because
the model would not be able to differentiate between field and
forest, since their colours
and architectures at those small sizes would be very similar, so it
is not an option.
After all the attempts, 16x16 pixels size seemed the best choice,
since the model could
recognize what was on the image with quite enough precision and the
process was very
much quicker than the other two.
27
Figure 3.2: Image sizes tests, where 0 is forest, 1 is field and 2
is clouds
3.2.2. Images labelling
Working with classification asks for having some datasets and
examples before the
learning process is carried out, as well as the desired output for
those examples.
Therefore, before starting to program the algorithm, it is
necessary to create it. In this
case this dataset must be composed by some images where their
regions have to be
related to the values 0 (forest), 1 (field) and 2 (clouds).
To do so, there exists a program that allows performing this on a
very easy way, QGIS10.
This program allows the manually labelling of the images that must
be used later to train
and test the neural network model.
Also, a program called GRASS711 can be used to turn the images,
which are 11 band
images, into RGB by gathering the RGB bands together and,
therefore, identify better
their features.
In figure 3.3 we can see an example of a labelling from a South
American region.
10http://www.qgistutorials.com/en/
11https://grass.osgeo.org/grass7/
28
The results of the labelling remain saved on a polygon file that
recognizes each selected
region as forest, field or cloud. This file is the one that must be
loaded, as well as the
image in which the manual classification was computed, so that the
neural network model
can be trained.
Below these lines there is an example of the loading of this file
and both bands from the
image.
After the loading, a relation between the data must be performed.
That is relating in the
program the image as the input dataset and the polygon file as the
output desired.
Thankfully, this step was already fulfilled on Terra-I before, so
for this project the function
made then could be re-used. This function is called
rasterize_label_shp (See Apendix A)
and relates the shape layer (extracted from the polygon file) to
the region of the globe
where it belongs to.
Thereafter, the algorithm must be able to relate each output value
(0, 1 or 2) from the
shape file to the 4th and 5th bands’ values in the same point of
the picture.
shape_dataset =
ogr.Open(os.path.join('fields_polygon_227_290.shp')) shape_layer =
shape_dataset.GetLayer(0) band_4 =
gdal.Open(os.path.join('LC82270652015290LGN00_sr_band4.tif'))
band_5 =
gdal.Open(os.path.join('LC82270652015290LGN00_sr_band5.tif'))
Figure 3.3: Example of labelling in a South American region
29
3.3. Algorithm creation and development
The algorithm to train and validate the convolutional network able
to do the desired
classification must be defined, as well as the network itself. To
do so, the process is
divided in two main parts: the search for the optimal model (by
creating, training and
testing the neural network model) and the visual validation on
images.
3.3.1. Optimal model
In this section of the project it is taken into account the
creation of the model together with
the training and testing because it is after the results of these
processes that the model
must be changed and improved little by little.
To start at first with the creation of the model, the use of
convolutional neural network
examples made for CIFAR10 (see section 2.1.1.1) dataset is very
usual. In this case so,
we followed the same path, only adapting it to our dataset. That
is, for example, changing
the input layer, since the images chosen for this project have
16x16pixel size unlike the
CIFAR-10 ones, as well as the output, since this project only
requires 3 classes: forest,
field and clouds.
From here, while training the model and testing it, the optimal
option was chosen.
3.3.1.1. Training of the model
The first thing to do before carrying out the classification, as we
mentioned before, is to
have a dataset of examples that can allow the algorithm to
learn.
Once this is performed, the next step is to select from the dataset
the sample points that
are going to be used for the training of the model and the ones
that will serve to test it.
The idea of doing this is to prove that the model is learning how
to classify each region as
field, forest and clouds and is not learning by heart the whole
dataset. This learning
process can be graded taking into account the loss and accuracy
each iteration or “epoch”
of the training.
In figure 3.4 an example of a good learning process is shown, since
both training and
validation loss decrease on each iteration and the accuracy,
contrarily, increases.
Figure 3.4: Example of a good learning process’ loss and accuracy
graphic
30
As long as the learning process is not good enough, an improvement
of the model must
be carried out until it ensures that the model is not learning by
heart.
3.3.1.2. Classification quality estimation
Once the training is done, to truly find the best classification
model, a quality estimation of
all the possible models must be performed [7], so that the optimal
model can be chosen
giving the best results.
To compute this quality estimation, an image from the dataset
(previously and manually
classified) that has not been used in the training process of the
model, has to be
processed.
With the manual classification of the image and the one made by the
model on it, each
pixel of the image must be defined, according to its kind and for
each one of the three
classes, as one of the following 4 categories:
True positive (tp): corresponds to a pixel that is classified by
the trained model as
the class that we are examining and it matches with the previous
manual
classification of itself.
False positive (fp): corresponds to a pixel that is classified as
the class we are
examining but it actually belongs to any of the other
classes.
True negative (tn): corresponds to a pixel that is not classified
as the class we are
examining in both cases.
False negative (fn): corresponds to a pixel that is not classified
as the class we
are examining even though it belongs to that class according to the
dataset.
Once determined in which category belongs each pixel for each
class, the next step is to
compute some measurements.
Precision: It measures the probability that a pixel classified as
forest, field or cloud
is actually that.
Recall: Measures the probability that a pixel is classified as
forest, field or cloud.
F1-score: is the harmonic mean between precision and recall.
The higher the results of the measurements are the better is the
model. These are,
therefore, the indexes that we used to determine which model was
the best choice for the
classification that we wanted to achieve.
31
3.3.1.3. Definition of the model
After carrying out all the previous steps, training and qualifying
some model patterns, the
optimal choice created for our purpose is described on the
following lines.
As explained on section 3.1.1, Keras is structured as a model in
which the layers must be
added in a very easy way. Specifically in this model, those layers
are:
Convolution 2D: In this case, it has 16 different random filters
3x3 that make
convolution on the 2 dimensions input units in order to extract
main features of
them.
Maxpool 2x2: For each 2x2 region of the previous layer’s images it
chooses the
pixel with more information for the classification, reducing by 2
the size of the
images.
Dropout 0.25: It sets a fraction 0.25 of the input units to 0,
which prevents
overfitting.
Flatten: Flattens the input without affecting the batch size.
Dense n: It fully connects the input resulting on an n batch
size.
Activation ReLu: Is a function that sets to 0 all the negative
values and leaves the
same the positive ones. It doesn’t affect the size or the batch
size.
Activation Softmax: Is a function that sets all the input values in
the range [0, 1]. It
doesn’t affect the size or the batch size.
The representation of this model on a more visual way is on figure
3.5.
from keras.models import Sequential from keras.layers.core import
Dense, Dropout, Activation, Flatten from keras.layers.convolutional
import Convolution2D, MaxPooling2D from keras.optimizers import
Adam model = Sequential() nb_classes = 3 input_shape =
X_train.shape[1:] model.add(Convolution2D(16,3,3,
input_shape=input_shape, activation = 'relu', name = 'conv1'))
model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25))
model.add(Convolution2D(16,3,3, activation = 'relu', name =
'conv2')) model.add(Convolution2D(16,3,3, activation = 'relu', name
= 'conv3')) model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(8))
model.add(Activation('relu')) model.add(Dense(nb_classes))
model.add(Activation('softmax')) sgd = Adam(lr=0.001, beta_1=0.9,
beta_2=0.999, epsilon=1e-08)
model.compile(loss='categorical_crossentropy', optimizer=sgd,
metrics=['accuracy'])
32
Following the image, the process can be more easily
understood.
The 16x16 pixels input goes through some layers that extract the
best features of it and,
thereafter, the model gives a probability of the image of being
forest, field or clouds type.
Afterwards, once gone through the softmax activation, the model
decides which one of
them is more likely to be.
As seen on section 3.3.1.1, once the model is created and chosen it
must be trained until
the relation between the layers is the optimal to extract the most
important information of
the input image and classify it, thus, on the most accurate way
possible.
After that, the final and optimal weights used on each layer to
achieve the best
classification must be saved executing the following line:
Note: as we can see, the model weights are saved on an hdf5 format.
This is a
hierarchical data format designed to store and organize large
amounts of data.
3.3.1.3.1. Features extraction
Once the model is trained and saved, it is possible to look at the
input that maximizes
each filter, the features extracted from it, to understand how the
model is able to classify
well each picture [10].
This can be useful also for improving the model by erasing some
layers that can be
useless or changing them in the way that more features can be
extracted.
On figure 3.6 a compilation of the first convolutional layer
filters’ extracted features is
shown.
model.save_weights('model_weights.hdf5')
33
Observing the image above, we can notice the fact that the
convolutional filters try to
extract colours from the input units. We can conclude that the best
feature that can be
extracted from them is the colour.
In fact, if we take a look to the values of both bands for forest,
field and clouds, we will
see that they are so different that this feature can be enough to
distinguish between the
three classes. In figure 3.7 and 3.8 a proof of this can be seen
for some random input
data of a South American image.
Figure 3.6: First convolutional layer filter’s extracted features
for both bands
Figure 3.7: Band 4 values for forest, field and clouds
34
This is why we can say that our model is using mainly the colour
feature of the images to
carry out the classification.
3.3.1.3.2. Logistic Regression
Actually, when we watched at the features extracted, we realized
that it was easier for the
model to classify the images following a colour pattern rather than
a structure one, which
is what we were seeking for.
When we noticed that, we found out that we did not need a very hard
and complex model
to achieve it, so everything turned out much more simple.
In scikit-learn, there exists a simple model called Logistic
regression that is already
created and ready to use for classification and, since at the end
of the project we ended
up with a very easy one, we decided to make some experiments with
both, our model and
Logistic regression model, to compare the results and show that a
convolutional neural
network introduces a noticeable improvement.
This will be further exposed on the next chapter of this
document.
3.4. Visual validation on images
Once the model is chosen and saved, the next step is to compute the
code necessary to
carry out the classification process using the model.
This classification, to be truly useful, must be performed for a
brand new image from
which the model has not seen or used previously the dataset, which
means that the
model knows nothing about it.
First of all, since the satellite images tend to be very large
(usually 7631x7781 pixels) to
fully classify them at once could take a lot of computation time,
so cutting them into, for
example, 200x200 pixels pieces can considerably ease the
process.
Then, for the small 200x200 pixels image, each pixel has to be
classified by taking into
account its surroundings, which means taking the 16x16 (see section
3.2.1) pixels region
around every single one and make it go through the model to extract
its value.
This process must be taken along the whole image until it is all
fully classified.
The figure 3.9 tries to explain better this process by representing
it in a satellite image
without taking into account the real size of the image.
Figure 3.8: Band 5 values for forest, field and clouds
35
The part of the code able to carry out this classification is
defined below these lines:
import sys for l in range(win, x_final - win, it): for m in
range(win, y_final - win, it):
print l, m sys.stdout.flush() X_predict = [] if x_final-win-l >=
it and y_final-win-m >= it: x_fin = l + it y_fin = m + it elif
x_final-win-l < it or y_final-win-m < it: x_fin = x_f - win
y_fin = y_f - win if x_final-win-l > it: x_fin = l + it elif
y_final-win-m > it: y_fin = m + it for i in range(l,x_fin): for
j in range(m,y_fin): subimg =
landsat_img[(i-win):(i+win),(j-win):(j+win)]
X_predict.append(subimg.T) X_predict =
np.array(X_predict).astype(np.float32) X_predict /= 255.0 X_predict
= (X_predict - X_mean) / X_std predict_label =
model.predict_classes(X_predict, verbose=0) predict_label =
predict_label.reshape(x_fin-l,y_fin-m) output[l:x_fin,m:y_fin] =
predict_label
Figure 3.9: Representation of the classification process with the
trained model
36
On the previous code, x_final and y_final correspond to the height
and width of the 2
dimensional image, landsat_img is the combination of both B4 and B5
bands of the
satellite image to classify and output is the final result of the
classification for each pixel
of the input image (at the end must have the same size as the input
image).
Figure 3.10 represents how, after this process for the image shown
on figure 3.9, the
output should end up looking like, where blue corresponds to
forest, green to field and
red to clouds.
Notice that the model has some troubles detecting clouds. On the
following section this is
going to be discussed.
Figure 3.10: Output values after the classification process on
figure 3.9
37
4. Results
This chapter exposes the obtained results to the experiments
carried out in order to prove
the validation of the project.
As introduced in section 1.1, one of the main goals of this project
was to achieve
deforestation detection in a spatial way, although in section
3.3.1.3.2 it was introduced
the problem of the simplicity of the model.
For these reasons, the experiments carried out were done with the
purpose of evidencing
the improvement that the development of this project may have meant
for Terra-I.
4.1. Training with the final model and Logistic Regression
model
The first test was to prove the improvement that our model achieved
compared to the
already trained logistic regression model.
With this purpose, we decided to perform two experiments on the
satellite image shown
in figure 4.1 from some region in Brazil, in South America, which
we had already
manually classified.
Figure 4.1: South American satellite zone and image used for this
experiment
38
In order to perform the experiments, we used three different images
from South America
as a previous dataset to train and test the models, in between them
the image shown
above.
These images where manually classified before, performing the
following labellings:
Figure 4.2: labelling for the region 227_65_290 Figure 4.3:
labelling for the region 227_65_172
Figure 4.4: labelling for the region 1_86
39
4.1.1. Training with upper half
The first experiment consisted on training both models separately
with the upper half of
the three images and then using them to classify some region on the
lower half of the
image shown on figure 4.1. The reason of choosing some region on
the lower half is to
make sure that the models would not be able to see its
classification during the training.
This region is shown on figure 4.5.
The whole process can be found on the notebook LoadWeights Training
final image
WITH and WITHOUT LogisticRegression_up HTML.
4.1.1.1. The model created during this project.
After training our model with some random 16x16 pixels examples
from the dataset of the
three images, the classification report of the model’s precision
gave the results shown on
table 4.1.
accuracy 0.931111111111
Table 4.1: classification report on our model for the 1 st
training
This table shows precision, recall and f1score resulting from the
testing of 3000 examples
of forest, 3000 of field and 3000 of clouds. These kinds of
measurements allow grading
the accuracy of a certain model (see section 2.2).
Figure 4.6 Figure 4.5: Region classified on the experiment
“training with the upper half”
40
After the validation on the region shown on figure 4.5, the result
was the one shown in
figure 4.6.
4.1.1.2. Logistic Regression
For the case of logistic regression, after the training with the
same 9000 examples
employed on the previous section, the classification report of the
model’s precision gave
the results shown on table 4.2.
precision recall f1-score support
accuracy 0.879444444444
Table 4.2: classification report on logistic regression for the 1
st training
After the validation on the same region of figure 4.5, the result
was the one shown in
figure 4.7.
Figure 4.6: Result of the classification with our model for the 1
st training
41
Comparing the results for both cases, even though there is not much
difference, at least
on the measurements values it is clear that the model we created
gives better results in
this case of classification than the logistic regression.
4.1.2. Training with lower half
The second experiment consisted on trying the same thing as before
but the other way
around. This time the training would be performed with the lower
half of the three images
and then the classification it would take part on some region on
the upper half of the
image. This region is shown on figure 4.8.
The whole process can be followed on the notebook LoadWeights
Training final image
WITH and WITHOUT LogisticRegression_down HTML.
Figure 4.7: Result of the classification with logistic regression
for the 1 st training
Figure 4.7 Figure 4.8: Region classified on the experiment
“training with the lower half”
42
4.1.2.1. The model created during this project
After training our model with new random examples from the dataset
of the three images,
the classification report of the model’s precision gave the results
shown on table 4.3.
precision recall f1-score support
accuracy 0.957222222222
Table 4.3: classification report on our model for the 2 nd
training
These results are even better than the ones obtained before, on the
previous experiment.
And so is the classification done with this model on the upper
region shown on figure 4.8,
as it can be seen on the figure 4.9.
4.1.2.2. Logistic Regression
For the case of logistic regression, as seen on the table 4.4, the
results are also better
than before, although are still worse than our model’s.
Figure 4.9: Result of the classification with our model for the 2
nd
training
43
Table 4.4: classification report on logistic regression for the 2
nd
training
The classification associated to this case is shown on figure
4.10.
As seen on the figure above, comparing that result to the
classification performed by our
model there is a great difference, worse than on the previous
experiment. However, the
measurement values have increased this time.
The reason why this can be happening is because the testing (and
therefore the
measurements) is computed for the three images, while the
validation only takes place on
one of them, which happens to be the one filled with more clouds
and, therefore, more
likely to be wrongly classified.
4.1.3. Results
After the performance of these two experiments, we can conclude
that the improvement
of our model in the classification is very noticeable, so it can
make a difference using our
model instead of logistic regression model to classify the
images.
precision recall f1-score support
accuracy 0.896555555556
Figure 4.10: Result of the classification with logistic regression
for the 2 nd
training
44
4.2. Deforestation detection
Another one of the main goals of this project, as stated on section
1.1, was to improve the
deforestation detection on South America and extend it later to the
whole tropics.
Since the project did not last very long, the extension to the
whole tropics was impossible
to carry out. However, it left open the possibility of doing this
in the future.
Nonetheless, at least the detection on South America could be
performed, and therefore
this experiment was carried out.
To achieve a good and accurate detection, we used 4 images from the
same place but on
different points in time. They were chosen from two different years
(2013 and 2015) and,
on each year, from the two different tropical seasons11.
The point of choosing images from the two seasons was to achieve a
good classification
on any time of the year.
4.2.1. Classification of all the images
In order to perform the classification, since the region examined
was the same as the
previous experiments, we chose the model that gave better results
on the tests, which is
the one trained with the lower half of the input images (see
section 4.1.2).
We did this because the model chosen was trained both with dry and
wet season images,
so that we could see if it was enough to perform the classification
for both seasons.
The classification, separated between wet season and dry season,
gave the following
results:
Figure 4.8: Classification for 2013 image from wet season
11https://en.wikipedia.org/wiki/Tropics
45
Figure 4.12: Classification for 2015 image from wet season
Figure 4.13: Classification for 2013 image from dry season
Figure 4.14: Classification for 2015 image from dry season
46
4.2.2. Detections
As it can be seen on the 4 classifications, the clouds are the
hardest to detect, although
we introduced the class to avoid the quality problems existing on
Terra-I before. However,
since these classifications by themselves are not going to show if
there has been
deforestation between the years of 2013 and 2015, it might not be a
problem.
More specifically, the reason why we decided to perform the
detections on four images
instead of two was actually to solve the possible mistakes that
could appear on the
classifications.
That is, if we only computed one detection (as the difference
between both years) the
failures due to wrong classification could be much more likely than
in the case that we
computed two.
For this reason, once the deforestation detection on both seasons
was separately
performed, obtaining the results shown on figure 4.15 and figure
4.16, the difference
between these two was computed to reach the most accurate
deforestation detection
possible.
As it can be seen in both figures, the detection consists on
assigning the value 0 to all the
places that have the same value in both years and to all the places
hidden by clouds, and
assigning 1 to the places where fields have appeared according to
the difference
between both years’ classifications.
Notice that on the detections there are some failures due to clouds
and forests that were
classified as field. If we only performed this first detection, its
accuracy would have been
really low.
Instead of this, performing the difference between these two
detections, although it would
give fewer deforestation fronts on the image, it would be more
precise even if, as we
have seen, one of the images is full of clouds.
The figures 4.17 and 4.18 show this difference on the images from
the year 2013 on top
and 2015 for the wet season. The figures 4.19 and 4.20 show the
same for the dry one.
Figure 4.15: Deforestation detection wet season Figure 4.16:
Deforestation detection dry season
47
Figure 4.17: Final detection on 2013 image for wet season
Figure 4.18: Final detection on 2015 image for wet season
48
Figure 4.19: Final detection on 2013 image for dry season
Figure 4.20: Final detection on 2015 image for dry season
49
After all the detections, as seen on the previous images, some of
the real detections are
very accurate and, although the clouds still introduce some
failures on them, the
improvement is very notable.
Notice that the 2013 image from the wet season is smaller than the
other three. This is a
problem of the satellite image, so it is normal that there is
detection where it should be
the end of the image.
The whole process can be found the notebook LoadWeights Training
final image
227_65_HTML.
4.2.3. Global Forest Watch
Global Forest Watch12 is an interactive online platform designed
with the aim of
monitoring forest and alert of new deforestation fronts.
It works with information from many organizations (including
Terra-I) that periodically
update the deforestation detection.
This platform updates a Tree Cover Loss around the whole world
every year with a very
high accuracy, but a year is too much time to react to new
deforestation fronts. This is
why Terra-I’s main goal is to achieve an accuracy as high as that
one but with an
updating every 16 days.
For this reason, to really observe the improvement that this
project might introduce on the
Terra-I, the last step of this experiment was to compare our
detections (on figure 4.21)
with the detections registered on by this yearly cover loss (shown
on figure 4.22)
amplified to the same region of the image.
As it can be seen on the images, the detections are pretty much
alike, despite some
differences probably due to the clouds of the images used on our
detections.
12http://www.globalforestwatch.org
Figure 4.21: Our detection Figure 4.22: Tree cover loss
detection
50
Notice that the region where the images differ from each other the
most is actually the
region that covers the cloud on the 2015 image from the wet
season.
Therefore, although is not perfect, the improvement on the
detection despite the clouds is
quite notable, so without the clouds it can be even better.
Thus, after all we can conclude that the detection on South America
is possible with this
new algorithm, even with cloudy images, with pretty good
results.
51
5. Budget
In this project no physical prototype has been designed, so no
components have been
needed. Therefore, this budget has been estimated regarding to the
number of hours
dedicated to the thesis in the different work packages.
The budget is evaluated at a cost of junior engineer, which has
been established at
8€/hour.
Task Weeks Mean number hours/week
Cost per hour (€)
Manual classification 1 20 8 160
Study of Neural Networks 4 20 8 640
Programming of the Neural Network
11 20 8 1760
Results and Final Report 4 25 8 800
TOTAL 20 22 - 4200€
6. Environment Impact
The main theme of this project is directly related to the
environment impact. As stated on
section 1, the Terra-I project started as a solution to the rough
habitat loss monitoring.
There exist other organizations that try to do the same as Terra-I,
only that many of them
update their information about forest loss within a very large
period of time, which many
times leads to belated reactions in front of new alerts.
Therefore, Terra-I and this project aim to be useful on habitat
loss alerts by its updating of
information every 16 days, so that reactions to increased
deforestation can be speeded
up by the government and environmental organizations.
53
7. Conclusions and future development
Regarding to the goals of the project, it started with the aim of
improving the deforestation
detection on Terra-I by exploiting the additional information that
the new satellite images
from landsat8 could give. Between them, for example, the 30m
resolution in front of the
250m resolution from the satellite used until now.
As a result, the process could be improved by taking also into
account the spatial
information of each pixel on the classification.
Also, a major goal was to find a solution to the clouds problem on
Terra-I and, even
though our solution is not perfect yet, we can say that this was
also tackled in with this
project.
Therefore, after all we could prove with this project the
usefulness of the option of a new
spatial analysis of the images and we opened it to the possibility
of carrying it out around
South America and, later, on the whole tropics.
Concerning the results of the experiments, the last point deserves
special attention: the
comparison with the Tree cover loss project. This is because it is
a very good detection
that takes place every year, so seeing that the detections produced
with our model and
that one are similar, gives hope to achieve one day the same
accuracy that they have
with this spatial analysis in South America, at least.
Related to the extension to the whole tropics, along the project we
realized that it was
going to take much more time than we expected, so at the end it
could not be performed.
This is because, due to the different vegetation that may be
present in Africa, South
America and Indonesia, it may be very difficult to find a good
model that could classify
images from the three continents.
However, at some time in this project we thought about how we could
reduce the
complexity of such a problem that could involve great computation,
and we actually found
an option. Since the vegetation patterns for the whole tropics are
very similar to each
other, it could be a good start to use few images from all types of
tropics vegetation and
try to detect the deforestation along that whole zone.
Finally, respecting the implementation on Terra-I, it takes much
more time than we had to
do the project, so it is something that it can be done in future
projects.
54
Bibliography
[1] L. Reymondin, A. Jarvis, A.Perez-Uribe, J. Touval, K. Argote,
A. Coca, J. Rebetez, E. Guevara, M. Mulligan. “A methodology for
near real-time monitoring of hábitat change at continental scales
using MODIS-NDVI and TRMM”. Terra-I, an eye on habitat change,
December 2012, pp. 4-9.
[2] M.I. Jordan, T.M. Mitchell. “Machine learning: Trends,
perspectives, and prospects”. Science, vol. 349,
no. 6245, pp. 255-260, July 15, 2015. DOI:
10.1126/science.aaa8415.
[3] Y. LeCun, Y. Bengio, G. Hinton. “Deep learning”. Nature, vol.
521, pp.436-444, May 28, 2015. DOI:
10.1038/nature14539.
[4] A. Karpathy. “Convolutional Neural Networks (CNNs/ConvNets)”.
[Online]. Available:
http://cs231n.github.io/convolutional-networks/
[5] A. Krizhevsky. “Convolutional Deep Belief Networks on
CIFAR-10”. August 2010.
[6] J. Rebetez. “Parasid, radar écologique pour l’Amazonie”.
Bachelor thesis, Haute Ecole d’Ingénierie et de Gestion du Canton
de Vaud, Switzerland, 2009.
[7] K. Georgy. “Terra-I on Earth Engine”. M.S. thesis, Haute Ecole
d’Ingénierie et de Gestion du Canton de Vaud, Switzerland,
2012.
[8] “Band designations for landsat satellites: Landsat 8
Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)”.
[Online]. Available:
http://landsat.usgs.gov/band_designations_landsat_satellites.php
[9] “Landsat 8 bands”. [Online]. Available:
http://landsat.gsfc.nasa.gov/?page_id=5377
[10] “How convolutional neural networks see the world”. [Online].
Available: http://blog.keras.io/how-
convolutional-neural-networks-see-the-world.html
"""
`dtype` is a gdal type like gdal.GDT_Byte
`options` should be a list that will be passed to
GDALRasterizeLayers papszO
ptions, like
["ATTRIBUTE=vegetation"]
ne):
"""
Loads the given shapefile with labelled polygon and rasterize it to
an image
Args:
model_dataset: the rasterized label image will have the same shape
as mo
del_dataset
label_fieldname: the shapefile attribute that contains the label of
each
polygon
56
label2id: if not None, a dict mapping label name to id. This should
cont
ain *all*
the name that will be encountered when loading this file (and
maybe
some that will not
Returns:
labels : an uint8 masked array containing the label of each
pixel
(unlabelled pixels are masked)
label2id : a dict mapping label name to id (the same as label2id if
it w
as passed)
shape_layer.SetAttributeFilter(None)
shape_layer.ResetReading()
label2id[label] = lid
assert len(np.setdiff1d(loaded_label2id.keys(), label2id.keys()))
== 0
# Use attribute filters to rasterize shapes with a given label one
by one an
d assign a different lid
# to each
OUCHED'])