Predicting Tropical Thunderstorm Trajectories Using LSTM1229533/FULLTEXT01.pdf · Bibliography 42. List of Figures ... FAR False alarm rate HSS Heidke skill score LSTM Long short-term

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Predicting Tropical Thunderstorm Trajectories Using LSTM

ISAK NORDIN STENSÖ

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Predicting TropicalThunderstorm TrajectoriesUsing LSTM

ISAK NORDIN STENSÖ

Master in Computer ScienceDate: June 14, 2018Supervisor: Iman SayyaddelshadExaminer: Örjan EkebergSwedish title: Att använda LSTM för att förutsäga tropiskaåskväders banorSchool of Computer Science and Communication

iii

Abstract

Thunderstorms are both dangerous as well as important rain-bearingstructures for large parts of the world. The prediction of thunderstormtrajectories is however difficult, especially in tropical regions. This islargely due to their smaller size and shorter lifespan.

To overcome this issue, this thesis investigates how well a neuralnetwork composed of long short-term memory (LSTM) units can pre-dict the trajectories of thunderstorms, based on several years of light-ning strike data. The data is first clustered, and important featuresare extracted from it. These are used to predict the mean position ofthe thunderstorms using an LSTM network. A random search is thencarried out to identify optimal parameters for the LSTM model.

It is shown that the trajectories predicted by the LSTM are muchcloser to the true trajectories than what a linear model predicts. This isespecially true for predictions of more than 1 hour. Scores commonlyused to measure forecast accuracy are applied to compare the LSTMand linear model. It is found that the LSTM significantly improvesforecast accuracy compared to the linear model.

iv

Sammanfattning

Åskväder är både farliga och livsviktiga bärare av vatten för storadelar av världen. Det är dock svårt att förutsäga åskcellernas banor,främst i tropiska områden. Detta beror till större delen på deras mind-re storlek och kortare livslängd.

Detta examensarbete undersöker hur väl ett neuralt nätverk, be-stående av long short-term memory-lager (LSTM) kan förutsäga åskvä-ders banor baserat på flera års blixtnedlslagsdata. Först klustras datan,och viktiga karaktärsdrag hämtas ut från den. Dessa används för attförutspå åskvädrens genomsnittliga position med hjälp av ett LSTM-nätverk. En slumpmässig sökning genomförs sedan för att identifieraoptimala parametrar för LSTM-modellen.

Det fastslås att de banor som förutspås av LSTM-modellen är myc-ket närmare de sanna banorna, än de som förutspås av en linjär mo-dell. Detta gäller i synnerhet för förutsägelser mer än 1 timme fram-åt. Värden som är vanliga för att bedöma prognosers träffsäkerhet be-räknas för att jämföra LSTM-modellen och den linjära. Det visas attLSTM-modellen klart förbättrar förutsägelsernas träffsäkerhet jämförtmed den linjära modellen.

Acknowledgements

I would like to thank my supervisor at KTH, Iman Sayyaddelshad,who has guided me through this project.

Thank you to all of Ignitia, for granting me the opportunity to workon this project, and for a very memorable visit to Ghana. I would liketo especially thank Andreas Vallgren, Qiang Li and Daniel Salvador,who have been a huge support at all times.

Thank you to Marcus Wallberg as well, for the collaboration withthe clustering, as well as the many talks about our studies and othertopics.

A final thank you to Ezeddin Al Hakim, Linn Bergelid, Berk Gedik,Satucahaya Langit and Jonathan Ohlsson for their feedback.

v

Contents

1 Introduction 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . 21.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Ethical Concerns . . . . . . . . . . . . . . . . . . . . . . . 31.4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Thunderstorms . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Merges and Splits . . . . . . . . . . . . . . . . . . . 9

2.3 Thunderstorm Tracking and Nowcasting . . . . . . . . . 92.4 Sequence Learning . . . . . . . . . . . . . . . . . . . . . . 10

2.4.1 RNN . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.2 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Method 163.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 173.4 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4.1 Random Search . . . . . . . . . . . . . . . . . . . . 203.4.2 Layers . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Evaluation of LSTM . . . . . . . . . . . . . . . . . . . . . . 213.5.1 Linear Regression . . . . . . . . . . . . . . . . . . . 213.5.2 Other Scores . . . . . . . . . . . . . . . . . . . . . . 21

vi

CONTENTS vii

4 Results 244.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Random Search . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Discussion 355.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Conclusion 40

Bibliography 42

List of Figures

2.1 The life cycle of a storm . . . . . . . . . . . . . . . . . . . 52.2 Density-reachability and density-connectivity . . . . . . . 72.3 A basic RNN unfolded across two time steps . . . . . . . 112.4 A memory cell of an LSTM . . . . . . . . . . . . . . . . . . 14

3.1 Example of a storm grid . . . . . . . . . . . . . . . . . . . 22

4.1 Clustering of storms, August 3 2012 . . . . . . . . . . . . 254.2 Plots of random search loss for different parameter set-

tings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Scatter plots of random search loss for different param-

eter settings . . . . . . . . . . . . . . . . . . . . . . . . . . 294.4 Storm visualizations . . . . . . . . . . . . . . . . . . . . . 324.5 Scores for 15 minutes . . . . . . . . . . . . . . . . . . . . . 334.6 Scores for 30 minutes . . . . . . . . . . . . . . . . . . . . . 334.7 Scores for 1 hour . . . . . . . . . . . . . . . . . . . . . . . . 344.8 Scores for 2 hours . . . . . . . . . . . . . . . . . . . . . . . 34

viii

List of Tables

2.1 Contingency table for thunderstorm forecasts . . . . . . . 15

3.1 Chosen parameter values for the DBSCAN algorithm . . 173.2 Number of sequences in data . . . . . . . . . . . . . . . . 20

4.1 Random search parameters with the smallest loss . . . . 264.2 Parameters used for the final LSTM . . . . . . . . . . . . . 304.3 RMSE for Linear Regression and LSTM . . . . . . . . . . 304.4 Total scores for linear regression and LSTM . . . . . . . . 31

ix

List of Abbreviations

ANN Artificial neural network

CSI Critical success index

DBSCAN Density Based Spatial Clustering of Applications withNoise

FAR False alarm rate

HSS Heidke skill score

LSTM Long short-term memory

POD Probability of detection

RNN Recurrent neural network

TITAN Thunderstorm Identification, Tracking, Analysis andNowcasting

x

Chapter 1

Introduction

Thunderstorms cause serious problems around the world. It is esti-mated that in total 24,000 people die each year, with 240,000 gettinginjuries [20], and they can lead to dangers such as wildfires [13]. Atthe same time, they are very important for redistributing the water ofthe Earth [40], as they generate large amounts of rainfall. Parts of WestAfrica get 90% of their rainfall from thunderstorms [25].

Thunderstorms are especially common in warmer climates, suchas West Africa [39, 40, 42]. Sub-Saharan Africa is the place on Earthwith the most thunderstorm days per year [8]. As such, accurate thun-derstorm prediction in West Africa is important. A notification that astorm will appear can mean the opportunity to get to safety, or theknowledge that watering crops that day is unnecessary. However,thunderstorm forecasting remains extremely difficult, as thunderstormsare short-lived and highly volatile. As such it is only possible to fore-cast details at most 1 hour into the future, also called nowcasting, be-yond which forecasts become probabilistic [9].

Today’s forecasting models are generally numerical, and often useensemble methods where the initial conditions and numerical repre-sentation of the atmosphere are varied among the members in the en-semble [16]. However, machine learning is an area that is becomingincreasingly researched in this field. Yet, there is still a lack of researchinto machine learning models for prediction of thunderstorms. Thisis mainly due to that most research has been done on European andNorth American weather, where storms are larger and easier to pre-dict. Tropical storms are smaller, and thus more susceptible to chaos[32].

1

2 CHAPTER 1. INTRODUCTION

Despite this, machine learning has been used with great success fortrajectory prediction in other fields. Examples include hurricanes [2,33] and people [1]. This is often done by using recurrent neural networks(RNNs). They can learn and predict sequences, which is very useful inforecasting trajectories [17].

The long short-term memory (LSTM) is an improvement on the RNN,which consistently gives better results for sequence learning than stan-dard RNNs [30]. It manages to remember information longer, as wellas ignore noise.

1.1 Research Question

The question that this study will try to answer is:

How can LSTMs improve thunderstorm trajectory prediction ina 6 hour window?

This project sets out to explore to what degree machine learningcan predict the trajectories of thunderstorms in the short term (alsoknown as nowcasting). The World Meteorological Organization de-fines nowcasting as up to 6 hours, so this is the value used in thisstudy [41]. Reasonable predictions beyond 6 hours seem unlikely. Amachine learning algorithm will be implemented, and used to predictthe trajectories up to 6 hours into the future.

The project will use a data set containing lightning strike data (seesection 3.1). From this, thunderstorms will be identified, and a modelwill be developed using this data to forecast the trajectories of thestorms. These trajectories are defined as the center point of the stormover time.

1.2 Limitations

This study focuses on predicting the trajectories of thunderstorms. Otheraspects, such as lifespan, size and shape of the storms are not pre-dicted. These trajectories are only predicted up to 6 hours. Thunder-storm lifespans are generally short, but large storms (such as squalllines) can live for several hours or even days, which motivates extend-ing the prediction period this far [11].

CHAPTER 1. INTRODUCTION 3

The only machine learning algorithm that will be studied is LSTM.This is the model that has been used most extensively for trajectoryprediction. Apart from LSTM, a linear regression model will also beimplemented as a testbed against which the LSTM can be compared.

1.3 Ethical Concerns

The are no major ethical concerns associated with this study. Data wasgathered from weather stations. There is no knowledge on how it wasgathered there, but there is no known cause for concern. The results ofthis study will not be usable for other purposes than predicting thun-derstorms, and do not represent a concern either.

Greater accuracy in tropical thunderstorm prediction should on thecontrary be beneficial to humanity. Farmers can plan their wateringschedules, generating more food and avoiding starvation. People willhave a greater opportunity to get to safety if they are in a dangerousposition, such as in an open field or on water.

1.4 Structure

This thesis is structured as follows. In chapter 2, the relevant theory ofthe thesis is explained. Related studies are also presented. In chapter3, the data set is explained, as well as the chosen algorithms and howthe results are evaluated. In chapter 4, the results of the thunderstormnowcasting are presented, and they are discussed in chapter 5. Finally,the work and results are summarized in chapter 6.

Chapter 2

Background

In this chapter, the theory behind thunderstorms is first presented insection 2.1. After that, different ways of clustering lightning stormsare presented in section 2.2. In section 2.3, methods currently used fortracking and nowcasting thunderstorms are given, followed by meth-ods commonly used for learning sequences in section 2.4.

2.1 Thunderstorms

Thunderstorms are most common during spring and summer, and es-pecially during the mid-afternoon. They occur most frequently in thetropics [39], and tend to be smaller than those in temperate zones [32].

Thunderstorms consist of one or several cells. A cell is a thunder-storm with one updraft. It starts from rising warm and moist air,which creates the updraft [39]. This is often due to sunlight heatingthe ground, which in turn heats the air. This creates a cumulus cloud,which later develops into a cumulonimbus cloud [8]. Because of this,the first stage of a thunderstorm is called the cumulus stage. The risingair cools, and then creates rain and a downdraft. This is the secondstage; the mature stage. It is in the mature stage that lightning is pri-marily generated [8]. Eventually, the downdraft cuts off the supply ofwarm air for the updraft, and the cloud starts to dissipate. This is thethird and last stage; the final stage. An illustration of the life cycle ofa thunderstorm can be seen in Figure 2.1. In total, each cell lives foraround 30 minutes [39]. Larger storms containing several cells, suchas squall lines, can live for several hours or more [11].

At times, the cold air in the downdraft may force warm air to rise,

4

CHAPTER 2. BACKGROUND 5

Figure 2.1: The life cycle of a storm. The first stage is the cumulus stage,the second is the mature stage, and the third is the final stage. [8]

which creates a new cell [8]. Thus, it is common for cells to gener-ate new cells around it, meaning that many storms are comprised ofseveral cells [39].

Most lightning is between the different polarities of the cloud. Thisis called intracloud. A similar phenomenon can occur between clouds,which is called intercloud. The rest are ground strikes, of which 90%are negative, meaning that they originate in the negatively chargedpart of the cloud. A few of these fail to complete their path to theground, becoming air discharges [8].

Sometimes, thunderstorms will group together to form a long line.Such a line is called a squall line and can be 300-500 km long, and tendto form in a north to south manner [25].

2.2 Clustering

By grouping lightning strikes, it is possible to get an approximation ofthe underlying thunderstorms. While there are several methods of do-ing this, one of the most prominent is clustering. Clustering is a method

6 CHAPTER 2. BACKGROUND

of identifying similarities in data sets, and is generally unsupervised,meaning that there is no need for already labelled data. It works bygrouping the data into clusters, where the similarity within a cluster ismaximized and the similarity between clusters is minimized. [19]

Most clustering methods use the distance between the data pointsas a similarity measure. This will tend to create spherical clusters, butthis is not always sought for. Density-based clustering avoids this bygrowing clusters as long as the density is high enough, generally basedon predefined values. This allows the clusters to take any shape.

Another type of clustering is grid-based clustering, where the data isdivided into a grid and grid areas are connected to form clusters. Byusing grids for the clustering, it is possible to cluster much faster thanotherwise. It is also possible to combine these methods, see e.g. [19].

Clustering is very common when grouping lightning strikes intothunderstorms. Kohn et al. [27] first created a grid from the lightningdata, and then used a hierarchical k-means to cluster the areas in thegrid. Every grid area was 100 km2, and areas containing less than oneflash every 15 minutes were set to be empty.

Lakshmanan and Smith [29] also created a grid from the lightningdata, and then defined two constraints for the clustering. The first wasthat every area should be clustered with other areas that were similarin value space, and the second was that areas should have as manyneighbors (areas within a predefined distance) in the same cluster aspossible. A cost was calculated using these constraints, and was thenminimized. This method was presented as a general purpose method,that could either work with radar data or lightning strike data.

Another grid-based solution can be found in the study by Betz etal. [4], as they divided the data into grids and identified a cell whenthe number of strokes per area was above a given value. The borderwas thus defined when the number of strokes per area fell below thisnumber.

2.2.1 DBSCAN

While the previously mentioned studies present good results, moststudies use density-based clustering for identifying thunderstorms.

Density-based methods have the advantage of being able to findarbitrary shapes in the data. Further, they can also identify outliers,and ignore them in the clustering. A third advantage is that they do


not require a predefined number of clusters, but can find any number[19]. This makes them very interesting for identifying thunderstorms,as it is rarely known how many thunderstorms there are at any givenmoment.

The most popular density-based clustering method is DBSCAN. Itwas presented by Ester et al. [14], and has proven itself able to effi-ciently cluster very large databases [5].

DBSCAN works by identifying core points, that have enough pointsin their neighborbood. The neighborhood is defined by the ε-distance, adistance that is predefined before running the algorithm. A core pointis defined by having at least MinPts other points in its ε-neighborhood.MinPts is also set beforehand. Ester et al. [14] gave the following defi-nitions of relations between points:

Definition 1. (Directly density-reachable) A point p is directly density-reachable from the core point q if p is within the ε-neighborhood of q.

Definition 2. (Density-reachable) A point q is density-reachable fromthe core point p if there is a chain of objects p1, ..., pn, p1 = p, pn = q suchthat pi+1 is directly density-reachable from pi. See Figure 2.2, where mis directly density-reachable from p, and q is directly density-reachablefrom m. Thus q is density-reachable from p.

Definition 3. (Density-connected) Two points are density-connected ifthere is a core point o such that both points are density-reachable fromo. See Figure 2.2 where s and r are density-connected, since both aredensity-reachable from the core point o.

Figure 2.2: Density-reachability and density-connectivity. [19]


Definition 4. (cluster) A cluster C is a non-empty subset of the dataset, satisfying the following two conditions:

1. ∀p, q : if p ∈ C and q is density-reachable from p, then q ∈ C.

2. ∀p, q ∈ C : p is density-connected to q.

Definition 5. (Noise) Noise is the set of points that don’t belong to anycluster.

The DBSCAN algorithm starts with a random point. If it has MinPts ormore points in its ε-neighborhood it is classified as a core point, and allpoints in its neighborhood are added to the new cluster. These pointsare then processed, to see if they are core points. If they are, theirneighborhoods are also added. If not, they are border points of thecluster. When the cluster has grown to its maximal extent, the sameprocess is repeated for the points not yet clustered. The algorithm isfinished when all points have been processed [5, 14, 19].

As such, there are two parameters that need to be set: MinPts andε. These can however be quite difficult to set [19]. It is also true thatDBSCAN can have difficulties clustering data when the clusters havedifferent densities [5].

DBSCAN is likely the most popular algorithm for clustering light-ning strikes, due to its ability to identify clusters with different shapesand ignore noise. Often it is modified, however, in some way to beable to handle different constraints of the problem. This can be seenin the study of Juntian, Shanqiang, and Wanxing [24]. They calculatedthe relationship between every point in a 30 minute window, and thenclustered them. They tested various parameter values, and decidedupon ε = 0.5 andMinPts = 3 for large data sets (> 10, 000) and ε = 0.2

and MinPts = 4 for other data sets. Doing this allowed them to trackthe storms using a sliding window.

A common solution is to use two ε-values. This allows for defin-ing two distances, most commonly for spatial and temporal distance.This can be seen in the work of Hutchins, Holzworth, and Brundell[22]. They used DBSCAN on lightning strike data using the latitude,longitude and time of every strike. The first ε-value defined the spa-tial distance, and the second the temporal distance. They arrived atε = 0.12, εtime = 18 minutes and MinPts = 2. Hutchins and Holz-worth [21] built on this, and used it for clustering lightning strikesfrom two different data sets, and presented good results.


Birant and Kut [5] presented their own version of DBSCAN – ST-DBSCAN. This aimed to solve several problems with DBSCAN, in-cluding identifying clusters with differing densities. They also usedtwo ε-values, one for the temporal distance and one for all other fea-tures. It was tested on a data set containing seawater characteristics,and was proven to perform well.

Another variant can be found in the study of Matthews and Trostel[31]. They used DBSCAN on radar data, which was in a polar grid.They then used the clustering to track the storms. This can thus beseen as a combination of grid-based and density-based methods.

2.2.2 Merges and Splits

One problem that many methods of identifying storms face, regard-less of the method that is used, is the problem of handling merges andsplits. Storms can over time either split into smaller storms, or mergeinto larger ones. This presents a problem when identifying and track-ing storms, as it can be difficult to label a storm as merging or splitting.Betz et al. [4] state that their method had difficulties with this prob-lem, and Juntian, Shanqiang, and Wanxing [24] neither present a wayof handling the problem.

One method that managed to solve it, however, is TITAN [10]. TI-TAN is commonly used for forecasting thunderstorms based on radardata in a grid. A storm is then defined as a contiguous set of grid areasin an image, where the radar reflectivity is above a certain threshold.These storms are then linked between radar images at different timesby minimizing the paths and volume differences between storms. Itaddresses merges by forecasting storms that terminate. If this forecastis within another storm in the next time step, then TITAN defines itas a merge instead. Regarding splits, large storms are forecasted, andif they cover newly created storms in the next time step, they are re-garded as split storms.

2.3 Thunderstorm Tracking and Nowcasting

Once storms have been identified it is often interesting to nowcastthem. This is a common problem, but most studies define their ownmethods [4]. It can thus be done in several ways.


Radar data was the data that was first used [27] for nowcastingthunderstorm trajectories. Later, lightning strike data has also becomepopular, as well as satellite images.

Approximating the trajectory of a thunderstorm, either for predic-tion or identifying the storm’s previous trajectory, is often done usinglinear regression. Kohn et al. [27] used lightning data to identify andtrack the storms, and then looked at two consecutive time frames andestimated the future motion from them.

Another way was presented by Lakshmanan, Herzog, and King-field [28]. They described a way of improving precalculated trajecto-ries, consisting of clusters of cells, in which they calculated the Theil-Sen trajectory for every such cluster. This is a linear trajectory, whichwas chosen for its simplicity. They used the Euclidean distance, and amaximum temporal distance of 10 minutes. If a cell was further awaythan that from all cells in its trajectory, it was labelled as a separatetrajectory. They then went through every cell, and calculated the Eu-clidean distance to the slopes of the closest trajectories. They updatedthe clusters, by moving cells to their closest slopes, pruned the trajec-tories and repeated. Pruning in this case means merging trajectoriesthat were very similar. They showed that this method worked well onstorms defined on small and mid-sized scales, but performed worseon larger scales (1000 km2).

To forecast the storms TITAN assumes that they move in a straightline, and that the growth or decay of the storm is linear [10]. It useslinear regression to find the slope of the trajectory, with older valuesbeing weighted exponentially less the further back in time they are.It addresses merges by using the averages of the parents, weightedbased on their volume. Splits are addressed by using the parent of thestorm, but translating its history to coincide with the child that is to beforecast.

2.4 Sequence Learning

A subset of machine learning algorithms deal with sequence learning.The focus of this area is to learn on sequences of data. This can be usedfor problems such as classification and next-step prediction. Next-stepprediction is of particular interest to this study.

Bontempi, Le Borgne, and De Stefani [6] presented several types


of multivariate time-series forecasting. One such type is vector autore-gressive models, which model linear relationships. They are howeverincapable of modelling non-linearity, and they are unsuitable for largedata. They are also designed to predict the same attributes as the in-put they receive. Another type presented is recurrent neural networks(RNNs). They can model non-linearity, and recent improvements suchas long short-term memory (LSTM) have made them more viable. RNNsand LSTMs are described further below.

2.4.1 RNN

An RNN is a variant of an artificial neural network (ANN). They havebeen very successful for a variety of tasks, especially when it is diffi-cult to understand the underlying features of the problem [30]. Thestandard ANN is a multilayer perceptron, but this can only map frominput to output vectors. They lose their current state after every run.An RNN, however, uses its entire history, as well as input, to generateoutput [17]. This is done by having a hidden state, which is calculatedusing the input of the current time step, as well as the hidden state inthe previous time step. The output is calculated from the hidden states[30]. As such, they can learn on all history for every output, since ev-ery hidden state contains information about all previous hidden states.A basic example can be seen in Figure 2.3.

Figure 2.3: A basic RNN unfolded across two time steps. [30]


An example can be seen in the study of Moradi Kordmahalleh,Gorji Sefidmazgi, and Homaifar [33]. They used a sparse RNN to fore-cast hurricane trajectories. This allowed the network to take less timeprocessing the data and generalize better than a fully-connected net-work. When predicting the trajectory of a hurricane, they selected themost similar hurricanes to the one to predict for, and used them asinput into the RNN. Similarity was calculated by comparing the lo-cations of the first 10 observations of hurricanes. The output was thesequence of locations that the hurricane would be in. They tested thismethod on a group of large historical hurricanes, and showed that itperformed well for both one-step ahead and two-step ahead, with alow mean absolute error.

Alemany et al. [2] expanded on the above work. They wanted touse an RNN that could learn from all types of hurricanes. The abovework used several constraints, including the fact that hurricanes mustnot turn back on themselves. The data that they used were the lon-gitude, latitude, wind speed and pressure of the hurricane in everyobservation. Instead of using the longitude and latitude directly, theycalculated the distance and direction. This allowed them to learn onrelative values, instead of the absolute position. They also normalizedthe data, as they claimed that this makes the model learn better. Theyalso plotted the locations of the hurricanes on a grid, in order to avoidsmall truncation errors that could potentially grow large in the net-work. They then compared their RNN to the sparse RNN of MoradiKordmahalleh, Gorji Sefidmazgi, and Homaifar [33], and showed thattheir new RNN performed much better.

There are a few problems with RNNs however. One main problemis that RNNs can suffer from vanishing or exploding gradients. Thismeans that the influence of a given input will either grow very largeor become very small as it propagates through the network. There areseveral solutions, of which the most popular is the LSTM. They avoidthis gradient problem, and are capable of remembering the state forlonger as well as ignoring noise more. [17, 30]

2.4.2 LSTM

The LSTM functions much like a standard RNN, but uses so-calledmemory cells. These replace the ordinary hidden nodes in an RNN,and allow the LSTM network to store information over a long time.


A diagram of a memory cell can be seen in Figure 2.4. A memorycell contains a series of nodes and gates, all with their own weightparameters [30]:

Input node The input node takes the input and the previous hiddenstate. The activation function is usually tanh.

Input gate The input gate is a sigmoidal activation function, whichtakes the same input as the input node. This is then mul-tiplied with the result of the input node. If the value ofthe input gate is a 0, the input is cut off, and if it is a 1the input is unaffected.

Internal state The internal state is calculated by adding the product ofthe input node and gate to the previous internal state.

Forget gate The forget gate was added by Gers, Schmidhuber, andCummins [15]. It allows for forgetting the previous in-ternal state. It works in a similar way as the input gate,by being multiplied to the previous internal state andpotentially setting it to 0.

Output gate The output gate is calculated in the same way as theother gates. It is then multiplied to the internal state,after that has been activated by a tanh function. The re-sult is the hidden state in this time step.

LSTMs can also be used for trajectory prediction. Alahi et al. [1] usedLSTMs for predicting human trajectories. The problem with an LSTMis that it can learn sequences, but cannot capture dependencies be-tween correlating sequences. However, often objects will affect thetrajectories of other simultaneous objects, such as people in a crowd.This study used one LSTM per person, but learned the same weightsfor all. The hidden states of all proximate people were pooled to-gether and used as an input in the next time step. They used whatthey call a social hidden-state tensor, which was embedded into vec-tors for use as input. This model outperformed both standard LSTMsand more hand-engineered social models for trajectory prediction inlarge crowds.

LSTMs have also been used for predicting weather phenomena. Shiet al. [34] used LSTMs to predict precipitation. They did so by using


Figure 2.4: A memory cell of an LSTM. [17]

radar data. The map was divided into a 2D grid, with every grid areacontaining different measurements. Every time step was thus repre-sented by a 3D tensor. The goal was to predict the followingK tensors.They proposed a convolutional LSTM, where every input, output andinternal state was a 3D tensor. When tested, it performed much betterthan a fully connected LSTM.

A problem can arise when trying to predict values several stepsaway in time. This is known as multistep-ahead prediction, and can bedone in several ways. Cheng et al. [7] compares different methods.One common method is to use a series of data to predict the next value.Then this value is added to the known data, to predict the following,and continuing until the target step is reached. This method is knownas multi-stage prediction. Another approach is to build a separate modelfor each step that is to be predicted, and is known as independent valueprediction. The study tests these methods using RNNs, and argues thatusing multi-stage prediction leads to error accumulation over time,thus giving worse results than using independent models.


Table 2.1: Contingency table for thunderstorm forecasts.

ObservedTrue False

PredictedTrue a bFalse c d

2.5 Measures

For verifying the correctness of a forecast, there are several differentmeasures that can be used. Many of these assume a binary (yes/no)forecast. Thus, in order to use them, the prediction must be changedinto such an answer. By using the contingency table that can be seenin Table 2.1, these measures can be expressed as follows.

One such score is the probability of detection (POD) or hit rate (equa-tion 2.1). This score measures the ratio of all observed thunderstormsthat were correctly predicted. It can be compared to the false alarm rate(FAR), which measures the the percentage of the predicted thunder-storms that were false alarms (see equation 2.2). A similar measure tothe POD is the critical success index (CSI). This takes into account thefalse alarms as well, however (equation 2.3). [12]

POD =a

a+ c(2.1)

FAR =b

a+ b(2.2)

CSI =a

a+ b+ c(2.3)

A different measure is the Heidke skill score (HSS) [23]. This skill scoremeasures how much the result depends on chance. A score of 1 meansthat the forecast is perfect, while 0 means that a random guess shouldhave the same result (see equation 2.4).

HSS =2(ad− bc)

(a+ b)(b+ d) + (a+ c)(c+ d)(2.4)

Chapter 3

Method

In this chapter the methodology used in this study is presented. Firstthe data set is briefly explained in section 3.1. The clustering of light-ning strikes is presented in section 3.2, followed by a description ofhow features were calculated from said clusters in section 3.3. TheLSTM that was used for predicting the storm trajectories is explainedin section 3.4, and finally the evaluation of the predictive model is pre-sented in section 3.5.

3.1 Data Set

The data set used in this thesis is from the GLD360 data set, which isprovided by Vaisala and contains lightning strike data from across theworld [38]. The data set used here contains 37, 500, 000 recorded light-ning strikes in West Africa. Associated with each strike is a position inlongitude and latitude as well as other features, of which some wereused for the models and are described below.

The strikes were recorded by several sensors measuring the arrivaltime and angle of impulses generated by lightning strikes. The loca-tion of the strike, together with other features, was then calculated byusing the data from all the sensors that recorded the strike.

3.2 Clustering

In order to identify the separate thunderstorms, the lightning datawas clustered by using the DBSCAN algorithm, which was described

16

CHAPTER 3. METHOD 17

in section 2.2.1. By doing this, every strike was associated with ex-actly one label, and several strikes were defined as noise (not part of astorm).

The strikes were clustered based on spatial proximity, defined bythe longitude and latitude of the strikes, and on temporal proximity,defined by their timestamp, as seen in previous studies [21, 22]. Thus,two separate distance parameters were used; one spatial and one tem-poral. The chosen parameters can be seen in Table 3.1. TheMinPts pa-rameter was chosen to be as realistic as possible. Larger than 3 wouldmiss storms, as well as border points. On the other hand, using 1 or2 would generate a large number of very small storms. These stormscannot be learned on, so 3 seems to be a good compromise. The othertwo values were found experimentally (See section 4.1).

Table 3.1: Chosen parameter values for the DBSCAN algorithm.

Parameter ValueSpatial ε-distance 0.15 degrees

Temporal ε-distance 20 minutesMinPts 3

The clustering was evaluated by studying several cases in the data. Asimilar approach was seen in the studies of Kohn et al. [27] and Tuomiand Larjavaara [37]. The cases were selected to include both large andsmall storms. By doing this, storms could be evaluated on whetherthey behaved like real thunderstorms. Care was made to keep stormsas separate as possible, while not having an excessive number of smallstorms that should be merged into a larger one.

3.3 Feature Extraction

After every strike had been associated with a storm, or had been de-fined as noise, these clusters were processed to find features for thestorm as a whole. To do this for every storm, a time sliding windowwas used to find the features. The window spanned 15 minutes, andwas moved 5 minutes every step. As such, for every 5 minutes of thestorm, the features of that storm in that time was defined by the previ-ous 15 minutes. The following features that were extracted:

1. Mean longitude

18 CHAPTER 3. METHOD

2. Mean latitude

3. Number of strikes

4. Number of positively charged strikes

5. Area

6. Current life span of storm

7. Month

8. Hour

9. Covariance

While most are self-explanatory, others are not; the area is defined asthe area of the convex hull of the lightning strikes, and the covarianceis calculated from the longitude and latitude of the strikes.

By this process, a feature data set containing one to several in-stances of every storm was gained, and the noise was ignored. Afterthis, every storm that only contained one such instance and thus livedfor very short was removed. No trend could be learned from them.

In order to train and test the models on different data, the wholefeature set was split into training, validation and testing data sets. 60%of the storms were used for training, 20% were used for validation and20% for testing. No storm was divided between data sets, in order toavoid training on the same storms that appeared in the validation andtesting sets. The training data was used for training the models, thevalidation data was used for testing the models to verify the optimalparameters, and the test data was used in the final tests.

The goal was to predict several values at different points in time,so several models were to be trained depending on how far ahead topredict. Every such model needed its own input and output to trainon. As such, input and output data was collected for predicting thefollowing periods into the future:

1. 15 minutes

2. 30 minutes

3. 1 hour


4. 2 hours

5. 3 hours

6. 4 hours

7. 5 hours

8. 6 hours

Since an LSTM needs a set length of the input series, the length ofthe input sequence was set to 5 time steps. This was deemed to belong enough to be able to learn from, while not so long as to limit theamount of data. Thus, for every storm, sequences of 5 time steps wereextracted and used as input data. The output data, which was used intraining and as comparison when testing, was the mean longitude andlatitude of the time step that corresponded to the value to be predicted.(This represented the trajectory, which in section 1.1 is defined as thecenter point of the storm over time.) As an example, if the first 5 valueswere set as input, the corresponding output for the first model wouldbe the sixth value, as it predicts 5 minutes into the future. The secondmodel would on the other hand use the eighth value as output, and soon.

In order to be able to predict values for young storms, every stormsequence that had been extracted were zero-padded with four timesteps in the beginning. This was done before the LSTM input and out-put sequences were collected. After this, a sliding window was usedto extract all possible input sequences, and their corresponding out-puts. If there was not enough data in a storm to get an output for amodel, that model was ignored for this input. Since all were trainedseparately, this was not a problem. Finally, the data was normalisedto be between 0 and 1. The number of sequences in the training, vali-dation and test data for the different predictive lengths can be seen inTable 3.2.

3.4 LSTM

For each prediction distance a separate LSTM model was trained. Thisshould give better results than training one model and using its resultsto predict future values, according to Cheng et al. [7]. The LSTMs used


Table 3.2: Number of sequences in data.

Training Data Validation Data Test Data15 min 355396 131313 13421830 min 291294 109153 1116781 hour 213892 81414 833312 hours 132956 51705 526523 hours 90123 35859 359264 hours 64017 26194 256595 hours 46769 19772 188296 hours 34707 15399 14228

the RMSProp optimizer [36]. The same experiments were carried outon every such model. In order to find the optimal network, experi-ments were carried out to find the optimal parameters.

3.4.1 Random Search

Bergstra and Bengio [3] argue for using a random search to find theoptimal parameters, as this covers the search space in a more efficientmanner. It was also used by Greff et al. [18] to evaluate several dif-ferent parameters. As such, the following parameters were evaluatedusing random searches.

Number of nodes

For each layer in the network, the number of nodes was selected froma uniform distribution between 10 and 100.

Dropout

Dropout allows a network to ignore a certain percentage of the input.This allows it to avoid overfitting to the training data [35]. This wasused on the first layer of the network, and was selected from a uniformdistribution between 0 and 0.5.

Learning Rate

The learning rate, which defines the rate at which the network weightsare updated, was selected from a log-uniform distribution between10−6 and 10−2, in a similar way as was done by Greff et al. [18].


3.4.2 Layers

In order to determine the optimal number of LSTM layers, this param-eter was also tested. Karpathy, Johnson, and Li [26] argue that using2 or 3 layers get optimal results for LSTMs. On the other hand, Greffet al. [18] use a single-layer and achieved very good results. Thereforethe number of layers was chosen from 1 to 3. They were followed bya fully connected layer that set the output to 2-dimensional, represent-ing the longitude and latitude.

For each such layer setup, 25 random searches were carried out onthe data, training for 10 epochs. Only the models predicting 15 min-utes, 30 minutes and 1 hour were used. This was deemed reasonable,as three models should be enough to see trends regarding potentialdifferences between the models.

3.5 Evaluation of LSTM

When the random search was done, all models described in section 3.3were trained on the training data, using the optimal parameters foundin the random search. The training ran for 100 epochs. When trainingwas done, it was tested on the test data, and the root-mean-squareerror (RMSE) was calculated from the forecasts after it had been scaledback to degrees. This was chosen as a measure in order to calculate theaverage distance between the predicted trajectories and the true ones.

3.5.1 Linear Regression

As it is common to use a linear regression model for predicting thun-derstorm trajectories (see for example Dixon and Wiener [10]), sucha model was made for every forecast, in order to compare with theLSTMs. These linear regression models fitted a linear function usingthe longitude and latitude using the last 5 time steps, just as the LSTM.This was then used to predict the future values, of the same time spansas the LSTM predicted.

3.5.2 Other Scores

In order to compare the forecasting potential of both models, the scoresdescribed in section 2.5 were used. In order for these measures to be


Figure 3.1: An example of a storm grid for the prediction at 16:15, 2017-07-08. The black pixels contain at least one strike.

made, the prediction must be transformed into a grid. To test the pre-dictions made from time t, all strikes from the 15 minute frame thatended in t, as well as had not been labelled as noise, were mappedonto a grid. Every pixel in the grid was given the value 1 if it con-tained at least one strike, otherwise the value was 0. An example canbe seen in Figure 3.1. The future position of the storms were then cal-culated using the trained models. The difference between the meanposition in t and the predicted mean position was calculated for ev-ery storm, and added to the position of every lightning strike to get apredicted grid. This allowed the prediction to maintain the shape andsize of the storms, as only the trajectories are of interest for this study.

This grid was then compared to a grid in which the same stormshad been mapped to their true positions at the predicted time. By us-ing the same storms, it was possible to avoid the problem with newstorms being created that would affect the measures. Noise was alsoremoved by doing this. The measures in section 2.5 were then used tocompare the predicted values from both the LSTM and linear regres-sion to the truth.

The measures chosen were POD, FAR, CSI and HSS. These scoresgive an understanding of different aspects of the prediction power.


They all avoid the problem that comes with predicting a large area,of which only a small part has active thunderstorms. In these cases,many areas will be correctly predicted as lacking a storm, generatingvery positive results even if the prediction was very inaccurate.

These measures were evaluated for all days from 2017-07-08 to 2017-11-08, using all storms active at 16:00. Days without storms were ig-nored. The scores were calculated for 15 minute, 30 minute, 1 hour and2 hour predictions for both linear regression and the LSTM, in orderto give a sense of how both models perform short-term as well as in alonger term.

Chapter 4

Results

The results of the clustering are first described in section 4.1. In sec-tion 4.2 the findings of the random search are presented, and finally insection 4.3 the loss of the final model on the testing data is shown, aswell as storm visualizations and the scores mentioned in section 2.5.

4.1 Clustering

In order to define the optimal parameters for the clustering, severalcases are studied. A spatial ε-distance of 0.15 is chosen as the opti-mal value. Higher than this causes separate storm centres close toeach other to be labelled as the same storm, while smaller values causestorms to be divided unnecessarily. A similar trend is seen with tem-poral ε-distances higher or lower than 20 minutes. The problem withtoo small values creating many smaller storm labels is especially pro-nounced at the beginning or end of a storm.

Figure 4.1 depicts the lightning strikes over Benin and Togo in themorning of 3rd August 2012. The images are separated by one hour.Every point represents a lightning strike, and its colour represents itscluster. Black point are labelled as noise. The two storms have alsobeen highlighted with red circles around the points associated withthem. The clustering is done using the above mentioned parameters.The algorithm manages to identify the majority of the strikes for eachcentre as belonging to the same storm. However, it can be seen thatthere are still a few strikes being differently labelled in the second im-age, just before the creation of the southern storm. When the northernstorm dies out, several of the strikes are being labelled as noise.

24

CHAPTER 4. RESULTS 25

Figure 4.1: Clustering of storms, August 3 2012.

26 CHAPTER 4. RESULTS

4.2 Random Search

The results of the random search for the different parameters can beseen in Figure 4.2. Here the losses for different values of the param-eters specified in section 3.4.1 are presented in several charts. In eachchart the loss for three different models can be seen. The blue linesrepresent the first model, predicting 15 minutes into the future. Thegreen lines represent the second model, predicting 30 minutes into thefuture. Similarly, the red lines correspond to the third model, whichpredicts 1 hour into the future.

As can be seen, the three models perform similarly for each param-eter setup. Dropout is the parameter which has the clearest trend; lessdropout tends to give better loss. For the number of nodes and thelearning rate, there is no clear trend to be seen. For 1 layer, the bestresults were had using greater learning rates, however this does nothold for using 3 layers.

In Figure 4.3 scatter plots of the random searches are shown. Allcombinations of the three parameters are plotted, and the colour of thepoints represent the loss, with yellow representing low loss and bluerepresenting high. Here it is also possible to see that low dropout givesbetter loss. There is still no clear trend regarding number of nodes orlearning rate however.

In order to highlight the differences between using different num-ber of layers, the best results for each model using every number of

Table 4.1: Parameters for the runs with the smallest loss, for eachmodel and number of layers.

Layers Model Nodes Dropout Learning Rate Loss

11 19 0.063 93 0.000 56 0.001 03

2 93 0.008 37 0.005 63 0.000 42

3 84 0.054 13 0.000 12 0.001 08

21 40 0.003 81 0.000 02 0.000 27

2 60 0.002 300 0.000 14 0.000 36

3 60 0.002 300 0.000 14 0.000 76

31 56 0.031 97 0.001 64 0.000 36

2 56 0.031 97 0.001 64 0.000 72

3 56 0.031 97 0.001 64 0.001 71


layers are presented in Table 4.1. There is no clear connection betweenthe layers with regard to what parameters perform best. Dropout islow, and the learning rate is generally between 0.0001 and 0.005. Thenumber of nodes is around 50 when using more than 1 layer, while itis either high or low when using 1 layer. Regarding final loss, using 2layers performs better than the other setups.


(a) 1 Layer

(b) 2 Layers

(c) 3 Layers

Figure 4.2: Loss for different parameter settings, for different numbersof layers. Blue represents 15 minutes predictions; green represents 30minutes predictions; and red represents 1 hour predictions.


(a) 1 Layer

(b) 2 Layers

(c) 3 Layers

Figure 4.3: Loss for different parameter settings, for different numbersof layers. Yellow means low loss, and blue means high.


4.3 Final Model

As the same trends can be seen for each of the three prediction models,the same set of parameters are used for all models when training thefinal ones. The chosen parameters can be seen in Table 4.2.

In Table 4.3, the RMSE of the linear regression and the LSTM mod-els can be seen for different predictive time lengths. The linear regres-sion starts out good, but the error rapidly grows. At 6 hours the erroris very large; almost 3.5 degrees off. On the other hand, the LSTMconsistently performs better than the linear model. The error is morestable than for the linear regression, and at 6 hours it is still less than 1.

In Figure 4.4, visualizations of predictions of some storms can beseen. Long-lived storms are shown using the first 25 minutes as basisfor the prediction in Figure 4.4a, and other long-lived storms that havealready lived for 50 minutes are shown in Figure 4.4b. In Figure 4.4cstorms with shorter life span are shown. The red markings show thetrajectory that is used as input for the models, with the yellow linesrepresenting the true paths of the storms, the green representing the

Table 4.2: Parameters used for the final LSTM.

Parameter ValueLayers 2

Nodes per layer 50Dropout 0

Learning rate 0.0001

Table 4.3: RMSE for Linear Regression and LSTM for different predic-tive lengths.

Prediction Linear Regression LSTM15 min 0.134 0.10830 min 0.228 0.1571 hour 0.431 0.2452 hours 0.901 0.4143 hours 1.447 0.6084 hours 2.032 0.7135 hours 2.688 0.7986 hours 3.403 0.894


paths predicted by the linear model and the blue lines representingthe paths predicted by the LSTM. The LSTM tends to be closer to thetrue path, but in some cases the linear path is closer to the truth. TheLSTM also tends to have a more correct length of the path, while thelinear model can occasionally either predict too long or too short paths.

In Figure 4.5 the scores mentioned in section 2.5 are shown whenpredicting 15 minutes into the future, and the same scores can be seenfor 30 minutes, 1 hour and 2 hours predictions Figures 4.6, 4.7 and 4.8respectively. The scores are over a 4 month period. Both models havesimilar performance when predicting 15 minutes and 2 hours, but for30 minute and 1 hour predictions the LSTM has a significantly higherPOD, CSI and HSS, and lower FAR. These results are supported byTable 4.4, where the scores are shown for the total of each model andprediction period, calculated by using the sum of all True and FalsePositives and Negatives over all 4 months (see table 2.1 for reference).

Table 4.4: Scores for the total of all predictions using linear regressionand LSTM, for different predictive periods.

Prediction Time Model POD FAR CSI HSS

15 minLSTM 0.510 0.502 0.337 0.500

Linear Regression 0.487 0.524 0.317 0.477

30 minLSTM 0.356 0.675 0.205 0.334


1 hourLSTM 0.135 0.888 0.065 0.115


2 hoursLSTM 0.045 0.968 0.019 0.031



(a) 6 hour prediction

(b) 6 hour prediction, after 1 hour

(c) 3 hour prediction

Figure 4.4: Storm visualizations. Red represents the initial trajectory,yellow is the true trajectory, green is the linear prediction, and blue isthe LSTM prediction.


Figure 4.5: Scores for 15 minutes predictions for LSTM and linear re-gression.

Figure 4.6: Scores for 30 minutes predictions for LSTM and linear re-gression.


Figure 4.7: Scores for 1 hour predictions for LSTM and linear regres-sion.

Figure 4.8: Scores for 2 hour predictions for LSTM and linear regres-sion.

Chapter 5

Discussion

The goal of this study was to use an LSTM to predict the trajectories ofthunderstorms, given lightning strike data, and evaluate the accuracyof these forecasts. Compared to the linear regression model, the LSTMmanages to generate predictions that are closer to the true trajectory.It also manages to generate forecasts with greater accuracy than thelinear model. However, the accuracy of both models is similar in termsof scores for very short predictions (Figure 4.5) and predictions of over2 hours (Figure 4.8).

The results of the clustering are discussed in section 5.1. In sec-tion 5.2 the performance of the LSTM, as well as the random search,are evaluated. Finally, potential pathways for future research are pre-sented in section 5.3.

5.1 Clustering

As discussed in section 3.2, the lightning strike data was clustered soas to approximate real world thunderstorms. The choice of using DB-SCAN was motivated by previous studies, and proved to work asexpected in terms of unification of storms and separation of differ-ent storms. By studying different scenarios of thunderstorms, it wasshown how this clustering behaved compared to true thunderstorms(see Figure 4.1).

There is also the problem of merging and splitting storms. Severalprevious studies have discussed this problem, and few have managedto find a good solution. As such, this was not addressed by this study.Since the clustering was not the focus of this study, it was deemed ac-

35

36 CHAPTER 5. DISCUSSION

ceptable to ignore the problem. After studying the data, it seems thatthe storms often have similar trajectories when splitting or merging oc-curs, which should mean that the effect on the trajectory prediction islimited. Larger storms, with longer lifespans, suffer from more split-ting and merging than smaller storms. This could have affected thepredictive power of the LSTM for longer predictions.

5.2 LSTM

The end goal of this study was to train an LSTM network that couldpredict thunderstorms more accurately than linear regression. Thishas been achieved, as shown in Table 4.3. Especially for predictionsfurther away in time there is a clear difference, which indicates thatLSTMs can be used with greater success for such predictions. In com-parison, the linear regression can work efficiently for very short-termpredictions, but for more long-term predictions its error increases rapidly.In its current state LSTMs could potentially supplement already exist-ing technologies for storm prediction.

One problem, however, is the low number of sequences for long-term predictions, as shown in Table 3.2. This means that the later mod-els have much less data to train on, which likely affects the accuracyof the predictions. However, it is reasonable that there is less data forlong-lived storms, especially in Africa as has been described in chapter1. Despite the lack of data, the models still manage to perform well.

Regarding the random search that was performed in order to findthe optimal parameters, this failed to show clear trends for most ofthe parameters (see Figures 4.2 and 4.3). The only parameter with aclear trend was dropout, as the tests indicated that it only decreasedthe performance. Table 4.1 indicates that using 2 layers is optimal, asthe best models using 2 layers have lower loss than the other config-urations. There was no clear indication what the optimal number ofnodes in each layer was, but the models with the lowest loss (for 2 or 3layers) had around 50 nodes, so this was used for the final model. Thelearning rate was set to 0.0001 for similar reasons.

Figure 4.4 shows that the LSTM can approximate the path of thethunderstorm much more accurately than the linear regression. It gen-erally manages to approximate the direction and speed of the storm,while the linear regression fails in this. However, it is often not very

CHAPTER 5. DISCUSSION 37

reliable and precise, and on occasion completely malfunctions. Therecan be several reasons as follows. First, availability of data is a concern,where longer predictions suffer more. Often the prediction is relativelyclose at first, but the paths diverge as time passes. Another reason canbe the clustering and feature extraction. If the center point of a stormis very erratic, the path becomes more difficult to predict. The sameeffect can be had if the clustering not correctly identifies a storm. Thisbehaviour might be what is occurring in Figure 4.4c, where the truepath (yellow) has an unrealistic trajectory in one example.

The problem can also lie in the storms themselves. As a storm gen-erally is comprised of several cells in various phases, its is possiblethat the growth of one cell and the decay of another would cause thecentroid to move, unrelated to the movement direction and speed ofthe storm system as a whole.

Regarding the forecasting capabilities of the models, Figures 4.5and 4.8 indicate that there is no clear difference for 15 minutes or 2hours, while Figures 4.6 and 4.7 show that for 30 minutes and 1 hourthe LSTM outperforms the linear model. These results are corrobo-rated by Table 4.4. The plots are very uneven, likely due to differencesin the number of storms every day. Days without storms were ignored,but days with few storms are more sensitive to errors in predictions.This is not a problem with the total score presented in Table 4.4 how-ever, as the total of all predictions is used.

A reason that the two models are closer in performance here thanwhen measured in RMSE could be that there is no difference in pre-dicting the storm close to he truth or very far from it, as only TruePositives and Negatives are counted as successes. The POD and theFAR indicate that they occasionally output somewhat correct predic-tions. The CSI, which can be seen as an accuracy measure that avoidsusing the correctly predicted calm areas, indicates however that theperformance is probably not good enough to be used in practice. TheHSS also indicates that while better than a random guess, both modelsfail to perform adequately for longer predictions, when they are onlyslightly better than random guessing.

It is not surprising that both models perform worse for more than15 minutes. A storm cell generally lives around 20 minutes, so a stormthat continues on for longer will be comprised of new cells. Since theexact size and shape is not predicted, and thus not changed, by themodels, the accuracy decreases as this happens. For 2 hour predictions

38 CHAPTER 5. DISCUSSION

or longer it is not a usable approach. However, for shorter predictionsit is possible to see that the LSTM drastically improves upon the linearregression, despite the creation of new storm cells.

A reason for the decline in accuracy of the LSTM for 2 hours, canprobably be explained by the fact that neither model predicts the deathof storms. Most storms have likely died after 2 hours, meaning thatboth models predict storms that no longer exist.

That the results of the LSTM are not better is not very surprising.As has been mentioned, predicting a trajectory that is close to the truetrajectory, but without any overlap, still counts as a complete miss.Also, there is no prediction regarding the death of the storm. Fur-thermore, this study has only predicted the trajectories of the storms.Thus, the shape and size of the storms have been maintained whencalculating these scores, but this does not hold for the truth.

Furthermore, it is possible that the clustering was unrepresenta-tive of the actual storms. While the results of the clustering showedthat it managed to cluster strikes in close proximity into the samestorms while separating different storms, not all strikes were correctlylabelled (see Figure 4.1). The problem of merges and splits, especiallyfor larger storms, could also mean that different storms were clusteredas the same. In this case, the trajectories of these clusters have beenlearned upon and predicted, not the trajectories of the actual storms.This would not show when examining the RMSE and the visualiza-tions, but will cause greater errors for the measurements. However, ithas been shown that the behaviour of the storms can be learned upon.Even if the trajectories of the clusters are not the exact same as thestorms, they should still be close enough that with a better clusteringalgorithm, the same (or very similar) LSTM models could be used. Asa result, the predictive power of the models might become much betterif the clustering is improved upon.

5.3 Future Research

The results of this study indicate that LSTMs can significantly improvethunderstorm trajectory prediction compared to its rival, linear regres-sion. However, there are still many parameters and questions to beconsidered as potential paths for future work.

One idea is to improve upon the current clustering algorithm. An

CHAPTER 5. DISCUSSION 39

improved clustering algorithm, which also manages to handle mergesand splits, would increase the performance of the models. As men-tioned in section 5.1, long-lived storms have more merges and splits.An improved clustering algorithm could thus significantly improvethe accuracy of the longer predictions. As the LSTM learns on theclustering, it is important to have it represent the reality as much aspossible.

Further study into exactly what features affect the result the mostwould also be of interest. This was for example done by [18] for hand-writing detection. It is possible that some features do not contributesignificantly enough to motivate their use. It is also possible that re-placing some features with new parameters would improve the re-sults. One idea is to use the geographical center of the storms, insteadof the mean of the strikes. Other interesting ideas could be to allow forinternal variability of the storm. This could include polarity distribu-tion of the lightning strikes, different lightning densities and a variableshape.

Further, it should be possible to achieve better results with moredata, especially for long-lived storms. It is for these storms that thegreatest improvement over linear models can be made, and with moredata to train on even better accuracy can hopefully be achieved. Thiscould be combined with other types of data, such as wind speeds atdifferent altitudes.

It would also be very interesting to adapt the network to predictother aspects of the thunderstorms. Such aspects include the intensity,the lifespan, the size and the shape of the storm. The shape could po-tentially be approximated as an ellipse, and the intensity as the num-ber of strikes within a set area. Such a study could allow for predictingall aspects of a thunderstorm, which would allow such models to gen-erate far more accurate predictions.

Chapter 6

Conclusion

It was shown that an LSTM-based model can predict thunderstormtrajectories with greater accuracy than linear models. The model pre-dicts the mean position of the storm for the next 6 hours. For this, dataconsisting of individual lightning strikes was used.

The first task was to cluster the lightning strikes to identify separatethunderstorms. The DBSCAN algorithm was used for storm identifi-cation and tracking. The input features for the LSTM model were thencalculated from the identified storms.

In order to identify the best parameters for the LSTM, a randomsearch was carried out, and as a result it is found that using 2 layers,50 nodes in each layer, 0.0001 as learning rate and no dropout were themost efficient parameter settings. These were used in the final model,which was tested and compared to a linear regression model. It wasquantitatively found, by measuring the error of the predictions, thatthe LSTM can provide more accurate trajectory prediction comparedto the linear model. The difference between the models grows as thepredictions become longer.

Using some metrics, it can be seen that the LSTM generates moreaccurate forecasts than the linear model, especially for timespans of 30minutes to 1 hour. In Table 4.4 it is shown that the POD for 30 minutesis 8 percentage points higher than for the linear model, meaning animprovement of almost 30 %. Considerable improvements can be seenfor the other scores as well, indicating that the LSTM is more capableof identifying the correct position of the storms, and less dependent onchance. Predictions of 2 hours or more into the future use storms thatare likely dead by then, meaning that both models have similar scores.

40

CHAPTER 6. CONCLUSION 41

In section 5.3 it is proposed that further study is carried out intothe clustering, to solve the problem with merging and splitting storms.This is more common for long-lived storms, and can potentially iden-tify separate storms as the same. An improved clustering algorithmwill create features that more closely represent the truth. Researchshould also be carried out to predict the shape, size and lifespan ofthunderstorms, in order to generate a complete forecast. Using ellipsesas approximations for storms would be interesting, and the form andangle of it could then be predicted. By predicting the shape and sizeof a thunderstorm, it becomes possible to generate even more accurateforecasts, and predicting the lifespan will avoid predicting for deadstorms and lower the FAR.

Bibliography

[1] Alexandre Alahi et al. “Social lstm: Human trajectory predictionin crowded spaces”. In: 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR). (June 27–30, 2016). Las Vegas,Nevada, pp. 961–971. DOI: 10.1109/CVPR.2016.110.

[2] Sheila Alemany et al. “Predicting Hurricane Trajectories using aRecurrent Neural Network”. In: ArXiv e-prints (Feb. 2018). arXiv:1802.02548.

[3] James Bergstra and Yoshua Bengio. “Random search for hyper-parameter optimization”. In: Journal of Machine Learning Research13 (Feb. 2012), pp. 281–305.

[4] Hans-Dieter Betz et al. “Cell-tracking with lightning data fromLINET”. In: Advances in Geosciences 17 (July 2008), pp. 55–61.

[5] Derya Birant and Alp Kut. “ST-DBSCAN: An algorithm for clus-tering spatial–temporal data”. In: Data Knowledge Engineering60.1 (2007), pp. 208–221. DOI: https://doi.org/10.1016/j.datak.2006.01.013.

[6] Gianluca Bontempi, Yann-Aël Le Borgne, and Jacopo De Stefani.“A Dynamic Factor Machine Learning Method for Multi-variateand Multi-step-Ahead Forecasting”. In: 2017 IEEE InternationalConference on Data Science and Advanced Analytics (DSAA). (Oct. 19–21, 2017). Tokyo, Japan, 2017, pp. 222–231. DOI: 10.1109/DSAA.2017.1.

[7] Haibin Cheng et al. “Multistep-Ahead Time Series Prediction”.In: Advances in Knowledge Discovery and Data Mining. 10th Pacific-Asia Conference, PAKDD. (Apr. 9–12, 2006). Ed. by Wee-Keong Nget al. Singapore, 2006, pp. 765–774.

42

BIBLIOGRAPHY 43

[8] Vernon Cooray. An Introduction to Lightning. Dordrecht: SpringerNetherlands, 2015. ISBN: 978-94-017-8938-7. DOI: 10.1007/978-94-017-8938-7_1.

[9] Sandy Dance, Elizabeth Ebert, and David Scurrah. “Thunder-storm Strike Probability Nowcasting”. In: Journal of Atmosphericand Oceanic Technology 27.1 (2010), pp. 79–93. DOI: 10.1175/2009JTECHA1279.1.

[10] Michael Dixon and Gerry Wiener. “TITAN: Thunderstorm Iden-tification, Tracking, Analysis, and Nowcasting—A Radar-basedMethodology”. In: Journal of Atmospheric and Oceanic Technology10.6 (1993), pp. 785–797. DOI: 10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

[11] Charles A. Doswell III. “MESOSCALE METEOROLOGY | Se-vere Storms”. In: Encyclopedia of Atmospheric Sciences (Second Edi-tion). Ed. by Gerald R. North, John Pyle, and Fuqing Zhang. Sec-ond Edition. Oxford: Academic Press, 2015, pp. 361–368. ISBN:978-0-12-382225-3. DOI: https://doi.org/10.1016/B978-0-12-382225-3.00366-2.

[12] Charles A. Doswell III, Robert Davies-Jones, and David L. Keller.“On Summary Measures of Skill in Rare Event Forecasting Basedon Contingency Tables”. In: Weather and Forecasting 5.4 (1990),pp. 576–585. DOI: 10.1175/1520-0434(1990)005<0576:OSMOSI>2.0.CO;2.

[13] Andrew Dowdy and Graham A. Mills. “Characteristics of lightning-attributed fires in south-east Australia”. In: International Journalof Wildland Fire 21.5 (2012), pp. 521–524.

[14] Martin Ester et al. “A density-based algorithm for discoveringclusters in large spatial databases with noise.” In: KDD’96. Pro-ceedings of the Second International Conference on Knowledge Dis-covery and Data Mining. (Aug. 2–4, 1996). Portland, Oregon, 1996,pp. 226–231.

[15] Felix A. Gers, Jürgen Schmidhuber, and Fred A. Cummins. “Learn-ing to forget: continual prediction with LSTM”. In: ICANN 99.Ninth International Conference on Artificial Neural Networks. (Sept. 7–10, 1999). Edinburgh, UK, 1999, pp. 850–855. DOI: 10.1049/cp:19991218.

44 BIBLIOGRAPHY

[16] Tilmann Gneiting and Adrian E. Raftery. “Weather Forecastingwith Ensemble Methods”. In: Science 310.5746 (2005), pp. 248–249. ISSN: 0036-8075. DOI: 10.1126/science.1115255.

[17] Alex Graves. “Supervised Sequence Labelling with RecurrentNeural Networks”. PhD thesis. Technical University of Munich,2012.

[18] Klaus Greff et al. “LSTM: A Search Space Odyssey”. In: IEEETransactions on Neural Networks and Learning Systems 28.10 (2017),pp. 2222–2232. ISSN: 2162-237X. DOI: 10.1109/TNNLS.2016.2582924.

[19] Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: con-cepts and techniques. Third Edition. Elsevier, 2011.

[20] Ronald Holle. “Annual rates of lightning fatalities by country”.In: 20th International Lightning Detection Conference. (Apr. 21–23,2008). Tucson, Arizona, 2008.

[21] Michael L Hutchins and Robert H Holzworth. “Thunderstormcharacteristics from cluster analysis of lightning”. In: XV Inter-national Conference on Atmospheric Electricity. (June 15–20, 2014).Norman, Oklahoma, 2014.

[22] Michael L. Hutchins, Robert H. Holzworth, and James B. Brun-dell. “Diurnal variation of the global electric circuit from clus-tered thunderstorms”. In: Journal of Geophysical Research: SpacePhysics 119.1 (2014), pp. 620–629. ISSN: 2169-9402. DOI: 10.1002/2013JA019593.

[23] Otto Hyvärinen. “A Probabilistic Derivation of Heidke Skill Score”.In: Weather and Forecasting 29.1 (2014), pp. 177–181. DOI: 10.1175/WAF-D-13-00103.1.

[24] Guo Juntian, Gu Shanqiang, and Feng Wanxing. “A lightningmotion prediction technology based on spatial clustering method”.In: 2011 7th Asia-Pacific International Conference on Lightning. (Nov. 1–4, 2011). Chengdu, China, 2011, pp. 788–793. DOI: 10.1109/APL.2011.6110234.

[25] Serrie I. Kamara. “THE ORIGINS AND TYPES OF RAINFALLIN WEST AFRICA”. In: Weather 41.2 (1986), pp. 48–56. ISSN: 1477-8696. DOI: 10.1002/j.1477-8696.1986.tb03787.x.

BIBLIOGRAPHY 45

[26] Andrej Karpathy, Justin Johnson, and Fei-Fei Li. “Visualizingand Understanding Recurrent Networks”. In: ArXiv e-prints (2015).arXiv: 1506.02078.

[27] Moriah Kohn et al. “Nowcasting thunderstorms in the Mediter-ranean region using lightning data”. In: Atmospheric Research 100.4(2011), pp. 489–502. ISSN: 0169-8095. DOI: https://doi.org/10.1016/j.atmosres.2010.08.010.

[28] Valliappa Lakshmanan, Benjamin Herzog, and Darrel Kingfield.“A Method for Extracting Postevent Storm Tracks”. In: Journal ofApplied Meteorology and Climatology 54.2 (2015), pp. 451–462. DOI:10.1175/JAMC-D-14-0132.1.

[29] Valliappa Lakshmanan and Travis Smith. “Data Mining StormAttributes from Spatial Grids”. In: Journal of Atmospheric and OceanicTechnology 26.11 (2009), pp. 2353–2365. DOI: 10.1175/2009JTECHA1257.1.

[30] Zachary C. Lipton, John Berkowitz, and Charles Elkan. “A Crit-ical Review of Recurrent Neural Networks for Sequence Learn-ing”. In: ArXiv e-prints (2015). arXiv: 1506.00019.

[31] Jenny Matthews and John Trostel. “An improved storm cell iden-tification and tracking (SCIT) algorithm based on DBSCAN clus-tering and JPDA tracking methods”. In: 21st International Light-ning Detection Conference. (Apr. 19–20, 2010). Orlando, Florida,2010.

[32] Karen I. Mohr and Edward J. Zipser. “Mesoscale ConvectiveSystems Defined by Their 85-GHz Ice Scattering Signature: Sizeand Intensity Comparison over Tropical Oceans and Continents”.In: Monthly Weather Review 124.11 (1996), pp. 2417–2437. DOI:10.1175/1520- 0493(1996)124<2417:MCSDBT> 2.0.CO;2.

[33] Mina Moradi Kordmahalleh, Mohammad Gorji Sefidmazgi, andAbdollah Homaifar. “A Sparse Recurrent Neural Network forTrajectory Prediction of Atlantic Hurricanes”. In: GECCO ’16.Proceedings of the Genetic and Evolutionary Computation Conference2016. Denver, Colorado, 2016, pp. 957–964. DOI: 10.1145/2908812.2908834.

46 BIBLIOGRAPHY

[34] Xingjian Shi et al. “Convolutional LSTM Network: A MachineLearning Approach for Precipitation Nowcasting”. In: Advancesin Neural Information Processing Systems 28. (Dec. 7–12, 2015). Ed.by C. Cortes et al. Montreal, Canada, 2015, pp. 802–810.

[35] Nitish Srivastava et al. “Dropout: A simple way to prevent neu-ral networks from overfitting”. In: The Journal of Machine LearningResearch 15.1 (2014), pp. 1929–1958.

[36] Tijmen Tieleman and Geoffrey Hinton. Lecture 6e rmsprop: Dividethe gradient by a running average of its recent magnitude. COURS-ERA Class: Neural networks for machine learning. 2012.

[37] Tapio J. Tuomi and Markku Larjavaara. “Identification and anal-ysis of flash cells in thunderstorms”. In: Quarterly Journal of theRoyal Meteorological Society 131.607 (2005), pp. 1191–1214. ISSN:1477-870X. DOI: 10.1256/qj.04.64.

[38] Vaisala. Unique Vaisala Global Lightning Dataset GLD360. Brochure.2015. URL: https://www.vaisala.com/sites/default/files/documents/WEA-MET-GLD360%5C%20Brochure-B211271EN.pdf.

[39] Hans Volland. Atmospheric Electrodynamics. Berlin, Heidelberg:Springer Berlin Heidelberg, 1984. ISBN: 978-3-642-69813-2. DOI:10.1007/978-3-642-69813-2_4.

[40] Earle R. Williams. “Lightning and climate: A review”. In: At-mospheric Research 76.1 (2005). Atmospheric Electricity, pp. 272–287. ISSN: 0169-8095. DOI: https://doi.org/10.1016/j.atmosres.2004.11.014.

[41] WMO. Nowcasting. 2017. URL: http://www.wmo.int/pages/prog/amp/pwsp/Nowcasting.htm (visited on 02/06/2018).

[42] Edward J Zipser et al. “Where are the most intense thunder-storms on Earth?” In: Bulletin of the American Meteorological So-ciety 87.8 (2006), pp. 1057–1071.

www.kth.se

Documents

Predicting Tropical Thunderstorm Trajectories Using LSTM1229533/FULLTEXT01.pdf · Bibliography 42. List of Figures ... FAR False alarm rate HSS Heidke skill score LSTM Long short-term