A performance study of an evolutionary algorithm for two ...kth.diva-portal.org/smash/get/diva2:1105919/FULLTEXT01.pdfevolutionary algorithm for two point stock forecasting FREDRIK

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2017

A performance study of an evolutionary algorithm for two point stock forecasting

FREDRIK HYYRYNEN

MARCUS LIGNERCRONA

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

iii

Abstract

This study was conducted to conclude whether or not it was possi-ble to accurately predict stock behavior by analyzing general patternsin historical stock data. This was done by creating an evolutionaryalgorithm that learned and weighted possible outcomes by studyingthe behaviour of the Nasdaq stock market between 2000 and 2016 andusing the result from the training to make predictions.

The result of testing with varied parameters concluded that clear pat-terns could not reliably be established with the suggested method assmall adjustments to the measuring dates yielded wildly different re-sults. The results also suggests that the amount of data is more rele-vant than how closely the stocks are related for the performance andthat less precise predictions performs better than predicting multipledegrees of change. The performance of the seemingly better settingwas shown to perform worse than random predictions but researchwith other settings might yield more accurate predictions.

iv

Sammanfattning

Den här studien utfördes för att konstatera ifall det är möjligt att sä-kert förutspå beetendet hos en aktiekurs genom att analyser generellamönster i historiska aktiedata. Detta gjordes genom att skapa en evolu-tionär algorithm som lär och sätter vikt på möjliga utfall genom studieav aktiekurser av Nasdaq-aktiemarknaden mellan 2000 och 2016 föratt sedan avnända resultatet av inlärningen för att göra prognoser.

Resultaten av test med varierade parametrar konstaterade att tydligamönster inte kunde etableras med den föreslagna metoden eftersomsmå justeringar i mätdatum gav stora skillnader i resultatet men förslogatt mängden data var mer relevant för prestandan än huruvida akti-erna var relaterade till varandra och att mindre nogrannhet gav bättreprestanda än prognoser av fler grader av förändring. Prestandan avinställningen som verkade bättre visades prestera sämre än slumpadeprognoser men vidare forskning med andra inställninger skulle kunnage säkrare prognoser.

Contents

1 Introduction 11.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . 21.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Adjusted closing price . . . . . . . . . . . . . . . . . . . . 32.2 Yahoo! Finance . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Efficient market hypothesis . . . . . . . . . . . . . . . . . 42.4 Evolutionary algorithms . . . . . . . . . . . . . . . . . . . 42.5 Literature review . . . . . . . . . . . . . . . . . . . . . . . 5

2.5.1 Carl Sandström - An evolutionary approach totime series forecasting with artificial neural net-works . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5.2 Defu Zhanga QiSen Caia et al. - A Novel StockForecasting Model based on Fuzzy Time Seriesand Genetic Algorithm . . . . . . . . . . . . . . . . 6

3 Method 73.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 83.2.2 Variations . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 123.3.1 Hit rate . . . . . . . . . . . . . . . . . . . . . . . . . 123.3.2 Same direction rate . . . . . . . . . . . . . . . . . . 13

4 Results 144.1 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

v

vi CONTENTS

4.1.1 Unrelated stock selection - eight inclination seg-ments . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.2 Unrelated stock selection - two inclination seg-ments . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.3 Similar stock selection - eight inclination segments 154.1.4 Similar stock selection - two inclination segments 164.1.5 Random predictions . . . . . . . . . . . . . . . . . 16

5 Discussion 205.1 Discussion of results . . . . . . . . . . . . . . . . . . . . . 205.2 Discussion of method . . . . . . . . . . . . . . . . . . . . . 23

6 Conclusion 24

Bibliography 25

Appendices 26

A Stock symbols 26A.1 Stock symbols of the supplied stocks . . . . . . . . . . . . 26A.2 Unrelated stock symbols for training . . . . . . . . . . . . 26A.3 Unrelated stock symbols used for evaluation . . . . . . . 27A.4 Health care stock symbols . . . . . . . . . . . . . . . . . . 27A.5 Health care stock symbols used for training . . . . . . . . 27A.6 Health care stock symbols used for evaluation . . . . . . 27

Chapter 1

Introduction

Stock forecasting is a subject which is not only interesting because ofthe possibility of monetary gain but also because of the opportunityto develop forecasting algorithms. This subject has been studied ex-tensively[1][2], and the results are varied depending on many factors.One such factor is the design of the algorithm and an other factor isthe selection of analysed stocks and time periods.

Perfect stock precognition may be an impossibility because the manyfactors that affect the stock values are far too many to account for orso complex that there is no way of measuring them. Efficient markethypothesis[3] also implies that it is impossible for an expert analystto invest in a undervalued stock and sell when it is overvalued. Thechanges in the stock price directly reflects the actions and value of thecompany and the only way to “beat the market” is by investing incompanies that will be valued higher in the future due to unforeseenactions.

Even though seemingly impossible to find, it is commonly believedthat there may exist patterns in the history of a stock[4] that affect thefuture values; patterns that people subconsciously consider when theytrade their shares, patterns that companies might base their actions onthat will reflect their value. The existence of professionals in the busi-ness of forecasting stock values suggests that these patterns shouldexist or else their job solely depends on luck. If these patterns existsthen a well-designed algorithm will be able to find them.

1

2 CHAPTER 1. INTRODUCTION

1.1 Problem statement

This report investigates a stock forecasting algorithm inspired by meth-ods used in Sandströms study[1], and a study by QiSen Cai amongothers[2].

The questions this investigation will answer are the following:

• Can the investigated evolutionary algorithm accurately predictstock price changes using only historical adjusted closing priceson a set of stocks?

• Does it perform better on a set of related stocks or a set of ran-domly selected stocks from the same stock market?

• How is the performance affected by the precision on which thedata is interpreted?

• How is the performance affected by the amount of data availablefor learning?

1.2 Scope

The stock data used in this investigation are supplied by Yahoo! Fi-nance[5] from the Nasdaq[6] stock market.

This investigation tests a few of many possible variations of settingson the algorithm because of a short time frame.

The number of measuring points and the precision used in the algo-rithm affect the execution time and are therefore kept at experimentalvalues (see section 3.2.1).

The amount of data used are stock prices of 53 different stocks from2000-01-01 (or later if data is missing) to 2016-12-30. Among these 53stocks are 12 stocks from the health care category. These health carestocks are used as related stocks due to availability and the sharedcategorization.

The choice of settings on the algorithm was made by the authors andare experimental to reduce the execution time of the algorithm. Othersettings might result in a much better performance.

Chapter 2

Background

This sections provides brief information about the kind of data, APIand algorithm used in the study, and contains a statement about EMH.

2.1 Adjusted closing price

Closing price refers to the final price of a security on a given tradingday.

The definition of adjusted closing price used in this investigation followsthe definition from Investopedia[7]:

“An adjusted closing price is a stock’s closing price on any given dayof trading that has been amended to include any distributions and cor-porate actions that occurred at any time prior to the next day’s open.”

2.2 Yahoo! Finance

Yahoo! Finance[5] supplies a way of gathering historical stock data fromdifferent stock markets. The data supplied contains date, opening price,daily high value, daily low value, closing price, adjusted closing price, andvolume from many different dates and stocks. The data features usedin this study are the date and the adjusted closing price.

3

4 CHAPTER 2. BACKGROUND

2.3 Efficient market hypothesis

From Investopedia[3]:

“The efficient market hypothesis (EMH) is an investment theory thatstates it is impossible to ‘beat the market’ because stock market effi-ciency causes existing share prices to always incorporate and reflectall relevant information. According to the EMH, stocks always tradeat their fair value on stock exchanges, making it impossible for in-vestors to either purchase undervalued stocks or sell stocks for inflatedprices. As such, it should be impossible to outperform the overall mar-ket through expert stock selection or market timing, and the only wayan investor can possibly obtain higher returns is by purchasing riskierinvestments.”

2.4 Evolutionary algorithms

An evolutionary algorithm is a kind of machine learning algorithm; al-gorithms that utilizes the computational powers of computers to learnhow to solve different problems.

The evolutionary algorithms are inspired by biological evolution. Theypass through a set of mechanisms such as selection, reproduction, muta-tion, and recombination[8].

Following is a summary of the stages of an evolutionary algorithm:

• A set of variables are set to initial values, randomized or prede-fined by the author. The variables are used to generate a popu-lation of solutions. Each solution strives to complete a task andthe extent of how well the solution have achieved said task ismeasured in a fitness value.

• A set of the solutions in the population are selected and passedto the reproduction stage. The selection compares the fitness valueand a number of top solutions are chosen. Pairs of the selectedsolutions are recombined in the reproduction stage and the resultsbecome children of the pairs.

CHAPTER 2. BACKGROUND 5

• The combination stage combines the variable settings of both par-ents in different ways and the generated children thus have vari-able settings defined only by its parents.

• Each child undergoes a mutation. This mutation stage shifts anumber of variables and the mutated children are subjected to thesame process as the parents but this time with its own variablesettings.

Each iteration of population creation and child production, called a gen-eration, are believed to produce better solutions than the previous gen-eration. The process halts after a halting condition is reached.

2.5 Literature review

This section will focus on research made prior to this one, the resultsthey produced as well as how they relate to this investigation.

2.5.1 Carl Sandström - An evolutionary approach totime series forecasting with artificial neural net-works

In his research[1] Carl Sandström examined whether an approach tostock forecasting using an evolutionary algorithm performed betterthan a back-propagating algorithm. Testing was performed on threestocks selected from the Nasdaq stock market Apple, Microsoft andYahoo. While the author deemed that his findings did not providea definite conclusion on whether an evolutionary algorithm performsbetter than a back-propagating one the results presented hints that anevolutionary algorithm performs at least as good as a back-propagatingalgorithm. Using the evolutionary algorithm the author made an ap-proximate 57% return on investments made into Microsoft, an approx-imate 13% return on investments into Apple and an approximate 17%return on investments into Yahoo.

6 CHAPTER 2. BACKGROUND

2.5.2 Defu Zhanga QiSen Caia et al. - A Novel StockForecasting Model based on Fuzzy Time Seriesand Genetic Algorithm

In their research[2] Defu Zhanga among others made an attempt topredict the stock market using a combination of a genetic algorithmand fuzzy time series. The result indicates that a combination of ge-netic algorithms and fuzzy time series surpasses methods only usingfuzzy time series as was demonstrated by comparing the results ofthe investigations proposed method to that of three more conventionalmodels implementing fuzzy time series.

Chapter 3

Method

This section first describes the data and later the algorithm and its vari-ables used to conduct the investigation.

3.1 Data

Random stock data refers to the data on each of the stocks suppliedby Yahoo! Finance[5]. It consists of historical data from 53 differentstocks from the Nasdaq market[6]. The contents of each stock data isthe adjusted closing price and date.

Similar stock data refers to the stock data on 12 different stocks un-der the health care category on the Nasdaq market[6]. These stocks arebelieved to be related as they share category.

Prediction table refers to the table generated by the algorithm in eachevolutionary iteration. The generated table contains probabilities onhow the stock price might change in terms of inclination segments, de-pending on the inclinations between the measuring points. The proba-bility is calculated in the learning stage on every possible combinationsof segmented inclinations between the measuring points. For more in-formation on the segmentation of inclinations, see figure 3.1.

Best prediction table refers to the table with the greatest fitness valueof every prediction table generated in each evolutionary iteration. This

7

8 CHAPTER 3. METHOD

Figure 3.1: An illustration of the usage of the variable inclination seg-ments with values 8 and 2.

table is believed to give the best possible prediction given the learningperiod, and other variable settings.

3.2 Algorithm

This section will first define the variables that affects the performanceof the algorithm and later describe the different stages of the algo-rithm. The different settings used in this investigation are specifiedin section 3.2.2.

3.2.1 Overview

Variables

Number of measuring points. The number of dates that will be usedwhen analyzing. This value will be set to six. This value is experimen-tal and chosen by the authors. It is believed to be low enough to reduce

CHAPTER 3. METHOD 9

the amount of data needed for a prediction and also high enough to beable to find useful patterns.

Number of inclination segments. The number of segments the differ-ent inclinations will be classified into. This will be varied to answerthe question on how the result relate to the precision of which we in-terpret the data. It is shown in figure 3.1 how this variable is used andthe effect of change.

Analysis time frame. The amount of years, months or days in whichthe measuring points are spread across in each step of the analysis andthus the time frame that in which a pattern can be found. This numberwill be kept at 4 months as this is believed to be sufficient to find apattern and small enough to generate sufficient amount of data for anincrease in performance.

Analysis period. The number of years, months or days in total that willbe analyzed. This will be kept at 16 years in this article. This numberis experimental and chosen by the authors. It is believed that it willgenerate sufficient amount of data to increase the performance.

Prediction dates. Which dates relative to the analysis time frame thatwill be predicted using the algorithm. These dates will be set to oneand two months after the last measuring point. These numbers areexperimental and chosen by the authors. More dates allows for moreinformation about the performance and two dates are believed to beenough for this performance study.

Number of evolutions. The number of times the algorithm should passthrough the initialization, learning, and evolving stage. Greater num-ber is believed to yield a better best prediction table. This number willbe kept at 200 to reduce the overall execution time and is believed tobe high enough to stabilize at a pattern that enables for accurate pre-dictions.

Initialization stage

This stage sets the initial values for the variables but most importantlythe initial relative dates between the measuring points. These dates arefirst evenly spread across the analysis time frame and are later adjustedin the evolving stage.

10 CHAPTER 3. METHOD

Learning stage

In this stage the algorithm reads data from a set of stocks on the datesof all the measuring points and the prediction points in the analysistime frame. The analysis time frame then moves one month at timeacross the stock data to cover the whole analysis period. The inclinationbetween any two measuring points in each analysis step gets classifiedinto one of the number of inclination segments. The set of classified in-clination between two adjacent measuring points decides the path inthe prediction table and the classified inclination between the last mea-suring point and the first prediction point and any two adjacent pre-diction point increases a weight on this path in the table to generate aprobability of the same pattern to occur given only the same measur-ing points.

Benchmark stage

The algorithm runs a test on a new set of stocks using the settingsand probabilities from the prediction table calculated from the learn-ing stage. Predictions are made on dates with known values and thepredicted values are compared to the real value. Any pair of predictedinclination and real inclination that are in the same inclination segmentare counted as a hit. The number of hits is that prediction tables fitnessvalue. If the fitness value is higher than the current best prediction ta-ble then the best prediction table is replaced.

The algorithm will continue with the evolving stage if the number ofiterations are below the number of evolutions it will otherwise finish theprocess and supply the current best prediction table as the solution.

Evolving stage

In this stage the relative dates for the measuring points are altered.Randomized variations occurs at randomly chosen intermediate mea-suring points. The first and last measuring point and the predictionpoints are kept at fixed dates as defined in the initialization stage. Af-ter a number of evolutionary steps, ten in this investigation, the set-tings are first copied from the current best prediction table and then

CHAPTER 3. METHOD 11

altered. This is done to prevent the settings of the children from differ-ing too much and to keep the search of a pattern near the current bestone to try and improve it.

After this stage the iteration starts over on the learning stage with thenew measuring point settings.

Forecast stage

This stage is used to predict the most probable prediction given a gen-erated pattern. In this investigation the pattern generated by the train-ing period will be used to analyze a new set of stocks that was not usedduring training and output the most probable outcome which will becompared to the true value and generate results.

3.2.2 Variations

Some variable setting are kept at a constant value throughout the in-vestigation but some are varied. These variations are described here.

Unrelated stock selection - eight inclination segments

The set of stocks used in training are 47 out of the 53 stocks suppliedby Yahoo! Finance (see A.1 and A.2). The set of stocks used in theforecasting stage are the remaining 6 stocks (see A.3). The number ofinclination segments are set to eight.

Unrelated stock selection - two inclination segments

The set of stocks used in training and the forecasting stage are the sameas above. The number of inclination segments are set to two.

Similar stock selection - eight inclination segments

The set of stocks used in training are 8 out of the 12 stocks suppliedby Yahoo! Finance in the health care category (see A.4 and A.5). The set

12 CHAPTER 3. METHOD

of stocks used in the forecasting stage are the remaining 4 stocks (seeA.6). The number of inclination segments are set to eight.

Similar stock selection - two inclination segments

The set of stocks used in training and the forecasting stage are the sameas above. The number of inclination segments are set to two.

Random predictions

This setting is used for evaluating the performance of the investigatedmethod. The predictions made are not based of the probabilities fromthe best prediction table but instead random values. The method areotherwise used in the same manner as for the other settings.

The set of stocks used in training are 47 out of the 53 stocks suppliedby Yahoo! Finance (see A.1 and A.2). The set of stocks used in theforecasting stage are the remaining 6 stocks (see A.3). The number ofinclination segments are set to eight.

3.3 Measurements

The results indicates how far the predictions are from the true values.This will be measured in two ways; hit rate and same direction rate.The most promising setting is tested with different stock symbols toget more hit rates on which a mean value and the standard deviation(see formula 3.1) can be used to produce more information about theperformance of the setting.

s =

√√√√ 1

N − 1

N∑i=1

(xi − x̄)2 (3.1)

3.3.1 Hit rate

This measurement is the percentage of true value and prediction valuepairs that classifies in same inclination segment on each prediction

CHAPTER 3. METHOD 13

date.

3.3.2 Same direction rate

This measurement is the percentage of prediction that predicts a risewhen the true value rises and a fall when the true value falls.

Chapter 4

Results

4.1 Variations

Following are the individual results of each variation during bench-marking and also the results from the final prediction during forecast-ing.

4.1.1 Unrelated stock selection - eight inclination seg-ments

With eight segments the evolution with the highest hit rate found dur-ing benchmarking was around 18% on unrelated stocks. As seen infigure 4.1 Using this particular evolution for forecasting gave the fol-lowing hit rates:

• 8.3% were exact hits

• 25% were 1 segment away from hit


• 8.3% were 3 segment away from hit


With the redefinition of a hit being that it was either correctly guessedto be an increase or decrease in value the maximum hit rate found

14

CHAPTER 4. RESULTS 15

during benchmarking was around 64% on unrelated stocks. With thisdefinition the hit rate for the forecast yielded a hit rate of 66.6%

4.1.2 Unrelated stock selection - two inclination seg-ments

With 2 segments the best evolution yielded a hit rate of about 80% onunrelated stocks during benchmarking And during forecasting a hitrate of 75%. As seen in figure 4.3

As this method performed the best further investigation was madeto determine the standard deviation (see formula 3.1) and the meanvalue of this setting, this was done by repeating the test with the samesettings while using varied learning stocks and forecasting stocks fromthe supplied stock data (see section A.1). This yielded a mean value of47% and a standard deviation of 16.3 percentiles.

4.1.3 Similar stock selection - eight inclination seg-ments

With 8 segments the evolution with the highest hit rate yielded a resultof around 31.25% hits during benchmarking on related stocks. As seenin figure 4.4

Using this particular evolution for forecasting gave the following hitrates:

• 25% were exact hits





With the redefinition of a hit being that it was either correctly guessedto be an increase or decrease in value, the maximum hit rate found onrelated stocks with was close to 68.75%.

16 CHAPTER 4. RESULTS

Figure 4.1: Hit rate with 8 inclination segments on unrelated stockdata. 1 away refers to predicted inclination that compared to the realinclination differed in 1 inclination segment. The same goes for 2 awayand so on. The diagram shows how each evolutionary step performed.The settings of the measuring points of two neighbouring evolutionpoints (with exception of every tenth point) differs by the small ad-justments made on each evolutionary step.

With this definition the hit rate for the forecast yielded a hit rate of62.5%.

4.1.4 Similar stock selection - two inclination segments

With 2 segments the best evolution yielded a hit rate of 87.5% dur-ing benchmarking on related stocks as seen in figure 4.6. And duringforecasting yielded a hit rate of 50%.

4.1.5 Random predictions

For easy comparison a test using randomly generated number for theguesses was used and resulted in a hit rate of around 21% as seen infigure 4.7.


Figure 4.2: Same direction rate with 8 inclination segments on unre-lated stock data. The diagram is read in the same manner as in figure4.1

Figure 4.3: Hit rate with 2 inclination segments on unrelated stockdata. The diagram is read in the same manner as in figure 4.1

18 CHAPTER 4. RESULTS

Figure 4.4: Hit rates on each evolutionary step with 8 inclination seg-ments on similar stock data. The diagram is read in the same manneras in figure 4.1

Figure 4.5: Same direction rate with 8 inclination segments on similarstock data. The diagram is read in the same manner as in figure 4.1


Figure 4.6: Hit rates on each evolutionary step with 2 inclination seg-ments on similar stock data. The diagram is read in the same manneras in figure 4.1

Figure 4.7: Hit rates on each evolutionary step with 8 inclination seg-ments on unrelated stock data and random predictions. The diagramis read in the same manner as in figure 4.1

Chapter 5

Discussion

5.1 Discussion of results

The numbers presented in chapter 4 are the results of one iteration ofeach variation. These numbers are not final because of the random na-ture behind the adjustments made on each evolutionary step resultingin iterations that may yield much better or worse results. The numberswere used as a measurement for the fitness of the different settings. Astwo inclination segments on unrelated stocks performed better, furtherinvestigation was made on this setting.

The results and corresponding diagrams may allow for many differentinterpretations. One reason is the decision of presenting every evolu-tion whether that evolution was an improvement or not and anotherreason is the variations that were tested. The presentation of everyevolution makes it more easy to see if there were clear improvementsbetween different evolutions or whether improvements appeared seem-ingly due to a random mutation faring better than a previous evolu-tion, the latter being the likelier candidate due to the volatile natureof the hit rates. One interpretation of the diagrams and this volatilenature is that small adjustments to the measuring dates greatly affectsthe results which makes the search for a desired pattern sensitive tochange and therefore makes it hard to find with a small number of evo-lutions. The trouble of finding patterns reinforces the Efficient markethypothesis[3].

20

CHAPTER 5. DISCUSSION 21

Figure 5.1: Illustration of a theoretical outcome of the evolutionarysteps

Figure 5.1 illustrates the resulting diagram of evolutionary steps thatwould ensure better results with more steps. The results from the in-vestigations shows no such increase in hit rates and thus the amountof evolutions only increases the probability of finding reliable patternsbut any number of evolutions may produce the same results.

By using random prediction with eight inclination segments one mayobserve the even spread between the hit rates of different levels of ac-curacy. When compared to using the proposed method for guessinginclination segments, the data of the random prediction appear to bemore accurate. It is interesting to note however that the spread be-tween the levels of accuracy differ from the two.

Although the random predictions (figure 4.7) performed better in theevolutionary steps using eight inclination segments (figure 4.1) theforecasting with an error margin of one inclination segment showsthat the proposed method with the same settings outperformed therandom predictions. The same direction rate differs by 25 percentileswhich also supports the conclusion that the proposed method is bet-ter than completely random guesses. The random prediction havingbetter results in the evolutionary steps may be due to the coincidenceof guessing the correct values at that one time, but using the settingsfrom that generation did not affect the low probability of guessing the

22 CHAPTER 5. DISCUSSION

correct inclination again.

The resulting hit rate from using only two inclination segments (seefigure 3.1 & figure 4.3) follows the same volatile nature as the pre-ceding results but the average hit rate is above those generated by arandom method. The hit rates of each evolutionary step vary between40% and 80% with an average hit rate of 55%. The performance fromrandomly guessing up or down would be a hit rate around 50% whilethe proposed method reached 80% during learning and 75% on theforecast. The results from further investigation shows that this casecould be one of the more unlikely as the mean value appeared to be47% with a standard deviation of 16.3 percentiles. This shows that15.9% of the runs would give a hit rate of 63.3% or more. More than50% of the runs would however perform worse than randomly guess-ing whether there is an increase or decrease in the stock value. Theother settings have not been tested this thoroughly and might yield ahigher mean value and a smaller standard deviation.

A comparison can be made between both diagrams using two incli-nation segments (see figure 4.3 & figure 4.6) and the diagrams usingeight inclination segments (see figure 4.2 & figure 4.5) when only con-sidering whether the predicted value increased or decreased due tothe difference in the amount of inclination segments that were usedduring the learning process and what these data sets consider a hit isthe same. When comparing it is clear that on similar stocks using twoinclination segments outperforms using eighth in both highest hit ratereached as well as the average hit rate. A comparison with the cor-responding diagrams with non related stocks shows much the samebehaviour being exhibited.

In regards to comparing related and unrelated stocks, no definitiveconclusion can be drawn as can clearly be seen when comparing figure4.1 using a large amount of unrelated stocks and figure 4.4 using asmaller amount of related stocks where the results appear to be morevolatile than the other results in general. A conclusion that can bedrawn however is that the amount of data is of greater importance.

CHAPTER 5. DISCUSSION 23

5.2 Discussion of method

The proposed method uses an algorithm with a questionable asso-ciation to evolutionary algorithms as it can resemble a chain whereeach link is generated from the previous link and the strongest linkinfluence every tenth link, instead of the typical population and se-lection stages. The method does however evolve in a way such thatthe strongest offspring survives throughout the process of selectionbut the population is handled differently. This questionable imple-mentation of evolutionary algorithms may have a great impact on theperformance compared to other evolutionary algorithms.

Only six measurement points were chosen throughout the experimentwhich affects the precision and the amount of data analyzed. No con-clusions can be drawn considering this variable except that the sixmeasurement points generated promising results.

The decision of having two prediction points instead of only one or alarge period of data points is believed to decrease not only the memoryusage and execution time but also the uncertainty of the predictions.The proposed method would most likely perform worse with moreprediction points if no other changes of the settings are made.

The proposed method measures dates scattered over four months topredict a date one and two months after the last measuring point. Byanalyzing months instead of years makes more data available and itallows more drastic changes in the stock value to be analyzed morethoroughly while having shorter intervals like weeks or days wouldaffect the execution time or the ability to find long term patterns in thehistory of the stock value. No conclusions can be drawn as only oneinterval setting were tested.

The learning stage analyzed 16 years worth of data (or less if the stockwas missing data) which might hide modern, short term patterns there-fore make the performance suffer. Decreasing this period might re-sult in a better performance but no conclusions on this matter can bemade.

Chapter 6

Conclusion

It was found that the proposed method for predicting stock price changeshad the potential to perform better than random predictions. Whileonly predicting whether the stock price would increase or decreaseseemed to perform better than having better precision but further in-vestigation of the setting revealed that it might perform worse or equalto random predictions. Less precise predictions seemed to performbetter but a conclusive statement on whether this method can be usedto accurately predict stock price changes can not be made.

No conclusive statement can be made whether related stocks or ran-domly selected stocks performs better due to a number of variablesaffecting the results. One variable was the amount of data availableto train the proposed algorithm; more data seems to have a positiveeffect on the performance but the larger time periods being analyzedmight have the opposite.

Some questions are left unanswered as further research is required.Settings not tested in this investigation might produce predictions ofhigher accuracy.

24

Bibliography

[1] Carl Sandström. An evolutionary approach to time series forecastingwith artificial neural networks. Degree project from KTH Royal in-stitute of technology - CSC school (2015), http://kth.diva-portal.org. Accessed: 2017-03-28.

[2] Defu Zhanga QiSen Caia et al. A Novel Stock Forecasting Modelbased on Fuzzy Time Series and Genetic Algorithm. International Con-ference on Computational Science (2013), http://www.sciencedirect.com. Accessed: 2017-03-28.

[3] Investopedia. Efficient market hypothesis. http://www.investopedia.com/terms/e/efficientmarkethypothesis.asp. Accessed:2017-03-28.

[4] Investopedia. Trendline. http://www.investopedia.com/terms/t/trendline.asp. Accessed: 2017-03-31.

[5] Yahoo! Finance. https://finance.yahoo.com/. Accessed:2017-03-28.

[6] Nasdaq. http://www.nasdaq.com/. Accessed: 2017-03-28.

[7] Investopedia. Adjusted Closing Price. http://www.investopedia.com/terms/a/adjusted_closing_price.asp. Accessed:2017-03-28.

[8] J.E. Smith A.E Eiben. “What is an Evolutionary Algorithm?” In:Introduction to Evolutionary Computing. 2003. Chap. 2, pp. 15–18.

25

Appendix A

Stock symbols

The following stock symbols are supplied by Yahoo! Finance[5] fromthe Nasdaq[6] stock market.

A.1 Stock symbols of the supplied stocks

LJPC • DRYS • AAPL • BCRX • ERIC • GOLD • NVDA • XXIA •TSEM • MNKD • URBN • SIRI • AMZN • ASML • CBMX • LOGI •NVMI • BIDU • STX • NFLX • SHIP • CYNO • GILT • QCOM • PTEN• SSRI • XTLB • AFSI • CELG • SAFT • GPRO • KFRC • VOXX • MELI• FSLR • MSFT • CSCO • MASI • ALLT • DNBF • ETRM • PCLN •CERN • GILD • GOOG • SBUX • INFN • HIMX • KTOS • MYGN •SWKS • AMGN • ACAD

A.2 Unrelated stock symbols for training

LJPC • DRYS • AAPL • BCRX • ERIC • GOLD • NVDA • XXIA •TSEM • MNKD • URBN • SIRI • AMZN • ASML • CBMX • LOGI •NVMI • BIDU • STX • NFLX • SHIP • CYNO • GILT • QCOM • PTEN• SSRI • XTLB • AFSI • CELG • SAFT • GPRO • KFRC • VOXX • MELI• FSLR • MSFT • CSCO • MASI • HIMX • KTOS • MYGN • SWKS •AMGN • ACAD • ALLT • DNBF • ETRM

26

APPENDIX A. STOCK SYMBOLS 27

A.3 Unrelated stock symbols used for eval-uation

PCLN • CERN • GILD • GOOG • SBUX • INFN

A.4 Health care stock symbols

LJPC • BCRX • MNKD • CBMX • XTLB • CELG • MASI • ETRM •GILD • MYGN • AMGN • ACAD

A.5 Health care stock symbols used for train-ing

LJPC • BCRX • MNKD • CELG • MASI • GILD • MYGN • ACAD

A.6 Health care stock symbols used for eval-uation

AMGN • ETRM • XTLB • CBMX

www.kth.se

Documents

A performance study of an evolutionary algorithm for two ...kth.diva-portal.org/smash/get/diva2:1105919/FULLTEXT01.pdfevolutionary algorithm for two point stock forecasting FREDRIK