View
1
Download
0
Category
Preview:
Citation preview
INOM EXAMENSARBETE TEKNIK,GRUNDNIVÅ, 15 HP
, STOCKHOLM SVERIGE 2019
Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data
SARA BARWARY
TINA ABAZARI
KTHSKOLAN FÖR TEKNIKVETENSKAP
Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data SARA BARWARY TINA ABAZARI
Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2019 Supervisors Rickard Henricsson, Peyman Dabiri & Cecilia Pettersson Supervisors at KTH: Camilla Landén, Per Jörgen Säve-Söderbergh & Julia Liljegren Examiner at KTH: Per Jörgen Säve-Söderbergh
TRITA-SCI-GRU 2019:270 MAT-K 2019:29
Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci
Abstract
In thesis within Industrial Economics and Applied Mathematics in cooperation
with Svenska Handelsbanken given transformations was examined in order to
assess their ability tomake a given time series stationary. In addition, a parameter
α belonging to each of the transformation formulas was to be decided. To do this
an extensive study of previous research was conducted and two different tests of
hypothesis where obtained to confirm output. A result was concluded where a
value or interval for α was chosen for each transformation. Moreover, the first
difference transformation is proven to have a positive effect on stationarity of
financial data.
Sammanfattning
Det här kandidatexamensarbetet inom Industriell Ekonomi och tillämpad
matematik i samarbete med Handelsbanken undersöker givna transformationer
för att bedöma deras förmåga att göra givna tidsserier stationära. Dessutom
skulle en parameter α tillhörande varje transformations formel bestämmas. För
att göra detta utfördes en omfattande studie av tidigare forskning och två olika
hypotestester gjordes för att bekräfta output. Ett resultat sammanställdes där ett
värde eller ett intervall för α valdes till varje transformation. Dessutom visade
det sig att ”first difference” transformationen är bra för stationäritet av finansiell
data.
Keywords
Bachelor Thesis, financial outcome, transformations, stationarity, tests of
hypothesis, EWMA
i
1 Preface
This Bachelor’s thesis was written in the spring of 2019 by Sara Barwary and
Tina Abazari during a five-years Master’s program within Industrial Engineering
and Management at KTH Royal Institute of Technology. The thesis is based on
application of theory frommathematical statistics as well as the field of industrial
economics. We would like to thank Cecilia Pettersson, Rickard Henricsson
and Peyman Dabiri at Handelsbanken for contributing to the work and giving
resources needed. We would also like to express appreciation to our supervisor
Camilla Landén and additionally Per Jörgen Säve- Söderbergh at KTH for helping
and giving support when facing problems throughout the work. Julia Liljegren
at the department of Industrial Engineering and Management also provided
valuable input and guidance to the project.
ii
Contents
1 Preface ii
2 Introduction 12.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Goal and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Economic Theory 63.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Market Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.3 Exchange Rates . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.4 Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.5 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.6 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Timing of Entry Framework . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 First Mover Advantages . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 First Mover Disadvantages . . . . . . . . . . . . . . . . . . . 10
3.3 Porter’s Five Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Mathematical Theory 134.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 The Objectives of Time Series Analysis . . . . . . . . . . . . 13
4.1.2 Time Series Decomposition . . . . . . . . . . . . . . . . . . . 14
4.1.3 Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.4 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.5 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Stationarity Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test . . . . . 20
4.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Level Transformation . . . . . . . . . . . . . . . . . . . . . . 23
iii
4.3.2 First Difference Transformation . . . . . . . . . . . . . . . . 23
4.3.3 Mean EWMA-transformation . . . . . . . . . . . . . . . . . 24
4.3.4 Variance-EWMA Transformation . . . . . . . . . . . . . . . 24
4.3.5 Skewness EWMA Transformation . . . . . . . . . . . . . . . 25
4.3.6 Kurtosis-EWMA Transformation . . . . . . . . . . . . . . . . 25
4.3.7 Autocorrelation Transformation . . . . . . . . . . . . . . . . 25
4.3.8 Correlation-EWMA Transformation . . . . . . . . . . . . . . 26
5 Methodology 285.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Data and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 Exchange Rates (FX data) . . . . . . . . . . . . . . . . . . . 29
5.2.2 US Sectors Data . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.3 Countries- Stock Index Data . . . . . . . . . . . . . . . . . . 29
5.2.4 Commodities Data . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.5 VIX- Market Volatility Index Data . . . . . . . . . . . . . . . 30
5.2.6 Bond (IR) Data . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Selection of Transformations and Hypothesis Tests . . . . . . . . . 31
5.4 Selection of Market Entry Frameworks . . . . . . . . . . . . . . . . 31
5.5 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.6 Procedure of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Results 406.1 First Trial: Plots for Currency Rates, with Fixed α . . . . . . . . . . 40
6.1.1 Statistics for First Trial . . . . . . . . . . . . . . . . . . . . . 45
6.2 Second Trial: Plots for Commodity, with a Fixed α . . . . . . . . . . 47
6.2.1 Statistics for Second Trial . . . . . . . . . . . . . . . . . . . . 50
6.3 Third Trial: Plots for Commodity Prices, with a Fixed α . . . . . . . 51
6.3.1 Statistics with trial 3 . . . . . . . . . . . . . . . . . . . . . . . 54
6.4 Seasonality and Trends . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.6 First Differences on all Data . . . . . . . . . . . . . . . . . . . . . . 59
6.7 Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . . . . . 60
iv
6.7.1 Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7.2 US-Sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7.3 Countries Index . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.7.4 Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.7.5 VIX (Market Volatility) . . . . . . . . . . . . . . . . . . . . . 63
6.7.6 IR (Bonds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.7.7 Aggregated α . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7 Conclusions 657.1 Interpretation and Impact . . . . . . . . . . . . . . . . . . . . . . . 65
7.1.1 Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1.2 Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1.3 Trial 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.1.4 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . 66
7.1.5 First Difference as a Transformation . . . . . . . . . . . . . . 66
7.1.6 Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Analysis of Timing of Entry and Competitive Rivalry . . . . . . . . . 69
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.4 Benefits for SHB and its Stakeholders . . . . . . . . . . . . . . . . . 73
7.5 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
v
2 Introduction
2.1 Background
The last couple of years machine learning based forecasting has gained attention
increasingly and becomemore established. Moreover, usingmachine learning for
prediction of financial outcome has become desirable among financial institutions
and private investors.1 There are ongoing discussions and research about how to
improve these prediction models, as well as about how to pre-process input data
in order to obtain predictions with high accuracy.2 This is why machine learning
has become essential since the effective method combines computer science and
mathematics to develop models with the intent of delivering maximal predictive
precision.
Predictions of financial outcome, for example security prices or market indices,
involve a time component since future price movements may be dependent on
past values. Thus, the time dimension needs to be taken into account when
using a machine learning based prediction model. These prices in the financial
market can be seen as observations at points in time. Financial price over a time
period can therefore be described as a time series. As mentioned, the interest for
using machine learning for prediction of price movements in the financial market
has grown. Consequently, time series forecasting has become an increasingly
important area of machine learning.3
The underlying assumption in time series forecasting and the related machine
learning methods is that the input data, is a stationary process. That is, the
statistical properties for example the mean, variance and autocorrelation of the
time series should not change over time.4 However, most data is not stationary.
1Sarlin, Peter. Björk, Kaj-Mikael.”Machine learning in finance”. Neurocomputing. Vol. 264,2017: 1-88, Retrieved 2019-02-02
2Palaniappan, Vivek.”Using Machine Learning to Predict Stock Prices” 2018-10-31 https://medium.com/analytics-vidhya/using-machine-learning-to-predict-stock-prices-c4d0b23b029a (Retrieved 2019-02-02)
3Brownlee, Jason. ”What is Time Series Forecasting?”. Machine Learning Mastery. 2016-12-2https://machinelearningmastery.com/time-series-forecasting/?fbclid=IwAR1Zpv80x-4EEN-IIo-h1HL5fGHF6fD-OZYpknScLWdmU-p3uJ803ZF9Ag(Retrieved2019− 05− 01)
4Lindgren, George. ”Stationary stochastic processes”p.13-16http://www.math.chalmers.se/ rootzen/fintid/stationary120312.pdf (Retrieved 2019-02-02)
1
As the time span of historical observations increases, the greater is the probability
of the time series showing non-stationary characteristics.5
For many machine learning methods, handling non-stationary data sets is a
challenge since it could increase the risk of obtaining prediction outcomes
significantly different from the real outcomes. Non-stationary time series is
a result from data showing trends, seasonal effects, cycles, noise and other
structures dependent on the time observation. Therefore, it cannot be analyzed
through traditional techniques. Instead, forecasting non-stationary time series
may require models with higher complexity. In order to facilitate achieving more
reliable output from a prediction model effects such as seasonal components and
trends may need to be removed from the input data set.6 It is possible to make
data stationary, or at least approximately stationary by the use of mathematical
transformations.
In the last couple of months, Svenska Handelsbanken AB (SHB) has been
discussing a market entry for new financial products. The idea is to predict the
return of the securities with a machine learning based model, which the products
can be based upon in the future.
Richard Henricsson at SHB conducted research ten years ago regarding
mathematical transformations and their ability to generate stationary financial
data. As a result of considering this potential business idea, the question has been
raised by SHB regarding whether the transformations are still applicable to data
today. Henricsson found several transformations, including both established ones
and his own approximations. The approximations were derived with the aim to
reduce complexity of some of the transformations. His studies resulted in seven
chosen transformations.
• Differencing (First order)• Exponentially weighted moving average: Mean• Exponentially weighted moving average: Variance
5Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p.16-19 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-02-19)
6Kang,Eugine. ”TimeSeries: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-stationarity-1bee9085da05 (Retrieved 2019-02-23)
2
• Exponentially weighted moving average: Skewness• Exponentially weighted moving average: Kurtosis• Exponentially weighted moving average: Autocorrelation• Exponentially weighted moving average: Correlation
The definition and meaning of these will be explained more thoroughly in the
theoretical background, Section 4. Except for the first difference transformation,
the other transformations depend on a unknown constant α. Changing the
value of α will result in different output obtained from each transformation.
Consequently, the choice of α for a chosen transformation may have an impact on
whether the data can be made stationary. In accordance, this has raised interest
for SHB to examine the specific values for α to potentially make financial time
series stationary.
Furthermore, SHB is one of the biggest banks in the Nordic countries. In
the Nordic financial sector, there are not many commercial players today
providing financial products related to machine learning based financial outcome
prediction. Consequently, SHB has the potential to be among the first players in
this area. It is therefore of interest for SHB to understand how the timing of entry
to market can affect their business.
2.2 Research Question
The work of this thesis was done in cooperation with SHB. The main research
question is to examine whether financial data can be transformed to become
stationary, and forwhat value or values of the parameterα stationarity is achieved.
The time span of all the data sets is 2001-01-01 to 2018-12-31. The main research
questions to be answered are consequently the following:
1. Are the given transformations sufficient enough to make the data stationary?
2. Which parameter value or values of α for each transformation will make the
data potentially stationary?
Also, a discussion will be held regarding the effects of the timing of entry to a new
market. More precisely, given that the transformations can make financial data
3
stationary and SHB can develop financial products based on machine learning
financial outcome prediction, how will the timing of potential new product
launches affect their competitive advantage.
2.3 Goal and Purpose
Predicting financial outcome for securities is relevant for private investors as well
as for financial institutions. The goal of this thesis is to examine how financial
data may be pre-processed in order to make it useful as input data to a prediction
model.
The main goal for SHB is to, based on the results of this thesis, separately develop
a machine learning based forecasting model for prediction of financial outcome.
More precisely, their model will indicate future price movements in the financial
market, mainly for stocks inwell developed countries such as theUS. For example,
this can be stock indices from theUS such asDow Jones Industrial Average or S&P
500.
Since SHB is currently in the initial phase of the model development, it is of
importance for them to know if it is possible to make input data for the future
model stationary. The goal of this research is to provide an insight regarding the
question. If it not possible tomake the data stationary, itmay be required for them
to consider conducting further research on building a model on non-stationary
data. Alternatively, this thesis can answer whether it is needed for SHB to conduct
further research regarding how to make data stationary. Therefore, the greater
purpose of this study is to give a direction for the future work for SHB.
In the market entry discussion, the market of SHB will be limited to other banks
institutions in Sweden/the Nordics since SHB:s main business activity lies within
this area.
4
2.4 Scope and Limitations
The scope of this thesis is limited to examining transformations givenby SHB.Also
the data is provided by SHB and it is mainly related to the financial markets of the
US and other well developed countries. Moreover, it is necessary to determine
what qualifies as stationarity since there exists a strong and weak form. For this
research, it has been decided that it is sufficient if a time series only fulfills the
requirements of weak stationarity since proving strict stationarity for a whole data
set is complex. The difference between these types of stationarity is explained
in Section 4.1.5.7 Moreover, the project will be limited to only two different
hypothesis tests, both chosen by SHB. These were chosen since they are based
on different model assumptions and hypotheses and may therefore give a wider
perspective to the analysis of the results.
7”Stationarity Differencing”https://www.statisticshowto.datasciencecentral.com/stationarity/ (Retrieved 2019-03-02)
5
3 Economic Theory
Terminology related to the financial market that are mentioned in the thesis or
used as input data in the research are explained in this section. The purpose
is to facilitate obtaining an understanding of the content of this thesis. Theory
regarding stocks, bonds and other financial assets will be provided to understand
why they are important to look at when studying an economy. Moreover a model
of Porters Five Forces will be introduced and discussed as well as benefits with
being a first mover to the market.
3.1 Terminology
3.1.1 Securities
A security is a financial asset that can be traded. There are are several types of
securities and these are in general classified as equity securities, debt securities
and derivatives.
Equity securities represents ownership in an entity. The most common equity
security is a stock, which is an ownership of a share of a company.8
A holder of a debt security borrows money which later must be repaid. For
instance, when a debt security is issued, different terms are formulated for
example, for the size of the loan, the maturity date and the interest rate.
Corporate bonds and government bonds are examples of two frequently debt
securities.9
Derivatives are contracts between at least two parties. The value of the contract
is based on an underlying asset such as a stock, a market index, interest rate or a
market index. There are various derivatives, such as options and futures.10
8Kenton, Will. ”Security”, 2019-05-20. https://www.investopedia.com/terms/s/security.asp(Retrieved 2019-05-22)
9Chen,James. ”Debt Security”, 2019-03-23. https://www.investopedia.com/terms/d/debtsecurity.asp(Retrieved 2019-05-20)
10Chen, James. ”What is a Derivate?”, 2019-05-19.https://www.investopedia.com/ask/answers/12/derivative.asp (Retrieved 2019-05-22)
6
3.1.2 Market Index
A market index is a measurement of a segment of the financial market. More
precisely, the index shows the performance of the securities within the chosen
segment. A market index is computed from the prices of the securities. There are
several weighting methods for determining the impact of each price. 11
3.1.3 Exchange Rates
An exchange rate is the value of an economic zone’s currency compared to the
currency of another nation or a specific economic zone. The currency exchange
rate is one of the most important factors to use when indicating a country’s
economic health relative to others. It is vital to a country’s level of trade and
financial flows in the area.12 Movements in the exchange rate has an influence on
the decisions of businesses, government and individuals in society. Collectively,
this may have an effect on the activity on the financial markets (for example on
how people trade and how securities are valued).13
3.1.4 Commodities
Commodities are basic goods used in commerce and as input in productions
of both products and services. The price of it is usually decided by the whole
market. It could be anything from raw material to chemicals sold. Commodities
aremost commonly sold and purchased through future contracts that standardize
the quantity and minimum quality of the commodity that is being traded. The
market of commodities is important since it offers a market place wheremembers
can transact business. It also establishes a regulated trading with rules and
11Young,Julie. ”Market Index”, 2019-05-02. https://www.investopedia.com/terms/m/marketindex.asp(Retrieved 2019-05-22)
12Twin, Alexandra. ”6 Factors that Influence Exchange Rate”, 2019-05-20. https://www.investopedia.com/trading/factors-influence-exchange-rates/ (Retrieved 2019-05-20)
13Hamilton, Adam. ”Understanding Exchange Rates and Why They Are Important”,2018. https://www.rba.gov.au/publications/bulletin/2018/dec/pdf/understanding-exchange-rates-and-why-they-are-important.pdf (Retrieved 2019-05-20)
7
regulations. Moreover it is a place for collecting and disseminating as well as
grading of the commodities depending on quality.14
One example that will be used in the thesis is the the spot price of crude oil which
is considered one of the most important commodities in the world. Since today’s
society and economy is dependent on non-renewable fossil fuels crude oil plays an
important role in the market of commodities. The cost of a barrel of crude oil is
determined by the global market, more precisely the supply and demand of it. For
example, if the demand for crude oil is high and the supply is low, the result will
be higher oil prices. This is important for economists and experts to predict since
the prices are volatile. The price of oil can directly or indirectly through multiple
steps affect the costs of goods and services in the economy which can result in
inflation. The West Texas Intermediate crude oil is considered one of the major
benchmarks of crude oil.15
3.1.5 Volatility
Volatility is the standard deviation the return of an asset. The standard deviation
is the square root of the variance. Both variance and standard deviation measure
the variability of a return.
The volatility is as an indicator of the risk level for an assets, for instance a security,
portfolio ormarket. It is expected to bemore challenging to predict the price of an
highly volatile asset. Consequently volatile assets are viewed as riskier compared
to less volatile assets. Shortly, volatility is considered as the risk related to the
change in the asset’s price.
The VIX Index is an example of a market volatility measure. Before making an
investment decision, investors normally look at the VIX values to gain insight
about the market risk.16
14Lioudis, Nick. ”CommoditiesTrading: An Overview”, 2018-05-18. https://www.investopedia.com/investing/commodities-trading-overview/(Retrieved 2019-05-20)
15Premkumar,Divya. ”How do oil prices affect stock market”, 2019-01-08. https://www.tradebrains.in/how-do-oil-prices-affect-the-stock-market/(Retrieved 2019-05-01)
16Kuepper, Justin. ”Volatility Definition”, 2019-04-18.https://www.investopedia.com/terms/v/volatility.asp (Retrieved 2019-05-01)
8
3.1.6 Bonds
A bond is a fixed income instrument that is a loan made by an investor to a
borrower. When companies or other financial institutions need to finance new
projects, ongoing operations or other financial investors they can issue bonds
directly to investors. The borrower, the one that issued the bond, for example
includes terms of the loan, interest payments and maturity date. The interest
payment, the coupon, is the earning for bondholders for loaning their funds.
The interest rate that determines the payment is called the coupon rate. A
government bonds is a bond issued by the government. Treasury yield is the
return on investment on the U.S. government’s debt obligations. It is important
when analysing stocks since it tends to signal investor confidence. When it
is high the bond’s price drops and yield increase since investors believe they
can find investments with higher return. When confidence is low, the opposite
occurs.
Bonds will affect the amount of liquidity in countries since it determines how easy
or difficult it will be to take loans and buy on credit for example. Since the bonds
are so strongly related to the economy it means they are important for forecasting.
Bond yields will indicate what investors think the economy will do.17
3.2 Timing of Entry Framework
When firms are about to enter a new market, either by launching a new product
or expanding to new regions, one main concern is regarding when to enter the
market. Entrants are usually divided into three categories depending on their
time of entrance. These are the firstmovers, early followers and the late entrants.
Earlier research have resulted in contradictory answers to the question of which
entry timing strategy is the optimal and why.
The first movers of a market are the first to bring and sell a new good or service
to the market. Early followers are relatively early to the market, even though
17Amadeo, Kimberly. ”How BondsAffect the U.S. Economy”, 2019-01-20. https://www.thebalance.com/how-do-bonds-affect-the-us-economy-3305601 (Retrieved 2019-05-01)
9
they are not the first to enter. Lastly, the late entrants are seen when a product
is becoming or has become more commercial, in other words when the product
gains mass market penetration.
3.2.1 First Mover Advantages
The theory of timing of entry also covers the advantages and disadvantages of
being the first mover. According to theory, the first mover will gain brand loyalty
and technological leadership. Additionally, first movers have more time on the
market, enabling them to gain more market share. This could eventually result
in a Winner-Takes-All Market. The reasons is that the company may be posed
as a technological innovator and gain reputation as a leader. Being the first also
enables the player to develop the characteristics of the technology, for instance its
features, functionality of the technology, as well as forming the pricing.
Firms that enter the market early can capture important resources such as key
locations, government permits, patents to the technology, access to distribution
channels and develop relationships with suppliers. Another advantage with being
early is exploiting buyer switching costs. In other words if a buyer faces switching
costs when changing to other superior technology and has invested time in the
technology, the first mover that captures customers may be able to keep those
customers. If the industry pressures and encourages the adoption of a dominant
design the timing of the entry could be critical to its likelihood of success.
3.2.2 First Mover Disadvantages
Studies have shown that many first movers are exposed to higher costs, which
reduce the profits of their businesses. To become the first mover, it may be
required to add resources to research and development work. The late entrants
have on the other hand the possibility to use already existing work, technology
and knowledge developed by the first mover, to create a similar product. They
can also adapt the product or service development to the customers’ preferences
instead of facing customer uncertainty of requirements. As a result, they can avoid
high development expenses.
10
Another negative aspect is that new developed technologies may require other
technologies or components produced by other firms. Therefore, they are
dependent on the effort of other firms. The first movers can therefore not rely
on enabling technologies. Moreover, when firms introduces new technology and
innovations, often there are no appropriate suppliers or distributors exist. This
will lead to the firm having to assist the suppliers or perhaps develop its own
suppliers which is a time and resource demanding task.
3.3 Porter’s Five Forces
Porter’s Five Forces Framework, developed by Michael Porter, is a tool for
analyzing the market dynamics and the competition of a business. The purpose
of the model is to identify and analyze five competitive forces that shape every
industry and helps determine an industry’s weaknesses and strengths. The
insights are often used to see if new product or service offerings can be profitable.
Also it may be used for answering strategic questions such as how, where and
when a market entry should be done. The five forces are threats of new entrants,
bargaining power of suppliers, bargaining power of customers, threats of
substitute products and competitive rivalry. All together, the four first forces
describe the competitive rivalry.
11
Figure 3.1: Porter’s five forces model and important questions to answer duringthe analysis
12
4 Mathematical Theory
The following section provides information regarding the mathematical theories
and models used in the thesis. It also intends to explain the assumptions which
the models are based upon.
4.1 Time Series
A time series is a series of data points, measured over a time period and indexed
in time order. In other words, values are taken by a variable over time in
chronological order.18 The time series is denoted as a vector {Xt}, t=0,1,2.... wheret represents the time and Xt is seen as a random variable. There exists both
discrete and continuous time series for a time series. For a time t ∈ [0,∞).
4.1.1 The Objectives of Time Series Analysis
The primary objective of time series analysis is the development of mathematical
models that describe the data sample. The purpose is to extract meaningful
statistics and characteristics of the data. There are in general two main goals of
the time series analysis:
1. Identifying the nature of the phenomenon. What does it contain?
2. Forecasting or in other words predicting future values of the time series
variable.
These goals require an identification of the pattern that is observed in the time
series. With this it can be interpreted and integrated with other data for a forecast
model.19
18”Time Series” http://www.businessdictionary.com/definition/time-series.html (Retrieved2019-01-30)
19”Time Series” https://www.stat.ncsu.edu/people/bloomfield/courses/st730/slides/SnS-01-2.pdf (Retrieved 2019-02-02)
13
4.1.2 Time Series Decomposition
Within time series analysis, one can decompose a time series into several
components. Let {Xt} be a sequence of random variables. Then, a time series
can be decomposed either additively as:
Xt = Tt + St + ϵt
or multiplicatively as
Xt = Tt ∗ St ∗ ϵt
where Tt is the trend component at time t, St is the seasonal component at time t
and ϵt is a irregular component at time t.20
Over a long time period a time series may show a general tendency of decrease,
increase or stagnation. This is represented by the trend component in a
decomposition. The seasonal component exhibits patterns affected by seasonal
factors such as the day of the weak or the quarter of the year. The period of
the seasonality is fixed and known. Further, the irregular component portrays
events that do not occur regularly and are of unpredictable characteristics.21
The irregular component corresponds to the residual obtained after the trend
and seasonality have been removed, that is, ϵt is a random noise component.
Additionally, ϵt is stationary at least in the weak (described in Section 4.1.3) sense.22
4.1.3 Trends
Usually one wants to know if there is a trend in the time series to support future
forecasting. In some cases a trend is seen as an accumulated effect of certain
factors and in other cases trends indicate a kind of influence that needs further
investigation. The trend could for example be linear, exponential or even mixed
20Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf (Retrieved 2019-02-16)
21 Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)
22Brockwell, J Peter. Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.20.Third ed, Springer
14
between different types.23
4.1.4 Seasonality
In time series data, seasonality is a presence of variations that occur at specific
regular intervals for example every autumn. These repeat regularly over time.
Identifying or removing seasonal components could result in a more clear
relationship between the variables that are input and output. It could also provide
information that is helpful for improvement of model performance.24
4.1.5 Stationarity
A stationary assumption is equivalent to saying that the generating mechanism
of the process is itself time-invariant, so that neither the form nor the parameter
values of the generation procedure change over time. A process {Xt}, t ∈ Z (where
Z is the integer set) is defined to be weakly stationary if it satisfies
1. E[Xt] = µ
2. Var[Xt] = σ2x < ∞
3. γX(s, t) = γX(s + h, t + h) for all s, t, h ∈ Z, where γ is the autocovariance
function.
In other words this means that a stochastic process that is stationary will have a
mean and variance that do not change over a time period. Also the autocovariance,
meaning the covariance between the values of the process at two points in
time, will only depend on the distance between the time points and not on
time itself.25 There is also a more restrictive definition of stationarity than
the above mentioned. A time series {Xt1 , Xt2 ..., Xtn , t = 0,±1,±2, ....} is strictlystationary if the same joint probability distribution holds for (Xt1 , ..., Xtn) as for
(Xt1+h, ..., Xtn+h), that is
23Deshpande, Bala. 2014-03-12 ”Time series forecasting:understanding trend and seasonality” http://www.simafore.com/blog/bid/205420/Time-series-forecasting-understanding-trend-and-seasonality (Retrieved 2019-05-01)
24Brownlee, Jason. 2016-12-23 ”How to Identify and Remove Seasonality from Time SeriesData with Python” https://machinelearningmastery.com/time-series-seasonality-with-python/(Retrieved 2019-04-14)
25A. Lincoln. Introduction to the theory of time series, Chapter 1 p.4-6
15
(Xt1 , ....., Xtn)d= (Xt1+h, ....., Xtn+h)
for all integers h and n>0 .26
The importance of stationarity is great. If the data selection of a time series is non-
stationary the series can very much influence both its behaviour and properties.
Thus, a regression depending on the data points will be hard to prove. Also, if the
variables in a regression model not are surely stationary, the assumptions for the
asymptotic analysis may not be valid.27 Non-stationary time series will depend on
data showing trends, seasonal effects and other structures dependent on the time
observation.28 A time series is usually non-deterministic, hencewhat occurs in the
future can not be predicted with certainty. Therefore, the concept of stationary of
a time series abates the complexity in forecasting the future.29
In order to prove or check for stationarity there are a number of different
approaches that could be useful. The most commonmethods are examining plots
and statistical tests.30 One can run a sequence of plots and examine them to
find any obvious trends or seasonal effect. With this, summary statistics can be
obtained which are used to summarize a set of observations, to communicate as
much of the information as possible. In the process the data is partitioned into
intervals and then it is checked if there are obvious or significant differences in
the summary statistics between them. Statistical tests can provide a method for
making quantitative decisions about a particular sample.
26Brockwell, J Peter . Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.13.Third ed, Springer
27Ryabko,Daniil. ”Asymptotic Nonparametric Statistical Analysis of Stationary Time Series”,2019-03-30 https://arxiv.org/abs/1904.00173 (Retrieved 2019-05-01)
28Kang,Eugine.”TimeSeries: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-stationarity-1bee9085da05 (Retrieved 2019-02-23)
29Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)
30”Tests of Stationarity” https://people.maths.bris.ac.uk/ magpn/Research/LSTS/TOS.html(Retrieved 2019-02-12)
16
Figure 4.1: The following graph illustrates a non-stationary time series, a randomwalk that has not been adjusted
Figure 4.2: This figure illustrates the same data but after stationarity is obtainedwith the first difference transformation. As one can see the graph seemsmore likea even line, indicating stationarity.
17
4.2 Stationarity Hypothesis Testing
As mentioned in the limitations to this project, we will only use two different
stationarity tests. These hypothesis tests are used to obtain an indication as to
whether a time series is stationary. However they can not be used as a proof
of stationarity. If the counter hypothesis is rejected, the null-hypothesis is not
confirmed. A non significant result only means it can be concluded that the
counter-hypothesis is not a strong competitor to the null-hypothesis. Also, in
general there can be many other null-hypotheses that also would not have been
rejected.31
4.2.1 Dickey-Fuller Test
Acommonly usedmethod for checking the existence of a unit root is by theDickey-
Fuller test, which was developed by David Dickey and Wayne Fuller (1979).
The Dickey-Fuller hypothesis test gives an indication on whether a process is
stationary or not.32 The test checks if a process follows a unit root process. The
augmented Dickey-Fuller (ADF) test is an expansion of the original Dickey-Fuller
(DF) test, used for higher order correlations, since the Dickey-Fuller is only valid
for AR(1)-processes. An AR(1)-process is an autoregressive process of the first
order. This means that the current value is based on the immediately preceding
value.33 Similar to the originalDF-test, theADF tests for a unit root in a time series
sample. The primary difference is that the ADF is used for more complicated and
larger sets of time series models.34 If there is higher order correlation instead of
only AR(1)- processes the augmented version must be used.
The purpose is to test the null hypothesis, that an unit root is present against the
hypothesis that there is no unit root which indicates that the data is stationary.
31”Hypotesprövning” http://gauss.stat.su.se/gu/sg/2012VT/Kompendium/KAP17new.pdf(Retrieved 2019-05-03)
32 ”ADF — Augmented DickeyFuller Test ” https://www.statisticshowto.datasciencecentral.com/adf-augmented-dickey-fuller-test/ (Retrieved 2019-03-15)
33Pantelis, Anastasios. 2008. ”Testing for unit roots in the presence of structural change”http://lup.lub.lu.se/luur/download?func=downloadFilerecordOId=1338330fileOId=1646631(Retrieved 2019-03-09)
34”The Augmented Dickey-Fuller Test” https://www.thoughtco.com/the-augmented-dickey-fuller-test-1145985 (Retrieved 2019-02-27)
18
Consider the first order autoregressive model
Xt = δ + θXt−1 + ϵt
where θ = 1 corresponds to a unit root and ϵt is a white noise process, with a
constant variance and zeromean. In a stationaryAR(1)-process, the constant term
δ can be expressed as δ = (1− θ)µ, where µ is the mean of the series.
The null hypothesis of a unit root is that θ = 1 which also implies that δ = 0.
Hence, to test the null hypothesis θ = 1 and δ = 0must be shown. This is difficult
to test, therefore the model is rewritten as
∆Xt = δ + (θ − 1)Xt−1 + ϵt = πXt−1 + ϵt
The null hypothesis states that ϕ− 1 = 0 or equivalently π = 0. The hypothesis is
thus formulated as
H0 : π = 0
H1 : π < 0
When the hypotheses are established the Dickey-Fuller test performs a t-test on
H0. With the test one obtains a critical value τ̂ , which is a point in the test
distribution and is compared to the test statistics.
τ̂ =ϕ̂− 1
SE(ϕ)=
π̂
SE(π̂)
35Whenperforming the ADF test, p-value<0.05 indicates strong evidence against
the null hypothesis. Thus, stationarity is not rejected. On the other hand, p-
35Verbeek, Marno.”A Guide to Modern Econometrics” 2014, 2nd Edition, p.265-268
19
value≥ 0.05, then evidence against the null-hypothesis is weak, hence stationarity
of the time series can be rejected.
4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test
The KPSS-test is a test of the stationarity hypothesis proposed by Kwiatkowski,
Phillips, Schmidt and Shin (1990). Similar to the Dickey-Fuller test, the
characteristics of the KPSS-test is that it gives an indication on whether there
exists a unit root or the process is stationary.36
Let Xt, t = 1,2,...T be a time series of observed values. Assume, the series can be
decomposed into a deterministic trend, a random walk, and a stationary error.
The data generating process (DGP) of Xt in KPSS can then be defined as
Xt = Yt + ϵt + ξt
where Yt is the deterministic trend term, ϵt is the error term, and ξt is the random
walk term, so that
ξt = ξt−1 + ηt
.
By definition of the random walk ηt∼ iid(0,σ2).37 If σ2=0meaning the variance of
ηt is zero, then it holds that
ξt = ξt−1
That is, the random walk process devolves to a constant term and Xt becomes
36”What isa Critical Value?”, 2019. https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/basic-statistics/inference/supporting-topics/basics/what-is-a-critical-value/ (Retrieved 2019-05-04)
37Nabeya, Seiji et al. ”Asymptotic Theory of a Test for the Con-stancy of Regression Coefficients Against the Random Walk alternative”1987. https://projecteuclid.org/download/pdf1/euclid.aos/1176350701?fbclid =IwAR2Rt2XpMITexA880DiEC4qzo8V EjzmA7HjMKNyp3mKSoKSAXhOaY Ff85c(Retrieved2019−04− 30)
20
trend-stationary, meaning that the series grows around the deterministic trend.
Consequently, the null hypothesis can be formulated as
H0 : σ2 = 0
H1 : σ2 > 0
Under the null hypothesis the process is trend-stationary (and the counter
hypothesis implies that Xt, t = 1, 2...T is a unit root process).38 To reduce
complexity, the deterministic component of the series may also be removed, Yt =
0. This is a special case for which the null hypothesis is that Xt is level-stationary
around a level or mean (ξ0) instead of around a trend, meaning that the mean
value no longer depends on t.39 A statistic that can be used for the null hypothesis
is the LM statistic, which is defined as
LM =T∑i=1
St2/σ̂2
t
where
S2t =
t∑i=1
ei
.
That is, S2t is the squared partial sum of the residuals from a regression of x on the
deterministic term. Further, et, t=1, 2,T denotes the residuals from a regression
of X on a time trend and an intercept. Also, σ̂2t is the notation for the estimated
value of the variance obtained from the regression. If the aim is to test for trend
stationarity then the residual is redefined as
ei = Xi− X̄
38Cappuccio, Nunzio et al. ”The Fragility of the KPSS Stationarity Test”2009. http://leonardo3.dse.univr.it/home/workingpapers/fragilitykpss.pdf?fbclid =IwAR0snLcQCpmgyNCMq0eR9JgXXwFW3hnIZykKcv72IbZO7t57goM9d1W4xGI(Retrieved2019−04− 30)
39Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternativeof a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8(Retrieved 2019-04-30)
21
which is the regression of X only on an intercept.40
4.3 Transformations
This section will provide theory regarding the transformations that Henricsson
found to be relevant when doing research. Furthermore, the purpose of them
will be discussed. Data transformation is a process where information or data
is converted from one format to another. In this case the goal is to transform data
fromnon-stationary to stationary. To describe these given equations the following
variables are introduced:
Data is measured on the range ( t0, .., t, .., tmax) and consists of T elements. The
dataset X, is an N*T matrix containing the N variable vectors (x1, x2,.., xN) where
xi = (xi,t0,…, xit…, xi,tmax). For a certain point in time t, and a specific variable k,
we will present a number of approximations of transformations.
Most of the generally approximated transformations depend on the rate of
decay α, which can be varied so there are a suitable number of varieties of the
transformations and an estimation may be needed. Generally the formula for the
new forecast after the transformation follows the pattern
NewForecast = α(NewData) + (1− α)MostRecentForecast
One can say that the approximation of α will decide the rate of howmuch the new
forecast represents of new data and howmuch to consider the past.41 Studies that
have been performed before have suggested that the value of α should be below
0.3 for a smoothing result.42
40Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternativeof a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8(Retrieved 2019-04-30)
41Ragnarstrom, Elsa. ”How tocalculate forecast accuracy for stocked items with a lumpy demands”, 2015. https://www.diva-portal.org/smash/get/diva2:901177/FULLTEXT01.pdf (Retrieved 2019-05-03)
42”How To Identify Patterns in Time Series Data: Time Series Analysis”
22
4.3.1 Level Transformation
Let {Xt, t = 0, 1, 2...} be a time series. Then the level transformation is definedas
F1i,t = Xi,t̄
where
t̄ = max(tj ≤ t)
t̄ = max(tj ≤ t) is the largest t value in the sample at a specific point of time. That,
it corresponds to the latest observation. In other words, if there are any missing
values, the most recent value obtained will be used.
4.3.2 First Difference Transformation
The first difference at time t, F2i,t is obtained by looking at the change between
an observation at time t and the previous time step, t-1, from the original series.43
The first difference transformation is defined as
F2i,t = Xi,t̄ −Xi,t̄−1
A non-stationary behavior commonly encountered is when the level of the process
changes, although the process still shows homogeneity in the variability. Taking
the (first) difference may in these cases lead to stationarity.44 In time series
analysis, differencing is frequently used for removing dependency on time, for
which structures such as trend and seasonality may be included.
http://www.statsoft.com/Textbook/Time-Series-Analysis (Retrieved 2019-05-03)43Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011.p 9044Bisgaard.S, Kulahci. M. ”Time Series Analysis and Forecasting”, 2017-06-
22. https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q(Retrieved 2019-02-18)
23
4.3.3 Mean EWMA-transformation
An exponentially weightedmoving average, also called EWMA is a type of moving
average that places a greater weight and significance on the most recent data
points. For example, it can be assumed that a security’s price is mostly dependent
on more recent prices compared to long ago historical data. The previous value of
the EWMA is taken into consideration in the calculation of the following EWMA.
The weights are based on the expontential function as the name indicates.45 This
is a very popular scheme to produce a smoothed time series. In general if you have
a time series called {Xt} then the smoother version will look like
St = α ∗ xt + (1− α)St−1
46
The definition for the EWMAmean in this case is
F3i,t = (1− α) ∗ F3i,t−1 + α ∗ F2i,t
4.3.4 Variance-EWMA Transformation
As mentioned, exponentially weighted moving averages are often used for
smoothing irregular fluctuations in a time series to better find the patterns over a
specific time period. Since EWMA has different properties the formula used for
the EWMA variance transformation is
F4i,t = (1− α) ∗ F4i,t−1 + α(F2i,t − F3i,t)2
From EWMA variance, a future variance is estimated by the weighted average of
45”Exponentially Weighted Moving Average” https://www.value-at-risk.net/exponentially-weighted-moving-average-ewma (Retrieved 2019-03-02)
46Jinka, Preetam. ”Exponential Smoothing for Time Series Forecasting”, 2017-06-22. https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q(Retrieved 2019-02-18)
24
past variances.47
4.3.5 Skewness EWMA Transformation
This transformation measures the skewness and uses it in order to transform the
data. The formula used is
F5i,t = (1− α) ∗ F5i,t−1 + α(F2i,t − F3i,t)3
4.3.6 Kurtosis-EWMA Transformation
This transformation measures the kurtosis of the change in the variable.
F6i,t = (1− α) ∗ F6i,t−1 + α(F2i,t − F3i,t)4
4.3.7 Autocorrelation Transformation
In general probability theory and statistics with a known stochastic process in
focus, the autocorrelation will be a number that represents the similarity between
a given time series and a lagged version of it over successive time intervals. In
other words it is the same as calculating the correlation between two different
time series, its current value versus its past. The result varies between -1 and 1. If
the autocorrelation is positive it means that the increase in one time series results
in an increase in the other time series as well.48 Firstly, the EWMAautocovariance
is calculated by the following formula
47Breaking Down Finance. EXPONENTIALLY MOVING AVERAGE VOLATILITY (EWMA).https://breakingdownfinance.com/finance-topics/risk-management/ewma/ (Retrieved2019-05-03)
48Kenton, Will. ”Autocorrelation”, 2019-03-31.https://www.investopedia.com/terms/a/autocorrelation.asp (Retrieved 2019-04-13)
25
F7i,t = (1− α) ∗ F7i,t−1 + α(F2i,t − F3i,t)(F2i,t−1 − F3i,t−1)
Normally, the autocovariance function between time t1 and t2 for Xt is defined
as
γX(t1, t2) = Cov(Xt1 , Xt2)
and the autocorrelation is defined as
φX,X(t1, t2) =γX(t1, t2)
σt1 ∗ σt2
where σt2 is the variance at time t.49 To obtain the EWMA autocorrelation
between, t1 = t and t2 = t − 1 the standard variances are replaced with the
corresponding EWMA variances. Also, the EWMA autocovariance is used and
the formula is hence
EWMA autocorrt =F7i,t√
F4i,t∗√
F4i,t−1
4.3.8 Correlation-EWMA Transformation
In probability theory the correlation measures the degree to which two time
series move in relation to each other. Just like in the autocorrelation case, if
the correlation is positive, it indicates that if one series moves up the other will
follow.50 Let {Xt, t = 0, 1, 2...} be a time series representing one set of observeddata, and {Yt, t = 0, 1, 2....} be another time series which represents another set ofobserved data.
To begin with, the EWMA covariance is calculated by the formula
F8i,j ,t = (1− α) ∗ F8i,j ,t−1 + α(F2i,t − F3i,t)(F2j ,t−1 − F3j ,t−1)
49Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.6250Hayes, Adam. ”Correlation”, 2019-04-30.
https://www.investopedia.com/terms/c/correlation.asp (Retrieved 2019-05-01)
26
where index i and index j correspond to Xt and Yt, respectively. In general, the
covariance between to random variables X and Y is denoted Cov(X,Y) and the
correlation between the random variables is defined as
φX,Y =Cov(X,Y )
σX ∗ σY
where σX2 is the variance of X and σY
2 is the variance of Y .51
Using the EWMA covariance and replacing the standard variance with their
corresponding EWMA variances, the EWMA correlation is formulated as
EWMA corrt =F8i,j ,t√
F4i,t∗√
F4j ,t
51Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.62
27
5 Methodology
As tools it was decided to limit this project to the programming language Python
and spreadsheet Microsoft Excel. These tools have been chosen since they are
easily used for time series data and one can perform all the hypothesis tests and
transformations required using these.52
5.1 Data Collection
The data was provided by SHB and consisted of different security prices and
indices. These covered the time period from 2001-01-01 to 2018-12-31 and were
noted on a daily basis. This was in order to capture real trends and seasonality
of the time series. The data regarded US related securities, such as US sectors
stock indices, US treasury bonds, exchange rates with the US dollar and more.
Processing this type of data may lay the basis for SHB to use the data and predict
future outcome of the US stock market. For example, future values for US stock
market indices Dow Jones Index or SP 500 may potentially be forecasted by a
prediction model after the data is pre-processed. This was an area of interest for
SHB.
The data is considered to be quantitative since it only contains numbers.53
Qualitative data was also used when discussing experiences with professionals
with previous expertise regarding data pre-processing. For example discussions
on how to interpret results or to understand more about the data chosen.
5.2 Data and Notations
This section contains the data and notation used in this thesis and explanations
regarding them.
52Brownlee, Jason. ”How to Check if Time Series Data is Stationary with Python”, 2016-12-30 https://machinelearningmastery.com/time-series-data-stationary-python/ (Retrieved 2019-03-09)
53”Collecting Data” http://betterthesis.dk/research-methods/lesson-1different-approaches-to-research/collecting-data (Retrieved 2019-02-09)
28
5.2.1 Exchange Rates (FX data)
An exchange rate shows the value of one currency unit relative to a unit of
another currency in the foreign exchange market.54 Further in this report, a
currancy pair Currancy1 Currancy2 represents the price given in currency 2,
for one unit of currency 1. As FX-data, the currency pairs used are EURUSD,
GBPUSD, AUDUSD, NZDUSD, USDCAD, USDCHF, USDJPY, USDNOK and
USDSEK .
5.2.2 US Sectors Data
The sector data used are indices, each one describing the performance of
a chosen sector in the United States. The index is designed by Morgan
Stanley Capital International (MSCI) and covers securities in the large and
mid cap segment within the specific sector. MSCI is a provider of security
indices and performance analytics.55 The classification of the securities
follows the Global Industry Classification Standard (GICS®).56 Notations
for each sector are MXUS0EN (Energy), MXUS0MT (Materials), MXUS0IN
(Industrials), MXUS0CD (Consumer Discretionary), MXUS0CS (Consumer
Staples), MXUS0HC (Health Care), MXUS0FN (Financials), MXUS0IT
(Information Technology) and MXUS0TC (Telecom Services) and MXUS0UT
(Utilities).
5.2.3 Countries- Stock Index Data
The country (and region) indices used are MXDE (Denmark), MXEU (Europe),
MXGB (United Kingdom), MXFR (France), MXCH (Switzerland), MXES
(Spain), MXIT (Italy) and MXUS (the United States). Each index is developed
54Investopedia, ”CurrancyPair Definition”.https://www.investopedia.com/terms/c/currencypair.asp. (Retrieved 2019-05-04)
55”Index solutions”. MSCI, https://www.msci.com/index-solutions (Retrieved 2019-05-18)56”MSCI USA
MATERIALS INDEX”. MSCI, 2019-04-30. https://www.msci.com/documents/10199/6ce4617e-9127-480f-8f3b-1fdf4c0c8962 (Retrieved 2019-05-03)
29
by MSCI and is used as an measurement for the stock market performance for
large and mid cap stocks of a particular country or region.57
5.2.4 Commodities Data
The commodity data covers WTI Crude Oil prices (denoted C1 Comdty) and
secondly, the spot prices of 1 troy ounce of gold in terms of US dollars (denoted
XAUUSD). 58
5.2.5 VIX- Market Volatility Index Data
The twomarket volatility data sets used are denoted as VIX Index and V2X Index.
VIX Index, which is also known as the Chicago Board Exchange (CBOE) Market
Volatility Index, is a real-time market index reflecting the market’s expectation
of the volatility. It is a 30-day forward looking volatility based on SP 500 index
options.59 V2X Index is based on real-time prices of EURO STOXX 50 Index
options. The index corresponds to the market expectations of the two month
forward-looking volatility. 60
5.2.6 Bond (IR) Data
The indices usedwereUSGG30YR Index,USGG10YR Index,USGG2YR Index and
CSI BARC Index. The USGGXYR Index, where X is an number, denotes a United
States government bond indexwithX yearsmaturity fromwhen it was first issued.
It is ameasure of of the generic governmentX-yield forUS issues of treasuries. For
example, the USGG10YR represents the index of 10-year US government bond. 61
57”MSCI US Index”. MSCI, 2019-04-30. https://www.msci.com/documents/10199/67a768a1-71d0-4bd0-8d7e-f7b53e8d0d9f (Retrieved 2019-05-03)
58”XAUUSD”. TradersTrus. https://traders-trust.com/instrument/xauusd-gold-spot-vs-us-dollar/ (Retrieved 2019-05-03)
59”CBOE Volatility Index (VIX) Definition”. Investopedia.https://www.investopedia.com/terms/v/vix.asp (Retrieved 2019-05-03)
60”EURO STOXX® 50 VOLATILITY (VSTOXX) INDEX”. STOXX, 2019-03-29. https://www.stoxx.com/document/Bookmarks/CurrentFactsheets/V2TX.pdf (Retrieved2019-05-03)
61InvestmentFinance. ”USGG10YR”, 2014-02-07.https://www.investment-and-finance.net/finance/u/usgg10yr.html (Retrieved 2019-05-02)
30
The CSI Barc Index represents Barclays Capital US Corporate High Yield Bond
Index - Yield-to-Worst (YTW) 10 Year Treasury spread. 62
5.2.7 Transformations
Tomake the results and conclusionmore short and concise the followingnotations
will be used for the transformations
• First difference transformation: Transformation 1•EWMA- Mean transformation: Transformation 2
• EWMA- variance transformation: Transformation 3
• Autocorrelation transformation: Transformation 4• Correlation-EWMA transformation: Transformation 5
5.3 Selection of Transformations and Hypothesis Tests
Asmentioned, the selection of transformations and hypothesis tests weremade by
SHB and were based on their knowledge and experience of data pre-processing
and commonly used transformations. Both established transformations and
approximated formulas derived by Richard Henricsson were chosen. The
hypothesis tests includedwere also chosen since they are formulatedwith opposite
null hypotheses. This would therefore provide a wider perspective of the results
of the hypothesis test compared to having two tests where their null hypothesis
indicates the same outcome.
5.4 Selection of Market Entry Frameworks
The timing of entry framework was chosen to get a comprehensive view on how
SHB can be affected by the decision of when to commercialize their machine
learning based predictionmodel, for example by launching new financial products
based on their technology.
62 BlackRock.”ETF Landscape: Industry Highlights”, 2012-04-12 (Retrieved 2019-05-02)
31
The benefits of entering at different times also depend on how the market
dynamics are right now. Porters five forces, is a well-established model for
understanding the characteristics and dynamics of a market therefore it is
included as a complement to the analysis of the potential market entry. Although,
since Porters Fiver Forces is an extensive model covering various perspectives
of the competitive environment of a industry or market, not every aspects was
considered relevant for the discussion regarding market entry. Therefore, it was
determined to not include every aspect in the analysis.
5.5 Literature Study
Prior to diving into the specific scope of the thesis a comprehensive examination
of existing knowledge was performed. Literature and journals were both collected
online and provided by SHB. The first part of the literature study was regarding
the aspects of economics and management within this thesis. It was important
to understand the greater purpose of it to SHB and how different economic
theories can be applied to the long term project. Two important books for the
thesis included in the study was Porter’s Five Forces by Newton and Bristoll and
Shilling’s Strategic Management of Technological Innovation(2017).
The literature selected for the mathematical theory was used in order to
understand concepts and terms presented within time series analysis. For the
research question of this thesis, a comprehension is necessary regarding the
different components of time series as well as how and why time series can be
processed. Moreover, in a systematic order, studies examining similar theses
were collected and summarized to gain an indication of what could be considered
a reasonable result. Theory for the mathematical section was mainly retrieved
from Introduction to Time Series and Forecasting by Brockwell and Davis(2002)
and Time Series Analysis and Forecasting by Example by Bisgaard and Kulahci
(2011). Also it was supported with knowledge received from Richard Henricsson
and Peyman Dabiri at SHB.
32
5.6 Procedure of Work
As mentioned, the first step of the thesis was to do a comprehensive literature
study to gain a deep understanding of the long term purpose regarding economics
and management for SHB. It was important to analyze possibilities, competitors,
advantages and potential outcome. For instance, an extensive research was
conducted about what banks in the Nordics that currently offered products based
on machine learning prediction of security prices.
After this study was obtained the focus shifted to understanding themathematical
theories. For the time series analysis, it was chosen to work with the additive
model presented in the theory section since it has been proven that this general
model fits data smoothly and are flexible without adding much complexity or
variance to the process.63 To confirm this, a few data sets were plotted to
graphically identify whether or not the models were additive. By inspecting the
pattern of increase of the amplitude of the time series it was decided that the time
series most likely was an additive model.
The mission was to investigate whether or not the transformations given could
make the given data stationary. Firstly, all the data given was imported, the
transformations were coded, as well as the two hypothesis tests with specific
Python functions from the ”stattools” package. Important to note is that the ADF-
function testes generally for stationarity whearas the KPSS-function in Python
tests for the null hypothesis that the data is level or trend stationary.64 This is
automatically integrated in the two functions.
A relevant concern for the this project was missing values since the values of the
transformation at time t, depends on the value at time t-1. Therefore, an important
first step of the procedure was to establish a methodology for handling potential
missing values in the data sets. Particularly, in this case it was agreed on to replace
themissing valuewith themost recent value prior to it as an approximation. It was
later noticed that all the given data sets contained no missing values. Therefore,
63”Generalized Additative Models”, 2017-07-06 https://machinelearningmastery.com/time-series-data-stationary-python/ (Retrieved 2019-05-06)
64”statsmodels.tsa.stattools.kpss”https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.kpss.html (Retrieved2019-05-04)
33
Figure 5.1: Figure showing an additive time series. As one can see the amplitude ofthe series seems to remain constant for different time steps which is an indicationof an additive model.
Figure 5.2: Figure showing a multiplicative time series. As one can see theamplitude of the series seems to increase with a factor for different time stepswhich is an indication of a multiplicative model.
34
this treatment was not used.
As mentioned, for the majority of the transformations, the calculation for time
t was dependent on the value at the previous time step. In other words, the
calculations were made recursively, but in order to start the recursion an initial
value was needed for all the transformations except the first difference which does
not depend on the previous value. In all of the cases n=19 was chosen, that is the
19 first values in a time series. These values were later on removed from the time
series and not included in the usage of the transformations. This seemed like a
reasonable amount to give a good estimated starting point without removing too
much of the data. The sample size that was chosen large enough to smooth the
effects of eventual effects of potentially existing extreme values. However, since
the size of the transformed series will decrease as n increases, a relatively narrow
sample size was desired.
Starting with the EWMA- mean, the first transformation value was estimated as
taking the average of a sample n of the first differences. The first differences are
chosen as the base data for all the initial approximations since first differences in
general may lead to removal of some trends or seasonal effects. Hence, compared
to the original series, differenced series may behave more similar to a stationary
series. More precisely initial EWMA-mean is65
F3i,1 = F̄2i,t =
∑nt=1 F2i,tn
The initial EWMA-variance was calculates as the variance of the sample of
differences.66
F4i,1 =
∑nt=1(F2i,t − F̄2i,t)
2
n− 1
Likewise, the initial skewness value was obtained by calculating the skewness of
65”Sample Mean”. https://www.statisticshowto.datasciencecentral.com/sample-mean/(Retrieved 2019-04-03)
66”Sample Variance: Simple Definition, Howto Find it in Easy Steps”. https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/descriptive-statistics/sample-variance/ (Retrieved 2019-04-03)
35
the sample, that is
F5i,1 =n
(n− 1)(n− 2)∗∑n
t=1(F2i,t − F̄2i,t)3
s3
is the standard deviation for a sample of the 19 first differences where s is the
standard deviation for the sample of the 19 first differences.67
For the EWMA kurtosis calculations, the inital value was approximated by the
kurtosis of the sample
F6i,1 =n(n+ 1)
(n− 1)(n− 2)(n− 3)∗
n∑t=1
(F2i,t − F̄2i,t)4
s4− 3(n− 1)2)
(n− 2)(n− 3)
where s is the sample standard deviation.68
The intital value for EWMA autocorrelation was calculated by69
F7i,1 =1
(n− 1)∗∑n
t=1(F2i,t − F̄2i,t)(F2i,t+1 − F̄2i,t+1)
F4i,1
Instead of using the variance as the standard formula for autocorrelation suggests
it was chosen to use the EWMA-variance. This was due to the fact that the EWMA-
variance by theory should be more stationary compared to data of the standard
variance since it i1s exponentially weighted. Thus, using the transformed values
should result in a more stationary initial value for the autocorrelation.
The autocorrelation was calculated by taking the first until the 19:th differenced
value as one time period, and the second value to the 20:th as the other set. In
other words, lag 1 was used for the autocorrelation function. Lag 1 was chosen
since thenextEWMAautocorrelation calculation at time t, is the value at t+1.
67 ”SKEW function”.Microsoft. 2019. https://support.office.com/en-ie/article/skew-function-bdf49d86-b1ef-4804-a046-28eaea69c9fa (Retrieved 2019-04-03)
68”KURT function”. Microsoft. 2018. https://support.office.com/en-us/article/kurt-function-bc3a265c-5da4-4dcb-b7fd-c237789095ab (Retrieved 2019-04-03)
69”Autocorrelation function”. http://www.real-statistics.com/time-series-analysis/stochastic-processes/autocorrelation-function/(Retrieved 2019-04-03)
36
Initial correlation-value is calculated
F8i,1 =1
(n− 1)∗∑n
t=1(F2i,t − F̄2i,t)(F2j,t − F̄2j,t)√F4i,1
√F4j,1
where i and j denote two different time series.70
Similar to the case with the autocorrelation, the square root of the EWMA-
variances for the different time series are taken instead of the standard deviation
that the standard formulas use. Once again this was chosen due to the fact that
theEWMA-variances should bemore stationary than the standard deviation of the
original data. Therefore, this should result in a more stationary inital value.
With these initial values the transformation formulas could be applied to the
time series. Before properly using the coded program for all data sets and
transformations to check for stationarity, and find optimal values for α, a few data
sets were tested. The original data set and the output of every transformation
were visualized with graphs, since graphical methods for observing stationarity is
a commonly used in time series analysis. The aim was to obtain an initial insight
of how a transformation might affect the financial time series. For this purpose,
only one fixed value for α was set.
To begin with, with two different currencies where chosen for the initial
assessment. This is because currency exchange rates give important information
regarding a country’s relative level of economic health. Therefore, it is important
data for the prediction model.71 Currency exchange rates are not only related to
interest rates but also to inflation and a country with clear positive attributes will
draw more investment funds. Therefore, this was one of the easiest data sets for
understanding how they affect different prediction models. Besides, exchange
rates are generally relatively stable and therefore we predicted that FX-data would
be one of the easiest sectors to stationarize. As our second choice it was decided to
plot commodities such as Crude Oil, C1. Since it is known that oil prices in general
70”CORREL function”. Microsoft. 2019. https://support.office.com/en-gb/article/correl-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92 (Retrieved 2019-04-03)
71”6 Factors that Influence Exchange Rates” https://www.investopedia.com/trading/factors-influence-exchange-rates/ (Retrieved 2019-05-01)
37
depend on seasonality it was wanted to see how well our transformations would
handle the seasonal component.72
When plotting for these initial graphs a fixed α=0.5 was chosen for each
transformation. As mentioned, these plots were only for our own understanding
and therefore α =0.5 seemed like a neutral choice, since it gives as much weight
to the current value as the past. Before trying to find an optimal α, once again a
fixed α was chosen to compare the transformation formula to a moving average
of the transformation. This would show if our approximated transformation
formulas are good estimates. When taking a moving average, instead of using
the transformation formula to obtain the rest of the values from the initial value,
the mathematical formulas presented in this section 5.5 for the initial values are
used. But for each point you move the interval which is implied by ”moving
average”. For example, the first initial value n=1 goes to 19 in each formula. For
the next point we will instead use n=2 up to 20 for the same formula without
the usage of the approximated transformation formulas. The the moving average
and transformation formula points are plotted simultaneously to see if their
values show a similar pattern. If they differ a lot, it can be concluded that the
transformation formulasmight not be specific enough towork for this type of data.
Therefore, the abnormal transformations would not be further used when finding
an optimal α.
Figure 5.3: Figure explaining how one can see the pattern of the moving averagecompared to the actual transformation outcome. As seen, the pattern of themoving average graph matches the transformation. This indicated that thistransformation is a good and valid approximation
72Journal of International Studies ” Seasonal patterns in oil prices and their implications forinvestors” 2018. (Retrieved 2019-05-01)
38
After confirming the formulas, one of the main step was to examine the first
difference transformation. Since the transformation does not depend on the
variable α it was not tried for different values of α. The data sets were run
through for this transformation separately and later on tried with the hypothesis
tests to see if the first difference transformation alone would be enough as a
transformation.
The final step of the project was searching for the optimal value or an optimal
interval for α for each EWMA transformation respectively. A step length of 0.02
was used. This was chosen since it was noticed that taking smaller steps would
require too much time to run for all the given data sets. This was done for all the
transformations that were dependent on α and for the transformations for which
formulas proved to be good approximations through the moving average test. We
decided to define a passing test, in other words the data set being stationary, if
and when it passed both hypothesis tests.
39
6 Results
Initially, three different trials were conducted for a fixed α, using data from two
currency pairs and one commodity. Each data set was transformed separately and
for each transformation. To visualize the result from the transformation, a graph
consisting of the transformed values was plotted after a data set had been run
through the transformation. Afterwards the hypothesis tests where performed to
see whether the graphical results indicated were valid. The final step was to test
for differentα- values for each transformation in order to find the optimal value or
values for α in the sense that it couldmake input data potentially stationary.
6.1 First Trial: Plots for Currency Rates, with Fixed α
For the first trial, the currency pairs EURUSD and USDCHF were arbitrarily
chosen among the currencies. As mentioned, the intention was to get an overview
of how currency rates time series behave before and after each transformation. To
check for stationarity, each transformation was plotted and examined. Also, the
transformed series were checked using the two hypotheses tests KPSS and ADF.
For all transformations depending on the weighting factor α, a fixed value α = 0.5
was used. This was done for the first two data sets of the currency data to see
the results of the plots and also if these agreed with the tests statistics for each
currency.
Figure 6.1 shows the original time series for the EURUSD prices from 2001-01-01
to 2018-12-31. It is observed that data does not seem to have a constant mean
nor a constant variance and may therefore be non-stationary. However, there
are no distinguishable seasonal patterns or extremely large deviations, whichmay
indicate that the process is relatively stable. This behavior was expected since it
is common for FX data.
Figure 6.2 shows the original time series for the USDCHF currency plotted from
the data. It is observed that the time series does not look stationary, rather it
seems as if there is a general decrease, exhibited by a trend component.
Figure 6.3 shows the time series for the EURUSD currency but transformed
40
Figure 6.1: EURUSD graph
Figure 6.2: USDCHF graph
with the first difference transformation. It is observed that the transformed
series seems to move regularly around a certain value (a straight horizontal
line). Specifically it seems to have a constant mean and variance, similar to the
behaviour of a stationary time series. Since the first difference transformation is
known to be an easy, yet a very useful transformation these positive results were
awaited.
Figure 6.4 shows the time series for the USDCHF currency but transformed with
the first difference transformation. With this the time series seems to become
more stationary even though there are still peeks.
Figure 6.5 shows the time series for the EURUSD currency but transformed with
41
Figure 6.3: EURUSD graph- transformation 1
Figure 6.4: USDCHF graph- transformation 1
the EWMA-mean transformation. With this the time series seems to become even
more stationary, with a regular movement around zero. The theory describes the
EWMA-mean transformation as a smoothing one and indeed it seems to smooth
the pattern of the graph. Some peeks can be still be found, which may represent
extreme movements in the original series (a sharp increase or decrease in prices
in a short time period). Unusual events, such as the financial crisis in 2008 may
bring contribute to this kind of behaviour in a financial time series.
Figure 6.6 shows the time series for the USDCHF currency but transformed with
the EWMA-mean transformation. With this the time series also seems to become
more stationary.
42
Figure 6.5: EURUSD graph- transformation 2
Figure 6.6: USDCHF graph- transformation
Figure 6.7 shows the time series for the EURUSD currency but transformed with
the EWMA-variance transformation. With this the time series seems to become
more stationary which one can see since the scale on the y-axis decreases which
means that the peaks become smaller.
Figure 6.8 shows the time series for theUSDCHFCurncy but transformedwith the
EWMA variance transformation. With this there is barely any difference visually
compared to the previous transformation.
Figure 6.9 shows the time series for the EURUSD curency but transformed with
the EWMAautocorrelation transformation. With this it seems that the correlation
43
Figure 6.7: USDCHF graph- transformation 3
Figure 6.8: USDCHF graph- transformation 3
is stable between around -1 and 1 which is where the autorcorrelation should
be.
Figure 6.10 shows the time series for the USDCHF currency but transformed with
the EWMA autocorrelation transformation. Just as for the EURUSD-currency it
seems stable and varies around -1 and 1.
Figure 6.11 shows the correlation between the two time series for EURUSD
currency and USDCHF currency. From the plot it is observed that there might
be some dependency between the two currency pairs. This was expected since
44
Figure 6.9: EURUSD graph- transformation 4
Figure 6.10: USDCHF graph- transformation 4
both currency pairs depend on the US dollar.
6.1.1 Statistics for First Trial
The following statistics were obtained for the Augmented Dickey Fuller-Test
(ADF) and KPSS-test for the two different currencies tried.
For EURUSD- Currency pair:
P-value/Transformation 1 2 3 4 5
ADF 1.84*10−21 4.37*10−21 0.0002 8.02*10−21 2.50*10-20KPSS 0.1 0.1 0.1 0.0204 0.0333
Table 6.1: Test Statistics obtained for EURUSD trial 1
45
Figure 6.11: USDCHF graph- transformation 5
For USDCHF- Currency pair:
P-value/Transformation 1 2 3 4 5
ADF 2.58*10−24 6.47*10−24 6.27*10−16 6.05*10−22 2.50*10-20KPSS 0.1 0.1 0.01 0.0877 0.0333
Table 6.2: Test Statistics obtained for USDCHF trial 1
Table 6.1: One can see that the p-values for the ADF-test are low. The results
from the KPSS- test have large p-values for most of the transformations except for
transformation 4 and 5. This indicated that according to the ADF-test the time
series are all made stationary for the different transforms. However, the KPSS-
test statistics does not agree regarding transformation 4 and 5 making the data
stationary.
Table 6.2: One can see that the p-values for the ADF-test are low which
indicates stationarity. The results from the KPSS- test have large p-values for
transformation 1,2 and 4. Small values are obtained from the KPSS-test for
transformation 3 and 5, which indicates that the KPSS-test does not share the
same conclusion as for the ADF-test regarding making data stationary.
46
6.2 Second Trial: Plots for Commodity, with a Fixed α
Second trial, a commodity, more precisely the crude oil, was examined with
α = 0.5 for all transformations.
Figure 6.12: Commodity: Original Time Series
Figure 6.12 shows the original time series of the commodity plotted from the data.
The time series varies a lot with different peeks. As seen the y-scale on the axis
is much larger than for the currencies which indicates that currencies in fact are
more stable compared to commodities. As mentioned, many commodity prices
are affected by seasonal fluctuations.
Figure 6.13: Commodity with the first difference transformation
6.13: The differenced series behaves as it has become more stationary but not
47
completely since some values are remarkably larger or smaller than the rest of the
set. In Figure 6.12 (representing the original series) there is a sudden decrease in
the crude oil price around 2008 (which is also the year of the financial crisis). This
is also reflected in the transformed serieswhere extreme values can be foundwhen
examining the same time period. In general, the first difference transformation
seems to be working in accordance with theory, but might not work as effectively
for removing the impact of such extreme events.
Figure 6.14: Commodities with the EWMA transformation
6.14: For the commodities the EWMA mean transformation seems to narrow
the range for which the time series values varies within. For the difference
transformation seen in Figure 6.14, the values varies within a range of -15
to 15 approximately (except for the extreme values). After the EWMA-mean
transformation is performed, it is observed from the graph that the range for
which the time series values varies within, has been narrowed to around -8 to
8. In accordance with theory, the transformation has a smoothing effect on the
series. Similar to the differencing transformation, EWMA-mean does not seem to
succeed with removing extreme events such as the 2008 financial crisis.
Figure 6.15 pictures the EWMA variance transformation of the CL1. As seen the
transformation does not seem to have a beneficial effect on the time series. The
EWMA-variance depends on the values from the difference and the EWMA-mean
transformation. Therefore, it might be expected that if the EWMA-mean and
the differenced values are stationary, then the EWMA-variance should yield a
48
Figure 6.15: EWMA variance transformation of the CL1
stationary output too. Although, by the plot it can be seen that the transformation,
in fact, seems to make the data more non-stationary and varied. Especially for
short time periods with extreme movements and fluctuations in the original data
(such as around 2008), the EWMA-variance values for those periods become
very deviant. This indicates that the EWMA-variance is highly sensitive to
variations in the input data. In a normal case, if the first difference transformation
and the EWMA- mean transformation make the data set stationary whereas
the EWMA-variance has a negative impact one would not normally do these
transformations and the following ones but since it is wanted to understand all
of the transformations, they will all be plotted.
Figure 6.16: EWMA auotcorrelation transformation
49
5.16 shows that autocorrelation varies between -1 and 1 and graphically seems
stable
6.2.1 Statistics for Second Trial
P-value/Transformation 1 2 3 4
ADF 8.78*10−18 5.83*10−17 2.16*10−4 8.45*10−20KPSS 0.1 0.1 0.01 0.01
Table 6.3: Test Statistics obtained for Commodity trial 2
Table 6.3: ADF-values are low for all different transformation. KPSS-values are
high for transformation 1 and 2 but low for the rest. As for the currencies, the
ADF-test indicates stationarity for all the transformations whereas the KPSS-test
only confirms stationarity for transformation 1 and 2.
50
6.3 Third Trial: Plots for Commodity Prices, with a Fixed α
For the third trial another time series of commodity prices was tried, this time,
gold prices expressed in US dollars. As for the previous trials, a fixed value α = 0.5
was used as the weighing factor for the transformations.
Figure 6.17: Original data of XAUUSD currency
Figure 6.18: First Difference transformation of XAUUSD currency
Figure 6.18: The first difference seems to have a positive impact on the stationarity
of the time series and is similar to the results from the previous trials.
51
Figure 6.19: EWMA- mean transformation of XAUUSD currency
Figure 6.19: The EWMA- mean transformation seems to behave similarly to the
differenced values seen in Figure 6.18. More precisely, both transformations
seem to move around the same level and also their variations are resembling,
in the sense that they have deviating values at the same points in time. As
for the previous trials, compared to the first order differencing transformation,
the EWMA-mean values seem to have a more narrow range in which these
transformed values varies. The EWMA-mean transformation is based on the
differenced values rather than the original series in the calculations. By theorem,
it is also known that EWMA-mean should serve as a smoothing technique. An
interpretation of the results could therefore be that trends are firstly removed
by differencing, and thereafter the data set is smoothened by the EWMA-mean.
Therefore the graph of the resulting series seems more stationary.
Figure 6.20: The transformation does not seem to have a positive impact on the
time series. The EWMA-variance transformation takes both EWMA-mean values
and differenced values as input. Looking at the Figure 6.18 (First difference of
XAUUSD data) and Figure 6.19 (EWMA-mean of XAUUSD data), it is noticed
that both transformations exhibit extreme values at the same points of time, for
example one is found between 2013 and 20014, and another is found just before
2012. These deviations seem to increase remarkably after the EWMA-variance
52
Figure 6.20: EWMA-variance transformation on XAUUSD currency
transformation. As for the previous trials, the transformation presents a certain
sensitivity to sharp fluctuations in the input time series.
Figure 6.21: EWMA-autocorrelation transformation on XAUUSD currency
The transformeddata looks stable in the sense that the series seem tomove around
a certain mean value with a constant variance. More specifically, the values vary
from approximately -1 to 1. The result can be expected since it is known that in
general, autocorrelation values range from -1 to 1.
53
Figure 6.22: Correlation between commodity and XAUUSD currency
Most of the correlation values varies around -1 and 1 but the values before exceeds
the limits a bit.
6.3.1 Statistics with trial 3
P-value/Transformation 1 2 3 4 5
ADF 5.16*10−25 1.19*10−24 1.51*10−8 1.28*10−20 8.71*10−11KPSS 0.1 0.1 0.01 0.1 0.01
Table 6.4: Test Statistics obtained for commodity XAUUSD
As seen in Table 6.4 the ADF-values are well below 0.05 which indicates
stationarity, whereas the KPSS-values only exceed 0.05 for transformation 1,2
and 4. That is, the KPSS-test only indicates stationarity after using the three
mentioned transformations. Also, transformation4 for the gold commodity seems
to work better than for the oil.
54
6.4 Seasonality and Trends
The following graphs were obtained by using Python’s decompose function. This
was done to ”confirm” and visually see that the time series do indeed consist of a
trend component and a seasonal one even when transformed.
Figure 6.23: Graph showing the original data, the trend of it, the seasonality andthe residual decomposed
55
Figure 6.24: Graph showing the data transformed with first difference, the trendof it , the seasonality and the residual decomposed
6.5 Skewness and Kurtosis
As mentioned in the procedure of work (Section 4.5), before using the
transformation formulas implemented in the Python code, a simple moving
average, corresponding to an EWMA transformation, was plotted in Excel. In the
same graph the values of the EWMA transformation were also plotted. This was
done for each EWMA transformation that Henricsson at SHB had approximated,
which were for skewness, kurtosis, autocorrelation and correlation. The
difference between the simple moving average and an exponentially weighthed
moving average is that the EWMA gives more weight to recent values than past
values. Although, EWMA is based on the simple moving average and therefore
their graphs should approximately follow the same pattern. Consequently, if an
EWMA transformation followed the pattern of its corresponding moving average,
it was assumed that the transformationwaswell approximated. The aimwas to see
if the given formulaswould serve as accurate approximations. If not, theywere not
further examined, in other words, they were excluded from the research question.
All the approximated EWMA transformations seemed to follow a similar pattern
to themoving average, except for the skewness and kurtosis transformation.
56
Figure 6.25: Graph showing the moving average of the skewness versus theEWMA-skewness obtained by given transformation formula. As one can see theblue pattern representing the EWMA-skew, does not match the moving average(orange pattern) and there is an evident difference between their values.
Figure 6.26: Graph showing the EWMA-skewness up close since it was notproperly seen behind the moving average of the skewness.
57
Figure 6.27: Graph showing themoving average of the kurtosis versus the kurtosisobtained by given transformation. Once again it is seen that the blue graph of thekurtosis (obtained by the transformation) does not match the moving average ofthe kurtosis.
Figure 6.28: Graph showing the kurtosis up close since it was not properly visiblebehind the moving average of the kurtosis.
58
6.6 First Differences on all Data
The first difference transformation does not depend on the value of α and
was therefore, treated separately. All the data that passed the first difference
transformation is presented below. X represents passing both stationarity tests,
whereas 0 represents failing at least one of them.
Currencies
Figure 6.29: Table showing the currencies that passed transformation 1
US Sectors
Figure 6.30: Table showing the US sectors that passed transformation 1
Countries
Figure 6.31: Table showing the countries that passed transformation 1
Commodities
Figure 6.32: Table showing the commodities that passed transformation 1
VIX
Figure 6.33: Table showing the VIX data (market volatility data) that passedtransformation 1
59
Bonds
Figure 6.34: Table showing the IR data (bond data) that passed transformation 1
6.7 Finding the Optimal α
6.7.1 Currencies
To find the optimal α:s between 0 and 1 a step length 0.02 was chosen to loop
through different values of the variable. Each one was examined, starting with
FX- data. The transformations were tested separately, meaning for a chosen
transformation, the goal was to find a value for α that would make the most FX
data sets stationary.
The value or values of α for a chosen transformation, that made the transformed
series pass both the ADF and the KPSS-test for the most number of currencies
was chosen as the optimal value or interval. In the table, the symbol Xmeans that
the currency passed the two hypothesis tests for the transformation and later the
value or interval for αwill be presented. The symbol 0 corresponds to not passing
the tests, either both or one of them.
Currency/Transformation 2 3 4
EURUSD X X XGBPUSD X X XAUDUSD X X XNZDUSD X X XUSDCAD X X XUSDCHF X X XUSDJPY X X XUSDNOK X X XUSDSEK X X X
Table 6.5: Table Obtained for Currencies
The optimal values of α for transformation 2 are 0.381≤ α ≤ 1.
The optimal value is α = 0.981 for transformation 3.
60
The optimal interval for transformation 4 is 0.701≤ α ≤ 0.981.
The optimal interval for transformation 5 is 0.141≤ α ≤ 1. Here, all combinations
of two currency pairs were used to calculate different correlations. For example,
the correlation between EURUSD andGBPUSDwas tried, thereafter for EURUSD
and AUDUSD and so on.
6.7.2 US-Sectors
The same procedure was applied to the data of the US Sectors and so on.
US sector Index/Transformation 2 3 4
MXUS0EN X X XMXUS0MT X X XMXUS0IN X X XMXUS0CD X X XMXUS0CS X X XMXUS0HC 0 X XMXUS0FN X X XMXUS0IT 0 X XMXUS0TC X X XMXUS0UT X X XMXUS X X X
Table 6.6: Table Obtained for US Sectors
For transformation 2 there is no α that passes all the tests for the data.
Optimal interval for transformation 3 is 0.941≤ α ≤ 0.981.
Optimal intervals for transformation 4 is 0.061≤ α ≤ 0.521 and 0.861≤ α ≤ 0.981.
Optimal interval for transformation 5 is 0.261≤ α ≤ 1 where correlations
were calculated for various combinations of the US sectors. For example,
the correlation between MXUS0CD (Consumer Discretionary) and MXUS0CS
(Consumer Staples) was tried, thereafter forMXUS0CS andMXUS0UT (Utilities)
and so on.
61
6.7.3 Countries Index
Countries Index/Transformation 2 3 4
MXUS X X XMXEU X X XMXGB X X XMXFR X X XMXCH X X XMXES X X XMXIT X X XMXDE X X X
Table 6.7: Table Obtained for Countries Index
The optimal interval for transformation 2 is 0.101≤ α ≤ 1.
The optimal interval for transformation 3 is 0.961≤ α ≤ 0.981.
The optimal interval for transformation 4 is 0.181≤ α ≤ 0.981.
The optimal interval for transformation 5 is 0.041≤ α ≤ 1, where correlations
were calculated as previous.
6.7.4 Commodities
Commodity/Transformation 2 3 4
CL1 X X XXAUUSD X 0 X
Table 6.8: Table Obtained for Commodities
The optimal interval for transformation 2 is 0.161≤ α ≤ 1.
For transformation 3 there was no optimal value of α
The optimal interval for transformation 4 is 0.181≤ α ≤ 0.981.
For transformation 5, correlations were calculated as for previous the financial
variables, in other words, correlations between different commodities were tried.
There was no value for α that made data pass both tests.
62
6.7.5 VIX (Market Volatility)
Index/Transformation 2 3 4
VIX X X XV2X 0 0 X
Table 6.9: Table Obtained for VIX
For transformation 2 there is no optimal value that makes data pass both
hypothesis tests.
For transformation 3 there is no optimal value that passes both hypothesis tests.
The optimal intervals for transformation 4 are 0.181≤ α ≤ 0.421 and 0.921≤ α ≤0.981
For transformation 5 the correlation between all the different combinations of VIX
indexes were tried and the interval 0.101≤ α ≤ 0.981 is optimal.
6.7.6 IR (Bonds)
IR Index/Transformation 2 3 4
USGG30YR X X XUSGG10YR X X XUSGG2YR X X XCSI BARC X X X
Table 6.10: Table Obtained for IR Indexes
The optimal interval for transformation 2 is 0.121≤ α ≤ 1.
The optimal interval for transformation 3 is 0.921≤ α ≤ 0.981.
The optimal interval for transformation 4 is 0.961≤ α ≤ 0.981.
For transformation 5 correlations between all the various combinations of
different bond time series were tried, and there was no α that made all data pass
both tests.
63
6.7.7 Aggregated α
To further obtain an aggregated α for each transformation, the value or interval
of α that made the majority of the time series pass the hypothesis tests, were
aggregated into one. As a result, the following α:s were obtained
• Transformation 2: 0.381≤ α ≤ 1
•Transformation 3: α = 0.981
•For transformation 4, twodifferent intervalswere obtained, both in a lower rangeand a higher one that potentially could make the data stationary. The intervals
were 0.181 ≤ α ≤ 0.421 and 0.921 ≤ α ≤ 0.981.
• Transformation 5: 0.141 ≤ α ≤ 1
Accordingly, for each transformations, the values of α presented above resulted in
making financial data potentially stationary.
64
7 Conclusions
7.1 Interpretation and Impact
7.1.1 Trial 1
For the first trial, most transforms seem stationary when examining the pattern
of the graphs. It is also seen that the correlation between the variables exceeds
the limits -1 and 1, which is due to the fact that real data is used. To not
exceed the limits one could scale the data with a common factor. To confirm the
graphical conclusions two hypothesis tests were performed. For the first trial, the
hypothesis tests are performed to each one of the transformed data sets to see
which ones indicate stationary data and which ones do not.
For the EURUSD exchange rate all the p-values from the ADF-tests are small
and therefore it proves that this hypothesis test considers all the transformations
to make data stationary. On the other hand, the KPSS-test does not result in
the same conclusion. There it is seen that transformation nr 4 and 5 have low
values and therefore rejects the null hypothesis that the data is stationary. For
the USDCHF- currency the same results are concluded for the ADF-test but the
KPSS-test indicates that transformation 3, 4 and 5 make data stationary.
From the first trial it is concluded that even though all the graphs visually seemed
to become more stable it does not have to coincide with the result of the different
hypothesis tests. It is also concluded that the ADF and KPSS-test do not have
to indicate the same result and can be contradicting regarding the validity of the
transformations.
7.1.2 Trial 2
Just as for trial 1, the transformations graphically seems to be improving the data.
All except the EWMA-variance transformation that seems to increase the peeks of
the time series. To confirm stationarity the hypothesis tests were performed. For
the second trial, the p-values for the data of the commodity is obtained. In this
case all the p-values from the ADF-tests are low which indicates stationarity. On
65
the contrary, the p-values for transformation 3 and 4 for the KPSS are also low,
which suggests that the data is non-stationary.
It is concluded from the KPSS-test that the EWMA-variance makes already
stationary data non-stationary which was also graphically seen. Therefore, when
performing these tests on real data for actual prediction one should stop if one
transformation has already made the data stationary instead of continuing since
more transformations do not have to result in more stationary data.
7.1.3 Trial 3
For the third trial the gold price data was tried. In accordance with trial 2 all
transformations except EWMA-variance seem to be stabilizing. Thep-values from
the ADF-tests all proved stationarity. For the KPSS-test, transformation 3 and 5
indicated a non-stationary data set. Trial 3 therefore serves as a further indication
regarding EWMA-variance not being a beneficial transformation.
7.1.4 Skewness and Kurtosis
As seen in the plotted moving averages, it is clear that the approximated
EWMAs of skewness and kurtosis are not good enough estimates since they
are not similar to the moving average patterns. There is a huge difference
between the values calculated by the mathematical formula of the skewness for
example, and the values that the transformations of the skewness results in.
Because of this the transformations of skewness and kurtosis are not further
used and it was concluded that better estimates for these transformations are
required for the transformations to be valid. This is an example of a area that
could use more research to find better formulas and approximations for these
characteristics.
7.1.5 First Difference as a Transformation
The first difference transformation does in accordance with theory prove to
be a good transformation. This was seen both graphically when plotting the
66
transformed data and also when trying the transformation for all given data. Most
of the data transformed with the first difference passes both the hypothesis tests
performed. Therefore, it is concluded that this will be a very useful transformation
for future regression models when pre-processing data.
One important aspect to mention is that in this project a time series that was
made stationary by the first difference transformation was still tried for the rest
of the transformations to see the effects of them. To further extend the scope
for future projects it could be relevant to include a part so that data that passes
the first difference transformation and is stationary is not further treated and
analysed for finding optimal α values. This could provide another perspective of
the results.
Also, the first difference was also performed with lag 1. For series with different
lenghts of the season, a better result would have been obtained if the lag was set
to the length of the season. Hence, for future work the seasonal component and
its period should be examined before differencing a time series. This is to find
an appropriate lag and consequently increase the probability of making the series
stationary.
Additionally, for some data a higher degree of differencing may be required to
make a process stationary. For further research it can be examined whether only
differencing of one is sufficient or not to make the financial data stationary.
7.1.6 Finding the Optimal α
As mentioned in the results, a value or intervals of α are found for each
transformation. From the results in Section 6.7.7 it is seen the value is very
different depending on the transformation and therefore it is difficult to aggregate
the values into a single α for all financial data. In comparison with theory found
regarding α below 0.3 being a good value, it is seen that a lot of lower values are
indeed included in the intervals obtained. However, the span of the value is big
since there were many α:s that made the data pass the hypothesis tests. This
means that in this project there is no indication regarding α:s lower than 0.3 being
more useful compared to the α:s above this. However, the lower the value, the
67
more previous data is taken into consideration which is desirable since old data
is necessary for a prediction of future values. For future projects it is essential to
include more data in order to obtain a more clear indication of what values of α
are adequate for making financial time series stationary. The values of α obtained
might only be suitable for the data used in this research but not for other input
data. A big interval does not have to indicate a bad result since it could mean that
the data used just i easily transformed and treated. As mentioned before, a future
goal for SHB is to be able to predict a US stockmarket index. Among other factors,
the index is affected by macro factors such as interest rates, unemployment rates
and GDP figures.73 Consequently, it might be required to pre-process this type of
data too. To broaden this research, the transformations can be tried for various
types of macro data (or other relevant data affecting the stock market), to find
optimal values of α that would make these data sets stationary.
Usingmore hypothesis tests tomake the conditions of ”passing” harder could also
be necessary for making the value or interval of α more specified. In general,
the majority of the transformations seem to make data stationary, at least for
some values of α. EWMA-variance was an exception, since it has a tendency to
sometimes make data even more unstable and non-stationary. More precisely,
this could be seen when the input data had outlying values or sharp fluctuations
in a short period of time. These deviations are even more apparent after using
the transformation and hence, it is concluded that EWMA-variance is possibly
sensitive to extreme values and deviations in the input data. This was seen
both in the graphs and when looping to find an optimal α since only α around
approximately 0.982 worked. To understand why the EWMA-variance formula is
a poor transformation its formula
F4it = (1− α) ∗ F4i,t−1 + α(F2it − F3it)2
is further investigated. If α is close to 1 it means that there will be little influence
of the EWMA-variance from time t-1. Most of the values will therefore solely
depend on transformation 1 (first difference) and transformation 2 (EWMA-
73”Trading the Dow Jones industrial average”. UFX.https://www.ufx.com/en-gb/assets/indices/dow-jones/(Retrieved 2019-05-10)
68
mean) at time t. Since there is so little influence of the previous values,
the prediction of future ones may be misleading. As mentioned, the results
show that EWMA-variance does not handle deviations in the data well. This
transformation might be more suitable for non-volatile data sets. Although, to
draw a conclusion, the transformation has to be tried for much more data, both
highly fluctuating and relatively stable series, and thereafter compare the results.
Also, another weighting of time could be tried as further research, for instance a
log-space exponential moving average for a re-experiment since this has proven
to give more accurate results for very long-term and highly volatile data in other
experiments.74
7.2 Analysis of Timing of Entry and Competitive Rivalry
Machine learning based prediction of financial security prices is a fairly new topic
and it therefore is of interest to discuss the advantages and disadvantages of being
among the first to commercialize the concept.
In the market in the area of predicting securities’ returns with machine learning,
being the first mover can create many advantages for the firm. It could open the
opportunity to establish their model as the standard in the banking industry. This
could make a strong impression on both customers and investors and SHB may
be perceived as innovators or technological leaders. In the long term thismay also
result in gained customer loyalty.
Although there are many first mover advantages regarding this project if SHB
manages to be first, there are also disadvantages. These could be such as too
much expenditures and resources spent on R&D for SHB for this specific project.
During the project it was noticed that a lot of resources had been used in order
to remain in the lead of this development. For example this entire project was
spent on examining the validation of specific data and transformation. As in the
case of the kurtosis and skewness, an extensive amount of time was spent on these
transformations which later on proved to be poor approximations and could not
74”Log-spaceExponential Moving Average”, 2017-11-22. https://www.tradingview.com/script/cyfV1gLU-Log-space-Moving-Average/(Retrieved 2019-05-07)
69
be used for obtaining stationarity of data. This could indicate that perhaps it
could be more beneficial for SHB to be for example the second first to the market.
Then the method for developing a prediction model and also pre-processing data
might already have been examined. There could also be a possibility of having a
prediction model to imitate.
Since the long-term goal of this project is to develop a prediction model it was
interesting to analyze the potential outcome of this through Porters Five Forces.
The possibility for customers to negotiate is not expected to be very high since
there are no banks in Sweden offering these opportunities for investors and
customers based on machine learning. This indicates that being the first mover
may be beneficial for SHB:s bargaining power over the customer and could for
instance set the market prices for their offerings.
Since this area of machine learning is relatively new there are not many
competitors in this specific field among banks. However, this does not have to
mean that there are none in the future. There could bemany other banks investing
in this type of R&D hoping to enter the market in the near future. In this case,
being first to the market can be beneficial since it gives additional time to gain
market share.
However, after applying these models there are still relevant aspects not covered
in the analysis. For instance, there is no perspective regarding the government
and how future regulations and laws could affect the market or limit and change
the structure of the banks. Banks have to follow regulations and laws not only
for legal purposes but to show validity to customers. This means that if laws do
change SHB have to be quick to adjust its business and models to it.
Most importantly, there is currently no information about the end product or
service to be offered in the future. Consequently, no final conclusion can be drawn
regarding the strategy for entering a market. If the product would be an mutual
fund which investment decisions are made based on the forecasts from the future
prediction model, then it may be beneficial to be early in order to attract other
investors the fund quickly. Especially, this is important formutual funds since one
of the advantage ofmutual funds is that they provide economic of scales (reducing
transaction costs for investors). In this specific case, being the first mover could
70
maybe an the advantage since it bring time more time than competitors to gain
these customers.75
7.3 Future Work
This thesis could be used in future work regarding more precise prediction of
different time series. For example, being able to handle different types of data
and have common knowledge on how to make them stationary. The focus in
this research was on specific financial data but the spectra could be made wider.
Moreover, data from more social aspects could give an even better estimate.
For example social media data could have for example been used, such as
twitter hashtags or other important factors that may have an impact on how
people behave on the financial markets. New research have shown that social
media sentiment may have an impact on the stock market.76 In other word,
compared to 10 years ago the influence of apps and social media has increased
significantly which should be taken into consideration when predicting financial
outcome. Inherently, it raises the question how this type of data can be pre-
processed before used as an input in a prediction model. For instance, a question
is whether it is possible to quantify social media data and thereafter use the
transformations assessed in this research to make the data stationary. In, that
case it is also interesting to examine for which parameter values α the data will
become stationary. Other macro-data could also be further investigated and used
to see if data with less frequency behave different compared to our financial data
that has approximately a daily frequency.
It could also be useful to further investigate approximately how much a financial
time series is dependent on a trend and a seasonal component and respectively,
before transforming the series. In this way, it may be possible to identify whether
some transformations or some specific values for the weighting factor α that fit
certain types of time series. If this suggested research is conducted in the future,
an example of an eventual finding could be that one of the transformations is
75Segal, Troy. ”Mutual Fund”, 2019-05-20https://www.investopedia.com/terms/m/mutualfund.asp (Retrieved 2019-05-23)
76Chousa, P. Ramon J. ”Influencing of social media over the stock market”. Psychology andMarketing, 2017. Vol.34(1), pp.101-108
71
suitable formaking cyclical time series stationary. Such findings enables adjusting
the transformation after what type of series is used as an input.
Regarding the results, it would be further research into the differences between
the ADF-test and the KPSS-test since the results sometimes are contradicting.
The most common hypothesis tests either have the same null hypothesis or the
opposite one. Therefore, in this project it was seen as sufficient to only have two
of them. Withmore timemore hypothesis testswith other hypotheses that provide
other support for stationarity could be used. Moreover, in the future, more
transformations and more data can be tried to get an even more reliable result.
Our belief aswhy theKPSS-test andADF-test showdifferent results is that they are
based on different underlying assumptions. Moreover both the hypothesis tests
are based on the usage of the additive model for time series. As explained, the
additive model was chosen due to the plots indicating an additive model since the
amplitude of the graphs did not seem to increase by a multiplicative factor. Also,
it was chosen due to the simplicity of the model. To improve this project other
tests to find the relationship between the time series and its amplitude could for
example be used to gain further evidence to select themodel type. Since not all the
data was plotted from the start the choice of model is a clear assumption in this
project and should be further examined in the future. It was also later discovered
that one can confirm whether or not the time series is additive or multiplicative
by decomposition of the data and an analysis of the residuals obtained through
Python programs.77 For more accuracy this could be done.
Anothermore technical improvement would have been to loop through the αwith
smaller steps such as 10−7 for example. As mentioned this was not done since it
requires a lot of capacity and takes too much time. With greater resources this
could have been done to see how sensitive the value of α is tominor changes. Also,
it was an extensive project during a short period so there was not enough time to
do further analysis of the data beyond the subject of stationarity. With more time
there could bemore analysis done to find outliers of the data and for example treat
them with other diagnostics.
77”Ismy time series additive ormultiplicative?”, 2017 https://www.r-bloggers.com/is-my-time-series-additive-or-multiplicative/ (Retrieved 2019-05-06)
72
7.4 Benefits for SHB and its Stakeholders
SHB has many stakeholders affecting its business, internally as well as externally.
The main interest groups are the employees, customers, society, owners and
investors. When proceeding with a project it is important to identify these and
decide how to handle the different interests. This project was limited to only
the pre-processing of the initial data which means that for this part there is
no direct link to the different stakeholders since the machine learning model
has not yet been modelled and is kept internally within the project group. But,
as mentioned, the bigger hope for this project is for SHB to now be able to
obtain a prediction model for different stocks and indices for example. This
will have a significant impact on investors, customers and society in general.
With this machine-learning established method wiser investment decisions and
recommendations will be provided. This will not only increase the profit of
SHB but also its brand which as clear link to investors, customers and hopefully
employees as stakeholders. SHB is also outspoken about its will to contribution
to the sustainable development goals (SDG:s). Since society and the people of it
also is a vital stakeholder to analyse the potential way the project could increase
value to it should be discussed through its impact on the SDGs.78
One of the SDGs in focus is to promote sustained economic growth, higher
levels of productivity and technological innovation. This will long term lead to
better conditions for jobs and encourage entrepreneurship. This project could
contribute to this throughmore precisely focus on green investment opportunities
or infrastructure stocks through machine learning for prediction of financial
outcome. This will then also contribute to sustainable cities and communities
as well as climate action, two more very important development goals if using
data of this kind for the model. Therefore, with this project stakeholders such
as the society in general can also be affected in a positive way, depending on the
model developed. To summarize, we acknowledge this to not only be a financially
beneficial project but also a project that could help SHB, its brand and society
overall depending on the future model development even though this bachelor
78Handelsbanken ”SustainabilityReport”, 2017https://www.industrivarden.se/globalassets/innehavsbolagen/hallbarhetsred2017eng.pdf(Retrieved2019−05− 06)
73
thesis is only the data-handling of the process.
7.5 Final Words
We are very thankful for the opportunity to participate in a project created and
directed by Handelsbanken. We have learned a lot about the process of working
within a specialized project and how to utilize tools provided. It was a learning
experience to be able to handle all the resources in the most effective way and we
hope that this project will contribute to new research or applications.
We received support throughout the project both from the school and from
Handelsbanken which we value greatly. The project has resulted in a deep
understanding of time series analysis and treatments for stationarity.
74
References
[1] ADF — Augmented Dickey Fuller Test https://www.statisticshowto.
datasciencecentral.com/adf-augmented-dickey-fuller-test/ Retrieved
2019-03-15
[2] Adhikari, Ratnadip et al.An Introductory Study onTimeSeriesModeling and
Forecasting p.16-19
https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf
Retrieved 2019-02-19
[3] Autocorrelation function
http://www.real-statistics.com/time-series-analysis/
stochastic-processes/autocorrelation-function/ Retrieved 2019-04-03
[4] BlackRock.ETF Landscape: IndustryHighlights, 2012-04-12. https://www.
fondsprofessionell.at/upload/attach/1336476343.pdf Retrieved 2019-
05-02
[5] Breaking Down Finance. EXPONENTIALLY MOVING AVERAGE
VOLATILITY (EWMA) https://breakingdownfinance.com/
finance-topics/risk-management/ewma/ Retrieved 2019-05-03
[6] Brownlee, Jason.How toCheck if Time SeriesData is Stationarywith Python
2016-12-30
https://machinelearningmastery.com/
time-series-data-stationary-python/ Retrieved 2019-03-09
[7] Brownlee. Jason. Time Series Forecasting as Supervised Learning
2015-12-05
https://machinelearningmastery.com/
time-series-forecasting-supervised-learning/Retrieved 2019-02-02
[8] Brownlee, Jason. How to Identify and Remove Seasonality from Time
Series Data with Python 2016-12-23 https://machinelearningmastery.
com/time-series-seasonality-with-python/ Retrieved 2019-04-14
75
[9] Brownlee, Jason.
What is Time Series Forecasting?”. Machine Learning Mastery, 2016-
12-02 https://machinelearningmastery.com/time-series-forecasting/
?fbclid=IwAR1Zpv80x-4EEN-IIo-h1HL5fGHF6fD-OZYpknScLWdmU-p3uJ8_
03ZF9Ag Retrieved 2019-05-01
[10] Cappuccio, Nunzio et al.
The Fragility of the KPSS Stationarity Test 2009. http://leonardo3.dse.
univr.it/home/workingpapers/fragility_kpss.pdf?fbclid=
IwAR0snLcQCpmgyNCMq0eR9JgXXwFW3hnIZykKcv72IbZO7t57goM9d1W4xGI
Retrieved 2019-04-30
[11] Investopedia. CBOE Volatility Index (VIX) Definition
https://www.investopedia.com/terms/v/vix.asp Retrieved 2019-05-03
[12] Collecting Data
http://betterthesis.dk/research-methods/
lesson-1different-approaches-to-research/collecting-data Retrieved
2019-02-09
[13] CORREL function https://support.office.com/en-gb/article/
correl-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92 Retrieved
2019-04-03
[14] Chousa, P. Ramon J. Influencing of social media over the stock market
Psychology and Marketing, 2017. Vol.34(1), pp.101-108
[15] Deshpande, Bala. Time series forecasting: understanding trend
and seasonality, 2014-03-12 http://www.simafore.com/blog/bid/205420/
Time-series-forecasting-understanding-trend-and-seasonality
Retrieved 2019-05-01
[16] EURO STOXX® 50 VOLATILITY (VSTOXX) INDEX, 2019-03-29. https:
//www.stoxx.com/document/Bookmarks/CurrentFactsheets/V2T.pdf
Retrieved 2019-05-03
[17] Exponentially Weighted Moving Average https://www.value-at-risk.
net/exponentially-weighted-moving-average-ewmaRetrieved 2019-03-02
76
[18] Generalized Additative Models, 2017-07-06.
https://machinelearningmastery.com/
time-series-data-stationary-python/ Retrieved 2019-05-06
[19] Hayes, Adam. Correlation, 2019-04-30. https://www.investopedia.com/
terms/c/correlation.asp Retrieved 2019-05-01
[20] Handelsbanken. Sustainability Report, 2017.
https://www.industrivarden.se/globalassets/innehavsbolagen/
hallbarhetsred_2017_eng.pdf Retrieved 2019-05-06
[21] Hypotesprövning http://gauss.stat.su.se/gu/sg/2012VT/Kompendium/
KAP17new.pdf Retrieved 2019-05-03
[22] Investopedia Currency Pair Definition https://www.investopedia.com/
terms/c/currencypair.asp Retrieved 2019-05-04
[23] Investopedia CBOE Volatility Index (VIX) Definition
https://www.investopedia.com/terms/v/vix.asp Retrieved 2019-05-03
[24] Investment & FinanceUSGG10YR, 2014-02-
07. https://www.investment-and-finance.net/finance/u/usgg10yr.html
Retrieved 2019-05-02
[25] Jinka, Preetam. Exponential Smoothing for Time Series Forecasting 2017-
06-22.
https://www.vividcortex.com/blog/
exponential-smoothing-for-time-series-forecasting?fbclid=
IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q
Retrieved 2019-02-18
[26] Journal of Econonometrics. Testing the null hypothesis of stationarity
against the alternative of a unit root 1991. http://debis.deu.edu.tr/
userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=
IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8
Retrieved 2019-04-30
[27] Journal of International Studies. Seasonal patterns in oil prices and their
implications for investors, 2018.
77
[28] Kang, Eugine. Time Series, Check Stationarity 2018-08-26
https://medium.com/@kangeugine/
time-series-check-stationarity-1bee9085da05 Retrieved 2019-02-23
[29] Kenton, Will. Autocorrelation, 2019-03-31. https://www.investopedia.
com/terms/a/autocorrelation.asp Retrieved 2019-04-03
[30] KURT function https://support.office.com/en-us/article/
kurt-function-bc3a265c-5da4-4dcb-b7fd-c237789095ab Retrived 2019-
04-03
[31] Kulahci, Murat et al. Time Series Analysis and Forecasting by Example,
2011 p.90
[32] Kuepper, Justin, Volatility Definition,2019-04-18. https:
//www.investopedia.com/terms/v/volatility.asp (Retrieved 2019-05-01)
[33] Lindgren, George. Stationary stochastic processes p.13-16
http://www.math.chalmers.se/~rootzen/fintid/stationary120312.pdf
Retrieved 2019-02-02
[34] Log-space Exponential Moving Average, 2017-11-22.
https://www.tradingview.com/script/
cyfV1gLU-Log-space-Moving-Average/ Retrieved 2019-05-07
[35] MSCIMSCI US Index, 2019-04-30.https://www.msci.com/documents/
10199/67a768a1-71d0-4bd0-8d7e-f7b53e8d0d9f Retrieved 2019-05-03
[36] MSCIMSCI USA MATERIALS INDEX, 2019-04-30.https://www.msci.
com/documents/10199/6ce4617e-9127-480f-8f3b-1fdf4c0c8962 Retrieved
2019-05-03
[37] MSCIIndex solutions, https://www.msci.com/index-solutions Retrieved
2019-05-18
[38] Nabeya, Seiji et al. Asymptotic Theory of a Test for the Constancy of
Regression Coefficients Against the RandomWalk alternative 1987.
https://projecteuclid.org/download/pdf_1/euclid.aos/1176350701?
fbclid=IwAR2Rt2XpMITe_
xA880DiEC4qzo8VEjzmA7HjMKNyp3mKSoKSAXhOaYFf85c Retrieved 2019-04-30
78
[39] Palaniappan, Vivek. Using Machine Learning to Predict Stock Prices 2018-
10-31
https://medium.com/analytics-vidhya/
using-machine-learning-to-predict-stock-prices-c4d0b23b029a
Retrieved 2019-02-02
[40] Pantelis, Anastasios. Testing for unit roots in the presence of structural
changes 2008 http://lup.lub.lu.se/luur/download?func=downloadFile&
recordOId=1338330&fileOId=1646631 Retrieved 2019-03-09
[41] Paul Newton Helen Bristoll Porters Five Forces p. 20-25
[42] Ragnarstrom, Elsa. How to calculate forecast accuracy for stocked items
with a lumpy demands, 2015. https://www.diva-portal.org/smash/get/
diva2:901177/FULLTEXT01.pdf Retrieved 2019-05-03
[43] Ryabko,Daniil. Asymptotic Nonparametric Statistical Analysis of
Stationary Time Series 2019-03-30
https://arxiv.org/abs/1904.00173 Retrieved 2019-05-01
[44] Sample Mean https://www.statisticshowto.datasciencecentral.com/
sample-mean/ Retrieved 2019-04-03
[45] Sample Variance: Simple Definition, How to Find
it in Easy Steps https://www.statisticshowto.datasciencecentral.com/
probability-and-statistics/descriptive-statistics/
sample-variance/ Retrieved 2019-04-03
[46] Sarlin, Peter and Björk, Kaj-Mikael.
Machine learning in financeNeurocomputing. Vol. 264, 2017: 1-88
[47] Shilling , Melissa Strategic Management of Technological Innovation 5th
edition 2017 p.93-97
[48] SKEW function https://support.office.com/en-ie/article/
skew-function-bdf49d86-b1ef-4804-a046-28eaea69c9fa
Retrieved 2019-04-03
79
[49] Stationarity and Differencing
https://www.statisticshowto.datasciencecentral.com/stationarity/
Retrieved 2019-03-02
[50] statsmodels.tsa.stattools.kpss https://www.statsmodels.org/dev/
generated/statsmodels.tsa.stattools.kpss.html Retrieved 2019-05-04
[51] Segal, Troy. Mutual Fund, 2019-05-20. https://www.investopedia.com/
terms/m/mutualfund.asp Retrieved 2019-05-23
[52] Tests of Stationarity
https://people.maths.bris.ac.uk/~magpn/Research/LSTS/TOS.html
Retrieved 2019-02-12
[53] The Augmented Dickey-Fuller Test https://www.thoughtco.com/
the-augmented-dickey-fuller-test-1145985 Retrieved 2019-02-27
[54] Time Series
http://www.businessdictionary.com/definition/time-series.html
Retrieved 2019-01-30
[55] Time Series Analysis: Building a Model on Non-stationary
Time Series, 2018-01-30. https://datascienceplus.com/
time-series-analysis-building-a-model-on-non-stationary-time-series/
Retrieved 2019-03-23
[56] UFX. Trading the Dow Jones industrial average, 2018-01-30. https://
www.ufx.com/en-gb/assets/indices/dow-jones/ Retrieved 2019-05-10
[57] Verbeek, Marno. A Guide to Modern Econometrics, 2014, 2nd Edition,
p.265-268
[58] 6 Factors that Influence Exchange Rates
https:
//www.investopedia.com/trading/factors-influence-exchange-rates/
Retrieved 2019-05-01
80
TRITA -SCI-GRU 2019:270
www.kth.se
Recommended