Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

1

Forecasting daily and monthly exchange rates with machine learning

techniques

Vasilios Plakandaras1,

Theophilos Papadimitriou*, Periklis Gogas

+

Department of Economics

Democritus University of Thrace.

Abstract

We combine signal processing to machine learning methodologies by introducing a

hybrid Ensemble Empirical Mode Decomposition (EEMD), Multivariate Adaptive

Regression Splines (MARS) and Support Vector Regression (SVR) model in order to

forecast the monthly and daily Euro (EUR)/United States Dollar (USD), USD/Japanese

Yen (JPY), Australian Dollar (AUD)/Norwegian Krone (NOK), New Zealand Dollar

(NZD)/Brazilian Real (BRL) and South African Rand (ZAR)/Philippine Peso (PHP)

exchange rates. After the decomposition with EEMD of the original exchange rate series

into a smoothed and a fluctuation component, MARS selects the most informative input

datasets from the plethora of variables included in our initial data set. The selected

variables are fed into two distinctive SVR models for forecasting each component

separately one period ahead for daily and monthly data. The summation of the two

forecasted components provides exchange rate forecasts. The above implementation

exhibits superior forecasting ability in exchange rate forecasting compared to various

models. Overall the proposed model a) is a combination of empirically proven effective

techniques in forecasting time series, b) is data driven, c) relies on minimum initial

assumptions and d) provides a structural aspect of the forecasting problem.

JEL Code: G15

Key words: Exchange rate forecasting, Support Vector Regression, Multivariate

Adaptive Regression Splines, variable selection, Ensemble Empirical Mode

Decomposition, time series forecasting.

1 [email protected] * [email protected], corresponding author + [email protected]

mailto:[email protected]

2

1. Introduction

Following the breakdown of the Bretton Woods fixed exchange rate system, a large

number of models were proposed in order to outperform the random walk model in

exchange rate forecasting. Forecasting the evolution of exchange rates apart from

influencing the way an investor chooses to allocate his wealth into different portfolios, is

also a key determinant in macroeconomic variable forecasting, since future

macroeconomic information is projected on exchange rate values and vice versa (Rime,

Sarno and Sojli, 2010).

There is a vast literature regarding exchange rate forecasting. Overall, we can separate

existing research initiatives into two general categories: models examining the influence

of macroeconomic variables on the exchange rate markets and those that build on a

micro-structural market approach. The first category includes the so-called monetary

exchange rate models which attempt to establish a link between the evolution of

exchange rates and the variability of fundamentals. To mention just a few, one of the

first attempts to examine exchange rate evolution is the classic model of Mundell (1968)

and Fleming (1962) and its extension, i.e. the sticky price monetary model proposed by

Dornbuch (1976). On a similar perspective, the flexible price monetary model was for

many years the workhorse of exchange rate economics (Bilson; 1978, Stockman, 1980;

Lucas, 1982) extended by Frankel (1979) to its real interest rate differential variant.1

A large part of the critique against the ability of fundamentals to forecast exchange rates

stems from the work of Messe and Rogoff (1983). In their seminal paper they test

various monetary exchange rate models against a random walk (RW) with a drift model

in out-of-sample forecasting. They find that regardless of the version of the monetary

model used, none outperforms the RW in terms of Mean Square Error. This suggestion

has been for many years the standard benchmark for exchange rate forecasting models.

Mark and Sul (2001) using panel regression models reject the univariate framework of

Messe and Rogoff (1983) concluding that fundamentals possess significant forecasting

1 A complete reference of all monetary exchange rate literature is beyond the scope of this paper. An interest reader is referenced to Ronald Macdonald (2010).

3

ability on cross sectional data examination. On a broader perspective, Cheung et al

(2005) argue against the ability of the widely used Mean Square Error metric for

evaluating the forecasting power of a structural model. Studying a vast variety of

monetary exchange rate models, they find that the forecasting ability of each model

depends on the time period considered for evaluation. On a more promising path,

Molodtsova and Papell (2009) report superior out-of-sample predictability on short

forecasting horizons for monetary structural models compared to a RW for a sample of

eleven currencies. Overall, the examination of the existing literature leads to the

conclusion that fundamentals have not yet established their value in exchange rate

forecasting since their main drawbacks have not been convincingly overturned.

Under a different perspective, the microstructure proposition of the exchange rate

market focuses on extremely short term forecasting (from daily to tick-by-tick data) and

incorporates a broad category of input variables. Evans and Lyons (2002, 2005) find that

institutional aspects of the foreign exchange market such as order flow volume (net

value of long and short positions) can be of significant importance. As Lyons (2001)

states it, unlike macroeconomic expectations that are based on surveys, order flow is a

more realistic measurement of market sentiment since it represents one’s direct

willingness to back his beliefs up with money. Nevertheless, Sager and Taylor (2008)

examine the forecasting ability of various currencies and forecasting horizons and reject

the superiority of order flow models against the RW. Killeen et al (2006) argue that the

predictive information content in order flow models decays rapidly over time and thus

their forecasting ability is time limited since exchange rates revert back to a RW model,

implying that the exchange rate forecasting puzzle is far from being considered fully

tackled.

On similar strands, Karemera and Kim (2006) develop Auto Regressive Integrated

Moving Average (ARIMA) models that outperform random walk models for a number

of currencies on a monthly forecasting horizon but the forecasting ability is strongly tied

to the time period under evaluation. Cheung (1993) detects long-run memory in

exchange rates and proposes the use of Auto Regressive Fractionally Integrated Moving

Average, but with limited success. On the other hand, Generalized Auto Regressive

Conditional Heteroskedasticity (GARCH) models are a popular selection among

practitioners for volatility modeling, although lacking clear cut validation of their

4

forecasting ability. Bollerslev and Wright (2001) compare GARCH and Exponential

GARCH models to autoregressive models on high frequency data, concluding that

GARCH models possess forecasting power but tend to forecast worse than AR models

on high frequency series such as daily or intraday data. Later, Galbraith and Kisinbay

(2005) reach to similar findings with the above for Deutche Mark/USD and Japanese

Yen (JPY)/USD exchange rates. Projections based on past realized variance with

autoregressive models provide better one step ahead forecasts than GARCH and

Fractionally Integrated GARCH models for a period of up to 30 trading days.

Apart from econometricians, the intense economic interest for exchange rate forecasting

has attracted researchers from a wide variety of scientific areas. Machine Learning

methodologies and more specifically Neural Networks (NN) (Semprinis et al, 2013) and

Support Vector Machines (SVM) gained significant merit based on their ability to model

non-linear systems with minimum initial assumptions and high forecasting accuracy.

Dunis and Williams (2002) compare alternative models: NN, Random Walk, Auto

Regressive Conditional Heteroskedasticity, Moving Average Convergence/Divergence,

Auto Regressive Moving Average and a Logit model on the EUR/USD exchange rate.

The sample spans May 2000 to July 2001 and use a daily forecasting horizon. They

verify the statistical and economic superiority of the NN based scheme over the

alternative models. Brandl et al., in their 2009 paper, use Genetic Algorithms for feature

selection and set their model using the SVR methodology. The initial dataset includes

variables suggested by the Purchasing Power Parity, the Covered Interest Rate Parity,

and the Uncovered Interest Rate Parity. The proposed model outperforms an NN, an

OLS regression and an ARIMA model on monthly out-of-sample forecasting for

EUR/USD, USD/JPY and USD/GBP rates. On a similar research framework, Ince and

Trafalis (2006) couple ARIMA and SVM for improving directional forecasting of the

EUR/USD exchange rate and outperforming Logit/Probit models.

The use of the RW model as a forecasting benchmark is also a contemporaneous testing

of the Efficient Market Hypothesis (EMH). Proposed by Eugene Fama (1965), states

that the determination of prices in an efficient market follows a random walk and thus it

is impossible to create a forecasting model that achieves sustainable positive returns on

the long-run. The EMH is usually presented in three forms; the weak, the semi-strong

and the strong form of efficiency. We have a weak-form efficient market when historic

5

prices of the variable in question cannot forecast the future ones, as the generating

mechanism of prices follows a random walk. Thus, autoregressive models have no

forecasting power and the best forecast about next period’s price is today’s price. Semi-

strong efficiency imposes more strict assumptions in that all historic prices and all

publicly available information is already reflected in current asset prices and thus they

cannot be used successfully in forecasting. Finally, the strong EMH builds on the semi-

strong case adding all private information and thus making impossible to forecast

successfully the future evolution of an asset’s price. Overall, outperforming the RW

model is an indication of potential economic gains for a trader that follows an alternative

trading strategy.

In this paper we propose a hybrid model that combines signal processing to Machine

Learning methodologies. Following existing literature we compile an initial dataset

containing a plethora of variables. Then we suppress noise by decomposing all series

with EEMD into a smoothed and a fluctuation component. MARS selects the most

informative input dataset for each component of the forecasted exchange rate. On a final

stage, two SVR models are trained for forecasting each component of the exchange rate

separately. The additive series of the two forecasted component series is evaluated for

out-of-sample forecasting.

We choose one day ahead forecasting as suggested by Rime, Sarno and Sojli (2010)

because: (i) one-day ahead forecasts are implementable, (ii) it is a relevant horizon for

practitioners (e.g. most currency hedge funds), (iii) unlike intraday forecasts it involves

interest rate considerations, and (iv) it is unlikely that gradual learning based on this data

will allow forecasting at much longer horizons. Moreover, one month ahead forecasting

is evaluated for comparison to a longer horizon. We consider our method for the Euro

(EUR)/United States Dollar (USD), USD/Japanese Yen (JPY), Australian Dollar

(AUD)/Norwegian Krone (NOK), New Zealand Dollar (NZD)/Brazilian Real (BRL) and

South African Rand (ZAR)/Philippine Peso (PHP) exchange rates. According to the

triennial survey of BIS (2010) USD, EUR, JPY, AUD and NOK belong to the ten most

6

traded currencies in the foreign exchange rate market2, while the latter three exhibit very

small trading volumes. Since the trading volume of a currency reflects the economic

interest of traders, we select USD, EUR and JPN as high, NOK, AUD and NZD as

medium and BRL, PHP, ZAR as currencies with small trading volumes respectively. In

this way we evaluate the forecasting power of our proposed model by examining

exchange rates of different volumes and thus we expect that different trading interest

leads to different versions of market efficiency.

Following the empirical findings of West et al (1993), Cheung et al (2005) and Rime et

al (2010) statistical evaluation itself cannot guarantee tangible economic gains for a

trader. Therefore we extend our research framework not only to statistical metrics as

MSE, but also conduct mean-variance analysis of returns for a dynamic trading strategy

based on the forecasts provided by our model. We calculate the Sharpe ratio (SR) of a

trading strategy following forecasts of the proposed model, since SR is a common

performance measure among practitioners.

2 USD:42,45%, EUR: 19,05%, JPY:9,5%, NZD:0,9%, NOK: 0,65%, BRL: 0,35%, PHP: 0,1% and ZAR: 0,35% of the total trading volume.

7

2. Methodology – Dataset

2.1 Methodology overview

As reported by Ronald Macdonald (2010), the volatility of exchange rates time series is

significantly higher than the volatility of other macroeconomic variables, with the

exception of interest rate differentials. The above phenomenon can be attributed to

transaction costs (which shift the original price of the series away from its original value)

and to the difficulty of observing the exact price of the exchange rate, since the daily

exchange rate is the approximation of a large number of tick-by-tick data. Smoothing

techniques can provide an alternative representation of the exchange rate series closer to

the original underlying phenomenon, reducing noise on data set.

In the voluminous literature regarding time series smoothing, several methodologies are

proposed3. A key issue in all smoothing methodologies is the definition of the optimal

point, beyond which smoothing results to distortion rather than noise reduction.

Apparently, since the time series of the actual phenomenon is always unknown and we

can only observe its noise additive variant, we can only assume its actual representation

through the use of a smoothing function. Unfortunately, many smoothing

implementations select ad hoc the smoothness parameters of the model, lacking

empirical evidence.. To overcome this drawback, our implementation exploits the

unique abilities of a relatively novel signal decomposition method called Ensemble

Empirical Mode Decomposition (EEMD) coupled with the criteria proposed by Guo and

Tse (2013) and the trend extraction method of Moghtaderi et al (2013). The main

advantage of this smoothing framework is the absence of initial assumptions about data

and the tuning of all model parameters based solely on data characteristics.

Another difficulty in training a forecasting model is the proper selection of input

variables. A large part of existing literature selects arbitrary input dataset of the

forecasting model, based on the propositions of a theoretical framework such as the

sticky or the flexible price model aforementioned. Instead of “hand-picking and then

3 For a detailed survey on the field of trend extraction and smoothing on time series the interested reader is referred to Alexandrov et al (2009).

8

searching for the empirical confirmation of our decision, we implement Multivariate

Adaptive Regression Splines (MARS) for automatic variable selection based on

statistical loss metrics. After decomposing all initial time series into a smoothed and a

residual fluctuating component (from each series we get its two components), we train

two MARS models for selecting the input variables of every exchange rate; one MARS

model for the smoothed and one for the fluctuating component. The selected variables

are then fed into two Support Vector Regression (SVR) models for forecasting each

component separately. Finally, the summation of the two forecasted component series is

evaluated for out-of-sample forecasting. An overview of the proposed model is depicted

in Figure 1.

Figure 1: Overview of the proposed EEMD-MARS-SVR model.

An innovation of this paper in comparison to other noise reduction methodologies

(Premanode and Toumazou, 2013) is that we model the fluctuating part of the initial

time series separately from its smoothed variant, but taking into account forecasts of

both parts in order to avoid information loss. In other words, since the observed value of

the exchange rate in the exchange rate market is always a noise additive version of the

underlying phenomenon, we study the observed value into a segregated smoothed part

and a noise component (fluctuating part). Thus we can produce models that are better

fitted to the characteristics of each series. Then, we restructure a forecasted proxy of the

9

actual observed value from both component forecasts so as to approximate the actual

observed exchange rate value, which always includes noise.

2.2 Ensemble Empirical Mode Decomposition and time series smoothing

The EEMD is a data driven algorithm that decomposes a time series into finite additive

oscillatory components called Intrinsic Mode Functions (IMFs). Proposed by Wu and

Huang (2009), the main advantage of EEMD is the lack of initial assumptions about data,

such as the need for linearity or stationarity. The decomposition into IMFs is achieved

through an iterative scheme, until stopping criteria are met and the residual has no local

maxima. The generic EEMD procedure is described on the Appendix and an example of

a decomposition applied on daily EUR/USD exchange rate is depicted in figure 2.

Figure 2: Decomposition of the original (1

st row) daily EUR/USD exchange rate into 10

series. The last row (straight line) is the residue of the EEMD process.

The main difference of EEMD in comparison to simple Empirical Mode Decomposition

(EMD) is the addition of white Gaussian noise to the original series before

decomposition. Iteratively white noise is used and an ensemble mean of all IMFs is

10

taken as the final result (i.e. the mean of the 1st IMF of all decompositions is the final 1

st

IMF and so on). The ensemble of the decomposed IMF is relieved from the initially

added noise4.

The amplitude of the Gaussian noise variant added is chosen according to the

maximized relative RMSE criterion proposed by Guo and Tse (2013):

√∑ ( ( ) ( ))

∑ ( )

( )

where n is the number of observations, the original series and ( ) the IMF with

the maximum correlation with the initial exchange rate series.

The proposed Relative RMSE is the ratio between the RMSE of the original series and a

selected IMF. The IMF with the highest correlation is expected to preserve the main

elements of the exchange rate under decomposition, relieved from the added white noise.

So, this specific IMF is selected for evaluating the decomposition performance for

different Gaussian noise amplitudes.

Following Moghtaderi et al (2013) we compute a smoothed version of the initial time

series as the sum of the last few IMFs and the residual of the EEMD decomposition.

With mathematical notation, the above smoothing function is expressed with:

∑

( )

where is the smoothed variant of the initial series for index , L the total number of

IMFs, r the final EEMD residual and the optimum index of the IMFs to be summed.

So, smoothing depends on selecting the optimum index , for constructing the

smoothed series.

From the decomposition of the original exchange rate series in Figure 2, we notice that

the frequency of every IMF drops as we move from old to new IMFs, until no further

4 For more details the interested reader is referenced to Wu et al (2009).

11

decomposition is possible. Rilling et al (2005) examine fractional Gaussian noise

functions and find that apart from frequency, the averaged energy of every IMF

decreases as the index order increases. In general, the mathematical notation of a

signal’s energy is given by:

∑| |

( )

where denotes time observations of the i-th IMF, is the energy of the i-th IMF

and L is the total number of IMFs. Then, the averaged energy of an IMF would be:

∑

(4)

Moghtaderi et al (2013) expand the energy drop assumption to broadband functions

showing that is a decreasing series of i. Nevertheless, when we depart from fractional

Gaussian noise to general form functions, the energy of certain IMFs increase instead of

decreasing. Exploiting this empirical observation the authors argue that the optimum

index of the smoothing function lies with the smallest index, where the average

energy rises for the first time.

For instance, in Figure 2, after the decomposition of the daily time series into 10 IMFs

and the residual, energy rises on the 4th and 6

th IMF. So by summing up 4th to 10th IMF

plus the residual, we obtain the smoothed function representation of the red curve in

Figure 3, while figure 4 depicts implementation on monthly data.

12

Figure 3: Daily EUR/USD exchange rate and smoothed series.

Figure 4: Monthly EUR/USD exchange rate and smoothed series.

2.3 Support Vector Regression (SVR)

The Support Vector Regression is a direct extension of the classic Support Vector

Machine model. The algorithm proposed by Vladimir Vapnik et al (1992), originates

from the field of statistical learning. When it comes to regression, the basic idea is to

13

find a linear function that has at most a predetermined deviation from the actual values

of the initial series. In other words we do not care about the error of each forecast as

long as it doesn’t violate a threshold, but we will not tolerate a higher deviation. The

Support Vector (SV) set which bounds this “error-tolerance band” is located in the

dataset through a minimization procedure.

One of the main advantage of SVR in comparison to other machine learning techniques

is the ability to identify global minima avoiding local ones, thus reaching an optimal

solution. This aspect is crucial to the generalization ability of any model in producing

accurate and reliable forecasts. The model is built in two steps: the training step and the

testing step. In the training step, the largest part of the dataset is used for the estimation

of the function (i.e. the detection of the Support Vectors that define the band); in the

testing step, the generalization ability of the model is evaluated by checking the model’s

performance in the small subset that was left aside in the first step performing out-of-

sample forecasting.

Using mathematical notation we start from a training data set

( ) ( ) ( ) , where are

observation samples and is the dependent variable (the target of the regression system

that we need to approximate). The method’s scope is to minimize the loss function

(

‖ ‖ ) subject to| ( )| , thus we enforce an upper

deviation threshold creating an “error-tolerance” band.

Expanding this initial framework, Vapnik and Cortes (1995) propose a soft margin

model, i.e. they allowed the existence of data points outside the error tolerance zone,

though they penalized them proportionally to the distance from the edge of the zone. So,

they introduce slack variables to the loss function and controlled through a cost

parameter C, resulting in the loss function (

‖ ‖ ∑ (

) ).

In this way the primal problem that we wish to solve is:

14

‖ ‖ ∑(

) ∑(

) ∑ ( )

∑ (

)

( )

where

are the Lagrange multipliers from the Lagrangian primal function (5).

Instead of solving (3) we attack the dual form of the problem, which takes the form:

(

∑ (

) (

) ∑ (

) ∑ (

) ) (6)

subject to ∑ ( )

and

The solution of the primal problem (5) is

∑ ( )

( )

and ∑ ( )

( )

Naturally, not all phenomena can be described by strictly linear functions. To overcome

such a drawback, we map initial data into a higher dimensional space were such a linear

function exists (Figure 5). This mapping is achieved through the use of a kernel function,

also known as “kernel trick”, since the computation of such a projection for a

multivariate problem would be impossible. Then the linear solution of the procuring

univariate equation is re-projected back to its original dimensional space. In this way,

with the selection of both linear or non-linear kernels SVR can also approximate non-

linear phenomena.

15

Figure 5: Upper and lower threshold on error tolerance indicated with letter ε. The

boundaries of the error tolerance band are defined by Support Vectors (SVs). On the

right we see the projection form 2 to 3 dimensions space and the projected error

tolerance band. Forecasted values greater than ε get a penalty ζ according to their

distance from the tolerance accepted band (source: Scholckopf and Smola, 1998).

The described mapping is performed on four kernels: the linear, the radial basis function

(RBF), the sigmoid and the polynomial. The mathematical representation of each kernel

is:

Linear ( ) (9)

RBF ( ) ‖ ‖ (10)

Polynomial ( ) ( ) (11)

Sigmoid(MLP) ( ) ( ) (12)

with factors d, r, γ representing kernel parameters.

For the construction of the SVR model we used LIBSVM, an SVM model computation

software package developed by Chang and Lin (2011).

SV

SV

SV

16

2.4 Multivariate Adaptive Regression Splines

Multivariate Adaptive Regression Splines (MARS) proposed by Freidman (1991), is a

non-parametric form of piece-wise regression. In global parametric methods such as

linear regression, the relationship between a depended variable and a set of explanatory

variables is described using a global parametric function fitted universally to all data set.

In a non-parametric approach the available data are separated into sub-regions and a

model is locally fitted to each sub-region. The points separating the regions are called

knots. The position and number of the knots is determined iteratively based on a lack-of-

fit criterion.

Starting from a training dataset ( ) ( ) ( )

, where are input variables and is the dependent variable vector (the

target of the regression system that we need to approximate), MARS builds a model of

the form

( ) ∑ ( ) (13)

where is a constant, are local regression model coefficients of each sub-region, m

the number of the sub-regions of the data and ( ) are spline basis functions of the

form:

( ) ( ) (14)

where is the corresponding knot of the sub-region i (a constant number) and the

data sample of the sub-region i.

MARS models are developed through a two stage forward/backward stepwise regression

procedure. In the forward stage, the entire D is split arbitrarily into overlapping sub-

regions and model parameters are selected by minimizing a lack-of-fit criterion. On the

backward stage, basis functions (variables) and knots that no longer contribute to the

accuracy of the fit are removed. The lack-of-fit criterion is a modified version of the

generalized cross validation criterion (MGCV) expressed as:

∑ ( )

( ( ) ) ⁄ ⁄ (15)

17

where C(M) is the number of parameters being fit, n is the total number of observations

and d is a penalty factor. A representation of a MARS model compared to a parametric

linear regression model is depicted in Figure 6.

Figure 6: A MARS and a parametric linear regression model representation.

The basic advantages of MARS are its data driven regression procedure, the lack of

initial assumptions about data and the detection of an optimum input variable set

through a lack-of-fit criterion. For the development of MARS models we used the

software package ARESLab developed by Jekabsons (2011).

2.5 Dataset

As already stated, the scope of this paper is to create models that produce one period

ahead forecasts for 5 nominal exchange rates, using monthly and daily data. The data

span the period from 1/1/1999 to 30/10/2011, not including weekends and holidays. We

use as explanatory variables data of 7 exchange rates, closing prices of major stock

indices in the U.S. and the Euro zone, spot prices of 10 precious and non-precious

metals, 18 commodities including crude oil and moving averages of 3,5,10 and 30 days

of the EUR/USD exchange rate (since it is the exchange rate with the highest volume on

the exchange rate market). We also include trade weighted USD indices compiled by the

Federal Reserve Bank of Saint Louis (FRED), EURIBOR rates of various maturities and

interest rate spreads between commercial paper, the federal funds and the effective

federal funds rate. Finally, we include basic macroeconomic variables from the US, EU,

Japan, UK, Australia, Norway, South Africa, Brazil and New Zealand.

18

Table 1: Input Variables Commodities

Crude Oil

Cotton

Lumber

Cocoa

Coffee

Orange Juice

Sugar

Corn

Wheat

Oats

Rough Rice

Soybean Meal

Soybean Oil

Soybeans

Feeder Cattle

Lean Hogs

Live Cattle

Pork Bellies

Iron Ore

Crude Oil

Iron Ore

Cocoa

Cotton

Lumber

Metals

Gold

Copper

Pallladium

Platinum

Silver

Aluminium

Zinc

Nickel

Lead

Tin

Stock Indices

Dow Jones

Nasdaq 100

S & P 500

DAX

CAC 40

FTSE100

USD Trade Weighted

Indices Major partners

Broad Index

Other Partners

Interest rates

T-bill 6 months

T-bill 10 years

Spread MLP-EURIBOR 3M

Spread MLP-Eonia

Spread FF-CP

Spread FF-EFF

EONIA

EURIBOR 1 Week

ECB Interest rate

EURIBOR 1 Month

FED rate

Macroeconomic

Japan/New Zealand/South

Africa/ Brasil/ Australia/ UK

Variables

Cunsumer Price Index

Productivity index

Gross Domestic Product

Trade Balance

Unemployment rate

Exchange Rates

JPY/USD

USD/GBP

BRL/NZD

NOK/AUD

PHP/ZAR

EUR/GBP

EYR/USD

Technical Analysis

variables Relative Strengh

Index

MA3 EUR/USD

MA5 EUR/USD

MA10 EUR/USD

MA30 EUR/USD

Macroeconomic

EU Variables

Unemployment rate

Productivity

Consumer Price

Index

Current accounts

Debt

Deficit

M3

GDP

Macroeconomic

US variables

Consumer Price

Index

Debt

M3

trade balance

US deficit or

Surplus

GDP

Productivity

Unemployment rate

19

3. Empirical Results

3.1.1 Statistical Evaluation

A crucial step in measuring the generalization ability of a model is its out-of-sample

forecasting performance. Thus, in order to test the generalization ability of the selected

model, the dataset is split in two parts; the train subset and the test subset. The train/test

set ratio chosen both for monthly and daily data is 80/20. Specifically, we leave aside 30

monthly observations (6/2009-10/2011) and 648 daily observations (2/4/2009-

30/10/2011) for out-of-sample forecasting. As forecast evaluation metrics we use the

Root Mean Square Error (RMSE), the Root Mean Square Percentage Error (RMSPE)

and the Mean Absolute Percentage Error (MAPE).

√

∑( )

( )

√

∑ (

)

(17)

∑|

|

( )

where is the forecasted ith value of the actual Yi rate and n is the total number of the

observations.

After implementing EEMD to decompose each time series into a smoothed and the

remaining fluctuating component, MARS selects the most informative input dataset for

each component of the forecasted exchange rate.

Table 2: Optimum input variable number selected by MARS

Exchange

rate

Monthly data Daily data

Fluctuating

Component

Smoothed

Series

Fluctuating

Component

Smoothed Series

EUR/USD 16 1 22 1

USD/JPY 14 4 20 1

AUD/NOK 11 7 19 1

ZND/BRL 5 1 17 1

ZAR/PHP 1 1 22 1

As we observe from table 2, MARS selects only the smoothed component of the current

exchange rate as input variable for daily forecasting of the smoothed component, while

20

for monthly forecasting of USD/JPY and AUD/NOK constructs a structural model of

more input variables. For the fluctuating component, the input dataset consists from

more variable for daily forecasting horizon and a less for monthly data.

In order to test the forecasting ability of the EEMD-MARS-SVR model against

alternatives, we compare it with a Random Walk (RW) with a drift, an autoregressive

(using as input variables only the two decomposition components for each exchange rate)

AR-SVR and an EEMD-AR-SVR model. The results from the evaluation of all models

are depicted in tables 3 and 4.

Table 3: Out-of-sample forecasting results on monthly exchange rates

Model RMSE RMSPE MAPE

EUR/USD

RW 0.0512 0.0379 0.0295

AR-SVR(linear) 0.2260 0.0379 0.0294

EEMD-AR-SVR(linear) 0.0426 0.0314 0.0239

EEMD-MARS-SVR(RBF) 0.0407 0.0300 0.0238

USD/JPY

RW 2.4739 0.0281 0.0221

AR-SVR(linear) 3.3696 0.0395 0.0354


EEMD-MARS-SVR(linear) 1.9996 0.0226 0.0177

AUD/NOK

RW 0.1054 0.0189 0.0151

AR-SVR(linear) 0.1693 0.0301 0.0261

EEMD-AR-SVR(poly) 0.1017 0.0187 0.0144

EEMD-MARS-SVR(poly) 0.1287 0.0233 0.0184

ZND/BRL

RW 0.0359 0.0275 0.0220

AR-SVR(RBF) 0.0342 0.0263 0.0209

EEMD-AR-SVR(Sigmoid) 0.0253 0.0192 0.0158

EEMD-MARS-SVR(Sigmoid) 0.0256 0.0194 0.0149

ZAR/PHP

RW 0.2075 0.0348 0.0270

AR-SVR(linear) 0.2054 0.0344 0.0267

EEMD-AR-SVR(poly) 0.1862 0.0317 0.0213

EEMD-MARS-SVR(Sigmoid) 0.2017 0.0337 0.0243

Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.

21

Table 4: Out-of-sample forecasting results on daily exchange rates

Model RMSE RMSPE MAPE

EUR/USD

RW 0.0096 0.0070 0.0055

AR-SVR(linear) 0.0980 0.0070 0.0055



USD/JPY

RW 0.5950 0.0068 0.0050

AR-SVR(linear) 0.6088 0.0069 0.0053



AUD/NOK

RW 0.0340 0.0063 0.0049

AR-SVR(linear) 0.0343 0.0063 0.0050


EEMD-MARS-SVR(poly) 0.0583 0.0102 0.0078

ZND/BRL

RW 0.0106 0.0082 0.0064

AR-SVR(linear) 0.0108 0.0082 0.0064

EEMD-AR-SVR(RBF) 0.0093 0.0072 0.0057


ZAR/PHP

RW 0.0661 0.0111 0.0085

AR-SVR(linear) 0.0661 0.0111 0.0085



Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.

The model incorporating EEMD decomposition exhibits superior forecasting ability

both on daily and monthly data. In all exchange rates but the EUR/USD, the

autoregressive version of the EEMD-SVR model has the smallest RMSE, while for the

EUR/USD series the structural version of the model with the variables selected by

MARS for the fluctuating component has lower RMSE. With the exception of

AUD/NOK series, both the EEMD-AR-SVR and the EEMD-MARS-SVR outperform

the RW model, while for the AUD/NOK rate only the autoregressive EEMD-AR-SVR

model is more accurate.

The empirical findings are rather interesting and in line with our expectations about

trading volumes and EMH. The EUR/USD trading volume accounts for 28% of the total

22

daily trading volume in exchange rate market5. A successful estimation of the future

price of the exact exchange rate with the aforementioned trading volume could yield

high trading profits. Thus we expect that the intense economic interest for the EUR/USD

rate minimizes profit margin rapidly by moving the exchange rate to buy/sell

equilibrium and ultimately leads market to efficiency.

Indeed, our empirical findings are in favor of a weak form of market efficiency on the

EUR/USD exchange rate market. Both with daily and monthly data, the structural

variant of our proposed model presents better forecasting abilities, while for daily

forecasting horizon the autoregressive model is only slightly better than the RW, in

comparison to the structural one. So, we cannot reject the weak form efficiency for the

EUR/USD rate. On the contrary, market efficiency is rejected for all other exchange

rates, since the autoregressive model provides significantly better forecasts than the RW

and the structural variant, which is in line with our expectations due to their lower daily

trading volumes in comparison to EUR/USD.

Overall, SVR models coupled with EEMD decomposition as a preprocessing step

produce superior out-of-sample forecasts compared to the RW model, indicating the

possibility of potential profits for a trading strategy based on the EEMD-SVR forecasts.

3.2 Trading performance

In the previous section we measured the forecasting ability of the EEMD-SVR model

with forecasting performance metrics. Nevertheless, when it comes to trading, the

interest of a market participant is the evaluation of a model by means of overall

volatility and return, since statistical performance is not always synonymous to profit (as

pointed out by Cheng et al (2005)). Specifically, we examine whether there are any

additional economic gains for an investing strategy based on forecasts of the proposed

model, in comparison to trading based on the RW model; i.e. that the current price of the

exchange rate is the best approximation to future price.

5 According to the triennial report of BIS (2010) EUR/USD accounts for the 28% of the total daily trading volume, USD/JPY for 14% and all the other exchange rates are well below 1%.

23

Sharpe ratio, or return-to-variability ratio, measures the risk-adjusted returns of a

portfolio or investment strategy and is widely used by investment banks and asset

management companies to evaluate investment performance. It measures the risk-

adjusted performance of a portfolio and its mathematical expression is:

(19)

where is the annualized return of the exchange rate and the annualized standard

deviation of the trading strategy.

In order to measure the profitability of the model, we transform our forecasts into

returns with the formula:

(

) ( )

where is the return at time t+1 and is the exchange rate at time t. The trading

strategy is to stay long on the currency of the nominator of each exchange rate when the

forecasted return is positive and go short (sell) the currency at the opposite occasion.

When the forecasted return is zero, we are indifferent, keeping our previous position. In

other words the problem breaks down to a binary problem of 2 states; up/down, buy/sell

and in the special case stay neutral. For simplicity we assume that the investor begins

with an initial amount of all five currencies of the nominator, he invests entirely to one

of the two currencies for each exchange rate and receives his long/short decision based

on the forecast value of the exchange rate he has on time t, a position that he closes at

the end of the next trading period (t+1). At the end of the trading horizon, he measures

the annualized return and standard deviation, given by:

∑

( )

√ (22)

where is the annualized return, the annualized standard deviation, the standard

deviation computed at the end of the forecasting horizon for all out-of-sample forecasted

returns and the maximum annual transactions possible, here 252 for daily and 12 for

24

monthly data. Since the vast majority of transactions is made through electronic trade

platform such EBS and Reuters’ dealing 3000 thus minimizing trading costs, we assume

the least transaction cost possible for all exchange rates as of 1 pip per trade (0.0001

spread between buy and sell price of the exchange rate).

25

Table 6: Trading Performance on monthly data

Model EUR/USD RW EUR/USD USD/JPY RW USD/JPY AUD/NOK RW AUD/NOK NZD/BRL RW NZD/BRL ZAR/PHP RW ZAR/PHP

Annualized return (excluding costs)(%) 1,24 -0,44 8,12 0,66 -1,08 -2,61 1,12 -8,15 3,91 0,46

Annualized volatility (excluding costs)(%) 12,18 8 8,80 8,17 1,76 2,85 4,02 5,81 9,02 8,83

Sharpe ratio(excluding costs) 0,10 -0,06 0.92 0,08 -0,61 -0,92 0,28 -1,40 0,43 0,05

Positions taken (annualized) 10 5 10 8 2 4 2 5 6 6

Transaction costs (annualized)(%) 0,003 0,002 0.0005 0,0004 0.0022 0.0036 0,0078 0.0019 0,0049 0,0049

Annualized return (including costs)(%) 1,21 -0,46 8,12 0,66 -1,08 -2,61 1,11 -8,17 3,90 0,46

Sharpe ratio (including costs)(%) 0,09 -0,06 0,92 0,08 -0,62 -0.92 0,28 -1,40 0,43 0,05

Table 7: Trading Performance on daily data

Model EUR/USD RW EUR/USD USD/JPY RW USD/JPY AUD/NOK RW AUD/NOK NZD/BRL RW NZD/BRL ZAR/PHP RW ZAR/PHP

Annualized return (excluding costs)(%) 31,02 2,25 24,23 0,67 6,44 -3,05 9,59 -3,24 45,37 0,15

Annualized volatility (excluding costs)(%) 7,74 8,15 7,75 7,90 6,83 6,87 8,77 8,74 12,43 13,53

Sharpe ratio(excluding costs) 4 0,28 3,13 0.08 0,94 -0,44 1,09 -0.37 3,65 0,01

Positions taken (annualized) 121 125 142 134 116 117 124 123 128 120

Transaction costs (annualized)(%) 0,44 0,45 0.09 0.007 0,11 0,11 0,48 0,48 0,10 0,01

Annualized return (including costs)(%) 30,59 1,80 24,23 0.66 6,34 -3,15 9,10 -3,72 45,26 0,04

Sharpe ratio (including costs)(%) 3,95 0,22 3,12 0.08 0,93 -0,46 1,04 -0,43 3,64 0.004

26

The relatively high Sharpe ratio of the proposed trading strategies are generally in line

with the empirical findings of previous studies about portfolios including exchange rates

(Rime et al, 2010, Sermpinis et al, 2013). Nevertheless, the computed ratios from our

approach are higher than previous studies both in daily and monthly forecasting

horizons6.

Summarizing the results of table 6 and 7, a strategy based on the forecasts of the

proposed EEMD-SVR methodology exhibits positive annualized returns (after trading

costs) both for daily and monthly data on all exchange rates, with the exception of the

AUD/NOK monthly exchange rate. Overall, the EEMD-SVR based trading strategy

outperforms the RW model, producing economic gains.

6 Sarno, Valente, and Leon (2006) calculate Sharpe ratios of forward bias trading strategies that range between 0.16 and 0.88, while Lyons (2001) reports a Sharpe ratio of 0.48 for an equally weighted investment in six currencies.

27

4. Conclusion

The proposed hybrid EEMD-MARS-SVR model is a novel forecasting approach based

on data driven methods. By segregating variable time series to a smoothed and a

fluctuating component and forecasting each one separately, our model outperforms the

RW in out-of-sample forecasting on the Euro (EUR) /United States Dollar (USD),

Japanese Yien (JPY) /USD, Norwegian Krone (NOK) / Australian Dollar (AUD)/, New

Zealand Dollar (NZD) / Brazilian Real (BRL) and Philippine Peso (PHP) / South

African Rand (ZAR) exchange rates. Moreover, the adoption of an autoregressive

expression for all series rejects the EMH for all exchange rate markets, but the

EUR/USD rate where we cannot reject the weak form of market efficiency. Overall,

machine learning approximations coupled with signal processing methodologies for time

series smoothing can be exploited for adopting trading strategies that yield economic

profits.

Acknowledgments

This research has been co-financed by the European Union (European Social Fund –

ESF) and Greek national funds through the Operational Program "Education and

Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research

Funding Program: THALES. Investing in knowledge society through the European

Social Fund.

We would also like to thank the participants of the 75st International Atlantic Economic

conference held in Vienna, Austria in April 2013, for their helpful comments.

References

Alexandrov, T., Bianconcini, S., Bee Dagum, E., Maass, P. and Mc Elroy, T. (2009) A

review of some modern approaches to the Problem of trend extraction, Research

Report Series, Statistics vol. 3, U.S. Census Bureau, Washington.

Berger, D.W., Chaboud, A.P., Chernenko, S.V., Howorka, E. and Wright, J.H. (2008)

Order flow and exchange rate dynamics in electronic brokerage system data,

Journal of International Economics, vol.75, pp. 93–109.

28

Bilson, J. (1978) The monetary approach to the exchange rate some empirical evidence,

IMF Staff Papers, vol. 25(1), pp. 48–75.

Bollerslev, T. and Wright, J. H. (2001) High-frequency data, frequency domain

inference, and volatility forecasting, Review of Economics and Statistics, vol. 83,

pp. 596–602.

Brandl B., Wildburger U. and Pickl S. (2009) Increasing of the fitness of fundamental

exchange rate forecast models. International Journal of Contemporary

Mathematical Sciences, vol. 4 (16), pp. 779-798.

Chang C.-C. and Lin C.-J. (2011) LIBSVM: a library for support vector machines. ACM

Transactions on Intelligent Systems and Technology, vol. 2 (27), pp. 1-27.

Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Cheung Y-W. (1993) Long memory in foreign-exchange rates, Journal of Business and

Economic Statistics, vol. 11, pp. 93- 101.

Cheung,Y.-W., Chinn M. and Pascual A.G. (2005) Empirical Exchange Rate Models of

the Nineties: Are any Fit to Survive?, Journal of International Money and Finance,

vol. 24(7), pp. 1150–75.

Cortes C. and Vapnik V. (1995) Support-Vector Networks, Machine Leaming, vol 20,

pp. 273-297.

Dornbusch, R., (1976) Expectations and Exchange Rate Dynamics, Journal of Political

Economy, vol. 86, pp. 1161-1176.

Dunis, C. and Williams, M. (2002) Modeling and Trading the EUR/USD Exchange Rate:

Do Neural Network Models Perform Better ?, Derivatives Use, Trading and

Regulation, vol.8 (3), pp. 211-239.

Evans, M.D.D. and Lyons, R.K. (2008) How is macro news transmitted to exchange

rates?, Journal of Financial Economics, vol. 88,pp. 26–50.

Evans, M.D.D. and Lyons, R.K. (2002) Order flow and exchange rate dynamics, Journal

of Political Economy, vol. 110, pp. 170–180.

Evans, M.D.D. and Lyons, R.K. (2005) Do currency markets absorb news quickly?,

Journal of International Money and Finance, vol. 24,pp. 197–217.

Fama, E (1965) The Behavior of Stock Market Prices, Journal of Business. vol. 38, pp.

34–105.

Fleming, J., Kirby, C. and Ostdiek, B. (2001) The economic value of volatility timing.

Journal of Finance, vol. 56, pp. 329–352.

29

Fleming,J.M. (1962),Domestic Financial Policies under Fixed and Floating Exchange

Rates, IMF Staff Papers, pp. 369–79.

Frankel, J.A., (1979) On the Mark.: A Theory of Floating Exchange Rates Based on

Real Interest Differentials, American Economic Review, vol. 69, pp. 610-622.

Freidman J. (1991) Multivariate adaptive regression splines, Annals of Statistics, vol. 19,

pp. 1-141.

Galbraith, J.W. and Kısınbay T. (2005) Content horizons for conditional variance

Forecasts, International Journal of Forecasting, vol. 21, pp. 249-260.

Guo W. and Tse P. (2013), A novel signal compression method based on optimal

ensemble empirical mode decomposition for bearing vibration signals, Journal of

Sound and Vibration, vol.332, pp. 423-441.

Hannan, E. J., and B. G. Quinn (1979) The Determination of the Order of an

Autoregression, Journal of the Royal Statistical Society, B, vol.41, pp.190–195.

Hodrick, R.J. and Prescott, E.C. (1997), Postwar US business cycles: an empirical

investigation, Journal of Money, Credit, and Banking, vol. 29, pp. 1-16.

Huang, M. L. Wu, S. R. Long, S. S. Shen, W. D. Qu, P. Gloersen, and K. L. Fan (1998),

The empirical mode decomposition and the Hilbert spectrum for nonlinear and

non-stationary time series analysis, Procccedings of the Royal Society of London,

vol. 454A, pp. 903-993.

Ince H. and Trafalis T. (2006) A hybrid model for exchange rate prediction. Decision

Support Systems, vol. 42(2), pp. 1054-1062.

Jekabsons G. (2011), ARESLab: Adaptive Regression Splines toolbox for

Matlab/Octave, available at http://www.cs.rtu.lv/jekabsons.

Karemera D. and Kim B, (2006), Assessing the forecasting accuracy of alternative

nominal exchange rate models: the case of long memory, Journal of Forecasting

Volume 25, Issue 5, pages 369–380.

Killeen, W.P., Lyons, R.K. and Moore, M.J. (2006) Fixed versus flexible: lessons from

EMS order flow, Journal of International Money and Finance, vol. 25, pp. 551–

579.

Lucas,R.E. (1982)‘Interest Rates and Currency Prices in a Two-Country World, Journal

of Monetary Economics, vol. 10, pp. 335–60.

Lyons,R.K. (2001), The Microstructure Approach to Exchange Rates, Cambridge, MA:

MIT Press.

http://en.wikipedia.org/wiki/Edward_J._Hannan

http://www.informatik.uni-trier.de/~ley/db/journals/dss/dss42.html#InceT06

http://www.informatik.uni-trier.de/~ley/db/journals/dss/dss42.html#InceT06

http://onlinelibrary.wiley.com/doi/10.1002/for.v25:5/issuetoc

30

Macdonald, Ronald. (2010) Exchange Rate Economics: Theory and Evidence. 1st

Edition. New York: Routledge.

Mark, N.C. and D. Sul (2001), Nominal Exchange Rates and Monetary Fundamentals:

Evidence from a Small Post-Bretton Woods Panel, Journal of International

Economics, vol. 53(1), pp. 29–52.

Meese, R. and Rogoff K. (1983) Empirical Exchange Rate Models of the Seventies: Do

they Fit out of Sample?, Journal of International Economics, vol. 14, pp. 3–24.

Moghtaderi A., Flandrin P, and Borgnat P, (2013), Trend filtering via Empirical Mode

Decompostition, Computational Statistics and Data Analysis, vol. 58, pp. 114-126.

Molodtsova, T. and Papell, D.H., (2009) Out- of- sample exchange rate predictability

with Taylor rule fundamentals. Journal of International Economics. vol. 77,

pp.167–180.

Mundell, R.A. (1968), International Economics (New York: Macmillan).

Premanode B. and Toumazou C. (2013) Improving prediction of exchange rates using

Differential EMD, Expert Systems with Applications, vol. 40, pp. 377–384.

Rilling G., Flandrin, P. and Goncalves, P. (2005) Empirical mode decomposition,

fractional Gaussian noise and Hurst exponent estimation. IEEE International

Conference on Acoustics, Speech, and Signal Processing, pp. 489–492.

Rime D., Sarno L. and Sojli E. (2010), Exchange rate forecasting, order flow and

macroeconomic information, Journal of International Economics, vol. 80, pp. 72-

88.

Sager M.J. and Taylor M.P. (2008) Commercially available order flow data and

exchange rate movements: caveat emptor. Journal of Money and Credit Banking,

vol. 40, pp. 583–625.

Sarno L., Valente G. and Leon H. (2006) Nonlinearity in Deviations from Uncovered

Interest Parity: An Explanation of the Forward Bias Puzzle, Review of Finance,

European Finance Association, vol. 10(3), pages 443-482.

Sermpinis G., Theofilatos K., Karathanasopoulos A., Georgopoulos F. E., Dunis C.

(2013) Forecasting foreign exchange rates with adaptive neural networks using

radial-basis functions and Particle Swarm Optimization, European Journal of

Operational Research, vol. 225, pp. 528–540.

Stockman, A. (1980) A Theory of Exchange Rate Determination, Journal of Political

Economy, vol. 88, pp. 673–98.

http://ideas.repec.org/a/oup/revfin/v10y2006i3p443-482.html

http://ideas.repec.org/a/oup/revfin/v10y2006i3p443-482.html

31

Vapnik, V., Boser, B. and Guyon, I. (1992) A training algorithm for optimal margin

classifiers, Fifth Annual Workshop on Computational Learning Theory, Pittsburgh,

ACM, pp.144–152.

West, K.D., Edison, H.J. and Cho, D. (1993) A utility based comparison of some models

for exchange rate volatility. Journal of International Economics, vol.35,pp.23–45.

Wu, Z., and N. E Huang (2009) Ensemble Empirical Mode Decomposition: a noise-

assisted data analysis method., Advances in Adaptive Data Analysis., vol. 1, No.1,

pp. 1-41.

32

APPENDIX

The generic EEMD method can be described as follows:

1) Add white noise with a predefined variation to the time series under

consideration.

2) Detect local minima and maxima.

3) With cubic interpolation compute the upper and lower envelopes from local

minima and maxima respectively.

4) Compute the mean value of the lower and the upper envelope. If a) the number

of local minima, local maxima and zero crossing points vary at most by one and

b) the mean is approximately zero, then we subtract the mean from the initial

time series. The residual is the first IMF. If the criteria are not met then we go

back to step (2) and repeat the procedure until criteria fulfillment.

5) Repeat step (4) until the residual has no local maxima and minima.

6) Go back to step (1) and repeat the process for different versions of added white

noise.

7) Take the mean of the ensemble of all decompositions for each IMF as the final

output.

Documents

Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques