115
  PERFORMANCE ANALYSIS OF ARTIFICIAL NEURAL NETWORKS IN FORECASTING FINANCIAL TIME SERIES  by Assia Lasfer A Thesis Presented to the Faculty of the American University of Sharjah College of Engineering in Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering Systems Management Sharjah, United Arab Emirates January 2013

35.232-2013.15 Assia Lasfer

  • Upload
    luan-vo

  • View
    10

  • Download
    0

Embed Size (px)

DESCRIPTION

Assia Lasfer

Citation preview

Page 1: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 1/115

PERFORMANCE ANALYSIS OF ARTIFICIAL NEURAL NETWORKS IN

FORECASTING FINANCIAL TIME SERIES

 by

Assia Lasfer

A Thesis Presented to the Faculty of theAmerican University of Sharjah

College of Engineeringin Partial Fulfillmentof the Requirements

for the Degree of

Master of Science inEngineering Systems Management

Sharjah, United Arab Emirates

January 2013

Page 2: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 2/115

© 2013 Assia Hanafi Lasfer. All rights reserved

Page 3: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 3/115

 Approval Signatures

We, the undersigned, approve the Master‟s Thesis of Assia Lasfer. 

Thesis Title: Performance Analysis of Artificial Neural Networks in Forecasting FinancialTime Series

Signature Date of Signature

___________________________ _______________

Dr. Hazim El-BazAssociate Professor,Engineering Systems Management Graduate ProgramThesis Advisor

___________________________ _______________

Dr. Tarik Aouam 

Associate Professor,Engineering Systems Management Graduate ProgramThesis Committee Member

___________________________ _______________

Dr. Imran ZualkernanAssociate Professor, Department of Computer Science and EngineeringThesis Committee Member

___________________________ _______________

Dr. Tarik OzkulProfessor, Department of Computer Science and EngineeringThesis Committee Member

___________________________ _______________

Dr. Moncer HarigaDirector,Engineering Systems Management Graduate Program

___________________________ _______________

Dr. Hany El KadiAssociate Dean, College of Engineering

___________________________ _______________

Dr. Hany El KadiActing Dean, College of Engineering

___________________________ _______________

Dr. Khaled AssalehDirector, Graduate Studies 

Page 4: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 4/115

Acknowledgments 

Foremost, I would like to express my sincere gratitude to my advisor Dr. Hazim El-

Baz for the continuous support and guidance of my Masters study and research, and for his

 patience, motivation, and encouragement. His guidance helped me in the process of

researching and writing this thesis, and I am very honored to have worked with him for the

 past two years.

I would also like to express my gratitude towards Dr. Imran Zualkernan and Dr. Tarik

Ozkul who have given me a lot of their time and knowledge and did not hesitate to help me

every time I developed with new questions. I greatly appreciate their patience and guidance

and I am truly grateful for being their student.

Furthermore, I must extend my appreciation to the following ESM professors who

made my Masters experience a truly inspiring one. Dr. Moncer Hariga, Dr. Ibrahim Al

Kattan, Dr. Norita Ahmed, and Dr. Tarik Aouam have all shared with me their knowledge

and experience and expanded my understanding of different management and engineering

fields. I would also like to thank Mr. Hisham Ahmad from the IT department for helping me

with technological matters. Finally, I thank all my friends and colleagues who made my AUSexperience an unforgettable one.

Page 5: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 5/115

Dedication

This thesis could not have been completed without the care and support of my mother,

father, sister, brother, uncle, and aunt. I dedicate this work to them, as they have been with

me every step of my educational path.

I dedicate this work to all of my friends who supported me and shared in the long

days, sleepless nights, and weekends working in the office: Noha Tarek, Manal Kaakani, Edi

Ali, Haidar Karaghool, Ahmad Ghadban, Rana and Rami El-Haj, Fatemeh Makhsoos, Eman

AlRaeesi, Sahar Choobbor, Leena Samarrai, Alaa Abu-Salah, Ghanim Kashwani, Basma

Kaplan, Sitara Hola, and Maryam Raji.

Lastly, I dedicate this work to all of the ESM students, and I hope future students may

find it helpful in their own quest for knowledge. I thank God for giving me the opportunity to

 be in this university and attend this program, and for making this part of my life a rewarding

experience.

Page 6: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 6/115

6

Abstract

Forecasting stock prices is of critical importance for investors who wish to reduce

investment risks. Forecasting is based on the idea that stock prices move in patterns. Sofar, it is understood that developed, emerging, and frontier markets have different general

characteristics. Subsequently, this research uses design of experiments (DOE) to study

the significance and behavior of artificial neural networks‟ (ANN) design parameters and

their effect on the performance of predicting movement of developed, emerging, and

frontier markets. In this study, each classification is represented by two market indices.

The data is based on Morgan Stanley Country Index (MSCI), and includes the indices of

UAE, Jordan, Egypt, Turkey, Japan, and UK. Two designed experiments are conducted

where 5 neural network design parameters are varied between two levels. The first model

is a 4 factor full factorial, which includes the parameters of type of network, number of

hidden layer neurons, type of output transfer function, and the learning rate of

Levenberg-Marquardt (LM) algorithm. The second model, a 5 factor fractional factorial,

includes all previous four parameters plus the shape of hidden layer sigmoid function.

The results show that, for a specific financial market, DOE is a useful tool in identifying

the most significant ANN design parameters. Furthermore, the results show that there

exist some commonly significant and commonly insignificant factors among all tested

markets, and sometimes among markets of the same classification only. However, there

does not seem to be any differences in ANN design parameters ‟ effect based on market

classification; all main effects and interactions that appear to be significant behave

similarly through all tested markets.

Search Terms: Artificial neural networks (ANN), Design of experiments (DOE),

Frontier, Emerging, Developed, Financial time series

Page 7: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 7/115

7

Table of Contents

Abstract ............................................................................................................................... 6 

Table of Contents ................................................................................................................ 7 

CHAPTER 1: Introduction ............................................................................................... 12 

1.1  Financial markets ............................................................................................... 12 

1.2 Research Objective ............................................................................................. 14 

1.3  Research Significance ........................................................................................ 15 

1.4  Thesis Outline .................................................................................................... 15 

CHAPTER 2: Literature Review ...................................................................................... 17 

2.1 Technical, fundamental, and time series analysis as forecasting methods....... 17 

2.2  Artificial neural networks in financial forecasting ............................................. 17 

2.3  Artificial neural networks in stock market forecasting ..................................... 18 

2.4  Using Design of experiments with artificial neural networks ............................ 20 

CHAPTER 3: Artificial Neural Networks ........................................................................ 22 

3.1 Neurons .............................................................................................................. 22 

3.2  Multilayer Feed-forward networks (FFNN) ....................................................... 23 

3.3   Nonlinear autoregressive exogenous model neural networks (NARX) ............. 25 

3.4 Back-propagation algorithm ............................................................................... 26 

CHAPTER 4: Design of Experiments .............................................................................. 30 

4.1  Design of Experiments ...................................................................................... 30 

4.2  Factorial and fractional factorial designs ........................................................... 31 

4.3  ANOVA ............................................................................................................. 32 

4.4  Model graphs ...................................................................................................... 34 

CHAPTER 5: Methodology .............................................................................................. 35 

5.1  Definition of the problem statement................................................................... 35 

5.2  Choice of factors and levels ............................................................................... 37 

5.2.1 Variable selection ............................................................................................. 37 

5.2.2 Data collection .................................................................................................. 38 

5.2.3 Data preprocessing ........................................................................................... 39 

5.2.4 Training, testing, and validation sets ................................................................ 39 

5.2.5 Neural network paradigms................................................................................ 40 

Page 8: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 8/115

8

5.2.6 Evaluation criteria ............................................................................................. 41 

5.2.7 Neural network training .................................................................................... 41 

5.2.8 Implementation ................................................................................................. 44 

5.3  Selection of the response variable ...................................................................... 45 

5.4  Choice of experimental design ........................................................................... 46 

5.4.1 Model 1 ............................................................................................................. 47 

5.4.2 Model 2 ............................................................................................................. 47 

CHAPTER 6: Results ....................................................................................................... 49 

6.1  Model 1 .............................................................................................................. 51 

6.1.2  ANOVA results and significant factors ...................................................... 52 

6.1.3  Observations ............................................................................................... 58 

6.2  Model 2 .............................................................................................................. 59 

6.2.2  ANOVA results and significant factors ...................................................... 61 

6.2.2  Observations ............................................................................................... 72 

CHAPTER 7: Conclusion ................................................................................................. 74 

7.1  Results conclusion .............................................................................................. 74 

7.2  Statistical significance and practical significance .............................................. 75 

7.3  Limitations ......................................................................................................... 76 

7.4  Future work ........................................................................................................ 77 

Appendix A ....................................................................................................................... 84 

Half normal probability plots for model 1 .................................................................... 84 

Half normal probability plots for model 2 .................................................................... 85 

Pareto charts for model 1 .............................................................................................. 86 

Pareto charts for model 2 .............................................................................................. 87 

Appendix B ....................................................................................................................... 88 

ANOVA results for model 1 ......................................................................................... 88 

ANOVA results for model 2 ......................................................................................... 96 

Appendix C ..................................................................................................................... 106 

Model Diagnosis ............................................................................................................. 106 

Model 1 Diagnosis ...................................................................................................... 107 

Model 2 Diagnosis ...................................................................................................... 111 

Page 9: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 9/115

9

List of Tables 

Table 1 MSCI classification guidelines from [4] .............................................................. 13 

Table 2 Guidelines for designing an experiment [47] Pg 14 ............................................ 31 

Table 3 ANOVA table ...................................................................................................... 33 

Table 4 Eight steps in designing a neural networks forecasting model Kaastra and Boyd

[21] .................................................................................................................................... 37 

Table 5 ANN configurations of previous works ............................................................... 42 

Table 6 Summary of the factor settings to be used in this experiment and their values ... 45 

Table 7 Model 1 ................................................................................................................ 47 

Table 8 Model 2 ................................................................................................................ 48 

Table 9 Four factor full factorial (24) ............................................................................... 51 

Table 10 Response variables ............................................................................................. 51 

Table 11 Significant effects for model 1........................................................................... 53 

Table 12 5 factor fractional factorial (25-1 res.V) ............................................................ 61 

Table 13 Response variables ............................................................................................. 61 

Table 14 Aliases for model 2 ............................................................................................ 61 

Table 15 Significant effects for model 2........................................................................... 62 

Page 10: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 10/115

10

List of Figures

Figure 1 Basic neuron ....................................................................................................... 22 

Figure 2 Examples of transfer functions ........................................................................... 23 

Figure 3 Feed-forward Neural Network ........................................................................... 24 

Figure 4 Series NARX ...................................................................................................... 26 

Figure 5 Computation of the error function [42] Pg 156 .................................................. 27 

Figure 6 Two-factor and Three-factor full factorial designs ............................................ 32 

Figure 7 Checking constant variance with predicted vs. residual graph ........................... 34 

Figure 8 Single effect and interaction graphs ................................................................... 34 

Figure 9 List of market indices of each classification as listed by MSCI......................... 36 

Figure 10 Mean square error of ANN trained with 3 back-propagation algorithms ........ 43  

Figure 11 Training stopping due to convergence ............................................................. 44 

Figure 12 ANN training in progress –  MATLAB ............................................................ 46 

Figure 13 Markets Data .................................................................................................... 50 

Figure 14 UAE MSE significant main effects .................................................................. 54 

Figure 15 UAE MSE significant interactions ................................................................... 54 

Figure 16 Jordan MSE significant main effects ................................................................ 55 

Figure 17 Jordan MSE significant interactions ................................................................. 55 

Figure 18 Egypt MSE significant main effects ................................................................. 57 

Figure 19 Egypt MSE significant interactions .................................................................. 57 

Figure 20 Turkey MSE significant main effects ............................................................... 58 

Figure 21 Turkey MSE significant interactions ................................................................ 58 

Figure 22 Hyperbolic tangent sigmoid transfer function .................................................. 60 

Figure 23 UAE MSE significant main effects .................................................................. 63 

Figure 24 Jordan MSE significant main effects ................................................................ 64 

Figure 25 Jordan MSE significant interactions ................................................................. 64 

Figure 26 Egypt MSE significant main effects ................................................................. 66 

Figure 27 Egypt MSE significant interactions .................................................................. 67 

Figure 28 Turkey MSE significant main effects ............................................................... 68 

Figure 29 UK MSE significant main effects..................................................................... 69 

Page 11: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 11/115

11

Figure 30 UK MSE significant interactions...................................................................... 70 

Figure 31 Japan MSE significant main effects ................................................................. 71 

Figure 32 Japan MSE significant interactions .................................................................. 72 

Page 12: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 12/115

12

CHAPTER 1: Introduction

The biggest challenge for financial professionals and researchers is the existence

of uncertainty and risk. In fact, risk is a fundamental aspect of modern financial studies.

Since gambling with investment choices is not an option for most, proper research and

 planning can greatly reduce the threat of uncertainty and guide investors towards the

correct steps to take. For this reason, many resources are spent on risk management as

risk not only complicates decision making, but also creates opportunities for those who

 plan for it efficiently.

The belief that equity markets move in patterns led researchers to work on

forecasting techniques. However, it is also known that different markets behave

differently; while developed markets seem more efficient and harder to predict, emerging

markets tend to be more predictable [1]. There have been numerous studies on whether

stock markets follow the random walk hypothesis or not, and many time series analyses

have been done in this area. None, however, studied these differences in behavior using

neural networks as the comparison tool.

This research compares forecasting models of different nations that are built using

artificial neural networks. As the forecasting ability of ANN tends to be superior to many

older methods [2], this new comparison is expected to yield more informative results.

1.1 Financial markets

Financial markets can be categorized into three main sectors: developed,

emerging, and frontier; each sector exhibits unique characteristics. Standard & Poor‟s and

Morgan Stanley Country Index (MSCI) provide the most accurate and trusted

classification guidelines for the inclusion of markets in each category. In their criteria,

S&P and MSCI set minimum volume levels and market capitalization for securities after

the adjustments of free float and foreign ownership; they also look at the size of markets

and their liquidity. S&P looks more specifically into the market turnover, amount of

foreign investments, and number of listings [3], while MSCI examines how stable a

country is politically and economically, and whether there are established rules and

Page 13: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 13/115

13

regulations governing equity markets [4] [5]. Table 1 shows the three requirements that

each country needs to meet in order to be classified in the MSCI classification system.

Table 1 MSCI classification guidelines from [4]

Bekaert and Harvey [6] and Arora et.al [7] agree on four points that differentiate

emerging and more extreme frontier markets from the developed ones; these include the

higher returns and risk, the low correlation with developed markets, the high volatility,

and the higher predictability. In addition, Sener [8] states that the existence of correlation

 between markets is an advantage that helps in studying market classifications. He adds

that emerging and frontier markets exhibit low correlation with the world index and other

markets. This can have great advantages on the global diversification for investors as it

lowers their portfolio‟s overall risk.

Indeed the predictability of stock prices has been the concern of many researchers

for a long time. Discovering price patterns can have great advantages both academically

and financially. However, locating these patterns has long been and remains to be a

significant challenge. One accepted theory regarding stock market behavior is the

Efficient-Market Hypothesis (EMH). The EMH in its strong form states that current

 prices reflect all the available information about a market; this means that one cannot

Page 14: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 14/115

14

gain higher returns by merely looking at the information available at the time of

investment [9]. This hypothesis is consistent with the random walk hypothesis which

claims that stock prices move randomly and cannot be predicted. Although the EMH has

gained acceptance, there is a lot of debate concerning its validity [9]. In fact, certain

anomalies in stock behavior have been documented that also contradict the previous

hypothesis [10], [11]. An observation worth noting is that of Harvey [12] and Bekaert

[13] in which they see that emerging markets are less efficient than developed markets

and are thus more predictable. On the other hand, Robinson [1] relays that sometimes

mixed findings exist where some developed markets behave like emerging ones and vice

versa. He also adds that discovered anomalies sometimes exist in the methods used for

the studies and not in the markets themselves; therefore, choosing a proper and accurate

method is of great importance.

The belief of pattern existence has fueled researchers and financial professionals

to develop forecasting techniques where historical data is studied to determine future

 price trends. Previous forecasting techniques have not proven very effective given the

chaotic and dynamic nature of stock movements; these included statistical and technical

analysis and fundamental analysis. Advanced methods include time series financial

forecasting and computer modeling using artificial intelligence. As the currently most

accepted method in financial forecasting, artificial neural networks (ANN) have gained

extensive popularity in this field due to their unique features that give researchers and

investors hope for sorting out the stock market mystery [14].

1.2 Research Objective

For this research, several ANN design parameters are chosen, and their effect on ANN

 performance is investigated. The methodology is applied to market indices for developed,

emerging, and frontier markets. The research has three objectives:

1.  Identify important neural network design parameters that may impact

 performance in forecasting future moves of financial markets

2.  Develop designed experiments to identify the significant parameters, identified in

step 1, that impact the performance of ANNs

Page 15: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 15/115

15

3.  Compare the performance of ANNs in forecasting future moves of developed,

emerging, and frontier markets

1.3 Research Significance

In this study, experiments are conducted to assess how different ANN designs

 behave with different types of market classifications in terms of prediction. Using ANN

 prediction error as a response, the significant design parameters are identified and their

effect on the performance of ANNs will be analyzed and recorded. This could give

insight into how ANNs can be calibrated differently for different market classifications.

Consequently, this will provide a guide for future researchers to minimize trial and error,

since previous research advocated that the best ANN design should be found in such a

way that provided no guidelines. There have also been initiatives to study individual

markets but no comparison of ANNs through multiple markets exists. This research will

attempt to provide these guidelines and thus save future researchers time and effort. Its

intended contribution will be important in that it will cover the full spectrum of financial

market types (developed, emerging, and frontier) and provide crucial insight into the

 behavior of these markets and how the design of ANNs is affected by market behavior.

1.4 Thesis Outline

Chapter 2: Literature Review. This chapter provides a brief introduction on forecasting of

financial time series using previous methods. In addition, the role of neural networks in

the financial field is discussed along with their advantages over previous methods.

Chapter 3: Artificial neural networks. This chapter provides a brief introduction on neural

network theory, both feed-forward and recurrent NARX, and how they work, in addition

to the back propagation algorithm.

Chapter 4: Design of experiments. This chapter introduces the concept of designing

experiments and performing statistical analysis for finding significant factors.

Page 16: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 16/115

16

Chapter 5: Methodology. This chapter covers the steps of designing and preparing the

experiment to be conducted. This is done by first choosing the right ANN parameter

values and then defining the experimental models to be built.

Chapter 6: Results. This chapter assesses the results of the statistical analysis done on the

experimental models, and draws conclusions from interaction diagrams about the

significant factors and their behavior in each studied market.

Chapter 7: Conclusion. Here, the study is summarized along with the main results and

contributions. The chapter also discusses the implications of this study on future research.

Page 17: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 17/115

17

CHAPTER 2: Literature Review

In this chapter financial forecasting methods are introduced. The first part briefly

mentions previous forecasting techniques and states why ANNs are preferred. The next

section moves to financial forecasting areas where ANNs are well established and used.

Finally, a survey of past research and work regarding forecasting stock prices is

 presented. Neural networks have surpassed other forecasting methods in modeling

financial time series and numerous published works have proven their superiority.

2.1 Technical, fundamental, and time series analyses as forecasting methods

Technical and fundamental analyses are the most basic methods of forecasting

stocks. Technical analysis uses past prices and trading volume to study future price

movements [15], and fundamental analysis uses publicly available information like

dividends, growth factors, and accounting earnings to find core values of securities [16].

However, these two analysis methods are not practical; interpreting results is very

subjective, and data used may be old causing a loss of opportunity for investors due to

time delay [17]. Nowadays, these analyses are used as ANN inputs.

Moreover, time series analysis takes into account the sequential nature of stock

 prices and uses complex methods to model price behavior. However, these methods

assume that data are generated from linear processes, which is not the case in most real

world problems, including the financial ones [14]. Time series analysis methods are

usually compared to ANNs, the latter often being the better forecasting performer [18]

[19]. 

2.2 Artificial neural networks in financial forecasting

ANNs gained great popularity in the financial field for their ability to deal with

uncertainty and handle noisy data [20]. Previous financial applications for ANN include,

 but are not limited to, risk assessment for mortgages and fixed investments, prediction of

default and bankruptcy, portfolio selection, and economic forecasting [21]. The features

of ANNs make it convenient to study stock market behaviors; however, although

theoretically ANN can approximate any function, designing a good neural network

Page 18: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 18/115

18

through calibrating numerous parameters is a significant challenge. Therefore, a universal

design does not exist, and different data require different designs. This creates a building

 process with a tedious trial and error nature [22].

ANNs are data-driven and self-adaptive because no assumptions have to be made

about the problem before they are built; this makes them ideal for data that is complex

and open to interpretation [14]. ANNs learn dynamically through training and then make

educated guesses [17]. Theoretically, a correctly designed ANN is able to converge to

any optimal result after being trained. ANNs are universal function estimators that can

map any nonlinear function [20]. All these features make it very convenient to study

stock market behaviors using ANNs.

In a theoretical sense, ANNs can approximate any function, but designing a good

neural network and calibrating its parameters correctly is a serious challenge, as this is

dependent on the specific data set used. Since different data need different designs, the

 building process can only be accomplished through a tedious trial and error process.

Furthermore, given the large computer processing power and memory requirements for

ANN training, the trial and error system is limited because researchers cannot attempt

countless combinations. Finally, models are dynamically built during the training process

of an ANN; therefore, they can be considered a black box that is only used but not

transparent. This makes studying the generated model, as well as analyzing why it made

good or bad predictions, nearly impossible [23].

2.3 Artificial neural networks in stock market forecasting

As stated earlier, neural networks occupy a large area in financial applications and

research. Specifically, stock market forecasting is a very active field of exploration.

Researchers have published several works setting guidelines to building good ANNs.

 Notably, Kaastra and Boyd [21] discuss a step by step approach for the proper building ofANNs for forecasting financial and economic time series. They focus on all the important

design parameters of back-propagation feed-forward networks and ways of configuring

them starting from preprocessing of data, functions and methods to use, and network

configurations to make. Similarly, Zhang et al [14] survey past practices and provide

insights on ANN modeling issues. Similar works include that of Padhiary and Misha [9]

Page 19: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 19/115

19

who build ANNs with adaptive learning rate to predict long and short term returns.

However, Yao et al [23]  quote that most of the research material published lacks

experimental data or does not use data from real world problems; furthermore, testing a

single market or a short time period signifies little and does not provide a complete

 picture of the performance of ANNs. Finally, they conclude that multiple ANNs could be

equally accurate in solving a problem and that building a model construction system to

help build the proper ANNs would free researchers from a trial and error basis.

Among practical published research, most of the surveyed works use back-

 propagation feed-forward networks because of their simplicity and ease of use; however,

few others explore other topologies and learning algorithms. As for studies done on

developed markets, Patel and Marwala [22] use both feed-forward networks and radial

 basis networks to build forecasting models for the Dow Jones Industrial Average,

Johannesburg Stock Exchange, Nasdaq 100, and the Nikkei 225 Stock Exchange. The

 best accuracy recorded by their study is 72% for the Dow Jones Industrial Average.

Moreover, Isfan et al [24] built an ANN model to predict the Portuguese stock market.

The best topology is chosen by changing certain parameters, like the number of hidden

layer neurons and the learning rate. These researchers conclude that ANNs outperform

other forecasting methods and give hope for future understanding of stock markets‟ 

chaotic behavior. Moreover, Roman and Jameel [25] use recurrent networks with back-

 propagation to study five developed stock markets, namely those of UK, Canada, Japan,

Hong Kong, and USA, and design portfolios across international markets.

Moving to emerging markets‟  studies, Thenmozhi [22] uses a feed-forward

network to forecast Bombay Stock Exchange Index (BSE SENSEX). Inputs to the

network are four consecutive closing values and the output is the closing value of the

fifth day. After conducting a sensitivity analysis, Thenmozhi concludes that the latest

 price is of highest importance. Likewise, Desai et al. [26] propose a similar model for

forecasting the prices of S&P CNX Nifty 50 Index of India. The input to the ANN is the

simple moving average (SMA) of the closing prices; this is after concluding that the

SMA provides better results than raw prices. The researchers emphasize that ANNs can

 be very helpful in forecasting volatile markets of emerging countries. Correspondingly,

other applications build similar ANNs for Tehran stock exchange [27], where the

Page 20: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 20/115

20

exponential moving average is used as input and different learning algorithms are used

for training, the Brazilian stock market [18], Kuwaiti stock exchange index [28], and S˜ao

Paulo Stock Exchange [29].

Several research surveys were conducted regarding ANNs and financial

forecasting. One such survey compares different applications in an effort to find the best

topology for specific problems in financial time series [20]. It is observed that most

applications use three layered ANNs, back-propagation, and sigmoid activation

functions; however, no “recipe” can be found that relates methodology to topology. A

more thorough survey done by Atsalakis and Valavanis summarizes 100 published

articles [2]. The authors note that closing prices and technical indicators are commonly

used as input variables. Moreover, sixty percent of all studies use feed-forward or

recurrent neural networks with one or two hidden layers. The study classifies applications

for developed and emerging markets and observes that more forecasting models are built

for emerging markets than for developed markets. This could be due to emerging markets

 being more inefficient and thus more predictable. The paper concludes by stating that

although some guidelines can be given for building ANNs, finding the best is still a

matter of trial and error.

2.4 Using Design of experiments with artificial neural networks

Research is ongoing to develop optimal ANN designs. There are many parameters

that should be taken into consideration, and all of them affect the performance of ANNs

to a certain degree; therefore, finding the most important parameters helps in focusing on

crucial information, maximizing performance, and minimizing building costs. Some have

attempted to tackle the problem using simple experimentation of a one-factor-at-a-time

fashion like Tan and Witting [30]. In their work, Tan and Witting study the effects of five

ANN parameters, namely the momentum coefficient, learning rate, activation function,

inputs, and number of neurons in the hidden layer. The network is a stock market

 prediction model, so the response variable is the difference between the actual price and

the predicted price. The researchers start with six initial frameworks and change

 parameter value combinations one at a time. However, this work does not include any

statistical conclusions or observations regarding parameter interactions.

Page 21: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 21/115

21

Other works that apply design of experiments with statistical analysis include

Balestrassi et al [31] who study ANN models built for non-linear time series forecasting.

It is clearly mentioned that one-factor-at-a-time analysis gives unreliable and misleading

results, and that statistically designed experiments perform more efficiently. The work

 builds a mixed level design of twenty-three factors that are allowed to have two, three, or

four levels. The response variable is the MSE. After the first run is completed, the most

significant factors are chosen and a smaller fractional factorial design is built; this

 process helps in reducing confusion and finding more accurate results. The research

concludes that better performing ANNs can be built using DOE. Other works that apply

DOE to ANNs in other fields include Laosiritaworn and Chotchaithanakorn [32] who

study 24 design, and Behmanesh and Rahimi [33] who use DOE to optimize building a

recurrent ANN using a 23 design.

In compiling this literature review, no study was found that compares the

generalization ability of ANNs among markets of different classifications. Moreover, no

study could be found that applies DOE to analyze the significance of ANN parameters for

different stock markets. This research, therefore, is unique in that the significance of

ANN design parameters is compared between different markets from different

classifications. 

Page 22: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 22/115

22

CHAPTER 3: Artificial Neural Networks

Artificial neural network s mimic the human brain‟s ability to learn and identify

 patterns through training. This chapter introduces the topic, basic terminologies, and

components of ANNs.

3.1 Neurons

Artificial neural networks are adaptive computational models inspired by the

 biological human brain system. Unlike other analytical tools, they have been capable of

solving complex problems, such as function approximation, classification, and pattern

recognition. Moreover, they have been used as optimization tools for complicated and

non-linear problems [34].  A typical ANN consists of multiple neurons organized in a

layered fashion and connected to each other forming an inter-dependent network.

 Neurons are the basic building blocks of all neural networks. In his book, Dreyfus [35]

defines a neuron as a “nonlinear, parameterized, bounded function.” Figure 1 illustrates a

simple neuron.

Figure 1 Basic neuron

A neuron can have one or more inputs and one or more outputs; the output of the

neuron is the result of a non-linear combination of the inputs { }, weighted by the

synaptic weights { }, and the application of a function { } on the result. Gupta et al.

[36] explain that each neuron has a relative weight that represents the importance of the

signal it sends; these weights are assigned according to past experience gained through

training. They add that after multiple weighted signals are combined in the neuron,

Page 23: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 23/115

23

further processing is conducted using a special function called the activation function

{ }. The set of inputs to a neuron generally includes a bias { } whose value is constant

and equal to 1; sometimes it is also denoted as [35].

An activation function, or sometimes called transfer function, is a function

applied to the weighted sum of the inputs and the bias as shown in the following

equation:

(3.1)

The function can be of linear or non-linear nature, some of these functions include pure-

linear, sigmoid, hyperbolic, and Gaussian. Figure 2 illustrates some of the commonly

used functions.

Figure 2 Examples of transfer functions

 Neurons are the building blocks for any type of network and always work in the

form discussed above. These neurons are arranged and connected in a layered fashion;

 because data passes sequentially from one layer to the other, the first layer is called an

input layer and the last layer is called an output layer. There are two types of neural

networks: feed-forward and recurrent (feedback) neural networks. Both were utilized in

this research and are discussed in the following section.

3.2 Multilayer Feed-forward networks (FFNN)

Feed-forward neural networks are the most commonly used ANNs because of

their simplicity. The information across the network is transferred in a forward manner

starting from the input layer, through one or more hidden layers, and out of the output

layer (see Figure 3).

Page 24: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 24/115

24

Figure 3 Feed-forward Neural Network

For a network with 1 hidden layer, if we denote as the input layer, as the

hidden layer, and as the output layer, the input is propagated through the hidden layer

such as

(3.2)

(3.3)

where is the output function for the th  hidden layer node for time , is the

number of inputs, is the weight connecting the th hidden node and the th input node,

and is the bias of the hidden layer. That function will then be propagated to the output

layer such as:

(3.4)

(3.5)

Where is the output function of the th output node for time , is the number of

hidden neurons, is the weight connecting the th output node and the th hidden node,

and is the bias of the output layer [37].

Page 25: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 25/115

25

FFNN‟s simplicity is the main reason for their popularity [34]. As the network is

initialized, random weights are assigned to the network‟s neurons and  the weights are

then modified as to minimize the output error. In this research, only supervised method is

discussed, which is the training of a network through examples that it can compare its

output with. Each input entry has a paired correct output. The network error is the

difference between the correct output and the predicted output of the network. The

network will then work to minimize that error. The training algorithm that is usually

associated with FFNN is the error back-propagation algorithm. This algorithm works in

two phases: first, the input signal propagates through the network in a forward direction

with the weights being fixed; then the error is propagated backwards from the output

layer to the input layer. The weights are then adjusted based on the error-correction rule

[38].

3.3 Nonlinear autoregressive exogenous model neural networks (NARX)

 NARX is a type of recurrent neural network. Recurrent networks are a more

complex form of networks where connections between layers are cyclic, meaning that

output of neurons in a certain layer can be input to neurons in preceding layers. This

creates a short-term memory for the network and allows it to behave in a dynamic

temporal way [38]. Unlike FFNNs, recurrent networks depend not only on the inputs, butalso on the state or the time sequence of data. This feature has made them an interesting

candidate for the studying of financial time series [39].

For a NARX network, the output at time is used as an input for time , as

shown in Figure 4  [40]. For a network with 1 hidden layer, 1 output, and a recurrent

connection from the output layer to the input layer, if we denote as the input layer,

as the hidden layer, and as the output layer, the input is propagated through the hidden

layer and to the output layer in the same way as shown in equations 2 to 5; however, the

recurrent input layer is denoted as

(3.6)

where is the total input value [37].

Page 26: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 26/115

26

Figure 4 Series NARX

Several sources state that NARX networks outperform other recurrent networks like

Elman networks in forecasting time series in general, and financial time series in specific

[41], [42].

3.4 Back-propagation algorithm

The standard learning algorithm for ANNs is the back-propagation algorithm.

There are several forms of this algorithm. The simplest one, however, follows the

direction in which the error function decreases most rapidly (negative gradient) to updateweights and biases. Thus, a single iteration can be written as:

(3.7)

where is the vector of weights and biases, is the learning rate, and is the

gradient [35].

In general the back-propagation algorithm looks for the minimum of the error

function by following its gradient. For the purpose of computation, the function must be

continuous and differentiable, and to ensure that, a proper activation function must be

chosen. Figures 2a and 2b show two variations of the most popular activation function:

the sigmoid [43]. For a neural network, the solution of the problem is the combination of

weights that give the minimum error. The weights are first initialized randomly, then the

Page 27: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 27/115

27

gradient of the error function is computed and used to correct the initial weights. This is

done recursively until the error cannot decrease anymore, meaning a minimum is reached

[43]. The error function most frequently used is the mean square error (MSE), which can

 be written as:

(3.8)

where is the output of the neural network and is the target value. Figure 5 shows

how this error function is computed per neuron; all values are then summed up.

Although the back-propagation is usually used with FFNN networks, its use can

also be extended to recurrent networks including NARX networks [43], [44], [45].

Figure 5 Computation of the error function [42] Pg. 156

Although the back-propagation algorithm has been used successfully in many real

world applications, it suffers from two main drawbacks: slow learning speed and

sensitivity to parameters [38]. For the purpose of improving the drawbacks while still

maintaining its positive features, several modifications to the algorithm where created,

including gradient descent back-propagation with adaptive learning rate and momentum

coefficient, Quansi-Newton, resilent back-propagation, and Levenberg-Marquardt

algorithm.

Levenberg-Marquardt algorithm outperforms simple gradient descent and

generally performs well. It is thus one of the most widely used algorithms [46]. In fact,

MathWorks has tested nine commonly used back-propagation algorithms and concluded

Page 28: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 28/115

28

that Levenberg-Marquardt has the fastest convergence and lowest mean square error for a

function approximation problem built on a small to medium network. It was concluded

that the resilient algorithm is the best for pattern recognition problems [47].

Levenberg-Marquardt algorithm is a blend of simple gradient-descent and

 Newton‟s method. While the simple gradient descent suffers from various convergence

 problems, Newton‟s method improves these problems by using second derivatives [46] .

The Newton method can be expressed as:

(3.9)

where is the training step, is the Hessian matrix, and is the gradient. Although

the Newton method provides good convergence, it is very expensive and complex tocompute the Hessian matrix. Levenberg-Marquardt combines the complimentary

advantages of both gradient-descent and Newton‟s  method. It approaches the second-

order convergence speed without having to calculate the Hessian matrix [47]  , [46].

Therefore, the Hessian matrix is approximated as:

(3.10)

and the gradient as

(3.11)

where is the Jacobian matrix (first derivatives of network errors) and is the vector of

network errors. Calculating the Jacobian matrix is a significantly easier task than

computing the Hessian matrix. Therefore, the Levenberg-Marquardt iteration can be

written as:

(3.12)

As the learning rate goes to zero, the algorithm becomes the Newton method as in

equation (3.9); however, when it increases, the algorithm becomes the gradient descent

with small steps. As the Newton method has fast convergence, the algorithm tends to

Page 29: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 29/115

29

shift towards it as fast as possible. Thus decreases after each iteration if the

 performance function keeps decreasing to reduce the influence of gradient-descent. If the

 performance function increases, the value of increases again to follow the gradient

more. [44]

Building proper neural networks requires the close study of all parameters and

common practices. Chapter 5 further discusses the selection process of these parameters.

Page 30: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 30/115

30

CHAPTER 4: Design of Experiments

Properly designing an experiment is of great importance when statistically

analyzing an experimental model. There are several guidelines that must be followed.

This chapter discusses these guidelines as well as the theory of experimental design and

analysis.

4.1 Design of Experiments

An experiment is a systematic procedure done under controlled conditions to test

a hypothesis of a process or system. In this research, neural networks served as the

system experimented on, and their parameters were the factors tested. Experiments are

used to find significant factors that affect processes‟ outputs and factor values that give

the best output. Properly designing an experiment reduces time and narrows the focus to

attaining the desired information.

In his work Design and Analysis of Experiments, Montgomery states that there are

two aspects to any experimental problem: the design of the experiment and the statistical

analysis of the data [48]. These two aspects are inter-related and the success of one

depends on the success of the other. Montgomery adds that “when  a problem involves

data that are subject to experimental errors, statistical methodology is the only objective

approach to analysis”  [48]. Table 2 summarizes the guidelines of properly designing an

experiment, as discussed in Montgomery‟s work  [48]. These guidelines were followed in

this research and their implementation is discussed in the following chapters.

When conducting an experiment, it is important to define all factors that affect the

response. These factors include those of interest to the experimenter as well as other

factors that need to be controlled for their effect not to be evident in the results.

Randomization can minimize the effect of uncontrollable factors if they are present [49].

Furthermore, repetition is also an important concept with experimentation. Given the

same treatment, it is unlikely that the experiment will yield the same results if repeated

several times. Replication thus helps researchers attain more precise results by reducing

their variability and increasing their significance. It allows for the calculation of pure

Page 31: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 31/115

31

error, which reflects the variability of observations within a single treatment. Replication

is also used to check if discrepancies are statistically different and whether the statistical

model is adequate [50].

Table 2 Guidelines for designing an experiment [48] Pg 14

1.  Definition of the problem statement

2.  Choice of factors, levels, and ranges

3. 

Selection of response variables

4.  Choice of the experimental design

5. 

Performing the experiment

6.  Statistical analysis of the data

7.  Conclusions and recommendations

4.2 Factorial and fractional factorial designs

The objective of conducting experiments is to investigate which factors are

significant and which are not. However, the interactions between factors must also be

taken into consideration. The presence of significant interactions means that a factor‟s

effect changes at different levels of another factor. In such a scenario, a factorial design is

appropriate where factors are varied together [48]. Factorial designs with two levels are

usually utilized. Here, each factor has low and high levels only. For instance, a 24

factorial design is one with four factors, each having two levels. Since all possible

combinations of levels of factors are used as treatments, a design with k  factors will need

2k   runs. With a large number of factors this becomes infeasible, and so a fractional

factorial design can be used instead.

A fractional factorial design is a variation of the original factorial design where

only a portion of runs is done [51]. It is usually represented in the form 2k  –  p where k  is

the number of factors and 1/2 p is the fraction of the original design (see Figure 6). The

fractional factorial design follows the assumptions that the system is dominated by main

effects and low order interactions and that higher order interactions are insignificant.

Following this assumption, higher order interactions can be deliberately confounded

(aliased) with main effects to reduce the number of runs required [51]. Confounding

Page 32: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 32/115

32

occurs when the impact of a factor cannot be distinguished from that of another factor

[49].

Figure 6 Two-factor and Three-factor full factorial designs 

Before building a fractional factorial design the proper resolution must be chosen.

The commonly used resolutions are [51]:

 Resolution III: In these designs main effects are not confounded with any other main

effects, but main effects are confounded with two-factor interactions, and two-

factor interactions are confounded with each other.

 Resolution IV: In these designs main effects are not confounded with any other main

effects or with two factor interactions, but two factor interactions are confounded with

each other.

 

Resolution V: In these designs main effects are not confounded with other main

effects, two factor interactions, or three factor interactions, but two factor interactions

are confounded with three factor interactions.

It is common practice to follow recommended designs presented in textbooks in

order to produce a design with the highest resolution. Please refer to Montgomery [48]

Table 9-14 for such a guide.

4.3 ANOVA

Statistical analysis of the experimental model is important in determining how

likely it is that a result has occurred by chance alone. It answers the question of whether

an effect is statistically significant or not. Having a level of significance of p < 0.05

means that any effect that is likely to happen less than 5% of the time by chance is

Page 33: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 33/115

33

statistically significant. [52]  An ANOVA test compares the variance due to the factor

under investigation with the variance due to chance; this is done using the F-test.

Therefore,

 =

  (4.1) 

When conducting the F-test, the null hypothesis is  H o:  the tested term in not

 significant. To test this hypothesis, a p-value associated with a significant level (usually

0.05) is checked. A level is said to be significant if p-value is less than 0.05 i.e. reject  H o

[48]. 

When conducting an ANOVA test, a typical generated table would appear similar

to Table 3. Looking at the table, d.f. are the degrees of freedom, SS is the sum of squares,

MS is the mean square, and N is the total degrees of freedom (equal to the total number

of treatments including repetition). The p value is calculated based on the F value of the

factors and the error. If the p value < 0.05 (for a 95% confidence interval) then the H o 

hypothesis is rejected and the factor is significant. The desired results are that a factor is

significant and the error is not significant.

Table 3 ANOVA table

Moreover, for the ANOVA test results to be valid, two points must be true:

  The data is normally distributed

 

The errors are normally distributed and their variance is constant. Figure 7 shows

a plot verifying constant variance where the predicted values of the model are

Page 34: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 34/115

34

 plotted against the residual. Here, the points are scattered randomly (no pattern) as

desired [52].

Figure 7 Checking constant variance with predicted vs. residual graph

4.4 Model graphs

Factor significance and behavior at different levels can be checked using model

graphs. These graphs show how at different levels, the response variable is affected by

each factor. Model graphs can show the effects of single factors, two-level interactions,

and three-level interactions. When revising a single factor graph, care should be taken

with factors that are part of an interaction whose impact is not visible. Therefore, final

conclusions cannot be made from single factor graphs alone for factors that are part of

significant interactions [49]. Figure 8  shows a single factor graph, and a two-factor

interaction graph.

Figure 8 Single effect and interaction graphs

Page 35: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 35/115

35

CHAPTER 5: Methodology

The aim of this research was to investigate the significance of each ANN

 parameter on different market classifications, and check how their behavior wascompared across different markets. This was tackled by following a systematic approach

of experimentation. As mentioned in Chapter 2, design of experiments with statistical

analysis leads to more informed results than factor-at-a-time approach, and thus it was

also used in this research. The guidelines of experiments (see Table 1) were followed to

conduct the experimentation process. This chapter explains in detail how the research

was designed and conducted.

5.1 Definition of the problem statement

The designed experiment aimed to answer the following questions: 

1-  When a group of ANN parameters is studied, what parameters are more

significant to the ANN performance?

2-  Are the significant ANN parameters the same for all market types? That is,

a.  Are significant parameters and their behavior similar for markets of the

same classification?

 b.  Is a significant parameter for a frontier market, for instance, also

significant for other market types, or is it unique per market classification?

To tackle these questions, the first step was to choose markets and indices.

  Due to time and processing limitations, only two markets per classification were

chosen for a total of six markets.

 

The MSCI classification system was used to define the frontier, emerging, and

developed markets (as shown in Figure 9). According to this classification

system, UAE and Jordan were chosen as frontier markets, Turkey and Egypt as

emerging markets, and UK and Japan as developed markets.

Page 36: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 36/115

36

Figure 9 List of market indices of each classification as listed by MSCI

  The MSCI indices average stocks for the overall countries. Each of these indices

was chosen in this research as input data (training data) to the ANN system to be

studied. Furthermore, using indices from the same source insured that the effect of

 possible unknown lurking variables was minimized.

Page 37: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 37/115

37

5.2 Choice of factors and levels

The second step in the process of designing the experiment was the choice of

factors and levels. In the ANN context, the factors are the design parameters. In order to

choose these factors, previous literature was reviewed to identify the best practices andthe variables that are most important to the ANN performance. Kaastra and Boyd [21]

describe a method for designing a back-propagation ANN for financial forecasting in a

step-by-step process (see Table 4). Other articles mentioned in the literature review

section also describe similar procedures.

Table 4 Eight steps in designing a neural networks forecasting model by Kaastra and Boyd [21]

Step1: Variable selection

Step2: Data collectionStep3: Data preprocessing

Step4: Training, testing, and validation sets

Step5: Neural network paradigms

 Number of hidden layers

 Number of hidden neurons

 Number of output neurons

Transfer functions

Step6: Evaluation criteria

Step7: Neural network training

 Number of training iterationsLearning rate

Step8: Implementation

5.2.1 Variable selection

In the case of financial markets, defining input data is of high importance since it

greatly affects the generalization ability of an ANN. It is important to have a clear

understanding of the problem. In the case of stock market prediction, previous prices and

technical indicators are used as input data. As stated earlier in the literature review

section, closing prices are widely used as inputs and as targets; that is, closing prices of

 previous days (e.g. days 1 to 5) can be used as inputs to the system to predict the closing

 price of the next day (all referenced papers use closing prices). Using technical indicators

in conjunction with the previous days‟ closing pr ice is also a common practice among

Page 38: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 38/115

38

 previous researchers, especially the moving average and the exponential moving average.

Atsalakis and Valavanis [53] note that closing prices and technical indicators are

common inputs among the one hundred surveyed works. Others like Desai et al. [26]

conclude that using a moving average enhances the performance of ANNs, and

Thenmozhi [54] states that the more recent the closing price is, the more effective it is on

the result. This makes the exponential moving average an excellent candidate since it

gives more weight to recent prices. When the simple moving average (SMA) is expressed

as

(5.1)

where is the closing price, is the day, and is the total time period, the exponential

moving average is expressed as

(5.2)

where . As can be seen, when the value goes further away in time, its

influence on the result exponentially decreases.

5.2.2 Data collection

The data used are:

Frontier markets:

o  UAE MSCI index

o  Jordan MSCI index

Emerging markets:

o  Turkey MSCI index

o  Egypt MSCI index

Page 39: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 39/115

39

Developed markets:

o  UK MSCI index

o  Japan MSCI index

All index data are from the past ten years: from August 1st

 2002 to July 31st

 2012

5.2.3 Data preprocessing

Data preprocessing refers to the transformation of input and output data for the

sake of reducing noise, detecting patterns, and flattening data distribution to assist ANNs

in learning relevant patterns. “Since neural networks are pattern matchers, the

representation of the data is critical in designing a successful network ” [21]. Raw inputs

and outputs are rarely fed to the ANN directly; instead, preprocessed data is used because

it tends to help ANNs learn patterns better [14].

Firstly, the frequency of the financial data should be constant throughout the data

set. This means it can be daily, every two days, weekly, or any other frequency, as long

as it is fixed. If this is not the case, some techniques must be used to estimate the missing

data. In this research, daily data (five business days per week) was used with no missing

values. Therefore, no correction was needed. Next is the scaling and normalization of

data. Here, both input and output data were scaled to common minimum and maximum

values, usually -1 and 1 or 0 and 1, depending on the transfer function used. In the case

that input and output data ranges do not match, the ANN faces a harder time learning

their relationship, whereas rescaling them to the same ranges greatly facilitates the

learning process. Moreover, this matches the range of data with the range of transfer

functions used [14].

5.2.4 Training, testing, and validation sets

A common practice in training ANNs is to divide the data into training, testing,

and validation sets. The training set is the largest and is used to train the ANN to learn

data patterns. Next, a smaller testing set is used to test the generalization ability of the

newly trained network. Finally, the validation set is used to double check the

Page 40: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 40/115

40

 performance of the network. To properly divide the data, some points must be followed

regarding each set. The training set should be the largest to give the network enough data

to learn, and the validation set should be chosen as to balance between having enough of

a sample size to evaluate the network and enough remaining data for the other two sets.

Kaastra and Boyd [21] recommend a 10% to 30% portion for testing data, and Zhunag et

al [14] recommend 90% to 70% for training data. Therefore, a 70-15-15 is an acceptable

division and was thus used in this research.

5.2.5 Neural network paradigms

 Neural networks can be built in an infinite number of ways. The ANN

architecture can be formed in many ways depending on the number of input neurons,

number of hidden layers, number of neurons in each hidden layer, number of output

neurons, and types of connections between all these neurons. Another aspect is the

transfer function applied to each neuron, which also has many options. In order to limit

this research to certain values for each of the parameters, a review on previous practices

was conducted. Table 5 summarizes past practices of some previous applications.

   Number of input neurons: this is the same as the number of inputs; therefore, one

input neuron is assigned for the closing price and another is assigned for the

EMA.

   Number of hidden layers: number of hidden layers depends on the amount of

inputs and the nature of data. However, it is commonly accepted that one or two

are enough because more layers increase the danger of over-fitting (ANN

memorizes exact answers instead of learning how to predict) [55]. It is also

accepted that one layer with a sufficient number of neurons is enough for good

approximations [36].

 

 Number of hidden layer neurons: this is a function of the number of input

neurons. The number of neurons will range between one half to three times the

number of input neurons [56], [57].  For this research, since the number of input

neurons is only two, the number of neurons in the hidden layer ranges from two

to six.

Page 41: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 41/115

41

  There is only one output neuron, which is for the forecasted closing price.

  Transfer function for each neuron: the most commonly used transfer functions

are pure-linear and sigmoid functions because of their continuity feature [2]. The

hyperbolic tangent sigmoid (Tan-sig) and pure linear functions are both used in

this research. The Tan-sig function accepts inputs in any range and non-linearly

transforms them into a value between -1 and 1. The function can be expressed as:

(5.3)

where is the input range and is a variable that controls the shape of the

function. The larger the value of , the closer the function gets to the step

function, and the smaller the value, the closer it gets to a pure linear function.

The variable is varied in the experiment to test the effect of different shapes of

the tan-sig function on ANN performance.

  Error function: the mean square error (MSE) function is used for training. It was

also used as a response variable for statistical analysis of the conducted

experiment [58] [59] [60].

5.2.6 Evaluation criteria

 Neural network performance is evaluated using specified performance functions.

Many error functions are used, such as the least absolute deviations, sum of squared

errors, and mean absolute percentage error. However, the mean squared error (MSE) is

the most common and was therefore used in this research for ANN training. It was also

used as the response variable for experimental analysis.

5.2.7 Neural network training

The process of training an ANN is by iteratively feeding it with inputs and

 presenting it with correct answers. After the training process is over, the ANN is meant to

 provide a good generalization. The main aim of training is to reach the global minimum

of the error function. In order to do that, a training algorithm is used, usually gradient-

descent back-propagation, to go down the steepest slope of the error function. There are

Page 42: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 42/115

42

many variations of the back-propagation algorithm, which have been developed in order

to improve training for different types of problems, save training time, and save memory

requirements. In order to select which back-propagation algorithm to use, different

choices were surveyed.

Table 5 ANN configurations of previous works

Paper Input variables Number of

hidden layers

Number of

neurons in hidden

layers

Transfer

functions

Error

functions

Zhang et al. [14] Closing price Mostly 1,sometimes 2

Between “inputs/2”

and “2(inputs)+1” Sigmoidfunctions

MSE, SSE,RMSE,MAPE forcomparison

Lawrence [17]  Usually 1 Sigmoidde Faria et al [18] Closing price 1 Between “inputs/4”

and “2(inputs) RMSE

Assaleh et al [19]  Closing price 1 MSE,MAPE forcomparison

Zekic [20] Closing price and othertechnical indicators

Mostly 1,sometimes 2

Between “inputs/3”

and “2(inputs) Sigmoid

Kaastra and M. Boyd [21] Closing price and othertechnical indicators

Mostly 1 Between “inputs/2”

and “3(inputs)” Sigmoid MSE

Patel and Marwala [22] Closing price andmoving average

1 5 to 150 Sigmoid RMSE

Yao et al [23]  1 Between “inputs/2”

and “inputs+1” Hyperbolicand sigmoid

Yixian et al [39] Closing price 2 “2(inputs)+1”  SigmoidMaria et al [41] Past values 2 “2(inputs)+1”  MSE

Isfan et al [24] Closing price 1 “2(inputs)+2”  Sigmoid andlinear

MSE

Desai et al [26] Closing price andmoving average

1 Between “inputs/2”

and “2(inputs)” sigmoid MSE

Mehrara et al [27] Moving average 2 “2(inputs)+2”  Sigmoid,linear, Volterra

MSE andMAPE

Mostafa [28] Closing price 1 Sigmoid MSE andRMSE

de Oliveira [29] Closing price 1 Between “input”

and “input*4” MSE

Katz [57] Closing price, movingaverage

1 Between “input/2”

and “input*3” Sigmoid MSE,

RMSE,MAPE

The gradient descent back-propagation with adaptive learning rate and

momentum coefficient (GDX) is one of the most basic forms and many pilot studies use

it. MathWorks, however, conducted a study comparing nine algorithms in different types

of problems, speed, and memory requirements. The results reveal that the Levenberg-

Page 43: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 43/115

43

Marquardt algorithm is the fastest and most suitable for function approximation problems

and small to medium networks. The resilient algorithm is also the best performer for

 pattern recognition problems. However, the gradient-descent with learning rate and

momentum is viewed as suitable only for larger networks. Since the maximum

architecture to be used in this research is 2-6-1 and the problem is function

approximation, Levenberg-Marquardt was chosen for training. In order to double check

the suitability of the chosen algorithm, an initial test was conducted where 600 different

ANNs were trained with the gradient-descent (GDX), resilient, and Levenberg-Marquardt

(LM) while different combinations of learning rate and momentum coefficient were

tested for the GDX. The results show that the LM performs best and gives the minimum

MSE as shown in Figure 10. 

Figure 10 Mean square error of ANN trained with 3 back-propagation algorithms

Another aspect of training is the stopping criteria. Since the network is trained in

iterations, also called epochs, the number of these iterations must be specified to guide

the training process. The number of iterations can be specified before training starts, but

this method has two disadvantages. If the network converges very quickly, overtraining

will occur, but if the opposite happens then training would stop before the network is able

to converge [61]. On the other hand, the convergence method, which was used in this

Page 44: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 44/115

44

research, solves this problem by stopping when the performance function stops

improving, i.e. the network converges [21]. Figure 11 shows a network stopping after the

error stopped improving. Therefore, while training the ANN, the maximum epoch is

fixed to a high value that will not likely be reached (1000) to give training enough trials

to converge.

Moreover, other stopping criteria include the maximum time for training. Since

the LM algorithm is usually very fast, training time is usually very short. However,

recurrent NARX networks tend to take a longer time and thus a maximum time of ten

minutes is set to stop training.

Figure 11 Training stopping due to convergence

5.2.8 Implementation

After all parameters had been investigated in the literature review and their valueshad been chosen, the neural networks‟ building and training process was ready. Table 6

shows a summary of all parameters. Some were fixed and others, the ones that are seen to

 be most important, were varied to investigate their statistical significance. The parameters

that were varied were chosen as the experimental design factors. MATLAB neural

Page 45: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 45/115

45

networks toolbox was used to build and train all ANNs for this experiment.   Figure 12

shows a sample ANN in the training process.

Table 6 Summary of the factor settings to be used in this experiment and their values

Input variables Closing price and EMA FixedOutput variable Closing price Fixed

Training, testing, validation 70-15-15 Fixed

ANN type FFNN, RNN(NARX) Experiment factor

 Number of input neurons 2 Fixed

 Number of hidden layers 1 Fixed

 Number of neurons inhidden layer

2 to (2x3) Experiment factor

 Number of output neurons 1 Fixed

Hidden layer transferfunction

Hyperbolic tangent sigmoidshape ( factor in equation

5.3)

Experiment factor

Output layer transferfunction

Hyperbolic tangent sigmoidand Pure linear

Experiment factor

Training algorithm Levenberg-Marquardt back- propagation

Fixed

Training algorithm learningrate (Mu)

0.001 to 1 Experiment factor

Training algorithm learningrate updates (Mu step upand Mu step down)

10 and 0.1 Fixed

Error (performance)

function

MSE Fixed

Max epochs 1000 Fixed

Max training time 10 mins Fixed

5.3 Selection of the response variable

The response variable chosen was the mean square error (MSE) which is

calculated as follows:

(5.4)

Page 46: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 46/115

46

where is the output of the neural network and is the target value. MSE was chosen

 because it was used to train the neural networks, and would therefore better reflect the

 behavior of the network in the experiment.

Figure 12 ANN training in progress  –  MATLAB

5.4 Choice of experimental design

The choice of experimental design involves the choice of sample size, which is

the number of replicates, the run order for the experimental trials, and whether blocking

is required or not. There are many books and guideline sources that provide

recommendations regarding the suitable designs given the number of factors and levels.

This research used Montgomery [48] as a catalog for selecting an appropriate design.

Minitab, a statistical software package, was used in this study to facilitate the building

and analysis of statistical design models. Two models were built, one four-factor full

Page 47: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 47/115

47

factorial design, and one five-factor fractional factorial design. These designs assumed

the following:

  The experimental data are normally distributed

 

Errors are normally distributed and have a constant variance

  Factor effects are linear

5.4.1 Model 1

This model examines the effects of four factors and their interactions; each factor is

varied along two levels: low and high. Factors can be categorical, like the network type

and the output layer transfer function, or they can be numeric as in the case of number of

hidden layer neurons and learning rate Mu. This experimental design was a full factorial,

which means that no confounding existed between any two terms.

Design model: four-factor full factorial design (24)

Replications: 3

 Number of runs: 48

Factor and levels: see Table 7 

Table 7 Model 1

Factor Name Low High

A Type FFNN RNN

B H- neurons 2 6

C Output TF Tan Linear

D Mu 0.001 1

5.4.2 Model 2

This model examined the effects of five factors and their interactions; each factor was

varied along two levels: low and high. The shape of the hidden layer sigmoid function

was varied in this model in addition to the same factors used in the previous model. This

Page 48: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 48/115

48

experimental design was a fractional factorial, which means that there existed a

confounding effect; however, higher order interactions were assumed to be negligible.

Design model: five-factor fractional factorial design of resolution V (25-1 res. V)

Replications: 3

 Number of runs: 48

Factor and levels: see Table 8

Table 8 Model 2

Factor Name Low High

A Type FFNN RNN

B H- neurons 2 6

C HiddenSigmoid

1 3

D Output TF Tan Linear

E Mu 0.001 1

Page 49: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 49/115

49

CHAPTER 6: Results

The results of the study are presented in this chapter along with explanations

regarding these results. The objectives of this thesis are: to identify important parameters

of financial forecasting ANNs, to develop properly designed experiments to identify the

significant parameters in each market, and to compare the similarities and differences of

the results between developed, emerging, and frontier markets. In order to achieve these

objectives, a two-step approach was taken: the first step was to identify the most

important parameters in building ANNs and survey previous literature to find the best

 practices. This step was accomplished in the previous chapter (Chapter 5: Methodology).

The next step was to recognize the ANN parameters that can be varied along with their

suitable levels, then to develop designed experiments to produce meaningful results.

Two designed experiments were conducted with two models: a four-factor full

factorial design (24), and a fractional factorial five-factor design (25-1). These models

were used to study the degree of significance of the chosen factors on the mean square

error (MSE) of six chosen markets that represent frontier, emerging, and developed

markets. The six markets studied were: UAE and Jordan as frontier markets, Turkey and

Egypt as emerging markets, and UK and Japan as developed markets. Figure 13 shows

the data fed to the ANN for each market. These are closing prices of the MSCI country

indices from 1st of August 2002 till 31st of July 2012, spanning a ten year time period.

The period of ten years was chosen because it represents a sufficient time period where

 both market ups and downs are present. This was thought to help the ANN generalize

 better. It should be noted, however, that the UAE market is new and thus the data before

June 2005 is not available.

Page 50: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 50/115

50

a)  UAE b) Jordan

c ) UK d) Japan

e ) Turkey f) Egypt

Figure 13 Markets’ Data

Page 51: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 51/115

51

6.1 Model 1

The first experimental model was a four-factor full factorial design where only

four parameters were varied with two levels each. The number of experimental

treatments was 2

4

 (16), and each treatment was repeated three times in order to computethe pure error and achieve more accurate results; this made the total number of iterations

forty-eight. There were also six response variables, which were the mean square error

(MSE) of each market.

The four factors varied were: the network type, being either feed-forward or recurrent

(NARX), the number of neurons in the hidden layer, the starting learning rate of LM

algorithm, and the type of transfer function used in the output layer. Please refer to

Chapter 3 for a description of feed-forward and NARX networks and LM back propagation algorithm. Also refer to section 5.2 for a discussion of choice of factors and

levels and Table 6 for a summary of final choices. Table 9 gives a summary of the 24 

model and Table 10 shows the response variables.

Table 9 Four factor full factorial (24)

Factor Name Low High

A  Network type FFNN RNN

B H- neurons 2 6

C Output TF Tan Linear

D Mu 0.001 1

Table 10 Response variables

MSE UAE

MSE Jordan

MSE Turkey

MSE Egypt

MSE UK

MSE Japan

Therefore, the total number of ANNs created for this model was 48 x 6 = 288

Page 52: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 52/115

52

MATLAB neural network toolbox was used to create and train the ANNs. The

training, testing, and validation portions were divided as 70-15-15, and the MSE was

recorded for each run. All results were entered into Minitab for statistical analysis. The

analysis was completed as follows:

1.  Ran the analysis for variance test ANOVA to check for factor significance

2.  Chose all significant factors and interactions. Here, it was assumed that three and

four level interactions are negligible (ignored) for simplicity

3.  The validity of the model was checked by verifying the two assumptions of

linearly distributed residuals and constant variance of residuals

4.  Model graphs were produced to check the behavior of factors and interactions

5.  The significant factors and their behaviors (increasing or decreasing) were

compared across markets along with each market‟s regression model 

6.  Final conclusions were made

6.1.2  ANOVA results and significant factors

The analysis of variance, ANOVA, is used to test hypotheses concerning means

when more than one group is involved. ANOVA checks whether the means among two

or more groups are equal under the assumptions of normal distribution and constant

variance. The null hypothesis is as follows: Ho: the tested term is not significant. For the

null hypothesis to be rejected and the term to be considered significant, assuming a 95%

confidence interval, the p-value should be < 0.05. The lower the p-value the more

significant the term is. Please refer to Chapter 4 for a more in-depth explanation of how

the ANOVA table is constructed and how values are calculated.

After running the test for ANOVA and finding the significant factors, graphicalmethods were then used to represent them and their interactions. Appendix A shows the

half-normal probability plot with the significant factors selected. The half normal plot

shows the position of each term‟s mean  relative to means of other factors and

interactions. All values are absolute and the further away the term is from the normal

line, the more significant it is. Pareto charts can also be used to check for significant

Page 53: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 53/115

53

factors and interactions; these are also added in Appendix A. The ANOVA test table and

calculations are also presented in Appendix B. For the results of the ANOVA test to be

meaningful, the assumptions of linearly distributed residuals and constant variance of

residuals must be true. These assumptions were tested and verified for this model. The

results and evaluation can be found in Appendix C.

Table 11 Significant effects for model 1

Response Significant effects

UAE MSE B C D BC AB

Jordan MSE B C BC

Egypt MSE B C BC

Turkey MSE C BC

UK MSE None

Japan MSE none

After conducting a 2x2x2x2 ANOVA test with a 95% confidence interval, the list

of significant effects for each market MSE was retrieved. Table 11 summarizes the list of

significant effects for each response.

Frontier markets:

o  UAE MSE: There is a main effect of number of hidden layer neurons (B) with

networks made of six neurons having a lower average MSE than networks with two

neurons, F  (1, 41) = 7.93, p<0.05. There is also a main effect of output layer transfer

function (C) where networks with a pure linear function in the output layer result in

a lower average MSE than networks with a hyperbolic tangent function, F  (1, 41) =

26.68, p<0.05. Moreover, there is a main effect of learning rate Mu (D) where

initializing it to 1 results in a lower average MSE than initializing it to 0.001, F  (1,

41) = 8.93, p<0.05. Furthermore, there is an interaction between the type of

network and the number of hidden layer neurons (AB), with a much larger

difference between having two or six hidden neurons when the network type is

recurrent NARX. When the ANN is feed-forward, changing the number of hidden

neurons does not cause a significant difference in mean MSE. Only when the ANN

is recurrent (NARX) does the size of the network make a difference. On average,

having a NARX network with six neurons in the hidden layer gives the lowest

Page 54: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 54/115

54

average MSE, F  (1, 41) = 5.42, p<0.05. Finally, there is also an interaction between

the number of hidden layer neurons and output layer transfer function (BC); when

the number of neurons is two, changing the output transfer function from hyperbolic

tan to pure linear is stronger than when there are six neurons in the hidden layer.

This being said, a network with six hidden neurons gives a lower average MSE,

F  (1, 41) = 7.15, p<0.05. Figure 14 shows the significant main effects and Figure 15 

shows the significant interactions. Please refer to Appendix B for the full ANOVA

results table. The regression equation is:

MSE UAE =

+4.769E-004

-2.457E-006 * A

-3.721E-005 * B

-6.825E-005 * C

-3.949E-005 * D

-3.077E-005 * A * B

+3.534E-005 * B * C 

Figure 14 UAE MSE significant main effects

Figure 15 UAE MSE significant interactions

Page 55: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 55/115

55

o  Jordan MSE: There is a main effect of number of hidden layer neurons (B) with

networks made of six neurons having a lower average MSE than networks with two

neurons, F  (1, 44) = 16.55, p<0.05. There is also a main effect of output layer

transfer function (C) where networks with a pure linear function in the output layer

result in lower average MSE than those with a hyperbolic tangent function, F  (1, 44)

= 33.72, p<0.05. Moreover, there is also an interaction between the number of

hidden layer neurons and output layer transfer function (BC). When the number of

neurons is two, changing the output transfer function from hyperbolic tan to pure

linear is stronger than when there are six neurons in the hidden layer. This being

said, a network with six hidden neurons gives lower average MSE, F  (1, 44) = 26.2,

p<0.05. Figure 16 shows the significant main effects and Figure 17 shows the

significant interactions. Please refer to Appendix B for the full ANOVA results

table. The regression equation is:

MSE Jordan =

+3.882E-004

-3.090E-005 * B

-4.412E-005 * C

+3.889E-005 * B * C 

Figure 16 Jordan MSE significant main effects

Figure 17 Jordan MSE significant interactions

Page 56: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 56/115

56

Both frontier markets have the main effects of number of hidden layer neurons (B)

and output layer transfer function (C). They also both have an interaction between the

number of hidden layer neurons and output layer transfer function (BC). However, UAE

MSE has a main effect of learning rate (D) and an interaction between the type of

network and the number of hidden layer neurons (AB); both of these interactions do not

exist in Jordan MSE.

o  Emerging markets:

o  Egypt MSE: There is a main effect of number of hidden layer neurons (B) with

networks made of six neurons having a lower average MSE than networks with two

neurons, F  (1, 44) = 28.09, p<0.05. There is also a main effect of output layer

transfer function (C) where networks with a pure linear function in the output layer

result in lower average MSE than those with a hyperbolic tangent function, F  (1, 44)

= 21.12, p<0.05. Moreover, there is an interaction between number of hidden layer

neurons and output layer transfer function (BC); when the number of neurons is

two, changing the output transfer function from hyperbolic tan to pure linear is

stronger than when there are six neurons in the hidden layer. Here, it is observed

that a network with six hidden neurons gives lower average MSE when the output

transfer function is hyperbolic tangent, yet a network with two hidden neurons

 performs better when the output transfer function is pure linear. In general, a

network with six hidden neurons performs better than a network with two hidden

neurons, F  (1, 44) = 25.2, p<0.05. Figure 18 shows the significant main effects and

Figure 19 shows the significant interactions. Please refer to Appendix B for the full

ANOVA results table. The regression equation is:

MSE Egypt =

+3.532E-004

-3.850E-005 * B

-3.338E-005 * C+3.647E-005 * B * C 

Page 57: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 57/115

57

Figure 18 Egypt MSE significant main effects

Figure 19 Egypt MSE significant interactions

o  Turkey MSE: There is a main effect of output layer transfer function (C) where

networks with a pure linear function in the output layer result in lower average

MSE than networks with hyperbolic tangent function, F  (1, 44) = 5.06, p<0.05.

Moreover, there is an interaction between the number of hidden layer neurons and

output layer transfer function (BC); when the number of neurons is two, changing

the output transfer function from hyperbolic tan to pure linear is stronger than when

there are six neurons in the hidden layer. Here, it is observed that a network with

two hidden neurons and a pure linear output transfer function gives the lowest

average MSE, yet a network with two hidden neurons performs worst when the

output transfer function is hyperbolic tangent. A network with six hidden neurons

gives a medium average MSE and is indifferent with the type of output transfer

function used, F  (1, 44) = 5.49, p<0.05. Figure 20 shows the significant main effects

and Figure 21 shows the significant interactions. Please refer to Appendix B for the

full ANOVA results table. The regression equation is:

MSE Turkey =

+8.714E-004

-1.193E-005 * B

-3.021E-005 * C

+3.148E-005 * B * C 

Page 58: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 58/115

58

Figure 20 Turkey MSE significant main effects

Figure 21 Turkey MSE significant interactions

Both emerging markets have the main effect of output layer transfer function (C)

and an interaction between the number of hidden layer neurons and output layer transfer

function (BC). However, Egypt MSE has a main effect of number of hidden layer

neurons (B) that is not present in Turkey MSE.

o  Developed markets:

o  Both developed markets‟ responses, UK MSE and Japan MSE, have neither main

effects nor any interactions that are statistically significant. This means that all

neural networks built for these markets perform equally on average. None of the

studied four factors are important given this experimental model.

6.1.3 

Observations

The main effects of number of hidden layer neurons (B) and output layer transfer

function (C), as well as their interaction (BC) are significant for all frontier and emerging

markets‟ responses. It could be generally assumed that these factors and their interaction

are always significant due to their importance in building ANNs for forecasting financial

Page 59: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 59/115

59

time series of the tested frontier and emerging markets. Furthermore, not only are these

effects and their interaction mutually significant, but their behavior is similar too. For all

three markets, hidden layer neurons (B) cause a decrease in MSE when changed from

low (2) to high (3). It is strongest for Egypt, then UAE, Jordan, and Turkey (in

descending order). Similarly, output transfer function (C) reduces the MSE when

changed from low (tan) to high (pure linear) in all four frontier and emerging markets. It

is strongest for UAE, then Jordan, Egypt, and Turkey (in descending order). Finally, the

two factors‟ interaction always shows that having two hidden layer neurons causes a

more significant change in output transfer function. It is strongest for Jordan, then Egypt,

UAE, and Turkey (in descending order). Thus it can be concluded that the effects of all

frontier and emerging markets are similar in behavior, yet there is no meaningful order

for their strength. Moreover, tested developed markets agree with the fact that no main

effect or interactions are significant. Finally, it is surprising in this experiment that the

type of network used is not a significant main effect in any of the markets‟ responses.

Using recurrent NARX networks, which are considered better estimators for time-series

data, does not make a clear improvement over feed-forward networks for most market

MSE (except for UAE MSE when six neurons are present in the hidden layer). At least

for this model, the difference in average MSE for both network types is statistically

insignificant.

All in all, it can be concluded from this model that although frontier and emerging

markets may not agree on the full list of significant factors, there tends to be common

significant effects and interactions that have a similar behavior for all markets (increasing

in all cases or decreasing in all cases); yet there does not seem to be an informative order

for the strength of effects between markets. There are also common non-significant

effects for all markets. However, developed mark ets‟ ANNs are unique in the sense that

no parameter change causes a significant change in the average MSE.

6.2 Model 2

In the second experimental model, an extra factor was added to the previously

studied factors. The fifth factor was the shape of sigmoid function in the hidden layer.

The sigmoid function is the most widely used transfer function for the hidden layer as

Page 60: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 60/115

60

stated in Chapter 5. Its shape was modified in this experiment to examine the effect of

this change on the statistical significance of the function on the performance of ANNs.

The hyperbolic tangent sigmoid function can be described as follows:

(6.1)

where is a variable that controls the shape of the function, and is varied in this

experiment between two levels. The two levels are low=1 and high=3. Figure 22 shows

the shape of the function in both cases, where the red line shows the function with = 1,

and the blue line shows the function with = 3.

Figure 22 Hyperbolic tangent sigmoid transfer function

The experimental model is a five factor fractional factorial design, with the

number of experimental treatments being 25-1 (16). Each treatment is repeated three times

to compute the pure error and get more accurate results; thus the total number of

iterations is forty-eight. There are six response variables, which are the mean square error

(MSE) of each market. The model is a resolution V model, meaning that main effects are

not confounded with other main effects, two factors interactions, or three factors

interactions, but two factor interactions are confounded with three factor interactions. In a

fractional factorial design, one is willing to pay the price of confounding and losing the

ability to estimate interactions between three or more factors for the ability to study more

factors with less iteration. Since this study considers higher order interactions to be

negligible, this trade-off is favorable. Table 12 gives a summary of the 25-1 res. V model,

Page 61: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 61/115

61

and Table 13 shows the response variables. Table 14 lists the terms that are confounded

in the experimental model.

Table 12 5 factor fractional factorial (25-1 res.V)

Factor Name Low High

A  Network type FFNN RNN

B H- neurons 2 6

C H - Sigmoid 1 3

D Output TF Tan Linear

E Mu 0.001 1

Table 13 Response variables

MSE UAE

MSE Jordan

MSE Turkey

MSE Egypt

MSE UK

MSE Japan

Table 14 Aliases for model 2

 A=BCDE

 B=ACDE

C=ABDE

 D=ABCE

 E=ABCD

 AB=CDE

 AC=BDE

 AD=BCE

 BC=ADE

 BD=ACE

 BE=ACD

CD=ABE

CE=ABD

 DE=ABC

6.2.2 

ANOVA results and significant factors

For this experimental model, 2x2x2x2x2 ANOVA test was conducted. The null

hypothesis is as follows: Ho: the tested term is not significant. For the null hypothesis to

 be rejected and the term to be considered significant, assuming a 95% confidence

Page 62: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 62/115

62

interval, the p-value should be <0.05. The lower the p-value the more significant the term

is. After running the test for ANOVA and finding the significant factors, graphical

methods were used to represent the significant factors and their interactions. Appendix A

shows the half-normal probability plot with the significant factors selected. Pareto charts

can also be used to check for significant factors and interactions; these are added in

Appendix A. The ANOVA test table and calculations are also presented in Appendix B.

For the results of the ANOVA test to be correct, the assumptions of linearly distributed

residuals and constant variance of residuals must be true. These assumptions were tested

and verified for this model; the results and evaluation can be found in Appendix C.

Table 15 Significant effects for model 2

Response Significant factors

UAE MSE B C D

Jordan MSE B C D BC BD

Egypt MSE A B C D BC E CD AE AC

Turkey MSE A B C

UK MSE C D BE CD AB AE

Japan MSE C D BC CD AE

After conducting a 2x2x2x2x2 ANOVA test with a 95% confidence interval, the

list of significant effects for each market MSE is retrieved.  Table 15 summarizes the list

of significant effects for each response.

o  Frontier markets:

o  UAE MSE: There is a main effect of the number of hidden layer neurons (B)

with networks made of six neurons having a lower average MSE than networks

with two neurons, F  (1, 44) = 6.12, p<0.05. In addition, there is a main effect of

the hidden sigmoid function (C) where a wider function in level low (1) gives a

 better performance than a narrower function in level high (3), F  (1, 44) = 16.92,

p<0.05. There is also a main effect of output layer transfer function (D) where

networks with pure linear function in the output layer result in a lower average

MSE than networks with hyperbolic tangent function, F  (1, 44) = 7.86, p<0.05.

There are no interactions for this response. Figure 23 shows the main effects.

Page 63: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 63/115

63

Please refer to Appendix B for the full ANOVA results table. The regression

equation is:

MSE UAE =

+5.415E-004

-4.054E-005 * B+6.743E-005 * C

-4.594E-005 * D 

Figure 23 UAE MSE significant main effects

o  Jordan MSE: There is a main effect of the number of hidden layer neurons (B)

with networks made of six neurons having a lower average MSE than networks

with two neurons, F  (1, 42) = 26.87, p<0.05. In addition, there is a main effect of

the hidden sigmoid function (C) where a wider function in level low (1) gives

 better performance than a narrower function in level high (3), F  (1, 42) = 47.66,

p<0.05. There is also a main effect of output layer transfer function (D) where

networks with a pure linear function in the output layer result in lower average

MSE than those with hyperbolic tangent function, F  (1, 42) = 19.71, p<0.05.

Furthermore, there is an interaction between the number of hidden neurons and

the shape of the hidden sigmoid function (BC). Here, when a network has only

two neurons in the hidden layer, the change of the sigmoid function from wide

(low =1) to narrow (high =3) yields a stronger increase in average MSE than

when the network has six neurons. A network with six neurons also tends to give

lower average MSE, F  (1, 42) = 7.14, p<0.05. Moreover, there is also an

interaction between the number of hidden layer neurons and output layer transfer

function (BD). When the number of neurons is two, changing the output transfer

function from hyperbolic tan to pure linear is stronger than when there are six

neurons in the hidden layer. That being said, a network with six hidden neurons

Page 64: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 64/115

64

always gives a lower average MSE no matter what the output transfer function is,

F  (1, 42) = 4.39, p<0.05. Figure 24 shows the significant main effects and Figure

25 shows the significant interactions. Please refer to Appendix B for the full

ANOVA results table. The regression equation is:

MSE Jordan =

+4.396E-004

-5.044E-005 * B

+6.716E-005 * C

-4.320E-005 * D

-2.599E-005 * B * C

+2.038E-005 * B * D

Figure 24 Jordan MSE significant main effects

Figure 25 Jordan MSE significant interactions

Both frontier markets have the main effects of number of hidden layer neurons (B),

shape of hidden layer sigmoid function (C), and output layer transfer function (D).

However, while the UAE MSE has no significant interactions, Jordan MSE has an

interaction between the number of hidden layer neurons and shape of hidden sigmoid

Page 65: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 65/115

65

(BC), as well as an interaction between the number of hidden layer neurons and output

layer transfer function (BD). 

o  Emerging markets:

Egypt MSE: There is a main effect of type of network (A) where using a feed-

forward network yields a lower average MSE than using a NARX network, F  (1,

38) = 15.12, p<0.05. There is also a main effect of the number of hidden layer

neurons (B) with networks made of six neurons having a lower average MSE than

networks with two neurons, F  (1, 38) = 37.16, p<0.05. In addition, there is a main

effect of the hidden sigmoid function (C) where a wider function in level low (1)

gives a better performance than a narrower function in level high (3), F  (1, 38) =

126.81, p<0.05. There is also a main effect of output layer transfer function (D)

where networks with pure linear function in the output layer result in lower

average MSE than networks with hyperbolic tangent function, F  (1, 38) = 55.73,

p<0.05. Adding to that, there is a smaller effect of learning rate „Mu‟ (E) where

setting it to 1 gives a lower average MSE than setting it to 0.001, F  (1, 38) = 4.50,

p<0.05. Moreover there are multiple interactions affecting Egypt MSE. Firstly,

an interaction between the number of hidden layer neurons and hidden sigmoid

function (BC) is present where having two neurons causes a stronger effect in

narrowing the sigmoid function (moving from low (1) to high (3)). That being

said, a wider sigmoid always gives a lower MSE, as well as a network with six

neurons, F  (1, 38) = 7.71, p<0.05. Secondly, the interaction between hidden

sigmoid function and output layer transfer function (CD) shows that when the

sigmoid is narrow (3), changing the output transfer function from hyperbolic tan

to pure linear is stronger than when the sigmoid function is wide (1); a network

with a wide sigmoid function is almost indifferent about the nature of the output

function,F  

(1, 38) = 12.69, p<0.05. Furthermore, the interaction between type ofnetwork with shape of hidden sigmoid (AC) shows that having a NARX network

makes changing the shape of sigmoid from low to high stronger. A wider sigmoid

(low) always gives a lower MSE, F  (1, 38) = 14.45, p<0.05. Finally, the

interaction between type of network and learning rate „Mu‟ (AE) shows that it is

only significant to change the learning rate from 0.001 to 1 when the network is

Page 66: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 66/115

66

recurrent NARX, F  (1, 38) = 15.20, p<0.05. Figure 26 shows the significant main

effects and Figure 27 shows the significant interactions. Please refer to Appendix

B for the full ANOVA results table. The regression equation is:

MSE Egypt =+4.527E-004

+3.509E-005 * A

-5.501E-005 * B

+1.016E-004 * C

-6.738E-005 * D

-1.915E-005 * E

+3.430E-005 * A * C

-3.519E-005 * A * E

-2.507E-005 * B * C

-3.215E-005 * C * D 

Figure 26 Egypt MSE significant main effects

Page 67: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 67/115

67

Figure 27 Egypt MSE significant interactions

o  Turkey MSE: There exists a main effect of type of network (A) where a feed-

forward network gives a lower average MSE than a recurrent NARX network,

F  (1, 44) = 7.35, p<0.05. There is also a main effect of the number of hidden layer

neurons (B) with networks made of six neurons having a lower average MSE than

networks with two neurons, F  (1, 44) = 7.18, p<0.05. In addition, there is a main

effect of the hidden sigmoid function (C) where a wider function in level low (1)

gives a better performance than a narrower function in level high (3), F  (1, 44) =

30.97, p<0.05. There are no interactions for this response. Figure 28 shows the

significant main effects. Please refer to Appendix B for the full ANOVA results

table. The regression equation is:

Page 68: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 68/115

68

MSE Turkey =

+9.513E-004

+3.946E-005 * A

-3.901E-005 * B

+8.102E-005 * C

Figure 28 Turkey MSE significant main effects

Both emerging markets have the main effect of hidden layer neurons (B) and

hidden sigmoid function (C). In all tested markets, only Turkey MSE has a main effect of

network type (A). Furthermore, Egypt MSE has four interactions that are not significant

for Turkey MSE. Behavior wise, the common effects between Egypt and Turkey behave

in a similar fashion.

o  Developed markets:

UK MSE: There is a main effect of the hidden sigmoid function (C) where a

wider function in level low (1) gives a better performance than a narrower

function in level high (3), F  (1, 38) = 33.32, p<0.05. There is also a main effect of

output layer transfer function (D) where networks with pure linear function in the

output layer result in a lower average MSE than networks with hyperbolic tangent

function, F  (1, 38) = 17.80, p<0.05. Moreover the interaction between hidden

sigmoid function and output layer transfer function (CD) shows that when the

sigmoid is narrow (3) changing the output transfer function from hyperbolic tan to

 pure linear is stronger than when the sigmoid function is wide (1); a network with

a wide sigmoid function is almost indifferent about the nature of the output

function, F  (1, 38) = 10.49, p<0.05. Additionally, there are two interactions

involving the type of network with other factors. The interaction of type of

network with number of hidden neurons (AB) is a dis-ordinal interaction where

Page 69: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 69/115

69

effect lines have opposite slopes and cross each other. When the network is feed-

forward, moving the hidden neurons from two to six increases the average MSE;

however, when the network is recurrent NARX, moving the hidden neurons from

two to six decreases the average MSE, F  (1, 38) = 9.15, p<0.05. The other

interaction is between type of network and learning rate „Mu‟ (AE). This

interaction is also a dis-ordinal interaction with crossing lines. When the network

is feed-forward, increasing Mu from 0.001 to 1 decreases the average MSE;

conversely, when the network is NARX, increasing Mu from 0.001 to 1 increase

the average MSE, F  (1, 38) = 4.29, p<0.05. Next, there is also an interaction

 between the number of hidden layer neurons and learning rate (BE). Again this is

a crossing interaction. When the network has two neurons in the hidden layer,

increasing Mu from 0.001 to 1 increases the average MSE, while a network with

six neurons in the hidden layer causes a change of Mu from 0.001 to 1 to decrease

the average MSE. F  (1, 38) = 4.45, p<0.05. Figure 29 shows the significant main

effects and Figure 30 shows the significant interactions. Please refer to Appendix

B for the full ANOVA results table. The regression equation is:

MSE UK =

+9.147E-004

-6.643E-006 * A

-9.912E-006 * B+1.047E-004 * C

-7.649E-005 * D

+3.358E-006 * E

-5.484E-005 * A * B

+3.755E-005 * A * E

-3.826E-005 * B * E

-5.871E-005 * C * D 

Figure 29 UK MSE significant main effects

Page 70: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 70/115

70

Figure 30 UK MSE significant interactions

o  Japan MSE: There is a main effect of the hidden sigmoid function (C) where a

wider function in level low (1) gives a better performance than a narrower

function in level high (3), F  (1, 39) = 46.61, p<0.05. There is also a main effect of

output layer transfer function (D) where networks with a pure linear function in

the output layer result in a lower average MSE than networks with hyperbolic

tangent function, F  (1, 39) = 29.03, p<0.05. Furthermore, there is an interaction

 between the number of hidden layer neurons and shape of sigmoid function (BC);

when the network has six neurons in the hidden layer, the effect of changing the

sigmoid function from low (1) to high (3) is stronger, F  (1, 39) = 4.05, p<0.05.

Moreover, the interaction between hidden sigmoid function and output layer

transfer function (CD) shows that when the sigmoid is narrow (3), changing the

output transfer function from hyperbolic tan to pure linear is stronger than when

the sigmoid function is wide (1); a network with a wide sigmoid function always

Page 71: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 71/115

71

gives a lower average MSE, F  (1, 39) = 11.65, p<0.05. Lastly, there exists an

interaction between the type of network and the learning rate (AE) which is a dis-

ordinal crossing interaction. When the network type is feed-forward, changing Mu

from 0.001 to 1 causes an increase in average MSE; conversely, when the

network is NARX, changing Mu from 0.001 to 1 causes a decrease in average

MSE, F  (1, 39) = 10.38, p<0.05. Figure 31 shows the significant main effects and

Figure 32 shows the significant interactions. Please refer to Appendix B for the

full ANOVA results table. The regression equation is:

MSE Japan =

+1.159E-003

+1.231E-005 * A

-6.984E-006 * B

+7.917E-005 * C

-6.248E-005 * D

-1.291E-005 * E

-3.736E-005 * A * E

+2.334E-005 * B * C

-3.957E-005 * C * D 

Figure 31 Japan MSE significant main effects

Page 72: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 72/115

72

Figure 32 Japan MSE significant interactions

Both developed markets share the main effects of hidden layer neurons (B) and

hidden sigmoid function (C), which behave similarly in both markets. They also share

two interactions: the one between the number of hidden layer neurons and shape of

sigmoid function (BC), and the one between the type of network and the learning rate

(AE). The first interaction behaves similarly in both markets; however, the second

interaction is an interesting one because its behavior is different between the UK and

Japan; in fact, its effect on UK MSE is opposite to that of Japan MSE.

6.2.2 Observations

The main effect of shape of sigmoid function (C) is common among all six tested

markets. The main effect of output layer transfer function (D) is also common among five

markets. It could, therefore, be generally assumed that these factors are the most

important and significant among the tested factors, and are necessary for building most

ANNs for financial time series forecasting. Moreover, the factor of hidden layer neurons

(B) is commonly significant for all frontier and emerging markets, and the factor of type

Page 73: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 73/115

73

of network is common among emerging markets only. Therefore, it is observed that

markets of the same classification tend to share the list of main effects, with the exception

of Egypt and Turkey where Egypt has two additional main effects. As for the behavior,

all main effects behave in a very similar way across different markets, whether positive

or negative, yet there does not seem to be any pattern in factors ‟ strength related to the

modeled market MSE. Looking from the other side, it can be assumed that the type of

network (A) and the learning rate „Mu‟ (E) neither are commonly insignificant nor

attributed to any interaction for frontier markets. Moving to the interactions, it is

observed that each market has its unique set of interactions; however, whenever an

interaction is present in more than one market, its behavior tends to be similar across

markets. An exception to this is the interaction between the type of network and the

learning rate (AE) which appears to be significant for both UK MSE and Japan MSE, yet

has an opposite behavior for each one of them. This seems to be unexpected behavior for

markets within the same classification.

All in all, it can be concluded from this model that there exist main effects that are

commonly significant among all markets, as well as factors that are commonly significant

or commonly insignificant among markets of the same classification. The same could be

said about the effects‟ behaviors. However, nothing can be said about the interactions as

each market tends to contain its unique list of interactions.

Page 74: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 74/115

74

CHAPTER 7: Conclusion

Using ANNs to build efficient predictive models for financial time series is an

active area of research. Moreover, design of experiments (DOE) is a common practice for

finding significant factors and interactions, but it has not been used before with ANNs in

the financial field. This research combines DOE and ANNs to build the best performing

ANNs given a selected list of factors, then aims to study which factors are most

significant to ANN performance, and whether this is the same or not between different

markets.

To tackle this problem, a step-by-step approach was taken. After reviewing past

literature and considering many parameters, some are fixed to common practices and five

others are chosen to be factors in the designed experiments. Two experimental models

were built to have a better picture of the same factors under different models, because

results of statistical models are very specific. The first model was a four factor full

factorial design, and the second model was a five factor fractional factorial design. The

factors studied were the type of network, the number of neurons in the hidden layer, the

shape of hidden layer tangent sigmoid function, the type of output transfer function, and

the learning rate of Levenberg-Marquardt algorithm. As this is a pilot study, only two

levels were used for each factor, and their behavior was assumed to be linear forsimplicity issues.

7.1 Results conclusion

The results reveal that there exist certain factors among the studied factors that are

commonly significant, or commonly insignificant among all studied markets. This,

however, seems to apply only to main effects and not to interactions. The main effects of

the number of hidden layer neurons and output transfer function appear to be significant

for all frontier and emerging markets in Model 1; they also behave in a similar manner

across all stated markets. In addition, the main effect of type of network appears to be

insignificant for all markets. Developed markets agree in that no particular effect is

significant. Furthermore, commonly significant main effects also emerge in the results

from Model 2, which also share similar behavior along all tested markets. These factors

Page 75: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 75/115

75

are shape of sigmoid function and output layer transfer function. The factor of hidden

layer neurons is commonly significant for all frontier and emerging markets, and the

factor of type of network is common among emerging markets only.  Therefore, it is

observed that markets of the same classification tend to agree on the list of main effects,

with the exception of Egypt and Turkey, where Egypt has two additional main effects. It

is also observed that type of network and the learning rate „Mu‟ are neither significant

nor attribute to any interaction for tested frontier markets. The results of the study

confirm that it is possible to use designed experimentation with ANNs to find the most

appropriate combination of parameters to build networks with minimum average error.

This practice can greatly reduce the amount of trial and error needed to build good

generalizing networks. It can also lead to reductions in building costs by enabling

concentration on the most important parameters, and the elimination of less important

ones. The specific list of effects is unique for the experimental models built for this study,

yet the idea of common factors appearing among markets opens the possibility of

applying designed experimentation on different markets and with different factors.

In conclusion, this thesis has attempted to use DOE to find the significant factors

of each market and investigate whether there is a possibility that ANN parameters are

affected somehow by market maturity. The results show that DOE can indeed work with

ANNs built for financial time series, and it can identify the most important design

 parameters in building networks for each specific market. Furthermore, there exist main

effects that are commonly significant or commonly insignificant among all markets or

among markets of the same classification. However, there does not seem to be any

difference in behavior of ANN design parameters according to market classification; all

significant effects behave similarly among all markets. It is hoped that these findings add

to previous research by increasing the understanding of market price patterns and the

 behavior of ANNs.

7.2 Statistical significance and practical significance

When working with statistical significance, it is important not to forget how this is

compared to practical significance. A statistically significant difference in a response

variable is not necessarily significant in practice. As for the current case of ANN where

Page 76: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 76/115

76

MSE is a response variable, it is important to understand how this translates into reality.

The mean square error is the squared error difference between the estimated and true

values predicted by the ANN. The MSE is a risk function equivalent to the expected

value of squared loss. For a neural network trained with MSE of 0.0005 for instance, the

 performance of the network can be much higher or much lower at any instance, usually

estimated by the standard deviation. Therefore, for transactions dealing with large sums

of money, an average performance of MSE = 0.0005, or an expected error of 2.23%

yields huge losses, taking into consideration that the error can be much higher. If 10,000

shares are bought for $100 each, then there is an expected loss of $22,300. Moreover,

many investors trade with sums reaching millions of dollars; and therefore, it can be

concluded that a statistical difference between MSE 0.0002 and 0.0005 may translate into

losing 1.41% or 2.23% of investments respectively. Finally, the issue of practical

significance is a subjective matter and is always in the hand of the investor. While one

may view a difference as insignificant, another may see it as quite significant.

7.3 Limitations

The conclusions that were drawn from this study have several limitations. A

generalization of all markets cannot be made nor can one of all neural networks for a

given market, because the results are restricted to the following:

  The time period that the data is taken from. Although the data used span a long

time period to allow the neural networks to generalize as much as possible, results

are dependent on that data only.

  The variable factors specified in the model. This means that if a certain factor is

found to be insignificant, it is only statistically insignificant compared to other

factors. Therefore, the same factor could be statistically significant in another

model.

  The levels chosen for each factor. This depends on how far away the low and high

levels are from each other, the properties of each level, and the type of values. For

example, if comparing two transfer functions leads to an insignificant difference,

a different transfer function may change the results. Another example would be

Page 77: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 77/115

77

for the number of hidden layer neurons; whether this factor is significant or not

depends on whether the values range from two to six or from two to fifty; the

results could be different in this case.

  Randomness of the neural networks. ANNs tend to give random results and this

contributes to inaccuracies. This is handled by observing individual ANN results

and making sure that all values are within three standard deviations from the

mean. Replications were also included in the experimental model.

  Studying only two markets from each classification provides a minimal view of

the problem. A better understanding could be achieved by studying a much larger

number of markets.

 

Effects are assumed to be linear. The final effect plots all show linear

relationships. Although this may not be the case in reality, this assumption is used

to simplify the work for this new pilot study.

7.4 Future work

Since previous literature did not sufficiently cover the topic of comparing ANN

forecasting methods for different financial markets, this study aimed at exploring that

area of interest by building on previous literature and encouraging future research. Future

research topics could include studying ANNs with design of experiments including more

than two levels for different parameters, or taking into consideration the possibility of

non-linear behavior from one level to the other. Moreover, comparison must be done on

more market indices to have a better picture of the problem. The number of markets

studied in this research (six) is a very small portion. The larger the number of markets

studied, the better generalization assumptions one can make. In addition, more popular

indices can be studied such as S&P 500, NIKKEI, SENSEX, DFM, etc. The final aim

would be to discover the possibility of developing guidelines for building forecasting

models specific to each market classification or to how developed a market is. This

would ultimately increase the understanding of market price patterns and the behavior of

ANNs. Finally, it can be said that the area of comparing markets can always be

investigated from different angles, as the subject of ANN itself is a very wide subject

with new innovations constantly appearing.

Page 78: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 78/115

78

References

[1] J. Robinson, "Stock Price Behaviour in Emerging Markets: Tests for weak form

market efficiency on the Jamaica exchange," Social and Economic Studies, vol. 54,

no. 2, pp. 51 - 69, June 2005.

[2] G. S. Atsalakis and K. P. Valananis, "Surveying stock market forecasting techniques

- Part II: Soft computing methods," Expert Systems with Applications, vol. 36, no. 3,

 pp. 5932 - 5941, April 2009.

[3] Standard & Poor. (2011, February) www.sp-indexdata.com. [Online]. Available:

https://www.sp-

indexdata.com/idpfiles/citigroup/prc/active/whitepapers/Methodology_SP_Global_E

quity_Indices_Web.pdf. [Accessed: May. 1, 2012].

[4] MSCI. (2011, June) www.msci.com. [Online]. Available:http://www.msci.com/resources/products/indices/global_equity_indices/gimi/stdinde

x/MSCI_Market_Classification_Framework.pdf. [Accessed: May. 1, 2012].

[5] MSCI. (2011, May) www.msci.com. [Online]. Available:

http://www.msci.com/eqb/methodology/meth_docs/MSCI_May11_GIMIMethod.pd

f. [Accessed: May. 1, 2012].

[6] G. Bekaert and C. R. Harvey, "Emerging equity market volatility," Journal of

 Financial Economics, vol. 43, no. 1, pp. 29 - 77, January 1997.

[7] P. K. Jain, H. Das, R. K. Arora, "Behaviour of stock returns in selected emerging

markets," Journal of Financial Managment and Analysis, vol. 22, no. 2, pp. 13 - 25,

April 2009.

[8] T. Sener, "Uncorrelated emerging equity markets," Journal of American Academy of

 Business, Cambridge, vol. 12, no. 2, pp. 1 - 7, March 2008.

[9] P. K. Padhiary and A. P. Mishra, "Development of improved artificial neural

network model for stock market prediction," International Journal of Engineering

Science and Technology, vol. 3, no. 2, pp. 1576-1581, March 2011.

[10] E. Dimson, Stock Market Anomalies, E. Dimson, Ed. Cambridge, Great Britain:

Cambridge University Press, 1988.

[11] W. A. Risso, "The informational efficiency: The emerging markets versus the

developed markets," Applied Economics Letters, vol. 16, no. 5, pp. 485-487,

Page 79: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 79/115

79

February 2009.

[12] C. R. Harvey, "Predictable risk and returns in emerging markets," The Review of

 Financial Studies, vol. 8, no. 3, pp. 773-816, 1995.

[13] G. Bekaert, "Market integration and investment barriers in emerging equitymarkets," The World Bank Economic Review, vol. 9, no. 1, pp. 75-107, January

1995.

[14] B. E. Patuwo, M. Y. Hu, G. Zhang, "Forecasting with artificial neural networks: The

state of the art," International Journal of Forecasting , vol. 14, no. 1, pp. 35-62,

March 1998.

[15] B. M. Aleksandra, "Stock market prediction using technical analysis," Economic

 Annals, vol. 51, no. 170, pp. 125-146, 2006.

[16] C. J. Lee, "Fundamental analysis and the stock market," Journal of Business

 Finance and Accounting , vol. 14, no. 1, pp. 131-141, Spring 1987.

[17] R. Lawrence, "Using neural networks to forecast stock market prices," Department

of Computer Science, University of Manitoba, 1997.

[18] M. P. Albuquerque, J. L. Gonzalez, J. T. Cavalcante, E. L. de Faria, "Predicting the

Brazilian stock market through neural networks and adaptive exponential smoothing

methods," Expert Systems with Applications, vol. 36, no. 10, pp. 12506-12509, May

2009.

[19] H. El-Baz, S. Al-Salkhadi, K. Assaleh, "Predicting Stock Prices Using Polynomial

Classifiers: The Case of Dubai Financial Market," Journal of Intelligent Learning

Systems and Applications, vol. 3, no. 2, pp. 82-89, May 2011.

[20] M. Zekic, "Neural network applications in stock market predictions - a methodology

analysis," in Proceedings of the 9th International Conference on Information and

 Intelligent Systems, pp. 255-263, 1998.

[21] I. Kaastra and M. Boyd, "Designing a neural network for forecasting financial andeconomic time series," Neurocomputing , vol. 10, no. 3, pp. 215-236, 1996.

[22] T. Marwala, P. B. Patel, "Forecasting Closing Price Indices Using Neural

 Networks," in IEEE Conference on Systems, Man, and Cybernetics, Taipei, Taiwan,

 pp. 2351-2356, 2006.

Page 80: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 80/115

80

[23] P. North, C. L. Tan J. Yao, "Guidelines for Financial Forecasting with Neural

 Networks," in Proceedings of International Coference on Neural Information

 Processing , 2001.

[24] R. Menezes, D. A Mendes, M. Isfan, "Forecasting the Portuguese stock market time

series by using artificial neural networks," Journal of Physics: Conference Series,

vol. 221, no. 1, January 2010.

[25] J. Roman and A. Jameel, "Backpropagation and Recurrent Neural Networks in

Financial Analysis and Multiple Stock Market Returns," in Proceedings of the 29th

 Annual Hawaii International Conference in System Sciences, Hawaii, pp. 454-460,

1996.

[26] K. J. Desai, N. A. Joshi, A. Juneja, A. R. Dave, J. Desai, "Forecasting of Indian

stock market index S&P CNX Nifty 50 using artificial intelligence,"  Behavioral &

 Experimental Finance eJournal , vol. 3, no. 79, May 2011.

[27] A. Moeini, M. Ahrari, A. Ghafari, M. Mehrara, "Using technical analysis with

neural network for forecasting stock price index in Tehran stock exchange," Middle

 Eatern Finance and Economics, vol. 6, no. 6, pp. 50-61, March 2010.

[28] M. M. Mostafa, "Forecasting stock exchange movements using neural networks:

Empirical evidence from Kuwait," Expert Systems with Applications, vol. 37, no. 9,

 pp. 6302-6309, September 2010.

[29] F. A. de Oliveira, "The use of artificial neural networks in the analysis and prediction of stock prices," in 2011 IEEE International Cnference on Systems, Man,

and Cybernetics, Anchorage, Alaska, pp. 2151-2155, 2011.

[30] C. N. W. Tan and G. E. Witting, "A study of the parameters of a backpropagation

stock price prediction model," in Artificial Neural Networks and Expert Systems,

Dunedin, pp. 288 - 291, 1993.

[31] P. P. Balestrassi et al, "Design of experiments on neural network's training for

nonlinear time series forecasting," Neurocomputing , vol. 72, no. 4 - 6, pp. 1160 -

1178, January 2009.

[32] W. Laosiritaworn and N. Chotchaithanakorn, "Artificial neural networks parameters

optimization with design of experiments: An application in ferromagnetic materials

modeling," Chiang Mai J. Sci., vol. 36, no. 1, pp. 83 - 91, 2009.

Page 81: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 81/115

81

[33] R. Behmanesh and I. Rahimi, "Using combination of optimized recurrent neural

network with design of experiments and regression for control chart forecasting," in

 Business Engineering and Industrial Applications Colloquium, Kuala Lumpur, pp.

435 - 439, 2012.

[34] S. Ghosh-Dastidar and H. Adeli, "Spiking neural networks," International Journal

of Neural Systems, vol. 19, no. 4, pp. 295-308, 2009.

[35] G. Dreyfus, Neural Networks Methodology and Applications. Paris, France:

Springer, 2005.

[36] L. Jin, N. Homma, M. M. Gupta, Static and Dynamic Neural Networks. New Jersey,

USA: John Wiley & Sons, Inc., 2003.

[37] M. Bod´en, "A guide to recurrent neural networks and backpropagation," Halmstad

University, Halmstad, Sweden, 2001.

[38] K. Tan, Z. Yi, H. Tang, Neural Networks: Computational Models and Applications.

Berlin, Germany: Springer, 2007.

[39] W. Baowen, W. Yongmao, F. Yixian, "The stock index forecast based on dynamic

recurrent neural network trained with GA," in 20th Pacific Asia Coference on

 Language, Information and Computation, pp. Y06-1042, 2006.

[40] E. Diaconescu, "The use of NARX neural networks to predict chaotic time series,"

WSEAS Transactions on Computer Research, vol. 3, no. 3, pp. 182-191, March2008.

[41] G. A. Barreto, J. Maria, "Long-term time series prediction with the NARX network:

An empirical evaluation," Neurocomputing , vol. 71, no. 16-18, pp. 3335-3343,

October 2008.

[42] P. C. Soman, "An adaptive NARX neural network approach for financial time series

 prediction," New Brunswick Rutgers, The State University of New Jersey, New

Jersey, Master Thesis 2008.

[43] R. Rojas, Neural Networks. Berlin, Germany: Springer-Verlag, 1996.

[44] I. S. Baruch and C. R. Mariaca-Gaspar, "A Levenberg-Marquart learning applied for

recurrent neural identification and control of a wastewater treatment bioprocess,"

 International Journal of Intelligent Systems, vol. 24, no. 11, pp. 1094-1114,

 November 2009.

Page 82: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 82/115

82

[45] W. Ongsakul, C. Marpaung, Y. Tanoto, "Levenberg-Marquardt recurrent network

for long term electricity peal load forecasting," TELKOMNIKA, vol. 9, no. 2, pp.

257-266, 2011.

[46] A. Burney, T. A. Jilani, C. Ardil, S. Muhammad, "Levenberg-Marquardt algorithm

for Karachi stock exchange share rates forecasting," International Journal of

Computational Intelligence, vol. 1, no. 3, pp. 144-149, Summer 2005.

[47] MathWorks Documentation Center. (2012) www.mathworks.com. [Online].

Available: http://www.mathworks.com/help/nnet/ug/speed-and-memory-

comparison-for-training-multilayer-networks.html. [Accessed: May. 1, 2012].

[48] D. C. Montgomery, Design and Analysis of Experiments, 4th ed. Arizona, United

States of America: John Wiley & Sons, 1997.

[49] G. W. Oehlert, A First Course in Design and Analysis of Experiments. New York,United States of America: W. H. Freeman and Company, 2000.

[50] P. J. Whitcomb and K. Larntz, "The role of pure error on normal probability plots,"

 Annual Quality Congress, Nashville TN , vol. 46, pp. 1223-1229, May 1992.

[51] J. Antony, Design of Experiments. Oxford, United Kingdom: Butterworth-

Heinemann, 2003.

[52] J. R. Turner and J. F. Thayer, Introduction to Analysis of Variance. Thousand Oaks,

California, United States of America: Sage Publicatons, 2001.

[53] G. S. Atsalakis and K. P. Valananis, "Surveying stock market forecasting techniques

- Part II: Soft computing methods," Expert Systems with Applications, vol. 36, no. 3,

 pp. 5932 - 5941, April 2009.

[54] M. Thenmozhi, "Forecasting stock index returns using neural networks," Delhi

 Business Review, vol. 7, no. 2, pp. 59 - 69, December 2006.

[55] S. Chen, "Computationally intelligent agents in eonomics and finance," Information

Sciences, vol. 177, no. 5, pp. 1153-1168, March 2007.

[56] D. Bailey and D. Thompson, "Developing neural networks applications," AI Expert ,

vol. 5, no. 9, pp. 34-41, September 1990.

[57] J. O. Katz, "Developing neural network forecasters for trading," Technical Analysis

of Stocks and Commodities, vol. 10, no. 4, pp. 160-168, April 1992.

Page 83: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 83/115

83

[58] P. K. Padhiary and A. P. Mishra, "Development of Improved Artificial Neural

 Network Model for Stock Market Prediction," International Journal of Engineering

Science and Technology, vol. 3, no. 2, pp. 1576-1581, March 2011.

[59] W. A. Risso, "The Informational Efficiency: The emerging markets versus the

developed markets," Applied Economics Letters, vol. 16, no. 5, pp. 485-487,

February 2009.

[60] P. B. Patel and T. Marwala, "Forecasting Closing Price Indices Using Neural

 Networks," in IEEE Conference on Systems, Man, and Cybernetics, Taipei, Taiwan,

 pp. 2351-2356, 2006.

[61] F. Gunther and S. Fritsch, "neuralnet: Training of neural networks," The R Journal ,

vol. 2, no. 1, pp. 30-38, June 2010.

Page 84: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 84/115

84

Appendix A 

Half normal probability plots for model 1

UAE Jordan

UK Japan 

Turkey Egypt 

Page 85: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 85/115

85

Half normal probability plots for model 2

UAE Jordan

UK Japan 

Turkey Egypt 

Page 86: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 86/115

86

Pareto charts for model 1

UAE Jordan

UK Japan 

Turkey Egypt 

Page 87: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 87/115

87

Pareto charts for model 2

UAE Jordan

UK Japan 

Turkey Egypt 

Page 88: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 88/115

88

Appendix B

ANOVA results for model 1

Response 1 MSE UAE

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 4.71E-07 6 7.84E-08 9.359982 < 0.0001 Significant

A-Type 2.9E-10 1 2.9E-10 0.034582 0.8534

B-H neurons 6.65E-08 1 6.65E-08 7.933485 0.0074

C-Output TF 2.24E-07 1 2.24E-07 26.68182 < 0.0001

D-Mu 7.48E-08 1 7.48E-08 8.932767 0.0047

AB 4.54E-08 1 4.54E-08 5.422323 0.0249

BC 5.99E-08 1 5.99E-08 7.154915 0.0107

Residual 3.44E-07 41 8.38E-09

Lack of Fit 6.62E-08 9 7.35E-09 0.848013 0.5789 not significant

Pure Error 2.77E-07 32 8.67E-09

Cor Total 8.14E-07 47

The Model F-value of 9.36 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case B, C, D, AB, BC are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.85 implies the Lack of Fit is not significant relative to the pure

error. There is a 57.89% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 9.15E-05 R-Squared 0.578015

Mean 0.000477 Adj R-Squared 0.516261C.V. % 19.19469 Pred R-Squared 0.421622

PRESS 4.71E-07 Adeq Precision 10.05344

The "Pred R-Squared" of 0.4216 is in reasonable agreement with the "Adj R-Squared" of 0.5163.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

Page 89: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 89/115

89

ratio of 10.053 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000477 1 1.32E-05 0.00045 0.000504

A-Type -2.5E-06 1 1.32E-05 -2.9E-05 2.42E-05 1

B-H neurons -3.7E-05 1 1.32E-05 -6.4E-05 -1.1E-05 1

C-Output TF -6.8E-05 1 1.32E-05 -9.5E-05 -4.2E-05 1

D-Mu -3.9E-05 1 1.32E-05 -6.6E-05 -1.3E-05 1

AB -3.1E-05 1 1.32E-05 -5.7E-05 -4.1E-06 1

BC 3.53E-05 1 1.32E-05 8.66E-06 6.2E-05 1

Final Equation in Terms of Coded Factors:

MSE UAE =

0.000477

-2.5E-06 * A

-3.7E-05 * B

-6.8E-05 * C

-3.9E-05 * D

-3.1E-05 * A * B

3.53E-05 * B * C

Response 2 MSE Jordan

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 2.12E-07 3 7.06E-08 25.49035 < 0.0001 Significant

B-H neurons 4.58E-08 1 4.58E-08 16.54639 0.0002

C-Output TF 9.34E-08 1 9.34E-08 33.72424 < 0.0001

BC 7.26E-08 1 7.26E-08 26.20043 < 0.0001

Residual 1.22E-07 44 2.77E-09

Lack of Fit 3.47E-08 12 2.89E-09 1.060538 0.4228 not significantPure Error 8.72E-08 32 2.73E-09

Cor Total 3.34E-07 47

The Model F-value of 25.49 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Page 90: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 90/115

90

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case B, C, BC are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 1.06 implies the Lack of Fit is not significant relative to the pure

error. There is a 42.28% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 5.26E-05 R-Squared 0.634767

Mean 0.000388 Adj R-Squared 0.609865

C.V. % 13.55828

Pred R-

Squared 0.565343

PRESS 1.45E-07

Adeq

Precision 10.92589

The "Pred R-Squared" of 0.5653 is in reasonable agreement with the "Adj R-Squared" of 0.6099.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 10.926 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000388 1 7.6E-06 0.000373 0.000404

B-H neurons -3.1E-05 1 7.6E-06 -4.6E-05 -1.6E-05 1

C-Output TF -4.4E-05 1 7.6E-06 -5.9E-05 -2.9E-05 1

BC 3.89E-05 1 7.6E-06 2.36E-05 5.42E-05 1

Final Equation in Terms of Coded Factors:

MSE

Jordan =

0.000388

-3.1E-05 * B

-4.4E-05 * C

3.89E-05 * B * C

Page 91: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 91/115

91

Response 3 MSE UK

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 2.844413 3 0.948138 1.557808 0.2131 not significant

A-Type 0.860515 1 0.860515 1.413843 0.2408

C-Output TF 0.044311 1 0.044311 0.072804 0.7886

AC 1.939587 1 1.939587 3.186778 0.0811

Residual 26.77997 44 0.608636

Lack of Fit 3.760285 12 0.313357 0.435602 0.9364 not significant

Pure Error 23.01969 32 0.719365

Cor Total 29.62439 47

The "Model F-value" of 1.56 implies the model is not significant relative to the noise. There is a

21.31 % chance that a "Model F-value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case there are no significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.44 implies the Lack of Fit is not significant relative to the pure

error. There is a 93.64% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 0.780151 R-Squared 0.096016

Mean 17.48266 Adj R-Squared 0.034381

C.V. % 4.462428

Pred R-

Squared -0.07582

PRESS 31.87038 Adeq Precision 2.974206

 A negative "Pred R-Squared" implies that the overall mean is a better predictor of your

response than the current model.

"Adeq Precision" measures the signal to noise ratio. A ratio of 2.97 indicates an inadequate

signal and we should not use this model to navigate the design space.

Page 92: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 92/115

92

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 17.48266 1 0.112605 17.25572 17.7096

A-Type -0.13389 1 0.112605 -0.36083 0.093047 1

C-Output TF -0.03038 1 0.112605 -0.25732 0.196557 1

AC -0.20102 1 0.112605 -0.42796 0.025923 1

Response 4 MSE Japan

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 1.71E+08 3 56913349 0.951914 0.4238 not significant

B-H neurons 2674279 1 2674279 0.044729 0.8335

C-Output TF 76795384 1 76795384 1.284455 0.2632

BC 91270385 1 91270385 1.526559 0.2232

Residual 2.63E+09 44 59788317

Lack of Fit 1.25E+08 12 10455014 0.133545 0.9997 not significant

Pure Error 2.51E+09 32 78288305

Cor Total 2.8E+09 47

The "Model F-value" of 0.95 implies the model is not significant relative to the noise. There is a

42.38 % chance that a "Model F-value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case there are no significant model terms.Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.13 implies the Lack of Fit is not significant relative to the pure

error. There is a 99.97% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 7732.291 R-Squared 0.060948

Mean 46259.02 Adj R-Squared -0.00308C.V. % 16.71521 Pred R-Squared -0.11755

PRESS 3.13E+09 Adeq Precision 2.368878

 A negative "Pred R-Squared" implies that the overall mean is a better predictor of your

response than the current model.

Page 93: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 93/115

93

"Adeq Precision" measures the signal to noise ratio. A ratio of 2.37 indicates an inadequate

signal and we should not use this model to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIFIntercept 46259.02 1 1116.06 44009.75 48508.29

B-H neurons -236.038 1 1116.06 -2485.31 2013.233 1

C-Output TF 1264.873 1 1116.06 -984.398 3514.144 1

BC -1378.94 1 1116.06 -3628.21 870.3345 1

Response 5 MSE Turkey

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 9.82E-08 3 3.27E-08 3.779082 0.0170 Significant

B-H neurons 6.83E-09 1 6.83E-09 0.789019 0.3792

C-Output TF 4.38E-08 1 4.38E-08 5.056221 0.0296

BC 4.76E-08 1 4.76E-08 5.492005 0.0237

Residual 3.81E-07 44 8.66E-09

Lack of Fit 8.62E-08 12 7.18E-09 0.779256 0.6669 not significant

Pure Error 2.95E-07 32 9.22E-09

Cor Total 4.79E-07 47

The Model F-value of 3.78 implies the model is significant. There is only

a 1.70% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case C, BC are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.78 implies the Lack of Fit is not significant relative to the pure

error. There is a 66.69% chance that a "Lack of Fit F-value" this large could occur dueto noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 9.31E-05 R-Squared 0.204875

Mean 0.000871 Adj R-Squared 0.150662

C.V. % 10.68032 Pred R-Squared 0.053736

PRESS 4.54E-07 Adeq Precision 4.592107

Page 94: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 94/115

94

The "Pred R-Squared" of 0.0537 is in reasonable agreement with the "Adj R-Squared" of 0.1507.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 4.592 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000871 1 1.34E-05 0.000844 0.000898

B-H neurons -1.2E-05 1 1.34E-05 -3.9E-05 1.51E-05 1

C-Output TF -3E-05 1 1.34E-05 -5.7E-05 -3.1E-06 1

BC 3.15E-05 1 1.34E-05 4.41E-06 5.86E-05 1

Final Equation in Terms of Coded Factors:

MSETurkey =

0.000871

-1.2E-05 * B

-3E-05 * C

3.15E-05 * B * C

Response 6 MSE Egypt

ANOVA for selected factorial modelAnalysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 1.88E-07 3 6.28E-08 24.80416 < 0.0001 Significant

B-H neurons 7.11E-08 1 7.11E-08 28.09254 < 0.0001

C-Output TF 5.35E-08 1 5.35E-08 21.1162 < 0.0001

BC 6.38E-08 1 6.38E-08 25.20374 < 0.0001

Residual 1.11E-07 44 2.53E-09

Lack of Fit 3.2E-08 12 2.67E-09 1.074721 0.4120 not significant

Pure Error 7.94E-08 32 2.48E-09

Cor Total 3E-07 47

The Model F-value of 24.80 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case B, C, BC are significant model terms.

Page 95: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 95/115

95

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 1.07 implies the Lack of Fit is not significant relative to the pure

error. There is a 41.20% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 5.03E-05 R-Squared 0.628418

Mean 0.000353 Adj R-Squared 0.603082

C.V. % 14.24897 Pred R-Squared 0.557786

PRESS 1.33E-07 Adeq Precision 10.32057

The "Pred R-Squared" of 0.5578 is in reasonable agreement with the "Adj R-Squared" of 0.6031.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 10.321 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000353 1 7.26E-06 0.000339 0.000368

B-H neurons -3.9E-05 1 7.26E-06 -5.3E-05 -2.4E-05 1

C-Output TF -3.3E-05 1 7.26E-06 -4.8E-05 -1.9E-05 1

BC 3.65E-05 1 7.26E-06 2.18E-05 5.11E-05 1

Final Equation in Terms of Coded Factors:

MSE

Egypt =

0.000353

-3.9E-05 * B

-3.3E-05 * C

3.65E-05 * B * C

Page 96: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 96/115

96

ANOVA results for model 2

Response 1

MSE

UAE

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 3.98E-07 3 1.33E-07 10.29862 < 0.0001 Significant

B-H neurons 7.89E-08 1 7.89E-08 6.117453 0.0173

C-H Sigmoid 2.18E-07 1 2.18E-07 16.9232 0.0002

D-Output TF 1.01E-07 1 1.01E-07 7.855204 0.0075

Residual 5.67E-07 44 1.29E-08

Lack of Fit 1.08E-07 12 8.97E-09 0.624219 0.8058 not significant

Pure Error 4.6E-07 32 1.44E-08

Cor Total 9.66E-07 47

The Model F-value of 10.30 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case B, C, D are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.62 implies the Lack of Fit is not significant relative to the pure

error. There is a 80.58% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 0.000114 R-Squared 0.412518

Mean 0.000542 Adj R-Squared 0.372462

C.V. % 20.96895

Pred R-

Squared 0.300847

PRESS 6.75E-07 Adeq Precision 9.389844

The "Pred R-Squared" of 0.3008 is in reasonable agreement with the "Adj R-Squared" of 0.3725.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 9.390 indicates an adequate signal. This model can be used to navigate the design space.

Page 97: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 97/115

97

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000542 1 1.64E-05 0.000509 0.000575

B-H neurons -4.1E-05 1 1.64E-05 -7.4E-05 -7.5E-06 1

C-H Sigmoid 6.74E-05 1 1.64E-05 3.44E-05 0.0001 1

D-Output TF -4.6E-05 1 1.64E-05 -7.9E-05 -1.3E-05 1

Final Equation in Terms of Coded Factors:

MSE UAE =

0.000542

-4.1E-05 * B

6.74E-05 * C

-4.6E-05 * D

Response 2 MSE Jordan

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 4.81E-07 5 9.61E-08 21.15336 < 0.0001 Significant

B-H neurons 1.22E-07 1 1.22E-07 26.87342 < 0.0001

C-H Sigmoid 2.17E-07 1 2.17E-07 47.65735 < 0.0001

D-Output TF 8.96E-08 1 8.96E-08 19.71369 < 0.0001

BC 3.24E-08 1 3.24E-08 7.136538 0.0107

BD 1.99E-08 1 1.99E-08 4.385786 0.0423

Residual 1.91E-07 42 4.54E-09

Lack of Fit 5.9E-08 10 5.9E-09 1.431933 0.2113 not significant

Pure Error 1.32E-07 32 4.12E-09

Cor Total 6.71E-07 47

The Model F-value of 21.15 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case B, C, D, BC, BD are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

Page 98: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 98/115

98

The "Lack of Fit F-value" of 1.43 implies the Lack of Fit is not significant relative to the pure

error. There is a 21.13% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 6.74E-05 R-Squared 0.715768

Mean 0.00044 Adj R-Squared 0.681931

C.V. % 15.33173 Pred R-Squared 0.628759

PRESS 2.49E-07 Adeq Precision 13.49456

The "Pred R-Squared" of 0.6288 is in reasonable agreement with the "Adj R-Squared" of 0.6819.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 13.495 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.00044 1 9.73E-06 0.00042 0.000459

B-H neurons -5E-05 1 9.73E-06 -7E-05 -3.1E-05 1

C-H Sigmoid 6.72E-05 1 9.73E-06 4.75E-05 8.68E-05 1

D-Output TF -4.3E-05 1 9.73E-06 -6.3E-05 -2.4E-05 1

BC -2.6E-05 1 9.73E-06 -4.6E-05 -6.4E-06 1

BD 2.04E-05 1 9.73E-06 7.41E-07 4E-05 1

Final Equation in Terms of Coded Factors:

MSE

Jordan =

0.00044

-5E-05 * B

6.72E-05 * C

-4.3E-05 * D

-2.6E-05 * B * C

2.04E-05 * B * D

Page 99: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 99/115

99

Response 3 MSE UK

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 1.26E-06 9 1.4E-07 8.883877 < 0.0001 Significant

A-Type 2.12E-09 1 2.12E-09 0.134247 0.7161

B-H neurons 4.72E-09 1 4.72E-09 0.298864 0.5878

C-H Sigmoid 5.26E-07 1 5.26E-07 33.31536 < 0.0001

D-Output TF 2.81E-07 1 2.81E-07 17.79732 0.0001

E-Mu 5.41E-10 1 5.41E-10 0.034299 0.8541

AB 1.44E-07 1 1.44E-07 9.147887 0.0044

AE 6.77E-08 1 6.77E-08 4.289511 0.0452

BE 7.03E-08 1 7.03E-08 4.452347 0.0415

CD 1.65E-07 1 1.65E-07 10.48506 0.0025

Residual 6E-07 38 1.58E-08

Lack of Fit 9.92E-08 6 1.65E-08 1.057341 0.4080 not significant

Pure Error 5E-07 32 1.56E-08

Cor Total 1.86E-06 47

The Model F-value of 8.88 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case C, D, AB, AE, BE, CD are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 1.06 implies the Lack of Fit is not significant relative to the pure

error. There is a 40.80% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 0.000126 R-Squared 0.677843

Mean 0.000915 Adj R-Squared 0.601543

C.V. % 13.7326 Pred R-Squared 0.485977PRESS 9.57E-07 Adeq Precision 10.00429

The "Pred R-Squared" of 0.4860 is in reasonable agreement with the "Adj R-Squared" of 0.6015.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 10.004 indicates an adequate signal. This model can be used to navigate the design space.

Page 100: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 100/115

100

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000915 1 1.81E-05 0.000878 0.000951

A-Type -6.6E-06 1 1.81E-05 -4.3E-05 3.01E-05 1

B-H neurons -9.9E-06 1 1.81E-05 -4.7E-05 2.68E-05 1

C-H Sigmoid 0.000105 1 1.81E-05 6.79E-05 0.000141 1

D-Output TF -7.6E-05 1 1.81E-05 -0.00011 -4E-05 1

E-Mu 3.36E-06 1 1.81E-05 -3.3E-05 4.01E-05 1

AB -5.5E-05 1 1.81E-05 -9.2E-05 -1.8E-05 1

AE 3.76E-05 1 1.81E-05 8.47E-07 7.43E-05 1

BE -3.8E-05 1 1.81E-05 -7.5E-05 -1.6E-06 1

CD -5.9E-05 1 1.81E-05 -9.5E-05 -2.2E-05 1

Final Equation in Terms of Coded Factors:

MSE UK =

0.000915

-6.6E-06 * A

-9.9E-06 * B

0.000105 * C

-7.6E-05 * D

3.36E-06 * E

-5.5E-05 * A * B

3.76E-05 * A * E-3.8E-05 * B * E

-5.9E-05 * C * D

Page 101: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 101/115

101

Response 4 MSE Japan

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 6.74E-07 8 8.43E-08 13.05563 < 0.0001 significant

A-Type 7.28E-09 1 7.28E-09 1.127311 0.2949

B-H neurons 2.34E-09 1 2.34E-09 0.362738 0.5505

C-H Sigmoid 3.01E-07 1 3.01E-07 46.60721 < 0.0001

D-Output TF 1.87E-07 1 1.87E-07 29.02911 < 0.0001

E-Mu 8.01E-09 1 8.01E-09 1.240255 0.2722

AE 6.7E-08 1 6.7E-08 10.38049 0.0026

BC 2.62E-08 1 2.62E-08 4.052321 0.0511

CD 7.52E-08 1 7.52E-08 11.64557 0.0015

Residual 2.52E-07 39 6.45E-09

Lack of Fit 5.31E-08 7 7.58E-09 1.221579 0.3199 not significant

Pure Error 1.99E-07 32 6.21E-09

Cor Total 9.26E-07 47

The Model F-value of 13.06 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case C, D, AE, CD are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),model reduction may improve your model.

The "Lack of Fit F-value" of 1.22 implies the Lack of Fit is not significant relative to the pure

error. There is a 31.99% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 8.03E-05 R-Squared 0.728119

Mean 0.001159 Adj R-Squared 0.672348

C.V. % 6.933846 Pred R-Squared 0.588156

PRESS 3.81E-07 Adeq Precision 11.05873

The "Pred R-Squared" of 0.5882 is in reasonable agreement with the "Adj R-Squared" of 0.6723.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 11.059 indicates an adequate signal. This model can be used to navigate the design space.

Page 102: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 102/115

102

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.001159 1 1.16E-05 0.001135 0.001182

A-Type 1.23E-05 1 1.16E-05 -1.1E-05 3.58E-05 1

B-H neurons -7E-06 1 1.16E-05 -3E-05 1.65E-05 1

C-H Sigmoid 7.92E-05 1 1.16E-05 5.57E-05 0.000103 1

D-Output TF -6.2E-05 1 1.16E-05 -8.6E-05 -3.9E-05 1

E-Mu -1.3E-05 1 1.16E-05 -3.6E-05 1.05E-05 1

AE -3.7E-05 1 1.16E-05 -6.1E-05 -1.4E-05 1

BC 2.33E-05 1 1.16E-05 -1.1E-07 4.68E-05 1

CD -4E-05 1 1.16E-05 -6.3E-05 -1.6E-05 1

Final Equation in Terms of Coded Factors:

MSE Japan =

0.001159

1.23E-05 * A

-7E-06 * B

7.92E-05 * C

-6.2E-05 * D

-1.3E-05 * E

-3.7E-05 * A * E

2.33E-05 * B * C

-4E-05 * C * D

Response 5 MSE Turkey

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 4.63E-07 3 1.54E-07 15.16422 < 0.0001 significant

A-Type 7.47E-08 1 7.47E-08 7.345702 0.0095

B-H neurons 7.31E-08 1 7.31E-08 7.18025 0.0103

C-H Sigmoid 3.15E-07 1 3.15E-07 30.9667 < 0.0001

Residual 4.48E-07 44 1.02E-08

Lack of Fit 1.21E-07 12 1E-08 0.983079 0.4849 not significant

Pure Error 3.27E-07 32 1.02E-08

Cor Total 9.11E-07 47

The Model F-value of 15.16 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Page 103: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 103/115

103

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case A, B, C are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.

If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 0.98 implies the Lack of Fit is not significant relative to the pure

error. There is a 48.49% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 0.000101 R-Squared 0.50834

Mean 0.000951 Adj R-Squared 0.474817

C.V. % 10.6033 Pred R-Squared 0.414883

PRESS 5.33E-07 Adeq Precision 10.95467

The "Pred R-Squared" of 0.4149 is in reasonable agreement with the "Adj R-Squared" of 0.4748.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 10.955 indicates an adequate signal. This model can be used to navigate the design space.

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000951 1 1.46E-05 0.000922 0.000981

A-Type 3.95E-05 1 1.46E-05 1.01E-05 6.88E-05 1

B-H neurons -3.9E-05 1 1.46E-05 -6.8E-05 -9.7E-06 1

C-H Sigmoid 8.1E-05 1 1.46E-05 5.17E-05 0.00011 1

Final Equation in Terms of Coded Factors:

MSE Turkey =

0.000951

3.95E-05 * A

-3.9E-05 * B

8.1E-05 * C

Page 104: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 104/115

104

Response 6 MSE Egypt

ANOVA for selected factorial model

Analysis of variance table [Partial sum of squares - Type III]

Sum of Mean F p-value

Source Squares df Square Value Prob > F

Model 1.13E-06 9 1.26E-07 32.1528 < 0.0001 significant

A-Type 5.91E-08 1 5.91E-08 15.11997 0.0004

B-H neurons 1.45E-07 1 1.45E-07 37.15615 < 0.0001

C-H Sigmoid 4.96E-07 1 4.96E-07 126.8067 < 0.0001

D-Output TF 2.18E-07 1 2.18E-07 55.73339 < 0.0001

E-Mu 1.76E-08 1 1.76E-08 4.504311 0.0404

AC 5.65E-08 1 5.65E-08 14.44691 0.0005

AE 5.94E-08 1 5.94E-08 15.20215 0.0004

BC 3.02E-08 1 3.02E-08 7.714965 0.0085

CD 4.96E-08 1 4.96E-08 12.69073 0.0010

Residual 1.49E-07 38 3.91E-09

Lack of Fit 3.22E-08 6 5.37E-09 1.476897 0.2175 not significant

Pure Error 1.16E-07 32 3.64E-09

Cor Total 1.28E-06 47

The Model F-value of 32.15 implies the model is significant. There is only

a 0.01% chance that a "Model F-Value" this large could occur due to noise.

Values of "Prob > F" less than 0.0500 indicate model terms are significant.

In this case A, B, C, D, E, AC, AE, BC, CD are significant model terms.

Values greater than 0.1000 indicate the model terms are not significant.If there are many insignificant model terms (not counting those required to support hierarchy),

model reduction may improve your model.

The "Lack of Fit F-value" of 1.48 implies the Lack of Fit is not significant relative to the pure

error. There is a 21.75% chance that a "Lack of Fit F-value" this large could occur due

to noise. Non-significant lack of fit is good -- we want the model to fit.

Std. Dev. 6.25E-05 R-Squared 0.883925

Mean 0.000453 Adj R-Squared 0.856434

C.V. % 13.81337 Pred R-Squared 0.814795PRESS 2.37E-07 Adeq Precision 16.10742

The "Pred R-Squared" of 0.8148 is in reasonable agreement with the "Adj R-Squared" of 0.8564.

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. Your

ratio of 16.107 indicates an adequate signal. This model can be used to navigate the design space.

Page 105: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 105/115

105

Coefficient Standard 95% CI 95% CI

Factor Estimate df Error Low High VIF

Intercept 0.000453 1 9.03E-06 0.000434 0.000471

A-Type 3.51E-05 1 9.03E-06 1.68E-05 5.34E-05 1

B-H neurons -5.5E-05 1 9.03E-06 -7.3E-05 -3.7E-05 1

C-H Sigmoid 0.000102 1 9.03E-06 8.34E-05 0.00012 1

D-Output TF -6.7E-05 1 9.03E-06 -8.6E-05 -4.9E-05 1

E-Mu -1.9E-05 1 9.03E-06 -3.7E-05 -8.8E-07 1

AC 3.43E-05 1 9.03E-06 1.6E-05 5.26E-05 1

AE -3.5E-05 1 9.03E-06 -5.3E-05 -1.7E-05 1

BC -2.5E-05 1 9.03E-06 -4.3E-05 -6.8E-06 1

CD -3.2E-05 1 9.03E-06 -5E-05 -1.4E-05 1

Final Equation in Terms of Coded Factors:

MSEEgypt =

0.000453

3.51E-05 * A

-5.5E-05 * B

0.000102 * C

-6.7E-05 * D

-1.9E-05 * E

3.43E-05 * A * C

-3.5E-05 * A * E

-2.5E-05 * B * C

-3.2E-05 * C * D

Page 106: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 106/115

106

Appendix C

Model Diagnosis

After conducting the ANOVA test, it is necessary to diagnose the model for each

market to make sure that the assumptions of the ANOVA are met. These two

assumptions are:

  The residuals are normally distributed:

o  This is checked from the normal probability plot of the residuals. For the

assumptions to be valid, most points should be on or close to the normal line.

After checking the normal plots generated by Minitab, it is found that the model

is valid for all 6 markets.

  The variance of residuals is constant : There are two ways to check the constant

variance of residuals, these are:

o  From the predicted vs. residual plot. The points in this graph should be randomly

scattered around the mean 0 with no clear pattern. The graphs generated using

Minitab are checked and no patterns are apparent for any of the responses of the

model. To further check the assumption of constant variance, two tests for

constant variance are used:

  Bartlett‟s test with 95% confidence interval. The null hypothesis for this test

is Ho: variance is constant. The null hypothesis should not be rejected for the

assumption to be valid; this means that the result of the test must be > 0.05.

Bartlett‟s test assumes that data are normally distributed, and since this

assumption is already checked to be true, it can be said that the results of

Bartlett‟s test are valid. According to the results of this test, all model

responses pass this test.  Levene‟s test with 95% confidence interval. The null hypothesis for this test

is Ho: variance is constant. The null hypothesis should not be rejected for the

assumption to be valid; this means that the result of the test must be > 0.05 .

This test has no assumptions to be checked for. According to the results of

this test, all model responses pass this test.

Page 107: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 107/115

107

Model 1 Diagnosis

 Normal Probability Plot Check for constant variance

UAE

Jordan

UK

Japan

Turkey

Page 108: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 108/115

108

Egypt

Page 109: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 109/115

109

Page 110: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 110/115

110

Page 111: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 111/115

111

Model 2 Diagnosis

 Normal Probability Plot Check for constant variance

UAE

Jordan

UK

Japan

Turkey

Page 112: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 112/115

112

Egypt

Page 113: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 113/115

113

Page 114: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 114/115

114

Page 115: 35.232-2013.15  Assia Lasfer

7/18/2019 35.232-2013.15 Assia Lasfer

http://slidepdf.com/reader/full/35232-201315-assia-lasfer 115/115

VITA

Assia Hanafi Lasfer was born on July 4, 1989, in Oran, Algeria. She was educated

in the United Arab Emirates at Al-Nahda national schools and graduated from there in

2006. She received her Bachelor degree in Computer Engineering with a Cum Laude

honors from the American University of Sharjah, Sharjah, United Arab Emirates, in

2010.

Assia continued her education in the American University of Sharjah and joined

the Master‟s program in Engineering Systems Management while working as a research

assistant and teaching assistant of statistics. She was awarded the Master of Science

degree in Engineering Systems Management in 2012.