Using Statistics to understand Stock Market with forecasting

Using Statistics to Understand Stock Markets

SummaryThis term paper outlines and demonstrates use the Statistics in understanding ten year stock

market prices and returns from fifty countries using basic statistics, linear regression and time series forecasting models.

KeywordsTerm paper, Statistics, Linear Regression, Average, Logarithmic Transformation, Kurtosis, F-Test for Linear Regression, T-Test for Linear regression, P-value, Partial Linear Regression,

Exponential Time Series Forecasting, Moving Averages Forecasting

Page 2 of 27

Contents:

1. Abstract

2. Introduction

3. Basic Statistics:

4. Correlation

5. Linear Regression Model

5.1 Introduction.5.2 Predicting the Dependent Variable Y.5.3 Coefficient of Mutliple Determination.5.4 Test for the Significance of the Overall Multiple Regression

Model.5.6 Residual Analysis.5.7 Inferences Concerning the Population Regression Coefficients.5.8 Line Plots.5.9 Further Analysis.5.10 Multiple Linear Regression Conclusions.

6. Forecasting

6.1 Models.6.2 Forecast Error Measures.6.3 Charts for Forecast Models.

7. Conclusion

Appendix

A. References.

B. List of Excel files used for calculations.

1. Abstract

Page 3 of 27

In this paper we demonstrated the use of statistical concepts to understand stock market returns for fifty different countries. The data is monthly closing prices of MCSI country index in US dollars.

We created basic statistics for the set of fifty countries stock market monthly returns. We showed usage of linear regression to relate returns of Finland first to the stock markets of the top seven economies of the world, then to seven bigger neighbors and in the end to eight other important economies. In the end we demonstrated the use of time series prediction to predict returns of Austrian stock index prices. Excel and PHStat were tools of our choice.

2. Introduction

Most of us have some relationship with the stock market. We either have direct stock holdings or have indirect stock holdings through mutual funds in retirement plans. Stock markets are where investors invest in companies big and small. Stock market is a direct indicator economical environment. Hence understanding stock markets is helpful for managers while making investment and capital expenditure decisions. Typically stock brokers, stock investors, government officials and banks are interested in stock market data and analysis.

We had with us monthly MSCI index prices for fifty countries. The index prices are not distributed normally but the returns are. It is not preferable to work directly with the price series for performing any statistical analysis. The raw price series are converted into series of returns. Additionally, returns have the added benefit that they are unit-free and currency free, allowing comparisons to be done across markets.

There are two methods used to calculate returns from a series of prices, and these involve the formation of simple returns, and continuously compounded returns, which are achieved as follows:

Simple returns = ( ( − )/( ) ) × 100% Continuously compounded returns: = 100% × ln ( / )

where:

denotes the simple return at time t denotes the continuously compounded return at time t denotes the asset price at time t

ln denotes the natural logarithm.

We used continuously compounded returns using logarithm transformation for our analysis. We used Excel and PHStat as our tools to do our Statistical analysis.

Page 4 of 27

3. Basic Statistics:

We found out which countries on average gave most returns over ten year period. Countries Colombia, Czech Republic, Austria and Denmark were on the top four with 18.68%, 18.22%, 17.98% and 14.89% annual compounded returns for the nine plus year period of 10/31/1997 to 7/31/2007.

We were able to easily identify that while Colombia, Czech Republic, Austria and Denmark gave investors the biggest returns, it was Denmark and Austria which provided lesser volatility among these 4 countries. The standard deviation of Denmark and Austria were 17.44 and 23.99 compared to 32.16 and 40.72 for Czech Republic and Columbia.

The chart below shows the average rate of returns against the standard deviation.

Investing in the stock market always bears some risk, large or small, depending on the volatility of the stock price. A stock that has large volatility may make give higher or negative returns depending on when the investor enters and exits out of the stock holdings.

We found that Russia, Turkey, Indonesia has highest volatility whereas USA, United Kingdom and Canada are mostly stable (as of Sep 2007).

We made an attempt to understand the statistical measure Kurtosis. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations. Kurtosis measures whether the data is sharp or flat relative to a normal distribution. Since Kurtosis measures the shape of the distribution (the fatness of the tails), it focuses on how returns are ranged around the mean. A Kurtosis coefficient of three indicates a normal distribution.

Page 5 of 27

Kurtosis of less than three indicates a low peak with a fat midrange on either side (platykurtic). Conversely, Kurtosis greater than three indicates a sharp/high peak with a thin midrange and fat tails (leptokurtic). Therefore, to put simply, Kurtosis describes how bunched around the center or spread at the endpoints a frequency distribution is. Sometimes Kurtosis is also called "the volatility of volatility."

The chart below shows kurtosis in increasing order against average monthly returns with its standard deviation. It is Russia which has the largest standard deviation and Kurtosis, meaning Russia stock market is most volatile among 50 countries.

4. Correlation

We know that relationship between two variables is expressed through correlation. We used Excel to create a matrix of 50 X 50 correlation relationship with Excel which is showed in the table. We found that Netherlands and France had the value of 0.886035, which was one of the highest correlations. In the other hand, we found Pakistan and Denmark with correlation value of -0.19975.

Any investment requires diversification so that the investing risks are spread among unrelated investments. Correlation could be good tool to find diversification. A quick glance at the correlation matrix table identify on how to diversify. For example investment in France and Netherlands, with correlation coefficient of 0.886, will lead to same type of returns or risks.

5. Linear Regression Model

Page 6 of 27

IntroductionWe picked Finland for creating linear regression models. Finland is 33rd

among the top 50th economies of the world. The top 7 economies are:

United States Japan Germany China United Kindom France Italy

(Source: The Economist Pocket World Figures, 2009 edition)

Multiple regression Model with k independent variables is given as:

In our case we have 7 independent variables, so the model to be developed is:

Here , for j= 1 to 7, are monthly rate of returns for the seven biggest countries of

the world. is the estimated rate of return for Finland for which the linear model was to be created. We first used Microsoft Excel to compute the values of the eight regression coefficients.

Output of Microsoft Excel Multiple Regression Analysis

From above figure, the computed values of the regression coefficients are

Therefore the multiple regression equation is:

Page 7 of 27

The sample Y intercept ( ) estimates the return of Finland stock market when returns of all other seven stock markets are zeroes. Because the

stock returns cannot be zeroes for all markets at same time, the value of has no practical interpretation.

The slope of rate of return with US rate indicates that for a given amount of rate of return for US, the Finland rate of return is going to decrease

by 0.14045 times. The estimates of all allowed us to better understand the effect of the rate of returns of biggest seven economies on Finland.

Regression coefficients in multiple regression are called net regression coefficients. They estimate the mean change in Y per unit change in a particular X, holding constant the effect of other X variables.

Predicting the Dependent Variable Y

We used the multiple regression equation to predict the value of the dependent variable. We took monthly rate of return from Feb 27, 1998 for all seven countries and found that model predicted range of 1.43428 to 8.31292 at 95% confidence level. The actual value of dependent variable or Finland’s monthly rate of return was 9.80996. We used PHSTAT’s Confidence interval estimate and prediction interval function to arrive at range, shown in the table below.

The table below shows expected the output of Finland’s monthly return at 95% confidence level on Feb 27, 1998 with given rates of returns for the seven countries:

DataConfidence Level 95%USA given value 6.287008JAPAN given value 1.1237GERMANY given value 4.412191CHINA given value 30.57856UNITED KINGDOM given value 5.309952FRANCE given value 8.189638ITALY given value 6.306493

For Average Predicted Y (YHat)Interval Half Width 3.439322Confidence Interval Lower Limit 1.434281Confidence Interval Upper Limit 8.312925

Coefficient of Mutliple Determination

The coefficient of multiple determination is equal to the regression sum of square (SSR) divided by the total sum of squares (SST):

= Regression Sum of Squares / Total Sum of Squares = SSR /SST

Page 8 of 27

For Finland’s monthly rate of return we have =3299.203 / 7011.618 = 0.47053

Excel also created the same result for us when we did regression analysis:

SUMMARY OUTPUTRegression Statistics

Multiple R 0.685955R Square 0.470534Adjusted R Square 0.43684Standard Error 5.809409Observations 118

The coefficient of multiple determination ( = 0.47053) indicates that 47.05% of the variation in Finland’s rate of return is explained by the rate of returns of the seven biggest economies. One would have expected higher correlation but this is not the case here.

We also looked at PHStat output where for was calculated six times,

removing one of the each time.

Condition Which Variable RemovedAll but Italy 0.47041All but France 0.46662All but United Kingdom 0.46382All but China 0.47012All but Germany 0.43121All but Japan 0.44104All but USA 0.46883

Test for the Significance of the Overall Multiple Regression Model

We performed the significance test of the overall multiple regression model using F-test. Here we tried to find if there is a significant relationship between the dependent variable and the entire set of independent variables. Since there is more than one independent variable, we used the following null and alternate hypothesis:

Page 9 of 27

: = 0 (There is no linear relationship between the dependent variable and the independent variables.)

: At least one 0, j= 1, 2, ….7 (There is linear relationship between the dependent variable and at least one of the independent variables.)

The overall F Test Statistic is equal to the regression mean square (MSR) divided by the error mean square

F = MSR/MSE

Where: F= test statistic from an F distribution with k and n – k – 1 degrees of freedomk=number of independent variables in the regression model

n=number of samples uses to create the regression model

ANOVA table for our model

Df SS MS FSignificance

F

Regression 7 3299.203009471.314715

5 13.96519865 7.54666E-13

Residual 110 3712.41541333.7492310

3Total 117 7011.618422

The decision rule is to reject : at the α level of significance if F >

otherwise, do not reject . Using a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately 2.02. From figure above the F statistic is 13.9652.

Because 13.9652 > 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns of seven biggest economies) is related to Finland rate of return.

Figure showing Significance of the Overall Multiple Regression Model F-Test

Residual Analysis

We evaluated the appropriateness of using the multiple linear regression model using residual analysis. We created the seven residual plots using Excel

Page 10 of 27

along with residuals for expected Y. From these charts we saw that the pattern is random for all the charts and use linear regression was appropriate in this case.

Inferences Concerning the Population Regression Coefficients

To determine the existence of a significant linear effect on y (Finland’s rate of

return) and independent variable (the monthly rate of return of one of the biggest seven economies) the null and the alternate hypotheses are:

Page 11 of 27

: (There is no linear relationship): (There is a linear relationship)

The t-statistic equals the difference between the sample slope and the hypothesized value of the population slope divided by the standard error of the slope:

t = ( ) / ( ) where

= slope of variable j with Y, holding constant the effects of the other independent variables.

= Standard Error of the regression coefficient t = test statistic for a t distribution with n – k – 1 degrees of freedom

k = number of independent variables

= hypothesized value of the population for variable j, holding constant the effects of the other independent variables.

The table below summarized our findings. We used 95% confidence levels, and for p-value > 0.05 null hypothesis was accepted.

Country t StatCritical t

Is t-stat in area of non-rejection

Null Hypothesis p-value > 0.05

USA -0.59559-

1.9799 Yes Accepted 0.552673 Yes

JAPAN 2.47559 1.9799 No Rejected 0.014826 No

GERMANY 2.85814 1.9799 No Rejected 0.005098 No

CHINA 0.27974 1.9799 Yes Accepted 0.780206 YesUNITED KINGDOM 1.18083 1.9799 Yes Accepted 0.240219 Yes

FRANCE 0.90193 1.9799 Yes Accepted 0.369066 Yes

ITALY 0.15824 1.9799 Yes Accepted 0.874559 Yes

We found that only Japan and Germany have the biggest contribution to Y value or Finland’s rate of return.

We decided to perform PHStat Stepwise Regression to confirm the findings above. We found that the PHStat’s Forward Selection Function Best Model Fit

Page 12 of 27

selected only Germany and Japan as significant dependent variables. This confirmed what we found earlier. So the liner equation reduced to:

PHStat Stepwise Regression Analysis TableFinland and 7 biggest economiesTable of Results for Forward SelectionGERMANY entered.

df SS MS FSignificance

F

Regression 1 2745.549597 2745.54959774.6550902

9 3.55615E-14Residual 116 4266.068825 36.77645538Total 117 7011.618422

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept-

0.034545387 0.562181166-

0.0614488510.95110749

7-

1.1480159851.07892521

1

GERMANY 0.730094046 0.084498518 8.640317719 3.55615E-14 0.5627340890.89745400

3JAPAN entered.

df SS MS FSignificance

FRegression 2 3150.704997 1575.352499

46.92297325 1.2593E-15

Residual 115 3860.913425 33.57316021Total 117 7011.618422

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept-

0.117655288 0.537672497-

0.218823333 0.82717555 -1.182680990.94737041

4

GERMANY 0.66036799 0.083192301 7.937849769 1.51106E-12 0.4955800580.82515592

3

JAPAN 0.351600388 0.101212614 3.4738791360.00072409

4 0.151117685 0.55208309

No other variables could be entered into the model. Stepwise ends.

Line Plots

We created line plots for Finland for each of independent variables:

Page 13 of 27

Further Analysis

At point we wondered if the Finland’s stock market was correlated more to the bigger economies in its neighborhood. We picked following countries to do further analysis:

Page 14 of 27

Country Economic Rank Country Economic RankRussia 11 Poland 24Netherlands 16 Norway 25Belgium 18 Denmark 28Sweden 19

We obtained following Excel output:

Here =0.596 told us that Finland’s stock market has better linear relationship to its seven neighboring countries than the biggest seven economies of the world.

Again with a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately

2.02. The F statistic is 23.1811. Because 23.1811 > 2.02, we rejected and found enough statistical proof to conclude that at least one of the independent variables (rates of returns of seven neighboring economies) is related to Finland’s rate of return.

Hence the multiple linear regression equation is:

Again we created the t-test table and found that only Sweden and Poland have the biggest contribution to Y value or Finland’s rate of return.

Page 15 of 27




RUSSIA0.5012210

6

-1.979

9 Yes Accepted0.61721

7 YesNETHERLANDS -0.1346789

1.9799 No Accepted

0.893112 Yes

BELGIUM0.8745872

41.979

9 No Accepted0.38370

4 Yes

SWEDEN4.7668807

61.979

9 Yes Rejected 5.77E-06 No

POLAND 4.2295021.979

9 Yes Rejected 4.86E-05 No

NORWAY -0.3460091

-1.979

9 Yes Accepted0.72999

7 Yes

DENMARK -0.672525

-1.979

9 Yes Accepted 0.50266 Yes

With this new insight, we changed the linear equation to:

Once again we performed PHStat Stepwise Regression to confirm the findings and we found the same, that Finland is more related to Sweden and Poland.

Page 16 of 27

The findings above got us curious. We wanted to see if Finland’s rate of return had any linear relationship with other countries in Asia Pacific and Latin America. We also decided kept four countries which had linear relationship in our earlier analysis and added other eight which were not considered before. These countries were picked (the one’s in bold were picked before):

ARGENTINA INDONESIA CHINA POLAND

AUSTRALIA JAPANGERMANY SWEDEN

BRAZIL MEXICO INDIASINGAPORE

We obtained following Excel output:

Page 17 of 27

We found that =0.62 told us that Finland’s stock market has better linear relationship to twelve countries picked in the list.

Again with a 0.05 level of significance and the F statistic being 14.715 which

is greater than 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns from twelve countries) is related to Finland rate of return.

We created the t-test table and found that only Sweden, Poland, China and Australia had linear relationship to Y value or Finland’s rate of return.

Page 18 of 27




ARGENTINA -0.52605 -1.9799 Yes Accepted 0.59996518 Yes

AUSTRALIA 2.03623 1.9799 No Rejected0.04424444

5 No

BRAZIL 0.627811 1.9799 Yes Accepted0.53149049

2 Yes

CHINA -2.08348 -1.9799 No Rejected0.03963512

3 No

GERMANY 0.649476 1.9799 Yes Accepted0.51744923

9 Yes

INDIA 0.892593 1.9799 Yes Accepted0.37411610

5 Yes

INDONESIA 0.109509 1.9799 Yes Accepted0.91300748

7 Yes

JAPAN 0.422115 1.9799 Yes Accepted0.67380430

7 Yes

MEXICO 0.127667 1.9799 Yes Accepted 0.89865689 Yes

POLAND 3.339558 1.9799 No Rejected0.00116232

7 No

SWEDEN 3.297052 1.9799 No Rejected0.00133379

4 No

SINGAPORE 0.282879 1.9799 Yes Accepted0.77782614

7 Yes

Page 19 of 27

Again PHStat Stepwise Regression confirmed the same findings and we found the same that Finland is related linearly to Sweden, Poland, China and Australia.

Multiple Linear Regression Conclusions

We found that regression can be very good tool to model stock market returns and find relationship among different market returns. With any statistical analysis there is always going to be uncertainty, and this needs to be kept in mind while making all investing decisions.

Page 20 of 27

We were able to find that Finland’s rate of return was related to rates of returns of these countries: Sweden, Poland, China and Australia. In other way to understand this will be that these five countries present similar investment risks.

6. Forecasting

We exercised forecasting modeling techniques with the Austria’s stock index prices. First we plotted the monthly rate of return against time.

Here we found the monthly returns are non seasonal has no consistent

upward and downward trend. In this case the use of exponential and moving average smoothing models for forecasting purposes was most appropriate.

ModelsWe used following forecasting techniques to create our forecasting modes:

1. First Order Naïve:

2. 2nd Order Naïve:

3. 3 Period Moving averages:

4. 4 Period Moving averages:

5. 5 Period Moving averages: 6. Exponential Smoothing Forecast with ω = 0.758:

7. Exponential Smoothing Forecast with ω = 0.2

Page 21 of 27

Forecast Error Measures

We created these seven models using Excel. After creating forecasted series we calculated Forecast Error Measures. These are given as:

Bias

Average error=∑t=1

n

et

nVariability:

Mean squared error MSE=∑t=1

n

et2

n−1

Standard deviation s =√MSE

Mean absolute error MAD

=∑t=1

n

|et|

n−1We calculated the error measures and here they are:

Date

FNAIV Error

2nd NAIV Error

F_MA_3 Error

F_MA_4 Error

F_MA_5 Error

EXP_0.758 Error

Exp_0.2 Error

Average Error

-0.11329

-0.08389

-0.08465

-0.11619

-0.25684 -0.00982

0.237657

MSE91.5818

991.6208

9 63.332958.7538

757.7495

7 73.8986554.2746

5

SE9.56984

3 9.571887.95819

77.66510

77.59931

4 8.5964327.36713

3

MAD7.17234

57.07788

45.53094

85.43828

75.55145

1 6.1669965.39645

3

Page 22 of 27

We found that the exponential Smoothing Forecast with ω = 0.758 had least bias with average error of -0.00982, whereas exponential Smoothing Forecast with ω = 0.2 had least variability. The Mean absolute error (MAD) was least for the 4 Period Moving averages model but exponential Smoothing Forecast with ω = 0.758 also had MAD close to the moving averages model.

We created charts for all these models with actual vs forecasted models to visually show the different models used for forecasting.

Charts for Forecast Models

First Order Naïve Forecast Model Chart (Blue Actual, Red Forecasted)

Second Order Naïve Forecast Model Chart (Blue Actual, Red Forecasted)

3 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)

Page 23 of 27



Exponential Smoothing Forecast Chart (with ω=0.758, Blue Actual, Red Forecasted)

Page 24 of 27

Exponential Smoothing Forecast Chart (with ω=0.2, Blue Actual, Red Forecasted)

7. ConclusionIn this paper we used different Statistical aspects to analyze international

stock market returns. The study as such could be very exhaustive, but in our limited scope, we successfully demonstrated the use of basic Statistics, Linear Regression and Time Forecasting.

Page 25 of 27

Appendix

A. References1. Levine, Stephan, Krehbiel, Berenson Statistics for Managers Using

Microsoft Excel, 5th Edition, 2008.2. Anderson, Sweeney Williams, Essentials of Modern Business Statistics,

4th Edition, 20093. Statistical terms at http://en.wikipedia.org/4. Class notes.

B. List of Excel files used calculation

1. country_data_in_pc.xlsm is the main excel file with following worksheets:a. The base data for fifty countries monthly rate of return is in

HistoryIndex worksheet.b. MarketReturns worksheet transforms HistoryIndex worksheet to log

returns in percentages using formula =100*LN(HistoryIndex!B11/HistoryIndex!B10) It also has following basic statistics for each country:

Monthly Avreage Return Monthly VarianceMonthly Std. Deviation Yearly Avg ReturnYearly Std. Dev Coefficient of VarianceSkewness Kurtosis

c. CorrelationMatrix worksheet has the 50 X 50 correlation matrix table for 50 countries.

d. Histograms worksheet has histograms for all countries rate of returns including their frequency tables.

e. Worsheet BasicStatsUsingExcel contains once again basic statistics but calculated using Data Analysis Descriptive Statistics Tools,

Page 26 of 27

http://en.wikipedia.org/

instead of using function and calculations for each of statistical measures in b.

f. Line diagram for average yearly rates of return with standard deviation is in YRLY_RETURN_CHART_BY_COUNTRIES worksheet.

g. Kurtosis_Chart workseet has monthly Kurtosis measure plotted with monthly average returns and standard deviation.

h. List of all countries in one column is in CountryNames work sheet.

2. Correlationmatrix.xlsx has 50 X 50 correlation matrix table for 50 countries CorrelationMatrix.

3. Finland_n_other_7_big_economies.xlsx has Finland’s regression calculation along with residual plots and line diagrams in the first worksheet Finland_Regression_Model. Other work sheets contain output from PHStat.

4. File finland_vs_7_Neighboring _big_countries.xlsx has Finland’s regression calculations with seven biggest neighboring economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output.

5. File Finland_and_other_tweleve_countries.xlsx has Finland’s regression calculations with 12 other major economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output.

6. Austria_time_forecasting.xlsm has time series modeling output. These are the worksheets in the excel file:a. Worksheet Main has the basic time series and columns showing all

time series models with values and errors.b. Basic Charts worksheet has Austria’s stock price and returns plotted

against time.c. Worksheets FNAIV, FNAIV, F_MA_3, F_MA_4, F_MA_5, EXP_0.758 and

EXP_0.2 are used to calculate Avg Error, MSE, SE and MAD

Page 27 of 27

Documents

Using Statistics to understand Stock Market with forecasting