Upload
xebra
View
39
Download
3
Embed Size (px)
DESCRIPTION
Using Statistics to understand Stock Market with forecasting
Citation preview
Using Statistics to Understand Stock Markets
SummaryThis term paper outlines and demonstrates use the Statistics in understanding ten year stock
market prices and returns from fifty countries using basic statistics, linear regression and time series forecasting models.
KeywordsTerm paper, Statistics, Linear Regression, Average, Logarithmic Transformation, Kurtosis, F-Test for Linear Regression, T-Test for Linear regression, P-value, Partial Linear Regression,
Exponential Time Series Forecasting, Moving Averages Forecasting
Page 2 of 27
Contents:
1. Abstract
2. Introduction
3. Basic Statistics:
4. Correlation
5. Linear Regression Model
5.1 Introduction.5.2 Predicting the Dependent Variable Y.5.3 Coefficient of Mutliple Determination.5.4 Test for the Significance of the Overall Multiple Regression
Model.5.6 Residual Analysis.5.7 Inferences Concerning the Population Regression Coefficients.5.8 Line Plots.5.9 Further Analysis.5.10 Multiple Linear Regression Conclusions.
6. Forecasting
6.1 Models.6.2 Forecast Error Measures.6.3 Charts for Forecast Models.
7. Conclusion
Appendix
A. References.
B. List of Excel files used for calculations.
1. Abstract
Page 3 of 27
In this paper we demonstrated the use of statistical concepts to understand stock market returns for fifty different countries. The data is monthly closing prices of MCSI country index in US dollars.
We created basic statistics for the set of fifty countries stock market monthly returns. We showed usage of linear regression to relate returns of Finland first to the stock markets of the top seven economies of the world, then to seven bigger neighbors and in the end to eight other important economies. In the end we demonstrated the use of time series prediction to predict returns of Austrian stock index prices. Excel and PHStat were tools of our choice.
2. Introduction
Most of us have some relationship with the stock market. We either have direct stock holdings or have indirect stock holdings through mutual funds in retirement plans. Stock markets are where investors invest in companies big and small. Stock market is a direct indicator economical environment. Hence understanding stock markets is helpful for managers while making investment and capital expenditure decisions. Typically stock brokers, stock investors, government officials and banks are interested in stock market data and analysis.
We had with us monthly MSCI index prices for fifty countries. The index prices are not distributed normally but the returns are. It is not preferable to work directly with the price series for performing any statistical analysis. The raw price series are converted into series of returns. Additionally, returns have the added benefit that they are unit-free and currency free, allowing comparisons to be done across markets.
There are two methods used to calculate returns from a series of prices, and these involve the formation of simple returns, and continuously compounded returns, which are achieved as follows:
Simple returns = ( ( − )/( ) ) × 100% Continuously compounded returns: = 100% × ln ( / )
where:
denotes the simple return at time t denotes the continuously compounded return at time t denotes the asset price at time t
ln denotes the natural logarithm.
We used continuously compounded returns using logarithm transformation for our analysis. We used Excel and PHStat as our tools to do our Statistical analysis.
Page 4 of 27
3. Basic Statistics:
We found out which countries on average gave most returns over ten year period. Countries Colombia, Czech Republic, Austria and Denmark were on the top four with 18.68%, 18.22%, 17.98% and 14.89% annual compounded returns for the nine plus year period of 10/31/1997 to 7/31/2007.
We were able to easily identify that while Colombia, Czech Republic, Austria and Denmark gave investors the biggest returns, it was Denmark and Austria which provided lesser volatility among these 4 countries. The standard deviation of Denmark and Austria were 17.44 and 23.99 compared to 32.16 and 40.72 for Czech Republic and Columbia.
The chart below shows the average rate of returns against the standard deviation.
Investing in the stock market always bears some risk, large or small, depending on the volatility of the stock price. A stock that has large volatility may make give higher or negative returns depending on when the investor enters and exits out of the stock holdings.
We found that Russia, Turkey, Indonesia has highest volatility whereas USA, United Kingdom and Canada are mostly stable (as of Sep 2007).
We made an attempt to understand the statistical measure Kurtosis. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations. Kurtosis measures whether the data is sharp or flat relative to a normal distribution. Since Kurtosis measures the shape of the distribution (the fatness of the tails), it focuses on how returns are ranged around the mean. A Kurtosis coefficient of three indicates a normal distribution.
Page 5 of 27
Kurtosis of less than three indicates a low peak with a fat midrange on either side (platykurtic). Conversely, Kurtosis greater than three indicates a sharp/high peak with a thin midrange and fat tails (leptokurtic). Therefore, to put simply, Kurtosis describes how bunched around the center or spread at the endpoints a frequency distribution is. Sometimes Kurtosis is also called "the volatility of volatility."
The chart below shows kurtosis in increasing order against average monthly returns with its standard deviation. It is Russia which has the largest standard deviation and Kurtosis, meaning Russia stock market is most volatile among 50 countries.
4. Correlation
We know that relationship between two variables is expressed through correlation. We used Excel to create a matrix of 50 X 50 correlation relationship with Excel which is showed in the table. We found that Netherlands and France had the value of 0.886035, which was one of the highest correlations. In the other hand, we found Pakistan and Denmark with correlation value of -0.19975.
Any investment requires diversification so that the investing risks are spread among unrelated investments. Correlation could be good tool to find diversification. A quick glance at the correlation matrix table identify on how to diversify. For example investment in France and Netherlands, with correlation coefficient of 0.886, will lead to same type of returns or risks.
5. Linear Regression Model
Page 6 of 27
IntroductionWe picked Finland for creating linear regression models. Finland is 33rd
among the top 50th economies of the world. The top 7 economies are:
United States Japan Germany China United Kindom France Italy
(Source: The Economist Pocket World Figures, 2009 edition)
Multiple regression Model with k independent variables is given as:
In our case we have 7 independent variables, so the model to be developed is:
Here , for j= 1 to 7, are monthly rate of returns for the seven biggest countries of
the world. is the estimated rate of return for Finland for which the linear model was to be created. We first used Microsoft Excel to compute the values of the eight regression coefficients.
Output of Microsoft Excel Multiple Regression Analysis
From above figure, the computed values of the regression coefficients are
Therefore the multiple regression equation is:
Page 7 of 27
The sample Y intercept ( ) estimates the return of Finland stock market when returns of all other seven stock markets are zeroes. Because the
stock returns cannot be zeroes for all markets at same time, the value of has no practical interpretation.
The slope of rate of return with US rate indicates that for a given amount of rate of return for US, the Finland rate of return is going to decrease
by 0.14045 times. The estimates of all allowed us to better understand the effect of the rate of returns of biggest seven economies on Finland.
Regression coefficients in multiple regression are called net regression coefficients. They estimate the mean change in Y per unit change in a particular X, holding constant the effect of other X variables.
Predicting the Dependent Variable Y
We used the multiple regression equation to predict the value of the dependent variable. We took monthly rate of return from Feb 27, 1998 for all seven countries and found that model predicted range of 1.43428 to 8.31292 at 95% confidence level. The actual value of dependent variable or Finland’s monthly rate of return was 9.80996. We used PHSTAT’s Confidence interval estimate and prediction interval function to arrive at range, shown in the table below.
The table below shows expected the output of Finland’s monthly return at 95% confidence level on Feb 27, 1998 with given rates of returns for the seven countries:
DataConfidence Level 95%USA given value 6.287008JAPAN given value 1.1237GERMANY given value 4.412191CHINA given value 30.57856UNITED KINGDOM given value 5.309952FRANCE given value 8.189638ITALY given value 6.306493
For Average Predicted Y (YHat)Interval Half Width 3.439322Confidence Interval Lower Limit 1.434281Confidence Interval Upper Limit 8.312925
Coefficient of Mutliple Determination
The coefficient of multiple determination is equal to the regression sum of square (SSR) divided by the total sum of squares (SST):
= Regression Sum of Squares / Total Sum of Squares = SSR /SST
Page 8 of 27
For Finland’s monthly rate of return we have =3299.203 / 7011.618 = 0.47053
Excel also created the same result for us when we did regression analysis:
SUMMARY OUTPUTRegression Statistics
Multiple R 0.685955R Square 0.470534Adjusted R Square 0.43684Standard Error 5.809409Observations 118
The coefficient of multiple determination ( = 0.47053) indicates that 47.05% of the variation in Finland’s rate of return is explained by the rate of returns of the seven biggest economies. One would have expected higher correlation but this is not the case here.
We also looked at PHStat output where for was calculated six times,
removing one of the each time.
Condition Which Variable RemovedAll but Italy 0.47041All but France 0.46662All but United Kingdom 0.46382All but China 0.47012All but Germany 0.43121All but Japan 0.44104All but USA 0.46883
Test for the Significance of the Overall Multiple Regression Model
We performed the significance test of the overall multiple regression model using F-test. Here we tried to find if there is a significant relationship between the dependent variable and the entire set of independent variables. Since there is more than one independent variable, we used the following null and alternate hypothesis:
Page 9 of 27
: = 0 (There is no linear relationship between the dependent variable and the independent variables.)
: At least one 0, j= 1, 2, ….7 (There is linear relationship between the dependent variable and at least one of the independent variables.)
The overall F Test Statistic is equal to the regression mean square (MSR) divided by the error mean square
F = MSR/MSE
Where: F= test statistic from an F distribution with k and n – k – 1 degrees of freedomk=number of independent variables in the regression model
n=number of samples uses to create the regression model
ANOVA table for our model
Df SS MS FSignificance
F
Regression 7 3299.203009471.314715
5 13.96519865 7.54666E-13
Residual 110 3712.41541333.7492310
3Total 117 7011.618422
The decision rule is to reject : at the α level of significance if F >
otherwise, do not reject . Using a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately 2.02. From figure above the F statistic is 13.9652.
Because 13.9652 > 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns of seven biggest economies) is related to Finland rate of return.
Figure showing Significance of the Overall Multiple Regression Model F-Test
Residual Analysis
We evaluated the appropriateness of using the multiple linear regression model using residual analysis. We created the seven residual plots using Excel
Page 10 of 27
along with residuals for expected Y. From these charts we saw that the pattern is random for all the charts and use linear regression was appropriate in this case.
Inferences Concerning the Population Regression Coefficients
To determine the existence of a significant linear effect on y (Finland’s rate of
return) and independent variable (the monthly rate of return of one of the biggest seven economies) the null and the alternate hypotheses are:
Page 11 of 27
: (There is no linear relationship): (There is a linear relationship)
The t-statistic equals the difference between the sample slope and the hypothesized value of the population slope divided by the standard error of the slope:
t = ( ) / ( ) where
= slope of variable j with Y, holding constant the effects of the other independent variables.
= Standard Error of the regression coefficient t = test statistic for a t distribution with n – k – 1 degrees of freedom
k = number of independent variables
= hypothesized value of the population for variable j, holding constant the effects of the other independent variables.
The table below summarized our findings. We used 95% confidence levels, and for p-value > 0.05 null hypothesis was accepted.
Country t StatCritical t
Is t-stat in area of non-rejection
Null Hypothesis p-value > 0.05
USA -0.59559-
1.9799 Yes Accepted 0.552673 Yes
JAPAN 2.47559 1.9799 No Rejected 0.014826 No
GERMANY 2.85814 1.9799 No Rejected 0.005098 No
CHINA 0.27974 1.9799 Yes Accepted 0.780206 YesUNITED KINGDOM 1.18083 1.9799 Yes Accepted 0.240219 Yes
FRANCE 0.90193 1.9799 Yes Accepted 0.369066 Yes
ITALY 0.15824 1.9799 Yes Accepted 0.874559 Yes
We found that only Japan and Germany have the biggest contribution to Y value or Finland’s rate of return.
We decided to perform PHStat Stepwise Regression to confirm the findings above. We found that the PHStat’s Forward Selection Function Best Model Fit
Page 12 of 27
selected only Germany and Japan as significant dependent variables. This confirmed what we found earlier. So the liner equation reduced to:
PHStat Stepwise Regression Analysis TableFinland and 7 biggest economiesTable of Results for Forward SelectionGERMANY entered.
df SS MS FSignificance
F
Regression 1 2745.549597 2745.54959774.6550902
9 3.55615E-14Residual 116 4266.068825 36.77645538Total 117 7011.618422
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept-
0.034545387 0.562181166-
0.0614488510.95110749
7-
1.1480159851.07892521
1
GERMANY 0.730094046 0.084498518 8.640317719 3.55615E-14 0.5627340890.89745400
3JAPAN entered.
df SS MS FSignificance
FRegression 2 3150.704997 1575.352499
46.92297325 1.2593E-15
Residual 115 3860.913425 33.57316021Total 117 7011.618422
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept-
0.117655288 0.537672497-
0.218823333 0.82717555 -1.182680990.94737041
4
GERMANY 0.66036799 0.083192301 7.937849769 1.51106E-12 0.4955800580.82515592
3
JAPAN 0.351600388 0.101212614 3.4738791360.00072409
4 0.151117685 0.55208309
No other variables could be entered into the model. Stepwise ends.
Line Plots
We created line plots for Finland for each of independent variables:
Page 13 of 27
Further Analysis
At point we wondered if the Finland’s stock market was correlated more to the bigger economies in its neighborhood. We picked following countries to do further analysis:
Page 14 of 27
Country Economic Rank Country Economic RankRussia 11 Poland 24Netherlands 16 Norway 25Belgium 18 Denmark 28Sweden 19
We obtained following Excel output:
Here =0.596 told us that Finland’s stock market has better linear relationship to its seven neighboring countries than the biggest seven economies of the world.
Again with a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately
2.02. The F statistic is 23.1811. Because 23.1811 > 2.02, we rejected and found enough statistical proof to conclude that at least one of the independent variables (rates of returns of seven neighboring economies) is related to Finland’s rate of return.
Hence the multiple linear regression equation is:
Again we created the t-test table and found that only Sweden and Poland have the biggest contribution to Y value or Finland’s rate of return.
Page 15 of 27
Country t StatCritical t
Is t-stat in area of non-rejection
Null Hypothesis p-value > 0.05
RUSSIA0.5012210
6
-1.979
9 Yes Accepted0.61721
7 YesNETHERLANDS -0.1346789
1.9799 No Accepted
0.893112 Yes
BELGIUM0.8745872
41.979
9 No Accepted0.38370
4 Yes
SWEDEN4.7668807
61.979
9 Yes Rejected 5.77E-06 No
POLAND 4.2295021.979
9 Yes Rejected 4.86E-05 No
NORWAY -0.3460091
-1.979
9 Yes Accepted0.72999
7 Yes
DENMARK -0.672525
-1.979
9 Yes Accepted 0.50266 Yes
With this new insight, we changed the linear equation to:
Once again we performed PHStat Stepwise Regression to confirm the findings and we found the same, that Finland is more related to Sweden and Poland.
Page 16 of 27
The findings above got us curious. We wanted to see if Finland’s rate of return had any linear relationship with other countries in Asia Pacific and Latin America. We also decided kept four countries which had linear relationship in our earlier analysis and added other eight which were not considered before. These countries were picked (the one’s in bold were picked before):
ARGENTINA INDONESIA CHINA POLAND
AUSTRALIA JAPANGERMANY SWEDEN
BRAZIL MEXICO INDIASINGAPORE
We obtained following Excel output:
Page 17 of 27
We found that =0.62 told us that Finland’s stock market has better linear relationship to twelve countries picked in the list.
Again with a 0.05 level of significance and the F statistic being 14.715 which
is greater than 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns from twelve countries) is related to Finland rate of return.
We created the t-test table and found that only Sweden, Poland, China and Australia had linear relationship to Y value or Finland’s rate of return.
Page 18 of 27
Country t StatCritical t
Is t-stat in area of non-rejection
Null Hypothesis p-value > 0.05
ARGENTINA -0.52605 -1.9799 Yes Accepted 0.59996518 Yes
AUSTRALIA 2.03623 1.9799 No Rejected0.04424444
5 No
BRAZIL 0.627811 1.9799 Yes Accepted0.53149049
2 Yes
CHINA -2.08348 -1.9799 No Rejected0.03963512
3 No
GERMANY 0.649476 1.9799 Yes Accepted0.51744923
9 Yes
INDIA 0.892593 1.9799 Yes Accepted0.37411610
5 Yes
INDONESIA 0.109509 1.9799 Yes Accepted0.91300748
7 Yes
JAPAN 0.422115 1.9799 Yes Accepted0.67380430
7 Yes
MEXICO 0.127667 1.9799 Yes Accepted 0.89865689 Yes
POLAND 3.339558 1.9799 No Rejected0.00116232
7 No
SWEDEN 3.297052 1.9799 No Rejected0.00133379
4 No
SINGAPORE 0.282879 1.9799 Yes Accepted0.77782614
7 Yes
Page 19 of 27
Again PHStat Stepwise Regression confirmed the same findings and we found the same that Finland is related linearly to Sweden, Poland, China and Australia.
Multiple Linear Regression Conclusions
We found that regression can be very good tool to model stock market returns and find relationship among different market returns. With any statistical analysis there is always going to be uncertainty, and this needs to be kept in mind while making all investing decisions.
Page 20 of 27
We were able to find that Finland’s rate of return was related to rates of returns of these countries: Sweden, Poland, China and Australia. In other way to understand this will be that these five countries present similar investment risks.
6. Forecasting
We exercised forecasting modeling techniques with the Austria’s stock index prices. First we plotted the monthly rate of return against time.
Here we found the monthly returns are non seasonal has no consistent
upward and downward trend. In this case the use of exponential and moving average smoothing models for forecasting purposes was most appropriate.
ModelsWe used following forecasting techniques to create our forecasting modes:
1. First Order Naïve:
2. 2nd Order Naïve:
3. 3 Period Moving averages:
4. 4 Period Moving averages:
5. 5 Period Moving averages: 6. Exponential Smoothing Forecast with ω = 0.758:
7. Exponential Smoothing Forecast with ω = 0.2
Page 21 of 27
Forecast Error Measures
We created these seven models using Excel. After creating forecasted series we calculated Forecast Error Measures. These are given as:
Bias
Average error=∑t=1
n
et
nVariability:
Mean squared error MSE=∑t=1
n
et2
n−1
Standard deviation s =√MSE
Mean absolute error MAD
=∑t=1
n
|et|
n−1We calculated the error measures and here they are:
Date
FNAIV Error
2nd NAIV Error
F_MA_3 Error
F_MA_4 Error
F_MA_5 Error
EXP_0.758 Error
Exp_0.2 Error
Average Error
-0.11329
-0.08389
-0.08465
-0.11619
-0.25684 -0.00982
0.237657
MSE91.5818
991.6208
9 63.332958.7538
757.7495
7 73.8986554.2746
5
SE9.56984
3 9.571887.95819
77.66510
77.59931
4 8.5964327.36713
3
MAD7.17234
57.07788
45.53094
85.43828
75.55145
1 6.1669965.39645
3
Page 22 of 27
We found that the exponential Smoothing Forecast with ω = 0.758 had least bias with average error of -0.00982, whereas exponential Smoothing Forecast with ω = 0.2 had least variability. The Mean absolute error (MAD) was least for the 4 Period Moving averages model but exponential Smoothing Forecast with ω = 0.758 also had MAD close to the moving averages model.
We created charts for all these models with actual vs forecasted models to visually show the different models used for forecasting.
Charts for Forecast Models
First Order Naïve Forecast Model Chart (Blue Actual, Red Forecasted)
Second Order Naïve Forecast Model Chart (Blue Actual, Red Forecasted)
3 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)
Page 23 of 27
4 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)
5 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)
Exponential Smoothing Forecast Chart (with ω=0.758, Blue Actual, Red Forecasted)
Page 24 of 27
Exponential Smoothing Forecast Chart (with ω=0.2, Blue Actual, Red Forecasted)
7. ConclusionIn this paper we used different Statistical aspects to analyze international
stock market returns. The study as such could be very exhaustive, but in our limited scope, we successfully demonstrated the use of basic Statistics, Linear Regression and Time Forecasting.
Page 25 of 27
Appendix
A. References1. Levine, Stephan, Krehbiel, Berenson Statistics for Managers Using
Microsoft Excel, 5th Edition, 2008.2. Anderson, Sweeney Williams, Essentials of Modern Business Statistics,
4th Edition, 20093. Statistical terms at http://en.wikipedia.org/4. Class notes.
B. List of Excel files used calculation
1. country_data_in_pc.xlsm is the main excel file with following worksheets:a. The base data for fifty countries monthly rate of return is in
HistoryIndex worksheet.b. MarketReturns worksheet transforms HistoryIndex worksheet to log
returns in percentages using formula =100*LN(HistoryIndex!B11/HistoryIndex!B10) It also has following basic statistics for each country:
Monthly Avreage Return Monthly VarianceMonthly Std. Deviation Yearly Avg ReturnYearly Std. Dev Coefficient of VarianceSkewness Kurtosis
c. CorrelationMatrix worksheet has the 50 X 50 correlation matrix table for 50 countries.
d. Histograms worksheet has histograms for all countries rate of returns including their frequency tables.
e. Worsheet BasicStatsUsingExcel contains once again basic statistics but calculated using Data Analysis Descriptive Statistics Tools,
Page 26 of 27
instead of using function and calculations for each of statistical measures in b.
f. Line diagram for average yearly rates of return with standard deviation is in YRLY_RETURN_CHART_BY_COUNTRIES worksheet.
g. Kurtosis_Chart workseet has monthly Kurtosis measure plotted with monthly average returns and standard deviation.
h. List of all countries in one column is in CountryNames work sheet.
2. Correlationmatrix.xlsx has 50 X 50 correlation matrix table for 50 countries CorrelationMatrix.
3. Finland_n_other_7_big_economies.xlsx has Finland’s regression calculation along with residual plots and line diagrams in the first worksheet Finland_Regression_Model. Other work sheets contain output from PHStat.
4. File finland_vs_7_Neighboring _big_countries.xlsx has Finland’s regression calculations with seven biggest neighboring economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output.
5. File Finland_and_other_tweleve_countries.xlsx has Finland’s regression calculations with 12 other major economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output.
6. Austria_time_forecasting.xlsm has time series modeling output. These are the worksheets in the excel file:a. Worksheet Main has the basic time series and columns showing all
time series models with values and errors.b. Basic Charts worksheet has Austria’s stock price and returns plotted
against time.c. Worksheets FNAIV, FNAIV, F_MA_3, F_MA_4, F_MA_5, EXP_0.758 and
EXP_0.2 are used to calculate Avg Error, MSE, SE and MAD
Page 27 of 27