Upload
munishvirang
View
230
Download
0
Embed Size (px)
DESCRIPTION
Regression Model to predict BSE SENSEX
Citation preview
F O R E C A ST I N G M O D E L T O P R E D I C T B S E ( B O M B AY ST O C K E XC H A N G E ) S E N SE X
U SI N G R E G R E SS I O N A N A LY SI S
BY : M U N I SH V I R A N GI N D U S T R I A L & SY S T E M E N G I N E E R
A R I Z O N A S TAT E U N I V E R S I T Y
Outline
DATASET Description Full Model Issues & Remedy Final Model Comparison between Full Model & Final Model Conclusion & Scope of Improvement Tools Used
DATASET Description
Response Variable: BSE (Bombay Stock Exchange) SENSEX (No Unit)
Predictor Variables:
S.No Description of the Regressor
Type Units
1 DOW Numeric (Global Queue)
No Unit
2 NASDAQ (NAS) Numeric (Global Queue)
No Unit
3 DAX Numeric (Global Queue)
No Unit
4 FTSE 1000 (FTS) Numeric (Global Queue)
No Unit
5 NIKKEI 225 (NIK) Numeric (Global Queue)
No Unit
6 Straits Times (ST) Numeric (Global Queue)
No Unit
7 FOREX (FOR) Numeric $ Billion
8 Crude Oil (CO) Numeric $/BBL ($ US)
9 Gold Bullion (GOLD) Numeric $/Troy Ounce
10 GDP Numeric $ Billion
11 Inflation Rate (IR) Numeric %
12 Rs vs Dollar (RD) Numeric Rupee
Full Model: All Predictors in the Model
BSE SENSEX (Y) = - 6069 - 0.095 DOW - 0.229 NAS - 0.128 DAX + 1.06 FTS + 0.129 NIK + 0.23 ST+ 2.00 GDP + 8.5 FOR - 17.2 IR - 33 RD + 11.0 CO + 14.4 Gold
Issues With The Full Models:
Prediction Way Off from Target Over fitting the MODELMulticollinearity Present Model Adequacy Check Fail
1 6 11 16 21 26 31 360
20
40
60
80
100
120
140
160
ACTUAL BSE SENSEXPREDICTED BSE SENSEX
R-Sq = 97.5% > R-Sq (adj) = 96.5%
Statistics Indicating: Lot of Redundant Predictor Variables in the Model for which model is
been penalized in tern of accuracy.
The Condition Number Κ=λmax =1374.84 λminStrong multicollinearity is present in the data set.
As Nasdaq and DOW rise BSE also rise ie Positive Correlation (Subject Matter Expert).
Model indicate the other way
Residual show a pattern are not structure less.
Model not good for future prediction
How To Deal With Issue
Collecting More Data Standardization Ridge Regression Re specify the Model (using Principle Component
Analysis)
First three Approach No Change.
Principle Component Analysis (Re specifying The Model )
Predictor variables can be classified as : Global Queue /Stock Market Indices from US,EUROPE and
ASIA Commodities & Exchange
Approach:Instead of using all the predictor variables in the model a linear
combination of variables are used
New Variables
PC Que • PC QUE =
0.391*DOW+0.406*NAS+0.406*DAX+0.437*FTS+0.441*NIK+0.334*ST
• All the Global Queue Equally Loaded
PC GFCG
• PC FCG = 0.511*GDP+0.508*FOR+0.473*CO+0.507*Gold
• Linear Combination of GDP,FOREX,CRUDE OIL & GOLD
• Equally Loaded
Infaltion Rate&
Rupee Dollar Rate• Unchanged Variables
Linear Combination•Eigen Value •Scree Plot
Variable Selection
Model: Y = - 14323 + 125 RD + 14.1 PC GFCG + 0.363 PC Queue
Method Suggested PredictorsSummary Statistics
Forward Selection(α–to-enter: 0.1)
RD,PC QUE and PC GFCG
S = 369R-sq = 96.54
R-sq(adj) = 96.26C-p = 3.2
Backward Elimination(α–to-remove: 0.1)
RD,PC QUE and PC GFCG
S = 369R-sq = 96.54
R-sq(adj) = 96.26C-p = 3.2
Stepwise Regression ( α–to-enter: 0.1, α–to-remove: 0.1)
RD,PC QUE and PC GFCG
S = 369R-sq = 96.54
R-sq(adj) = 96.26C-p = 3.2
Model Adequacy Check Fail
10005000-500-1000
99
90
50
10
1
Residual
Perc
ent
100008000600040002000
1000
500
0
-500
-1000
Fitted Value
Residual
8004000-400-800
12
9
6
3
0
Residual
Fre
quency
4035302520151051
1000
500
0
-500
-1000
Observation Order
Residual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Y
Model not good for future Prediction
Curve Pattern (Ideal
Structure less)
How to Deal with Model Inadequacy
Data Transformation:Using a square root transformation on response variable ,BSE SensexModel good for future prediction
210-1-2
99
90
50
10
1
Standardized Residual
Perc
ent
10090807060
2
1
0
-1
-2
Fitted Value
Sta
ndard
ized R
esidual
210-1-2
8
6
4
2
0
Standardized Residual
Fre
quency
4035302520151051
2
1
0
-1
-2
Observation Order
Sta
ndard
ized R
esidual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for SQRT RESStructure
less
Variable Selection
Model: SQRT (BSE Sensex) = - 13.2 + 0.0875 PC GFCG + 0.00218 PC Queue
Method Suggested PredictorsSummary Statistics
Forward Selection(α–to-enter: 0.1)
PC QUE and PC GFCG
S = 2.16R-sq = 97.26
R-sq(adj) = 97.11C-p = 1.4
Backward Elimination(α–to-remove: 0.1)
PC QUE and PC GFCG
S = 2.16R-sq = 97.26
R-sq(adj) = 97.11C-p = 1.4
Stepwise Regression ( α–to-enter: 0.1, α–to-remove: 0.1)
PC QUE and PC GFCG
S = 2.16R-sq = 97.26
R-sq(adj) = 97.11C-p = 1.4
Final Model
SQRT (BSE SENSEX) = - 13.2 + 0.0875 PC GFCG + 0.00218 PC Queue
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839400
20
40
60
80
100
120
BSE SENSEXPREDICTED BSE
Better Predic-tion
Comparison of Full Model and Final Model
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 390
20
40
60
80
100
120
140
160
BSE SENSEXFINAL PREDICTED MODELFULLMODEL
Conclusion & Scope of Improvement
As per the model the important predictor are1. Global Queue2. GDP of the Country3. FOREX4. Gold and Crude oil Prices
Including top 30 share of BSE Sensex (Direct Cause and Effect relation) will enhance the capability of the model to predict more accurately
Tools & References
Tools: MINTAB 14 SAS 9.1
References: Montgomery D.C., Introduction to Linear Regression,
Fourth Edition, Wiley & Sons Inc www.finance.yahoo.com www.bseindia.com