Munish Virang Rp

F O R E C A ST I N G M O D E L T O P R E D I C T B S E ( B O M B AY ST O C K E XC H A N G E ) S E N SE X

U SI N G R E G R E SS I O N A N A LY SI S

BY : M U N I SH V I R A N GI N D U S T R I A L & SY S T E M E N G I N E E R

A R I Z O N A S TAT E U N I V E R S I T Y

Outline

DATASET Description Full Model Issues & Remedy Final Model Comparison between Full Model & Final Model Conclusion & Scope of Improvement Tools Used

DATASET Description

Response Variable: BSE (Bombay Stock Exchange) SENSEX (No Unit)

Predictor Variables:

S.No Description of the Regressor

Type Units

1 DOW Numeric (Global Queue)

No Unit

2 NASDAQ (NAS) Numeric (Global Queue)

No Unit

3 DAX Numeric (Global Queue)

No Unit

4 FTSE 1000 (FTS) Numeric (Global Queue)

No Unit

5 NIKKEI 225 (NIK) Numeric (Global Queue)

No Unit

6 Straits Times (ST) Numeric (Global Queue)

No Unit

7 FOREX (FOR) Numeric $ Billion

8 Crude Oil (CO) Numeric $/BBL ($ US)

9 Gold Bullion (GOLD) Numeric $/Troy Ounce

10 GDP Numeric $ Billion

11 Inflation Rate (IR) Numeric %

12 Rs vs Dollar (RD) Numeric Rupee

Full Model: All Predictors in the Model

BSE SENSEX (Y) = - 6069 - 0.095 DOW - 0.229 NAS - 0.128 DAX + 1.06 FTS + 0.129 NIK + 0.23 ST+ 2.00 GDP + 8.5 FOR - 17.2 IR - 33 RD + 11.0 CO + 14.4 Gold

Issues With The Full Models:

Prediction Way Off from Target Over fitting the MODELMulticollinearity Present Model Adequacy Check Fail

1 6 11 16 21 26 31 360

20

40

60

80

100

120

140

160

ACTUAL BSE SENSEXPREDICTED BSE SENSEX

R-Sq = 97.5% > R-Sq (adj) = 96.5%

Statistics Indicating: Lot of Redundant Predictor Variables in the Model for which model is

been penalized in tern of accuracy.

The Condition Number Κ=λmax =1374.84 λminStrong multicollinearity is present in the data set.

As Nasdaq and DOW rise BSE also rise ie Positive Correlation (Subject Matter Expert).

Model indicate the other way

Residual show a pattern are not structure less.

Model not good for future prediction

How To Deal With Issue

Collecting More Data Standardization Ridge Regression Re specify the Model (using Principle Component

Analysis)

First three Approach No Change.

Principle Component Analysis (Re specifying The Model )

Predictor variables can be classified as : Global Queue /Stock Market Indices from US,EUROPE and

ASIA Commodities & Exchange

Approach:Instead of using all the predictor variables in the model a linear

combination of variables are used

New Variables

PC Que • PC QUE =

0.391*DOW+0.406*NAS+0.406*DAX+0.437*FTS+0.441*NIK+0.334*ST

• All the Global Queue Equally Loaded

PC GFCG

• PC FCG = 0.511*GDP+0.508*FOR+0.473*CO+0.507*Gold

• Linear Combination of GDP,FOREX,CRUDE OIL & GOLD

• Equally Loaded

Infaltion Rate&

Rupee Dollar Rate• Unchanged Variables

Linear Combination•Eigen Value •Scree Plot

Variable Selection

Model: Y = - 14323 + 125 RD + 14.1 PC GFCG + 0.363 PC Queue

Method Suggested PredictorsSummary Statistics

Forward Selection(α–to-enter: 0.1)

RD,PC QUE and PC GFCG

S = 369R-sq = 96.54

R-sq(adj) = 96.26C-p = 3.2

Backward Elimination(α–to-remove: 0.1)


S = 369R-sq = 96.54

R-sq(adj) = 96.26C-p = 3.2

Stepwise Regression ( α–to-enter: 0.1, α–to-remove: 0.1)


S = 369R-sq = 96.54

R-sq(adj) = 96.26C-p = 3.2

Model Adequacy Check Fail

10005000-500-1000

99

90

50

10

1

Residual

Perc

ent

100008000600040002000

1000

500

0

-500

-1000

Fitted Value

Residual

8004000-400-800

12

9

6

3

0

Residual

Fre

quency

4035302520151051

1000

500

0

-500

-1000

Observation Order

Residual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Y

Model not good for future Prediction

Curve Pattern (Ideal

Structure less)

How to Deal with Model Inadequacy

Data Transformation:Using a square root transformation on response variable ,BSE SensexModel good for future prediction

210-1-2

99

90

50

10

1

Standardized Residual

Perc

ent

10090807060

2

1

0

-1

-2

Fitted Value

Sta

ndard

ized R

esidual

210-1-2

8

6

4

2

0

Standardized Residual

Fre

quency

4035302520151051

2

1

0

-1

-2

Observation Order

Sta

ndard

ized R

esidual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for SQRT RESStructure

less

Variable Selection

Model: SQRT (BSE Sensex) = - 13.2 + 0.0875 PC GFCG + 0.00218 PC Queue

Method Suggested PredictorsSummary Statistics

Forward Selection(α–to-enter: 0.1)

PC QUE and PC GFCG

S = 2.16R-sq = 97.26

R-sq(adj) = 97.11C-p = 1.4

Backward Elimination(α–to-remove: 0.1)

PC QUE and PC GFCG

S = 2.16R-sq = 97.26

R-sq(adj) = 97.11C-p = 1.4

Stepwise Regression ( α–to-enter: 0.1, α–to-remove: 0.1)

PC QUE and PC GFCG

S = 2.16R-sq = 97.26

R-sq(adj) = 97.11C-p = 1.4

Final Model

SQRT (BSE SENSEX) = - 13.2 + 0.0875 PC GFCG + 0.00218 PC Queue

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839400

20

40

60

80

100

120

BSE SENSEXPREDICTED BSE

Better Predic-tion

Comparison of Full Model and Final Model

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 390

20

40

60

80

100

120

140

160

BSE SENSEXFINAL PREDICTED MODELFULLMODEL

Conclusion & Scope of Improvement

As per the model the important predictor are1. Global Queue2. GDP of the Country3. FOREX4. Gold and Crude oil Prices

Including top 30 share of BSE Sensex (Direct Cause and Effect relation) will enhance the capability of the model to predict more accurately

Tools & References

Tools: MINTAB 14 SAS 9.1

References: Montgomery D.C., Introduction to Linear Regression,

Fourth Edition, Wiley & Sons Inc www.finance.yahoo.com www.bseindia.com

http://www.finance.yahoo.com/

http://www.bseindia.com/

Business

Munish Virang Rp