White Paper on Regression

8/8/2019 White Paper on Regression

1/14

PRAXIS BUSINESS SCHOOL

White Paper on regression

A Report

Submitted to

Dr. Prithwis Mukherjee

In partial fulfilment of the requirements of the course

Quantitative Technique-2

On 07/09/2010

By

Ashish Maheshwari

( B09004)

Statistical Modelling:


2/14

Statistical modelling involves the appropriate application of statistical techniques, each

requiring certain assumptions to perform hypothesis tests, interpret the data and reach valid

conclusions. Data from experiments, product testing, simulation, surveys, and statistical

process and quality control must be appropriately analyzed before results can be

determined and conclusions drawn. The results from experiment or testing must be obtained

following established statistical procedures, including experimental design and the

appropriate use of statistical analysis and modelling techniques. These results can then be

reproduced, within sampling error, by repeating the experiment.

Statistical modelling requires careful selection of analytical techniques, verification ofassumptions, and verification of data. Descriptive statistics, graphs and relational plots ofthe data should first be examined to evaluate the legitimacy of the data, identify possibleoutcomes and assumptions and form preliminary ideas on variable relationships formodelling.

Benefits:

Application of appropriate statistical analysis techniques

Development of appropriate conclusions and key learning from the data

Ensuring results address experimental objectives

Maximizing information gained from the data

Maximizing chances of the experiment being successful

Techniques:

1. Statistical analysis and modelling techniques2. Descriptive techniques3. Data graphs, plots and exploratory data analysis4. Multi linear regression analysis

5. Logistic regression6. Time series analysis7. Discrminant analysis8. Factor analysis9. Cluster analysis10. Multivariate analysis11. Nonparametric analysis12. Experimental design

Pitfalls in using regression :

Regression analysis are statistical tool that, when properly used, can help people to makedecisions. But of the times they are not used in a proper way, they are misused. As a result,decision makers often make inaccurate forecast. The most common errors made whileusing Regression is as follows:

1. Specific limited range over which regression equation holds:

A common mistake is to assume that the estimating line can be applied over any range of

values. Hospital administrators can properly use regression analysis to predict therelationship between cost per bed and occupancy levels. Some administrators howeverincorrectly use the same regression to predict the cost per bed for occupancy levels that are


3/14

significantly higher than those were used to estimate regression line. The people makedecision on one set of cost and find that the cost change drastically as occupancyincreases.

2. Regression analysis do not determine cause and effect :

Another mistake which we assume while doing regression analysis is to assume that a

change in one variable is caused by change in the other variable.Considering the example of research and development expenses and annual profit toillustrate various aspects of regression analysis. It is really unlikely to say that profit in agiven year is caused by research and development expenditure in that year. In hightechnology industries the research and development activity can be used to explain profits,but a better way to do so would be to predict current profits in terms of past research anddevelopment expenditure including economic conditions, dollars spent on advertising andother variables .This can be done by using multiple regression techniques.

3. Conditions change and invalidate the regression equation:

Care must be taken when we use historical data to estimate the regression equation.Condition can change and violate one or more of the assumptions on which our regressionanalysis depends.

4. Values of variable change over time:

Another error which may arise is the dependence of some variables on time. Suppose a firmuses regression analysis to determine the relationship between the number of employeesand production volume. If the observation used in the analysis to determine extend back forseveral years, the resulting regression line may be too steep because it may fail torecognise the effect of changing technology.5.Relationships that have no common bond:

When applying regression analysis people sometime find a relationship between twovariables that, in fact have no common bond.

For example, to find a statistical relationship between a random variable of the number ofmiles per gallon consumed by eight different cars and the distance from earth to other eightplanets. But because there is no common bond between gas mileage and the distance toother planets, this relationship would be meaningless.

6. Finding things that do not exist:

In this regard, if one have to run a large number of regressions between many pairs ofvariables, it would be possible to get some interesting relationships. For example, to find ahigh statistical relationship between your income and the amount of beer consumed in theUS or even between the length of weight train and the weather. But in neither case there isa factor common to both variables. Hence, such relationships are meaningless.

7. Misinterpreting r and r2 : :

The coefficient of determination is misinterpreted if we use r2 to describe the percentage ofchange in the dependent variable that is caused by a change in the independent variable.This is wrong because r2 is a measure only of how well one variable describes another, notof how much of the change in one variable is caused by the other variable.

Techniques of regression that can be used to model social and businessscenarios:

Regression analysis is a statistical forecasting model that is concerned with describing andevaluating the relationship between the given two variables i.e. dependent and independent.

Regression analysis can predict the outcome of a given key business indicator (dependentvariable) based on the interactions of other related business drivers (explanatory variables).


4/14

Use of regression in Business model:

1. Trend line analysis:Line regression is used in the creation of trend lines, which uses past data to predict futureperformance or trends. Usually trend lines are used in business to show the movement offinancial or product attributes over time to time. Stock prices, oil prices, or productspecification can all be analysed using trend lines.

2. Risk analysis for Investments:The capital asset pricing model was developed using linear regression analysis, and acommon measure of the volatility of a stock or investment is its beta, which can bedetermined using linear regression. Linear regression and its use is key in assessing the riskassociated with the most investment vehicle.

3. Sales or Market forecasts:Multivariate linear regression is a method for forecasting sales volume, or market movementto create comprehensive plans for growth. This method is more accurate than trendanalysis, as trend analysis only looks at how one variable changes with respect to another.

4. Total quality control:

Quality control methods make frequent use of linear regression to analyse key productspecifications and other measurable parameters of product or organisational quality (suchas number of complaints over time etc.)

5. Linear Regression in Human resource:Linear regression methods are also used to predict the demographics and types of futurework force for large companies. This helps the companies to prepare the need of the workforce through development of good hiring plans and training plans for the existingemployees.

Social Model:

1. H ealth survey :

Taking example of Tuberculosis scenario during National Family Health Survey. If we takethe relationship of reporting TB infection and seeking treatment for men and women byvarious socio- economic characteristics, multivariate logistic regression are applied to findthe significant factors explaining reporting TB and treatment- seeking.

2. Analysis on Urbanization:

Taking example of Chinas urbanization projection level, which can be projected by applyingregression model and S- curve regression model.

Its formula is : ut=a0+a1*t

Where, t is the independent variable of year, ut is the dependent variable of urbanisationlevel in year t.

Based on the urbanisation level in 1990 cencus definition in the period of 1983-1999, theconstants in this formula are estimated and the linear regression simulation equation :

Ut=-1026.54+0.529*t

The static feature of this equation are as the following :

R2=0.98, F= 714.46, sig F=0.00000,

Which indicates that the simulation model is statistically significant.

Source:(www.iiasa.ac.at/admin)


5/14

3. Land use change scenario projections:

If the study area includes all the countries in the world, We derive future proportions ofartificial surfaces per region from projections of population and GDP, using a regressionmodel. We calculated a linear regression model linking the proportion of artificial surfaces

per region to the population and gross domestic per capita, with the country and urban typecity as additional factors.

How does one test the validity of regression model in terms of

a. Coefficient of determination:In statistics Coefficient of determination, R2 is used in the context of statistical models whosemain purpose of future outcomes on the basis of other related information. It is theproportion of validity in a data set that is accounted for by the statistical model. It provides ameasure of how well future outcomes are likely to be predicted by the model. There areseveral different definitions of R 2 which are only sometimes equivalent. One class of such

cases includes that of linear regression. In this case, R2

is simply the square of the samplecorrelation coefficient between the outcomes and their predicted values, or in the case ofsimple linear regression, between the outcome and the values being used for prediction. Insuch cases, the values vary from 0 to 1. If it is more towards 1, the model is valid and if itmore towards 0, the model is less valid.

b. Statistical significance of the identified slope coefficients:

The slope coefficient gives the degree of magnitude in change of independent variable ondependent variable. For example if slope coefficient is -2, it states that 1 % increase inindependent variable leads to 2 % decrease in dependent variable. It also gives us howimportant the independent variable is for deciding the future of dependent variable.


6/14

Business model

DATE OCL CHANGE(

Y)

DATE SENSE

X

CHANGE

(Y)

Mar-

04

212.0

5

Mar-

04

5,649.

30

Apr-

04

252 19% Apr-

04

5,599.

12

-1%

May-

04

299 19% May-

04

5,645.

86

1%

Jun-04 305 2% Jun-04 4,792.

01

-15%

Jul-04 309.9 2% Jul-04 4,813.

76

0%

Aug-

04

310 0% Aug-

04

5,193.

25

8%

Sep-

04

338 9% Sep-

04

5,202.

16

0%

Oct-

04

389.5 15% Oct-

04

5,587.

46

7%

Nov-

04

361 -7% Nov-

04

5,678.

65

2%

Dec-

04

359 -1% Dec-

04

6,259.

28

10%

Jan-05 421 17% Jan-05 6,626.

49

6%

Feb-

05

395 -6% Feb-

05

6,565.

21

-1%

Mar-

05

426 8% Mar-

05

6,725.

92

2%


7/14

Apr-

05

564 32% Apr-

05

6,506.

60

-3%

May-

05

580 3% May-

05

6,183.

07

-5%

Jun-05 572.1 -1% Jun-05 6,729.

39

9%

Jul-05 575 1% Jul-05 7,165.

45

6%

Aug-

05

650 13% Aug-

05

7,632.

01

7%

Sep-

05

188 -71% Sep-

05

7,818.

90

2%

Oct-

05

159.9 -15% Oct-

05

8,662.

99

11%

Nov-

05

120 -25% Nov-

05

7,989.

86

-8%

Dec-

05

151 26% Dec-

05

8,813.

82

10%

Jan-06 155 3% Jan-06 9,422.

49

7%

Feb-

06

150.3 -3% Feb-

06

9,959.

24

6%

Mar-

06

144 -4% Mar-

06

10,368

.75

4%

Apr-

06

148.9

5

3% Apr-

06

11,342

.96

9%

May-

06

206.9 39% May-

06

12,103

.78

7%

Jun-06 159.9

5

-23% Jun-06 10,472

.46

-13%

Jul-06 142.6

5

-11% Jul-06 10,616

.97

1%

Aug-

06

153.3 7% Aug-

06

10,737

.50

1%

Sep-

06

158.8

5

4% Sep-

06

11,699

.57

9%

Oct- 172.5 9% Oct- 12,473 7%


8/14

06 06 .79

Nov-

06

170.2

5

-1% Nov-

06

12,992

.62

4%

Dec-

06

172 1% Dec-

06

13,729

.67

6%

Jan-07 166.6 -3% Jan-07 13,827

.77

1%

Feb-

07

172 3% Feb-

07

14,124

.36

2%

Mar-

07

154.2 -10% Mar-

07

13,013

.74

-8%

Apr-

07

141 -9% Apr-

07

12,811

.93

-2%

May-

07

149 6% May-

07

13,987

.77

9%

Jun-07 151.7

5

2% Jun-07 14,610

.28

4%

Jul-07 147.6

5

-3% Jul-07 14,685

.16

1%

Aug-

07

148 0% Aug-

07

15,344

.02

4%

Sep-

07

143 -3% Sep-

07

15,401

.99

0%

Oct-

07

162 13% Oct-

07

17,356

.99

13%

Nov-

07

302.1 86% Nov-

07

20,130

.23

16%

Dec-07 320 6% Dec-07 19,547.09 -3%

Jan-08 340 6% Jan-08 20,325

.27

4%

Feb-

08

227 -33% Feb-

08

17,820

.67

-12%

Mar-

08

209.6 -8% Mar-

08

17,227

.56

-3%

Apr-08

150 -28% Apr-08

15,771.72

-8%


9/14

May-

08

138.2

5

-8% May-

08

17,560

.15

11%

Jun-08 132 -5% Jun-08 16,591

.46

-6%

Jul-08 99.05 -25% Jul-08 13,480

.02

-19%

Aug-

08

95.15 -4% Aug-

08

14,064

.26

4%

Sep-

08

96.6 2% Sep-

08

14,412

.99

2%

Oct-

08

68 -30% Oct-

08

13,006

.72

-10%

Nov-

08

62 -9% Nov-

08

10,209

.37

-22%

Dec-

08

41 -34% Dec-

08

9,162.

94

-10%

Jan-09 43.45 6% Jan-09 9,720.

55

6%

Feb-

09

50.9 17% Feb-

09

9,340.

37

-4%

Mar-

09

43.6 -14% Mar-

09

8,762.

88

-6%

Apr-

09

45.95 5% Apr-

09

9,745.

77

11%

May-

09

71 55% May-

09

11,635

.24

19%

Jun-09 95.55 35% Jun-09 14,746

.51

27%

Jul-09 96.9 1% Jul-09 14,506

.43

-2%

Aug-

09

112.9

5

17% Aug-

09

15,694

.78

8%

Sep-

09

131.0

5

16% Sep-

09

15,691

.27

0%

Oct-

09

138 5% Oct-

09

17,186

.20

10%

Nov- 110 -20% Nov- 15,838 -8%


10/14

09 09 .63

Dec-

09

111.4

5

1% Dec-

09

16,947

.46

7%

Jan-10 126.8 14% Jan-10 17,473

.45

3%

Feb-

10

128.5 1% Feb-

10

16,339

.32

-6%

1.613675

94%

1.86311

25%

(Source: www..bseindia..com)

The data above shows the closing price per month of Orissa cements limited starting from March 04to Februarys 10 vis-a -vis data of sensex starting from march 04 to February 10. Therefore, by

running regression analysis with the help of this data, we can calculate the Beta of the given stock.

When analysts use capital asset pricing model (CAPM), they generally use regression to calculate

Beta. Beta is use to calculate the cost of capital for a company. It helps in valuing a company and

further equity research and recommendation to the investors.

Hypothesis 1:

Stock price of a company depends upon sensex.

Hypothesis 2:

The stock price of the company is more sensitive than the sensex.

Since the statistical use of regression may overwhelm some, Microsoft excel has packaged them in

their standard copy of the software. Below, excel 7.0 is used to illustrate the ease of calculating the

regression.

Step 1:

Dependent variable: Stock price of OCL.

Step2:

Independent variable: Sensex price

Step 3:

Obtain data for dependent variable and independent variable from past periods. For this business

model, we will use stock of OCL as well as sensex, starting from March 04 to February 10 .

Step 4:

Run the regression to assess the level of fit. In order to complete regression analysis, we first need to

add a piece of software that comes with standard version of excel. Once the information is input,


11/14

select the data which to be analysed and run the regression tool to view regression dialog bbox. Keep

in mind that the Y range is the dependent variable and the X range is the independent variable.

The performance of sensex is equal to the collective

performance of all the fifty companies stock in BSE.

We assume here that the volatility of sensex will

affect the stock price of a company. If an increase in

sensex increases the stock price then there is a

positive correlation in between them and vice-versa.

Y=0.2305x+0.0159

Executive Summary:

The above linear regression model gives us idea of Beta of the stock of a company which in turn

infers about the volatility of that stock. This also presents us the fact how the stock of a company is

performing in the market and whether it in accordance with the economic growth of the country. It

simplifies the fact that the sensex returns for a day have a positive or a negative impact on the daily

stock return of a company.

Regression Statistics

Multiple R0.5547

17

R Square0.3077

11Adjusted RSquare

0.297677

Standard Error0.1737

84

Observations 71

ANOVA

df SSRegression 1 0.92624

2Residual 69 2.08386

5

Total 70 3.010107

Coefficients

StandardError

Intercept-

0.009160.02112

4

X Variable 11.35798

90.24521

3

2. R2 statistic

for analysis

purpose

3. Standard

error for each4. Total sum of

squared regression.

5. Total sum of

squared errors.

6. Total sum of

squares.

1. Basic

R2


12/14

Business Model:

Years

No. of carssold

fuel price per barrelin Rs

1/fuel price per barrelin Rs

Per capitaincome

2002 6626387 1112.67 0.000898738 19040

2003 6240526 1292.85 0.000773487 20989

2004 6814554 1702.16 0.000587491 23241

2005 7338314 2177.74 0.000459191 20813

2006 8036010 2643.91 0.000378228 23222

2007 8534690 2605.88 0.000383748 29382

2008 9237780 4258.39 0.00023483 37490

I have taken data of number of car sold of Toyota , fuel price per barrel and per capita income from

year 2002 to 2008.

Source:

Number of passenger vehicle sold in India (2002-2008) www.siam.com

Per capita income of India ( 2002-2008) www.economywatch.com

Crude oil price ( 2002- 2008) www.ioga.com

Dependent Variable : Number of car sold

Independent variable: 1/ fuel price per barrel in Rs. and per capita consumption

The business model in this context is to find out the dependency of sale of Toyota cars in relation to

fuel price and per capita income. From this model we can forecast the sale of Toyota.

Hypothesis 1:

Sale of Toyota car depend upon per capita income

Hypothesis 2:

Sale of Toyota car depend upon fuel price.

SUMMARY OUTPUT


13/14

Regression Statistics

Multiple R0.9493

42

R Square0.9012

49Adjusted RSquare

0.851874

Standard Error421834

.6

Observations 7

ANOVA

df SS MS F Significanc

e F

Regression 2 6.5E+123.25E+

1218.253

04 0.009752

Residual 4 7.12E+111.78E+

11

Total 6 7.21E+12

Coefficients

StandardError t Stat

P-value

Lower95%

Upper95%

Lower95.0%

Upper95.0%

Intercept 6958610 15563674.4710

580.0110

66 26374411127977

9 2637441 11279779

1/fuel price per barrelin Rs -2.5E+09 1.14E+09

-2.2396

80.0886

53 -5.7E+096.11E+0

8 -5.7E+09 6.11E+08

Per capita income 77.99742 41.609681.8745

020.1341

31 -37.5296193.524

4 -37.5296 193.5244

R2 is 0.94 which is very near to 1, that indicates sale of Toyota cars is depend on fuel price as well as

per capita income. The model can be Y=6958610-2.5E+0.9x1 + 77.99742x2

Where,

Y= sale of Toyota car.

X1 =1/ fuel price per barrel in Rs.

X2= per capita income.

Y=-4E+09x + 1E+07

Y= 149.56x+4E+06


14/14

Executive Summary :

The above model gives idea about the expected sale of Toyota car next year. In this model fuel price

and per capita income are to be taken as independent variable. So its easy to get a data of expected

per capita income and fuel price. We can put data in this model and easily find out the expected sale

of Toyota car next year. Here in this model the assumption is that sale of Toyota is only depend on

the two variables which may or may not be true. The limitation of this model is only applicable in India.

Documents

White Paper on Regression