White Paper on Regression

Embed Size (px)

Citation preview

  • 8/8/2019 White Paper on Regression

    1/14

    PRAXIS BUSINESS SCHOOL

    White Paper on regression

    A Report

    Submitted to

    Dr. Prithwis Mukherjee

    In partial fulfilment of the requirements of the course

    Quantitative Technique-2

    On 07/09/2010

    By

    Ashish Maheshwari

    ( B09004)

    Statistical Modelling:

  • 8/8/2019 White Paper on Regression

    2/14

    Statistical modelling involves the appropriate application of statistical techniques, each

    requiring certain assumptions to perform hypothesis tests, interpret the data and reach valid

    conclusions. Data from experiments, product testing, simulation, surveys, and statistical

    process and quality control must be appropriately analyzed before results can be

    determined and conclusions drawn. The results from experiment or testing must be obtained

    following established statistical procedures, including experimental design and the

    appropriate use of statistical analysis and modelling techniques. These results can then be

    reproduced, within sampling error, by repeating the experiment.

    Statistical modelling requires careful selection of analytical techniques, verification ofassumptions, and verification of data. Descriptive statistics, graphs and relational plots ofthe data should first be examined to evaluate the legitimacy of the data, identify possibleoutcomes and assumptions and form preliminary ideas on variable relationships formodelling.

    Benefits:

    Application of appropriate statistical analysis techniques

    Development of appropriate conclusions and key learning from the data

    Ensuring results address experimental objectives

    Maximizing information gained from the data

    Maximizing chances of the experiment being successful

    Techniques:

    1. Statistical analysis and modelling techniques2. Descriptive techniques3. Data graphs, plots and exploratory data analysis4. Multi linear regression analysis

    5. Logistic regression6. Time series analysis7. Discrminant analysis8. Factor analysis9. Cluster analysis10. Multivariate analysis11. Nonparametric analysis12. Experimental design

    Pitfalls in using regression :

    Regression analysis are statistical tool that, when properly used, can help people to makedecisions. But of the times they are not used in a proper way, they are misused. As a result,decision makers often make inaccurate forecast. The most common errors made whileusing Regression is as follows:

    1. Specific limited range over which regression equation holds:

    A common mistake is to assume that the estimating line can be applied over any range of

    values. Hospital administrators can properly use regression analysis to predict therelationship between cost per bed and occupancy levels. Some administrators howeverincorrectly use the same regression to predict the cost per bed for occupancy levels that are

  • 8/8/2019 White Paper on Regression

    3/14

    significantly higher than those were used to estimate regression line. The people makedecision on one set of cost and find that the cost change drastically as occupancyincreases.

    2. Regression analysis do not determine cause and effect :

    Another mistake which we assume while doing regression analysis is to assume that a

    change in one variable is caused by change in the other variable.Considering the example of research and development expenses and annual profit toillustrate various aspects of regression analysis. It is really unlikely to say that profit in agiven year is caused by research and development expenditure in that year. In hightechnology industries the research and development activity can be used to explain profits,but a better way to do so would be to predict current profits in terms of past research anddevelopment expenditure including economic conditions, dollars spent on advertising andother variables .This can be done by using multiple regression techniques.

    3. Conditions change and invalidate the regression equation:

    Care must be taken when we use historical data to estimate the regression equation.Condition can change and violate one or more of the assumptions on which our regressionanalysis depends.

    4. Values of variable change over time:

    Another error which may arise is the dependence of some variables on time. Suppose a firmuses regression analysis to determine the relationship between the number of employeesand production volume. If the observation used in the analysis to determine extend back forseveral years, the resulting regression line may be too steep because it may fail torecognise the effect of changing technology.5.Relationships that have no common bond:

    When applying regression analysis people sometime find a relationship between twovariables that, in fact have no common bond.

    For example, to find a statistical relationship between a random variable of the number ofmiles per gallon consumed by eight different cars and the distance from earth to other eightplanets. But because there is no common bond between gas mileage and the distance toother planets, this relationship would be meaningless.

    6. Finding things that do not exist:

    In this regard, if one have to run a large number of regressions between many pairs ofvariables, it would be possible to get some interesting relationships. For example, to find ahigh statistical relationship between your income and the amount of beer consumed in theUS or even between the length of weight train and the weather. But in neither case there isa factor common to both variables. Hence, such relationships are meaningless.

    7. Misinterpreting r and r2 : :

    The coefficient of determination is misinterpreted if we use r2 to describe the percentage ofchange in the dependent variable that is caused by a change in the independent variable.This is wrong because r2 is a measure only of how well one variable describes another, notof how much of the change in one variable is caused by the other variable.

    Techniques of regression that can be used to model social and businessscenarios:

    Regression analysis is a statistical forecasting model that is concerned with describing andevaluating the relationship between the given two variables i.e. dependent and independent.

    Regression analysis can predict the outcome of a given key business indicator (dependentvariable) based on the interactions of other related business drivers (explanatory variables).

  • 8/8/2019 White Paper on Regression

    4/14

    Use of regression in Business model:

    1. Trend line analysis:Line regression is used in the creation of trend lines, which uses past data to predict futureperformance or trends. Usually trend lines are used in business to show the movement offinancial or product attributes over time to time. Stock prices, oil prices, or productspecification can all be analysed using trend lines.

    2. Risk analysis for Investments:The capital asset pricing model was developed using linear regression analysis, and acommon measure of the volatility of a stock or investment is its beta, which can bedetermined using linear regression. Linear regression and its use is key in assessing the riskassociated with the most investment vehicle.

    3. Sales or Market forecasts:Multivariate linear regression is a method for forecasting sales volume, or market movementto create comprehensive plans for growth. This method is more accurate than trendanalysis, as trend analysis only looks at how one variable changes with respect to another.

    4. Total quality control:

    Quality control methods make frequent use of linear regression to analyse key productspecifications and other measurable parameters of product or organisational quality (suchas number of complaints over time etc.)

    5. Linear Regression in Human resource:Linear regression methods are also used to predict the demographics and types of futurework force for large companies. This helps the companies to prepare the need of the workforce through development of good hiring plans and training plans for the existingemployees.

    Social Model:

    1. H ealth survey :

    Taking example of Tuberculosis scenario during National Family Health Survey. If we takethe relationship of reporting TB infection and seeking treatment for men and women byvarious socio- economic characteristics, multivariate logistic regression are applied to findthe significant factors explaining reporting TB and treatment- seeking.

    2. Analysis on Urbanization:

    Taking example of Chinas urbanization projection level, which can be projected by applyingregression model and S- curve regression model.

    Its formula is : ut=a0+a1*t

    Where, t is the independent variable of year, ut is the dependent variable of urbanisationlevel in year t.

    Based on the urbanisation level in 1990 cencus definition in the period of 1983-1999, theconstants in this formula are estimated and the linear regression simulation equation :

    Ut=-1026.54+0.529*t

    The static feature of this equation are as the following :

    R2=0.98, F= 714.46, sig F=0.00000,

    Which indicates that the simulation model is statistically significant.

    Source:(www.iiasa.ac.at/admin)

  • 8/8/2019 White Paper on Regression

    5/14

    3. Land use change scenario projections:

    If the study area includes all the countries in the world, We derive future proportions ofartificial surfaces per region from projections of population and GDP, using a regressionmodel. We calculated a linear regression model linking the proportion of artificial surfaces

    per region to the population and gross domestic per capita, with the country and urban typecity as additional factors.

    How does one test the validity of regression model in terms of

    a. Coefficient of determination:In statistics Coefficient of determination, R2 is used in the context of statistical models whosemain purpose of future outcomes on the basis of other related information. It is theproportion of validity in a data set that is accounted for by the statistical model. It provides ameasure of how well future outcomes are likely to be predicted by the model. There areseveral different definitions of R 2 which are only sometimes equivalent. One class of such

    cases includes that of linear regression. In this case, R2

    is simply the square of the samplecorrelation coefficient between the outcomes and their predicted values, or in the case ofsimple linear regression, between the outcome and the values being used for prediction. Insuch cases, the values vary from 0 to 1. If it is more towards 1, the model is valid and if itmore towards 0, the model is less valid.

    b. Statistical significance of the identified slope coefficients:

    The slope coefficient gives the degree of magnitude in change of independent variable ondependent variable. For example if slope coefficient is -2, it states that 1 % increase inindependent variable leads to 2 % decrease in dependent variable. It also gives us howimportant the independent variable is for deciding the future of dependent variable.

  • 8/8/2019 White Paper on Regression

    6/14

    Business model

    DATE OCL CHANGE(

    Y)

    DATE SENSE

    X

    CHANGE

    (Y)

    Mar-

    04

    212.0

    5

    Mar-

    04

    5,649.

    30

    Apr-

    04

    252 19% Apr-

    04

    5,599.

    12

    -1%

    May-

    04

    299 19% May-

    04

    5,645.

    86

    1%

    Jun-04 305 2% Jun-04 4,792.

    01

    -15%

    Jul-04 309.9 2% Jul-04 4,813.

    76

    0%

    Aug-

    04

    310 0% Aug-

    04

    5,193.

    25

    8%

    Sep-

    04

    338 9% Sep-

    04

    5,202.

    16

    0%

    Oct-

    04

    389.5 15% Oct-

    04

    5,587.

    46

    7%

    Nov-

    04

    361 -7% Nov-

    04

    5,678.

    65

    2%

    Dec-

    04

    359 -1% Dec-

    04

    6,259.

    28

    10%

    Jan-05 421 17% Jan-05 6,626.

    49

    6%

    Feb-

    05

    395 -6% Feb-

    05

    6,565.

    21

    -1%

    Mar-

    05

    426 8% Mar-

    05

    6,725.

    92

    2%

  • 8/8/2019 White Paper on Regression

    7/14

    Apr-

    05

    564 32% Apr-

    05

    6,506.

    60

    -3%

    May-

    05

    580 3% May-

    05

    6,183.

    07

    -5%

    Jun-05 572.1 -1% Jun-05 6,729.

    39

    9%

    Jul-05 575 1% Jul-05 7,165.

    45

    6%

    Aug-

    05

    650 13% Aug-

    05

    7,632.

    01

    7%

    Sep-

    05

    188 -71% Sep-

    05

    7,818.

    90

    2%

    Oct-

    05

    159.9 -15% Oct-

    05

    8,662.

    99

    11%

    Nov-

    05

    120 -25% Nov-

    05

    7,989.

    86

    -8%

    Dec-

    05

    151 26% Dec-

    05

    8,813.

    82

    10%

    Jan-06 155 3% Jan-06 9,422.

    49

    7%

    Feb-

    06

    150.3 -3% Feb-

    06

    9,959.

    24

    6%

    Mar-

    06

    144 -4% Mar-

    06

    10,368

    .75

    4%

    Apr-

    06

    148.9

    5

    3% Apr-

    06

    11,342

    .96

    9%

    May-

    06

    206.9 39% May-

    06

    12,103

    .78

    7%

    Jun-06 159.9

    5

    -23% Jun-06 10,472

    .46

    -13%

    Jul-06 142.6

    5

    -11% Jul-06 10,616

    .97

    1%

    Aug-

    06

    153.3 7% Aug-

    06

    10,737

    .50

    1%

    Sep-

    06

    158.8

    5

    4% Sep-

    06

    11,699

    .57

    9%

    Oct- 172.5 9% Oct- 12,473 7%

  • 8/8/2019 White Paper on Regression

    8/14

    06 06 .79

    Nov-

    06

    170.2

    5

    -1% Nov-

    06

    12,992

    .62

    4%

    Dec-

    06

    172 1% Dec-

    06

    13,729

    .67

    6%

    Jan-07 166.6 -3% Jan-07 13,827

    .77

    1%

    Feb-

    07

    172 3% Feb-

    07

    14,124

    .36

    2%

    Mar-

    07

    154.2 -10% Mar-

    07

    13,013

    .74

    -8%

    Apr-

    07

    141 -9% Apr-

    07

    12,811

    .93

    -2%

    May-

    07

    149 6% May-

    07

    13,987

    .77

    9%

    Jun-07 151.7

    5

    2% Jun-07 14,610

    .28

    4%

    Jul-07 147.6

    5

    -3% Jul-07 14,685

    .16

    1%

    Aug-

    07

    148 0% Aug-

    07

    15,344

    .02

    4%

    Sep-

    07

    143 -3% Sep-

    07

    15,401

    .99

    0%

    Oct-

    07

    162 13% Oct-

    07

    17,356

    .99

    13%

    Nov-

    07

    302.1 86% Nov-

    07

    20,130

    .23

    16%

    Dec-07 320 6% Dec-07 19,547.09 -3%

    Jan-08 340 6% Jan-08 20,325

    .27

    4%

    Feb-

    08

    227 -33% Feb-

    08

    17,820

    .67

    -12%

    Mar-

    08

    209.6 -8% Mar-

    08

    17,227

    .56

    -3%

    Apr-08

    150 -28% Apr-08

    15,771.72

    -8%

  • 8/8/2019 White Paper on Regression

    9/14

    May-

    08

    138.2

    5

    -8% May-

    08

    17,560

    .15

    11%

    Jun-08 132 -5% Jun-08 16,591

    .46

    -6%

    Jul-08 99.05 -25% Jul-08 13,480

    .02

    -19%

    Aug-

    08

    95.15 -4% Aug-

    08

    14,064

    .26

    4%

    Sep-

    08

    96.6 2% Sep-

    08

    14,412

    .99

    2%

    Oct-

    08

    68 -30% Oct-

    08

    13,006

    .72

    -10%

    Nov-

    08

    62 -9% Nov-

    08

    10,209

    .37

    -22%

    Dec-

    08

    41 -34% Dec-

    08

    9,162.

    94

    -10%

    Jan-09 43.45 6% Jan-09 9,720.

    55

    6%

    Feb-

    09

    50.9 17% Feb-

    09

    9,340.

    37

    -4%

    Mar-

    09

    43.6 -14% Mar-

    09

    8,762.

    88

    -6%

    Apr-

    09

    45.95 5% Apr-

    09

    9,745.

    77

    11%

    May-

    09

    71 55% May-

    09

    11,635

    .24

    19%

    Jun-09 95.55 35% Jun-09 14,746

    .51

    27%

    Jul-09 96.9 1% Jul-09 14,506

    .43

    -2%

    Aug-

    09

    112.9

    5

    17% Aug-

    09

    15,694

    .78

    8%

    Sep-

    09

    131.0

    5

    16% Sep-

    09

    15,691

    .27

    0%

    Oct-

    09

    138 5% Oct-

    09

    17,186

    .20

    10%

    Nov- 110 -20% Nov- 15,838 -8%

  • 8/8/2019 White Paper on Regression

    10/14

    09 09 .63

    Dec-

    09

    111.4

    5

    1% Dec-

    09

    16,947

    .46

    7%

    Jan-10 126.8 14% Jan-10 17,473

    .45

    3%

    Feb-

    10

    128.5 1% Feb-

    10

    16,339

    .32

    -6%

    1.613675

    94%

    1.86311

    25%

    (Source: www..bseindia..com)

    The data above shows the closing price per month of Orissa cements limited starting from March 04to Februarys 10 vis-a -vis data of sensex starting from march 04 to February 10. Therefore, by

    running regression analysis with the help of this data, we can calculate the Beta of the given stock.

    When analysts use capital asset pricing model (CAPM), they generally use regression to calculate

    Beta. Beta is use to calculate the cost of capital for a company. It helps in valuing a company and

    further equity research and recommendation to the investors.

    Hypothesis 1:

    Stock price of a company depends upon sensex.

    Hypothesis 2:

    The stock price of the company is more sensitive than the sensex.

    Since the statistical use of regression may overwhelm some, Microsoft excel has packaged them in

    their standard copy of the software. Below, excel 7.0 is used to illustrate the ease of calculating the

    regression.

    Step 1:

    Dependent variable: Stock price of OCL.

    Step2:

    Independent variable: Sensex price

    Step 3:

    Obtain data for dependent variable and independent variable from past periods. For this business

    model, we will use stock of OCL as well as sensex, starting from March 04 to February 10 .

    Step 4:

    Run the regression to assess the level of fit. In order to complete regression analysis, we first need to

    add a piece of software that comes with standard version of excel. Once the information is input,

  • 8/8/2019 White Paper on Regression

    11/14

    select the data which to be analysed and run the regression tool to view regression dialog bbox. Keep

    in mind that the Y range is the dependent variable and the X range is the independent variable.

    The performance of sensex is equal to the collective

    performance of all the fifty companies stock in BSE.

    We assume here that the volatility of sensex will

    affect the stock price of a company. If an increase in

    sensex increases the stock price then there is a

    positive correlation in between them and vice-versa.

    Y=0.2305x+0.0159

    Executive Summary:

    The above linear regression model gives us idea of Beta of the stock of a company which in turn

    infers about the volatility of that stock. This also presents us the fact how the stock of a company is

    performing in the market and whether it in accordance with the economic growth of the country. It

    simplifies the fact that the sensex returns for a day have a positive or a negative impact on the daily

    stock return of a company.

    Regression Statistics

    Multiple R0.5547

    17

    R Square0.3077

    11Adjusted RSquare

    0.297677

    Standard Error0.1737

    84

    Observations 71

    ANOVA

    df SSRegression 1 0.92624

    2Residual 69 2.08386

    5

    Total 70 3.010107

    Coefficients

    StandardError

    Intercept-

    0.009160.02112

    4

    X Variable 11.35798

    90.24521

    3

    2. R2 statistic

    for analysis

    purpose

    3. Standard

    error for each4. Total sum of

    squared regression.

    5. Total sum of

    squared errors.

    6. Total sum of

    squares.

    1. Basic

    R2

  • 8/8/2019 White Paper on Regression

    12/14

    Business Model:

    Years

    No. of carssold

    fuel price per barrelin Rs

    1/fuel price per barrelin Rs

    Per capitaincome

    2002 6626387 1112.67 0.000898738 19040

    2003 6240526 1292.85 0.000773487 20989

    2004 6814554 1702.16 0.000587491 23241

    2005 7338314 2177.74 0.000459191 20813

    2006 8036010 2643.91 0.000378228 23222

    2007 8534690 2605.88 0.000383748 29382

    2008 9237780 4258.39 0.00023483 37490

    I have taken data of number of car sold of Toyota , fuel price per barrel and per capita income from

    year 2002 to 2008.

    Source:

    Number of passenger vehicle sold in India (2002-2008) www.siam.com

    Per capita income of India ( 2002-2008) www.economywatch.com

    Crude oil price ( 2002- 2008) www.ioga.com

    Dependent Variable : Number of car sold

    Independent variable: 1/ fuel price per barrel in Rs. and per capita consumption

    The business model in this context is to find out the dependency of sale of Toyota cars in relation to

    fuel price and per capita income. From this model we can forecast the sale of Toyota.

    Hypothesis 1:

    Sale of Toyota car depend upon per capita income

    Hypothesis 2:

    Sale of Toyota car depend upon fuel price.

    SUMMARY OUTPUT

  • 8/8/2019 White Paper on Regression

    13/14

    Regression Statistics

    Multiple R0.9493

    42

    R Square0.9012

    49Adjusted RSquare

    0.851874

    Standard Error421834

    .6

    Observations 7

    ANOVA

    df SS MS F Significanc

    e F

    Regression 2 6.5E+123.25E+

    1218.253

    04 0.009752

    Residual 4 7.12E+111.78E+

    11

    Total 6 7.21E+12

    Coefficients

    StandardError t Stat

    P-value

    Lower95%

    Upper95%

    Lower95.0%

    Upper95.0%

    Intercept 6958610 15563674.4710

    580.0110

    66 26374411127977

    9 2637441 11279779

    1/fuel price per barrelin Rs -2.5E+09 1.14E+09

    -2.2396

    80.0886

    53 -5.7E+096.11E+0

    8 -5.7E+09 6.11E+08

    Per capita income 77.99742 41.609681.8745

    020.1341

    31 -37.5296193.524

    4 -37.5296 193.5244

    R2 is 0.94 which is very near to 1, that indicates sale of Toyota cars is depend on fuel price as well as

    per capita income. The model can be Y=6958610-2.5E+0.9x1 + 77.99742x2

    Where,

    Y= sale of Toyota car.

    X1 =1/ fuel price per barrel in Rs.

    X2= per capita income.

    Y=-4E+09x + 1E+07

    Y= 149.56x+4E+06

  • 8/8/2019 White Paper on Regression

    14/14

    Executive Summary :

    The above model gives idea about the expected sale of Toyota car next year. In this model fuel price

    and per capita income are to be taken as independent variable. So its easy to get a data of expected

    per capita income and fuel price. We can put data in this model and easily find out the expected sale

    of Toyota car next year. Here in this model the assumption is that sale of Toyota is only depend on

    the two variables which may or may not be true. The limitation of this model is only applicable in India.