Regression & Correlation Class 2011-2013 L17

Embed Size (px)

Citation preview

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    1/37

    Regression Models

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    2/37

    Introduction

    Regression analysisis a very valuabletool for a manager

    Regression can be used to

    Understand the relationship betweenvariables

    Predict the value of one variable based onanother variable

    Simple linear regression models haveonly two variables

    Multiple regression models have morevariables

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    3/37

    Coefficient of Correlation

    Measures the relative strength of the linear relationshipbetween two variables

    1. Price and Quantity demand (Simple Regression)

    2. Height and Weight (Simple Regression)

    3. Advertisement expenditure and Sales(Simple Regression)

    4. Family income and expenditure on luxury items

    (Simple Regression)

    5. Sales revenue of the product is influenced by the Adv. exp., Quality

    of the product, and price. (Multiple Regression)6. Employer-employee relationship in any organization may be

    examined with reference to training and development facilities,medical, housing, salary structure etc (Multiple Regression)

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    4/37

    Introduction

    The variable to be predicted is calledthe dependent variable Sometimes called the response variable

    The value of this variable depends onthe value of the independent variable Sometimes called the explanatoryor

    predictor variable

    Independentvariable

    Dependentvariable

    Independentvariable

    = +

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    5/37

    Scatter Diagram

    Graphing is a helpful way to investigatethe relationship between variables

    A scatter diagramor scatter plotis

    often used The independent variable is normally

    plotted on theXaxis

    The dependent variable is normallyplotted on the Yaxis

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    6/37

    Triple A Construction

    Triple A Construction renovates old homes

    They have found that the dollar volume ofrenovation work is dependent on the areapayroll

    TRIPLE AS SALES(Rs100,000s)

    LOCAL PAYROLL(Rs100,000,000s)

    6 3

    8 4

    9 65 4

    4.5 2

    9.5 5

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    7/37

    Triple A Construction

    12

    10

    8

    6

    4

    2

    0

    Sales(Rs100,0

    00)

    Payroll (Rs100 million)

    | | | | | | | |

    0 1 2 3 4 5 6 7 8

    Fig. 1

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    8/37

    Simple Linear Regression

    where

    Y = dependent variable (response)

    X = independent variable (predictor or explanatory)

    0 = constant (value of YwhenX= 0)

    1 = slope of the regression line

    = random error

    Regression models are used to test if there is arelationship between variables

    There is some random error that cannot bepredicted

    XY10

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    9/37

    Simple Linear Regression

    True values for the slope and constant are notknown so they are estimated using sample data

    XbbY 10

    where

    Y = dependent variable (response)

    X

    = independent variable (predictor or explanatory)b0 = constant (value of YwhenX= 0)

    b1 = slope of the regression line

    ^

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    10/37

    Triple A Construction

    Triple A Construction is trying to predict salesbased on area payroll

    Y= Sales

    X= Area payroll

    The line chosen in Figure 1 is the one thatminimizes the errors

    Error = (Actual value) (Predicted value)

    YYe

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    11/37

    Triple A Construction

    For the simple linear regression model, the values of theconstant and slope can be calculated using the formulaebelow

    XbbY 10

    0 1

    1 1

    N N

    n n

    Y nb b X

    2

    0 1

    1 1 1

    N N N

    n n n

    YX b X b X

    Step 1.

    Step 2.

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    12/37

    Triple A Construction

    Regression calculations (Step 3)

    Y X XY X2

    6 3 18 9

    8 4 32 16

    9 6 54 36

    5 4 20 16

    4.5 2 9 4

    9.5 5 47.5 25

    Y= 42Y= 42/6 = 7

    X= 24X= 24/6 = 4

    5.180XY 1062

    X

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    13/37

    Slop, constant and equation

    25.1;2 10 bb

    XY 2512 .Therefore

    Sales = 2 + 1.25(Payroll)

    If the payroll next year is Rs600 million

    000,9505.9)6(25.12 orY

    Put values from step 3 to Step 1; 2 equations

    Results are:

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    14/37

    Measuring the Fitof the Regression Model

    Regression models can be developedfor any variablesXand Y

    How do we know the model is actually

    helpful in predicting Ybased onX? We could just take the average error, but

    the positive and negative errors wouldcancel each other out

    Three measures of variability are SST Total variability about the mean

    SSE Variability about the regression line

    SSR Total variability that is explained bythe model

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    15/37

    Measuring the Fitof the Regression Model

    Sum of the squares total2)( YYSST

    Sum of the squared error22)( YYeSSE

    Sum of squares due to regression

    2

    )

    ( YYSSR

    An important relationship

    SSESSRSST

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    16/37

    Measuring the Fitof the Regression Model

    Y X (YY)2 Y (YY)2 (YY)2

    6 3 (6 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563

    8 4 (8 7)2 = 1 2 + 1.25(4) = 7.00 1 0

    9 6 (9 7)2 = 4 2 + 1.25(6) = 9.50 0.25 6.25

    5 4 (5 7)2 = 4 2 + 1.25(4) = 7.00 4 0

    4.5 2 (4.5 7)2 = 6.25 2 + 1.25(2) = 4.50 0 6.25

    9.5 5 (9.5 7)2 = 6.25 2 + 1.25(5) = 8.25 1.5625 1.563

    (YY)2 = 22.5 (YY)2 = 6.875 (YY)2 = 15.625

    Y= 7 SST= 22.5 SSE = 6.875 SSR = 15.625

    ^

    ^^

    ^^

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    17/37

    Sum of the squares total2)( YYSST

    Sum of the squared error22)( YYeSSE

    Sum of squares due to regression

    2

    )

    ( YYSSR

    An important relationship

    SSESSRSST

    Measuring the Fitof the Regression Model

    For Triple A Construction

    SST= 22.5

    SSE = 6.875

    SSR = 15.625

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    18/37

    Measuring the Fitof the Regression Model

    12

    10

    8

    6

    4

    2

    0

    Sales(Rs100,0

    00)

    Payroll (Rs100 million)

    | | | | | | | |

    0 1 2 3 4 5 6 7 8

    Y= 2 + 1.25X^

    YYYY

    ^

    YYY^

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    19/37

    Coefficient of Determination

    The proportion of the variability in Yexplained byregression equation is called the coefficient ofdetermination

    The coefficient of determination isr2

    SST

    SSE

    SST

    SSRr 12

    For Triple A Construction

    69440522

    625152 ..

    .r

    About 69% of the variability in Yis explained bythe equation based on payroll (X)

    r2= 0 to 1

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    20/37

    Correlation Coefficient

    Thecorrelation coefficientis an expression of thestrength of the linear relationship

    It will always be between +1 and1

    The correlation coefficient isr

    2rr

    For Triple A Construction

    8333069440 ..r

    r will be ve if slope isve and +ve if slope is +ve

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    21/37

    Correlation Coefficient

    *

    **

    *(a) Perfect Positive

    Correlation:r = +1

    X

    Y

    *

    **

    *

    (c) No Correlation:r = 0

    X

    Y

    * *

    **

    * *

    * **

    *

    (d) Perfect NegativeCorrelation:

    r =1

    X

    Y

    * **

    *

    * ***

    *(b) Positive

    Correlation:0

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    22/37

    Using Excel for Regression

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    23/37

    Using Excel for Regression

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    24/37

    Using Excel for Regression

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    25/37

    Using Excel for Regression

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    26/37

    Using Excel for RegressionCorrelation coefficient is

    called Multiple R in Excel

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    27/37

    Multiple Regression Analysis

    Multiple regression modelsareextensions to the simple linear modeland allow the creation of models withseveral independent variables

    Y= 0 + 1X1 + 2X2+ + kXk +

    where

    Y= dependent variable (response variable)

    Xi = ith independent variable (predictor or explanatoryvariable)

    0 = constant (value of Ywhen allXi= 0)I = coefficient of the ith independent variable

    k = number of independent variables

    = random error

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    28/37

    Multiple Regression Analysis

    To estimate these values, a sample is takenthe following equation developed

    kkXbXbXbbY ...

    22110

    where

    = predicted value of Y

    b0 = sample constant (and is an estimate of 0)

    bi= sample coefficient of the ith variable (and isan estimate of i)

    Y

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    29/37

    Jenny Wilson Realty

    Jenny Wilson wants to develop a model todetermine the suggested listing price for housesbased on the size and age of the house

    kkXbXbXbbY ...

    22110

    where

    = predicted value of dependent variable (sellingprice)

    b0 = Yconstant

    X1

    andX2

    = value of the two independent variables (squarefootage and age) respectively

    b1 andb2 = slopes forX1 andX2 respectively

    Y

    She selects a sample of houses that have soldrecently and records the data shown in followingTable

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    30/37

    Jenny Wilson Realty

    SELLINGPRICE (Rs)

    SQUAREFOOTAGE

    AGE OFHOUSE

    CONDITION

    95,000 1,926 30 Good

    119,000 2,069 40 Excellent

    124,800 1,720 30 Excellent

    135,000 1,396 15 Good142,000 1,706 32 Mint

    145,000 1,847 38 Mint

    159,000 1,950 27 Mint

    165,000 2,323 30 Excellent

    182,000 2,285 26 Mint

    183,000 3,752 35 Good

    200,000 2,300 18 Good

    211,000 2,525 17 Good

    215,000 3,800 40 Excellent

    219,000 1,740 12 Mint

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    31/37

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    32/37

    Press OK

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    33/37

    One Dep. variable Selling price; Two indep. Variables

    Square Footage & Age. So Input X Range will containBoth dataset (Square footage & Age together i.e. B2:C15).

    And remaining process is same.

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    34/37

    Constant i.e. b0

    Proportion of Ist Indep. Var.i.e. Square footage

    Proportion of IInd Indep. Var.i.e. Age

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    35/37

    Jenny Wilson Realty

    21289944146631 XXY

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    36/37

    Assumptions of the Regression Model

    1. Errors are independent

    2. Errors are normally distributed

    3. Errors have a mean of zero

    4. Errors have a constant variance

    If we make certain assumptions about the errorsin a regression model, we can perform statisticaltests to determine if the model is useful

    A plot of the residuals (errors) will often highlightany glaring violations of the assumption

  • 7/31/2019 Regression & Correlation Class 2011-2013 L17

    37/37

    Example

    Bus and Subway ridership in Delhi during winter months is believed to be

    heavily tied to the number of tourists visiting the city. During the past 12

    years, the following data have been obtained:

    Develop a regression model.

    What is expected ridership if 10 thousand tourists visit the city?

    If there are no tourists at all, explain the predicted ridership.

    Year 1 2 3 4 5 6 7 8 9 10 11 12No. of Tourists

    (10,000s)

    7 2 6 4 14 15 16 12 14 20 15 7

    Ridership

    (1000s)

    15 10 13 15 25 27 24 20 27 44 34 17