BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

Embed Size (px)

Citation preview

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    1/80

    Business Statistics

    ABS-Bangalore

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    2/80

    Dr. R. Venkatamuni ReddyAssociate Professor

    Contact: 09632326277, 080-30938181

    [email protected]

    [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    3/80

    Correlation, Simple andMultiple Regression

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    4/80

    Learning Objectives

    In this chapter

    How to use regression analysis to predict the value of a

    dependent variable based on an independent variable

    The meaning of the regression coefficients b0 and b1

    How to evaluate the assumptions of regression analysis and

    know what to do if the assumptions are violated

    To make inferences about the slope and correlation coefficient

    To estimate mean values and predict individual values

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    5/80

    Correlation vs. Regression

    A scatter diagram can be used to show therelationship between two variables

    Correlation analysis is used to measure strengthof the association (linear relationship)between two variables

    Correlation is only concerned with strength of therelationship

    No causal effect is implied with correlation

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    6/80

    Introduction toRegression Analysis

    Regression analysis is used to:

    Predict the value of a dependent variable based on the value

    of at least one independent variable

    Explain the impact of changes in an independent variable on

    the dependent variable

    Dependent variable: the variable we wish to predict

    or explainIndependent variable: the variable used to explain

    the dependent variable

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    7/80

    Simple Linear RegressionModel

    Only one independent variable, X

    Relationship between X and Y is described by

    a linear function

    Changes in Y are assumed to be caused by

    changes in X

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    8/80

    Types of Relationships

    YY

    XX

    YY

    XX

    YY

    YY

    XX

    XX

    Linear relationshipsLinear relationships Curvilinear relationshipsCurvilinear relationships

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    9/80

    Types of Relationships

    YY

    XX

    YY

    XX

    YY

    YY

    XX

    XX

    Strong relationshipsStrong relationships Weak relationshipsWeak relationships

    (continued)(continued)

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    10/80

    Types of Relationships

    YY

    XX

    YY

    XX

    No relationshipNo relationship

    (continued)(continued)

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    11/80

    ii10i XY ++=Linear componentLinear component

    Simple Linear RegressionModel

    PopulationPopulation

    Y interceptY intercept

    PopulationPopulation

    SlopeSlope

    CoefficientCoefficient

    RandomRandom

    ErrorError

    termtermDependentDependent

    VariableVariable

    IndependentIndependent

    VariableVariable

    Random ErrorRandom Error

    componentcomponent

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    12/80

    (continued)(continued)

    Random Error forRandom Error for

    this Xthis Xii valuevalue

    YY

    XX

    Observed ValueObserved Value

    of Y for Xof Y for Xii

    Predicted ValuePredicted Value

    of Y for Xof Y for Xii

    ii10i XY ++=

    XXii

    Slope =Slope = 11

    Intercept =Intercept = 00

    ii

    Simple Linear RegressionModel

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    13/80

    i10i XbbY +=

    The simple linear regression equation provides anThe simple linear regression equation provides an estimateestimate ofof

    the population regression linethe population regression line

    Simple Linear RegressionEquation (Prediction Line)

    Estimate of theEstimate of theregressionregression

    interceptintercept

    Estimate of theEstimate of theregression sloperegression slope

    Estimated (orEstimated (or

    predicted) Ypredicted) Yvalue forvalue for

    observation iobservation i

    Value of X forValue of X for

    observation iobservation i

    The individual random error terms eThe individual random error terms eii have a mean of zerohave a mean of zero

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    14/80

    Least Squares Method

    b0 and b1 are obtained by finding the values of b0 and b1

    that minimize the sum of the squared differences

    between Y and :

    2

    i10i

    2

    ii ))Xb(b(Ymin)Y(Ymin +=

    Y

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    15/80

    Finding the Least SquaresEquation

    The coefficients b0

    and b1, and other regression

    results will be found using Excel or SPSS or

    STATA

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    16/80

    b0is the estimated average value of Y

    when the value of X is zero

    b1is the estimated change in the average

    value of Y as a result of a one-unit changein X

    Interpretation of theSlope and the Intercept

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    17/80

    Simple Linear RegressionExample

    A real estate agent wishes to examine the relationship

    between the selling price of a home and its size

    (measured in square feet)

    A random sample of 10 houses is selected

    Dependent variable (Y) = house price in $1000s

    Independent variable (X) = square feet

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    18/80

    Sample Data for HousePrice Model

    House Price in $1000s(Y)

    Square Feet(X)

    245 1400

    312 1600

    279 1700308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    19/80

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    0 500 1000 1500 2000 2500 3000

    Square Fee

    HousePrice($1000s)

    Graphical Presentation

    House price model: scatter plot

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    20/80

    Regression Using ExcelTools / Data Analysis / Regression

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    21/80

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    The regression equation is:The regression equation is:

    feet)(square0.1097798.24833pricehouse +=

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    22/80

    0

    50100

    150

    200

    250

    300

    350

    400

    450

    0 500 1000 1500 2000 2500 3000

    Square Fee

    HousePrice($1000s)

    Graphical PresentationHouse price model: scatter plot andregression line

    feet)(square0.1097798.24833pricehouse +=

    SlopeSlope

    = 0.10977= 0.10977

    InterceptIntercept= 98.248= 98.248

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    23/80

    Interpretation of theIntercept, b0

    b0

    is the estimated average value of Y when the

    value of X is zero (if X = 0 is in the range of

    observed X values)Here, no houses had 0

    square feet, so b0 = 98.24833just indicates that, for houses within the range

    of sizes observed, $98,248.33 is the portion of

    the house price not explained by square feet

    feet)(square0.1097798.24833pricehouse +=

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    24/80

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    25/80

    317.85

    0)0.1098(20098.25

    (sq.ft.)0.109898.25pricehouse

    =

    +=

    +=

    Predict the price for a housePredict the price for a house

    with 2000 square feet:with 2000 square feet:

    The predicted price for a house with 2000The predicted price for a house with 2000

    square feet is 317.85($1,000s) = $317,850square feet is 317.85($1,000s) = $317,850

    Predictions usingRegression Analysis

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    26/80

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    0 500 1000 1500 2000 2500 3000

    HousePrice($1

    000s)

    Interpolation vs. Extrapolation

    When using a regression model for prediction,only predict within the relevant range of data

    Relevant range forRelevant range for

    interpolationinterpolation

    Do not try toDo not try to

    extrapolate beyondextrapolate beyond

    the range ofthe range of

    observed Xsobserved Xs

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    27/80

    Measures of Variation

    Total variation is made up of two parts:

    SSESSRSST +=Total Sum ofTotal Sum of

    SquaresSquaresRegression Sum ofRegression Sum of

    SquaresSquaresError Sum ofError Sum of

    SquaresSquares

    =

    2

    i )YY(SST =

    2

    ii )Y

    Y(SSE=

    2

    i )YY

    (SSRwhere:where:

    = Average value of the dependent variable= Average value of the dependent variable

    YYii = Observed values of the dependent variable= Observed values of the dependent variable

    ii = Predicted value of Y for the given X= Predicted value of Y for the given Xii valuevalueY

    Y

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    28/80

    Measures of Variation

    SST = Total sum of squares

    Measures the variation of the Yi values around their mean Y

    SSR = regression sum of squares

    Explained variation attributable to the relationship between X

    and Y

    SSE = error sum of squares

    Variation attributable to factors other than the relationship

    between X and Y

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    29/80

    (continued)(continued)

    XXii

    YY

    XX

    YYii

    SSTSST ==(Y(Yii-- YY))22SSESSE == (Y(Yii-- YYii ))22

    SSR =SSR = ((YYii-- YY))22

    __

    __

    __

    YY

    YY

    YY

    __YY

    Measures of Variation

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    30/80

    The coefficient of determination is the portion of

    the total variation in the dependent variable that is

    explained by variation in the independent variable

    The coefficient of determination is also called r-squared and is denoted as r2

    Coefficient of Determination,r2

    1r0 2 note:note:

    squaresofsumtotalsquaresofsumregression

    SSTSSRr2 ==

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    31/80

    rr22 = 1= 1

    Examples of Approximater2 Values

    YY

    XX

    YY

    XX

    rr22 = 1= 1

    rr22 = 1= 1

    Perfect linear relationship betweenPerfect linear relationship betweenX and Y:X and Y:

    100% of the variation in Y is100% of the variation in Y is

    explained by variation in Xexplained by variation in X

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    32/80

    Examples of Approximater2 Values

    YY

    XX

    YY

    XX

    0 < r0 < r22 < 1< 1

    Weaker linear relationshipsWeaker linear relationshipsbetween X and Y:between X and Y:

    Some but not all of the variationSome but not all of the variation

    in Y is explained by variation inin Y is explained by variation in

    XX

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    33/80

    Examples of Approximater2 Values

    rr22 = 0= 0

    No linear relationship between XNo linear relationship between Xand Y:and Y:

    The value of Y does not dependThe value of Y does not depend

    on X. (None of the variation in Yon X. (None of the variation in Y

    is explained by variation in X)is explained by variation in X)

    YY

    XXrr22 = 0= 0

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    34/80

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    58.08% of the variation in house58.08% of the variation in house

    prices is explained by variation inprices is explained by variation in

    square feetsquare feet

    0.5808232600.500018934.9348

    SSTSSRr2 ===

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    35/80

    Standard Error of EstimateThe standard deviation of the variation of

    observations around the regression line is estimated

    by

    2n

    )YY(

    2n

    SSES

    n

    1i

    2

    ii

    YX

    =

    ==

    WhereWhere

    SSE = error sum of squaresSSE = error sum of squares

    n = sample sizen = sample size

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    36/80

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    41.33032SYX =

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    37/80

    Comparing Standard Errors

    YYYY

    XX XXYXssmall YXslarge

    SSYXYX is a measure of the variation of observed Y valuesis a measure of the variation of observed Y values

    from the regression linefrom the regression line

    The magnitude of SThe magnitude of SYXYX should always be judged relative to the size ofshould always be judged relative to the size of

    the Y values in the sample datathe Y values in the sample data

    i.e., Si.e., SYXYX = $41.33K is= $41.33K ismoderately small relative to house prices in themoderately small relative to house prices in the

    $200 - $300K range$200 - $300K range

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    38/80

    Assumptions of RegressionUse the acronym LINE:

    Linearity

    The underlying relationship between X and Y is linear

    Independence of Errors Error values are statistically independent

    Normality of Error Error values () are normally distributed for any given value of X

    Equal Variance (Homoscedasticity) The probability distribution of the errors has constant variance

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    39/80

    Residual Analysis

    The residual for observation i, ei, is the difference between its

    observed and predicted value

    Check the assumptions of regression by examining the residuals Examine for linearity assumption

    Evaluate independence assumption

    Evaluate normal distribution assumption

    Examine for constant variance for all levels of X (homoscedasticity)

    Graphical Analysis of Residuals

    Can plot residuals vs. X

    iii YYe =

    Resid al Anal sis for

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    40/80

    Residual Analysis forLinearity

    Not LinearNot Linear LinearLinear

    xx

    res

    id

    ual

    s

    res

    id

    ual

    s

    xx

    YY

    xx

    YY

    xx

    res

    id

    ual

    s

    res

    id

    ual

    s

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    41/80

    Residual Analysis forResidual Analysis forIndependenceIndependence

    Not IndependentNot Independent

    IndependentIndependent

    XX

    XXres

    id

    ua

    ls

    res

    id

    ua

    ls

    res

    id

    ua

    ls

    res

    id

    ua

    ls

    XX

    res

    id

    ua

    ls

    res

    id

    ua

    ls

    Residual Analysis for

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    42/80

    Residual Analysis forNormality

    PercentPercent

    ResidualResidual

    A normal probability plot of the residuals can be used toA normal probability plot of the residuals can be used to

    check for normality:check for normality:

    -3 -2 -1 0 1 2 3-3 -2 -1 0 1 2 3

    00

    100100

    R id l A l i f

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    43/80

    Residual Analysis forEqual Variance

    Non-constant varianceNon-constant variance Constant varianceConstant variance

    xx xx

    YY

    xx xx

    YY

    res

    id

    ual

    s

    res

    id

    ua

    ls

    res

    id

    ual

    s

    res

    id

    ua

    ls

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    44/80

    House Price Model Residual Pl

    -60

    -40

    -20

    0

    20

    40

    60

    80

    0 1000 2000 300

    Square Fee

    Residuals

    Excel Residual Output

    RESIDUAL OUTPUT

    Predicted

    House Price

    Residuals

    1 251.92316 -6.923162

    2 273.87671 38.123293 284.85348 -5.853484

    4 304.06284 3.937162

    5 218.99284 -19.99284

    6 268.38832 -49.38832

    7 356.20251 48.797498 367.17929 -43.17929

    9 254.6674 64.33264

    10 284.85348 -29.85348

    Does not appear to violateDoes not appear to violate

    any regression assumptionsany regression assumptions

    easur ng u ocorre a on:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    45/80

    Used when data are collected over time

    to detect if autocorrelation is present

    Autocorrelation exists if residuals in one

    time period are related to residuals in

    another period

    easur ng u ocorre a on:The Durbin-Watson

    Statistic

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    46/80

    AutocorrelationAutocorrelation is correlation of the errors (residuals) over time

    Violates the regression assumption that residuals areViolates the regression assumption that residuals are

    random and independentrandom and independent

    Time (t) Residual Plot

    -15

    -10

    -5

    0

    5

    10

    15

    0 2 4 6 8

    Time (t)

    Residuals

    Here, residuals show aHere, residuals show a

    cyclic pattern, notcyclic pattern, not

    random. Cyclical patternsrandom. Cyclical patterns

    are a sign of positiveare a sign of positive

    autocorrelationautocorrelation

    Th D bi W t

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    47/80

    The Durbin-WatsonStatistic

    =

    =

    = n

    1i

    2

    i

    n

    2i

    2

    1ii

    e

    )ee(

    D

    The possible range is 0The possible range is 0 D 4 D 4

    D should be close to 2 if HD should be close to 2 if H00 is trueis true

    D less than 2 may signal positiveD less than 2 may signal positive

    autocorrelation, D greater than 2 mayautocorrelation, D greater than 2 may

    signal negative autocorrelationsignal negative autocorrelation

    The Durbin-Watson statistic is used to test for

    autocorrelation

    HH00: residuals are not correlated: residuals are not correlated

    HH11: positive: positive autocorrelation is presentautocorrelation is present

    T ti f P iti

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    48/80

    Testing for PositiveAutocorrelation

    Calculate the Durbin-Watson test statistic = DCalculate the Durbin-Watson test statistic = D

    (The Durbin-Watson Statistic can be found using Excel or Minitab)(The Durbin-Watson Statistic can be found using Excel or Minitab)

    Decision rule: reject HDecision rule: reject H00

    if D < dif D < dLL

    HH00: positive autocorrelation does not exist: positive autocorrelation does not exist

    HH11::positive autocorrelation is presentpositive autocorrelation is present

    00 ddUU 22ddLL

    Reject HReject H00 Do not reject HDo not reject H00

    Find the values dFind the values dLL and dand dUU from the Durbin-Watson tablefrom the Durbin-Watson table

    (for sample size(for sample size nn and number of independent variablesand number of independent variables kk))

    InconclusiveInconclusive

    i f i i

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    49/80

    Suppose we have the following time series data:

    Is there autocorrelation?

    y = 30.65 + 4.7038

    R2

    = 0.8976

    0

    20

    40

    60

    80

    100

    120

    140

    160

    0 5 10 15 20 25 30

    Time

    Sales

    Testing for PositiveTesting for PositiveAutocorrelationAutocorrelation

    (continued)(continued)

    T ti f P iti

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    50/80

    Example with n = 25:

    Durbin-Watson Calculations

    Sum of Squared

    Difference of Residuals

    3296.18

    Sum of SquaredResiduals

    3279.98

    Durbin-Watson Statistic 1.00494

    y = 30.65 + 4.7038

    R2 = 0.8976

    0

    20

    40

    60

    80

    100

    120

    140

    160

    0 5 10 15 20 25 30

    Time

    Sales

    Testing for PositiveAutocorrelation

    (continued)(continued)

    Excel/PHStat output:Excel/PHStat output:

    1.004943279.98

    3296.18

    e

    )e(e

    Dn

    1i

    2

    i

    n

    2i

    2

    1ii

    ==

    =

    =

    =

    T ti f P iti

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    51/80

    Here, n = 25 and there is k = 1 one independent variable

    Using the Durbin-Watson table, dL= 1.29 and d

    U= 1.45

    D = 1.00494 < dL= 1.29, soreject H

    0and conclude that

    significant positive autocorrelation existsTherefore the linear model is not the appropriate model to

    forecast sales

    Testing for PositiveAutocorrelation

    (continued)(continued)

    Decision:Decision: reject Hreject H00 sincesince

    D = 1.00494 < dD = 1.00494 < dLL

    00 ddUU=1.45=1.45 22ddLL=1.29=1.29Reject HReject H00 Do not reject HDo not reject H00InconclusiveInconclusive

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    52/80

    Inferences About the Slope

    The standard error of the regression slope

    coefficient (b1) is estimated by

    ==

    2

    i

    YXYXb

    )X(X

    S

    SSX

    SS1

    where:where:

    = Estimate of the standard error of the least squares slope= Estimate of the standard error of the least squares slope

    = Standard error of the estimate= Standard error of the estimate

    1bS

    2n

    SSESYX

    =

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    53/80

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    0.03297S1b=

    C i St d d E

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    54/80

    Comparing Standard Errorsof the Slope

    YY

    XX

    YY

    XX1bSsmall 1bSlarge

    is a measure of the variation in the slope of regression lines fromis a measure of the variation in the slope of regression lines from

    different possible samplesdifferent possible samples1b

    S

    Inference about the Slope:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    55/80

    Inference about the Slope:t Test

    t test for a population slopeIs there a linear relationship between X and Y?

    Null and alternative hypotheses

    H0: 1 = 0 (no linear relationship)

    H1: 1 0 (linear relationship does exist)

    Test statistic

    1b

    11

    S

    bt

    =

    2nd.f. =

    where:where:

    bb11 = regression slope= regression slopecoefficientcoefficient

    11 = hypothesized slope= hypothesized slope

    SSbb = standard= standard

    error of the slopeerror of the slope11

    Inference about the Slope:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    56/80

    House Price in$1000s

    (y)

    Square Feet(x)

    245 1400

    312 1600

    279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

    (sq.ft.)0.109898.25pricehouse +=

    Simple Linear Regression Equation:Simple Linear Regression Equation:

    The slope of this model is 0.1098The slope of this model is 0.1098

    Does square footage of the house affectDoes square footage of the house affect

    its sales price?its sales price?

    Inference about the Slope:t Test

    (continued)(continued)

    Inferences about the Slope:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    57/80

    Inferences about the Slope:tTest Example

    H0: 1 = 0

    H1: 1 0

    From Excel output:From Excel output:

    Coefficients Standard Error t Stat P-value

    Intercept 98.24833 58.03348 1.69296 0.12892

    Square Feet 0.10977 0.03297 3.32938 0.01039

    1bS

    tt

    bb11

    32938.303297.0

    010977.0

    S

    b

    t1b

    11

    =

    =

    =

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    58/80

    Inferences about the Slope:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    59/80

    Inferences about the Slope:tTest Example

    H0: 1 = 0

    H1: 1 0

    P-value =P-value = 0.010390.01039

    There is sufficient evidenceThere is sufficient evidence

    that square footage affectsthat square footage affects

    house pricehouse price

    From Excel output:From Excel output:

    Reject HReject H00

    Coefficients Standard Error t Stat P-value

    Intercept 98.24833 58.03348 1.69296 0.12892

    Square Feet 0.10977 0.03297 3.32938 0.01039

    P-valueP-value

    Decision:Decision: P-value 3.329)+P(t < -3.329) =P(t > 3.329)+P(t < -3.329) =

    0.010390.01039

    (for 8 d.f.)(for 8 d.f.)

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    60/80

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    61/80

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    11.08481708.1957

    18934.9348

    MSE

    MSRF ===

    With 1 and 8 degrees ofWith 1 and 8 degrees of

    freedomfreedom P-value forP-value forthe F Testthe F Test

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    62/80

    H0: 1 = 0

    H1: 1 0

    = .05

    df1= 1 df2 = 8

    Test Statistic:Test Statistic:

    Decision:Decision:

    Conclusion:Conclusion:

    Reject HReject H00 atat = 0.05= 0.05

    There is sufficient evidence that houseThere is sufficient evidence that house

    size affects selling pricesize affects selling price00

    = .05= .05

    FF.05.05 = 5.32= 5.32

    Reject HReject H00Do notDo notreject Hreject H00

    11.08MSE

    MSRF ==

    CriticalCritical

    Value:Value:

    FF = 5.32= 5.32

    F Test for Significance(continued)(continued)

    FF

    Confidence Interval Estimate

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    63/80

    Confidence Interval Estimatefor the Slope

    Confidence Interval Estimate of the Slope:Confidence Interval Estimate of the Slope:

    Excel Printout for House Prices:Excel Printout for House Prices:

    At 95% level of confidence, the confidence interval for the slope isAt 95% level of confidence, the confidence interval for the slope is

    (0.0337, 0.1858)(0.0337, 0.1858)

    1b2n1Stb

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    d.f. = n - 2d.f. = n - 2

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    64/80

    t Test for a Correlation

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    65/80

    t Test for a CorrelationCoefficient

    Hypotheses

    H0: = 0 (no correlation between X and Y)

    HA: 0 (correlation exists)

    Test statistic

    (with n 2 degrees of freedom)

    2n

    r1-rt 2

    =

    0bifrr

    0bifrr

    where

    1

    2

    1

    2

    +=

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    66/80

    Example: House PricesIs there evidence of a linear relationship betweenIs there evidence of a linear relationship between

    square feet and house price at the .05 level ofsquare feet and house price at the .05 level of

    significance?significance?

    HH00:: = 0 (No correlation)= 0 (No correlation)HH11:: 0 (correlation exists)0 (correlation exists)

    =.05 , df=.05 , df==10 - 2 = 810 - 2 = 8

    3.329

    210

    .76210.762

    2n

    r1rt

    22=

    =

    =

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    67/80

    Example: Test Solution

    Conclusion:Conclusion:ThereThere is evidenceis evidence ofof

    a linear associationa linear association

    at the 5% level ofat the 5% level of

    significancesignificance

    Decision:Decision:

    Reject HReject H00

    Reject HReject H00Reject HReject H00

    /2=.025/2=.025

    -t-t/2/2Do not reject HDo not reject H00

    00tt/2/2

    /2=.025/2=.025

    -2.3060-2.3060 2.30602.30603.3293.329

    d.f. = 10-2 = 8d.f. = 10-2 = 8

    3.329

    210

    .7621

    0.762

    2n

    r1

    rt

    22=

    =

    =

    Estimating Mean Values and

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    68/80

    Estimating Mean Values andPredicting Individual Values

    YY

    XX XXii

    Y = bY = b00+b+b11XXii

    ConfidenceConfidence

    Interval forInterval for

    thethe meanmean ofofY, given XY, given Xii

    Prediction IntervalPrediction Interval

    for anfor an individualindividual YY,,

    given Xgiven Xii

    Goal: Form intervals around Y to express uncertaintyGoal: Form intervals around Y to express uncertaintyabout the value of Y for a given Xabout the value of Y for a given Xii

    YY

    Confidence Interval for

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    69/80

    Confidence Interval forthe Average Y, Given X

    Confidence interval estimate for theConfidence interval estimate for themean value of Ymean value of Y given a particular Xgiven a particular Xii

    Size of interval varies according toSize of interval varies according to

    distance away from mean,distance away from mean,XX

    iYX2n

    XX|Y

    hStY

    :forintervalConfidencei

    =

    +=

    +=2

    i

    2

    i

    2

    ii

    )X(X

    )X(X

    n

    1

    SSX

    )X(X

    n

    1h

    Prediction Interval for

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    70/80

    Prediction Interval foran Individual Y, Given X

    Confidence interval estimate for anConfidence interval estimate for anIndividual value of YIndividual value of Y given a particular Xgiven a particular Xii

    This extra term adds to the interval width to reflect theThis extra term adds to the interval width to reflect the

    added uncertainty for an individual caseadded uncertainty for an individual case

    iYX2n

    XX

    h1StY

    :YforintervalConfidencei

    +

    =

    Estimation of Mean Values:

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    71/80

    Estimation of Mean Values:Example

    Find the 95% confidence interval for the mean price ofFind the 95% confidence interval for the mean price of

    2,000 square-foot houses2,000 square-foot houses

    Predicted Price YPredicted Price Yii = 317.85 ($1,000s)= 317.85 ($1,000s)

    Confidence Interval Estimate forConfidence Interval Estimate forY|X=XY|X=X

    37.12317.85

    )X(X

    )X(X

    n

    1StY

    2

    i

    2

    iYX2-n =

    +

    The confidence interval endpoints are 280.66 and 354.90, or fromThe confidence interval endpoints are 280.66 and 354.90, or from

    $280,660 to $354,900$280,660 to $354,900

    ii

    Estimation of Individual

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    72/80

    Estimation of IndividualValues: Example

    Find the 95% prediction interval for an individual houseFind the 95% prediction interval for an individual house

    with 2,000 square feetwith 2,000 square feet

    Predicted Price YPredicted Price Yii = 317.85 ($1,000s)= 317.85 ($1,000s)

    Prediction Interval Estimate for YPrediction Interval Estimate for YX=XX=X

    102.28317.85

    )X(X

    )X(X

    n

    11StY

    2

    i

    2

    iYX1-n =

    ++

    The prediction interval endpoints are 215.50 and 420.07, or fromThe prediction interval endpoints are 215.50 and 420.07, or from

    $215,500 to $420,070$215,500 to $420,070

    ii

    Finding Confidence and

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    73/80

    Finding Confidence andPrediction Intervals in Excel

    In Excel, use

    PHStat | regression | simple linear regression

    Check the

    confidence and prediction interval for X=

    box and enter the X-value and confidence leveldesired

    Finding Confidence and

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    74/80

    Input values

    Finding Confidence andPrediction Intervals in Excel

    (continued)(continued)

    Confidence Interval Estimate forConfidence Interval Estimate forY|X=XiY|X=Xi

    Prediction Interval Estimate for YPrediction Interval Estimate for YX=XiX=Xi

    YY

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    75/80

    P itfalls of Regression

    AnalysisLacking an awareness of the assumptionsunderlying least-squares regression

    Not knowing how to evaluate the assumptions

    Not knowing the alternatives to least-squares

    regression if a particular assumption is violated

    Using a regression model without knowledge of

    the subject matter

    Extrapolating outside the relevant range

    Strategies for Avoiding

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    76/80

    Strategies for Avoidingthe Pitfalls of Regression

    Start with a scatter diagram of X vs. Y to

    observe possible relationship

    Perform residual analysis to check the

    assumptions

    Plot the residuals vs. X to check for violations of

    assumptions such as homoscedasticity

    Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the

    residuals to uncover possible non-normality

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    77/80

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    78/80

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    79/80

    Chapter Summary

    Described inference about the slope

    Discussed correlation -- measuring the

    strength of the associationAddressed estimation of mean values and

    prediction of individual values

    Discussed possible pitfalls in regression andrecommended strategies to avoid them

    (continued)(continued)

  • 8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final

    80/80

    Thank You