2003 Prentice-Hall, Inc.Chap 11-1 Business Statistics: A First
Course (3 rd Edition) Chapter 11 Multiple Regression
Slide 2
2003 Prentice-Hall, Inc. Chap 11-2 Chapter Topics The Multiple
Regression Model Residual Analysis Testing for the Significance of
the Regression Model Inferences on the Population Regression
Coefficients Testing Portions of the Multiple Regression Model
Slide 3
2003 Prentice-Hall, Inc. Chap 11-3 Chapter Topics The Quadratic
Regression Model Dummy Variables Using Transformations in
Regression Models Collinearity Model Building Pitfalls in Multiple
Regression and Ethical Issues (continued)
Slide 4
2003 Prentice-Hall, Inc. Chap 11-4 Population Y-intercept
Population slopesRandom Error The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent
variables is a linear function Dependent (Response) variable
Independent (Explanatory) variables
Slide 5
2003 Prentice-Hall, Inc. Chap 11-5 Multiple Regression Model
Bivariate model
Slide 6
2003 Prentice-Hall, Inc. Chap 11-6 Multiple Regression Equation
Bivariate model Multiple Regression Equation
Slide 7
2003 Prentice-Hall, Inc. Chap 11-7 Multiple Regression Equation
Too complicated by hand! Ouch!
Slide 8
2003 Prentice-Hall, Inc. Chap 11-8 Interpretation of Estimated
Coefficients Slope (b i ) Estimated that the average value of Y
changes by b i for each 1 unit increase in X i holding all other
variables constant (ceterus paribus) Example: If b 1 = -2, then
fuel oil usage (Y) is expected to decrease by an estimated 2
gallons for each 1 degree increase in temperature (X 1 ) given the
inches of insulation (X 2 ) Y-Intercept (b 0 ) The estimated
average value of Y when all X i = 0
Slide 9
2003 Prentice-Hall, Inc. Chap 11-9 Multiple Regression Model:
Example ( 0 F) Develop a model for estimating heating oil used for
a single family home in the month of January based on average
temperature and amount of insulation in inches.
Slide 10
2003 Prentice-Hall, Inc. Chap 11-10 Multiple Regression
Equation: Example Excel Output For each degree increase in
temperature, the estimated average amount of heating oil used is
decreased by 5.437 gallons, holding insulation constant. For each
increase in one inch of insulation, the estimated average use of
heating oil is decreased by 20.012 gallons, holding temperature
constant.
Slide 11
2003 Prentice-Hall, Inc. Chap 11-11 Multiple Regression in
PHStat PHStat | Regression | Multiple Regression EXCEL spreadsheet
for the heating oil example.
Slide 12
2003 Prentice-Hall, Inc. Chap 11-12 Simple and Multiple
Regression Compared simple Coefficients in a simple regression pick
up the impact of that variable (plus the impacts of other variables
that are correlated with it) and the dependent variable. multiple
Coefficients in a multiple regression account for the impacts of
the other variables in the equation.
Slide 13
2003 Prentice-Hall, Inc. Chap 11-13 Simple and Multiple
Regression Compared:Example Two simple regressions: Multiple
Regression:
Slide 14
2003 Prentice-Hall, Inc. Chap 11-14 Venn Diagrams and
Explanatory Power of Regression Oil Temp Variations in Oil
explained by Temp or variations in Temp used in explaining
variation in Oil Variations in Oil explained by the error term
Variations in Temp not used in explaining variation in Oil
Slide 15
2003 Prentice-Hall, Inc. Chap 11-15 Venn Diagrams and
Explanatory Power of Regression Oil Temp (continued)
Slide 16
2003 Prentice-Hall, Inc. Chap 11-16 Venn Diagrams and
Explanatory Power of Regression Oil Temp Insulation Overlapping
variation NOT estimation Overlapping variation in both Temp and
Insulation are used in explaining the variation in Oil but NOT in
the estimation of nor NOT Variation NOT explained by Temp nor
Insulation
Slide 17
2003 Prentice-Hall, Inc. Chap 11-17 Venn Diagrams and
Explanatory Power of Regression Oil Temp Insulation
Slide 18
2003 Prentice-Hall, Inc. Chap 11-18 Coefficient of Multiple
Determination Proportion of Total Variation in Y Explained by All X
Variables Taken Together Never Decreases When a New X Variable is
Added to Model Disadvantage When Comparing Models
Slide 19
2003 Prentice-Hall, Inc. Chap 11-19 Adjusted Coefficient of
Multiple Determination Proportion of Variation in Y Explained by
All X Variables Adjusted for the Number of X Variables Used and
Sample Size Penalizes Excessive Use of Independent Variables
Smaller than Useful in Comparing among Models
Slide 20
2003 Prentice-Hall, Inc. Chap 11-20 Coefficient of Multiple
Determination Excel Output Adjusted r 2 reflects the number of
explanatory variables and sample size is smaller than r 2
Slide 21
2003 Prentice-Hall, Inc. Chap 11-21 Interpretation of
Coefficient of Multiple Determination 96.56% of the total variation
in heating oil can be explained by temperature and amount of
insulation 95.99% of the total fluctuation in heating oil can be
explained by temperature and amount of insulation after adjusting
for the number of explanatory variables and sample size
Slide 22
2003 Prentice-Hall, Inc. Chap 11-22 Using The Regression
Equation to Make Predictions Predict the amount of heating oil used
for a home if the average temperature is 30 0 and the insulation is
6 inches. The predicted heating oil used is 278.97 gallons
Slide 23
2003 Prentice-Hall, Inc. Chap 11-23 Predictions in PHStat
PHStat | Regression | Multiple Regression Check the Confidence and
Prediction Interval Estimate box EXCEL spreadsheet for the heating
oil example.
Slide 24
2003 Prentice-Hall, Inc. Chap 11-24 Residual Plots Residuals Vs
May need to transform Y variable Residuals Vs May need to transform
variable Residuals Vs May need to transform variable Residuals Vs
Time May have autocorrelation
Slide 25
2003 Prentice-Hall, Inc. Chap 11-25 Residual Plots: Example No
Discernible Pattern Maybe some non- linear relationship
Slide 26
2003 Prentice-Hall, Inc. Chap 11-26 Testing for Overall
Significance Shows if there is a Linear Relationship between all of
the X Variables Together and Y Use F test Statistic Hypotheses: H 0
: k = 0 (No linear relationship) H 1 : At least one i ( At least
one independent variable affects Y ) The Null Hypothesis is a Very
Strong Statement The Null Hypothesis is Almost Always Rejected
Slide 27
2003 Prentice-Hall, Inc. Chap 11-27 Testing for Overall
Significance Test Statistic: where F has k numerator and (n-k-1)
denominator degrees of freedom (continued)
Slide 28
2003 Prentice-Hall, Inc. Chap 11-28 Test for Overall
Significance Excel Output: Example k = 2, the number of explanatory
variables n - 1 p value
Slide 29
2003 Prentice-Hall, Inc. Chap 11-29 Test for Overall
Significance Example Solution F 03.89 H 0 : 1 = 2 = = k = 0 H 1 :
At least one i 0 =.05 df = 2 and 12 Critical Value : Test
Statistic: Decision: Conclusion: Reject at = 0.05 There is evidence
that at least one independent variable affects Y = 0.05 F 168.47
(Excel Output)
Slide 30
2003 Prentice-Hall, Inc. Chap 11-30 Test for Significance:
Individual Variables Shows if There is a Linear Relationship
Between the Variable X i and Y Use t Test Statistic Hypotheses: H 0
: i 0 (No linear relationship) H 1 : i 0 (Linear relationship
between X i and Y)
Slide 31
2003 Prentice-Hall, Inc. Chap 11-31 t Test Statistic Excel
Output: Example t Test Statistic for X 1 (Temperature) t Test
Statistic for X 2 (Insulation)
Slide 32
2003 Prentice-Hall, Inc. Chap 11-32 t Test : Example Solution H
0 : 1 = 0 H 1 : 1 0 df = 12 Critical Values: Test Statistic:
Decision: Conclusion: Reject H 0 at = 0.05 There is evidence of a
significant effect of temperature on oil consumption. t 0 2.1788
-2.1788.025 Reject H 0 0.025 Does temperature have a significant
effect on monthly consumption of heating oil? Test at = 0.05. t
Test Statistic = 16.1699
Slide 33
2003 Prentice-Hall, Inc. Chap 11-33 Venn Diagrams and
Estimation of Regression Model Oil Temp Insulation Only this
information is used in the estimation of This information is NOT
used in the estimation of nor
Slide 34
2003 Prentice-Hall, Inc. Chap 11-34 Confidence Interval
Estimate for the Slope Provide the 95% confidence interval for the
population slope 1 (the effect of temperature on oil consumption).
-6.169 1 -4.704 The estimated average consumption of oil is reduced
by between 4.7 gallons to 6.17 gallons per each increase of 1 0
F.
Slide 35
2003 Prentice-Hall, Inc. Chap 11-35 Contribution of a Single
Independent Variable Let X k be the Independent Variable of
Interest Measures the contribution of X k in explaining the total
variation in Y
Slide 36
2003 Prentice-Hall, Inc. Chap 11-36 Contribution of a Single
Independent Variable Measures the contribution of in explaining Y
From ANOVA section of regression for
Slide 37
2003 Prentice-Hall, Inc. Chap 11-37 Coefficient of Partial
Determination of Measures the Proportion of Variation in the
Dependent Variable that is Explained by X k while Controlling for
(Holding Constant) the Other Independent Variables
Slide 38
2003 Prentice-Hall, Inc. Chap 11-38 Coefficient of Partial
Determination for (continued) Example: Model with two independent
variables
Slide 39
2003 Prentice-Hall, Inc. Chap 11-39 Venn Diagrams and
Coefficient of Partial Determination for Oil Temp Insulation =
Slide 40
2003 Prentice-Hall, Inc. Chap 11-40 Coefficient of Partial
Determination in PHStat PHStat | Regression | Multiple Regression
Check the Coefficient of Partial Determination box EXCEL
spreadsheet for the heating oil example
Slide 41
2003 Prentice-Hall, Inc. Chap 11-41 Contribution of a Subset of
Independent Variables Let X s Be the Subset of Independent
Variables of Interest Measures the contribution of the subset X s
in explaining SST
Slide 42
2003 Prentice-Hall, Inc. Chap 11-42 Contribution of a Subset of
Independent Variables: Example Let X s be X 1 and X 3 From ANOVA
section of regression for
Slide 43
2003 Prentice-Hall, Inc. Chap 11-43 Testing Portions of Model
Examines the Contribution of a Subset X s of Explanatory Variables
to the Relationship with Y Null Hypothesis: Variables in the subset
do not improve significantly the model when all other variables are
included Alternative Hypothesis: At least one variable is
significant
Slide 44
2003 Prentice-Hall, Inc. Chap 11-44 Testing Portions of Model
One-tailed Rejection Region Requires Comparison of Two Regressions
One regression includes everything Another regression includes
everything except the portion to be tested (continued)
Slide 45
2003 Prentice-Hall, Inc. Chap 11-45 Partial F Test for the
Contribution of a Subset of X variables Hypotheses: H 0 : Variables
X s do not significantly improve the model given all others
variables included H 1 : Variables X s significantly improve the
model given all others included Test Statistic: with df = m and
(n-k-1) m = # of variables in the subset X s
Slide 46
2003 Prentice-Hall, Inc. Chap 11-46 Partial F Test for the
Contribution of a Single Hypotheses: H 0 : Variable X j does not
significantly improve the model given all others included H 1 :
Variable X j significantly improves the model given all others
included Test Statistic: With df = 1 and (n-k-1) m = 1 here
Slide 47
2003 Prentice-Hall, Inc. Chap 11-47 Testing Portions of Model:
Example Test at the =.05 level to determine if the variable of
average temperature significantly improves the model given that
insulation is included.
Slide 48
2003 Prentice-Hall, Inc. Chap 11-48 Testing Portions of Model:
Example H 0 : X 1 (temperature) does not improve model with X 2
(insulation) included H 1 : X 1 does improve model =.05, df = 1 and
12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion:
Reject H 0 ; X 1 does improve model
Slide 49
2003 Prentice-Hall, Inc. Chap 11-49 Testing Portions of Model
in PHStat PHStat | Regression | Multiple Regression Check the
Coefficient of Partial Determination box EXCEL spreadsheet for the
heating oil example.
Slide 50
2003 Prentice-Hall, Inc. Chap 11-50 Do We Need to Do This for
One Variable? The F Test for the Inclusion of a Single Variable
after all Other Variables are Included in the Model is IDENTICAL to
the t Test of the Slope for that Variable The Only Reason to Do an
F Test is to Test Several Variables Together
Slide 51
2003 Prentice-Hall, Inc. Chap 11-51 The Quadratic Regression
Model Relationship Between the Response Variable and the
Explanatory Variable is a Quadratic Polynomial Function Useful When
Scatter Diagram Indicates Non- linear Relationship Quadratic Model
: The Second Explanatory Variable is the Square of the First
Variable
Slide 52
2003 Prentice-Hall, Inc. Chap 11-52 Quadratic Regression Model
(continued) Quadratic model may be considered when a scatter
diagram takes on the following shapes: X1X1 Y X1X1 X1X1 YYY 2 >
0 2 < 0 2 = the coefficient of the quadratic term X1X1
Slide 53
2003 Prentice-Hall, Inc. Chap 11-53 Testing for Significance:
Quadratic Model Testing for Overall Relationship Similar to test
for linear model F test statistic = Testing the Quadratic Effect
Compare quadratic model with the linear model Hypotheses (No
quadratic term) (Quadratic term is needed)
Slide 54
2003 Prentice-Hall, Inc. Chap 11-54 Heating Oil Example ( 0 F)
Determine if a quadratic model is needed for estimating heating oil
used for a single family home in the month of January based on
average temperature and amount of insulation in inches.
Slide 55
2003 Prentice-Hall, Inc. Chap 11-55 Heating Oil Example:
Residual Analysis No Discernable Pattern Possible non-linear
relationship (continued)
Slide 56
2003 Prentice-Hall, Inc. Chap 11-56 Heating Oil Example: t Test
for Quadratic Model Testing the Quadratic Effect Model with
quadratic insulation term Model without quadratic insulation term
Hypotheses (No quadratic term in insulation) (Quadratic term is
needed in insulation) (continued)
Slide 57
2003 Prentice-Hall, Inc. Chap 11-57 Example Solution H 0 : 3 =
0 H 1 : 3 0 df = 11 Critical Values: Test Statistic: Decision:
Conclusion: Do not reject H 0 at = 0.05 There is not sufficient
evidence for the need to include quadratic effect of insulation on
oil consumption. Z 0 2.2010-2.2010.025 Reject H 0 0.025 Is
quadratic term in insulation needed on monthly consumption of
heating oil? Test at = 0.05. 1.6611
Slide 58
2003 Prentice-Hall, Inc. Chap 11-58 Example Solution in PHStat
PHStat | Regression | Multiple Regression EXCEL spreadsheet for the
heating oil example.
Slide 59
2003 Prentice-Hall, Inc. Chap 11-59 Dummy Variable Models
Categorical Explanatory Variable with 2 or More Levels: Yes or No,
On or Off, Male or Female, Use Dummy Variables (Coded As 0 or 1)
Only Intercepts are Different Assumes Equal Slopes Across
Categories The Number of Dummy Variables Needed is (# of Levels -
1) Regression Model Has Same Form:
Slide 60
2003 Prentice-Hall, Inc. Chap 11-60 Dummy-Variable Models (with
2 Levels) Given: Y = Assessed Value of House X 1 = Square footage
of House X 2 = Desirability of Neighborhood = Desirable (X 2 = 1)
Undesirable (X 2 = 0) 0 if undesirable 1 if desirable Same
slopes
Slide 61
2003 Prentice-Hall, Inc. Chap 11-61 Undesirable Desirable
Location Dummy-Variable Models (with 2 Levels) (continued) X 1
(Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes
Intercepts different
Slide 62
2003 Prentice-Hall, Inc. Chap 11-62 Interpretation of the Dummy
Variable Coefficient (with 2 Levels) Example: : GPA 0 non-business
degree 1 business degree : Annual salary of college graduate in
thousand $ With the same GPA, college graduates with a business
degree are making an estimated 6 thousand dollars more than
graduates with a non-business degree on average. :
Slide 63
2003 Prentice-Hall, Inc. Chap 11-63 Regression Model Containing
an Interaction Term Hypothesizes Interaction Between a Pair of X
Variables Response to one X variable varies at different levels of
another X variable Contains a Cross Product Term Can Be Combined
With Other Models E.g., Dummy Variable Model
Slide 64
2003 Prentice-Hall, Inc. Chap 11-64 Effect of Interaction
Given: Without Interaction Term, Effect of X 1 on Y is Measured by
1 With Interaction Term, Effect of X 1 on Y is Measured by 1 + 3 X
2 Effect Changes as X 2 Changes
Slide 65
2003 Prentice-Hall, Inc. Chap 11-65 Y = 1 + 2X 1 + 3(1) + 4X 1
(1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1
Interaction Example Effect (slope) of X 1 on Y depends on X 2 value
X1X1 4 8 12 0 010.51.5 Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2
Slide 66
2003 Prentice-Hall, Inc. Chap 11-66 Interaction Regression
Model Worksheet Multiply X 1 by X 2 to get X 1 X 2 Run regression
with Y, X 1, X 2, X 1 X 2 Case, iYiYi X 1i X 2i X 1i X 2i 11133
248540 31326 435630 :::::
Slide 67
2003 Prentice-Hall, Inc. Chap 11-67 Interpretation when there
are 3+ Levels MALE = 0 if female and 1 if male MARRIED = 1 if
married; 0 if not DIVORCED = 1 if divorced; 0 if not MALEMARRIED =
1 if male married; 0 otherwise = (MALE times MARRIED) MALEDIVORCED
= 1 if male divorced; 0 otherwise = (MALE times DIVORCED)
Slide 68
2003 Prentice-Hall, Inc. Chap 11-68 Interpretation when there
are 3+ Levels (continued)
Slide 69
2003 Prentice-Hall, Inc. Chap 11-69 Interpreting Results FEMALE
Single: Married: Divorced: MALE Single: Married: Divorced: Main
Effects : MALE, MARRIED and DIVORCED Interaction Effects :
MALEMARRIED and MALEDIVORCED Difference
Slide 70
2003 Prentice-Hall, Inc. Chap 11-70 Hypothesize Interaction
Between a Pair of Independent Variables Contains a Cross-Product
Term Hypotheses: H 0 : 3 = 0 (No Interaction between X 1 and X 2 )
H 1 : 3 0 (X 1 Interacts with X 2 ) Evaluating Presence of
Interaction
Slide 71
2003 Prentice-Hall, Inc. Chap 11-71 Using Transformations
Requires Data Transformation Either or Both Independent and
Dependent Variables may be Transformed Can be Based on Theory,
Logic or Scatter Diagrams
Slide 72
2003 Prentice-Hall, Inc. Chap 11-72 Inherently Non-Linear
Models Non-linear Models that can be Expressed in Linear Form Can
be estimated by least squares in linear form Require Data
Transformation
Slide 73
2003 Prentice-Hall, Inc. Chap 11-73 Transformed Multiplicative
Model (Log-Log) Similarly for X 2
Slide 74
2003 Prentice-Hall, Inc. Chap 11-74 Square Root Transformation
1 > 0 1 < 0 Similarly for X 2 Transforms non-linear model to
one that appears linear. Often used to overcome
heteroscedasticity.
Slide 75
2003 Prentice-Hall, Inc. Chap 11-75 Exponential Transformation
(Log-Linear) Original Model 1 > 0 1 < 0 Transformed
Into:
Slide 76
2003 Prentice-Hall, Inc. Chap 11-76 Interpretation of
Coefficients Transformed Exponential Model (The Dependent Variable
is Logged) The coefficient of the independent variable can be
approximately interpreted as: a 1 unit change in leads to an
estimated average rate of change of percentage in Y
Slide 77
2003 Prentice-Hall, Inc. Chap 11-77 Interpretation of
Coefficients Transformed Multiplicative Model (Both Dependent and
Independent Variables are Logged) The coefficient of the
independent variable can be approximately interpreted as : a 1
percent rate of change in leads to an estimated average rate of
change of percentage in Y. Therefore is the elasticity of Y with
respect to a change in (continued)
Slide 78
2003 Prentice-Hall, Inc. Chap 11-78 Collinearity
(Multicollinearity) High Correlation between Explanatory Variables
Coefficient of Multiple Determination Measures Combined Effect of
the Correlated Explanatory Variables Little or No New Information
Provided Leads to Unstable Coefficients (Large Standard Error)
Slide 79
2003 Prentice-Hall, Inc. Chap 11-79 Venn Diagrams and
Collinearity Oil Temp Insulation Overlap NOT Large Overlap in
variation of Temp and Insulation is used in explaining the
variation in Oil but NOT in estimating and Overlap Large Overlap
reflects collinearity between Temp and Insulation
Slide 80
2003 Prentice-Hall, Inc. Chap 11-80 Detect Collinearity
(Variance Inflationary Factor) Used to Measure Collinearity If is
Highly Correlated with the Other Explanatory Variables
Slide 81
2003 Prentice-Hall, Inc. Chap 11-81 Detect Collinearity in
PHStat PHStat | Regression | Multiple Regression Check the Variance
Inflationary Factor (VIF) box EXCEL spreadsheet for the heating oil
example Since there are only two explanatory variables, only one
VIF is reported in the Excel spreadsheet No VIF is > 5 There is
no evidence of collinearity
Slide 82
2003 Prentice-Hall, Inc. Chap 11-82 Model Building Goal is to
Develop a Good Model with the Fewest Explanatory Variables Easier
to interpret Lower probability of collinearity Stepwise Regression
Procedure Provide limited evaluation of alternative models
Best-Subset Approach Uses the or C p Statistic Selects the model
with the largest or small C p near k+1
Slide 83
2003 Prentice-Hall, Inc. Chap 11-83 Model Building Flowchart
Choose X 1,X 2,X p Run Regression to find VIFs Remove Variable with
Highest VIF Any VIF>5? Run Subsets Regression to Obtain best
models in terms of C p Do Complete Analysis Add Curvilinear Term
and/or Transform Variables as Indicated Perform Predictions No More
than One? Remove this X Yes No Yes
Slide 84
2003 Prentice-Hall, Inc. Chap 11-84 Additional Pitfalls and
Ethical Issues Fail to Understand that the Interpretation of the
Estimated Regression Coefficients are Performed Holding All Other
Independent Variables Constant Fail to Evaluate Residual Plots for
Each Independent Variable Fail to Evaluate Interaction Terms
Slide 85
2003 Prentice-Hall, Inc. Chap 11-85 Additional Pitfalls and
Ethical Issues Fail to Obtain VIF for Each Independent Variable and
Remove Variables that Exhibit a High Collinearity with Other
Independent Variables before Performing Significance Test on Each
Independent Variable Fail to Examine Several Alternative Models
Fail to Use Other Methods when the Assumptions Necessary for
Least-squares Regression have been Seriously Violated
(continued)
Slide 86
2003 Prentice-Hall, Inc. Chap 11-86 Chapter Summary Developed
the Multiple Regression Model Discussed Residual Plots Addressed
Testing the Significance of the Multiple Regression Model Discussed
Inferences on Population Regression Coefficients Addressed Testing
Portions of the Multiple Regression Model
Slide 87
2003 Prentice-Hall, Inc. Chap 11-87 Chapter Summary Described
the Quadratic Regression Model Addressed Dummy Variables Discussed
Using Transformations in Regression Models Described Collinearity
Discussed Model Building Addressed Pitfalls in Multiple Regression
and Ethical Issues (continued)