View
36
Download
0
Category
Tags:
Preview:
DESCRIPTION
Chapter 12. Multiple Regression Analysis and Model Building. Chapter 12 - Chapter Outcomes. After studying the material in this chapter, you should be able to: Understand the general concepts behind model building using multiple regression analysis. - PowerPoint PPT Presentation
Citation preview
Chapter 12Chapter 12
Multiple Multiple Regression Regression
Analysis and Model Analysis and Model BuildingBuilding
Chapter 12 - Chapter 12 - Chapter Chapter OutcomesOutcomes
After studying the material in this chapter, you should be able to:•Understand the general concepts behind model building using multiple regression analysis.•Apply multiple regression analysis to business, decision-making situations.•Analyze the computer output for a multiple regression model and test the significance of the independent variables in the model.
Chapter 12 - Chapter 12 - Chapter Chapter OutcomesOutcomes
(continued)(continued)
After studying the material in this chapter, you should be able to:
•Recognize potential problems when using multiple regression analysis and take the steps to correct the problems.•Incorporate qualitative variables into the regression model by using dummy variables.
Multiple Regression Multiple Regression AnalysisAnalysis
SIMPLE LINEAR REGRESSION MODEL SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL)(POPULATION MODEL)
where:y = Value of the dependent variablex = Value of the independent variable = Population’s y-intercept = Slope of the population regression
line = Error term, or residual
xy 10
01
Multiple Regression Multiple Regression AnalysisAnalysis
ESTIMATED SIMPLE LINEAR ESTIMATED SIMPLE LINEAR REGRESSION MODELREGRESSION MODEL
where:b0 = Estimated y intercept
b1 = Estimated slope coefficient
xbbyi 10ˆ
Multiple Regression Multiple Regression AnalysisAnalysis
A residual or prediction errorresidual or prediction error is the difference between the actual value of y and the predicted value of y.
yye ˆ
Multiple Regression Multiple Regression AnalysisAnalysis
The standard error of the standard error of the estimateestimate refers to the standard deviation of the model errors. The standard error measures the dispersion of the actual values of the dependent variable around the fitted regression plane.
Multiple Regression Multiple Regression AnalysisAnalysis
MULTIPLE REGRESSION MODEL MULTIPLE REGRESSION MODEL (POPULATION MODEL)(POPULATION MODEL)
where: = Population’s regression constant = Population’s regression coefficient
for variable j; j=1, 2, … kk = Number of independent variables = Model error
kk xxxy 22110
0j
Multiple Regression Multiple Regression AnalysisAnalysis
ESTIMATED MULTIPLE REGRESSION ESTIMATED MULTIPLE REGRESSION MODELMODEL
kki xbxbxbby 22110ˆ
Multiple Regression Multiple Regression AnalysisAnalysis
A modelmodel is a representation of an actual system using either a physical or mathematical portrayal.
Model SpecificationModel Specification
• Decide what you want to do and select the dependent variable.
• List the potential independent variables for your model.
• Gather the sample data (observations) for all variables.
Multiple Regression Multiple Regression AnalysisAnalysis
The correlation coefficientcorrelation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation coefficient, r, ranges between -1.0 and +1.0.
Multiple Regression Multiple Regression AnalysisAnalysis
CORRELATION COEFFICIENTCORRELATION COEFFICIENT
One x variable with y
22 )()(
))((
yyxx
yyxxr
or
Multiple Regression Multiple Regression AnalysisAnalysis
CORRELATION COEFFICIENTCORRELATION COEFFICIENT
One x variable with another x
22 )()(
))((
xxxx
xxxxr
Multiple Regression Multiple Regression AnalysisAnalysis
(Example 12-1)(Example 12-1)
Multiple Regression Model:
)(5.203,28)(0.522,3
)(4.410,8)(4.144,1).(1.636.127,31ˆ
GarageBathrooms
BedroomsAgefeetSqy
House Characteristics:x1 = Square feet = 2,100; x2 = Age = 15; x3 = Number of Bedrooms = 4;
x4 = Number of baths = 3;
x5 = Size of garage = 2Point Estimate for Sale Price:
70.802,179$ˆ
)2(5.203,28)3(0.522,3
)4(4.410,8)15(4.144,1)100,2(1.636.127,31ˆ
y
y
Coefficient of Coefficient of DeterminationDetermination
MULTIPLE COEFFICIENT OF MULTIPLE COEFFICIENT OF DETERMINATIONDETERMINATION
The percentage of variation in the dependent variable explained by the independent variable in the regression model:
TSS
SSRR
squaresof sum Totaln regressiosquaresof Sum2
Model DiagnosisModel Diagnosis
• Is the overall model significant?• Are the individual variables
significant?• Is the standard deviation of the
model error too large to provide meaningful results?
• Is multicollinearity a problem?
Is the Model Significant?Is the Model Significant?
0210 kH
0 equalnot does oneleast At iAH
If the null hypothesis is true, the overall regression model is not useful for predictive purposes.
Is the Model Significant?Is the Model Significant?
F-TEST STATISTICF-TEST STATISTIC
where:SSR = Sum of squares regressionSSE = Sum of squares error n = Number of data points k = Number of independent
variablesDegrees of freedom = D1 = k and D2 =
n - k - 1
MSE
MSR
knSSEk
SSR
F
1
Is the Model Significant?Is the Model Significant?
ADJUSTED R-SQUAREDADJUSTED R-SQUAREDA measure of the percentage of explained variation in the dependent variable that takes into account the relationship between the number of cases and the number of independent variables in the regression model.
where: n = Number of data points k = Number of independent
variables
1
1)1(1)( 22
kn
nRRadjsqR A
Are the Individual Are the Individual Variables Significant?Variables Significant?
iallforH
H
iA
i
0:
0:0
Are the Individual Are the Individual Variables Significant?Variables Significant?
t-TEST FOR SIGNIFICANCE OF EACH t-TEST FOR SIGNIFICANCE OF EACH REGRESSION COEFFICIENTREGRESSION COEFFICIENT
where:bi = Sample slope coefficient for the ith
independent variablesbi
= Estimate of the standard error for
the ith sample slope coefficientn-k-1 = Degrees of freedom
ib
i
s
bt
0
364.201. t0
Are the Individual Are the Individual Variables Significant? Variables Significant?
(From Figure 12-7)(From Figure 12-7)
/2 = 0.01
Decision RuleDecision Rule: If -2.364 t 2.364, accept H0 Otherwise, reject H0
364.201. t
/2 = 0.01
02.0
,0.0:
0.0:0
model the in already are variablesother all given
model the in already are variablesother all given ,
iA
i
H
H
31315319
1..
knfd
Are the Individual Are the Individual Variables Significant? Variables Significant?
(From Figure 12-7)(From Figure 12-7)
0
1
H reject 2.364, 15.70 Since
15.70 t -Calculated For
:
0
2
H reject 2.364, - 10.15 -Since
10.15 -t -Calculated For
:
0
3
H reject 2.364, - 2.80 -Since
2.80 -t -Calculated For
:
0
4
H not reject do 2.364, 2.23 Since
2.23 t -Calculated For
:
0
5
H reject 2.364, 9.87 Since
9.87 t -Calculated For
:
Is the Standard Deviation of Is the Standard Deviation of the Regression Model Too the Regression Model Too
Large?Large?
ESTIMATE FOR THE STANDARD ESTIMATE FOR THE STANDARD DEVIATION OF THE MODELDEVIATION OF THE MODEL
where:SSE = Sum of squares error
n = Sample size k = Number of independent
variables
MSEkn
SSEs
1
Is Multicollinearity A Is Multicollinearity A Problem?Problem?
MulticollinearityMulticollinearity refers to the situation when high correlation exists between two independent variables. This means the two variables contribute redundant information to the multiple regression model. When highly correlated independent variables are included in the regression model, they can adversely affect the regression results.
Some Indications of Some Indications of Severe MulticollinearitySevere Multicollinearity
• Incorrect signs on the coefficients.• A sizable change in the values of the
previous coefficients when a new variable is added to the model.
• A variable previously significant in the model becomes insignificant when a new independent variable is added.
• The estimate of the standard deviation of the model increases when a variable is added to the model.
Is Multicollinearity A Is Multicollinearity A Problem?Problem?
The variance inflation factorvariance inflation factor is a measure of how much the variance of an estimated regression coefficient increases if the independent variables are correlated. A VIFVIF equal to one for a given independent variable indicates that this independent variable is not correlated with the remaining independent variables in the model. The greater the multicollinearity, the larger the VIF will be.
Is Multicollinearity A Is Multicollinearity A Problem?Problem?
VARIANCE INFLATION FACTORVARIANCE INFLATION FACTOR
where:Rj
2 = Coefficient of determination when the jth independent variable is regressed against the remaining k - 1 independent variables.
)1(
12jR
VIF
Multiple Regression Multiple Regression AnalysisAnalysis
CONFIDENCE INTERVAL FOR THE CONFIDENCE INTERVAL FOR THE REGRESSION COEFFICIENTREGRESSION COEFFICIENT
where:bi = Point estimate for the regression
coefficient i
t/2= Critical t-value for a 1 - confidence interval
sbi= The standard error of the ith regression coefficient
ibi stb 2/
Multiple Regression Multiple Regression AnalysisAnalysis
(Example from Figure 12-9)(Example from Figure 12-9)
)017.4(967.106.63
ibi stb 2/
90.706.63 $55.16$55.16 $70.97$70.97
Using Qualitative Using Qualitative Independent VariablesIndependent Variables
A dummy variabledummy variable is a variable that is assigned a value equal to 0 or 1 depending on whether the observation possesses a given characteristic or not.
Using Qualitative Using Qualitative Independent Variables Independent Variables
(Example 12-2)(Example 12-2)
22110 xxy
notif 0 MBA,if 1 2x
Age 1xSalary y
21 236,35055,2974,6ˆ xxy
Dummy Variable:
Estimated Regression:
Using Qualitative Using Qualitative Independent Variables Independent Variables
(Example 12-2)(Example 12-2)
If No MBA:
If MBA:
)0(236,35055,2974,6ˆ 1 xy
1055,2974,6ˆ xy
)1(236,35055,2974,6ˆ 1 xy
1055,2210,42ˆ xy
Using Qualitative Using Qualitative Independent VariablesIndependent Variables
(Figure 12-11)(Figure 12-11)
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
0 10 20 30 40 50 60 70
Age
Salary
MBAs
1055,2210,42ˆ xy
Non-MBAs
1055,2974,6ˆ xy b2 = 35,236 = Regression coefficient on the dummy variable
Stepwise RegressionStepwise Regression
Stepwise regressionStepwise regression refers to a method which develops the least squares regression equation in steps, either through forward forward selectionselection, backward eliminationbackward elimination, or through standard stepwisestandard stepwise regression.
Stepwise RegressionStepwise Regression
The coefficient of partial coefficient of partial determinationdetermination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model.
Best Subsets RegressionBest Subsets Regression
CCp p STATISTICSTATISTIC
where:p = (Number of independent variables in
model) + 1T = 1 + The total number of independent
variables to be considered for inclusion in the model
Rp2 = Coefficient of multiple determination
for the model with p = k parameters
RT2 = Coefficient of multiple determination
for the model that contains all T parameters
)2(1
))(1(2
2
pnR
TnRC
T
pp
Key TermsKey Terms
• Adjusted R-Squared• Correlation
Coefficient• Correlation Matrix• Dummy Variables
• Multicollinearity• Multiple
Coefficient of Determination
• Multiple Regression Model
Key TermsKey Terms(continued)(continued)
• Residual (Prediction Error)
• Standard Error of the Estimate
• Standardized Residual• Variance Inflation
Factor
Recommended